What You Need to Know About Rate Limits

What is rate limiting?

Most APIs are subject to a limit on how many calls can be made per second (or minute, or other short time period), in order to protect servers from being overloaded and maintain high quality of service to many clients. In the case of FullContact, in addition to your monthly quota of profile matches, your plan has an associated rate limit that is tracked and enforced over a 60-second window.

What does FullContact rate limiting look like?

FullContact APIs use HTTP headers to communicate to your application what its rate limit is and how quickly it’s being used. Here’s an example:

$ curl -I \
HTTP/1.1 200 OK
Date: Mon, 30 Mar 2015 01:58:31 GMT
Content-Type: application/json; charset=UTF-8
X-Rate-Limit-Limit: 30
X-Rate-Limit-Remaining: 11
X-Rate-Limit-Reset: 44

There are three important concepts here:

  • X-Rate-Limit-Limit indicates how many calls your application may make per time window. This time window is currently 60 seconds for most of our APIs, but for the most reliable code you should not assume this is the case — use the X-Rate-Limit-Reset header instead.
  • X-Rate-Limit-Reset indicates when the current window ends, in seconds from the current time.
  • X-Rate-Limit-Remaining indicates how many calls you have remaining in this window.

So in this example, the application has 11 more calls remaining in the next 44 seconds; after that, the counter, X-Rate-Limit-Remaining, will reset back to 30. If the application were to make more calls than allowed, the API will return an HTTP 403 status code:

$ curl -I \
HTTP/1.1 403 Forbidden
Date: Mon, 30 Mar 2015 01:58:31 GMT
X-Rate-Limit-Limit: 30
X-Rate-Limit-Remaining: 0
X-Rate-Limit-Reset: 11

What are some good techniques for dealing with rate limits?

The most reliable way to ensure your app stays within its rate limit is to look at the headers as responses come in, and use this information to slow down your calls to the FullContact API if needed.

Take for example the set of headers above — we’re allowed 11 more calls over the next 44 seconds, so ideally we’d like to make one call every 4 seconds so we get as close as possible to our rate limit without going over. The python code below shows a simple implementation of this:

from datetime import datetime, timedelta
import time
import requests

class FullContactAdaptiveClient(object):

    def __init__(self):
        self.next_req_time = datetime.fromtimestamp(0)

    def call_fullcontact(self, email):
        r = requests.get('https://api.fullcontact.com/v2/person.json',
                         params={'email': email, 'apiKey': API_KEY})
        return r.json()

    def _wait_for_rate_limit(self):
        now = datetime.now()
        if self.next_req_time > now:
            t = self.next_req_time - now

    def _update_rate_limit(self, hdr):
        remaining = float(hdr['X-Rate-Limit-Remaining'])
        reset = float(hdr['X-Rate-Limit-Reset'])
        spacing = reset / (1.0 + remaining)
        delay = spacing - self.REQUEST_LATENCY
        self.next_req_time = datetime.now() + timedelta(seconds=delay)

We divide the reset number by the remaining number to get the spacing we want between calls. The additional 1 in the denominator has little effect when remaining is large, but causes the calculation of spacing to err a bit on the short side as we get towards the end of a time window. Once remaining=0 our spacing will be equal to reset.

If HTTP calls were instantaneous, we could just sleep for spacing seconds between each call, but thanks to that pesky speed of light we need to subtract out how long the previous request took. In a real implementation this would be measured from actual requests, but even hardcoding it works fairly well if you try to err on the small side.

Notice that this approach doesn’t care that other clients may be consuming the same rate limit. For example, a high-volume application may want to have more than one server making queries to FullContact. Since we recalculate the delay after each request, this approach tracks the correct rate limit even if multiple instances of the client are using the same API key. Hence, for distributed client applications, following these rate limit headers is the easiest way to ensure the application stays under the limits.

Handle 403s gracefully

Even the best rate-limiting code is going to occasionally get an HTTP 403 error, so make sure your application handles them gracefully. In a backend application, this probably means slowing down (the example code above should handle this since X-Rate-Limit-Remaining is 0 on 403 responses) and trying again. In a frontend application, this probably means showing a nice error message to the user.

Try to design the usage pattern that makes the fewest calls

For example, let’s say your application wants to use FullContact to show extra information about registered users in your webapp. You could make the API call when rendering the user page, but it would probably be more rate-limit friendly to do it just once, when the user first registers, and store the result alongside your user database.

Consider caching

Along the same lines, even if your usage pattern demands that you make FullContact API calls at display time, if there’s significant overlap where the same person is displayed more than once, adding a cache between your application and FullContact could be a big win. Consider using something like Memcache and refreshing data from the FullContact API only when it is older than, say, 30 days.


Hopefully this gives you a good starting point on how your application can gracefully deal with rate limits. Remember that if you use an official FullContact client library (Java only for now), most of this is taken care of automatically. If you have any thoughts on rate limiting based on your experience integrating with the FullContact API, we’d love to hear from you!