What is rate-limiting and how to implement it in a python application?

What is rate-limiting and how to implement it in a python application?

I am experiencing an unexpectedly big amount of traffic.

Someone is brute-forcing my login endpoint.

I am being under a DDoS attack.


If some of the quotes above sound familiar you are at the right place. Rate limiting, also known as throttling is a concept of restricting access to your application. Either if it's your client's requirement or you've found it a hard way by experiencing an attack such as DDoS, it's always a great choice to look into this topic. This guide will teach you the main theoretical concepts together with how to implement them in practice using FastAPI and token-throttler packages. Let's dive into it.

What is rate-limiting?

Since we are going to be building an API to showcase rate-limiting power, I'm going to stick with API terminology. As the first sentence of this article already states and as we all know that repetition is the mother of learning, rate limiting is a concept of restricting access to your application. What does that mean? That means that you, as an API creator, want to block a specific user from sending burst requests to your endpoints. Why would I do that? Because you don't want to overload your servers or in the worst-case scenario experience an outage, and accordingly have an unnecessarily big cost.

There are multiple techniques to tackle this problem but in this guide, we will stick with the Token bucket technique since the token-throttler package that we are going to be using is built on top of it.

Token bucket

Terms in the IT world have pretty intuitive, real-life names to make us understand concepts better. This term is no exception and I guess you already have a pretty good idea of what it might represent. Let's break it even further in an as simple and intuitive way as possible.

The token is nothing but a single unit that we award a user with. You can relate it to a drop of water, or a real unit of 1L (1 liter) for example. We can limit the bucket by its size so for instance, if we have a 5L bucket that bucket fits only 5L of water and if we pour more water it overflows. You get the idea. In our use case, equivalent would be to create a bucket for each user with a maximum amount of tokens set and to allow access to the users as long as they don't exceed the maximum limit. That means that for every request user fills a bucket with one token until it reaches the maximum amount. After it reaches the maximum amount we return 429 Too Many Requests response.



Create a new directory with a provisional name and inside create two files, app.py and requirements.in as shown in snippets below.




from fastapi import FastAPI
from fastapi.responses import JSONResponse

from token_throttler import TokenBucket, TokenThrottler
from token_throttler.storage import RuntimeStorage

throttler: TokenThrottler = TokenThrottler(cost=1, storage=RuntimeStorage())
throttler.add_bucket(identifier="user_id", bucket=TokenBucket(replenish_time=10, max_tokens=5))

app = FastAPI()

async def root():
    if throttler.consume(identifier="user_id"):
        return JSONResponse(status_code=200, content={"message": "Hello User, you have access!"})
    return JSONResponse(status_code=429, content={"message": "You've reached the limit!"})

requirements.in is a file containing all dependencies required for our application to run and before installing them it's best practice to use a virtual environment so let's do that.

Create a virtual environment:

python3 -m venv venv

Activate virtual environment:

source venv/bin/activate

Install dependencies:

pip install -r requirments.in

Before running the application let's dig into the code a little bit. TokenThrottler is a class responsible for bucket management. It allows us to add and delete buckets. Two required parameters need to be passed on initialization and those are cost and storage.

  • cost - How many tokens is the user going to consume per request.
  • storage - What type of storage are we going to utilize. For this use case memory storage seems about right, in a real world application you would probably like to persist data in something like Redis.

Method add_bucket takes two parameters, identifier and bucket.

  • identifier - String value that represents an identity. In our case, we hardcoded it to a user_id value but in the real world that could be a real user-id coming from for example access token or it could be an IP.
  • bucket - Creates a bucket of TokenBucket type.
    • replenish_time - Time in which the user is allowed to consume max_tokens. If the user exceeds max_tokens before this time passes, 429 Too Many Requests response gets returned.
    • max_tokens - Number of tokens that the user is allowed to consume in a given time.

Now when we've learned code basics let's run the app:

uvicorn app:app --reload

You should now be able to access the application on and if you try to access it more than 5 times in 10 seconds you should be getting 429 response as shown in the logs below.


That's it, hope you liked it.

Thank you for reading.