Rate limiting is a critical mechanism in software architecture that controls resource usage and safeguards system security. By limiting the number of requests handled within a given time frame, rate limiting prevents system overloads, ensuring stability under high-load conditions. This article aims to analyze and implement several expected rate-limiting algorithms and discuss their pros and cons. Finally, we’ll explore rate-limiting designs in real-world business code.
This article was first published in the Medium MPP plan. If you’re a Medium user, please follow me on Medium. Thank you very much.
Traffic Counter Mode
Assuming performance tests indicate that our system can handle a maximum load of 10 TPS, the most straightforward approach is to set a timer that allows ten requests per second and rejects any excess traffic.
Code1: Windows Limit Example. You can run the code example here.
However, this approach has a flaw. Suppose we receive 10 TPS requests within 2 seconds, between 0.9s and 1.1s. Even though the request rate does not exceed the threshold of 10 TPS in each period, the system still experiences 20 TPS within 1 second, exceeding the limit. The root cause of this issue lies in the discrete time-based counting, and to fix this, a “sliding window” rate limiting mode is designed to smooth the time-based calculations.
Sliding Window Mechanism
The Sliding Window Algorithm is widely used in various fields of computer science, such as congestion control in TCP protocols. This algorithm divides time into smaller segments, each maintaining an independent counter. When a request arrives, it is assigned to the current time segment, and the corresponding counter is checked. If the counter has not reached its limit, the request is allowed, and the counter is incremented. Otherwise, the request is rejected. Older segments fade out of the window as time progresses while new segments are added.
Code2: Sliding Window Example. You can try this code here.
Although the Sliding Window Algorithm provides more granular traffic control, it doesn’t fully resolve sudden traffic bursts. It also struggles to reshape traffic curves with finer granularity and does not achieve peak smoothing. Below are two additional rate-limiting algorithms designed for more stringent control.
Leaky Bucket Algorithm
The Leaky Bucket Algorithm simulates a bucket with a fixed capacity and processes requests at a constant rate, regardless of the incoming traffic. This ensures that requests are handled evenly, smoothing out traffic and preventing spikes.
Code3: Leaky Bucket Example. Run the code here.
The Leaky Bucket Algorithm is well-suited for scenarios requiring steady request processing, such as network traffic control and API rate limiting. The Leaky Bucket Algorithm effectively prevents system overload due to sudden traffic surges by controlling the rate at which tokens are added. The challenge lies in determining the bucket size and outflow rate. The system may still experience traffic surges if the bucket is too large. If it’s too tiny, legitimate requests could be unnecessarily rejected.
Token Bucket Algorithm
The Token Bucket Algorithm is similar to real-world queueing systems, where tokens are issued at a fixed rate. For example, if we want to limit the system to a maximum of Y requests over X seconds, tokens are added to the bucket every X/Y seconds. Incoming requests must first acquire a token from the bucket to be processed. If no tokens are available, the request is rejected.
Code 4: Token Bucket Example. You can try this code here.
The Token Bucket Algorithm is ideal for handling burst traffic, such as network communications or API calls. It effectively balances traffic and prevents overload by controlling the token generation rate and the bucket’s capacity. Additionally, if system resources are sufficient, it allows for quickly handling more requests. The token generation speed is critical and must be carefully considered.
Rate Limiting in Real-world Applications
Distributed Rate Limiting Algorithms
The four rate-limiting algorithms discussed above store their state in memory, which works for single-node systems. However, in microservice architectures, these algorithms are most effective at the gateway level for cluster-wide traffic control but not for fine-grained control within individual service nodes. We need to store the state in a shared cluster to enable rate-limiting coordination across multiple nodes. Below is an example of using Redis to implement a distributed token bucket algorithm. This is the method I often use in production.
Code 5: Distributed Token Bucket Example using Redis
Prioritization
Server resources are valuable, and we want to allocate these resources to serve VIP clients during resource-constrained periods. A “quota” system based on distributed rate limiting offers a flexible solution. Unlike traditional token bucket algorithms, where access is binary (allow/deny), the quota system will enable users to consume resources as a “currency.” VIP users may be assigned higher quotas, while regular users face tighter limits. When a user’s quota is depleted, they must request more tokens to continue accessing services.
Code 6: Quota System Example. You can try this code here.
Conclusion
This article has covered several expected rate-limiting algorithms, including traffic counters, sliding windows, leaky buckets, and token buckets, and analyzed their application scenarios. Traffic counters are simple but need help with sudden traffic spikes. The sliding window algorithm provides a smoother approach but is still limited. The leaky bucket algorithm ensures steady request handling, while the token bucket algorithm allows flexible handling of burst traffic.
In real-world applications, especially in microservices, distributed rate limiting becomes essential. Using middleware like Redis, a distributed token bucket algorithm can effectively manage traffic across multiple nodes. Additionally, quota-based rate limiting offers greater flexibility and improves the experience for VIP customers. Rate limiting protects systems from overload and enhances overall stability and performance.