Getting Ready: Rate Limiter
Getting Ready: Rate Limiter
Why This Problem Matters
Every public API on the internet is protected by a Rate Limiter. When you see "429 Too Many Requests" from the GitHub API (5,000 requests/hour), the Twitter/X API (300 requests/15 minutes), or the Stripe API (100 requests/second), that's a rate limiter rejecting your request because you've exceeded the allowed throughput.
Rate limiters are critical infrastructure:
- AWS API Gateway uses Token Bucket to throttle API calls — 10,000 requests/second with burst support per account
- Cloudflare protects 30+ million websites using rate limiting to absorb DDoS attacks that can exceed 50 million requests per second
- NGINX uses Leaky Bucket at the reverse proxy layer to smooth traffic spikes before they reach backend services
- Redis-based limiters power rate limiting at Stripe, Shopify, and Discord for per-user and per-endpoint throttling
What makes the Rate Limiter an outstanding concurrency-focused LLD problem is the combination of:
- Algorithm selection — Three fundamentally different algorithms (Token Bucket, Sliding Window, Leaky Bucket) each with distinct trade-offs in accuracy, memory usage, and burst tolerance. Choosing the right one is a genuine engineering decision, not a stylistic preference.
- Thread safety under high contention — A rate limiter sits in the critical path of every request. When 1,000 concurrent requests hit
allowRequest()simultaneously, the token counter or timestamp log must remain consistent. A single race condition means either dropping legitimate traffic or allowing a flood through. - Strategy Pattern as a natural fit — The three algorithms share the same interface (
allowRequest(clientId) → boolean) but differ entirely in internal mechanics. This is the textbook case for the Strategy Pattern — and our design uses it to swap algorithms at runtime without touching client code. - Lazy vs. eager state management — Token Bucket can refill tokens on-demand (lazy, calculated at call time) or via a background thread (eager). This is a real-world design decision that affects complexity, accuracy, and CPU usage.
This problem appears in LLD interviews at companies like Google, Amazon, Stripe, and Cloudflare because it tests whether candidates understand concurrency primitives, algorithmic trade-offs, and clean pattern application — skills that directly transfer to building production infrastructure.
What You'll Learn
By the end of this design problem, you will be able to:
- Understand three rate limiting algorithms in depth — Token Bucket (burst-tolerant), Sliding Window (boundary-accurate), and Leaky Bucket (smooth-output) — with step-by-step traced examples and complexity analysis for each
- Apply the Strategy Pattern to swap rate limiting algorithms at runtime without modifying client code — the same interface, different internal mechanics
- Design thread-safe data structures using locks, atomic operations, and careful state management to handle concurrent
allowRequest()calls - Make lazy vs. eager trade-off decisions — understand when to compute token refills on demand vs. using a background refill thread
- Create a UML class diagram that captures the Strategy hierarchy, per-client state management, and configuration
- Write complete thread-safe implementations in Python and Java that are production-ready, not toy examples
Prerequisites
Before starting this problem, make sure you're comfortable with:
- Strategy Pattern — encapsulating a family of algorithms behind a common interface (Fundamentals S12). The rate limiter is one of the cleanest real-world Strategy applications.
- Concurrency Fundamentals — threads, locks (mutex), and race conditions (Fundamentals S14). Every algorithm in this problem needs thread-safe access to shared counters or timestamp logs.
- Thread-Safe Blocking Queue (DP33) — if you completed the Blocking Queue problem, you've already seen mutex + condition variable coordination. The Leaky Bucket uses a similar FIFO queue structure.
- OOP Relationships — composition (limiter owns its state), association (client ↔ limiter mapping) (Fundamentals S4)
- Time Complexity Analysis — we'll analyze O(1) vs. O(n) operations and explain why Token Bucket is faster than Sliding Window Log
Problem Scope
We will design: The core rate limiting engine — the algorithms, the class hierarchy, per-client state tracking, thread-safe implementations, and the Strategy interface that makes algorithms interchangeable.
We will NOT design: Distributed rate limiting (Redis clusters, consistent hashing across nodes), HTTP middleware integration, response headers (X-RateLimit-Remaining), persistent storage of rate state, or dashboard/monitoring UI. This is a Low Level Design (object-oriented, single-process) exercise, not a System Design (distributed infrastructure) problem.
What distinguishes this from DP33 (Blocking Queue)? The Blocking Queue is a general-purpose concurrent data structure — producer/consumer with blocking semantics. The Rate Limiter is an application-level component that uses concurrency primitives to solve a specific problem: controlling request throughput. Where the Blocking Queue focuses on wait/notify coordination, the Rate Limiter focuses on algorithm selection, time-based state management, and per-client isolation.