Implementing Rate Limiting in APIs

Why rate limit

Protect backend resources from abuse, ensure fair usage across clients, and provide a first line of defence against DDoS at the application layer.

Algorithms

Fixed window — count requests per time window (e.g., 100 per minute). Simple but allows burst at window boundaries (100 at :59, 100 at :01 = 200 in 2 seconds).
Sliding window log — store timestamp of each request. Accurate but memory-intensive at scale.
Token bucket — a bucket refills at a constant rate; each request consumes a token. Allows controlled bursting. Most common algorithm.

Redis token bucket implementation

-- Lua script for atomic token bucket in Redis
local key = KEYS[1]
local rate = tonumber(ARGV[1])     -- tokens per second
local capacity = tonumber(ARGV[2]) -- max burst
local now = tonumber(ARGV[3])

local bucket = redis.call("HMGET", key, "tokens", "last_refill")
local tokens = tonumber(bucket[1]) or capacity
local last = tonumber(bucket[2]) or now

tokens = math.min(capacity, tokens + (now - last) * rate)
if tokens >= 1 then
  tokens = tokens - 1
  redis.call("HMSET", key, "tokens", tokens, "last_refill", now)
  return 1  -- allowed
end
return 0  -- denied

Response headers

Always return X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset so clients can back off gracefully.

Why rate limit

Algorithms

Redis token bucket implementation

Response headers

Related articles

Database Schema Migration Strategies

REST API Versioning Strategies

REST API design principles we follow

JWT Authentication — Implementation and Security Patterns