Why rate limit

Protect backend resources from abuse, ensure fair usage across clients, and provide a first line of defence against DDoS at the application layer.

Algorithms

  • Fixed window — count requests per time window (e.g., 100 per minute). Simple but allows burst at window boundaries (100 at :59, 100 at :01 = 200 in 2 seconds).
  • Sliding window log — store timestamp of each request. Accurate but memory-intensive at scale.
  • Token bucket — a bucket refills at a constant rate; each request consumes a token. Allows controlled bursting. Most common algorithm.

Redis token bucket implementation

-- Lua script for atomic token bucket in Redis
local key = KEYS[1]
local rate = tonumber(ARGV[1])     -- tokens per second
local capacity = tonumber(ARGV[2]) -- max burst
local now = tonumber(ARGV[3])

local bucket = redis.call("HMGET", key, "tokens", "last_refill")
local tokens = tonumber(bucket[1]) or capacity
local last = tonumber(bucket[2]) or now

tokens = math.min(capacity, tokens + (now - last) * rate)
if tokens >= 1 then
  tokens = tokens - 1
  redis.call("HMSET", key, "tokens", tokens, "last_refill", now)
  return 1  -- allowed
end
return 0  -- denied

Response headers

Always return X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset so clients can back off gracefully.