A compact, high-performance rate limiting service and library supporting Token Bucket and Leaky Bucket algorithms with per-IP/per-user/per-key limiting, comprehensive observability, and production-ready features.
- 🪣 Dual Algorithms: Token Bucket (burstiness) and Leaky Bucket (pacing)
- 🔑 Flexible Key Derivation: IP, JWT subject, or custom headers
- 📊 Full Observability: Prometheus metrics, Grafana dashboards
- 🌙 Shadow Mode: Log-only testing for safe rollouts
- 🏃 Multiple Backends: In-memory or Redis with atomic Lua scripts
- 📝 Standard Headers: X-RateLimit-* and IETF RateLimit-* styles
- 🚫 Penalty Box: Temporary bans for repeat violators
- 🎯 Hot Key Detection: Identify and monitor high-traffic keys
- 🔧 Admin API: Policy management, key inspection, health checks
- 🐳 Docker Ready: Complete stack with Docker Compose
# Setup environment
make setup
make config-env
# Start observability stack
make up
# Option 1: Memory backend (fastest setup)
make api
# Option 2: Redis backend (production-like)
make redis-up
make api-redis
# Run load tests
make load-smoke
make load-spike
# View dashboards
make urls┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Demo API │ │ Rate Limiter │ │ Storage │
│ │ │ │ │ │
│ ┌─────────────┐ │ │ ┌──────────────┐ │ │ ┌─────────────┐ │
│ │ aiohttp │ │───▶│ │ Middleware │ │───▶│ │ Memory/ │ │
│ │ middleware │ │ │ │ │ │ │ │ Redis │ │
│ └─────────────┘ │ │ └──────────────┘ │ │ └─────────────┘ │
│ │ │ ┌──────────────┐ │ │ │
│ ┌─────────────┐ │ │ │ Algorithms │ │ │ │
│ │ Endpoints │ │ │ │ • Token │ │ │ │
│ │ • /api/echo │ │ │ │ • Leaky │ │ │ │
│ │ • /api/orders│ │ │ └──────────────┘ │ │ │
│ └─────────────┘ │ └──────────────────┘ └─────────────────┘
└─────────────────┘
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Monitoring │ │ Load Testing │ │ Management │
│ │ │ │ │ │
│ ┌─────────────┐ │ │ ┌──────────────┐ │ │ ┌─────────────┐ │
│ │ Prometheus │ │ │ │ k6 Scripts │ │ │ │ Admin API │ │
│ └─────────────┘ │ │ │ • Smoke │ │ │ │ • Policies │ │
│ ┌─────────────┐ │ │ │ • Spike │ │ │ │ • Hot Keys │ │
│ │ Grafana │ │ │ │ • Fairness │ │ │ │ • Reset │ │
│ └─────────────┘ │ │ └──────────────┘ │ │ └─────────────┘ │
└─────────────────┘ └──────────────────┘ └─────────────────┘
| Feature | Token Bucket | Leaky Bucket |
|---|---|---|
| Burst Handling | ✅ Allows bursts up to capacity | ❌ Strict pacing |
| Smoothing | ❌ Variable rate | ✅ Consistent rate |
| Use Case | API rate limiting, bursty traffic | Traffic shaping, QoS |
| Client Experience | Fast when tokens available | Predictable delays |
| Implementation | Simpler | More complex |
tokens = min(capacity, tokens + (time_elapsed * rate))
allow = tokens >= cost
queue_depth = max(0, current_depth - (time_elapsed * leak_rate))
allow = queue_depth + cost <= capacity
defaults:
algo: token
rate_per_sec: 10
burst: 30
headers_style: x-ratelimit
routes:
- id: public-api
match:
path_prefix: "/api/public"
key_derivation: ip
algo: token
rate_per_sec: 8
burst: 16
- id: user-orders
match:
regex: "^/api/orders(/.*)?$"
key_derivation: jwt.sub
algo: token
rate_per_sec: 5
burst: 10
method_costs:
POST: 2
DELETE: 3
penalty_box:
enabled: true
fail_threshold: 5
ttl_sec: 60
- id: heavy-processing
match:
path_prefix: "/api/heavy"
key_derivation: "header:X-Api-Key"
algo: leaky
leak_per_sec: 2
capacity: 10
shadow: false# Core Configuration
WITH_REDIS=false # Use Redis backend
DEFAULT_ALGO=token # Default algorithm
HEADERS_STYLE=x-ratelimit # Header style
KEY_DERIVATION=ip # Default key derivation
# Shadow Mode
WITH_SHADOW=false # Global shadow mode
# Performance
LOCAL_RPS=400 # Target RPS for load tests
ACTORS=5000 # Number of simulated users
KEY_CARDINALITY=1000 # Number of unique keys
# Ports
API_PORT=8080 # Demo API
RL_PORT=8085 # Rate limiter service
PROM_PORT=9090 # Prometheus
GRAFANA_PORT=3000 # Grafana
REDIS_PORT=6379 # Redis
# Clock
CLOCK_SKEW_MS=0 # Simulated clock skewkey_derivation: ip- Uses client IP address
- Supports X-Forwarded-For, X-Real-IP headers
- Good for: Public APIs, DDoS protection
key_derivation: jwt.sub- Extracts
subclaim from JWT token - No signature verification (for performance)
- Good for: User-specific limits
key_derivation: "header:X-Api-Key"- Uses value from specified header
- Good for: API key-based limiting
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 75
X-RateLimit-Reset: 1640995800
X-RateLimit-Cost: 2
Retry-After: 30RateLimit-Limit: 100;w=3600
RateLimit-Remaining: 75
RateLimit-Reset: 1825
Retry-After: 30Shadow mode allows testing rate limiting policies without blocking traffic:
routes:
- id: test-route
shadow: true # Log blocks but don't enforceFeatures:
- ✅ All requests allowed
- 📝 Logs would-be blocks
- 📊 Metrics recorded
- 🏷️ Special headers added
# Request metrics
rl_requests_total{route,decision,algo,backend}
rl_tokens_remaining{route,algo}
rl_retry_after_seconds_bucket{route,algo}
# Shadow mode
rl_shadow_would_block_total{route,algo}
# Penalty box
rl_penalty_box_total{route,action}
# Storage performance
rl_storage_latency_seconds_bucket{backend,operation}
rl_active_keys{route}
# Health
rl_backend_health{backend}
Access Grafana at http://localhost:3000 (admin/admin) for:
- Request Rate & Decisions: Allow/block rates by route
- Response Times: P50/P95/P99 latencies
- Token Levels: Remaining capacity by route
- Hot Keys: Most active rate limiting keys
- Storage Health: Redis/memory performance
- Penalty Box Activity: Temporary bans
- Shadow Mode Analysis: Would-block events
make load-smokeBasic functionality test with light load.
make load-spike LOCAL_RPS=800 ACTORS=10000High-load test with configurable parameters.
make load-fairness KEY_CARDINALITY=5000Multi-user fairness and hot-key detection.
GET /healthz- Health checkGET/POST /api/echo- Echo endpoint (IP-based limiting)GET /api/heavy?delay=N- Heavy processing (leaky bucket)GET /api/burst- Burst testingGET/POST/DELETE /api/orders- Orders (JWT-based limiting)GET /api/data- Data endpoint (API key-based)
GET /admin/policies- View policiesPOST /admin/reload- Reload configurationGET /admin/keys/hot?limit=10- Hot keysPOST /admin/keys/reset- Reset specific keyPOST /admin/shadow- Toggle shadow mode
GET /metrics- Prometheus metricsGET /healthz- Health check
- ✅ Zero dependencies
- ✅ Low latency
- ✅ Lock-based concurrency
- ❌ Single instance only
- ❌ No persistence
- ✅ Distributed
- ✅ Atomic Lua scripts
- ✅ Persistent
- ✅ Clock skew tolerance
⚠️ Network latency⚠️ Additional dependency
apiVersion: apps/v1
kind: Deployment
metadata:
name: rate-limiter
spec:
replicas: 3
selector:
matchLabels:
app: rate-limiter
template:
metadata:
labels:
app: rate-limiter
spec:
containers:
- name: rate-limiter
image: rate-limiter-service:latest
ports:
- containerPort: 8085
env:
- name: WITH_REDIS
value: "true"
- name: REDIS_URL
value: "redis://redis-service:6379"- Adjust
cleanup_intervalfor GC frequency - Monitor
rl_active_keysmetric - Consider TTL values vs memory usage
- Use Redis cluster for scale
- Monitor Lua script performance
- Tune
clock_skew_tolerance_ms - Consider Redis pipeline for bulk operations
- Profile hot paths with
rl_request_latency_seconds - Monitor
rl_storage_latency_seconds - Scale horizontally behind load balancer
High Latency
make debug-metrics | grep latencyCheck storage backend performance.
Missing Rate Limit Headers
make debug-policiesVerify route configuration matches request paths.
Redis Connection Issues
make debug-redisCheck Redis connectivity and health.
Unexpected Blocking
curl http://localhost:8085/admin/keys/hotLook for hot keys or penalty box activations.
make status # Service health
make debug-metrics # Current metrics
make debug-policies # Active policies
make debug-hot-keys # Hot keys
make debug-redis # Redis statusmake setup-dev # Install dev dependencies
make check # Run linting and type checks
make test # Run test suite
make test-cov # Test coverage reportmake lint # Flake8 linting
make format # Black formatting
make type-check # MyPy type checkingUnit Tests (test/unit/)
- Algorithm math correctness
- Key derivation logic
- Header formatting
- Shadow mode behavior
Integration Tests (test/integration/)
- Storage backend atomicity
- Policy loading and matching
- Concurrency under load
E2E Tests (test/e2e/)
- Full request flow
- k6 load test validation
- Multi-service interaction
class CustomStorage:
async def get(self, key: str) -> Optional[BucketState]:
# Implement key retrieval
pass
async def set(self, key: str, state: BucketState, ttl: int):
# Implement key storage
passdef custom_key_deriver(strategy: str, route_id: str, request_data: dict) -> str:
if strategy == "custom:device_id":
device_id = request_data["headers"].get("X-Device-ID")
return f"{route_id}:device:{device_id}"
raise KeyDerivationError(f"Unknown strategy: {strategy}")class CustomBucket:
def check_and_consume(self, state: BucketState, cost: int) -> Tuple[RateLimitResult, BucketState]:
# Implement custom rate limiting logic
pass- Throughput: 50,000+ RPS (single instance)
- Latency: P95 < 1ms, P99 < 5ms
- Memory: ~100 bytes per active key
- Throughput: 10,000+ RPS (single Redis)
- Latency: P95 < 10ms, P99 < 50ms
- Overhead: ~200 bytes per key + network
- Memory: Linear scaling per instance
- Redis: Horizontal scaling with cluster
- Hot Keys: Automatic detection and monitoring
MIT License - see LICENSE file for details.
- Fork the repository
- Create feature branch
- Add tests for new functionality
- Ensure all tests pass (
make test) - Run code quality checks (
make check) - Submit pull request
- 📖 Full documentation in
/docs - 🐛 Issues: GitHub Issues
- 💬 Discussions: GitHub Discussions
- 📧 Security: security@example.com