rate-limiting.md 3.6 KB

Rate Limiting Patterns

Strategies and headers for API rate limiting.

Standard Headers

Response Headers

X-RateLimit-Limit: 1000          # Max requests per window
X-RateLimit-Remaining: 847       # Requests remaining
X-RateLimit-Reset: 1698415200    # Unix timestamp when limit resets
Retry-After: 60                  # Seconds to wait (on 429)

Rate Limit Response (429)

{
  "error": {
    "code": "RATE_LIMIT_EXCEEDED",
    "message": "Too many requests",
    "retry_after": 60,
    "limit": 1000,
    "remaining": 0,
    "reset_at": "2024-10-27T12:00:00Z"
  }
}

Rate Limiting Strategies

Fixed Window

Simple: count requests in fixed time periods.

Window: 1 minute (00:00-00:59, 01:00-01:59, ...)
Limit: 100 requests

Pros: Simple to implement
Cons: Burst at window edges (200 in 2 seconds across boundary)

Sliding Window

Smoother: use weighted average across windows.

Current window: 50% through
Previous window: 60 requests
Current window: 40 requests

Weighted count = (60 × 0.5) + 40 = 70
Remaining = 100 - 70 = 30

Pros: Smoother limits
Cons: More complex, needs previous window data

Token Bucket

Allows bursts with steady refill.

Bucket capacity: 100 tokens
Refill rate: 10 tokens/second

- Start with 100 tokens
- Each request costs 1 token
- Tokens refill at 10/sec
- Burst allowed up to 100, then steady 10/sec

Pros: Allows bursts, intuitive
Cons: More state to track

Leaky Bucket

Fixed output rate, queue excess.

Processing rate: 10 requests/second
Queue size: 50

- Requests queue up
- Processed at constant rate
- Queue overflow = 429

Pros: Smooth output, protects backend
Cons: Adds latency

Rate Limit Tiers

By User/Plan

# Free tier
X-RateLimit-Limit: 100
X-RateLimit-Window: 3600   # per hour

# Pro tier
X-RateLimit-Limit: 10000
X-RateLimit-Window: 3600

By Endpoint

# Search (expensive)
X-RateLimit-Limit: 10
X-RateLimit-Window: 60

# Read (cheap)
X-RateLimit-Limit: 1000
X-RateLimit-Window: 60

By Operation Type

# Writes
POST/PUT/DELETE: 100/minute

# Reads
GET: 1000/minute

Implementation Headers

GitHub Style

X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4999
X-RateLimit-Reset: 1372700873
X-RateLimit-Used: 1
X-RateLimit-Resource: core

RFC Draft (RateLimit Headers)

RateLimit-Limit: 100
RateLimit-Remaining: 50
RateLimit-Reset: 60

Client Handling

Retry Logic

async function fetchWithRetry(url, options, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    const response = await fetch(url, options);

    if (response.status === 429) {
      const retryAfter = response.headers.get('Retry-After') || 60;
      await sleep(retryAfter * 1000);
      continue;
    }

    return response;
  }
  throw new Error('Rate limit exceeded after retries');
}

Proactive Backoff

function checkRateLimit(response) {
  const remaining = response.headers.get('X-RateLimit-Remaining');
  const reset = response.headers.get('X-RateLimit-Reset');

  if (remaining < 10) {
    const waitMs = (reset - Date.now()) / remaining;
    // Slow down requests
  }
}

Best Practices

Server Side

  1. Include rate limit headers in all responses
  2. Return 429 with clear error message
  3. Always include Retry-After
  4. Consider different limits per endpoint
  5. Log rate limit hits for monitoring

Client Side

  1. Respect Retry-After header
  2. Implement exponential backoff
  3. Monitor remaining quota
  4. Cache responses to reduce requests
  5. Batch operations when possible