# Optimization Patterns

Proven performance optimization strategies across the stack.

## Caching Strategies

### Cache Selection Decision Tree

```
What are you caching?
│
├─ Computation result (same input → same output)
│  ├─ In-process only
│  │  └─ In-memory cache (LRU map, memoization)
│  │     Eviction: LRU, LFU, TTL
│  │     Tools: lru-cache (Node), functools.lru_cache (Python), sync.Map (Go)
│  └─ Shared across processes/servers
│     └─ Redis / Memcached
│        TTL-based, key-value, sub-millisecond latency
│
├─ Database query result
│  ├─ Rarely changes, expensive to compute
│  │  └─ Materialized view (database-level cache)
│  │     Refresh: on schedule, on trigger, on demand
│  ├─ Changes with writes
│  │  └─ Cache-aside pattern (read: cache → DB, write: DB → invalidate cache)
│  └─ Read-heavy, tolerate slight staleness
│     └─ Read replica + cache with TTL
│
├─ API response
│  ├─ Same for all users
│  │  └─ CDN cache (Cloudflare, CloudFront)
│  │     Headers: Cache-Control, ETag, Last-Modified
│  ├─ Varies by user but cacheable
│  │  └─ Vary header + CDN or reverse proxy cache
│  └─ Personalized but expensive
│     └─ Server-side cache (Redis) with user-specific keys
│
├─ Static assets (JS, CSS, images)
│  └─ CDN + long cache + content-hashed filenames
│     Cache-Control: public, max-age=31536000, immutable
│     Bust cache by changing filename (app.a1b2c3.js)
│
└─ HTML pages
   ├─ Static content
   │  └─ Pre-render at build time (SSG)
   │     Cache-Control: public, max-age=3600
   ├─ Mostly static, some dynamic
   │  └─ Stale-while-revalidate
   │     Cache-Control: public, max-age=60, stale-while-revalidate=3600
   └─ Fully dynamic
      └─ Short TTL or no-cache + ETag for conditional requests
```

### Cache Invalidation Patterns

```
Pattern: TTL (Time-To-Live)
├─ Simplest approach: cache expires after N seconds
├─ Pro: no coordination needed, self-healing
├─ Con: stale data for up to TTL duration
└─ Best for: session data, config, rate limits

Pattern: Cache-Aside (Lazy Loading)
├─ Read: check cache → miss → query DB → populate cache
├─ Write: update DB → delete cache key (not update)
├─ Pro: only caches what's actually requested
├─ Con: first request after invalidation is slow (cache miss)
└─ Best for: general purpose, most common pattern

Pattern: Write-Through
├─ Write: update cache AND DB in same operation
├─ Pro: cache always consistent with DB
├─ Con: write latency increases, caches unused data
└─ Best for: read-heavy data that must be fresh

Pattern: Write-Behind (Write-Back)
├─ Write: update cache, async flush to DB
├─ Pro: fast writes, batch DB operations
├─ Con: data loss risk if cache crashes before flush
└─ Best for: high-write-throughput, non-critical data (counters, analytics)

Pattern: Event-Driven Invalidation
├─ Publish event on data change, subscribers invalidate caches
├─ Pro: low latency invalidation, decoupled
├─ Con: eventual consistency, event delivery guarantees needed
└─ Best for: microservices, distributed systems
```

### Cache Anti-Patterns

| Anti-Pattern | Problem | Fix |
|-------------|---------|-----|
| Cache without eviction | Memory grows unbounded | Set max size + LRU/LFU eviction |
| Thundering herd | Cache expires, all requests hit DB simultaneously | Mutex/singleflight, stale-while-revalidate |
| Cache stampede | Hot key expires under high load | Background refresh before expiry |
| Inconsistent cache + DB | Update DB, crash before invalidating cache | Delete cache first (slightly stale reads), or use distributed transactions |
| Caching errors | Error response cached, served to all users | Only cache successful responses, or cache with very short TTL |
| Over-caching | Too many cache layers, hard to debug | Cache at one layer, usually closest to consumer |

## Database Optimization

### Indexing Strategy

```
Index selection decision tree:
│
├─ Query uses WHERE clause
│  ├─ Single column filter
│  │  └─ B-tree index on that column
│  ├─ Multiple column filter (AND)
│  │  └─ Composite index (most selective column first)
│  │     CREATE INDEX idx ON table(col_a, col_b, col_c)
│  │     Left-prefix rule: this index covers (a), (a,b), (a,b,c) queries
│  └─ Text search (LIKE '%term%')
│     └─ Full-text index (not B-tree, which only helps prefix LIKE 'term%')
│
├─ Query uses ORDER BY
│  └─ Index matching ORDER BY columns avoids filesort
│     Combine with WHERE columns: INDEX(where_col, order_col)
│
├─ Query uses JOIN
│  └─ Index on join columns of the inner table
│     ON a.id = b.a_id → index on b.a_id
│
├─ Query uses GROUP BY / DISTINCT
│  └─ Index matching GROUP BY columns
│
└─ High cardinality vs low cardinality
   ├─ High cardinality (many unique values): good index candidate
   └─ Low cardinality (few unique values, e.g., boolean): partial index
      CREATE INDEX idx ON orders(status) WHERE status = 'pending'
```

### Query Optimization Patterns

```sql
-- AVOID: SELECT * (fetches unnecessary columns)
SELECT * FROM users WHERE id = 1;
-- PREFER: select only needed columns
SELECT id, name, email FROM users WHERE id = 1;

-- AVOID: N+1 queries
-- Python/ORM: for user in users: user.orders  (fires query per user)
-- PREFER: eager loading
-- SELECT * FROM users JOIN orders ON users.id = orders.user_id

-- AVOID: OFFSET for pagination on large tables
SELECT * FROM posts ORDER BY created_at DESC LIMIT 20 OFFSET 10000;
-- PREFER: cursor-based pagination
SELECT * FROM posts WHERE created_at < '2025-01-01' ORDER BY created_at DESC LIMIT 20;

-- AVOID: functions on indexed columns in WHERE
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';
-- PREFER: expression index or store normalized
CREATE INDEX idx_email_lower ON users(LOWER(email));

-- AVOID: implicit type conversion
SELECT * FROM users WHERE id = '123';  -- string vs integer
-- PREFER: correct types
SELECT * FROM users WHERE id = 123;

-- AVOID: correlated subqueries
SELECT *, (SELECT COUNT(*) FROM orders WHERE orders.user_id = users.id) FROM users;
-- PREFER: JOIN with GROUP BY
SELECT users.*, COUNT(orders.id) FROM users LEFT JOIN orders ON users.id = orders.user_id GROUP BY users.id;
```

### Connection Pooling

```
Connection pool sizing:
│
├─ Too small
│  └─ Requests queue waiting for connections
│     Symptom: latency spikes, "connection pool exhausted" errors
│
├─ Too large
│  └─ Database overwhelmed with connections
│     Symptom: high memory usage on DB, mutex contention, slower queries
│
└─ Right size
   └─ Formula: connections = (core_count * 2) + effective_spindle_count
      For SSD: connections ≈ core_count * 2-3
      For cloud DB (e.g., RDS): check instance limits

Tools:
- PostgreSQL: PgBouncer (transaction pooling, session pooling)
- MySQL: ProxySQL
- Java: HikariCP (fastest JVM pool)
- Python: SQLAlchemy pool (pool_size, max_overflow, pool_timeout)
- Node.js: built-in pool in pg, mysql2, knex
- Go: database/sql built-in pool (SetMaxOpenConns, SetMaxIdleConns)
```

## Frontend Optimization

### Code Splitting

```
When to split:
│
├─ Route-based splitting (most impactful)
│  └─ Each page/route is a separate chunk
│     React: React.lazy(() => import('./Page'))
│     Next.js: automatic per-page
│     Vue: defineAsyncComponent(() => import('./Page.vue'))
│
├─ Component-based splitting
│  └─ Heavy components loaded on demand
│     Modal dialogs, rich text editors, charts, maps
│     <Suspense fallback={<Spinner />}><LazyComponent /></Suspense>
│
├─ Library-based splitting
│  └─ Large libraries in separate chunks
│     Moment.js, chart libraries, syntax highlighters
│     // vite.config.js
│     build: { rollupOptions: { output: {
│       manualChunks: { vendor: ['react', 'react-dom'] }
│     }}}
│
└─ Conditional feature splitting
   └─ Features only some users need (admin panel, A/B tests)
      const AdminPanel = lazy(() => import('./AdminPanel'))
```

### Image Optimization

```
Format selection:
│
├─ Photographs
│  ├─ Best: AVIF (30-50% smaller than JPEG)
│  ├─ Good: WebP (25-35% smaller than JPEG)
│  └─ Fallback: JPEG (universal support)
│
├─ Graphics/logos with transparency
│  ├─ Best: WebP or AVIF
│  ├─ Good: PNG (lossless)
│  └─ Simple graphics: SVG (scalable, tiny file size)
│
└─ Icons
   └─ SVG sprites or icon fonts (not individual PNGs)

Implementation:
<picture>
  <source srcset="image.avif" type="image/avif">
  <source srcset="image.webp" type="image/webp">
  <img src="image.jpg" alt="Description"
       loading="lazy"
       decoding="async"
       width="800" height="600">
</picture>

Responsive images:
<img srcset="small.jpg 400w, medium.jpg 800w, large.jpg 1200w"
     sizes="(max-width: 600px) 400px, (max-width: 900px) 800px, 1200px"
     src="medium.jpg" alt="Description">
```

### Critical Rendering Path

```
Optimization checklist:
│
├─ Critical CSS
│  ├─ Inline above-the-fold CSS in <head>
│  ├─ Defer non-critical CSS: <link rel="preload" as="style">
│  └─ Tools: critical (npm package), Critters (webpack plugin)
│
├─ JavaScript loading
│  ├─ <script defer>: download parallel, execute after HTML parse
│  ├─ <script async>: download parallel, execute immediately (non-order)
│  ├─ <script type="module">: deferred by default, strict mode
│  └─ Move non-critical JS below the fold
│
├─ Resource hints
│  ├─ <link rel="preconnect" href="https://api.example.com">
│  │  Establish early connections to known origins
│  ├─ <link rel="dns-prefetch" href="https://cdn.example.com">
│  │  Resolve DNS early for third-party domains
│  ├─ <link rel="preload" href="font.woff2" as="font" crossorigin>
│  │  Preload critical resources (fonts, hero image, key CSS)
│  └─ <link rel="prefetch" href="next-page.js">
│     Prefetch resources for likely next navigation
│
├─ Font optimization
│  ├─ font-display: swap (show fallback immediately)
│  ├─ Subset fonts to used characters
│  ├─ Preload critical fonts
│  ├─ Use woff2 format (best compression)
│  └─ Self-host instead of Google Fonts (one fewer connection)
│
└─ Layout stability (CLS)
   ├─ Always set width/height on images and videos
   ├─ Reserve space for ads and embeds
   ├─ Use CSS aspect-ratio for responsive containers
   └─ Avoid inserting content above existing content
```

## API Optimization

### Response Optimization

```
Reducing payload size:
│
├─ Field selection (GraphQL-style)
│  └─ GET /users/123?fields=id,name,email
│     Only return requested fields
│
├─ Pagination
│  ├─ Offset-based: ?page=3&limit=20
│  │  Simple but slow for deep pages (OFFSET 10000)
│  ├─ Cursor-based: ?after=cursor_abc&limit=20
│  │  Fast at any depth, stable during inserts
│  └─ Keyset: ?created_after=2025-01-01&limit=20
│     Uses indexed column for efficient seeking
│
├─ Compression
│  ├─ Brotli (br): best ratio for static content
│  ├─ gzip: universal support, good for dynamic content
│  ├─ Accept-Encoding: br, gzip
│  └─ Skip compression for small payloads (<150 bytes)
│
├─ Batch endpoints
│  └─ POST /batch with array of operations
│     Reduces HTTP overhead, enables server-side optimization
│
└─ Caching headers
   ├─ ETag + If-None-Match: 304 Not Modified (no body transfer)
   ├─ Last-Modified + If-Modified-Since: same as ETag
   └─ Cache-Control: max-age=60, stale-while-revalidate=600
```

### HTTP/2 and HTTP/3

```
HTTP/2 optimizations (changes from HTTP/1.1):
│
├─ Multiplexing: multiple requests on single connection
│  └─ STOP: domain sharding (hurts with HTTP/2)
│  └─ STOP: sprite sheets (individual files are fine)
│  └─ STOP: concatenating CSS/JS into mega-bundles
│
├─ Server Push: server sends resources before client requests
│  └─ Mostly deprecated: use <link rel="preload"> instead
│
├─ Header compression (HPACK)
│  └─ Automatic, no action needed
│
└─ Stream prioritization
   └─ Browsers handle automatically, server must support

HTTP/3 (QUIC):
├─ Faster connection setup (0-RTT)
├─ No head-of-line blocking
├─ Better on lossy networks (mobile)
└─ Enable on CDN/reverse proxy (Cloudflare, nginx with quic module)
```

## Concurrency Optimization

### Worker Pool Patterns

```
When to use worker pools:
│
├─ CPU-bound tasks
│  ├─ Pool size = number of CPU cores
│  ├─ Node.js: worker_threads
│  ├─ Python: multiprocessing.Pool, ProcessPoolExecutor
│  ├─ Go: bounded goroutine pool (semaphore pattern)
│  └─ Rust: rayon::ThreadPool, tokio::task::spawn_blocking
│
├─ I/O-bound tasks
│  ├─ Pool size = much larger than CPU cores (100s-1000s)
│  ├─ Node.js: async/await (event loop handles concurrency)
│  ├─ Python: asyncio, ThreadPoolExecutor for blocking I/O
│  ├─ Go: goroutines (lightweight, thousands are fine)
│  └─ Rust: tokio tasks (lightweight, thousands are fine)
│
└─ Mixed workloads
   └─ Separate pools for CPU and I/O work
      Don't let CPU-bound tasks block I/O pool
      Don't let I/O waits occupy CPU pool slots
```

### Backpressure

```
Without backpressure:
Producer (fast) → → → → → QUEUE OVERFLOW → OOM/Crash
                          ↓ ↓ ↓ ↓ ↓ ↓ ↓
Consumer (slow) → → →

With backpressure:
Producer (fast) → → BLOCKED (queue full)
                          ↓ ↓ ↓
Consumer (slow) → → → (processes at own pace)

Implementation strategies:
├─ Bounded queues: reject/block when full
├─ Rate limiting: token bucket, leaky bucket
├─ Circuit breaker: stop sending when consumer is unhealthy
├─ Load shedding: drop low-priority work under pressure
└─ Reactive streams: consumer signals demand to producer
```

### Batching

```
Individual operations:
Request 1 → DB Query → Response
Request 2 → DB Query → Response    Total: 100 DB queries
Request 3 → DB Query → Response
... (100 times)

Batched operations:
Request 1 ─┐
Request 2 ──┤ Batch → 1 DB Query → Responses    Total: 1 DB query
Request 3 ──┤
... (100)  ─┘

Implementation:
├─ DataLoader pattern (GraphQL/general)
│  Collect individual loads within one event loop tick
│  Execute as single batch query
│  Return individual results
│
├─ Bulk INSERT
│  Collect rows, INSERT multiple in one statement
│  INSERT INTO items (a, b) VALUES (1,2), (3,4), (5,6)
│
├─ Batch API calls
│  Collect individual API calls
│  Send as single batch request
│  POST /batch [{method, path, body}, ...]
│
└─ Write coalescing
   Buffer writes for short window (10-100ms)
   Flush as single operation
   Trade latency for throughput
```

## Memory Optimization

### Object Pooling

```
When to pool:
├─ Objects are expensive to create (DB connections, threads, buffers)
├─ Objects are created and destroyed frequently
├─ GC pressure is high (many short-lived allocations)
└─ Object initialization involves I/O or complex setup

Implementation pattern:
Pool {
  available: []     // Ready to use
  in_use: []        // Currently checked out
  max_size: N       // Maximum pool size

  acquire():
    if available.length > 0:
      return available.pop()
    elif in_use.length < max_size:
      return create_new()
    else:
      wait_or_reject()

  release(obj):
    reset(obj)       // Clean state for reuse
    available.push(obj)
}

Language-specific:
- Go: sync.Pool (GC-aware, may evict items)
- Rust: object-pool crate, crossbeam ObjectPool
- Java: Apache Commons Pool, HikariCP (connection pool)
- Python: queue.Queue-based custom pool
- Node.js: generic-pool package
```

### Streaming vs Buffering

```
Buffering (load all into memory):
├─ Simple code
├─ Random access to data
├─ DANGER: memory scales with data size
└─ Max data size limited by available RAM

Streaming (process chunk by chunk):
├─ Constant memory regardless of data size
├─ Can start processing before all data arrives
├─ Required for: large files, real-time data, video/audio
└─ Slightly more complex code

Decision:
├─ Data < 10MB → buffering is fine
├─ Data > 100MB → streaming required
├─ Data size unknown → streaming required
├─ Real-time processing → streaming required
└─ Need multiple passes → buffer or use temp file

Examples:
- Node.js: fs.createReadStream() vs fs.readFileSync()
- Python: open().read() vs for line in open()
- Go: io.Reader/io.Writer interfaces
- Rust: BufReader/BufWriter, tokio AsyncRead/AsyncWrite
```

### String Interning

```
Problem: many duplicate strings consuming memory

Before interning:
  "status: active"  →  String { ptr: 0x1000, len: 14 }
  "status: active"  →  String { ptr: 0x2000, len: 14 }  // Duplicate!
  "status: active"  →  String { ptr: 0x3000, len: 14 }  // Another!

After interning:
  "status: active"  →  all point to same allocation

When to use:
├─ Many repeated string values (status fields, tags, categories)
├─ Long-lived strings that are compared often
└─ Parsing structured data with repetitive fields

Language support:
- Python: sys.intern() (automatic for small strings)
- Java: String.intern() (JVM string pool)
- Go: no built-in, use map[string]string dedup
- Rust: string-interner crate
- JavaScript: V8 interns short strings automatically
```

## Algorithm Optimization

### Big-O Awareness

```
Common operations and their complexity:

O(1)        Array index, hash map lookup, stack push/pop
O(log n)    Binary search, balanced BST lookup, heap insert
O(n)        Linear scan, array copy, linked list traversal
O(n log n)  Efficient sort (merge, quick, heap sort)
O(n²)       Nested loops, bubble sort, insertion sort
O(2ⁿ)       Recursive Fibonacci (naive), power set

Practical impact (n = 1,000,000):
O(1)       = 1 operation
O(log n)   = 20 operations
O(n)       = 1,000,000 operations
O(n log n) = 20,000,000 operations
O(n²)      = 1,000,000,000,000 operations  ← TOO SLOW

Quick wins:
├─ Replace list search with set/hash map: O(n) → O(1)
├─ Sort + binary search instead of linear search: O(n) → O(log n)
├─ Replace nested loops with hash join: O(n²) → O(n)
├─ Cache computed values (memoization): avoid redundant work
└─ Use appropriate data structure for access pattern
```

### Data Structure Selection

```
Access pattern → best data structure:
│
├─ Fast lookup by key
│  └─ Hash map / dictionary
│     O(1) average lookup, insert, delete
│
├─ Ordered iteration + fast lookup
│  └─ Balanced BST (TreeMap, BTreeMap)
│     O(log n) lookup, insert, delete, ordered iteration
│
├─ Fast insert/remove at both ends
│  └─ Deque (double-ended queue)
│     O(1) push/pop at front and back
│
├─ Priority-based access (always get min/max)
│  └─ Heap / priority queue
│     O(1) peek min/max, O(log n) insert and extract
│
├─ Fast membership testing
│  └─ Hash set
│     O(1) contains check
│  └─ Bloom filter (probabilistic, space-efficient)
│     O(1) contains, may have false positives
│
├─ Fast prefix search / autocomplete
│  └─ Trie
│     O(k) lookup where k = key length
│
├─ Range queries (find all items between A and B)
│  └─ Sorted array + binary search, or B-tree
│
└─ Graph relationships
   ├─ Sparse graph → adjacency list
   └─ Dense graph → adjacency matrix
```

## Anti-Patterns

### Premature Optimization

```
"Premature optimization is the root of all evil" — Knuth

Signs of premature optimization:
├─ Optimizing before measuring
├─ Optimizing code that runs once or rarely
├─ Sacrificing readability for negligible performance gain
├─ Optimizing at the wrong level (micro vs macro)
└─ Optimizing before the feature is correct

The right approach:
1. Make it work (correct behavior)
2. Make it right (clean, maintainable code)
3. Make it fast (profile, then optimize measured bottlenecks)

Exception: architectural decisions (data model, API design) are
expensive to change later. Think about performance at design time
for structural choices.
```

### Common Performance Anti-Patterns

| Anti-Pattern | Why It's Bad | Better Approach |
|-------------|-------------|-----------------|
| N+1 queries | 1000 items = 1001 DB queries | Eager loading, batch queries, DataLoader |
| Unbounded growth | Cache/queue/buffer grows without limit | Set max size, eviction policy, backpressure |
| Synchronous I/O in async code | Blocks event loop/thread pool, kills throughput | Use async I/O throughout, offload blocking to thread pool |
| Re-rendering everything | UI updates trigger full tree re-render | Virtual DOM diffing, memoization, fine-grained reactivity |
| Serializing/deserializing repeatedly | Data converted between formats multiple times | Pass native objects, serialize once at boundary |
| Polling when events are available | CPU waste checking for changes | WebSockets, SSE, file watch, pub/sub |
| Logging in hot path | String formatting + I/O in tight loop | Sampling, async logging, log level guards |
| Global locks | All threads contend on single mutex | Fine-grained locks, lock-free structures, sharding |
| String concatenation in loop | O(n²) due to repeated copying | StringBuilder, join, format strings |
| Creating regex in loop | Compile regex on every iteration | Compile once, reuse compiled pattern |
| Deep cloning when shallow suffices | Copying entire object graph unnecessarily | Structural sharing, immutable data structures, shallow copy |
| Catching exceptions for flow control | Exceptions are expensive (stack trace capture) | Use return values, option types, result types |