# Performance Diagnosis Quick Reference Symptom classification, tool selection, and common patterns for rapid performance triage. ## Performance Issue Decision Tree ``` What symptom are you observing? | +- High CPU usage | +- Sustained 100% on one core | | +- CPU-bound: hot loop, regex backtracking, tight computation | | -> Profile with flamegraph (py-spy, pprof, clinic flame, samply) | +- Sustained 100% across all cores | | +- Parallelism gone wrong: fork bomb, unbounded workers, spin locks | | -> Check process count, thread count, lock contention | +- Periodic spikes | +- GC pressure, cron job, batch processing, cache stampede | -> Correlate with GC logs, scheduled tasks, traffic patterns | +- High memory usage | +- Growing over time (never decreasing) | | +- Memory leak: unclosed resources, growing caches, event listener accumulation | | -> Heap snapshots over time, compare retained objects | +- Sudden large allocation | | +- Unbounded buffer, loading full dataset into memory, large file read | | -> Check allocation sizes, switch to streaming | +- High but stable | +- May be normal: in-memory cache, preloaded data, memory-mapped files | -> Verify with expected working set size | +- Slow responses / high latency | +- All endpoints slow | | +- Systemic: resource exhaustion, GC pauses, DNS issues, TLS overhead | | -> Check resource utilization, GC metrics, network path | +- Specific endpoint slow | | +- Query-specific: N+1 queries, missing index, unoptimized algorithm | | -> EXPLAIN ANALYZE, query logging, endpoint profiling | +- Intermittently slow (p99 spikes) | +- Contention: lock wait, connection pool exhaustion, noisy neighbor | -> Check lock metrics, pool sizes, correlated traffic | +- Low throughput | +- CPU not saturated | | +- I/O bound: disk wait, network latency, blocking calls in async code | | -> Check iowait, network RTT, ensure async throughout | +- CPU saturated | | +- Compute bound: need algorithmic improvement or horizontal scaling | | -> Profile hot paths, optimize or scale out | +- Queues backing up | +- Consumer too slow: batch size, consumer count, downstream bottleneck | -> Increase consumers, optimize processing, check downstream | +- Large bundle size (frontend) | +- Main bundle too large | | +- Missing code splitting, tree shaking not working, barrel file imports | | -> Bundle analyzer, check import patterns, add dynamic imports | +- Duplicate dependencies | | +- Multiple versions of same library bundled | | -> Dedupe, check peer dependencies, use resolutions | +- Large assets | +- Unoptimized images, embedded fonts, inline data URIs | -> Image optimization, font subsetting, external assets | +- Slow database queries +- Single slow query | +- Missing index, suboptimal join order, full table scan | -> EXPLAIN ANALYZE, add index, rewrite query +- Many small queries (N+1) | +- ORM lazy loading, loop with individual queries | -> Eager loading, batch queries, dataloader pattern +- Lock contention +- Long transactions, row-level locks, table locks -> Shorten transactions, check isolation level, advisory locks ``` ## Profiling Tool Selection Matrix | Problem | Node.js | Python | Go | Rust | Browser | |---------|---------|--------|----|------|---------| | **CPU hotspots** | clinic flame, 0x | py-spy, scalene | pprof (CPU) | cargo-flamegraph, samply | DevTools Performance | | **Memory leaks** | clinic doctor, heap snapshot | memray, tracemalloc | pprof (heap) | DHAT, heaptrack | DevTools Memory | | **Memory allocation** | --heap-prof | memray, scalene | pprof (allocs) | DHAT | DevTools Allocation | | **Async bottlenecks** | clinic bubbleprof | asyncio debug mode | pprof (goroutine) | tokio-console | DevTools Performance | | **I/O profiling** | clinic doctor | strace, py-spy | pprof (block) | strace, perf | Network tab | | **GC pressure** | --trace-gc | gc.set_debug | GODEBUG=gctrace=1 | N/A (no GC) | Performance timeline | | **Lock contention** | N/A | py-spy (threading) | pprof (mutex) | parking_lot stats | N/A | | **Startup time** | --cpu-prof | python -X importtime | go build -v | cargo build --timings | Lighthouse | ## CPU Profiling Quick Reference ### Flamegraph Basics ``` Reading a flamegraph: - X-axis: proportion of total samples (wider = more time) - Y-axis: call stack depth (bottom = entry point, top = leaf) - Color: random (not meaningful) in most tools - Look for: wide plateaus at the top (hot functions) - Ignore: narrow towers (called often but fast) Key actions: 1. Find the widest bars at the TOP of the graph 2. Trace down to see what calls them 3. Focus optimization on the widest top-level functions 4. Re-profile after each change to verify improvement ``` ### Tool Quick Start | Tool | Language | Command | Output | |------|----------|---------|--------| | **py-spy** | Python | `py-spy record -o profile.svg -- python app.py` | SVG flamegraph | | **py-spy top** | Python | `py-spy top --pid PID` | Live top-like view | | **pprof** | Go | `go tool pprof -http :8080 http://localhost:6060/debug/pprof/profile?seconds=30` | Interactive web UI | | **clinic flame** | Node.js | `clinic flame -- node app.js` | HTML flamegraph | | **0x** | Node.js | `0x app.js` | SVG flamegraph | | **cargo-flamegraph** | Rust | `cargo flamegraph --bin myapp` | SVG flamegraph | | **samply** | Rust/C/C++ | `samply record ./target/release/myapp` | Firefox Profiler UI | | **perf** | Linux (any) | `perf record -g ./myapp && perf script \| inferno-flamegraph > out.svg` | SVG flamegraph | ## Memory Profiling Quick Reference | Tool | Language | Command | What It Shows | |------|----------|---------|---------------| | **memray** | Python | `memray run script.py && memray flamegraph output.bin` | Allocation flamegraph, leak detection | | **tracemalloc** | Python | `tracemalloc.start(); snapshot = tracemalloc.take_snapshot()` | Top allocators, allocation traceback | | **scalene** | Python | `scalene script.py` | CPU + memory + GPU in one profiler | | **heaptrack** | C/C++/Rust | `heaptrack ./myapp && heaptrack_gui heaptrack.myapp.*.zst` | Allocation timeline, flamegraph, leak candidates | | **DHAT** | Rust | `valgrind --tool=dhat ./target/debug/myapp` | Allocation sites, short-lived allocs | | **pprof (heap)** | Go | `go tool pprof http://localhost:6060/debug/pprof/heap` | Live heap, allocation counts | | **Chrome heap** | JS/Browser | DevTools - Memory - Take heap snapshot | Object retention, detached DOM | | **clinic doctor** | Node.js | `clinic doctor -- node app.js` | Memory + CPU + event loop diagnosis | ## Bundle Analysis Quick Reference | Tool | Bundler | Command | Output | |------|---------|---------|--------| | **webpack-bundle-analyzer** | Webpack | `npx webpack-bundle-analyzer stats.json` | Interactive treemap | | **source-map-explorer** | Any | `npx source-map-explorer bundle.js` | Treemap from source maps | | **rollup-plugin-visualizer** | Rollup/Vite | Add plugin, build | HTML treemap | | **vite-bundle-visualizer** | Vite | `npx vite-bundle-visualizer` | Treemap visualization | | **bundlephobia** | npm | `npx bundlephobia ` | Package size analysis | | **size-limit** | Any | Configure in package.json, run in CI | Size budget enforcement | ### Bundle Size Reduction Checklist ``` [ ] Dynamic imports for routes and heavy components [ ] Tree shaking working (check for side effects in package.json) [ ] No barrel file re-exports pulling in entire modules [ ] Lodash: use lodash-es or individual imports (lodash/debounce) [ ] Moment.js replaced with date-fns or dayjs [ ] Images optimized (WebP/AVIF, responsive sizes, lazy loading) [ ] Fonts subsetted to used characters [ ] Gzip/Brotli compression enabled on server [ ] Source maps excluded from production bundle size [ ] CSS purged of unused styles (PurgeCSS, Tailwind JIT) ``` ## Database Performance Quick Reference ### EXPLAIN ANALYZE Interpretation ``` Key metrics in EXPLAIN ANALYZE output: | +- Seq Scan -> Full table scan (often bad for large tables) | +- Fix: Add index on filter columns +- Index Scan -> Using index (good) +- Bitmap Index Scan -> Multiple index conditions combined (good) +- Nested Loop -> OK for small inner table, bad for large joins | +- Fix: Add index on join column, consider Hash Join +- Hash Join -> Good for large equi-joins +- Sort -> Check if index can provide order | +- Fix: Add index matching ORDER BY +- actual time -> First row..last row in milliseconds +- rows -> Actual rows vs planned (estimate accuracy) +- buffers -> shared hit (cache) vs read (disk I/O) ``` ### N+1 Detection ``` Symptoms: - Many identical queries with different WHERE values - Response time scales linearly with result count - Query log shows repeated patterns Detection: - Django: django-debug-toolbar, nplusone - Rails: Bullet gem - SQLAlchemy: sqlalchemy.echo=True, look for repeated patterns - General: enable slow query log, count queries per request Fix: - Eager loading (JOIN, prefetch, include) - Batch queries (WHERE id IN (...)) - DataLoader pattern (batch + cache per request) ``` ## Load Testing Quick Reference | Tool | Language | Strengths | Command | |------|----------|-----------|---------| | **k6** | Go (JS scripts) | Scripted scenarios, thresholds, cloud | `k6 run script.js` | | **artillery** | Node.js | YAML config, plugins, Playwright | `artillery run config.yml` | | **vegeta** | Go | CLI piping, constant rate | `echo "GET http://localhost" \| vegeta attack \| vegeta report` | | **wrk** | C | Lightweight, Lua scripts | `wrk -t4 -c100 -d30s http://localhost` | | **autocannon** | Node.js | Programmatic, pipelining | `autocannon -c 100 -d 30 http://localhost` | | **locust** | Python | Python classes, distributed | `locust -f locustfile.py` | ### Load Test Types ``` Test Type Selection: | +- Smoke Test | +- Minimal load (1-2 VUs) to verify system works | Duration: 1-5 minutes | +- Load Test | +- Expected production load | Duration: 15-60 minutes | Goal: Verify SLOs are met under normal conditions | +- Stress Test | +- Beyond expected load, find breaking point | Ramp up until errors or unacceptable latency | Goal: Know the system's limits | +- Spike Test | +- Sudden burst of traffic | Instant jump to high load, then drop | Goal: Test auto-scaling, queue behavior | +- Soak Test (Endurance) | +- Moderate load for extended period (hours) | Goal: Find memory leaks, resource exhaustion, GC issues | +- Breakpoint Test +- Continuously ramp up until failure Goal: Find maximum capacity ``` ## Benchmarking Quick Reference | Tool | Domain | Command | Notes | |------|--------|---------|-------| | **hyperfine** | CLI commands | `hyperfine 'cmd1' 'cmd2'` | Warm-up, statistical analysis, export | | **criterion** | Rust | `cargo bench` (with criterion dep) | Statistical, HTML reports, regression detection | | **testing.B** | Go | `go test -bench=. -benchmem` | Built-in, memory allocs, sub-benchmarks | | **pytest-benchmark** | Python | `pytest --benchmark-only` | Statistical, histograms, comparison | | **vitest bench** | JS/TS | `vitest bench` | Built-in to Vitest, Tinybench engine | | **Benchmark.js** | JS | Programmatic setup | Statistical analysis, ops/sec | ### Benchmarking Best Practices ``` [ ] Warm up before measuring (JIT compilation, cache population) [ ] Run multiple iterations (minimum 10, prefer 100+) [ ] Report statistical summary (mean, median, stddev, min, max) [ ] Control for system noise (close other apps, pin CPU frequency) [ ] Compare against baseline (previous version, alternative impl) [ ] Measure what matters (end-to-end, not micro-operations in isolation) [ ] Profile before benchmarking (know WHAT to benchmark) [ ] Document environment (hardware, OS, runtime version, flags) ``` ## Optimization Patterns Quick Reference | Pattern | When to Use | Example | |---------|-------------|---------| | **Caching** | Repeated expensive computations or I/O | Redis, in-memory LRU, CDN, memoization | | **Lazy loading** | Resources not needed immediately | Dynamic imports, virtual scrolling, pagination | | **Connection pooling** | Frequent DB/HTTP connections | PgBouncer, HikariCP, urllib3 pool | | **Batch operations** | Many small operations on same resource | Bulk INSERT, DataLoader, batch API calls | | **Pagination** | Large result sets | Cursor-based (not offset) for large datasets | | **Compression** | Network transfer of text data | Brotli > gzip for static, gzip for dynamic | | **Streaming** | Processing large files or datasets | Line-by-line, chunk processing, async iterators | | **Precomputation** | Predictable expensive calculations | Materialized views, build-time generation | | **Denormalization** | Read-heavy with expensive joins | Duplicate data for read performance | | **Index optimization** | Slow queries on large tables | Composite indexes matching query patterns | ## Common Gotchas | Gotcha | Why It Hurts | Fix | |--------|-------------|-----| | Premature optimization | Wastes time on non-bottlenecks, adds complexity | Profile first, optimize the measured hot path | | Micro-benchmarks misleading | JIT, caching, branch prediction differ from real workload | Benchmark realistic workloads, validate with production metrics | | Profiling overhead | Profiler itself skews results (observer effect) | Use sampling profilers (py-spy, pprof) not tracing profilers | | Cache invalidation | Stale data served, inconsistent state across nodes | TTL + event-based invalidation, cache-aside pattern | | Optimizing cold path | Spending effort on rarely-executed code | Focus on hot paths identified by profiling | | Ignoring tail latency | p50 looks great but p99 is 10x worse | Measure and optimize p95/p99, not just averages | | N+1 queries hidden by ORM | Each page load fires hundreds of queries | Enable query logging, use eager loading | | Compression on small payloads | Overhead exceeds savings for payloads <150 bytes | Only compress above minimum size threshold | | Connection pool too large | Each connection uses memory, causes lock contention | Size pool to CPU cores x 2-3, not hundreds | | Missing async in I/O path | One blocking call serializes all concurrent requests | Audit entire request path for blocking calls | | Benchmarking debug builds | Debug builds 10-100x slower, misleading results | Always benchmark release/optimized builds | | Over-indexing database | Write performance degrades, storage bloats | Only index columns in WHERE, JOIN, ORDER BY clauses |