SKILL.md 12 KB


name: perf-ops description: "Performance profiling and optimization orchestrator - diagnoses symptoms, dispatches profiling to language experts, manages before/after comparisons. Triggers on: performance, profiling, flamegraph, pprof, py-spy, clinic.js, memray, heaptrack, bundle size, webpack analyzer, load testing, k6, artillery, vegeta, locust, benchmark, hyperfine, criterion, slow query, EXPLAIN ANALYZE, N+1, caching, optimization, latency, throughput, p99, memory leak, CPU spike, bottleneck." license: MIT allowed-tools: "Read Edit Write Bash Glob Grep Agent TaskCreate TaskUpdate" metadata: author: claude-mods

related-skills: debug-ops, monitoring-ops, testing-ops, code-stats, postgres-ops

Performance Operations

Orchestrator for cross-language performance profiling and optimization. Classifies symptoms inline, dispatches profiling to language expert agents (background), and manages optimization with confirmation.

Architecture

User describes performance issue or requests profiling
    |
    +---> T1: Diagnose (inline, fast)
    |       +---> Classify symptom (decision tree)
    |       +---> Detect language/runtime from project
    |       +---> Check installed profiling tools
    |       +---> Determine production vs development
    |       +---> Gather system baseline (CPU/mem/disk)
    |       +---> Present: diagnosis + recommended profiling approach
    |
    +---> T2: Profile (dispatch to language expert, background)
    |       +---> Select expert agent from routing table
    |       +---> Build perf-focused dispatch prompt
    |       +---> Expert runs profiler, collects data, interprets results
    |       |       +---> Fallback: general-purpose with tool commands inlined
    |       +---> Returns: findings + bottleneck identification + suggestions
    |       |
    |       +---> [Optional parallel dispatch]:
    |             +---> CPU profiling agent  ---+
    |             +---> Memory profiling agent --+--> Consolidate findings
    |             +---> Baseline benchmark ------+
    |
    +---> T3: Optimize (dispatch to expert, foreground + confirm)
            +---> Expert proposes specific code changes
            +---> Preflight: what changes, expected impact, risks
            +---> User confirms
            +---> Apply changes
            +---> Re-benchmark for before/after delta

Safety Tiers

T1: Diagnose - Run Inline

No agent needed. Execute directly via Bash for instant results.

Operation Command / Method
Detect Python profilers which py-spy && which memray && which scalene
Detect Go profilers which go && go tool pprof -h 2>/dev/null
Detect Rust profilers which cargo-flamegraph && which samply
Detect Node profilers which clinic && which 0x
Detect benchmarking tools which hyperfine && which k6 && which vegeta
System CPU baseline top -bn1 -o %CPU \| head -20 (Linux) or wmic cpu get loadpercentage (Win)
System memory baseline free -h (Linux) or wmic OS get FreePhysicalMemory (Win)
Disk I/O check iostat -x 1 3 (Linux)
Identify language Check for package.json, go.mod, Cargo.toml, pyproject.toml, requirements.txt
Production vs dev Ask user or detect from environment (NODE_ENV, FLASK_ENV, etc.)
Read existing profiles Parse .prof, .svg, .bin files in project

Production safety rule: In production environments, only recommend sampling profilers (py-spy, pprof HTTP endpoint, perf). Never suggest attaching debuggers, tracing profilers, or tools that require process restart.

T2: Profile - Dispatch to Expert Agent

Gather context from T1 diagnosis, then dispatch to the appropriate language expert.

Language Expert Routing:

Detected Language Expert Agent Key Profiling Tools
Python (.py, pyproject.toml, requirements.txt) python-expert py-spy, memray, scalene, tracemalloc
Go (go.mod, .go files) go-expert pprof (CPU/heap/goroutine/mutex), benchstat
Rust (Cargo.toml, .rs files) rust-expert cargo-flamegraph, samply, DHAT, criterion
TypeScript/JavaScript (backend, package.json + server) javascript-expert clinic flame/doctor/bubbleprof, 0x
TypeScript/JavaScript (frontend, bundle issues) typescript-expert webpack-bundle-analyzer, Lighthouse, source-map-explorer
SQL / PostgreSQL postgres-expert EXPLAIN ANALYZE, pg_stat_statements, pgbench
General / unknown / CLI benchmarking general-purpose hyperfine, perf, strace

Dispatch template (T2):

You are handling a performance profiling task dispatched by the perf-ops orchestrator.

## Diagnosis (from T1)
- Symptom: {classified symptom from decision tree}
- Language/Runtime: {detected language}
- Environment: {production | development}
- Installed tools: {list from tool detection}
- System baseline: {CPU/memory/disk metrics}

## Profiling Task
{specific profiling request - e.g., "CPU profile the API server under load"}

## Target
- Process/file: {target application or endpoint}
- Expected workload: {how to generate representative load if needed}

## Domain Knowledge
Before starting, read the relevant profiling reference for this language:
- Read: skills/perf-ops/references/cpu-memory-profiling.md

For load testing tasks, also read:
- Read: skills/perf-ops/references/load-testing.md

For database profiling, also read:
- Read: skills/postgres-ops/SKILL.md (if PostgreSQL)

## Instructions
1. Run the appropriate profiler for this language and symptom
2. Collect sufficient samples (minimum 30 seconds for CPU, multiple snapshots for memory)
3. Interpret the results - identify the top 3-5 bottlenecks
4. For each bottleneck: explain what it is, why it's slow, and suggest a fix
5. Report findings in structured format with metrics

Execution mode:

Scenario Mode Why
User waiting for results run_in_background=False They need findings before continuing
User continuing other work run_in_background=True Don't block the main session
Quick benchmark (hyperfine) run_in_background=False Fast enough to wait
Load test (k6, artillery) run_in_background=True Takes minutes

T3: Optimize - Preflight Required

Dispatch to language expert with explicit instruction to produce a preflight report before any code changes.

Dispatch template (T3 preflight):

You are handling a performance optimization dispatched by the perf-ops orchestrator.

## Profiling Results (from T2)
{bottleneck findings, metrics, flamegraph interpretation}

## Optimization Request
{specific optimization - e.g., "Fix the N+1 query in UserController.list"}

IMPORTANT: Do NOT apply changes yet. Produce a Preflight Report:
1. Exactly what code/config changes you will make
2. Expected performance improvement (with reasoning)
3. Risks (correctness, side effects, edge cases)
4. How to verify the improvement (specific benchmark or test)
5. How to revert if the optimization causes issues

After user confirms: Re-dispatch with execute authority plus the before/after protocol.

Dispatch template (T3 execute + before/after):

User confirmed the optimization. Proceed with execution.

## Approved Changes
{exact changes from preflight report}

## Before/After Protocol
1. Record the current benchmark baseline: {specific command from T2}
2. Apply the approved changes
3. Run the same benchmark again
4. Report comparison:
   - Metric: before value -> after value (% change)
   - Include statistical confidence if tool supports it
5. If regression detected: revert and report

Parallel Profiling

When multiple independent symptoms are detected, or the user requests comprehensive profiling, dispatch parallel agents.

Parallelizable combinations:

Agent 1 Agent 2 Why Independent
CPU profiler Memory profiler Different tools, different data
CPU profiler Baseline benchmark Read vs measurement
Backend profiler Frontend bundle analysis Different runtimes
Service A profiler Service B profiler Different processes

NOT parallelizable:

Operation A Operation B Why Sequential
Profile Interpret results Dependency
Before benchmark After benchmark Requires code change between
Load test CPU profile same process Tool interference

Dispatch pattern for parallel profiling:

# Example: CPU + memory profiling in parallel
Agent(
    subagent_type="python-expert",
    model="sonnet",
    run_in_background=True,
    prompt="CPU profiling task: {cpu_prompt}"
)
Agent(
    subagent_type="python-expert",
    model="sonnet",
    run_in_background=True,
    prompt="Memory profiling task: {memory_prompt}"
)
# Both run simultaneously, consolidate findings when both complete

Fallback: When Expert Agent Is Unavailable

If the target language expert is not registered as a subagent type, fall back to general-purpose with profiling commands inlined.

Agent(
    subagent_type="general-purpose",
    model="sonnet",
    run_in_background=True,
    prompt="""You are acting as a performance profiling agent for {language}.

Use these specific tools and commands:
{tool commands from diagnosis-quickref.md for the detected language}

{original dispatch prompt}
"""
)

For simple benchmarks (hyperfine, single command timing), skip agent dispatch entirely and run inline via Bash.

Decision Logic

When a performance-related request arrives:

1. Classify the request:
   - Symptom description? -> Start at T1 (diagnose)
   - "Profile my app"? -> T1 (detect language + tools) then T2 (profile)
   - "Benchmark X vs Y"? -> T2 directly (hyperfine or language benchmark)
   - "Optimize this"? -> T2 (profile first) then T3 (optimize)
   - "Why is X slow"? -> T1 (diagnose) then T2 (targeted profile)

2. T1 Diagnose (always runs first for new issues):
   - Detect language/runtime
   - Check installed profiling tools
   - Classify symptom using decision tree (see diagnosis-quickref.md)
   - Determine production vs development
   - Present findings + recommend next step

3. T2 Profile (when diagnosis points to a specific bottleneck):
   - Route to appropriate language expert
   - Decide foreground vs background
   - Consider parallel dispatch if multiple symptoms
   - Consolidate findings from all agents

4. T3 Optimize (only when user wants changes applied):
   - Always produce preflight report first
   - Wait for explicit user confirmation
   - Execute with before/after comparison
   - Report delta with statistical confidence

Quick Reference

Task Tier Execution
Detect tools T1 Inline
Check system metrics T1 Inline
Classify symptom T1 Inline
Identify language T1 Inline
Run CPU profiler T2 Agent (bg)
Run memory profiler T2 Agent (bg)
Run load test T2 Agent (bg)
Run benchmark T2 Agent (bg or inline for hyperfine)
Bundle analysis T2 Agent (bg)
EXPLAIN ANALYZE T2 Agent (fg)
Before/after comparison T2 Agent (fg)
Apply optimization T3 Agent + confirm
Add index T3 Agent + confirm
Refactor hot path T3 Agent + confirm

Reference Files

File Contents
references/diagnosis-quickref.md Decision tree, tool selection matrix, quick references for all profiling domains, common gotchas
references/cpu-memory-profiling.md Deep flamegraph interpretation, language-specific CPU/memory profiling guides
references/load-testing.md k6, Artillery, vegeta, wrk, Locust methodology and CI integration
references/optimization-patterns.md Caching, database, frontend, API, concurrency, memory optimization strategies
references/ci-integration.md Performance budgets, regression detection, CI pipeline patterns, benchmark baselines

Load reference files when deeper tool-specific guidance is needed beyond what the dispatch prompt provides.

See Also

Skill When to Combine
debug-ops Root cause analysis for performance regressions
monitoring-ops Production metrics, alerting on latency/throughput
testing-ops Performance regression tests in CI, benchmark suites
code-stats Identify complex code that may be performance-sensitive
postgres-ops PostgreSQL-specific query optimization, indexing, EXPLAIN
container-orchestration Resource limits, pod scaling, container performance