name: iterate description: "Autonomous improvement loop - modify, measure, keep or discard, repeat. Inspired by Karpathy's autoresearch. Triggers on: iterate, improve autonomously, run overnight, keep improving, autoresearch, improvement loop, iterate until done, autonomous iteration."
Inspired by Karpathy's autoresearch: constrain scope, clarify success with one mechanical metric, loop autonomously. The agent modifies code, measures the result, keeps improvements, discards regressions, and repeats - indefinitely or for N iterations.
The power is in the constraint. One metric. One scope. One loop. Git as memory.
Collect five inputs. If the user provides them inline, extract and proceed. If any are missing, ask once using AskUserQuestion with all missing fields batched together.
| Field | Required | What it is | Example |
|---|---|---|---|
| Goal | Yes | What you're improving, in plain language | "Increase test coverage to 90%" |
| Scope | Yes | File globs the agent may modify | src/**/*.ts |
| Verify | Yes | Shell command that outputs the metric (a number) | npm test -- --coverage \| grep "All files" |
| Direction | Yes | Is higher or lower better? | higher / lower |
| Guard | No | Command that must always pass (prevents regressions) | npm run typecheck |
Bounded mode: If the user includes Iterations: N, run exactly N iterations then stop with a summary. Otherwise, loop forever until interrupted.
Once config is complete:
results.tsv with the header and baseline rowGoal: Increase test coverage to 90%
Scope: src/**/*.ts
Verify: npm test -- --coverage | grep "All files"
Direction: higher
Guard: npm run typecheck
Baseline: 72.3%
Mode: unbounded
Starting iteration loop.
LOOP (forever, or N times):
1. REVIEW git log --oneline -10 + read results.tsv tail
Know what worked, what failed, what's untried.
2. IDEATE Pick ONE change. Write a one-sentence description
BEFORE touching any code. Consult git history -
don't repeat discarded approaches.
3. MODIFY Make ONE atomic change to in-scope files only.
Small, focused, explainable.
4. COMMIT git add <specific files> (never git add -A)
git commit -m "experiment: <description>"
Commit BEFORE verification. Enables clean rollback.
5. VERIFY Run the verify command. Extract the metric.
If guard is set and metric improved, run guard too.
6. DECIDE
Improved + guard passes (or no guard) -> KEEP
Improved + guard fails -> REVERT (git revert HEAD --no-edit)
Same or worse -> REVERT
Crashed -> attempt fix (max 3 tries), else REVERT
7. LOG Append row to results.tsv
8. REPEAT Go to 1. Print a one-line status every 5 iterations.
NEVER ask "should I continue?" - just keep going.
If bounded and iteration N reached, print summary and stop.
Always use git revert HEAD --no-edit (preserves the experiment in history - the agent can learn from it). If revert conflicts, fall back to git reset --hard HEAD~1.
git log before ideating. Failed experiments stay visible in history via revert commits.Tab-separated file: results.tsv
iteration commit metric status description
0 a1b2c3d 72.3 baseline initial state
1 b2c3d4e 74.1 keep add edge case tests for auth module
2 - 73.8 discard refactor test helpers (broke coverage)
3 c3d4e5f 75.0 keep add missing null checks in user service
4 - 0.0 crash switched to vitest (import errors)
Status values: baseline, keep, discard, crash
Every 5 iterations, print a brief status:
Iteration 15: metric 81.2 (baseline 72.3, +8.9) | 6 keeps, 8 discards, 1 crash
When a bounded loop completes:
=== Iterate Complete (25/25) ===
Baseline: 72.3 -> Final: 88.7 (+16.4)
Keeps: 12 | Discards: 11 | Crashes: 2
Best iteration: #18 - add integration tests for payment flow (+3.2)
The pattern is universal. Change the five inputs, not the loop.
| Domain | Goal | Verify | Direction |
|---|---|---|---|
| Test coverage | Coverage to 90% | npm test -- --coverage |
higher |
| Bundle size | Below 200KB | npm run build && stat -f%z dist/main.js |
lower |
| Performance | Faster API response | npm run bench \| grep p95 |
lower |
| ML training | Lower validation loss | uv run train.py && grep val_bpb run.log |
lower |
| Lint errors | Zero warnings | npm run lint 2>&1 \| grep -c warning |
lower |
| Lighthouse | Score above 95 | npx lighthouse --output=json \| jq .score |
higher |
| Code quality | Reduce complexity | npx complexity-report \| grep average |
lower |
The guard is an optional safety net - a command that must always pass regardless of what the main metric does.
If the metric improves but the guard fails, the change is reverted. The agent should note WHY the guard failed and adapt future attempts accordingly.
Common guards: npm test, tsc --noEmit, cargo check, pytest, go vet
/iterate
Goal: Increase test coverage from 72% to 90%
Scope: src/**/*.ts, src/**/*.test.ts
Verify: npm test -- --coverage | grep "All files" | awk '{print $10}'
Direction: higher
Guard: tsc --noEmit
Iterations: 30
/iterate
Goal: Make the API faster
Agent scans codebase for tooling, suggests scope/verify/direction, asks once, then goes.
/iterate
Goal: Reduce bundle size below 150KB
Scope: src/**/*.ts, webpack.config.js
Verify: npm run build 2>&1 | grep "main.js" | awk '{print $2}'
Direction: lower
Agent runs indefinitely. User interrupts in the morning. Results are in results.tsv and git history.