name: iterate description: "Autonomous improvement loop - modify, measure, keep or discard, repeat. Inspired by Karpathy's autoresearch. Triggers on: iterate, improve autonomously, run overnight, keep improving, autoresearch, improvement loop, iterate until done, autonomous iteration."

allowed-tools: "Read Write Edit Glob Grep Bash Agent"

Iterate - Autonomous Improvement Loop

Inspired by Karpathy's autoresearch: constrain scope, clarify success with one mechanical metric, loop autonomously. The agent modifies code, measures the result, keeps improvements, discards regressions, and repeats - indefinitely or for N iterations.

The power is in the constraint. One metric. One scope. One loop. Git as memory.

Setup

Collect five inputs. If the user provides them inline, extract and proceed. If any are missing, ask once using AskUserQuestion with all missing fields batched together.

Field	Required	What it is	Example
Goal	Yes	What you're improving, in plain language	"Increase test coverage to 90%"
Scope	Yes	File globs the agent may modify	`src/*/.ts`
Verify	Yes	Shell command that outputs the metric (a number)	`npm test -- --coverage \\| grep "All files"`
Direction	Yes	Is higher or lower better?	`higher` / `lower`
Guard	No	Command that must always pass (prevents regressions)	`npm run typecheck`

Bounded mode: If the user includes Iterations: N, run exactly N iterations then stop with a summary. Otherwise, loop forever until interrupted.

Baseline

Once config is complete:

Read all in-scope files for full context
Run the verify command on the current state
Extract the metric value - this is iteration 0 (baseline)
Create results.tsv with the header and baseline row
Confirm setup to the user, then begin the loop

Goal:      Increase test coverage to 90%
Scope:     src/**/*.ts
Verify:    npm test -- --coverage | grep "All files"
Direction: higher
Guard:     npm run typecheck
Baseline:  72.3%
Mode:      unbounded

Starting iteration loop.

The Loop

LOOP (forever, or N times):

  1. REVIEW    git log --oneline -10 + read results.tsv tail
              Know what worked, what failed, what's untried.

  2. IDEATE    Pick ONE change. Write a one-sentence description
              BEFORE touching any code. Consult git history -
              don't repeat discarded approaches.

  3. MODIFY    Make ONE atomic change to in-scope files only.
              Small, focused, explainable.

  4. COMMIT    git add <specific files> (never git add -A)
              git commit -m "experiment: <description>"
              Commit BEFORE verification. Enables clean rollback.

  5. VERIFY    Run the verify command. Extract the metric.
              If guard is set and metric improved, run guard too.

  6. DECIDE
              Improved + guard passes (or no guard) -> KEEP
              Improved + guard fails -> REVERT (git revert HEAD --no-edit)
              Same or worse                -> REVERT
              Crashed -> attempt fix (max 3 tries), else REVERT

  7. LOG       Append row to results.tsv

  8. REPEAT    Go to 1. Print a one-line status every 5 iterations.
              NEVER ask "should I continue?" - just keep going.
              If bounded and iteration N reached, print summary and stop.

Rollback

Always use git revert HEAD --no-edit (preserves the experiment in history - the agent can learn from it). If revert conflicts, fall back to git reset --hard HEAD~1.

When Stuck (5+ consecutive discards)

Re-read ALL in-scope files from scratch
Re-read the original goal
Review entire results.tsv for patterns
Try combining two previously successful changes
Try the opposite of what hasn't been working
Try something radical - architectural changes, different algorithms

Rules

One change per iteration. Atomic. If it breaks, you know exactly why.
Mechanical verification only. No "looks good." The number decides.
Git is memory. Commit before verify. Revert on failure. Read git log before ideating. Failed experiments stay visible in history via revert commits.
Simpler wins. Equal metric + less code = keep. Tiny improvement + ugly complexity = discard. Removing code for equal results is a win.
Never stop. Unbounded loops run until interrupted. Never ask permission to continue. The user may be asleep.
Read before write. Understand full context before each modification.
Scope is sacred. Only modify files matching the scope globs. Never touch verify/guard targets, test fixtures, or config outside scope.

Results Log

Tab-separated file: results.tsv

iteration	commit	metric	status	description
0	a1b2c3d	72.3	baseline	initial state
1	b2c3d4e	74.1	keep	add edge case tests for auth module
2	-	73.8	discard	refactor test helpers (broke coverage)
3	c3d4e5f	75.0	keep	add missing null checks in user service
4	-	0.0	crash	switched to vitest (import errors)

Status values: baseline, keep, discard, crash

Progress Output

Every 5 iterations, print a brief status:

Iteration 15: metric 81.2 (baseline 72.3, +8.9) | 6 keeps, 8 discards, 1 crash

When a bounded loop completes:

=== Iterate Complete (25/25) ===
Baseline: 72.3 -> Final: 88.7 (+16.4)
Keeps: 12 | Discards: 11 | Crashes: 2
Best iteration: #18 - add integration tests for payment flow (+3.2)

Adapting to Any Domain

The pattern is universal. Change the five inputs, not the loop.

Domain	Goal	Verify	Direction
Test coverage	Coverage to 90%	`npm test -- --coverage`	higher
Bundle size	Below 200KB	`npm run build && stat -f%z dist/main.js`	lower
Performance	Faster API response	`npm run bench \\| grep p95`	lower
ML training	Lower validation loss	`uv run train.py && grep val_bpb run.log`	lower
Lint errors	Zero warnings	`npm run lint 2>&1 \\| grep -c warning`	lower
Lighthouse	Score above 95	`npx lighthouse --output=json \\| jq .score`	higher
Code quality	Reduce complexity	`npx complexity-report \\| grep average`	lower

Guard: Preventing Regressions

The guard is an optional safety net - a command that must always pass regardless of what the main metric does.

Verify answers: "Did the metric improve?"
Guard answers: "Did anything else break?"

If the metric improves but the guard fails, the change is reverted. The agent should note WHY the guard failed and adapt future attempts accordingly.

Common guards: npm test, tsc --noEmit, cargo check, pytest, go vet

Usage Examples

Inline config (all fields provided)

/iterate
Goal: Increase test coverage from 72% to 90%
Scope: src/**/*.ts, src/**/*.test.ts
Verify: npm test -- --coverage | grep "All files" | awk '{print $10}'
Direction: higher
Guard: tsc --noEmit
Iterations: 30

Minimal (triggers interactive setup)

/iterate
Goal: Make the API faster

Agent scans codebase for tooling, suggests scope/verify/direction, asks once, then goes.

Unbounded overnight run

/iterate
Goal: Reduce bundle size below 150KB
Scope: src/**/*.ts, webpack.config.js
Verify: npm run build 2>&1 | grep "main.js" | awk '{print $2}'
Direction: lower

Agent runs indefinitely. User interrupts in the morning. Results are in results.tsv and git history.

SKILL.md 7.4 KB 永久連結 文件歷史 原始文件