|
|
@@ -0,0 +1,197 @@
|
|
|
+---
|
|
|
+name: iterate
|
|
|
+description: "Autonomous improvement loop - modify, measure, keep or discard, repeat. Inspired by Karpathy's autoresearch. Triggers on: iterate, improve autonomously, run overnight, keep improving, autoresearch, improvement loop, iterate until done, autonomous iteration."
|
|
|
+allowed-tools: "Read Write Edit Glob Grep Bash Agent"
|
|
|
+---
|
|
|
+
|
|
|
+# Iterate - Autonomous Improvement Loop
|
|
|
+
|
|
|
+Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch): constrain scope, clarify success with one mechanical metric, loop autonomously. The agent modifies code, measures the result, keeps improvements, discards regressions, and repeats - indefinitely or for N iterations.
|
|
|
+
|
|
|
+The power is in the constraint. One metric. One scope. One loop. Git as memory.
|
|
|
+
|
|
|
+## Setup
|
|
|
+
|
|
|
+Collect five inputs. If the user provides them inline, extract and proceed. If any are missing, ask once using `AskUserQuestion` with all missing fields batched together.
|
|
|
+
|
|
|
+| Field | Required | What it is | Example |
|
|
|
+|-------|----------|------------|---------|
|
|
|
+| **Goal** | Yes | What you're improving, in plain language | "Increase test coverage to 90%" |
|
|
|
+| **Scope** | Yes | File globs the agent may modify | `src/**/*.ts` |
|
|
|
+| **Verify** | Yes | Shell command that outputs the metric (a number) | `npm test -- --coverage \| grep "All files"` |
|
|
|
+| **Direction** | Yes | Is higher or lower better? | `higher` / `lower` |
|
|
|
+| **Guard** | No | Command that must always pass (prevents regressions) | `npm run typecheck` |
|
|
|
+
|
|
|
+**Bounded mode:** If the user includes `Iterations: N`, run exactly N iterations then stop with a summary. Otherwise, loop forever until interrupted.
|
|
|
+
|
|
|
+### Baseline
|
|
|
+
|
|
|
+Once config is complete:
|
|
|
+
|
|
|
+1. Read all in-scope files for full context
|
|
|
+2. Run the verify command on the current state
|
|
|
+3. Extract the metric value - this is iteration 0 (baseline)
|
|
|
+4. Create `results.tsv` with the header and baseline row
|
|
|
+5. Confirm setup to the user, then begin the loop
|
|
|
+
|
|
|
+```
|
|
|
+Goal: Increase test coverage to 90%
|
|
|
+Scope: src/**/*.ts
|
|
|
+Verify: npm test -- --coverage | grep "All files"
|
|
|
+Direction: higher
|
|
|
+Guard: npm run typecheck
|
|
|
+Baseline: 72.3%
|
|
|
+Mode: unbounded
|
|
|
+
|
|
|
+Starting iteration loop.
|
|
|
+```
|
|
|
+
|
|
|
+## The Loop
|
|
|
+
|
|
|
+```
|
|
|
+LOOP (forever, or N times):
|
|
|
+
|
|
|
+ 1. REVIEW git log --oneline -10 + read results.tsv tail
|
|
|
+ Know what worked, what failed, what's untried.
|
|
|
+
|
|
|
+ 2. IDEATE Pick ONE change. Write a one-sentence description
|
|
|
+ BEFORE touching any code. Consult git history -
|
|
|
+ don't repeat discarded approaches.
|
|
|
+
|
|
|
+ 3. MODIFY Make ONE atomic change to in-scope files only.
|
|
|
+ Small, focused, explainable.
|
|
|
+
|
|
|
+ 4. COMMIT git add <specific files> (never git add -A)
|
|
|
+ git commit -m "experiment: <description>"
|
|
|
+ Commit BEFORE verification. Enables clean rollback.
|
|
|
+
|
|
|
+ 5. VERIFY Run the verify command. Extract the metric.
|
|
|
+ If guard is set and metric improved, run guard too.
|
|
|
+
|
|
|
+ 6. DECIDE
|
|
|
+ Improved + guard passes (or no guard) -> KEEP
|
|
|
+ Improved + guard fails -> REVERT (git revert HEAD --no-edit)
|
|
|
+ Same or worse -> REVERT
|
|
|
+ Crashed -> attempt fix (max 3 tries), else REVERT
|
|
|
+
|
|
|
+ 7. LOG Append row to results.tsv
|
|
|
+
|
|
|
+ 8. REPEAT Go to 1. Print a one-line status every 5 iterations.
|
|
|
+ NEVER ask "should I continue?" - just keep going.
|
|
|
+ If bounded and iteration N reached, print summary and stop.
|
|
|
+```
|
|
|
+
|
|
|
+### Rollback
|
|
|
+
|
|
|
+Always use `git revert HEAD --no-edit` (preserves the experiment in history - the agent can learn from it). If revert conflicts, fall back to `git reset --hard HEAD~1`.
|
|
|
+
|
|
|
+### When Stuck (5+ consecutive discards)
|
|
|
+
|
|
|
+1. Re-read ALL in-scope files from scratch
|
|
|
+2. Re-read the original goal
|
|
|
+3. Review entire results.tsv for patterns
|
|
|
+4. Try combining two previously successful changes
|
|
|
+5. Try the opposite of what hasn't been working
|
|
|
+6. Try something radical - architectural changes, different algorithms
|
|
|
+
|
|
|
+## Rules
|
|
|
+
|
|
|
+1. **One change per iteration.** Atomic. If it breaks, you know exactly why.
|
|
|
+2. **Mechanical verification only.** No "looks good." The number decides.
|
|
|
+3. **Git is memory.** Commit before verify. Revert on failure. Read `git log` before ideating. Failed experiments stay visible in history via revert commits.
|
|
|
+4. **Simpler wins.** Equal metric + less code = keep. Tiny improvement + ugly complexity = discard. Removing code for equal results is a win.
|
|
|
+5. **Never stop.** Unbounded loops run until interrupted. Never ask permission to continue. The user may be asleep.
|
|
|
+6. **Read before write.** Understand full context before each modification.
|
|
|
+7. **Scope is sacred.** Only modify files matching the scope globs. Never touch verify/guard targets, test fixtures, or config outside scope.
|
|
|
+
|
|
|
+## Results Log
|
|
|
+
|
|
|
+Tab-separated file: `results.tsv`
|
|
|
+
|
|
|
+```tsv
|
|
|
+iteration commit metric status description
|
|
|
+0 a1b2c3d 72.3 baseline initial state
|
|
|
+1 b2c3d4e 74.1 keep add edge case tests for auth module
|
|
|
+2 - 73.8 discard refactor test helpers (broke coverage)
|
|
|
+3 c3d4e5f 75.0 keep add missing null checks in user service
|
|
|
+4 - 0.0 crash switched to vitest (import errors)
|
|
|
+```
|
|
|
+
|
|
|
+**Status values:** `baseline`, `keep`, `discard`, `crash`
|
|
|
+
|
|
|
+### Progress Output
|
|
|
+
|
|
|
+Every 5 iterations, print a brief status:
|
|
|
+
|
|
|
+```
|
|
|
+Iteration 15: metric 81.2 (baseline 72.3, +8.9) | 6 keeps, 8 discards, 1 crash
|
|
|
+```
|
|
|
+
|
|
|
+When a bounded loop completes:
|
|
|
+
|
|
|
+```
|
|
|
+=== Iterate Complete (25/25) ===
|
|
|
+Baseline: 72.3 -> Final: 88.7 (+16.4)
|
|
|
+Keeps: 12 | Discards: 11 | Crashes: 2
|
|
|
+Best iteration: #18 - add integration tests for payment flow (+3.2)
|
|
|
+```
|
|
|
+
|
|
|
+## Adapting to Any Domain
|
|
|
+
|
|
|
+The pattern is universal. Change the five inputs, not the loop.
|
|
|
+
|
|
|
+| Domain | Goal | Verify | Direction |
|
|
|
+|--------|------|--------|-----------|
|
|
|
+| Test coverage | Coverage to 90% | `npm test -- --coverage` | higher |
|
|
|
+| Bundle size | Below 200KB | `npm run build && stat -f%z dist/main.js` | lower |
|
|
|
+| Performance | Faster API response | `npm run bench \| grep p95` | lower |
|
|
|
+| ML training | Lower validation loss | `uv run train.py && grep val_bpb run.log` | lower |
|
|
|
+| Lint errors | Zero warnings | `npm run lint 2>&1 \| grep -c warning` | lower |
|
|
|
+| Lighthouse | Score above 95 | `npx lighthouse --output=json \| jq .score` | higher |
|
|
|
+| Code quality | Reduce complexity | `npx complexity-report \| grep average` | lower |
|
|
|
+
|
|
|
+## Guard: Preventing Regressions
|
|
|
+
|
|
|
+The guard is an optional safety net - a command that must always pass regardless of what the main metric does.
|
|
|
+
|
|
|
+- **Verify** answers: "Did the metric improve?"
|
|
|
+- **Guard** answers: "Did anything else break?"
|
|
|
+
|
|
|
+If the metric improves but the guard fails, the change is reverted. The agent should note WHY the guard failed and adapt future attempts accordingly.
|
|
|
+
|
|
|
+Common guards: `npm test`, `tsc --noEmit`, `cargo check`, `pytest`, `go vet`
|
|
|
+
|
|
|
+## Usage Examples
|
|
|
+
|
|
|
+### Inline config (all fields provided)
|
|
|
+
|
|
|
+```
|
|
|
+/iterate
|
|
|
+Goal: Increase test coverage from 72% to 90%
|
|
|
+Scope: src/**/*.ts, src/**/*.test.ts
|
|
|
+Verify: npm test -- --coverage | grep "All files" | awk '{print $10}'
|
|
|
+Direction: higher
|
|
|
+Guard: tsc --noEmit
|
|
|
+Iterations: 30
|
|
|
+```
|
|
|
+
|
|
|
+### Minimal (triggers interactive setup)
|
|
|
+
|
|
|
+```
|
|
|
+/iterate
|
|
|
+Goal: Make the API faster
|
|
|
+```
|
|
|
+
|
|
|
+Agent scans codebase for tooling, suggests scope/verify/direction, asks once, then goes.
|
|
|
+
|
|
|
+### Unbounded overnight run
|
|
|
+
|
|
|
+```
|
|
|
+/iterate
|
|
|
+Goal: Reduce bundle size below 150KB
|
|
|
+Scope: src/**/*.ts, webpack.config.js
|
|
|
+Verify: npm run build 2>&1 | grep "main.js" | awk '{print $2}'
|
|
|
+Direction: lower
|
|
|
+```
|
|
|
+
|
|
|
+Agent runs indefinitely. User interrupts in the morning. Results are in `results.tsv` and git history.
|