|
|
@@ -1,6 +1,6 @@
|
|
|
---
|
|
|
name: iterate
|
|
|
-description: "Autonomous improvement loop - modify, measure, keep or discard, repeat. Inspired by Karpathy's autoresearch. Triggers on: iterate, improve autonomously, run overnight, keep improving, autoresearch, improvement loop, iterate until done, autonomous iteration."
|
|
|
+description: "Autonomous improvement loop - modify, measure, keep or discard, repeat. Inspired by Karpathy's autoresearch. Triggers on: iterate, improve autonomously, run overnight, keep improving, autoresearch, improvement loop, iterate until done, autonomous iteration, batch experiments."
|
|
|
license: MIT
|
|
|
allowed-tools: "Read Write Edit Glob Grep Bash Agent TaskCreate TaskUpdate TaskList"
|
|
|
metadata:
|
|
|
@@ -9,7 +9,7 @@ metadata:
|
|
|
|
|
|
# Iterate - Autonomous Improvement Loop
|
|
|
|
|
|
-Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch): constrain scope, clarify success with one mechanical metric, loop autonomously. The agent modifies code, measures the result, keeps improvements, discards regressions, and repeats - indefinitely or for N iterations.
|
|
|
+Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch): constrain scope, clarify success with one mechanical metric, loop autonomously. The agent modifies code, measures the result, keeps improvements, discards regressions, and repeats - until any stop condition fires or the user interrupts.
|
|
|
|
|
|
The power is in the constraint. One metric. One scope. One loop. Git as memory.
|
|
|
|
|
|
@@ -19,7 +19,7 @@ Before the loop starts, do the work that makes the loop effective. Don't skip st
|
|
|
|
|
|
### 1. Collect Config
|
|
|
|
|
|
-Five inputs. If provided inline, extract and proceed. If any are missing, ask once using `AskUserQuestion` with all missing fields batched together.
|
|
|
+If provided inline, extract and proceed. If required fields are missing, ask once using `AskUserQuestion` with all missing fields batched together.
|
|
|
|
|
|
| Field | Required | What it is | Example |
|
|
|
|-------|----------|------------|---------|
|
|
|
@@ -28,8 +28,13 @@ Five inputs. If provided inline, extract and proceed. If any are missing, ask on
|
|
|
| **Verify** | Yes | Shell command that outputs the metric (a number) | `npm test -- --coverage \| grep "All files"` |
|
|
|
| **Direction** | Yes | Is higher or lower better? | `higher` / `lower` |
|
|
|
| **Guard** | No | Command that must always pass (prevents regressions) | `npm run typecheck` |
|
|
|
+| **Batch** | No | Changes per iteration. >1 enables bisect-on-regression. Default `1`. | `3` |
|
|
|
+| **Iterations** | No | Hard cap on iteration count. | `30` |
|
|
|
+| **Until** | No | Stop when metric crosses this target value. | `90` |
|
|
|
+| **Stagnation** | No | Stop after N consecutive iterations with no improvement. | `15` |
|
|
|
+| **Branch** | No | Branch isolation. `current` (default), `auto` (slug from goal), or explicit name. | `auto` |
|
|
|
|
|
|
-**Bounded mode:** If the user includes `Iterations: N`, run exactly N iterations then stop with a summary. Otherwise, loop forever until interrupted.
|
|
|
+**Stop conditions are OR'd**: any combination of `Iterations`, `Until`, `Stagnation` may be set. The loop stops when any one is satisfied. If none are set, the loop is unbounded - it runs until interrupted.
|
|
|
|
|
|
### 2. Plan
|
|
|
|
|
|
@@ -48,7 +53,25 @@ Check that `allowed-tools` cover what the loop needs. The verify and guard comma
|
|
|
- Dry-run the guard command (if set). Same check.
|
|
|
- If permissions are missing, suggest specific wildcard additions for `.claude/settings.local.json` and ask the user to approve before starting. Reference `/setperms` for a full setup.
|
|
|
|
|
|
-### 4. Tasks
|
|
|
+### 4. Branch Setup
|
|
|
+
|
|
|
+The `Branch` field controls where iteration commits land.
|
|
|
+
|
|
|
+| Value | Behavior |
|
|
|
+|-------|----------|
|
|
|
+| `current` (default) | Stay on the current branch. Commits land directly. |
|
|
|
+| `auto` | Create `iterate/<slug-from-goal>` from current HEAD and switch to it. |
|
|
|
+| `<explicit-name>` | Create branch with that exact name and switch to it. |
|
|
|
+
|
|
|
+**Slug derivation** (for `auto`): lowercase the Goal, replace non-alphanumeric runs with `-`, trim leading/trailing dashes, truncate to 40 chars. "Increase test coverage to 90%" → `iterate/increase-test-coverage-to-90`.
|
|
|
+
|
|
|
+**Collision**: if the branch already exists, suffix `-2`, `-3`, etc.
|
|
|
+
|
|
|
+**Confirm before switching**: print the chosen branch name and source branch. Do not silently create a branch the user didn't ask for.
|
|
|
+
|
|
|
+**Cleanup**: never auto-delete the branch. The user decides whether to merge, open a PR, or `git branch -D` it. The skill's job ends at "branch exists with results."
|
|
|
+
|
|
|
+### 5. Tasks
|
|
|
|
|
|
Create a TaskList to track progress across iterations. This provides structure the user can check without reading the full results log.
|
|
|
|
|
|
@@ -60,7 +83,7 @@ TaskCreate: "Final summary and cleanup" (status: pending)
|
|
|
|
|
|
Update task status as the loop progresses. Mark the iteration task as `in_progress` when the loop starts, `completed` when it ends.
|
|
|
|
|
|
-### 5. Tests and Verification
|
|
|
+### 6. Tests and Verification
|
|
|
|
|
|
Before the first iteration, make sure verification actually works:
|
|
|
|
|
|
@@ -68,25 +91,27 @@ Before the first iteration, make sure verification actually works:
|
|
|
- Run the guard command (if set). If it fails on the current state, the codebase has pre-existing issues - flag to the user.
|
|
|
- If tests don't exist yet for the scope, consider writing them as iteration 0. Good tests make the loop more effective.
|
|
|
|
|
|
-### 6. Baseline
|
|
|
+### 7. Baseline
|
|
|
|
|
|
Record the starting point:
|
|
|
|
|
|
1. Run verify command, extract the metric - this is iteration 0
|
|
|
2. Create `results.tsv` with the header and baseline row
|
|
|
-3. Update the baseline task to `completed`
|
|
|
-4. Confirm setup to the user, then begin the loop
|
|
|
+3. Tag the baseline: `git tag iterate/best` (will float forward as the metric improves)
|
|
|
+4. Update the baseline task to `completed`
|
|
|
+5. Confirm setup to the user, then begin the loop
|
|
|
|
|
|
```
|
|
|
-Goal: Increase test coverage to 90%
|
|
|
-Scope: src/**/*.ts
|
|
|
-Verify: npm test -- --coverage | grep "All files"
|
|
|
-Direction: higher
|
|
|
-Guard: npm run typecheck
|
|
|
-Baseline: 72.3%
|
|
|
-Mode: unbounded
|
|
|
-Tasks: 3 created
|
|
|
-Permissions: verified (all commands pre-approved)
|
|
|
+Goal: Increase test coverage to 90%
|
|
|
+Scope: src/**/*.ts
|
|
|
+Verify: npm test -- --coverage | grep "All files"
|
|
|
+Direction: higher
|
|
|
+Guard: npm run typecheck
|
|
|
+Branch: iterate/increase-test-coverage-to-90 (created from main)
|
|
|
+Batch: 3
|
|
|
+Stop: Iterations 50 OR Until ≥ 90.0 OR Stagnation 15
|
|
|
+Baseline: 72.3
|
|
|
+Permissions: verified
|
|
|
|
|
|
Starting iteration loop.
|
|
|
```
|
|
|
@@ -94,43 +119,93 @@ Starting iteration loop.
|
|
|
## The Loop
|
|
|
|
|
|
```
|
|
|
-LOOP (forever, or N times):
|
|
|
+LOOP (until any stop condition met):
|
|
|
|
|
|
1. REVIEW git log --oneline -10 + read results.tsv tail
|
|
|
Know what worked, what failed, what's untried.
|
|
|
|
|
|
- 2. IDEATE Pick ONE change. Write a one-sentence description
|
|
|
- BEFORE touching any code. Consult git history -
|
|
|
+ 2. IDEATE Pick UP TO `Batch` independent changes. Each must stand on
|
|
|
+ its own and be applicable independently. Write a one-sentence
|
|
|
+ description per change BEFORE touching code. Consult git history -
|
|
|
don't repeat discarded approaches.
|
|
|
|
|
|
- 3. MODIFY Make ONE atomic change to in-scope files only.
|
|
|
- Small, focused, explainable.
|
|
|
+ 3. MODIFY+COMMIT
|
|
|
+ For each change in the batch (in order):
|
|
|
+ a. Apply the change to in-scope files only.
|
|
|
+ b. git add <specific files> (never git add -A)
|
|
|
+ c. git commit -m "experiment: <one-line description>"
|
|
|
+ Each change is its own commit. Non-negotiable - bisection
|
|
|
+ depends on it.
|
|
|
+
|
|
|
+ 4. VERIFY Run the verify command after the final commit of the batch.
|
|
|
+ Extract the metric. If guard is set, run it too.
|
|
|
+
|
|
|
+ 5. DECIDE
|
|
|
+ Improved + guard ok (or no guard)
|
|
|
+ -> KEEP entire batch
|
|
|
+ Regressed / unchanged / guard failed:
|
|
|
+ if Batch == 1 -> REVERT the one commit
|
|
|
+ if Batch > 1 -> BISECT (see below)
|
|
|
+ Crashed (verify or guard non-zero exit, not just regressed)
|
|
|
+ -> attempt fix (max 3 tries), else REVERT entire batch
|
|
|
+
|
|
|
+ 6. LOG Append one row per change to results.tsv.
|
|
|
+
|
|
|
+ 7. SNAPSHOT If the new metric beats the previous best, force-update tag:
|
|
|
+ git tag -f iterate/best
|
|
|
+
|
|
|
+ 8. CHECK STOP
|
|
|
+ Iterations cap reached? -> stop, summarize, exit.
|
|
|
+ Until target crossed? -> stop, summarize, exit.
|
|
|
+ Stagnation N reached? -> stop, summarize, exit.
|
|
|
+ Interrupted / fatal error? -> stop, summarize, exit.
|
|
|
+
|
|
|
+ 9. REPEAT Go to 1. Print a one-line status every 5 iterations.
|
|
|
+ NEVER ask "should I continue?" - just keep going.
|
|
|
+```
|
|
|
|
|
|
- 4. COMMIT git add <specific files> (never git add -A)
|
|
|
- git commit -m "experiment: <description>"
|
|
|
- Commit BEFORE verification. Enables clean rollback.
|
|
|
+### Bisection (Batch > 1, regression detected)
|
|
|
|
|
|
- 5. VERIFY Run the verify command. Extract the metric.
|
|
|
- If guard is set and metric improved, run guard too.
|
|
|
+When a batched verify fails, the loop must identify which commit(s) caused the regression - keeping the good ones, dropping the bad ones.
|
|
|
|
|
|
- 6. DECIDE
|
|
|
- Improved + guard passes (or no guard) -> KEEP
|
|
|
- Improved + guard fails -> REVERT (git revert HEAD --no-edit)
|
|
|
- Same or worse -> REVERT
|
|
|
- Crashed -> attempt fix (max 3 tries), else REVERT
|
|
|
+```
|
|
|
+1. Note C0 = the iteration's start commit (before the batch)
|
|
|
+ Note C1..CN = the batch commits in order
|
|
|
|
|
|
- 7. LOG Append row to results.tsv
|
|
|
+2. git reset --hard C0
|
|
|
|
|
|
- 8. REPEAT Go to 1. Print a one-line status every 5 iterations.
|
|
|
- NEVER ask "should I continue?" - just keep going.
|
|
|
- If bounded and iteration N reached, print summary and stop.
|
|
|
+3. For each Ci in order:
|
|
|
+ a. git cherry-pick Ci
|
|
|
+ b. Run verify
|
|
|
+ c. Improved or held + guard ok -> keep (commit stays in history)
|
|
|
+ Regressed or guard failed -> git reset --hard HEAD~1 (drop)
|
|
|
+
|
|
|
+4. Log each change's outcome to results.tsv:
|
|
|
+ status=bisect-keep -> commit kept
|
|
|
+ status=bisect-drop -> commit dropped
|
|
|
```
|
|
|
|
|
|
+**Cost**: worst case is N additional verify runs (1 batch + N individual). For `Batch: 3`, max 4 verifies. Worth it when batches are mostly good - i.e., when you're confident in the domain.
|
|
|
+
|
|
|
+**Rule of thumb**: use `Batch: 1` for exploratory work, `Batch: 3-5` for mechanical fixes (lint, obvious test gaps, dead code), `Batch: 5+` only when you've watched the loop succeed at smaller batches first.
|
|
|
+
|
|
|
### Rollback
|
|
|
|
|
|
-Always use `git revert HEAD --no-edit` (preserves the experiment in history - the agent can learn from it). If revert conflicts, fall back to `git reset --hard HEAD~1`.
|
|
|
+For single-commit reverts: `git revert HEAD --no-edit` (preserves the experiment in history). If revert conflicts, fall back to `git reset --hard HEAD~1`.
|
|
|
+
|
|
|
+For batch reverts (full crash): `git reset --hard <iteration-start-commit>` - drops all batch commits cleanly.
|
|
|
|
|
|
-### When Stuck (5+ consecutive discards)
|
|
|
+### Best Snapshot
|
|
|
+
|
|
|
+`iterate/best` is a force-updated tag pointing to the highest-metric commit so far. It floats forward whenever a new best is reached.
|
|
|
+
|
|
|
+- Recovery from any later regression: `git checkout iterate/best`
|
|
|
+- Inspect what's pinned: `git log iterate/best -1`
|
|
|
+- The skill never deletes the tag - clear manually with `git tag -d iterate/best`
|
|
|
+
|
|
|
+The tag is updated *after* the SNAPSHOT step, so it always reflects the best state visible in `results.tsv`.
|
|
|
+
|
|
|
+### When Stuck (5+ consecutive discards or stagnation watermark)
|
|
|
|
|
|
1. Re-read ALL in-scope files from scratch
|
|
|
2. Re-read the original goal
|
|
|
@@ -139,15 +214,18 @@ Always use `git revert HEAD --no-edit` (preserves the experiment in history - th
|
|
|
5. Try the opposite of what hasn't been working
|
|
|
6. Try something radical - architectural changes, different algorithms
|
|
|
|
|
|
+If `Stagnation: N` is set and reached, the loop will stop on its own. The "when stuck" protocol fires earlier (5 discards) as a course correction before the formal stop fires.
|
|
|
+
|
|
|
## Rules
|
|
|
|
|
|
-1. **One change per iteration.** Atomic. If it breaks, you know exactly why.
|
|
|
+1. **One change per commit.** With `Batch: 1`, one change per iteration. With `Batch: N`, N commits per iteration - each independently bisectable. The atomicity invariant lives at the commit, not the iteration.
|
|
|
2. **Mechanical verification only.** No "looks good." The number decides.
|
|
|
3. **Git is memory.** Commit before verify. Revert on failure. Read `git log` before ideating. Failed experiments stay visible in history via revert commits.
|
|
|
4. **Simpler wins.** Equal metric + less code = keep. Tiny improvement + ugly complexity = discard. Removing code for equal results is a win.
|
|
|
-5. **Never stop.** Unbounded loops run until interrupted. Never ask permission to continue. The user may be asleep.
|
|
|
-6. **Read before write.** Understand full context before each modification.
|
|
|
-7. **Scope is sacred.** Only modify files matching the scope globs. Never touch verify/guard targets, test fixtures, or config outside scope.
|
|
|
+5. **Never stop early.** Unbounded loops run until interrupted or a stop condition fires. Never ask permission to continue.
|
|
|
+6. **Always summarize on exit.** When the loop ends for any reason - bounded completion, target reached, stagnation, interrupt, or fatal error - emit the final summary block before yielding control. The user might be asleep; they'll read it in the morning.
|
|
|
+7. **Read before write.** Understand full context before each modification.
|
|
|
+8. **Scope is sacred.** Only modify files matching the scope globs. Never touch verify/guard targets, test fixtures, or config outside scope.
|
|
|
|
|
|
## Results Log
|
|
|
|
|
|
@@ -158,32 +236,47 @@ iteration commit metric status description
|
|
|
0 a1b2c3d 72.3 baseline initial state
|
|
|
1 b2c3d4e 74.1 keep add edge case tests for auth module
|
|
|
2 - 73.8 discard refactor test helpers (broke coverage)
|
|
|
-3 c3d4e5f 75.0 keep add missing null checks in user service
|
|
|
+3.1 c3d4e5f 74.6 bisect-keep add null check in user service
|
|
|
+3.2 - 74.6 bisect-drop rename helper module (regressed)
|
|
|
+3.3 d4e5f6a 75.0 bisect-keep add tests for token expiry
|
|
|
4 - 0.0 crash switched to vitest (import errors)
|
|
|
```
|
|
|
|
|
|
-**Status values:** `baseline`, `keep`, `discard`, `crash`
|
|
|
+**Status values**: `baseline`, `keep`, `discard`, `bisect-keep`, `bisect-drop`, `crash`
|
|
|
+
|
|
|
+**Iteration column**: integer for atomic iterations (`Batch: 1`), `<iter>.<change>` decimal for batched iterations (one row per change in the batch).
|
|
|
|
|
|
### Progress Output
|
|
|
|
|
|
Every 5 iterations, print a brief status:
|
|
|
|
|
|
```
|
|
|
-Iteration 15: metric 81.2 (baseline 72.3, +8.9) | 6 keeps, 8 discards, 1 crash
|
|
|
+Iter 15: metric 81.2 (baseline 72.3, +8.9, best 81.2) | 6 keeps, 8 discards, 1 crash | stagnation 0/15
|
|
|
```
|
|
|
|
|
|
-When a bounded loop completes:
|
|
|
+### Final Summary (always emitted on exit)
|
|
|
+
|
|
|
+Whatever causes the stop - bounded completion, `Until` target, `Stagnation` cap, interrupt, fatal - print this block before yielding control:
|
|
|
|
|
|
```
|
|
|
-=== Iterate Complete (25/25) ===
|
|
|
-Baseline: 72.3 -> Final: 88.7 (+16.4)
|
|
|
-Keeps: 12 | Discards: 11 | Crashes: 2
|
|
|
-Best iteration: #18 - add integration tests for payment flow (+3.2)
|
|
|
+=== Iterate Complete ===
|
|
|
+Stopped: target reached (Until ≥ 90.0)
|
|
|
+Iterations: 23
|
|
|
+Baseline: 72.3 -> Final 90.4 (+18.1)
|
|
|
+Best: 90.4 @ iter 22 ("add integration tests for payment flow")
|
|
|
+Keeps: 14 | Discards: 8 | Crashes: 1
|
|
|
+Branch: iterate/increase-test-coverage-to-90
|
|
|
+Tag: iterate/best -> 7f8a9b2
|
|
|
+
|
|
|
+Next: review the branch, then merge / PR / cherry-pick / discard at your discretion.
|
|
|
+Recovery from regression: git checkout iterate/best
|
|
|
```
|
|
|
|
|
|
+The "Stopped" line names the trigger. Common values: `bounded completion`, `target reached`, `stagnation cap`, `user interrupt`, `fatal error`.
|
|
|
+
|
|
|
## Adapting to Any Domain
|
|
|
|
|
|
-The pattern is universal. Change the five inputs, not the loop.
|
|
|
+The pattern is universal. Change the inputs, not the loop.
|
|
|
|
|
|
| Domain | Goal | Verify | Direction |
|
|
|
|--------|------|--------|-----------|
|
|
|
@@ -202,13 +295,13 @@ The guard is an optional safety net - a command that must always pass regardless
|
|
|
- **Verify** answers: "Did the metric improve?"
|
|
|
- **Guard** answers: "Did anything else break?"
|
|
|
|
|
|
-If the metric improves but the guard fails, the change is reverted. The agent should note WHY the guard failed and adapt future attempts accordingly.
|
|
|
+If the metric improves but the guard fails, the change is reverted (or bisected, in batch mode). The agent should note WHY the guard failed and adapt future attempts accordingly.
|
|
|
|
|
|
Common guards: `npm test`, `tsc --noEmit`, `cargo check`, `pytest`, `go vet`
|
|
|
|
|
|
## Usage Examples
|
|
|
|
|
|
-### Inline config (all fields provided)
|
|
|
+### Inline config — full overnight run with target
|
|
|
|
|
|
```
|
|
|
/iterate
|
|
|
@@ -217,19 +310,39 @@ Scope: src/**/*.ts, src/**/*.test.ts
|
|
|
Verify: npm test -- --coverage | grep "All files" | awk '{print $10}'
|
|
|
Direction: higher
|
|
|
Guard: tsc --noEmit
|
|
|
-Iterations: 30
|
|
|
+Until: 90
|
|
|
+Stagnation: 15
|
|
|
+Batch: 3
|
|
|
+Branch: auto
|
|
|
```
|
|
|
|
|
|
-### Minimal (triggers interactive setup)
|
|
|
+Runs until coverage hits 90% OR 15 consecutive iterations show no improvement, whichever first. 3 changes per iteration, bisected on regression. Lands on its own branch.
|
|
|
+
|
|
|
+### Bounded throughput run — mechanical fixes
|
|
|
+
|
|
|
+```
|
|
|
+/iterate
|
|
|
+Goal: Reduce lint warnings to zero
|
|
|
+Scope: src/**/*.ts
|
|
|
+Verify: npm run lint 2>&1 | grep -c warning
|
|
|
+Direction: lower
|
|
|
+Until: 0
|
|
|
+Iterations: 50
|
|
|
+Batch: 5
|
|
|
+```
|
|
|
+
|
|
|
+High-confidence mechanical fixes - batch aggressively. Stops at zero warnings or 50 iterations. Stays on current branch.
|
|
|
+
|
|
|
+### Minimal — interactive setup
|
|
|
|
|
|
```
|
|
|
/iterate
|
|
|
Goal: Make the API faster
|
|
|
```
|
|
|
|
|
|
-Agent scans codebase for tooling, suggests scope/verify/direction, asks once, then goes.
|
|
|
+Agent scans codebase, suggests scope/verify/direction, asks once, then goes. Defaults: `Batch: 1`, `Branch: current`, unbounded.
|
|
|
|
|
|
-### Unbounded overnight run
|
|
|
+### Unbounded overnight, atomic, isolated branch
|
|
|
|
|
|
```
|
|
|
/iterate
|
|
|
@@ -237,6 +350,7 @@ Goal: Reduce bundle size below 150KB
|
|
|
Scope: src/**/*.ts, webpack.config.js
|
|
|
Verify: npm run build 2>&1 | grep "main.js" | awk '{print $2}'
|
|
|
Direction: lower
|
|
|
+Branch: auto
|
|
|
```
|
|
|
|
|
|
-Agent runs indefinitely. User interrupts in the morning. Results are in `results.tsv` and git history.
|
|
|
+Runs indefinitely on `iterate/reduce-bundle-size-below-150kb`. User interrupts in the morning, reads the summary, decides whether to merge.
|