Browse Source

feat(skills): Add /iterate autonomous improvement loop (v2.1.0)

Inspired by Karpathy's autoresearch - constrain scope, clarify success
with one mechanical metric, loop autonomously. The agent modifies code,
measures the result, keeps improvements, discards regressions, repeats.

- Add iterate skill (197 lines) with score.sh self-test
- Update install scripts with -patterns to -ops cleanup
- Bump plugin version to 2.1.0 (65 skills)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
0xDarkMatter 3 weeks ago
parent
commit
fb7acf4b73
7 changed files with 398 additions and 27 deletions
  1. 2 1
      .claude-plugin/plugin.json
  2. 1 1
      AGENTS.md
  3. 18 25
      README.md
  4. 27 0
      scripts/install.ps1
  5. 26 0
      scripts/install.sh
  6. 197 0
      skills/iterate/SKILL.md
  7. 127 0
      skills/iterate/scripts/score.sh

+ 2 - 1
.claude-plugin/plugin.json

@@ -1,6 +1,6 @@
 {
   "name": "claude-mods",
-  "version": "2.0.0",
+  "version": "2.1.0",
   "description": "Custom commands, skills, and agents for Claude Code - session continuity, 22 expert agents, 65 skills, 3 commands, 5 rules, 3 hooks, 4 output styles, modern CLI tools",
   "author": "0xDarkMatter",
   "repository": "https://github.com/0xDarkMatter/claude-mods",
@@ -69,6 +69,7 @@
       "skills/git-workflow",
       "skills/go-ops",
       "skills/introspect",
+      "skills/iterate",
       "skills/javascript-ops",
       "skills/laravel-ops",
       "skills/log-ops",

+ 1 - 1
AGENTS.md

@@ -5,7 +5,7 @@
 This is **claude-mods** - a collection of custom extensions for Claude Code:
 - **22 expert agents** for specialized domains (React, Python, Go, Rust, AWS, etc.)
 - **3 commands** for session management (/sync, /save) and experimental features (/canvas)
-- **64 skills** for CLI tools, patterns, workflows, and development tasks
+- **65 skills** for CLI tools, patterns, workflows, and development tasks
 - **4 output styles** for response personality (Vesper, Spartan, Mentor, Executive)
 - **3 hooks** for pre-commit linting, post-edit formatting, and dangerous command warnings
 

File diff suppressed because it is too large
+ 18 - 25
README.md


+ 27 - 0
scripts/install.ps1

@@ -43,6 +43,33 @@ $deprecated = @(
     "$claudeDir\skills\conclave"
 )
 
+# Renamed skills: -patterns -> -ops (March 2026)
+$renamedSkills = @(
+    "cli-patterns",
+    "mcp-patterns",
+    "python-async-patterns",
+    "python-cli-patterns",
+    "python-database-patterns",
+    "python-fastapi-patterns",
+    "python-observability-patterns",
+    "python-pytest-patterns",
+    "python-typing-patterns",
+    "rest-patterns",
+    "security-patterns",
+    "sql-patterns",
+    "tailwind-patterns",
+    "testing-patterns"
+)
+
+foreach ($oldSkill in $renamedSkills) {
+    $oldPath = "$claudeDir\skills\$oldSkill"
+    if (Test-Path $oldPath) {
+        Remove-Item -Path $oldPath -Recurse -Force
+        $newName = $oldSkill -replace '-patterns$', '-ops'
+        Write-Host "  Removed renamed: $oldSkill (now $newName)" -ForegroundColor Red
+    }
+}
+
 Write-Host "Cleaning up deprecated items..." -ForegroundColor Yellow
 foreach ($item in $deprecated) {
     if (Test-Path $item) {

+ 26 - 0
scripts/install.sh

@@ -44,6 +44,32 @@ deprecated_items=(
     "$CLAUDE_DIR/skills/conclave"         # Deprecated
 )
 
+# Renamed skills: -patterns -> -ops (March 2026)
+renamed_skills=(
+    cli-patterns
+    mcp-patterns
+    python-async-patterns
+    python-cli-patterns
+    python-database-patterns
+    python-fastapi-patterns
+    python-observability-patterns
+    python-pytest-patterns
+    python-typing-patterns
+    rest-patterns
+    security-patterns
+    sql-patterns
+    tailwind-patterns
+    testing-patterns
+)
+
+for old_skill in "${renamed_skills[@]}"; do
+    old_path="$CLAUDE_DIR/skills/$old_skill"
+    if [ -d "$old_path" ]; then
+        rm -rf "$old_path"
+        echo -e "  ${RED}Removed renamed: $old_skill (now ${old_skill%-patterns}-ops)${NC}"
+    fi
+done
+
 for item in "${deprecated_items[@]}"; do
     if [ -e "$item" ]; then
         rm -rf "$item"

+ 197 - 0
skills/iterate/SKILL.md

@@ -0,0 +1,197 @@
+---
+name: iterate
+description: "Autonomous improvement loop - modify, measure, keep or discard, repeat. Inspired by Karpathy's autoresearch. Triggers on: iterate, improve autonomously, run overnight, keep improving, autoresearch, improvement loop, iterate until done, autonomous iteration."
+allowed-tools: "Read Write Edit Glob Grep Bash Agent"
+---
+
+# Iterate - Autonomous Improvement Loop
+
+Inspired by [Karpathy's autoresearch](https://github.com/karpathy/autoresearch): constrain scope, clarify success with one mechanical metric, loop autonomously. The agent modifies code, measures the result, keeps improvements, discards regressions, and repeats - indefinitely or for N iterations.
+
+The power is in the constraint. One metric. One scope. One loop. Git as memory.
+
+## Setup
+
+Collect five inputs. If the user provides them inline, extract and proceed. If any are missing, ask once using `AskUserQuestion` with all missing fields batched together.
+
+| Field | Required | What it is | Example |
+|-------|----------|------------|---------|
+| **Goal** | Yes | What you're improving, in plain language | "Increase test coverage to 90%" |
+| **Scope** | Yes | File globs the agent may modify | `src/**/*.ts` |
+| **Verify** | Yes | Shell command that outputs the metric (a number) | `npm test -- --coverage \| grep "All files"` |
+| **Direction** | Yes | Is higher or lower better? | `higher` / `lower` |
+| **Guard** | No | Command that must always pass (prevents regressions) | `npm run typecheck` |
+
+**Bounded mode:** If the user includes `Iterations: N`, run exactly N iterations then stop with a summary. Otherwise, loop forever until interrupted.
+
+### Baseline
+
+Once config is complete:
+
+1. Read all in-scope files for full context
+2. Run the verify command on the current state
+3. Extract the metric value - this is iteration 0 (baseline)
+4. Create `results.tsv` with the header and baseline row
+5. Confirm setup to the user, then begin the loop
+
+```
+Goal:      Increase test coverage to 90%
+Scope:     src/**/*.ts
+Verify:    npm test -- --coverage | grep "All files"
+Direction: higher
+Guard:     npm run typecheck
+Baseline:  72.3%
+Mode:      unbounded
+
+Starting iteration loop.
+```
+
+## The Loop
+
+```
+LOOP (forever, or N times):
+
+  1. REVIEW    git log --oneline -10 + read results.tsv tail
+              Know what worked, what failed, what's untried.
+
+  2. IDEATE    Pick ONE change. Write a one-sentence description
+              BEFORE touching any code. Consult git history -
+              don't repeat discarded approaches.
+
+  3. MODIFY    Make ONE atomic change to in-scope files only.
+              Small, focused, explainable.
+
+  4. COMMIT    git add <specific files> (never git add -A)
+              git commit -m "experiment: <description>"
+              Commit BEFORE verification. Enables clean rollback.
+
+  5. VERIFY    Run the verify command. Extract the metric.
+              If guard is set and metric improved, run guard too.
+
+  6. DECIDE
+              Improved + guard passes (or no guard) -> KEEP
+              Improved + guard fails -> REVERT (git revert HEAD --no-edit)
+              Same or worse                -> REVERT
+              Crashed -> attempt fix (max 3 tries), else REVERT
+
+  7. LOG       Append row to results.tsv
+
+  8. REPEAT    Go to 1. Print a one-line status every 5 iterations.
+              NEVER ask "should I continue?" - just keep going.
+              If bounded and iteration N reached, print summary and stop.
+```
+
+### Rollback
+
+Always use `git revert HEAD --no-edit` (preserves the experiment in history - the agent can learn from it). If revert conflicts, fall back to `git reset --hard HEAD~1`.
+
+### When Stuck (5+ consecutive discards)
+
+1. Re-read ALL in-scope files from scratch
+2. Re-read the original goal
+3. Review entire results.tsv for patterns
+4. Try combining two previously successful changes
+5. Try the opposite of what hasn't been working
+6. Try something radical - architectural changes, different algorithms
+
+## Rules
+
+1. **One change per iteration.** Atomic. If it breaks, you know exactly why.
+2. **Mechanical verification only.** No "looks good." The number decides.
+3. **Git is memory.** Commit before verify. Revert on failure. Read `git log` before ideating. Failed experiments stay visible in history via revert commits.
+4. **Simpler wins.** Equal metric + less code = keep. Tiny improvement + ugly complexity = discard. Removing code for equal results is a win.
+5. **Never stop.** Unbounded loops run until interrupted. Never ask permission to continue. The user may be asleep.
+6. **Read before write.** Understand full context before each modification.
+7. **Scope is sacred.** Only modify files matching the scope globs. Never touch verify/guard targets, test fixtures, or config outside scope.
+
+## Results Log
+
+Tab-separated file: `results.tsv`
+
+```tsv
+iteration	commit	metric	status	description
+0	a1b2c3d	72.3	baseline	initial state
+1	b2c3d4e	74.1	keep	add edge case tests for auth module
+2	-	73.8	discard	refactor test helpers (broke coverage)
+3	c3d4e5f	75.0	keep	add missing null checks in user service
+4	-	0.0	crash	switched to vitest (import errors)
+```
+
+**Status values:** `baseline`, `keep`, `discard`, `crash`
+
+### Progress Output
+
+Every 5 iterations, print a brief status:
+
+```
+Iteration 15: metric 81.2 (baseline 72.3, +8.9) | 6 keeps, 8 discards, 1 crash
+```
+
+When a bounded loop completes:
+
+```
+=== Iterate Complete (25/25) ===
+Baseline: 72.3 -> Final: 88.7 (+16.4)
+Keeps: 12 | Discards: 11 | Crashes: 2
+Best iteration: #18 - add integration tests for payment flow (+3.2)
+```
+
+## Adapting to Any Domain
+
+The pattern is universal. Change the five inputs, not the loop.
+
+| Domain | Goal | Verify | Direction |
+|--------|------|--------|-----------|
+| Test coverage | Coverage to 90% | `npm test -- --coverage` | higher |
+| Bundle size | Below 200KB | `npm run build && stat -f%z dist/main.js` | lower |
+| Performance | Faster API response | `npm run bench \| grep p95` | lower |
+| ML training | Lower validation loss | `uv run train.py && grep val_bpb run.log` | lower |
+| Lint errors | Zero warnings | `npm run lint 2>&1 \| grep -c warning` | lower |
+| Lighthouse | Score above 95 | `npx lighthouse --output=json \| jq .score` | higher |
+| Code quality | Reduce complexity | `npx complexity-report \| grep average` | lower |
+
+## Guard: Preventing Regressions
+
+The guard is an optional safety net - a command that must always pass regardless of what the main metric does.
+
+- **Verify** answers: "Did the metric improve?"
+- **Guard** answers: "Did anything else break?"
+
+If the metric improves but the guard fails, the change is reverted. The agent should note WHY the guard failed and adapt future attempts accordingly.
+
+Common guards: `npm test`, `tsc --noEmit`, `cargo check`, `pytest`, `go vet`
+
+## Usage Examples
+
+### Inline config (all fields provided)
+
+```
+/iterate
+Goal: Increase test coverage from 72% to 90%
+Scope: src/**/*.ts, src/**/*.test.ts
+Verify: npm test -- --coverage | grep "All files" | awk '{print $10}'
+Direction: higher
+Guard: tsc --noEmit
+Iterations: 30
+```
+
+### Minimal (triggers interactive setup)
+
+```
+/iterate
+Goal: Make the API faster
+```
+
+Agent scans codebase for tooling, suggests scope/verify/direction, asks once, then goes.
+
+### Unbounded overnight run
+
+```
+/iterate
+Goal: Reduce bundle size below 150KB
+Scope: src/**/*.ts, webpack.config.js
+Verify: npm run build 2>&1 | grep "main.js" | awk '{print $2}'
+Direction: lower
+```
+
+Agent runs indefinitely. User interrupts in the morning. Results are in `results.tsv` and git history.

+ 127 - 0
skills/iterate/scripts/score.sh

@@ -0,0 +1,127 @@
+#!/bin/bash
+# Score iterate/SKILL.md on structural quality metrics.
+# Output: a single integer score (higher = better, max ~100).
+#
+# This scorer is deliberately hard to max out.
+# It rewards clarity, completeness, and economy of expression.
+
+FILE="${1:-skills/iterate/SKILL.md}"
+SCORE=0
+
+if [ ! -f "$FILE" ]; then
+  echo "0"
+  exit 0
+fi
+
+CONTENT=$(cat "$FILE")
+LINES=$(wc -l < "$FILE" | tr -d ' ')
+WORDS=$(wc -w < "$FILE" | tr -d ' ')
+
+# --- STRUCTURE (25 pts) ---
+
+# Frontmatter fields (10 pts)
+echo "$CONTENT" | head -1 | grep -q "^---" && {
+  echo "$CONTENT" | grep -q "^name:" && SCORE=$((SCORE + 2))
+  echo "$CONTENT" | grep -q "^description:" && SCORE=$((SCORE + 3))
+  echo "$CONTENT" | grep -qi "triggers\? on:" && SCORE=$((SCORE + 2))
+  echo "$CONTENT" | grep -q "^allowed-tools:" && SCORE=$((SCORE + 3))
+}
+
+# Required sections present (15 pts, 2.5 each - using 2+3 alternating)
+for section in "## Setup" "## The Loop" "## Rules" "## Results" "## Adapt" "## Guard"; do
+  echo "$CONTENT" | grep -qi "$section" && SCORE=$((SCORE + 2))
+done
+# Bonus for usage examples section
+echo "$CONTENT" | grep -qi "## Usage\|## Example" && SCORE=$((SCORE + 3))
+
+# --- LOOP COMPLETENESS (20 pts) ---
+# All 8 steps of the loop referenced
+for step in "REVIEW\|Review\|review" "IDEATE\|Ideate\|ideate" "MODIFY\|Modify\|modify" \
+            "COMMIT\|Commit\|commit" "VERIFY\|Verify\|verify" "DECIDE\|Decide\|decide" \
+            "LOG\|Log\|log" "REPEAT\|Repeat\|repeat"; do
+  echo "$CONTENT" | grep -q "$step" && SCORE=$((SCORE + 2))
+done
+
+# Rollback strategy mentioned (4 pts)
+echo "$CONTENT" | grep -q "git revert" && SCORE=$((SCORE + 2))
+echo "$CONTENT" | grep -q "git reset\|fallback\|fall back" && SCORE=$((SCORE + 2))
+
+# --- CLARITY (20 pts) ---
+
+# Results.tsv example with actual data rows (5 pts)
+tsv_rows=$(echo "$CONTENT" | grep -c "^[0-9].*	.*	.*keep\|^[0-9].*	.*	.*discard\|^[0-9].*	.*	.*baseline\|^[0-9].*	.*	.*crash")
+if [ "$tsv_rows" -ge 3 ]; then
+  SCORE=$((SCORE + 5))
+elif [ "$tsv_rows" -ge 1 ]; then
+  SCORE=$((SCORE + 2))
+fi
+
+# Domain adaptation table with 5+ examples (5 pts)
+domain_rows=$(echo "$CONTENT" | grep -c "^|.*|.*|.*higher\|^|.*|.*|.*lower")
+if [ "$domain_rows" -ge 5 ]; then
+  SCORE=$((SCORE + 5))
+elif [ "$domain_rows" -ge 3 ]; then
+  SCORE=$((SCORE + 3))
+fi
+
+# Progress output format shown (5 pts)
+echo "$CONTENT" | grep -q "Iteration [0-9]" && SCORE=$((SCORE + 3))
+echo "$CONTENT" | grep -q "=== Iterate Complete\|summary" && SCORE=$((SCORE + 2))
+
+# Inline config example with all 5 fields (5 pts)
+example_fields=0
+echo "$CONTENT" | grep -q "^Goal:" && example_fields=$((example_fields + 1))
+echo "$CONTENT" | grep -q "^Scope:" && example_fields=$((example_fields + 1))
+echo "$CONTENT" | grep -q "^Verify:" && example_fields=$((example_fields + 1))
+echo "$CONTENT" | grep -q "^Direction:" && example_fields=$((example_fields + 1))
+echo "$CONTENT" | grep -q "^Guard:" && example_fields=$((example_fields + 1))
+if [ "$example_fields" -ge 5 ]; then
+  SCORE=$((SCORE + 5))
+elif [ "$example_fields" -ge 3 ]; then
+  SCORE=$((SCORE + 2))
+fi
+
+# --- ECONOMY (20 pts) ---
+
+# Line count: sweet spot 150-250 (10 pts)
+if [ "$LINES" -ge 150 ] && [ "$LINES" -le 250 ]; then
+  SCORE=$((SCORE + 10))
+elif [ "$LINES" -ge 120 ] && [ "$LINES" -le 300 ]; then
+  SCORE=$((SCORE + 6))
+elif [ "$LINES" -ge 80 ] && [ "$LINES" -le 400 ]; then
+  SCORE=$((SCORE + 3))
+fi
+
+# Word economy: under 2500 words (5 pts)
+if [ "$WORDS" -le 1800 ]; then
+  SCORE=$((SCORE + 5))
+elif [ "$WORDS" -le 2500 ]; then
+  SCORE=$((SCORE + 3))
+elif [ "$WORDS" -le 3500 ]; then
+  SCORE=$((SCORE + 1))
+fi
+
+# No TODO/FIXME/HACK/XXX markers (5 pts)
+todo_count=$(echo "$CONTENT" | grep -ci "TODO\|FIXME\|HACK\|XXX")
+if [ "$todo_count" -eq 0 ]; then
+  SCORE=$((SCORE + 5))
+fi
+
+# --- HYGIENE (15 pts) ---
+
+# Attribution to Karpathy (3 pts)
+echo "$CONTENT" | grep -qi "karpathy" && SCORE=$((SCORE + 3))
+
+# AskUserQuestion mentioned for missing config (3 pts)
+echo "$CONTENT" | grep -q "AskUserQuestion" && SCORE=$((SCORE + 3))
+
+# Bounded mode (Iterations: N) documented (3 pts)
+echo "$CONTENT" | grep -q "Iterations:" && SCORE=$((SCORE + 3))
+
+# "Never stop" / "never ask" principle (3 pts)
+echo "$CONTENT" | grep -qi "never stop\|never ask" && SCORE=$((SCORE + 3))
+
+# git add specific files warning (no -A) (3 pts)
+echo "$CONTENT" | grep -q "git add -A\|never.*git add" && SCORE=$((SCORE + 3))
+
+echo "$SCORE"