Incident-shaped scar tissue. Every entry is a real way an outer loop goes wrong: the
symptom you'd observe, the mechanism underneath, and the catch — the
specific loop-ops control (or Claude Code gate) that prevents or surfaces it. Read this
before you schedule anything unattended; most of these only bite once you're not watching.
The meta-lesson (Addy Osmani): "build the loop like someone who intends to stay the engineer." These failures are what happens when the loop is given more autonomy than its judgment has earned.
budget_tokens (a per-run ceiling). Estimate with loop-estimate before
scheduling; loop-doctor fails the loop if budget_tokens < estimated tokens/run.
Reconcile the loop-estimate estimate against run-log.md actuals periodically — a tick
that used to cost 2k now costing 40k means scope crept.run-log.md shows nothing but failures.verify/guard gate command's binary isn't on PATH in the
scheduler's environment (works on your laptop, absent on the CI runner), or claude
itself isn't installed there. In non-interactive -p, a hard denial aborts the
session — no human to prompt.loop-doctor --live resolves the gate's leading binary and checks
claude/git are on PATH before you schedule. Run it in the target environment.loop-estimate --cached projected; cache_read_input_tokens
stays 0.datetime.now(), a
per-run UUID, or unsorted JSON in the prefix invalidates the cache. Or the cadence is
slower than the cache TTL (a 6h loop can't keep a 1h entry warm), so every tick is a
cold write.run.md byte-identical (the template enforces this — fresh context,
same prompt). loop-estimate tells you whether the cadence can cache at all and which TTL;
if it can't, don't pay the write multiplier — run uncached.main, or ran a production migration —
"to fix the thing."main, prod
deploy/migration, mass delete, IAM grants, deleting pre-session files, .claude edits)
are always escalated, declared in escalation:. Claude Code's auto-mode classifier
also hard/soft-denies them independently: a general goal is not explicit intent.auto mode tried to launch a detached claude -p
--permission-mode bypassPermissions child (an ungated autonomous agent). Wrapping the
flag in a script to dodge the classifier is a hard_deny nothing clears.claude -p, not a session that
spawns ungated children. Move the launch to cron/Actions/Task Scheduler (the human
authorizer), and give the child gates (dontAsk + allowlist), not bypass — unless it's
in an isolated container. (rules/loop-engineering.md directive #2.)pigeon so a
peer can stand off. Never touch another session's .claude/worktrees/.STATE.md.STATE.md's _Updated_ timestamp + the run-log.md tail as a
heartbeat — if the latest run is older than ~2× the cadence, the loop is down. A
separate cheap monitor (or a daily-scan loop) that flags stale loop heartbeats
closes this; the kill switch is for stopping, the heartbeat is for noticing it stopped.skip-ped the failing
test, not because it fixed the bug.verify passes) rather than the
intent. A loop, like any optimizer, games a weak metric.guard that runs the full suite +
typecheck, a scope that excludes test files and CI config, and a human review at
L2 (the PR gate). Never let an L2/L3 loop modify the very tests that gate it.scope: "*" (or **, or empty) — "may touch anything."loop-check rejects an unbounded or placeholder scope (exit 10). Scope
is bounded globs, always.kill_switch is mandatory (loop-check errors without one) and checked
first every run. The cheapest implementation is a PreToolUse hook that blocks
every tool the instant a PAUSED sentinel appears — an instant breaker.| Failure | Primary control |
|---|---|
| Runaway budget | budget_tokens + loop-estimate + loop-doctor budget check |
| 3am-dead | loop-doctor --live (gate binary + PATH) |
| Cache-cold | byte-identical run.md + loop-estimate TTL guidance |
| Force-push / prod | escalation gate + auto-mode classifier |
| Ungated-child spawn | scheduler-invokes-claude -p (rule #2) |
| Colliding loops | priority order + per-loop worktree + pigeon |
| Silent-stop | STATE.md/run-log heartbeat staleness |
| Gate reward-hacking | full-suite guard + scope excludes tests + human PR gate |
| Unbounded scope | loop-check rejects * |
| No kill switch | mandatory kill_switch + PreToolUse PAUSED hook |
| Comprehension debt | L1-first graduation; read the reports |