name: loop-ops description: "Design, scaffold, and safely run OUTER loops — scheduled discover→triage→implement→verify→escalate-or-land agent loops, the orchestration layer above a single run. Risk-tier ladder (L1 report → L2 assisted → L3 unattended) mapped onto Claude Code's permission model, a persistent STATE/run-log/budget spine, a production pattern catalog, multi-loop coordination, and a kill switch. Composes iterate (inner loop), fleet-worker (spawn), fleet-ops (land), and native /loop + /schedule. Triggers on: loop engineering, outer loop, loop design, design a loop, scheduled agent, autonomous loop, background agent loop, PR watch, CI watch, dependency bump, changelog gen, issue sort, daily scan, metric chase, regression watch, digest loop, backfill, monitor loop, freshness check, event-driven loop, channel webhook, loop check, loop estimate, loop readiness, ralph loop, agent harness, escalation gate, risk tier, kill switch, run it overnight on a schedule." when_to_use: "Use when designing or running a recurring/scheduled agent loop rather than a one-shot task — e.g. 'set up a loop that triages PRs every 10 minutes', 'design an autonomous CI-failure recovery loop', 'how risky is this loop / is it ready to run unattended', 'estimate what this loop costs per month', 'build a loop-engineering setup'. For a single-session improvement loop against one metric, use iterate instead." license: MIT allowed-tools: "Read Write Edit Bash Glob Grep" metadata: author: claude-mods
A loop is not a prompt. Turn-by-turn prompting puts you in the loop forever. Loop engineering inverts it: you design a recurring process with memory, verification, and boundaries that discovers work, hands it to agents, verifies the result, and decides — on a schedule or until a goal is met — whether to land it or escalate to a human.
"You shouldn't be prompting coding agents anymore. You should be designing the loops that prompt your agents." — Peter Steinberger
This skill is the outer loop: the orchestration layer above a single agent run. It
is the twin of iterate — iterate is the inner loop (one
metric, one session, git-as-memory); loop-ops is the design discipline for the loop
that schedules and gates inner runs. It does not reimplement spawning or landing; it
composes what this repo already ships.
Every durable loop rests on six primitives. The discipline is wiring them; the parts already exist:
| Primitive | What it is | Owned in claude-mods by |
|---|---|---|
| Schedule | fire the loop on a cadence | native-first: /loop (in-session), Desktop scheduled task (local, unattended), /schedule cloud routines (no local files); /goal is the native completion gate. External (cron/Task Scheduler + loop-run.sh) only for non-Claude-Code control |
| Worktree | isolated, discardable execution context | git-ops worktrees, fleet-worker (per-task worktree) |
| Skills | persistent project knowledge the run loads | this repo's skill layer + your CLAUDE.md |
| Sub-agents | maker/checker separation | Agent/Task; dispatching skills (review, testgen) |
| Connectors | reach tickets / CI / chat | MCP tools, gh, github-ops |
| + State | a durable spine outside the conversation | STATE.md + run-log + budget (this skill) |
The inner improvement loop is iterate; cheap parallel makers are fleet-worker; the
test-gated merge queue is fleet-ops; inter-loop signalling is pigeon. loop-ops is
the doctrine that connects them.
┌──────────────────────────────────────────────────────────────┐
│ SCHEDULE (cadence) │
│ └─▶ TRIAGE read STATE.md → pick the next unit of work │
│ └─▶ WORKTREE isolate (git worktree) │
│ └─▶ MAKER implementer run (or fleet-worker)│
│ └─▶ CHECKER verify gate + guard (tests) │
│ └─▶ GATE safe & allowlisted? │
│ ├─ yes → LAND (commit/PR) │
│ └─ no → ESCALATE (+context) │
│ └─▶ write STATE.md, append run-log, decrement budget ──────┘
The gate is the load-bearing decision. Everything before it is mechanical; the gate is where a loop earns the right to run unattended — or doesn't.
Never start a loop unattended. Graduate it. Each tier maps to a concrete Claude Code permission mode — full mapping, the headless-profile table, and the enumerate vs isolate fork in references/risk-tiers.md.
| Tier | Posture | Permission mode | May do | Lands by |
|---|---|---|---|---|
| L1 Report | read-only discovery + triage | plan / dontAsk+read allowlist |
scan, summarize, propose — writes nothing | a human reads the report |
| L2 Assisted | suggest changes, human gates the merge | dontAsk+narrow allowlist, or auto |
edit in a worktree, run tests, open a PR | a human approves the PR (or fleet-ops) |
| L3 Unattended | autonomous land within a denylist | bypassPermissions in an isolated container only |
commit/merge allowlisted classes | the loop itself, inside its boundary |
The cardinal rule, straight from Claude Code's own gate model: an unattended loop is a
scheduler/script that invokes claude -p, not a Claude session that spawns ungated
children. A session in auto mode that tries to launch a --permission-mode
bypassPermissions child is blocked as Create Unsafe Agents — by design. See
references/risk-tiers.md and the repo's
auto-mode-classifier reference.
What a loop may land vs what it must escalate is not a vibe — it mirrors Claude
Code's classifier tiers. Bake these into the config's escalation: field:
main, production deploys
or migrations, mass deletion, granting IAM/repo permissions, anything destroying
pre-session files, editing .claude//settings (self-modification), curl | bash.gh pr merge out and land_via: fleet-ops in. Full
connector/MCP-scope discipline + the auto-merge guard: references/risk-tiers.md.A loop's memory lives outside the conversation, in three files (schemas + read/write contract in references/state-spine.md):
STATE.md — the triage snapshot: priority / watch / noise + a readiness line.
Read at the top of every run, rewritten at the end.run-log.md — append one line per run (timestamp, action, outcome, tokens). The
audit trail that answers "what has this loop been doing?"loop.config.yaml — the loop's definition (goal, tier, cadence, scope, gate,
budget, escalation). Scaffolded by loop-scaffold, scored by loop-check.Patterns are compositions of three axes — trigger (cadence / event via a Channel
/ goal) × posture (L1/L2/L3) × locus (connector→cloud routine / local→Desktop task).
The named patterns are well-trodden points in that space; compose your own from the axes.
Full recipes + the morphology in references/pattern-catalog.md:
| Pattern | Trigger · Locus | Tier | One-line job |
|---|---|---|---|
daily-scan |
cadence · local | L1 | discover + prioritize, report only |
pr-watch |
event|cadence · connector | L1 | watch review state, surface stuck PRs |
ci-watch |
event · local | L2 | triage build failures, propose a fix |
dep-bump |
cadence · local | L2 | patch-only bumps behind cooldown + guard |
changelog-gen |
event(tag)|cadence · local | L1 | draft release notes for approval |
merge-hygiene |
cadence · local | L1 | dead branches, stale flags |
issue-sort |
cadence · connector | L1 | classify + label, propose only |
metric-chase |
goal · local | L2 | drive a metric (coverage/latency/eval) via iterate |
regression-watch |
cadence|event · local | L1 | run a benchmark/eval, flag a regression |
digest |
cadence · connector | L1 | summarize email/Asana/news (cloud routine) |
backfill |
goal · local | L2 | drain a migration/queue to completion |
monitor |
event · local | L1 | error/deploy webhook → triage + page |
freshness |
cadence · local | L1 | re-check docs/data/deps vs reality |
Start any pattern at L1. Graduate to L2 only after the L1 reports prove its judgment.
Prefer event over cadence where a webhook exists (cheaper, faster than polling).
Running several loops? Two non-negotiables (detail in references/state-spine.md):
CI Watch → PR Watch → Dependency Bump →
Merge-Hygiene/Changelog → Daily Scan (off-peak). A higher-priority loop's
worktree wins; lowers defer. Loops signal each other via pigeon.PAUSED sentinel file
or a loop-pause label — that every loop checks at the top of its run and exits on.
No loop ships without one. Put it in kill_switch: and check it first.| You need to… | Use | Not |
|---|---|---|
| improve one metric in one session | iterate |
a hand-rolled inner loop |
| spawn cheap parallel makers | fleet-worker |
bespoke claude -p plumbing |
| test-gate + land winning branches | fleet-ops |
a manual merge step |
| fire on a cadence | native /loop · Desktop scheduled task · /schedule cloud routine; /goal for completion |
a custom cron in this skill |
| commit / PR / release | git-ops, github-ops |
raw git push |
| signal between loops | pigeon |
a shared scratch file |
loop-ops is the design layer; these are the execution layers.
Five scripts, all following the Skill Resource Protocol
(stdout = data, semantic exit codes, --help with EXAMPLES, --json envelopes): init
scaffolds the loop, audit scores whether the config is well-formed, doctor
preflights whether it will actually run, cost estimates spend (caching-aware), and
check-pricing-sync gates pricing drift in CI. The discipline before scheduling is
init → fill → cost → audit → doctor --live.
scripts/loop-scaffold.sh — scaffold a loop's state spineWrites <dir>/<name>/ with five files from the bundled templates:
loop.config.yaml (assets/loop.config.template.yaml),
STATE.md (assets/STATE.template.md), run-log.md, run.md
(the headless run prompt, assets/run.template.md), and an
executable loop-run.sh (assets/run.sh.template) — the
runner-agnostic tick wrapper any scheduler invokes (cron / Windows Task Scheduler /
systemd / by hand), no GitHub Actions required. Pass a known --pattern
(pr-watch, ci-watch, dep-bump, …) and the config is seeded with that
pattern's scope/goal/escalation — and, at L2+, its gate — so you get a near-ready config to
review, not blank placeholders (it audits clean immediately). Doctrine holds: it still
scaffolds at L1 by default with a graduation block.
# Create .loops/pr-watch/ with config + STATE.md + run-log.md + run.md from templates:
bash scripts/loop-scaffold.sh --name pr-watch --pattern pr-watch --tier L1
# Custom dir + cadence, preview without writing:
bash scripts/loop-scaffold.sh --name dep-bump --pattern dep-bump \
--tier L2 --cadence 1d --dir .loops --dry-run
Refuses to overwrite a populated <dir>/<name>/ (exit 5) unless --force. Atomic
writes. --dry-run prints what it would create and writes nothing. stdout = the created
config path.
scripts/loop-check.sh — readiness scorer (run before you schedule)The question this answers: is this loop safe to turn on at its declared tier? It scores
a loop.config.yaml against the readiness rubric — gate present, scope bounded,
escalation defined, guard + worktree at L2+, budget + kill switch set, permission mode
consistent with tier — and refuses a green light if any critical gap exists.
bash scripts/loop-check.sh .loops/pr-watch/loop.config.yaml # exit 0 ready, 10 not ready
bash scripts/loop-check.sh --json .loops/dep-bump/loop.config.yaml | jq '.data[] | select(.severity=="error")'
bash scripts/loop-check.sh --min 80 .loops/ci-watch/loop.config.yaml # raise the score bar
Exit 0 = ready (no errors, score ≥ --min), 10 = not ready (findings on stdout),
2 usage, 3 config not found, 4 config unparseable. --strict counts warnings
toward the not-ready signal.
scripts/loop-doctor.sh — live preflight (will it actually run?)loop-check proves the config is well-formed; loop-doctor proves the loop will
execute — catching the "blocked at 3am" failures audit can't see. --offline (CI-safe):
the budget fits a tick's estimated tokens, the permission mode is achievable (not
interactive), an L3 bypass declares an isolation boundary. --live adds runtime preflight:
the verify/guard gate's leading binary resolves on PATH, claude/git are present,
the kill-switch sentinel's parent dir exists.
bash scripts/loop-doctor.sh --offline .loops/pr-watch/loop.config.yaml # CI gate
bash scripts/loop-doctor.sh --live .loops/ci-watch/loop.config.yaml # before scheduling
bash scripts/loop-doctor.sh --live --json .loops/dep-bump/loop.config.yaml | jq '.data[] | select(.state=="bad")'
Exit 0 = will run, 10 = a check predicts a runtime failure (gate binary missing,
bypass on host without isolation, budget too small for a tick), 2 usage, 3 not found,
4 unparseable, 5 missing core dep. Run it after loop-check and before scheduling.
scripts/loop-estimate.py — token/$ estimate by pattern × cadence × model (caching-aware)Estimate spend before committing to a cadence — the cost of an outer loop is
runs/day × tokens/run × price, and sub-agents multiply it. It also models prompt
caching: a loop re-sends the same run.md+system prefix every tick (the Ralph
property), so the prefix should be cache-written once then read (~0.1×) — but only if the
tick interval fits the cache TTL. A loop slower than ~1h can't cache (the entry expires
between ticks); the estimator says so and recommends the TTL. Pricing reads from
assets/model-pricing.json (date-stamped; claude-api-ops
is the source of truth — run its check-model-table.py if you suspect drift).
python scripts/loop-estimate.py --pattern pr-watch --cadence 10m --model claude-haiku-4-5
python scripts/loop-estimate.py --pattern ci-watch --cadence 15m --model claude-sonnet-4-6 --days 30 --json
python scripts/loop-estimate.py --list-models # the pricing table + its as-of date
Exit 0 ok, 2 usage, 3 pricing file missing, 4 bad cadence/model. Output names
every assumption (runs/day, tokens/run, sub-agent multiplier) — it's an estimate, and it
says so.
scripts/check-pricing-sync.py — offline drift guard (CI)model-pricing.json is a copy of claude-api-ops's authoritative model table, and a copy
drifts silently. This offline verifier asserts every model in
assets/model-pricing.json matches claude-api-ops's "Current
Models" table (prices included). Both files are in-repo, so it's network-free and gates PR
CI via tests/check-resources.sh; live model-id drift is owned by claude-api-ops's
check-model-table.py.
python scripts/check-pricing-sync.py --offline # exit 0 in sync, 10 drift, 3 a file missing
custom). Start at L1.bash scripts/loop-scaffold.sh --name <n> --pattern <p> --tier L1.loop.config.yaml — the real goal, scope (bounded globs, never *),
verify gate, escalation rule, budget_tokens, kill_switch.python scripts/loop-estimate.py --pattern <p> --cadence <c> --model <m> —
sanity-check the monthly spend against the value.bash scripts/loop-check.sh .loops/<n>/loop.config.yaml — fix every
error before scheduling. Don't schedule a loop that fails its own audit.bash scripts/loop-doctor.sh --live .loops/<n>/loop.config.yaml — prove
it will actually run (gate binary on PATH, budget fits a tick). Audit = well-formed;
doctor = will-run.claude -p every ~270 s), not /loop
(which grows a session and chews tokens); fixed-criteria long task → /goal; quick
supervised polling → /loop. (L1 is read-only — it just writes STATE.md + a report.)fleet-ops landing) and re-audit at the higher tier.A complete, audit + doctor-clean L1 loop ships at
assets/examples/pr-watch/: a filled
loop.config.yaml, a populated STATE.md, the run.md run prompt, a sample
run-log.md, the runner-agnostic loop-run.sh (the tick wrapper, with the
kill-switch gate and dontAsk + allowlist baked in — point cron / Task Scheduler at it),
and an optional github-actions.yml for repos already on GitHub. Copy the dir, adjust
scope/cadence, run loop-check + loop-doctor --live, then wire loop-run.sh to your
scheduler. The other patterns don't ship as
static dirs that rot — loop-scaffold --pattern <name> generates the same, seeded and
gate-clean, for any pattern at any tier. CI runs loop-check + loop-doctor on this
example every build, so it can't drift out of validity.
The incident-shaped catalog — symptom → mechanism → the control that catches each — is references/failure-modes.md (runaway budget, the 3am-dead loop, cache-cold, force-push, ungated-child spawn, colliding loops, silent-stop, gate reward-hacking, …). The headline ones:
claude -p --permission-mode bypassPermissions
in a script to dodge the classifier is Auto-Mode Bypass — a hard_deny nothing
clears. If an outcome is blocked, authorize it (a narrow allow rule, or run the
scheduler outside the auto-mode session), never disguise it.auto mode is
the wrong place to launch the loop. The scheduler/cron/Task-Scheduler/CI runner that
invokes claude -p is the authorizer. See references/risk-tiers.md §"enumerate vs isolate".verify: is empty is not a loop, it's an unsupervised typer.
loop-check errors on it.scope: "*" means "may touch anything" — the audit rejects it./loop, /schedule, hooks, the scheduler pattern.