Browse Source

feat(skills): add loop-ops outer-loop design skill

Adds loop-ops, the outer-loop design discipline and twin to iterate (the
inner loop). Where iterate drives one metric in one session, loop-ops is the
orchestration layer above it: designing, scaffolding, costing, and safely
running scheduled discover->triage->implement->verify->escalate-or-land agent
loops.

Its spine is the risk-tier ladder (L1 report -> L2 assisted -> L3 unattended)
mapped onto Claude Code's actual permission model, including the
enumerate-vs-isolate fork and the rule that a scheduler invokes `claude -p`,
not a session that spawns ungated children. Ships a STATE/run-log/budget state
spine, a 7-pattern catalog, multi-loop coordination + kill switch, and three
Resource-Protocol scripts: loop-init (scaffold), loop-audit (readiness scorer),
loop-cost (token-$ estimate). 58-assertion offline suite. Composes
fleet-worker (spawn) and fleet-ops (land).

Also canonicalises docs/auto-mode-classifier.md -> docs/AUTO-MODE-CLASSIFIER.md
to match the docs/ UPPERCASE convention; loop-ops cites it as the authority for
its risk-tier mapping. Distils loop-engineering and the Ralph loop.

Wiring: README skill row + v3.3.0 changelog + count 94->95; AGENTS.md and
docs/PLAN.md counts; plugin.json 3.2.0->3.3.0; term.sh loop brand glyph.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
0xDarkMatter 1 week ago
parent
commit
27569684a0

+ 1 - 1
.claude-plugin/plugin.json

@@ -1,7 +1,7 @@
 {
   "$schema": "https://json.schemastore.org/claude-code-plugin-manifest.json",
   "name": "claude-mods",
-  "version": "3.2.0",
+  "version": "3.3.0",
   "description": "Custom commands, skills, agents, rules, hooks, and output styles for Claude Code - session continuity and modern CLI tooling for real-world development workflows",
   "author": {
     "name": "0xDarkMatter"

+ 1 - 1
AGENTS.md

@@ -5,7 +5,7 @@
 This is **claude-mods** - a collection of custom extensions for Claude Code:
 - **3 expert agents** for pure context-isolation/worker roles (git-agent, firecrawl-expert, project-organizer) - every domain-knowledge agent became an `-ops` skill (v3.0, skills-first)
 - **2 commands** for session management (/sync, /save)
-- **94 skills** for CLI tools, patterns, workflows, and development tasks (incl. `ffmpeg-ops` for probe-first media processing and EDL-driven editing, `supply-chain-defense` for behavioural-first dependency security, `prompt-injection-defense` for instruction-integrity scanning, `pypi-ops` for OIDC Trusted Publishing to PyPI, `net-ops` for network troubleshooting, `windows-ops` / `mac-ops` for workstation diagnostics, `fleet-worker` for cheap parallel worker delegation)
+- **95 skills** for CLI tools, patterns, workflows, and development tasks (incl. `loop-ops` for outer-loop design discipline, `ffmpeg-ops` for probe-first media processing and EDL-driven editing, `supply-chain-defense` for behavioural-first dependency security, `prompt-injection-defense` for instruction-integrity scanning, `pypi-ops` for OIDC Trusted Publishing to PyPI, `net-ops` for network troubleshooting, `windows-ops` / `mac-ops` for workstation diagnostics, `fleet-worker` for cheap parallel worker delegation)
 - **13 output styles** for response personality (Vesper, Spartan, Mentor, Executive, Pair, Atlas, Coach, Harbour, Meridian, Noir, Roast, Sage, Scout)
 - **11 hooks** for pre-commit linting, post-edit formatting, dangerous command warnings, uv enforcement, dependency-install + manifest-edit supply-chain advisories, hidden-Unicode scanning (session-start + pre-commit), live config-change + worktree guards, and pmail notifications - security set auto-wired via plugin hooks.json
 - **Pigeon** inter-session messaging (`pigeon send/read/reply`) - SQLite-backed pmail at `~/.claude/pmail.db`

+ 24 - 0
CHANGELOG.md

@@ -5,6 +5,30 @@ All notable changes to claude-mods are documented here. Format follows
 [Semantic Versioning](https://semver.org/). Fuller narrative entries for
 feature releases live in the README "Recent Updates" section.
 
+## [3.3.0] - 2026-06-22
+
+### Added
+- **`loop-ops` skill** - the *outer-loop* design discipline, twin to `iterate` (the inner
+  loop). Where `iterate` drives one metric in one session, `loop-ops` is the orchestration
+  layer above it: how to design, scaffold, cost, and safely run scheduled
+  discover→triage→implement→verify→escalate-or-land agent loops. Its spine is the
+  **risk-tier ladder** (L1 report → L2 assisted → L3 unattended) mapped onto Claude Code's
+  *actual* permission model — each tier a concrete permission mode, plus the
+  enumerate-vs-isolate fork and the load-bearing rule that a scheduler invokes `claude -p`,
+  not a session that spawns ungated children (grounded in `docs/AUTO-MODE-CLASSIFIER.md`).
+  Ships a STATE/run-log/budget state spine, a 7-pattern catalog (PR babysitter, CI sweeper,
+  dependency sweeper, changelog drafter, post-merge cleanup, issue/daily triage), multi-loop
+  coordination + kill switch, and three Resource-Protocol scripts: `loop-init` (scaffold),
+  `loop-audit` (readiness scorer — refuses a green light on an unbounded scope, missing gate,
+  or undefined escalation), and `loop-cost` (token-$ estimate by pattern × cadence × model,
+  pricing sourced from `claude-api-ops`). Composes `fleet-worker` (spawn) and `fleet-ops`
+  (land); 58-assertion offline self-test. Distils
+  [loop-engineering](https://github.com/cobusgreyling/loop-engineering) and the
+  [Ralph loop](https://ghuntley.com/ralph/).
+- **`docs/AUTO-MODE-CLASSIFIER.md`** - reference on Claude Code's auto-mode permission
+  classifier (the two-gate model, gating categories, legitimate-authorization decision tree),
+  cited by `loop-ops` as the authority for its risk-tier mapping.
+
 ## [3.2.0] - 2026-06-22
 
 ### Added

+ 9 - 5
README.md

@@ -12,16 +12,19 @@
 
 > *A comprehensive extension toolkit that transforms Claude Code into a specialized development powerhouse.*
 
-**claude-mods** is a production-ready plugin that extends Claude Code with 94 specialized skills, 3 expert agents, 13 output styles, 11 hooks, and modern CLI tools designed for real-world development workflows. Whether you're debugging React hooks, optimizing PostgreSQL queries, or building production CLI applications, this toolkit equips Claude with the domain expertise and procedural knowledge to work at expert level across multiple technology stacks.
+**claude-mods** is a production-ready plugin that extends Claude Code with 95 specialized skills, 3 expert agents, 13 output styles, 11 hooks, and modern CLI tools designed for real-world development workflows. Whether you're debugging React hooks, optimizing PostgreSQL queries, or building production CLI applications, this toolkit equips Claude with the domain expertise and procedural knowledge to work at expert level across multiple technology stacks.
 
 Built on the [Agent Skills specification](https://agentskills.io/specification) (an open standard backed by Anthropic, Vercel, Google, Microsoft, and 40+ agent platforms), claude-mods fills critical gaps in Claude Code's capabilities: persistent session state that survives across machines, on-demand expert knowledge for specialized domains, token-efficient modern CLI tools (10-100x faster than traditional alternatives), and proven workflow patterns for TDD, code review, and feature development. The toolkit implements Anthropic's [recommended patterns for long-running agents](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents), ensuring your development context never vanishes when sessions end.
 
 From Python async patterns to Rust ownership models, from AWS Fargate deployments to Craft CMS development - claude-mods provides the specialized knowledge and tools that transform Claude from a general-purpose assistant into a domain expert who understands your stack, remembers your workflow, and ships production code.
 
-**3 agents. 94 skills. 13 styles. 11 hooks. 7 rules. One install.**
+**3 agents. 95 skills. 13 styles. 11 hooks. 7 rules. One install.**
 
 ## Recent Updates
 
+**v3.3.0** (June 2026)
+- 🔁 **`loop-ops` skill** — the *outer-loop* design discipline, twin to [`iterate`](skills/iterate/) (the inner loop). Where `iterate` drives one metric in one session, `loop-ops` is the orchestration layer above it: how to design, scaffold, cost, and **safely** run scheduled discover→triage→implement→verify→escalate-or-land agent loops. Its spine is the **risk-tier ladder** — L1 report → L2 assisted → L3 unattended — mapped onto Claude Code's *actual* permission model (the thing the upstream methodology can't do): each tier is a concrete permission mode, with the *enumerate-vs-isolate* fork and the load-bearing rule that **a scheduler invokes `claude -p`, not a session that spawns ungated children**. Ships a STATE/run-log/budget state spine, a 7-pattern catalog (PR babysitter, CI sweeper, dependency sweeper, changelog drafter, post-merge cleanup, issue/daily triage), multi-loop coordination + kill switch, and three Resource-Protocol scripts — `loop-init` (scaffold), `loop-audit` (readiness scorer that refuses a green light on an unbounded scope / missing gate / undefined escalation), `loop-cost` (token-$ estimate by pattern × cadence × model). Composes `fleet-worker` (spawn) and `fleet-ops` (land); 58-assertion offline suite. Distils [loop-engineering](https://github.com/cobusgreyling/loop-engineering) and the [Ralph loop](https://ghuntley.com/ralph/), grounded in this repo's auto-mode-classifier reference.
+
 **v3.2.0** (June 2026)
 - 🤖 **`fleet-worker` skill** — delegate tool-using, multi-step tasks to *cheaper headless Claude Code workers* — a cheaper Anthropic model (Sonnet/Haiku) or any Anthropic-compatible endpoint (e.g. GLM 5.2 via z.ai) — while an Opus orchestrator fans them out in parallel and gates their results before anything lands. Each worker is a real `claude -p` with Claude Code's full tool harness (Read/Write/Edit/Bash/Glob/Grep/Task) and any skills you provision into it, but a cheaper brain — isolated in its own git worktree + `CLAUDE_CONFIG_DIR`. Ships bash + PowerShell launchers, a result-gating collector, an endpoint health verifier, and the fleet-ops handoff recipes. fleet-worker is the **spawn** layer; [`fleet-ops`](skills/fleet-ops/) is the test-gated **landing** layer it hands winning branches to. Provider-agnostic.
 
@@ -76,7 +79,7 @@ Claude Code is powerful out of the box, but it has gaps. This toolkit fills them
 
 - **Session continuity** — Tasks vanish when sessions end. We fix that with `/save` and `/sync`, implementing Anthropic's [recommended pattern](https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents) for long-running agents.
 
-- **Expert-level knowledge on demand** — 94 on-demand skills covering React, TypeScript, Python, Go, Rust, PostgreSQL, and more, plus 3 specialized agents reserved for genuine context-isolation/worker roles (git operations, web scraping, project reorganization). Skills-first: knowledge loads when relevant instead of living in heavyweight agent prompts.
+- **Expert-level knowledge on demand** — 95 on-demand skills covering React, TypeScript, Python, Go, Rust, PostgreSQL, and more, plus 3 specialized agents reserved for genuine context-isolation/worker roles (git operations, web scraping, project reorganization). Skills-first: knowledge loads when relevant instead of living in heavyweight agent prompts.
 
 - **Modern CLI tools** — Stop using `grep`, `find`, and `cat`. Our rules automatically prefer `ripgrep`, `fd`, `eza`, and `bat` — 10-100x faster and token-efficient.
 
@@ -101,7 +104,7 @@ claude-mods/
 ├── .claude-plugin/     # Plugin metadata
 ├── agents/             # Expert subagents (3)
 ├── commands/           # Slash commands (2)
-├── skills/             # Custom skills (94)
+├── skills/             # Custom skills (95)
 ├── output-styles/      # Response personalities
 ├── hooks/              # Hook examples & docs
 ├── rules/              # Claude Code rules
@@ -309,6 +312,7 @@ See [skill-creator](skills/skill-creator/) for the complete guide.
 | [refactor-ops](skills/refactor-ops/) | Safe refactoring patterns, code smell detection, test-driven methodology |
 | [scaffold](skills/scaffold/) | Project scaffolding - generate boilerplate for APIs, web apps, CLIs, monorepos |
 | [iterate](skills/iterate/) | Autonomous improvement loop - modify, measure, keep or discard, repeat. Inspired by Karpathy's autoresearch. |
+| [loop-ops](skills/loop-ops/) | Outer-loop design discipline - the orchestration layer above `iterate`: risk-tier ladder (L1 report → L2 assisted → L3 unattended) mapped onto Claude Code's permission model, STATE/run-log/budget spine, 7-pattern catalog (PR babysitter, CI sweeper, dependency sweeper…), multi-loop coordination, kill switch. Composes iterate/fleet-worker/fleet-ops/native-loop. loop-init/loop-audit/loop-cost scripts. |
 | [testing-ops](skills/testing-ops/) | Test strategy patterns - mocking, CI testing, test data design |
 | [claude-code-ops](skills/claude-code-ops/) | Claude Code internals - full hook event catalog, skill frontmatter spec, headless/CLI reference, extension debugging |
 | [playwright-ops](skills/playwright-ops/) | Playwright e2e testing - selector hierarchy, fixtures, network mocking, CI sharding, flake hunting |
@@ -562,7 +566,7 @@ When using multiple MCP servers (Chrome DevTools, Vibe Kanban, etc.), their tool
 
 ### Skill Description Budget
 
-With 90+ skills installed (this plugin alone ships 94), skill descriptions can overflow the listing budget. All skill names are always listed, but descriptions share a budget of **1% of the model context window** — on overflow, least-invoked skills lose their descriptions first and **silently stop auto-triggering** (explicit `/name` invocation still works). Each skill's combined `description` + `when_to_use` is also truncated at **1,536 chars**, so trigger phrases belong at the front.
+With 90+ skills installed (this plugin alone ships 95), skill descriptions can overflow the listing budget. All skill names are always listed, but descriptions share a budget of **1% of the model context window** — on overflow, least-invoked skills lose their descriptions first and **silently stop auto-triggering** (explicit `/name` invocation still works). Each skill's combined `description` + `when_to_use` is also truncated at **1,536 chars**, so trigger phrases belong at the front.
 
 - **Check:** run `/doctor` — it shows whether the budget is overflowing and which skills are affected.
 - **Fix:** demote or disable skills you don't use via `skillOverrides` in settings (`"on"` / `"name-only"` / `"user-invocable-only"` / `"off"` per skill, or `/skills` + `Space`). Plugin skills are managed via `/plugin` instead.

+ 454 - 0
docs/AUTO-MODE-CLASSIFIER.md

@@ -0,0 +1,454 @@
+# Claude Code Auto-Mode Permission Classifier — Reference
+
+> Compiled 2026-06-22 for Claude Code v2.1.x (auto mode requires v2.1.83+).
+> Two evidence sources, labelled throughout:
+> - **[DOC]** — official Anthropic documentation (URL cited inline). Verified by direct fetch on 2026-06-22.
+> - **[OBS]** — observed behaviour extracted from local Claude Code session transcripts
+>   (`~/.claude/projects/**/*.jsonl`), summarised, no secrets reproduced. These reflect
+>   runtime behaviour and internal label strings that are **not** part of the published docs.
+>
+> Where [DOC] and [OBS] agree, the claim is solid. Where only [OBS] is given, treat it as
+> reverse-engineered from runtime output and subject to change between releases.
+
+---
+
+## What this document answers
+
+When **auto mode** is on, Claude Code stops prompting you for most actions. Instead, a
+separate **classifier model** reviews each not-yet-resolved tool call and decides
+*allow / block* on its own — with no human in the loop. This doc explains:
+
+1. Where that classifier sits relative to the rule-based `allow`/`deny`/`ask` system.
+2. The exact decision order, and **why a broad `allow` rule does not protect a command** it
+   appears to match (the question that prompted this doc).
+3. The categories the classifier gates, with real triggers.
+4. How a denial is delivered (non-interactive in auto mode) and the fallback behaviour.
+5. The **legitimate** ways to authorise something the classifier blocks.
+6. The evasion patterns that are both **detected and wrong** — and why.
+
+---
+
+## 1. The two-gate model
+
+A tool call passes through **two independent gates** before it runs. [DOC]
+
+```
+tool call
+   │
+   ├─▶ GATE 1 — the permissions system (rule-based, deterministic)
+   │      PreToolUse hooks → permissions.deny → permissions.ask
+   │      → permission mode → permissions.allow
+   │      (precedence: deny, then ask, then allow — first match wins)
+   │
+   └─▶ GATE 2 — the auto-mode classifier (model-based, only in `auto` mode)
+          runs *after* the permissions system, for anything the rules didn't resolve
+```
+
+> "The classifier is a second gate that runs after the permissions system. For actions that
+> must never run regardless of user intent or classifier configuration, use
+> `permissions.deny` in managed settings, which blocks the action before the classifier is
+> consulted and cannot be overridden." — [DOC] [auto-mode-config](https://code.claude.com/docs/en/auto-mode-config.md)
+
+Key consequences:
+
+- **`permissions.deny` always wins** — it blocks *before* the classifier is even consulted.
+- The classifier only exists in `auto` mode. In `default`/`acceptEdits`/`plan`/`dontAsk`/
+  `bypassPermissions`, gate 2 is absent; unresolved actions prompt, get auto-approved, or get
+  auto-denied per the mode. [DOC] [permission-modes](https://code.claude.com/docs/en/permission-modes.md)
+
+---
+
+## 2. What auto mode is (documented)
+
+[DOC] [permission-modes](https://code.claude.com/docs/en/permission-modes.md):
+
+> "Auto mode lets Claude execute without routine permission prompts. A separate classifier
+> model reviews actions before they run, blocking anything that escalates beyond your request,
+> targets unrecognized infrastructure, or appears driven by hostile content Claude read.
+> Explicit ask rules still force a prompt."
+
+It is a **research preview**: *"It reduces prompts but does not guarantee safety."* [DOC]
+
+| Property | Value | Source |
+|---|---|---|
+| Mode name | `auto` | [DOC] |
+| Minimum version | Claude Code v2.1.83 | [DOC] |
+| Classifier model | Server-configured, independent of your `/model`. Anthropic API: Opus 4.6+ or Sonnet 4.6. Bedrock/Vertex/Foundry: Opus 4.7 / 4.8 only. | [DOC] |
+| What the classifier reads | User messages, tool calls, your `CLAUDE.md`. **Tool *results* are stripped** (a server-side probe scans them for hostile content first). | [DOC] |
+| Enable | `Shift+Tab` to cycle (opt-in prompt first); or `defaultMode: "auto"` in **`~/.claude/settings.json`**. | [DOC] |
+| `defaultMode: "auto"` in `.claude/settings.json` or `.local.json` | **Ignored** (v2.1.142+) — "a repository cannot grant itself auto mode." Must live in user settings. | [DOC] |
+| Bedrock/Vertex/Foundry | Off until `CLAUDE_CODE_ENABLE_AUTO_MODE=1` (v2.1.158+). | [DOC] |
+| Admin lock-off | `permissions.disableAutoMode: "disable"` in managed settings. | [DOC] |
+
+The fact that a repo can't grant itself auto mode (and can't inject `autoMode` rules via
+shared `.claude/settings.json`) is the same design principle behind the **Self-Modification**
+denials in §5 — the agent's own working tree must not be able to widen its own autonomy.
+
+---
+
+## 3. The decision order inside auto mode
+
+Verbatim from [DOC] [permission-modes](https://code.claude.com/docs/en/permission-modes.md)
+("How the classifier evaluates actions"), first matching step wins:
+
+1. Actions matching your **allow or deny rules resolve immediately**, *except* writes to
+   **protected paths**, which route to the classifier even when an allow rule matches.
+2. **Read-only actions and file edits in your working directory are auto-approved**, except
+   writes to protected paths.
+3. **Everything else goes to the classifier.**
+4. If the classifier blocks, Claude receives the reason and tries an alternative.
+
+### 3.1 The broad-allow-rule drop (this is the crux)
+
+> "On entering auto mode, broad allow rules that grant arbitrary code execution are dropped:
+> - Blanket `Bash(*)` or `PowerShell(*)`
+> - Wildcarded interpreters like `Bash(python*)`
+> - Package-manager run commands
+> - `Agent` allow rules
+>
+> Narrow rules like `Bash(npm test)` carry over. Dropped rules are restored when you leave
+> auto mode." — [DOC] [permission-modes](https://code.claude.com/docs/en/permission-modes.md)
+
+**The classifier also "sees" the CLAUDE.md** and treats it as steering input, which is why
+auto mode enforces your own stated rules (see §5, user-boundary class).
+
+---
+
+## 4. Allow-rule vs classifier — when rules short-circuit, when they don't
+
+This directly answers the observed puzzle: *an existing `Bash(bash:*)` allow rule did **not**
+save a `bash wrapper.sh` call once the classifier saw the wrapper was hiding
+`--permission-mode bypassPermissions`.*
+
+The accurate model (combining §3.1 + the §3 order):
+
+| Situation | Does the allow rule short-circuit the classifier? |
+|---|---|
+| **Narrow** allow rule (`Bash(npm test)`), benign target | **Yes.** Resolves at step 1; never reaches the classifier. [DOC] |
+| **Broad** allow rule (`Bash(*)`, `Bash(python*)`, `Bash(bash:*)`, pkg-manager runners, `Agent(...)`) | **No** — the rule is *dropped on entry* to auto mode, so there's nothing to match at step 1. The action falls to step 3 (classifier). [DOC] |
+| Write to a **protected path** (`.git`, `.claude`, `.env`, shell rc, `.npmrc`, …) | **No.** Always routes to the classifier even if an allow rule matches. [DOC] |
+
+So the `Bash(bash:*)` case was **not** "a high-severity category overriding a live allow
+rule." It was: `Bash(bash:*)` is a *wildcarded interpreter* → **dropped on entry** → the
+`bash wrapper.sh` command reached the classifier → the classifier read the wrapper's real
+intent (launch a `bypassPermissions` agent) → blocked it as **Auto-Mode Bypass + Create
+Unsafe Agents**. [DOC] mechanism + [OBS] labels.
+
+### 4.1 Inside the classifier: hard vs soft, and "explicit intent"
+
+Once an action reaches the classifier, a four-tier precedence applies. [DOC]
+[auto-mode-config](https://code.claude.com/docs/en/auto-mode-config.md):
+
+- **`hard_deny`** — blocks unconditionally. *User intent and `allow` exceptions do not apply.*
+  The built-in hard-deny list includes **data exfiltration** and **auto-mode bypass** rules. [DOC]
+- **`soft_deny`** — blocks next; *can* be cleared by `allow` exceptions or explicit user intent.
+  Built-ins include force-push, `curl | bash`, production deploys. [DOC]
+- **`allow`** — exceptions that override matching `soft_deny`. [DOC]
+- **Explicit user intent** — overrides remaining soft blocks **only when specific**:
+
+  > "General requests don't count as explicit intent. Asking Claude to 'clean up the repo'
+  > does not authorize force-pushing, but asking Claude to 'force-push this branch' does." — [DOC]
+
+This is the principle behind nearly every observed denial: a *general* instruction
+("run an unattended loop", "do the backfill") is **not** authorisation for a *specific*
+high-blast-radius action it happens to imply. The classifier asks whether *this exact action*
+was authorised — and `bypassPermissions` self-replication lands in `hard_deny`, which even
+specific intent can't clear without an explicit user/admin config change.
+
+---
+
+## 5. Gating categories
+
+Two complementary views. The **[DOC] view** is the published behavioural lists; the **[OBS]
+view** is the short internal label strings the classifier emits inside denial reasons (only a
+couple of which — "auto-mode bypass", "data exfiltration" — also appear in the docs).
+
+### 5.1 Documented behavioural lists [DOC]
+
+[permission-modes](https://code.claude.com/docs/en/permission-modes.md) "What the classifier
+blocks by default":
+
+**Blocked by default:** downloading and executing code (`curl | bash`); sending sensitive
+data to external endpoints; production deploys and migrations; mass deletion on cloud storage;
+granting IAM/repo permissions; modifying shared infrastructure; irreversibly destroying files
+that existed before the session; force-push or pushing directly to `main`; `git reset --hard`,
+`git checkout -- .`, `git restore .`, `git clean -fd`, `git stash drop/clear`;
+`git commit --amend` on a HEAD commit not created this session; `terraform/pulumi/cdk/
+terragrunt destroy` (or applying a destructive plan).
+
+**Allowed by default:** local file operations in the working directory; installing
+dependencies declared in lock files/manifests; reading `.env` and sending credentials to
+*their matching* API; read-only HTTP; pushing to the branch you started on or one Claude created.
+
+Run `claude auto-mode defaults` to print the full built-in rule lists as JSON. [DOC]
+
+### 5.2 Observed internal category labels [OBS]
+
+These Title-Case tags appear verbatim inside `Reason:` strings of real denials. They are the
+classifier's own labels; only "Auto-Mode Bypass" / "data exfiltration" are echoed in the docs.
+
+| Label [OBS] | 1-line trigger | Real observed example (paraphrased reason) |
+|---|---|---|
+| **Create Unsafe Agents** | Spawning an autonomous `claude -p` loop with bypassed permissions | "Launches an autonomous agent loop with `--permission-mode bypassPermissions` and no human approval gate." |
+| **Auto-Mode Bypass** | Structuring a command to dodge the classifier itself | "…wrapper script explicitly built to route the bypass flag through `bash` to dodge the classifier — Auto-Mode Bypass and Create Unsafe Agents." |
+| **Self-Modification** | Agent editing its own startup config to widen autonomy | "Editing `.claude/settings.json` to add a permission allow-rule is Self-Modification." / "Editing the global `~/.claude/settings.json` to wire in a new PostToolUse hook is Self-Modification of agent startup config." |
+| **Credential Exploration** | Systematically scanning secret stores beyond task scope | "Systematically scanning the OS keyring across 36 guessed service/user-name combinations… is Credential Exploration." Also `.env` sweeps, `cmdkey /list`. |
+| **Unauthorized Persistence** | Installing something that survives the session | "Installs a persistent SYSTEM scheduled task via elevated PowerShell (Unauthorized Persistence)." |
+| **Modify Shared Resources** | Writing shared infra/config outside project scope | "Registering a persistent managed service in the shared orchestration system (Unauthorized Persistence + Modify Shared Resources + Expose Local Services)." |
+| **Expose Local Services** | Making a local service network-reachable | (same composite reason as above) |
+| **Production Deploy** | Touching live/production systems | "Applying an unapplied migration (raw CREATE TABLE/INDEX/GRANT DDL) directly to the shared production Supabase database — a Production Deploy / Modify Shared Resources." |
+| **Security Weaken** | Disabling a safety control | "Runs a script with `-ExecutionPolicy Bypass` (Security Weaken)." |
+
+In addition to the named categories, a large class of denials [OBS] is **user-boundary
+enforcement** — the classifier blocking things *you* forbade, with no fixed category tag:
+
+- `CLAUDE.md` rules: "Command uses `cd … && …` chaining, which the user's CLAUDE.md
+  explicitly prohibits."
+- Standing conversational boundaries: "User set a standing boundary 'don't launch another
+  trial'… that boundary was never lifted." (Matches the documented "boundaries you state in
+  conversation" behaviour, [DOC].)
+- Project rules: skipping a release-review preview before `gh release create`; overwriting
+  another session's worktree; fabricating a result file; deleting with `rm -rf` after the user
+  said never to.
+
+> ⚠️ Boundaries you state in chat are **re-read from the transcript on each check** and are
+> **not stored as rules** — context compaction can drop the message that set them. For a hard
+> guarantee, use a `permissions.deny` rule. [DOC]
+
+---
+
+## 6. The denial experience
+
+### 6.1 In auto mode, a denial is non-interactive [OBS]
+
+When auto mode is **on**, a blocked action is **not** surfaced as an approval prompt. It comes
+back to the model as a `tool_result` with `is_error: true` and this fixed template:
+
+> "Permission for this action was denied by the **Claude Code auto mode classifier**. Reason:
+> `<reason>`.. If you have other tasks that don't depend on this action, continue working on
+> those. IMPORTANT: You *may* attempt to accomplish this action using other tools that might
+> naturally be used to accomplish this goal, e.g. using head instead of cat. But you *should
+> not* attempt to work around this denial in malicious ways… If you believe this capability is
+> essential to complete the user's request, STOP and explain to the user what you were trying
+> to do and why you need this permission. Let the user decide how to proceed. To allow this
+> type of action in the future, the user can add a Bash permission rule to their settings."
+
+With auto mode **off**, a non-allowlisted action instead surfaces to the human as a normal
+approval prompt — that is the only way to get manual approval (see §7). [DOC]/[OBS]
+
+### 6.2 Fail-closed fallback [OBS]/[DOC]
+
+If the classifier can't evaluate an action, it **blocks** (fail-closed). Observed message [OBS]:
+
+> "Auto mode could not evaluate this action and is blocking it for safety — run with `--debug`
+> for details."
+
+The docs describe the same situation as a transient classifier outage where a message says
+auto mode "cannot determine the safety" of an action — distinct from auto mode being
+*unavailable* (an unmet requirement, not transient). [DOC]
+[errors#auto-mode-cannot-determine-the-safety-of-an-action](https://code.claude.com/docs/en/errors).
+
+### 6.3 Repeated-denial fallback [DOC]
+
+> "If the classifier blocks an action 3 times in a row or 20 times total, auto mode pauses and
+> Claude Code resumes prompting… These thresholds are not configurable. Any allowed action
+> resets the consecutive counter, while the total counter persists for the session."
+
+In **non-interactive `-p` mode**, repeated blocks **abort the session** (no human to prompt). [DOC]
+This is exactly why unattended `claude -p` batch agents die on a hard denial instead of pausing.
+
+Denials are recorded in `/permissions` → **Recently denied** tab; press `r` to mark one for a
+manual-approval retry. [DOC]
+
+---
+
+## 7. Legitimately authorising a blocked action
+
+Decision tree, cheapest/safest first. **Never** route around the classifier (see §8).
+
+1. **State specific intent in the conversation.** For a `soft_deny` action, a *specific*
+   instruction lifts the block ("force-push this branch", not "clean up the repo"). Does **not**
+   work for `hard_deny` (data exfiltration, auto-mode bypass). [DOC]
+
+2. **You add a narrow `allow` rule** (you, not the agent). A surviving narrow rule short-circuits
+   the classifier (§4). Keep it specific: `Bash(npm test)` carries over; `Bash(*)` / `Bash(bash:*)`
+   are dropped on entry and won't help. Edit `~/.claude/settings.json` or `.claude/settings.json`
+   yourself — the agent editing it is **Self-Modification** (§5). [DOC]/[OBS]
+
+3. **Add an `ask` rule** if you want a prompt rather than silent approval. Explicit `ask` rules
+   fire in *every* mode, including auto and bypassPermissions. [DOC]
+
+4. **Admin: widen the classifier's trust** via `autoMode.environment` (prose, not regex —
+   describe your repos/buckets/domains) and, if needed, `autoMode.allow` / `soft_deny` /
+   `hard_deny`. Lives in `~/.claude/settings.json` or managed settings (**not** shared project
+   settings). Validate with `claude auto-mode config` / `critique`. Always keep `"$defaults"`
+   unless you intend to replace the whole list. [DOC]
+   [auto-mode-config](https://code.claude.com/docs/en/auto-mode-config.md)
+
+5. **Switch auto mode off → get manual approval.** `Shift+Tab` back to `default` (or start
+   without `auto`). Now non-allowlisted actions prompt you interactively — the supported path
+   when someone says "just ask me for manual approval." [DOC]
+
+6. **Pick the right permission mode** for the job instead of fighting the classifier:
+   - `default` — review each action (most oversight).
+   - `acceptEdits` — auto-approve in-scope edits + common fs commands; everything else prompts.
+   - `plan` — read/explore only.
+   - `auto` — classifier-gated autonomy (this doc).
+   - `dontAsk` — auto-deny anything not pre-approved (locked-down CI).
+   - `bypassPermissions` — skip checks entirely (see 7). [DOC]
+
+7. **A dedicated `--dangerously-skip-permissions` / `--permission-mode bypassPermissions`
+   session** — for genuinely autonomous loops. **Risk tradeoffs, read before using:**
+   - Disables prompts *and* safety checks; as of v2.1.126 even protected-path writes execute. [DOC]
+   - "Offers no protection against prompt injection or unintended actions." [DOC]
+   - Only safe in an **isolated container/VM/dev-container without internet access**, where Claude
+     Code can't damage the host. [DOC]
+   - Refuses to start as root/sudo on Linux/macOS (unless in a recognised sandbox). [DOC]
+   - Explicit `ask` rules still prompt; `rm -rf /` and `rm -rf ~` still hit a circuit-breaker. [DOC]
+   - Cannot be entered mid-session — must launch with the flag. [DOC]
+   - Admin kill-switch: `permissions.disableBypassPermissionsMode: "disable"`. [DOC]
+   - **For background-safety with far fewer prompts, the docs explicitly steer you to auto mode
+     instead of bypassPermissions.** [DOC]
+
+8. **Hard guarantees** (the other direction): `permissions.deny` blocks before the classifier
+   and can't be overridden; `permissions.disableAutoMode` / `disableBypassPermissionsMode`
+   lock modes off in managed settings. React to denials programmatically with the
+   `PermissionDenied` hook. [DOC] [hooks](https://code.claude.com/docs/en/hooks)
+
+### 7.9 Running headless / unattended `claude -p` sessions
+
+The classifier does **not** block headless mode — `claude -p` is fully supported. [DOC]
+[headless](https://code.claude.com/docs/en/headless). It blocks one specific shape: an
+**auto-mode session silently spawning an ungated, unattended child** (`bypassPermissions` =
+"no approval gates" = **Create Unsafe Agents** [OBS]). Two independent fixes; either works,
+combine for best result.
+
+**Fix 1 — move the launch outside the auto-mode session.** The classifier only evaluates tool
+calls *inside* an auto-mode session. A human — or a human-configured Task Scheduler / cron /
+CI runner / plain script — running `claude -p …` is the authoriser, with no parent classifier
+in the loop. An unattended build loop should be *a scheduler/script that invokes `claude -p`*,
+not a Claude session that tries to launch the loop. Don't run the orchestrator session itself
+in auto mode if its job is spawning agents. [DOC]/[OBS]
+
+**Fix 2 — give the child gates instead of `bypassPermissions`.** The denial is about the
+*ungated* property, not headless-ness; [DOC] explicitly steers you here ("For background safety
+checks with far fewer prompts, use auto mode instead"). Pick the least privilege that still
+lets the job run:
+
+| Headless profile | Behaviour | Use for |
+|---|---|---|
+| `--permission-mode dontAsk` + curated `permissions.allow` | Auto-denies anything not pre-approved; read-only Bash always allowed; fully non-interactive. | Locked-down CI / unattended workers (**recommended default**). |
+| `--permission-mode auto` | Classifier-gated autonomy; configure `autoMode.environment` for your infra. In `-p`, repeated blocks abort the session. | Long "trust-the-direction" runs. |
+| `--permission-mode acceptEdits` + allow rules | Edits + common fs commands auto-approved; other Bash needs an allow rule (no prompt fires in `-p`). | Edit-heavy tasks with a known command set. |
+| `--dangerously-skip-permissions` (= `bypassPermissions`) | No gates at all. Refuses root/sudo; `ask` rules and `rm -rf /` \| `~` still circuit-break. | **Only** inside an isolated container/VM/devcontainer without internet. |
+
+All four are [DOC] [permission-modes](https://code.claude.com/docs/en/permission-modes.md).
+
+**The real fork for build workers:** *enumerate* (dontAsk + allowlist — runs anywhere, safe)
+vs *isolate* (container + bypassPermissions — full power, needs a sandbox). Reaching for
+`bypassPermissions` on the host *to avoid enumerating permissions* is precisely the pattern the
+classifier blocks.
+
+**Subagents can't escalate.** Subagents launched via the Agent/Task tool inherit the parent's
+mode, and the classifier uses the parent mode; `permissionMode` in subagent frontmatter is
+ignored. [DOC] So a full-bypass worker fleet must be the isolated-container path launched
+*outside* the auto-mode session — not an in-session subagent.
+
+---
+
+## 8. Do NOT do this — evasion is detected and wrong
+
+The classifier explicitly watches for attempts to **route around itself**, and the denial
+template tells the model in plain terms not to. Two patterns seen [OBS] and why they fail:
+
+### 8.1 Wrapper scripts that hide a flag → **Auto-Mode Bypass** (a `hard_deny`)
+
+Putting `claude -p --permission-mode bypassPermissions …` inside `wrapper.sh` and running
+`bash wrapper.sh` to slip past a `Bash(bash:*)` allow rule **does not work**:
+
+- The broad `Bash(bash:*)` rule is dropped on entry (§4), so it never matches.
+- The classifier reads the wrapper's *intent*, not just the literal command, and flags it:
+  > "Launches a detached autonomous `claude -p --permission-mode bypassPermissions` agent via a
+  > wrapper script built to route the bypass flag through bash to evade the classifier." [OBS]
+- "Auto-mode bypass" is in the **`hard_deny`** tier — *unconditional*, not clearable by user
+  intent or an `allow` exception. [DOC]
+- It is *wrong* regardless of detection: the denial message says *"you should not attempt to
+  work around this denial in malicious ways… do not attempt to bypass the intent behind this
+  denial."* [OBS] The classifier exists because regex can't tell that `python -c "..."` or a
+  shell wrapper is equivalent to the dangerous thing it wraps. [DOC engineering deep-dive]
+
+### 8.2 Agent editing settings/config to widen its own permissions → **Self-Modification**
+
+Having the agent write to `.claude/settings.json` / `~/.claude/settings.json` (to add an allow
+rule, wire a hook, or flip `defaultMode: "auto"`) is blocked even when the user asked for the
+*outcome*:
+
+> "Editing `.claude/settings.json` to add a permission allow-rule is Self-Modification; the user
+> chose 'Add allow-rule (I'll do it)', indicating they would add it themselves." [OBS]
+
+This mirrors the documented design: a repo can't grant itself auto mode, and shared
+`.claude/settings.json` can't inject `autoMode` rules. [DOC] **The human edits the config.**
+
+### Rule of thumb
+
+If the *outcome* is blocked, the answer is to **authorise it** (§7), never to **disguise it**.
+When a capability is genuinely needed and you can't authorise it cheaply, the correct move is
+the one the denial message names: **stop and ask the human.**
+
+---
+
+## 9. Quick reference
+
+### Settings / flags [DOC]
+
+| Key / flag | Effect |
+|---|---|
+| `permissions.defaultMode: "auto"` | Start in auto mode (**user settings only**; ignored in project/local). |
+| `permissions.disableAutoMode: "disable"` | Admin lock-off of auto mode (managed settings). |
+| `permissions.disableBypassPermissionsMode: "disable"` | Admin lock-off of bypassPermissions. |
+| `permissions.deny` / `ask` / `allow` | Rule-based gate 1; deny > ask > allow, first match wins; deny runs before the classifier. |
+| `autoMode.environment` | Prose description of trusted repos/buckets/domains. Include `"$defaults"`. |
+| `autoMode.hard_deny` / `soft_deny` / `allow` | Override classifier rule tiers. Keep `"$defaults"` unless replacing wholesale. |
+| `CLAUDE_CODE_ENABLE_AUTO_MODE=1` | Enable auto mode on Bedrock/Vertex/Foundry. |
+| `--permission-mode <mode>` | `default` / `acceptEdits` / `plan` / `auto` / `dontAsk` / `bypassPermissions`. |
+| `--dangerously-skip-permissions` | Alias for `--permission-mode bypassPermissions`. |
+| `--allow-dangerously-skip-permissions` | Adds bypass to the `Shift+Tab` cycle without activating it. |
+
+### CLI / inspection [DOC]
+
+| Command | Purpose |
+|---|---|
+| `claude auto-mode defaults` | Print built-in `environment`/`allow`/`soft_deny`/`hard_deny` as JSON. |
+| `claude auto-mode config` | Print the *effective* config (`"$defaults"` expanded). |
+| `claude auto-mode critique` | AI review of your custom rules (ambiguous / redundant / false-positive-prone). |
+| `/permissions` → Recently denied (`r`) | Review classifier denials; retry one with manual approval. |
+
+### Hooks [DOC] [hooks](https://code.claude.com/docs/en/hooks)
+
+- `PreToolUse` — custom allow/deny/ask logic before a tool runs (gate 1).
+- `PermissionRequest` — fires when a permission dialog would appear.
+- `PermissionDenied` — react to a classifier denial (e.g. signal a retry).
+
+---
+
+## 10. Sources
+
+**Documented [DOC]** (fetched 2026-06-22):
+
+- [Permission modes](https://code.claude.com/docs/en/permission-modes.md) — modes, auto mode,
+  decision order, broad-rule drop, blocked/allowed lists, fallback thresholds, protected paths.
+- [Configure auto mode](https://code.claude.com/docs/en/auto-mode-config.md) — `autoMode.*`
+  config, hard/soft/allow tiers, explicit-intent rule, `claude auto-mode` subcommands.
+- [Permissions](https://code.claude.com/docs/en/permissions.md) — rule syntax + precedence.
+- [Settings](https://code.claude.com/docs/en/settings.md) — settings files + precedence.
+- [Hooks](https://code.claude.com/docs/en/hooks) — `PreToolUse` / `PermissionRequest` /
+  `PermissionDenied`.
+- [Security](https://code.claude.com/docs/en/security.md) — safeguards, protected paths.
+- [Errors](https://code.claude.com/docs/en/errors) — "auto mode cannot determine the safety…".
+- Engineering deep-dive: [How we built Claude Code auto mode](https://www.anthropic.com/engineering/claude-code-auto-mode);
+  announcement: [claude.com/blog/auto-mode](https://claude.com/blog/auto-mode).
+
+**Observed [OBS]:** denial records extracted from local session transcripts under
+`~/.claude/projects/**/*.jsonl` (≈50+ sessions where the classifier fired), 2026. Summarised;
+no credentials or private content reproduced. Internal category label strings and the exact
+denial-message template are runtime artifacts, **not** published API, and may change.

+ 1 - 1
docs/PLAN.md

@@ -16,7 +16,7 @@
 | Component | Count | Notes |
 |-----------|-------|-------|
 | Agents | 3 | Pure context-isolation/worker roles only: git-agent (background commits/PRs), firecrawl-expert (noisy scrapes), project-organizer (bulk restructure) |
-| Skills | 94 | Operational skills, CLI tools, workflows, diagnostics, security |
+| Skills | 95 | Operational skills, CLI tools, workflows, diagnostics, security |
 | Commands | 2 | Session management (sync, save) |
 | Rules | 7 | cli-tools, commit-style, naming-conventions, prompt-injection, skill-agent-updates, supply-chain, worktree-boundaries |
 | Output Styles | 13 | Vesper, Spartan, Mentor, Executive, Pair, Atlas, Coach, Harbour, Meridian, Noir, Roast, Sage, Scout |

+ 1 - 0
skills/_lib/term.sh

@@ -202,6 +202,7 @@ __term_lookup() {
     BRAND::supply-chain)        entry="🛡|[S]" ;;
     BRAND::net-ops)             entry="📡|[N]" ;;
     BRAND::adr)                 entry="📒|[A]" ;;
+    BRAND::loop)                entry="🔁|[L]" ;;
     BRAND::terraform)           entry="🏗|[T]" ;;
     BRAND::claude)              entry="✶|[C]" ;;
     BRAND::play)                entry="🎭|[P]" ;;

+ 253 - 0
skills/loop-ops/SKILL.md

@@ -0,0 +1,253 @@
+---
+name: loop-ops
+description: "Design, scaffold, and safely run OUTER loops — scheduled discover→triage→implement→verify→escalate-or-land agent loops, the orchestration layer above a single run. Risk-tier ladder (L1 report → L2 assisted → L3 unattended) mapped onto Claude Code's permission model, a persistent STATE/run-log/budget spine, a production pattern catalog, multi-loop coordination, and a kill switch. Composes iterate (inner loop), fleet-worker (spawn), fleet-ops (land), and native /loop + /schedule. Triggers on: loop engineering, outer loop, loop design, design a loop, scheduled agent, autonomous loop, background agent loop, PR babysitter, CI sweeper, dependency sweeper, changelog drafter, issue triage, daily triage, loop audit, loop cost, loop readiness, ralph loop, agent harness, escalation gate, risk tier, kill switch, run it overnight on a schedule."
+when_to_use: "Use when designing or running a recurring/scheduled agent loop rather than a one-shot task — e.g. 'set up a loop that triages PRs every 10 minutes', 'design an autonomous CI-failure sweeper', 'how risky is this loop / is it ready to run unattended', 'estimate what this loop costs per month', 'build a loop-engineering setup'. For a single-session improvement loop against one metric, use iterate instead."
+license: MIT
+allowed-tools: "Read Write Edit Bash Glob Grep"
+metadata:
+  author: claude-mods
+  related-skills: "iterate, fleet-ops, fleet-worker, pigeon, git-ops, ci-cd-ops"
+---
+
+# Loop Ops — outer-loop design discipline
+
+**A loop is not a prompt.** Turn-by-turn prompting puts you in the loop forever. *Loop
+engineering* inverts it: you design a **recurring process with memory, verification, and
+boundaries** that discovers work, hands it to agents, verifies the result, and decides —
+on a schedule or until a goal is met — whether to **land it or escalate to a human**.
+
+> "You shouldn't be prompting coding agents anymore. You should be designing the loops
+> that prompt your agents." — Peter Steinberger
+
+This skill is the **outer loop**: the orchestration layer *above* a single agent run. It
+is the twin of [`iterate`](../iterate/SKILL.md) — `iterate` is the *inner* loop (one
+metric, one session, git-as-memory); `loop-ops` is the design discipline for the loop
+that *schedules and gates* inner runs. It does not reimplement spawning or landing; it
+**composes** what this repo already ships.
+
+---
+
+## The six primitives → what owns each here
+
+Every durable loop rests on six primitives. The discipline is wiring them; the parts
+already exist:
+
+| Primitive | What it is | Owned in claude-mods by |
+|---|---|---|
+| **Schedule** | fire the loop on a cadence | native `/loop`, `/schedule` (cron agents), `ScheduleWakeup` |
+| **Worktree** | isolated, discardable execution context | `git-ops` worktrees, `fleet-worker` (per-task worktree) |
+| **Skills** | persistent project knowledge the run loads | this repo's skill layer + your `CLAUDE.md` |
+| **Sub-agents** | maker/checker separation | `Agent`/`Task`; dispatching skills (`review`, `testgen`) |
+| **Connectors** | reach tickets / CI / chat | MCP tools, `gh`, `github-ops` |
+| **+ State** | a durable spine *outside* the conversation | `STATE.md` + run-log + budget (this skill) |
+
+The inner improvement loop is `iterate`; cheap parallel makers are `fleet-worker`; the
+test-gated merge queue is `fleet-ops`; inter-loop signalling is `pigeon`. `loop-ops` is
+the doctrine that connects them.
+
+## Loop anatomy
+
+```
+   ┌──────────────────────────────────────────────────────────────┐
+   │  SCHEDULE (cadence)                                           │
+   │     └─▶ TRIAGE      read STATE.md → pick the next unit of work │
+   │           └─▶ WORKTREE   isolate (git worktree)               │
+   │                 └─▶ MAKER     implementer run (or fleet-worker)│
+   │                       └─▶ CHECKER  verify gate + guard (tests) │
+   │                             └─▶ GATE  safe & allowlisted?      │
+   │                                   ├─ yes → LAND  (commit/PR)   │
+   │                                   └─ no  → ESCALATE (+context) │
+   │     └─▶ write STATE.md, append run-log, decrement budget ──────┘
+```
+
+The **gate** is the load-bearing decision. Everything before it is mechanical; the gate
+is where a loop earns the right to run unattended — or doesn't.
+
+## The risk-tier ladder (the heart of the discipline)
+
+Never start a loop unattended. Graduate it. Each tier maps to a concrete Claude Code
+**permission mode** — full mapping, the headless-profile table, and the *enumerate vs
+isolate* fork in [references/risk-tiers.md](references/risk-tiers.md).
+
+| Tier | Posture | Permission mode | May do | Lands by |
+|---|---|---|---|---|
+| **L1 Report** | read-only discovery + triage | `plan` / `dontAsk`+read allowlist | scan, summarize, propose — **writes nothing** | a human reads the report |
+| **L2 Assisted** | suggest changes, human gates the merge | `dontAsk`+narrow allowlist, or `auto` | edit in a **worktree**, run tests, open a PR | a human approves the PR (or `fleet-ops`) |
+| **L3 Unattended** | autonomous land within a denylist | `bypassPermissions` **in an isolated container only** | commit/merge allowlisted classes | the loop itself, inside its boundary |
+
+The cardinal rule, straight from Claude Code's own gate model: **an unattended loop is a
+*scheduler/script that invokes `claude -p`*, not a Claude session that spawns ungated
+children.** A session in `auto` mode that tries to launch a `--permission-mode
+bypassPermissions` child is blocked as *Create Unsafe Agents* — by design. See
+[references/risk-tiers.md](references/risk-tiers.md) and the repo's
+[auto-mode-classifier reference](../../docs/AUTO-MODE-CLASSIFIER.md).
+
+## The escalation gate
+
+What a loop may **land** vs what it must **escalate** is not a vibe — it mirrors Claude
+Code's classifier tiers. Bake these into the config's `escalation:` field:
+
+- **Always escalate (never auto-land):** force-push, push to `main`, production deploys
+  or migrations, mass deletion, granting IAM/repo permissions, anything destroying
+  pre-session files, editing `.claude/`/settings (self-modification), `curl | bash`.
+- **Safe to auto-land at L2/L3 (when allowlisted):** a green PR on a feature branch,
+  a lockfile patch bump that passes the guard, a generated changelog draft, a label/
+  triage classification, a comment.
+- **The test:** *would a careful human let this happen unattended in this repo?* If the
+  action's blast radius exceeds the loop's stated purpose, it escalates. A general goal
+  ("keep CI green") is **not** authorization for a specific high-blast action it implies.
+
+## The state spine
+
+A loop's memory lives **outside** the conversation, in three files (schemas +
+read/write contract in [references/state-spine.md](references/state-spine.md)):
+
+- **`STATE.md`** — the triage snapshot: priority / watch / noise + a readiness line.
+  Read at the top of every run, rewritten at the end.
+- **`run-log.md`** — append one line per run (timestamp, action, outcome, tokens). The
+  audit trail that answers "what has this loop been doing?"
+- **`loop.config.yaml`** — the loop's definition (goal, tier, cadence, scope, gate,
+  budget, escalation). Scaffolded by `loop-init`, scored by `loop-audit`.
+
+## Pattern catalog
+
+Seven battle-tested shapes, each with a cadence, a risk tier, and an escalation rule.
+Full skeletons in [references/pattern-catalog.md](references/pattern-catalog.md):
+
+| Pattern | Cadence | Tier | One-line job |
+|---|---|---|---|
+| Daily Triage | 1–2 h | L1 | discover + prioritize, report only |
+| PR Babysitter | 5–15 min | L1 | watch review state, surface stuck PRs |
+| CI Sweeper | 5–15 min | L2 | triage build failures, propose a fix |
+| Dependency Sweeper | 6 h–1 d | L2 | patch-only bumps behind the cooldown + guard |
+| Changelog Drafter | 1 d / tag | L1 | draft release notes for human approval |
+| Post-Merge Cleanup | 1–6 h | L1 | hygiene: dead branches, stale flags |
+| Issue Triage | 2 h–1 d | L1 | classify + label, propose only |
+
+Start any pattern at L1. Graduate to L2 only after the L1 reports prove its judgment.
+
+## Multi-loop coordination & the kill switch
+
+Running several loops? Two non-negotiables (detail in
+[references/state-spine.md](references/state-spine.md)):
+
+- **Priority order** prevents collisions: `CI Sweeper → PR Babysitter → Dependency
+  Sweeper → Post-Merge/Changelog → Daily Triage (off-peak)`. A higher-priority loop's
+  worktree wins; lowers defer. Loops signal each other via [`pigeon`](../pigeon/SKILL.md).
+- **A kill switch every loop honors.** A single stop signal — a `PAUSED` sentinel file
+  or a `loop-pause` label — that every loop checks at the top of its run and exits on.
+  No loop ships without one. Put it in `kill_switch:` and check it first.
+
+## Composition map — don't rebuild what exists
+
+| You need to… | Use | Not |
+|---|---|---|
+| improve one metric in one session | [`iterate`](../iterate/SKILL.md) | a hand-rolled inner loop |
+| spawn cheap parallel makers | [`fleet-worker`](../fleet-worker/SKILL.md) | bespoke `claude -p` plumbing |
+| test-gate + land winning branches | [`fleet-ops`](../fleet-ops/SKILL.md) | a manual merge step |
+| fire on a cadence | native `/loop`, `/schedule` | a custom cron in this skill |
+| commit / PR / release | [`git-ops`](../git-ops/SKILL.md), [`github-ops`](../github-ops/SKILL.md) | raw `git push` |
+| signal between loops | [`pigeon`](../pigeon/SKILL.md) | a shared scratch file |
+
+`loop-ops` is the **design layer**; these are the **execution layers**.
+
+---
+
+## Tools
+
+Three scripts, all following the [Skill Resource Protocol](../../docs/SKILL-RESOURCE-PROTOCOL.md)
+(stdout = data, semantic exit codes, `--help` with EXAMPLES, `--json` envelopes). They
+are the legs of a stool: **init** scaffolds, **audit** scores readiness, **cost**
+estimates spend before you commit to a cadence.
+
+### `scripts/loop-init.sh` — scaffold a loop's state spine
+
+```bash
+# Create .loops/pr-babysitter/ with config + STATE.md + run-log.md from templates:
+bash scripts/loop-init.sh --name pr-babysitter --pattern pr-babysitter --tier L1
+
+# Custom dir + cadence, preview without writing:
+bash scripts/loop-init.sh --name dep-sweeper --pattern dependency-sweeper \
+  --tier L2 --cadence 1d --dir .loops --dry-run
+```
+
+Refuses to overwrite a populated `<dir>/<name>/` (exit 5) unless `--force`. Atomic
+writes. `--dry-run` prints what it would create and writes nothing. stdout = the created
+config path.
+
+### `scripts/loop-audit.sh` — readiness scorer (run before you schedule)
+
+The question this answers: *is this loop safe to turn on at its declared tier?* It scores
+a `loop.config.yaml` against the readiness rubric — gate present, scope bounded,
+escalation defined, guard + worktree at L2+, budget + kill switch set, permission mode
+consistent with tier — and refuses a green light if any **critical** gap exists.
+
+```bash
+bash scripts/loop-audit.sh .loops/pr-babysitter/loop.config.yaml   # exit 0 ready, 10 not ready
+bash scripts/loop-audit.sh --json .loops/dep-sweeper/loop.config.yaml | jq '.data[] | select(.severity=="error")'
+bash scripts/loop-audit.sh --min 80 .loops/ci-sweeper/loop.config.yaml   # raise the score bar
+```
+
+Exit **0** = ready (no errors, score ≥ `--min`), **10** = not ready (findings on stdout),
+`2` usage, `3` config not found, `4` config unparseable. `--strict` counts warnings
+toward the not-ready signal.
+
+### `scripts/loop-cost.py` — token/$ estimate by pattern × cadence × model
+
+Estimate spend **before** committing to a cadence — the cost of an outer loop is
+runs/day × tokens/run × price, and sub-agents multiply it. Pricing reads from
+`assets/model-pricing.json` (date-stamped; [`claude-api-ops`](../claude-api-ops/SKILL.md)
+is the source of truth — run its `check-model-table.py` if you suspect drift).
+
+```bash
+python scripts/loop-cost.py --pattern pr-babysitter --cadence 10m --model claude-haiku-4-5
+python scripts/loop-cost.py --pattern ci-sweeper --cadence 15m --model claude-sonnet-4-6 --days 30 --json
+python scripts/loop-cost.py --list-models      # the pricing table + its as-of date
+```
+
+Exit `0` ok, `2` usage, `3` pricing file missing, `4` bad cadence/model. Output names
+every assumption (runs/day, tokens/run, sub-agent multiplier) — it's an estimate, and it
+says so.
+
+---
+
+## End-to-end workflow
+
+1. **Pick a pattern** from the catalog (or `custom`). Start at **L1**.
+2. **Scaffold:** `bash scripts/loop-init.sh --name <n> --pattern <p> --tier L1`.
+3. **Fill `loop.config.yaml`** — the real `goal`, `scope` (bounded globs, never `*`),
+   `verify` gate, `escalation` rule, `budget_tokens`, `kill_switch`.
+4. **Cost it:** `python scripts/loop-cost.py --pattern <p> --cadence <c> --model <m>` —
+   sanity-check the monthly spend against the value.
+5. **Audit it:** `bash scripts/loop-audit.sh .loops/<n>/loop.config.yaml` — fix every
+   error before scheduling. Don't schedule a loop that fails its own audit.
+6. **Schedule** the L1 run with native `/loop` or `/schedule` (read-only — it just
+   writes `STATE.md` + a report).
+7. **Read the reports.** Only after the loop's judgment is proven do you graduate it to
+   **L2** (worktree + guard + `fleet-ops` landing) and re-audit at the higher tier.
+
+## Anti-patterns (these are detected and wrong)
+
+- **Routing around the gate.** Wrapping `claude -p --permission-mode bypassPermissions`
+  in a script to dodge the classifier is *Auto-Mode Bypass* — a `hard_deny` nothing
+  clears. If an outcome is blocked, **authorize it** (a narrow allow rule, or run the
+  scheduler outside the auto-mode session), never **disguise it**.
+- **The orchestrator session spawning ungated children.** A session in `auto` mode is
+  the wrong place to launch the loop. The scheduler/cron/Task-Scheduler/CI runner that
+  invokes `claude -p` is the authorizer. See [references/risk-tiers.md](references/risk-tiers.md) §"enumerate vs isolate".
+- **No gate.** A loop whose `verify:` is empty is not a loop, it's an unsupervised typer.
+  `loop-audit` errors on it.
+- **Unbounded scope.** `scope: "*"` means "may touch anything" — the audit rejects it.
+- **No kill switch / no budget.** A loop you can't stop, or whose spend you didn't
+  bound, will eventually surprise you. Both are audit findings.
+- **Skipping L1.** Starting a fresh loop at L3 is how comprehension debt and incidents
+  compound. The ladder exists precisely so trust is *earned* before it's *granted*.
+
+## See also
+
+- [references/risk-tiers.md](references/risk-tiers.md) — L1/L2/L3 ↔ permission modes, headless profiles, enumerate-vs-isolate.
+- [references/pattern-catalog.md](references/pattern-catalog.md) — the seven patterns, full skeletons + escalation rules.
+- [references/state-spine.md](references/state-spine.md) — STATE.md / run-log / budget schemas, multi-loop coordination.
+- [references/claude-code-loops.md](references/claude-code-loops.md) — where loops actually live: `/loop`, `/schedule`, hooks, the scheduler pattern.
+- [assets/loop.config.template.yaml](assets/loop.config.template.yaml) — the loop definition starter.
+- The lineage: [Ralph loop](https://ghuntley.com/ralph/) (inner brute-force), [loop-engineering](https://github.com/cobusgreyling/loop-engineering) (the methodology this distills).

+ 24 - 0
skills/loop-ops/assets/STATE.template.md

@@ -0,0 +1,24 @@
+# <loop-name> — STATE
+_Updated: <ISO-8601 Z> · run #0 · readiness 100/100_
+
+<!--
+The triage snapshot. The loop READS this at the top of every run and REWRITES it at
+the end. Not a database — a lightweight snapshot of: what to act on, what to watch,
+what was seen-and-dismissed. Read/write contract: references/state-spine.md.
+First action of every run: check the kill switch, then read the Priority list.
+-->
+
+## Priority   (act on these next)
+<!-- the next units of work, highest first. e.g. "[P1] PR #412 failing CI 3h" -->
+- (none yet)
+
+## Watch      (not yet actionable)
+<!-- things being tracked that aren't ready to action -->
+- (none yet)
+
+## Noise      (seen + dismissed this run)
+<!-- items deliberately skipped, so the next run doesn't re-surface them -->
+- (none yet)
+
+---
+_Source: <scheduler, e.g. .github/workflows/<loop-name>.yml> · config: loop.config.yaml_

+ 41 - 0
skills/loop-ops/assets/loop.config.template.yaml

@@ -0,0 +1,41 @@
+# loop.config.yaml — one OUTER-loop definition.
+# Scaffolded by loop-init.sh, scored by loop-audit.sh. Flat YAML on purpose: every
+# key sits at column 0 so the audit parses it without a yq dependency.
+# Full field semantics: skills/loop-ops/references/state-spine.md
+#
+# >>> ADAPT every <PLACEHOLDER> below. The audit errors on unbounded scope, a
+#     missing gate, an undefined escalation rule, or a tier/permission mismatch.
+
+# ── identity ────────────────────────────────────────────────────────────────
+name: <loop-name>            # matches the .loops/<name>/ directory
+pattern: <pattern-key>       # a catalog key (pr-babysitter, ci-sweeper, …) or "custom"
+
+# ── autonomy ────────────────────────────────────────────────────────────────
+tier: L1                     # L1 report-only | L2 assisted (worktree+gate) | L3 unattended
+permission_mode: dontAsk     # plan | dontAsk | auto | acceptEdits | bypassPermissions
+                             #   L1 → plan or dontAsk · L2 → dontAsk/auto · L3 → bypassPermissions (container only)
+
+# ── cadence ─────────────────────────────────────────────────────────────────
+cadence: 1h                  # 10m | 1h | 6h | 1d, or a cron string ("*/10 * * * *")
+
+# ── purpose & bounds ────────────────────────────────────────────────────────
+goal: "<one sentence: what this loop does AND what it must never do>"
+scope:                       # bounded globs the loop may touch — NEVER "*" or "**"
+  - "<src/**>"
+
+# ── the gate (required at L2+) ──────────────────────────────────────────────
+verify: "<command that decides pass/fail, e.g. npm test>"   # a loop with no gate is invalid at L2+
+guard: "<must-always-pass, e.g. npm run typecheck>"          # required at L2+
+
+# ── isolation & landing (required at L2+) ───────────────────────────────────
+worktree: true               # isolate code changes in a git worktree (required L2+)
+land_via: fleet-ops          # who test-gates + lands winning branches (L2+)
+
+# ── the escalation rule (required) ──────────────────────────────────────────
+# What the loop ESCALATES instead of doing. Mirror the never-auto-land classes:
+# force-push, push to main, prod deploy/migration, mass delete, IAM grants, .claude edits.
+escalation: "<e.g. open a PR with context; never merge to main; never deploy>"
+
+# ── safety rails ────────────────────────────────────────────────────────────
+budget_tokens: 200000        # per-run output-token ceiling (stop the run when reached)
+kill_switch: ".loops/<loop-name>/PAUSED exists, OR the loop-pause label is set"

+ 22 - 0
skills/loop-ops/assets/model-pricing.json

@@ -0,0 +1,22 @@
+{
+  "_comment": "Per-model USD pricing per million tokens, read by loop-cost.py. SOURCE OF TRUTH is skills/claude-api-ops/SKILL.md 'Current Models' table — run skills/claude-api-ops/scripts/check-model-table.py --live if you suspect drift. Date-stamp this file when updating.",
+  "_as_of": "2026-06",
+  "_schema": "claude-mods.loop-ops.pricing/v1",
+  "models": {
+    "claude-fable-5":    { "input_per_mtok": 10.0, "output_per_mtok": 50.0 },
+    "claude-opus-4-8":   { "input_per_mtok": 5.0,  "output_per_mtok": 25.0 },
+    "claude-sonnet-4-6": { "input_per_mtok": 3.0,  "output_per_mtok": 15.0 },
+    "claude-haiku-4-5":  { "input_per_mtok": 1.0,  "output_per_mtok": 5.0 }
+  },
+  "_pattern_defaults": {
+    "_comment": "Rough per-run token estimates by pattern. input = context the run reads (STATE, diffs, tool results); output = tokens the model generates; subagents multiplies tokens when the pattern fans out makers/checkers. Estimates, not guarantees — reconcile against run-log.md actuals.",
+    "daily-triage":        { "input": 40000,  "output": 6000,  "subagents": 1 },
+    "pr-babysitter":       { "input": 15000,  "output": 3000,  "subagents": 1 },
+    "ci-sweeper":          { "input": 60000,  "output": 12000, "subagents": 2 },
+    "dependency-sweeper":  { "input": 30000,  "output": 8000,  "subagents": 1 },
+    "changelog-drafter":   { "input": 25000,  "output": 9000,  "subagents": 1 },
+    "post-merge-cleanup":  { "input": 20000,  "output": 4000,  "subagents": 1 },
+    "issue-triage":        { "input": 18000,  "output": 4000,  "subagents": 1 },
+    "custom":              { "input": 30000,  "output": 8000,  "subagents": 1 }
+  }
+}

+ 114 - 0
skills/loop-ops/references/claude-code-loops.md

@@ -0,0 +1,114 @@
+# Where Loops Actually Live in Claude Code
+
+The outer loop is a *cadence + a headless run*. This file is the mechanics: the concrete
+ways to fire a loop in Claude Code, when to use each, and how they compose with the tier
+model. The doctrine — *a scheduler invokes `claude -p`, not a session that spawns ungated
+children* — is in [risk-tiers.md](risk-tiers.md); this is the how.
+
+---
+
+## The four cadence mechanisms
+
+| Mechanism | What it is | Best for | Tier fit |
+|---|---|---|---|
+| **`/loop`** | runs a prompt/slash-command on a recurring interval (or self-paced) in the *current* session | interactive, supervised loops; polling you watch | L1, supervised |
+| **`/schedule`** | cron-scheduled **cloud** agents (routines) that run detached | unattended recurring loops, the real L2/L3 cadence | L2/L3 |
+| **`ScheduleWakeup`** | re-enter *this* session after a delay (dynamic `/loop` pacing) | self-pacing a single long task; polling external state | L1, supervised |
+| **OS scheduler / CI** | Task Scheduler / cron / GitHub Actions invoking `claude -p` | the canonical unattended loop; the authorizer is the scheduler | L2/L3 |
+
+The first three keep a session in the loop (good for L1, supervised). The fourth is the
+unattended pattern: **the scheduler is the human-configured authorizer**, so there's no
+parent classifier to block the headless child.
+
+---
+
+## The canonical unattended shape
+
+```
+                    ┌────────────────────────────────────────────┐
+   cron / Task      │  for each tick:                            │
+   Scheduler / CI ──┤    claude -p "$(cat .loops/<name>/run.md)" \│
+   (the authorizer) │      --permission-mode dontAsk \           │
+                    │      --append-system-prompt "$(cat STATE.md)"│
+                    │    → run reads STATE, does work, rewrites it │
+                    └────────────────────────────────────────────┘
+```
+
+- The **scheduler** (not a Claude session) invokes `claude -p`. It is the human-configured
+  authorizer; nothing upstream gates the run.
+- `--permission-mode dontAsk` + a curated allowlist = a **gated** worker that runs
+  anywhere. (For L3 arbitrary-execution jobs, swap to a container + `bypassPermissions` —
+  see the enumerate-vs-isolate fork in [risk-tiers.md](risk-tiers.md).)
+- The run prompt (`run.md`) is the same every tick — fresh context each time (the Ralph
+  property). State survives in `STATE.md` + the codebase + git, not the conversation.
+
+### Why not "a Claude session that launches the loop"?
+
+Because an `auto`-mode session that spawns a detached `claude -p --permission-mode
+bypassPermissions` child is blocked as **Create Unsafe Agents** — an ungated autonomous
+agent with no human gate. The fix is structural, not a workaround: move the launch to the
+scheduler. Trying to wrap the bypass flag in a script to dodge the gate is **Auto-Mode
+Bypass**, a `hard_deny` (see [risk-tiers.md](risk-tiers.md) and the
+[classifier reference](../../../docs/AUTO-MODE-CLASSIFIER.md)).
+
+---
+
+## Hooks — the loop's reflexes
+
+Hooks fire shell commands at points in the agent's lifecycle. Useful loop wiring:
+
+| Hook | Loop use |
+|---|---|
+| `PreToolUse` | enforce scope/kill-switch before a tool runs (deterministic gate 1) |
+| `PermissionDenied` | react to a classifier denial — log it, signal a retry, escalate |
+| `Stop` | write the run-log line + rewrite `STATE.md` as the run ends |
+| `SessionStart` | load `STATE.md` into context at the top of a run |
+
+A `PreToolUse` hook that checks `.loops/<name>/PAUSED` is the cheapest possible kill
+switch — it blocks every tool the instant the sentinel appears, no matter where the run
+is. See [`claude-code-ops`](../../claude-code-ops/SKILL.md) for the full 30-event hook
+catalog and the stdin/stdout JSON contracts.
+
+---
+
+## Composing with the execution layers
+
+The cadence fires; the work is done by the layers this repo already ships:
+
+```
+/schedule (cadence)
+   └─▶ claude -p  (the run; dontAsk + allowlist)
+         ├─▶ iterate          # inner improvement loop, if the unit of work is "improve metric X"
+         ├─▶ fleet-worker     # spawn cheap parallel makers in worktrees
+         └─▶ fleet-ops        # test-gate + land the winning branch
+   └─▶ Stop hook → rewrite STATE.md + append run-log
+```
+
+- **`iterate`** when the unit of work is "drive metric X to target in this session".
+- **`fleet-worker`** when one tick should fan out several maker attempts cheaply.
+- **`fleet-ops`** as the `land_via` — the sequential, test-gated merge queue that turns a
+  worker's green branch into a landed change (or escalates it).
+- **`pigeon`** to coordinate across concurrent loops (the priority-order standoff).
+
+---
+
+## A worked L1 → L2 graduation
+
+1. **L1, supervised:** `/loop 15m` in a session, running a read-only "report PR state to
+   STATE.md" prompt. You watch it; it writes nothing but the snapshot. Permission mode
+   `plan`.
+2. **Prove judgment:** read a week of `STATE.md` snapshots + the run-log. Is its triage
+   right? Does readiness hold?
+3. **L2, unattended:** move the cadence to `/schedule` (or cron → `claude -p`). Switch the
+   run prompt to "open a fix PR in a worktree" with `--permission-mode dontAsk` + a narrow
+   allowlist (`Bash(npm test)`, `Bash(git …)`). Add a `guard`, set `land_via: fleet-ops`,
+   write the `escalation` rule. Re-run `loop-audit` at L2 — fix every error — then enable.
+
+The point of the ladder: the cadence mechanism *changes* (session `/loop` → scheduled
+`claude -p`) exactly when the autonomy does, and the audit gates the transition.
+
+## See also
+
+- [risk-tiers.md](risk-tiers.md) — the permission-mode mapping + scheduler-not-session rule.
+- [state-spine.md](state-spine.md) — the STATE.md the run reads and rewrites.
+- [../../claude-code-ops/SKILL.md](../../claude-code-ops/SKILL.md) — the full hook catalog, `claude -p` flags, headless reference.

+ 138 - 0
skills/loop-ops/references/pattern-catalog.md

@@ -0,0 +1,138 @@
+# Pattern Catalog — seven production loop shapes
+
+Each pattern is a proven outer-loop shape with a cadence, a starting risk tier, a gate,
+and an escalation rule. **Start every pattern at L1** (read-only) and graduate it only
+after its reports prove its judgment. The `loop-init` script seeds a `loop.config.yaml`
+keyed by `--pattern <name>`; the names below are the canonical keys.
+
+The columns that matter for every pattern: **cadence** (how often), **tier** (starting
+autonomy), **gate** (what must pass before landing), **escalate** (what it must hand to a
+human instead of doing).
+
+---
+
+## `daily-triage` — discover + prioritize
+
+| | |
+|---|---|
+| Cadence | 1–2 h (weekday business hours) |
+| Tier | L1 (report only) |
+| Job | sweep the backlog/inbox/alerts; rank by priority; write the day's `STATE.md` |
+| Gate | none — it writes nothing but the report |
+| Escalate | everything; a human decides what to action |
+
+The off-peak, lowest-priority loop. It produces the work-list the other loops and the
+human draw from. Output is a `STATE.md` priority/watch/noise snapshot, nothing else.
+
+---
+
+## `pr-babysitter` — watch review state
+
+| | |
+|---|---|
+| Cadence | 5–15 min |
+| Tier | L1 |
+| Job | list open PRs; flag stuck (no review N hours), failing checks, merge conflicts |
+| Gate | none — surfaces state, posts a summary comment at most |
+| Escalate | a human reviews/merges; the loop never merges |
+
+Skeleton: `gh pr list --json …` → classify each (waiting-on-review / failing / conflict /
+ready) → update `STATE.md` watch list → optionally one summary comment per PR.
+Public-comment text follows the repo's preview-before-send discipline.
+
+---
+
+## `ci-sweeper` — triage build failures
+
+| | |
+|---|---|
+| Cadence | 5–15 min |
+| Tier | L2 (after L1 proves triage quality) |
+| Job | detect red CI; classify the failure; at L2, propose a fix in a worktree |
+| Gate | `verify` (the failing test passes) **and** `guard` (full suite + typecheck) |
+| Escalate | flaky/infra failures, anything touching deploy/secrets, ambiguous root cause |
+
+The highest-priority loop in the multi-loop order — a red build blocks everyone. At L1 it
+classifies and reports; at L2 it opens a fix PR in an isolated worktree and hands the
+branch to `fleet-ops` for the gated merge. Never auto-merges to `main`.
+
+---
+
+## `dependency-sweeper` — patch-only bumps
+
+| | |
+|---|---|
+| Cadence | 6 h – 1 d |
+| Tier | L2 |
+| Job | find outdated deps; bump **patch-only**, behind a release-age cooldown |
+| Gate | `guard` (full suite + build) passes; supply-chain cooldown satisfied |
+| Escalate | minor/major bumps, anything failing the guard, any flagged advisory |
+
+Pair with [`supply-chain-defense`](../../supply-chain-defense/SKILL.md): respect the
+7-day cooldown (`preinstall-check.sh`) and behavioural score before a bump lands. Patch
+bumps that pass both the cooldown and the guard are the *only* class safe to auto-land,
+and only on a feature branch, never `main`.
+
+---
+
+## `changelog-drafter` — release-note drafts
+
+| | |
+|---|---|
+| Cadence | 1 d, or on tag |
+| Tier | L1 (draft only) |
+| Job | summarize merged PRs since the last tag into a `RELEASE_NOTES_DRAFT.md` |
+| Gate | none — produces a draft for human approval |
+| Escalate | the human edits + publishes; the loop never runs `gh release create` |
+
+Drafts to a file, never publishes. Publishing a release is a one-way visibility
+commitment — it stays a human step (the repo's release-review discipline). Pair with
+[`github-ops`](../../github-ops/SKILL.md) for the human-driven publish.
+
+---
+
+## `post-merge-cleanup` — hygiene
+
+| | |
+|---|---|
+| Cadence | 1–6 h (off-peak) |
+| Tier | L1 |
+| Job | find merged-and-deletable branches, stale feature flags, orphaned artifacts |
+| Gate | none at L1; at L2, branch deletion behind a "merged + N days old" rule |
+| Escalate | anything ambiguous; never deletes a branch with unmerged commits |
+
+Honors [worktree boundaries](../../../rules/worktree-boundaries.md): never touches another
+session's `.claude/worktrees/`, never sweeps with `git add -A`.
+
+---
+
+## `issue-triage` — classify + label
+
+| | |
+|---|---|
+| Cadence | 2 h – 1 d |
+| Tier | L1 (propose-only) |
+| Job | classify new issues (bug/feature/question/dupe), suggest labels + priority |
+| Gate | none — proposes labels, applies only the mechanical ones at L2 |
+| Escalate | priority calls, dupe-closing, anything needing product judgment |
+
+At L1 it proposes; at L2 it may apply purely-mechanical labels (`needs-triage` →
+`bug`/`docs`) but never closes an issue or sets priority unattended.
+
+---
+
+## Choosing a pattern → tier → cadence
+
+1. **Match the job** to the closest pattern; use `custom` only if none fit.
+2. **Start at the pattern's L1 tier.** Run it read-only until you trust its reports.
+3. **Set the cadence** to the slowest that still catches the work in time — cadence is
+   the biggest cost lever (see [loop-cost](../scripts/loop-cost.py)). A 5-min PR
+   babysitter costs 3× a 15-min one for marginal freshness gain.
+4. **Graduate** to L2 only with a `guard`, a `worktree`, an `escalation` rule, and a
+   `land_via` — then re-run `loop-audit` at the new tier.
+
+## See also
+
+- [risk-tiers.md](risk-tiers.md) — what each tier may do and the permission-mode mapping.
+- [state-spine.md](state-spine.md) — the multi-loop priority order these patterns share.
+- [../assets/loop.config.template.yaml](../assets/loop.config.template.yaml) — the config every pattern fills in.

+ 137 - 0
skills/loop-ops/references/risk-tiers.md

@@ -0,0 +1,137 @@
+# Risk Tiers ↔ Claude Code's permission model
+
+The single best idea in loop engineering is **graduated autonomy**: a loop earns the
+right to act unattended, it isn't granted it. This file maps the L1→L2→L3 ladder onto
+Claude Code's *actual* permission machinery — which is what makes this skill more than a
+generic-agent methodology. The authority for the gate behaviour is the repo's
+[auto-mode-classifier reference](../../../docs/AUTO-MODE-CLASSIFIER.md); read it for the
+full two-gate model. This file is the loop-specific projection.
+
+---
+
+## The ladder
+
+```
+L1 Report ───────► L2 Assisted ───────► L3 Unattended
+read-only          suggest + human-gate    autonomous within a denylist
+(plan/dontAsk)     (dontAsk/auto)          (bypassPermissions, ISOLATED only)
+```
+
+**Never skip a rung.** A fresh loop starts at L1. It graduates only after its reports
+prove its judgment over real runs. Each rung adds exactly one new power and one new
+guardrail.
+
+| | L1 Report | L2 Assisted | L3 Unattended |
+|---|---|---|---|
+| **Posture** | discovery + triage | propose changes | autonomous land |
+| **Writes?** | no — report only | yes, in a worktree | yes, allowlisted classes |
+| **Permission mode** | `plan` or `dontAsk` + read allowlist | `dontAsk` + narrow allowlist, or `auto` | `bypassPermissions` **in a container** |
+| **Required guardrails** | bounded scope, kill switch | + guard command, + worktree, + escalation | + denylist, + isolation boundary, + budget cap |
+| **Lands by** | a human reads the report | a human approves the PR (or `fleet-ops`) | the loop, inside its boundary |
+| **Blast radius** | zero (no writes) | one PR, reviewable | bounded by the denylist + container |
+
+---
+
+## How each tier maps to a permission mode
+
+Claude Code has six permission modes. Loops use four of them:
+
+| Mode | Behaviour | Loop tier |
+|---|---|---|
+| `plan` | read/explore only; cannot edit | L1 (strictest) |
+| `dontAsk` | auto-**denies** anything not pre-approved; read-only Bash always allowed; fully non-interactive | L1 / L2 (**recommended default for workers**) |
+| `auto` | a classifier model gates each unresolved action; "trust the direction" autonomy | L2 (long runs) |
+| `acceptEdits` | in-scope edits + common fs commands auto-approved; other Bash needs an allow rule | L2 (edit-heavy, known command set) |
+| `bypassPermissions` | no gates at all | L3 — **only** inside an isolated container/VM without internet |
+
+`default` (prompt each action) is interactive — not for unattended loops.
+`acceptEdits` is the middle option when the command set is known.
+
+### Why `dontAsk` is the workhorse for L1/L2 workers
+
+`dontAsk` is fully non-interactive (it never prompts; it auto-denies the unknown), so it
+runs anywhere — no container required — and read-only Bash is always allowed. Pair it
+with a **narrow** `permissions.allow` list (`Bash(npm test)`, `Bash(git status)`) and you
+get a worker that can do exactly its job and nothing else. This is the safe default for
+headless loop workers.
+
+---
+
+## The headless-profile table (what a `claude -p` worker should use)
+
+The loop's *maker* runs are headless `claude -p` sessions. Pick the least privilege that
+still lets the job run:
+
+| Profile | Behaviour | Use for |
+|---|---|---|
+| `--permission-mode dontAsk` + curated `permissions.allow` | auto-denies anything not pre-approved; read-only Bash allowed; non-interactive | **locked-down workers (recommended default)** |
+| `--permission-mode auto` | classifier-gated; configure `autoMode.environment` for your infra. In `-p`, repeated blocks abort the session | long "trust-the-direction" runs |
+| `--permission-mode acceptEdits` + allow rules | edits + common fs auto-approved; other Bash needs an allow rule | edit-heavy tasks, known command set |
+| `--dangerously-skip-permissions` (= `bypassPermissions`) | no gates; refuses root/sudo; `rm -rf /`\|`~` still circuit-break | **only** in an isolated container/VM/devcontainer without internet |
+
+In **non-interactive `-p` mode** a hard denial **aborts the session** (there's no human
+to prompt). So an `auto`-mode worker that hits a wall dies; a `dontAsk` worker with a
+correct allowlist never hits one. This is why enumerating permissions beats relying on
+the classifier for batch workers.
+
+---
+
+## The cardinal rule: scheduler invokes `claude -p`, not session-spawns-loop
+
+This is the one thing the upstream methodology can't tell you because it isn't grounded
+in Claude Code's gate. **An unattended loop must be a scheduler/script that invokes
+`claude -p` — not a Claude session that tries to launch the loop.**
+
+Why: the auto-mode classifier evaluates tool calls *inside* an auto-mode session. A
+session that tries to spawn a detached `claude -p --permission-mode bypassPermissions`
+child is blocked as **Create Unsafe Agents** (an ungated autonomous agent with no human
+gate). Two independent fixes, combine for best result:
+
+1. **Move the launch outside the auto-mode session.** A human — or a human-configured
+   Task Scheduler / cron / CI runner / plain script — running `claude -p …` is the
+   authorizer, with no parent classifier in the loop. Don't run the *orchestrator*
+   session itself in auto mode if its job is spawning agents.
+2. **Give the child gates instead of bypass.** The denial is about the *ungated*
+   property, not headless-ness. A `dontAsk`+allowlist child is gated and runs fine.
+
+> **Subagents can't escalate.** Agent/Task subagents inherit the parent's mode; the
+> classifier uses the parent mode and ignores `permissionMode` in subagent frontmatter.
+> A full-bypass worker fleet must be the isolated-container path launched *outside* the
+> auto-mode session — never an in-session subagent.
+
+---
+
+## The real fork: enumerate vs isolate
+
+When a loop needs real power, there are exactly two legitimate shapes. Reaching for
+`bypassPermissions` on the host *to avoid enumerating permissions* is precisely the
+pattern the classifier blocks.
+
+| | **Enumerate** | **Isolate** |
+|---|---|---|
+| Shape | `dontAsk` + a curated allowlist | container/VM + `bypassPermissions` |
+| Runs | anywhere (host, CI, laptop) | only inside the sandbox |
+| Safety | bounded by the allowlist | bounded by the container |
+| Cost | you list the commands once | you stand up isolation |
+| Best for | most loops; CI/PR/dep workers | heavy autonomous refactors, untrusted-input runs |
+
+**Default to enumerate.** Reach for isolate only when the job genuinely needs arbitrary
+execution *and* you have a real sandbox (no internet, can't damage the host).
+
+---
+
+## Tier checklist (what `loop-audit` enforces)
+
+- **L1:** bounded `scope` (never `*`), a `kill_switch`, `permission_mode` ∈ {plan,
+  dontAsk}, **no** `verify` that writes. Report-only.
+- **L2:** all of L1, plus a `verify` gate **and** a `guard` (must-always-pass),
+  `worktree: true`, a concrete `escalation:` rule, and a `land_via` (e.g. `fleet-ops`).
+- **L3:** all of L2, plus `permission_mode: bypassPermissions` **with** an isolation note
+  in `escalation`/scope, a denylist of never-auto-land classes, and a `budget_tokens`
+  cap. The audit warns hard if L3 is declared without an isolation boundary.
+
+## See also
+
+- [../../../docs/AUTO-MODE-CLASSIFIER.md](../../../docs/AUTO-MODE-CLASSIFIER.md) — the full two-gate model, classifier categories, legitimate-authorization decision tree.
+- [claude-code-loops.md](claude-code-loops.md) — the scheduler/`claude -p` mechanics this tier model runs on.
+- [pattern-catalog.md](pattern-catalog.md) — each pattern's recommended starting tier.

+ 150 - 0
skills/loop-ops/references/state-spine.md

@@ -0,0 +1,150 @@
+# The State Spine — memory outside the conversation
+
+A loop's durability comes from state that lives **outside** the conversation window. The
+conversation is ephemeral and degrades as it fills (the Ralph insight: quality drops past
+~100–150k tokens). The spine is three files the loop reads at the start of every run and
+writes at the end. This is the loop's working memory, audit trail, and definition.
+
+```
+.loops/<name>/
+├── loop.config.yaml    # the definition (immutable-ish; edited by a human)
+├── STATE.md            # the triage snapshot (rewritten every run)
+└── run-log.md          # append-only audit trail (one line per run)
+```
+
+`loop-init` scaffolds all three. The config is human-owned; `STATE.md` and `run-log.md`
+are loop-owned.
+
+---
+
+## `loop.config.yaml` — the definition
+
+Flat YAML so it's trivially parseable (no `yq` dependency). Full annotated template:
+[../assets/loop.config.template.yaml](../assets/loop.config.template.yaml). Fields:
+
+| Field | Required | Meaning |
+|---|---|---|
+| `name` | yes | the loop's identifier; matches the directory |
+| `pattern` | yes | a catalog key (`pr-babysitter`, …) or `custom` |
+| `tier` | yes | `L1` / `L2` / `L3` — the autonomy rung |
+| `cadence` | yes | `10m` / `1h` / `6h` / `1d`, or a cron string |
+| `goal` | yes | one sentence: what this loop does and what it must NOT do |
+| `scope` | yes | bounded globs the loop may touch — **never `*`** |
+| `verify` | L2+ | the gate command (the metric/check); a loop with no gate is invalid |
+| `guard` | L2+ | a must-always-pass command (full suite / typecheck) |
+| `permission_mode` | yes | `plan` / `dontAsk` / `auto` / `acceptEdits` / `bypassPermissions` |
+| `worktree` | L2+ | `true` to isolate code changes in a git worktree |
+| `escalation` | yes | what the loop escalates instead of doing (the gate rule) |
+| `budget_tokens` | rec | per-run output-token ceiling |
+| `kill_switch` | yes | the stop signal every run checks first |
+| `land_via` | L2+ | who gates + lands winning branches (e.g. `fleet-ops`) |
+
+`loop-audit` reads this file and scores it against the tier's requirements.
+
+---
+
+## `STATE.md` — the triage snapshot
+
+Rewritten at the end of every run; read at the top of the next. It is **not** a database
+— it's a lightweight snapshot of what the loop needs, what it's watching, and what it
+ignored. Template: [../assets/STATE.template.md](../assets/STATE.template.md). Shape:
+
+```markdown
+# <loop-name> — STATE
+_Updated: 2026-06-22T14:05:00Z · run #142 · readiness 100/100_
+
+## Priority   (act on these next)
+- [P1] PR #412 failing CI 3h — owner pinged
+- [P2] dep `axios` patch 1.14.0→1.14.1 available, cooldown clears 2026-06-25
+
+## Watch     (not yet actionable)
+- PR #408 awaiting review 1h
+- flag `new-checkout` at 100% rollout 6d — cleanup candidate
+
+## Noise      (seen + dismissed this run)
+- PR #410 draft — skip until ready
+- dep `left-pad` major bump — escalates, not auto
+
+---
+_Source: .github/workflows/<loop>.yml · config: loop.config.yaml_
+```
+
+**The read/write contract:**
+1. **Read** `STATE.md` first thing — it's the loop's memory of the last run.
+2. **Check the kill switch** (`kill_switch:` from config) — exit immediately if set.
+3. Do the run's work, drawing the next unit from the Priority list.
+4. **Rewrite** `STATE.md` — promote/demote items across Priority/Watch/Noise, bump the
+   `_Updated_` line + run number + readiness.
+
+`readiness` is the loop's self-assessment (0–100): is its config still coherent, its
+gate still passing, its scope still valid? A dropping readiness is an early signal to
+re-audit.
+
+---
+
+## `run-log.md` — the append-only audit trail
+
+One line per run, appended, never rewritten. Answers "what has this loop been doing, and
+what did it cost?"
+
+```
+2026-06-22T14:05:00Z  run#142  action=reported  pr=412  outcome=escalated  tokens=18420
+2026-06-22T13:55:00Z  run#141  action=none       -       outcome=quiet      tokens=2110
+2026-06-22T13:45:00Z  run#140  action=proposed   pr=409  outcome=pr-opened  tokens=44380
+```
+
+The `tokens` column feeds back into the budget. Tail it to see drift: a loop that used to
+cost 2k/run quietly now costing 40k/run is doing more than it was scoped to.
+
+---
+
+## Budget control
+
+A loop's cost is `runs/day × tokens/run × price`, and sub-agents multiply tokens/run.
+Two controls:
+
+- **`budget_tokens`** in the config — a per-run output ceiling. The loop stops the run
+  when it's reached (the same discipline as a dynamic `/loop` watching `budget.remaining()`).
+- **The run-log** — the actual spend, line by line. Reconcile estimate (`loop-cost`)
+  against actual periodically; if they diverge, the loop's scope crept.
+
+Estimate before you schedule: [../scripts/loop-cost.py](../scripts/loop-cost.py). The
+cheapest lever is **cadence** — halving the frequency halves the cost. The next is
+**model** — a Haiku triage loop costs a fifth of an Opus one; put the cheap model on the
+maker and reserve the expensive one for the gate decision.
+
+---
+
+## Multi-loop coordination
+
+Running several loops against one repo, two rules prevent them tripping over each other:
+
+### Priority order (collision avoidance)
+
+```
+CI Sweeper  ►  PR Babysitter  ►  Dependency Sweeper  ►  Post-Merge/Changelog  ►  Daily Triage
+ (highest)                                                                        (off-peak)
+```
+
+A red build blocks everyone, so the CI sweeper wins any worktree contention; daily triage
+yields to all. When two loops want the same worktree/branch, the higher-priority one
+proceeds and the lower defers to its next cadence tick. Loops announce what they're
+touching via [`pigeon`](../../pigeon/SKILL.md) so a peer can see "ci-sweeper holds a
+worktree on PR #412" and stand off.
+
+### The kill switch (every loop honors it)
+
+One stop signal, checked at the top of **every** run, that halts **every** loop:
+
+- a **sentinel file** — `.loops/PAUSED` (global) or `.loops/<name>/PAUSED` (one loop), or
+- a **label** — `loop-pause` on the repo/issue, checked via `gh`.
+
+No loop ships without one. It's the difference between "the loops are misbehaving, give me
+a minute" and "the loops are misbehaving, where's the breaker?". Put the exact mechanism
+in `kill_switch:` and make checking it the first action of every run, before the work.
+
+## See also
+
+- [risk-tiers.md](risk-tiers.md) — the autonomy ladder the config's `tier` selects.
+- [pattern-catalog.md](pattern-catalog.md) — each pattern's place in the priority order.
+- [claude-code-loops.md](claude-code-loops.md) — how the cadence actually fires.

+ 244 - 0
skills/loop-ops/scripts/loop-audit.sh

@@ -0,0 +1,244 @@
+#!/usr/bin/env bash
+# Score an outer-loop config for readiness before it is scheduled.
+#
+# Usage:   loop-audit.sh [OPTIONS] <loop.config.yaml>
+# Input:   argv flags + a config path (no stdin).
+# Output:  stdout = findings (plain `SEVERITY  message` rows, or --json envelope).
+#          Data only.
+# Stderr:  the readiness panel (score + verdict), notices, errors.
+# Exit:    0 ready (no errors, score >= --min), 2 usage, 3 config not found,
+#          4 config unparseable, 10 NOT ready (findings present)
+#
+# Scores a flat loop.config.yaml against the tier's requirements: a bounded scope,
+# a defined escalation rule + kill switch, and — at L2+ — a verify gate, a guard, a
+# worktree, and a landing path. The config is parsed without a yq dependency.
+# Pair with loop-init.sh (scaffold) and references/risk-tiers.md (the rubric).
+#
+# Examples:
+#   loop-audit.sh .loops/pr-babysitter/loop.config.yaml
+#   loop-audit.sh --json .loops/dep-sweeper/loop.config.yaml | jq '.data[] | select(.severity=="error")'
+#   loop-audit.sh --min 80 --strict .loops/ci-sweeper/loop.config.yaml
+set -uo pipefail
+
+readonly EX_OK=0 EX_USAGE=2 EX_NOTFOUND=3 EX_UNPARSEABLE=4 EX_FINDINGS=10
+
+# Terminal design system. stdout = findings (data); the score panel frames on stderr.
+__lib="$(cd "$(dirname "${BASH_SOURCE[0]}")/../../_lib" 2>/dev/null && pwd || true)"
+if [ -n "${__lib:-}" ] && [ -f "$__lib/term.sh" ]; then . "$__lib/term.sh"; term_init 2
+else
+  term_panel_open() { :; }; term_panel_close() { :; }; term_panel_vert() { :; }
+  term_status_row() { shift; printf '  - %s %s\n' "$1" "${2:-}"; }
+  term_pip_bar() { printf '%s/%s' "$2" "$3"; }
+  term_color() { shift; printf '%s' "$*"; }; TERM_DOT="|"
+fi
+
+CFG=""
+MIN=70
+STRICT=0
+JSON=0
+
+usage() {
+  cat <<'EOF'
+loop-audit.sh — score an outer-loop config for readiness.
+
+Usage:
+  loop-audit.sh [OPTIONS] <loop.config.yaml>
+
+Options:
+  --min N        readiness score (0-100) required for a "ready" verdict (default: 70).
+  --strict       count warnings toward the NOT-ready signal (exit 10).
+  --json         emit a JSON envelope instead of plain rows.
+  -h, --help     show this help and exit 0.
+
+Exit codes:
+  0 ready    2 usage    3 config not found    4 unparseable    10 NOT ready (findings)
+
+Examples:
+  loop-audit.sh .loops/pr-babysitter/loop.config.yaml
+  loop-audit.sh --json .loops/dep-sweeper/loop.config.yaml | jq '.data[] | select(.severity=="error")'
+  loop-audit.sh --min 80 --strict .loops/ci-sweeper/loop.config.yaml
+EOF
+}
+
+die_usage() { printf 'error: %s\n' "$1" >&2; echo >&2; usage >&2; exit "$EX_USAGE"; }
+
+# ── parse args ──────────────────────────────────────────────────────────────
+while [[ $# -gt 0 ]]; do
+  case "$1" in
+    --min)     [[ $# -ge 2 ]] || die_usage "--min needs a value"; MIN="$2"; shift 2 ;;
+    --strict)  STRICT=1; shift ;;
+    --json)    JSON=1; shift ;;
+    -h|--help) usage; exit "$EX_OK" ;;
+    -*)        die_usage "unknown flag: $1" ;;
+    *)         [[ -z "$CFG" ]] || die_usage "unexpected extra argument: $1"; CFG="$1"; shift ;;
+  esac
+done
+
+[[ -n "$CFG" ]] || die_usage "a loop.config.yaml path is required"
+[[ "$MIN" =~ ^[0-9]+$ ]] || die_usage "--min must be an integer (got '$MIN')"
+[[ -f "$CFG" ]] || { printf 'error: config not found: %s\n' "$CFG" >&2; exit "$EX_NOTFOUND"; }
+
+# Unparseable: no top-level `key:` lines at all.
+if ! grep -Eq '^[a-z_]+:' "$CFG"; then
+  printf 'error: no parseable top-level keys in %s\n' "$CFG" >&2
+  exit "$EX_UNPARSEABLE"
+fi
+
+# ── flat-YAML readers (no yq) ───────────────────────────────────────────────
+cfg_scalar() { # inline scalar value for `^KEY:`; empty if absent or block-list
+  awk -v k="$1" -v q="'" '
+    $0 ~ "^"k":" {
+      sub("^"k":[ \t]*","")
+      sub(/[ \t]*#.*$/,"")
+      gsub(/^[ \t]+|[ \t]+$/,"")
+      gsub(/^"|"$/,""); gsub("^"q"|"q"$","")
+      print; exit
+    }' "$CFG"
+}
+cfg_has_key() { grep -Eq "^$1:" "$CFG"; }
+cfg_list_items() { # `  - item` lines under `^KEY:`, until the next top-level key
+  awk -v k="$1" -v q="'" '
+    $0 ~ "^"k":" { inlist=1; next }
+    inlist==1 {
+      if ($0 ~ /^[ \t]*-[ \t]+/) {
+        line=$0
+        sub(/^[ \t]*-[ \t]+/,"",line); sub(/[ \t]*#.*$/,"",line)
+        gsub(/^[ \t]+|[ \t]+$/,"",line); gsub(/^"|"$/,"",line); gsub("^"q"|"q"$","",line)
+        if (line != "") print line
+      } else if ($0 ~ /^[^ \t#]/) { inlist=0 }
+    }' "$CFG"
+}
+is_placeholder() { [[ "$1" == *"<"*">"* ]]; }   # an unfilled <PLACEHOLDER>
+
+# ── findings + scoring ──────────────────────────────────────────────────────
+FIND_SEV=(); FIND_MSG=()
+CHECKS_TOTAL=0; CHECKS_PASS=0
+add() { FIND_SEV+=("$1"); FIND_MSG+=("$2"); }
+pass() { CHECKS_TOTAL=$((CHECKS_TOTAL+1)); CHECKS_PASS=$((CHECKS_PASS+1)); }
+fail() { CHECKS_TOTAL=$((CHECKS_TOTAL+1)); add "$1" "$2"; }    # $1=severity $2=message
+
+# require <severity> <ok?> <message-on-fail>  — a present+valid scalar check.
+require() { if [[ "$2" -eq 1 ]]; then pass; else fail "$1" "$3"; fi; }
+
+TIER="$(cfg_scalar tier)"
+PMODE="$(cfg_scalar permission_mode)"
+NAME="$(cfg_scalar name)"
+GOAL="$(cfg_scalar goal)"
+ESCAL="$(cfg_scalar escalation)"
+KILL="$(cfg_scalar kill_switch)"
+BUDGET="$(cfg_scalar budget_tokens)"
+VERIFY="$(cfg_scalar verify)"
+GUARD="$(cfg_scalar guard)"
+WORKTREE="$(cfg_scalar worktree)"
+LANDVIA="$(cfg_scalar land_via)"
+CADENCE="$(cfg_scalar cadence)"
+PATTERN="$(cfg_scalar pattern)"
+
+is_l2plus=0; [[ "$TIER" == "L2" || "$TIER" == "L3" ]] && is_l2plus=1
+
+# present-and-not-placeholder predicate
+filled() { [[ -n "$1" ]] && ! is_placeholder "$1"; }
+
+# ── always-applicable checks ────────────────────────────────────────────────
+require error  "$(filled "$NAME" && echo 1 || echo 0)"  "name: missing or placeholder"
+require warning "$(filled "$PATTERN" && echo 1 || echo 0)" "pattern: missing"
+case "$TIER" in L1|L2|L3) pass ;; *) fail error "tier: must be L1|L2|L3 (got '${TIER:-empty}')" ;; esac
+require warning "$(filled "$CADENCE" && echo 1 || echo 0)" "cadence: missing"
+require error  "$(filled "$GOAL" && echo 1 || echo 0)"  "goal: missing or placeholder"
+require error  "$(filled "$ESCAL" && echo 1 || echo 0)" "escalation: undefined — every loop must declare what it escalates"
+require error  "$(filled "$KILL" && echo 1 || echo 0)"  "kill_switch: undefined — no loop ships without a stop signal"
+
+# budget present + numeric
+if [[ -n "$BUDGET" && "$BUDGET" =~ ^[0-9]+$ ]]; then pass; else fail warning "budget_tokens: missing or non-numeric — bound the per-run spend"; fi
+
+# scope present + bounded + not placeholder
+mapfile -t SCOPE_ITEMS < <(cfg_list_items scope)
+SCOPE_INLINE="$(cfg_scalar scope)"
+[[ -n "$SCOPE_INLINE" ]] && SCOPE_ITEMS+=("$SCOPE_INLINE")
+if ! cfg_has_key scope || [[ ${#SCOPE_ITEMS[@]} -eq 0 ]]; then
+  fail error "scope: missing — bound what the loop may touch"
+else
+  scope_bad=0
+  for it in "${SCOPE_ITEMS[@]}"; do
+    if is_placeholder "$it"; then fail error "scope: unfilled placeholder ('$it')"; scope_bad=1; break; fi
+    case "$it" in '*'|'**'|'.'|'./'|'/'|'') fail error "scope: unbounded ('$it') — a loop that may touch anything is not bounded"; scope_bad=1; break ;; esac
+  done
+  [[ "$scope_bad" -eq 0 ]] && pass
+fi
+
+# permission_mode present + valid
+case "$PMODE" in
+  plan|dontAsk|auto|acceptEdits|bypassPermissions) pass ;;
+  "") fail error "permission_mode: missing" ;;
+  *)  fail error "permission_mode: invalid ('$PMODE')" ;;
+esac
+
+# permission_mode consistent with tier (warning)
+case "$TIER" in
+  L1) case "$PMODE" in plan|dontAsk) pass ;; *) fail warning "permission_mode '$PMODE' is broad for L1 (report-only) — prefer plan or dontAsk" ;; esac ;;
+  L2) case "$PMODE" in dontAsk|auto|acceptEdits) pass ;; *) fail warning "permission_mode '$PMODE' fits L2 poorly — prefer dontAsk/auto/acceptEdits" ;; esac ;;
+  L3) case "$PMODE" in bypassPermissions) pass ;; *) fail warning "L3 unattended usually needs bypassPermissions in a container (got '$PMODE')" ;; esac ;;
+  *)  : ;;
+esac
+
+# ── L2+ checks (code-changing tiers) ────────────────────────────────────────
+if [[ "$is_l2plus" -eq 1 ]]; then
+  require error "$(filled "$VERIFY" && echo 1 || echo 0)" "verify: no gate command — a code-changing loop with no gate is invalid"
+  require error "$(filled "$GUARD" && echo 1 || echo 0)"  "guard: no must-always-pass command at $TIER"
+  if [[ "$WORKTREE" == "true" ]]; then pass; else fail error "worktree: must be true at $TIER — isolate code changes"; fi
+  require warning "$(filled "$LANDVIA" && echo 1 || echo 0)" "land_via: undefined — name who gates+lands (e.g. fleet-ops)"
+fi
+
+# ── L3-specific isolation check ─────────────────────────────────────────────
+if [[ "$TIER" == "L3" ]]; then
+  if printf '%s %s' "$ESCAL" "${SCOPE_ITEMS[*]:-}" | grep -Eqi 'container|isolat|sandbox|devcontainer'; then
+    pass
+  else
+    fail warning "L3 declares no isolation boundary — bypassPermissions is only safe in a container/VM; note it in escalation"
+  fi
+fi
+
+# ── verdict ─────────────────────────────────────────────────────────────────
+ERRORS=0; WARNINGS=0
+for s in "${FIND_SEV[@]:-}"; do
+  [[ "$s" == "error" ]] && ERRORS=$((ERRORS+1))
+  [[ "$s" == "warning" ]] && WARNINGS=$((WARNINGS+1))
+done
+SCORE=0
+[[ "$CHECKS_TOTAL" -gt 0 ]] && SCORE=$(( CHECKS_PASS * 100 / CHECKS_TOTAL ))
+
+READY=1
+[[ "$ERRORS" -gt 0 ]] && READY=0
+[[ "$SCORE" -lt "$MIN" ]] && READY=0
+[[ "$STRICT" -eq 1 && "$WARNINGS" -gt 0 ]] && READY=0
+
+# ── output ──────────────────────────────────────────────────────────────────
+if [[ "$JSON" -eq 1 ]]; then
+  printf '{\n  "data": [\n'
+  for i in "${!FIND_SEV[@]}"; do
+    msg="${FIND_MSG[$i]//\\/\\\\}"; msg="${msg//\"/\\\"}"
+    sep=","; [[ "$i" -eq $(( ${#FIND_SEV[@]} - 1 )) ]] && sep=""
+    printf '    {"severity": "%s", "message": "%s"}%s\n' "${FIND_SEV[$i]}" "$msg" "$sep"
+  done
+  printf '  ],\n  "meta": {"count": %d, "errors": %d, "warnings": %d, "score": %d, "min": %d, "ready": %s, "tier": "%s", "schema": "claude-mods.loop-ops.audit/v1"}\n}\n' \
+    "${#FIND_SEV[@]}" "$ERRORS" "$WARNINGS" "$SCORE" "$MIN" "$([[ "$READY" -eq 1 ]] && echo true || echo false)" "${TIER:-unknown}"
+else
+  if [[ ${#FIND_SEV[@]} -gt 0 ]]; then
+    for i in "${!FIND_SEV[@]}"; do
+      printf '%-7s %s\n' "$(printf '%s' "${FIND_SEV[$i]}" | tr '[:lower:]' '[:upper:]')" "${FIND_MSG[$i]}"
+    done
+  fi
+  verdict="$([[ "$READY" -eq 1 ]] && echo READY || echo "NOT READY")"
+  vstate="$([[ "$READY" -eq 1 ]] && echo ok || echo bad)"
+  {
+    term_panel_open loop "loop ${TERM_DOT} audit" "${NAME:-$(basename "$(dirname "$CFG")")}"
+    term_panel_vert
+    term_status_row "$vstate" "$verdict  $(term_pip_bar score "$SCORE" 100)" "score $SCORE/100 ${TERM_DOT} tier ${TIER:-?}"
+    term_status_row "$([[ "$ERRORS" -eq 0 ]] && echo ok || echo bad)" "$ERRORS error(s)" "must be 0 to be ready"
+    term_status_row "$([[ "$WARNINGS" -eq 0 ]] && echo ok || echo warn)" "$WARNINGS warning(s)" "$([[ "$STRICT" -eq 1 ]] && echo 'block under --strict' || echo advisory)"
+    term_panel_vert
+    term_panel_close "min $MIN ${TERM_DOT} fix errors before scheduling" ""
+  } >&2
+fi
+
+[[ "$READY" -eq 1 ]] && exit "$EX_OK" || exit "$EX_FINDINGS"

+ 229 - 0
skills/loop-ops/scripts/loop-cost.py

@@ -0,0 +1,229 @@
+#!/usr/bin/env python3
+"""Estimate the token/$ cost of an outer loop by pattern × cadence × model.
+
+A loop's cost is runs/day × tokens/run × price, and sub-agents multiply tokens/run.
+This computes that before you commit to a cadence. Pricing reads from
+assets/model-pricing.json (date-stamped; skills/claude-api-ops is the source of
+truth — run its check-model-table.py if you suspect drift).
+
+Usage:   loop-cost.py --pattern P --cadence C --model M [OPTIONS]
+Input:   argv flags only (no stdin).
+Output:  stdout = the cost breakdown (plain rows, or --json envelope). Data only.
+Stderr:  the assumptions note, errors.
+Exit:    0 ok, 2 usage, 3 pricing file missing, 4 bad cadence/model/pattern
+
+Estimates, not guarantees — reconcile against the loop's run-log.md actuals. The
+cheapest lever is cadence (halving frequency halves cost); the next is model.
+
+Examples:
+  loop-cost.py --pattern pr-babysitter --cadence 10m --model claude-haiku-4-5
+  loop-cost.py --pattern ci-sweeper --cadence 15m --model claude-sonnet-4-6 --days 30 --json
+  loop-cost.py --list-models
+"""
+from __future__ import annotations
+
+import argparse
+import json
+import os
+import re
+import sys
+from pathlib import Path
+
+EX_OK = 0
+EX_USAGE = 2
+EX_NOTFOUND = 3
+EX_VALIDATION = 4
+
+DEFAULT_PRICING = Path(__file__).resolve().parent.parent / "assets" / "model-pricing.json"
+
+
+class Term:
+    """Minimal ANSI helper (term.sh is bash-only; per TERMINAL-DESIGN.md §9 the
+    Python port is inline). Honors FORCE_COLOR / NO_COLOR / TERM_ASCII and the
+    bound stream's TTY + encoding, so piped data stays plain ASCII."""
+
+    _C = {"green": "\033[32m", "cyan": "\033[36m", "dim": "\033[2m", "off": "\033[0m"}
+
+    def __init__(self, stream=sys.stderr):
+        enc = (getattr(stream, "encoding", "") or "").lower()
+        self.ascii = os.environ.get("TERM_ASCII") == "1" or "utf" not in enc
+        if os.environ.get("FORCE_COLOR"):
+            self.color = True
+        elif (os.environ.get("NO_COLOR") is not None
+              or os.environ.get("TERM") == "dumb"
+              or not getattr(stream, "isatty", lambda: False)()):
+            self.color = False
+        else:
+            self.color = True
+
+    def c(self, name, text):
+        return f"{self._C.get(name,'')}{text}{self._C['off']}" if self.color else text
+
+
+def load_pricing(path: Path) -> dict:
+    if not path.is_file():
+        print(f"error: pricing file not found: {path}", file=sys.stderr)
+        raise SystemExit(EX_NOTFOUND)
+    try:
+        return json.loads(path.read_text(encoding="utf-8"))
+    except (json.JSONDecodeError, OSError) as exc:
+        print(f"error: could not read pricing file: {exc}", file=sys.stderr)
+        raise SystemExit(EX_VALIDATION)
+
+
+def runs_per_day(cadence: str, override: float | None) -> float:
+    """Translate a cadence into runs/day. Supports Nm/Nh/Nd and the common cron
+    forms `*/N * * * *` and `N * * * *`. --runs-per-day overrides everything."""
+    if override is not None:
+        if override <= 0:
+            print("error: --runs-per-day must be positive", file=sys.stderr)
+            raise SystemExit(EX_VALIDATION)
+        return float(override)
+
+    s = cadence.strip()
+    m = re.fullmatch(r"(\d+)([mhd])", s)
+    if m:
+        n = int(m.group(1))
+        if n <= 0:
+            print(f"error: cadence value must be positive (got '{cadence}')", file=sys.stderr)
+            raise SystemExit(EX_VALIDATION)
+        return {"m": 1440.0, "h": 24.0, "d": 1.0}[m.group(2)] / n
+    cron_min = re.fullmatch(r"\*/(\d+) \* \* \* \*", s)
+    if cron_min:
+        n = int(cron_min.group(1))
+        return 1440.0 / n if n > 0 else 1440.0
+    if re.fullmatch(r"\d+ \* \* \* \*", s):
+        return 24.0
+    print(
+        f"error: cannot derive runs/day from cadence '{cadence}' — "
+        "use Nm/Nh/Nd, `*/N * * * *`, or pass --runs-per-day",
+        file=sys.stderr,
+    )
+    raise SystemExit(EX_VALIDATION)
+
+
+def fmt_money(x: float) -> str:
+    """Human dollar string: cents below $100, 4 decimals below $1 for tiny per-run costs."""
+    if x < 1:
+        return f"${x:.4f}"
+    return f"${x:,.2f}"
+
+
+def main(argv: list[str]) -> int:
+    p = argparse.ArgumentParser(
+        prog="loop-cost.py",
+        description="Estimate outer-loop cost by pattern × cadence × model.",
+    )
+    p.add_argument("--pattern", default="custom", help="catalog pattern key (default: custom)")
+    p.add_argument("--cadence", default="1h", help="10m | 1h | 6h | 1d, or a cron string (default: 1h)")
+    p.add_argument("--model", default="claude-haiku-4-5", help="model id (default: claude-haiku-4-5)")
+    p.add_argument("--days", type=int, default=30, help="horizon in days for the total (default: 30)")
+    p.add_argument("--runs-per-day", type=float, default=None, help="override the cadence-derived runs/day")
+    p.add_argument("--input-tokens", type=int, default=None, help="override per-run input tokens")
+    p.add_argument("--output-tokens", type=int, default=None, help="override per-run output tokens")
+    p.add_argument("--subagents", type=int, default=None, help="override the sub-agent fan-out multiplier")
+    p.add_argument("--pricing", default=str(DEFAULT_PRICING), help="path to model-pricing.json")
+    p.add_argument("--list-models", action="store_true", help="print the pricing table + as-of date, exit 0")
+    p.add_argument("--json", action="store_true", help="emit a JSON envelope")
+    try:
+        args = p.parse_args(argv)
+    except SystemExit as exc:
+        return EX_USAGE if exc.code not in (0, None) else (exc.code or EX_OK)
+
+    pricing = load_pricing(Path(args.pricing))
+    models = pricing.get("models", {})
+    as_of = pricing.get("_as_of", "unknown")
+    pattern_defaults = pricing.get("_pattern_defaults", {})
+
+    # ── --list-models ──
+    if args.list_models:
+        if args.json:
+            print(json.dumps({"data": models, "meta": {"as_of": as_of, "schema": "claude-mods.loop-ops.pricing/v1"}}, indent=2))
+        else:
+            print(f"{'model':<22}{'input $/MTok':>14}{'output $/MTok':>16}")
+            for mid, pr in models.items():
+                print(f"{mid:<22}{pr.get('input_per_mtok', 0):>14.2f}{pr.get('output_per_mtok', 0):>16.2f}")
+            print(f"\n(as of {as_of}; source of truth: claude-api-ops)", file=sys.stderr)
+        return EX_OK
+
+    if args.days <= 0:
+        print("error: --days must be positive", file=sys.stderr)
+        return EX_VALIDATION
+
+    # ── model ──
+    if args.model not in models:
+        print(f"error: unknown model '{args.model}' — known: {', '.join(models) or '(none)'}", file=sys.stderr)
+        return EX_VALIDATION
+    in_price = float(models[args.model]["input_per_mtok"])
+    out_price = float(models[args.model]["output_per_mtok"])
+
+    # ── tokens/run: overrides win, else pattern defaults ──
+    if args.input_tokens is not None and args.output_tokens is not None:
+        in_tok, out_tok = args.input_tokens, args.output_tokens
+        sub = args.subagents if args.subagents is not None else 1
+    elif args.pattern in pattern_defaults:
+        d = pattern_defaults[args.pattern]
+        in_tok = args.input_tokens if args.input_tokens is not None else int(d["input"])
+        out_tok = args.output_tokens if args.output_tokens is not None else int(d["output"])
+        sub = args.subagents if args.subagents is not None else int(d.get("subagents", 1))
+    else:
+        print(
+            f"error: unknown pattern '{args.pattern}' — pass --input-tokens and "
+            f"--output-tokens, or use one of: {', '.join(k for k in pattern_defaults if not k.startswith('_'))}",
+            file=sys.stderr,
+        )
+        return EX_VALIDATION
+
+    if min(in_tok, out_tok, sub) < 0:
+        print("error: token counts and --subagents must be non-negative", file=sys.stderr)
+        return EX_VALIDATION
+
+    rpd = runs_per_day(args.cadence, args.runs_per_day)
+
+    # ── cost math ──
+    cost_in = in_tok / 1_000_000 * in_price
+    cost_out = out_tok / 1_000_000 * out_price
+    cost_run = (cost_in + cost_out) * sub
+    tokens_run = (in_tok + out_tok) * sub
+    cost_day = cost_run * rpd
+    cost_horizon = cost_day * args.days
+
+    if args.json:
+        envelope = {
+            "data": {
+                "pattern": args.pattern,
+                "model": args.model,
+                "cadence": args.cadence,
+                "runs_per_day": round(rpd, 3),
+                "tokens_per_run": tokens_run,
+                "input_tokens": in_tok,
+                "output_tokens": out_tok,
+                "subagents": sub,
+                "cost_per_run": round(cost_run, 6),
+                "cost_per_day": round(cost_day, 4),
+                "days": args.days,
+                "cost_per_horizon": round(cost_horizon, 2),
+            },
+            "meta": {"as_of": as_of, "schema": "claude-mods.loop-ops.cost/v1"},
+        }
+        print(json.dumps(envelope, indent=2))
+        return EX_OK
+
+    t = Term(sys.stderr)
+    print(f"{'pattern:':<16}{args.pattern}")
+    print(f"{'model:':<16}{args.model}")
+    print(f"{'cadence:':<16}{args.cadence}  ->  {rpd:g} runs/day")
+    print(f"{'tokens/run:':<16}{tokens_run:,} ({in_tok:,} in + {out_tok:,} out) x {sub} subagent(s)")
+    print(f"{'cost/run:':<16}{fmt_money(cost_run)}")
+    print(f"{'cost/day:':<16}{fmt_money(cost_day)}")
+    print(f"{'cost/'+str(args.days)+'d:':<16}{t.c('cyan', fmt_money(cost_horizon))}")
+    print(
+        f"estimate (as of {as_of} pricing) - reconcile against run-log.md actuals; "
+        "cadence is the biggest lever",
+        file=sys.stderr,
+    )
+    return EX_OK
+
+
+if __name__ == "__main__":
+    sys.exit(main(sys.argv[1:]))

+ 200 - 0
skills/loop-ops/scripts/loop-init.sh

@@ -0,0 +1,200 @@
+#!/usr/bin/env bash
+# Scaffold an outer-loop state spine (loop.config.yaml + STATE.md + run-log.md).
+#
+# Usage:   loop-init.sh --name NAME [OPTIONS]
+# Input:   argv flags only (no stdin).
+# Output:  stdout = the created loop.config.yaml path (data). Under --dry-run, the
+#          path then the rendered config. Data only.
+# Stderr:  the creation panel, reminders, warnings, errors.
+# Exit:    0 created (or dry-run rendered), 2 usage, 3 template/dir not found,
+#          5 precondition (target dir already populated, no --force)
+#
+# Creates <dir>/<name>/ from the bundled templates, substituting name/pattern/tier/
+# cadence/permission_mode. Never clobbers a populated loop dir. Atomic writes.
+# Next step: fill the config, then `loop-audit.sh <dir>/<name>/loop.config.yaml`.
+#
+# Examples:
+#   loop-init.sh --name pr-babysitter --pattern pr-babysitter --tier L1
+#   loop-init.sh --name dep-sweeper --pattern dependency-sweeper --tier L2 --cadence 1d
+#   loop-init.sh --name nightly --cadence "0 3 * * *" --dry-run
+set -uo pipefail
+
+readonly EX_OK=0 EX_USAGE=2 EX_NOTFOUND=3 EX_PRECOND=5
+
+# Terminal design system (skills/_lib/term.sh). stdout = the created path (data);
+# the creation panel frames on stderr, so detect color on fd 2. Degrade to plain
+# stderr lines if the shared lib is unreachable.
+__lib="$(cd "$(dirname "${BASH_SOURCE[0]}")/../../_lib" 2>/dev/null && pwd || true)"
+if [ -n "${__lib:-}" ] && [ -f "$__lib/term.sh" ]; then . "$__lib/term.sh"; term_init 2
+else
+  term_panel_open() { :; }; term_panel_close() { :; }; term_panel_vert() { :; }
+  term_status_row() { shift; printf '  - %s %s\n' "$1" "${2:-}"; }
+  term_alert() { shift; printf '  ! %s\n' "$*"; }
+  term_color() { shift; printf '%s' "$*"; }; TERM_DOT="|"
+fi
+
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+ASSETS="$HERE/../assets"
+CFG_TPL="$ASSETS/loop.config.template.yaml"
+STATE_TPL="$ASSETS/STATE.template.md"
+
+# ── defaults ────────────────────────────────────────────────────────────────
+NAME=""
+PATTERN="custom"
+TIER="L1"
+CADENCE="1h"
+DIR=".loops"
+DRY_RUN=0
+FORCE=0
+
+usage() {
+  cat <<'EOF'
+loop-init.sh — scaffold an outer-loop state spine.
+
+Usage:
+  loop-init.sh --name NAME [OPTIONS]
+
+Options:
+  --name NAME        loop identifier, kebab-case (required). Names the directory.
+  --pattern KEY      catalog key (pr-babysitter, ci-sweeper, dependency-sweeper,
+                     changelog-drafter, post-merge-cleanup, issue-triage,
+                     daily-triage) or "custom" (default: custom).
+  --tier L1|L2|L3    starting autonomy tier (default: L1).
+  --cadence STR      10m | 1h | 6h | 1d, or a cron string (default: 1h).
+  --dir DIR          parent directory for the loop (default: .loops).
+  --dry-run          print the target path + rendered config; write nothing.
+  --force            overwrite an already-populated <dir>/<name>/ directory.
+  -h, --help         show this help and exit 0.
+
+Exit codes:
+  0 created (or dry-run)   2 usage   3 template/dir not found   5 dir populated
+
+Examples:
+  loop-init.sh --name pr-babysitter --pattern pr-babysitter --tier L1
+  loop-init.sh --name dep-sweeper --pattern dependency-sweeper --tier L2 --cadence 1d
+  loop-init.sh --name nightly --cadence "0 3 * * *" --dry-run
+EOF
+}
+
+die_usage() { printf 'error: %s\n' "$1" >&2; echo >&2; usage >&2; exit "$EX_USAGE"; }
+
+# ── parse args ──────────────────────────────────────────────────────────────
+while [[ $# -gt 0 ]]; do
+  case "$1" in
+    --name)    [[ $# -ge 2 ]] || die_usage "--name needs a value"; NAME="$2"; shift 2 ;;
+    --pattern) [[ $# -ge 2 ]] || die_usage "--pattern needs a value"; PATTERN="$2"; shift 2 ;;
+    --tier)    [[ $# -ge 2 ]] || die_usage "--tier needs a value"; TIER="$2"; shift 2 ;;
+    --cadence) [[ $# -ge 2 ]] || die_usage "--cadence needs a value"; CADENCE="$2"; shift 2 ;;
+    --dir)     [[ $# -ge 2 ]] || die_usage "--dir needs a value"; DIR="$2"; shift 2 ;;
+    --dry-run) DRY_RUN=1; shift ;;
+    --force)   FORCE=1; shift ;;
+    -h|--help) usage; exit "$EX_OK" ;;
+    -*)        die_usage "unknown flag: $1" ;;
+    *)         die_usage "unexpected positional argument: $1" ;;
+  esac
+done
+
+# ── validate ────────────────────────────────────────────────────────────────
+[[ -n "$NAME" ]] || die_usage "--name is required"
+[[ "$NAME" =~ ^[a-z0-9]+(-[a-z0-9]+)*$ ]] || die_usage "--name must be kebab-case (got '$NAME')"
+[[ "$PATTERN" =~ ^[a-z0-9]+(-[a-z0-9]+)*$ ]] || die_usage "--pattern must be kebab-case (got '$PATTERN')"
+case "$TIER" in L1|L2|L3) ;; *) die_usage "--tier must be L1|L2|L3 (got '$TIER')" ;; esac
+# cadence: Nm/Nh/Nd OR a cron-ish string (digits, spaces, * / , -)
+[[ "$CADENCE" =~ ^[0-9]+[mhd]$ || "$CADENCE" =~ ^[-0-9*/,\ ]+$ ]] \
+  || die_usage "--cadence must be like 10m/1h/1d or a cron string (got '$CADENCE')"
+
+[[ -f "$CFG_TPL" ]]   || { printf 'error: config template not found at %s\n' "$CFG_TPL" >&2; exit "$EX_NOTFOUND"; }
+[[ -f "$STATE_TPL" ]] || { printf 'error: STATE template not found at %s\n' "$STATE_TPL" >&2; exit "$EX_NOTFOUND"; }
+
+# Default permission_mode from tier (the workhorse mapping; see references/risk-tiers.md).
+case "$TIER" in
+  L1|L2) PMODE="dontAsk" ;;
+  L3)    PMODE="bypassPermissions" ;;
+esac
+
+TARGET_DIR="$DIR/$NAME"
+CFG_OUT="$TARGET_DIR/loop.config.yaml"
+STATE_OUT="$TARGET_DIR/STATE.md"
+LOG_OUT="$TARGET_DIR/run-log.md"
+
+# Refuse a populated target unless --force.
+if [[ -d "$TARGET_DIR" ]] && [[ -n "$(ls -A "$TARGET_DIR" 2>/dev/null)" ]] && [[ "$FORCE" -ne 1 ]]; then
+  printf 'error: loop directory already populated: %s (use --force to overwrite)\n' "$TARGET_DIR" >&2
+  exit "$EX_PRECOND"
+fi
+
+NOW="$(date -u +%Y-%m-%dT%H:%M:%SZ)"
+
+# ── render config from template ─────────────────────────────────────────────
+# Line-anchored sed substitutions: identity placeholders globally, the three
+# tunable scalar lines by their default value. Kill-switch path carries <loop-name>.
+render_config() {
+  sed -E \
+    -e "s|<loop-name>|$NAME|g" \
+    -e "s|<pattern-key>|$PATTERN|" \
+    -e "s|^tier: L1|tier: $TIER|" \
+    -e "s|^cadence: 1h|cadence: $CADENCE|" \
+    -e "s|^permission_mode: dontAsk|permission_mode: $PMODE|" \
+    "$CFG_TPL"
+}
+
+render_state() {
+  sed -E \
+    -e "s|<loop-name>|$NAME|g" \
+    -e "s|<ISO-8601 Z>|$NOW|" \
+    "$STATE_TPL"
+}
+
+render_log() {
+  cat <<EOF
+# $NAME — run log (append-only; one line per run)
+# format: <ISO-Z>  run#N  action=<reported|proposed|none>  <key=val…>  outcome=<…>  tokens=<N>
+EOF
+}
+
+# ── dry-run: print and stop ─────────────────────────────────────────────────
+if [[ "$DRY_RUN" -eq 1 ]]; then
+  printf '%s\n' "$CFG_OUT"
+  {
+    term_panel_open loop "loop ${TERM_DOT} init (dry-run)" "$NAME"
+    term_panel_vert
+    term_status_row skip "would create  $TARGET_DIR/" "tier $TIER ${TERM_DOT} $PATTERN ${TERM_DOT} $CADENCE"
+    term_status_row skip "  loop.config.yaml" "permission_mode: $PMODE"
+    term_status_row skip "  STATE.md / run-log.md" ""
+    term_panel_vert
+    term_panel_close "nothing written" ""
+  } >&2
+  render_config
+  exit "$EX_OK"
+fi
+
+# ── atomic writes ───────────────────────────────────────────────────────────
+mkdir -p "$TARGET_DIR" || { printf 'error: could not create %s\n' "$TARGET_DIR" >&2; exit 1; }
+
+write_atomic() {  # write_atomic <dest> <content>
+  local dest="$1" content="$2" tmp
+  tmp="$dest.tmp.$$"
+  printf '%s\n' "$content" > "$tmp" || { printf 'error: failed to write %s\n' "$tmp" >&2; exit 1; }
+  mv -f "$tmp" "$dest" || { rm -f "$tmp"; printf 'error: failed to move into place: %s\n' "$dest" >&2; exit 1; }
+}
+
+write_atomic "$CFG_OUT"   "$(render_config)"
+write_atomic "$STATE_OUT" "$(render_state)"
+write_atomic "$LOG_OUT"   "$(render_log)"
+
+printf '%s\n' "$CFG_OUT"
+
+{
+  term_panel_open loop "loop ${TERM_DOT} init" "$NAME"
+  term_panel_vert
+  term_status_row ok "created  $TARGET_DIR/" "tier $TIER ${TERM_DOT} $PATTERN ${TERM_DOT} $CADENCE"
+  term_status_row ok "  loop.config.yaml" "permission_mode: $PMODE"
+  term_status_row ok "  STATE.md / run-log.md" ""
+  if [[ "$TIER" != "L1" ]]; then
+    term_alert warning "tier $TIER needs a verify gate, guard, worktree, escalation + land_via — fill them before auditing"
+  fi
+  term_panel_vert
+  term_panel_close "then: fill the config ${TERM_DOT} loop-audit.sh $CFG_OUT" ""
+} >&2
+
+exit "$EX_OK"

+ 206 - 0
skills/loop-ops/tests/run.sh

@@ -0,0 +1,206 @@
+#!/usr/bin/env bash
+# Self-test for loop-ops scripts (loop-init.sh, loop-audit.sh, loop-cost.py).
+#
+# Offline-deterministic (no network). Scaffolds throwaway loop fixtures, asserts the
+# documented exit codes + key output of each script, then cleans up. Resolves paths
+# relative to itself so it works both in the repo and installed to ~/.claude/.
+#
+# Usage:   bash tests/run.sh
+# Exit:    0 all pass, 1 one or more failures
+
+set -uo pipefail
+
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+SKILL="$(dirname "$HERE")"
+SCRIPTS="$SKILL/scripts"
+INIT="$SCRIPTS/loop-init.sh"
+AUDIT="$SCRIPTS/loop-audit.sh"
+COST="$SCRIPTS/loop-cost.py"
+
+# Pick a python that actually executes — skips the Windows Store python3 stub.
+PYTHON=""
+for c in python python3 py; do
+  if command -v "$c" >/dev/null 2>&1 && "$c" -c "" >/dev/null 2>&1; then PYTHON="$c"; break; fi
+done
+[[ -z "$PYTHON" ]] && { echo "no working python found — skipping" >&2; exit 0; }
+
+SB="$(mktemp -d)"; trap 'rm -rf "$SB"' EXIT
+
+PASS=0; FAIL=0
+ok() { PASS=$((PASS+1)); printf '  PASS  %s\n' "$1"; }
+no() { FAIL=$((FAIL+1)); printf '  FAIL  %s\n' "$1"; }
+expect_exit() { [[ "$2" == "$3" ]] && ok "$1 (exit $3)" || no "$1 (want $2 got $3)"; }
+expect_has()  { case "$3" in *"$2"*) ok "$1";; *) no "$1 (missing '$2')";; esac; }
+
+# Write a filled, READY L1 report-only config.
+good_l1() { cat > "$1" <<'EOF'
+name: test-l1
+pattern: pr-babysitter
+tier: L1
+permission_mode: dontAsk
+cadence: 10m
+goal: "Watch open PRs and report; never merge."
+scope:
+  - "src/**"
+escalation: "comment on the PR; never merge to main"
+budget_tokens: 200000
+kill_switch: ".loops/test-l1/PAUSED exists or loop-pause label"
+EOF
+}
+
+# Write a filled, READY L2 assisted config.
+good_l2() { cat > "$1" <<'EOF'
+name: dep-sweeper
+pattern: dependency-sweeper
+tier: L2
+permission_mode: dontAsk
+cadence: 1d
+goal: "Patch-only dependency bumps behind cooldown; open a PR."
+scope:
+  - "package.json"
+  - "package-lock.json"
+verify: "npm test"
+guard: "npm run typecheck"
+worktree: true
+land_via: fleet-ops
+escalation: "minor/major bumps escalate; never merge to main"
+budget_tokens: 300000
+kill_switch: ".loops/dep-sweeper/PAUSED"
+EOF
+}
+
+echo "=== loop-ops self-test (python: $PYTHON) ==="
+
+# ── --help contracts (exit 0) ──────────────────────────────────────────────
+echo "-- --help --"
+bash "$INIT"  --help >/dev/null 2>&1; expect_exit "loop-init --help" 0 $?
+bash "$AUDIT" --help >/dev/null 2>&1; expect_exit "loop-audit --help" 0 $?
+"$PYTHON" "$COST" --help >/dev/null 2>&1; expect_exit "loop-cost --help" 0 $?
+
+# ── loop-init: scaffolds dir + 3 files, substitutes fields ─────────────────
+echo "-- loop-init --"
+out="$(bash "$INIT" --name pr-watch --pattern pr-babysitter --tier L1 --cadence 5m --dir "$SB/loops" 2>/dev/null)"; rc=$?
+expect_exit "loop-init -> 0" 0 "$rc"
+expect_has  "prints the config path" "pr-watch/loop.config.yaml" "$out"
+[[ -f "$SB/loops/pr-watch/loop.config.yaml" ]] && ok "wrote loop.config.yaml" || no "no loop.config.yaml"
+[[ -f "$SB/loops/pr-watch/STATE.md" ]] && ok "wrote STATE.md" || no "no STATE.md"
+[[ -f "$SB/loops/pr-watch/run-log.md" ]] && ok "wrote run-log.md" || no "no run-log.md"
+cfg="$(cat "$SB/loops/pr-watch/loop.config.yaml")"
+expect_has "substituted name" "name: pr-watch" "$cfg"
+expect_has "substituted tier" "tier: L1" "$cfg"
+expect_has "substituted cadence" "cadence: 5m" "$cfg"
+expect_has "L1 default permission_mode" "permission_mode: dontAsk" "$cfg"
+# L3 default permission_mode is bypassPermissions
+bash "$INIT" --name big-job --tier L3 --dir "$SB/loops" >/dev/null 2>&1
+expect_has "L3 default permission_mode" "permission_mode: bypassPermissions" "$(cat "$SB/loops/big-job/loop.config.yaml")"
+
+# ── loop-init: refuses a populated dir -> 5, --force overwrites ─────────────
+bash "$INIT" --name pr-watch --dir "$SB/loops" >/dev/null 2>&1; expect_exit "refuse populated dir -> 5" 5 $?
+bash "$INIT" --name pr-watch --dir "$SB/loops" --force >/dev/null 2>&1; expect_exit "--force overwrites -> 0" 0 $?
+
+# ── loop-init: --dry-run writes nothing ────────────────────────────────────
+out="$(bash "$INIT" --name ghost --dir "$SB/dryloops" --dry-run 2>/dev/null)"; rc=$?
+expect_exit "dry-run -> 0" 0 "$rc"
+[[ -e "$SB/dryloops" ]] && no "dry-run created files" || ok "dry-run wrote nothing"
+expect_has "dry-run prints config path" "ghost/loop.config.yaml" "$out"
+
+# ── loop-init: usage errors ────────────────────────────────────────────────
+bash "$INIT" --dir "$SB/loops" >/dev/null 2>&1; expect_exit "missing --name -> 2" 2 $?
+bash "$INIT" --name BadName --dir "$SB/loops" >/dev/null 2>&1; expect_exit "non-kebab name -> 2" 2 $?
+bash "$INIT" --name x --tier L9 --dir "$SB/loops" >/dev/null 2>&1; expect_exit "bad tier -> 2" 2 $?
+
+# ── loop-audit: a freshly-init'd config is NOT ready (placeholders) -> 10 ───
+echo "-- loop-audit --"
+bash "$INIT" --name raw --pattern custom --tier L1 --dir "$SB/loops" >/dev/null 2>&1
+out="$(bash "$AUDIT" "$SB/loops/raw/loop.config.yaml" 2>/dev/null)"; rc=$?
+expect_exit "raw scaffold not ready -> 10" 10 "$rc"
+expect_has  "flags the goal placeholder" "goal:" "$out"
+
+# ── loop-audit: filled L1 config is READY -> 0 ─────────────────────────────
+good_l1 "$SB/l1.yaml"
+out="$(bash "$AUDIT" "$SB/l1.yaml" 2>/dev/null)"; rc=$?
+expect_exit "filled L1 ready -> 0" 0 "$rc"
+
+# ── loop-audit: filled L2 config is READY -> 0 ─────────────────────────────
+good_l2 "$SB/l2.yaml"
+bash "$AUDIT" "$SB/l2.yaml" >/dev/null 2>&1; expect_exit "filled L2 ready -> 0" 0 $?
+
+# ── loop-audit: L2 missing the gate -> 10, names verify ────────────────────
+grep -v '^verify:' "$SB/l2.yaml" > "$SB/l2-nogate.yaml"
+out="$(bash "$AUDIT" "$SB/l2-nogate.yaml" 2>/dev/null)"; rc=$?
+expect_exit "L2 missing gate -> 10" 10 "$rc"
+expect_has  "names the missing gate" "verify:" "$out"
+
+# ── loop-audit: unbounded scope -> 10 ──────────────────────────────────────
+sed 's|  - "src/\*\*"|  - "*"|' "$SB/l1.yaml" > "$SB/l1-unbounded.yaml"
+out="$(bash "$AUDIT" "$SB/l1-unbounded.yaml" 2>/dev/null)"; rc=$?
+expect_exit "unbounded scope -> 10" 10 "$rc"
+expect_has  "names unbounded scope" "unbounded" "$out"
+
+# ── loop-audit: missing escalation -> 10 ───────────────────────────────────
+grep -v '^escalation:' "$SB/l1.yaml" > "$SB/l1-noescal.yaml"
+out="$(bash "$AUDIT" "$SB/l1-noescal.yaml" 2>/dev/null)"; rc=$?
+expect_exit "missing escalation -> 10" 10 "$rc"
+expect_has  "names escalation" "escalation:" "$out"
+
+# ── loop-audit: missing file -> 3, unparseable -> 4, bad --min -> 2 ────────
+bash "$AUDIT" "$SB/no-such.yaml" >/dev/null 2>&1; expect_exit "missing config -> 3" 3 $?
+printf 'just some prose, no keys\n' > "$SB/garbage.yaml"
+bash "$AUDIT" "$SB/garbage.yaml" >/dev/null 2>&1; expect_exit "unparseable -> 4" 4 $?
+bash "$AUDIT" --min abc "$SB/l1.yaml" >/dev/null 2>&1; expect_exit "bad --min -> 2" 2 $?
+
+# ── loop-audit: --json envelope schema + ready flag ────────────────────────
+out="$(bash "$AUDIT" --json "$SB/l1.yaml" 2>/dev/null)"
+expect_has "audit json schema" "claude-mods.loop-ops.audit/v1" "$out"
+expect_has "audit json ready true" '"ready": true' "$out"
+out="$(bash "$AUDIT" --json "$SB/l2-nogate.yaml" 2>/dev/null)"
+expect_has "audit json ready false" '"ready": false' "$out"
+
+# ── loop-audit: --strict turns a warning into NOT ready ────────────────────
+# An L1 with permission_mode: auto is consistent-enough to pass errors but warns
+# (broad for L1). Normally ready; --strict flips it.
+sed 's|permission_mode: dontAsk|permission_mode: auto|' "$SB/l1.yaml" > "$SB/l1-warn.yaml"
+bash "$AUDIT" "$SB/l1-warn.yaml" >/dev/null 2>&1; expect_exit "warning, normally ready -> 0" 0 $?
+bash "$AUDIT" --strict "$SB/l1-warn.yaml" >/dev/null 2>&1; expect_exit "warning, --strict not ready -> 10" 10 $?
+
+# ── loop-cost: basic run, --json, --list-models, cadence forms ─────────────
+echo "-- loop-cost --"
+out="$("$PYTHON" "$COST" --pattern pr-babysitter --cadence 10m --model claude-haiku-4-5 2>/dev/null)"; rc=$?
+expect_exit "loop-cost -> 0" 0 "$rc"
+expect_has  "prints a daily cost" "cost/day:" "$out"
+expect_has  "derives runs/day from 10m" "144 runs/day" "$out"
+out="$("$PYTHON" "$COST" --pattern ci-sweeper --cadence 15m --model claude-sonnet-4-6 --json 2>/dev/null)"
+expect_has "cost json schema" "claude-mods.loop-ops.cost/v1" "$out"
+expect_has "cost json carries runs_per_day" "runs_per_day" "$out"
+out="$("$PYTHON" "$COST" --list-models 2>/dev/null)"; rc=$?
+expect_exit "list-models -> 0" 0 "$rc"
+expect_has  "list-models shows a model" "claude-opus-4-8" "$out"
+# cron cadence parses
+"$PYTHON" "$COST" --pattern daily-triage --cadence '*/10 * * * *' --model claude-haiku-4-5 >/dev/null 2>&1
+expect_exit "cron cadence -> 0" 0 $?
+# --runs-per-day override
+out="$("$PYTHON" "$COST" --pattern custom --cadence weird --runs-per-day 5 --model claude-haiku-4-5 2>/dev/null)"; rc=$?
+expect_exit "runs-per-day override -> 0" 0 "$rc"
+expect_has  "uses the override" "5 runs/day" "$out"
+
+# ── loop-cost: validation errors ───────────────────────────────────────────
+"$PYTHON" "$COST" --pattern pr-babysitter --cadence 10m --model claude-nope >/dev/null 2>&1; expect_exit "unknown model -> 4" 4 $?
+"$PYTHON" "$COST" --pattern not-a-pattern --cadence 10m --model claude-haiku-4-5 >/dev/null 2>&1; expect_exit "unknown pattern -> 4" 4 $?
+"$PYTHON" "$COST" --pattern pr-babysitter --cadence "garbage cron" --model claude-haiku-4-5 >/dev/null 2>&1; expect_exit "bad cadence -> 4" 4 $?
+"$PYTHON" "$COST" --pricing "$SB/no-pricing.json" --pattern custom --cadence 1h --input-tokens 1 --output-tokens 1 --model x >/dev/null 2>&1; expect_exit "missing pricing file -> 3" 3 $?
+
+# ── terminal design system ─────────────────────────────────────────────────
+echo "-- terminal design system --"
+for s in "$INIT" "$AUDIT"; do
+  b="$(basename "$s")"
+  grep -q '_lib/term.sh' "$s" && ok "$b sources _lib/term.sh" || no "$b does not source _lib/term.sh"
+done
+grep -q 'class Term' "$COST" && ok "loop-cost carries inline Term helper" || no "loop-cost missing inline Term helper"
+grep -q 'BRAND::loop' "$SKILL/../_lib/term.sh" && ok "term.sh registers the loop brand glyph" || no "term.sh missing loop brand glyph"
+# Piped audit findings stay plain (no ANSI in the data stream).
+po="$(bash "$AUDIT" "$SB/l2-nogate.yaml" 2>/dev/null)"
+case "$po" in *$'\033'*) no "piped audit leaked ANSI into data";; *) ok "piped audit stays plain data";; esac
+
+# ── summary ────────────────────────────────────────────────────────────────
+echo "=== $PASS passed, $FAIL failed ==="
+[[ "$FAIL" -eq 0 ]] || exit 1