Просмотр исходного кода

feat(skills): loop-ops Tier 1 — loop-doctor, caching-aware cost, companion rule

Three world-class upgrades from the deep review:

- loop-doctor.sh — a live preflight (--offline/--live) modeled on fleet-doctor:
  loop-audit proves the config is well-formed, loop-doctor proves the loop will
  RUN. Catches the "blocked at 3am" failures: the verify/guard gate binary not on
  PATH, a budget too small to fit a tick, an interactive permission mode, an L3
  bypass with no isolation boundary. Exit 10 on a predicted runtime failure.

- loop-cost.py is now caching-aware. A loop re-sends the same run.md+system prefix
  every tick (the Ralph property) — the textbook prompt-caching case. It models the
  prefix as a cache entry (write once 1.25-2x, read ~0.1x) AND the load-bearing
  TTL-vs-cadence rule: a loop slower than the 1h max TTL can't cache (the entry
  expires between ticks). Teaches the #1 loop cost lever instead of overstating cost.

- rules/loop-engineering.md — a companion always-on directive (like supply-chain.md /
  prompt-injection.md) carrying graduated autonomy (L1->L2->L3 never start unattended),
  scheduler-invokes-claude-p-not-a-session, the escalation gate, and kill-switch +
  budget into every session — not only when loop-ops is explicitly invoked.

Wiring: rules 7->8 (README header + Rules row + PLAN); SKILL.md documents doctor +
caching + the init->fill->cost->audit->doctor workflow. Suite 68->81; doc-drift,
check-resources, validate all green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
0xDarkMatter 1 день назад
Родитель
Сommit
328fd6ffcb

+ 9 - 0
CHANGELOG.md

@@ -28,6 +28,15 @@ feature releases live in the README "Recent Updates" section.
 - **`docs/AUTO-MODE-CLASSIFIER.md`** - reference on Claude Code's auto-mode permission
   classifier (the two-gate model, gating categories, legitimate-authorization decision tree),
   cited by `loop-ops` as the authority for its risk-tier mapping.
+- **loop-ops hardening (world-class pass)**: `loop-doctor.sh` - a live preflight
+  (`--offline`/`--live`) that proves a loop will *run* (gate binary on PATH, budget fits a
+  tick, permission mode achievable, L3 isolation present), complementing loop-audit's
+  *well-formed* check; `loop-cost.py` is now **caching-aware** - it models the static
+  run-prompt prefix as a cache entry and the TTL-vs-cadence rule (a loop slower than ~1h
+  can't cache), the key loop economics lever; and a companion **`rules/loop-engineering.md`**
+  carries the graduated-autonomy directive (L1→L2→L3, scheduler-not-session, escalation
+  gate, kill switch + budget) into every session, not just when the skill is invoked.
+  Suite now 81 assertions.
 
 ## [3.2.0] - 2026-06-22
 

+ 2 - 1
README.md

@@ -18,7 +18,7 @@ Built on the [Agent Skills specification](https://agentskills.io/specification)
 
 From Python async patterns to Rust ownership models, from AWS Fargate deployments to Craft CMS development - claude-mods provides the specialized knowledge and tools that transform Claude from a general-purpose assistant into a domain expert who understands your stack, remembers your workflow, and ships production code.
 
-**3 agents. 95 skills. 13 styles. 11 hooks. 7 rules. One install.**
+**3 agents. 95 skills. 13 styles. 11 hooks. 8 rules. One install.**
 
 ## Recent Updates
 
@@ -395,6 +395,7 @@ See [skill-creator](skills/skill-creator/) for the complete guide.
 | [skill-agent-updates.md](rules/skill-agent-updates.md) | Mandatory docs check before creating/updating skills or agents |
 | [supply-chain.md](rules/supply-chain.md) | Behavioural-first dependency hygiene - scan before adding, day-zero cooldown, OIDC audit, persistence-hook awareness |
 | [worktree-boundaries.md](rules/worktree-boundaries.md) | Never touch other sessions' worktrees - no rm -rf, no git add -A sweeping gitlinks |
+| [loop-engineering.md](rules/loop-engineering.md) | Graduated-autonomy discipline for scheduled/autonomous agent loops - L1→L2→L3, scheduler-not-session, escalation gate, kill switch + budget; companion to loop-ops |
 
 ### Tools & Hooks
 

+ 1 - 1
docs/PLAN.md

@@ -18,7 +18,7 @@
 | Agents | 3 | Pure context-isolation/worker roles only: git-agent (background commits/PRs), firecrawl-expert (noisy scrapes), project-organizer (bulk restructure) |
 | Skills | 95 | Operational skills, CLI tools, workflows, diagnostics, security |
 | Commands | 2 | Session management (sync, save) |
-| Rules | 7 | cli-tools, commit-style, naming-conventions, prompt-injection, skill-agent-updates, supply-chain, worktree-boundaries |
+| Rules | 8 | cli-tools, commit-style, naming-conventions, prompt-injection, skill-agent-updates, supply-chain, worktree-boundaries, loop-engineering |
 | Output Styles | 13 | Vesper, Spartan, Mentor, Executive, Pair, Atlas, Coach, Harbour, Meridian, Noir, Roast, Sage, Scout |
 | Hooks | 11 | lint, format, safety, uv, install-scan, manifest-scan, pmail, unicode-scan ×2, config-change guard, worktree guard |
 

+ 83 - 0
rules/loop-engineering.md

@@ -0,0 +1,83 @@
+# Loop Engineering — graduated-autonomy discipline for agent loops
+
+Companion to the [`loop-ops`](../skills/loop-ops/SKILL.md) skill (the full playbook +
+`loop-init`/`loop-audit`/`loop-doctor`/`loop-cost` scripts). This file is the *directive*
+— what to do every time you design or run a **recurring / scheduled / autonomous** agent
+loop, in any project: a `/loop`, a `/schedule` routine, a cron `claude -p`, an `iterate`
+run, a `fleet-worker` fan-out.
+
+## The rule
+
+**A loop is a recurring process you grant standing authority to. Grant it the *least*
+authority that does the job, earn each increase with evidence, and never let it act on a
+blast radius bigger than its stated purpose.** Three non-negotiables:
+
+1. **Graduated autonomy — never start unattended.** `L1 report → L2 assisted → L3
+   unattended`. A fresh loop runs read-only (L1) until its reports prove its judgment;
+   only then does it earn write access (L2, human-gated merge), and only then autonomous
+   landing (L3, inside an isolation boundary). Starting at L3 is how incidents and
+   comprehension debt compound.
+2. **A scheduler invokes `claude -p` — a session does not spawn ungated children.** The
+   authorizer of an unattended loop is a human-configured cron / Task Scheduler / CI
+   runner, *outside* any auto-mode session. An `auto`-mode session that launches a
+   `--permission-mode bypassPermissions` child is hard-denied as *Create Unsafe Agents* —
+   by design. Give the headless child *gates* (`dontAsk` + a narrow allowlist), not bypass
+   — unless it runs in an isolated container.
+3. **No gate, no kill switch, no budget → no loop.** Every loop has a `verify` gate (the
+   check that decides land-vs-discard), a kill switch every run checks first, and a
+   per-run token budget. A loop missing any of these doesn't get scheduled.
+
+## Why this matters
+
+Unattended loops amplify both good judgment and mistakes, and they do it on a schedule
+while you're not watching. The failure modes are not hypothetical: a loop that force-pushes,
+that burns a day's budget in an hour, that "fixes" CI by deleting the failing test, that
+collides with another loop's worktree, or that silently stops triggering. The controls
+above are what make a loop's authority *recoverable*: a kill switch stops it, a budget
+bounds it, a gate keeps bad changes out, and the tier ladder means you only ever granted
+the authority you'd already seen it use well.
+
+## Directives — apply whenever a loop is involved
+
+| Situation | Directive |
+|---|---|
+| Designing any scheduled/autonomous loop | Start at **L1 (read-only)**. Scaffold with `loop-init`; fill a bounded `scope` (never `*`), a `verify` gate, an `escalation` rule, a `kill_switch`, a `budget_tokens`. |
+| Before scheduling a loop | Run **`loop-audit`** (config sane?) **then `loop-doctor --live`** (will it actually run — gate binary on PATH, budget fits a tick, permission mode achievable?). Don't schedule a loop that fails either. |
+| Choosing the permission mode | Default to **`dontAsk` + a narrow allowlist** (runs anywhere, fully gated). Reserve `bypassPermissions` for an **isolated container** (the enumerate-vs-isolate fork). Never `default` (interactive) for a headless loop. |
+| Wiring the cadence | A **scheduler** runs `claude -p` (the authorizer). Do **not** run an orchestrator session in auto mode whose job is spawning the loop. |
+| Setting the cadence + cost | Cadence is the biggest cost lever; **caching** is the next (a loop re-sends the same prompt — cache the static prefix, and note a loop slower than ~1h can't cache). Estimate with `loop-cost` before committing. |
+| Running several loops | Give them a **priority order** (CI > PR > deps > cleanup > triage) and a **shared kill switch**; coordinate via `pigeon` so they don't collide on a worktree. |
+| Anything high-blast-radius | **Escalate, don't act** (see below). A general goal is *not* authorization for a specific destructive action it implies. |
+
+## The escalation gate — never auto-land
+
+Bake into every loop's `escalation:` field. These **always** go to a human, regardless of
+the loop's goal: force-push · push to `main` · production deploy/migration · mass deletion ·
+granting IAM/repo permissions · destroying files that predate the run · editing `.claude/`
+or settings (self-modification) · `curl | bash`. Safe to auto-land at L2/L3 *when
+allowlisted*: a green PR on a feature branch, a lockfile patch bump past the guard, a
+generated draft, a label/triage classification, a comment.
+
+## Self-check before wiring a loop
+
+- Is it starting at L1? If you're reaching for L3 on a fresh loop, stop.
+- Does `loop-audit` pass and `loop-doctor --live` say it will run?
+- Is the child **gated** (`dontAsk`+allowlist) or genuinely **isolated** (container)? If
+  you're using `bypassPermissions` on the host to avoid enumerating permissions, that's the
+  exact pattern the auto-mode classifier blocks — authorize it properly or isolate it.
+- Can you stop it (kill switch) and does it have a budget?
+
+## When the playbook is needed
+
+For the full operational workflow — the risk-tier ↔ permission-mode mapping, the STATE/
+run-log/budget spine, the seven production patterns, multi-loop coordination, the
+scheduler mechanics, and the `loop-init`/`loop-audit`/`loop-doctor`/`loop-cost` tools —
+**invoke the [`loop-ops`](../skills/loop-ops/SKILL.md) skill.**
+
+## Cross-reference
+
+- `~/.claude/skills/loop-ops/SKILL.md` — full playbook + scripts.
+- `~/.claude/skills/loop-ops/references/risk-tiers.md` — the L1/L2/L3 ↔ permission-mode mapping.
+- `~/.claude/docs/AUTO-MODE-CLASSIFIER.md` — the two-gate model behind directive #2.
+- `worktree-boundaries.md` — never let a loop touch another session's `.claude/worktrees/`.
+- `iterate` / `fleet-worker` / `fleet-ops` — the inner-loop, spawn, and land layers a loop composes.

+ 36 - 8
skills/loop-ops/SKILL.md

@@ -155,10 +155,12 @@ Running several loops? Two non-negotiables (detail in
 
 ## Tools
 
-Three scripts, all following the [Skill Resource Protocol](../../docs/SKILL-RESOURCE-PROTOCOL.md)
-(stdout = data, semantic exit codes, `--help` with EXAMPLES, `--json` envelopes). They
-are the legs of a stool: **init** scaffolds, **audit** scores readiness, **cost**
-estimates spend before you commit to a cadence.
+Five scripts, all following the [Skill Resource Protocol](../../docs/SKILL-RESOURCE-PROTOCOL.md)
+(stdout = data, semantic exit codes, `--help` with EXAMPLES, `--json` envelopes): **init**
+scaffolds the loop, **audit** scores whether the config is *well-formed*, **doctor**
+preflights whether it will actually *run*, **cost** estimates spend (caching-aware), and
+**check-pricing-sync** gates pricing drift in CI. The discipline before scheduling is
+`init → fill → cost → audit → doctor --live`.
 
 ### `scripts/loop-init.sh` — scaffold a loop's state spine
 
@@ -198,10 +200,33 @@ Exit **0** = ready (no errors, score ≥ `--min`), **10** = not ready (findings
 `2` usage, `3` config not found, `4` config unparseable. `--strict` counts warnings
 toward the not-ready signal.
 
-### `scripts/loop-cost.py` — token/$ estimate by pattern × cadence × model
+### `scripts/loop-doctor.sh` — live preflight (will it actually run?)
+
+`loop-audit` proves the config is *well-formed*; `loop-doctor` proves the loop will
+*execute* — catching the "blocked at 3am" failures audit can't see. `--offline` (CI-safe):
+the budget fits a tick's estimated tokens, the permission mode is achievable (not
+interactive), an L3 bypass declares an isolation boundary. `--live` adds runtime preflight:
+the `verify`/`guard` gate's leading binary resolves on PATH, `claude`/`git` are present,
+the kill-switch sentinel's parent dir exists.
+
+```bash
+bash scripts/loop-doctor.sh --offline .loops/pr-babysitter/loop.config.yaml   # CI gate
+bash scripts/loop-doctor.sh --live .loops/ci-sweeper/loop.config.yaml          # before scheduling
+bash scripts/loop-doctor.sh --live --json .loops/dep-sweeper/loop.config.yaml | jq '.data[] | select(.state=="bad")'
+```
+
+Exit **0** = will run, **10** = a check predicts a runtime failure (gate binary missing,
+bypass on host without isolation, budget too small for a tick), `2` usage, `3` not found,
+`4` unparseable, `5` missing core dep. Run it **after** `loop-audit` and before scheduling.
+
+### `scripts/loop-cost.py` — token/$ estimate by pattern × cadence × model (caching-aware)
 
 Estimate spend **before** committing to a cadence — the cost of an outer loop is
-runs/day × tokens/run × price, and sub-agents multiply it. Pricing reads from
+runs/day × tokens/run × price, and sub-agents multiply it. It also models **prompt
+caching**: a loop re-sends the same `run.md`+system prefix every tick (the Ralph
+property), so the prefix should be cache-written once then read (~0.1×) — *but only if the
+tick interval fits the cache TTL*. A loop slower than ~1h can't cache (the entry expires
+between ticks); the estimator says so and recommends the TTL. Pricing reads from
 `assets/model-pricing.json` (date-stamped; [`claude-api-ops`](../claude-api-ops/SKILL.md)
 is the source of truth — run its `check-model-table.py` if you suspect drift).
 
@@ -240,9 +265,12 @@ python scripts/check-pricing-sync.py --offline   # exit 0 in sync, 10 drift, 3 a
    sanity-check the monthly spend against the value.
 5. **Audit it:** `bash scripts/loop-audit.sh .loops/<n>/loop.config.yaml` — fix every
    error before scheduling. Don't schedule a loop that fails its own audit.
-6. **Schedule** the L1 run with native `/loop` or `/schedule` (read-only — it just
+6. **Doctor it:** `bash scripts/loop-doctor.sh --live .loops/<n>/loop.config.yaml` — prove
+   it will actually *run* (gate binary on PATH, budget fits a tick). Audit = well-formed;
+   doctor = will-run.
+7. **Schedule** the L1 run with native `/loop` or `/schedule` (read-only — it just
    writes `STATE.md` + a report).
-7. **Read the reports.** Only after the loop's judgment is proven do you graduate it to
+8. **Read the reports.** Only after the loop's judgment is proven do you graduate it to
    **L2** (worktree + guard + `fleet-ops` landing) and re-audit at the higher tier.
 
 ## Anti-patterns (these are detected and wrong)

+ 132 - 42
skills/loop-ops/scripts/loop-cost.py

@@ -2,22 +2,28 @@
 """Estimate the token/$ cost of an outer loop by pattern × cadence × model.
 
 A loop's cost is runs/day × tokens/run × price, and sub-agents multiply tokens/run.
-This computes that before you commit to a cadence. Pricing reads from
-assets/model-pricing.json (date-stamped; skills/claude-api-ops is the source of
-truth — run its check-model-table.py if you suspect drift).
+This computes that - and, crucially, models **prompt caching**: a loop re-sends the
+SAME run.md + system prefix every tick (the Ralph property), which is the textbook
+caching case. Whether caching helps depends on cadence vs cache TTL, so this picks the
+TTL and reports the cached projection alongside the naive one.
+
+Pricing reads from assets/model-pricing.json (date-stamped; skills/claude-api-ops is
+the source of truth - run its check-model-table.py if you suspect drift).
 
 Usage:   loop-cost.py --pattern P --cadence C --model M [OPTIONS]
 Input:   argv flags only (no stdin).
 Output:  stdout = the cost breakdown (plain rows, or --json envelope). Data only.
-Stderr:  the assumptions note, errors.
+Stderr:  the assumptions + caching note, errors.
 Exit:    0 ok, 2 usage, 3 pricing file missing, 4 bad cadence/model/pattern
 
-Estimates, not guarantees — reconcile against the loop's run-log.md actuals. The
-cheapest lever is cadence (halving frequency halves cost); the next is model.
+Estimates, not guarantees - reconcile against the loop's run-log.md actuals. Levers in
+order of impact: cadence (halving frequency halves cost), prompt caching (model below),
+model tier.
 
 Examples:
   loop-cost.py --pattern pr-babysitter --cadence 10m --model claude-haiku-4-5
   loop-cost.py --pattern ci-sweeper --cadence 15m --model claude-sonnet-4-6 --days 30 --json
+  loop-cost.py --pattern daily-triage --cadence 6h --model claude-opus-4-8   # too slow to cache
   loop-cost.py --list-models
 """
 from __future__ import annotations
@@ -36,11 +42,26 @@ EX_VALIDATION = 4
 
 DEFAULT_PRICING = Path(__file__).resolve().parent.parent / "assets" / "model-pricing.json"
 
+# Prompt-caching multipliers vs base input price (claude-api-ops/references/caching-and-cost.md).
+CACHE_WRITE_5M = 1.25   # write a 5-minute-TTL entry
+CACHE_WRITE_1H = 2.0    # write a 1-hour-TTL entry
+CACHE_READ = 0.1        # read any cached entry
+
+# Minimum cacheable prefix (tokens) - below this the cache_control marker is silently
+# ignored (caching-and-cost.md). A loop whose static prefix is smaller can't cache.
+MIN_PREFIX = {
+    "claude-fable-5": 512,
+    "claude-opus-4-8": 1024,
+    "claude-sonnet-4-6": 1024,
+    "claude-haiku-4-5": 4096,
+}
+DEFAULT_MIN_PREFIX = 1024
+
 
 class Term:
-    """Minimal ANSI helper (term.sh is bash-only; per TERMINAL-DESIGN.md §9 the
-    Python port is inline). Honors FORCE_COLOR / NO_COLOR / TERM_ASCII and the
-    bound stream's TTY + encoding, so piped data stays plain ASCII."""
+    """Minimal ANSI helper (term.sh is bash-only; per TERMINAL-DESIGN.md §9 the Python
+    port is inline). Honors FORCE_COLOR / NO_COLOR / TERM_ASCII and the bound stream's
+    TTY + encoding, so piped data stays plain ASCII."""
 
     _C = {"green": "\033[32m", "cyan": "\033[36m", "dim": "\033[2m", "off": "\033[0m"}
 
@@ -95,15 +116,63 @@ def runs_per_day(cadence: str, override: float | None) -> float:
     if re.fullmatch(r"\d+ \* \* \* \*", s):
         return 24.0
     print(
-        f"error: cannot derive runs/day from cadence '{cadence}'  "
+        f"error: cannot derive runs/day from cadence '{cadence}' - "
         "use Nm/Nh/Nd, `*/N * * * *`, or pass --runs-per-day",
         file=sys.stderr,
     )
     raise SystemExit(EX_VALIDATION)
 
 
+def caching_projection(in_tok, out_tok, sub, in_price, out_price, rpd, model,
+                       prefix_frac, ttl_choice):
+    """Model prompt-caching of the static run-prompt prefix across ticks.
+
+    Returns a dict: ttl, beneficial, reason, cost_per_run/day, prefix_tokens.
+    The cache stays warm only when the tick interval is <= the TTL (reads refresh it);
+    a loop slower than the 1h max TTL writes a cold entry every tick - caching can't help.
+    """
+    interval_min = 1440.0 / rpd if rpd > 0 else 1e9
+    prefix_tokens = int(round(in_tok * prefix_frac))
+    variable_in = in_tok - prefix_tokens
+    min_prefix = MIN_PREFIX.get(model, DEFAULT_MIN_PREFIX)
+
+    # Pick TTL: smallest that stays warm at this cadence.
+    if ttl_choice == "5m":
+        ttl, warm = "5m", interval_min <= 5
+    elif ttl_choice == "1h":
+        ttl, warm = "1h", interval_min <= 60
+    else:  # auto
+        if interval_min <= 5:
+            ttl, warm = "5m", True
+        elif interval_min <= 60:
+            ttl, warm = "1h", True
+        else:
+            ttl, warm = None, False
+
+    out_cost_day = out_tok / 1e6 * out_price * rpd
+
+    if prefix_tokens < min_prefix:
+        return {"ttl": ttl, "beneficial": False,
+                "reason": f"static prefix ~{prefix_tokens} tok < {model} minimum {min_prefix} tok "
+                          "- cache marker silently ignored; enlarge the run prompt/system or skip caching",
+                "prefix_tokens": prefix_tokens, "cost_per_day": None, "cost_per_run": None}
+    if not warm or ttl is None:
+        return {"ttl": ttl, "beneficial": False,
+                "reason": f"tick interval ~{interval_min:.0f} min exceeds the cache TTL "
+                          "- the entry expires between ticks, so every tick is a cold write; caching won't help",
+                "prefix_tokens": prefix_tokens, "cost_per_day": None, "cost_per_run": None}
+
+    write_mult = CACHE_WRITE_5M if ttl == "5m" else CACHE_WRITE_1H
+    # Per day, warm: ~1 cache write of the prefix + (rpd-1) reads; variable input + output full price.
+    prefix_day = prefix_tokens / 1e6 * in_price * (write_mult + max(rpd - 1, 0) * CACHE_READ)
+    variable_day = variable_in / 1e6 * in_price * rpd
+    cost_day = (prefix_day + variable_day + out_cost_day) * sub
+    return {"ttl": ttl, "beneficial": True, "reason": "",
+            "prefix_tokens": prefix_tokens, "write_mult": write_mult,
+            "cost_per_day": cost_day, "cost_per_run": cost_day / rpd if rpd else cost_day}
+
+
 def fmt_money(x: float) -> str:
-    """Human dollar string: cents below $100, 4 decimals below $1 for tiny per-run costs."""
     if x < 1:
         return f"${x:.4f}"
     return f"${x:,.2f}"
@@ -112,7 +181,7 @@ def fmt_money(x: float) -> str:
 def main(argv: list[str]) -> int:
     p = argparse.ArgumentParser(
         prog="loop-cost.py",
-        description="Estimate outer-loop cost by pattern × cadence × model.",
+        description="Estimate outer-loop cost by pattern × cadence × model, with prompt caching.",
     )
     p.add_argument("--pattern", default="custom", help="catalog pattern key (default: custom)")
     p.add_argument("--cadence", default="1h", help="10m | 1h | 6h | 1d, or a cron string (default: 1h)")
@@ -122,6 +191,11 @@ def main(argv: list[str]) -> int:
     p.add_argument("--input-tokens", type=int, default=None, help="override per-run input tokens")
     p.add_argument("--output-tokens", type=int, default=None, help="override per-run output tokens")
     p.add_argument("--subagents", type=int, default=None, help="override the sub-agent fan-out multiplier")
+    p.add_argument("--cache-prefix-frac", type=float, default=0.6,
+                   help="fraction of input that is the static, cacheable run-prompt prefix (default: 0.6)")
+    p.add_argument("--cache-ttl", choices=["auto", "5m", "1h"], default="auto",
+                   help="cache TTL to model (default: auto - pick by cadence)")
+    p.add_argument("--no-cache", action="store_true", help="report the uncached cost only")
     p.add_argument("--pricing", default=str(DEFAULT_PRICING), help="path to model-pricing.json")
     p.add_argument("--list-models", action="store_true", help="print the pricing table + as-of date, exit 0")
     p.add_argument("--json", action="store_true", help="emit a JSON envelope")
@@ -135,7 +209,6 @@ def main(argv: list[str]) -> int:
     as_of = pricing.get("_as_of", "unknown")
     pattern_defaults = pricing.get("_pattern_defaults", {})
 
-    # ── --list-models ──
     if args.list_models:
         if args.json:
             print(json.dumps({"data": models, "meta": {"as_of": as_of, "schema": "claude-mods.loop-ops.pricing/v1"}}, indent=2))
@@ -149,26 +222,27 @@ def main(argv: list[str]) -> int:
     if args.days <= 0:
         print("error: --days must be positive", file=sys.stderr)
         return EX_VALIDATION
+    if not (0.0 <= args.cache_prefix_frac <= 1.0):
+        print("error: --cache-prefix-frac must be between 0 and 1", file=sys.stderr)
+        return EX_VALIDATION
 
-    # ── model ──
     if args.model not in models:
-        print(f"error: unknown model '{args.model}'  known: {', '.join(models) or '(none)'}", file=sys.stderr)
+        print(f"error: unknown model '{args.model}' - known: {', '.join(models) or '(none)'}", file=sys.stderr)
         return EX_VALIDATION
     in_price = float(models[args.model]["input_per_mtok"])
     out_price = float(models[args.model]["output_per_mtok"])
 
-    # ── tokens/run: overrides win, else pattern defaults ──
     if args.input_tokens is not None and args.output_tokens is not None:
         in_tok, out_tok = args.input_tokens, args.output_tokens
         sub = args.subagents if args.subagents is not None else 1
-    elif args.pattern in pattern_defaults:
+    elif args.pattern in pattern_defaults and not args.pattern.startswith("_"):
         d = pattern_defaults[args.pattern]
         in_tok = args.input_tokens if args.input_tokens is not None else int(d["input"])
         out_tok = args.output_tokens if args.output_tokens is not None else int(d["output"])
         sub = args.subagents if args.subagents is not None else int(d.get("subagents", 1))
     else:
         print(
-            f"error: unknown pattern '{args.pattern}'  pass --input-tokens and "
+            f"error: unknown pattern '{args.pattern}' - pass --input-tokens and "
             f"--output-tokens, or use one of: {', '.join(k for k in pattern_defaults if not k.startswith('_'))}",
             file=sys.stderr,
         )
@@ -180,7 +254,7 @@ def main(argv: list[str]) -> int:
 
     rpd = runs_per_day(args.cadence, args.runs_per_day)
 
-    # ── cost math ──
+    # ── uncached (naive) ──
     cost_in = in_tok / 1_000_000 * in_price
     cost_out = out_tok / 1_000_000 * out_price
     cost_run = (cost_in + cost_out) * sub
@@ -188,25 +262,32 @@ def main(argv: list[str]) -> int:
     cost_day = cost_run * rpd
     cost_horizon = cost_day * args.days
 
+    # ── cached projection ──
+    cache = None
+    if not args.no_cache:
+        cache = caching_projection(in_tok, out_tok, sub, in_price, out_price, rpd,
+                                   args.model, args.cache_prefix_frac, args.cache_ttl)
+
     if args.json:
-        envelope = {
-            "data": {
-                "pattern": args.pattern,
-                "model": args.model,
-                "cadence": args.cadence,
-                "runs_per_day": round(rpd, 3),
-                "tokens_per_run": tokens_run,
-                "input_tokens": in_tok,
-                "output_tokens": out_tok,
-                "subagents": sub,
-                "cost_per_run": round(cost_run, 6),
-                "cost_per_day": round(cost_day, 4),
-                "days": args.days,
-                "cost_per_horizon": round(cost_horizon, 2),
-            },
-            "meta": {"as_of": as_of, "schema": "claude-mods.loop-ops.cost/v1"},
+        data = {
+            "pattern": args.pattern, "model": args.model, "cadence": args.cadence,
+            "runs_per_day": round(rpd, 3), "tokens_per_run": tokens_run,
+            "input_tokens": in_tok, "output_tokens": out_tok, "subagents": sub,
+            "cost_per_run": round(cost_run, 6), "cost_per_day": round(cost_day, 4),
+            "days": args.days, "cost_per_horizon": round(cost_horizon, 2),
         }
-        print(json.dumps(envelope, indent=2))
+        if cache is not None:
+            if cache["beneficial"]:
+                cd = cache["cost_per_day"]
+                data["caching"] = {
+                    "beneficial": True, "ttl": cache["ttl"], "prefix_tokens": cache["prefix_tokens"],
+                    "cost_per_day": round(cd, 4), "cost_per_horizon": round(cd * args.days, 2),
+                    "savings_pct": round((cost_day - cd) / cost_day * 100, 1) if cost_day else 0.0,
+                }
+            else:
+                data["caching"] = {"beneficial": False, "reason": cache["reason"],
+                                   "prefix_tokens": cache["prefix_tokens"]}
+        print(json.dumps({"data": data, "meta": {"as_of": as_of, "schema": "claude-mods.loop-ops.cost/v1"}}, indent=2))
         return EX_OK
 
     t = Term(sys.stderr)
@@ -216,12 +297,21 @@ def main(argv: list[str]) -> int:
     print(f"{'tokens/run:':<16}{tokens_run:,} ({in_tok:,} in + {out_tok:,} out) x {sub} subagent(s)")
     print(f"{'cost/run:':<16}{fmt_money(cost_run)}")
     print(f"{'cost/day:':<16}{fmt_money(cost_day)}")
-    print(f"{'cost/'+str(args.days)+'d:':<16}{t.c('cyan', fmt_money(cost_horizon))}")
-    print(
-        f"estimate (as of {as_of} pricing) - reconcile against run-log.md actuals; "
-        "cadence is the biggest lever",
-        file=sys.stderr,
-    )
+    print(f"{'cost/'+str(args.days)+'d:':<16}{fmt_money(cost_horizon)}  (uncached)")
+    if cache is not None:
+        if cache["beneficial"]:
+            cd, ch = cache["cost_per_day"], cache["cost_per_day"] * args.days
+            save = (cost_day - cd) / cost_day * 100 if cost_day else 0.0
+            print(f"{'cached/'+str(args.days)+'d:':<16}{t.c('cyan', fmt_money(ch))}  "
+                  f"({t.c('green', f'-{save:.0f}%')}, TTL {cache['ttl']}, prefix ~{cache['prefix_tokens']:,} tok)")
+            print(f"recommendation: cache the static run.md+system prefix at TTL {cache['ttl']} "
+                  f"-> ~-{save:.0f}%/mo. Keep run.md BYTE-IDENTICAL every tick or the cache never hits.",
+                  file=sys.stderr)
+        else:
+            print(f"caching: not beneficial here", file=sys.stderr)
+            print(f"  why: {cache['reason']}", file=sys.stderr)
+    print(f"estimate (as of {as_of} pricing) - reconcile against run-log.md actuals; "
+          "cadence is the biggest lever, then caching, then model tier", file=sys.stderr)
     return EX_OK
 
 

+ 227 - 0
skills/loop-ops/scripts/loop-doctor.sh

@@ -0,0 +1,227 @@
+#!/usr/bin/env bash
+# Preflight a loop config - will this loop actually RUN, or die at 3am?
+#
+# loop-audit checks the config is well-formed; loop-doctor checks the loop will
+# execute: the gate command's binary resolves, claude/git are on PATH, the budget
+# can fit a tick, and the permission mode is achievable from where it launches.
+# Modeled on fleet-worker/scripts/fleet-doctor.sh.
+#
+# Usage:   loop-doctor.sh [--offline|--live] [--json] [-q] <loop.config.yaml>
+# Input:   argv flags + a config path (no stdin).
+# Output:  stdout = check rows (TSV: state<TAB>check<TAB>detail), or a --json envelope.
+# Stderr:  the preflight panel, notices, errors.
+# Exit:    0 ok, 2 usage, 3 config not found, 4 unparseable, 5 missing core dep,
+#          10 a check predicts a runtime failure (a gate binary missing, bypass on
+#          host without isolation, budget too small for a tick)
+#
+#   --offline (default): no PATH/exec - config-shape + budget-vs-cost + permission/
+#                        isolation coherence. Safe for PR CI.
+#   --live:              adds runtime preflight - claude/git on PATH, the verify/guard
+#                        leading binary resolvable, the kill-switch path's parent exists.
+#
+# Examples:
+#   loop-doctor.sh --offline .loops/pr-babysitter/loop.config.yaml
+#   loop-doctor.sh --live .loops/ci-sweeper/loop.config.yaml
+#   loop-doctor.sh --live --json .loops/dep-sweeper/loop.config.yaml | jq '.data[] | select(.state=="bad")'
+set -uo pipefail
+
+readonly EX_OK=0 EX_USAGE=2 EX_NOTFOUND=3 EX_UNPARSEABLE=4 EX_MISSING_DEP=5 EX_FINDINGS=10
+
+__lib="$(cd "$(dirname "${BASH_SOURCE[0]}")/../../_lib" 2>/dev/null && pwd || true)"
+if [ -n "${__lib:-}" ] && [ -f "$__lib/term.sh" ]; then . "$__lib/term.sh"; term_init 2
+else
+  term_panel_open() { :; }; term_panel_close() { :; }; term_panel_vert() { :; }
+  term_status_row() { shift; printf '  - %s %s\n' "$1" "${2:-}"; }
+  term_color() { shift; printf '%s' "$*"; }; TERM_DOT="|"
+fi
+
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PRICING="$HERE/../assets/model-pricing.json"
+
+CFG=""; MODE="offline"; JSON=0; QUIET=0
+
+usage() {
+  cat <<'EOF'
+loop-doctor.sh - preflight a loop config (will it actually run?).
+
+Usage:
+  loop-doctor.sh [--offline|--live] [--json] [-q] <loop.config.yaml>
+
+Options:
+  --offline      config-shape + budget-vs-cost + permission coherence (default; no PATH/exec).
+  --live         adds runtime preflight: claude/git on PATH, verify/guard binary resolvable.
+  --json         emit a JSON envelope.
+  -q, --quiet    suppress the stderr panel.
+  -h, --help     show this help and exit 0.
+
+Exit codes:
+  0 ok   2 usage   3 not found   4 unparseable   5 missing dep   10 predicted runtime failure
+
+Examples:
+  loop-doctor.sh --offline .loops/pr-babysitter/loop.config.yaml
+  loop-doctor.sh --live .loops/ci-sweeper/loop.config.yaml
+  loop-doctor.sh --live --json .loops/dep-sweeper/loop.config.yaml | jq '.data[] | select(.state=="bad")'
+EOF
+}
+die_usage() { printf 'error: %s\n' "$1" >&2; echo >&2; usage >&2; exit "$EX_USAGE"; }
+
+while [[ $# -gt 0 ]]; do
+  case "$1" in
+    --offline) MODE="offline"; shift ;;
+    --live)    MODE="live"; shift ;;
+    --json)    JSON=1; shift ;;
+    -q|--quiet) QUIET=1; shift ;;
+    -h|--help) usage; exit "$EX_OK" ;;
+    -*)        die_usage "unknown flag: $1" ;;
+    *)         [[ -z "$CFG" ]] || die_usage "unexpected extra argument: $1"; CFG="$1"; shift ;;
+  esac
+done
+
+command -v awk  >/dev/null 2>&1 || { echo "loop-doctor: awk required" >&2; exit "$EX_MISSING_DEP"; }
+command -v grep >/dev/null 2>&1 || { echo "loop-doctor: grep required" >&2; exit "$EX_MISSING_DEP"; }
+
+[[ -n "$CFG" ]] || die_usage "a loop.config.yaml path is required"
+[[ -f "$CFG" ]] || { printf 'error: config not found: %s\n' "$CFG" >&2; exit "$EX_NOTFOUND"; }
+grep -Eq '^[a-z_]+:' "$CFG" || { printf 'error: no parseable keys in %s\n' "$CFG" >&2; exit "$EX_UNPARSEABLE"; }
+
+# Pick a working python for the budget-vs-cost check (skipped gracefully if none).
+PY=""
+for c in python python3 py; do
+  if command -v "$c" >/dev/null 2>&1 && "$c" -c "" >/dev/null 2>&1; then PY="$c"; break; fi
+done
+
+# ── flat-YAML readers (no yq), same contract as loop-audit.sh ────────────────
+cfg_scalar() {
+  awk -v k="$1" -v q="'" '
+    $0 ~ "^"k":" { sub("^"k":[ \t]*",""); sub(/[ \t]*#.*$/,""); gsub(/^[ \t]+|[ \t]+$/,"");
+      gsub(/^"|"$/,""); gsub("^"q"|"q"$",""); print; exit }' "$CFG"
+}
+cfg_list_items() {
+  awk -v k="$1" -v q="'" '
+    $0 ~ "^"k":" { inlist=1; next }
+    inlist==1 { if ($0 ~ /^[ \t]*-[ \t]+/) { line=$0; sub(/^[ \t]*-[ \t]+/,"",line); sub(/[ \t]*#.*$/,"",line);
+        gsub(/^[ \t]+|[ \t]+$/,"",line); gsub(/^"|"$/,"",line); gsub("^"q"|"q"$","",line); if (line!="") print line }
+      else if ($0 ~ /^[^ \t#]/) { inlist=0 } }' "$CFG"
+}
+
+TIER="$(cfg_scalar tier)"; PMODE="$(cfg_scalar permission_mode)"; PATTERN="$(cfg_scalar pattern)"
+VERIFY="$(cfg_scalar verify)"; GUARD="$(cfg_scalar guard)"; BUDGET="$(cfg_scalar budget_tokens)"
+KILL="$(cfg_scalar kill_switch)"; ESCAL="$(cfg_scalar escalation)"
+is_l2plus=0; [[ "$TIER" == "L2" || "$TIER" == "L3" ]] && is_l2plus=1
+
+# ── findings ─────────────────────────────────────────────────────────────
+ROWS=()       # "state\tcheck\tdetail"
+FINDING=0
+row() { ROWS+=("$1"$'\t'"$2"$'\t'"$3"); [[ "$1" == "bad" ]] && FINDING=1; }
+
+# leading binary of a command string (first whitespace token; strips a leading VAR= prefix)
+lead_bin() { awk '{ for(i=1;i<=NF;i++){ if($i !~ /=/){print $i; exit} } }' <<<"$1"; }
+
+# ── OFFLINE checks ───────────────────────────────────────────────────────
+# Permission mode achievability.
+case "$PMODE" in
+  default) row bad "permission_mode" "default is interactive - a headless 'claude -p' tick can't answer prompts; use dontAsk/auto/bypassPermissions" ;;
+  "")      row bad "permission_mode" "missing" ;;
+  *)       row ok  "permission_mode" "$PMODE" ;;
+esac
+# L3 bypass needs an isolation boundary.
+if [[ "$TIER" == "L3" && "$PMODE" == "bypassPermissions" ]]; then
+  if printf '%s %s' "$ESCAL" "$(cfg_list_items scope | tr '\n' ' ')" | grep -Eqi 'container|isolat|sandbox|devcontainer'; then
+    row ok "isolation" "L3 bypass declares an isolation boundary"
+  else
+    row bad "isolation" "L3 + bypassPermissions with no container/sandbox note - only safe in an isolated VM/container"
+  fi
+fi
+# Budget vs estimated tokens/run.
+if [[ -n "$BUDGET" && "$BUDGET" =~ ^[0-9]+$ && -n "$PY" && -n "$PATTERN" && -f "$PRICING" ]]; then
+  TPR="$(PR="$PRICING" PAT="$PATTERN" "$PY" -c "import json,os
+try:
+ d=json.load(open(os.environ['PR']))['_pattern_defaults'].get(os.environ['PAT'])
+ print((int(d['input'])+int(d['output']))*int(d.get('subagents',1)) if d else '')
+except Exception: print('')" 2>/dev/null)"
+  if [[ -n "$TPR" && "$TPR" =~ ^[0-9]+$ ]]; then
+    if [[ "$BUDGET" -lt "$TPR" ]]; then
+      row bad "budget" "budget_tokens $BUDGET < ~$TPR est. tokens/run for $PATTERN - a tick can't complete"
+    else
+      row ok "budget" "budget_tokens $BUDGET >= ~$TPR est. tokens/run"
+    fi
+  fi
+fi
+
+# ── LIVE checks ──────────────────────────────────────────────────────────
+if [[ "$MODE" == "live" ]]; then
+  if command -v claude >/dev/null 2>&1; then row ok "claude" "on PATH"; else row warn "claude" "not on PATH - the scheduler that runs 'claude -p' must have it"; fi
+  if command -v git >/dev/null 2>&1; then
+    row ok "git" "on PATH"
+    if [[ "$is_l2plus" -eq 1 ]] && ! git worktree list >/dev/null 2>&1; then
+      row warn "worktree" "'git worktree' unavailable here - L2+ isolates changes in a worktree"
+    fi
+  elif [[ "$is_l2plus" -eq 1 ]]; then
+    row bad "git" "git not on PATH - L2+ needs it for worktree isolation + landing"
+  else
+    row warn "git" "git not on PATH"
+  fi
+  # verify / guard leading binary resolvable
+  for pair in "verify:$VERIFY" "guard:$GUARD"; do
+    label="${pair%%:*}"; cmd="${pair#*:}"
+    [[ -z "$cmd" ]] && continue
+    case "$cmd" in *"<"*">"*) continue ;; esac   # unfilled placeholder - audit's job
+    bin="$(lead_bin "$cmd")"
+    [[ -z "$bin" ]] && continue
+    if [[ "$bin" == */* ]]; then
+      [[ -x "$bin" ]] && row ok "$label" "$bin executable" || row bad "$label" "$bin not executable - the gate can't run"
+    elif command -v "$bin" >/dev/null 2>&1; then
+      row ok "$label" "$bin resolves"
+    else
+      row bad "$label" "'$bin' not on PATH - the gate command can't run at tick time"
+    fi
+  done
+  # kill-switch path parent exists (only when it clearly names a path)
+  ks_path="$(grep -oE '[^ "'"'"']*/[^ "'"'"']*' <<<"$KILL" | head -1)"
+  if [[ -n "$ks_path" ]]; then
+    parent="$(dirname "$ks_path")"
+    [[ -d "$parent" || "$parent" == "." ]] && row ok "kill_switch" "sentinel path parent exists ($parent)" \
+      || row warn "kill_switch" "sentinel parent dir missing ($parent) - create it so the switch works"
+  fi
+fi
+
+# ── output ───────────────────────────────────────────────────────────────
+n_bad=0; n_warn=0; n_ok=0
+for r in "${ROWS[@]:-}"; do
+  case "${r%%$'\t'*}" in bad) n_bad=$((n_bad+1));; warn) n_warn=$((n_warn+1));; ok) n_ok=$((n_ok+1));; esac
+done
+
+if [[ "$JSON" -eq 1 ]]; then
+  printf '{\n  "data": [\n'
+  if [[ ${#ROWS[@]} -gt 0 ]]; then
+   for i in "${!ROWS[@]}"; do
+    IFS=$'\t' read -r st ck dt <<<"${ROWS[$i]}"
+    dt="${dt//\\/\\\\}"; dt="${dt//\"/\\\"}"
+    sep=","; [[ "$i" -eq $(( ${#ROWS[@]} - 1 )) ]] && sep=""
+    printf '    {"state": "%s", "check": "%s", "detail": "%s"}%s\n' "$st" "$ck" "$dt" "$sep"
+   done
+  fi
+  printf '  ],\n  "meta": {"mode": "%s", "ok": %d, "warn": %d, "bad": %d, "will_run": %s, "tier": "%s", "schema": "claude-mods.loop-ops.doctor/v1"}\n}\n' \
+    "$MODE" "$n_ok" "$n_warn" "$n_bad" "$([[ "$FINDING" -eq 0 ]] && echo true || echo false)" "${TIER:-unknown}"
+else
+  if [[ ${#ROWS[@]} -gt 0 ]]; then
+    for r in "${ROWS[@]}"; do
+      IFS=$'\t' read -r st ck dt <<<"$r"
+      printf '%-5s %-14s %s\n' "$st" "$ck" "$dt"
+    done
+  fi
+  if [[ "$QUIET" -eq 0 ]]; then
+    verdict="$([[ "$FINDING" -eq 0 ]] && echo "WILL RUN" || echo "WILL FAIL")"
+    vstate="$([[ "$FINDING" -eq 0 ]] && echo ok || echo bad)"
+    {
+      term_panel_open loop "loop ${TERM_DOT} doctor ($MODE)" "$(basename "$(dirname "$CFG")")"
+      term_panel_vert
+      term_status_row "$vstate" "$verdict" "$n_bad blocking ${TERM_DOT} $n_warn advisory ${TERM_DOT} $n_ok ok"
+      [[ "$MODE" == "offline" ]] && term_status_row skip "run --live before scheduling" "checks gate binaries + PATH"
+      term_panel_vert
+      term_panel_close "audit = well-formed ${TERM_DOT} doctor = will-run" ""
+    } >&2
+  fi
+fi
+
+[[ "$FINDING" -eq 0 ]] && exit "$EX_OK" || exit "$EX_FINDINGS"

+ 32 - 1
skills/loop-ops/tests/run.sh

@@ -17,6 +17,7 @@ INIT="$SCRIPTS/loop-init.sh"
 AUDIT="$SCRIPTS/loop-audit.sh"
 COST="$SCRIPTS/loop-cost.py"
 SYNC="$SCRIPTS/check-pricing-sync.py"
+DOCTOR="$SCRIPTS/loop-doctor.sh"
 
 # Pick a python that actually executes — skips the Windows Store python3 stub.
 PYTHON=""
@@ -187,6 +188,36 @@ expect_exit "cron cadence -> 0" 0 $?
 out="$("$PYTHON" "$COST" --pattern custom --cadence weird --runs-per-day 5 --model claude-haiku-4-5 2>/dev/null)"; rc=$?
 expect_exit "runs-per-day override -> 0" 0 "$rc"
 expect_has  "uses the override" "5 runs/day" "$out"
+# caching: a fast loop (10m -> 1h TTL) projects a cached saving
+out="$("$PYTHON" "$COST" --pattern ci-sweeper --cadence 10m --model claude-sonnet-4-6 2>&1)"
+expect_has "fast loop shows a cached projection" "cached/" "$out"
+# caching: a slow loop (6h > 1h TTL) is not cache-beneficial
+out="$("$PYTHON" "$COST" --pattern daily-triage --cadence 6h --model claude-opus-4-8 2>&1)"
+expect_has "slow loop: caching not beneficial" "not beneficial" "$out"
+# --no-cache suppresses the cached projection
+out="$("$PYTHON" "$COST" --pattern ci-sweeper --cadence 10m --model claude-sonnet-4-6 --no-cache 2>&1)"
+case "$out" in *"cached/"*) no "--no-cache still showed caching";; *) ok "--no-cache suppresses caching";; esac
+# json caching block present for a cacheable loop
+out="$("$PYTHON" "$COST" --pattern ci-sweeper --cadence 5m --model claude-sonnet-4-6 --json 2>/dev/null)"
+expect_has "cost json carries caching block" '"caching"' "$out"
+
+# ── loop-doctor: preflight (offline budget, live binary), json ─────────────
+echo "-- loop-doctor --"
+bash "$DOCTOR" --help >/dev/null 2>&1; expect_exit "loop-doctor --help -> 0" 0 $?
+bash "$DOCTOR" --offline "$SB/l1.yaml" >/dev/null 2>&1; expect_exit "doctor offline healthy L1 -> 0" 0 $?
+bash "$DOCTOR" --live "$SB/l1.yaml" >/dev/null 2>&1; expect_exit "doctor live healthy L1 -> 0" 0 $?
+# budget too small for the pattern -> bad -> 10
+sed 's/^budget_tokens: 300000/budget_tokens: 100/' "$SB/l2.yaml" > "$SB/l2-poor.yaml"
+out="$(bash "$DOCTOR" --offline "$SB/l2-poor.yaml" 2>/dev/null)"; rc=$?
+expect_exit "doctor budget-too-small -> 10" 10 "$rc"
+expect_has  "doctor names the budget gap" "tokens/run" "$out"
+# live: a verify gate whose binary is missing -> bad -> 10
+sed 's/^verify: "npm test"/verify: "totally-missing-binary-zzz run"/' "$SB/l2.yaml" > "$SB/l2-nobin.yaml"
+bash "$DOCTOR" --live "$SB/l2-nobin.yaml" >/dev/null 2>&1; expect_exit "doctor missing gate binary -> 10" 10 $?
+# missing config -> 3, json schema
+bash "$DOCTOR" --offline "$SB/no-such.yaml" >/dev/null 2>&1; expect_exit "doctor missing config -> 3" 3 $?
+out="$(bash "$DOCTOR" --offline --json "$SB/l1.yaml" 2>/dev/null)"
+expect_has "doctor json schema" "claude-mods.loop-ops.doctor/v1" "$out"
 
 # ── loop-cost: validation errors ───────────────────────────────────────────
 "$PYTHON" "$COST" --pattern pr-babysitter --cadence 10m --model claude-nope >/dev/null 2>&1; expect_exit "unknown model -> 4" 4 $?
@@ -208,7 +239,7 @@ expect_has "pricing-sync json in_sync" '"in_sync": true' "$out"
 
 # ── terminal design system ─────────────────────────────────────────────────
 echo "-- terminal design system --"
-for s in "$INIT" "$AUDIT"; do
+for s in "$INIT" "$AUDIT" "$DOCTOR"; do
   b="$(basename "$s")"
   grep -q '_lib/term.sh' "$s" && ok "$b sources _lib/term.sh" || no "$b does not source _lib/term.sh"
 done