# Edit-as-Code — EDL-driven editing The pattern behind Anthropic's Fable launch video (edited entirely through Claude Code — transcription → shot-selection JSON → ffmpeg → LUTs, no NLE): **the edit is files, not timeline state.** Every stage's output is a reviewable, rerunnable, diffable artifact. ## The pipeline ``` raw takes (+ script if scripted) │ ▼ 1 TRANSCRIBE word-level JSON per take → work/transcripts/*.json ▼ 2 SELECT reason over transcripts → work/final-edit.json (EDL) ▼ 3 CUT cut-from-edl.py --execute → work/edl-cuts/ + final.mp4 ▼ 4 VERIFY re-transcribe the output → no clipped words, no filler ▼ 5 GRADE gen-luts.py + HUMAN picks → graded.mp4 ▼ 6 PACKAGE loudnorm pass-2, faststart, gates → deliverable ``` Stages 5–6 are [color-grading.md](color-grading.md) and [audio.md](audio.md)/[quality-metrics.md](quality-metrics.md); transcription is [stt-whisper.md](stt-whisper.md). This file owns stages 2–4. ## The EDL is the deliverable artifact Schema: [../assets/edl-schema.json](../assets/edl-schema.json). The load-bearing field is `selection_rationale` — *why* each take won, written down: ```json { "scene": 1, "title": "Part 1: Intro", "candidate_takes": ["C001", "C002", "C003", "C017 (re-shoot)"], "selection_rationale": "C017 disqualified - 5.8s dead pause mid-sentence. C003 is the cleanest complete take: zero ums, clean ending.", "clips": [{ "file": "takes/A004C003.mp4", "start": 1.89, "end": 60.81, "first_words": "Hey everyone, it's..." }] } ``` A reviewer reads the rationale instead of scrubbing footage. Git diffs of the EDL *are* the edit history. Re-running `cut-from-edl.py` regenerates the output identically. ## Shot-selection heuristics (multi-take footage) The agent reasons over **transcripts, not frames** — it cannot watch video. Per scene, read every candidate take's transcript and apply: - **Fewest filler words** ("um", "uh", restarts) wins, all else equal. - **Prefer later takes** — speakers warm up; the last full take is usually best. - **Disqualify** takes with dead pauses > ~2 s mid-sentence or that never complete the scripted line. - **Trim warm-up openers** ("Hey [name]…" used to start a sentence warm): cut at the silent gap *after* the warm-up, never mid-word. - Record disqualifications in the rationale too — the search space is part of the review. For many scenes, fan out one agent per scene (each reads only its candidates) and have a verifier pass check the assembled EDL — this maps directly onto the Workflow tool's pipeline+verify pattern. ## The two verification rules **1. Every cut boundary must land in silence.** Words are clipped by cuts that "look right" numerically. Mechanically check each in/out against measured silence: ```bash python skills/ffmpeg-ops/scripts/detect-segments.py --silence --json take.mp4 \ | jq --argjson t 60.81 '.data.silences[] | select(.start <= $t and .end >= $t)' # empty result = the proposed cut at 60.81 is NOT in silence — move it ``` **2. Re-transcribe the output.** After `cut-from-edl.py --execute`, run the final video back through transcription and assert: every scene's `first_words` appears, no filler words survived, no sentence is truncated at a boundary. This catches off-by-keyframe and timestamp-unit errors that no amount of EDL review will. ## Cut mode choice `cut-from-edl.py` re-encodes by default (frame-accurate, normalizes mixed sources, concat always safe). Use `--copy` only when the EDL was authored against measured keyframes (`probe-media.py --keyframes-near` for every in-point) — e.g. when the source is an all-intra mezzanine ([encoding.md](encoding.md)). ## Human gates Taste calls stay human: the grade pick ([color-grading.md](color-grading.md)), final timing, sound design. The agent's job ends at presenting options with evidence — never auto-select past these gates.