Edit-as-Code — EDL-driven editing

The pattern behind Anthropic's Fable launch video (edited entirely through Claude Code — transcription → shot-selection JSON → ffmpeg → LUTs, no NLE): the edit is files, not timeline state. Every stage's output is a reviewable, rerunnable, diffable artifact.

The pipeline

raw takes (+ script if scripted)
  │
  ▼ 1 TRANSCRIBE   word-level JSON per take          → work/transcripts/*.json
  ▼ 2 SELECT       reason over transcripts            → work/final-edit.json (EDL)
  ▼ 3 CUT          cut-from-edl.py --execute          → work/edl-cuts/ + final.mp4
  ▼ 4 VERIFY       re-transcribe the output           → no clipped words, no filler
  ▼ 5 GRADE        gen-luts.py + HUMAN picks          → graded.mp4
  ▼ 6 PACKAGE      loudnorm pass-2, faststart, gates  → deliverable

Stages 5–6 are color-grading.md and audio.md/quality-metrics.md; transcription is stt-whisper.md. This file owns stages 2–4.

The EDL is the deliverable artifact

Schema: ../assets/edl-schema.json. The load-bearing field is selection_rationale — why each take won, written down:

{
  "scene": 1,
  "title": "Part 1: Intro",
  "candidate_takes": ["C001", "C002", "C003", "C017 (re-shoot)"],
  "selection_rationale": "C017 disqualified - 5.8s dead pause mid-sentence. C003 is the cleanest complete take: zero ums, clean ending.",
  "clips": [{ "file": "takes/A004C003.mp4", "start": 1.89, "end": 60.81,
              "first_words": "Hey everyone, it's..." }]
}

A reviewer reads the rationale instead of scrubbing footage. Git diffs of the EDL are the edit history. Re-running cut-from-edl.py regenerates the output identically.

Shot-selection heuristics (multi-take footage)

The agent reasons over transcripts, not frames — it cannot watch video. Per scene, read every candidate take's transcript and apply:

Fewest filler words ("um", "uh", restarts) wins, all else equal.
Prefer later takes — speakers warm up; the last full take is usually best.
Disqualify takes with dead pauses > ~2 s mid-sentence or that never complete the scripted line.
Trim warm-up openers ("Hey [name]…" used to start a sentence warm): cut at the silent gap after the warm-up, never mid-word.
Record disqualifications in the rationale too — the search space is part of the review.

For many scenes, fan out one agent per scene (each reads only its candidates) and have a verifier pass check the assembled EDL — this maps directly onto the Workflow tool's pipeline+verify pattern.

The two verification rules

1. Every cut boundary must land in silence. Words are clipped by cuts that "look right" numerically. Mechanically check each in/out against measured silence:

python skills/ffmpeg-ops/scripts/detect-segments.py --silence --json take.mp4 \
  | jq --argjson t 60.81 '.data.silences[] | select(.start <= $t and .end >= $t)'
# empty result = the proposed cut at 60.81 is NOT in silence — move it

2. Re-transcribe the output. After cut-from-edl.py --execute, run the final video back through transcription and assert: every scene's first_words appears, no filler words survived, no sentence is truncated at a boundary. This catches off-by-keyframe and timestamp-unit errors that no amount of EDL review will.

Cut mode choice

cut-from-edl.py re-encodes by default (frame-accurate, normalizes mixed sources, concat always safe). Use --copy only when the EDL was authored against measured keyframes (probe-media.py --keyframes-near for every in-point) — e.g. when the source is an all-intra mezzanine (encoding.md).

Human gates

Taste calls stay human: the grade pick (color-grading.md), final timing, sound design. The agent's job ends at presenting options with evidence — never auto-select past these gates.

edit-as-code.md 3.9 KB لینک دائمی تاريخچه خام