# Visualization — audio-reactive video, audiograms, spectrograms Turning sound into pixels: podcast clips for social, waveform "audiograms", debugging audio by looking at it. ## Waveform video (the podcast audiogram) ```bash # scrolling waveform over a brand background + episode title: ffmpeg -i episode.mp3 -loop 1 -i bg_1080x1920.png -filter_complex \ "[0:a]showwaves=s=1080x300:mode=cline:colors=white:rate=30[w]; [1:v][w]overlay=0:1200:shortest=1,drawtext=text='EP 42 — Title':fontsize=56:fontcolor=white:x=(w-text_w)/2:y=320" \ -c:v libx264 -crf 21 -preset fast -pix_fmt yuv420p -c:a aac -b:a 128k -shortest audiogram.mp4 ``` `shortest=1` on the overlay + `-shortest` at the end stop the looped image from running forever. `showwaves` modes: `cline` (filled, the podcast look), `line`, `p2p`, `point`. ## Spectrum styles ```bash # frequency bars (the "visualizer" look): "[0:a]showfreqs=s=1280x420:mode=bar:fscale=log[v]" # scrolling spectrogram (also the debugging view — see below): "[0:a]showspectrum=s=1280x720:mode=combined:color=intensity:scale=log:slide=scroll[v]" # musical/CQT spectrum (notes align to rows — lovely for music): "[0:a]showcqt=s=1280x720[v]" # minimal volume meter / phase scope: "[0:a]avectorscope=s=720x720:zoom=1.5[v]" ``` All consume `[0:a]` and produce a video stream — overlay/hstack them like any other video ([filtergraph.md](filtergraph.md)). ## Static waveform / spectrogram images ```bash # waveform PNG (one image of the whole file — episode art, quick inspection): ffmpeg -i in.mp3 -filter_complex "showwavespic=s=1920x480:colors=#3aa3ff" -frames:v 1 wave.png # spectrogram PNG — the audio-debugging x-ray: ffmpeg -i in.wav -lavfi "showspectrumpic=s=1920x1080:scale=log" -frames:v 1 spec.png ``` Reading the spectrogram: a hard ceiling at ~16 kHz = the file was once a lossy 128k MP3 regardless of its current extension; mains hum = a solid line at 50/60 Hz (kill with `highpass`); clicks = vertical needles. Faster than ears for "is this 'lossless' file actually lossless". ## Audio-reactive overlays (beyond fixed shapes) ffmpeg-only reactivity is limited to the built-in scopes. For brand-grade audio-reactive motion (pulsing logos, beat-synced glow), render with a composition tool (hyperframes' audio-reactive bindings or Remotion's `useAudioData`) and use ffmpeg for the I/O around it: extract the audio (`-vn`), supply stems, encode/package the rendered result ([encoding.md](encoding.md)). ## Comparison grids (encode A/B, model-output review) ```bash # 2x2 labelled grid of four variants: ffmpeg -i a.mp4 -i b.mp4 -i c.mp4 -i d.mp4 -filter_complex \ "[0:v]drawtext=text='crf20':fontsize=36:fontcolor=white:box=1:boxcolor=black@0.5:x=12:y=12[a]; [1:v]drawtext=text='crf26':fontsize=36:fontcolor=white:box=1:boxcolor=black@0.5:x=12:y=12[b]; [2:v]drawtext=text='nvenc':fontsize=36:fontcolor=white:box=1:boxcolor=black@0.5:x=12:y=12[c]; [3:v]drawtext=text='av1':fontsize=36:fontcolor=white:box=1:boxcolor=black@0.5:x=12:y=12[d]; [a][b][c][d]xstack=inputs=4:layout=0_0|w0_0|0_h0|w0_h0" -an grid.mp4 ``` Inputs must share dimensions (scale first if not). The 2-input case (`hstack` + difference blend) lives in [quality-metrics.md](quality-metrics.md).