Visualization — audio-reactive video, audiograms, spectrograms

Turning sound into pixels: podcast clips for social, waveform "audiograms", debugging audio by looking at it.

Waveform video (the podcast audiogram)

# scrolling waveform over a brand background + episode title:
ffmpeg -i episode.mp3 -loop 1 -i bg_1080x1920.png -filter_complex \
  "[0:a]showwaves=s=1080x300:mode=cline:colors=white:rate=30[w];
   [1:v][w]overlay=0:1200:shortest=1,drawtext=text='EP 42 — Title':fontsize=56:fontcolor=white:x=(w-text_w)/2:y=320" \
  -c:v libx264 -crf 21 -preset fast -pix_fmt yuv420p -c:a aac -b:a 128k -shortest audiogram.mp4

shortest=1 on the overlay + -shortest at the end stop the looped image from running forever. showwaves modes: cline (filled, the podcast look), line, p2p, point.

Spectrum styles

# frequency bars (the "visualizer" look):
"[0:a]showfreqs=s=1280x420:mode=bar:fscale=log[v]"

# scrolling spectrogram (also the debugging view — see below):
"[0:a]showspectrum=s=1280x720:mode=combined:color=intensity:scale=log:slide=scroll[v]"

# musical/CQT spectrum (notes align to rows — lovely for music):
"[0:a]showcqt=s=1280x720[v]"

# minimal volume meter / phase scope:
"[0:a]avectorscope=s=720x720:zoom=1.5[v]"

All consume [0:a] and produce a video stream — overlay/hstack them like any other video (filtergraph.md).

Static waveform / spectrogram images

# waveform PNG (one image of the whole file — episode art, quick inspection):
ffmpeg -i in.mp3 -filter_complex "showwavespic=s=1920x480:colors=#3aa3ff" -frames:v 1 wave.png

# spectrogram PNG — the audio-debugging x-ray:
ffmpeg -i in.wav -lavfi "showspectrumpic=s=1920x1080:scale=log" -frames:v 1 spec.png

Reading the spectrogram: a hard ceiling at ~16 kHz = the file was once a lossy 128k MP3 regardless of its current extension; mains hum = a solid line at 50/60 Hz (kill with highpass); clicks = vertical needles. Faster than ears for "is this 'lossless' file actually lossless".

Audio-reactive overlays (beyond fixed shapes)

ffmpeg-only reactivity is limited to the built-in scopes. For brand-grade audio-reactive motion (pulsing logos, beat-synced glow), render with a composition tool (hyperframes' audio-reactive bindings or Remotion's useAudioData) and use ffmpeg for the I/O around it: extract the audio (-vn), supply stems, encode/package the rendered result (encoding.md).

Comparison grids (encode A/B, model-output review)

# 2x2 labelled grid of four variants:
ffmpeg -i a.mp4 -i b.mp4 -i c.mp4 -i d.mp4 -filter_complex \
  "[0:v]drawtext=text='crf20':fontsize=36:fontcolor=white:box=1:boxcolor=black@0.5:x=12:y=12[a];
   [1:v]drawtext=text='crf26':fontsize=36:fontcolor=white:box=1:boxcolor=black@0.5:x=12:y=12[b];
   [2:v]drawtext=text='nvenc':fontsize=36:fontcolor=white:box=1:boxcolor=black@0.5:x=12:y=12[c];
   [3:v]drawtext=text='av1':fontsize=36:fontcolor=white:box=1:boxcolor=black@0.5:x=12:y=12[d];
   [a][b][c][d]xstack=inputs=4:layout=0_0|w0_0|0_h0|w0_h0" -an grid.mp4

Inputs must share dimensions (scale first if not). The 2-input case (hstack + difference blend) lives in quality-metrics.md.

visualization.md 3.2 KB لینک دائمی تاريخچه خام