name: log-ops description: "Log analysis and JSONL processing - structured extraction, cross-log correlation, timeline reconstruction, pattern search" allowed-tools: "Read Edit Write Bash Glob Grep Agent"

related-skills: [data-processing, debug-ops, monitoring-ops, file-search, introspect]

Log Operations

Practical patterns for analyzing log files -- especially JSONL format used in agent conversation logs, benchmark outputs, and structured application logs.

Log Format Decision Tree

Unknown Log File
│
├─ Is it one JSON object per line?
│  ├─ Yes ──────────────────────── JSONL
│  │  ├─ Small file (<100MB)
│  │  │  └─ jq for extraction, jq -s for aggregation
│  │  ├─ Large file (100MB-1GB)
│  │  │  └─ rg prefilter then pipe to jq
│  │  └─ Huge file (>1GB)
│  │     └─ split + parallel jq, or jq --stream
│  │
│  └─ No
│     ├─ Is it one large JSON object/array?
│     │  └─ Yes ──────────────── Single JSON
│     │     └─ jq --stream for SAX-style, or jq directly if fits in memory
│     │
│     ├─ Does it have key=value pairs?
│     │  └─ Yes ──────────────── Structured (logfmt / key-value)
│     │     └─ rg for search, awk/sd for extraction, angle-grinder for aggregation
│     │
│     ├─ Does it follow syslog format? (timestamp hostname service[pid]: message)
│     │  └─ Yes ──────────────── Syslog
│     │     └─ rg for search, awk for column extraction, lnav for interactive
│     │
│     ├─ Is it space/tab delimited with consistent columns?
│     │  └─ Yes ──────────────── Column-based (access logs, CSV)
│     │     └─ awk for extraction, mlr for CSV, rg for pattern search
│     │
│     └─ Mixed or unstructured
│        └─ Plain text ─────────── Freeform
│           └─ rg for search, rg -A/-B for context, lnav for exploration

Prerequisites

Required (must be installed):

rg (ripgrep) - text search, prefiltering. Install: cargo install ripgrep / choco install ripgrep
jq - JSON/JSONL extraction and transformation. Install: brew install jq / choco install jq

Optional (enhanced capabilities, gracefully degraded without):

lnav - interactive log exploration with SQL queries. Install: brew install lnav / WSL: apt install lnav
agrind (angle-grinder) - pipeline aggregation syntax. Install: cargo install ag
mlr (Miller) - CSV/TSV log analysis. Install: brew install miller / choco install miller
GNU parallel - parallel processing of split files. Install: brew install parallel

All patterns in this skill work with just rg + jq. Optional tools add interactive exploration (lnav), pipeline aggregation (agrind), and tabular analysis (mlr).

Tool Selection Matrix

Tool	Best For	Speed	Required?
`rg` (ripgrep)	Raw pattern matching in any format	Fastest	Yes
`jq`	JSONL structured extraction and transformation	Fast	Yes
`jq -s`	JSONL aggregation (slurp all lines into array)	Medium (loads all into memory)	Yes (part of jq)
`lnav`	Interactive exploration, SQL over logs	Interactive	Optional
`agrind` (angle-grinder)	Pipeline aggregation and counting	Fast	Optional
`awk`	Column-based log formats, field extraction	Fast	Pre-installed
`mlr` (Miller)	CSV/TSV log analysis, statistics	Fast	Optional
`fd` + `rg`	Searching across many log directories	Fast	Pre-installed in dev-shell
`GNU parallel`	Splitting large files for parallel processing	N/A (orchestrator)	Optional

When to Use What

Need to...
│
├─ Find lines matching a pattern
│  └─ rg (always fastest for text search)
│
├─ Extract specific fields from JSONL
│  └─ jq -r '[.field1, .field2] | @tsv'
│
├─ Count/aggregate over JSONL
│  └─ jq -sc 'group_by(.field) | map({key: .[0].field, n: length})'
│
├─ Search JSONL by value then format results
│  └─ rg '"error"' file.jsonl | jq -r '.message'  (two-stage)
│
├─ Explore interactively with filtering/SQL
│  └─ lnav file.log
│
├─ Aggregate with pipeline syntax
│  └─ agrind '* | parse "* * *" as ts, level, msg | count by level'
│
├─ Extract columns from space-delimited logs
│  └─ awk '{print $1, $4, $7}' access.log
│
└─ Process CSV/TSV logs with headers
   └─ mlr --csv filter '$status >= 400' then stats1 -a count -f status

JSONL Quick Reference

The most common format for structured logs. One JSON object per line, no trailing commas, no wrapping array.

Stream Filtering (line by line, constant memory)

# Filter by field value
jq -c 'select(.level == "error")' app.jsonl

# Filter by nested field
jq -c 'select(.request.method == "POST")' app.jsonl

# Filter by multiple conditions
jq -c 'select(.level == "error" and .status >= 500)' app.jsonl

# Filter by array contains
jq -c 'select(.tags | index("critical"))' app.jsonl

# Filter by field existence
jq -c 'select(.stack_trace != null)' app.jsonl

# Negate a filter
jq -c 'select(.level != "debug")' app.jsonl

Field Extraction

# Extract single field
jq -r '.message' app.jsonl

# Extract multiple fields as TSV
jq -r '[.timestamp, .level, .message] | @tsv' app.jsonl

# Extract with default for missing fields
jq -r '.error_code // "none"' app.jsonl

# Extract nested field safely
jq -r '.response.headers["content-type"] // "unknown"' app.jsonl

Aggregation (requires slurp: loads entire file)

# Count by field value
jq -sc 'group_by(.level) | map({level: .[0].level, count: length})' app.jsonl

# Top-N most common values
jq -sc '[.[].error_type] | group_by(.) | map({type: .[0], count: length}) | sort_by(-.count) | .[:10]' app.jsonl

# Sum a numeric field
jq -sc 'map(.duration_ms) | add' app.jsonl

# Average
jq -sc 'map(.duration_ms) | add / length' app.jsonl

# Min and max
jq -sc 'map(.duration_ms) | {min: min, max: max}' app.jsonl

Nested Extraction (agent logs, complex structures)

# Extract tool calls from conversation logs
jq -c '.content[]? | select(.type == "tool_use") | .name' conversation.jsonl

# De-escape nested JSON strings
jq -c '.content | fromjson' app.jsonl

# Flatten nested arrays
jq -c '[.events[]? | .action]' app.jsonl

# Extract from arrays of objects
jq -c '.results[]? | select(.passed == false) | {test: .name, error: .message}' results.jsonl

Two-Stage Pipeline (rg for speed, jq for structure)

# Fast prefilter then structured extraction
rg '"error"' app.jsonl | jq -r '[.timestamp, .message] | @tsv'

# Search for specific value then aggregate
rg '"timeout"' app.jsonl | jq -sc 'length'

# Pattern match then extract
rg '"user_id":"u-123"' app.jsonl | jq -c '{ts: .timestamp, action: .action}'

Time-Range Filtering

# Filter by timestamp range (ISO 8601 string comparison works)
jq -c 'select(.timestamp > "2026-03-08T10:00" and .timestamp < "2026-03-08T11:00")' app.jsonl

# Events in the last N minutes (using epoch seconds)
jq -c --arg cutoff "$(date -d '30 minutes ago' +%s)" 'select((.timestamp | sub("\\.[0-9]+Z$"; "Z") | fromdate) > ($cutoff | tonumber))' app.jsonl

# Extract hour for histogram
jq -r '.timestamp | split("T")[1] | split(":")[0]' app.jsonl | sort | uniq -c

Cross-File Join

# Extract IDs from one file, search in another
jq -r '.request_id' errors.jsonl | while read id; do
  rg "\"$id\"" responses.jsonl | jq -c '{id: .request_id, status: .status}'
done

# Faster: build lookup, then join
jq -r '.request_id' errors.jsonl | sort -u > /tmp/error_ids.txt
rg -Ff /tmp/error_ids.txt responses.jsonl | jq -c '{id: .request_id, status: .status}'

# Join two JSONL files by key using jq --slurpfile
jq --slurpfile lookup <(jq -sc 'map({(.id): .}) | add' lookup.jsonl) \
  '. + ($lookup[0][.ref_id] // {})' main.jsonl

Plain Text Log Patterns

Pattern Search with Context

# Show 5 lines before and after each match
rg -B5 -A5 "OutOfMemoryError" app.log

# Show only matching files
rg -l "FATAL" /var/log/

# Count matches per file
rg -c "ERROR" /var/log/*.log | sort -t: -k2 -rn

# Multiline patterns (stack traces)
rg -U "Exception.*\n(\s+at .*\n)+" app.log

Column Extraction with awk

# Apache/nginx access log: extract status codes
awk '{print $9}' access.log | sort | uniq -c | sort -rn

# Extract specific time range from syslog
awk '$0 >= "Mar  8 10:00" && $0 <= "Mar  8 11:00"' syslog

# Calculate average response time (column 11)
awk '{sum += $11; n++} END {print sum/n}' access.log

# Filter by status code and show URL + response time
awk '$9 >= 500 {print $7, $11"ms"}' access.log

Live Monitoring

# Follow with filtering
tail -f app.log | rg --line-buffered "ERROR"

# Follow JSONL and extract fields
tail -f app.jsonl | jq --unbuffered -r '[.timestamp, .level, .message] | @tsv'

# Follow multiple files
tail -f /var/log/service-*.log | rg --line-buffered "error|warn"

Timeline Reconstruction

Extracting and Sorting by Timestamp

# Merge multiple log files by timestamp
sort -t' ' -k1,2 service-a.log service-b.log > timeline.log

# JSONL: sort by timestamp field
jq -sc 'sort_by(.timestamp)[]' combined.jsonl > sorted.jsonl

# Extract timestamps and calculate gaps
jq -r '.timestamp' app.jsonl | awk '
  NR > 1 {
    cmd = "date -d \"" prev "\" +%s"; cmd | getline t1; close(cmd)
    cmd = "date -d \"" $0 "\" +%s"; cmd | getline t2; close(cmd)
    gap = t2 - t1
    if (gap > 5) print gap "s gap before " $0
  }
  { prev = $0 }
'

# Quick duration between first and last event
jq -sc '{start: .[0].timestamp, end: .[-1].timestamp}' app.jsonl

Calculating Durations Between Events

# Duration between paired events (start/end)
jq -sc '
  group_by(.request_id) |
  map(
    (map(select(.event == "start")) | .[0].timestamp) as $start |
    (map(select(.event == "end")) | .[0].timestamp) as $end |
    {id: .[0].request_id, start: $start, end: $end}
  )
' events.jsonl

# Identify the slowest phase
jq -sc '
  sort_by(.timestamp) |
  [range(1; length) | {
    from: .[.-1].event,
    to: .[.].event,
    gap: ((.[.].ts_epoch) - (.[.-1].ts_epoch))
  }] |
  sort_by(-.gap) | .[0]
' events.jsonl

Cross-Log Correlation

By Correlation ID

# Find a request across all service logs
fd -e jsonl . /var/log/services/ -x rg "\"req-abc-123\"" {}

# Build a timeline for a single request
fd -e jsonl . /var/log/services/ -x rg "\"req-abc-123\"" {} \; | jq -sc 'sort_by(.timestamp)[] | [.timestamp, .service, .event] | @tsv'

By Timestamp Window

# Find events within 2 seconds of a known event
# First get the target timestamp
TARGET="2026-03-08T14:23:15"
jq -c --arg t "$TARGET" '
  select(
    .timestamp > ($t | sub("15$"; "13")) and
    .timestamp < ($t | sub("15$"; "17"))
  )
' other-service.jsonl

By Session/User

# Reconstruct a user session across log files
fd -e jsonl . /var/log/ -x rg "\"user-42\"" {} \; |
  jq -sc 'sort_by(.timestamp)[] | [.timestamp, .service, .action] | @tsv'

Large File Strategies

Search Recent Only

# Last 10,000 lines (fast for append-only logs)
tail -n 10000 huge.log | rg "pattern"

# Last N lines of JSONL with structured extraction
tail -n 5000 huge.jsonl | jq -c 'select(.level == "error")'

Split for Parallel Processing

# Split into 100K-line chunks
split -l 100000 huge.jsonl /tmp/chunk_

# Process in parallel
fd 'chunk_' /tmp/ -x jq -c 'select(.level == "error")' {} > errors.jsonl

# With GNU parallel
split -l 100000 huge.jsonl /tmp/chunk_
ls /tmp/chunk_* | parallel 'jq -c "select(.level == \"error\")" {} >> /tmp/errors.jsonl'

Streaming for Huge Single JSON

# SAX-style processing of a huge JSON array
jq --stream 'select(.[0][0] == "results" and .[0][-1] == "status") | .[1]' huge.json

# Extract items from a huge array without loading all
jq -cn --stream 'fromstream(1 | truncate_stream(inputs))' huge-array.json

Two-Stage Always

# ALWAYS faster: rg filters text, jq parses survivors
rg '"error"' huge.jsonl | jq -r '.message'

# vs. SLOW: jq reads and parses every line
jq -r 'select(.level == "error") | .message' huge.jsonl

Search Across Directories

Multi-Directory Patterns

# Find all JSONL files with errors across trial directories
fd -e jsonl . trials/ -x rg -l '"error"' {}

# Count errors per log file across directories
fd -e jsonl . trials/ -x bash -c 'echo "$(rg -c "\"error\"" "$1" 2>/dev/null || echo 0) $1"' _ {}

# Extract and aggregate across directories
fd -e jsonl . trials/ -x jq -c 'select(.level == "error") | {file: input_filename, msg: .message}' {}

# Build summary table from multiple runs
for dir in trials/*/; do
  total=$(wc -l < "$dir/results.jsonl")
  errors=$(rg -c '"error"' "$dir/results.jsonl" 2>/dev/null || echo 0)
  echo -e "$dir\t$total\t$errors"
done | column -t -N DIR,TOTAL,ERRORS

Common Gotchas

Gotcha	Why It Hurts	Fix
`jq -s` on huge files loads everything into memory	OOM crash or swap thrashing on files over ~500MB	Use streaming: `rg` prefilter, `jq --stream`, or `split` + parallel
JSONL with embedded newlines in string values	Line-by-line tools (rg, awk, head) split a single record across lines	Use `jq -c` to re-compact, or `jq -R 'fromjson?'` to skip malformed lines
rg matches JSON keys, not just values	`rg "error"` matches `{"error_count": 0}` which is not an error	Use `rg '"level":"error"'` or pipe to `jq 'select(.level == "error")'`
Timezone mismatches in timestamp comparisons	Events appear out of order or time ranges miss data	Normalize to UTC before comparing: `jq '.timestamp
Unicode and escape sequences in log messages	jq chokes on invalid UTF-8 or double-escaped strings	Prefilter with `rg -a` (binary mode), or use `jq -R` for raw strings
Inconsistent JSON schemas across log lines	`jq` errors on lines missing expected fields	Use `//` operator for defaults: `.field // "missing"` and `?` for optional: `.arr[]?`
Forgetting `-c` flag with jq on JSONL	jq pretty-prints each line, output is no longer valid JSONL	Always use `jq -c` when output feeds into another JSONL consumer
tail -f with jq buffering	Output appears delayed or not at all	Use `jq --unbuffered` or `stdbuf -oL jq`
Sorting JSONL by timestamp without slurp	`sort` command does lexicographic sort on whole lines, not by field	Either `jq -sc 'sort_by(.timestamp)[]'` or extract timestamp prefix first
Assuming log files are complete	Logs may be rotated, compressed, or still being written	Check for `.gz` rotated files: `fd -e gz . /var/log/ -x zcat {} \\| rg pattern`
Single quotes in jq on Windows	PowerShell/cmd do not handle single quotes the same as bash	Use double quotes with escaped inner quotes, or write jq filter to a file

Reference Files

File	Contents	Lines
`references/jsonl-patterns.md`	JSONL extraction, aggregation, transformation, comparison, and performance patterns	~700
`references/analysis-workflows.md`	Agent conversation analysis, application log analysis, benchmark result parsing, cross-directory workflows	~600
`references/tool-setup.md`	Installation and configuration for jq, lnav, angle-grinder, rg, awk, GNU parallel, Miller	~450

SKILL.md 16 KB

Permalink History Raw

related-skills: [data-processing, debug-ops, monitoring-ops, file-search, introspect]

Log Operations

Log Format Decision Tree

Prerequisites

Tool Selection Matrix

When to Use What

JSONL Quick Reference

Stream Filtering (line by line, constant memory)

Field Extraction

Aggregation (requires slurp: loads entire file)

Nested Extraction (agent logs, complex structures)

Two-Stage Pipeline (rg for speed, jq for structure)

Time-Range Filtering

Cross-File Join

Plain Text Log Patterns

Pattern Search with Context

Column Extraction with awk

Live Monitoring

Timeline Reconstruction

Extracting and Sorting by Timestamp

Calculating Durations Between Events

Cross-Log Correlation

By Correlation ID

By Timestamp Window

By Session/User

Large File Strategies

Search Recent Only

Split for Parallel Processing

Streaming for Huge Single JSON

Two-Stage Always

Search Across Directories

Multi-Directory Patterns

Common Gotchas

Reference Files

See Also

SKILL.md 16 KB Permalink History Raw

related-skills: [data-processing, debug-ops, monitoring-ops, file-search, introspect]

Log Operations

Log Format Decision Tree

Prerequisites

Tool Selection Matrix

When to Use What

JSONL Quick Reference

Stream Filtering (line by line, constant memory)

Field Extraction

Aggregation (requires slurp: loads entire file)

Nested Extraction (agent logs, complex structures)

Two-Stage Pipeline (rg for speed, jq for structure)

Time-Range Filtering

Cross-File Join

Plain Text Log Patterns

Pattern Search with Context

Column Extraction with awk

Live Monitoring

Timeline Reconstruction

Extracting and Sorting by Timestamp

Calculating Durations Between Events

Cross-Log Correlation

By Correlation ID

By Timestamp Window

By Session/User

Large File Strategies

Search Recent Only

Split for Parallel Processing

Streaming for Huge Single JSON

Two-Stage Always

Search Across Directories

Multi-Directory Patterns

Common Gotchas

Reference Files

See Also

SKILL.md 16 KB

Permalink History Raw