Installation, configuration, and key commands for log analysis tools. Each tool includes install commands for all platforms, the most useful flags, and integration patterns.
The primary tool for structured log analysis. Processes JSONL line by line (streaming) or as a batch (slurp).
# macOS
brew install jq
# Ubuntu/Debian
sudo apt install jq
# Windows
choco install jq
# or
winget install jqlang.jq
# Verify
jq --version
| Flag | Purpose | Example |
|---|---|---|
-c |
Compact output (one JSON per line) | jq -c '.' file.jsonl |
-r |
Raw string output (no quotes) | jq -r '.message' file.jsonl |
-s |
Slurp: read all lines into array | jq -s 'length' file.jsonl |
-S |
Sort object keys | jq -S '.' file.json |
-e |
Exit 1 if output is false/null | jq -e '.ok' file.json |
-R |
Read lines as raw strings | jq -R 'fromjson?' messy.jsonl |
-n |
Null input (use with inputs) | jq -n '[inputs]' file.jsonl |
--arg |
Pass string variable | jq --arg id "42" 'select(.id == $id)' |
--argjson |
Pass JSON variable | jq --argjson n 10 'select(.count > $n)' |
--slurpfile |
Load file as variable | `jq --slurpfile ids ids.json 'select(.id |
--stream |
SAX-style path/value output | jq --stream '.' huge.json |
--unbuffered |
Flush after each output | tail -f f.jsonl \| jq --unbuffered '.' |
--tab |
Use tabs for indentation | jq --tab '.' file.json |
# Pretty print a single JSON object
jq '.' file.json
# Validate JSONL (report bad lines)
jq -c '.' file.jsonl > /dev/null 2>&1 || echo "Invalid JSON detected"
# Find and show invalid lines
awk '{
cmd = "echo " "'\'''" $0 "'\'''" " | jq . 2>/dev/null"
if (system(cmd) != 0) print NR": "$0
}' file.jsonl
# Better: use jq -R to find invalid lines
jq -R 'fromjson? // error' file.jsonl 2>&1 | rg "error" | head
# Count lines in JSONL
jq -sc 'length' file.jsonl
# Get unique keys across all objects
jq -sc '[.[] | keys[]] | unique' file.jsonl
# Get schema (keys and types) from first line
head -1 file.jsonl | jq '[to_entries[] | {key, type: (.value | type)}]'
# Reformat JSONL with consistent key ordering
jq -cS '.' file.jsonl > normalized.jsonl
# Use debug to print intermediate values to stderr
jq '.items[] | debug | select(.active)' file.json
# Use @text to see what jq thinks a value is
jq '.field | @text' file.json
# Use type to check value types
jq '.field | type' file.json
# Build expressions incrementally
jq '.' file.json # Start: see full structure
jq '.items' file.json # Navigate to array
jq '.items[]' file.json # Iterate array
jq '.items[] | .name' file.json # Extract field
jq '.items[] | select(.active)' file.json # Filter
# Common error: "Cannot iterate over null"
# Fix: use ? operator
jq '.items[]?' file.json # Won't error if items is null
# Common error: "null is not iterable"
# Fix: default empty array
jq '(.items // [])[]' file.json
# rg prefilter then jq parse
rg '"error"' app.jsonl | jq -r '.message'
# jq output to column for alignment
jq -r '[.name, .status, .duration] | @tsv' app.jsonl | column -t
# jq output to sort/uniq for frequency
jq -r '.error_type' errors.jsonl | sort | uniq -c | sort -rn
# jq to CSV for spreadsheet import
jq -r '[.timestamp, .level, .message] | @csv' app.jsonl > export.csv
# jq with xargs for per-line processing
jq -r '.file_path' manifest.jsonl | xargs wc -l
Interactive terminal-based log viewer with SQL support, automatic format detection, timeline view, and filtering. Ideal for exploratory analysis.
# macOS
brew install lnav
# Ubuntu/Debian
sudo apt install lnav
# Windows (via Chocolatey)
choco install lnav
# From source
curl -LO https://github.com/tstack/lnav/releases/download/v0.12.2/lnav-0.12.2-linux-musl-x86_64.zip
unzip lnav-0.12.2-linux-musl-x86_64.zip
sudo cp lnav-0.12.2/lnav /usr/local/bin/
# Verify
lnav -V
| Feature | Access | Description |
|---|---|---|
| Auto-detect format | Automatic | Recognizes syslog, Apache, nginx, JSON, and many more |
| SQL queries | : then SQL |
Run SQL against log data |
| Filter in/out | i / o |
Interactive include/exclude filters |
| Bookmarks | m |
Mark lines for later reference |
| Timeline | t |
Show time histogram |
| Pretty print | p |
Toggle pretty-printing JSON |
| Headless mode | -n -c "..." |
Non-interactive command execution |
| Compressed files | Automatic | Handles .gz, .bz2, .xz transparently |
# Open log file(s)
lnav app.log
lnav /var/log/syslog /var/log/auth.log # multiple files, merged by timestamp
# Open JSONL logs
lnav app.jsonl
# Open compressed logs
lnav app.log.gz
# Open all logs in a directory
lnav /var/log/myapp/
# Headless mode: run query and output results
lnav -n -c ";SELECT count(*) FROM logline WHERE log_level = 'error'" app.log
# Headless mode: filter and export
lnav -n -c ";SELECT log_time, log_body FROM logline WHERE log_level = 'error'" \
-c ":write-csv-to errors.csv" app.log
# Headless mode: get stats
lnav -n -c ";SELECT log_level, count(*) as cnt FROM logline GROUP BY log_level ORDER BY cnt DESC" app.log
-- Error count by hour
SELECT strftime('%Y-%m-%d %H', log_time) as hour, count(*) as errors
FROM logline WHERE log_level = 'error'
GROUP BY hour ORDER BY hour;
-- Top error messages
SELECT log_body, count(*) as cnt
FROM logline WHERE log_level = 'error'
GROUP BY log_body ORDER BY cnt DESC LIMIT 10;
-- Time between events
SELECT log_time, log_body,
julianday(log_time) - julianday(lag(log_time) OVER (ORDER BY log_time)) as gap_days
FROM logline WHERE log_level = 'error';
-- Log volume over time
SELECT strftime('%Y-%m-%d %H:%M', log_time) as minute, count(*) as lines
FROM logline
GROUP BY minute ORDER BY minute;
// ~/.lnav/formats/installed/myapp.json
{
"myapp_log": {
"title": "My Application Log",
"regex": {
"std": {
"pattern": "^(?<timestamp>\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2})\\s+\\[(?<level>\\w+)\\]\\s+(?<body>.*)"
}
},
"timestamp-format": ["%Y-%m-%dT%H:%M:%S"],
"level": {
"error": "ERROR",
"warning": "WARN",
"info": "INFO",
"debug": "DEBUG"
}
}
}
| Key | Action |
|---|---|
/ |
Search forward (regex) |
n / N |
Next/previous search match |
i |
Toggle filter: show only matching lines |
o |
Toggle filter: hide matching lines |
TAB |
Switch between views (log, text, help) |
t |
Toggle timeline histogram |
m |
Set bookmark on current line |
u / U |
Next/previous bookmark |
z / Z |
Zoom in/out on timeline |
p |
Toggle pretty-print for JSON |
e / E |
Next/previous error |
w / W |
Next/previous warning |
: |
Enter command mode |
; |
Enter SQL query mode |
Pipeline-based aggregation tool designed for log analysis. Think SQL-like queries in a streaming pipeline syntax.
# Via cargo (all platforms)
cargo install ag
# macOS
brew install angle-grinder
# Verify
agrind --version
<input_pattern> | <operator1> | <operator2> | ...
# Count log levels
cat app.log | agrind '* | parse "* [*] *" as ts, level, msg | count by level'
# Top URLs
cat access.log | agrind '* | parse "* * * * * * *" as ip, _, _, ts, method, url, status | count by url | sort by _count desc | head 10'
# Average response time by endpoint
cat access.log | agrind '* | parse "* *ms" as prefix, duration | avg of duration by prefix'
# Error frequency over time
cat app.log | agrind '* | parse "*T*:*:* [ERROR]*" as date, hour, min, sec, msg | count by hour'
# Filter then aggregate
cat app.log | agrind '* | where level == "error" | count by msg | sort by _count desc'
# JSON log fields
cat app.jsonl | agrind '* | json | where level == "error" | count by message'
| Operator | Purpose | Example |
|---|---|---|
parse |
Extract fields with pattern | parse "* [*] *" as a, b, c |
json |
Parse JSON log lines | json |
where |
Filter rows | where level == "error" |
count |
Count (optionally by group) | count by level |
sum |
Sum a field | sum of bytes |
avg |
Average a field | avg of duration |
min / max |
Min/max of field | min of response_time |
sort |
Sort results | sort by _count desc |
head |
Limit results | head 10 |
uniq |
Unique values | uniq by user_id |
percentile |
Percentile calc | p50 of duration, p99 of duration |
Already covered extensively in file-search skill. Here are log-specific flags and patterns.
| Flag | Purpose | Example |
|---|---|---|
-c |
Count matches per file | rg -c "ERROR" /var/log/*.log |
-l |
List files with matches | rg -l "timeout" /var/log/ |
-L |
List files without matches | rg -L "healthy" /var/log/ |
--stats |
Show match statistics | rg --stats "error" app.log |
-A N |
Show N lines after match | rg -A5 "Exception" app.log |
-B N |
Show N lines before match | rg -B3 "FATAL" app.log |
-C N |
Show N lines context | rg -C5 "crash" app.log |
-U |
Multiline matching | rg -U "Error.*\n.*at " app.log |
--json |
JSON output format | rg --json "error" app.log |
-a |
Search binary files | rg -a "pattern" binary.log |
--line-buffered |
Flush per line (for tail) | tail -f app.log \| rg --line-buffered "error" |
-F |
Fixed string (no regex) | rg -F "stack[0]" app.log |
-f FILE |
Patterns from file | rg -f patterns.txt app.log |
-v |
Invert match | rg -v "DEBUG" app.log |
# Find errors across all log files recursively
rg "ERROR|FATAL|CRITICAL" /var/log/
# Count errors per file, sorted
rg -c "ERROR" /var/log/ 2>/dev/null | sort -t: -k2 -rn
# Find stack traces (multiline)
rg -U "Exception.*\n(\s+at .*\n)+" app.log
# Extract timestamps of errors
rg "ERROR" app.log | rg -o "^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}"
# Search compressed log files
rg -z "error" app.log.gz
# Search JSONL for specific field value (text-level, fast but approximate)
rg '"level":"error"' app.jsonl
# Search JSONL for value in specific key (avoid matching wrong key)
rg '"user_id":"user-42"' app.jsonl
# Negative lookahead: errors that are NOT timeouts
rg "ERROR(?!.*timeout)" app.log
# Time-bounded search (extract lines between two timestamps)
rg "2026-03-08T1[4-5]:" app.log
# Get structured output from rg (useful for programmatic processing)
rg --json "error" app.log | jq -c 'select(.type == "match") | {file: .data.path.text, line: .data.line_number, text: .data.lines.text}'
# Count matches with file info
rg --json "error" app.log | jq -c 'select(.type == "summary") | .data.stats'
Pre-installed on all Unix systems. Best for space/tab delimited logs with consistent column structure.
# Apache/nginx combined log format columns:
# $1=IP $2=ident $3=user $4=date $5=time $6=tz $7=method $8=path $9=proto $10=status $11=size
# Status code distribution
awk '{print $9}' access.log | sort | uniq -c | sort -rn
# Requests per IP
awk '{print $1}' access.log | sort | uniq -c | sort -rn | head -20
# 5xx errors with paths
awk '$9 >= 500 {print $1, $9, $7}' access.log
# Average response size
awk '{sum += $10; n++} END {printf "Avg: %.0f bytes\n", sum/n}' access.log
# Requests per minute
awk '{print substr($4, 2, 17)}' access.log | sort | uniq -c | tail -20
# Bandwidth by path
awk '{bytes[$7] += $10} END {for (p in bytes) printf "%10d %s\n", bytes[p], p}' access.log | sort -rn | head -20
# Custom delimiter (e.g., pipe-separated)
awk -F'|' '{print $3, $5}' custom.log
# Time-range filter (syslog format)
awk '/^Mar 8 14:/ {print}' syslog
# Calculate time difference between first and last line
awk 'NR==1 {first=$1" "$2} END {last=$1" "$2; print "From:", first, "To:", last}' app.log
# Parse key=value format (logfmt)
awk '{
for (i=1; i<=NF; i++) {
split($i, kv, "=")
if (kv[1] == "duration") sum += kv[2]; n++
}
} END {print "avg_duration=" sum/n}' app.log
# Extract specific key from logfmt
awk '{
for (i=1; i<=NF; i++) {
split($i, kv, "=")
if (kv[1] == "status" && kv[2] >= 500) print $0
}
}' app.log
Splits work across CPU cores for processing large log files.
# macOS
brew install parallel
# Ubuntu/Debian
sudo apt install parallel
# Verify
parallel --version
# Process multiple log files in parallel
ls /var/log/app-*.jsonl | parallel "jq -c 'select(.level == \"error\")' {} > {.}_errors.jsonl"
# Split large file and process chunks in parallel
split -l 200000 huge.jsonl /tmp/chunk_
ls /tmp/chunk_* | parallel "jq -r '.message' {} | sort | uniq -c" | sort -rn | head -20
# Parallel grep across many files
fd -e jsonl . /var/log/ | parallel "rg -c '\"error\"' {} 2>/dev/null" | sort -t: -k2 -rn
# Pipe-based parallelism (no temp files)
parallel --pipe -L 50000 "jq -c 'select(.level == \"error\")'" < huge.jsonl > errors.jsonl
# Parallel with progress bar
ls /var/log/app-*.jsonl | parallel --bar "jq -sc 'length' {}" | awk '{sum+=$1} END {print sum, "total lines"}'
# Number of jobs (default: CPU cores)
ls *.jsonl | parallel -j 4 "jq -c 'select(.status >= 500)' {}"
# Full workflow: split, process in parallel, merge results
FILE=huge.jsonl
CHUNKS=/tmp/log_chunks
mkdir -p "$CHUNKS"
# Split
split -l 100000 "$FILE" "$CHUNKS/chunk_"
# Process in parallel
ls "$CHUNKS"/chunk_* | parallel "jq -r 'select(.level == \"error\") | .message' {}" |
sort | uniq -c | sort -rn > error_summary.txt
# Cleanup
rm -rf "$CHUNKS"
Like awk, sed, and jq combined but specifically for structured record data (CSV, TSV, JSON).
# macOS
brew install miller
# Ubuntu/Debian
sudo apt install miller
# Windows
choco install miller
# Verify
mlr --version
# View CSV with headers
mlr --csv head -n 10 access_log.csv
# Filter rows
mlr --csv filter '$status >= 400' access_log.csv
# Sort by column
mlr --csv sort-by -nr duration access_log.csv
# Statistics
mlr --csv stats1 -a min,max,mean,p95 -f duration access_log.csv
# Group by with stats
mlr --csv stats1 -a count,mean -f duration -g endpoint access_log.csv
# Convert formats
mlr --c2j cat access_log.csv # CSV to JSON
mlr --c2t cat access_log.csv # CSV to TSV (table)
mlr --j2c cat access_log.json # JSON to CSV
mlr --c2p cat access_log.csv # CSV to pretty-print table
# Top-N by group
mlr --csv top -n 5 -f duration -g endpoint access_log.csv
# Add computed fields
mlr --csv put '$error = ($status >= 400 ? "yes" : "no")' access_log.csv
# Decimate (sample every Nth row)
mlr --csv sample -k 100 huge_log.csv
# Uniq count
mlr --csv count-distinct -f status access_log.csv
# Histogram
mlr --csv decimate -g status -n 1 access_log.csv | mlr --csv count-distinct -f status
# Join two CSV files
mlr --csv join -j user_id -f users.csv then sort-by user_id access_log.csv
# Extract JSONL to TSV, then use mlr for analysis
jq -r '[.timestamp, .level, .duration_ms, .path] | @tsv' app.jsonl > /tmp/extracted.tsv
mlr --tsvlite --from /tmp/extracted.tsv \
label timestamp,level,duration_ms,path then \
filter '$level == "error"' then \
stats1 -a count,mean -f duration_ms -g path then \
sort-by -nr count
# fd + rg + jq: find files, prefilter, extract
fd -e jsonl . logs/ | xargs rg -l '"error"' | xargs jq -c 'select(.level == "error") | {ts: .timestamp, msg: .message}'
# rg + jq + column: search, extract, format
rg '"timeout"' app.jsonl | jq -r '[.timestamp, .service, .message] | @tsv' | column -t
# jq + sort + uniq: aggregate without slurp
jq -r '.error_type' errors.jsonl | sort | uniq -c | sort -rn
# tail + rg + jq: live monitoring with extraction
tail -f app.jsonl | rg --line-buffered '"error"' | jq --unbuffered -r '[.timestamp, .message] | @tsv'
# fd + parallel + jq: parallel extraction across many files
fd -e jsonl . logs/ | parallel "jq -c 'select(.level == \"error\")' {}" > all_errors.jsonl
# jq + mlr: structured extraction then statistical analysis
jq -r '[.path, .duration_ms, .status] | @csv' app.jsonl | \
mlr --csv label path,duration,status then \
stats1 -a p50,p95,p99 -f duration -g path then \
sort-by -nr p95
# lnav + headless SQL: non-interactive queries
lnav -n -c ";SELECT log_level, count(*) FROM logline GROUP BY log_level" app.log
Task: Explore unknown log file
--> lnav (interactive, auto-detects format)
Task: Quick search for pattern
--> rg "pattern" file.log
Task: Extract fields from JSONL
--> jq -r '[.field1, .field2] | @tsv' file.jsonl
Task: Count/aggregate JSONL (<100MB)
--> jq -sc 'group_by(.x) | map(...)' file.jsonl
Task: Count/aggregate JSONL (>100MB)
--> jq -r '.field' file.jsonl | sort | uniq -c | sort -rn
Task: Search large JSONL then extract
--> rg "pattern" file.jsonl | jq -r '.field'
Task: CSV/TSV log statistics
--> mlr --csv stats1 -a mean,p95 -f duration file.csv
Task: Process many log files in parallel
--> fd -e jsonl . dir/ | parallel "jq ..."
Task: Pipeline aggregation on text logs
--> cat file.log | agrind '* | parse ... | count by ...'
Task: Live monitoring with filtering
--> tail -f file.jsonl | rg --line-buffered "x" | jq --unbuffered '.'