Systematic Debugging Methods

Structured approaches to finding and fixing bugs, from scientific method to team-based protocols.

Scientific Debugging Method

The most rigorous approach: treat debugging as an experiment.

The Cycle

Observe → Hypothesize → Predict → Test → Conclude
   ↑                                        │
   └────────────────────────────────────────┘
           (if hypothesis rejected)

Step-by-Step

Observe: Gather all available evidence without interpretation
- Error messages, stack traces, logs
- System state (memory, CPU, disk, network)
- User-reported behavior vs expected behavior
- When it started (correlate with deployments, config changes)
Hypothesize: Form a specific, falsifiable explanation
- "The query times out because the users table lacks an index on email"
- NOT "something is wrong with the database" (too vague)
Predict: State what should happen if the hypothesis is true
- "If I add an index on email, the query should complete in <100ms"
- "If I run EXPLAIN on the query, it should show a sequential scan"
Test: Run the smallest experiment that distinguishes true from false
- Run EXPLAIN ANALYZE on the query
- Check if the index exists: \di users_email_idx
Conclude: Accept or reject the hypothesis based on evidence
- If confirmed: proceed to fix
- If rejected: return to step 1 with new information

Example Walkthrough

OBSERVATION:
  API endpoint POST /api/orders returns 500 after deploying v2.3.1
  Error log: "TypeError: Cannot read property 'id' of undefined"
  Stack trace points to orders.controller.js:47

HYPOTHESIS 1:
  "The user object is null because the auth middleware is not
   attaching it to the request in the new version"

PREDICTION:
  "If I log req.user in the auth middleware, it will be undefined
   for the failing requests"

TEST:
  Added console.log(req.user) in auth middleware
  Result: req.user IS defined and correct

CONCLUSION:
  Hypothesis 1 REJECTED. The user object exists.

HYPOTHESIS 2:
  "The order.customer field changed from an object to a string ID
   in the new schema, so order.customer.id fails"

PREDICTION:
  "If I inspect the order document, customer will be a string, not
   an object with an .id property"

TEST:
  db.orders.findOne({_id: "failing-order-id"})
  Result: { customer: "user_123", ... }  -- string, not object

CONCLUSION:
  Hypothesis 2 CONFIRMED. Schema migration changed customer from
  embedded object to reference ID. Fix: update controller to handle
  both formats or ensure migration is complete.

Binary Search Debugging

Divide the search space in half with each step. Works for both code and history.

Git Bisect (Automated)

Find the exact commit that introduced a bug:

# Start bisect
git bisect start

# Mark current (broken) state as bad
git bisect bad

# Mark a known-good commit
git bisect good v2.2.0

# Automate with a test script (exit 0 = good, exit 1 = bad)
git bisect run ./test-bug.sh

Example test script:

#!/bin/bash
# test-bug.sh - exits 0 if bug is absent, 1 if present

# Build the project (skip if not needed)
npm install --silent 2>/dev/null
npm run build --silent 2>/dev/null

# Run the specific test that catches the bug
npm test -- --grep "order creation" 2>/dev/null
exit $?

# After bisect completes:
# "abc1234 is the first bad commit"

# View the offending commit
git show abc1234

# Clean up
git bisect reset

Git Bisect (Manual)

git bisect start
git bisect bad HEAD
git bisect good v2.2.0

# Git checks out a middle commit
# Test manually, then mark:
git bisect good   # if this commit works
git bisect bad    # if this commit is broken

# Repeat until the first bad commit is found
# ~10 steps for 1000 commits (log2(1000) ≈ 10)

Manual Code Bisection

When the bug is in a single file or function:

1. Comment out the bottom half of the suspect function
2. Test → still broken? Bug is in the top half
3. Uncomment bottom half, comment out top half of remaining suspect code
4. Repeat until you isolate the exact line(s)

This is particularly effective for:

Long functions with no clear fault location
Template/config files where errors are positional
CSS debugging (comment out rule blocks)

Wolf Fence Algorithm

Named after the strategy of placing a fence across a territory to determine which side the wolf is on.

Concept

Place a "probe" (assertion, log, breakpoint) at the midpoint of execution. Check if the state is correct at that point. If correct, the bug is downstream. If incorrect, it is upstream. Repeat.

Implementation

def process_order(order):
    validated = validate(order)
    # PROBE 1: is validated correct here?
    assert validated.total > 0, f"Probe 1 failed: total={validated.total}"

    enriched = enrich_with_inventory(validated)
    # PROBE 2: is enriched correct here?
    assert enriched.items_available, f"Probe 2 failed: items={enriched.items}"

    charged = charge_payment(enriched)
    # PROBE 3: is charged correct here?
    assert charged.payment_id, f"Probe 3 failed: payment={charged.payment_id}"

    return finalize(charged)

Strategic Probe Placement

Place probes at:
├─ Function entry/exit boundaries
├─ Before/after external calls (DB, API, filesystem)
├─ Before/after data transformations
├─ At conditional branches (which path was taken?)
└─ At loop boundaries (iteration count, accumulator value)

Rubber Duck Debugging

Explaining the problem forces you to examine your assumptions.

Structured Rubber Duck Template

1. WHAT I EXPECT TO HAPPEN:
   [describe the correct behavior in detail]

2. WHAT ACTUALLY HAPPENS:
   [describe the observed behavior precisely]

3. THE GAP:
   [what is different between expected and actual?]

4. MY CODE DOES THIS:
   [walk through the relevant code line by line]
   - Line 1: "First, we fetch the user by ID..."
   - Line 2: "Then we check if the user has permission..."
   - Line 3: "Wait... we check user.role but role could be
              undefined if the user was created before we
              added roles... THAT'S THE BUG"

5. ASSUMPTIONS I AM MAKING:
   [list every assumption, then question each one]
   - "The user always has a role" ← IS THIS TRUE?
   - "The database returns results in insertion order" ← IS THIS TRUE?
   - "The config file is loaded before this function runs" ← IS THIS TRUE?

Why It Works

Forces sequential reasoning instead of pattern-matching
Exposes implicit assumptions
Catches "it obviously works" blind spots
The bug is usually found in step 4 or 5

Differential Debugging

Compare what works against what does not.

Method

Working State          vs.          Broken State
─────────────                      ────────────
Environment A                      Environment B
Input set X                        Input set Y
Version N-1                        Version N
Config A                           Config B

Practical Comparison Commands

# Compare environment variables
diff <(env | sort) <(ssh prod 'env | sort')

# Compare installed packages
diff <(pip list --format=freeze | sort) <(ssh prod 'pip list --format=freeze | sort')

# Compare config files
diff local.env production.env
difft config-working.yaml config-broken.yaml  # semantic diff

# Compare database schemas
diff <(pg_dump --schema-only dbA) <(pg_dump --schema-only dbB)

# Compare API responses
diff <(curl -s localhost:3000/api/users | jq .) <(curl -s prod:3000/api/users | jq .)

# Compare directory structures
diff <(fd -t f . ./working/ | sort) <(fd -t f . ./broken/ | sort)

Environment Diff Checklist

[ ] OS and version
[ ] Runtime version (node --version, python --version, go version)
[ ] Dependency versions (package-lock.json, requirements.txt, go.sum)
[ ] Environment variables (especially secrets, API keys, feature flags)
[ ] Config files (compare byte-for-byte)
[ ] Database schema and seed data
[ ] Network configuration (firewall, DNS, proxy)
[ ] File system (permissions, case sensitivity, available disk)
[ ] System resources (memory, CPU, file descriptors)
[ ] Time and timezone

Delta Debugging

Systematically reduce the input or code to find the minimal failing case.

Concept (ddmin Algorithm)

Given: A failing input of N elements
Goal:  Find the smallest subset that still triggers the failure

1. Split input into 2 halves
2. Test each half:
   - If one half fails alone → recurse on that half
   - If neither half fails alone → the bug requires elements from both
     → try removing each quarter, then each eighth, etc.
3. Stop when no single element can be removed without fixing the bug

Practical Application

# Reduce a failing test input file
# Start: 1000-line input.json that causes a crash

# Test: does the first half crash?
head -500 input.json > test.json && ./program test.json

# If yes: recurse on first 500 lines
# If no: test second half
tail -500 input.json > test.json && ./program test.json

# Continue halving until minimal input found

Code Reduction

1. Start with the full failing program
2. Remove half the code (e.g., half the imports, half the functions)
3. Does it still fail?
   - Yes → keep removing from what remains
   - No → restore that half, remove the other half
4. Result: minimal code that reproduces the failure

Tools

C-Reduce (creduce): Automated C/C++ test case reduction
Perses: Language-agnostic program reducer
picireny: Python-based delta debugging framework
Manual reduction is often fastest for small programs

Trace-Based Debugging

Follow the execution path through the system.

Strategic Trace Points

import functools
import time
import json

def trace(func):
    """Decorator that traces function entry, exit, and timing."""
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        call_id = id(args) % 10000
        arg_summary = json.dumps(args[:3], default=str)[:100]
        print(f"[TRACE {call_id}] ENTER {func.__name__}({arg_summary})")
        start = time.perf_counter()
        try:
            result = func(*args, **kwargs)
            elapsed = (time.perf_counter() - start) * 1000
            result_summary = str(result)[:100]
            print(f"[TRACE {call_id}] EXIT  {func.__name__} -> {result_summary} ({elapsed:.1f}ms)")
            return result
        except Exception as e:
            elapsed = (time.perf_counter() - start) * 1000
            print(f"[TRACE {call_id}] ERROR {func.__name__}: {e} ({elapsed:.1f}ms)")
            raise
    return wrapper

System Call Tracing

# Linux: trace system calls
strace -f -e trace=network,file -p PID

# Trace a command from start
strace -f -o trace.log ./my-program

# Count system calls (find the hot path)
strace -c ./my-program

# macOS: dtrace/dtruss
sudo dtruss -p PID

# Trace file access only
strace -e trace=open,openat,read,write -p PID

Structured Trace Output

Timestamp   | Component     | Event    | Detail
------------|---------------|----------|---------------------------
10:23:01.001| auth-service  | ENTRY    | validateToken(tok_abc...)
10:23:01.003| auth-service  | CALL     | redis.get("session:abc")
10:23:01.015| auth-service  | RETURN   | redis -> {user: "u123"}
10:23:01.016| auth-service  | EXIT     | validateToken -> valid
10:23:01.017| order-service | ENTRY    | createOrder(user=u123)
10:23:01.018| order-service | CALL     | db.query("SELECT ...")
10:23:01.250| order-service | RETURN   | db -> timeout after 232ms  ← PROBLEM

Time-Travel Debugging

Record execution and replay it, stepping forwards AND backwards.

rr (Linux only)

# Record the execution
rr record ./my-program arg1 arg2

# Replay (starts at the end, you can go backwards)
rr replay

# Inside the rr session (gdb-like interface):
(rr) continue           # run forward
(rr) reverse-continue   # run backward to previous breakpoint
(rr) reverse-next       # step backward one line
(rr) reverse-step       # step backward into function calls
(rr) watch -l var       # break when var changes (works in reverse too)

# Set a breakpoint and reverse-continue to find what set a value
(rr) break my_function
(rr) continue           # hit the breakpoint going forward
(rr) watch -l result    # watch the variable
(rr) reverse-continue   # go back to where result was last set

When to Use Time-Travel Debugging

Bug manifests late but root cause is early in execution
You need to find "what set this variable to the wrong value?"
Intermittent bugs that are hard to reproduce (record once, replay many times)
Complex multi-step state corruption

Alternatives by Language

Language	Tool	Notes
C/C++	rr	Best-in-class, Linux only
C/C++	UDB (UndoDB)	Commercial, cross-platform
JavaScript	Chrome DevTools	"Step backward" in Sources panel (limited)
Python	`epdb` / `pdb++`	Post-mortem with history, not true time-travel
Java	IntelliJ IDEA	Limited reverse debugging
.NET	Visual Studio	IntelliTrace (Enterprise edition)

Hypothesis-Driven Debugging

Maintain a structured log of hypotheses to avoid going in circles.

Tracking Template

## Bug: [Brief description]

### Evidence Collected
- [ ] Error message: "..."
- [ ] Stack trace captured
- [ ] Logs reviewed for timeframe: X to Y
- [ ] Reproduction rate: N/M attempts

### Hypotheses

| # | Hypothesis | Prediction | Test | Result | Status |
|---|-----------|------------|------|--------|--------|
| 1 | Missing index on users.email | EXPLAIN shows seq scan | Run EXPLAIN ANALYZE | Shows index scan | REJECTED |
| 2 | Connection pool exhausted | Active connections = max | Check pg_stat_activity | 47/50 connections | INVESTIGATING |
| 3 | | | | | |

### Current Best Hypothesis: #2

### What I Tried That Didn't Work
- Restarting the service (symptom returned after 5 min)
- Increasing query timeout (different error, same root cause)

Rules

Write hypotheses down before testing them
Define the prediction before running the test
Record results even for rejected hypotheses (prevents retesting)
If you have tested 5+ hypotheses without progress, step back and re-examine assumptions

Debugging Checklists

Pre-Debug Checklist

[ ] Read the full error message and stack trace
[ ] Check if this is a known issue (search issue tracker, logs)
[ ] Identify when it started (correlate with recent changes)
[ ] Verify you are on the correct branch/version
[ ] Check recent commits: git log --oneline -10
[ ] Check recent deployments or config changes
[ ] Reproduce the bug locally
[ ] Set a time limit (30 min before seeking help)

During-Debug Log

Time    | Action Taken              | Result           | Next Step
--------|---------------------------|------------------|----------
10:00   | Reproduced locally        | Fails 3/3 times  | Check logs
10:05   | Checked error logs        | Found stack trace | Form hypothesis
10:10   | Hypothesis: null user obj | Test: add assert  | Test
10:15   | Assert passed (user OK)   | Rejected H1      | New hypothesis
10:20   | Hypothesis: stale cache   | Test: clear cache | Test
10:22   | Cache cleared, bug gone   | Confirmed H2     | Find root cause
10:30   | Cache TTL was 0 (never expires) | Root cause found | Fix

Post-Debug Retrospective

[ ] Root cause documented
[ ] Fix applied and verified
[ ] Regression test added
[ ] Could this bug class exist elsewhere? (search for similar patterns)
[ ] Was the debugging process efficient? What would I do differently?
[ ] Knowledge shared with team (if applicable)
[ ] Monitoring/alerting added to catch recurrence

When to Stop Debugging

Diminishing Returns Signals

You have been debugging for 2+ hours without new information
You are retesting hypotheses you already rejected
You are making changes "just to see what happens"
You are frustrated and making mistakes

Decision Framework

Can you reproduce it?
├─ No → Workaround + monitoring + move on
└─ Yes
   ├─ Is the impact high? (data loss, security, outage)
   │  └─ Yes → Keep debugging, escalate if needed
   └─ Is the impact low? (cosmetic, edge case, rare)
      └─ Yes → Workaround + backlog ticket + move on

Escalation Criteria

You have spent 2x your initial time estimate
You need access to systems/data you do not have
The bug crosses team boundaries (your service + another team's service)
You suspect a bug in a third-party library or runtime

Debugging in Teams

Pair Debugging

Two people, one screen. One drives (types), one navigates (thinks strategically).

Driver focuses on tactical execution
Navigator watches for wrong turns, suggests hypotheses
Switch roles every 20-30 minutes
Navigator should resist the urge to grab the keyboard

Fresh Eyes Protocol

When stuck, explain the problem to a colleague who has NOT been debugging it:

Describe the expected behavior
Describe the actual behavior
List hypotheses tested and their results
Ask: "What am I missing?"

The fresh person often spots an assumption the original debugger has gone blind to.

Knowledge Transfer After Debugging

Share with the team:
├─ What the bug was (root cause, not just symptom)
├─ How it was found (which technique worked)
├─ Why existing tests/monitoring did not catch it
├─ What was added to prevent recurrence
└─ Any broader lessons (design patterns, common pitfalls)

systematic-methods.md 17 KB History Raw