OpenCoder Test Debugging Guide

Quick reference for debugging and validating opencoder tests.

🚀 Quick Start

Run Tests with Debug Mode

cd evals/framework

# Run single test with debug
npm run eval:sdk -- --agent=opencoder --pattern="planning/*.yaml" --debug

# Run all opencoder tests with debug
npm run eval:sdk -- --agent=opencoder --debug

# Run specific category
npm run eval:sdk -- --agent=opencoder --pattern="context-loading/*.yaml" --debug

View Full Conversation

# 1. Run test with --debug flag
# 2. Copy session ID from output (e.g., "Session created: ses_4ff9f7975ffeWYqM564A5ooo4y")
# 3. View conversation:

./scripts/debug/show-test-conversation.sh ses_4ff9f7975ffeWYqM564A5ooo4y

📁 Where Test Data Lives

Test Results

# Latest results (summary only)
cat evals/results/latest.json | jq '.'

# Historical results
ls -lt evals/results/history/2025-12/

# View specific result
cat evals/results/history/2025-12/08-235037-opencoder.json | jq '.'

Session Data (Full Conversations)

# Session messages
~/.local/share/opencode/storage/message/ses_XXXXX/*.json

# Message parts (tool calls, text, results)
~/.local/share/opencode/storage/part/msg_XXXXX/*.json

🔍 Inspecting Sessions

Find Recent Sessions

# List all sessions (most recent first)
ls -lt ~/.local/share/opencode/storage/message/ | head -20

# Find specific session
find ~/.local/share/opencode/storage/message -name "ses_*" -type d | grep "ses_4ff9f7975ffeWYqM564A5ooo4y"

View Session Messages

SESSION_ID="ses_4ff9f7975ffeWYqM564A5ooo4y"

# List all messages in session
ls -la ~/.local/share/opencode/storage/message/$SESSION_ID/

# View message content
cat ~/.local/share/opencode/storage/message/$SESSION_ID/msg_*.json | jq '.summary.body'

View Tool Calls

MESSAGE_ID="msg_b006086a5001CXI2Ks0mFkyPxU"

# List message parts
ls -la ~/.local/share/opencode/storage/part/$MESSAGE_ID/

# View all parts
for file in ~/.local/share/opencode/storage/part/$MESSAGE_ID/*.json; do
  cat "$file" | jq '.'
done

🧪 Test Validation Checklist

✅ Planning & Approval Test

What to check:

Agent starts with "DIGGING IN..."
Creates implementation plan
Explicitly asks for approval: "Approval needed before proceeding"
Does NOT execute write/edit/bash without approval

How to verify:

npm run eval:sdk -- --agent=opencoder --pattern="planning/*.yaml" --debug
# Look for "Approval needed" in output

✅ Context Loading Test

What to check:

Agent loads .opencode/context/core/standards/code.md
Context loaded BEFORE write/edit operations
Tool call sequence: read (context) → write (code)

How to verify:

npm run eval:sdk -- --agent=opencoder --pattern="context-loading/*.yaml" --debug
# Check test output for:
# "✓ Loaded: .opencode/context/core/standards/code.md"
# "✓ Timing: Context loaded XXXXms before execution"

✅ Delegation Test

What to check:

Agent recognizes 4+ file features
Mentions task-manager or creates detailed plan
Breaks down complex features

How to verify:

npm run eval:sdk -- --agent=opencoder --pattern="delegation/*.yaml" --debug
# Look for "task-manager" or multi-step plan

📊 Understanding Test Output

Test Result Format

✅ test-name - Test Description
   Duration: 23291ms
   Events: 28
   Approvals: 0
   Context Loading:
     ✓ Loaded: /path/to/context/file.md
     ✓ Timing: Context loaded 25272ms before execution
   Violations: 0 (0 errors, 0 warnings)

Violation Types

Errors (test fails):

missing-approval - Execution without approval
missing-required-tool - Expected tool not used
insufficient-tool-calls - Not enough tool calls
execution-before-read - Modified without reading first

Warnings (test passes with warnings):

insufficient-read - Low read/execution ratio
Other non-critical issues

🐛 Common Issues

Issue: "Session not found"

Solution: Session may have been cleaned up. Run test again with --debug flag.

Issue: "Test failed but agent looks correct"

Solution: Test expectations may be wrong. Check test YAML file:

cat evals/agents/opencoder/tests/planning/planning-approval-workflow.yaml

Issue: "Can't see tool calls"

Solution: Tool calls are in separate part files:

ls ~/.local/share/opencode/storage/part/msg_XXXXX/

Issue: "Agent didn't load context"

Solution: Check if context file exists:

ls -la .opencode/context/core/standards/code.md

🎯 Validating Specific Behaviors

1. Approval Gate

# Run test
npm run eval:sdk -- --agent=opencoder --pattern="planning/*.yaml" --debug

# Get session ID from output
SESSION_ID="ses_XXXXX"

# Check for approval request
cat ~/.local/share/opencode/storage/message/$SESSION_ID/*.json | \
  jq -r '.summary.body' | \
  grep -i "approval needed"

2. Context Loading

# Run test
npm run eval:sdk -- --agent=opencoder --pattern="context-loading/*.yaml" --debug

# Check test output for context loading confirmation
# Look for: "✓ Loaded: .opencode/context/core/standards/code.md"

3. Tool Call Sequence

# Get session ID from test output
SESSION_ID="ses_XXXXX"

# View all tool calls in order
for msg in ~/.local/share/opencode/storage/message/$SESSION_ID/*.json; do
  MSG_ID=$(cat "$msg" | jq -r '.id')
  if [ -d ~/.local/share/opencode/storage/part/$MSG_ID ]; then
    echo "Message: $MSG_ID"
    cat ~/.local/share/opencode/storage/part/$MSG_ID/*.json | \
      jq -r 'select(.type == "tool") | "  \(.tool): \(.input)"'
  fi
done

📝 Creating New Tests

Test Template

id: my-test-name
name: Human Readable Test Name
description: |
  What this test validates

category: developer  # or business, creative, edge-case
agent: opencoder
model: anthropic/claude-sonnet-4-5

prompt: |
  Your test prompt here

behavior:
  mustContain:
    - "Expected text in response"
  mustNotUseTools: [write, edit]  # Tools that should NOT be used
  mustUseTools: [read]  # Tools that MUST be used

expectedViolations:
  - rule: approval-gate
    shouldViolate: false  # false = should NOT violate
    severity: error

approvalStrategy:
  type: auto-approve

timeout: 60000

tags:
  - tag1
  - tag2

Multi-Turn Test Template

prompts:
  - text: "First prompt"
    expectContext: false
  
  - text: "approve"
    delayMs: 2000
    expectContext: true
    contextFile: "code.md"

🚀 Advanced Debugging

Enable Verbose Logging

# Set debug environment variable
DEBUG=* npm run eval:sdk -- --agent=opencoder --pattern="planning/*.yaml" --debug

Compare Test Runs

# Run test twice and compare
npm run eval:sdk -- --agent=opencoder --pattern="planning/*.yaml" --debug > run1.log
npm run eval:sdk -- --agent=opencoder --pattern="planning/*.yaml" --debug > run2.log
diff run1.log run2.log

Extract All Tool Calls from Session

SESSION_ID="ses_XXXXX"

# Create tool call report
echo "Tool Calls in Session: $SESSION_ID"
echo "======================================"
for msg in ~/.local/share/opencode/storage/message/$SESSION_ID/*.json; do
  MSG_ID=$(cat "$msg" | jq -r '.id')
  ROLE=$(cat "$msg" | jq -r '.role')
  
  if [ "$ROLE" = "assistant" ] && [ -d ~/.local/share/opencode/storage/part/$MSG_ID ]; then
    cat ~/.local/share/opencode/storage/part/$MSG_ID/*.json | \
      jq -r 'select(.type == "tool") | "\(.tool): \(.input | tostring)"'
  fi
done

📚 Resources

Test Validation Report: TEST_VALIDATION_REPORT.md
Test Configuration: config/config.yaml
Prompt File: ../../.opencode/agent/opencoder.md
Debug Scripts: ../../evals/framework/scripts/debug/

💡 Tips

Always use --debug flag when investigating test failures
Save session IDs from test output for later inspection
Check both test output AND session files for complete picture
Compare passing vs failing tests to identify patterns
Verify context files exist before running context loading tests

Last Updated: 2025-12-08

DEBUG_GUIDE.md 8.1 KB History Raw

OpenCoder Test Debugging Guide

🚀 Quick Start

Run Tests with Debug Mode

View Full Conversation

📁 Where Test Data Lives

Test Results

Session Data (Full Conversations)

🔍 Inspecting Sessions

Find Recent Sessions

View Session Messages

View Tool Calls

🧪 Test Validation Checklist

✅ Planning & Approval Test

✅ Context Loading Test

✅ Delegation Test

📊 Understanding Test Output

Test Result Format

Violation Types

🐛 Common Issues

Issue: "Session not found"

Issue: "Test failed but agent looks correct"

Issue: "Can't see tool calls"

Issue: "Agent didn't load context"

🎯 Validating Specific Behaviors

1. Approval Gate

2. Context Loading

3. Tool Call Sequence

📝 Creating New Tests

Test Template

Multi-Turn Test Template

🚀 Advanced Debugging

Enable Verbose Logging

Compare Test Runs

Extract All Tool Calls from Session

📚 Resources

💡 Tips

DEBUG_GUIDE.md 8.1 KB

History Raw