QUICK_TEST_GUIDE.md 6.0 KB

Quick Test Guide - OpenCoder

TL;DR: Run tests and see EXACTLY what the agent says and does.


๐Ÿš€ Run Test with Full Conversation

Method 1: Using --verbose flag (RECOMMENDED)

cd evals/framework

# Run single test and see full conversation
npm run eval:sdk -- --agent=opencoder --pattern="planning/*.yaml" --verbose --debug

# Run context loading test
npm run eval:sdk -- --agent=opencoder --pattern="context-loading/*.yaml" --verbose --debug

# Run all tests (will take longer)
npm run eval:sdk -- --agent=opencoder --verbose --debug

Note: Both --verbose and --debug flags are required:

  • --verbose = Show full conversations
  • --debug = Keep session data (required for --verbose to work)

Method 2: Using helper script

cd evals/framework/scripts

# Run single test and see full conversation
./run-test-verbose.sh opencoder "planning/*.yaml"

# Run context loading test
./run-test-verbose.sh opencoder "context-loading/*.yaml"

Output shows:

  1. โœ… Test result (PASS/FAIL)
  2. ๐Ÿ“Š Test metrics (duration, events, violations)
  3. ๐Ÿ’ฌ FULL CONVERSATION - Every message between user and agent

๐Ÿ“‹ Example Output

TEST RESULTS
======================================================================

1. โœ… planning-approval-workflow - Planning & Approval Workflow
   Duration: 28327ms
   Events: 33
   Approvals: 0
   Violations: 0 (0 errors, 0 warnings)

======================================================================
FULL CONVERSATION
======================================================================

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
๐Ÿ‘ค USER
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
## Implementation Plan

Based on the existing code structure, I will:

1. Create a new file `utils/utils.js` with an `add` function
2. Follow pure function pattern
3. Use JSDoc comments

**Approval needed before proceeding. Please review and confirm.**

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”
๐Ÿค– ASSISTANT
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”

โœ… What You Can Verify

1. Planning & Approval Workflow

./run-test-verbose.sh opencoder "planning/*.yaml"

Look for:

  • โœ… "DIGGING IN..." at start
  • โœ… "## Implementation Plan"
  • โœ… "Approval needed before proceeding"
  • โœ… NO write/edit/bash tools used before approval

2. Context Loading

./run-test-verbose.sh opencoder "context-loading/*.yaml"

Look for:

  • โœ… Test output shows: "โœ“ Loaded: .opencode/context/core/standards/code.md"
  • โœ… Test output shows: "โœ“ Timing: Context loaded XXXXms before execution"
  • โœ… Tool calls show read of code.md BEFORE write

3. Delegation Recognition

./run-test-verbose.sh opencoder "delegation/*.yaml"

Look for:

  • โœ… Agent recognizes multi-file features
  • โœ… Mentions "task-manager" or creates detailed breakdown
  • โœ… Identifies complexity correctly

๐Ÿ” Inspect Specific Session

If you have a session ID from a test run:

cd evals/framework/scripts/debug
./show-test-conversation.sh ses_XXXXXXXXXXXXX

๐Ÿ“Š All Available Tests

# List all opencoder tests
find ../agents/opencoder/tests -name "*.yaml" -type f

# Run all tests (no conversation output)
cd ../
npm run eval:sdk -- --agent=opencoder

# Run all tests with debug
npm run eval:sdk -- --agent=opencoder --debug

๐ŸŽฏ Test Categories

Category Pattern What It Tests
Planning planning/*.yaml Plan-first, approval gates
Context context-loading/*.yaml Loads code.md before coding
Implementation implementation/*.yaml Incremental execution, validation
Delegation delegation/*.yaml Task-manager, coder-agent routing
Error Handling error-handling/*.yaml Stop on failure, report-first
Completion completion/*.yaml Handoff recommendations

๐Ÿ’ก Quick Validation Checklist

Run these 3 tests to validate core behaviors:

# 1. Approval workflow
./run-test-verbose.sh opencoder "planning/*.yaml"
# โœ… Look for: "Approval needed before proceeding"

# 2. Context loading
./run-test-verbose.sh opencoder "context-loading/*.yaml"
# โœ… Look for: "โœ“ Loaded: .opencode/context/core/standards/code.md"

# 3. Delegation
./run-test-verbose.sh opencoder "delegation/delegation-task-manager.yaml"
# โœ… Look for: Multi-step plan or "task-manager" mention

๐Ÿ› Troubleshooting

"Session not found"

  • Session was cleaned up. Run test again.

"No conversation shown"

  • Check if test actually ran (look for "Session created" in output)
  • Try running test directly: npm run eval:sdk -- --agent=opencoder --pattern="planning/*.yaml" --debug

"Test failed but agent looks correct"

  • Test expectations may be wrong
  • Check test YAML file: cat ../agents/opencoder/tests/planning/planning-approval-workflow.yaml

๐Ÿ“ Summary

YES, you can see exactly what's being asked and what's being responded with!

The run-test-verbose.sh script:

  1. Runs the test
  2. Captures the session ID
  3. Shows test results
  4. Shows the FULL conversation between user and agent

No more guessing - you can see:

  • โœ… Exact prompts sent to agent
  • โœ… Exact responses from agent
  • โœ… Tool calls made
  • โœ… Approval requests
  • โœ… Context loading
  • โœ… Everything!

Last Updated: 2025-12-08