TL;DR: Run tests and see EXACTLY what the agent says and does.
cd evals/framework
# Run single test and see full conversation
npm run eval:sdk -- --agent=opencoder --pattern="planning/*.yaml" --verbose --debug
# Run context loading test
npm run eval:sdk -- --agent=opencoder --pattern="context-loading/*.yaml" --verbose --debug
# Run all tests (will take longer)
npm run eval:sdk -- --agent=opencoder --verbose --debug
Note: Both --verbose and --debug flags are required:
--verbose = Show full conversations--debug = Keep session data (required for --verbose to work)cd evals/framework/scripts
# Run single test and see full conversation
./run-test-verbose.sh opencoder "planning/*.yaml"
# Run context loading test
./run-test-verbose.sh opencoder "context-loading/*.yaml"
Output shows:
TEST RESULTS
======================================================================
1. โ
planning-approval-workflow - Planning & Approval Workflow
Duration: 28327ms
Events: 33
Approvals: 0
Violations: 0 (0 errors, 0 warnings)
======================================================================
FULL CONVERSATION
======================================================================
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ค USER
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
## Implementation Plan
Based on the existing code structure, I will:
1. Create a new file `utils/utils.js` with an `add` function
2. Follow pure function pattern
3. Use JSDoc comments
**Approval needed before proceeding. Please review and confirm.**
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ค ASSISTANT
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
./run-test-verbose.sh opencoder "planning/*.yaml"
Look for:
./run-test-verbose.sh opencoder "context-loading/*.yaml"
Look for:
read of code.md BEFORE write./run-test-verbose.sh opencoder "delegation/*.yaml"
Look for:
If you have a session ID from a test run:
cd evals/framework/scripts/debug
./show-test-conversation.sh ses_XXXXXXXXXXXXX
# List all opencoder tests
find ../agents/opencoder/tests -name "*.yaml" -type f
# Run all tests (no conversation output)
cd ../
npm run eval:sdk -- --agent=opencoder
# Run all tests with debug
npm run eval:sdk -- --agent=opencoder --debug
| Category | Pattern | What It Tests |
|---|---|---|
| Planning | planning/*.yaml |
Plan-first, approval gates |
| Context | context-loading/*.yaml |
Loads code.md before coding |
| Implementation | implementation/*.yaml |
Incremental execution, validation |
| Delegation | delegation/*.yaml |
Task-manager, coder-agent routing |
| Error Handling | error-handling/*.yaml |
Stop on failure, report-first |
| Completion | completion/*.yaml |
Handoff recommendations |
Run these 3 tests to validate core behaviors:
# 1. Approval workflow
./run-test-verbose.sh opencoder "planning/*.yaml"
# โ
Look for: "Approval needed before proceeding"
# 2. Context loading
./run-test-verbose.sh opencoder "context-loading/*.yaml"
# โ
Look for: "โ Loaded: .opencode/context/core/standards/code.md"
# 3. Delegation
./run-test-verbose.sh opencoder "delegation/delegation-task-manager.yaml"
# โ
Look for: Multi-step plan or "task-manager" mention
npm run eval:sdk -- --agent=opencoder --pattern="planning/*.yaml" --debugcat ../agents/opencoder/tests/planning/planning-approval-workflow.yamlYES, you can see exactly what's being asked and what's being responded with!
The run-test-verbose.sh script:
No more guessing - you can see:
Last Updated: 2025-12-08