ContextScout is a subagent with mode: subagent, which means:
task toolCurrent Problem: Our tests are mixing both approaches and not testing either properly.
What we're testing: ContextScout's logic in isolation
How it works:
# Force ContextScout to run as primary agent
npm run eval:sdk -- --subagent=contextscout --pattern="smoke-test.yaml"
What happens:
mode: primary (overrides mode: subagent)Use when:
Limitations:
What we're testing: OpenAgent's ability to delegate to ContextScout
How it works:
# Test delegation from OpenAgent to ContextScout
npm run eval:sdk -- --agent=core/openagent --pattern="delegation-test.yaml"
What happens:
task(subagent_type="ContextScout", ...)Use when:
Limitations:
Current:
npm run eval:sdk -- --agent=ContextScout
Problem: This runs through OpenAgent but doesn't properly test delegation
What actually happens:
ContextScout as the agentResult: Inconsistent - sometimes ContextScout runs, sometimes it doesn't
Current tests mix concerns:
Result: Tests fail for unclear reasons
Goal: Verify ContextScout's core logic works
Tests:
Command:
cd evals/framework
npm run eval:sdk -- --subagent=contextscout --pattern="standalone/*.yaml"
Test Structure:
# evals/agents/ContextScout/tests/standalone/01-discovery.yaml
id: contextscout-standalone-discovery
name: "ContextScout Standalone: Discovery Test"
description: |
Tests ContextScout's ability to discover context files when running
as a standalone agent (mode: primary override).
This tests ContextScout's logic in isolation, not delegation.
prompts:
- text: |
Find all context files in .opencode/context/core/
Return:
- Exact file paths
- File count
- Directory structure
approvalStrategy:
type: auto-approve
behavior:
mustUseTools:
- glob # ContextScout MUST use glob to discover files
- list # ContextScout MUST use list to explore structure
minToolCalls: 2
maxToolCalls: 20
# Verify ContextScout's output
assertions:
- type: output_contains
value: ".opencode/context/core/"
- type: tool_called
tool: "glob"
- type: tool_called
tool: "list"
timeout: 60000
tags:
- contextscout
- standalone
- discovery
- unit-test
Goal: Verify OpenAgent delegates to ContextScout correctly
Tests:
Command:
cd evals/framework
npm run eval:sdk -- --agent=core/openagent --pattern="delegation/*.yaml"
Test Structure:
# evals/agents/core/openagent/tests/delegation/contextscout-delegation.yaml
id: openagent-contextscout-delegation
name: "OpenAgent: ContextScout Delegation Test"
description: |
Tests OpenAgent's ability to delegate to ContextScout.
This tests the delegation workflow, not ContextScout's logic.
prompts:
- text: |
I need to find context files for code standards. Can you help?
approvalStrategy:
type: auto-approve
behavior:
mustUseTools:
- task # OpenAgent MUST use task tool to delegate
minToolCalls: 1
maxToolCalls: 10
# Verify OpenAgent delegates to ContextScout
assertions:
- type: tool_called
tool: "task"
with_args:
subagent_type: "ContextScout"
- type: output_contains
value: ".opencode/context/core/standards/code.md"
timeout: 120000
tags:
- openagent
- delegation
- contextscout
- integration-test
When test fails:
Check tool usage: ```bash
npm run eval:sdk -- --subagent=contextscout --pattern="test.yaml" --debug
# Look for tool calls cat evals/results/latest.json | jq '.tests[0].timeline[] | select(.type == "tool_call")'
2. **Verify ContextScout is running**:
```bash
# Check agent name in logs
# Should see: "Agent: ContextScout" (not "Agent: OpenAgent")
Check output format:
# View agent response
cat evals/results/latest.json | jq '.tests[0].timeline[] | select(.type == "assistant_message")'
Common issues:
mustUseTools in testWhen test fails:
Check if delegation happened:
# Look for task tool call
cat evals/results/latest.json | jq '.tests[0].timeline[] | select(.tool == "task")'
Verify delegation parameters:
# Check subagent_type
cat evals/results/latest.json | jq '.tests[0].timeline[] | select(.tool == "task") | .arguments'
Check ContextScout's response:
# Look for ContextScout's output in timeline
# Should see nested agent execution
Common issues:
evals/agents/ContextScout/
├── config/
│ └── config.yaml # Test configuration
├── tests/
│ ├── standalone/ # Phase 1: Test ContextScout directly
│ │ ├── 01-discovery.yaml # Find files
│ │ ├── 02-search.yaml # Search for specific files
│ │ ├── 03-extraction.yaml # Extract key findings
│ │ ├── 04-output-format.yaml # Verify output format
│ │ ├── 05-error-handling.yaml # Handle errors
│ │ ├── 06-false-positive.yaml # Prevent hallucinations
│ │ ├── 07-invalid-path.yaml # Handle invalid paths
│ │ ├── 08-ambiguous-query.yaml # Handle vague queries
│ │ ├── 09-mvi-detection.yaml # Detect MVI compliance
│ │ └── 10-empty-directory.yaml # Handle empty dirs
│ └── delegation/ # Phase 2: Test OpenAgent delegation
│ ├── 01-trigger.yaml # Does OpenAgent delegate?
│ ├── 02-parameters.yaml # Correct parameters?
│ └── 03-results.yaml # Results presented correctly?
└── README.md
tests/standalone/ directorystandalone/--subagent=contextscoutmustUseTools to enforce tool usagetests/delegation/ directory# Standalone mode - tests ContextScout's logic
npm run eval:sdk -- --subagent=contextscout --pattern="standalone/*.yaml"
# Delegation mode - tests OpenAgent → ContextScout
npm run eval:sdk -- --agent=core/openagent --pattern="delegation/*.yaml"
# Run with debug output
npm run eval:sdk -- --subagent=contextscout --pattern="standalone/01-discovery.yaml" --debug
# Check tool calls
cat evals/results/latest.json | jq '.tests[0].timeline[] | select(.type == "tool_call") | {tool, args}'
# Run with debug output
npm run eval:sdk -- --agent=core/openagent --pattern="delegation/01-trigger.yaml" --debug
# Check if delegation happened
cat evals/results/latest.json | jq '.tests[0].timeline[] | select(.tool == "task")'
What should we test?
Phase 1: Test ContextScout's logic directly (standalone mode)
Phase 2: Test OpenAgent's delegation (integration mode)
Current status: Tests are confused - mixing both modes
Next step: Reorganize tests into standalone/ and delegation/ directories
Last Updated: 2026-01-07
Status: Strategy defined, implementation pending