Date: 2026-01-09
Status: Tests Created, Initial Run Complete
We created comprehensive tests for ContextScout integration and ran initial tests. Here's what we learned:
Test: 04-implicit-discovery.yaml - "How does the registry system work?"
Expected: OpenAgent should delegate to ContextScout to discover registry context files
Actual: OpenAgent used grep and read directly to find information
grep to search for "registry"registry.json directlygrep to search for "auto-detect".opencode/context/openagents-repo/core-concepts/registry.mdtask tool to delegate to ContextScoutConclusion: OpenAgent is NOT proactively using ContextScout for discovery. It's doing its own searching.
Test: 02-unknown-domain-discovery.yaml - "Explain how the eval framework works"
Expected: OpenAgent should use ContextScout and provide comprehensive answer
Actual: OpenAgent made only 1 tool call and provided minimal response
Conclusion: OpenAgent may be treating discovery requests as conversational rather than requiring deep context loading.
Test: 01-code-standards-discovery.yaml via --subagent=contextscout
Expected: ContextScout runs as mode: primary and uses glob/read directly
Actual: Even in "standalone" mode, ContextScout is wrapped by a parent agent
mode: primary (confirmed in debug logs)task tooltask → ContextScout (recursive!)Conclusion: The eval framework's "standalone" mode doesn't truly run subagents standalone. They're still invoked through a parent wrapper.
Issue: When testing subagents, the framework captures the parent agent's tool usage, not the subagent's internal tool usage.
Impact:
glob/read/grep internallyWorkaround Needed:
--subagent=contextscout --delegate)| Test | Agent | Expected Behavior | Actual Behavior | Status |
|---|---|---|---|---|
| 04-implicit-discovery | OpenAgent | Use ContextScout | Used grep/read directly | ❌ FAIL |
| 02-unknown-domain | OpenAgent | Use ContextScout | Minimal response, no delegation | ❌ FAIL |
| 01-code-standards | ContextScout | Use glob/read | Delegated to itself (wrapper issue) | ❌ FAIL |
Looking at OpenAgent's prompt (.opencode/agent/core/openagent.md):
Stage 3.0: DiscoverContext is marked as optional="true"
<step id="3.0" name="DiscoverContext" optional="true">
OPTIONAL: Use ContextScout to discover relevant context files intelligently
When to use ContextScout:
- Unfamiliar with project structure or domain
- Need to find domain-specific patterns or standards
- Looking for examples, guides, or error solutions
- Want to ensure you have all relevant context before proceeding
Problem: The step is optional, and the criteria are vague. OpenAgent can easily skip this step and use its own tools (grep/read) instead.
Why this happens:
grep and read toolsOption 1: Make ContextScout usage more explicit for unfamiliar domains
<step id="3.0" name="DiscoverContext" required="conditional">
Check if task involves unfamiliar domain:
- Registry system, eval framework, agent creation, etc.
IF unfamiliar domain:
MUST delegate to ContextScout for discovery
ELSE:
MAY load known context directly
Option 2: Add a decision tree
Is this a standard task (code/docs/tests)?
YES → Load known context directly (.opencode/context/core/standards/*)
NO → Is this domain-specific (registry, evals, agents)?
YES → MUST use ContextScout for discovery
NO → Proceed with available context
Issue: Can't test subagents in true standalone mode
Solution: Add nested tool call capture
nestedToolCalls to test expectationsQuestion: Is ContextScout actually needed?
Evidence:
Options:
⏳ Decide on ContextScout strategy:
⏳ Fix test framework to support true subagent testing
⏳ Update OpenAgent/OpenCoder prompts based on decision
For now, focus on integration tests:
Create simpler tests:
evals/agents/core/openagent/tests/contextscout-integration/04-implicit-discovery.yamlevals/agents/core/openagent/tests/contextscout-integration/05-multi-domain-comprehensive.yamlevals/agents/core/opencoder/tests/contextscout-integration/01-implicit-pattern-discovery.yamlevals/agents/core/opencoder/tests/contextscout-integration/README.mdevals/agents/ContextScout/tests/01-code-standards-discovery.yamlevals/agents/ContextScout/tests/02-domain-specific-discovery.yamlevals/agents/ContextScout/tests/03-bad-request-handling.yamlevals/agents/ContextScout/tests/04-multi-domain-comprehensive.yamlevals/agents/ContextScout/tests/05-tool-usage-validation.yamlevals/agents/ContextScout/tests/README.mdevals/agents/CONTEXTSCOUT_INTEGRATION_TESTS.md - Comprehensive test suite overviewevals/agents/CONTEXTSCOUT_TEST_FINDINGS.md - This fileevals/framework/src/sdk/run-sdk-tests.ts - Added contextscout to subagent mapsevals/framework/src/sdk/test-runner.ts - Added contextscout to agent mapevals/agents/ContextScout/config/config.yaml - Added test suitesThe tests revealed that OpenAgent and OpenCoder are NOT proactively using ContextScout as intended. They're using their own discovery tools (grep/read) instead, which is actually faster but potentially less comprehensive.
Decision needed: Should we:
Test framework limitation: Can't validate subagent internal behavior in standalone mode. Need to either fix framework or focus on integration/delegation tests.
Next conversation: Decide on ContextScout strategy and update agent prompts accordingly.