# ContextScout Integration Test Findings **Date**: 2026-01-09 **Status**: Tests Created, Initial Run Complete --- ## Summary We created comprehensive tests for ContextScout integration and ran initial tests. Here's what we learned: --- ## Key Findings ### 1. OpenAgent Does NOT Use ContextScout Proactively ❌ **Test**: `04-implicit-discovery.yaml` - "How does the registry system work?" **Expected**: OpenAgent should delegate to ContextScout to discover registry context files **Actual**: OpenAgent used `grep` and `read` directly to find information - Used `grep` to search for "registry" - Read `registry.json` directly - Used `grep` to search for "auto-detect" - Eventually found `.opencode/context/openagents-repo/core-concepts/registry.md` - **Did NOT use the `task` tool to delegate to ContextScout** **Conclusion**: OpenAgent is NOT proactively using ContextScout for discovery. It's doing its own searching. --- ### 2. OpenAgent Provides Minimal Responses for Discovery Requests ⚠️ **Test**: `02-unknown-domain-discovery.yaml` - "Explain how the eval framework works" **Expected**: OpenAgent should use ContextScout and provide comprehensive answer **Actual**: OpenAgent made only 1 tool call and provided minimal response - Very short execution time (14.5s) - Only 1 tool call (insufficient for discovery) - Did NOT delegate to ContextScout **Conclusion**: OpenAgent may be treating discovery requests as conversational rather than requiring deep context loading. --- ### 3. ContextScout Cannot Be Tested in True Standalone Mode 🔧 **Test**: `01-code-standards-discovery.yaml` via `--subagent=contextscout` **Expected**: ContextScout runs as `mode: primary` and uses glob/read directly **Actual**: Even in "standalone" mode, ContextScout is wrapped by a parent agent - Test runner forces `mode: primary` (confirmed in debug logs) - But ContextScout still tries to delegate to itself via `task` tool - Tool calls show: `task → ContextScout` (recursive!) - The test framework captures parent agent's tool calls, not nested subagent's **Conclusion**: The eval framework's "standalone" mode doesn't truly run subagents standalone. They're still invoked through a parent wrapper. --- ### 4. Test Framework Limitations 📊 **Issue**: When testing subagents, the framework captures the parent agent's tool usage, not the subagent's internal tool usage. **Impact**: - Can't validate that ContextScout uses `glob/read/grep` internally - Can only validate that parent agent delegates to ContextScout - Behavior expectations (mustUseTools) check parent, not subagent **Workaround Needed**: - Option A: Test ContextScout via delegation mode (`--subagent=contextscout --delegate`) - Option B: Modify test framework to capture nested subagent tool calls - Option C: Create integration tests that check end-to-end behavior --- ## Test Results Summary | Test | Agent | Expected Behavior | Actual Behavior | Status | |------|-------|-------------------|-----------------|--------| | 04-implicit-discovery | OpenAgent | Use ContextScout | Used grep/read directly | ❌ FAIL | | 02-unknown-domain | OpenAgent | Use ContextScout | Minimal response, no delegation | ❌ FAIL | | 01-code-standards | ContextScout | Use glob/read | Delegated to itself (wrapper issue) | ❌ FAIL | --- ## Root Cause Analysis ### Why OpenAgent Doesn't Use ContextScout Looking at OpenAgent's prompt (`.opencode/agent/core/openagent.md`): **Stage 3.0: DiscoverContext** is marked as `optional="true"` ```xml OPTIONAL: Use ContextScout to discover relevant context files intelligently When to use ContextScout: - Unfamiliar with project structure or domain - Need to find domain-specific patterns or standards - Looking for examples, guides, or error solutions - Want to ensure you have all relevant context before proceeding ``` **Problem**: The step is optional, and the criteria are vague. OpenAgent can easily skip this step and use its own tools (grep/read) instead. **Why this happens**: 1. OpenAgent has access to `grep` and `read` tools 2. Using grep/read is faster than delegating to ContextScout 3. The "when to use" criteria are subjective 4. No enforcement mechanism to ensure ContextScout is used --- ## Recommendations ### Short Term: Fix OpenAgent Prompt **Option 1**: Make ContextScout usage more explicit for unfamiliar domains ```xml Check if task involves unfamiliar domain: - Registry system, eval framework, agent creation, etc. IF unfamiliar domain: MUST delegate to ContextScout for discovery ELSE: MAY load known context directly ``` **Option 2**: Add a decision tree ``` Is this a standard task (code/docs/tests)? YES → Load known context directly (.opencode/context/core/standards/*) NO → Is this domain-specific (registry, evals, agents)? YES → MUST use ContextScout for discovery NO → Proceed with available context ``` ### Medium Term: Improve Test Framework **Issue**: Can't test subagents in true standalone mode **Solution**: Add nested tool call capture - Track tool calls from parent AND subagents - Separate validation for parent vs subagent behavior - Add `nestedToolCalls` to test expectations ### Long Term: Evaluate ContextScout Value **Question**: Is ContextScout actually needed? **Evidence**: - OpenAgent can discover context using grep/read directly - ContextScout adds delegation overhead - Agents aren't using it proactively **Options**: 1. **Keep ContextScout**: Make usage mandatory for specific scenarios 2. **Simplify**: Remove ContextScout, improve direct discovery 3. **Hybrid**: Use ContextScout only for complex multi-domain queries --- ## Next Steps ### Immediate Actions 1. ✅ **Document findings** (this file) 2. ⏳ **Decide on ContextScout strategy**: - Make it mandatory for unfamiliar domains? - Remove it and improve direct discovery? - Keep it optional but improve prompts? 3. ⏳ **Fix test framework** to support true subagent testing 4. ⏳ **Update OpenAgent/OpenCoder prompts** based on decision ### Test Strategy Going Forward **For now, focus on integration tests**: - Test that OpenAgent CAN delegate to ContextScout (delegation mode) - Test end-to-end behavior (does agent find correct context?) - Don't worry about internal tool usage until framework supports it **Create simpler tests**: - Does OpenAgent find correct context files? (regardless of how) - Does OpenAgent load context before execution? - Does OpenAgent provide accurate answers? --- ## Files Created ### OpenAgent Integration Tests - `evals/agents/core/openagent/tests/contextscout-integration/04-implicit-discovery.yaml` - `evals/agents/core/openagent/tests/contextscout-integration/05-multi-domain-comprehensive.yaml` ### OpenCoder Integration Tests - `evals/agents/core/opencoder/tests/contextscout-integration/01-implicit-pattern-discovery.yaml` - `evals/agents/core/opencoder/tests/contextscout-integration/README.md` ### ContextScout Functionality Tests - `evals/agents/ContextScout/tests/01-code-standards-discovery.yaml` - `evals/agents/ContextScout/tests/02-domain-specific-discovery.yaml` - `evals/agents/ContextScout/tests/03-bad-request-handling.yaml` - `evals/agents/ContextScout/tests/04-multi-domain-comprehensive.yaml` - `evals/agents/ContextScout/tests/05-tool-usage-validation.yaml` - `evals/agents/ContextScout/tests/README.md` ### Documentation - `evals/agents/CONTEXTSCOUT_INTEGRATION_TESTS.md` - Comprehensive test suite overview - `evals/agents/CONTEXTSCOUT_TEST_FINDINGS.md` - This file ### Framework Updates - Updated `evals/framework/src/sdk/run-sdk-tests.ts` - Added contextscout to subagent maps - Updated `evals/framework/src/sdk/test-runner.ts` - Added contextscout to agent map - Updated `evals/agents/ContextScout/config/config.yaml` - Added test suites --- ## Conclusion **The tests revealed that OpenAgent and OpenCoder are NOT proactively using ContextScout as intended.** They're using their own discovery tools (grep/read) instead, which is actually faster but potentially less comprehensive. **Decision needed**: Should we: 1. Enforce ContextScout usage for unfamiliar domains? 2. Remove ContextScout and improve direct discovery? 3. Keep current approach (optional ContextScout)? **Test framework limitation**: Can't validate subagent internal behavior in standalone mode. Need to either fix framework or focus on integration/delegation tests. --- **Next conversation**: Decide on ContextScout strategy and update agent prompts accordingly.