CONTEXTSCOUT_TEST_FINDINGS.md 8.5 KB

ContextScout Integration Test Findings

Date: 2026-01-09
Status: Tests Created, Initial Run Complete


Summary

We created comprehensive tests for ContextScout integration and ran initial tests. Here's what we learned:


Key Findings

1. OpenAgent Does NOT Use ContextScout Proactively ❌

Test: 04-implicit-discovery.yaml - "How does the registry system work?"

Expected: OpenAgent should delegate to ContextScout to discover registry context files

Actual: OpenAgent used grep and read directly to find information

  • Used grep to search for "registry"
  • Read registry.json directly
  • Used grep to search for "auto-detect"
  • Eventually found .opencode/context/openagents-repo/core-concepts/registry.md
  • Did NOT use the task tool to delegate to ContextScout

Conclusion: OpenAgent is NOT proactively using ContextScout for discovery. It's doing its own searching.


2. OpenAgent Provides Minimal Responses for Discovery Requests ⚠️

Test: 02-unknown-domain-discovery.yaml - "Explain how the eval framework works"

Expected: OpenAgent should use ContextScout and provide comprehensive answer

Actual: OpenAgent made only 1 tool call and provided minimal response

  • Very short execution time (14.5s)
  • Only 1 tool call (insufficient for discovery)
  • Did NOT delegate to ContextScout

Conclusion: OpenAgent may be treating discovery requests as conversational rather than requiring deep context loading.


3. ContextScout Cannot Be Tested in True Standalone Mode 🔧

Test: 01-code-standards-discovery.yaml via --subagent=contextscout

Expected: ContextScout runs as mode: primary and uses glob/read directly

Actual: Even in "standalone" mode, ContextScout is wrapped by a parent agent

  • Test runner forces mode: primary (confirmed in debug logs)
  • But ContextScout still tries to delegate to itself via task tool
  • Tool calls show: task → ContextScout (recursive!)
  • The test framework captures parent agent's tool calls, not nested subagent's

Conclusion: The eval framework's "standalone" mode doesn't truly run subagents standalone. They're still invoked through a parent wrapper.


4. Test Framework Limitations 📊

Issue: When testing subagents, the framework captures the parent agent's tool usage, not the subagent's internal tool usage.

Impact:

  • Can't validate that ContextScout uses glob/read/grep internally
  • Can only validate that parent agent delegates to ContextScout
  • Behavior expectations (mustUseTools) check parent, not subagent

Workaround Needed:

  • Option A: Test ContextScout via delegation mode (--subagent=contextscout --delegate)
  • Option B: Modify test framework to capture nested subagent tool calls
  • Option C: Create integration tests that check end-to-end behavior

Test Results Summary

Test Agent Expected Behavior Actual Behavior Status
04-implicit-discovery OpenAgent Use ContextScout Used grep/read directly ❌ FAIL
02-unknown-domain OpenAgent Use ContextScout Minimal response, no delegation ❌ FAIL
01-code-standards ContextScout Use glob/read Delegated to itself (wrapper issue) ❌ FAIL

Root Cause Analysis

Why OpenAgent Doesn't Use ContextScout

Looking at OpenAgent's prompt (.opencode/agent/core/openagent.md):

Stage 3.0: DiscoverContext is marked as optional="true"

<step id="3.0" name="DiscoverContext" optional="true">
  OPTIONAL: Use ContextScout to discover relevant context files intelligently
  
  When to use ContextScout:
  - Unfamiliar with project structure or domain
  - Need to find domain-specific patterns or standards
  - Looking for examples, guides, or error solutions
  - Want to ensure you have all relevant context before proceeding

Problem: The step is optional, and the criteria are vague. OpenAgent can easily skip this step and use its own tools (grep/read) instead.

Why this happens:

  1. OpenAgent has access to grep and read tools
  2. Using grep/read is faster than delegating to ContextScout
  3. The "when to use" criteria are subjective
  4. No enforcement mechanism to ensure ContextScout is used

Recommendations

Short Term: Fix OpenAgent Prompt

Option 1: Make ContextScout usage more explicit for unfamiliar domains

<step id="3.0" name="DiscoverContext" required="conditional">
  Check if task involves unfamiliar domain:
  - Registry system, eval framework, agent creation, etc.
  
  IF unfamiliar domain:
    MUST delegate to ContextScout for discovery
  ELSE:
    MAY load known context directly

Option 2: Add a decision tree

Is this a standard task (code/docs/tests)? 
  YES → Load known context directly (.opencode/context/core/standards/*)
  NO → Is this domain-specific (registry, evals, agents)?
    YES → MUST use ContextScout for discovery
    NO → Proceed with available context

Medium Term: Improve Test Framework

Issue: Can't test subagents in true standalone mode

Solution: Add nested tool call capture

  • Track tool calls from parent AND subagents
  • Separate validation for parent vs subagent behavior
  • Add nestedToolCalls to test expectations

Long Term: Evaluate ContextScout Value

Question: Is ContextScout actually needed?

Evidence:

  • OpenAgent can discover context using grep/read directly
  • ContextScout adds delegation overhead
  • Agents aren't using it proactively

Options:

  1. Keep ContextScout: Make usage mandatory for specific scenarios
  2. Simplify: Remove ContextScout, improve direct discovery
  3. Hybrid: Use ContextScout only for complex multi-domain queries

Next Steps

Immediate Actions

  1. Document findings (this file)
  2. Decide on ContextScout strategy:

    • Make it mandatory for unfamiliar domains?
    • Remove it and improve direct discovery?
    • Keep it optional but improve prompts?
  3. Fix test framework to support true subagent testing

  4. Update OpenAgent/OpenCoder prompts based on decision

Test Strategy Going Forward

For now, focus on integration tests:

  • Test that OpenAgent CAN delegate to ContextScout (delegation mode)
  • Test end-to-end behavior (does agent find correct context?)
  • Don't worry about internal tool usage until framework supports it

Create simpler tests:

  • Does OpenAgent find correct context files? (regardless of how)
  • Does OpenAgent load context before execution?
  • Does OpenAgent provide accurate answers?

Files Created

OpenAgent Integration Tests

  • evals/agents/core/openagent/tests/contextscout-integration/04-implicit-discovery.yaml
  • evals/agents/core/openagent/tests/contextscout-integration/05-multi-domain-comprehensive.yaml

OpenCoder Integration Tests

  • evals/agents/core/opencoder/tests/contextscout-integration/01-implicit-pattern-discovery.yaml
  • evals/agents/core/opencoder/tests/contextscout-integration/README.md

ContextScout Functionality Tests

  • evals/agents/ContextScout/tests/01-code-standards-discovery.yaml
  • evals/agents/ContextScout/tests/02-domain-specific-discovery.yaml
  • evals/agents/ContextScout/tests/03-bad-request-handling.yaml
  • evals/agents/ContextScout/tests/04-multi-domain-comprehensive.yaml
  • evals/agents/ContextScout/tests/05-tool-usage-validation.yaml
  • evals/agents/ContextScout/tests/README.md

Documentation

  • evals/agents/CONTEXTSCOUT_INTEGRATION_TESTS.md - Comprehensive test suite overview
  • evals/agents/CONTEXTSCOUT_TEST_FINDINGS.md - This file

Framework Updates

  • Updated evals/framework/src/sdk/run-sdk-tests.ts - Added contextscout to subagent maps
  • Updated evals/framework/src/sdk/test-runner.ts - Added contextscout to agent map
  • Updated evals/agents/ContextScout/config/config.yaml - Added test suites

Conclusion

The tests revealed that OpenAgent and OpenCoder are NOT proactively using ContextScout as intended. They're using their own discovery tools (grep/read) instead, which is actually faster but potentially less comprehensive.

Decision needed: Should we:

  1. Enforce ContextScout usage for unfamiliar domains?
  2. Remove ContextScout and improve direct discovery?
  3. Keep current approach (optional ContextScout)?

Test framework limitation: Can't validate subagent internal behavior in standalone mode. Need to either fix framework or focus on integration/delegation tests.


Next conversation: Decide on ContextScout strategy and update agent prompts accordingly.