# ContextScout Integration Tests **Purpose**: Validate that OpenAgent uses ContextScout effectively - at the right time, for the right reasons, with the right outcomes. **Created**: 2026-01-07 **Status**: Ready to Run --- ## The Big Question **Should we use ContextScout to help agents discover context, or is it adding unnecessary complexity?** These tests answer: 1. ✅ When should agents use ContextScout vs. direct loading? 2. ✅ Does ContextScout improve accuracy and completeness? 3. ✅ What's the performance trade-off? 4. ✅ Are agents using it predictably? --- ## Test Suite ### Test 1: Known Context - Direct Loading ⚡ **File**: `01-known-context-direct-load.yaml` **Scenario**: "Write a fibonacci function" **Expected**: Agent should DIRECTLY load `code.md` without using ContextScout **Why**: For standard tasks (code/docs/tests), the context path is well-known. ContextScout adds overhead without value. **Success Criteria**: - ✅ Loads `.opencode/context/core/standards/code.md` - ✅ Does NOT use task tool (no ContextScout delegation) - ✅ Completes in <30 seconds --- ### Test 2: Unknown Domain - Discovery 🔍 **File**: `02-unknown-domain-discovery.yaml` **Scenario**: "Explain how the eval framework works" **Expected**: Agent should USE ContextScout to discover eval-specific context **Why**: For domain-specific topics, agents don't know what context exists. ContextScout discovers relevant files. **Success Criteria**: - ✅ Delegates to ContextScout - ✅ Finds `.opencode/context/openagents-repo/core-concepts/evals.md` - ✅ Loads discovered files - ✅ Provides comprehensive answer --- ### Test 3: Accuracy - Correct Files 🎯 **File**: `03-accuracy-correct-files.yaml` **Scenario**: "What are the MVI principles?" **Expected**: ContextScout finds the CORRECT file (mvi.md), not random files **Why**: Validates ContextScout's search accuracy and relevance filtering. **Success Criteria**: - ✅ Uses ContextScout to search - ✅ Finds `.opencode/context/core/context-system/standards/mvi.md` - ✅ No irrelevant files loaded - ✅ Accurate explanation of MVI --- ## Running the Tests ### Run All Integration Tests ```bash cd evals/framework npm run eval:sdk -- --agent=core/openagent --pattern="contextscout-integration/*.yaml" ``` ### Run Individual Tests ```bash # Test 1: Known context (should NOT use ContextScout) npm run eval:sdk -- --agent=core/openagent --pattern="01-known-context-direct-load.yaml" # Test 2: Unknown domain (should use ContextScout) npm run eval:sdk -- --agent=core/openagent --pattern="02-unknown-domain-discovery.yaml" # Test 3: Accuracy (ContextScout finds correct files) npm run eval:sdk -- --agent=core/openagent --pattern="03-accuracy-correct-files.yaml" ``` ### Run Without Evaluators (Focus on Behavior) ```bash npm run eval:sdk -- --agent=core/openagent --pattern="contextscout-integration/*.yaml" --no-evaluators ``` --- ## Interpreting Results ### Scenario A: ContextScout is "The Way Forward" ✅ **If tests show**: - Test 1: Agent loads code.md directly (fast, no ContextScout) - Test 2: Agent uses ContextScout, finds eval context (accurate discovery) - Test 3: ContextScout finds mvi.md correctly (high accuracy) **Conclusion**: ContextScout is valuable for discovery, agents use it intelligently **Action**: Keep ContextScout, refine when/how agents use it --- ### Scenario B: ContextScout Adds Overhead Without Value ❌ **If tests show**: - Test 1: Agent uses ContextScout for simple task (unnecessary overhead) - Test 2: Agent doesn't use ContextScout, misses domain context (inconsistent) - Test 3: ContextScout finds wrong files (poor accuracy) **Conclusion**: ContextScout isn't helping, agents aren't using it predictably **Action**: Remove ContextScout OR fix agent decision-making --- ### Scenario C: Mixed Results - Needs Refinement ⚠️ **If tests show**: - Test 1: Sometimes uses ContextScout, sometimes doesn't (unpredictable) - Test 2: Uses ContextScout but takes too long (>60s) - Test 3: Finds correct files but also loads irrelevant ones (noisy) **Conclusion**: ContextScout has potential but needs tuning **Action**: Refine agent prompts, improve ContextScout search, add decision criteria --- ## Decision Matrix | Test Result | Interpretation | Action | |-------------|----------------|--------| | All 3 pass | ✅ ContextScout working as designed | Keep it, document best practices | | Test 1 fails (uses ContextScout) | ⚠️ Agents using it when they shouldn't | Refine agent decision logic | | Test 2 fails (no ContextScout) | ⚠️ Agents not using it when they should | Update agent prompts | | Test 3 fails (wrong files) | ❌ ContextScout search accuracy poor | Improve ContextScout search logic | | All 3 fail | ❌ ContextScout not ready | Disable or redesign | --- ## Success Metrics ### ContextScout is "Worth It" if: 1. **Accuracy**: Finds correct files 90%+ of the time (Test 3) 2. **Efficiency**: Agents skip it for known tasks (Test 1) 3. **Discovery**: Agents use it for unknown domains (Test 2) 4. **Speed**: Adds <30s overhead for discovery (Test 2) 5. **Predictability**: Consistent behavior across runs ### ContextScout is "Not Worth It" if: 1. **Overhead**: Used for simple tasks where direct loading is faster 2. **Inaccuracy**: Finds wrong files >20% of the time 3. **Unpredictability**: Random usage, no clear pattern 4. **Complexity**: Makes workflows harder to understand 5. **Slowness**: Adds >60s to task completion --- ## Next Steps 1. **Run Phase 1 tests** (these 3 tests) 2. **Analyze results** using decision matrix above 3. **Decide**: Is ContextScout improving workflows? 4. **If YES**: Document best practices, add more tests 5. **If NO**: Disable ContextScout, use direct loading only 6. **If MIXED**: Refine agent prompts and ContextScout search --- ## Related Documentation - [ContextScout Agent](.opencode/agent/ContextScout.md) - [ContextScout Tests](../../../ContextScout/README.md) - [OpenAgent Workflow](.opencode/agent/core/openagent.md) - [Test Plan](./contextscout-integration-test-plan.md) --- **Key Insight**: ContextScout should be a **discovery tool** for unknown domains, NOT a replacement for direct loading of well-known context paths. These tests validate this hypothesis.