# ContextScout Integration Test Plan **Purpose**: Validate that OpenAgent and OpenCoder use ContextScout effectively to discover and load the RIGHT context at the RIGHT time. **Date**: 2026-01-07 **Status**: Draft - Ready for Review --- ## Core Questions 1. **When should agents use ContextScout vs. hardcoded context paths?** 2. **Does ContextScout improve speed, accuracy, or predictability?** 3. **Are agents using ContextScout when they should?** 4. **Does ContextScout help or hurt the workflow?** --- ## Test Scenarios ### Scenario 1: Known Context (Hardcoded Paths Should Win) **Task**: "Write a new function to calculate fibonacci numbers" **Expected Behavior**: - Agent should DIRECTLY load `.opencode/context/core/standards/code.md` - NO need for ContextScout (path is well-known) - Fast execution (no discovery overhead) **Why**: For standard tasks (code/docs/tests), agents already know the context path. ContextScout adds overhead without value. **Test**: ```yaml name: "OpenCoder: Known Context - Direct Loading" prompts: - text: "Write a new function to calculate fibonacci numbers" expectations: - type: context_loaded contexts: [".opencode/context/core/standards/code.md"] - type: tool_not_called tool: "task" reason: "Should not delegate to ContextScout for known context" - type: max_duration value: 30000 # Should be fast without discovery ``` --- ### Scenario 2: Unknown Domain (ContextScout Should Help) **Task**: "Find context files about eval framework testing patterns" **Expected Behavior**: - Agent should use ContextScout to discover eval-specific context - ContextScout finds `.opencode/context/openagents-repo/core-concepts/evals.md` - Agent loads discovered files - More accurate than guessing **Why**: For domain-specific or unfamiliar topics, ContextScout discovers relevant files that agents might miss. **Test**: ```yaml name: "OpenAgent: Unknown Domain - ContextScout Discovery" prompts: - text: "Find context files about eval framework testing patterns" expectations: - type: tool_called tool: "task" args_contain: "ContextScout" - type: context_loaded contexts: [".opencode/context/openagents-repo/core-concepts/evals.md"] - type: max_duration value: 60000 # Discovery adds time, but finds right context ``` --- ### Scenario 3: Ambiguous Task (ContextScout Clarifies) **Task**: "Help me improve the documentation" **Expected Behavior**: - Agent unsure which docs (code docs? user docs? API docs?) - Uses ContextScout to discover available doc standards - ContextScout returns multiple options with priorities - Agent asks user to clarify OR picks most relevant **Why**: ContextScout helps agents understand what context exists when task is vague. **Test**: ```yaml name: "OpenAgent: Ambiguous Task - ContextScout Clarification" prompts: - text: "Help me improve the documentation" expectations: - type: tool_called tool: "task" args_contain: "contextscout" - type: output_contains value: "documentation" - type: behavior check: "agent_asks_for_clarification OR agent_loads_relevant_context" ``` --- ### Scenario 4: Multi-Domain Task (ContextScout Finds All) **Task**: "Create a new agent with tests and documentation" **Expected Behavior**: - Agent needs: code standards, test standards, doc standards, agent creation guide - Uses ContextScout to discover ALL relevant files - ContextScout returns prioritized list - Agent loads in correct order **Why**: Complex tasks need multiple context files. ContextScout ensures nothing is missed. **Test**: ```yaml name: "OpenAgent: Multi-Domain - ContextScout Comprehensive Discovery" prompts: - text: "Create a new agent with tests and documentation" expectations: - type: tool_called tool: "task" args_contain: "contextscout" - type: context_loaded contexts: - ".opencode/context/core/standards/code.md" - ".opencode/context/core/standards/tests.md" - ".opencode/context/core/standards/docs.md" - ".opencode/context/openagents-repo/guides/adding-agent.md" - type: loading_order check: "critical_files_loaded_first" ``` --- ### Scenario 5: Speed Test (Direct vs. Discovery) **Task A**: "Write a function" (direct loading) **Task B**: "Write a function" (with ContextScout discovery) **Expected Behavior**: - Task A: Loads code.md directly (~5-10s) - Task B: Uses ContextScout, then loads code.md (~15-25s) - Task A should be faster for known context **Why**: Measure the overhead of ContextScout for known tasks. **Test**: ```yaml name: "Performance: Direct Loading vs. ContextScout Discovery" variants: - name: "direct" force_behavior: "skip_contextscout" expected_duration: 10000 - name: "discovery" force_behavior: "use_contextscout" expected_duration: 25000 comparison: - type: duration_difference max_overhead: 15000 # ContextScout should add <15s ``` --- ### Scenario 6: Accuracy Test (Does ContextScout Find Right Files?) **Task**: "Find context about MVI principles" **Expected Behavior**: - ContextScout searches for "MVI" - Finds `.opencode/context/core/context-system/standards/mvi.md` - Returns exact path with line ranges - Agent loads correct file **Why**: Validate ContextScout's search accuracy. **Test**: ```yaml name: "Accuracy: ContextScout Finds Correct Files" prompts: - text: "Find context about MVI principles" expectations: - type: tool_called tool: "task" args_contain: "contextscout" - type: context_loaded contexts: [".opencode/context/core/context-system/standards/mvi.md"] - type: no_false_positives check: "only_relevant_files_loaded" ``` --- ### Scenario 7: Predictability Test (Consistent Behavior) **Task**: Run same task 5 times, check if agent uses ContextScout consistently **Expected Behavior**: - For known tasks: NEVER uses ContextScout (5/5 times) - For unknown tasks: ALWAYS uses ContextScout (5/5 times) - Consistent decision-making **Why**: Agents should be predictable, not random. **Test**: ```yaml name: "Predictability: Consistent ContextScout Usage" iterations: 5 prompts: - text: "Write a function to parse JSON" # Known task expectations: - type: consistency check: "contextscout_usage_same_across_runs" - type: expected_behavior value: "never_uses_contextscout" # Known context path ``` --- ## Decision Matrix: When to Use ContextScout | Scenario | Use ContextScout? | Why | |----------|-------------------|-----| | Standard code task | ❌ NO | Path known: `.opencode/context/core/standards/code.md` | | Standard docs task | ❌ NO | Path known: `.opencode/context/core/standards/docs.md` | | Standard tests task | ❌ NO | Path known: `.opencode/context/core/standards/tests.md` | | Domain-specific task | ✅ YES | Need to discover domain context (e.g., evals, registry) | | Unfamiliar topic | ✅ YES | Don't know what context exists | | Multi-domain task | ✅ YES | Need to find ALL relevant files | | Ambiguous request | ✅ YES | Clarify what context is available | | Error/troubleshooting | ✅ YES | Find error-specific guides | --- ## Success Criteria ### ContextScout is "The Way Forward" if: 1. ✅ **Accuracy**: Finds correct context files 95%+ of the time 2. ✅ **Speed**: Adds <15s overhead for discovery 3. ✅ **Predictability**: Agents use it consistently for unknown domains 4. ✅ **Value**: Improves outcomes for complex/multi-domain tasks 5. ✅ **Simplicity**: Doesn't add confusion or complexity ### ContextScout is "Not Worth It" if: 1. ❌ **Slow**: Adds >30s overhead 2. ❌ **Inaccurate**: Finds wrong files >20% of the time 3. ❌ **Unpredictable**: Agents use it randomly 4. ❌ **Overhead**: Used for simple tasks where direct loading is better 5. ❌ **Complexity**: Makes workflows harder to understand --- ## Recommended Test Implementation ### Phase 1: Basic Integration (3 tests) 1. Known context - direct loading (should NOT use ContextScout) 2. Unknown domain - discovery (should use ContextScout) 3. Accuracy - finds correct files ### Phase 2: Performance (2 tests) 4. Speed comparison (direct vs. discovery) 5. Overhead measurement ### Phase 3: Predictability (2 tests) 6. Consistency across runs 7. Multi-domain comprehensive discovery --- ## Next Steps 1. **Review this plan** - Does it answer your questions? 2. **Implement Phase 1 tests** - Basic integration validation 3. **Run tests and analyze results** - Measure actual behavior 4. **Decide**: Is ContextScout improving workflows or adding complexity? 5. **Refine agent prompts** - Update when/how to use ContextScout based on results --- **Key Insight**: ContextScout should be used for **discovery** (unknown domains), NOT for **known paths** (standard code/docs/tests). The test suite will validate this hypothesis.