ContextScout Testing Strategy

What Are We Actually Testing?

The Confusion

ContextScout is a subagent with mode: subagent, which means:

Production: OpenAgent calls ContextScout via task tool
Testing: We can test it TWO ways

Current Problem: Our tests are mixing both approaches and not testing either properly.

Two Testing Approaches

Approach 1: Test ContextScout Directly (Standalone Mode)

What we're testing: ContextScout's logic in isolation

How it works:

# Force ContextScout to run as primary agent
npm run eval:sdk -- --subagent=contextscout --pattern="smoke-test.yaml"

What happens:

Eval framework forces mode: primary (overrides mode: subagent)
ContextScout receives user prompt directly
ContextScout uses tools (glob, read, grep) directly
No OpenAgent involved

Use when:

✅ Debugging ContextScout's logic
✅ Testing tool usage (glob, read, grep)
✅ Verifying output format
✅ Developing new features

Limitations:

⚠️ Not how it runs in production
⚠️ Doesn't test delegation
⚠️ Doesn't test OpenAgent integration

Approach 2: Test OpenAgent → ContextScout Delegation

What we're testing: OpenAgent's ability to delegate to ContextScout

How it works:

# Test delegation from OpenAgent to ContextScout
npm run eval:sdk -- --agent=core/openagent --pattern="delegation-test.yaml"

What happens:

OpenAgent receives user prompt
OpenAgent decides to delegate to ContextScout
OpenAgent calls task(subagent_type="ContextScout", ...)
ContextScout runs as subagent
ContextScout returns results to OpenAgent
OpenAgent presents results to user

Use when:

✅ Testing production workflow
✅ Verifying delegation logic
✅ Testing integration
✅ Validating real-world usage

Limitations:

⚠️ Harder to debug (two agents involved)
⚠️ Slower (more overhead)
⚠️ Tests both OpenAgent AND ContextScout

Current Test Issues

Issue 1: Wrong Testing Mode

Current:

npm run eval:sdk -- --agent=ContextScout

Problem: This runs through OpenAgent but doesn't properly test delegation

What actually happens:

Eval framework loads ContextScout as the agent
OpenAgent wraps it (because it's in subagents/ directory)
OpenAgent receives the test prompt
OpenAgent answers directly OR tries to delegate
ContextScout may or may not be invoked

Result: Inconsistent - sometimes ContextScout runs, sometimes it doesn't

Issue 2: No Clear Test Separation

Current tests mix concerns:

Some tests expect ContextScout to use tools directly
Some tests expect OpenAgent to delegate
No clear separation of what's being tested

Result: Tests fail for unclear reasons

Recommended Testing Strategy

Phase 1: Test ContextScout Directly (Standalone)

Goal: Verify ContextScout's core logic works

Tests:

✅ Discovery - Can ContextScout find context files?
✅ Search - Can ContextScout locate specific files?
✅ Extraction - Can ContextScout extract key findings?
✅ Output Format - Does ContextScout return proper format?
✅ Error Handling - Does ContextScout handle errors gracefully?

Command:

cd evals/framework
npm run eval:sdk -- --subagent=contextscout --pattern="standalone/*.yaml"

Test Structure:

# evals/agents/ContextScout/tests/standalone/01-discovery.yaml
id: contextscout-standalone-discovery
name: "ContextScout Standalone: Discovery Test"
description: |
  Tests ContextScout's ability to discover context files when running
  as a standalone agent (mode: primary override).
  
  This tests ContextScout's logic in isolation, not delegation.

prompts:
  - text: |
      Find all context files in .opencode/context/core/
      
      Return:
      - Exact file paths
      - File count
      - Directory structure

approvalStrategy:
  type: auto-approve

behavior:
  mustUseTools:
    - glob  # ContextScout MUST use glob to discover files
    - list  # ContextScout MUST use list to explore structure
  minToolCalls: 2
  maxToolCalls: 20

# Verify ContextScout's output
assertions:
  - type: output_contains
    value: ".opencode/context/core/"
  - type: tool_called
    tool: "glob"
  - type: tool_called
    tool: "list"

timeout: 60000

tags:
  - contextscout
  - standalone
  - discovery
  - unit-test

Phase 2: Test OpenAgent → ContextScout Delegation

Goal: Verify OpenAgent delegates to ContextScout correctly

Tests:

✅ Delegation Trigger - Does OpenAgent recognize when to delegate?
✅ Delegation Format - Does OpenAgent pass correct parameters?
✅ Result Handling - Does OpenAgent present ContextScout's results?
✅ Error Propagation - Does OpenAgent handle ContextScout errors?

Command:

cd evals/framework
npm run eval:sdk -- --agent=core/openagent --pattern="delegation/*.yaml"

Test Structure:

# evals/agents/core/openagent/tests/delegation/contextscout-delegation.yaml
id: openagent-contextscout-delegation
name: "OpenAgent: ContextScout Delegation Test"
description: |
  Tests OpenAgent's ability to delegate to ContextScout.
  
  This tests the delegation workflow, not ContextScout's logic.

prompts:
  - text: |
      I need to find context files for code standards. Can you help?

approvalStrategy:
  type: auto-approve

behavior:
  mustUseTools:
    - task  # OpenAgent MUST use task tool to delegate
  minToolCalls: 1
  maxToolCalls: 10

# Verify OpenAgent delegates to ContextScout
assertions:
  - type: tool_called
    tool: "task"
    with_args:
      subagent_type: "ContextScout"
  - type: output_contains
    value: ".opencode/context/core/standards/code.md"

timeout: 120000

tags:
  - openagent
  - delegation
  - contextscout
  - integration-test

Debugging Guide

Debugging Standalone Tests

When test fails:

Check tool usage: ```bash

Run with debug

npm run eval:sdk -- --subagent=contextscout --pattern="test.yaml" --debug

# Look for tool calls cat evals/results/latest.json | jq '.tests[0].timeline[] | select(.type == "tool_call")'


2. **Verify ContextScout is running**:
   ```bash
   # Check agent name in logs
   # Should see: "Agent: ContextScout" (not "Agent: OpenAgent")

Check output format:

# View agent response
cat evals/results/latest.json | jq '.tests[0].timeline[] | select(.type == "assistant_message")'

Common issues:
- ❌ Agent not using tools → Check mustUseTools in test
- ❌ Wrong output format → Check ContextScout prompt
- ❌ Timeout → Increase timeout or simplify test

Debugging Delegation Tests

When test fails:

Check if delegation happened:

# Look for task tool call
cat evals/results/latest.json | jq '.tests[0].timeline[] | select(.tool == "task")'

Verify delegation parameters:

# Check subagent_type
cat evals/results/latest.json | jq '.tests[0].timeline[] | select(.tool == "task") | .arguments'

Check ContextScout's response:

# Look for ContextScout's output in timeline
# Should see nested agent execution

Common issues:
- ❌ OpenAgent doesn't delegate → Prompt not clear enough
- ❌ Wrong subagent called → OpenAgent chose different subagent
- ❌ Delegation fails → Check ContextScout is registered

Test File Organization

Recommended Structure

evals/agents/ContextScout/
├── config/
│   └── config.yaml                      # Test configuration
├── tests/
│   ├── standalone/                      # Phase 1: Test ContextScout directly
│   │   ├── 01-discovery.yaml           # Find files
│   │   ├── 02-search.yaml              # Search for specific files
│   │   ├── 03-extraction.yaml          # Extract key findings
│   │   ├── 04-output-format.yaml       # Verify output format
│   │   ├── 05-error-handling.yaml      # Handle errors
│   │   ├── 06-false-positive.yaml      # Prevent hallucinations
│   │   ├── 07-invalid-path.yaml        # Handle invalid paths
│   │   ├── 08-ambiguous-query.yaml     # Handle vague queries
│   │   ├── 09-mvi-detection.yaml       # Detect MVI compliance
│   │   └── 10-empty-directory.yaml     # Handle empty dirs
│   └── delegation/                      # Phase 2: Test OpenAgent delegation
│       ├── 01-trigger.yaml             # Does OpenAgent delegate?
│       ├── 02-parameters.yaml          # Correct parameters?
│       └── 03-results.yaml             # Results presented correctly?
└── README.md

Implementation Plan

Step 1: Reorganize Tests (30 min)

Create tests/standalone/ directory
Move current tests to standalone/
Update test descriptions to clarify "standalone mode"
Update config.yaml

Step 2: Fix Standalone Tests (1 hour)

Update test command to use --subagent=contextscout
Add mustUseTools to enforce tool usage
Add assertions to verify output
Run and debug each test

Step 3: Create Delegation Tests (1 hour)

Create tests/delegation/ directory
Create OpenAgent delegation tests
Test delegation trigger
Test result handling

Step 4: Document Results (30 min)

Update README with test results
Document debugging procedures
Create troubleshooting guide

Quick Reference

Test ContextScout Directly

# Standalone mode - tests ContextScout's logic
npm run eval:sdk -- --subagent=contextscout --pattern="standalone/*.yaml"

Test OpenAgent Delegation

# Delegation mode - tests OpenAgent → ContextScout
npm run eval:sdk -- --agent=core/openagent --pattern="delegation/*.yaml"

Debug Standalone Test

# Run with debug output
npm run eval:sdk -- --subagent=contextscout --pattern="standalone/01-discovery.yaml" --debug

# Check tool calls
cat evals/results/latest.json | jq '.tests[0].timeline[] | select(.type == "tool_call") | {tool, args}'

Debug Delegation Test

# Run with debug output
npm run eval:sdk -- --agent=core/openagent --pattern="delegation/01-trigger.yaml" --debug

# Check if delegation happened
cat evals/results/latest.json | jq '.tests[0].timeline[] | select(.tool == "task")'

Bottom Line

What should we test?

Phase 1: Test ContextScout's logic directly (standalone mode)
- Verify tool usage (glob, read, grep)
- Verify output format
- Verify error handling
Phase 2: Test OpenAgent's delegation (integration mode)
- Verify OpenAgent delegates correctly
- Verify results are presented properly
- Verify error propagation

Current status: Tests are confused - mixing both modes

Next step: Reorganize tests into standalone/ and delegation/ directories

Last Updated: 2026-01-07
Status: Strategy defined, implementation pending

TESTING_STRATEGY.md 11 KB История Исходник