# ContextScout Testing Strategy

## What Are We Actually Testing?

### The Confusion

ContextScout is a **subagent** with `mode: subagent`, which means:
- **Production**: OpenAgent calls ContextScout via `task` tool
- **Testing**: We can test it TWO ways

**Current Problem**: Our tests are mixing both approaches and not testing either properly.

---

## Two Testing Approaches

### Approach 1: Test ContextScout Directly (Standalone Mode)

**What we're testing**: ContextScout's logic in isolation

**How it works**:
```bash
# Force ContextScout to run as primary agent
npm run eval:sdk -- --subagent=contextscout --pattern="smoke-test.yaml"
```

**What happens**:
- Eval framework forces `mode: primary` (overrides `mode: subagent`)
- ContextScout receives user prompt directly
- ContextScout uses tools (glob, read, grep) directly
- No OpenAgent involved

**Use when**:
- ✅ Debugging ContextScout's logic
- ✅ Testing tool usage (glob, read, grep)
- ✅ Verifying output format
- ✅ Developing new features

**Limitations**:
- ⚠️ Not how it runs in production
- ⚠️ Doesn't test delegation
- ⚠️ Doesn't test OpenAgent integration

---

### Approach 2: Test OpenAgent → ContextScout Delegation

**What we're testing**: OpenAgent's ability to delegate to ContextScout

**How it works**:
```bash
# Test delegation from OpenAgent to ContextScout
npm run eval:sdk -- --agent=core/openagent --pattern="delegation-test.yaml"
```

**What happens**:
- OpenAgent receives user prompt
- OpenAgent decides to delegate to ContextScout
- OpenAgent calls `task(subagent_type="ContextScout", ...)`
- ContextScout runs as subagent
- ContextScout returns results to OpenAgent
- OpenAgent presents results to user

**Use when**:
- ✅ Testing production workflow
- ✅ Verifying delegation logic
- ✅ Testing integration
- ✅ Validating real-world usage

**Limitations**:
- ⚠️ Harder to debug (two agents involved)
- ⚠️ Slower (more overhead)
- ⚠️ Tests both OpenAgent AND ContextScout

---

## Current Test Issues

### Issue 1: Wrong Testing Mode

**Current**:
```bash
npm run eval:sdk -- --agent=ContextScout
```

**Problem**: This runs through OpenAgent but doesn't properly test delegation

**What actually happens**:
1. Eval framework loads `ContextScout` as the agent
2. OpenAgent wraps it (because it's in subagents/ directory)
3. OpenAgent receives the test prompt
4. OpenAgent answers directly OR tries to delegate
5. ContextScout may or may not be invoked

**Result**: Inconsistent - sometimes ContextScout runs, sometimes it doesn't

---

### Issue 2: No Clear Test Separation

**Current tests mix concerns**:
- Some tests expect ContextScout to use tools directly
- Some tests expect OpenAgent to delegate
- No clear separation of what's being tested

**Result**: Tests fail for unclear reasons

---

## Recommended Testing Strategy

### Phase 1: Test ContextScout Directly (Standalone)

**Goal**: Verify ContextScout's core logic works

**Tests**:
1. ✅ Discovery - Can ContextScout find context files?
2. ✅ Search - Can ContextScout locate specific files?
3. ✅ Extraction - Can ContextScout extract key findings?
4. ✅ Output Format - Does ContextScout return proper format?
5. ✅ Error Handling - Does ContextScout handle errors gracefully?

**Command**:
```bash
cd evals/framework
npm run eval:sdk -- --subagent=contextscout --pattern="standalone/*.yaml"
```

**Test Structure**:
```yaml
# evals/agents/ContextScout/tests/standalone/01-discovery.yaml
id: contextscout-standalone-discovery
name: "ContextScout Standalone: Discovery Test"
description: |
  Tests ContextScout's ability to discover context files when running
  as a standalone agent (mode: primary override).
  
  This tests ContextScout's logic in isolation, not delegation.

prompts:
  - text: |
      Find all context files in .opencode/context/core/
      
      Return:
      - Exact file paths
      - File count
      - Directory structure

approvalStrategy:
  type: auto-approve

behavior:
  mustUseTools:
    - glob  # ContextScout MUST use glob to discover files
    - list  # ContextScout MUST use list to explore structure
  minToolCalls: 2
  maxToolCalls: 20

# Verify ContextScout's output
assertions:
  - type: output_contains
    value: ".opencode/context/core/"
  - type: tool_called
    tool: "glob"
  - type: tool_called
    tool: "list"

timeout: 60000

tags:
  - contextscout
  - standalone
  - discovery
  - unit-test
```

---

### Phase 2: Test OpenAgent → ContextScout Delegation

**Goal**: Verify OpenAgent delegates to ContextScout correctly

**Tests**:
1. ✅ Delegation Trigger - Does OpenAgent recognize when to delegate?
2. ✅ Delegation Format - Does OpenAgent pass correct parameters?
3. ✅ Result Handling - Does OpenAgent present ContextScout's results?
4. ✅ Error Propagation - Does OpenAgent handle ContextScout errors?

**Command**:
```bash
cd evals/framework
npm run eval:sdk -- --agent=core/openagent --pattern="delegation/*.yaml"
```

**Test Structure**:
```yaml
# evals/agents/core/openagent/tests/delegation/contextscout-delegation.yaml
id: openagent-contextscout-delegation
name: "OpenAgent: ContextScout Delegation Test"
description: |
  Tests OpenAgent's ability to delegate to ContextScout.
  
  This tests the delegation workflow, not ContextScout's logic.

prompts:
  - text: |
      I need to find context files for code standards. Can you help?

approvalStrategy:
  type: auto-approve

behavior:
  mustUseTools:
    - task  # OpenAgent MUST use task tool to delegate
  minToolCalls: 1
  maxToolCalls: 10

# Verify OpenAgent delegates to ContextScout
assertions:
  - type: tool_called
    tool: "task"
    with_args:
      subagent_type: "ContextScout"
  - type: output_contains
    value: ".opencode/context/core/standards/code.md"

timeout: 120000

tags:
  - openagent
  - delegation
  - contextscout
  - integration-test
```

---

## Debugging Guide

### Debugging Standalone Tests

**When test fails**:

1. **Check tool usage**:
   ```bash
   # Run with debug
   npm run eval:sdk -- --subagent=contextscout --pattern="test.yaml" --debug
   
   # Look for tool calls
   cat evals/results/latest.json | jq '.tests[0].timeline[] | select(.type == "tool_call")'
   ```

2. **Verify ContextScout is running**:
   ```bash
   # Check agent name in logs
   # Should see: "Agent: ContextScout" (not "Agent: OpenAgent")
   ```

3. **Check output format**:
   ```bash
   # View agent response
   cat evals/results/latest.json | jq '.tests[0].timeline[] | select(.type == "assistant_message")'
   ```

4. **Common issues**:
   - ❌ Agent not using tools → Check `mustUseTools` in test
   - ❌ Wrong output format → Check ContextScout prompt
   - ❌ Timeout → Increase timeout or simplify test

---

### Debugging Delegation Tests

**When test fails**:

1. **Check if delegation happened**:
   ```bash
   # Look for task tool call
   cat evals/results/latest.json | jq '.tests[0].timeline[] | select(.tool == "task")'
   ```

2. **Verify delegation parameters**:
   ```bash
   # Check subagent_type
   cat evals/results/latest.json | jq '.tests[0].timeline[] | select(.tool == "task") | .arguments'
   ```

3. **Check ContextScout's response**:
   ```bash
   # Look for ContextScout's output in timeline
   # Should see nested agent execution
   ```

4. **Common issues**:
   - ❌ OpenAgent doesn't delegate → Prompt not clear enough
   - ❌ Wrong subagent called → OpenAgent chose different subagent
   - ❌ Delegation fails → Check ContextScout is registered

---

## Test File Organization

### Recommended Structure

```
evals/agents/ContextScout/
├── config/
│   └── config.yaml                      # Test configuration
├── tests/
│   ├── standalone/                      # Phase 1: Test ContextScout directly
│   │   ├── 01-discovery.yaml           # Find files
│   │   ├── 02-search.yaml              # Search for specific files
│   │   ├── 03-extraction.yaml          # Extract key findings
│   │   ├── 04-output-format.yaml       # Verify output format
│   │   ├── 05-error-handling.yaml      # Handle errors
│   │   ├── 06-false-positive.yaml      # Prevent hallucinations
│   │   ├── 07-invalid-path.yaml        # Handle invalid paths
│   │   ├── 08-ambiguous-query.yaml     # Handle vague queries
│   │   ├── 09-mvi-detection.yaml       # Detect MVI compliance
│   │   └── 10-empty-directory.yaml     # Handle empty dirs
│   └── delegation/                      # Phase 2: Test OpenAgent delegation
│       ├── 01-trigger.yaml             # Does OpenAgent delegate?
│       ├── 02-parameters.yaml          # Correct parameters?
│       └── 03-results.yaml             # Results presented correctly?
└── README.md
```

---

## Implementation Plan

### Step 1: Reorganize Tests (30 min)

1. Create `tests/standalone/` directory
2. Move current tests to `standalone/`
3. Update test descriptions to clarify "standalone mode"
4. Update config.yaml

### Step 2: Fix Standalone Tests (1 hour)

1. Update test command to use `--subagent=contextscout`
2. Add `mustUseTools` to enforce tool usage
3. Add assertions to verify output
4. Run and debug each test

### Step 3: Create Delegation Tests (1 hour)

1. Create `tests/delegation/` directory
2. Create OpenAgent delegation tests
3. Test delegation trigger
4. Test result handling

### Step 4: Document Results (30 min)

1. Update README with test results
2. Document debugging procedures
3. Create troubleshooting guide

---

## Quick Reference

### Test ContextScout Directly
```bash
# Standalone mode - tests ContextScout's logic
npm run eval:sdk -- --subagent=contextscout --pattern="standalone/*.yaml"
```

### Test OpenAgent Delegation
```bash
# Delegation mode - tests OpenAgent → ContextScout
npm run eval:sdk -- --agent=core/openagent --pattern="delegation/*.yaml"
```

### Debug Standalone Test
```bash
# Run with debug output
npm run eval:sdk -- --subagent=contextscout --pattern="standalone/01-discovery.yaml" --debug

# Check tool calls
cat evals/results/latest.json | jq '.tests[0].timeline[] | select(.type == "tool_call") | {tool, args}'
```

### Debug Delegation Test
```bash
# Run with debug output
npm run eval:sdk -- --agent=core/openagent --pattern="delegation/01-trigger.yaml" --debug

# Check if delegation happened
cat evals/results/latest.json | jq '.tests[0].timeline[] | select(.tool == "task")'
```

---

## Bottom Line

**What should we test?**

1. **Phase 1**: Test ContextScout's logic directly (standalone mode)
   - Verify tool usage (glob, read, grep)
   - Verify output format
   - Verify error handling

2. **Phase 2**: Test OpenAgent's delegation (integration mode)
   - Verify OpenAgent delegates correctly
   - Verify results are presented properly
   - Verify error propagation

**Current status**: Tests are confused - mixing both modes

**Next step**: Reorganize tests into `standalone/` and `delegation/` directories

---

**Last Updated**: 2026-01-07  
**Status**: Strategy defined, implementation pending