Purpose: Comprehensive validation that OpenAgent, OpenCoder, and ContextScout work together effectively for intelligent context discovery.
Created: 2026-01-09
Status: Ready to Run
This test suite answers the critical question: Should agents use ContextScout for context discovery, and if so, when and how?
evals/agents/
├── core/
│ ├── openagent/tests/contextscout-integration/
│ │ ├── 01-known-context-direct-load.yaml # Should NOT use ContextScout
│ │ ├── 02-unknown-domain-discovery.yaml # Should use ContextScout
│ │ ├── 03-accuracy-correct-files.yaml # ContextScout finds right files
│ │ ├── 04-implicit-discovery.yaml # NEW: Proactive usage
│ │ ├── 05-multi-domain-comprehensive.yaml # NEW: Multi-domain discovery
│ │ └── README.md
│ │
│ └── opencoder/tests/contextscout-integration/
│ ├── 01-implicit-pattern-discovery.yaml # NEW: Pattern discovery
│ └── README.md
│
└── ContextScout/tests/
├── 01-code-standards-discovery.yaml # NEW: Basic discovery
├── 02-domain-specific-discovery.yaml # NEW: Domain-specific
├── 03-bad-request-handling.yaml # NEW: Error handling
├── 04-multi-domain-comprehensive.yaml # NEW: Multi-domain
├── 05-tool-usage-validation.yaml # NEW: Read-only enforcement
└── README.md
Location: evals/agents/core/openagent/tests/contextscout-integration/
| Test | Purpose | Expected Behavior |
|---|---|---|
| 01-known-context | Validate direct loading for known tasks | Should NOT use ContextScout |
| 02-unknown-domain | Validate discovery for unfamiliar topics | Should use ContextScout |
| 03-accuracy | Validate ContextScout finds correct files | Finds MVI.md correctly |
| 04-implicit-discovery | NEW: Proactive usage without instruction | Uses ContextScout automatically |
| 05-multi-domain | NEW: Comprehensive multi-domain discovery | Finds all relevant files |
Key Question: Does OpenAgent know when to use ContextScout vs. direct loading?
Location: evals/agents/core/opencoder/tests/contextscout-integration/
| Test | Purpose | Expected Behavior |
|---|---|---|
| 01-implicit-pattern-discovery | NEW: Pattern discovery for unfamiliar code | Uses ContextScout for eval framework |
Key Question: Does OpenCoder discover patterns before implementing unfamiliar code?
Location: evals/agents/ContextScout/tests/
| Test | Purpose | Expected Behavior |
|---|---|---|
| 01-code-standards | NEW: Basic discovery | Finds code-quality.md |
| 02-domain-specific | NEW: Domain-specific search | Finds eval framework context |
| 03-bad-request | NEW: Error handling | Handles invalid queries gracefully |
| 04-multi-domain | NEW: Comprehensive discovery | Finds 4-5 files across domains |
| 05-tool-usage | NEW: Read-only enforcement | Never uses write/edit/bash |
Key Question: Does ContextScout discover context correctly and safely?
cd evals/framework
# All OpenAgent integration tests
npm run eval:sdk -- --agent=core/openagent --pattern="contextscout-integration/*.yaml"
# All OpenCoder integration tests
npm run eval:sdk -- --agent=core/opencoder --pattern="contextscout-integration/*.yaml"
# All ContextScout functionality tests
npm run eval:sdk -- --agent=ContextScout
# Category A: OpenAgent Integration
npm run eval:sdk -- --agent=core/openagent --pattern="contextscout-integration/*.yaml"
# Category B: OpenCoder Integration
npm run eval:sdk -- --agent=core/opencoder --pattern="contextscout-integration/*.yaml"
# Category C: ContextScout Functionality
npm run eval:sdk -- --agent=ContextScout --pattern="*.yaml"
# OpenAgent: Implicit discovery (NEW)
npm run eval:sdk -- --agent=core/openagent --pattern="04-implicit-discovery.yaml"
# OpenCoder: Pattern discovery (NEW)
npm run eval:sdk -- --agent=core/opencoder --pattern="01-implicit-pattern-discovery.yaml"
# ContextScout: Bad request handling (NEW)
npm run eval:sdk -- --agent=ContextScout --pattern="03-bad-request-handling.yaml"
OpenAgent: 04-implicit-discovery.yaml (NEW)
OpenCoder: 01-implicit-pattern-discovery.yaml (NEW)
ContextScout: 03-bad-request-handling.yaml (NEW)
ContextScout: 05-tool-usage-validation.yaml (NEW)
Conclusion: ContextScout integration is working as designed
Actions:
Symptoms:
Conclusion: Agents aren't using ContextScout proactively
Actions:
Symptoms:
Conclusion: ContextScout search accuracy is poor
Actions:
Symptoms:
Conclusion: Security violation - ContextScout is not read-only
Actions:
Total Tests: 12 (6 OpenAgent + 1 OpenCoder + 5 ContextScout)
New Tests: 8 (created 2026-01-09)
Critical Tests: 4 (implicit discovery, pattern discovery, bad requests, tool usage)
Key Insight: ContextScout should be used proactively by agents when encountering unfamiliar domains, NOT as a replacement for direct loading of well-known context. These tests validate this hypothesis comprehensively.
Run Command:
cd evals/framework
# Run everything
npm run eval:sdk -- --agent=core/openagent --pattern="contextscout-integration/*.yaml"
npm run eval:sdk -- --agent=core/opencoder --pattern="contextscout-integration/*.yaml"
npm run eval:sdk -- --agent=ContextScout