Subagent Evaluation Tests

Overview

This directory contains evaluation tests for all 14 OpenCode subagents. Each subagent has its own test suite with configuration, smoke tests, and placeholder directories for prompt variants.

Directory Structure

subagents/
├── code/                    # Code-focused subagents (parent: opencoder)
│   ├── build-agent/
│   ├── codebase-pattern-analyst/
│   ├── coder-agent/
│   ├── reviewer/
│   └── tester/
├── core/                    # Core orchestration subagents (parent: openagent)
│   ├── contextscout/
│   ├── documentation/
│   └── task-manager/
├── system-builder/          # System builder subagents (parent: system-builder)
│   ├── agent-generator/
│   ├── command-creator/
│   ├── context-organizer/
│   ├── domain-analyzer/
│   └── workflow-designer/
└── utils/                   # Utility subagents (parent: openagent)
    └── image-specialist/

Each subagent directory contains:

config/config.yaml - Test configuration
tests/smoke-test.yaml - Basic sanity check
prompts/ - Placeholder for model-specific prompt variants
README.md - Subagent-specific documentation

Subagent Inventory

Code Subagents (Parent: opencoder)

Subagent	Description	Status
build-agent	Type checking and build validation	✅ Smoke test ready
codebase-pattern-analyst	Finding similar implementations	✅ Smoke test ready
coder-agent	Focused coding subtasks	✅ Smoke test ready
reviewer	Code review and quality assurance	✅ Smoke test ready
tester	Test authoring and TDD	✅ Smoke test ready

Core Subagents (Parent: openagent)

Subagent	Description	Status
contextscout	Context search and retrieval	✅ Smoke test ready
documentation	Documentation authoring	✅ Smoke test ready
task-manager	Task breakdown with dependency tracking	✅ Smoke test ready

System Builder Subagents (Parent: system-builder)

Subagent	Description	Status
agent-generator	XML-optimized agent generation	✅ Smoke test ready
command-creator	Custom slash command creation	✅ Smoke test ready
context-organizer	Context file organization	✅ Smoke test ready
domain-analyzer	Domain concept analysis	✅ Smoke test ready
workflow-designer	Workflow definition design	✅ Smoke test ready

Utils Subagents (Parent: openagent)

Subagent	Description	Status
image-specialist	Image editing and analysis	✅ Smoke test ready

Running Tests

Quick Start

# Test a specific subagent (standalone mode)
npm run eval:sdk -- --subagent=coder-agent

# Test via parent agent (delegation mode)
npm run eval:sdk -- --subagent=coder-agent --delegate

# Using Makefile shortcuts
make test-subagent SUBAGENT=coder-agent
make test-subagent-delegate SUBAGENT=coder-agent

Test All Subagents

# Run smoke tests for all code subagents
for agent in build-agent codebase-pattern-analyst coder-agent reviewer tester; do
  npm run eval:sdk -- --subagent=$agent --pattern="**/smoke-test.yaml"
done

# Run smoke tests for all core subagents
for agent in contextscout documentation task-manager; do
  npm run eval:sdk -- --subagent=$agent --pattern="**/smoke-test.yaml"
done

# Run smoke tests for all system-builder subagents
for agent in agent-generator command-creator context-organizer domain-analyzer workflow-designer; do
  npm run eval:sdk -- --subagent=$agent --pattern="**/smoke-test.yaml"
done

# Run smoke test for utils subagent
npm run eval:sdk -- --subagent=image-specialist --pattern="**/smoke-test.yaml"

Verbose Output

# See full conversation and tool calls
npm run eval:sdk -- --subagent=coder-agent --verbose

Test Modes

Standalone Mode (Default)

Forces mode: primary in the subagent file
Tests subagent directly without parent delegation
Useful for isolated functionality testing
Use case: Unit testing subagent behavior

Delegation Mode

Tests subagent via its parent agent
Simulates real-world usage patterns
Parent agent delegates to subagent
Use case: Integration testing

Test Suites

Each subagent supports three test suites:

smoke - Basic sanity checks (✅ Implemented)
standalone - Isolated functionality tests (🚧 TODO)
delegation - Parent delegation scenarios (🚧 TODO)

Adding New Tests

Navigate to the subagent's tests/ directory
Create a new YAML test file following the schema:

id: subagent-name-test-id
name: "Test Name"
description: |
  Test description
category: developer

prompts:
  - text: |
      Your test prompt here

approvalStrategy:
  type: auto-approve  # or manual-approve

behavior:
  mustUseTools:
    - tool-name
  minToolCalls: 1
  maxToolCalls: 5

expectedViolations:
  - rule: approval-gate
    shouldViolate: false
    severity: error

timeout: 60000

tags:
  - subagent
  - subagent-name
  - test-suite-name

Update the subagent's README.md
Run the test to verify

Prompt Variants

The prompts/ directory in each subagent is reserved for model-specific prompt variants:

prompts/
├── gpt.md          # GPT-optimized prompts
├── gemini.md       # Gemini-optimized prompts
├── llama.md        # Llama-optimized prompts
├── grok.md         # Grok-optimized prompts
└── README.md       # Variant documentation

Status: 🚧 Not yet implemented (directories created with .gitkeep)

File Statistics

Total subagents: 14
Config files: 14 (config.yaml)
Smoke tests: 14 (smoke-test.yaml)
Test configs: 14 (config.yaml)
READMEs: 14
Prompt directories: 14 (with .gitkeep)
Total files: 56

Next Steps

Immediate Priorities

Run Smoke Tests - Verify all 14 smoke tests pass

# Test each subagent
npm run eval:sdk -- --subagent=coder-agent
npm run eval:sdk -- --subagent=task-manager
# ... etc

Create Standalone Test Suite - Add comprehensive standalone tests
- Focus on core functionality
- Test tool usage patterns
- Validate approval gates
- Check context loading
Create Delegation Test Suite - Add parent delegation tests
- Test real-world delegation scenarios
- Validate parent → subagent communication
- Check handoff patterns
Add Prompt Variants - Create model-specific prompts
- GPT optimizations
- Gemini optimizations
- Llama optimizations

Long-term Goals

Achieve 100% smoke test pass rate
Create comprehensive standalone test coverage
Create comprehensive delegation test coverage
Add performance benchmarking
Add multi-model testing
Document best practices per subagent
Create test templates for common patterns

Contributing

When adding tests for subagents:

Follow the existing directory structure
Use descriptive test IDs and names
Add appropriate tags for filtering
Document expected behavior
Update the subagent's README
Run tests locally before committing

Support

For issues or questions:

See SUBAGENT_TESTING.md for detailed testing guide
Check individual subagent READMEs for specific documentation
Review evals/README.md for framework documentation

README.md 7.8 KB

History Raw

Subagent Evaluation Tests

Overview

Directory Structure

Subagent Inventory

Code Subagents (Parent: opencoder)

Core Subagents (Parent: openagent)

System Builder Subagents (Parent: system-builder)

Utils Subagents (Parent: openagent)

Running Tests

Quick Start

Test All Subagents

Verbose Output

Test Modes

Standalone Mode (Default)

Delegation Mode

Test Suites

Adding New Tests

Prompt Variants

File Statistics

Related Documentation

Next Steps

Immediate Priorities

Long-term Goals

Contributing

Support

README.md 7.8 KB History Raw

Subagent Evaluation Tests

Overview

Directory Structure

Subagent Inventory

Code Subagents (Parent: opencoder)

Core Subagents (Parent: openagent)

System Builder Subagents (Parent: system-builder)

Utils Subagents (Parent: openagent)

Running Tests

Quick Start

Test All Subagents

Verbose Output

Test Modes

Standalone Mode (Default)

Delegation Mode

Test Suites

Adding New Tests

Prompt Variants

File Statistics

Related Documentation

Next Steps

Immediate Priorities

Long-term Goals

Contributing

Support

README.md 7.8 KB

History Raw