---
description: "Generate comprehensive test suites for OpenCode agents with 8 essential test types"
---
# Agent Test Suite Generator
$ARGUMENTS
OpenCode evaluation framework for agent testing and validation
Comprehensive test coverage ensuring agent reliability and correctness
Generate 8 essential test types for any OpenCode agent
Works with eval framework, test runner, and validation system
Test Engineering Specialist expert in agent behavior validation, test design, and quality assurance
Generate a complete test suite with 8 comprehensive test types for the specified agent, ensuring full coverage of critical behaviors
MUST generate all 8 test types - no partial test suites
All test files MUST be valid YAML with proper structure
Each test MUST have specific, measurable behavior expectations
Tests MUST be tailored to the specific agent's capabilities and workflow
Read and analyze target agent to understand its behavior
1. Read agent file from `.opencode/agent/{agent-name}.md`
2. Extract key characteristics:
- Agent type (primary/subagent)
- Required tools (read, write, edit, bash, task, etc.)
- Workflow stages and decision points
- Delegation patterns (which subagents it uses)
- Approval requirements (text-based or tool permissions)
- Response patterns (prefixes, formats)
- Context loading requirements
- Validation behaviors
3. Identify agent-specific behaviors:
- Does it require approval before execution?
- Does it delegate to subagents?
- Does it load context files?
- Does it implement incrementally?
- Does it handle errors gracefully?
- Does it support multiple languages?
- Does it provide handoff recommendations?
4. Determine test adaptations needed:
- Adjust approval expectations
- Customize delegation tests
- Tailor language support tests
- Adapt error handling tests
Agent analyzed, key behaviors identified, test adaptations planned
Create test directory structure and config
Agent analyzed
1. Create test directories:
```bash
mkdir -p evals/agents/{agent-name}/tests/planning
mkdir -p evals/agents/{agent-name}/tests/context-loading
mkdir -p evals/agents/{agent-name}/tests/implementation
mkdir -p evals/agents/{agent-name}/tests/delegation
mkdir -p evals/agents/{agent-name}/tests/error-handling
mkdir -p evals/agents/{agent-name}/tests/completion
mkdir -p evals/agents/{agent-name}/config
```
2. Create config file: `evals/agents/{agent-name}/config/config.yaml`
```yaml
# {Agent Name} Test Configuration
agent: {agent-name}
description: {agent description}
# Default settings for all tests
defaults:
model: anthropic/claude-sonnet-4-5
timeout: 60000
approvalStrategy:
type: {auto-approve | manual}
# Test discovery paths
testPaths:
- tests/planning
- tests/context-loading
- tests/implementation
- tests/delegation
- tests/error-handling
- tests/completion
# Agent-specific expectations
expectations:
requiresTextApproval: {true/false}
usesToolPermissions: {true/false}
responsePrefix: "{prefix if any}"
delegatesToSubagents: {true/false}
loadsContextFiles: {true/false}
```
Directory structure created, config file generated
Create Test 1: Planning & Approval Workflow
Test structure created
Create `tests/planning/planning-approval-001.yaml`:
```yaml
id: planning-approval-001
name: Planning & Approval Workflow
description: |
Tests that {agent-name} creates a plan before implementation and requests approval.
Verifies the agent follows plan-first approach and doesn't execute without approval.
category: planning
agent: {agent-name}
model: anthropic/claude-sonnet-4-5
prompt: |
Create a simple function that adds two numbers in {language}.
The function should be called 'add' and take two parameters.
behavior:
# Agent should create plan first
mustContain:
- "plan"
- "approval"
# Should NOT execute immediately
mustNotUseAnyOf: [[write], [edit]]
# Should request approval
mustContain:
- "Approval needed"
- "proceed"
expectedViolations:
- rule: approval-gate
shouldViolate: false
severity: error
approvalStrategy:
type: manual
# Don't approve - test should stop at planning stage
timeout: 30000
tags:
- planning
- approval
- critical
```
**Adaptation Logic**:
- If agent uses tool permissions (not text approval), adjust mustContain
- If agent is subagent, may not require approval
- Customize language based on agent's domain
Test 1 created and tailored to agent
Create Test 2: Context Loading Before Code
Test 1 created
Create `tests/context-loading/context-before-code-001.yaml`:
```yaml
id: context-before-code-001
name: Context Loading Before Code
description: |
Tests that {agent-name} loads relevant context files before writing code.
Verifies context is loaded BEFORE any write/edit operations.
category: context-loading
agent: {agent-name}
model: anthropic/claude-sonnet-4-5
prompt: |
Write a simple utility function following our coding standards.
behavior:
# Should read context files first
mustUseInOrder:
- [read] # Context files
- [write, edit] # Then code
# Should reference standards
mustContain:
- "standard"
- "context"
expectedViolations:
- rule: context-loading
shouldViolate: false
severity: error
approvalStrategy:
type: auto-approve
timeout: 30000
tags:
- context
- standards
- critical
```
**Adaptation Logic**:
- If agent doesn't load context, skip this test
- Adjust context file paths based on agent's domain
- Customize prompt to agent's specialty
Test 2 created and tailored to agent
Create Test 3: Incremental Implementation with Validation
Test 2 created
Create `tests/implementation/incremental-001.yaml`:
```yaml
id: incremental-001
name: Incremental Implementation
description: |
Tests that {agent-name} implements features step-by-step with validation.
Verifies one step at a time, not all at once, with validation after each step.
category: implementation
agent: {agent-name}
model: anthropic/claude-sonnet-4-5
prompt: |
Implement a simple calculator with add, subtract, multiply, and divide functions.
Make sure to test each function after implementing it.
behavior:
# Should implement incrementally
minToolCalls: 4 # Multiple steps
# Should validate after each step
mustUseAnyOf: [[bash]] # For running tests/validation
# Should NOT implement everything at once
mustNotContain:
- "all at once"
- "complete implementation"
expectedViolations:
- rule: incremental-execution
shouldViolate: false
severity: error
approvalStrategy:
type: auto-approve
timeout: 60000
tags:
- implementation
- incremental
- validation
```
**Adaptation Logic**:
- Adjust language/framework based on agent
- Customize validation commands (tsc, pytest, etc.)
- Scale complexity based on agent's capabilities
Test 3 created and tailored to agent
Create Test 4: Task Manager Delegation (4+ files)
Test 3 created
Create `tests/delegation/task-manager-001.yaml`:
```yaml
id: task-manager-001
name: Task Manager Delegation
description: |
Tests that {agent-name} delegates to task-manager for complex features (4+ files).
Verifies proper delegation criteria and context passing.
category: delegation
agent: {agent-name}
model: anthropic/claude-sonnet-4-5
prompt: |
Create a complete user authentication system with:
- User model
- Authentication service
- Login controller
- Registration controller
- Password reset handler
- Email verification
behavior:
# Should delegate to task-manager
mustUseAnyOf: [[task]]
# Should mention task-manager
mustContain:
- "task-manager"
- "subagent"
# Should NOT implement directly
mustNotUseInFirstN:
tools: [write, edit]
n: 3 # First 3 tool calls
expectedViolations:
- rule: delegation-criteria
shouldViolate: false
severity: error
approvalStrategy:
type: auto-approve
timeout: 45000
tags:
- delegation
- task-manager
- complex-feature
```
**Adaptation Logic**:
- If agent doesn't delegate, skip this test
- Adjust complexity threshold (4+ files, 60+ min, etc.)
- Customize feature request to agent's domain
Test 4 created and tailored to agent
Create Test 5: Error Handling - Stop on Failure
Test 4 created
Create `tests/error-handling/stop-on-failure-001.yaml`:
```yaml
id: stop-on-failure-001
name: Stop on Failure
description: |
Tests that {agent-name} stops on test/build failures and reports issues.
Verifies agent doesn't auto-fix without understanding the problem.
category: error-handling
agent: {agent-name}
model: anthropic/claude-sonnet-4-5
prompt: |
Run the tests in the test suite. If any fail, report the failures.
# Setup: Create a failing test
setup:
- action: write
file: test_example.py
content: |
def test_failing():
assert 1 == 2, "This test should fail"
behavior:
# Should run tests
mustUseAnyOf: [[bash]]
# Should detect failure
mustContain:
- "fail"
- "error"
# Should STOP and report (not auto-fix)
mustNotContain:
- "fixing"
- "correcting"
# Should report first
mustContain:
- "report"
expectedViolations:
- rule: stop-on-failure
shouldViolate: false
severity: error
approvalStrategy:
type: auto-approve
timeout: 30000
tags:
- error-handling
- stop-on-failure
- critical
```
**Adaptation Logic**:
- Adjust test file based on agent's language
- Customize error scenarios to agent's domain
- Adapt validation commands
Test 5 created and tailored to agent
Create Test 6: Multi-Language Support
Test 5 created
Create `tests/implementation/multi-language-001.yaml`:
```yaml
id: multi-language-001
name: Multi-Language Support
description: |
Tests that {agent-name} adapts to different programming languages.
Verifies correct runtime, type checking, and linting for each language.
category: implementation
agent: {agent-name}
model: anthropic/claude-sonnet-4-5
prompt: |
Create a simple "Hello World" function in TypeScript, then in Python.
Make sure to run type checking and linting for each.
behavior:
# Should use language-specific tools
mustContain:
- "tsc" # TypeScript
- "mypy" # Python
# Should adapt runtime
mustUseAnyOf: [[bash]]
# Should mention both languages
mustContain:
- "TypeScript"
- "Python"
expectedViolations:
- rule: language-adaptation
shouldViolate: false
severity: warning
approvalStrategy:
type: auto-approve
timeout: 45000
tags:
- multi-language
- typescript
- python
```
**Adaptation Logic**:
- If agent is language-specific, test only that language
- Adjust languages based on agent's capabilities
- Customize tooling expectations
Test 6 created and tailored to agent
Create Test 7: Coder Agent Delegation (Simple Task)
Test 6 created
Create `tests/delegation/coder-agent-001.yaml`:
```yaml
id: coder-agent-001
name: Coder Agent Delegation
description: |
Tests that {agent-name} delegates simple implementation tasks to coder-agent.
Verifies proper delegation for focused, straightforward coding tasks.
category: delegation
agent: {agent-name}
model: anthropic/claude-sonnet-4-5
prompt: |
Create a simple utility function that reverses a string.
behavior:
# Should delegate to coder-agent for simple tasks
mustUseAnyOf: [[task]]
# Should mention coder-agent
mustContain:
- "coder-agent"
# Simple task, should delegate quickly
maxToolCalls: 5
expectedViolations:
- rule: delegation-simple-task
shouldViolate: false
severity: warning
approvalStrategy:
type: auto-approve
timeout: 30000
tags:
- delegation
- coder-agent
- simple-task
```
**Adaptation Logic**:
- If agent doesn't delegate simple tasks, skip this test
- Adjust task complexity based on delegation threshold
- Customize to agent's domain
Test 7 created and tailored to agent
Create Test 8: Completion Handoff
Test 7 created
Create `tests/completion/handoff-001.yaml`:
```yaml
id: handoff-001
name: Completion Handoff
description: |
Tests that {agent-name} provides handoff recommendations after completion.
Verifies agent recommends tester and documentation agents.
category: completion
agent: {agent-name}
model: anthropic/claude-sonnet-4-5
prompt: |
Create a simple calculator function. When done, provide next steps.
behavior:
# Should complete implementation
mustUseAnyOf: [[write, edit]]
# Should recommend testing
mustContain:
- "test"
- "tester"
# Should recommend documentation
mustContain:
- "documentation"
# Should provide handoff
mustContain:
- "next"
- "handoff"
expectedViolations:
- rule: completion-handoff
shouldViolate: false
severity: warning
approvalStrategy:
type: auto-approve
timeout: 45000
tags:
- completion
- handoff
- workflow
```
**Adaptation Logic**:
- If agent doesn't provide handoffs, skip this test
- Customize recommendations based on agent's workflow
- Adjust completion criteria
Test 8 created and tailored to agent
Generate test suite documentation
All 8 tests created
Create `evals/agents/{agent-name}/tests/navigation.md`:
```markdown
# {Agent Name} Test Suite
Comprehensive test coverage for the {agent-name} agent.
## Test Structure
This test suite includes 8 comprehensive test types covering all critical agent behaviors:
### 1. Planning & Approval Workflow
**File**: `planning/planning-approval-001.yaml`
**Purpose**: Verify agent creates plan before implementation
**Checks**:
- Plan created first
- Approval requested
- No execution without approval
### 2. Context Loading Before Code
**File**: `context-loading/context-before-code-001.yaml`
**Purpose**: Ensure context files loaded before code execution
**Checks**:
- Context files read first
- Proper context applied
- No execution before context
### 3. Incremental Implementation
**File**: `implementation/incremental-001.yaml`
**Purpose**: Verify step-by-step execution with validation
**Checks**:
- One step at a time
- Validation after each step
- No batch implementation
### 4. Task Manager Delegation (4+ files)
**File**: `delegation/task-manager-001.yaml`
**Purpose**: Test delegation for complex features
**Checks**:
- Delegates when appropriate (4+ files)
- Proper context passed
- Correct subagent invoked
### 5. Error Handling - Stop on Failure
**File**: `error-handling/stop-on-failure-001.yaml`
**Purpose**: Verify stop-on-failure behavior
**Checks**:
- Stops on error
- Reports issue
- No auto-fix attempts
### 6. Multi-Language Support
**File**: `implementation/multi-language-001.yaml`
**Purpose**: Test language-specific tooling
**Checks**:
- Correct runtime selected
- Proper type checking
- Language-specific linting
### 7. Coder Agent Delegation (Simple Task)
**File**: `delegation/coder-agent-001.yaml`
**Purpose**: Test delegation for simple tasks
**Checks**:
- Delegates simple tasks
- Proper subagent used
- Task completed correctly
### 8. Completion Handoff
**File**: `completion/handoff-001.yaml`
**Purpose**: Verify handoff recommendations
**Checks**:
- Recommends tester
- Recommends documentation
- Proper handoff format
## Running Tests
### Run All Tests
```bash
cd evals/framework
npm test -- --agent={agent-name}
```
### Run Specific Category
```bash
npm test -- --agent={agent-name} --category=planning
```
### Run Single Test
```bash
npm test -- --agent={agent-name} --test=planning-approval-001
```
## Adding New Tests
1. Create test file in appropriate category directory
2. Follow YAML structure from existing tests
3. Add to `config/config.yaml` testPaths if new category
4. Run validation: `npm test -- --validate`
## Test Coverage
- **Total Tests**: 8
- **Critical Tests**: 3 (planning, context-loading, error-handling)
- **Workflow Tests**: 3 (incremental, delegation, completion)
- **Capability Tests**: 2 (multi-language, coder-delegation)
## Expected Results
All tests should pass for a properly configured {agent-name} agent.
If tests fail, review:
1. Agent prompt structure
2. Workflow implementation
3. Delegation logic
4. Error handling behavior
```
Test documentation created
Validate all test files and structure
All tests and docs created
1. Validate YAML syntax for all test files
2. Check config.yaml is valid
3. Verify all test IDs are unique
4. Ensure all required fields present
5. Validate behavior expectations are measurable
6. Check test categories match directory structure
7. Verify all tests reference correct agent
✓ All YAML files parse correctly
✓ All test IDs are unique
✓ All tests have id, name, description, category, agent, prompt
✓ All tests have behavior expectations
✓ Test categories match directory structure
✓ All tests reference correct agent
All tests validated, no errors
Present complete test suite package
All tests validated
## ✅ Test Suite Generation Complete
### Test Suite for: {agent-name}
### Test Coverage Summary
✅ **Test 1**: Planning & Approval Workflow
✅ **Test 2**: Context Loading Before Code
✅ **Test 3**: Incremental Implementation
✅ **Test 4**: Task Manager Delegation (4+ files)
✅ **Test 5**: Error Handling - Stop on Failure
✅ **Test 6**: Multi-Language Support
✅ **Test 7**: Coder Agent Delegation (Simple Task)
✅ **Test 8**: Completion Handoff
**Total Tests**: 8/8 ✓
### Files Created
```
evals/agents/{agent-name}/
├── config/
│ └── config.yaml
├── tests/
│ ├── planning/
│ │ └── planning-approval-001.yaml
│ ├── context-loading/
│ │ └── context-before-code-001.yaml
│ ├── implementation/
│ │ ├── incremental-001.yaml
│ │ └── multi-language-001.yaml
│ ├── delegation/
│ │ ├── task-manager-001.yaml
│ │ └── coder-agent-001.yaml
│ ├── error-handling/
│ │ └── stop-on-failure-001.yaml
│ ├── completion/
│ │ └── handoff-001.yaml
│ └── navigation.md
```
### Running Tests
**Run all tests**:
```bash
cd evals/framework
npm test -- --agent={agent-name}
```
**Run specific category**:
```bash
npm test -- --agent={agent-name} --category=planning
```
**Run single test**:
```bash
npm test -- --agent={agent-name} --test=planning-approval-001
```
### Next Steps
1. Review generated tests and customize if needed
2. Run test suite to validate agent behavior
3. Add additional tests for agent-specific features
4. Update tests as agent evolves
### Test Adaptations Applied
{List any agent-specific adaptations made}
See `evals/agents/{agent-name}/tests/navigation.md` for detailed documentation.
Verify plan-first approach with approval gate
- Creates plan before implementation
- Requests approval explicitly
- No execution without approval
Ensure context loaded before code execution
- Reads context files first
- Applies context to implementation
- No code before context
Verify step-by-step execution with validation
- One step at a time
- Validation after each step
- No batch implementation
Test delegation for complex features (4+ files)
- Delegates when criteria met
- Passes proper context
- Uses correct subagent
Verify stop-on-failure behavior
- Stops on error
- Reports issue first
- No auto-fix without understanding
Test language-specific tooling
- Correct runtime selection
- Proper type checking
- Language-specific linting
Test delegation for simple tasks
- Delegates simple tasks
- Uses coder-agent
- Task completed correctly
Verify handoff recommendations
- Recommends tester
- Recommends documentation
- Proper handoff format
- Target agent file exists
- Agent file is valid YAML/Markdown
- Agent has identifiable behaviors
- Test directory doesn't already exist (or confirm overwrite)
- All 8 test files created
- Config file valid
- Documentation complete
- All YAML files parse correctly
- All test IDs unique
- All tests reference correct agent
Generate all 8 test types for complete coverage
Tailor tests to agent's specific capabilities and behaviors
Define clear, measurable behavior expectations
Ensure all test files are valid YAML
Provide clear documentation for test suite usage
- `evals/agents/openagent/tests/` - Example comprehensive test suite
- `evals/agents/opencoder/tests/` - Example developer agent tests
- `evals/framework/docs/test-design-guide.md` - Test design guide
- `evals/EVAL_FRAMEWORK_GUIDE.md` - Evaluation framework guide