Created: November 22, 2025
Purpose: Validate OpenAgent follows workflows defined in openagent.md
Approach: Simple, focused tests for core workflow compliance
| Test ID | File | Workflow Tested | Status |
|---|---|---|---|
task-simple-001 |
developer/task-simple-001.yaml |
Analyze → Approve → Execute → Validate | ✅ Created |
ctx-code-001 |
developer/ctx-code-001.yaml |
Execute → Load Context (code.md) | ✅ Created |
ctx-docs-001 |
developer/ctx-docs-001.yaml |
Execute → Load Context (docs.md) | ✅ Created |
fail-stop-001 |
developer/fail-stop-001.yaml |
Validate → Stop on Failure | ✅ Created |
conv-simple-001 |
business/conv-simple-001.yaml |
Conversational Path (no approval) | ✅ Created |
| Test ID | File | Purpose | Status |
|---|---|---|---|
shared-approval-001 |
shared/tests/common/approval-gate-basic.yaml |
Universal approval gate test | ✅ Created |
| File | Purpose | Status |
|---|---|---|
evals/agents/shared/README.md |
Shared tests guide | ✅ Created |
evals/opencode/AGENT_TESTING_GUIDE.md |
Agent-agnostic architecture guide | ✅ Created |
evals/SIMPLE_TEST_PLAN.md |
Simple test plan | ✅ Already exists |
| Workflow Stage | Rule | Tests Before | Tests After | Gap Closed |
|---|---|---|---|---|
| Analyze | Path detection | 1 | 2 | +1 |
| Approve | Approval gate | 2 | 3 | +1 |
| Execute → Load Context | Context loading | 0 | 2 | +2 |
| Validate | Stop on failure | 0 | 1 | +1 |
| Confirm | Cleanup | 0 | 0 | 0 |
Progress: 4/13 gaps closed (31% improvement)
File: developer/task-simple-001.yaml
Tests:
Expected Behavior:
User: "Run npm install"
Agent: "I'll run npm install. Should I proceed?" ← Asks approval
User: [Approves]
Agent: [Executes bash] → Reports result
Rules Tested:
File: developer/ctx-code-001.yaml
Tests:
Expected Behavior:
User: "Create a TypeScript function"
Agent: "I'll create the function. Should I proceed?" ← Asks approval
User: [Approves]
Agent: [Reads .opencode/context/core/standards/code.md] ← Loads context
Agent: [Writes code following standards] → Reports result
Rules Tested:
File: developer/ctx-docs-001.yaml
Tests:
Expected Behavior:
User: "Update README with installation steps"
Agent: "I'll update the README. Should I proceed?" ← Asks approval
User: [Approves]
Agent: [Reads .opencode/context/core/standards/docs.md] ← Loads context
Agent: [Edits README following standards] → Reports result
Rules Tested:
File: developer/fail-stop-001.yaml
Tests:
Expected Behavior:
User: "Run the test suite"
Agent: "I'll run the tests. Should I proceed?" ← Asks approval
User: [Approves]
Agent: [Runs tests] → Tests fail
Agent: STOPS ← Does NOT auto-fix
Agent: "Tests failed with X errors. Here's what I found..." ← Reports
Agent: "I can propose a fix if you'd like." ← Waits for approval
Rules Tested:
Note: This test requires a project with failing tests to properly validate.
File: business/conv-simple-001.yaml
Tests:
Expected Behavior:
User: "What does the main function do?"
Agent: [Reads src/index.ts] ← No approval needed
Agent: "The main function does X, Y, Z..." ← Answers directly
Rules Tested:
Framework Layer (Agent-Agnostic)
Agent Layer (Per Agent)
opencode/{agent}/tests/opencode/{agent}/docs/agents/shared/tests/Test Specifies Agent
agent: openagent # Routes to OpenAgent
evals/
├── framework/ # SHARED - Works with any agent
│ ├── src/sdk/ # Test runner
│ └── src/evaluators/ # Generic evaluators
│
├── opencode/
│ ├── openagent/ # OpenAgent-specific tests
│ │ ├── tests/
│ │ │ ├── developer/
│ │ │ │ ├── task-simple-001.yaml ← NEW
│ │ │ │ ├── ctx-code-001.yaml ← NEW
│ │ │ │ ├── ctx-docs-001.yaml ← NEW
│ │ │ │ └── fail-stop-001.yaml ← NEW
│ │ │ └── business/
│ │ │ └── conv-simple-001.yaml ← NEW
│ │ └── docs/
│ │ └── OPENAGENT_RULES.md
│ │
│ ├── opencoder/ # OpenCoder tests (future)
│ │ └── tests/
│ │
│ └── shared/ # Tests for ANY agent
│ ├── tests/
│ │ └── common/
│ │ └── approval-gate-basic.yaml ← NEW
│ └── README.md ← NEW
│
└── AGENT_TESTING_GUIDE.md ← NEW
# Run ALL OpenAgent tests
npm run eval:sdk -- --pattern="openagent/**/*.yaml"
# Run specific category
npm run eval:sdk -- --pattern="openagent/developer/*.yaml"
# Run shared tests for OpenAgent
npm run eval:sdk -- --pattern="shared/**/*.yaml" --agent=openagent
# Run single test
npx tsx src/sdk/show-test-details.ts openagent/developer/task-simple-001.yaml
# 1. Create directory
mkdir -p evals/opencode/my-agent/tests/developer
# 2. Copy shared tests
cp evals/agents/shared/tests/common/*.yaml \
evals/opencode/my-agent/tests/developer/
# 3. Update agent field
sed -i 's/agent: openagent/agent: my-agent/g' \
evals/opencode/my-agent/tests/developer/*.yaml
# 4. Run tests
npm run eval:sdk -- --pattern="my-agent/**/*.yaml"
Run the new tests
cd evals/framework
npm run eval:sdk -- --pattern="openagent/developer/task-simple-001.yaml"
npm run eval:sdk -- --pattern="openagent/developer/ctx-code-001.yaml"
npm run eval:sdk -- --pattern="openagent/developer/ctx-docs-001.yaml"
npm run eval:sdk -- --pattern="openagent/business/conv-simple-001.yaml"
Run all new tests together
npm run eval:sdk -- --pattern="openagent/**/*.yaml"
Check results
Add remaining tests (8 more to reach 17 total)
Create test fixtures
Refine evaluators
Add OpenCoder tests
Expand shared tests
opencode/{agent}/agents/shared/Created:
Coverage:
Ready to:
Next: