Comprehensive test suite for OpenAgent with focus on context loading, approval workflows, and multi-turn conversations.
Total Tests: 22
Pass Rate: 100% โ
Last Updated: 2025-11-26
| Category | Tests | Status | Description |
|---|---|---|---|
| context-loading | 5 | โ 100% | Context file loading validation |
| developer | 12 | โ 100% | Developer workflow tests |
| business | 2 | โ 100% | Business analysis tests |
| edge-case | 3 | โ 100% | Edge cases and error handling |
5 comprehensive tests validating that OpenAgent loads context files before execution:
| Test | Type | Duration | Status |
|---|---|---|---|
| ctx-simple-testing-approach | Simple | ~38s | โ PASS |
| ctx-simple-documentation-format | Simple | ~26s | โ PASS |
| ctx-simple-coding-standards | Simple | ~21s | โ PASS |
| ctx-multi-standards-to-docs | Complex | ~116s | โ PASS |
| ctx-multi-error-handling-to-tests | Complex | ~148s | โ PASS |
Total Duration: ~6 minutes for all 5 tests
Pass Rate: 100% (5/5)
ctx-simple-coding-standards - Asks about coding standards
code.md before respondingreadctx-simple-documentation-format - Asks about documentation format
docs.md before respondingreadctx-simple-testing-approach - Asks about testing strategy
read (multiple files)ctx-multi-standards-to-docs - Standards โ Documentation creation
code.md + docs.md before writingread, writectx-multi-error-handling-to-tests - Error handling โ Test creation
code.md + tests.md before writingread, write, grep, list, globSee: CONTEXT_LOADING_COVERAGE.md for detailed documentation
cd evals/framework
npm run eval:sdk -- --agent=openagent
npm run eval:sdk -- --agent=openagent --pattern="context-loading/*.yaml"
npm run eval:sdk -- --agent=openagent --pattern="context-loading/ctx-simple-coding-standards.yaml"
npm run eval:sdk -- --agent=openagent --pattern="context-loading/*.yaml" --debug
./scripts/utils/run-tests-batch.sh openagent 3 10
# Args: agent, batch_size, delay_seconds
tests/
โโโ context-loading/ # Context loading tests (NEW)
โ โโโ ctx-simple-coding-standards.yaml
โ โโโ ctx-simple-documentation-format.yaml
โ โโโ ctx-simple-testing-approach.yaml
โ โโโ ctx-multi-standards-to-docs.yaml
โ โโโ ctx-multi-error-handling-to-tests.yaml
โ
โโโ developer/ # Developer workflow tests
โ โโโ ctx-code-001.yaml # Code task with context
โ โโโ ctx-docs-001.yaml # Docs task with context
โ โโโ ctx-tests-001.yaml # Tests task with context
โ โโโ ctx-review-001.yaml # Review task with context
โ โโโ ctx-delegation-001.yaml # Delegation task
โ โโโ ctx-multi-turn-001.yaml # Multi-turn conversation
โ โโโ create-component.yaml # Component creation
โ โโโ install-dependencies.yaml
โ โโโ install-dependencies-v2.yaml
โ โโโ task-simple-001.yaml
โ โโโ fail-stop-001.yaml
โ
โโโ business/ # Business analysis tests
โ โโโ conv-simple-001.yaml
โ โโโ data-analysis.yaml
โ
โโโ edge-case/ # Edge cases
โโโ just-do-it.yaml
โโโ missing-approval-negative.yaml
โโโ no-approval-negative.yaml
OpenAgent tests use multi-turn prompts to simulate approval workflow:
prompts:
- text: "What are our coding standards?"
expectContext: true
contextFile: "standards.md"
- text: "approve"
delayMs: 2000
- text: "Create documentation about these standards"
expectContext: true
contextFile: "docs.md"
Complex tests use smart timeout system:
timeout: 300000 # 5 minutes
Tests verify context files are loaded before execution:
behavior:
mustUseTools: [read, write]
requiresContext: true
minToolCalls: 2
expectedViolations:
- rule: context-loading
shouldViolate: false
severity: error
======================================================================
SUMMARY: 5/5 context loading tests passed (0 failed)
======================================================================
โ
ctx-simple-testing-approach (38s)
โ
ctx-simple-documentation-format (26s)
โ
ctx-simple-coding-standards (21s)
โ
ctx-multi-standards-to-docs (116s)
โ
ctx-multi-error-handling-to-tests (148s)
Total Duration: 349 seconds (~6 minutes)
Pass Rate: 100%
Violations: 0
Context Loading:
โ Loaded: .opencode/context/core/standards/code.md
โ Timing: Context loaded 44317ms before execution
โ
Context Loading Tests - 5 comprehensive tests (3 simple, 2 complex)
โ
100% Pass Rate - All tests passing
โ
Smart Timeout - Handles complex multi-turn tests
โ
Fixed Evaluator - Properly detects context files
โ
Cleanup System - Auto-cleans test artifacts
โ
Documentation - Complete coverage documentation
| Document | Purpose |
|---|---|
| CONTEXT_LOADING_COVERAGE.md | Detailed context loading test documentation |
| IMPLEMENTATION_SUMMARY.md | Recent implementation details and fixes |
| docs/OPENAGENT_RULES.md | OpenAgent rules reference |
id: ctx-simple-coding-standards
name: "Context Loading: Coding Standards"
description: |
Simple test: Ask about coding standards and verify agent loads context file.
category: developer
agent: openagent
model: anthropic/claude-sonnet-4-5
prompt: "What are our coding standards for this project?"
behavior:
mustUseAnyOf: [[read]]
requiresContext: true
minToolCalls: 1
expectedViolations:
- rule: context-loading
shouldViolate: false
severity: error
approvalStrategy:
type: auto-approve
timeout: 60000
tags:
- context-loading
- simple-test
id: ctx-multi-standards-to-docs
name: "Context Loading: Multi-Turn Standards to Documentation"
description: |
Complex multi-turn test: Standards question โ Documentation request
category: developer
agent: openagent
model: anthropic/claude-sonnet-4-5
prompts:
- text: "What are our coding standards?"
expectContext: true
contextFile: "standards.md"
- text: "approve"
delayMs: 2000
- text: "Can you create documentation about these standards?"
expectContext: true
contextFile: "docs.md"
- text: "approve"
delayMs: 2000
behavior:
mustUseTools: [read, write]
requiresApproval: true
requiresContext: true
minToolCalls: 3
expectedViolations:
- rule: approval-gate
shouldViolate: false
severity: error
- rule: context-loading
shouldViolate: false
severity: error
approvalStrategy:
type: auto-approve
timeout: 300000 # 5 minutes
tags:
- context-loading
- multi-turn
- complex-test
Issue: Test times out on complex multi-turn scenarios
Solution: Increase timeout to 300000ms (5 minutes)
Issue: Evaluator reports "no context loaded"
Solution: Ensure test uses multi-turn prompts with approval
Issue: Test artifacts remain in test_tmp/
Solution: Check cleanup logic in run-sdk-tests.ts
Add More Edge Cases
Performance Metrics
Test Coverage Expansion
To add new tests:
Last Updated: 2025-11-26
Test Framework Version: 0.1.0
OpenAgent Tests: 22
Pass Rate: 100%