This directory contains test suite definitions for the OpenAgent evaluation framework.
config/
├── suite-schema.json # JSON Schema for validation
├── core-tests.json # Core test suite (legacy location)
├── suites/ # Test suite definitions (recommended)
│ ├── core.json # Core tests (7 tests, ~5-8 min)
│ ├── quick.json # Quick smoke tests (3 tests, ~2-3 min)
│ ├── critical.json # All critical rules (~10 tests)
│ ├── oss.json # OSS-optimized tests (5 tests)
│ └── custom-*.json # Your custom suites
└── README.md # This file
cp evals/agents/openagent/config/suites/core.json \
evals/agents/openagent/config/suites/my-suite.json
{
"name": "My Custom Suite",
"description": "Tests for specific use case",
"version": "1.0.0",
"agent": "openagent",
"totalTests": 3,
"estimatedRuntime": "3-5 minutes",
"tests": [
{
"id": 1,
"name": "Approval Gate",
"path": "01-critical-rules/approval-gate/05-approval-before-execution-positive.yaml",
"category": "critical-rules",
"priority": "critical",
"required": true,
"estimatedTime": "30-60s",
"description": "Validates approval workflow"
}
]
}
# Validate your suite
npm run validate:suites
# Or validate all suites
npm run validate:suites:all
# Run your custom suite
npm run eval:sdk -- --agent=openagent --suite=my-suite
# With prompt variant
npm run eval:sdk -- --agent=openagent --suite=my-suite --prompt-variant=XOSS
File: suite-schema.json
Validates:
Example Error:
❌ Schema validation failed
tests[0].priority: Invalid enum value. Expected 'critical' | 'high' | 'medium' | 'low', received 'urgent'
Checks that all test files exist:
./scripts/validation/validate-test-suites.sh openagent
Example Output:
🔍 Validating Test Suites
Validating: openagent/core
✅ Valid (7 tests)
Validating: openagent/my-suite
❌ Missing test files (1):
- 01-critical-rules/approval-gate/WRONG-PATH.yaml
Did you mean?
- 05-approval-before-execution-positive.yaml
- 01-basic-approval.yaml
❌ Invalid (1 errors, 0 warnings)
File: evals/framework/src/sdk/suite-validator.ts
Provides compile-time type checking:
import { TestSuite, SuiteValidator } from './suite-validator';
// Type-safe suite loading
const validator = new SuiteValidator(agentsDir);
const result = validator.validateSuiteFile('openagent', suitePath);
if (result.valid && result.suite) {
// result.suite is fully typed!
const testCount: number = result.suite.totalTests;
const firstTest: TestDefinition = result.suite.tests[0];
}
Automatically validates suites before committing:
# Setup (one-time)
./scripts/validation/setup-pre-commit-hook.sh
# Now validation runs automatically on commit
git add evals/agents/openagent/config/suites/my-suite.json
git commit -m "Add custom suite"
# Output:
🔍 Validating test suite JSON files...
✅ Test suite validation passed
File: .github/workflows/validate-test-suites.yml
Runs on:
mainAutomatically comments on PRs if validation fails.
| Field | Type | Description | Example |
|---|---|---|---|
name |
string | Human-readable suite name | "Core Test Suite" |
description |
string | Brief description | "Essential tests" |
version |
string | Semver version | "1.0.0" |
agent |
enum | Agent name | "openagent" |
totalTests |
number | Total test count | 7 |
estimatedRuntime |
string | Estimated runtime | "5-8 minutes" |
tests |
array | Test definitions | See below |
| Field | Type | Required | Description |
|---|---|---|---|
id |
number | ✅ | Unique test ID (within suite) |
name |
string | ✅ | Human-readable test name |
path |
string | ✅ | Relative path from tests/ directory |
category |
enum | ✅ | Test category (see below) |
priority |
enum | ✅ | Priority level |
required |
boolean | ❌ | Whether test must exist (default: true) |
estimatedTime |
string | ❌ | Estimated runtime (e.g., "30-60s") |
description |
string | ❌ | Brief description |
critical-rulesworkflow-stagesdelegationexecution-pathsedge-casesintegrationnegativebehaviortool-usagecritical - Must passhigh - Importantmedium - Standardlow - Nice to have# Validate specific agent
./scripts/validation/validate-test-suites.sh openagent
# Validate all agents
./scripts/validation/validate-test-suites.sh --all
# Via npm (from evals/framework/)
npm run validate:suites # Current agent
npm run validate:suites:all # All agents
# Setup pre-commit hook
./scripts/validation/setup-pre-commit-hook.sh
Error:
❌ Invalid JSON syntax
Fix: Check for:
Use a JSON validator or IDE with JSON support.
Error:
❌ Schema validation failed
version: String must match pattern ^\d+\.\d+\.\d+$
Fix: Ensure version follows semver format: "1.0.0"
Error:
❌ Missing test files (1):
- 01-critical-rules/approval-gate/wrong-path.yaml
Fix:
evals/agents/openagent/tests/Warning:
⚠️ Test count mismatch: found 6, declared 7
Fix: Update totalTests field to match actual test count.
// ❌ Bad
"name": "Test 1"
// ✅ Good
"name": "Approval Gate - Positive Case"
{
"id": 5,
"name": "Experimental Feature",
"path": "experimental/new-feature.yaml",
"required": false // Won't fail validation if missing
}
"tests": [
{ "id": 1, ... },
{ "id": 2, ... },
{ "id": 3, ... }
]
{
"rationale": {
"why7Tests": "These 7 tests provide 85% coverage with 90% fewer tests",
"useCases": [
"Quick validation before commits",
"CI/CD pull request checks"
]
}
}
When making breaking changes, bump the version:
// Before
"version": "1.0.0"
// After adding new required tests
"version": "2.0.0"
# Make sure script is executable
chmod +x scripts/validation/validate-test-suites.sh
# Install globally
npm install -g ajv-cli
# Or install in framework
cd evals/framework
npm install
# Re-run setup
./scripts/validation/setup-pre-commit-hook.sh
# Verify hook exists
ls -la .git/hooks/pre-commit
If you encounter issues:
--debug flag (coming soon)