Darren Hinde fc29fa3dc4 feat: add PR template and automated doc sync workflow (#40)		4 miesięcy temu
..
README.md	fc29fa3dc4 feat: add PR template and automated doc sync workflow (#40)	4 miesięcy temu
config.yaml	cc96acc50e feat: add 5 essential workflow tests and reorganize with agents/ structure	4 miesięcy temu
core-tests.json	fc29fa3dc4 feat: add PR template and automated doc sync workflow (#40)	4 miesięcy temu
smoke-test.json	fc29fa3dc4 feat: add PR template and automated doc sync workflow (#40)	4 miesięcy temu
suite-schema.json	fc29fa3dc4 feat: add PR template and automated doc sync workflow (#40)	4 miesięcy temu

Test Suite Configuration

This directory contains test suite definitions for the OpenAgent evaluation framework.

📁 Structure

config/
├── suite-schema.json       # JSON Schema for validation
├── core-tests.json         # Core test suite (legacy location)
├── suites/                 # Test suite definitions (recommended)
│   ├── core.json          # Core tests (7 tests, ~5-8 min)
│   ├── quick.json         # Quick smoke tests (3 tests, ~2-3 min)
│   ├── critical.json      # All critical rules (~10 tests)
│   ├── oss.json           # OSS-optimized tests (5 tests)
│   └── custom-*.json      # Your custom suites
└── README.md              # This file

🎯 Creating a Test Suite

Step 1: Copy Template

cp evals/agents/openagent/config/suites/core.json \
   evals/agents/openagent/config/suites/my-suite.json

Step 2: Edit Suite Definition

{
  "name": "My Custom Suite",
  "description": "Tests for specific use case",
  "version": "1.0.0",
  "agent": "openagent",
  "totalTests": 3,
  "estimatedRuntime": "3-5 minutes",
  "tests": [
    {
      "id": 1,
      "name": "Approval Gate",
      "path": "01-critical-rules/approval-gate/05-approval-before-execution-positive.yaml",
      "category": "critical-rules",
      "priority": "critical",
      "required": true,
      "estimatedTime": "30-60s",
      "description": "Validates approval workflow"
    }
  ]
}

Step 3: Validate

# Validate your suite
npm run validate:suites

# Or validate all suites
npm run validate:suites:all

Step 4: Run Tests

# Run your custom suite
npm run eval:sdk -- --agent=openagent --suite=my-suite

# With prompt variant
npm run eval:sdk -- --agent=openagent --suite=my-suite --prompt-variant=XOSS

✅ Validation Layers

1. JSON Schema Validation

File: suite-schema.json

Validates:

✅ Required fields present
✅ Correct data types
✅ Valid enum values (category, priority)
✅ Proper format (version, estimatedTime)
✅ Path format (must end with .yaml)

Example Error:

❌ Schema validation failed
   tests[0].priority: Invalid enum value. Expected 'critical' | 'high' | 'medium' | 'low', received 'urgent'

2. Path Validation

Checks that all test files exist:

./scripts/validation/validate-test-suites.sh openagent

Example Output:

🔍 Validating Test Suites

Validating: openagent/core
  ✅ Valid (7 tests)

Validating: openagent/my-suite
  ❌ Missing test files (1):
     - 01-critical-rules/approval-gate/WRONG-PATH.yaml
       Did you mean?
         - 05-approval-before-execution-positive.yaml
         - 01-basic-approval.yaml
  ❌ Invalid (1 errors, 0 warnings)

3. TypeScript Type Safety

File: evals/framework/src/sdk/suite-validator.ts

Provides compile-time type checking:

import { TestSuite, SuiteValidator } from './suite-validator';

// Type-safe suite loading
const validator = new SuiteValidator(agentsDir);
const result = validator.validateSuiteFile('openagent', suitePath);

if (result.valid && result.suite) {
  // result.suite is fully typed!
  const testCount: number = result.suite.totalTests;
  const firstTest: TestDefinition = result.suite.tests[0];
}

4. Pre-Commit Hook

Automatically validates suites before committing:

# Setup (one-time)
./scripts/validation/setup-pre-commit-hook.sh

# Now validation runs automatically on commit
git add evals/agents/openagent/config/suites/my-suite.json
git commit -m "Add custom suite"

# Output:
🔍 Validating test suite JSON files...
✅ Test suite validation passed

5. GitHub Actions (CI/CD)

File: .github/workflows/validate-test-suites.yml

Runs on:

Push to main
Pull requests
Changes to suite files or test files

Automatically comments on PRs if validation fails.

📋 Suite Schema Reference

Required Fields

Field	Type	Description	Example
`name`	string	Human-readable suite name	`"Core Test Suite"`
`description`	string	Brief description	`"Essential tests"`
`version`	string	Semver version	`"1.0.0"`
`agent`	enum	Agent name	`"openagent"`
`totalTests`	number	Total test count	`7`
`estimatedRuntime`	string	Estimated runtime	`"5-8 minutes"`
`tests`	array	Test definitions	See below

Test Definition Fields

Field	Type	Required	Description
`id`	number	✅	Unique test ID (within suite)
`name`	string	✅	Human-readable test name
`path`	string	✅	Relative path from `tests/` directory
`category`	enum	✅	Test category (see below)
`priority`	enum	✅	Priority level
`required`	boolean	❌	Whether test must exist (default: true)
`estimatedTime`	string	❌	Estimated runtime (e.g., "30-60s")
`description`	string	❌	Brief description

Valid Priorities

critical - Must pass
high - Important
medium - Standard
low - Nice to have

🔧 Validation Commands

# Validate specific agent
./scripts/validation/validate-test-suites.sh openagent

# Validate all agents
./scripts/validation/validate-test-suites.sh --all

# Via npm (from evals/framework/)
npm run validate:suites          # Current agent
npm run validate:suites:all      # All agents

# Setup pre-commit hook
./scripts/validation/setup-pre-commit-hook.sh

🚨 Common Errors

1. Invalid JSON Syntax

Error:

❌ Invalid JSON syntax

Fix: Check for:

Missing commas
Trailing commas
Unquoted keys
Unclosed brackets

Use a JSON validator or IDE with JSON support.

2. Schema Validation Failed

Error:

❌ Schema validation failed
   version: String must match pattern ^\d+\.\d+\.\d+$

Fix: Ensure version follows semver format: "1.0.0"

3. Missing Test Files

Error:

❌ Missing test files (1):
   - 01-critical-rules/approval-gate/wrong-path.yaml

Fix:

Check the path is correct
Verify file exists in evals/agents/openagent/tests/
Use suggested similar files

4. Test Count Mismatch

Warning:

⚠️  Test count mismatch: found 6, declared 7

Fix: Update totalTests field to match actual test count.

💡 Best Practices

1. Use Descriptive Names

// ❌ Bad
"name": "Test 1"

// ✅ Good
"name": "Approval Gate - Positive Case"

2. Mark Optional Tests

{
  "id": 5,
  "name": "Experimental Feature",
  "path": "experimental/new-feature.yaml",
  "required": false  // Won't fail validation if missing
}

3. Keep Test IDs Sequential

"tests": [
  { "id": 1, ... },
  { "id": 2, ... },
  { "id": 3, ... }
]

4. Document Your Rationale

{
  "rationale": {
    "why7Tests": "These 7 tests provide 85% coverage with 90% fewer tests",
    "useCases": [
      "Quick validation before commits",
      "CI/CD pull request checks"
    ]
  }
}

5. Version Your Suites

When making breaking changes, bump the version:

// Before
"version": "1.0.0"

// After adding new required tests
"version": "2.0.0"

🔗 Related Documentation

🆘 Troubleshooting

Validation Script Not Found

# Make sure script is executable
chmod +x scripts/validation/validate-test-suites.sh

ajv-cli Not Installed

# Install globally
npm install -g ajv-cli

# Or install in framework
cd evals/framework
npm install

Pre-Commit Hook Not Running

# Re-run setup
./scripts/validation/setup-pre-commit-hook.sh

# Verify hook exists
ls -la .git/hooks/pre-commit

📞 Support

If you encounter issues:

Check this README
Run validation with --debug flag (coming soon)
Check GitHub Actions logs
Open an issue with validation output

README.md

Test Suite Configuration

📁 Structure

🎯 Creating a Test Suite

Step 1: Copy Template

Step 2: Edit Suite Definition

Step 3: Validate

Step 4: Run Tests

✅ Validation Layers

1. JSON Schema Validation

2. Path Validation

3. TypeScript Type Safety

4. Pre-Commit Hook

5. GitHub Actions (CI/CD)

📋 Suite Schema Reference

Required Fields

Test Definition Fields

Valid Categories

Valid Priorities

🔧 Validation Commands

🚨 Common Errors

1. Invalid JSON Syntax

2. Schema Validation Failed

3. Missing Test Files

4. Test Count Mismatch

💡 Best Practices

1. Use Descriptive Names

2. Mark Optional Tests

3. Keep Test IDs Sequential

4. Document Your Rationale

5. Version Your Suites

🔗 Related Documentation

🆘 Troubleshooting

Validation Script Not Found

ajv-cli Not Installed

Pre-Commit Hook Not Running

📞 Support