Core Test Suite - Minimum Viable Tests

Purpose: Minimum tests needed to validate OpenAgent's 4 critical rules
Total: 8 core tests (down from 49)

Core Tests (8 tests)

1. Approval Gate (2 tests)

✅ 05-approval-before-execution-positive.yaml - Standard approval workflow
❌ 02-missing-approval-negative.yaml - Should fail without approval

2. Context Loading (3 tests)

✅ 01-code-task.yaml - Code task loads code.md
✅ 02-docs-task.yaml - Docs task loads docs.md
❌ 11-wrong-context-file-negative.yaml - Should fail with wrong context

3. Stop on Failure (2 tests)

✅ 02-stop-and-report-positive.yaml - Stops and reports
❌ 03-auto-fix-negative.yaml - Should fail if auto-fixes

4. Report First (1 test)

✅ 01-correct-workflow-positive.yaml - Report→Propose→Approve→Fix

Why These 8 Tests?

Approval Gate (2 tests):

Positive: Validates approval BEFORE execution works
Negative: Validates missing approval is caught

Context Loading (3 tests):

Code task: Most common use case
Docs task: Second most common
Wrong context: Validates evaluator catches wrong file

Stop on Failure (2 tests):

Positive: Validates agent stops on error
Negative: Validates auto-fix is caught

Report First (1 test):

Validates Report→Propose→Approve→Fix workflow

What We're NOT Testing (Can Add Later)

Conversational path (3 tests)
Multi-turn context (2 tests)
Delegation (2 tests)
Edge cases (3 tests)
Integration (6 tests)
Behavior validation (4 tests)
Tool usage (2 tests)

Total skipped: 22 tests

Token Optimization

Full Suite: 49 tests × ~7,000 tokens = ~343,000 tokens
Core Suite: 8 tests × ~7,000 tokens = ~56,000 tokens

Savings: 84% reduction in tokens