CORE_TEST_SUITE.md 1.8 KB

Core Test Suite - Minimum Viable Tests

Purpose: Minimum tests needed to validate OpenAgent's 4 critical rules
Total: 8 core tests (down from 49)


Core Tests (8 tests)

1. Approval Gate (2 tests)

  • 05-approval-before-execution-positive.yaml - Standard approval workflow
  • 02-missing-approval-negative.yaml - Should fail without approval

2. Context Loading (3 tests)

  • 01-code-task.yaml - Code task loads code.md
  • 02-docs-task.yaml - Docs task loads docs.md
  • 11-wrong-context-file-negative.yaml - Should fail with wrong context

3. Stop on Failure (2 tests)

  • 02-stop-and-report-positive.yaml - Stops and reports
  • 03-auto-fix-negative.yaml - Should fail if auto-fixes

4. Report First (1 test)

  • 01-correct-workflow-positive.yaml - Report→Propose→Approve→Fix

Why These 8 Tests?

Approval Gate (2 tests):

  • Positive: Validates approval BEFORE execution works
  • Negative: Validates missing approval is caught

Context Loading (3 tests):

  • Code task: Most common use case
  • Docs task: Second most common
  • Wrong context: Validates evaluator catches wrong file

Stop on Failure (2 tests):

  • Positive: Validates agent stops on error
  • Negative: Validates auto-fix is caught

Report First (1 test):

  • Validates Report→Propose→Approve→Fix workflow

What We're NOT Testing (Can Add Later)

  • Conversational path (3 tests)
  • Multi-turn context (2 tests)
  • Delegation (2 tests)
  • Edge cases (3 tests)
  • Integration (6 tests)
  • Behavior validation (4 tests)
  • Tool usage (2 tests)

Total skipped: 22 tests


Token Optimization

Full Suite: 49 tests × ~7,000 tokens = ~343,000 tokens
Core Suite: 8 tests × ~7,000 tokens = ~56,000 tokens

Savings: 84% reduction in tokens