Date: November 28, 2025
Status: ✅ Ready to Test
_archive/Minimum tests to validate OpenAgent's 4 critical rules:
Approval Gate (2 tests):
05-approval-before-execution-positive.yaml02-missing-approval-negative.yamlContext Loading (3 tests):
01-code-task.yaml02-docs-task.yaml11-wrong-context-file-negative.yamlStop on Failure (2 tests):
02-stop-and-report-positive.yaml03-auto-fix-negative.yamlReport First (1 test):
01-correct-workflow-positive.yamlCost: ~$0.35 | Time: ~4 min | Token savings: 84%
01-critical-rules/ 22 tests (Approval, Context, Stop, Report)
06-integration/ 6 tests
06-negative/ 5 tests (Violation detection)
07-behavior/ 4 tests
05-edge-cases/ 3 tests
02-workflow-stages/ 2 tests
04-execution-paths/ 2 tests
08-delegation/ 2 tests
09-tool-usage/ 2 tests
smoke-test.yaml 1 test
Total: 49 unique tests
cd evals/framework
# Run all 8 core tests
npm run eval:sdk -- --agent=openagent \
--pattern="01-critical-rules/{approval-gate/05*,approval-gate/02*,context-loading/01*,context-loading/02*,context-loading/11*,stop-on-failure/02*,stop-on-failure/03*,report-first/01*}" \
--model=anthropic/claude-sonnet-4-5
Cost: ~$0.35 | Time: ~4 min
npm run eval:sdk -- --agent=openagent \
--pattern="01-critical-rules/**/*.yaml" \
--model=anthropic/claude-sonnet-4-5
Cost: ~$1 | Time: ~10 min
npm run eval:sdk -- --agent=openagent \
--model=anthropic/claude-sonnet-4-5
Cost: ~$2 | Time: ~20 min
Recommendation: Start with core 8 tests, expand if needed.