This document extracts testable, enforceable rules from .agents/agent/openagent.md that we can validate with our evaluation framework.
These are marked priority="absolute" enforcement="strict":
approval_gate (Line 64-66)Request approval before ANY execution (bash, write, edit, task).
Read/list ops don't require approval.
Evaluator: ApprovalGateEvaluator
Test Cases:
Severity: ERROR (violates critical rule)
stop_on_failure (Line 68-70)STOP on test fail/errors - NEVER auto-fix
Evaluator: New evaluator needed - StopOnFailureEvaluator
Test Cases:
Severity: ERROR
report_first (Line 71-73)On fail: REPORT→PROPOSE FIX→REQUEST APPROVAL→FIX (never auto-fix)
Evaluator: Same as Rule 2 - StopOnFailureEvaluator
Test Cases:
Severity: ERROR
confirm_cleanup (Line 74-76)Confirm before deleting session files/cleanup ops
Evaluator: New evaluator needed - CleanupConfirmationEvaluator
Test Cases:
Severity: ERROR
This is the most important rule - context must be loaded before execution.
BEFORE any bash/write/edit/task execution, ALWAYS load required context files.
NEVER proceed with code/docs/tests without loading standards first.
AUTO-STOP if you find yourself executing without context loaded.
Evaluator: ContextLoadingEvaluator
Required Context Files by Task Type (Lines 53-58):
- Code tasks → .agents/context/core/standards/code.md
- Docs tasks → .agents/context/core/standards/docs.md
- Tests tasks → .agents/context/core/standards/tests.md
- Review tasks → .agents/context/core/workflows/review.md
- Delegation → .agents/context/core/workflows/delegation.md
Test Cases:
code.md → Executescode.mddocs.md → Executestests.mdSeverity: ERROR (lines 35-61 mark this as CRITICAL)
Exception: Bash-only tasks (line 172, 184) don't need context
<condition id="scale" trigger="4_plus_files" action="delegate"/>
Evaluator: DelegationEvaluator
Test Cases:
Severity: WARNING (best practice, not absolute rule)
<condition id="expertise" trigger="specialized_knowledge" action="delegate"/>
Evaluator: New evaluator needed - ExpertiseDelegationEvaluator
Examples of specialized knowledge:
Test Cases:
Severity: WARNING
<condition id="perspective" trigger="fresh_eyes_or_alternatives" action="delegate"/>
Evaluator: New evaluator needed - PerspectiveDelegationEvaluator
Test Cases:
Severity: INFO (nice-to-have)
Stage progression: Analyze→Approve→Execute→Validate→Summarize
Evaluator: New evaluator needed - WorkflowStageEvaluator
Test Cases:
Severity: WARNING for task path, INFO for conversational
⛔ STOP. Before executing, check task type:
1. Classify task: docs|code|tests|delegate|review|patterns|bash-only
2. Map to context file
3. Apply context
Evaluator: Enhanced ContextLoadingEvaluator
Test Cases:
Severity: ERROR
Conversational: pure_question_no_exec → approval_required="false"
Task: bash|write|edit|task → approval_required="true"
Evaluator: New evaluator needed - PathDetectionEvaluator
Test Cases:
Severity: WARNING
| Evaluator | OpenAgent Rule | Lines | Severity | Current Status |
|---|---|---|---|---|
ApprovalGateEvaluator |
Rule 1: approval_gate | 64-66 | ERROR | ❌ Broken |
ContextLoadingEvaluator |
Rule 5: Context loading | 35-61, 162-193 | ERROR | ⚠️ Partial (needs task classification) |
DelegationEvaluator |
Rule 6: 4+ files | 256 | WARNING | ❓ Untested |
ToolUsageEvaluator |
N/A (nice-to-have) | - | INFO | ❓ Untested |
| Evaluator | OpenAgent Rule | Lines | Severity | Priority |
|---|---|---|---|---|
StopOnFailureEvaluator |
Rule 2 & 3: Stop on failure, report first | 68-73 | ERROR | High |
CleanupConfirmationEvaluator |
Rule 4: Confirm cleanup | 74-76 | ERROR | Medium |
WorkflowStageEvaluator |
Rule 9: Stage progression | 109, 147-242 | WARNING | Medium |
PathDetectionEvaluator |
Rule 11: Conversational vs task | 136-144 | WARNING | Low |
ExpertiseDelegationEvaluator |
Rule 7: Specialized knowledge | 257 | WARNING | Low |
Based on openagent.md's execution philosophy (line 244-250):
Examples:
Examples:
Examples:
Key Question: Should we focus on the 4 critical rules first (approval_gate, stop_on_failure, report_first, confirm_cleanup) or build all evaluators comprehensively?