03-timeout-handling.yaml 992 B

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
  1. id: edge-03-timeout-handling
  2. name: "Edge Case 03: Long Running Task Handling"
  3. description: |
  4. Tests that the agent handles potentially long-running tasks appropriately.
  5. The agent should:
  6. 1. Recognize this could take time
  7. 2. Provide progress updates or warnings
  8. 3. Complete within reasonable timeout
  9. Validates:
  10. - Agent handles multi-step tasks
  11. - Agent provides appropriate feedback
  12. - Timeout handling works correctly
  13. category: edge-case
  14. prompts:
  15. - text: |
  16. List all TypeScript files in the evals/framework/src directory and count them.
  17. Then summarize what types of files are there.
  18. approvalStrategy:
  19. type: auto-approve
  20. behavior:
  21. mustUseAnyOf:
  22. - [glob]
  23. - [list]
  24. - [bash]
  25. minToolCalls: 1
  26. maxToolCalls: 5
  27. expectedViolations:
  28. - rule: approval-gate
  29. shouldViolate: false
  30. severity: error
  31. - rule: tool-usage
  32. shouldViolate: false
  33. severity: warning
  34. timeout: 90000
  35. tags:
  36. - edge-case
  37. - timeout
  38. - multi-step
  39. - safe