Date: 2025-11-26
Status: ✅ COMPLETE - ALL TESTS PASSING (5/5)
Created comprehensive test suite to verify OpenAgent loads context files correctly:
Simple Tests (3) - Single prompt, read-only
ctx-simple-coding-standards.yaml - Coding standards queryctx-simple-documentation-format.yaml - Documentation format queryctx-simple-testing-approach.yaml - Testing strategy queryComplex Tests (2) - Multi-turn with file creation
ctx-multi-standards-to-docs.yaml - Standards → Documentation creationctx-multi-error-handling-to-tests.yaml - Error handling → Test creationImplemented intelligent timeout handling for multi-turn tests:
Code: evals/framework/src/sdk/test-runner.ts - withSmartTimeout() method
Corrected evaluator to properly detect context files in multi-turn sessions:
Issues Fixed:
tool.data.input.filePath)tool.data.state.input.filePathCode: evals/framework/src/evaluators/context-loading-evaluator.ts
Created helper script for running tests in controlled batches:
Script: evals/framewor./scripts/utils/run-tests-batch.sh
Usage:
cd evals/framework
./scripts/utils/run-tests-batch.sh openagent 3 10
Confirmed automatic cleanup working correctly:
test_tmp/ before teststest_tmp/ after tests.gitignore and README.md| Test | Type | Duration | Status | Context Files Loaded |
|---|---|---|---|---|
| ctx-simple-testing-approach | Simple | 38s | ✅ PASS | 4 files (README, HOW_TESTS_WORK, etc.) |
| ctx-simple-documentation-format | Simple | 26s | ✅ PASS | docs.md |
| ctx-simple-coding-standards | Simple | 21s | ✅ PASS | code.md |
| ctx-multi-standards-to-docs | Complex | 116s | ✅ PASS | code.md, docs.md (44s before execution) |
| ctx-multi-error-handling-to-tests | Complex | 148s | ✅ PASS | code.md, tests.md (58s before execution) |
Total Duration: 349 seconds (~6 minutes)
Pass Rate: 5/5 (100%)
Violations: 0
// Single session created once
const session = await this.client.createSession({ title: testCase.name });
sessionId = session.id;
// All prompts use SAME session
for (let i = 0; i < testCase.prompts.length; i++) {
await this.client.sendPrompt(sessionId, { text: msg.text, ... });
}
// Base timeout: 300s of inactivity
// Max timeout: 600s absolute
await this.withSmartTimeout(
promptPromise,
300000, // 5 min activity timeout
600000, // 10 min absolute max
`Prompt ${i + 1} execution timed out`
);
// Fixed file path extraction
const filePath = tool.data?.state?.input?.filePath || // ✅ NEW
tool.data?.state?.input?.path ||
tool.data?.input?.filePath || // Old fallback
tool.data?.input?.path;
evals/agents/openagent/tests/context-loading/
├── ctx-simple-coding-standards.yaml
├── ctx-simple-documentation-format.yaml
├── ctx-simple-testing-approach.yaml
├── ctx-multi-standards-to-docs.yaml
└── ctx-multi-error-handling-to-tests.yaml
evals/agents/openagent/
├── CONTEXT_LOADING_COVERAGE.md
└── IMPLEMENTATION_SUMMARY.md (this file)
evals/framework/
└── scripts/
evals/framework/src/sdk/test-runner.ts
- Added withSmartTimeout() method
- Updated multi-turn test execution to use smart timeout
evals/framework/src/evaluators/context-loading-evaluator.ts
- Fixed file path extraction (tool.data.state.input.filePath)
- Added multi-turn execution checking
- Improved violation detection
evals/agents/openagent/tests/context-loading/*.yaml
- Increased timeout from 180s to 300s for complex tests
run-tests-batch.sh scriptcd evals/framework
npm run eval:sdk -- --agent=openagent --pattern="context-loading/*.yaml"
npm run eval:sdk -- --agent=openagent --pattern="context-loading/ctx-simple-coding-standards.yaml"
./scripts/utils/run-tests-batch.sh openagent 3 10
# Args: agent, batch_size, delay_seconds
cd ../results
./serve.sh
Add More Edge Cases
Performance Metrics
Test Coverage Expansion
✅ All objectives achieved
✅ 100% test pass rate
✅ OpenAgent context loading verified working correctly
✅ Test infrastructure improved and reliable
✅ Documentation complete
The context loading test suite is production-ready and provides comprehensive coverage of OpenAgent's context file loading behavior across both simple and complex multi-turn scenarios.
Maintained by: OpenCode Agents Team
Last Updated: 2025-11-26
Test Framework Version: 0.1.0