Integration Tests - Eval Pipeline

Overview

Comprehensive integration tests for the OpenCode evaluation framework that validate the complete pipeline from test execution through evaluation and reporting.

Test File

Location: src/__tests__/eval-pipeline-integration.test.ts

Test Count: 14 comprehensive integration tests

Status: ✅ All tests passing (14/14)

Test Coverage

1. Single Test Execution (3 tests)

Tests basic test case execution and evaluation:

Simple test case end-to-end: Validates basic prompt execution, event capture, evaluation, and scoring
Test with tool execution: Validates tool execution detection and evaluation
Approval gate violations: Validates approval denial detection

2. Multiple Test Execution (1 test)

Tests batch execution capabilities:

Execute multiple tests in sequence: Validates sequential test execution with proper session isolation

3. Evaluator Integration (2 tests)

Tests evaluator coordination and aggregation:

Multiple evaluators on same session: Validates that multiple evaluators can analyze the same session
Violation aggregation: Validates that violations from multiple evaluators are properly aggregated and counted

4. Session Data Collection (2 tests)

Tests session data collection and timeline building:

Complete session timeline: Validates timeline building from session data
Session with no tool execution: Validates handling of text-only sessions

5. Error Handling (2 tests)

Tests error scenarios and edge cases:

Test timeout handling: Validates graceful timeout handling
Invalid session ID: Validates error handling for non-existent sessions

6. Result Validation (2 tests)

Tests result structure and validation:

Result structure validation: Validates complete result object structure
Overall score calculation: Validates score aggregation from multiple evaluators

7. Report Generation (2 tests)

Tests report generation capabilities:

Text report generation: Validates single-session report generation
Batch summary report: Validates multi-session summary generation

Running the Tests

Run Integration Tests Only

cd evals/framework
SKIP_INTEGRATION=false npm test -- src/__tests__/eval-pipeline-integration.test.ts --run

Run All Tests (Including Integration)

cd evals/framework
SKIP_INTEGRATION=false npm test -- --run

Skip Integration Tests (Default)

Integration tests are skipped by default in CI environments and when SKIP_INTEGRATION=true:

cd evals/framework
npm test -- --run  # Integration tests skipped

Test Requirements

Integration tests require:

OpenCode CLI installed: The opencode command must be available
Running server: Tests start their own server instance
Network access: Tests communicate with the local server
Time: Integration tests take ~60 seconds to complete

Test Architecture

Components Tested

TestRunner: Orchestrates test execution
TestExecutor: Executes individual test cases
SessionReader: Reads session data from storage
TimelineBuilder: Builds event timelines from sessions
EvaluatorRunner: Runs evaluators and aggregates results
Individual Evaluators: ApprovalGate, ContextLoading, ToolUsage, etc.

Test Flow

Test Case → TestRunner → TestExecutor → Agent Execution
                                              ↓
                                        Session Data
                                              ↓
                                     SessionReader
                                              ↓
                                    TimelineBuilder
                                              ↓
                                    EvaluatorRunner
                                              ↓
                                    Multiple Evaluators
                                              ↓
                                    Aggregated Results
                                              ↓
                                    Report Generation

Key Validations

Execution Phase

✅ Session creation and management
✅ Event stream handling
✅ Approval strategy execution
✅ Tool execution detection
✅ Timeout handling
✅ Error handling

Evaluation Phase

✅ Timeline building from session data
✅ Multiple evaluators running on same session
✅ Violation detection and tracking
✅ Evidence collection
✅ Score calculation
✅ Pass/fail determination

Reporting Phase

✅ Result structure validation
✅ Violation aggregation
✅ Score aggregation
✅ Text report generation
✅ Batch summary generation

Test Isolation

Each test:

Creates its own session
Runs independently
Cleans up after completion
Does not affect other tests

Sessions are tracked in sessionIds array and cleaned up in afterAll hook.

Performance

Total Duration: ~60 seconds for all 14 tests
Average per test: ~4 seconds
Longest test: Batch execution (~8 seconds)
Shortest test: Error handling (~2 seconds)

Debugging

To enable debug output:

cd evals/framework
DEBUG_VERBOSE=true SKIP_INTEGRATION=false npm test -- src/__tests__/eval-pipeline-integration.test.ts --run

This will show:

Detailed event logs
Evaluator execution details
Session data
Timeline events
Violation details

Future Enhancements

Potential additions to integration tests:

Multi-turn conversation tests: Test complex multi-message interactions
Delegation tests: Test subagent delegation scenarios
Context loading tests: Test context file loading and validation
Performance benchmarks: Test execution speed and resource usage
Parallel execution: Test concurrent test execution
Custom evaluator tests: Test custom evaluator registration and execution

INTEGRATION_TESTS.md 6.1 KB

History Raw

Integration Tests - Eval Pipeline

Overview

Test File

Test Coverage

1. Single Test Execution (3 tests)

2. Multiple Test Execution (1 test)

3. Evaluator Integration (2 tests)

4. Session Data Collection (2 tests)

5. Error Handling (2 tests)

6. Result Validation (2 tests)

7. Report Generation (2 tests)

Running the Tests

Run Integration Tests Only

Run All Tests (Including Integration)

Skip Integration Tests (Default)

Test Requirements

Test Architecture

Components Tested

Test Flow

Key Validations

Execution Phase

Evaluation Phase

Reporting Phase

Test Isolation

Performance

Debugging

Future Enhancements

Related Documentation

INTEGRATION_TESTS.md 6.1 KB History Raw

Integration Tests - Eval Pipeline

Overview

Test File

Test Coverage

1. Single Test Execution (3 tests)

2. Multiple Test Execution (1 test)

3. Evaluator Integration (2 tests)

4. Session Data Collection (2 tests)

5. Error Handling (2 tests)

6. Result Validation (2 tests)

7. Report Generation (2 tests)

Running the Tests

Run Integration Tests Only

Run All Tests (Including Integration)

Skip Integration Tests (Default)

Test Requirements

Test Architecture

Components Tested

Test Flow

Key Validations

Execution Phase

Evaluation Phase

Reporting Phase

Test Isolation

Performance

Debugging

Future Enhancements

Related Documentation

INTEGRATION_TESTS.md 6.1 KB

History Raw