# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.1.0-alpha.1] - 2025-11-26

### Added

#### SDK-Based Evaluation Framework
- Complete test execution framework using OpenCode SDK
- Support for openagent and opencoder testing
- Real agent testing with session management
- Smart timeout system with activity monitoring
- Multi-turn conversation support

#### Modular Architecture
- Refactored test-runner.ts (884 lines → 4 focused modules):
  - `test-runner.ts` (411 lines): Thin orchestrator
  - `test-executor.ts` (392 lines): Core execution logic
  - `result-validator.ts` (253 lines): Validation logic
  - `event-logger.ts` (128 lines): Logging utilities
- Improved Single Responsibility Principle compliance
- Enhanced testability through dependency injection

#### Test Infrastructure
- 20+ test cases across multiple categories:
  - OpenAgent: Developer (12), Context Loading (5), Business (2), Edge Cases (3)
  - OpenCoder: Developer (4)
- BehaviorEvaluator for validating expected agent actions
- Comprehensive evaluators: approval-gate, context-loading, delegation, tool-usage

#### Interactive Results Dashboard
- Real-time test results visualization
- Filtering by agent, category, status
- Detailed violation tracking
- CSV export functionality
- Historical results tracking
- One-command deployment (`./serve.sh`)

#### Documentation
- ARCHITECTURE.md: Comprehensive system review (456 lines)
- GETTING_STARTED.md: Quick start guide (435 lines)
- SDK_EVAL_README.md: Complete SDK guide (298 lines)
- Test design guide and architecture overview
- Documentation cleanup (removed 3 outdated files)

#### Script Organization
- Organized 12 scripts into logical directories:
  - `scripts/debug/`: Session debugging tools (4 files)
  - `scripts/test/`: Test execution scripts (6 files)
  - `scripts/utils/`: Utility scripts (2 files)
- Comprehensive scripts/README.md with usage examples

#### Monorepo Structure
- Root package.json with convenient npm scripts
- Easy agent selection (openagent, opencoder)
- Easy model selection (grok, claude, gpt-4)
- Quick dashboard access from root
- No folder navigation required

#### CI/CD
- GitHub Actions workflow for automated testing
- Pre-merge validation for agent changes
- Fast smoke tests for both agents
- Automated test result reporting

#### Agent Improvements
- Enhanced openagent with better context loading
- New opencoder agent with test suite
- Improved subagent invocation patterns
- Ultra-compact context index system

### Changed
- Reorganized evaluation framework structure
- Improved test case schema with behavior expectations
- Enhanced context loading detection

### Removed
- Outdated documentation files (TESTING_CONFIDENCE.md, TEST_REVIEW.md, SESSION_STORAGE_FIX.md)
- Redundant test files

### Fixed
- Context loading evaluator detection accuracy
- Multi-turn prompt handling
- Test artifact cleanup

---

## Version Format

```
v0.1.0-alpha.1
│ │ │  │      │
│ │ │  │      └─ Build/Iteration number
│ │ │  └──────── Release stage (alpha, beta, rc)
│ │ └─────────── Patch version
│ └───────────── Minor version
└─────────────── Major version (0 = pre-release)
```

### Version Progression

- **Alpha** (`v0.x.0-alpha.N`): Early development, unstable
- **Beta** (`v0.x.0-beta.N`): Feature complete, testing
- **RC** (`v0.x.0-rc.N`): Release candidate, stable
- **Stable** (`v1.x.x`): Production ready

[0.1.0-alpha.1]: https://github.com/darrenhinde/OpenAgents/releases/tag/v0.1.0-alpha.1