QUICK_START.md 7.8 KB

๐Ÿš€ OpenCode Agents - Quick Start

Version

๐Ÿ“‹ Available Agents

  • openagent - Full-featured development agent (22+ tests)

    • Developer tests: Code, docs, tests, delegation
    • Context loading tests: Standards, patterns, workflows
    • Business tests: Conversations, data analysis
    • Edge cases: Approval gates, negative tests
  • opencoder - Specialized coding agent (4+ tests)

    • Developer tests: Bash execution, file operations
    • Multi-tool workflows

๐Ÿงช Running Tests

Test All Agents

npm test                              # All agents, all tests (default)
npm run test:all                      # Explicit all agents

Test Specific Agent

npm run test:openagent                # OpenAgent only
npm run test:opencoder                # OpenCoder only

Test with Different Models

OpenAgent

npm run test:openagent:grok           # Grok (free tier, fast)
npm run test:openagent:claude         # Claude Sonnet 4.5 (best quality)
npm run test:openagent:gpt4           # GPT-4 Turbo (OpenAI)

OpenCoder

npm run test:opencoder:grok           # Grok (free tier, fast)
npm run test:opencoder:claude         # Claude Sonnet 4.5 (best quality)
npm run test:opencoder:gpt4           # GPT-4 Turbo (OpenAI)

All Agents

npm run test:all:grok                 # All agents with Grok
npm run test:all:claude               # All agents with Claude
npm run test:all:gpt4                 # All agents with GPT-4

๐ŸŽฏ Test Specific Categories

OpenAgent Categories

npm run test:openagent:developer      # Developer tests (code, docs, tests)
npm run test:openagent:context        # Context loading tests
npm run test:openagent:business       # Business/conversation tests

OpenCoder Categories

npm run test:opencoder:developer      # Developer tests
npm run test:opencoder:bash           # Bash execution tests

Custom Patterns

npm run test:pattern -- "developer/*.yaml"              # All developer tests
npm run test:pattern -- "context-loading/*.yaml"        # Context tests
npm run test:pattern -- "edge-case/*.yaml"              # Edge cases
npm run test:openagent -- --pattern="developer/ctx-*"   # OpenAgent context tests

๐Ÿ“Š View Results

Dashboard (Recommended)

npm run dashboard                     # Launch interactive dashboard
npm run dashboard:open                # Launch and auto-open browser

The dashboard provides:

  • โœ… Real-time test results visualization
  • โœ… Filter by agent, category, status
  • โœ… Detailed violation tracking
  • โœ… CSV export functionality
  • โœ… Historical results tracking

Command Line

npm run results:openagent             # Recent OpenAgent results
npm run results:opencoder             # Recent OpenCoder results
npm run results:latest                # Latest test summary (JSON)

๐Ÿ› Debug Mode

npm run test:debug                    # Run with debug output
npm run test:openagent -- --debug     # Debug OpenAgent tests
npm run test:opencoder -- --debug     # Debug OpenCoder tests

Debug mode shows:

  • Detailed event logging
  • Tool call details
  • Session information
  • Evaluation progress

๐Ÿ”ง Development

npm run dev:setup                     # Install dependencies
npm run dev:build                     # Build framework
npm run dev:test                      # Run unit tests
npm run dev:clean                     # Clean and reinstall

๐Ÿ“ˆ Version Management

npm run version                       # Show current version
npm run version:bump alpha            # Bump alpha version
npm run version:bump beta             # Bump to beta
npm run version:bump rc               # Bump to release candidate

๐Ÿ“ Test Structure

evals/agents/
โ”œโ”€โ”€ openagent/tests/
โ”‚   โ”œโ”€โ”€ developer/          # Code, docs, tests (12 tests)
โ”‚   โ”‚   โ”œโ”€โ”€ ctx-code-001.yaml
โ”‚   โ”‚   โ”œโ”€โ”€ ctx-docs-001.yaml
โ”‚   โ”‚   โ”œโ”€โ”€ ctx-tests-001.yaml
โ”‚   โ”‚   โ”œโ”€โ”€ ctx-delegation-001.yaml
โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ”œโ”€โ”€ context-loading/    # Context loading (5 tests)
โ”‚   โ”‚   โ”œโ”€โ”€ ctx-simple-coding-standards.yaml
โ”‚   โ”‚   โ”œโ”€โ”€ ctx-simple-documentation-format.yaml
โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ”œโ”€โ”€ business/           # Conversations (2 tests)
โ”‚   โ”‚   โ”œโ”€โ”€ conv-simple-001.yaml
โ”‚   โ”‚   โ””โ”€โ”€ data-analysis.yaml
โ”‚   โ””โ”€โ”€ edge-case/          # Edge cases (3 tests)
โ”‚       โ”œโ”€โ”€ just-do-it.yaml
โ”‚       โ”œโ”€โ”€ missing-approval-negative.yaml
โ”‚       โ””โ”€โ”€ no-approval-negative.yaml
โ”‚
โ””โ”€โ”€ opencoder/tests/
    โ””โ”€โ”€ developer/          # Bash, file ops (4 tests)
        โ”œโ”€โ”€ bash-execution-001.yaml
        โ”œโ”€โ”€ file-read-001.yaml
        โ”œโ”€โ”€ multi-tool-001.yaml
        โ””โ”€โ”€ simple-bash-test.yaml

๐Ÿ’ก Common Workflows

Quick Test (Free Tier)

npm run test:openagent:grok           # Fast, free
npm run test:opencoder:grok           # Fast, free

Quality Test (Best Model)

npm run test:openagent:claude         # Best quality
npm run test:opencoder:claude         # Best quality

Full Test Suite

npm run test:all:claude               # All agents, best model

Continuous Development

# 1. Run tests in debug mode
npm run test:openagent:developer -- --debug

# 2. View results in dashboard
npm run dashboard:open

# 3. Iterate on agent prompts
# Edit .opencode/agent/openagent.md

# 4. Re-run tests
npm run test:openagent:developer

CI/CD Smoke Tests

npm run test:ci                       # Fast smoke tests for both agents
npm run test:ci:openagent             # OpenAgent smoke test
npm run test:ci:opencoder             # OpenCoder smoke test

๐ŸŽฏ Test Results

After running tests, results are saved to:

  • evals/results/latest.json - Latest test run
  • evals/results/history/YYYY-MM/DD-HHMMSS-{agent}.json - Historical results

View in dashboard: npm run dashboard:open


๐Ÿ” Understanding Test Results

Test Status

  • โœ… PASSED - All checks passed, no violations
  • โŒ FAILED - Test failed (execution error or violations)

Evaluators

Tests are evaluated by multiple evaluators:

  • approval-gate - Checks if agent requested approval when required
  • context-loading - Validates context files were loaded before execution
  • delegation - Checks if agent delegated to subagents appropriately
  • tool-usage - Validates correct tool usage
  • behavior - Checks if agent performed expected actions

Violations

  • Error - Critical issues that cause test failure
  • Warning - Non-critical issues
  • Info - Informational messages

๐Ÿ“š Additional Resources


๐Ÿ†˜ Troubleshooting

Tests not running?

# Ensure dependencies are installed
npm run dev:setup

# Build the framework
npm run dev:build

Dashboard not loading?

# Check if results exist
ls -la evals/results/

# Try launching manually
cd evals/results && ./serve.sh

Version mismatch?

# Check current version
npm run version

# Sync VERSION file with package.json
npm run version > VERSION

๐ŸŽ‰ Getting Help

  • Check evals/GETTING_STARTED.md for detailed guides
  • Review test examples in evals/agents/*/tests/
  • Run tests in debug mode: npm run test:debug
  • View results dashboard: npm run dashboard:open

Current Version: 0.1.0-alpha.1
Last Updated: 2025-11-26