InteLigent c799173dd0 align context and skills consistency (#192) (#198) 2 months ago
..
content c8f7103cb6 refactor(evals): consolidate documentation and enhance test infrastructure (#56) 3 months ago
core c799173dd0 align context and skills consistency (#192) (#198) 2 months ago
data c8f7103cb6 refactor(evals): consolidate documentation and enhance test infrastructure (#56) 3 months ago
development 193271091a feat: add ExternalScout and optimize ContextScout with research-backed patterns (#128) 2 months ago
README.md f669cac34c feat: repository review and MVI context system implementation (#85) 3 months ago

README.md

Prompt Library System

Multi-model prompt variants with integrated evaluation framework for testing and validation.


๐ŸŽฏ Quick Start

Testing a Prompt Variant

# Test with eval framework (recommended)
cd evals/framework
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --suite=smoke-test

# Test with specific model
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --model=ollama/llama3.2 --suite=core-tests

# View results
open ../results/index.html

Using a Variant Permanently

# Switch to a variant
./scripts/prompts/use-prompt.sh --agent=openagent --variant=llama

# Restore default (canonical agent file)
./scripts/prompts/use-prompt.sh --agent=openagent --variant=default

๐Ÿ“ Structure

.opencode/
โ”œโ”€โ”€ agent/                       # Canonical agent prompts (defaults)
โ”‚   โ”œโ”€โ”€ openagent.md            # OpenAgent default (Claude-optimized)
โ”‚   โ””โ”€โ”€ opencoder.md            # OpenCoder default
โ””โ”€โ”€ prompts/                     # Model-specific variants
    โ”œโ”€โ”€ navigation.md               # This file
    โ”œโ”€โ”€ openagent/              # OpenAgent variants
    โ”‚   โ”œโ”€โ”€ gpt.md             # GPT-4 optimized
    โ”‚   โ”œโ”€โ”€ gemini.md          # Gemini optimized
    โ”‚   โ”œโ”€โ”€ grok.md            # Grok optimized
    โ”‚   โ”œโ”€โ”€ llama.md           # Llama/OSS optimized
    โ”‚   โ”œโ”€โ”€ TEMPLATE.md        # Template for new variants
    โ”‚   โ”œโ”€โ”€ navigation.md          # Variant documentation
    โ”‚   โ””โ”€โ”€ results/           # Per-variant test results
    โ”‚       โ”œโ”€โ”€ default-results.json  # Default (agent file) results
    โ”‚       โ”œโ”€โ”€ gpt-results.json
    โ”‚       โ”œโ”€โ”€ gemini-results.json
    โ”‚       โ””โ”€โ”€ llama-results.json
    โ””โ”€โ”€ opencoder/              # OpenCoder variants
        โ””โ”€โ”€ ...

Architecture:

  • Agent files (.opencode/agent/*.md) = Canonical defaults (source of truth)
  • Prompt variants (.opencode/prompts/<agent>/<model>.md) = Model-specific optimizations
  • Results always saved to .opencode/prompts/<agent>/results/ (including default)

๐Ÿงช Evaluation Framework Integration

Running Tests with Variants

The eval framework automatically:

  • โœ… Switches to the specified variant
  • โœ… Runs your test suite
  • โœ… Tracks results per variant
  • โœ… Restores the default prompt after tests
# Smoke test (1 test, ~30s)
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --suite=smoke-test

# Core suite (7 tests, ~5-8min)
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --suite=core-tests

# Custom test pattern
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --pattern="01-critical-rules/**/*.yaml"

Auto-Model Detection

Variants specify recommended models in their YAML frontmatter:

---
model_family: llama
recommended_models:
  - ollama/llama3.2
  - ollama/qwen2.5
---

If you don't specify --model, the framework uses the first recommended model.

Results Tracking

Results are saved in two locations:

  1. Main results: evals/results/latest.json (includes prompt_variant field)
  2. Per-variant: .opencode/prompts/{agent}/results/{variant}-results.json

View in dashboard: evals/results/index.html (filter by variant)


๐Ÿ“ Creating a New Variant

Step 1: Copy Template

cp .opencode/prompts/openagent/TEMPLATE.md .opencode/prompts/openagent/my-variant.md

Step 2: Edit Metadata

---
model_family: oss
recommended_models:
  - ollama/my-model
status: experimental
maintainer: your-name
description: Optimized for my specific use case
---

Step 3: Customize Prompt

Edit the prompt content below the frontmatter for your target model.

Step 4: Validate

# Validate the variant exists and metadata is correct
cd evals/framework
npm run eval:sdk -- --agent=openagent --prompt-variant=my-variant --suite=smoke-test

Step 5: Test Thoroughly

# Run core test suite
npm run eval:sdk -- --agent=openagent --prompt-variant=my-variant --suite=core-tests

# Check results
open ../results/index.html

Step 6: Document Results

Update .opencode/prompts/openagent/navigation.md with:

  • Test results (pass rate, timing)
  • Known issues or limitations
  • Recommended use cases

๐ŸŽฏ Available Variants

OpenAgent

Variant Model Family Status Best For
default Claude โœ… Stable Production use, Claude models
gpt GPT โœ… Stable GPT-4, GPT-4o
gemini Gemini โœ… Stable Gemini 2.0, Gemini Pro
grok Grok โœ… Stable Grok models (free tier)
llama Llama/OSS โœ… Stable Llama, Qwen, DeepSeek, other OSS

See openagent/navigation.md for detailed test results.

OpenCoder

Coming soon.


๐Ÿ”ง Advanced Usage

Custom Test Suites

Create custom test suites for your variant:

# Create suite
cp evals/agents/openagent/config/smoke-test.json \
   evals/agents/openagent/config/my-suite.json

# Edit suite (add your tests)
# Validate suite
cd evals/framework && npm run validate:suites openagent

# Run with your variant
npm run eval:sdk -- --agent=openagent --prompt-variant=my-variant --suite=my-suite

See evals/TEST_SUITE_VALIDATION.md for details.

Comparing Models

Test the same variant with different models:

# Test Llama 3.2
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --model=ollama/llama3.2 --suite=core-tests

# Test Qwen 2.5
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --model=ollama/qwen2.5 --suite=core-tests

# Compare results in dashboard
open evals/results/index.html

๐Ÿ“Š Understanding Results

Dashboard Features

The results dashboard (evals/results/index.html) shows:

  • โœ… Filter by prompt variant
  • โœ… Filter by model
  • โœ… Pass/fail rates per variant
  • โœ… Test execution times
  • โœ… Detailed test results

Result Files

Main results (evals/results/latest.json):

{
  "meta": {
    "agent": "openagent",
    "model": "ollama/llama3.2",
    "prompt_variant": "llama",
    "model_family": "llama"
  },
  "summary": {
    "total": 7,
    "passed": 7,
    "failed": 0,
    "pass_rate": 1
  }
}

Per-variant results (.opencode/prompts/openagent/results/llama-results.json):

  • Tracks all test runs for this variant
  • Shows trends over time
  • Helps identify regressions

๐Ÿš€ For Contributors

Creating a Variant for PR

  1. Create your variant in .opencode/prompts/<agent>/<model>.md
  2. Test thoroughly with eval framework
  3. Document results in agent README
  4. Submit PR with variant file only

PR Requirements

  • โœ… Variant has YAML frontmatter with metadata
  • โœ… Variant passes core test suite (โ‰ฅ85% pass rate)
  • โœ… Results documented in agent README
  • โœ… Agent file unchanged (unless updating default)
  • โœ… No default.md files in prompts directory
  • โœ… CI validation passes

Validation

# Validate your variant
cd evals/framework
npm run eval:sdk -- --agent=openagent --prompt-variant=your-variant --suite=core-tests

# Ensure PR uses default
./scripts/prompts/validate-pr.sh

๐ŸŽ“ Design Principles

1. Agent Files are Canonical Defaults

  • Agent files (.opencode/agent/*.md) are the source of truth
  • Tested and production-ready
  • Optimized for Claude (primary model)
  • Modified through normal PR process

2. Variants are Model-Specific Optimizations

  • Stored in .opencode/prompts/<agent>/<model>.md
  • Optimized for specific models/use cases
  • May have different trade-offs
  • Results documented transparently

3. Results are Tracked

  • Every test run tracked per variant
  • Dashboard shows variant performance
  • Easy to compare variants

4. Easy to Test

  • One command to test any variant
  • Automatic model detection
  • Results saved automatically

5. Safe to Experiment

  • Variants don't affect default
  • Easy to switch and restore
  • Test before committing

๐Ÿ“š Related Documentation


๐Ÿ†˜ Troubleshooting

Variant Not Found

# List available variants
ls .opencode/prompts/openagent/*.md

# Check variant exists
npm run eval:sdk -- --agent=openagent --prompt-variant=your-variant --suite=smoke-test

Tests Failing

  1. Check variant metadata (YAML frontmatter)
  2. Verify recommended model is available
  3. Run with debug flag: --debug
  4. Check results in dashboard

Model Not Available

# Check available models
# For Ollama: ollama list
# For OpenRouter: check openrouter.ai/models

# Specify model explicitly
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --model=ollama/llama3.2

๐Ÿ’ก Tips

  • Start with smoke-test - Fast validation (1 test, ~30s)
  • Use core-tests for thorough testing - 7 tests, ~5-8min
  • Check dashboard regularly - Visual feedback on variant performance
  • Document your findings - Help others by sharing results
  • Test with multiple models - Same variant may perform differently

๐Ÿ”ฎ Future Enhancements

  • Automated variant comparison reports
  • Performance benchmarking across variants
  • Variant recommendation based on model
  • Historical trend analysis
  • A/B testing framework

Questions? See openagent/navigation.md or open an issue.