InteLigent c799173dd0 align context and skills consistency (#192) (#198)		2 months ago
..
content	c8f7103cb6 refactor(evals): consolidate documentation and enhance test infrastructure (#56)	3 months ago
core	c799173dd0 align context and skills consistency (#192) (#198)	2 months ago
data	c8f7103cb6 refactor(evals): consolidate documentation and enhance test infrastructure (#56)	3 months ago
development	193271091a feat: add ExternalScout and optimize ContextScout with research-backed patterns (#128)	2 months ago
README.md	f669cac34c feat: repository review and MVI context system implementation (#85)	3 months ago

Prompt Library System

Multi-model prompt variants with integrated evaluation framework for testing and validation.

🎯 Quick Start

Testing a Prompt Variant

# Test with eval framework (recommended)
cd evals/framework
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --suite=smoke-test

# Test with specific model
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --model=ollama/llama3.2 --suite=core-tests

# View results
open ../results/index.html

Using a Variant Permanently

# Switch to a variant
./scripts/prompts/use-prompt.sh --agent=openagent --variant=llama

# Restore default (canonical agent file)
./scripts/prompts/use-prompt.sh --agent=openagent --variant=default

📁 Structure

.opencode/
├── agent/                       # Canonical agent prompts (defaults)
│   ├── openagent.md            # OpenAgent default (Claude-optimized)
│   └── opencoder.md            # OpenCoder default
└── prompts/                     # Model-specific variants
    ├── navigation.md               # This file
    ├── openagent/              # OpenAgent variants
    │   ├── gpt.md             # GPT-4 optimized
    │   ├── gemini.md          # Gemini optimized
    │   ├── grok.md            # Grok optimized
    │   ├── llama.md           # Llama/OSS optimized
    │   ├── TEMPLATE.md        # Template for new variants
    │   ├── navigation.md          # Variant documentation
    │   └── results/           # Per-variant test results
    │       ├── default-results.json  # Default (agent file) results
    │       ├── gpt-results.json
    │       ├── gemini-results.json
    │       └── llama-results.json
    └── opencoder/              # OpenCoder variants
        └── ...

Architecture:

Agent files (.opencode/agent/*.md) = Canonical defaults (source of truth)
Prompt variants (.opencode/prompts/<agent>/<model>.md) = Model-specific optimizations
Results always saved to .opencode/prompts/<agent>/results/ (including default)

🧪 Evaluation Framework Integration

Running Tests with Variants

The eval framework automatically:

✅ Switches to the specified variant
✅ Runs your test suite
✅ Tracks results per variant
✅ Restores the default prompt after tests

# Smoke test (1 test, ~30s)
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --suite=smoke-test

# Core suite (7 tests, ~5-8min)
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --suite=core-tests

# Custom test pattern
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --pattern="01-critical-rules/**/*.yaml"

Auto-Model Detection

Variants specify recommended models in their YAML frontmatter:

---
model_family: llama
recommended_models:
  - ollama/llama3.2
  - ollama/qwen2.5
---

If you don't specify --model, the framework uses the first recommended model.

Results Tracking

Results are saved in two locations:

Main results: evals/results/latest.json (includes prompt_variant field)
Per-variant: .opencode/prompts/{agent}/results/{variant}-results.json

View in dashboard: evals/results/index.html (filter by variant)

📝 Creating a New Variant

Step 1: Copy Template

cp .opencode/prompts/openagent/TEMPLATE.md .opencode/prompts/openagent/my-variant.md

Step 2: Edit Metadata

---
model_family: oss
recommended_models:
  - ollama/my-model
status: experimental
maintainer: your-name
description: Optimized for my specific use case
---

Step 3: Customize Prompt

Edit the prompt content below the frontmatter for your target model.

Step 4: Validate

# Validate the variant exists and metadata is correct
cd evals/framework
npm run eval:sdk -- --agent=openagent --prompt-variant=my-variant --suite=smoke-test

Step 5: Test Thoroughly

# Run core test suite
npm run eval:sdk -- --agent=openagent --prompt-variant=my-variant --suite=core-tests

# Check results
open ../results/index.html

Step 6: Document Results

Update .opencode/prompts/openagent/navigation.md with:

Test results (pass rate, timing)
Known issues or limitations
Recommended use cases

🎯 Available Variants

OpenAgent

Variant	Model Family	Status	Best For
`default`	Claude	✅ Stable	Production use, Claude models
`gpt`	GPT	✅ Stable	GPT-4, GPT-4o
`gemini`	Gemini	✅ Stable	Gemini 2.0, Gemini Pro
`grok`	Grok	✅ Stable	Grok models (free tier)
`llama`	Llama/OSS	✅ Stable	Llama, Qwen, DeepSeek, other OSS

See openagent/navigation.md for detailed test results.

OpenCoder

Coming soon.

🔧 Advanced Usage

Custom Test Suites

Create custom test suites for your variant:

# Create suite
cp evals/agents/openagent/config/smoke-test.json \
   evals/agents/openagent/config/my-suite.json

# Edit suite (add your tests)
# Validate suite
cd evals/framework && npm run validate:suites openagent

# Run with your variant
npm run eval:sdk -- --agent=openagent --prompt-variant=my-variant --suite=my-suite

See evals/TEST_SUITE_VALIDATION.md for details.

Comparing Models

Test the same variant with different models:

# Test Llama 3.2
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --model=ollama/llama3.2 --suite=core-tests

# Test Qwen 2.5
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --model=ollama/qwen2.5 --suite=core-tests

# Compare results in dashboard
open evals/results/index.html

📊 Understanding Results

Dashboard Features

The results dashboard (evals/results/index.html) shows:

✅ Filter by prompt variant
✅ Filter by model
✅ Pass/fail rates per variant
✅ Test execution times
✅ Detailed test results

Result Files

Main results (evals/results/latest.json):

{
  "meta": {
    "agent": "openagent",
    "model": "ollama/llama3.2",
    "prompt_variant": "llama",
    "model_family": "llama"
  },
  "summary": {
    "total": 7,
    "passed": 7,
    "failed": 0,
    "pass_rate": 1
  }
}

Per-variant results (.opencode/prompts/openagent/results/llama-results.json):

Tracks all test runs for this variant
Shows trends over time
Helps identify regressions

🚀 For Contributors

Creating a Variant for PR

Create your variant in .opencode/prompts/<agent>/<model>.md
Test thoroughly with eval framework
Document results in agent README
Submit PR with variant file only

PR Requirements

✅ Variant has YAML frontmatter with metadata
✅ Variant passes core test suite (≥85% pass rate)
✅ Results documented in agent README
✅ Agent file unchanged (unless updating default)
✅ No default.md files in prompts directory
✅ CI validation passes

Validation

# Validate your variant
cd evals/framework
npm run eval:sdk -- --agent=openagent --prompt-variant=your-variant --suite=core-tests

# Ensure PR uses default
./scripts/prompts/validate-pr.sh

🎓 Design Principles

1. Agent Files are Canonical Defaults

Agent files (.opencode/agent/*.md) are the source of truth
Tested and production-ready
Optimized for Claude (primary model)
Modified through normal PR process

2. Variants are Model-Specific Optimizations

Stored in .opencode/prompts/<agent>/<model>.md
Optimized for specific models/use cases
May have different trade-offs
Results documented transparently

3. Results are Tracked

Every test run tracked per variant
Dashboard shows variant performance
Easy to compare variants

4. Easy to Test

One command to test any variant
Automatic model detection
Results saved automatically

5. Safe to Experiment

Variants don't affect default
Easy to switch and restore
Test before committing

📚 Related Documentation

Eval Framework Guide - How to run tests
Test Suite Validation - Creating test suites
OpenAgent Variants - OpenAgent-specific docs
Contributing Guide - Contribution guidelines

🆘 Troubleshooting

Variant Not Found

# List available variants
ls .opencode/prompts/openagent/*.md

# Check variant exists
npm run eval:sdk -- --agent=openagent --prompt-variant=your-variant --suite=smoke-test

Tests Failing

Check variant metadata (YAML frontmatter)
Verify recommended model is available
Run with debug flag: --debug
Check results in dashboard

Model Not Available

# Check available models
# For Ollama: ollama list
# For OpenRouter: check openrouter.ai/models

# Specify model explicitly
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --model=ollama/llama3.2

💡 Tips

Start with smoke-test - Fast validation (1 test, ~30s)
Use core-tests for thorough testing - 7 tests, ~5-8min
Check dashboard regularly - Visual feedback on variant performance
Document your findings - Help others by sharing results
Test with multiple models - Same variant may perform differently

🔮 Future Enhancements

Automated variant comparison reports
Performance benchmarking across variants
Variant recommendation based on model
Historical trend analysis
A/B testing framework

Questions? See openagent/navigation.md or open an issue.

README.md