|
|
2 weeks ago | |
|---|---|---|
| .. | ||
| content | 3 months ago | |
| core | 2 weeks ago | |
| data | 3 months ago | |
| development | 2 months ago | |
| README.md | 3 months ago | |
Multi-model prompt variants with integrated evaluation framework for testing and validation.
# Test with eval framework (recommended)
cd evals/framework
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --suite=smoke-test
# Test with specific model
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --model=ollama/llama3.2 --suite=core-tests
# View results
open ../results/index.html
# Switch to a variant
./scripts/prompts/use-prompt.sh --agent=openagent --variant=llama
# Restore default (canonical agent file)
./scripts/prompts/use-prompt.sh --agent=openagent --variant=default
.opencode/
โโโ agent/ # Canonical agent prompts (defaults)
โ โโโ openagent.md # OpenAgent default (Claude-optimized)
โ โโโ opencoder.md # OpenCoder default
โโโ prompts/ # Model-specific variants
โโโ navigation.md # This file
โโโ openagent/ # OpenAgent variants
โ โโโ gpt.md # GPT-4 optimized
โ โโโ gemini.md # Gemini optimized
โ โโโ grok.md # Grok optimized
โ โโโ llama.md # Llama/OSS optimized
โ โโโ TEMPLATE.md # Template for new variants
โ โโโ navigation.md # Variant documentation
โ โโโ results/ # Per-variant test results
โ โโโ default-results.json # Default (agent file) results
โ โโโ gpt-results.json
โ โโโ gemini-results.json
โ โโโ llama-results.json
โโโ opencoder/ # OpenCoder variants
โโโ ...
Architecture:
.opencode/agent/*.md) = Canonical defaults (source of truth).opencode/prompts/<agent>/<model>.md) = Model-specific optimizations.opencode/prompts/<agent>/results/ (including default)The eval framework automatically:
# Smoke test (1 test, ~30s)
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --suite=smoke-test
# Core suite (7 tests, ~5-8min)
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --suite=core-tests
# Custom test pattern
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --pattern="01-critical-rules/**/*.yaml"
Variants specify recommended models in their YAML frontmatter:
---
model_family: llama
recommended_models:
- ollama/llama3.2
- ollama/qwen2.5
---
If you don't specify --model, the framework uses the first recommended model.
Results are saved in two locations:
evals/results/latest.json (includes prompt_variant field).opencode/prompts/{agent}/results/{variant}-results.jsonView in dashboard: evals/results/index.html (filter by variant)
cp .opencode/prompts/openagent/TEMPLATE.md .opencode/prompts/openagent/my-variant.md
---
model_family: oss
recommended_models:
- ollama/my-model
status: experimental
maintainer: your-name
description: Optimized for my specific use case
---
Edit the prompt content below the frontmatter for your target model.
# Validate the variant exists and metadata is correct
cd evals/framework
npm run eval:sdk -- --agent=openagent --prompt-variant=my-variant --suite=smoke-test
# Run core test suite
npm run eval:sdk -- --agent=openagent --prompt-variant=my-variant --suite=core-tests
# Check results
open ../results/index.html
Update .opencode/prompts/openagent/navigation.md with:
| Variant | Model Family | Status | Best For |
|---|---|---|---|
default |
Claude | โ Stable | Production use, Claude models |
gpt |
GPT | โ Stable | GPT-4, GPT-4o |
gemini |
Gemini | โ Stable | Gemini 2.0, Gemini Pro |
grok |
Grok | โ Stable | Grok models (free tier) |
llama |
Llama/OSS | โ Stable | Llama, Qwen, DeepSeek, other OSS |
See openagent/navigation.md for detailed test results.
Coming soon.
Create custom test suites for your variant:
# Create suite
cp evals/agents/openagent/config/smoke-test.json \
evals/agents/openagent/config/my-suite.json
# Edit suite (add your tests)
# Validate suite
cd evals/framework && npm run validate:suites openagent
# Run with your variant
npm run eval:sdk -- --agent=openagent --prompt-variant=my-variant --suite=my-suite
See evals/TEST_SUITE_VALIDATION.md for details.
Test the same variant with different models:
# Test Llama 3.2
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --model=ollama/llama3.2 --suite=core-tests
# Test Qwen 2.5
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --model=ollama/qwen2.5 --suite=core-tests
# Compare results in dashboard
open evals/results/index.html
The results dashboard (evals/results/index.html) shows:
Main results (evals/results/latest.json):
{
"meta": {
"agent": "openagent",
"model": "ollama/llama3.2",
"prompt_variant": "llama",
"model_family": "llama"
},
"summary": {
"total": 7,
"passed": 7,
"failed": 0,
"pass_rate": 1
}
}
Per-variant results (.opencode/prompts/openagent/results/llama-results.json):
.opencode/prompts/<agent>/<model>.mddefault.md files in prompts directory# Validate your variant
cd evals/framework
npm run eval:sdk -- --agent=openagent --prompt-variant=your-variant --suite=core-tests
# Ensure PR uses default
./scripts/prompts/validate-pr.sh
.opencode/agent/*.md) are the source of truth.opencode/prompts/<agent>/<model>.md# List available variants
ls .opencode/prompts/openagent/*.md
# Check variant exists
npm run eval:sdk -- --agent=openagent --prompt-variant=your-variant --suite=smoke-test
--debug# Check available models
# For Ollama: ollama list
# For OpenRouter: check openrouter.ai/models
# Specify model explicitly
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --model=ollama/llama3.2
Questions? See openagent/navigation.md or open an issue.