# Prompt Library System **Multi-model prompt variants with integrated evaluation framework for testing and validation.** --- ## ๐ŸŽฏ Quick Start ### Testing a Prompt Variant ```bash # Test with eval framework (recommended) cd evals/framework npm run eval:sdk -- --agent=openagent --prompt-variant=llama --suite=smoke-test # Test with specific model npm run eval:sdk -- --agent=openagent --prompt-variant=llama --model=ollama/llama3.2 --suite=core-tests # View results open ../results/index.html ``` ### Using a Variant Permanently ```bash # Switch to a variant ./scripts/prompts/use-prompt.sh --agent=openagent --variant=llama # Restore default (canonical agent file) ./scripts/prompts/use-prompt.sh --agent=openagent --variant=default ``` --- ## ๐Ÿ“ Structure ``` .opencode/ โ”œโ”€โ”€ agent/ # Canonical agent prompts (defaults) โ”‚ โ”œโ”€โ”€ openagent.md # OpenAgent default (Claude-optimized) โ”‚ โ””โ”€โ”€ opencoder.md # OpenCoder default โ””โ”€โ”€ prompts/ # Model-specific variants โ”œโ”€โ”€ navigation.md # This file โ”œโ”€โ”€ openagent/ # OpenAgent variants โ”‚ โ”œโ”€โ”€ gpt.md # GPT-4 optimized โ”‚ โ”œโ”€โ”€ gemini.md # Gemini optimized โ”‚ โ”œโ”€โ”€ grok.md # Grok optimized โ”‚ โ”œโ”€โ”€ llama.md # Llama/OSS optimized โ”‚ โ”œโ”€โ”€ TEMPLATE.md # Template for new variants โ”‚ โ”œโ”€โ”€ navigation.md # Variant documentation โ”‚ โ””โ”€โ”€ results/ # Per-variant test results โ”‚ โ”œโ”€โ”€ default-results.json # Default (agent file) results โ”‚ โ”œโ”€โ”€ gpt-results.json โ”‚ โ”œโ”€โ”€ gemini-results.json โ”‚ โ””โ”€โ”€ llama-results.json โ””โ”€โ”€ opencoder/ # OpenCoder variants โ””โ”€โ”€ ... ``` **Architecture:** - **Agent files** (`.opencode/agent/*.md`) = Canonical defaults (source of truth) - **Prompt variants** (`.opencode/prompts//.md`) = Model-specific optimizations - **Results** always saved to `.opencode/prompts//results/` (including default) --- ## ๐Ÿงช Evaluation Framework Integration ### Running Tests with Variants The eval framework automatically: - โœ… Switches to the specified variant - โœ… Runs your test suite - โœ… Tracks results per variant - โœ… Restores the default prompt after tests ```bash # Smoke test (1 test, ~30s) npm run eval:sdk -- --agent=openagent --prompt-variant=llama --suite=smoke-test # Core suite (7 tests, ~5-8min) npm run eval:sdk -- --agent=openagent --prompt-variant=llama --suite=core-tests # Custom test pattern npm run eval:sdk -- --agent=openagent --prompt-variant=llama --pattern="01-critical-rules/**/*.yaml" ``` ### Auto-Model Detection Variants specify recommended models in their YAML frontmatter: ```yaml --- model_family: llama recommended_models: - ollama/llama3.2 - ollama/qwen2.5 --- ``` If you don't specify `--model`, the framework uses the first recommended model. ### Results Tracking Results are saved in two locations: 1. **Main results:** `evals/results/latest.json` (includes `prompt_variant` field) 2. **Per-variant:** `.opencode/prompts/{agent}/results/{variant}-results.json` View in dashboard: `evals/results/index.html` (filter by variant) --- ## ๐Ÿ“ Creating a New Variant ### Step 1: Copy Template ```bash cp .opencode/prompts/openagent/TEMPLATE.md .opencode/prompts/openagent/my-variant.md ``` ### Step 2: Edit Metadata ```yaml --- model_family: oss recommended_models: - ollama/my-model status: experimental maintainer: your-name description: Optimized for my specific use case --- ``` ### Step 3: Customize Prompt Edit the prompt content below the frontmatter for your target model. ### Step 4: Validate ```bash # Validate the variant exists and metadata is correct cd evals/framework npm run eval:sdk -- --agent=openagent --prompt-variant=my-variant --suite=smoke-test ``` ### Step 5: Test Thoroughly ```bash # Run core test suite npm run eval:sdk -- --agent=openagent --prompt-variant=my-variant --suite=core-tests # Check results open ../results/index.html ``` ### Step 6: Document Results Update `.opencode/prompts/openagent/navigation.md` with: - Test results (pass rate, timing) - Known issues or limitations - Recommended use cases --- ## ๐ŸŽฏ Available Variants ### OpenAgent | Variant | Model Family | Status | Best For | |---------|--------------|--------|----------| | `default` | Claude | โœ… Stable | Production use, Claude models | | `gpt` | GPT | โœ… Stable | GPT-4, GPT-4o | | `gemini` | Gemini | โœ… Stable | Gemini 2.0, Gemini Pro | | `grok` | Grok | โœ… Stable | Grok models (free tier) | | `llama` | Llama/OSS | โœ… Stable | Llama, Qwen, DeepSeek, other OSS | See [openagent/navigation.md](openagent/navigation.md) for detailed test results. ### OpenCoder Coming soon. --- ## ๐Ÿ”ง Advanced Usage ### Custom Test Suites Create custom test suites for your variant: ```bash # Create suite cp evals/agents/openagent/config/smoke-test.json \ evals/agents/openagent/config/my-suite.json # Edit suite (add your tests) # Validate suite cd evals/framework && npm run validate:suites openagent # Run with your variant npm run eval:sdk -- --agent=openagent --prompt-variant=my-variant --suite=my-suite ``` See [evals/TEST_SUITE_VALIDATION.md](../../evals/TEST_SUITE_VALIDATION.md) for details. ### Comparing Models Test the same variant with different models: ```bash # Test Llama 3.2 npm run eval:sdk -- --agent=openagent --prompt-variant=llama --model=ollama/llama3.2 --suite=core-tests # Test Qwen 2.5 npm run eval:sdk -- --agent=openagent --prompt-variant=llama --model=ollama/qwen2.5 --suite=core-tests # Compare results in dashboard open evals/results/index.html ``` --- ## ๐Ÿ“Š Understanding Results ### Dashboard Features The results dashboard (`evals/results/index.html`) shows: - โœ… Filter by prompt variant - โœ… Filter by model - โœ… Pass/fail rates per variant - โœ… Test execution times - โœ… Detailed test results ### Result Files **Main results** (`evals/results/latest.json`): ```json { "meta": { "agent": "openagent", "model": "ollama/llama3.2", "prompt_variant": "llama", "model_family": "llama" }, "summary": { "total": 7, "passed": 7, "failed": 0, "pass_rate": 1 } } ``` **Per-variant results** (`.opencode/prompts/openagent/results/llama-results.json`): - Tracks all test runs for this variant - Shows trends over time - Helps identify regressions --- ## ๐Ÿš€ For Contributors ### Creating a Variant for PR 1. **Create your variant** in `.opencode/prompts//.md` 2. **Test thoroughly** with eval framework 3. **Document results** in agent README 4. **Submit PR** with variant file only ### PR Requirements - โœ… Variant has YAML frontmatter with metadata - โœ… Variant passes core test suite (โ‰ฅ85% pass rate) - โœ… Results documented in agent README - โœ… Agent file unchanged (unless updating default) - โœ… No `default.md` files in prompts directory - โœ… CI validation passes ### Validation ```bash # Validate your variant cd evals/framework npm run eval:sdk -- --agent=openagent --prompt-variant=your-variant --suite=core-tests # Ensure PR uses default ./scripts/prompts/validate-pr.sh ``` --- ## ๐ŸŽ“ Design Principles ### 1. Agent Files are Canonical Defaults - Agent files (`.opencode/agent/*.md`) are the source of truth - Tested and production-ready - Optimized for Claude (primary model) - Modified through normal PR process ### 2. Variants are Model-Specific Optimizations - Stored in `.opencode/prompts//.md` - Optimized for specific models/use cases - May have different trade-offs - Results documented transparently ### 3. Results are Tracked - Every test run tracked per variant - Dashboard shows variant performance - Easy to compare variants ### 4. Easy to Test - One command to test any variant - Automatic model detection - Results saved automatically ### 5. Safe to Experiment - Variants don't affect default - Easy to switch and restore - Test before committing --- ## ๐Ÿ“š Related Documentation - [Eval Framework Guide](../../evals/EVAL_FRAMEWORK_GUIDE.md) - How to run tests - [Test Suite Validation](../../evals/TEST_SUITE_VALIDATION.md) - Creating test suites - [OpenAgent Variants](openagent/navigation.md) - OpenAgent-specific docs - [Contributing Guide](../../docs/contributing/CONTRIBUTING.md) - Contribution guidelines --- ## ๐Ÿ†˜ Troubleshooting ### Variant Not Found ```bash # List available variants ls .opencode/prompts/openagent/*.md # Check variant exists npm run eval:sdk -- --agent=openagent --prompt-variant=your-variant --suite=smoke-test ``` ### Tests Failing 1. Check variant metadata (YAML frontmatter) 2. Verify recommended model is available 3. Run with debug flag: `--debug` 4. Check results in dashboard ### Model Not Available ```bash # Check available models # For Ollama: ollama list # For OpenRouter: check openrouter.ai/models # Specify model explicitly npm run eval:sdk -- --agent=openagent --prompt-variant=llama --model=ollama/llama3.2 ``` --- ## ๐Ÿ’ก Tips - **Start with smoke-test** - Fast validation (1 test, ~30s) - **Use core-tests for thorough testing** - 7 tests, ~5-8min - **Check dashboard regularly** - Visual feedback on variant performance - **Document your findings** - Help others by sharing results - **Test with multiple models** - Same variant may perform differently --- ## ๐Ÿ”ฎ Future Enhancements - [ ] Automated variant comparison reports - [ ] Performance benchmarking across variants - [ ] Variant recommendation based on model - [ ] Historical trend analysis - [ ] A/B testing framework --- **Questions?** See [openagent/navigation.md](openagent/navigation.md) or open an issue.