prompt-library-system.md 7.9 KB

Prompt Library System

Multi-model prompt variants with integrated evaluation framework for testing, validation, and continuous improvement.

Last Updated: 2025-12-08 Status: โœ… Production Ready


๐Ÿ“‹ Quick Links


Overview

The Prompt Library System enables model-specific prompt optimization with comprehensive testing and validation.

Key Features

โœ… Multi-Model Support - Variants for Claude, GPT-4, Gemini, Grok, Llama/OSS โœ… Integrated Testing - Test variants with eval framework โœ… Results Tracking - Per-variant and per-model results โœ… Easy Switching - Switch between variants with one command โœ… Validation - JSON Schema + TypeScript validation โœ… Dashboard - Visual results with variant filtering

Quick Start

# Test a variant
cd evals/framework
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --suite=smoke-test

# View results
open ../results/index.html

System Status

Completed Features:

  • โœ… Prompt variant management (PromptManager)
  • โœ… Evaluation framework integration (--prompt-variant flag)
  • โœ… Results tracking (dual save: main + per-variant)
  • โœ… Dashboard filtering (variant badges and filters)
  • โœ… Test suite validation (JSON Schema + Zod)
  • โœ… CLI validation tool
  • โœ… GitHub Actions workflow
  • โœ… Comprehensive documentation

Tested & Working:

  • โœ… All 5 variants (default, gpt, gemini, grok, llama)
  • โœ… Smoke test suite (1 test)
  • โœ… Core test suite (7 tests)
  • โœ… Grok model integration
  • โœ… Results dashboard
  • โœ… Suite validation

Documentation

See the comprehensive documentation files:

  1. Main Prompts README

    • Quick start guide
    • Creating variants
    • Testing workflow
    • Advanced usage
  2. OpenAgent Variants README

    • Capabilities matrix
    • Variant details
    • Test results
    • Best practices
  3. Eval Framework Guide

    • How tests work
    • Running tests
    • Understanding results
  4. Test Suite Validation

    • Creating test suites
    • Validation system
    • JSON Schema reference
  5. Validation Quick Reference

    • Quick commands
    • Common fixes
    • Troubleshooting

Architecture

Components

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    Prompt Library System                     โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                               โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚   Variants   โ”‚โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚ Eval Frameworkโ”‚โ”€โ”€โ”€โ”€โ”€โ–ถโ”‚ Dashboard โ”‚ โ”‚
โ”‚  โ”‚  (.md files) โ”‚      โ”‚  (Test Runner)โ”‚      โ”‚ (Results) โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚         โ”‚                      โ”‚                     โ”‚       โ”‚
โ”‚         โ”‚                      โ”‚                     โ”‚       โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”      โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚   Metadata  โ”‚      โ”‚  Test Suites   โ”‚   โ”‚   Results   โ”‚ โ”‚
โ”‚  โ”‚(YAML Front) โ”‚      โ”‚  (JSON files)  โ”‚   โ”‚(JSON files) โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚                                                               โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Key Files

Prompt Variants:

  • .opencode/prompts/core/openagent/*.md - Variant files
  • .opencode/prompts/core/openagent/results/*.json - Per-variant results

Test Suites:

  • evals/agents/openagent/config/*.json - Suite definitions
  • evals/agents/openagent/config/suite-schema.json - JSON Schema

Framework:

  • evals/framework/src/sdk/prompt-manager.ts - Prompt switching
  • evals/framework/src/sdk/suite-validator.ts - Suite validation
  • evals/framework/src/sdk/run-sdk-tests.ts - Test runner

Results:

  • evals/results/latest.json - Main results
  • evals/results/index.html - Dashboard

Usage Examples

Testing Variants

# Quick smoke test (1 test, ~30s)
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --suite=smoke-test

# Core test suite (7 tests, ~5-8min)
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --suite=core-tests

# With specific model
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --model=ollama/llama3.2 --suite=core-tests

# Custom test pattern
npm run eval:sdk -- --agent=openagent --prompt-variant=llama --pattern="01-critical-rules/**/*.yaml"

Creating Variants

# 1. Copy template
cp .opencode/prompts/core/openagent/TEMPLATE.md .opencode/prompts/core/openagent/my-variant.md

# 2. Edit metadata and content
# 3. Test
npm run eval:sdk -- --agent=openagent --prompt-variant=my-variant --suite=smoke-test

# 4. Validate
cd evals/framework && npm run validate:suites openagent

Creating Test Suites

# 1. Copy existing suite
cp evals/agents/openagent/config/smoke-test.json \
   evals/agents/openagent/config/my-suite.json

# 2. Edit suite
# 3. Validate
cd evals/framework && npm run validate:suites openagent

# 4. Run
npm run eval:sdk -- --agent=openagent --suite=my-suite

API Reference

PromptManager

class PromptManager {
  constructor(projectRoot: string);
  variantExists(agent: string, variant: string): boolean;
  listVariants(agent: string): string[];
  readMetadata(agent: string, variant: string): PromptMetadata;
  switchToVariant(agent: string, variant: string): SwitchResult;
  restoreDefault(agent: string): boolean;
}

SuiteValidator

class SuiteValidator {
  constructor(agentsDir: string);
  loadSuite(agent: string, suiteName: string): TestSuite;
  validateSuite(agent: string, suite: TestSuite): ValidationResult;
  getTestPaths(agent: string, suite: TestSuite): string[];
}

Test Results

All variants tested with core test suite (7 tests):

Variant Pass Rate Model Tested Status
default 7/7 (100%) opencode/grok-code-fast โœ… Stable
gpt 7/7 (100%) opencode/grok-code-fast โœ… Stable
gemini 7/7 (100%) opencode/grok-code-fast โœ… Stable
grok 7/7 (100%) opencode/grok-code-fast โœ… Stable
llama 7/7 (100%) opencode/grok-code-fast โœ… Stable

Future Enhancements

  • Automated variant comparison reports
  • Performance benchmarking across variants
  • Variant recommendation based on model
  • Historical trend analysis
  • A/B testing framework
  • Automated regression detection

Related Documentation


Questions? Open an issue or see the main README.