# New Agent Creation System
**Research-backed agent creation following Anthropic 2025 best practices**
## Overview
This command system helps you create production-ready OpenCode agents with:
- ✅ **Minimal prompts** (~500 tokens at "right altitude")
- ✅ **Single agent + tools** (not multi-agent for coding)
- ✅ **Just-in-time context** (loaded on demand, not pre-loaded)
- ✅ **Clear tool definitions** (purpose, when to use, when not to use)
- ✅ **Comprehensive testing** (8 essential test types)
## Quick Start
### Create a New Agent
```bash
# Interactive agent creation
/create-agent my-agent-name
# Or specify in prompt
"Create a new agent called 'python-dev' for Python development"
```
### Generate Test Suite
```bash
# Generate 8 comprehensive tests for existing agent
/create-tests my-agent-name
```
## Research-Backed Principles
### 1. Single Agent + Tools > Multi-Agent for Coding
**Finding**: "Most coding tasks involve fewer truly parallelizable tasks than research" (Anthropic 2025)
**Why this matters**:
- Code changes are deeply dependent on each other
- Sub-agents can't coordinate edits to the same file
- Agents waste time duplicating work
- Multi-agent excels at research (90.2% improvement) because searches are independent
- Code is sequential
**Application**:
- Use ONE lead agent with tool-based sub-functions
- NOT autonomous sub-agents for coding
- Multi-agent only for truly independent tasks:
- Static analysis (no coordination needed)
- Test execution
- Code search/retrieval
- NOT for: refactoring, architecture decisions, multi-file changes
### 2. Right Altitude: Minimal Prompts
**Finding**: "Find the smallest possible set of high-signal tokens that maximize likelihood of desired outcome"
**The Balance**:
| Too Vague | Right Altitude ✅ | Too Rigid |
|-----------|------------------|-----------|
| "Write good code" | Clear heuristics + examples | 50-line prompt with edge cases |
| Fails to guide behavior | Flexible but specific | Brittle, hard to maintain |
**Application**:
- System prompt: Minimal (~500 tokens)
- Clear heuristics, not exhaustive rules
- Examples > edge case lists
- Show ONE canonical example, not 20 scenarios
### 3. Just-in-Time Context Loading
**Finding**: "Agents discover context layer by layer. File metadata guides behavior. Prevents drowning in irrelevant information"
**Context Management Layers**:
1. **System prompt**: Minimal (~500 tokens). Clear heuristics, not exhaustive rules.
2. **Just-in-time retrieval**: Tools that agents call to load context on demand (file paths, not full content)
3. **Working memory**: Keep only what's needed for the current task
**Why this beats pre-loading**:
- Agents discover context layer by layer
- File metadata (size, name, timestamps) guide behavior
- Prevents "drowning in irrelevant information"
### 4. CLAUDE.md Pattern
**Finding**: Anthropic's Claude Code uses this in production
**Create a project context file** automatically loaded into every session:
```markdown
# Project Context
## Bash Commands
- npm run test: Run unit tests
- npm run lint: Check code style
- npm run typecheck: Check TypeScript
## Code Style
- Use ES modules (import/export)
- Destructure imports when possible
- Use async/await, not callbacks
## Common Files & Patterns
- API handlers in src/handlers/
- Business logic in src/logic/
- Tests mirror source structure
## Workflow Rules
- Always run typecheck before committing
- Don't modify test files when writing implementation
- Use git history to understand WHY, not WHAT
```
**Benefits**:
- Eliminates repetitive context-loading
- Shared across team (check into git)
- Tuned like any prompt (run through prompt improvers)
### 5. Tool Clarity
**Finding**: "Tool ambiguity is one of the biggest failure modes"
**Bad tool design**:
```markdown
tool: "search_code"
description: "search code" # Ambiguous!
```
**Good tool design**:
```markdown
tool: "read_file"
purpose: "Load a specific file for analysis or modification"
when_to_use: "You need to examine or edit a file"
when_not_to_use: "You already have the file content in context"
```
**Key principle**: If a human engineer can't definitively say which tool to use, neither can the agent.
### 6. Extended Thinking for Decomposition
**Finding**: "Improved instruction-following and reasoning efficiency for complex decomposition"
**Before jumping to code, trigger extended thinking**:
```
"Think about how to approach this problem. What files need to change?
What are the dependencies? What should we test?"
```
**Phrases mapped to thinking budget**:
- "think" = basic
- "think hard" = 2x budget
- "think harder" = 3x budget
- "ultrathink" = maximum
### 7. Parallel Tool Calling
**Finding**: "Parallel tool calling cut research time by up to 90% for complex queries"
**Design workflows where agent can call multiple tools simultaneously**:
**Can do in parallel**:
- Run linter
- Execute tests
- Check type errors
**NOT in parallel** (sequential):
- Apply fix, then test
### 8. Outcome-Focused Evaluation
**Finding**: "Token usage explains 80% of performance variance. Number of tool calls ~10%. Model choice ~10%"
**What to measure**:
- ✅ Does it solve the task?
- ✅ Token usage reasonable?
- ✅ Tool calls appropriate?
- ❌ NOT: "Did it follow exact steps I imagined?"
**Application**:
- Optimize for using enough tokens to solve the problem
- Don't minimize tool calls (some redundancy is fine)
- Evaluate on real failure cases, not synthetic tests
## Agent Structure
### Minimal System Prompt Template (~500 tokens)
```markdown
---
description: "{one-line purpose}"
mode: primary
temperature: 0.1-0.7
tools:
read: true
write: true
edit: true
bash: true
glob: true
grep: true
---
# {Agent Name}
{Clear, concise role - what this agent does}
1. {First step - usually read/understand}
2. {Second step - usually think/plan}
3. {Third step - usually implement/execute}
4. {Fourth step - usually verify/test}
5. {Fifth step - usually complete/handoff}
- {Key heuristic 1 - how to approach problems}
- {Key heuristic 2 - when to use tools}
- {Key heuristic 3 - how to verify work}
- {Key heuristic 4 - when to stop/report}
**User**: "{typical request}"
**Agent**:
1. {Step 1 with tool usage}
2. {Step 2 with reasoning}
3. {Step 3 with output}
**Result**: {Expected outcome}
```
## Test Suite (8 Essential Tests)
Every agent gets 8 comprehensive tests:
1. **Planning & Approval** - Verify plan-first approach
2. **Context Loading** - Ensure just-in-time context retrieval
3. **Incremental Implementation** - Verify step-by-step execution
4. **Tool Usage** - Check correct tool selection and usage
5. **Error Handling** - Verify stop-on-failure behavior
6. **Extended Thinking** - Check decomposition before coding
7. **Compaction** - Verify summarization for long sessions
8. **Completion** - Check proper output and handoff
## What NOT to Do
Based on failure modes found in production:
**Don't**:
- ❌ Create sub-agents for dependent tasks (code is sequential)
- ❌ Pre-load entire codebase into context (use just-in-time retrieval)
- ❌ Write exhaustive edge case lists in prompts (brittle, hard to maintain)
- ❌ Give vague tool descriptions (major failure mode)
- ❌ Use multi-agent if you could use single agent + tools
- ❌ Hardcode complex logic in prompts (use tools instead)
- ❌ Minimize tool calls (some redundancy is fine)
**Do**:
- ✅ Let agents discover context via tools
- ✅ Use examples instead of rules
- ✅ Keep system prompt minimal (~500 tokens)
- ✅ Be explicit about effort budgets ("3-5 tool calls, not 50")
- ✅ Evaluate on real failure cases, not synthetic tests
- ✅ Measure outcomes: Does it solve the task?
## Files Created
When you create a new agent, the system generates:
```
.opencode/agent/{agent-name}.md
└─ Minimal system prompt (~500 tokens)
.opencode/context/project/{agent-name}-context.md
└─ Project context (CLAUDE.md pattern)
evals/agents/{agent-name}/
├─ config/
│ └─ config.yaml
└─ tests/
├─ planning/
│ └─ planning-approval-001.yaml
├─ context-loading/
│ └─ context-before-code-001.yaml
├─ implementation/
│ ├─ incremental-001.yaml
│ ├─ tool-usage-001.yaml
│ └─ extended-thinking-001.yaml
├─ error-handling/
│ └─ stop-on-failure-001.yaml
├─ long-horizon/
│ └─ compaction-001.yaml
└─ completion/
└─ handoff-001.yaml
registry.json (updated)
```
## Usage Examples
### Example 1: Create Python Development Agent
```bash
User: "Create a new agent for Python development with testing and linting"
System creates:
- Agent: python-dev
- System prompt: ~500 tokens
- Tools: read, write, edit, bash, glob, grep
- Context file: Python-specific commands and patterns
- 8 comprehensive tests
```
### Example 2: Create API Testing Agent
```bash
User: "Create an agent for API endpoint testing"
System creates:
- Agent: api-tester
- System prompt: ~500 tokens
- Tools: read, bash, glob, grep (no write/edit)
- Context file: API testing patterns and commands
- 8 comprehensive tests
```
## Running Tests
```bash
# Run all tests for an agent
cd evals/framework
npm test -- --agent=my-agent-name
# Run specific category
npm test -- --agent=my-agent-name --category=planning
# Run single test
npm test -- --agent=my-agent-name --test=planning-approval-001
```
## Iteration and Improvement
1. **Test with real use cases** (not just synthetic tests)
2. **Measure outcomes**: Does it solve the task?
3. **Iterate based on actual failures** (not imagined edge cases)
4. **Update status** to "stable" when proven in production
## Research References
- **Anthropic Multi-Agent Research** (Sept-Dec 2025)
- Single agent + tools > multi-agent for coding
- Token usage explains 80% of performance variance
- **Context Engineering Best Practices** (Sept 2025)
- "Find the smallest possible set of high-signal tokens"
- Just-in-time retrieval beats pre-loading
- **Claude Code Production Patterns**
- CLAUDE.md pattern for project context
- Extended thinking for complex decomposition
- Compaction for long-horizon tasks
## Support
For questions or issues:
1. Check existing agents:
- Core agents: `.opencode/agent/core/openagent.md`, `.opencode/agent/core/opencoder.md`
- Development agents: `.opencode/agent/development/frontend-specialist.md`
- Content agents: `.opencode/agent/content/copywriter.md`
2. Review test examples: `evals/agents/openagent/tests/`
3. See research docs: `docs/agents/research-backed-prompt-design.md`