|
|
3 months ago | |
|---|---|---|
| .. | ||
| templates | 4 months ago | |
| README.md | 3 months ago | |
| create-agent.md | 3 months ago | |
| create-tests.md | 3 months ago | |
Research-backed agent creation following Anthropic 2025 best practices
This command system helps you create production-ready OpenCode agents with:
# Interactive agent creation
/create-agent my-agent-name
# Or specify in prompt
"Create a new agent called 'python-dev' for Python development"
# Generate 8 comprehensive tests for existing agent
/create-tests my-agent-name
Finding: "Most coding tasks involve fewer truly parallelizable tasks than research" (Anthropic 2025)
Why this matters:
Application:
Finding: "Find the smallest possible set of high-signal tokens that maximize likelihood of desired outcome"
The Balance: | Too Vague | Right Altitude ✅ | Too Rigid | |-----------|------------------|-----------| | "Write good code" | Clear heuristics + examples | 50-line prompt with edge cases | | Fails to guide behavior | Flexible but specific | Brittle, hard to maintain |
Application:
Finding: "Agents discover context layer by layer. File metadata guides behavior. Prevents drowning in irrelevant information"
Context Management Layers:
Why this beats pre-loading:
Finding: Anthropic's Claude Code uses this in production
Create a project context file automatically loaded into every session:
# Project Context
## Bash Commands
- npm run test: Run unit tests
- npm run lint: Check code style
- npm run typecheck: Check TypeScript
## Code Style
- Use ES modules (import/export)
- Destructure imports when possible
- Use async/await, not callbacks
## Common Files & Patterns
- API handlers in src/handlers/
- Business logic in src/logic/
- Tests mirror source structure
## Workflow Rules
- Always run typecheck before committing
- Don't modify test files when writing implementation
- Use git history to understand WHY, not WHAT
Benefits:
Finding: "Tool ambiguity is one of the biggest failure modes"
Bad tool design:
tool: "search_code"
description: "search code" # Ambiguous!
Good tool design:
tool: "read_file"
purpose: "Load a specific file for analysis or modification"
when_to_use: "You need to examine or edit a file"
when_not_to_use: "You already have the file content in context"
Key principle: If a human engineer can't definitively say which tool to use, neither can the agent.
Finding: "Improved instruction-following and reasoning efficiency for complex decomposition"
Before jumping to code, trigger extended thinking:
"Think about how to approach this problem. What files need to change?
What are the dependencies? What should we test?"
Phrases mapped to thinking budget:
Finding: "Parallel tool calling cut research time by up to 90% for complex queries"
Design workflows where agent can call multiple tools simultaneously:
Can do in parallel:
NOT in parallel (sequential):
Finding: "Token usage explains 80% of performance variance. Number of tool calls ~10%. Model choice ~10%"
What to measure:
Application:
---
description: "{one-line purpose}"
mode: primary
temperature: 0.1-0.7
tools:
read: true
write: true
edit: true
bash: true
glob: true
grep: true
---
# {Agent Name}
<role>
{Clear, concise role - what this agent does}
</role>
<approach>
1. {First step - usually read/understand}
2. {Second step - usually think/plan}
3. {Third step - usually implement/execute}
4. {Fourth step - usually verify/test}
5. {Fifth step - usually complete/handoff}
</approach>
<heuristics>
- {Key heuristic 1 - how to approach problems}
- {Key heuristic 2 - when to use tools}
- {Key heuristic 3 - how to verify work}
- {Key heuristic 4 - when to stop/report}
</heuristics>
<output>
Always include:
- What you did
- Why you did it that way
- {Domain-specific output requirement}
</output>
<examples>
<example name="{Canonical Use Case}">
**User**: "{typical request}"
**Agent**:
1. {Step 1 with tool usage}
2. {Step 2 with reasoning}
3. {Step 3 with output}
**Result**: {Expected outcome}
</example>
</examples>
Every agent gets 8 comprehensive tests:
Based on failure modes found in production:
Don't:
Do:
When you create a new agent, the system generates:
.opencode/agent/{agent-name}.md
└─ Minimal system prompt (~500 tokens)
.opencode/context/project/{agent-name}-context.md
└─ Project context (CLAUDE.md pattern)
evals/agents/{agent-name}/
├─ config/
│ └─ config.yaml
└─ tests/
├─ planning/
│ └─ planning-approval-001.yaml
├─ context-loading/
│ └─ context-before-code-001.yaml
├─ implementation/
│ ├─ incremental-001.yaml
│ ├─ tool-usage-001.yaml
│ └─ extended-thinking-001.yaml
├─ error-handling/
│ └─ stop-on-failure-001.yaml
├─ long-horizon/
│ └─ compaction-001.yaml
└─ completion/
└─ handoff-001.yaml
registry.json (updated)
User: "Create a new agent for Python development with testing and linting"
System creates:
- Agent: python-dev
- System prompt: ~500 tokens
- Tools: read, write, edit, bash, glob, grep
- Context file: Python-specific commands and patterns
- 8 comprehensive tests
User: "Create an agent for API endpoint testing"
System creates:
- Agent: api-tester
- System prompt: ~500 tokens
- Tools: read, bash, glob, grep (no write/edit)
- Context file: API testing patterns and commands
- 8 comprehensive tests
# Run all tests for an agent
cd evals/framework
npm test -- --agent=my-agent-name
# Run specific category
npm test -- --agent=my-agent-name --category=planning
# Run single test
npm test -- --agent=my-agent-name --test=planning-approval-001
Anthropic Multi-Agent Research (Sept-Dec 2025)
Context Engineering Best Practices (Sept 2025)
Claude Code Production Patterns
For questions or issues:
.opencode/agent/core/openagent.md, .opencode/agent/core/opencoder.md.opencode/agent/development/frontend-specialist.md.opencode/agent/content/copywriter.mdevals/agents/openagent/tests/docs/agents/research-backed-prompt-design.md