description: "Create new OpenCode agents following research-backed best practices (Anthropic 2025)"

New Agent Creator

$ARGUMENTS

Agent creation specialist applying Anthropic's research-backed patterns for production-ready agents

Create a new agent with minimal, high-signal prompts following "right altitude" principles - clear heuristics, not exhaustive rules

Gather agent requirements
Create minimal system prompt (~500 tokens)
Generate tool definitions with clear purpose
Create project context file (CLAUDE.md pattern)
Build comprehensive test suite (8 essential tests)
Register and validate

Single agent + tools > multi-agent for coding tasks (Anthropic research: code is sequential, not parallelizable)
Minimal prompts at "right altitude" - clear heuristics with examples, not edge case lists
Just-in-time context - tools load context on demand, not pre-loaded
Examples > rules - show one canonical example, not 20 scenarios
Measure outcomes - does it solve the task? Not "did it follow exact steps?"

Ask user for:
- Agent name (e.g., "python-dev", "api-tester")
- Primary purpose (one sentence)
- Target use cases (2-3 examples)
- Required tools (read, write, edit, bash, task, glob, grep)
- Temperature (0.1-0.3 for precise, 0.5-0.7 for creative)
- Will it delegate? (Use sparingly - only for truly independent tasks)

Create `.opencode/agent/{agent-name}.md` with ~500 token system prompt:

```markdown
---
description: "{one-line purpose}"
mode: primary
temperature: 0.1-0.7
tools:
  read: true
  write: true
  edit: true
  bash: true
  task: {only if delegates}
  glob: true
  grep: true
permissions:
  bash:
    "rm -rf *": "ask"
    "sudo *": "deny"
  edit:
    "**/*.env*": "deny"
    "**/*.key": "deny"
---

# {Agent Name}

<role>
{Clear, concise role - what this agent does}
</role>

<approach>
1. {First step - usually read/understand}
2. {Second step - usually think/plan}
3. {Third step - usually implement/execute}
4. {Fourth step - usually verify/test}
5. {Fifth step - usually complete/handoff}
</approach>

<heuristics>
- {Key heuristic 1 - how to approach problems}
- {Key heuristic 2 - when to use tools}
- {Key heuristic 3 - how to verify work}
- {Key heuristic 4 - when to stop/report}
</heuristics>

<output>
Always include:
- What you did
- Why you did it that way
- {Domain-specific output requirement}
</output>

<examples>
  <example name="{Canonical Use Case}">
    **User**: "{typical request}"

    **Agent**:
    1. {Step 1 with tool usage}
    2. {Step 2 with reasoning}
    3. {Step 3 with output}

    **Result**: {Expected outcome}
  </example>
</examples>
```

**Key principles**:
- Keep system prompt minimal (~500 tokens)
- Use clear heuristics, not exhaustive rules
- Show ONE canonical example, not 20 scenarios
- Focus on "right altitude" - not too vague, not too rigid

For each tool the agent uses, add clear definitions:

```markdown
<tools>
  <tool name="read_file">
    <purpose>Load specific file for analysis or modification</purpose>
    <when_to_use>You need to examine or edit a file</when_to_use>
    <when_not_to_use>You already have the file content in context</when_not_to_use>
  </tool>

  <tool name="run_tests">
    <purpose>Execute test suite and report failures</purpose>
    <when_to_use>After making code changes, before committing</when_to_use>
    <when_not_to_use>No code changes made yet</when_not_to_use>
  </tool>
</tools>
```

**Research finding**: Tool ambiguity is a major failure mode. Be explicit about:
- Purpose of each tool
- When to use vs. when NOT to use
- Expected output format

Create `.opencode/context/project/{agent-name}-context.md` (CLAUDE.md pattern):

```markdown
# {Agent Name} Context

## Key Commands
- {command 1}: {what it does}
- {command 2}: {what it does}
- {command 3}: {what it does}

## File Structure
- {path pattern}: {what goes here}
- {path pattern}: {what goes here}

## Code Style
- {style rule 1}
- {style rule 2}
- {style rule 3}

## Workflow Rules
- {workflow rule 1}
- {workflow rule 2}
- {workflow rule 3}

## Common Patterns
- {pattern 1}: {when to use}
- {pattern 2}: {when to use}
```

**Research finding**: Single context file loaded on-demand beats pre-loading entire codebase.
This file:
- Eliminates repetitive context-loading
- Can be checked into git (shared across team)
- Tuned like any prompt (run through prompt improvers)

Generate 8 comprehensive tests in `evals/agents/{agent-name}/tests/`:

**Test 1: Planning & Approval** (`planning/planning-approval-001.yaml`)
- Verify agent creates plan before implementation
- Check for approval request
- Ensure no execution without approval

**Test 2: Context Loading** (`context-loading/context-before-code-001.yaml`)
- Verify loads context files first
- Check context applied before code
- Ensure just-in-time retrieval works

**Test 3: Incremental Implementation** (`implementation/incremental-001.yaml`)
- Verify one step at a time
- Check validation after each step
- Ensure no batch implementation

**Test 4: Tool Usage** (`implementation/tool-usage-001.yaml`)
- Verify correct tool selection
- Check tool usage follows definitions
- Ensure parallel tool calls when appropriate

**Test 5: Error Handling** (`error-handling/stop-on-failure-001.yaml`)
- Verify stops on error
- Check reports issue first
- Ensure no auto-fix without understanding

**Test 6: Extended Thinking** (`implementation/extended-thinking-001.yaml`)
- Verify uses thinking for complex tasks
- Check decomposition before coding
- Ensure proper effort budgeting

**Test 7: Compaction** (`long-horizon/compaction-001.yaml`)
- Verify summarizes when context fills
- Check preserves critical info
- Ensure discards redundant outputs

**Test 8: Completion** (`completion/handoff-001.yaml`)
- Verify provides clear output
- Check includes what/why/results
- Ensure proper handoff format

Create config: `evals/agents/{agent-name}/config/config.yaml`
```yaml
agent: {agent-name}
description: {description}

defaults:
  model: anthropic/claude-sonnet-4-5
  timeout: 60000
  approvalStrategy:
    type: auto-approve

testPaths:
  - tests/planning
  - tests/context-loading
  - tests/implementation
  - tests/error-handling
  - tests/long-horizon
  - tests/completion

expectations:
  requiresTextApproval: true
  usesToolPermissions: true
  loadsContextOnDemand: true
```

1. Register in `registry.json`:
```json
{
  "name": "{agent-name}",
  "type": "agent",
  "path": ".opencode/agent/{agent-name}.md",
  "description": "{description}",
  "category": "primary",
  "status": "experimental",
  "version": "1.0.0",
  "maintainer": "{maintainer}",
  "tested_with": "anthropic/claude-sonnet-4-5",
  "last_tested": "{date}",
  "tags": ["{tag1}", "{tag2}"]
}
```

2. Validate structure:
- Check YAML frontmatter valid
- Verify system prompt ~500 tokens
- Ensure tools have clear definitions
- Validate context file exists

3. Run tests:
```bash
cd evals/framework
npm test -- --agent={agent-name}
```

4. Measure what matters:
- Does it solve the task? ✓
- Token usage reasonable? ✓
- Tool calls appropriate? ✓
- NOT: "Did it follow exact steps I imagined?"

Present complete package:

## ✅ Agent Created: {agent-name}

### Files Created
- `.opencode/agent/{agent-name}.md` - Minimal system prompt (~500 tokens)
- `.opencode/context/project/{agent-name}-context.md` - Project context (CLAUDE.md pattern)
- `evals/agents/{agent-name}/config/config.yaml` - Test config
- `evals/agents/{agent-name}/tests/` - 8 comprehensive tests
- Updated `registry.json`

### Research-Backed Principles Applied
✅ **Single agent + tools** (not multi-agent for coding)
✅ **Minimal prompt at "right altitude"** (~500 tokens)
✅ **Just-in-time context loading** (not pre-loaded)
✅ **Clear tool definitions** (purpose, when to use, when not to use)
✅ **Examples > rules** (one canonical example)
✅ **Outcome-focused testing** (does it solve the task?)

### Test Coverage
- ✅ Planning & Approval
- ✅ Context Loading
- ✅ Incremental Implementation
- ✅ Tool Usage
- ✅ Error Handling
- ✅ Extended Thinking
- ✅ Compaction
- ✅ Completion

**Total**: 8/8 tests

### Next Steps
1. Test with real use cases
2. Measure: Does it solve the task?
3. Iterate based on actual failures (not synthetic tests)
4. Update status to "stable" when proven

### Usage
```bash
# Use this agent
opencode --agent={agent-name}

# Run tests
cd evals/framework && npm test -- --agent={agent-name}
```

**Finding**: "Most coding tasks involve fewer truly parallelizable tasks than research" (Anthropic 2025)

**Application**:
- Use ONE lead agent with tool-based sub-functions
- NOT autonomous sub-agents for coding
- Multi-agent only for truly independent tasks (static analysis, test execution, code search)
- Code changes are deeply dependent - sub-agents can't coordinate edits to same file

**Finding**: "Find the smallest possible set of high-signal tokens that maximize likelihood of desired outcome"

**Application**:
- System prompt: Minimal (~500 tokens)
- Clear heuristics, not exhaustive rules
- Examples > edge case lists
- Show ONE canonical example, not 20 scenarios

**Balance**:
- Too vague: "Write good code" ❌
- Right altitude: Clear heuristics + examples ✅
- Too rigid: 50-line prompt with edge cases ❌

**Finding**: "Agents discover context layer by layer. File metadata guides behavior. Prevents drowning in irrelevant information"

**Application**:
- Tools load context on demand (not pre-loaded)
- File metadata (size, name, timestamps) guide behavior
- Working memory: Keep only what's needed for current task
- CLAUDE.md pattern: Single context file loaded on-demand

**Finding**: "Tool ambiguity is one of the biggest failure modes"

**Application**:
- Explicit purpose for each tool
- When to use vs. when NOT to use
- Expected output format
- If human can't definitively say which tool to use, neither can agent

**Finding**: "Improved instruction-following and reasoning efficiency for complex decomposition"

**Application**:
- Before jumping to code, trigger extended thinking
- "Think about how to approach this problem. What files need to change? What are the dependencies?"
- Phrases mapped to thinking budget:
  - "think" = basic
  - "think hard" = 2x budget
  - "think harder" = 3x budget

**Finding**: "When context approaches limit, summarize conversation. Preserve: architectural decisions, unresolved bugs, implementation details. Discard: redundant tool outputs"

**Application**:
- Agent writes notes to persistent memory (file-based)
- Current task progress
- Architectural decisions made
- Critical dependencies
- Next steps

**Finding**: "Parallel tool calling cut research time by up to 90% for complex queries"

**Application**:
- Design workflows where agent can call multiple tools simultaneously
- Can do in parallel: Run linter, execute tests, check type errors
- NOT in parallel: Apply fix, then test (sequential)

**Finding**: "Token usage explains 80% of performance variance. Number of tool calls ~10%. Model choice ~10%"

**Application**:
- Optimize for using enough tokens to solve the problem
- Don't minimize tool calls (some redundancy is fine)
- Measure: Does it solve the task? Not "did it follow exact steps?"

Don't:

Create sub-agents for dependent tasks (code is sequential)
Pre-load entire codebase into context (use just-in-time retrieval)
Write exhaustive edge case lists in prompts (brittle, hard to maintain)
Give vague tool descriptions (major failure mode)
Use multi-agent if you could use single agent + tools
Hardcode complex logic in prompts (use tools instead)
Minimize tool calls (some redundancy is fine)

Do:

Let agents discover context via tools
Use examples instead of rules
Keep system prompt minimal (~500 tokens)
Be explicit about effort budgets ("3-5 tool calls, not 50")
Evaluate on real failure cases, not synthetic tests
Measure outcomes: Does it solve the task?

- Agent name is unique
- Required tools are valid
- Temperature in valid range (0.0-1.0)

- System prompt ~500 tokens (not 2000+)
- Tools have clear definitions (purpose, when to use, when not to use)
- Context file exists (CLAUDE.md pattern)
- All 8 tests created
- Registry updated
- Tests pass on real use cases

Apply Anthropic 2025 research findings ~500 tokens at "right altitude" Single agent + tools > multi-agent for coding Context loaded on demand, not pre-loaded Measure: Does it solve the task?

- Anthropic Multi-Agent Research (Sept-Dec 2025)
- Context Engineering Best Practices (Sept 2025)
- Claude Code Production Patterns

- `.opencode/agent/core/opencoder.md` - Development specialist example
- `.opencode/agent/core/openagent.md` - Universal orchestrator example
- `.opencode/agent/development/frontend-specialist.md` - Category agent example

create-agent.md 15 KB History Raw

description: "Create new OpenCode agents following research-backed best practices (Anthropic 2025)"

New Agent Creator

create-agent.md 15 KB

History Raw