Last verified: Dec 7, 2025
Source code verified: packages/opencode/src/session/system.ts, prompt.ts, transform.ts
Every time you send a message to OpenCode, it builds a context (like a brief for the AI). Think of it like preparing a sandwich:
๐ Header โ "You are Claude" (if using Anthropic) ~12 tokens
๐ฅฌ Base Prompt โ Big instructions (1,300-3,900 words) ~2,000 tokens
๐ง Environment โ "You're in /Users/you/project, 50 files..." ~200 tokens
๐ฅ Your Rules โ AGENTS.md, CLAUDE.md (your custom instructions) ~500 tokens
๐ Tools โ "You can read, write, edit..." (16 tools) ~6,600 tokens
๐ Your Message โ "Fix the bug in auth.ts" ~10 tokens
โโโโโโโโโโโโโโโโโโโโโโโโโ
Total: ~9,322 tokens
Without Caching (Ollama, most models):
With Caching (Claude/Anthropic only):
The TUI shows total tokens INCLUDING cached reads, so high numbers are actually GOOD for Claude!
OpenCode caches specific messages automatically:
// From: packages/opencode/src/provider/transform.ts:23-63
const system = msgs.filter((msg) => msg.role === "system").slice(0, 2)
const final = msgs.filter((msg) => msg.role !== "system").slice(-2)
Translation: OpenCode marks these messages as cacheable:
Visual example:
Request 1: "Fix auth bug"
โโ [System 1] Header + Base Prompt [CACHEABLE โ
]
โโ [System 2] Environment + Custom + Tools [CACHEABLE โ
]
โโ [User] "Fix auth bug" [NOT CACHED]
โโ [Assistant] "Here's the fix..." [NOT CACHED]
Request 2: "Add tests"
โโ [System 1] Header + Base Prompt [CACHE HIT! ๐ฐ]
โโ [System 2] Environment + Custom + Tools [CACHE HIT! ๐ฐ]
โโ [User] "Fix auth bug" [CACHEABLE โ
]
โโ [Assistant] "Here's the fix..." [CACHEABLE โ
]
โโ [User] "Add tests" [NOT CACHED]
โโ [Assistant] "Here are the tests..." [NOT CACHED]
Request 3: "Explain the tests"
โโ [System 1] Header + Base Prompt [CACHE HIT! ๐ฐ]
โโ [System 2] Environment + Custom + Tools [CACHE HIT! ๐ฐ]
โโ ... (earlier messages truncated)
โโ [User] "Add tests" [CACHE HIT! ๐ฐ]
โโ [Assistant] "Here are the tests..." [CACHE HIT! ๐ฐ]
โโ [User] "Explain the tests" [NOT CACHED]
โโ [Assistant] "The tests work by..." [NOT CACHED]
Caching happens on the provider's servers (not locally):
Anthropic receives your request with special markers:
{
"messages": [
{
"role": "system",
"content": "You are Claude...",
"cache_control": { "type": "ephemeral" } // โ Cache marker
}
]
}
Anthropic computes a hash of the message content
Cache is stored on Anthropic's servers for your API key
OpenCode tracks cache status in message metadata:
{
"tokens": {
"input": 1200,
"cache": {
"read": 8500, // โ Anthropic says "I already have this"
"write": 0 // โ New content added to cache
}
}
}
Lifespan: 5 minutes of inactivity
0:00 - Request 1: Cache written (full price)
0:30 - Request 2: Cache hit! (10% price)
1:00 - Request 3: Cache hit! (10% price)
4:50 - Request 4: Cache hit! (10% price)
... (silence for 5 minutes)
10:00 - Request 5: Cache expired, rebuilt (full price)
10:30 - Request 6: Cache hit again! (10% price)
Auto-refresh: Every cache hit resets the 5-minute timer
Source: packages/opencode/src/provider/transform.ts:65-74
export function message(msgs: ModelMessage[], providerID: string, modelID: string) {
if (providerID === "anthropic" || modelID.includes("anthropic") || modelID.includes("claude")) {
msgs = applyCaching(msgs, providerID)
}
return msgs
}
Verified Provider Support:
| Provider | Models | Caching Support | Cache Format | Notes |
|---|---|---|---|---|
| Anthropic | Claude 3.5 Sonnet Claude 3.5 Haiku Claude 3 Opus/Sonnet |
โ Yes | cacheControl: { type: "ephemeral" } |
Native support, best implementation |
| OpenRouter | When routing to Claude | โ Yes | cache_control: { type: "ephemeral" } |
Only if backend is Anthropic |
| AWS Bedrock | Claude on Bedrock | โ Yes | cachePoint: { type: "ephemeral" } |
AWS-specific format |
| OpenAI | GPT-4, GPT-4 Turbo o1, o3, GPT-5 |
โ ๏ธ Different | promptCacheKey: sessionID |
Different system, not as effective |
| OpenCode API | Big Pickle | โ Yes | Routes through Anthropic | Backend uses Anthropic, so caching works |
| Ollama | All local models | โ No | N/A | Local models don't support caching |
| LM Studio | All local models | โ No | N/A | Local server, no cloud cache |
| Together AI | Qwen, Llama, etc. | โ No | N/A | No caching support |
| Google AI | Gemini 1.5, 2.0 | โ No | N/A | Not supported by provider |
| Azure OpenAI | GPT-4 on Azure | โ ๏ธ Varies | Depends on Azure config | Check your Azure setup |
Method 1: Check Token Breakdown
# View your session tokens
cat ~/.local/share/opencode/storage/message/ses_YOUR_ID/*.json | jq '.tokens'
# If you see this, caching is working:
{
"cache": {
"read": 8500 // โ Non-zero = cache hit!
}
}
# If you see this, no caching:
{
"cache": {
"read": 0 // โ Zero = no cache support
}
}
Method 2: TUI Display
Context
9,842 tokens โ If this stays HIGH but cost stays LOW = caching works!
Method 3: Cost Pattern
Request 1: $0.028 (first request)
Request 2: $0.004 (85% cheaper)
Request 3: $0.004 (still cheap)
If costs drop dramatically after first request = caching works!
Big Pickle Mystery Solved:
Your Setup:
โโ You select: "Big Pickle" (Ollama model)
โโ OpenCode CLI sends to: OpenCode API
โโ OpenCode API routes to: Anthropic Claude API
โ
Cache happens here!
That's why you see:
cache.read: 8500 tokens (from Anthropic)It's not really Ollama - it's Claude with a different name!
For Anthropic/Claude:
For Non-Caching Models:
Source: packages/opencode/src/provider/transform.ts:23-63
function applyCaching(msgs: ModelMessage[], providerID: string): ModelMessage[] {
const system = msgs.filter((msg) => msg.role === "system").slice(0, 2)
const final = msgs.filter((msg) => msg.role !== "system").slice(-2)
const providerOptions = {
anthropic: { cacheControl: { type: "ephemeral" } },
openrouter: { cache_control: { type: "ephemeral" } },
bedrock: { cachePoint: { type: "ephemeral" } },
openaiCompatible: { cache_control: { type: "ephemeral" } },
}
// Apply cache markers to eligible messages
for (const msg of unique([...system, ...final])) {
msg.providerOptions = {
...msg.providerOptions,
...providerOptions[providerID]
}
}
return msgs
}
What this does:
graph TD
A[You type message] --> B{OpenCode starts building context}
B --> C[1๏ธโฃ Add Header]
C --> C1[Anthropic: 'You are Claude'<br/>Others: Nothing]
B --> D[2๏ธโฃ Add Base Prompt]
D --> D1{Which model?}
D1 -->|Claude| D2[anthropic.txt<br/>1,335 words]
D1 -->|GPT-4| D3[beast.txt<br/>1,904 words]
D1 -->|GPT-5| D4[codex.txt<br/>3,940 words]
D1 -->|Gemini| D5[gemini.txt<br/>2,235 words]
D1 -->|Others/Ollama/Big Pickle| D6[qwen.txt<br/>1,596 words]
B --> E[3๏ธโฃ Add Environment]
E --> E1[Working directory<br/>Project tree<br/>Date & platform]
B --> F[4๏ธโฃ Search for Custom Instructions]
F --> F1{Find local files?}
F1 -->|Yes| F2[Load AGENTS.md<br/>or CLAUDE.md]
F1 -->|No| F3[Check global<br/>~/.claude/CLAUDE.md]
B --> G[5๏ธโฃ Add Tool Definitions]
G --> G1{Which tools enabled?}
G1 -->|Agent config| G2[Load descriptions<br/>for enabled tools]
G1 -->|All by default| G3[Load all 16 tools<br/>~6,600 tokens!]
C1 & D2 & D3 & D4 & D5 & D6 & E1 & F2 & F3 & G2 & G3 --> H[6๏ธโฃ Combine Everything]
H --> I[7๏ธโฃ Apply Caching]
I --> I1{Anthropic/Claude?}
I1 -->|Yes| I2[Mark first 2 system<br/>messages as cacheable]
I1 -->|No| I3[No caching]
I2 & I3 --> J[8๏ธโฃ Add Your Message]
J --> K[Send to AI Model]
K --> L{First request?}
L -->|Yes| M[Full cost:<br/>All tokens charged]
L -->|No + Cached| N[Discounted:<br/>90% cached at 10% cost]
M & N --> O[AI Responds]
style A fill:#e1f5ff
style K fill:#fff4e1
style M fill:#ffe1e1
style N fill:#e1ffe1
style O fill:#f0e1ff
Think of OpenCode context like building a layer cake for the AI to "eat":
AGENTS.md, CLAUDE.md, or files in config.instructionsโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ STATIC CONTENT (Same Every Request) โ 8,000-10,000 tokens
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โข Base Prompt โ 2,000 tokens โ โ Huge!
โ โข Tool Definitions โ 6,600 tokens โ โ Wasteful if unused!
โ โข Environment โ 200 tokens โ
โ โข Custom Rules โ 500 tokens โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ This repeats EVERY request
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ DYNAMIC CONTENT (Changes Each Request) โ 10-100 tokens
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โข Your Message โ 15 tokens โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Without caching: You pay for all 8,000+ tokens every time!
With caching (Anthropic): You pay full price once, then 10% for the static parts!
Here's why you see different token counts for different models:
| Model | Base Prompt | Size | Why Different? |
|---|---|---|---|
| Claude | anthropic.txt | 1,736 tokens | Optimized for Claude's style |
| GPT-4 | beast.txt | 2,475 tokens | Detailed reasoning instructions |
| GPT-5 | codex.txt | 5,122 tokens | Advanced multi-step guidance |
| Gemini | gemini.txt | 2,906 tokens | Google-specific format |
| Ollama/Big Pickle | qwen.txt | 2,075 tokens | Open-source model format |
๐จ Key Point: Your Ollama model gets the same 2,075-token prompt as GPT-4, even though it has a tiny 8k context window!
First Request (No Cache):
Request 1: "Hi"
โโ Base Prompt: 2,000 tokens ร $3.00/1M = $0.0060
โโ Tools: 6,600 tokens ร $3.00/1M = $0.0198
โโ Environment: 200 tokens ร $3.00/1M = $0.0006
โโ Your Message: 10 tokens ร $3.00/1M = $0.0000
โโ AI Response: 100 tokens ร $15.00/1M = $0.0015
Total: $0.0279
Second Request (With Cache):
Request 2: "Thanks"
โโ Base Prompt: 2,000 tokens ร $0.30/1M = $0.0006 (cached!)
โโ Tools: 6,600 tokens ร $0.30/1M = $0.0020 (cached!)
โโ Environment: 200 tokens ร $0.30/1M = $0.0001 (cached!)
โโ Your Message: 10 tokens ร $3.00/1M = $0.0000
โโ AI Response: 100 tokens ร $15.00/1M = $0.0015
Total: $0.0042
Savings: 85% cheaper!
๐ Cache expires after 5 minutes of inactivity, then rebuilds automatically.
Ollama Model: 8,000 token context limit
โโ Base Prompt: 2,075 tokens (26%!) ๐ฑ
โโ Tools: 6,600 tokens (82%!) ๐ฑ๐ฑ
โโ Remaining for you: -675 tokens โ DOESN'T FIT!
Solution: Minimize everything (see optimization section below)
You can't easily change the 2,000+ token base prompt without editing source code.
Solution: Override with agent prompt: field (explained below)
All 16 tools = 6,600 tokens, even if you only need 3.
Solution: Explicitly disable unused tools (explained below)
Before diving into the technical details, here are the fastest ways to reduce context:
# Don't optimize - caching makes it cheap!
# Keep all tools and instructions
# .opencode/agent/ollama.md
---
description: "Ollama optimized"
prompt: "Code assistant" # โ Replaces 2,075 token base prompt!
tools:
read: true
write: true
edit: true
# All others automatically false = Saves 5,900 tokens!
---
# Remove custom instructions
mv ~/.claude/CLAUDE.md ~/.claude/CLAUDE.md.disabled # Saves 200-2,000 tokens
# Result: 750 tokens instead of 8,000+ (91% reduction!)
Before diving into details, here's how to check what context YOUR agents are using:
# Run the token counting script
cd /Users/darrenhinde/Documents/GitHub/opencode
./script/count-agent-tokens.sh AGENT_NAME MODEL_ID PROVIDER
# Examples:
./script/count-agent-tokens.sh build claude-sonnet-4 anthropic
./script/count-agent-tokens.sh ollama qwen2.5:latest ollama
./script/count-agent-tokens.sh your-custom-agent big-pickle opencode
# Output shows:
# - Base prompt tokens
# - Environment tokens
# - Custom instruction files found
# - Tool tokens
# - Total estimated tokens
When you run OpenCode in TUI mode:
opencode # Start TUI
# Top right shows:
Context
9,842 tokens โ Total tokens (includes cached!)
12% used โ % of context window
$0.00 spent โ Cost so far
๐ฏ Key Insight: The token count includes cache.read tokens, so:
# Find your session ID in TUI (top of screen: "Session: ses_...")
SESSION_ID="ses_YOUR_SESSION_ID_HERE"
# View token breakdown
cat ~/.local/share/opencode/storage/message/$SESSION_ID/*.json | \
jq '.tokens'
# Example output:
{
"input": 1200, โ New tokens this request
"output": 150, โ AI response tokens
"reasoning": 0, โ Reasoning tokens (o1/o3 only)
"cache": {
"read": 8500, โ Reused from cache (cheap!)
"write": 0 โ New cache writes
}
}
# Calculate real cost:
# Cached: 8500 ร $0.30/1M = $0.0026
# Input: 1200 ร $3.00/1M = $0.0036
# Output: 150 ร $15.00/1M = $0.0023
# Total: $0.0085 (not $0.0285 without cache!)
# Find custom instruction files being loaded
cd your-project
find . -name "AGENTS.md" -o -name "CLAUDE.md" -o -name "CONTEXT.md" 2>/dev/null
# Check global files
ls -la ~/.config/opencode/AGENTS.md 2>/dev/null
ls -la ~/.claude/CLAUDE.md 2>/dev/null
# Count words in custom files
wc -w .opencode/AGENTS.md ~/.claude/CLAUDE.md
# Estimate tokens (words ร 1.3)
# Check your opencode.json
cat opencode.json | jq '.tools'
# Or check agent config
cat .opencode/agent/your-agent.md | grep -A 20 "tools:"
# Count enabled tools:
# Each tool โ 200-1,800 tokens
# All 16 tools โ 6,600 tokens total
| Symptom | Likely Cause | Fix |
|---|---|---|
| 8,000+ tokens on "Hi" | Base prompt + all tools loaded | Use minimal agent, disable tools |
| Same tokens every request | No caching OR local model | Switch to Claude for caching |
| 9k cache + 1k input | Perfect! Caching working | Nothing, this is optimal! |
| Context 90% used (Ollama) | Too much context for small window | Minimize base prompt, disable tools |
| Can't fit full context | Project too large + tools + prompt | Reduce tool count, use agent override |
# 1. Create minimal agent
cat > .opencode/agent/test.md << 'EOF'
---
description: "Test minimal context"
prompt: "Code assistant"
tools:
read: true
---
EOF
# 2. Count tokens
./script/count-agent-tokens.sh test qwen2.5:latest ollama
# 3. Compare before/after
# Before: ~8,000 tokens
# After: ~400 tokens
# Savings: 95%!
# 4. Test in TUI
opencode --agent test
# Type: "hi"
# Check Context in top right
Now let's dive into the technical details...
Every request to the AI follows this exact sequence. Here's the verified code flow:
Source: packages/opencode/src/session/prompt.ts:492-512
async function resolveSystemPrompt(input: {
system?: string
agent: Agent.Info
providerID: string
modelID: string
}) {
let system = SystemPrompt.header(input.providerID) // Step 1
system.push(...(() => { // Step 2
if (input.system) return [input.system]
if (input.agent.prompt) return [input.agent.prompt]
return SystemPrompt.provider(input.modelID)
})())
system.push(...(await SystemPrompt.environment())) // Step 3
system.push(...(await SystemPrompt.custom())) // Step 4
// Combine into max 2 messages for caching
const [first, ...rest] = system
system = [first, rest.join("\n")]
return system
}
Source: packages/opencode/src/session/system.ts:20-23
export function header(providerID: string) {
if (providerID.includes("anthropic")) return [PROMPT_ANTHROPIC_SPOOF.trim()]
return []
}
What gets added:
Token cost:
Source: packages/opencode/src/session/system.ts:25-31
export function provider(modelID: string) {
if (modelID.includes("gpt-5")) return [PROMPT_CODEX]
if (modelID.includes("gpt-") || modelID.includes("o1") || modelID.includes("o3"))
return [PROMPT_BEAST]
if (modelID.includes("gemini-")) return [PROMPT_GEMINI]
if (modelID.includes("claude")) return [PROMPT_ANTHROPIC]
return [PROMPT_ANTHROPIC_WITHOUT_TODO] // Default fallback
}
Override Priority:
--system "custom" flag used โ Use thatprompt: field โ Use agent promptVerified Prompt Files & Token Counts:
| Model Pattern | File | Words | Approx Tokens | Used By |
|---|---|---|---|---|
gpt-5 |
codex.txt | 3,940 | ~5,122 | GPT-5, o1-pro, o1-2024-12-17 |
gpt-*, o1, o3 |
beast.txt | 1,904 | ~2,475 | GPT-4, o1, o3 |
gemini- |
gemini.txt | 2,235 | ~2,906 | Gemini models |
claude |
anthropic.txt | 1,335 | ~1,736 | Claude 3.5, 3, etc. |
| Default | qwen.txt | 1,596 | ~2,075 | Big Pickle, Ollama, DeepSeek, etc. |
๐ฅ Key Insight: Models that don't match specific patterns (like Big Pickle, Ollama models, most local models) get the qwen.txt prompt by default, which is ~2,075 tokens!
Source: packages/opencode/src/session/system.ts:33-56
export async function environment() {
const project = Instance.project
return [
[
`Here is some useful information about the environment you are running in:`,
`<env>`,
` Working directory: ${Instance.directory}`,
` Is directory a git repo: ${project.vcs === "git" ? "yes" : "no"}`,
` Platform: ${process.platform}`,
` Today's date: ${new Date().toDateString()}`,
`</env>`,
`<project>`,
` ${
project.vcs === "git"
? await Ripgrep.tree({
cwd: Instance.directory,
limit: 200, // โ Max 200 files shown
})
: ""
}`,
`</project>`,
].join("\n"),
]
}
What gets added:
Token cost:
Source: packages/opencode/src/session/system.ts:58-115
This is the most misunderstood part! Let me show you exactly what gets loaded:
const LOCAL_RULE_FILES = [
"AGENTS.md",
"CLAUDE.md",
"CONTEXT.md", // deprecated
]
const GLOBAL_RULE_FILES = [
path.join(Global.Path.config, "AGENTS.md"), // ~/.config/opencode/AGENTS.md
path.join(os.homedir(), ".claude", "CLAUDE.md"), // ~/.claude/CLAUDE.md
]
export async function custom() {
const config = await Config.get()
const paths = new Set<string>()
// 1. Search for LOCAL files (searches UP the directory tree)
for (const localRuleFile of LOCAL_RULE_FILES) {
const matches = await Filesystem.findUp(localRuleFile, Instance.directory, Instance.worktree)
if (matches.length > 0) {
matches.forEach((path) => paths.add(path))
break // โ STOPS after finding first matching file
}
}
// 2. Check GLOBAL files (exact paths only)
for (const globalRuleFile of GLOBAL_RULE_FILES) {
if (await Bun.file(globalRuleFile).exists()) {
paths.add(globalRuleFile)
break // โ STOPS after finding first global file
}
}
// 3. Load files from config.instructions (if specified)
if (config.instructions) {
for (let instruction of config.instructions) {
if (instruction.startsWith("~/")) {
instruction = path.join(os.homedir(), instruction.slice(2))
}
let matches: string[] = []
if (path.isAbsolute(instruction)) {
matches = await Array.fromAsync(
new Bun.Glob(path.basename(instruction)).scan({
cwd: path.dirname(instruction),
absolute: true,
onlyFiles: true,
}),
).catch(() => [])
} else {
matches = await Filesystem.globUp(instruction, Instance.directory, Instance.worktree)
.catch(() => [])
}
matches.forEach((path) => paths.add(path))
}
}
return Promise.all(Array.from(paths).map(...))
}
Search Behavior (Critical!):
Local Files (searches UP from current directory):
AGENTS.md, CLAUDE.md, CONTEXT.mdGlobal Files (exact paths):
~/.config/opencode/AGENTS.md OR~/.claude/CLAUDE.mdConfig Instructions (if you add to opencode.json):
{
"instructions": [
".opencode/rules.md",
"~/my-custom-rules.md"
]
}
๐จ Common Misconceptions:
โ "OpenCode loads ALL .md files in .opencode/"
โ
ONLY loads AGENTS.md, CLAUDE.md, CONTEXT.md (if found)
โ "OpenCode loads from both local AND global"
โ
Loads ONE local file + ONE global file (or until first match)
โ "OpenCode always loads custom instructions"
โ
Only if the specific files exist
Token cost:
Source: packages/opencode/src/session/prompt.ts:514-522
async function resolveTools(input: {
agent: Agent.Info
sessionID: string
modelID: string
providerID: string
tools?: Record<string, boolean>
processor: Processor
}) {
const tools: Record<string, AITool> = {}
const enabledTools = pipe(
input.agent.tools, // 1. Agent config
mergeDeep(await ToolRegistry.enabled(...)), // 2. Default tools
mergeDeep(input.tools ?? {}), // 3. Request override
)
// Only load enabled tools
for (const item of await ToolRegistry.tools(...)) {
if (Wildcard.all(item.id, enabledTools) === false) continue
// ... load tool definition
}
}
Verified Tool Sizes (from source code):
| Tool | Words | Tokens | Description Size |
|---|---|---|---|
| todowrite | 1,380 | ~1,794 | Largest - complex schema |
| bash | 1,453 | ~1,889 | Large - detailed examples |
| task | 625 | ~812 | Medium - agent descriptions |
| multiedit | 416 | ~541 | Medium |
| edit | 227 | ~295 | Small-medium |
| read | 203 | ~264 | Small-medium |
| todoread | 177 | ~230 | Small-medium |
| webfetch | 148 | ~192 | Small |
| grep | 112 | ~146 | Small |
| write | 108 | ~140 | Small |
| glob | 94 | ~122 | Small |
| websearch | 77 | ~100 | Small |
| ls | 53 | ~69 | Tiny |
| lsp-hover | 3 | ~4 | Minimal |
| lsp-diagnostics | 3 | ~4 | Minimal |
| patch | 3 | ~4 | Minimal |
Default Tool Set (if not specified):
Tool Enable/Disable Logic:
# In agent config:
tools:
read: true # Explicitly enable
write: false # Explicitly disable
# If not listed, uses default (usually enabled)
๐ฅ Critical: Tools are opt-out, not opt-in! If you don't set false, they load by default.
Your conversation messages (user + assistant) are added after system prompts and tools.
Token cost:
Each model family has different:
Source: packages/opencode/src/session/system.ts:25-31
if (modelID.includes("gpt-5")) return [PROMPT_CODEX]
if (modelID.includes("gpt-") || modelID.includes("o1") || modelID.includes("o3"))
return [PROMPT_BEAST]
if (modelID.includes("gemini-")) return [PROMPT_GEMINI]
if (modelID.includes("claude")) return [PROMPT_ANTHROPIC]
return [PROMPT_ANTHROPIC_WITHOUT_TODO] // โ Default
Optimized for:
Key features:
Sample excerpt:
You are opencode, an AI assistant specialized in software engineering...
IMPORTANT: Keep responses short and to the point.
When using tools, plan your approach before executing.
Used by:
Optimized for:
Key differences from Claude:
Sample excerpt:
You are opencode, an interactive CLI tool that helps users with software engineering tasks...
IMPORTANT: Refuse to write code or explain code that may be used maliciously...
When the user asks about opencode, use WebFetch tool to gather information...
๐จ Why this matters for Ollama:
Used by:
Optimized for:
Key features:
Used by:
Optimized for:
Why so large:
Used by:
Optimized for:
---
description: "Custom agent"
mode: primary
prompt: |
You are a helpful assistant. Be concise.
You have access to tools for file operations.
---
Result: Your prompt replaces the base model prompt entirely.
opencode --system "You are a helpful assistant."
Result: Overrides both agent prompt and model prompt.
Edit packages/opencode/src/session/system.ts:
export function provider(modelID: string) {
// Force minimal for specific models
if (modelID.includes("ollama")) return ["You are a coding assistant."]
// ... rest of logic
}
Concept: AI providers store frequently-used prompt segments and reuse them across requests, charging a reduced rate for cached content.
Source: packages/opencode/src/provider/transform.ts:23-74
function applyCaching(msgs: ModelMessage[], providerID: string): ModelMessage[] {
const system = msgs.filter((msg) => msg.role === "system").slice(0, 2)
const final = msgs.filter((msg) => msg.role !== "system").slice(-2)
const providerOptions = {
anthropic: {
cacheControl: { type: "ephemeral" },
},
openrouter: {
cache_control: { type: "ephemeral" },
},
bedrock: {
cachePoint: { type: "ephemeral" },
},
openaiCompatible: {
cache_control: { type: "ephemeral" },
},
}
// ... applies to last 2 system messages and last 2 conversation messages
}
export function message(msgs: ModelMessage[], providerID: string, modelID: string) {
if (providerID === "anthropic" || modelID.includes("anthropic") || modelID.includes("claude")) {
msgs = applyCaching(msgs, providerID)
}
return msgs
}
Verified Providers:
| Provider | Supports Caching | How it Works |
|---|---|---|
| Anthropic | โ Yes | Auto-applied via cacheControl: ephemeral |
| OpenRouter | โ Yes (if Anthropic backend) | Auto-applied via cache_control |
| Bedrock | โ Yes (if Anthropic models) | Auto-applied via cachePoint |
| OpenAI-compatible | โ Maybe | Depends on backend |
| OpenAI | โ ๏ธ Partial | Uses promptCacheKey (different system) |
| Ollama | โ No | Local models don't cache |
| LM Studio | โ No | Local server |
| OpenCode (Big Pickle) | โ ๏ธ Special | Routes through Anthropic API internally |
Logic: First 2 system messages + Last 2 conversation messages
const system = msgs.filter((msg) => msg.role === "system").slice(0, 2)
const final = msgs.filter((msg) => msg.role !== "system").slice(-2)
Typical cache structure:
Message 1 (system) - Header + Base Prompt [CACHED]
Message 2 (system) - Environment + Custom + Tools [CACHED]
...
Message N-1 (user) - Your previous question [CACHED]
Message N (assistant) - Previous response [CACHED]
Message N+1 (user) - Current question [NOT CACHED]
Anthropic:
Example costs (Claude 3.5 Sonnet):
Example session:
Context: 9,800 tokens
Breakdown:
cache.read: 8,500 tokens โ From previous request
input: 1,200 tokens โ New content this request
output: 100 tokens โ Response
Actual cost calculation:
Cache read: 8,500 ร $0.30 / 1M = $0.00255
Input: 1,200 ร $3.00 / 1M = $0.00360
Output: 100 ร $15.00 / 1M = $0.00150
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Total: $0.00765 (not $0.0294 without cache!)
๐ฅ Key Insight: High token count โ High cost when cached!
Why you see cache for Big Pickle (Ollama model):
Big Pickle routes through OpenCode's API, which likely uses Anthropic as the backend. So:
You โ OpenCode CLI โ OpenCode API โ Anthropic API โ Big Pickle model
โ
Caching happens here
Evidence:
cache.read tokens# Check session cache
cat ~/.local/share/opencode/storage/message/ses_YOUR_SESSION_ID/*.json | \
jq '.tokens'
# Example output:
{
"input": 1200,
"output": 100,
"cache": {
"read": 8500, โ Indicates caching is working
"write": 0
}
}
Source: packages/opencode/src/session/system.ts:58-115
Search path: Current directory โ Parent โ ... โ Git root
Files searched (in order):
AGENTS.md โ Most commonCLAUDE.md โ LegacyCONTEXT.md โ DeprecatedBehavior:
Example:
/Users/you/project/
.opencode/
AGENTS.md โ Will be loaded
CLAUDE.md โ Will be loaded
subproject/
AGENTS.md โ Will NOT be loaded (parent already matched)
Exact paths checked (in order):
~/.config/opencode/AGENTS.md~/.claude/CLAUDE.mdBehavior:
In opencode.json:
{
"instructions": [
".opencode/rules/*.md", // Glob pattern
"~/global-rules.md", // Absolute path
"docs/coding-standards.md" // Relative path
]
}
Behavior:
*, **)When combining instructions:
All are concatenated with:
Instructions from: /path/to/file.md
<file content>
Method 1: Rename files
mv AGENTS.md AGENTS.md.disabled
mv ~/.claude/CLAUDE.md ~/.claude/CLAUDE.md.disabled
Method 2: Move to different location
mkdir .opencode/disabled
mv .opencode/AGENTS.md .opencode/disabled/
Method 3: Remove from config
{
"instructions": [] // Empty array
}
Source: All tools enabled by default unless explicitly disabled
Total if all enabled: ~6,606 tokens
Expensive tools to consider disabling:
| Tool | Tokens | When to Disable |
|---|---|---|
| todowrite | 1,794 | Don't need task management |
| bash | 1,889 | Read-only workflows |
| task | 812 | Don't use subagents |
| multiedit | 541 | Single-file edits only |
Cheap tools worth keeping:
| Tool | Tokens | Why Keep |
|---|---|---|
| read | 264 | Essential for reading files |
| write | 140 | Essential for creating files |
| edit | 295 | Essential for modifying files |
| grep | 146 | Fast text search |
| glob | 122 | Find files by pattern |
{
"tools": {
"read": true,
"write": true,
"edit": true,
"bash": true,
"grep": true,
"glob": false,
"ls": false,
"patch": false,
"webfetch": false,
"task": false,
"multiedit": false,
"lsp-diagnostics": false,
"lsp-hover": false,
"todoread": false,
"todowrite": false
}
}
Saves: ~4,856 tokens (keeping only 5 essential tools)
---
description: "Minimal agent"
tools:
read: true
write: true
edit: true
# All others implicitly false
---
๐จ IMPORTANT: Must explicitly set false or tools remain enabled!
Rationale: Caching makes large contexts cheap
Recommended:
Typical setup:
Base prompt: 1,736 tokens
Tools: 6,606 tokens
Environment: 200 tokens
Custom: 500 tokens
โโโโโโโโโโโโโโโโโโโโโโโ
Total: 9,042 tokens
With caching:
First request: 9,042 tokens (~$0.027)
Subsequent: 1,200 input + 8,500 cache read (~$0.006)
Savings: 77% per request!
Rationale: Limited context, no caching, every token matters
Edit packages/opencode/src/session/system.ts:
export function provider(modelID: string) {
// Add before other checks:
if (modelID.includes("ollama") || modelID.includes("llama")) {
return ["You are a coding assistant. Be concise."] // 8 tokens!
}
if (modelID.includes("gpt-5")) return [PROMPT_CODEX]
// ... rest
}
Saves: ~2,067 tokens (from 2,075 to 8)
{
"tools": {
"read": true,
"write": true,
"edit": true,
"bash": false,
"grep": false
}
}
Saves: ~5,906 tokens (from 6,606 to 700)
mv AGENTS.md AGENTS.md.disabled
mv ~/.claude/CLAUDE.md ~/.claude/CLAUDE.md.disabled
Saves: Varies (typically 200-2,000 tokens)
Create .opencode/agent/ollama.md:
---
description: "Ollama-optimized"
mode: primary
prompt: "Coding assistant. Concise."
tools:
read: true
write: true
edit: true
---
Result:
Base prompt override: 5 tokens
Environment: 40 tokens
Tools: 700 tokens
โโโโโโโโโโโโโโโโโโโโโโโ
Total: ~745 tokens
From 8,000+ to 745 = 91% reduction!
Rationale: Good context window, some caching support
Recommended:
Typical setup:
Base prompt: 2,475 tokens
Tools: 4,000 tokens (disable heavy ones)
Environment: 200 tokens
Custom: 300 tokens
โโโโโโโโโโโโโโโโโโโโโโโ
Total: 6,975 tokens
Since it routes through Anthropic API:
Option 1: Treat like Claude (use caching)
Option 2: Optimize for cost
Session: "Fix bug in auth.ts"
Model: claude-sonnet-4
Agent: build (default)
Project: 50 files
TOKEN BREAKDOWN:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Component Tokens Cached
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Header (anthropic_spoof) 12 Yes
Base Prompt (anthropic.txt) 1,736 Yes
Environment Info 40 Yes
Project Tree (50 files) 150 Yes
AGENTS.md 245 Yes
~/.claude/CLAUDE.md 356 Yes
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
System Prompt Total 2,539 Yes
Tools (all 16 enabled) 6,606 Yes
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Context Total 9,145 Yes
Your Message 15 No
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
TOTAL FIRST REQUEST 9,160
First request cost: $0.0275
Subsequent (5min): $0.0050 (82% savings!)
Session: "Fix bug in auth.ts"
Model: ollama/qwen2.5:latest
Agent: ollama-minimal
Project: 50 files
TOKEN BREAKDOWN:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Component Tokens Cached
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Header 0 No
Agent Prompt Override 8 No
Environment Info 40 No
Project Tree (50 files) 150 No
No custom instructions 0 No
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
System Prompt Total 198 No
Tools (3 enabled: read, 700 No
write, edit)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Context Total 898 No
Your Message 15 No
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
TOTAL EVERY REQUEST 913
Cost: Free (local)
Context usage: 11% of 8k window
Session: "Respond with HI"
Model: big-pickle
Agent: ultra-minimal
Project: 1 file
TOKEN BREAKDOWN:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Component Tokens Cached
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Header 0 Yes
Base Prompt (qwen.txt) 2,075 Yes
Environment Info 40 Yes
Project Tree (1 file) 3 Yes
~/.claude/CLAUDE.md 46 Yes
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
System Prompt Total 2,164 Yes
Tools (ALL disabled) 0 Yes
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Context Total 2,164 Yes
Your Message 4 No
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
TOTAL 2,168
With agent override + no tools:
Agent Prompt 3 Yes
Environment 40 Yes
Tree 3 Yes
Tools 0 Yes
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Context Total 46 Yes
Message 4 No
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
TOTAL 50
Savings: 97.7% reduction!
AGENTS.md (local or global)CLAUDE.md (local or global)CONTEXT.md (deprecated, but still loads)config.instructionsREADME.mdCONTRIBUTING.md.opencode/custom.md (unless in instructions).md filesFor local models (Ollama):
# .opencode/agent/minimal.md
---
description: "Minimal"
mode: primary
prompt: "Code assistant"
tools:
read: true
write: true
edit: true
---
# Disable global instructions
mv ~/.claude/CLAUDE.md ~/.claude/CLAUDE.md.disabled
# Result: ~750 tokens total
For Claude/hosted:
# Use defaults - caching makes it efficient
# Just create agents with specific tools per task
---
description: "Research agent"
tools:
read: true
grep: true
webfetch: true
---
# View session messages
cat ~/.local/share/opencode/storage/message/ses_YOUR_ID/*.json | jq
# Check custom instruction sources
grep -r "Instructions from:" ~/.local/share/opencode/storage/message/
# Count tokens per component
./script/count-agent-tokens.sh your-agent qwen2.5:latest ollama
Issue: "Why 8k tokens for simple query?"
Answer: Base prompt (2,075) + Tools (6,606) = 8,681 tokens
Issue: "Cache not working for Ollama"
Answer: Ollama doesn't support caching (local models)
Issue: "Custom instructions not loading"
Answer: Check exact filenames (case-sensitive), verify path with find
Issue: "Tools still loading after disabling"
Answer: Must set false explicitly, not just omit from config
Last Updated: Dec 7, 2025
Verified Against: OpenCode source code packages/opencode/src/session/