LM Studio Optimal Settings for OpenCode
This guide provides battle-tested configurations for running Qwen3-Coder and GPT-OSS-20B with OpenCode via LM Studio.
Quick Start
- Copy the configuration from
lmstudio-config-example.json to your opencode.json
- Download your models in LM Studio
- Start the LM Studio server on port 1234
- Launch OpenCode and use
/models to select your local model
Model-Specific Settings
Qwen3-Coder-30B (Recommended Primary Model)
Best for: Precise tool calling, code generation, debugging
{
"limit": {
"context": 24000,
"output": 4000
},
"options": {
"temperature": 0.1,
"topP": 0.8,
"minP": 0.01,
"repetitionPenalty": 1.05
}
}
Why these settings:
- Temperature 0.1: Maximizes deterministic tool calling reliability. Use 0.2-0.3 for more creative exploration.
- Top-P 0.8: Constrains token diversity appropriately for coding tasks
- Min-P 0.01: Lower than llama.cpp default (0.1) for better tool use
- Repetition Penalty 1.05: Prevents infinite loops during multi-step tool calls
- Context 24000: Handles large codebases without frequent compaction
- Output 4000: Sufficient for most code generation tasks
GPT-OSS-20B (Alternative/Backup Model)
Best for: General coding, conversation, when you need higher creativity
{
"limit": {
"context": 16000,
"output": 4000
},
"options": {
"temperature": 0.4,
"topP": 0.9,
"minP": 0.05,
"repetitionPenalty": 1.05
}
}
Why these settings:
- Temperature 0.4: Higher than Qwen3 due to different architecture - still reliable for tools
- Top-P 0.9: More diversity for MoE (Mixture of Experts) architecture
- Min-P 0.05: Slightly higher for better creative balance
- Repetition Penalty 1.05: Same as Qwen3 for loop prevention
- Context 16000: Sufficient for most tasks, adjust based on VRAM
LM Studio Application Settings
GPU Acceleration (Critical)
In LM Studio Settings → Hardware:
GPU Offload Layers: Set to MAXIMUM your GPU can handle
- RTX 4060 8GB: 36 layers
- RTX 4070 12GB: 40 layers
- RTX 4090 24GB: All layers
- Mac M1/M2/M3: All layers (MLX preferred)
Keep Model in VRAM: ✅ Enable
Offload KV Cache to GPU: ✅ Enable (4x speedup on compatible hardware)
Context Settings
- Context Length: Match or exceed your config (24000 for Qwen3, 16000 for GPT-OSS)
- Batch Size: 512 (default) or higher if VRAM allows
- Threads: Set to CPU cores - 2 (e.g., 14 threads for 16-core CPU)
Speculative Decoding (Advanced)
For 30B+ models, enable speculative decoding:
- Draft Model: Use a small 1-3B model from the same family
- Speedup: 1.5x-3x without quality loss
OpenCode Integration
Full Configuration Example
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"lmstudio": {
"npm": "@ai-sdk/openai-compatible",
"name": "LM Studio (Local)",
"options": {
"baseURL": "http://127.0.0.1:1234/v1"
},
"models": {
"qwen3-coder-30b": {
"name": "Qwen3-Coder-30B (Local)",
"tools": true,
"limit": {
"context": 24000,
"output": 4000
},
"options": {
"temperature": 0.1,
"topP": 0.8,
"minP": 0.01,
"repetitionPenalty": 1.05
}
},
"gpt-oss-20b": {
"name": "GPT-OSS-20B (Local)",
"tools": true,
"limit": {
"context": 16000,
"output": 4000
},
"options": {
"temperature": 0.4,
"topP": 0.9,
"minP": 0.05,
"repetitionPenalty": 1.05
}
}
}
}
},
"model": "lmstudio/qwen3-coder-30b",
"agents": {
"build": {
"mode": "primary",
"description": "Main build agent"
}
}
}
Switching Models
Use the /models command in OpenCode to switch between your configured models without restarting.
Troubleshooting
Tool Calls Not Working
- Increase context window in LM Studio to 16k-32k minimum
- Verify temperature is set correctly (0.1 for Qwen3, 0.4 for GPT-OSS)
- Check repetition penalty is set to 1.05
- Restart LM Studio server after changing settings
Slow Performance
- Maximize GPU layers - check LM Studio logs for "offloaded X/Y layers"
- Enable KV cache offload in GPU settings
- Reduce context length if hitting VRAM limits
- Try speculative decoding with a draft model
Out of Memory
- Reduce context length: 16000 → 12000 → 8000
- Reduce GPU layers: Start at 50% and increase
- Switch to smaller quantization: Q6 → Q5 → Q4
- Close other applications using VRAM
Model Repeating Itself
- Increase repetition penalty: 1.05 → 1.10 → 1.15
- Lower temperature slightly: 0.1 → 0.05
- Check min-P setting: Should be 0.01-0.05
Hardware Recommendations
Minimum Specs (Qwen3-Coder-30B)
- GPU: 12GB VRAM (RTX 4070, RTX 3080 12GB)
- RAM: 16GB system RAM
- Quantization: Q4_K_M or Q5_K_M
Recommended Specs (Qwen3-Coder-30B)
- GPU: 16-24GB VRAM (RTX 4080, RTX 4090)
- RAM: 32GB system RAM
- Quantization: Q6_K or Q8
Minimum Specs (GPT-OSS-20B)
- GPU: 8GB VRAM (RTX 4060)
- RAM: 16GB system RAM
- Quantization: Q4_K_M
Mac Users
- MLX versions strongly recommended over GGUF
- Significantly faster on Apple Silicon
- Use LM Studio's MLX support or native MLX inference
- M1/M2/M3 with 16GB+ unified memory works well
Performance Expectations
Qwen3-Coder-30B (Q5_K_M on RTX 4080)
- Tokens/second: 15-25 t/s
- Context loading: 2-3 seconds
- Tool call reliability: 95%+
GPT-OSS-20B (Q5_K_M on RTX 4060)
- Tokens/second: 20-30 t/s
- Context loading: 1-2 seconds
- Tool call reliability: 90%+
Settings Comparison Table
| Setting |
Qwen3-Coder |
GPT-OSS-20B |
Reasoning |
| Temperature |
0.1 |
0.4 |
Qwen3 needs lower for tool calling |
| Top-P |
0.8 |
0.9 |
MoE models benefit from more diversity |
| Min-P |
0.01 |
0.05 |
Lower for deterministic tool use |
| Repetition Penalty |
1.05 |
1.05 |
Prevents loops in both |
| Context |
24000 |
16000 |
Qwen3 handles larger contexts better |
| Output |
4000 |
4000 |
Standard for code generation |
When to Adjust Settings
For More Creativity
- Increase temperature: 0.1 → 0.3 (Qwen3) or 0.4 → 0.6 (GPT-OSS)
- Increase top-P: 0.8 → 0.9 or 0.9 → 0.95
For More Precision
- Decrease temperature: 0.1 → 0.05 (careful: may reduce quality)
- Decrease top-P: 0.8 → 0.7
For Handling Repetition
- Increase repetition penalty: 1.05 → 1.10
- Add frequency penalty: 0 → 0.3
Notes
- These settings are optimized for tool calling reliability with OpenCode
- Raw performance benchmarks show Ollama may be faster, but tool calling is unreliable
- LM Studio's proper parameter handling makes it the recommended choice for OpenCode
- Settings can be adjusted per-use-case, but these defaults work for 90% of coding tasks