Stop waiting for Opus on every grep.
93.8% of Claude Code tokens go to Opus unnecessarily. better-model routes tasks to the right model — shifts ~60% of subagent work to Sonnet 4.6 (~1.4× faster, ~5× cheaper, ~91% of Opus quality on routine coding) and reserves Opus 4.7 for multi-file refactoring, architecture, and security.
npx better-model initYou pay for Max or Team Premium. You get Opus on every task. Sounds great — until you notice:
- File search? Opus. 3–5 seconds wait.
- Grep for a function name? Opus. 3–5 seconds wait.
- Write a single test? Opus. 10+ seconds wait.
- Rename a variable? Opus. 10+ seconds wait.
Sonnet 4.6 handles all of these at ~91% of Opus quality, ~1.4× faster, and 5× cheaper.
| Metric | Opus 4.7 | Sonnet 4.6 | Haiku 4.5 | Notes |
|---|---|---|---|---|
| SWE-bench Verified | 87.6% | 79.6% | — | Gap 8.0 pts |
| SWE-bench Pro | 64.3% | n/a | — | Agentic coding; Opus 4.7 +10.9 pts gen-on-gen |
| GPQA Diamond | 94.2% | 74.1% | — | Gap 20.1 pts (reasoning, where Opus earns it) |
| Terminal-Bench 2.0 | 69.4% | n/a | — | Tool-use / agentic |
| Context window | 1M | 1M | 200K | Opus regression >500K |
| Price (input / output) | $5 / $25 | $3 / $15 | $1 / $5 | per MTok |
| Relative speed | baseline | ~1.4× faster | ~2× faster | subjective |
Opus 4.7 caveats: new tokenizer produces 1.0–1.35× tokens vs 4.6 on identical text (effective cost on long prompts may rise up to ~35%; prompt caching is more valuable than before). Documented "lost in the middle" regression past ~500K tokens — for large-context tasks, prefer Sonnet 4.6 or chunk.
The gap only matters for architecture, security audits, multi-file refactoring, and novel problem-solving. That's ~20% of tasks. better-model routes the other 80% to where they belong.
Step 1. Run npx better-model init in your project.
Step 2. It creates two optimized agents (sonnet-coder and haiku-explorer), drops a decision matrix into docs/BETTER-MODEL.md, adds a CRITICAL routing block to CLAUDE.md with xhigh/max effort mapping for Opus 4.7 tasks, and injects model:/effort: frontmatter into any existing .claude/agents/ and .claude/skills/.
Step 3. Claude Code reads the routing block at session start and dispatches subagent tasks to the right model — Sonnet for coding, Haiku for search, Opus 4.7 + xhigh for multi-file work and code review, Opus 4.7 + max for architecture/security/novel algorithms.
That's it. No dependencies, no proxies, no hooks. Two agents, one decision matrix, correct frontmatter.
No black box. The routing logic is a single function at src/fix.js:10-57, applied to <agent-name> <agent-description> lowercased. First-match-wins:
1. Haiku tier → model: haiku (no effort field — Haiku 4.5 does not support it)
keywords: explore, search, scan, grep, find, discover,
verify, health, check, status, monitor
2. Opus + max → model: opus, effort: max
keywords: architect, security, novel, algorithm, ultraplan
rationale: frontier reasoning — GPQA Diamond 94.2% vs 74.1%
3. Opus + xhigh → model: opus, effort: xhigh
keywords: audit, migrate, migration, migrator, review,
orchestrate, orchestrator, advisor
rationale: Anthropic-recommended starting point for Opus 4.7 coding/agentic;
covers multi-agent orchestration and the "Advisor strategy"
pattern from Code with Claude 2026; "review" subsumes
"ultrareview" by substring
4. Sonnet + high → model: sonnet, effort: high
keywords: lint, debug, investigate, diagnose
rationale: needs rigor on isolated bugs
5. Sonnet + medium → model: sonnet, effort: medium
keywords: test, format, deploy, build, generate, refactor, pipeline
rationale: standard coding work
6. Default fallback → model: sonnet, effort: medium
Read the source, fork it, tweak it. Adding a keyword to a tier is a one-line change. The full evidence-based mapping with benchmark citations lives in templates/BETTER-MODEL.md.
Normalized "task unit" = 300K input tokens + 1M output tokens at medium-effort baseline. Same task across three routing strategies:
"Vanilla Claude Code" = stock Claude Code without better-model installed, on a Pro/Max subscription. Since Claude Code v2.1.118 (Apr 23, 2026) the default model is Opus 4.7 and the default effort is
high— applied to every task: main agent turns, every subagent dispatch, every grep, every test write. That's the baseline we compare against.
| Scenario | Cost / task | Quality (SWE-bench Verified blend) | Speed (relative) |
|---|---|---|---|
Vanilla Claude Code (Max default: Opus 4.7 + high everywhere) |
~$47 | 87.6% | 1.0× baseline |
Always Opus 4.7 + max effort |
~$122 | ~87.6%¹ | ~0.5× (much slower) |
| better-model routing (Sonnet 55.6% / Opus 32.8% / Haiku 11.7%) | ~$38 | ~82.6% | ~1.4× faster avg |
| → savings vs Vanilla | −18% | −5.0 pts | +40% faster |
| → savings vs Always-max | −68% | similar quality | ~2.8× faster |
¹ max effort doesn't improve quality on most coding work — Anthropic explicitly warns it overthinks on structured-output tasks like code review.
Methodology — every assumption, openly
- Task unit: 300K input tokens + 1M output tokens at "medium" effort baseline. Effort multipliers scale only the output side.
- Opus 4.7 tokenizer: 1.20× multiplier on both sides (per Anthropic pricing docs, "up to 1.35× more tokens for the same fixed text" — 1.20× is the mid-range; code-heavy prompts trend toward 1.35×).
- Effort multipliers (output side, approximating Anthropic guidance — actual variance depends on workload):
low0.6×,medium1.0×,high1.5×,xhigh2.5×,max4.0×. - Within-tier mix: Sonnet 80% medium / 20% high (debug-heavy work); Opus 80% xhigh / 20% max (most coding stays at
xhighper Anthropic). - Routing distribution: empirical May 2026 (n=961 subagent calls across four better-model-installed projects). See "Field data" below.
- Quality blend: SWE-bench Verified weighted by routing share, coding-only tasks (88.3% of routed work). Haiku-tier search tasks excluded — not benchmarked.
- Prompt cache: NOT included. Claude Code v2.1.133 gives a 3× reduction on subagent caches, but it applies equally across all three scenarios, so it cancels out of the comparison.
- Per-project variance: a Telegram-automation project routes ~73% to Haiku; a content-app project routes ~62% to Sonnet. Your savings depend on your task mix.
Reproduce on your own numbers:
SONNET_SHARE, OPUS_SHARE, HAIKU_SHARE = 0.556, 0.328, 0.117 # your distribution
SONNET_HIGH_RATIO, OPUS_MAX_RATIO = 0.20, 0.20 # within-tier mix
MULT = {"low": 0.6, "medium": 1.0, "high": 1.5, "xhigh": 2.5, "max": 4.0}
IN_PRICE = {"haiku": 1, "sonnet": 3, "opus": 5} # $/MTok input
OUT_PRICE = {"haiku": 5, "sonnet": 15, "opus": 25} # $/MTok output
TOKENIZER = {"opus": 1.20, "sonnet": 1.0, "haiku": 1.0}
def cost(model, effort):
inp = 0.3 * TOKENIZER[model] * IN_PRICE[model]
out = 1.0 * MULT[effort] * TOKENIZER[model] * OUT_PRICE[model]
return inp + out
sonnet = 0.80 * cost("sonnet", "medium") + 0.20 * cost("sonnet", "high")
opus = 0.80 * cost("opus", "xhigh") + 0.20 * cost("opus", "max")
haiku = cost("haiku", "medium") # no effort
better = SONNET_SHARE*sonnet + OPUS_SHARE*opus + HAIKU_SHARE*haiku
vanilla = cost("opus", "high") # Max-subscriber default
always_max = cost("opus", "max")
print(f"Vanilla: ${vanilla:6.2f}") # ~$46.80
print(f"Always-max: ${always_max:6.2f}") # ~$121.80
print(f"better-model: ${better:6.2f}") # ~$38.44Refined methodology — subagent-only calls (Agent tool invocations, controlled by the routing block) in projects where better-model is installed. Excludes main-session /model choices, which depend on the user's manual selection and don't reflect routing behaviour.
Measured on a single Max subscriber across the projects where better-model was installed (platonmamatov.com, scandal, TA, better-model):
Pre-install v0.5.x era v0.6.x era
Mar 1 – Apr 4 Apr 12 – Apr 15 Apr 16 – Apr 24
Subagent calls 44,319 1,704 1,266
Opus 52.7% 49.2% 46.1% -6.6pp
Sonnet 3.8% 46.2% 45.5% +41.7pp ← 12×
Haiku 42.4% 4.6% 8.5% -33.9pp
The headline: Sonnet share in subagent dispatch went from 3.8% to ~46% — a 12× increase. Most of that shift came out of Haiku (42.4% → ~9%) — routine coding tasks that were previously handled by the native Explore-agent Haiku are now routed to sonnet-coder where code quality matters. Opus share moved only -6.6 pp, confirming the tool doesn't suppress legitimate Opus-tier work.
Caveats: Numbers are from one user across 4 projects. Pre-install Haiku share (42.4%) reflects the native Claude Code Explore agent, not a missing baseline. v0.6 era sample is smaller (1,266 calls over 9 days) than pre-install (44,319 calls over ~5 weeks). Main-session
/modelchoices and projects without better-model installed are excluded. The previous v0.5.0 field test numbers (published in v0.5.0 README) mixed main-session and subagent calls — the refined subagent-only aggregate above is a cleaner measure of what the routing block actually controls.
You don't have to take the published field data on faith. Run npx better-model stats in any project where better-model is installed and you get the same measurement, computed locally and read-only, against your own session history.
$ npx better-model stats
better-model stats — /Users/alice/Projects/payment-service
Window: last 7 days (2026-05-05T19:00:00.000Z → 2026-05-12T19:00:00.000Z)
Source: 3 session files, 47 Agent calls
Main agent (your Claude Code setting — better-model does NOT control):
Opus 100.0% (412 turns)
Subagent dispatch (controlled by better-model routing):
Sonnet 55.3% (26 calls)
Opus 31.9% (15 calls)
Haiku 12.8% (6 calls)
Compared to README target (Sonnet 55.6% / Opus 32.8% / Haiku 11.7%):
✓ Sonnet -0.3 pp
✓ Opus -0.9 pp
✓ Haiku +1.1 ppThe ✓/⚠/✗ markers reflect distance from the README target: within ±5 pp gets a ✓, 5–15 pp gets ⚠, beyond 15 pp gets ✗.
The two blocks are separate on purpose. Main agent is whichever model you pick in Claude Code's settings — better-model has no way to swap it mid-session (Claude Code's harness reserves that for explicit user /model keystrokes). Subagent dispatch is what the routing block in CLAUDE.md actually controls — the model chosen for each Agent() tool call your main agent makes. Keeping them visually separate prevents the "100% Opus → better-model is broken" misread.
$ npx better-model stats --days 30 # 30-day rolling window
$ npx better-model stats --all-projects # aggregate across every CC project
$ npx better-model stats --json # stable schema for scripts / CIThe --json schema is stable across releases (additions only): top-level project, window_days, from, to, sessions, main_agent.{total,counts}, subagent_dispatch.{total,counts,percentages}, readme_target.
| ✓ better-model controls | ✗ better-model does not control |
|---|---|
model (and effort for Opus/Sonnet) in every Agent() subagent dispatch — via the routing block hint in CLAUDE.md |
Your main agent model — that's your Claude Code setting |
Two ready-to-use subagent agents: sonnet-coder, haiku-explorer |
Your main agent effort — same |
model: frontmatter injection in .claude/agents/ and .claude/skills/ (via audit --fix) |
When the main agent decides to spawn a subagent (the main agent's call) |
Mid-session model switching — Claude Code's harness reserves /model for the user |
|
| Whether the main agent respects the routing block hint (it's a hint — sample data shows Explore subagents occasionally dispatched on Sonnet when the agent judged the task needed more rigor) |
Where savings actually come from. In Plan Mode on a typical task, the main agent runs on Opus + xhigh end-to-end and spawns 1–3 Explore subagents — better-model can route those to Haiku, saving ~5–10% of the session cost. In /loop autonomous mode the main agent spawns more variety (haiku-explorer, sonnet-coder, code-reviewer, architect), and savings rise to the ~15–30% range consistent with the field data above. better-model shines when your workflow is subagent-heavy; if you spend all session in main-agent direct edits, the savings are necessarily smaller.
| Mode | Command | What it does |
|---|---|---|
| Enforcement (default) | npx better-model init |
Agents + routing block + inject model:/effort: into agents/skills (opus-tier → xhigh/max) |
| Soft | npx better-model init --soft |
Matrix as reference only — no agents, no frontmatter changes |
Tip
In a field test, a Claude Code session read the decision matrix in soft mode and proactively updated agent configs on its own — applying the correct model to all 8 agents and skills without audit --fix being run.
| Command | Description |
|---|---|
npx better-model init |
Install with enforcement (default) |
npx better-model init --soft |
Install soft mode — reference only |
npx better-model audit |
Report agents/skills missing model settings |
npx better-model audit --fix |
Auto-inject model/effort frontmatter |
npx better-model stats |
Show recent Agent-call model distribution (last 7 days) |
npx better-model stats --days N |
Same, with a custom window |
npx better-model stats --all-projects |
Aggregate across every project under ~/.claude/projects/ |
npx better-model stats --json |
Machine-readable output for scripts and CI |
npx better-model reset |
Remove better-model and restore defaults |
npx better-model status |
Check installation status |
The decision matrix organizes tasks into three tiers based on published benchmarks:
Tier 1 — Haiku 4.5 (~20% of tasks)
Codebase exploration, file search, pattern matching. Short, focused subagent tasks that require no reasoning.
Limitation: unreliable beyond ~15 turns. Use only for quick subagent bursts. Note: Haiku 4.5 does not support the effort parameter — set model: haiku without any effort field.
Tier 2 — Sonnet 4.6 (~60% of tasks)
The default for most coding: code generation, feature implementation, test writing, simple refactoring (1–2 files), single-file debugging.
Sonnet 4.6 delivers ~91% of Opus 4.7 coding quality (SWE-bench Verified 79.6% vs 87.6%) at ~20% of the cost ($3/$15 vs $5/$25). Default effort: medium — Anthropic's recommended balance of speed, cost, and performance for agentic coding.
Tier 3 — Opus 4.7 (~20% of tasks)
Reserved for tasks where Sonnet has documented failure modes: multi-file refactoring (3+ files), cross-file debugging, architecture design, security audits, code review, novel algorithm design, migrations.
Default effort: xhigh (Anthropic-recommended starting point for coding and agentic work on Opus 4.7). Reserve max for architecture, security audits, and novel algorithms only — on structured-output tasks like code review, max can overthink.
The GPQA gap (20.1 points) and the SWE-bench Pro lead (64.3% vs 53.4% on Opus 4.6 generation) are real — Opus 4.7 earns its place here.
- Default to Sonnet + medium effort — covers ~60% of tasks.
- Escalate to Opus 4.7 +
xhighwhen the task spans 3+ files, is multi-step agentic, or needs multi-file coherence. - Escalate to Opus 4.7 +
maxonly for architecture design, security audits, and novel algorithm design. - Downgrade to Haiku +
lowfor search and pattern-matching subagents. - On Sonnet failure, escalate to Opus 4.7 — don't retry Sonnet at higher effort. A stronger model at lower effort outperforms a weaker model at higher effort.
- Avoid Opus 4.7 on >500K tokens of live context — documented lost-in-the-middle regression; chunk the task or use Sonnet 4.6.
See the full decision matrix for complete details and evidence.
You can! better-model is just a well-researched starting point:
- Evidence-based: every routing rule cites published benchmarks (Anthropic, LLM-Stats, CodeRabbit), not vibes
- Ships ready-to-use agents:
sonnet-coder(model: sonnet, effort: medium) andhaiku-explorer(model: haiku, no effort field) — 100% compliance vs ~70% from CLAUDE.md alone - Inference engine: maps agent names to the right tier automatically (review → Opus + xhigh, architect → Opus + max, scan → Haiku without effort)
- Maintained: as models and benchmarks evolve,
npx better-model@latest initgets you the updated matrix — v0.5 → v0.6 auto-upgrades in place - Reversible:
npx better-model resetremoves everything cleanly
- SWE-bench Verified — Opus 4.7 87.6% vs Sonnet 4.6 79.6% (Opus 4.7 release April 16, 2026)
- SWE-bench Pro — Opus 4.7 64.3% (+10.9 pts vs Opus 4.6)
- GPQA Diamond — Opus 4.7 94.2% vs Sonnet 4.6 74.1%
- Terminal-Bench 2.0 — Opus 4.7 69.4%
- MCP-Atlas — Opus 4.7 77.3% (agentic tool use)
- CodeRabbit — Opus 4.7 code review study, 68/100 pass rate (+24% vs baseline)
- Anthropic effort docs —
xhighrecommended for Opus 4.7 coding/agentic - Claude Code changelog —
xhigh+/effortslider shipped in v2.1.111 (April 16, 2026) - Anthropic Models overview — official specs
- RouteLLM — model routing research (ICLR)
- Claude Code Issue #27665 — real token usage data from Max subscribers
npx better-model initThen start a Claude Code session. Watch it pick Sonnet for your next grep — and Opus 4.7 + xhigh for your next multi-file refactor.
better-model fits whichever package manager your project already uses:
pnpm dlx better-model@latest init # pnpm
yarn dlx better-model@latest init # yarn berry
bunx better-model@latest init # bunIf you run npx better-model init inside a pnpm, yarn, or bun project, better-model notices your lockfile or packageManager field and prints a one-line tip with the native command — so your next run stays quiet and fits into your existing toolchain. No hint appears in plain npm projects; you only see it when it's actually useful.
Why we went out of our way for this. Many pnpm projects keep pnpm-only keys in .npmrc — node-linker, auto-install-peers, strict-peer-dependencies, enable-pre-post-scripts. npm 11 already prints "Unknown project config" warnings for those, and npm 12 will refuse to start. We didn't want you to discover that the hard way through a cryptic npx better-model failure six months from now. Running through pnpm dlx / yarn dlx / bunx sidesteps the warnings today; the canonical long-term fix is to move those keys into pnpm-workspace.yaml in camelCase (nodeLinker: isolated, autoInstallPeers: true, …) and keep .npmrc for auth and registry only.
npx better-model@latest initThe v0.7.0 init recognises your v0.6.x routing block (which carried effort: "low" for Haiku — now known to be unsupported by Haiku 4.5 per Anthropic effort docs) and upgrades it in place. Your existing haiku-explorer.md is left untouched — better-model never overwrites user files. Run npx better-model audit to see flagged stale effort on Haiku agents (⚠) and edit manually if you want a clean report.
npx better-model@latest initThe v0.7.0 init recognises your v0.5.x routing block and upgrades it in place — no reset needed. Agents (sonnet-coder, haiku-explorer) remain unchanged; only the CLAUDE.md routing block is updated to the Opus 4.7 + xhigh/max mapping (with Haiku correctly omitting effort).
npx better-model@latest initThe single-line reference from v0.4.x is automatically replaced with the full v0.7.0 routing block in a single step — no data loss, no manual edits.
Found it useful? Star the repo — it helps others find it.
Found a bug? Open an issue.
Want to improve the matrix? See CONTRIBUTING.md.
- Node.js 18+
- A project using Claude Code