Kimi K2 + OpenClaw: Free Model Routing in Practice

Two posts ago we compared Kimi K2 and Claude Sonnet for raw coding tasks and landed on "K2 covers about 80% of the work at ~15% of the cost." That post left a question hanging: how do you actually wire that up so Kimi K2 picks up the cheap work automatically and your expensive models only fire when they need to?

Answer: routing. The OpenCode Controller skill inside OpenClaw is the routing layer. This post is the playbook we use across our client projects, with token-cost numbers from real billing data.

The cost math, finally

Here's a 30-day window from one of our agency tooling stacks. Pre-routing, single-model:

Period	Active model	Tokens (M)	Cost	Notes
Month 1	Claude Opus 4.6	38.4	$1,920	Everything went through Opus
Month 2	Claude Sonnet 4.6	41.1	$410	Switched everything to Sonnet
Month 3 (routed)	Mixed (K2 + Opus)	43.7	$158	OpenCode Controller routing live

Token volume went up between months 2 and 3 because the team trusted the tooling more and used it for more tasks. Cost still dropped 61%. The marginal cost of an additional task in month 3 was effectively zero for ~80% of work — that's the routing dividend.

Important caveat: same workload, same team, same OpenClaw setup. We didn't change agents or tasks, just routing rules.

Why Kimi K2 inside OpenClaw works

Kimi K2 is fast, cheap, and good enough at agentic code tasks. The places it falls down are predictable:

Schema design and migrations. K2 will produce something that compiles and runs but misses subtle constraint implications. Opus catches them.
Auth flows. Same pattern — works on the happy path, has a soft spot for rare edge cases.
Algorithmic reasoning that requires holding many constraints simultaneously. K2 will satisfy 80% of stated constraints and silently drop the rest.

For everything else — file edits, refactors, test scaffolding, glue code, search-and-replace, doc generation — K2 is competitive with Sonnet at a fraction of the cost.

The point of routing isn't to use K2 everywhere. It's to use K2 by default and escalate to Opus on the work where K2 actually breaks.

The routing config we ship

This is the [routing] section of our opencode-controller/config.toml, dropped into client projects unchanged:

[routing]
# 1. Default: Kimi K2 for everything not otherwise tagged
cheap_default = "openrouter:moonshotai/kimi-k2"

# 2. Heavy tags route to Opus
heavy_tags = ["schema", "migration", "auth", "security", "billing"]
heavy_model = "anthropic:claude-opus-4-7"

# 3. UI/design tasks go to Gemini 3 (better at design intent)
ui_tags = ["ui-mock", "design", "layout"]
ui_model = "google:gemini-3-pro"

# 4. Reviews always use a different model than the writer
review_strategy = "alternate"

# 5. When K2 returns a 429 or context-window overflow, fall through
fallback_chain = [
  "openrouter:moonshotai/kimi-k2",
  "anthropic:claude-sonnet-4-6",
  "anthropic:claude-opus-4-7"
]

The five-line breakdown:

cheap_default is the workhorse. ~80% of token volume hits this.
heavy_tags is the safety net. We learned the list by tracking which K2 outputs needed manual rework — schemas and migrations dominated, auth and billing were close behind.
ui_tags is qualitative. Gemini 3 produces design intent K2 doesn't match yet.
review_strategy = "alternate" is the cheap-insurance trick. If K2 wrote it, Opus reviews. If Opus wrote it, Sonnet reviews. Different model = different blind spots.
fallback_chain matters more than it sounds. Kimi K2 via OpenRouter rate-limits aggressively. When you saturate the lane, the chain prevents builds from blocking.

Tagging in practice

The model picks the route, but you pick the tag. Three patterns we use:

Manual tagging. Before kicking off a sub-task: /opencode tag schema. Reset after.

Convention-based tagging. A wrapper script reads the file path and auto-tags. migrations/*.sql → migration. lib/auth/* → auth. The script lives in our agency starter kit — we ship it with new client projects.

LLM-driven tagging. A first-pass routing agent reads the user request and picks the tag. We tried this; it's clever but adds 200-400ms latency per turn and doesn't beat the convention-based approach. Killed it after two weeks.

Pick one and stick with it. Mixing modes confuses the team and makes the cost dashboard hard to read.

Caching the writeable models

The biggest cost lever after routing is prompt caching. When the same long system prompt or codebase context goes to Opus repeatedly, cache it.

OpenClaw doesn't do this for you — but the controller skill does, with one config flag:

[providers.anthropic]
auth = "env:ANTHROPIC_API_KEY"
default_model = "claude-opus-4-7"
prompt_cache = "auto"  # cache anything > 1024 tokens for 5 minutes

In our usage, this knocks another 30-50% off Opus costs on long-running sessions. Sonnet supports it identically. K2 via OpenRouter doesn't, but K2 is cheap enough that you don't care.

What we got wrong at first

Tagging too aggressively. We started with seven heavy tags and routed almost everything to Opus. Cost barely budged from baseline. The data showed schema and migration were doing 90% of the routing work; the rest were noise. Cut to five tags, then to four. Costs dropped further.

Not measuring per-task quality. We assumed K2 was "fine" because builds passed. Builds passing isn't quality. We added a one-line manual rating after every K2-completed task for two weeks. The data surfaced that K2 outputs in auth/ were getting reworked 3x more often than other paths — that's how auth ended up in heavy_tags.

Trusting /opencode cost blindly. v1.4.2 of the skill undercounts cached Anthropic input by 90%. If your real bill is way higher than the skill's report, that's the cause. We pull actual billing data weekly and compare; the gap has stayed predictable.

When this approach doesn't work

Three patterns where multi-model routing isn't worth the complexity:

Solo project, low volume. If you're running OpenClaw 30 minutes a day on personal code, the savings won't pay for the config-and-tag overhead. Use one model and move on.
Compliance-bound work. If your data residency or audit requirements rule out one of the providers, you can't actually route to them. Stick with the allowed set.
Latency-critical interactive work. Routing adds a few hundred ms of decision latency. Doesn't matter for autonomous agent runs; can matter for tight interactive loops.

For everyone else — agency teams, AI-tooling-heavy startups, anyone running OpenClaw in production daily — this routing setup is the single biggest cost lever we've found short of dropping AI entirely.

What's next

If you haven't installed the controller skill yet, start with the setup guide. If you're skeptical of K2's quality claims, the Kimi K2 vs Claude benchmark post has the numbers. If you're trying to figure out the broader OpenClaw + OpenCode picture, the pillar essay is the entry point.

Kimi K2 + OpenClaw: Free Model Routing in Practice

Kimi K2 + OpenClaw: Free Model Routing in Practice

The cost math, finally

Why Kimi K2 inside OpenClaw works

The routing config we ship

Tagging in practice

Caching the writeable models

What we got wrong at first

When this approach doesn't work

What's next

About Jomar Montuya

Expertise:

Related Posts

ClawHub Security Risks: How to Audit Skills Before Installing

OpenClaw + Cursor: Running Both Without Conflicts

OpenClaw Sub-Agent Orchestration: Practical Patterns

Your Next Project, Delivered in 8–12 Weeks