← 2.1.0 Finding tested · runtime-test

The cost frontier: pay for intelligence only where it bends the outcome

agent: Explore routes to Haiku (~15x cheaper). Three Haiku forks in parallel, Opus synthesizing, is a production architecture, not a demo.

Across the graph 7 layers · 7 signed edges

One node in a single signed graph. Here is how this finding connects across the other layers — each edge content-addressed, evidence-gated, and ed25519-signed like any claim.

Named by · lexicon
Theory of ConstraintsRouting the cheap-and-frequent work to Haiku and reserving the frontier model for the binding step is constraint subordination applied to inference cost.
Eval LoopKnowing which work is safe to route down requires an eval loop that proves the cheaper model holds the quality bar.
Narrated by · anthology
Ode to the Haiku HordeThe dispatch that lionizes the Haiku horde doing the grunt work this primitive routes to them.
Wake the Fuck Up, RyanA field account of the cheap-model swarm running the overnight load this routing makes affordable.
Bounded by · self-critique
The site proves it can govern itself, which is not proof in a customer's productionCost-frontier savings measured in the lab are not the same as proven economics in a customer's production workload.
Augments · human work Singulariki
Judgment and Decision Making ↗Deciding which model tier each piece of work belongs to is a routing judgment under a cost-quality tradeoff.
Systems Evaluation ↗Measuring whether a cheaper model still clears the bar for a class of work is systems evaluation.

The cost frontier is the principle that in a multi-agent pipeline, the correct model tier for each step is the cheapest one whose capability is sufficient for that step -- and that this threshold is lower than practitioners expect for the majority of work.

The finding

Runtime testing of Claude Code 2.1.0 confirmed that agent: Explore routes to Haiku by default. Haiku runs at roughly 15x lower cost than Opus. The implication is structural, not incidental: most of the token volume in a real pipeline -- scanning files, grepping patterns, summarising individual results -- does not require frontier reasoning. Opus capability is only required at the synthesis step, where disparate Haiku-produced fragments must be integrated into a single coherent output.

The tested architecture was three Haiku forks running in parallel, with Opus synthesising their outputs. This is described in the source as "not a demo -- that's a production architecture."

How it works

The routing mechanism is the custom agent definition. A YAML stub like:

model: haiku
tools: Read, Grep, Glob

becomes a reusable cost/capability profile. Any skill that references the agent inherits both the model tier and the tool constraints. The profile is declared once and applied everywhere it is referenced, so the cost envelope is set at agent-definition time rather than scattered across skill logic.

The canonical task split observed in testing:

  • Haiku scans a codebase for relevant files
  • Haiku greps patterns across large file sets
  • Haiku summarises individual test results
  • Opus synthesises one coherent analysis from all prior outputs

90% of token volume runs at Haiku prices; 10% runs at Opus quality.

Why it matters

The cost frontier reframes how pipeline cost is modelled. Naive cost estimation treats every step as if it requires the highest-capability model. The frontier finding shows that the expensive model is bottlenecked to the synthesis step, and that the synthesis step is small relative to total work. The ratio is not a property of this particular test; it follows from the structure of information-gathering pipelines in general: breadth-first scanning produces many small outputs that a single aggregation step reduces.

This also means agent definitions are cost contracts. Defining cheap-scanner once with model: haiku and constrained tools is a durable commitment that cheap execution applies wherever that agent is referenced. Changing the model tier in one place propagates the cost change across all callsites.

Caveats

The 15x cost differential and the 90/10 split are reported from a single runtime session exploring 2.1.0 features. The session explicitly notes that 14 features and 8 workflows were tested and that only roughly 10% of the possible permutations were explored. The cost ratio is specific to the Haiku/Opus pair as it existed at test time; model pricing changes.

The architecture is validated as functional, not benchmarked for quality degradation. The test confirms the pattern works; it does not measure whether Haiku-produced intermediate outputs lose signal that would have been preserved by a higher-capability model at those steps.

Evidence & receipt
◇ ed25519 receipt
idfinding_44349dd66a81c41145d9ee45
alged25519
pubkey9b87705613b1e2fd064d57fa75a6b679d2856ceafad6b1daa8f982493871b6dd
sigd1b35edbef8397bf3fe05602d64154d6cbe26116c7aec4d60564c41970687e73dd8732778098c7378ca362790434630640c1d37e15ad7177c6553c063f7be106

Signed with an ed25519 key held off the repo. Anyone can verify against the published public key; nobody without the secret key can forge it. Click verify: it recomputes the signature in your browser. The signature proves integrity and authorship of this exact content — not a third-party timestamp or that the underlying claim is objectively true. signedAt is when the @f3/attest pipeline ran, not when the work happened; the evidence refs carry the source dates.

Connected