The cost frontier: pay for intelligence only where it bends the outcome — Runtime Atlas

The cost frontier is the principle that in a multi-agent pipeline, the correct model tier for each step is the cheapest one whose capability is sufficient for that step -- and that this threshold is lower than practitioners expect for the majority of work.

The finding

Runtime testing of Claude Code 2.1.0 confirmed that agent: Explore routes to Haiku by default. Haiku runs at roughly 15x lower cost than Opus. The implication is structural, not incidental: most of the token volume in a real pipeline -- scanning files, grepping patterns, summarising individual results -- does not require frontier reasoning. Opus capability is only required at the synthesis step, where disparate Haiku-produced fragments must be integrated into a single coherent output.

The tested architecture was three Haiku forks running in parallel, with Opus synthesising their outputs. This is described in the source as "not a demo -- that's a production architecture."

How it works

The routing mechanism is the custom agent definition. A YAML stub like:

model: haiku
tools: Read, Grep, Glob

becomes a reusable cost/capability profile. Any skill that references the agent inherits both the model tier and the tool constraints. The profile is declared once and applied everywhere it is referenced, so the cost envelope is set at agent-definition time rather than scattered across skill logic.

The canonical task split observed in testing:

Haiku scans a codebase for relevant files
Haiku greps patterns across large file sets
Haiku summarises individual test results
Opus synthesises one coherent analysis from all prior outputs

90% of token volume runs at Haiku prices; 10% runs at Opus quality.

Why it matters

The cost frontier reframes how pipeline cost is modelled. Naive cost estimation treats every step as if it requires the highest-capability model. The frontier finding shows that the expensive model is bottlenecked to the synthesis step, and that the synthesis step is small relative to total work. The ratio is not a property of this particular test; it follows from the structure of information-gathering pipelines in general: breadth-first scanning produces many small outputs that a single aggregation step reduces.

This also means agent definitions are cost contracts. Defining cheap-scanner once with model: haiku and constrained tools is a durable commitment that cheap execution applies wherever that agent is referenced. Changing the model tier in one place propagates the cost change across all callsites.

Caveats

The 15x cost differential and the 90/10 split are reported from a single runtime session exploring 2.1.0 features. The session explicitly notes that 14 features and 8 workflows were tested and that only roughly 10% of the possible permutations were explored. The cost ratio is specific to the Haiku/Opus pair as it existed at test time; model pricing changes.

The architecture is validated as functional, not benchmarked for quality degradation. The test confirms the pattern works; it does not measure whether Haiku-produced intermediate outputs lose signal that would have been preserved by a higher-capability model at those steps.