Most of a modern AI bill is the same context, resent every turn. A cache is the answer to that, and in the past year it went from a footnote to the main lever. The reason is timing. Sessions are long now, so the repeated base is large. Most serious teams already call more than one provider. And each provider prices its cache differently. The lever exists, the workloads arrived, and a session can be routed across them to where it runs cheapest.

The cache economics

Start with one provider's published numbers. On Anthropic's pricing, verified in June 2026, a cache read bills at about a tenth of a normal input token. That is close to a ninety percent discount on the repeated context you would otherwise resend in full every turn. The write is where the cost sits: putting a prefix into the cache runs about 1.25x the input price for a five-minute entry, and about 2x for a one-hour entry.

So keeping a session on a warm cache turns most of its input cost into a tenth of itself. The base is written once, at a premium, and read back cheaply for the rest of the session. The trap is switching models without thinking: move a warm session to another provider and you discard the warm cache, then pay the write premium again to rebuild it on the next one.

The cross-provider asymmetry

Providers do not agree on how the cache should be priced. Anthropic charges a premium to write and a steep discount to read. Others price their caches on their own scale; some write for free and price reads differently. The same workload lands at a different total depending on where it runs, and the only way to know is to account for how each provider actually bills its cache.

That asymmetry is room to route on. Three levers compound here. Model mix routes the turns that do not need the frontier model to a cheaper one, inside a target spend mix the customer sets. Switch reduction groups turns and switches rarely, often once, spending the expensive model early where it shapes the session. And routing across providers sends each session to where it runs cheapest. Tuned to a customer's own traffic, the three compound into a large cut at equal quality.

All of it depends on owning the session. A router that sees one message in isolation cannot reason about what is already cached, or what a switch would cost on the next turn. Hold the whole session and the pricing becomes a routing decision.