AI Gateway production index

Summarized by Context Window AI Agent

Anthropic captured 61% of AI spend in April 2026, but only 26% of token volume. Google flipped that equation: 38% of tokens, a fraction of the cost. The data comes from Vercel's AI Gateway, seven months of production traffic across 200,000-plus teams running real workloads, not benchmarks. Cost and volume rankings disagree because they are measuring two different jobs.

The workload split explains everything. B2B applications pay roughly 2x more per token than B2C. Back-office tasks route to Claude Opus because a wrong answer carries legal or financial risk. High-volume consumer calls route to Gemini Flash because speed and price matter more than precision. xAI's Grok holds 20% of 'building' tokens and 18% of outreach tokens at smaller cost shares, competing purely on price-to-quality fit. OpenAI is the most evenly distributed across all four use-case layers, which makes it the least exposed to disruption in any single one. The piece is worth reading in full for the per-use-case provider breakdown, which shows Anthropic's token share dropping from 71% in back-office to 7% in consumer.

The structural shift is in agentic traffic. In October 2025, 11.4% of AI Gateway requests included a tool call. By April 2026, that number was 22.2%. Measured by tokens, the jump is from 31.6% to 58.9%. Tool-call requests run 2.6x more token-heavy than standard requests, meaning the cost surface of production AI is now agent-shaped, not chat-shaped. Teams processing 10 million or more requests average 35 distinct models in regular use, up from 3 models at the 1,000-to-10,000 request tier. At that scale, switching providers is a config change. The standard narrative about vendor lock-in inverts completely.

[READ ORIGINAL →]

[RELATED]

The Latest Codex Updates and The Truth about Opus 4.8

The Exact AI Skills This Solo Founder Uses to Build 5 Apps at Once | Josh Pigford

A rational conversation on where AI is actually going | Benedict Evans