Claude Code is growing faster than Anthropic can manage rate limits. Boris Cherny, head of Claude Code at Anthropic, tells Big Technology that usage has exploded to the point where token consumption and infrastructure constraints are active engineering problems, not future ones. The interview covers tokenmaxxing, the practice of maximizing token usage per task, and whether that demand is real signal or waste.
The conversation gets specific in ways most AI coverage avoids. Cherny explains what happens when users run hundreds of agents in parallel, how Claude instances prompt other Claude instances, and why rate limits are generating genuine user frustration. The SaaS disruption angle, what Cherny calls a potential saaspocalypse, and the question of whether current models actually understand consequences of their actions are addressed directly, not hand-waved.
The final chapters on self-improving AI and world models are worth reading for Cherny's candor about what today's agents can and cannot do. The debate between sustainable agent infrastructure and a speculative fever dream is left productively unresolved. This is a primary source interview from someone building the system, not analyzing it from outside.
[WATCH ON YOUTUBE →]