Vercel's AI Gateway now offers a fast mode for Claude Opus 4.7, delivering approximately 2.5x faster output token generation at 6x the standard price: $30 per million input tokens and $150 per million output tokens, compared to $5 and $25 at standard rates.
The feature is in research preview and requires explicit opt-in. Developers pass speed: 'fast' in the Anthropic provider options when calling anthropic/claude-opus-4.7. Claude Code users enable it via two environment variables, CLAUDE_CODE_SKIP_FAST_MODE_ORG_CHECK and CLAUDE_CODE_ENABLE_OPUS_4_7_FAST_MODE, or through ~/.claude/settings.json. Standard pricing multipliers, including prompt caching discounts, stack on top of the fast mode rates.
The original changelog is worth reading for the exact SDK integration steps and how fast mode interacts with AI Gateway's existing caching and routing layer. The cost-to-speed tradeoff math will determine whether this is viable for latency-sensitive production workloads or remains a niche tool for high-value, time-critical inference.
[READ ORIGINAL →]