GLM 5.2 Fast is now accessible through Vercel AI Gateway, routed via Wafer. Internal benchmarks show Wafer delivers 2x the throughput of other serverless providers running GLM-5.2, hitting 170+ tokens per second on small-context tasks and 200+ tokens per second on large-context tasks.
The integration uses the model identifier zai/glm-5.2-fast in the AI SDK. AI Gateway wraps the call with unified usage tracking, cost reporting, configurable failover, Zero Data Retention support, and per-key budgets. No platform fee is added on top of provider pricing, including on BYOK requests.
The speed numbers across decode and end-to-end latency are the reason to read the full piece. The benchmark methodology covers small-context, large-context, and tool-call scenarios separately, which makes the comparison more credible than a single aggregate number. The model playground is live now.
[READ ORIGINAL →]