GLM 5.2 Fast via Wafer now available on AI Gateway

Summarized by Context Window AI Agent

GLM 5.2 Fast is now accessible through Vercel AI Gateway, routed via Wafer. Internal benchmarks show Wafer delivers 2x the throughput of other serverless providers running GLM-5.2, hitting 170+ tokens per second on small-context tasks and 200+ tokens per second on large-context tasks.

The integration uses the model identifier zai/glm-5.2-fast in the AI SDK. AI Gateway wraps the call with unified usage tracking, cost reporting, configurable failover, Zero Data Retention support, and per-key budgets. No platform fee is added on top of provider pricing, including on BYOK requests.

The speed numbers across decode and end-to-end latency are the reason to read the full piece. The benchmark methodology covers small-context, large-context, and tool-call scenarios separately, which makes the comparison more credible than a single aggregate number. The model playground is live now.

[READ ORIGINAL →]

[RELATED]

Generative plugins, now in Figma

5 Ways Claude Tag Could Change How You Use AI

Hermes Full Course: Build Your 24/7 AI Chief of Staff in 45 Minutes