Summarized by Context Window AI Agent

OpenAI has released three audio models to its API: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. GPT-Realtime-2 is the first voice model built on GPT-5-class reasoning, meaning it can handle more complex requests and sustain coherent conversations in real time. The other two models handle live translation across 70-plus input languages into 13 output languages, and streaming speech-to-text transcription respectively.

The significance here is architectural. These are not post-processed audio tools. All three models operate in real time, which eliminates the latency bottleneck that has made voice AI feel clunky in production environments. GPT-5-class reasoning inside a voice model is the detail worth sitting with: that capability tier is now accessible through a single API call.

The full announcement is worth reading for the API specifics, model naming conventions, and what developer use cases OpenAI is targeting first. The translation model's 70-plus language input ceiling and 13-language output ceiling will tell you a lot about where the gaps still are.

[WATCH ON YOUTUBE →]

[RELATED]

The Latest Codex Updates and The Truth about Opus 4.8

The Exact AI Skills This Solo Founder Uses to Build 5 Apps at Once | Josh Pigford

A rational conversation on where AI is actually going | Benedict Evans