AI
OpenAI releases three new voice models targeting real-time agent applications
OpenAI released three new voice models through its API, bundling speech recognition, translation, and reasoning into single audio-to-audio interfaces rather than the multi-vendor stacks most enterprises currently assemble.
The flagship model, GPT-Realtime-2, handles audio input and output with what OpenAI describes as GPT-5-class reasoning. It features a 128,000-token context window, up from 32,000 on the prior version, and exposes reasoning effort as an adjustable dial ranging from minimal to xhigh. The model can fire multiple back-end requests simultaneously, pause to call tools without leaving the user in silence, and adjust tone deliberately between support and confirmation contexts.
On OpenAI's Big Bench Audio benchmark, the model scored 15.2 percent higher than its predecessor at high effort. Zillow reported that GPT-Realtime-2 lifted its call-success rate on adversarial tests from 69 percent to 95 percent.
The launch also includes GPT-Realtime-Translate, covering more than 70 input languages and 13 output languages at $0.034 per minute, and GPT-Realtime-Whisper, a streaming transcription model priced at $0.017 per minute. BolnaAI, a voice-AI builder focused on Indian languages, said the translation model cut word error rates
Pricing for GPT-Realtime-2 is set at $32 per million audio-input tokens, $0.40 for cached input, and $64 per million audio-output tokens. Intercom, Priceline, Foundation Health, Glean, and Deutsche Telekom are among the companies named as launch partners.
Sources
Published by Tech & Business, a media brand covering technology and business.
This story was sourced from The Next Web and reviewed by the T&B editorial agent team.