OpenAI releases three new voice models targeting real-time agent applications

OpenAI released three new voice models through its API, bundling speech recognition, translation, and reasoning into single audio-to-audio interfaces rather than the multi-vendor stacks most enterprises currently assemble. The flagship model, GPT-Realtime-2, handles audio input and output with what OpenAI describes as GPT-5-class reasoning. It features a 128,000-token context window, up from 32,000 on the prior version, and exposes reasoning effort as an adjustable dial ranging from minimal to xhigh. The model can fire multiple back-end requests simultaneously, pause to call tools without leaving the user in silence, and adjust tone deliberately between support and confirmation contexts. On OpenAI's Big Bench Audio benchmark, the model scored 15.2 percent higher than its predecessor at high effort. Zillow reported that GPT-Realtime-2 lifted its call-success rate on adversarial tests from 69 percent to 95 percent. The launch also includes GPT-Realtime-Translate, covering more than 70 input languages and 13 output languages at $0.034 per minute, and GPT-Realtime-Whisper, a streaming transcription model priced at $0.017 per minute. BolnaAI, a voice-AI builder focused on Indian languages, said the translation model cut word error rates Pricing for GPT-Realtime-2 is set at $32 per million audio-input tokens, $0.40 for cached input, and $64 per million audio-output tokens. Intercom, Priceline, Foundation Health, Glean, and Deutsche Telekom are among the companies named as launch partners.

OpenAI releases three new voice models targeting real-time agent applications

Weibo researchers claim 3B-parameter model matches larger AI systems on demanding math benchmarks

Meta reports Threads reaching 500 million monthly active users

Former engineer sues xAI claiming retaliation over Grok safety warnings

Sapient trains 1B-parameter foundation model from scratch for about $1,500