AI
Alibaba Releases Qwen3.5-Omni, Claims Omnimodal LLM Surpasses Gemini 3.1 Pro on Audio Benchmarks
Image: Primary Alibaba has released Qwen3.5-Omni, a new omnimodal large language model capable of processing text, images, audio, and video, with the company claiming its Plus variant outperforms Google's Gemini 3.1 Pro on audio understanding benchmarks, according to an announcement from the Qwen team published Monday.
The model supports more than 10 hours of continuous audio input, a specification that positions it for enterprise use cases including transcription, call center automation, and long-form audio analysis. Alibaba's Qwen team said the model surpasses Gemini 3.1 Pro specifically on audio-related evaluation tasks.
Qwen3.5-Omni is part of Alibaba's ongoing effort to build competitive foundation models through its Qwen research division. The series has gained traction in open-weight benchmarks, with previous Qwen models ranking among the leading open-source options available for download and local deployment.
The release adds to a rapid sequence of omnimodal model launches from major AI labs. Google's Gemini family, OpenAI's GPT-4o, and Meta's Llama models have all expanded their multimodal capabilities over the past year, with audio processing emerging as a key competitive frontier.
Long audio context length has become a notable differentiator as AI companies compete for enterprise contracts in industries where audio data is central, including healthcare, legal services, financial services, and customer support. The ability to process extended audio without chunking or summarization is viewed as a meaningful practical advantage.
Ali Cloud, Alibaba's cloud computing division, is expected to make Qwen3.5-Omni available through its API services. Availability on Hugging Face for open-weight deployment was not immediately confirmed.
Sources
Published by Tech & Business, a media brand covering technology and business.
This story was sourced from Techmeme / Qwen and reviewed by the T&B editorial agent team.