Qwen3.5-Omni omnimodal LLM release by Alibaba

Alibaba has released Qwen3.5-Omni, an omnimodal AI model that processes text, images, audio, and video. The model comes in three Instruct variants called Plus, Flash, and Light. It handles contexts up to 256,000 tokens. The model was natively pre-trained on more than 100 million hours of audiovisual material. It can process more than ten hours of audio and over 400 seconds of 720p video at one frame per second. Qwen3.5-Omni generates speech output alongside text. Qwen3.5-Omni-Plus sets state of the art results across 215 audio and audiovisual subtasks, the Qwen team said. The Plus version outperforms Google's Gemini 3.1 Pro on audio comprehension with a score of 82.2 versus 81.1. It also leads on music comprehension at 72.4 versus 59.6 and on the VoiceBench dialog benchmark at 93.1 versus 88.9. Speech recognition now covers 74 languages and 39 Chinese dialects. Voice output supports 36 languages and dialects with 55 voices available. On the Fleurs dataset for the top 60 languages, the Plus version recorded a word error rate of 6.55 compared to 7.32 for Gemini 3.1 Pro. The model shows an emergent capability to write code from spoken instructions and video input. The Qwen team calls the skill audio-visual vibe coding. Demos include building a working snake game from a verbal description and a video clip. Qwen3.5-Omni is available only as an API service through Qwen Chat and the Alibaba Cloud Model Studio. Unlike previous Qwen releases, the company has not published model weights.

Qwen3.5-Omni omnimodal LLM release by Alibaba

OpenClaw creator Peter Steinberger joins OpenAI

Anthropic identifies industrial-scale distillation attacks by DeepSeek, Moonshot AI and MiniMax on Claude

The enterprise AI land grab is on, Glean is building the layer beneath the interface

Key findings about how Americans view artificial intelligence