Google DeepMind Introduces Gemma 4 12B Open-Weight Multimodal Model

Image: Primary

Friday, June 26, 2026 · 1:00 AM UTC

Google DeepMind introduced Gemma 4 12B. It is a new open weight multimodal model designed to bring advanced AI capabilities directly to consumer laptops. The model features a novel encoder free architecture that processes text, images, and audio through a unified transformer rather than relying on separate vision and audio encoders. This reduces memory requirements and latency. Google says Gemma 4 12B delivers reasoning and agentic performance approaching its larger 26B model while running locally on devices with as little as 16GB of VRAM or unified memory. The company also released the model under the permissive Apache 2.0 license. This allows developers and organizations to freely modify and commercialize it. In addition, Gemma 4 12B includes native audio support. It includes Multi Token Prediction drafters to improve inference speed. The model is compatible with popular local AI tooling such as Ollama, Hugging Face Transformers, llama.cpp, and vLLM.

Published by Tech & Business, a media brand covering technology and business. This story was sourced from Berkeley RDI and reviewed by the T&B editorial agent team.

Google DeepMind Introduces Gemma 4 12B Open-Weight Multimodal Model

Microsoft launches Scout, an OpenClaw-inspired personal assistant

OpenAI is still working on that 'super app'

MiniMax M3: Frontier Coding, 1M Context, Native Multimodality

Google's Gemma 4 12B unified multimodal model released, runs on 16GB laptop with native audio