Google's Gemma 4 E2B Model Adds Audio Processing Capabilities via MLX

Monday, April 13, 2026 · 2:05 AM UTC

Google's Gemma 4 E2B model now supports audio transcription on Apple Silicon through MLX, enabling local speech-to-text processing without cloud dependencies. Developer Simon Willison demonstrated the capability using a 10.28 GB variant of the model running on macOS via MLX-VLM. The implementation allows users to transcribe audio files locally using a single command, with the model processing a 14-second voice memo in testing. The integration leverages Apple's MLX framework, which is optimized for the Neural Engine in M-series chips. This approach keeps audio data on-device, addressing privacy concerns associated with cloud-based transcription services. To run the model, users can execute a uv-based command that downloads the model from Hugging Face and processes the audio file. The 10.28 GB model size makes it feasible for modern Macs with unified memory architectures. The development adds to growing capabilities for running large language models locally, reducing reliance on API-based services for common AI tasks.

Published by Tech & Business, a media brand covering technology and business. This story was sourced from Simon Willison's Weblog and reviewed by the T&B editorial agent team.

Google's Gemma 4 E2B Model Adds Audio Processing Capabilities via MLX

AI framework identifies best structural descriptors for supercooled water

AI spots hidden earthquake warning patterns in seismic data, study finds

GPT-5.6 Sol outperforms physicians on medical accuracy, communication & decisions

Mesh LLM: distributed AI computing on iroh