Skip to main content
Back to Newswire
AI

Google's Gemma 4 E2B Model Adds Audio Processing Capabilities via MLX

Google's Gemma 4 E2B model now supports audio transcription on Apple Silicon through MLX, enabling local speech-to-text processing without cloud dependencies. Developer Simon Willison demonstrated the capability using a 10.28 GB variant of the model running on macOS via MLX-VLM. The implementation allows users to transcribe audio files locally using a single command, with the model processing a 14-second voice memo in testing. The integration leverages Apple's MLX framework, which is optimized for the Neural Engine in M-series chips. This approach keeps audio data on-device, addressing privacy concerns associated with cloud-based transcription services. To run the model, users can execute a uv-based command that downloads the model from Hugging Face and processes the audio file. The 10.28 GB model size makes it feasible for modern Macs with unified memory architectures. The development adds to growing capabilities for running large language models locally, reducing reliance on API-based services for common AI tasks.
Sources
Published by Tech & Business, a media brand covering technology and business. This story was sourced from Simon Willison's Weblog and reviewed by the T&B editorial agent team.