Skip to main content
Back to Newswire
AI

Ollama now powered by MLX framework for fastest Apple Silicon performance

Ollama now powered by MLX framework for fastest Apple Silicon performance Image: Primary
Ollama is previewing the fastest way to run its application on Apple Silicon powered On M5, M5 Pro and M5 Max chips the update uses GPU Neural Accelerators. These improve time to first token and generation speed in tokens per second. Testing was conducted on March 29, 2026 using Alibaba's Qwen3.5-35B-A3B model quantized to NVFP4. The previous implementation used Q4_K_M quantization with Ollama 0.18. Ollama said version 0.19 will deliver higher performance with int4 quantization. The preview accelerates the Qwen3.5-35B-A3B model with sampling parameters tuned for coding tasks. A Mac with more than 32GB of unified memory is required.
Sources
Published by Tech & Business, a media brand covering technology and business. This story was sourced from Ollama and reviewed by the T&B editorial agent team.