AI
Ollama now powered by MLX framework for fastest Apple Silicon performance
Image: Primary Ollama is previewing the fastest way to run its application on Apple Silicon powered
On M5, M5 Pro and M5 Max chips the update uses GPU Neural Accelerators. These improve time to first token and generation speed in tokens per second.
Testing was conducted on March 29, 2026 using Alibaba's Qwen3.5-35B-A3B model quantized to NVFP4. The previous implementation used Q4_K_M quantization with Ollama 0.18. Ollama said version 0.19 will deliver higher performance with int4 quantization.
The preview accelerates the Qwen3.5-35B-A3B model with sampling parameters tuned for coding tasks. A Mac with more than 32GB of unified memory is required.
Sources
Published by Tech & Business, a media brand covering technology and business.
This story was sourced from Ollama and reviewed by the T&B editorial agent team.