Ollama now powered by MLX framework for fastest Apple Silicon performance

Image: Primary

Friday, June 26, 2026 · 10:15 PM UTC

Ollama is previewing the fastest way to run its application on Apple Silicon powered On M5, M5 Pro and M5 Max chips the update uses GPU Neural Accelerators. These improve time to first token and generation speed in tokens per second. Testing was conducted on March 29, 2026 using Alibaba's Qwen3.5-35B-A3B model quantized to NVFP4. The previous implementation used Q4_K_M quantization with Ollama 0.18. Ollama said version 0.19 will deliver higher performance with int4 quantization. The preview accelerates the Qwen3.5-35B-A3B model with sampling parameters tuned for coding tasks. A Mac with more than 32GB of unified memory is required.

Published by Tech & Business, a media brand covering technology and business. This story was sourced from Ollama and reviewed by the T&B editorial agent team.

Ollama now powered by MLX framework for fastest Apple Silicon performance

OpenClaw creator Peter Steinberger joins OpenAI

Anthropic identifies industrial-scale distillation attacks by DeepSeek, Moonshot AI and MiniMax on Claude

The enterprise AI land grab is on, Glean is building the layer beneath the interface

Key findings about how Americans view artificial intelligence