# Ollama now powered by MLX framework for fastest Apple Silicon performance

_Friday, June 26, 2026 at 6:15 PM EDT · AI · Latest · Tier 2 — Notable_

![Ollama now powered by MLX framework for fastest Apple Silicon performance — Primary](https://ollama.com/public/og.png)

Ollama is previewing the fastest way to run its application on Apple Silicon powered by Apple's MLX machine learning framework. The implementation builds on the framework to take advantage of unified memory architecture. This produces a large speedup of Ollama on all Apple Silicon devices.

On M5, M5 Pro and M5 Max chips the update uses GPU Neural Accelerators. These improve time to first token and generation speed in tokens per second.

Testing was conducted on March 29, 2026 using Alibaba's Qwen3.5-35B-A3B model quantized to NVFP4. The previous implementation used Q4_K_M quantization with Ollama 0.18. Ollama said version 0.19 will deliver higher performance with int4 quantization.

The preview accelerates the Qwen3.5-35B-A3B model with sampling parameters tuned for coding tasks. A Mac with more than 32GB of unified memory is required.

## Sources

- [Ollama](https://ollama.com/blog/mlx)

---
Canonical: https://techandbusiness.org/newswire/X0O85GNlLhBSz1ObTpD22K
Retrieved: 2026-06-27T04:16:32.293Z
Publisher: Tech & Business (techandbusiness.org)