TurboQuant: Redefining AI efficiency with extreme compression

Google Research scientists Amir Zandieh and Vahab Mirrokni introduced TurboQuant on March 24, 2026. The algorithm, along with Quantized Johnson-Lindenstrauss and PolarQuant, targets memory overhead in vector quantization for large language models and vector search engines. TurboQuant compresses high-dimensional vectors used in key-value caches and similarity searches. QJL applies the Johnson-Lindenstrauss Transform to reduce vector components to single sign bits with zero added memory overhead. PolarQuant converts vectors to polar coordinates to remove the need for data normalization steps. The techniques were tested on LongBench, Needle In A Haystack, ZeroSCROLLS, RULER, and L-Eval using Gemma and Mistral models. TurboQuant quantized key-value caches to 3 bits without training or fine-tuning and without accuracy loss. It delivered up to 8 times faster attention logit computation on H100 GPUs compared with 32-bit keys. In vector search tests against PQ and RabbiQ, TurboQuant achieved higher 1@k recall ratios. The methods require no dataset-specific tuning and support faster index building. TurboQuant, QJL, and PolarQuant are scheduled for presentation at ICLR 2026 and AISTATS 2026.

TurboQuant: Redefining AI efficiency with extreme compression

OpenClaw creator Peter Steinberger joins OpenAI

Anthropic identifies industrial-scale distillation attacks by DeepSeek, Moonshot AI and MiniMax on Claude

The enterprise AI land grab is on, Glean is building the layer beneath the interface

Key findings about how Americans view artificial intelligence