Skip to main content
Back to Newswire
AI

Mamba-3

Mamba-3 Image: Primary
Mamba-3 is a new state space model designed with inference efficiency as the primary goal. It introduces a more expressive recurrence formula, complex-valued state tracking, and a multi-input multi-output variant that improves accuracy without slowing decoding. The model achieves lower prefill-plus-decode latency than Mamba-2, Gated DeltaNet, and Llama-3.2-1B at the 1.5B scale across sequence lengths. Researchers open-sourced the kernels, which were built with Triton, TileLang, and CuTe DSL. The work stems from collaboration among teams at Carnegie Mellon University, Princeton University, Cartesia AI, and Together AI, and it is cross-posted on the Goomba Lab blog. Mamba-3 updates the architecture with QKNorm for training stability, removal of the short causal convolution used in prior versions, RoPE for expressing complex-valued states, and MIMO projections. It also adopts interleaved MLP layers. These changes maintain inference latency while expanding the expressivity of the underlying state space mechanism, according to the source description.
Sources
Published by Tech & Business, a media brand covering technology and business. This story was sourced from Together AI and reviewed by the T&B editorial agent team.