AI
New open models-Introducing NVIDIA Nemotron 3 Super
NVIDIA has released Nemotron-3-Super-120B-A12B-FP8 as part of its Nemotron family of open models. The model contains 120 billion parameters in total with 12 billion active parameters. It features a hybrid architecture that combines Mamba-2 layers, mixture-of-experts components, attention layers, and multi-token prediction layers.
Pre-training involved more than 25 trillion tokens drawn from crawled and synthetic sources covering code, math, science, and general knowledge. Supervised fine-tuning followed on synthetic data for code, math, science, tool calling, instruction following, structured outputs, and general knowledge. The process concluded with reinforcement learning using asynchronous group relative policy optimization in environments focused on math, code, science, and multi-step tool use.
The model supports English, French, German, Italian, Japanese, Spanish, and Chinese. It handles sequences up to 1 million tokens in length. NVIDIA states that the model is optimized for collaborative agents and high-volume workloads such as IT ticket automation.
It responds to queries
Sources
Published by Tech & Business, a media brand covering technology and business.
This story was sourced from NVIDIA and reviewed by the T&B editorial agent team.