New open models-Introducing NVIDIA Nemotron 3 Super

NVIDIA has released Nemotron-3-Super-120B-A12B-FP8 as part of its Nemotron family of open models. The model contains 120 billion parameters in total with 12 billion active parameters. It features a hybrid architecture that combines Mamba-2 layers, mixture-of-experts components, attention layers, and multi-token prediction layers. Pre-training involved more than 25 trillion tokens drawn from crawled and synthetic sources covering code, math, science, and general knowledge. Supervised fine-tuning followed on synthetic data for code, math, science, tool calling, instruction following, structured outputs, and general knowledge. The process concluded with reinforcement learning using asynchronous group relative policy optimization in environments focused on math, code, science, and multi-step tool use. The model supports English, French, German, Italian, Japanese, Spanish, and Chinese. It handles sequences up to 1 million tokens in length. NVIDIA states that the model is optimized for collaborative agents and high-volume workloads such as IT ticket automation. It responds to queries

New open models-Introducing NVIDIA Nemotron 3 Super

Sequential Attention: Making AI models leaner and faster without sacrificing accuracy

Claude AI agents build C compiler from scratch

Study: Platforms that rank the latest LLMs can be unreliable

Dr. MAS: Stable Reinforcement Learning for Multi-Agent LLM Systems