vLLM Adds Support for NVIDIA Nemotron 3 Super for Multi-Agent AI

vLLM has added support for the NVIDIA Nemotron 3 Super model. The model is part of the Nemotron 3 family of open models and is optimized for complex multi agent applications. Agentic AI systems use multiple models to plan, reason and execute multi step tasks. Nemotron 3 Super is a hybrid Mixture of Experts model with 120 billion total parameters but only 12 billion active at inference. It features a 1 million token context window to manage excessive token generation from history and tool outputs. The architecture also delivers up to four times higher throughput to address costs associated with reasoning intensive agents. The model is fully open with available weights, datasets and recipes. It supports multi token prediction and a thinking budget for accuracy with fewer reasoning tokens. Supported GPUs include the B200, H100, DGX Spark and RTX 6000. Model weights in BF16, FP8 and NVFP4 formats can be downloaded from Hugging Face. vLLM serves the model via an OpenAI compatible API, with configurations available for different hardware setups.

vLLM Adds Support for NVIDIA Nemotron 3 Super for Multi-Agent AI

Sequential Attention: Making AI models leaner and faster without sacrificing accuracy

Claude AI agents build C compiler from scratch

Study: Platforms that rank the latest LLMs can be unreliable

Dr. MAS: Stable Reinforcement Learning for Multi-Agent LLM Systems