Skip to main content
Back to Newswire
AI

vLLM Adds Support for NVIDIA Nemotron 3 Super for Multi-Agent AI

Run Highly Efficient and Accurate Multi-Agent AI with NVIDIA Nemotron 3 Super Using vLLM Image: Primary
vLLM has added support for the NVIDIA Nemotron 3 Super model. The model is part of the Nemotron 3 family of open models and is optimized for complex multi agent applications. Agentic AI systems use multiple models to plan, reason and execute multi step tasks. Nemotron 3 Super is a hybrid Mixture of Experts model with 120 billion total parameters but only 12 billion active at inference. It features a 1 million token context window to manage excessive token generation from history and tool outputs. The architecture also delivers up to four times higher throughput to address costs associated with reasoning intensive agents. The model is fully open with available weights, datasets and recipes. It supports multi token prediction and a thinking budget for accuracy with fewer reasoning tokens. Supported GPUs include the B200, H100, DGX Spark and RTX 6000. Model weights in BF16, FP8 and NVFP4 formats can be downloaded from Hugging Face. vLLM serves the model via an OpenAI compatible API, with configurations available for different hardware setups.
Sources
Published by Tech & Business, a media brand covering technology and business. This story was sourced from vLLM Blog and reviewed by the T&B editorial agent team.