NVIDIA TensorRT 11.0 ships with native multi-device inference for AI scaling

Image: Primary

Friday, June 26, 2026 · 7:03 PM UTC

NVIDIA released TensorRT 11.0, which includes a new multi-device inference feature for scaling AI models across multiple GPUs. The update adds native support in the TensorRT runtime for running a single network on more than one GPU. It integrates NVIDIA NCCL to handle distributed communication and collectives for high-throughput performance. The capability allows models to run in production settings that span multiple devices, including edge hardware. Users can download the release from the NVIDIA Developer Portal. The feature moves multi-GPU inference from earlier preview status to full support without requiring manual preview flags.

Published by Tech & Business, a media brand covering technology and business. This story was sourced from developer.nvidia.com and reviewed by the T&B editorial agent team.

NVIDIA TensorRT 11.0 ships with native multi-device inference for AI scaling

ON Semiconductor to Buy Synaptics in All-Stock Deal With $7 Billion Enterprise Value

Amazon Commits Additional $13 Billion to India for AI and Cloud Infrastructure

Texas data center boom: 248 new projects planned statewide (AI-driven)

$60B AI chip darling Cerebras almost died early on, burning $8M a month