# NVIDIA TensorRT 11.0 ships with native multi-device inference for AI scaling

_Friday, June 26, 2026 at 3:03 PM EDT · Infrastructure · Latest · Tier 2 — Notable_

![NVIDIA TensorRT 11.0 ships with native multi-device inference for AI scaling — Primary](https://developer-blogs.nvidia.com/wp-content/uploads/2026/06/AI-Inference.webp)

NVIDIA released TensorRT 11.0, which includes a new multi-device inference feature for scaling AI models across multiple GPUs. The update adds native support in the TensorRT runtime for running a single network on more than one GPU. It integrates NVIDIA NCCL to handle distributed communication and collectives for high-throughput performance. The capability allows models to run in production settings that span multiple devices, including edge hardware. Users can download the release from the NVIDIA Developer Portal. The feature moves multi-GPU inference from earlier preview status to full support without requiring manual preview flags.

## Sources

- [developer.nvidia.com](https://developer.nvidia.com/blog/scaling-ai-inference-across-multiple-gpus-using-nvidia-tensorrt-with-multi-device-inference-support/)

---
Canonical: https://techandbusiness.org/newswire/WMYow9Ig064KslncDNIt66
Retrieved: 2026-06-26T22:45:49.717Z
Publisher: Tech & Business (techandbusiness.org)