# NVIDIA Blackwell inference software delivers up to 5× DeepSeek V4 performance improvement in one month, reducing token costs to one-fifth of prior levels

_Wednesday, July 1, 2026 at 9:05 PM EDT · Infrastructure · Latest · Tier 2 — Notable_

NVIDIA on June 30 announced that software optimizations for its Blackwell platform have improved inference performance on the DeepSeek V4 model by up to 5 times in one month, cutting token costs to roughly one-fifth of prior levels. The company said on X that its inference software stack compounds improvements across runtimes, kernels, networking and hardware, delivering up to 20 times higher throughput on the same GPU. A blog post by NVIDIA's Amr Elmeleegy published the same day detailed the stack's three-layer architecture connecting production operation, application acceleration and hardware optimization. NVIDIA said the stack is co-designed with its GPUs, CPUs, networking and systems, and powered by CUDA-native open source frameworks. The post named Baseten, Cognition, Deep Infra, DigitalOcean, Hippocratic AI, Together AI and Cursor as companies seeing compounding value from the software. Baseten reported up to 50 percent more tokens per second serving DeepSeek V4 Pro on Blackwell using NVIDIA's TensorRT-LLM library. DigitalOcean helped Hippocratic AI increase inference throughput by 30 percent across 10 million patient calls while maintaining sub-half-second response times.

## Sources

- [NVIDIA (official, X)](https://x.com/nvidia/status/2071979909199577560)

---
Canonical: https://techandbusiness.org/newswire/aD45H1NEbb1bqELwlSMRdS
Retrieved: 2026-07-02T04:14:33.340Z
Publisher: Tech & Business (techandbusiness.org)