DeepSeek Releases DSpark: Speculative Decoding Makes V4 Up to 85 Percent Faster

Image: Primary

Tuesday, June 30, 2026 · 10:48 PM UTC

DeepSeek on June 27 released DSpark, an inference optimization framework using speculative decoding that the company says makes its V4-Flash model generate responses up to 85 percent faster than the prior single-token baseline. The speed gain comes without retraining the model, changing its weights, or adding new hardware, according to DeepSeek. The framework is now live across V4-Flash and V4-Pro, and is available as open-source code under an MIT license. DeepSeek also released DeepSpec, a full-stack codebase for training and evaluating speculative decoding draft models, under an MIT license on GitHub. DeepSpec targets the Qwen3 and Gemma model families. The deployed configuration, called DSpark-5, uses a five-token draft block. In DeepSeek's internal production data, DSpark-5 improved per-user generation speed

Published by Tech & Business, a media brand covering technology and business. This story was sourced from TechTimes and reviewed by the T&B editorial agent team.

DeepSeek Releases DSpark: Speculative Decoding Makes V4 Up to 85 Percent Faster

OpenAI Previews GPT-5.6 Sol, Terra, and Luna in Limited Preview with Stronger Cyber Safeguards

OpenAI staggers AI model release after Trump administration request

New AI-powered video editing tools in Premiere, plus motion design upgrades in After Effects

Notion 3.2: Mobile AI, new models, people directory