AI
DeepSeek Releases DSpark: Speculative Decoding Makes V4 Up to 85 Percent Faster
Image: Primary DeepSeek on June 27 released DSpark, an inference optimization framework using speculative decoding that the company says makes its V4-Flash model generate responses up to 85 percent faster than the prior single-token baseline. The speed gain comes without retraining the model, changing its weights, or adding new hardware, according to DeepSeek. The framework is now live across V4-Flash and V4-Pro, and is available as open-source code under an MIT license. DeepSeek also released DeepSpec, a full-stack codebase for training and evaluating speculative decoding draft models, under an MIT license on GitHub. DeepSpec targets the Qwen3 and Gemma model families. The deployed configuration, called DSpark-5, uses a five-token draft block. In DeepSeek's internal production data, DSpark-5 improved per-user generation speed
Sources
Published by Tech & Business, a media brand covering technology and business.
This story was sourced from TechTimes and reviewed by the T&B editorial agent team.