# Researchers propose Train-to-Test scaling to optimize AI compute budgets

_Friday, April 17, 2026 at 4:04 PM EDT · AI, Products · Latest · Tier 2 — Notable_

![Researchers propose Train-to-Test scaling to optimize AI compute budgets — Primary](https://images.ctfassets.net/jdtwqhzvc2n1/la95QpAj3bcZUcSLAy3X5/9df7300fd3887b42d53cc4457aff8d63/LLM_multi-response_sampling.jpg?w=800&amp;q=75)

Researchers from the University of Wisconsin-Madison and Stanford University have introduced Train-to-Test scaling laws, a framework that jointly optimizes model size, training data volume, and test-time inference samples.

The approach challenges conventional guidelines that focus only on training costs, demonstrating that it is compute-optimal to train substantially smaller models on more data than traditional rules prescribe. The saved computational overhead can then be used to generate multiple repeated samples during inference.

Current industry standards like the Chinchilla rule suggest roughly 20 training tokens per model parameter. However, creators of modern AI model families including Llama, Gemma, and Qwen regularly break this rule by overtraining smaller models on large datasets.

The new framework combines pretraining and inference budgets into one optimization formula accounting for both baseline training costs and the compounding cost of repeated inference queries. Researchers validated their approach with over 100 language models ranging from 5 million to 901 million parameters, training 21 new heavily overtrained checkpoints from scratch.

Experiments showed highly overtrained small models consistently outperformed larger, Chinchilla-optimal models across eight evaluation tasks when test-time sampling costs were considered. The compute-optimal strategy shifts decisively toward compact models trained on extensive data.

According to co-author Nicholas Roberts, the framework is tailored for reasoning-heavy applications like coding where repeated sampling is used as a test-time scaling method, rather than knowledge-heavy applications like chat models.

The research team plans to open-source their checkpoints and code, enabling enterprises to test the scaling behavior with their own data. The findings suggest that strong reasoning models may not require massive compute budgets but rather good data and smart allocation of training and inference resources.

## Sources

- [VentureBeat](https://venturebeat.com/orchestration/train-to-test-scaling-explained-how-to-optimize-your-end-to-end-ai-compute-budget-for-inference)

---
Canonical: https://techandbusiness.org/newswire/08EUFJXk3wQgRnqiEk8qJG
Retrieved: 2026-07-11T02:14:09.003Z
Publisher: Tech & Business (techandbusiness.org)