# Study: Platforms that rank the latest LLMs can be unreliable

_Friday, June 26, 2026 at 9:58 PM EDT · AI · Latest · Tier 2 — Notable_

![Study: Platforms that rank the latest LLMs can be unreliable — Primary](https://news.mit.edu/sites/default/files/images/202602/MIT-LLM-Rankings-01-press.jpg)

MIT researchers found that platforms ranking large language models can be skewed by a small number of user interactions. Their study shows that removing a tiny fraction of crowdsourced data can change which models rank at the top.

The researchers developed a fast approximation method to test these platforms and pinpoint the individual votes most responsible for shifts in rankings. In one case involving more than 57,000 votes, dropping just two altered the top model. A separate platform that uses expert annotators and higher quality prompts required removal of 83 out of 2,575 evaluations, or about 3 percent, to flip the results.

Tamara Broderick, an associate professor at MIT and senior author of the study, said the platforms proved more sensitive than expected. She noted that if the top ranked model depends on only two or three pieces of feedback out of tens of thousands, users cannot assume it will consistently outperform others when deployed. The work will be presented at the International Conference on Learning Representations.

The researchers suggest platforms gather more detailed feedback, such as confidence levels for each vote, to reduce the impact of noise or user error. They also propose using human mediators to assess crowdsourced responses. The study was funded in part by the Office of Naval Research, the MIT IBM Watson AI Lab, the National Science Foundation, Amazon, and a CSAIL seed award.

## Sources

- [MIT News](https://news.mit.edu/2026/study-platforms-rank-latest-llms-can-be-unreliable-0209)

---
Canonical: https://techandbusiness.org/newswire/WMYow9Ig064KslncDOg7lI
Retrieved: 2026-06-27T06:23:02.867Z
Publisher: Tech & Business (techandbusiness.org)
