# Testing LLMs on superconductivity research questions

_Friday, June 26, 2026 at 6:22 PM EDT · science · Latest · Tier 2 — Notable_

![Testing LLMs on superconductivity research questions — Primary](https://storage.googleapis.com/gweb-research2023-media/images/HO_previewImage1.width-800.format-jpeg.jpg)

Google Research scientists tested six large language models on expert-level questions about high-temperature superconductivity. The study, published in the Proceedings of the National Academy of Sciences, compared four models with full web access against two systems that used curated sources. A panel of experts scored the responses on six metrics.

NotebookLM and a custom-built retrieval-augmented generation system performed best overall. Both systems drew from a library of 1,726 sources that included experimental papers and 15 review articles selected by 12 international experts in the field. Models relying on open web sources tended to mix established theories with speculative ones.

The evaluation used 67 questions on topics such as doping levels in LSCO and evidence for quantum critical points in cuprates. NotebookLM scored highest for providing evidence and for offering comprehensive answers with a balance of perspectives. The custom system ranked next in most categories.

The models showed weaknesses in temporal context and in interpreting tables and images from scientific papers. The work was a collaboration with Cornell University and Harvard University.

## Sources

- [Google Research](https://research.google/blog/testing-llms-on-superconductivity-research-questions/)

---
Canonical: https://techandbusiness.org/newswire/WMYow9Ig064KslncDNzGhs
Retrieved: 2026-06-27T04:14:29.338Z
Publisher: Tech & Business (techandbusiness.org)