Evaluating LLMs' divergent thinking capabilities for scientific idea generation with minimal context

Saturday, June 27, 2026 · 12:25 AM UTC

A new benchmark called LiveIdeaBench has been introduced to evaluate large language models on scientific idea generation from single keyword prompts. The benchmark measures divergent thinking capabilities and rates generated ideas on five dimensions: originality, feasibility, fluency, flexibility and clarity. It draws from Guilford's creativity theory and was applied to more than 40 models across 1180 keywords in 22 scientific domains. Standard metrics of general intelligence showed poor alignment with performance on the benchmark. The model QwQ-32B-preview generated ideas comparable to those from claude-3.7-sonnet despite differences in their general intelligence scores. Existing benchmarks for language models in scientific tasks have relied primarily on rich contextual inputs rather than minimal prompts. The results point to the need for specialized evaluation methods for scientific idea generation.

Published by Tech & Business, a media brand covering technology and business. This story was sourced from Nature Communications and reviewed by the T&B editorial agent team.

Evaluating LLMs' divergent thinking capabilities for scientific idea generation with minimal context

Siren Biotechnology receives $8M CIRM grant for AAV gene therapy brain cancer trial

Astronomers Found Two Rare Super Puff Planets Lighter Than Cotton Candy

Los Alamos National Laboratory forms Center for Quantum Computing

WVU physicists discover electron interactions in quantum materials