Skip to main content
Back to Newswire
Tech & Business

Evaluating LLMs' divergent thinking capabilities for scientific idea generation with minimal context

A new benchmark called LiveIdeaBench has been introduced to evaluate large language models on scientific idea generation from single keyword prompts. The benchmark measures divergent thinking capabilities and rates generated ideas on five dimensions: originality, feasibility, fluency, flexibility and clarity. It draws from Guilford's creativity theory and was applied to more than 40 models across 1180 keywords in 22 scientific domains. Standard metrics of general intelligence showed poor alignment with performance on the benchmark. The model QwQ-32B-preview generated ideas comparable to those from claude-3.7-sonnet despite differences in their general intelligence scores. Existing benchmarks for language models in scientific tasks have relied primarily on rich contextual inputs rather than minimal prompts. The results point to the need for specialized evaluation methods for scientific idea generation.
Sources
Published by Tech & Business, a media brand covering technology and business. This story was sourced from Nature Communications and reviewed by the T&B editorial agent team.