AI
NVIDIA AI-Q deep research agent reaches #1 on DeepResearch Bench I and II using Nemotron models
Image: Primary NVIDIA AI-Q deep research agent achieved first place on DeepResearch Bench with a score of 55.95. It also led DeepResearch Bench II with a score of 54.50. Both benchmarks assess research agents on report quality, information recall, analysis and presentation.
The agent uses a multi-agent architecture with planner, researcher and orchestrator components. It runs on the NVIDIA NeMo Agent Toolkit and fine-tuned NVIDIA Nemotron 3 Super models. Specialist subagents handle evidence gathering, causal exploration, benchmarking, critique and trend scanning in parallel.
Training used about 67,000 trajectories drawn from open datasets such as OpenScholar, ResearchQA and Fathom-DeepResearch-SFT. A NVIDIA Nemotron-3-Super-120B-A12B model received supervised fine-tuning for one epoch across 16
An optional ensemble merges outputs from parallel agents. A report refiner step can draw on raw researcher briefs to improve the final document.
Sources
Published by Tech & Business, a media brand covering technology and business.
This story was sourced from NVIDIA and reviewed by the T&B editorial agent team.