# Alibaba's Metis agent reduces redundant AI tool calls by 96%

_Thursday, April 30, 2026 at 8:19 PM EDT · AI · Latest · Tier 2 — Notable_

![Alibaba's Metis agent reduces redundant AI tool calls by 96% — Primary](https://images.ctfassets.net/jdtwqhzvc2n1/5adrVJG12DsZYPv3bAT3Kk/786e22dcb26f295b11a3de9d91a97ac3/LLM_tool-use_abstention.jpg?w=800&amp;q=75)

Researchers at Alibaba have developed a new training framework that dramatically reduces unnecessary AI tool calls while maintaining accuracy. The system, called Hierarchical Decoupled Policy Optimization, trains agent models to balance execution efficiency with task correctness.

Current AI agents often suffer from what the researchers call a "metacognitive deficit," meaning they have difficulty deciding when to rely on internal knowledge versus querying external tools. The models tend to blindly invoke tools and APIs, creating latency bottlenecks, unnecessary costs, and degraded reasoning from environmental noise.

Previous reinforcement learning methods attempted to address this by combining task accuracy and execution efficiency into a single reward signal. The researchers found this creates an optimization dilemma: if efficiency penalties are too strict, models suppress necessary tool use and sacrifice correctness; if too lenient, the signal fails to prevent tool overuse.

HDPO separates accuracy and efficiency into independent optimization channels. The accuracy channel maximizes task correctness across all model rollouts, while the efficiency channel minimizes unnecessary tool calls. Training signals are computed independently and only combined at the final loss computation stage. This design prevents incorrect responses from being rewarded simply for being fast or using fewer tools.

The framework also creates an implicit cognitive curriculum. Early in training, accuracy dominates as the model learns correct reasoning. As reasoning capabilities mature, the efficiency signal scales up, allowing the model to refine its self-reliance by avoiding redundant API calls.

To support HDPO, the researchers built a multi-stage data curation pipeline for both supervised fine-tuning and reinforcement learning. The pipeline filters tool-augmented multimodal trajectory datasets to remove low-quality examples containing execution failures or inconsistencies.

The multimodal model trained with HDPO, called Metis, reduced redundant tool invocations from 98% to 2% while establishing new state-of-the-art reasoning accuracy across key industry benchmarks. The researchers say this approach enables the development of responsive and cost-effective agentic systems that know when to abstain from using tools.

## Sources

- [VentureBeat](https://venturebeat.com/orchestration/alibabas-metis-agent-cuts-redundant-ai-tool-calls-from-98-to-2-and-gets-more-accurate-doing-it)

---
Canonical: https://techandbusiness.org/newswire/3j8mtG0mARJOasjzjH9k1I
Retrieved: 2026-05-01T02:32:15.853Z
Publisher: Tech & Business (techandbusiness.org)