Improving AI models' ability to explain their predictions

MIT researchers developed a method to improve concept bottleneck models used in artificial intelligence systems. The technique extracts concepts that a pretrained computer vision model has already learned and converts them into explanations that humans can understand. The approach employs a sparse autoencoder to identify relevant features from the target model. A multimodal large language model then describes these features in plain language and annotates images to train a concept recognition module. When tested on bird species prediction and skin lesion identification, the method achieved the highest accuracy among compared approaches. It also produced concepts more applicable to the dataset images. The work restricts the model to using only five concepts per prediction to ensure explanations remain understandable. Lead The research involves collaborators from Polytechnic University of Milan and the MIT Computer Science and Artificial Intelligence Laboratory. It will be presented at the International Conference on Learning Representations.

Improving AI models' ability to explain their predictions

Sequential Attention: Making AI models leaner and faster without sacrificing accuracy

Claude AI agents build C compiler from scratch

Study: Platforms that rank the latest LLMs can be unreliable

Dr. MAS: Stable Reinforcement Learning for Multi-Agent LLM Systems