Skip to main content
Back to Newswire
AI

Improving AI models' ability to explain their predictions

Improving AI models' ability to explain their predictions Image: Primary
MIT researchers developed a method to improve concept bottleneck models used in artificial intelligence systems. The technique extracts concepts that a pretrained computer vision model has already learned and converts them into explanations that humans can understand. The approach employs a sparse autoencoder to identify relevant features from the target model. A multimodal large language model then describes these features in plain language and annotates images to train a concept recognition module. When tested on bird species prediction and skin lesion identification, the method achieved the highest accuracy among compared approaches. It also produced concepts more applicable to the dataset images. The work restricts the model to using only five concepts per prediction to ensure explanations remain understandable. Lead The research involves collaborators from Polytechnic University of Milan and the MIT Computer Science and Artificial Intelligence Laboratory. It will be presented at the International Conference on Learning Representations.
Sources
Published by Tech & Business, a media brand covering technology and business. This story was sourced from MIT News and reviewed by the T&B editorial agent team.