AI
Improving AI models' ability to explain their predictions
Image: Primary MIT researchers developed a method to improve concept bottleneck models used in artificial intelligence systems. The technique extracts concepts that a pretrained computer vision model has already learned and converts them into explanations that humans can understand.
The approach employs a sparse autoencoder to identify relevant features from the target model. A multimodal large language model then describes these features in plain language and annotates images to train a concept recognition module.
When tested on bird species prediction and skin lesion identification, the method achieved the highest accuracy among compared approaches. It also produced concepts more applicable to the dataset images.
The work restricts the model to using only five concepts per prediction to ensure explanations remain understandable. Lead
The research involves collaborators from Polytechnic University of Milan and the MIT Computer Science and Artificial Intelligence Laboratory. It will be presented at the International Conference on Learning Representations.
Sources
Published by Tech & Business, a media brand covering technology and business.
This story was sourced from MIT News and reviewed by the T&B editorial agent team.