On the Properties of Feature Attribution for Supervised Contrastive Learning

April 24, 20262604.22540

Leonardo Arrighi, Julia Eva Belloni, Aurélie Gallet, Ivan Gentile, Matteo Lippi + 1 more

cs.LGcs.AI

TLDR

This paper shows Supervised Contrastive Learning (SCL) improves feature attribution explanations in image classification models compared to Cross-Entropy.

Key contributions

Empirically demonstrates SCL improves feature attribution quality for image classification NNs.
Highlights SCL's superiority over CE in attribution faithfulness, complexity, and continuity.
Reinforces SCL's potential for developing more trustworthy and transparent neural networks.
Offers guidance for practitioners on selecting training objectives for model transparency.

Why it matters

Supervised Contrastive Learning (SCL) is vital for safety-critical AI due to its robustness. This paper shows SCL also yields higher-quality, more trustworthy feature explanations than traditional methods. This finding is crucial for developing transparent AI and helps practitioners choose training objectives for explainable models.

Original Abstract

Most Neural Networks (NNs) for classification are trained using Cross-Entropy as a loss function. This approach requires the model to have an explicit classification layer. However, there exist alternative approaches, such as Contrastive Learning (CL). Instead of explicitly operating a classification, CL has the NN produce an embedding space where projections of similar data are pulled together, while projections of dissimilar data are pushed apart. In the case of Supervised CL (SCL), labels are adopted as similarity criteria, thus creating an embedding space where the projected data points are well-clustered. SCL provides crucial advantages over CE with regard to adversarial robustness and out-of-distribution detection, thus making it a more natural choice in safety-critical scenarios. In the present paper, we empirically show that NNs for image classification trained with SCL present higher-quality feature attribution explanations than CL with regard to faithfulness, complexity, and continuity. These results reinforce previous findings about CL-based approaches when targeting more trustworthy and transparent NNs and can guide practitioners in the selection of training objectives targeting not only accuracy, but also transparency of the models.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers