Measuring the Sensitivity of Classification Models with the Error Sensitivity Profile
TLDR
This paper introduces the Error Sensitivity Profile (ESP) to quantify how data errors impact classification model performance, aiding data cleaning.
Key contributions
- Proposes the Error Sensitivity Profile (ESP) to measure model sensitivity to data errors.
- ESP quantifies performance degradation from errors in single or multiple features.
- Enables prioritization of data-cleaning efforts based on error impact.
- Introduces `\dirty`, a tool suite for computing the Error Sensitivity Profile.
Why it matters
Data quality is crucial for ML, but identifying impactful errors is hard. ESP provides a systematic way to quantify and prioritize data cleaning. This helps improve model reliability and efficiency by focusing efforts where they matter most.
Original Abstract
The quality of training data is critical to the performance of machine learning models. In this paper, the Error Sensitivity Profile (ESP) is proposed. It quantifies the sensitivity of model performance to errors in a single feature or in multiple features. By leveraging ESP, data-cleaning efforts can be prioritized based on error types and features most likely to affect model performance. To support the computation of this metric, an integrated suite of tools, called \dirty, is created. We conduct an extensive experimental study on two widely used datasets using 14 classification models, revealing that performance degradation is not always predictable from simple correlations with the target variable.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.