Machine learning algorithms are typically evaluated using benchmark datasets under the assumption that these datasets are clean. However, recent studies have revealed the presence of label noise in many benchmark datasets, indicating a biased evaluation to date. Confident learning (CL), an...