This is the official repo for our CVPR22 paper: Scalable Penalized Regression for Noise Detection in Learning With Noisy Labels.
SPR is a theoretically guaranteed noisy label detection framework to detect and remove noisy data for learning with noisy labels. It produces a penalized regression to model the linear relation between network features and one-hot labels, where the noisy data are identified by the non-zero mean shift parameters solved in the regression model. A non-asymptotic probabilistic condition for SPR is provided to correctly identify the noisy data. SPR can also be combined with semi-supervised algorithm to further exploit the support of noisy data as unlabeled data.
python==3.7.6
numpy==1.19.1
scipy==1.6.0
scikit-learn==0.23.2
torch==1.5.1
torchvision==0.6.0a0+35d732a
MNIST and CIFAR-10 can be downloaded using torchvision. The other two datasets can be downloaded from the official link: ANIMAL10, WebVision.
The datasets are expected to be stored in the folder ../data or specified by the root parameter, and arranged as follows:
│data/
├── MNIST/
│ ├── ......
├── CIFAR10/
│ ├── ......
├── animal10/
│ ├── training/
│ │ ├── ......
│ ├── testing/
│ │ ├── ......
├── webvision/
│ ├── info/
│ │ ├── ......
│ ├── google/
│ │ ├── ......
│ ├── val_images_256/
│ │ ├── ......
(Optional)
├── imagenet/
│ ├── meta.mat
│ ├── ILSVRC2012_validation_ground_truth.txt
│ ├── val/
│ │ ├── ......
The pretained models can be downloaded from here and should be put in the folder ckpt.
Example training commands are listed in the folder scripts. You could try the following commands as a start.
Note: To train with SPR but without using CutMix, you should set --cutmix 1
and --cutmix_prob 0
.
Train SPR on MNIST with different noise setting:
python scripts/train_mnist.py
Train SPR on CIFAR10 with different noise setting:
python scripts/train_cifar.py
Train SPR on Animal10:
python scripts/train_animal.py
Train SPR on WebVision:
python scripts/train_webvision.py
Example evaluation commands are listed in the folder scripts. You could try the following commands as a start.
Test SPR on MNIST with different noise setting:
python scripts/eval_mnist.py
Test SPR on CIFAR10 with different noise setting:
python scripts/eval_cifar.py
Test SPR on Animal10:
python scripts/eval_animal.py
Test SPR on WebVision:
python scripts/eval_webvision.py
Thanks to everyone who makes their code and models available. In particular,
For issues using SPR, please submit a GitHub issue.
If you found the provided code useful, please cite our work.
@inproceedings{wang2022scalable,
title={Scalable Penalized Regression for Noise Detection in Learning with Noisy Labels},
author={Wang, Yikai and Sun, Xinwei and Fu, Yanwei},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year={2022}
}