ICPR2020 Paper link
Neural Networks Paper link
- Requirements
- Dataset
- Folders
- How-to-train
- Usage-of-P-DIFF-layer
- Complexity-of-P-DIFF
- Experiment-environment
- Experiment-settings
- Model-list
Training and Testing dataset:
The structure of code folders:
Dataset | Description |
---|---|
README.md | The detailed instruction for P-DIFF reproduction. |
train.sh | The training entry script of P-DIFF. |
test.sh | The testing entry script of P-DIFF. |
caffe | The compiled official caffe repo. |
code | Some data downloading and processing codes saved in this folder. |
data | The training and testing datasets used in paper. |
layer | The implement of P-DIFF layer in caffe. |
log | The folder used to save training logs. |
models | The folder used to save training models. |
prototxt | The prototxt files used to train or test models in different datasets. |
We demostrate the training process of cifar10 dataset contains 50% symmetry noise
Pipeline:
Step 1. Clone caffe repo to ./caffe folder and compile it after install its requirements.
cd caffe
mkdir build
cd build
cmake ..
make -j8
Step 2. Add P-DIFF layer to caffe layers and recompile caffe project.
Step 3. Download mnist, cifar-10, cifar-100 and cloth1m datasets.(You can contact the author to download miniimage)
python ./code/download.py --dataset=cifar10
Step 4. Corrupt the labels of training dataset by using ./code/corrupt.py script
python ./code/corrupt.py --dataset=cifar10 --noise_type=SYMMETRY --noise_rate=0.50
Step 5. Generate respective lmdb by using caffe's converting tool(need multi-label supporting).
bash ./code/convert.sh cifar10 SYMMETRY 50
Step 6. Configure the training dataset path in train_val.prototxt file.
edit the parameters of ${noise_type}, ${noise_rate} and p_diff_layer in ./prototxt/train_val.prototxt.cifar10
Step 7. Train the dataset by using command of caffe.
bash ./train.sh cifar10
Step 8. Test the dataset by using command of caffe.
bash ./test.sh cifar10 SYMMETRY 50
The usage of P-DIFF layer in train_val.prototxt is described below:
layer {
name: "fix_prob"
type: "PDIFF"
# input of this layer
# bottom[0] is used for the forward of network
bottom: "prob"
# bottom[1] is used for computing sample weight, which could be different from bottom[0]
bottom: "prob"
# bottom[2] is the class label which indicate the sample's category
bottom: "label000"
# bottom[3](optional) is the noise label which indicate whether the sample is noise,
# just used for drawing pdf_clean and pdf_noise, not used for training.
# The second label is generated by multi-label lmdb converter.
# We can discard this input in general.
#bottom: "label001"
# output of this layer
top: "fix_prob"
# parameters of this layer
p_diff_param {
# We use a queue the maintain the delta distribution, the size of queue is slide_batch_num x batch_size
slide_batch_num: 100
# The iteration number per epoch
# Its value equal to total training samples number divide batch_size
# Here is 50,000 / 128
epoch_iters: 390
# This is the switch whether use auto noise method:
# "on" means this layer will compute a noise rate automatically,
# "off" means this layer will use a specific noise rate.
use_auto_noise_ratio: false
# if the switch of use_auto_noise_ratio is off, we need to set the specific noise rate.
noise_ratio: 0.50
# print some training information, like noise rate, threshold of zeta, pcf, weight, etc.
# which are used for debugging and draw figures.
# we turn off this switch usually.
#debug: true
# this is the prefix of log file name, which stored the information under debug mode.
#debug_prefix: "cifar10_noise_symmetric_50"
}
}
If we have known the batch size, slide batch num and bin size (200 in our training)
The time complexity of P-DIFF per iteration is
O(batch_size) x [O(bin_size) + O(slide_batch_num) + O(k)]
k is a constant value.
The space complexity of P-DIFF per iteration is
O(slide_batch_num) x O(batch_size) + k x O(bin_size)
k is a constant value.
- GPU: 8 cards of GeForce GTX TITAN X
- CPU: 48 cores of Intel(R) Xeon(R) CPU E5-2678 v3 @ 2.50GHz
- Memory: 512GB
- Ubuntu 18.04.2
-
mnist: Gray image with size 28x28, without cropping, total 60,000 training images and 10,000 test images.
This is a clean dataset, we corrupt it in three kinds of noise type. -
cifar10: Color image with size 32x32, without cropping, total 50,000 training images and 10,000 test images.
This is a clean dataset, we corrupt it in three kinds of noise type. -
cifar100: Color image with size 32x32, without cropping, total 50,000 training images and 10,000 test images.
This is a clean dataset, we corrupt it in three kinds of noise type. -
miniimage: Color image with size 84x84, without cropping, total 50,000 training images and 10,000 test images.
This is a clean dataset, we corrupt it in three kinds of noise type. -
cloth1m: Color image resize to 256x256 size, cropped 224x224 region, total 1,047571 training images and 10,526 test images.
This is a noisy dataset.
It should be emphasized that the original clothing1m contains 1,000,000 noisy training data and 47,571 clean training data.
- Optimizer: SGD
- Learning rate: 0.001
- Momentum: 0.9
- Batch size: 128
-
${T}_{max}$ : 200 -
${T}_{k}$ : 20
- CoNet (9 layers CNN network) for mnist, cifar10 and cifar100, miniimage, cloth1m.cnn the network structure is shown in supplementary material.
- ResNet101 for cloth1m.resnet101
-
${\zeta}$ : 0.9 - M: 0.2
-
@inproceedings{P-DIFF,
author = {Wei Hu and QiHao Zhao and Yangyu Huang and Fan Zhang},
title = {{P-DIFF:} Learning Classifier with Noisy Labels based on Probability Difference Distributions},
booktitle = {{ICPR}},
pages = {1882--1889},
publisher = {{IEEE}},
year = {2020}
} -
@article{zhao2021p,
title={P-DIFF+: Improving learning classifier with noisy labels by Noisy Negative Learning loss},
author={Zhao, QiHao and Hu, Wei and Huang, Yangyu and Zhang, Fan},
journal={Neural Networks},
volume={144},
pages={1--10},
year={2021},
publisher={Elsevier}
}