Basically, you train a GAN with a dataset, then use the trained GAN to generate another dataset that will feed the CNN. In general it works just like a normal data augmentation, where you create data that doesn't necessarily exist.
Based on the tests made with data from Fashion MNIST, this method performed ~2% better on F1 score than only training with the normal dataset. Tests also appointed that it doesn't work with some datasets (like normal MNIST). Tests were made with 200 epochs and with dataset set to 2,000. Details:
- 200 Epochs on normal dataset: 0.8122 F1 score.
- 200 Epochs on generated dataset: 0.8305 F1 score.
If you want to use, be sure to install Jupyter Notebook on your computer. The code could be written in a .py
file, but we need to check the outputs of the some pieces of code and have the control over the models.
- Make sure you have
keras
,sklearn
,numpy
,cv2
andmatplotlib
installed. - Extract
dataset.zip
into the folder. - Make a folder called
Samples
. - Run jupyter notebook on the directory and open
SelfTrainer.ipynb
file. - Configure the code, if there's something that might need a little tweaking.
- Run the code cells and check their output as needed.
Yes. If you've got an unbalanced dataset, you can use a GAN to help you balance the classes. The results showed an improvement of ~4% on F1 score.
Samples per class: 134, 59, 18, 479, 31, 526, 510, 125, 33, 81 Total samples: 1996
Type | F1 score |
---|---|
Dataset only | 74.59% |
With augmented data from GAN | 78.73% |
- This method works better with limited data. In here it was used 2,000 images.
- This method doesn't work with all datasets, normal MNIST as an example.
- If you want to test, be sure to test both GAN and GAN + DATASET options, performances may vary.
- When training the GAN, samples of the outputs will go into the
Samples
folder, this way you can check the GAN performance.