Aloukik Aditya ([email protected])
Sarthak Rawat ([email protected])
Find top-1 accuracy on the following dataset CIFAR-100, CIFAR-10, Caltech256, Caltech101.
❖ Caltech256 : https://www.vision.caltech.edu/Image_Datasets/Caltech256/ ❖ Caltech101 : https://www.vision.caltech.edu/Image_Datasets/Caltech101/ ❖ CIFAR-100 : ➢ Data set can be downloaded directly using python ➢ from tensorflow.keras.datasets import cifar100 ➢ (train_imgs, train_labels), (test_imgs, test_labels) = cifar100.load_data() ❖ CIFAR-10 : ➢ Data set can be downloaded directly using python ➢ from tensorflow.keras.datasets import cifar10 ➢ (train_imgs, train_labels), (test_imgs, test_labels) = cifar10.load_data()
We have used the VGG16 model in our Project. The first top fully connected layer is removed and then new layers according to the number of categories of the dataset are added.
● We kept two fully connected layers of VGG16 ● And change the number of neurons in our FC layer according to Dataset.
● Caltech-256: We have used the following structure ○ 900 Dense neurons (with L1 regularizer) ○ Dropout with 0.5 rate ○ 257 Dense (number of classes)
○ 600 Dense neurons (with L1 regularizer) ○ Dropout with 0.5 rate ○ 102 Dense (number of classes)
○ 1000 Dense neurons (with L1 regularizer) ○ Dropout with 0.3 rate ○ 100 Dense (number of classes)
○ 800 Dense neurons (with L1 regularizer) ○ Dropout with 0.3 rate ○ 10 Dense (number of classes)
For preprocessing and normalization of images, we have used the inbuilt function of VGG16 from Keras. Eg: from tensorflow.keras.applications.vgg16 import preprocess_input train_imgs = preprocess_input(train_imgs)
Both the test and train images were normalized using this function.
The images in cifar-10/100 are resized to (224,244,3) format so that they can be used with the VGG16 model. The code takes up a lot of RAM. (Google Cloud VM with 60gb ram was used so as to resize without any issues).
We have used one Image augmentation technique to increase the number of training samples in Caltech256 and Caltech101 We have just used an image flip(mirror image) for augmentation:
● Step1: We freeze our layers of VGG16 so that they are not trained initially. ● Step2: We then train only our fully connected layer for 2 epochs (so that it starts to extract meanings from VGG16 network) ● Step3: We then unfreeze VGG16 layers and then train our model with a low variable learning rate for a higher number of epochs.
● For the initial 2 epochs, we used Adam optimizer with LR=0.001. ● For the training full model, we used a variable learning rate. (Example : epoch 0-3 : LR=1e-4 , epoch 4-8 : LR =1e-5 , epoch 8-10 : LR = 1e-6 ) ● For Cifar datasets we used a dropout rate of 0.3 ● For Caltech dataset, we used a dropout rate of 0.5
We ran every model 3 times and took the average accuracy of the three runs:
Results for the different dataset are as follows:
● CIFAR-100 ● CIFAR-10 ● Caltech-256 ● Caltech-101
CIFAR-100 Accuracy after first two epochs (training of only FC layer):
After this, we unfreeze all our layers and Further train our model starting from these weights.
Training full model (All layers are trainable):
** Average accuracy for 3 runs: 79.69% **
CIFAR-10 Accuracy after first two epochs (training of only FC layer):
After this, we unfreeze all our layers and Further train our model starting from these weights. Training full model (All layers are trainable):
Average accuracy for 3 runs: 93.70%
Accuracy after first two epochs (training of only FC layer):
After this, we unfreeze all our layers and Further train our model starting from these weights. Training full model (All layers are trainable):
Average accuracy for 3 runs: 78.86%
Accuracy after first two epochs (training of only FC layer):
After this, we unfreeze all our layers and Further train our model starting from these weights.
Training full model (All layers are trainable):
Average accuracy for 3 runs: 92.99%
Final Result
● CALTECH-256 : 78.86% ● CALTECH-101 : 92.99% ● CIFAR-100 : 79.69% ● CIFAR-10 : 93.70%
● 16 core CPU ● 60 GB ram ● Nvidia P-100 GPU (16gb)
We have trained our models on the following dataset: cifar-10,100, caltech-256,101. Where we have used Transfer learning using the VGG16 model. Transfer learning has great potential. This is because they were already trained to extract important features of images from different categories from a much larger dataset(ImageNet). Freezing our model for two epochs (initially) helped us reach a better accuracy faster. This might be because our fully connected layer started learning to mean from the VGG16 model. Which further boosted our accuracy in further training of the model.