A Comparative Study of Learning Algorithms for Image Classification

Arkonil Dhar Saumyadip Bhowmick Shreya Pramanik
Shubha Sankar Banerjee Souvik Bhattacharyya

M.Sc. Statistics
Indian Institute of Technology Kanpur

Duration- July, 2021

Data Description

The MNIST dataset (Modified National Institute of Standards and Technology) is a large database of handwritten digits that has been chosen as our intended dataset for testing different Image Classification Techniques.

The data is present is present as MATLAB files which needs to be uploaded to the Jupiter notebook in due process.

The data is in two sets: Unrotated and Rotated.

The unrotated dataset as the name suggests consists of observation where the digits are not upright from the perspective of the viewer.

The rotated dataset on the other hand consists of observations where the digits have been rotated clockwise or anti-clockwise.

Methodology

We have tried to check the efficacies of a variety of learning algorithms to judge their usability on this particular dataset.

In particular we have used the following learning algorithms:

Discriminant Analysis
- Linear Discriminant Analysis
- Quadratic Discriminant Analysis
Decision Trees
- Application of Cost Complexity Pruning
- Random Forest
Support Vector Machine
Neural Network
Convolutional Neural Network

In this project, we have first applied the above said learning algorithms to predict the labels of the observations both on unrotated and merged (unrotated+rotated) data.

Brief Summary

Discriminant Analysis

Discriminant Analysis on unrotated data

Trained the Linear and Quadratic Discriminant models on the unrotated training dataset.
Tried to predict the labels of the observations in the unrotated test dataset.
Tried to predict the labels of the observations in the rotated test dataset.
Applied Principal Component Analysis (PCA) as a means of dimension reduction and then tried to repeat the above steps using the principal components as observations.

Discriminant Analysis on merged data

Merged the training sets of rotated and unrotated datasets.
Trained LDA and QDA models on this merged dataset.
Tried to predict the labels of the merged test dataset.
Used PCA on this merged dataset to reduce the dimension of the feature space and tried to repeat the above steps using the principal components as observations.

Decision Tree

Decision Tree on unrotated data

Applied Decision Tree Algorithm on the original dataset with and without reduced dimension via applying PCA.
Applied Decision Tree with Cost Complexity Pruning on the same dataset.
Trained a Random Forest Model on the unrotated dataset.
Tested the above three models on the rotated test dataset.

Decision Tree on merged data

Merged the training sets of scaled rotated and unrotated datasets.
Applied Decision Tree Algorithm on the merged dataset with and without Cost Complexity Pruning.
Tested the above models on the merged test set.
Trained a Random Forest Model on the merged dataset tested it on the merged test set.

Support Vector Machine

SVM on unrotated dataset

Trained SVM model on the unrotated training dataset using RBF Kernel.
Tried to predict the labels of the observations in the unrotated test dataset.
Tried to predict the labels of the observations in the rotated test dataset.
Applied Principal Component Analysis (PCA) as a means of dimension reduction and then tried to repeat the above steps using the principal components as observations.

SVM on rotated dataset

Merged the training sets of scaled rotated and unrotated datasets.
Trained SVM model on this merged dataset using RBF Kernel.
Tried to predict the labels using the merged test dataset.
Used PCA on this merged dataset to reduce the dimension of the feature space and tried to repeat the above steps using the principal components as observations.

Neural Network

Trained Neural Network model on the unrotated training dataset with varying model architectures.
Selected the model yielding best validation accuracy and tried to predict the labels of the observations in the unrotated test dataset.
Tried to predict the labels of the observation in the rotated test dataset.
Applied Principal Component Analysis (PCA) as a means of dimension reduction and then tried to repeat the above steps using the principal components as observations.
Repeated the above procedure for the merged train dataset and tried to predict the labels of the observations in the merged test dataset.

Convolutional Neural Network

CNN on unrotated data

Trained Convolution Neural Network on the unrotated training dataset with manually chosen convolution & pooling layers for the CNN.
Tried to select the model yielding best validation accuracy by RandomSearch hyperparameter tuning.
Predicted the labels of the obsevations in unrotated test dataset using the best model from RandomSearch.

CNN on merged data

Merged the training and test sets of unrotated and rotated datasets.
Applied the previously obtained best selected CNN model on the merged dataset.
Predicted the labels of observations in the merged dataset

Results

The following table shows the test accuracies of the different learning algorithms when the models have been trained on data obtained from the unrotated dataset.

Learning Algorithms		Test accuracy
		Model trained on given data set		Model trained using principal components
		Unrotated test data	Rotated test data	Unrotated test data
Discriminant Analysis	LDA	87.22%	9.39%	87.22%
Discriminant Analysis	QDA	54.28%	9.97%	13.63%
Decision Tree	Complete decision tree	87.8%	10.26%	80.13%
	Cost complexity pruing	88.31%	-	-
	Random forest	96.87%	10.42%	-
Support Vector Machine		97.88%	9.96%	96.85%
Neural Network		97.91%	34.21%	96.98%
Convolutional Neural Network		99.09%	-	-

The following table shows the test accuracies of the different learning algorithms when the models have been trained on merged obtained from the unrotated and rotated datasets.

Learning Algorithms		Test accuracy
Learning Algorithms		Model trained on given data set	Model trained using principal components
Discriminant Analysis	LDA	58.41%	56.76%
Discriminant Analysis	QDA	17.125%	14.98%
Decision Tree	Complete decision tree	68.89%	-
	Cost complexity pruing	69.42%	-
	Random forest	87.58%	-
Support Vector Machine		90.25%	88.51%
Neural Network		91.54%	-
Convolutional Neural Network		96.53%	-

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
Application-on-Merged-data		Application-on-Merged-data
Application-on-Unrotated-data		Application-on-Unrotated-data
Neural Network		Neural Network
data		data
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Comparative Study of Learning Algorithms for Image Classification

Data Description

Methodology

Brief Summary

Discriminant Analysis

Discriminant Analysis on unrotated data

Discriminant Analysis on merged data

Decision Tree

Decision Tree on unrotated data

Decision Tree on merged data

Support Vector Machine

SVM on unrotated dataset

SVM on rotated dataset

Neural Network

Convolutional Neural Network

CNN on unrotated data

CNN on merged data

Results

About

Releases

Packages

Contributors 5

Languages

shubha3/Image-Classification-ML-Project

Folders and files

Latest commit

History

Repository files navigation

A Comparative Study of Learning Algorithms for Image Classification

Data Description

Methodology

Brief Summary

Discriminant Analysis

Decision Tree

Support Vector Machine

Convolutional Neural Network

Results

About

Resources

Stars

Watchers

Forks

Languages