This repository contains my submissions/labs for the Fall 2019 CS5361 Machine Learning Course. All resources and provided instructions are provided by the instructor's site.
Lab 1: K-nearest Neighbors
The lab1
directory contains the following files:
_instr.pdf
which contains the instructions for the assignmentknn.py
which is the provided script to modify for building a k-nearest-neighbors predictor modelmnist.py
which is the provided script to load the mnist dataset for training/testingzeroR.py
which is the provided script containing a predictor modelreport.py
which is my submitted report for this assignment
- Modify the
dir
variable in theknn.py
script to direct the path to your downloaded datasets. - Run the
mnist.py
code first. - Run
knn.py
. See datasets for any additional files required to run the program.
The dataset used for this program is the MNIST dataset and the Solar particle dataset, as provided by the instructor on the course webpage.
Experimental results for this assignment can be found in this Google sheets* document, in the lab1-knn
sheet.
*This document may not be available after the course end date.
Lab 2: Decision Trees
The lab2
directory contains the following files:
_instr.pdf
which contains the instructions for the assignmentmagic04.txt
which is the provided datasetdecision_tree.py
which is the provided script to modify for building a decision tree classification modelregression_tree.py
which is the provided script to modify for building a decision tree regression model
- Modify the
dir
variable in theregression_tree.py
program to direct the path to your solar particle dataset. - Compile the
decision_tree.py
program or theregression_tree.py
program, or both to your preference.
The dataset used for this program is provided by the instructor on the course webpage.
Experimental results for this assignment can be found in this Google sheets* document, in the lab2-dectree
sheet.
*This document may not be available after the course end date.
Lab 3: Decision and Regression Trees
The lab3
directory contains the following files:
_instr.pdf
which contains the instructions for the assignmentdecision_tree.py
which is the provided script to modify for building a decision tree classification modelregression_tree.py
which is the provided script to modify for building a decision tree regression model
- Modify the
dir
variables in theregression_tree.py
anddecision_tree.py
programs to direct the path to a dataset of your choice. - Compile the
decision_tree.py
program or theregression_tree.py
program, or both to your preference.
The dataset used for this program is provided by the instructor on the course webpage.
Experimental results for this assignment can be found in this Google sheets* document, in the lab2-dectree
sheet.
*This document may not be available after the course end date.
Lab 4: The scikit library
The lab4
directory contains the following files:
_instr.pdf
which contains the instructions for the assignment__init__.py
which is the main script to compile the programdataset.py
which contains the Dataset class that loads and stores the datasets for use in the programdectree.py
which contains the classification and regressor predictor models for decision treesforest.py
which contains the classification and regressor predictor models for forestsknn.py
which contains the classification and regressor predictor models for knnlogreg.py
which contains the classification and regressor predictor models for logistic regressionsvm.py
which contains the classification and regressor predictor models for support vector machine
For information about the other files in this directory, see the Results section below.
- Modify the
dataset.py
script to access the dataset(s) of your choice - Compile the
__init__.py
program, or both to your preference.
The datasets used for this program are provided by the instructor on the course webpage.
Experimental results for this assignment can be found in the res.txt
and results.txt
files included in the lab4
directory.
Lab 5: The keras library
The lab5
directory contains the following files:
_instr.pdf
which contains the instructions for the assignment__init__.py
which is the main script to compile the programcnn.py
which contains the code to develop and test convolutional neural networks on two datasets: MNIST and CIFAR-10dnn.py
which contains the code to develop and test fully connected dense neural networks on two datasets: solar particle and gamma ray For information about the other files in this directory, see the Results section below.
- Modify the
dataset.py
script to access the dataset(s) of your choice - Compile the
__init__.py
program, or both to your preference.
The datasets used for this program are either provided by the instructor on the course webpage (solar particle and gamma ray) or imported via the keras library (MNIST and CIFAR-10).
Experimental results for this assignment can be found in the lab5\lab5.txt
file or, for specific runs, in the respective lab5\results
directory containing test##.txt
files.
Lab 6: Learning to Predict Sequences
The lab6
directory contains the following files:
_instr.pdf
which contains the instructions for the assignment__init__.py
which is the main script to compile the programresults-base.txt
which contains the accuracy results of the predictions by the baseline modelresults-lstm.txt
which contains the accuracy results of the predictions by the LSTM modelresults-conv.txt
which contains the accuracy results of the predictions by the convolutional model For information about the other files in this directory, see the Results section below.
- Modify the
dataset.py
script to access the dataset(s) of your choice. - Compile the
__init__.py
program.
The datasets used for this program are either provided by the instructor on the course webpage (solar dataset: xrp.npy
).
Experimental results for this assignment can be found in the npy
files. Accuracy results are presented in the results-XXXX.txt
files.
Predicting Personality through Text
The project
directory contains the following files:
baseline.py
which contains the code to build and train the baseline modelmain.py
which contains the code to build and train the Naive-Bayes classifiersdoc2vec.py
which contains the code to build a Doc2Vec embeddings for the datasetdataset.py
which contains the code to read and load the dataset
- For the first run (in both
baseline.py
andmain.py
), modify the constructor to Dataset to containfirst_time=False
. This will load the data in your system (make sure d indataset.py
reflects your system configuration) and create npy files for NumPy to use in future runs. - In the second run (in both
baseline.py
andmain.py
), remove thefirst_time=False
modification we used in (1). Then, change the main method to callsecond_run(...)
. - In any future runs (in both
baseline.py
andmain.py
), comment out the line of code in (2), and run as normally.