Predict Stock

Classification model for predicting stock market trending, based on machine learning techniques, such as Extremely Randomized Trees, K-Means, Support Vector Machines and K-Fold Cross-Validation.

Coding presented as part of the Capstone Project in Computational Engineering of the Universidade Federal de Juiz de Fora

📈 Problem presentation

This project aims to demonstrate an application of machine learning methods in predicting the oscillation of the stock market. Different techniques will be employed in order to create a more robust model and improve the predictions accuracy.

The pipeline and short description of the employed methods are as follows:

Data acquiring: Acquire stock history value using pandas-datareader.
Data preparation: Remove missing and unecessary data using pandas.
Apply indicators: Apply financial indicators in data collected using pandas.
Feature selection: Extremely Randomized Trees

Supervised method used to solve classification and regression problems. It is a variation of the classic Random Forests, which adds more randomization in node partition and choice of training sets. These changes reduce the bias and the variance of the model, proposing to alleviate the problems of underfitting and overfitting, respectively.

In the present problem, this method was used as a feature selector, measuring the importance of each financial indicator in the prediction.
Clusterization: K-Means

Unsupervised method used in partitioning or clustering, which organizes the elements of a set into groups (clusters) so that the elements resemble each other. The number of clusters must be defined initially and this becomes the starting point of the method.

This method was employed to clusterize the data and reduce the number of support vectors in the next step.
Classification: Support Vector Machines

Supervised method used to solve classification and regression problems with linear or nonlinear data. This methods aims to find the hyperplane that separates the training samples of the problem in their respective classes.

This is the main step of this pipeline, where the classified data stands for upward or downward stock oscillation.
Parameter tuning: K-Fold Cross-validation

Finally, we need a method to evaluate the parameters of the chosen model and tell what is the best combination of them.

This method randomly split the data set in K subsets. In each iteration, one set is used for test and the remaining K-1 sets are employed for training, make possible to measure the accuracy and tuning the parameters.

📝 Dependencies

Besides, of course, Python, you will need NumPy library for numerical operations, Matplotlib library for plotting, pandas and pandas-datareader to deal with datasets, and scikit-learn to perform the machine learning algorithms itself.

You may install all dependencies with the following command:

pip3 install numpy matplotlib pandas pandas-datareader scikit-learn

🏃 How to run

After install dependencies, open your terminal in the folder you want to clone the project:

git clone https://github.com/LorranSutter/PredictStock-SVM.git

First, you will need to acquire stocks data. The following command uses the file db/NASDAQ.csv as reference to list all stocks to get data. However, if you do not want to get the data from all the available stocks, just change the file removing unwanted stocks.

python3 initGetData.py

After acquire the stocks data, results will be stored in db/stocks folder. Then, you may run the main code changing the variable ticker inside the code with the desired ticker.

python3 main.py

💻 Technologies

Python - interpreted, high-level, general-purpose programming language
Pandas - data analysis and manipulation tool
Pandas datareader - data access for pandas
Sklearn - machine learning library
NumPy - general-purpose array-processing package
Matplotlib - plotting library for the Python

📖 Main references

VO, V.; LUO, J.; VO, B. Time series trend analysis based on k-means and support vector machine. v. 35, p. 111–127, 1 2016
LEE, M.-C. Using support vector machine with a hybrid feature selection method to the stock trend prediction. Expert Systems with Applications, v. 36, n. 8, p. 10896 – 10904, 2009. ISSN 0957-4174
XU, Y.; LI, Z.; LUO, L. A study on feature selection for trend prediction of stock trading price. Jun 2013
LIMA, M. L. Um modelo para predição de bolsa de valores baseado em mineração de opinião, 2016. Dissertação de Mestrado (Programa de Pós-Graduação em Engenharia de Eletricidade), UFMA (Universidade Federal do Maranhão), São Luı́s, Brasil

🍪 Credits

Thanks for indicators implementation of Bruno Franca pandasImpl.py

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
db		db
.gitignore		.gitignore
Indicators.py		Indicators.py
LICENSE		LICENSE
README.md		README.md
Stock.py		Stock.py
StockSVM.py		StockSVM.py
getData.py		getData.py
initGetData.py		initGetData.py
lastId.txt		lastId.txt
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predict Stock

📈 Problem presentation

📝 Dependencies

🏃 How to run

💻 Technologies

📖 Main references

🍪 Credits

About

Releases

Packages

Languages

License

LorranSutter/PredictStock-SVM

Folders and files

Latest commit

History

Repository files navigation

Predict Stock

📈 Problem presentation

📝 Dependencies

🏃 How to run

💻 Technologies

📖 Main references

🍪 Credits

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages