PCA_in_Python

Principal Component Analysis on a data set of 10 stocks in order to reduce dimensionality to 2 principal components. Numpy, Pandas.

Today we will perform PCA on a set of 10 stocks I've chosen, the top 5 holdings for the Nasdaq and top 5 holdings of the Dow Jonesm as of August 2018. The data is included in the repository as 'PCAstocks.csv'.

Nasdaq: Apple, Amazon, Microsoft, Google, Facebook

Dow: Boing, United Health Group, Goldmansachs, 3M, Home Depot

We first decide the number of dimensions we want to reduce our data set to. We can do this by naively guessing and checking, or we can write a loop and see where the diminshing returns of adding dimensions (or principal components) kicks in.

Here is what our graph looks like:

As we can see the effect of adding principal components drops off signficantly after the second (from ~15% to below 8%) With this in mind we will make our desired dimensions 2, which is also nice because this allows for easy visualization.

We fit our data using the number of dimensions we want and scikitlearn's PCA(). Print out of the desired dimensions and variability of the data they explain:

Now we calculate our factor returns and factor exposures (risk variable). We can then plot the factor exposure or our first principal component against our second. Here's the result:

Very interesting! We can see two rather distinct groupings of the data. Non-coincidentally they group into their respective tech and or DOW industries, which is where the data was pulled. (top 5 holdings of QQQ, top 5 of DIA.)

PCA allows for us to reduce the number of dimensions we are working with, which when dealing with financial data can often be extremely bountiful (curse of dimensionality). This allows for easier visualization, computation, and intuition.

We could further pair this with K means clustering as a way to cluster and classify our factor exposures, and we could try to classify any new data as one risk exposure or the other.

PCA is one of the most common dimensionality reduction techniques in data science and rather easy to implement.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
PCA.py		PCA.py
PCAstocks.csv		PCAstocks.csv
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PCA_in_Python

About

Releases

Packages

Languages

ayoubmassik/PCA_in_Python

Folders and files

Latest commit

History

Repository files navigation

PCA_in_Python

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages