Skip to content

My professional portfolio with some of my best data science projects.

Notifications You must be signed in to change notification settings

tkh5044/portfolio

Repository files navigation

Troy Hepper - Data Science Portfolio

Projects

Predicting MLB Game Attendance

For this project, I took a look into historical game data to see what factors might play a role in fan turnout. I gathered data from baseball reference by scraping game data for every MLB game from 1990 to 2016. From my findings, I built a regression model to predict the attendance for a given game. I only created one model that could be used to predict attendance for any team, and while I was able to produce an accurate model, this process could be carried out for a specific team in order to increase the performance of the model. The team could then use these results to anticipate low attendance nights and then develop effective marketing or promotional strategies.

Scraping Indeed.com and Predicting Data Scientist Salaries

One of the most interesting projects that I recently worked on involved collecting salary information on data science jobs in order to predict the salaries for certain jobs based on the location, title, and job summary. The project was a real test of a few newly acquired data science skills, such as gathering data and information from a webpage, performing natural language processing on text data, and building a classification prediction model. After collecting the data, I created a classification model to predict whether a job salary would be above or below the median salary for a data scientist.

Predicting Iowa Liquor Sales

The state of Iowa provides many datasets on their website, including one which contains transactions for all liquor stores in the state from January 2015 through March 2016. With this information available, my goal was to analyze the data and build a linear regression model to predict the sales for the rest of 2016. I created a model that described the relationship between the 2015 quarter 1 sales and the 2016 quarter 1 sales. Then, using that model, I predicted the quarters 2 through 4 sales for 2016 from the quarters 2 through 4 sales for 2015.

Analyzing Global Terrorism

The Global Terrorism Database (GTD) is an open-source database including information on terrorist attacks around the world from 1970 through 2015. The GTD includes systematic data on domestic as well as international terrorist incidents that have occurred during this time period and now includes more than 150,000 cases. For this project, I analyzed the 45 years of global terrorism and created numerous vizualizations in order to better understand the exceptionally large dataset. I applied Bayesian statistics to compare two different countries to see if one was significantly more dangerous than the other during 21st century. Additionally, the year 1993 is missing from this dataset so I attempted to estimate the number of bombings for that year by using a time series model.

About

My professional portfolio with some of my best data science projects.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published