Skip to content

Linear Regression Analysis on Domestic Film Revenue in the US

Notifications You must be signed in to change notification settings

ar2849/Metis-Project-Two

Repository files navigation

Metis-Project-Two

Linear Regression Analysis - Metis Project 2 Predicting US Domestic Gross Total Revenues

Description of project goals

The project required web scrapping and analysis of resulting data using regression, the form of regression was based on the data collected and the feature engineering necessary to analyze the data in regression. This project used simple linear regression with K-fold cross validation to predict the Target using the Features listed below.

Features and Target Variables

Target :

  • Total Domestic Gross

Features:

  • Months,
  • Years,
  • Distributor,
  • MPAA Rating,
  • Runtime,
  • Budget

Data Used

  • Box Office Mojo by IMDB data
  • The Numbers

Tools Used

  • Numpy
  • Pandas
  • Pickle
  • Matplotlib
  • Seaborn
  • Beautiful Soup
  • Sklearn
  • Request

Impacts in the scope of the project:

  • To create a prediction model for total domestic gross in the United States
  • To determine the salient features for predicting Domestic Box Office Revenues in the United States

Workflow

Created databases using the Numbers webpage and Box Office Mojo, which are noted in their respective jupyter notebooks. Then joined the datasets and conduct cleaning, preprocessing, EDA, and linear regression within the Regression notebook.