🎬 BoxOfficePrediction

Develop an advanced predictive model to forecast a film's box office revenue with precision and confidence. Utilizing a myriad of parameters, including budget, cast, genre, and past performance, our task is to leverage the power of machine learning to unravel the intricacies of box office dynamics and provide actionable insights for studios and filmmakers.

🚀 Motivation

With the extensive data from the TMDB_5000 dataset from Kaggle, numerous recommendation systems are built. However, the true potential of the dataset remains largely untapped. Our initiative aims to harness this wealth of information to predict a film's expected revenue by leveraging a multitude of parameters and innovative feature engineering techniques, ultimately empowering stakeholders to make more informed decisions in the ever-evolving landscape of the entertainment industry.

📄 Documentation

This section contains detailed information about the approach, experimentation results, and inferences derived from the project. I have created a blog explaining the approach and execution. Please visit my blog:

🛠️ Technology Stack

Frontend	Backend	ML Library	MLOps Tools	Deployment	Version Control

📊 Implementation Overview

Data:

TMDB 5000 Movie Dataset => Kaggle
Average Ticket Prices => (Made by me) : Download

🔧 Preprocessing:

Formatted complex structure to simple and trainable data.
Assigned Scores to special categorical features like crew, hero, heroine with many unique values, based on the cumulative popularity and weighted rating of their previous work to numerically determine their impact on revenue/footfall.
Used One-hot encoding for normal categorical features with fewer unique values.
Used log-normal transformation to handle skewed data and outliers.
Normalized data with StandardScaler.

🎯 Target Metric: Footfall Prediction

To predict expected revenue, we introduced a novel approach by considering footfall (number of tickets sold) as a target metric. While revenue is subject to various external factors such as ticket prices and distribution deals, footfall provides a more consistent and direct measure of a movie's popularity and audience engagement.

expected revenue = predicted footfall * current avg_ticket_price

🤖 Model Selection

Models trained:

Model	Best Model
RandomForestRegressor
DecisionTreeRegressor
GradientBoostingRegressor
LinearRegression
XGBRegressor	XGBRegressor
CatBoostRegressor
AdaBoostRegressor

📈 Best Model Metrics

Metric	Value
RMSE	0.012
neg_mean_squared_error	-0.00024

⚙️ Best Model Parameters

Parameter	Value
colsample_bytree	0.30000000000000004
learning_rate	0.11
max_depth	4
n_estimators	444

🔍 Hyperparameter Tuning

Method: RandomizedSearchCV

📑 MLflow Experiment Logs

All the experiment results and models are logged in MLflow for a clearer understanding and detailed inference: View here

📸 Screenshots

Home Page	Form Page	Result

🖥️ Run Locally

Clone the project

  git clone https://github.com/uvaishnav/BoxOfficePrediction.git

Create a conda environment after opening the repository

  conda create -n boxoffice python=3.9 -y

  conda activate boxoffice

Install requirements

  pip install -r requirements.txt

Start the server

python app.py

Now,

open up you local host and port

🔧 For Usage/Modification

1. Clone the project

  git clone https://github.com/uvaishnav/BoxOfficePrediction.git

2. Create a conda environment after opening the repository

  conda create -n boxoffice python=3.9 -y

  conda activate boxoffice

3. Install requirements

  pip install -r requirements.txt

4. Create a Kaggle Account and get the kaggle.json file and store it in .kaggle folder in your system (For data_ingestion pipeline)

5. Add Environment Variables

For model evaluation pipeline,

Connect repository to dagshub
Get mlflow uri and credentials
UPdate config.yaml file with your mlflow uri
Then add these variables(credentials from dagshub) to your environment

export MLFLOW_TRACKING_URI= your mlflow uri
export MLFLOW_TRACKING_USERNAME= your username
export MLFLOW_TRACKING_PASSWORD= your password

6. Run all the pipelines using Dvc

dvc init
dvc repro

🎥 Demo

My.Movie.2-720p30.mov

🚀 Deployment

To Deploy this Project on Heroku

1. Dockerize the Project

Update the Dockerfile as needed and build the Docker image. You need to install Docker Desktop first.

docker build -t boxoffice .

2. Update Secret Variables in GitHub to Deploy Using GitHub Actions

Create an account in heroku and create an app.
In your GitHub repository, navigate to Settings -> Secrets and Variables -> Actions. Add the secret keys according to your main.yaml file in workflow

HEROKU_API_KEY
HEROKU_APP_NAME
HEROKU_EMAIL

The buld will hapen and a new version of your project is deployed every time you make changes and push to github.

📈 Scope of Improvement

Our current model predicts expected revenue based on factors like budget, cast, release month, and genres.

Optimizing Cast Selection and Release Timing

We can enhance its utility by optimizing cast selection and release timing. By analyzing historical data, we can identify optimal combinations of actors and crew members that synergize well, thereby maximizing revenue potential. Additionally, refining our model to recommend the best release windows can help avoid high competition periods and leverage seasonal trends, further boosting a film’s success.

🙏 Acknowledgements

TMDB_5000 dataset from Kaggle
247wallst.com for preparing ticket prices dataset

📜 License

This project is licensed under the GPL-3.0 License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.dvc		.dvc
.github/workflows		.github/workflows
config		config
final_models		final_models
models		models
research		research
src/BoxOfficePrediction		src/BoxOfficePrediction
static		static
templates		templates
.dockerignore		.dockerignore
.dvcignore		.dvcignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.py		app.py
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
main.py		main.py
params.yaml		params.yaml
requirements.txt		requirements.txt
schema.yaml		schema.yaml
setup.py		setup.py
template.py		template.py
test.py		test.py

License

uvaishnav/BoxOfficePrediction

Folders and files

Latest commit

History

Repository files navigation