SC1015 DSAI Project: AniFame

Our Motivation:

Animes are outlets of relaxation and escape for people of all ages. However, while anime viewers love watching anime, studios are experiencing difficulties in making profits for many of the anime they produced due to high costs.
According to Eric (2015), an average 13-episode anime season costs around $2 million USD, and many animes cannot recoup this expense. In order to make it sell, anime advertisements, events and merchandise are essential to studios’ profit margin. All this depends on the popularity of the anime with anime viewers.
Hence, it is important to know whether the anime that a studio is producing will be profitable, hence allowing studios to maximise their profits and ensure their survivability in the industry.

Project Goal:

This project aims to maximize studios’ profits on animes they produce by estimating 'mean' rating of animes and predicting 'success' probability before production, hence giving studios the ability to fine-tune the animes before production.

Dataset Used:

We used MyAnimeList API to scrap anime from 2000 to 2021, cleaned and processed it for Exploratory Data Analysis and Machine Learning.

DataSet Folder

Note: Some datasets are scraped but are not included in the final project (e.g. the various ranking datasets)

Jupyter Notebooks:

Note: Some Jupyter Notebooks are used but are not included in the final project (e.g. anomaly detection, helpers, scraper)

Slide Deck:

Presentation Slides

Overview of DataScience Pipeline:

1. Data collection:

Used MAL API to recursively scrap thousands of anime data from 2000 to 2021

2. Data cleaning and preprocessing:

Removing useless features, handling missing values
Json conversion and manipulation
Feature engineering and generation
Creating 'genres' time series data
Export data as csv
One-hot Encoding

3. EDA & Visualization:

Explored, visualized, and generated insights for the following:

'genres' + 'genres' time series
'studios'
'mean' rating vs 'source', 'media_type', 'nsfw', 'rating', 'genre', and 'studios'
Relationship between 'mean', 'rank', 'popularity', 'positive_viewership_fraction', and 'negative_viewership_fraction'
num_episodes' and 'average_episode_duration' overview trend
'start_season_season'

4. Regression:

Models:

Linear Regression
Lasso Regression
Ridge Regression (Best)

Metrics:

Explained Variance (R^2)
Mean Squared Error, Root Mean Squared Error

5. Classification:

Models:

LinearSVC
Decision Tree
Random Forest (Best - 4th version)

Metrics:

TPR, TNR, Confusion Matrix
Precision, Recall (TPR), F-score
Out-of-bag score
ROC AUC score
K-fold cross validation standard deviation

6. Key Insights & Recommendations:

Studios should:

Focus on quality over quantity of animes
Broadcast animes regardless of season
Not focus on producing animes that generate more positive views through fan-service
Produce anime movie franchises

Important features that determine the success of an anime:

‘average_episode_duration’
‘num_episodes’
‘source_manga’
‘media_type_movie’
‘rating_pg_13’

What we learnt from this project:

Data collection:

Scraping data using API calls

Data cleaning and preprocessing:

Feature Engineering & Feature generation
JSON manipulation techniques
Generating time-series data

EDA & Visualization:

Visualization plots with large number of datapoints
- By reducing the data point size,
- By reducing the opacity of data points, or
- By introducing random sampling
‘genres’ time-series EDA

Machine Learning:

Machine Learning Models:
- Ridge Regression, Lasso Regression, Random Forest, LinearSVC
Classification Performance Metrics:
- F-score (Precision & Recall), out-of-bag (obb) score, ROC AUC score

Contributions:

Data Collection: Jing Qiang and Jing Hua
Data cleaning and preprocessing: Jing Qiang, Jing Hua, and YinFeng
EDA and visualization: Jing Qiang and Jing Hua
Regression: Jing Hua
Classification: Jing Qiang
Presentation Script: Jing Qiang
Presentation Voice Over + Editing: Jing Hua
Slides Deck: Jing Qiang, Jing Hua, YinFeng
GitHub ReadMe: Jing Qiang

Did but not included in the final product:

Ranking dataset EDA: YinFeng
Anomaly Detection: Jing Qiang, YinFeng

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
Anime		Anime
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SC1015 DSAI Project: AniFame

Project Goal:

Dataset Used:

Jupyter Notebooks:

Slide Deck:

Overview of DataScience Pipeline:

1. Data collection:

2. Data cleaning and preprocessing:

3. EDA & Visualization:

4. Regression:

5. Classification:

6. Key Insights & Recommendations:

What we learnt from this project:

Contributions:

References:

About

Releases

Packages

Contributors 3

Languages

License

ztjhz/SC1015-Project

Folders and files

Latest commit

History

Repository files navigation

SC1015 DSAI Project: AniFame

Project Goal:

Dataset Used:

Jupyter Notebooks:

Slide Deck:

Overview of DataScience Pipeline:

6. Key Insights & Recommendations:

What we learnt from this project:

Contributions:

References:

About

Topics

Resources

License

Stars

Watchers

Forks

Languages