Skip to content

This analysis extracts movie data from Wikipedia and Kaggle, transforms and cleans the data using python, and finally loads it into a readable file to be used in PostgreSQL.

Notifications You must be signed in to change notification settings

Wall-E28/movies_ETL

Repository files navigation

Extract, Transfrom, Load Movie Data

Overview

The purpose of this project was to gather movie data from both wikipedia and kaggle and create a database that students can use to preform there own analysis. In order to do these, I extracted the neccessary data from Wikipedia and Kaggle, used python's pandas library to transform the data into a working dataframe, and then loaded it to PostgreSQL for students to use it.

Results

After completing the extract, transform, and load process, I had a database with two tables: one for movies and one for ratings. The movies table consists of 31 columns holding a range of information for 6048 different movies.

Movies_Query

The ratings table consists of 5 different columns holding a range of information on over 26 million individual user ratings.

Rating_Query

Together, students will have a chance to preform various different analysis from these tables to derive all sorts of interesting insights.

About

This analysis extracts movie data from Wikipedia and Kaggle, transforms and cleans the data using python, and finally loads it into a readable file to be used in PostgreSQL.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published