Movies-ETL

Overview

Amazing Prime loves the dataset and wants to keep it updated on a daily basis. Britta needs your help to create an automated pipeline that takes in new data, performs the appropriate transformations, and loads the data into existing tables. You’ll need to refactor the code from this module to create one function that takes in the three files—Wikipedia data, Kaggle metadata, and the MovieLens rating data—and performs the ETL process by adding the data to a PostgreSQL database.

Deliverable

Deliverable 1: Write an ETL Function to Read Three Data Files
Deliverable 2: Extract and Transform the Wikipedia Data
Deliverable 3: Extract and Transform the Kaggle data
Deliverable 4: Create the Movie Database

Results

The results database we created consists of two tables (see images below)

Movie table with 6,051 rows of data that represent the movie titles. This table also consists of 31 columns with production related information.

Ratings table with 26,024,289 rows of data, each representing a rating 1-5 with five columsn of data.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Images		Images
.gitignore		.gitignore
ETL_clean_kaggle_data.ipynb		ETL_clean_kaggle_data.ipynb
ETL_clean_wiki_movies.ipynb		ETL_clean_wiki_movies.ipynb
ETL_create_database.ipynb		ETL_create_database.ipynb
ETL_function_test.ipynb		ETL_function_test.ipynb
LICENSE		LICENSE
README.md		README.md
wikipedia-movies.json		wikipedia-movies.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movies-ETL

Overview

Deliverable

Results

About

Releases

Packages

Languages

License

diercz/Movies-ETL

Folders and files

Latest commit

History

Repository files navigation

Movies-ETL

Overview

Deliverable

Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages