Skip to content

Emmanuel-Ncube/Udacity-Data-Analyst-Nanodegree

Repository files navigation

Udacity logo


This repository serves as a showcase of my skills, a platform to share my projects, and a way to track my progress in Data Analytics and Data Science-related topics.

Table of contents

Installation

This project uses Python 3 and is designed to be completed through the Jupyter Notebooks IDE. It is highly recommended that you use the Anaconda distribution to install Python, since the distribution includes all necessary Python libraries as well as Jupyter Notebooks. The following libraries are expected to be used in this project:

  • Numpy
  • Pandas
  • Matplotlib
  • Seaborn
  • tweepy
  • json
  • request

Portfolio Projects

In this section I will list data analytics projects briefly.

Project 1 - Investigate A Dataset

We shall analyse a data set which contains information about 10000 movies collected from The Movie Database (TMDb), including user ratings and revenue. The dataset is from kaggle and contains information about 10,866 movies collected from The Movie Database (TMDb), including popularity, revenue, budget, cast and genres.

Project 2 - Wrangle and Analyze Data

The dataset that I worked on, wrangling (and analyzing and visualizing) is the tweet archive of Twitter user @dog_rates, known as WeRateDogs. WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog. These ratings almost always have a denominator of 10. The numerators, though? Almost always greater than 10. 11/10, 12/10, 13/10, etc. Why? Because "they're good dogs Brent." WeRateDogs has over 4 million followers and has received international media coverage.

Goal: Wrangle WeRateDogs Twitter data to create interesting and trustworthy analyses and visualizations. The Twitter archive is great, but it only contains very basic tweet information. Additional gathering, then assessing and cleaning is required for "Wow!"-worthy analyses and visualizations.

Project 3 - Communicate Data Findings

This ProsperLoan Dataset contains 113,937 loans with 81 variables on each loan, including loan amount, prosper ratings, estimated loss, prosper and credit scores borrower rate (or interest rate), current loan status, borrower income, and many others. This data dictionary explains the features in the data set. The project objective is not expected to explore all of the variables in the dataset! But focus on only exploration on about 10-15 of visualizations.

Releases

No releases published

Packages

No packages published