Investigate a Dataset

Udacity Data Analyst Nanodegree

Project Overview

In this project, you will analyze a dataset and then communicate your findings about it. You will use the Python libraries NumPy, pandas, and Matplotlib to make your analysis easier.

Python Packages to be installed

Pandas
Numpy
Matplotlib
csv

Why this project?

In this project, you'll go through the data analysis process and see how everything fits together.

You'll use the Python libraries NumPy, pandas, and Matplotlib, which make writing data analysis code in Python a lot easier! Not only that, these are sought-after skills by employers!

What will i learn?

After completing the project, you will:

Know all the steps involved in a typical data analysis process
Be comfortable posing questions that can be answered with a given dataset and then answering those questions
Know how to investigate problems in a dataset and wrangle the data into a format you can use
Have experience communicating the results of your analysis
Be able to use vectorized operations in NumPy and pandas to speed up your data analysis code
Be familiar with pandas' Series and DataFrame objects, which let you access your data more conveniently
Know how to use Matplotlib to produce plots showing your findings

Introduction

For the final project, you will conduct your own data analysis and create a file to share that documents your findings. You should start by taking a look at your dataset and brainstorming what questions you could answer using it. Then you should use pandas and NumPy to answer the questions you are most interested in, and create a report sharing the answers. You will not be required to use inferential statistics or machine learning to complete this project, but you should make it clear in your communications that your findings are tentative.

Step One - Choose Your Data Set

Click this this link (availabe in google doc here)to open a document with links and information about data sets that you can investigate for this project. You must choose one of these datasets to complete the project.

Step Two - Get Organized

Eventually you'll want to submit your project (and share it with friends, family, and employers). Get organized before you begin. We recommend creating a single folder that will eventually contain:

The report communicating your findings
Any Python code you wrote as part of your analysis
The data set you used (which you will not need to submit)

Step Three - Analyze Your Data

Brainstorm some questions you could answer using the data set you chose, then start answering those questions. You can find some questions in the data set options to help you get started.

Try and suggest questions that promote looking at relationships between multiple variables. You should aim to analyze at least one dependent variable and three independent variables in your investigation. Make sure you use NumPy and pandas where they are appropriate!

Selected dataset: No-show appointments

Datset Description: The Brazilian public health system, termed in Portuguese as SUS for Unified Health System, is one of the world's largest with government spending amounting to almost 9% of GDP. Its operation, however, is not uniform, and residents in different parts of the country have differing ideas of quality. The no-show appointments dataset will be used in this project, which contains information from 100k medical visits in Brazil and is focused on the subject of whether or not patients show up for their appointments, as well as a collection of patient attributes in each columns. Some of the key variables in th dataset are:

ScheduledDay: This represent the date that each patients booked for medical check up
Neighbourhood: This represent the area each patients are residing within the region
Scholarship: This represent information about if a patient was sponspored for an appointment or otherwise
No_show: This the class or response variable which shows if a patient either shows up for medical appointment or not

Step Four - Share Your Findings

Once you have finished analyzing the data, create a report that shares the findings you found most interesting. If you use a Jupyter notebook, share your findings alongside the code you used to perform the analysis. Make sure that your report text is contained in Markdown cells to clearly distinguish your comments and findings from your code work. You should also feel free to use other tools and software to craft your final report, but make sure that you can submit your report as an HTML or PDF file so that it can be opened easily.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
investigate-a-dataset.html		investigate-a-dataset.html
investigate-a-dataset.ipynb		investigate-a-dataset.ipynb
noshowappointments-kagglev2-may-2016.csv		noshowappointments-kagglev2-may-2016.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Investigate a Dataset

Udacity Data Analyst Nanodegree

Project Overview

Python Packages to be installed

Why this project?

What will i learn?

Introduction

Step One - Choose Your Data Set

Step Two - Get Organized

Step Three - Analyze Your Data

Step Four - Share Your Findings

About

Releases

Packages

Languages

oluyemmy/Investigate-a-dataset

Folders and files

Latest commit

History

Repository files navigation

Investigate a Dataset

Udacity Data Analyst Nanodegree

Project Overview

Python Packages to be installed

Why this project?

What will i learn?

Introduction

Step One - Choose Your Data Set

Step Two - Get Organized

Step Three - Analyze Your Data

Step Four - Share Your Findings

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages