PISA Data 2012

This was the fifth and last project of Udacity's Data Analyst Nanodegree.

Dataset

PISA is a survey that examines students from compulsory education on how well prepared they are for life after school. This investigation focuses on the PISA Survey from 2012, with data belonging to around 500K students from 65 different countries.

The file provided by Udacity for this dataset contained data from a total of 485'490 students, grouped in 636 columns. The dataset contains not only the results from the exam in each category, but also lots of information on the students' background, including variables like country of residence, number of family members and their level of education, possessions or access to different facilities at home and at school.

The main feature of this dataset is the score obtained by the students in each discipline and the potential for understanding how a number of different background factors can impact these scores and therefore the level of preparation for students around the world.

Analyzing such a complex dataset entirely requires a lot of time. For this project, I simplify the dataset by keeping only the following selection of background variables:

Student's country of residence.
Student's country economic level (OECD or not).
Student's gender.
Student's parents' level of education.
Access to internet at school and at home.

In addition to these independent variables I keep the results for the three main disciplines: mathematics, reading and science.

Summary of Findings

The exploratory analysis provided the following findings:

The 3 score variables follow a clear normal distribution, ranging from 200 to 800 points each.

Not all countries are equally represented in the survey. While many of the 65 countries in the dataset have a similar number of entries (between 0.9 and 1.5% of the survey population each), about 15 of them go significantly above or below these values.

The 3 score variables show a linear correlation. This correlation is slightly weaker between math and reading scores adn stronger between math and science scores.

No clear relationship is found between student's gender and their scores, with the exception of reading, where female students seem to perform better than male students.

Students from OECD countries present better results in the exams.

Students' whose parents have a higher level of education also obtain better scores.

The use of internet at school doesn't seem to provide better test results. On the contrary, in non-OECD countries, students who claim to use internet at school show slightly lower performance in all 3 disciplines.

Key Insights for Presentation

In the presentation, I first show the correlation between the scores obtained by students in each of the 3 disciplines. We see this through a series of scatterplots comparing each discipline to the other two.

Afterwards, I focus on the relationship between each of the 4 background independent variables with the scores, using the violin and box plots obtained in the exploratory visualization. No modification is done to any of the plots.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
images		images
.gitignore		.gitignore
Part_II_slide_deck.ipynb		Part_II_slide_deck.ipynb
Part_II_slide_deck.slides.html		Part_II_slide_deck.slides.html
Part_I_exploration.html		Part_I_exploration.html
Part_I_exploration.ipynb		Part_I_exploration.ipynb
README.md		README.md
pisadict2012.csv		pisadict2012.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PISA Data 2012

Dataset

Summary of Findings

Key Insights for Presentation

About

Releases

Packages

Languages

picusin/pisa-2012

Folders and files

Latest commit

History

Repository files navigation

PISA Data 2012

Dataset

Summary of Findings

Key Insights for Presentation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages