Skip to content

This is a project for "Data Management" and "Data visualization" courses - Master's degree in Data Science, University Milano-Bicocca

License

Notifications You must be signed in to change notification settings

rconfa/Airline-On-Time-Performance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Management and Visualization
Airline on-time performance project

Overview

Final project of "Data Management and Data Visualization" courses in which we tried to answer the following question: "Is there a better time or airline to travel? Are there any differences between years?"
To answer this question, we used flight data from US airlines in the years 2018-2019. These data are publicly available on the USA government website, for more information we have decided to integrate this data with other sources that would allow you to create more complete views. For example, the data on the altitude and on the full name of the airports were integrated using the unique code of the airport obtained from the BTS website.
Once we obtained consistent files on a quantitative and qualitative level for the data, we concentrated on saving on mongoDb by implementing three different shards to divide the data thus managing the high amount of data (~5GB).
In the end we concentrated on the data visualization part trying to create valid infographics to answer the initial questions and that they were.

Software

The project was carried out with the use of Python for implementing the scripting used in Data Management part and Tableau as software for building visualizations about data.

File

  • Report: It describes all the steps and choices made in italian languages.
  • Script: Contains all python script implemented for this project.
  • File: Contains all files used in this project, either for data management part or data visualization part.
  • Sharding: Contains the docker compose used for building shard in MongoDB and a report of the execution time got for four different queries during the uploading. It also contains the description of the sharding strategy that we have used.
  • Json: Contains schema for json files and some sample files.
  • Visualization: Contains the Tableau project file and dashboard images exported in png format.

About us

Riccardo Confalonieri - Data Science Student @ University of Milano-Bicocca

Rebecca picarelli - Data Science Student @ University of Milano-Bicocca

Silvia Ranieri - Data Science Student @ University of Milano-Bicocca