Skip to content

marcogx/taxi-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NYC-Taxi-Data-Analysis

In this final group project we analyze NYC taxi data on topic of "Understanding Taxi Economics", it is implemented in Map-Reduce Algorithm with Hadoop Streamming API and Python.

Question Investigated

  1. How does revenue vary across neighborhoods and how does it correlate with the median household income in the neighborhood?
  2. How does revenue vary over time? Are the months or seasons when taxi companies make more (or less) money?
  3. How long do cab drives ride without passengers? How does this vary over time?
  4. Are revenues affected during major events? E.g., parades, presidential visits, storms

Data Source

2013 Taxi data
Trip data: https://chriswhong.com/wp-content/uploads/2014/06/nycTaxiTripData2013.torrent
Fare data: https://chriswhong.com/wp-content/uploads/2014/06/nycTaxiFareData2013.torrent

Census data
Demographics: https://www.nyc.gov/html/dcp/html/census/demo_tables_2010.shtml
Income information: https://www.nyc.gov/html/dcp/html/census/socio_tables.shtml
Shape files for census tracts: https://www.nyc.gov/html/dcp/html/bytes/districts_download_metadata.shtml (search for "tract")

Weather data
https://www.ncdc.noaa.gov/data-access/land-based-station-data/land-based-datasets
https://www7.ncdc.noaa.gov/CDO/dataproduct -- select "Surface Data, Hourly Global", and then when it comes to select the region, choose NY and the three main stations (Central Park, JFK and LaGuardia).

Result Snapshot











Author

Shaopeng Zhang
Hao Chen
Guang Xiong

About

Big Data Analysis on NYC Taxi in MapReduce/Hadoop

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published