Author: Meilan Ou
Source code: https://github.com/meilanou/NYCSubway
Last update: March 8th 2015
By annual ridership, the New York City Subway is the busiest rapid transit rail system in the United States and in the Americas, and the seventh busiest in the world. It offers rail service 24 hours and every day of the year. It is also the largest system in the world by number of stations, with 468 stations in operation (421, if stations connected by transfers are counted as single stations).
Understanding NYC Subway ridership may help government officials and planners to better idenifity the potential of space and resouce, and further improve the living quality in the urban setting. To understand ridership, one approach is to analyze the turnstile entry and exit data which is being collected by MTA at different stations and remote units.
In this project, I'm going to analyze the NYC Subway turnstile data with statistical methods, and predict its ridership by key common factors such as time, day of week, and weather records like raining or not.
Origianl raw data can be found at:
While on average data scientists spend 70% of their time on data munging, luckily for this project Udacity course team provides an improved data set that combines ridership and weather condition for each reading. It's available at: https://www.dropbox.com/s/1lpoeh2w6px4diu/improved-dataset.zip?dl=0
Clike on section 0: Explore ridership and weather data to get started!