In this notebook we will clean the Subway Stations dataset made available by MTA.
Let's start by opening and examining it.
In [2]:
import pandas as pd
stations = pd.read_csv('data/DOITT_SUBWAY_STATION_01_13SEPT2010.csv')
stations.head(4)
Out[2]:
Let's extract the latitude and longitude from the dataset. For that we will use add_coord_columns()
which is defined in coordinates.py
. Notice that the coordinates are reversed as in (longitude, latitude)
.
In [3]:
import coordinates as coord
coord.add_coord_columns(stations, 'the_geom', sep=' ', _reversed=True)
stations.loc[:, ('latitude', 'longitude')].head()
Out[3]:
Now let's clean the DataFrame
.
In [4]:
stations.rename(columns={'NAME': 'station', 'LINE': 'lines', 'NOTES': 'notes'}, inplace=True)
relevant_cols = ['station', 'latitude', 'longitude', 'lines', 'notes']
stations_cleaned = stations.loc[:, relevant_cols]
stations_cleaned.sort_values(by='station', inplace=True)
stations_cleaned.head()
Out[4]:
Let's quickly plot the stations coordinates to have a feel for their geographical location:
In [9]:
!pip install folium
In [12]:
import folium
stations_map = folium.Map([40.729, -73.9], zoom_start=11, tiles='CartoDB positron', width='60%')
for i, station in stations_cleaned.iterrows():
marker = folium.CircleMarker([station['latitude'], station['longitude']],
popup=station['station'], color='FireBrick',
fill_color='FireBrick', radius=2)
marker.add_to(stations_map)
stations_map.save('maps/all_entrances.html')
stations_map
Out[12]:
The interactive map is available here.
Now let's just save it as a pickle
binary file for later use in the recommender notebook.
In [11]:
stations_cleaned.to_pickle('pickle/stations_locations.p')