In [13]:
import os
os.getcwd()
os.chdir("C:\Vindico\Projects\Data\Course\Python\Udacity\Introduction to Data Science\Lesson 2\Exercise")
os.getcwd()
Out[13]:
In [14]:
import pandas
def get_hourly_entries(filename):
'''
The data in the MTA Subway Turnstile data reports on the cumulative
number of entries and exits per row. Assume that you have a dataframe
called df that contains only the rows for a particular turnstile machine
(i.e., unique SCP, C/A, and UNIT). This function should change
these cumulative entry numbers to a count of entries since the last reading
(i.e., entries since the last row in the dataframe).
More specifically, you want to do two things:
1) Create a new column called ENTRIESn_hourly
2) Assign to the column the difference between ENTRIESn of the current row
and the previous row. If there is any NaN, fill/replace it with 1.
You may find the pandas functions shift() and fillna() to be helpful in this exercise.
Examples of what your dataframe should look like at the end of this exercise:
C/A UNIT SCP DATEn TIMEn DESCn ENTRIESn EXITSn ENTRIESn_hourly
0 A002 R051 02-00-00 05-01-11 00:00:00 REGULAR 3144312 1088151 1
1 A002 R051 02-00-00 05-01-11 04:00:00 REGULAR 3144335 1088159 23
2 A002 R051 02-00-00 05-01-11 08:00:00 REGULAR 3144353 1088177 18
3 A002 R051 02-00-00 05-01-11 12:00:00 REGULAR 3144424 1088231 71
4 A002 R051 02-00-00 05-01-11 16:00:00 REGULAR 3144594 1088275 170
5 A002 R051 02-00-00 05-01-11 20:00:00 REGULAR 3144808 1088317 214
6 A002 R051 02-00-00 05-02-11 00:00:00 REGULAR 3144895 1088328 87
7 A002 R051 02-00-00 05-02-11 04:00:00 REGULAR 3144905 1088331 10
8 A002 R051 02-00-00 05-02-11 08:00:00 REGULAR 3144941 1088420 36
9 A002 R051 02-00-00 05-02-11 12:00:00 REGULAR 3145094 1088753 153
10 A002 R051 02-00-00 05-02-11 16:00:00 REGULAR 3145337 1088823 243
...
...
'''
#your code here
df = pandas.read_csv(filename)
df = df[df['DESCn']=='REGULAR']
df['ENTRIESn_hourly'] = df['ENTRIESn'] - df['ENTRIESn'].shift(1)
df['ENTRIESn_hourly'] = df['ENTRIESn_hourly'].fillna(1)
return df
In [15]:
get_hourly_entries("output.csv")
Out[15]: