Fileset (1,5 GB): https://data.sfgov.org/api/views/nuek-vuh3/rows.csv?accessType=DOWNLOAD
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [1]:
import pandas as pd
df = pd.read_csv(
r'C:\Users\Markus\Downloads\Fire_Department_Calls_for_Service.csv')
df.head()
C:\dev\apps\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py:2717: DtypeWarning: Columns (19,20,25) have mixed types. Specify dtype option on import or set low_memory=False.
interactivity=interactivity, compiler=compiler, result=result)
Out[1]:
Call Number
Unit ID
Incident Number
Call Type
Call Date
Watch Date
Received DtTm
Entry DtTm
Dispatch DtTm
Response DtTm
...
ALS Unit
Call Type Group
Number of Alarms
Unit Type
Unit sequence in call dispatch
Fire Prevention District
Supervisor District
Neighborhooods - Analysis Boundaries
Location
RowID
0
1030101
E18
306091
Medical Incident
04/12/2000
04/12/2000
04/12/2000 09:00:29 PM
04/12/2000 09:01:40 PM
04/12/2000 09:02:00 PM
NaN
...
False
NaN
1
ENGINE
1.0
8
4
Sunset/Parkside
(37.7487247711275, -122.495504020186)
001030101-E18
1
1030104
M14
30612
Medical Incident
04/12/2000
04/12/2000
04/12/2000 09:09:02 PM
04/12/2000 09:10:17 PM
04/12/2000 09:10:29 PM
04/12/2000 09:12:11 PM
...
True
NaN
1
MEDIC
2.0
8
4
Sunset/Parkside
(37.7540326780595, -122.502185504543)
001030104-M14
2
1030106
M36
30614
Medical Incident
04/12/2000
04/12/2000
04/12/2000 09:09:44 PM
04/12/2000 09:10:56 PM
04/12/2000 09:11:47 PM
NaN
...
False
NaN
1
MEDIC
1.0
2
6
Tenderloin
(37.7764405100838, -122.418481123408)
001030106-M36
3
1030107
E01
30615
Alarms
04/12/2000
04/12/2000
04/12/2000 09:13:47 PM
04/12/2000 09:13:51 PM
04/12/2000 09:14:13 PM
04/12/2000 09:15:58 PM
...
False
NaN
1
ENGINE
3.0
3
6
Tenderloin
(37.7825474000421, -122.412247935495)
001030107-E01
4
1030108
RS1
30616
Medical Incident
04/12/2000
04/12/2000
04/12/2000 09:14:43 PM
04/12/2000 09:16:11 PM
04/12/2000 09:16:24 PM
04/12/2000 09:18:20 PM
...
False
NaN
1
RESCUE SQUAD
2.0
1
3
Financial District/South Beach
(37.7863072236365, -122.405294845215)
001030108-RS1
5 rows × 34 columns
In [8]:
df.columns
Out[8]:
Index(['Call Number', 'Unit ID', 'Incident Number', 'Call Type', 'Call Date',
'Watch Date', 'Received DtTm', 'Entry DtTm', 'Dispatch DtTm',
'Response DtTm', 'On Scene DtTm', 'Transport DtTm', 'Hospital DtTm',
'Call Final Disposition', 'Available DtTm', 'Address', 'City',
'Zipcode of Incident', 'Battalion', 'Station Area', 'Box',
'Original Priority', 'Priority', 'Final Priority', 'ALS Unit',
'Call Type Group', 'Number of Alarms', 'Unit Type',
'Unit sequence in call dispatch', 'Fire Prevention District',
'Supervisor District', 'Neighborhooods - Analysis Boundaries',
'Location', 'RowID'],
dtype='object')
In [10]:
len(df)
Out[10]:
4548659
There are over 4 million rows in the DataFrame and it takes ~15-20 seconds to do a full read of it, maybe longer if your request gets blocked on AWS.
Now run the count again and see how long it takes on the cached data - it should take less than a second.
In [22]:
len(df['Call Type'].unique())
Out[22]:
32
In [20]:
%matplotlib inline
df['Call Type'].value_counts().head(10).plot(kind='pie', figsize=(10,10))
Out[20]:
<matplotlib.axes._subplots.AxesSubplot at 0x20059c17208>
In [21]:
df['Call Type'].value_counts().head(10).plot(kind='bar', figsize=(10,5))
Out[21]:
<matplotlib.axes._subplots.AxesSubplot at 0x20059d5d518>
In [27]:
df.groupby('Call Type')["Unit ID"].count().sort_values(ascending=False)
Out[27]:
Call Type
Medical Incident 2945292
Structure Fire 604773
Alarms 486055
Traffic Collision 186090
Other 73402
Citizen Assist / Service Call 68879
Outside Fire 53059
Vehicle Fire 22291
Water Rescue 21689
Gas Leak (Natural and LP Gases) 16807
Electrical Hazard 12695
Odor (Strange / Unknown) 12277
Elevator / Escalator Rescue 11905
Smoke Investigation (Outside) 9982
Fuel Spill 5343
HazMat 3802
Industrial Accidents 2785
Explosion 2524
Aircraft Emergency 1511
Assist Police 1312
Train / Rail Incident 1148
High Angle Rescue 1147
Watercraft in Distress 884
Extrication / Entrapped (Machinery, Vehicle) 665
Oil Spill 516
Confined Space / Structure Collapse 467
Mutual Aid / Assist Outside Agency 423
Marine Fire 356
Suspicious Package 299
Administrative 262
Train / Rail Fire 10
Lightning Strike (Investigation) 9
Name: Unit ID, dtype: int64
In [28]:
df.head()
Out[28]:
Call Number
Unit ID
Incident Number
Call Type
Call Date
Watch Date
Received DtTm
Entry DtTm
Dispatch DtTm
Response DtTm
...
ALS Unit
Call Type Group
Number of Alarms
Unit Type
Unit sequence in call dispatch
Fire Prevention District
Supervisor District
Neighborhooods - Analysis Boundaries
Location
RowID
0
1030101
E18
306091
Medical Incident
04/12/2000
04/12/2000
04/12/2000 09:00:29 PM
04/12/2000 09:01:40 PM
04/12/2000 09:02:00 PM
NaN
...
False
NaN
1
ENGINE
1.0
8
4
Sunset/Parkside
(37.7487247711275, -122.495504020186)
001030101-E18
1
1030104
M14
30612
Medical Incident
04/12/2000
04/12/2000
04/12/2000 09:09:02 PM
04/12/2000 09:10:17 PM
04/12/2000 09:10:29 PM
04/12/2000 09:12:11 PM
...
True
NaN
1
MEDIC
2.0
8
4
Sunset/Parkside
(37.7540326780595, -122.502185504543)
001030104-M14
2
1030106
M36
30614
Medical Incident
04/12/2000
04/12/2000
04/12/2000 09:09:44 PM
04/12/2000 09:10:56 PM
04/12/2000 09:11:47 PM
NaN
...
False
NaN
1
MEDIC
1.0
2
6
Tenderloin
(37.7764405100838, -122.418481123408)
001030106-M36
3
1030107
E01
30615
Alarms
04/12/2000
04/12/2000
04/12/2000 09:13:47 PM
04/12/2000 09:13:51 PM
04/12/2000 09:14:13 PM
04/12/2000 09:15:58 PM
...
False
NaN
1
ENGINE
3.0
3
6
Tenderloin
(37.7825474000421, -122.412247935495)
001030107-E01
4
1030108
RS1
30616
Medical Incident
04/12/2000
04/12/2000
04/12/2000 09:14:43 PM
04/12/2000 09:16:11 PM
04/12/2000 09:16:24 PM
04/12/2000 09:18:20 PM
...
False
NaN
1
RESCUE SQUAD
2.0
1
3
Financial District/South Beach
(37.7863072236365, -122.405294845215)
001030108-RS1
5 rows × 34 columns
In [32]:
FORMAT = "%m/%d/%Y"
df["Call Date TS"] = pd.to_datetime(df['Call Date'], format=FORMAT)
df.head()
Out[32]:
Call Number
Unit ID
Incident Number
Call Type
Call Date
Watch Date
Received DtTm
Entry DtTm
Dispatch DtTm
Response DtTm
...
Call Type Group
Number of Alarms
Unit Type
Unit sequence in call dispatch
Fire Prevention District
Supervisor District
Neighborhooods - Analysis Boundaries
Location
RowID
Call Date TS
0
1030101
E18
306091
Medical Incident
04/12/2000
04/12/2000
04/12/2000 09:00:29 PM
04/12/2000 09:01:40 PM
04/12/2000 09:02:00 PM
NaN
...
NaN
1
ENGINE
1.0
8
4
Sunset/Parkside
(37.7487247711275, -122.495504020186)
001030101-E18
2000-04-12
1
1030104
M14
30612
Medical Incident
04/12/2000
04/12/2000
04/12/2000 09:09:02 PM
04/12/2000 09:10:17 PM
04/12/2000 09:10:29 PM
04/12/2000 09:12:11 PM
...
NaN
1
MEDIC
2.0
8
4
Sunset/Parkside
(37.7540326780595, -122.502185504543)
001030104-M14
2000-04-12
2
1030106
M36
30614
Medical Incident
04/12/2000
04/12/2000
04/12/2000 09:09:44 PM
04/12/2000 09:10:56 PM
04/12/2000 09:11:47 PM
NaN
...
NaN
1
MEDIC
1.0
2
6
Tenderloin
(37.7764405100838, -122.418481123408)
001030106-M36
2000-04-12
3
1030107
E01
30615
Alarms
04/12/2000
04/12/2000
04/12/2000 09:13:47 PM
04/12/2000 09:13:51 PM
04/12/2000 09:14:13 PM
04/12/2000 09:15:58 PM
...
NaN
1
ENGINE
3.0
3
6
Tenderloin
(37.7825474000421, -122.412247935495)
001030107-E01
2000-04-12
4
1030108
RS1
30616
Medical Incident
04/12/2000
04/12/2000
04/12/2000 09:14:43 PM
04/12/2000 09:16:11 PM
04/12/2000 09:16:24 PM
04/12/2000 09:18:20 PM
...
NaN
1
RESCUE SQUAD
2.0
1
3
Financial District/South Beach
(37.7863072236365, -122.405294845215)
001030108-RS1
2000-04-12
5 rows × 35 columns
In [33]:
df['Call Date TS'].dt.year.value_counts()
Out[33]:
2017 312471
2016 303967
2015 296244
2014 280905
2013 273097
2011 268689
2012 265596
2010 256174
2008 249690
2009 244633
2003 240457
2007 235856
2006 235440
2004 235353
2005 232919
2002 225951
2001 220328
2000 158154
2018 12735
Name: Call Date TS, dtype: int64
In [43]:
from datetime import datetime
july4 = datetime(2016, 7, 4, 0, 0, 0)
june27 = datetime(2016, 6, 27, 0, 0, 0)
subset = df[(df['Call Date TS'] >= june27) & (df['Call Date TS'] <= july4)]
subset.head()
Out[43]:
Call Number
Unit ID
Incident Number
Call Type
Call Date
Watch Date
Received DtTm
Entry DtTm
Dispatch DtTm
Response DtTm
...
Call Type Group
Number of Alarms
Unit Type
Unit sequence in call dispatch
Fire Prevention District
Supervisor District
Neighborhooods - Analysis Boundaries
Location
RowID
Call Date TS
4021645
161842586
E36
16072812
Alarms
07/02/2016
07/02/2016
07/02/2016 05:31:27 PM
07/02/2016 05:32:41 PM
07/02/2016 05:33:10 PM
NaN
...
Alarm
1
ENGINE
2.0
2
8
Mission
(37.76558714518, -122.424297645357)
161842586-E36
2016-07-02
4021674
161842586
T07
16072812
Alarms
07/02/2016
07/02/2016
07/02/2016 05:31:27 PM
07/02/2016 05:32:41 PM
07/02/2016 05:33:10 PM
NaN
...
Alarm
1
TRUCK
1.0
2
8
Mission
(37.76558714518, -122.424297645357)
161842586-T07
2016-07-02
4055628
161794046
E07
16070984
Structure Fire
06/27/2016
06/27/2016
06/27/2016 10:25:42 PM
06/27/2016 10:27:07 PM
06/27/2016 10:27:34 PM
06/27/2016 10:28:16 PM
...
Alarm
1
ENGINE
7.0
2
8
Mission
(37.7664900497694, -122.423775038029)
161794046-E07
2016-06-27
4055658
161794046
E36
16070984
Structure Fire
06/27/2016
06/27/2016
06/27/2016 10:25:42 PM
06/27/2016 10:27:07 PM
06/27/2016 10:27:34 PM
06/27/2016 10:28:27 PM
...
Alarm
1
ENGINE
10.0
2
8
Mission
(37.7664900497694, -122.423775038029)
161794046-E36
2016-06-27
4055762
161801114
T03
16071118
Vehicle Fire
06/28/2016
06/28/2016
06/28/2016 09:49:23 AM
06/28/2016 09:51:01 AM
06/28/2016 09:51:18 AM
06/28/2016 09:53:10 AM
...
Fire
1
TRUCK
2.0
4
3
Nob Hill
(37.7907373942464, -122.419872740428)
161801114-T03
2016-06-28
5 rows × 35 columns
In [54]:
subset.groupby(pd.Grouper(key='Call Date TS', freq="D")).count()
Out[54]:
Call Number
Unit ID
Incident Number
Call Type
Call Date
Watch Date
Received DtTm
Entry DtTm
Dispatch DtTm
Response DtTm
...
ALS Unit
Call Type Group
Number of Alarms
Unit Type
Unit sequence in call dispatch
Fire Prevention District
Supervisor District
Neighborhooods - Analysis Boundaries
Location
RowID
Call Date TS
2016-06-27
809
809
809
809
809
809
809
809
809
789
...
809
809
809
809
809
809
809
809
809
809
2016-06-28
753
753
753
753
753
753
753
753
753
733
...
753
753
753
753
753
753
753
753
753
753
2016-06-29
731
731
731
731
731
731
731
731
731
711
...
731
731
731
731
731
731
731
731
731
731
2016-06-30
797
797
797
797
797
797
797
797
797
772
...
797
797
797
797
797
797
797
797
797
797
2016-07-01
847
847
847
847
847
847
847
847
847
816
...
847
847
847
847
847
847
847
847
847
847
2016-07-02
729
729
729
729
729
729
729
729
729
714
...
729
729
729
729
729
729
729
729
729
729
2016-07-03
797
797
797
797
797
797
797
797
797
786
...
797
797
797
797
797
797
797
797
797
797
2016-07-04
958
958
958
958
958
958
958
958
958
939
...
958
958
958
958
958
958
958
958
958
958
8 rows × 34 columns
In [62]:
subset.groupby(subset['Call Date TS'].dt.weekday_name).count()
Out[62]:
Call Number
Unit ID
Incident Number
Call Type
Call Date
Watch Date
Received DtTm
Entry DtTm
Dispatch DtTm
Response DtTm
...
Call Type Group
Number of Alarms
Unit Type
Unit sequence in call dispatch
Fire Prevention District
Supervisor District
Neighborhooods - Analysis Boundaries
Location
RowID
Call Date TS
Call Date TS
Friday
847
847
847
847
847
847
847
847
847
816
...
847
847
847
847
847
847
847
847
847
847
Monday
1767
1767
1767
1767
1767
1767
1767
1767
1767
1728
...
1767
1767
1767
1767
1767
1767
1767
1767
1767
1767
Saturday
729
729
729
729
729
729
729
729
729
714
...
729
729
729
729
729
729
729
729
729
729
Sunday
797
797
797
797
797
797
797
797
797
786
...
797
797
797
797
797
797
797
797
797
797
Thursday
797
797
797
797
797
797
797
797
797
772
...
797
797
797
797
797
797
797
797
797
797
Tuesday
753
753
753
753
753
753
753
753
753
733
...
753
753
753
753
753
753
753
753
753
753
Wednesday
731
731
731
731
731
731
731
731
731
711
...
731
731
731
731
731
731
731
731
731
731
7 rows × 35 columns
In [ ]:
Content source: feststelltaste/software-analytics
Similar notebooks: