Exploratory Data Analysis (EDA) Round 2

The purpose of this notebook is to do another pass at the fire risk data and see if we can't examine the entire thing easily, get a sense of how far back our dataset goes. To do this, we are going to use the offset function in the query (now that we know how to use it properly)


In [1]:
from __future__ import division, print_function # always future proof for python3
import pandas as pd

In [2]:
# for simplicity we'll store the url in a string that we'll then insert a new offset into each round
query_url = 'https://data.sfgov.org/resource/wbb6-uh78.json?$order=close_dttm%20DESC&$offset={}&$limit=1000'

In [3]:
df = pd.read_json(query_url.format('0'))

In [4]:
# I'm curious, how many pages of data do we have in this dataset, how many records?
# we could have done this programmatically by just continuing to go through the records
# and at a certain point we will need to do that, but for now
# we cheated and looked at the url https://data.sfgov.org/Public-Safety/Fire-Incidents/wr8u-xric
# and it shows we have 403,988 rows of data, so 403 pages that we'd need to page through if we wanted to create one
# big database.
# Certainly something we will do but for now let's grab the very last page and see what we've got
df = pd.read_json(query_url.format('403000'))

In [5]:
df.head()


Out[5]:
action_taken_other action_taken_primary action_taken_secondary address alarm_dttm area_of_fire_origin arrival_dttm automatic_extinguishing_system_present automatic_extinguishing_sytem_failure_reason automatic_extinguishing_sytem_perfomance ... other_units primary_situation property_use station_area structure_status structure_type supervisor_district suppression_personnel suppression_units zipcode
0 - 86 - investigate - 951 Eddy St. 2003-01-11T02:57:40.000 NaN 2003-01-11T03:01:06.000 NaN NaN NaN ... 0 733 - smoke detector activation/malfunction 429 - multifamily dwellings 05 NaN NaN 5 11 3 94102
1 - 86 - investigate - 30th St. / Dolores St. 2003-01-11T02:20:29.000 NaN 2003-01-11T02:25:32.000 NaN NaN NaN ... 0 311 - medical assist, assist ems crew 960 - street, other 11 NaN NaN 8 4 1 94110
2 - 86 - investigate - Market St. / Steuart St. 2003-01-11T02:08:45.000 NaN 2003-01-11T02:15:11.000 NaN NaN NaN ... 0 711 - municipal alarm system, street box false 960 - street, other 13 NaN NaN 3 4 1 94105
3 - 86 - investigate - 10th St. / Market St. 2003-01-11T01:12:53.000 NaN 2003-01-11T01:15:22.000 NaN NaN NaN ... 0 710 - malicious, mischievous false call, other 963 - street or road in commercial area 36 NaN NaN 6 9 2 94103
4 - 11 - extinguish - 373 Ellis St. 2003-01-10T23:33:30.000 NaN 2003-01-10T23:36:17.000 NaN NaN NaN ... 0 100 - fire, other 429 - multifamily dwellings 03 NaN NaN 6 34 10 94102

5 rows × 62 columns


In [6]:
# let's take a look at what cleanup work we need to do. 
# Generally a field that is just labeled as an "object" is something we wish to clean up
# This code will eventually find its way into a pipeline and moved to src
df.info()


<class 'pandas.core.frame.DataFrame'>
Int64Index: 988 entries, 0 to 987
Data columns (total 62 columns):
action_taken_other                              988 non-null object
action_taken_primary                            988 non-null object
action_taken_secondary                          988 non-null object
address                                         988 non-null object
alarm_dttm                                      988 non-null object
area_of_fire_origin                             60 non-null object
arrival_dttm                                    988 non-null object
automatic_extinguishing_system_present          36 non-null object
automatic_extinguishing_sytem_failure_reason    36 non-null object
automatic_extinguishing_sytem_perfomance        36 non-null object
automatic_extinguishing_sytem_type              36 non-null object
battalion                                       988 non-null object
box                                             20 non-null float64
call_number                                     988 non-null int64
city                                            988 non-null object
civilian_fatalities                             988 non-null int64
civilian_injuries                               988 non-null int64
close_dttm                                      988 non-null object
detector_alerted_occupants                      988 non-null object
detector_effectiveness                          36 non-null object
detector_failure_reason                         36 non-null object
detector_operation                              36 non-null object
detector_type                                   36 non-null object
detectors_present                               36 non-null object
ems_personnel                                   988 non-null int64
ems_units                                       988 non-null int64
estimated_contents_loss                         988 non-null int64
estimated_property_loss                         988 non-null int64
exposure_number                                 988 non-null int64
fire_fatalities                                 988 non-null int64
fire_injuries                                   988 non-null int64
fire_spread                                     36 non-null object
first_unit_on_scene                             867 non-null object
floor_of_fire_origin                            16 non-null float64
heat_source                                     60 non-null object
human_factors_associated_with_ignition          60 non-null object
ignition_cause                                  60 non-null object
ignition_factor_primary                         60 non-null object
ignition_factor_secondary                       60 non-null object
incident_date                                   988 non-null object
incident_number                                 988 non-null int64
item_first_ignited                              60 non-null object
location                                        775 non-null object
mutual_aid                                      988 non-null object
neighborhood_district                           774 non-null object
no_flame_spead                                  13 non-null float64
number_of_floors_with_extreme_damage            16 non-null float64
number_of_floors_with_heavy_damage              16 non-null float64
number_of_floors_with_minimum_damage            16 non-null float64
number_of_floors_with_significant_damage        16 non-null float64
number_of_sprinkler_heads_operating             16 non-null float64
other_personnel                                 988 non-null int64
other_units                                     988 non-null int64
primary_situation                               988 non-null object
property_use                                    988 non-null object
station_area                                    988 non-null object
structure_status                                36 non-null object
structure_type                                  36 non-null object
supervisor_district                             774 non-null float64
suppression_personnel                           988 non-null int64
suppression_units                               988 non-null int64
zipcode                                         774 non-null float64
dtypes: float64(10), int64(15), object(37)
memory usage: 486.3+ KB

In [7]:
# As you can see above in the second-to-last row, there are 10 float objects, 15 int objects, and 37 unspecified objects.
# We will want to fix that.
# first, what are the date values and all?

# df.describe(include='all')
# Uh-oh! The describe() function above failed with a type error,
# stating that there's an unhashable type of `dict` in our data. We'll need to find that and fix it
# before we can do a proper analysis

# let's look at a single record
df.iloc[0]


Out[7]:
action_taken_other                                                                              -
action_taken_primary                                                             86 - investigate
action_taken_secondary                                                                          -
address                                                                              951 Eddy St.
alarm_dttm                                                                2003-01-11T02:57:40.000
area_of_fire_origin                                                                           NaN
arrival_dttm                                                              2003-01-11T03:01:06.000
automatic_extinguishing_system_present                                                        NaN
automatic_extinguishing_sytem_failure_reason                                                  NaN
automatic_extinguishing_sytem_perfomance                                                      NaN
automatic_extinguishing_sytem_type                                                            NaN
battalion                                                                                     B02
box                                                                                           NaN
call_number                                                                              30110032
city                                                                                           SF
civilian_fatalities                                                                             0
civilian_injuries                                                                               0
close_dttm                                                                2003-01-11T03:08:24.000
detector_alerted_occupants                                                                      -
detector_effectiveness                                                                        NaN
detector_failure_reason                                                                       NaN
detector_operation                                                                            NaN
detector_type                                                                                 NaN
detectors_present                                                                             NaN
ems_personnel                                                                                   0
ems_units                                                                                       0
estimated_contents_loss                                                                         0
estimated_property_loss                                                                         0
exposure_number                                                                                 0
fire_fatalities                                                                                 0
                                                                      ...                        
first_unit_on_scene                                                                           T05
floor_of_fire_origin                                                                          NaN
heat_source                                                                                   NaN
human_factors_associated_with_ignition                                                        NaN
ignition_cause                                                                                NaN
ignition_factor_primary                                                                       NaN
ignition_factor_secondary                                                                     NaN
incident_date                                                             2003-01-11T00:00:00.000
incident_number                                                                           3003103
item_first_ignited                                                                            NaN
location                                        {u'type': u'Point', u'coordinates': [-122.4232...
mutual_aid                                                                                   none
neighborhood_district                                                            Western Addition
no_flame_spead                                                                                NaN
number_of_floors_with_extreme_damage                                                          NaN
number_of_floors_with_heavy_damage                                                            NaN
number_of_floors_with_minimum_damage                                                          NaN
number_of_floors_with_significant_damage                                                      NaN
number_of_sprinkler_heads_operating                                                           NaN
other_personnel                                                                                 0
other_units                                                                                     0
primary_situation                                     733 - smoke detector activation/malfunction
property_use                                                          429 - multifamily dwellings
station_area                                                                                   05
structure_status                                                                              NaN
structure_type                                                                                NaN
supervisor_district                                                                             5
suppression_personnel                                                                          11
suppression_units                                                                               3
zipcode                                                                                     94102
Name: 0, dtype: object

In [8]:
# even though there are so many columns that some are hidden, we can see above that location is a dict field.
# so we should either consider removing it, or doing an eval to get it into the database
df.iloc[0]['location']


Out[8]:
{u'coordinates': [-122.42329, 37.78251], u'type': u'Point'}

In [9]:
# pretty simple, it's just a location type and then lat long. I'm not sure we need the type "Point" but let's store it for now
# just to see what we get
# doing a simple check on http://www.latlong.net/
# confirmed the general lat long for SF is 37.774929, -122.419416 respectively,
# so the data is most likely flipped, where longitude is first, and latitude the second value.
# let's use ast/eval to convert the location into separate columns

# we first should confirm that we are dealing with strings or dicts
type(df.iloc[0]['location'])


Out[9]:
dict

In [10]:
#import ast  # TODO: move this to the very top with the other imports
temp_df = df.join(pd.DataFrame(df["location"].to_dict()).T)
# we'll do the slow approach to this for now, just as an example, but later consider a faster solution
#for index, row in df.iterrows():
temp_df.head()


Out[10]:
action_taken_other action_taken_primary action_taken_secondary address alarm_dttm area_of_fire_origin arrival_dttm automatic_extinguishing_system_present automatic_extinguishing_sytem_failure_reason automatic_extinguishing_sytem_perfomance ... property_use station_area structure_status structure_type supervisor_district suppression_personnel suppression_units zipcode coordinates type
0 - 86 - investigate - 951 Eddy St. 2003-01-11T02:57:40.000 NaN 2003-01-11T03:01:06.000 NaN NaN NaN ... 429 - multifamily dwellings 05 NaN NaN 5 11 3 94102 [-122.42329, 37.78251] Point
1 - 86 - investigate - 30th St. / Dolores St. 2003-01-11T02:20:29.000 NaN 2003-01-11T02:25:32.000 NaN NaN NaN ... 960 - street, other 11 NaN NaN 8 4 1 94110 [-122.424233, 37.742237] Point
2 - 86 - investigate - Market St. / Steuart St. 2003-01-11T02:08:45.000 NaN 2003-01-11T02:15:11.000 NaN NaN NaN ... 960 - street, other 13 NaN NaN 3 4 1 94105 [-122.394751, 37.79448] Point
3 - 86 - investigate - 10th St. / Market St. 2003-01-11T01:12:53.000 NaN 2003-01-11T01:15:22.000 NaN NaN NaN ... 963 - street or road in commercial area 36 NaN NaN 6 9 2 94103 [-122.417505, 37.77654] Point
4 - 11 - extinguish - 373 Ellis St. 2003-01-10T23:33:30.000 NaN 2003-01-10T23:36:17.000 NaN NaN NaN ... 429 - multifamily dwellings 03 NaN NaN 6 34 10 94102 [-122.412295, 37.784858] Point

5 rows × 64 columns


In [11]:
temp_df.iloc[0]['coordinates']


Out[11]:
[-122.42329, 37.78251]

In [12]:
# let's quickly check if the `type` is necessary, or if it only contains point and we should just delete it
temp_df.type.value_counts(dropna=False)


Out[12]:
Point    775
NaN      213
Name: type, dtype: int64

In [13]:
# let's check if the null values are important? 
temp_df[temp_df.type.isnull()].tail()


Out[13]:
action_taken_other action_taken_primary action_taken_secondary address alarm_dttm area_of_fire_origin arrival_dttm automatic_extinguishing_system_present automatic_extinguishing_sytem_failure_reason automatic_extinguishing_sytem_perfomance ... property_use station_area structure_status structure_type supervisor_district suppression_personnel suppression_units zipcode coordinates type
942 - 22 - rescue, remove from harm - 573 26th Av. 2003-01-01T03:36:46.000 NaN 2003-01-01T03:40:54.000 NaN NaN NaN ... 000 - property use, other 14 NaN NaN NaN 4 1 NaN NaN NaN
949 - 86 - investigate - Buchanan St. / Moulton St. 2003-01-01T03:10:40.000 80 - vehicle area, other 2003-01-01T03:12:12.000 NaN NaN NaN ... 962 - residential street, road or residential dr 16 NaN NaN NaN 4 1 NaN NaN NaN
958 - 33 - provide advanced life support (als) - Coleridge St. / Eugenia Av. 2003-01-01T01:30:02.000 NaN 2003-01-01T01:37:10.000 NaN NaN NaN ... 962 - residential street, road or residential dr 32 NaN NaN NaN 6 2 NaN NaN NaN
971 - 10 - fire, other - 2 Parker Av. 2003-01-01T00:43:53.000 - 2003-01-01T00:47:00.000 - - - ... 960 - street, other 10 - - NaN 4 1 NaN NaN NaN
985 - 86 - investigate - 33rd Av. / Noriega St. 2003-01-01T00:18:19.000 - 2003-01-01T00:21:15.000 - - - ... 962 - residential street, road or residential dr 18 - - NaN 4 1 NaN NaN NaN

5 rows × 64 columns


In [14]:
# Can we assume that if type is null, so is coordinates, in which case location may also be null?
temp_df[temp_df.type.isnull()].coordinates.value_counts(dropna=False)


Out[14]:
NaN    213
Name: coordinates, dtype: int64

In [15]:
# OK, so the above info shows we can drop type, and we can also drop location
# we'll also overwrite the original column and delete the temp_df
df = temp_df.drop(['type','location'],axis=1)

In [16]:
del temp_df

In [17]:
# TODO: when we circle back and do our next notebook and import, we should consider using int32, float32 to save memory
# TODO: use a publicly available database to fill in zipcode and lat lon with a geo lookup

In [18]:
# now that we've taken a look, remember that this is an OLD database, so it may have been overzealous to delete columns
# or decide that type wasn't useful.
# let's quickly do a check for the most recent 3,000 columns and see what data we get there.
temp_df = pd.read_json(query_url.format('0'))

In [19]:
temp_df = temp_df.join(pd.DataFrame(temp_df["location"].to_dict()).T)

In [20]:
temp_df.iloc[-1]


Out[20]:
action_taken_other                                                                        NaN
action_taken_primary                                                           86 investigate
action_taken_secondary                                                                    NaN
address                                                                  7th St/folsom Street
alarm_dttm                                                            2016-06-24T16:21:32.000
area_of_fire_origin                                                                       NaN
arrival_dttm                                                          2016-06-24T16:27:13.000
automatic_extinguishing_system_present                                                    NaN
automatic_extinguishing_sytem_perfomance                                                  NaN
automatic_extinguishing_sytem_type                                                        NaN
battalion                                                                                 B03
box                                                                                      2314
call_number                                                                         161762630
city                                                                            San Francisco
civilian_fatalities                                                                         0
civilian_injuries                                                                           0
close_dttm                                                            2016-06-24T16:28:44.000
detector_alerted_occupants                                                                NaN
detector_effectiveness                                                                    NaN
detector_operation                                                                        NaN
detector_type                                                                             NaN
detectors_present                                                                         NaN
ems_personnel                                                                               0
ems_units                                                                                   0
estimated_contents_loss                                                                   NaN
estimated_property_loss                                                                   NaN
exposure_number                                                                             0
fire_fatalities                                                                             0
fire_injuries                                                                               0
floor_of_fire_origin                                                                      NaN
heat_source                                                                               NaN
human_factors_associated_with_ignition                                                    NaN
ignition_cause                                                                            NaN
ignition_factor_primary                                                                   NaN
incident_date                                                         2016-06-24T00:00:00.000
incident_number                                                                      16069482
item_first_ignited                                                                        NaN
location                                    {u'type': u'Point', u'coordinates': [-122.4078...
mutual_aid                                                                             n none
neighborhood_district                                                         South of Market
no_flame_spead                                                                             na
number_of_sprinkler_heads_operating                                                       NaN
other_personnel                                                                             0
other_units                                                                                 0
primary_situation                                        700 false alarm or false call, other
property_use                                962 residential street, road or residential dr...
station_area                                                                                1
structure_status                                                                          NaN
structure_type                                                                            NaN
supervisor_district                                                                         6
suppression_personnel                                                                       9
suppression_units                                                                           2
zipcode                                                                                 94103
coordinates                                                          [-122.407844, 37.776746]
type                                                                                    Point
Name: 999, dtype: object

In [21]:
temp_df.type.value_counts(dropna=False)


Out[21]:
Point    999
NaN        1
Name: type, dtype: int64

In [22]:
# it appears that the coordinates always seem to represent a point, or is null. 
# so for now let's drop it
# one of the challenges with this dataset is that there are a LOT of columns
# we should determine which ones to trim down.
# since we now have the first 1,000 rows of data in `df` and the most recent records in `temp_df` let's concat the two into a single dataframe
temp_df = temp_df.drop(['type','location'],axis=1)

In [23]:
df = pd.concat([temp_df, df])

In [24]:
df.head()


Out[24]:
action_taken_other action_taken_primary action_taken_secondary address alarm_dttm area_of_fire_origin arrival_dttm automatic_extinguishing_system_present automatic_extinguishing_sytem_failure_reason automatic_extinguishing_sytem_perfomance ... other_units primary_situation property_use station_area structure_status structure_type supervisor_district suppression_personnel suppression_units zipcode
0 NaN 86 investigate NaN 105 Aptos Avenue 2016-07-10T21:50:58.000 NaN 2016-07-10T21:54:42.000 NaN NaN NaN ... 0 733 smoke detector activation due to malfunction 215 high school/junior high school/middle school 15 NaN NaN 7 11 3 94127
1 NaN 86 investigate NaN 8th St/bryant Street 2016-07-10T19:49:15.000 NaN 2016-07-10T19:52:06.000 NaN NaN NaN ... 0 711 municipal alarm system, malicious false alarm nnn none 29 NaN NaN 6 9 2 94103
2 NaN 61 restore municipal services NaN 2nd St/brannan Street 2016-07-10T19:44:29.000 NaN 2016-07-10T19:47:35.000 NaN NaN NaN ... 0 711 municipal alarm system, malicious false alarm 960 street, other 8 NaN NaN 6 9 2 94107
3 NaN 86 investigate NaN 8400 Oceanview Trail 2016-07-10T19:24:41.000 NaN 2016-07-10T19:28:56.000 NaN NaN NaN ... 0 733 smoke detector activation due to malfunction 429 multifamily dwelling 33 NaN NaN 7 11 3 94132
4 NaN 86 investigate NaN 636 Velasco Av A 2016-07-10T18:56:31.000 NaN 2016-07-10T19:00:13.000 NaN NaN NaN ... 1 531 smoke or odor removal 419 1 or 2 family dwelling 43 NaN NaN 10 29 8 94134

5 rows × 62 columns


In [25]:
# NOW let's try describe()
# we now have a problem because we have a list, which we created when we made coordinates
# let's fix that
# remember, first value is the longitude, and the second value is latitude
# don't do this for null values
mask = df.coordinates.notnull()
df.loc[mask, 'long'] = df[mask]['coordinates'].apply(lambda x: x[0])

In [26]:
df.loc[mask, 'lat'] = df[mask]['coordinates'].apply(lambda x: x[1])

In [27]:
df.iloc[-1]


Out[27]:
action_taken_other                                                                             -
action_taken_primary                                                            86 - investigate
action_taken_secondary                                                                         -
address                                                                Broadway St. / Taylor St.
alarm_dttm                                                               2003-01-01T00:02:13.000
area_of_fire_origin                                                                            -
arrival_dttm                                                             2003-01-01T00:06:13.000
automatic_extinguishing_system_present                                                         -
automatic_extinguishing_sytem_failure_reason                                                   -
automatic_extinguishing_sytem_perfomance                                                       -
automatic_extinguishing_sytem_type                                                             -
battalion                                                                                    B01
box                                                                                         1442
call_number                                                                             30010002
city                                                                                          SF
civilian_fatalities                                                                            0
civilian_injuries                                                                              0
close_dttm                                                               2003-01-01T00:06:37.000
coordinates                                                             [-122.413562, 37.797038]
detector_alerted_occupants                                                                     -
detector_effectiveness                                                                         -
detector_failure_reason                                                                        -
detector_operation                                                                             -
detector_type                                                                                  -
detectors_present                                                                              -
ems_personnel                                                                                  0
ems_units                                                                                      0
estimated_contents_loss                                                                        0
estimated_property_loss                                                                        0
exposure_number                                                                                0
                                                                      ...                       
floor_of_fire_origin                                                                         NaN
heat_source                                                                                    -
human_factors_associated_with_ignition                                                         -
ignition_cause                                                                                 -
ignition_factor_primary                                                                        -
ignition_factor_secondary                                                                      -
incident_date                                                            2003-01-01T00:00:00.000
incident_number                                                                          3000003
item_first_ignited                                                                             -
mutual_aid                                                                                  none
neighborhood_district                                                                   Nob Hill
no_flame_spead                                                                               NaN
number_of_floors_with_extreme_damage                                                         NaN
number_of_floors_with_heavy_damage                                                           NaN
number_of_floors_with_minimum_damage                                                         NaN
number_of_floors_with_significant_damage                                                     NaN
number_of_sprinkler_heads_operating                                                          NaN
other_personnel                                                                                0
other_units                                                                                    0
primary_situation                                 711 - municipal alarm system, street box false
property_use                                    962 - residential street, road or residential dr
station_area                                                                                  02
structure_status                                                                               -
structure_type                                                                                 -
supervisor_district                                                                            3
suppression_personnel                                                                          4
suppression_units                                                                              1
zipcode                                                                                    94133
long                                                                                    -122.414
lat                                                                                       37.797
Name: 987, dtype: object

In [28]:
# now we can delete the coordinates column
df = df.drop(['coordinates'],axis=1)

In [29]:
# again, let's try to describe the data
df.describe(include='all')


Out[29]:
action_taken_other action_taken_primary action_taken_secondary address alarm_dttm area_of_fire_origin arrival_dttm automatic_extinguishing_system_present automatic_extinguishing_sytem_failure_reason automatic_extinguishing_sytem_perfomance ... property_use station_area structure_status structure_type supervisor_district suppression_personnel suppression_units zipcode long lat
count 996 1988 1095 1985 1988 108 1988 50 36 41 ... 1985 1988 50 53 1760.000000 1988.000000 1988.000000 1760.000000 1774.000000 1774.000000
unique 7 75 24 1689 1988 36 1986 6 1 4 ... 131 87 5 7 NaN NaN NaN NaN NaN NaN
top - 86 - investigate - 1 Sf Intl Airport 2016-06-27T09:47:26.000 - 2003-01-03T11:40:20.000 - - - ... 429 multifamily dwelling 1 - - NaN NaN NaN NaN NaN NaN
freq 987 698 963 10 1 19 2 23 36 35 ... 218 108 22 20 NaN NaN NaN NaN NaN NaN
mean NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN 5.793182 8.836519 2.391851 94113.381250 -122.424762 37.770250
std NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN 2.740296 6.354400 2.132943 10.204425 0.026985 0.026594
min NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN 1.000000 0.000000 0.000000 94102.000000 -122.513984 37.616901
25% NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN 3.000000 4.000000 1.000000 94105.000000 -122.435977 37.759051
50% NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN 6.000000 9.000000 2.000000 94110.000000 -122.419111 37.778016
75% NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN 8.000000 11.000000 3.000000 94121.000000 -122.407486 37.786679
max NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN 11.000000 50.000000 38.000000 94158.000000 -122.337061 37.866731

11 rows × 63 columns


In [30]:
# yay! it worked! we are getting closer to clean data but a long way from it.
# one important thing we should do is look at the close_dttm since that is what we used to order the data
# but it appears our dates are not viewed as dates
df.info()


<class 'pandas.core.frame.DataFrame'>
Int64Index: 1988 entries, 0 to 987
Data columns (total 63 columns):
action_taken_other                              996 non-null object
action_taken_primary                            1988 non-null object
action_taken_secondary                          1095 non-null object
address                                         1985 non-null object
alarm_dttm                                      1988 non-null object
area_of_fire_origin                             108 non-null object
arrival_dttm                                    1988 non-null object
automatic_extinguishing_system_present          50 non-null object
automatic_extinguishing_sytem_failure_reason    36 non-null object
automatic_extinguishing_sytem_perfomance        41 non-null object
automatic_extinguishing_sytem_type              41 non-null object
battalion                                       1988 non-null object
box                                             1020 non-null object
call_number                                     1988 non-null int64
city                                            1975 non-null object
civilian_fatalities                             1988 non-null int64
civilian_injuries                               1988 non-null int64
close_dttm                                      1988 non-null object
detector_alerted_occupants                      1032 non-null object
detector_effectiveness                          37 non-null object
detector_failure_reason                         36 non-null object
detector_operation                              40 non-null object
detector_type                                   40 non-null object
detectors_present                               50 non-null object
ems_personnel                                   1988 non-null int64
ems_units                                       1988 non-null int64
estimated_contents_loss                         1018 non-null float64
estimated_property_loss                         1026 non-null float64
exposure_number                                 1988 non-null int64
fire_fatalities                                 1988 non-null int64
fire_injuries                                   1988 non-null int64
fire_spread                                     36 non-null object
first_unit_on_scene                             867 non-null object
floor_of_fire_origin                            30 non-null float64
heat_source                                     108 non-null object
human_factors_associated_with_ignition          108 non-null object
ignition_cause                                  108 non-null object
ignition_factor_primary                         108 non-null object
ignition_factor_secondary                       60 non-null object
incident_date                                   1988 non-null object
incident_number                                 1988 non-null int64
item_first_ignited                              108 non-null object
mutual_aid                                      1988 non-null object
neighborhood_district                           1760 non-null object
no_flame_spead                                  1013 non-null object
number_of_floors_with_extreme_damage            16 non-null float64
number_of_floors_with_heavy_damage              16 non-null float64
number_of_floors_with_minimum_damage            16 non-null float64
number_of_floors_with_significant_damage        16 non-null float64
number_of_sprinkler_heads_operating             17 non-null float64
other_personnel                                 1988 non-null int64
other_units                                     1988 non-null int64
primary_situation                               1988 non-null object
property_use                                    1985 non-null object
station_area                                    1988 non-null object
structure_status                                50 non-null object
structure_type                                  53 non-null object
supervisor_district                             1760 non-null float64
suppression_personnel                           1988 non-null int64
suppression_units                               1988 non-null int64
zipcode                                         1760 non-null float64
long                                            1774 non-null float64
lat                                             1774 non-null float64
dtypes: float64(12), int64(13), object(38)
memory usage: 994.0+ KB

In [31]:
# fortunately there was a nice use of the dttm to identify what should be datetime objects
# let's use that to filter and then convert
for col in df.columns:
    if 'dttm' in col:
        print(col)


alarm_dttm
arrival_dttm
close_dttm

In [32]:
# so there are three.. hmm I thought there were more? Let's start with converting these and then go from there
df['alarm_dttm'] = pd.to_datetime?

In [33]:
df['alarm_dttm'] = pd.to_datetime(df['alarm_dttm'])

In [34]:
df['arrival_dttm'] = pd.to_datetime(df['arrival_dttm'])
df['close_dttm'] = pd.to_datetime(df['close_dttm'])

In [35]:
df.info()


<class 'pandas.core.frame.DataFrame'>
Int64Index: 1988 entries, 0 to 987
Data columns (total 63 columns):
action_taken_other                              996 non-null object
action_taken_primary                            1988 non-null object
action_taken_secondary                          1095 non-null object
address                                         1985 non-null object
alarm_dttm                                      1988 non-null datetime64[ns]
area_of_fire_origin                             108 non-null object
arrival_dttm                                    1988 non-null datetime64[ns]
automatic_extinguishing_system_present          50 non-null object
automatic_extinguishing_sytem_failure_reason    36 non-null object
automatic_extinguishing_sytem_perfomance        41 non-null object
automatic_extinguishing_sytem_type              41 non-null object
battalion                                       1988 non-null object
box                                             1020 non-null object
call_number                                     1988 non-null int64
city                                            1975 non-null object
civilian_fatalities                             1988 non-null int64
civilian_injuries                               1988 non-null int64
close_dttm                                      1988 non-null datetime64[ns]
detector_alerted_occupants                      1032 non-null object
detector_effectiveness                          37 non-null object
detector_failure_reason                         36 non-null object
detector_operation                              40 non-null object
detector_type                                   40 non-null object
detectors_present                               50 non-null object
ems_personnel                                   1988 non-null int64
ems_units                                       1988 non-null int64
estimated_contents_loss                         1018 non-null float64
estimated_property_loss                         1026 non-null float64
exposure_number                                 1988 non-null int64
fire_fatalities                                 1988 non-null int64
fire_injuries                                   1988 non-null int64
fire_spread                                     36 non-null object
first_unit_on_scene                             867 non-null object
floor_of_fire_origin                            30 non-null float64
heat_source                                     108 non-null object
human_factors_associated_with_ignition          108 non-null object
ignition_cause                                  108 non-null object
ignition_factor_primary                         108 non-null object
ignition_factor_secondary                       60 non-null object
incident_date                                   1988 non-null object
incident_number                                 1988 non-null int64
item_first_ignited                              108 non-null object
mutual_aid                                      1988 non-null object
neighborhood_district                           1760 non-null object
no_flame_spead                                  1013 non-null object
number_of_floors_with_extreme_damage            16 non-null float64
number_of_floors_with_heavy_damage              16 non-null float64
number_of_floors_with_minimum_damage            16 non-null float64
number_of_floors_with_significant_damage        16 non-null float64
number_of_sprinkler_heads_operating             17 non-null float64
other_personnel                                 1988 non-null int64
other_units                                     1988 non-null int64
primary_situation                               1988 non-null object
property_use                                    1985 non-null object
station_area                                    1988 non-null object
structure_status                                50 non-null object
structure_type                                  53 non-null object
supervisor_district                             1760 non-null float64
suppression_personnel                           1988 non-null int64
suppression_units                               1988 non-null int64
zipcode                                         1760 non-null float64
long                                            1774 non-null float64
lat                                             1774 non-null float64
dtypes: datetime64[ns](3), float64(12), int64(13), object(35)
memory usage: 994.0+ KB

In [36]:
# now that we are starting to clean up our data, it's time that we discard some of these unnecessary columns
# let's do a quick values count on each to see if we have any that are just blank field
for col in df.columns:
    print("\n", col.title(), "\n")
    print(df[col].value_counts(dropna=False), "\n")
    print("*"*20)


 Action_Taken_Other 

NaN                                               992
-                                                 987
34 transport person                                 2
63 restore fire alarm system                        2
86 investigate                                      2
62 restore sprinkler or fire protection system      1
20 - search & rescue, other                         1
84 refer to proper authority                        1
Name: action_taken_other, dtype: int64 

********************

 Action_Taken_Primary 

86 - investigate                                     698
86 investigate                                       556
11 - extinguish                                       79
11 extinguishment by fire service personnel           64
00 action taken, other                                51
63 restore fire alarm system                          50
70 assistance, other                                  33
63 - restore fire alarm system                        25
10 fire control or extinguishment, other              25
71 assist physically disabled                         24
31 provide first aid & check for injuries             23
32 provide basic life support (bls)                   21
73 provide manpower                                   19
64 shut down system                                   17
93 - cancelled enroute                                15
70 - assistance, other                                14
00 - action taken, other                              14
10 - fire, other                                      14
30 - emergency medical services, other                13
45 - remove hazard                                    13
71 - assist physically disabled                       13
73 - provide manpower                                 12
61 restore municipal services                         11
30 emergency medical services, other                  11
87 investigate fire out on arrival                    11
64 - shut down system                                 10
31 - provide first aid & check for injuries            9
23 extricate, disentangle                              9
33 provide advanced life support (als)                 9
22 - rescue, remove from harm                          7
                                                    ... 
62 - restore fire protection system                    3
66 remove water                                        3
40 hazardous condition, other                          3
45 remove hazard                                       3
43 - hazmat spill control and confinement              3
12 - salvage & overhaul                                2
92 standby                                             2
65 secure property                                     2
62 restore sprinkler or fire protection system         2
61 - restore municipal services                        2
81 incident command                                    2
75 provide equipment                                   2
23 - extricate, disentangle                            2
60 - systems and services, other                       1
87 - investigate - fire out on arrival                 1
81 - incident command                                  1
80 - information/invest. & enforcement, other          1
75 - provide equipment                                 1
74 - provide apparatus                                 1
74 provide apparatus                                   1
44 - hazmat leak, control & containment                1
44 hazardous materials leak control & containment      1
21 search                                              1
34 transport person                                    1
16 control fire (wildland)                             1
47 decontaminate occupancy or area                     1
65 - secure property                                   1
92 - standby                                           1
12 salvage & overhaul                                  1
84 refer to proper authority                           1
Name: action_taken_primary, dtype: int64 

********************

 Action_Taken_Secondary 

-                                                        963
NaN                                                      893
63 restore fire alarm system                              45
86 investigate                                            29
86 - investigate                                          18
64 shut down system                                        6
82 notify other agencies.                                  4
12 salvage & overhaul                                      4
12 - salvage & overhaul                                    4
62 restore sprinkler or fire protection system             3
11 extinguishment by fire service personnel                3
30 emergency medical services, other                       2
60 systems and services, other                             2
42 hazmat detection, monitoring, sampling, & analysis      1
45 remove hazard                                           1
62 - restore fire protection system                        1
65 secure property                                         1
32 provide basic life support (bls)                        1
51 ventilate                                               1
85 enforce codes                                           1
34 transport person                                        1
34 - transport person                                      1
92 - standby                                               1
66 remove water                                            1
40 hazardous condition, other                              1
Name: action_taken_secondary, dtype: int64 

********************

 Address 

1 Sf Intl Airport                 10
Stockton St/north Point Street     6
Taylor St/eddy Street              5
3251 20th Av.                      5
99 Grove St.                       5
1 South Van Ness Av.               5
140 Jones Street                   4
3rd St. / Howard St.               4
Hyde St/pine Street                4
Grant Av/jackson Street            4
363 Noe Street                     4
Schwerin St. / Visitacion Av.      4
10th St. / Market St.              4
114 Sansome St.                    4
Ellis St. / Jones St.              4
2576 Harrison Street               4
22nd St. / Mission St.             3
Street                             3
23rd St. / Shotwell St.            3
2150 The Embarcadero Nor           3
280nb 6th St Of/brannan Street     3
7th St/howard Street               3
2051 3rd St.                       3
NaN                                3
Golden Gate Av. / Laguna St.       3
21st Av/noriega Street             3
900 North Point Street             3
19th Av/holloway Avenue            3
Eddy St. / Taylor St.              3
1 Drumm Street                     3
                                  ..
1483 45th Av.                      1
3377 Pacific Avenue                1
1238 Buchanan St.                  1
1201 Larkin Street                 1
23rd St. / Pennsylvania Av.        1
Taylor St/geary Street             1
986 Broadway Street                1
15th St/san Bruno Avenue           1
Polk St. / Sutter St.              1
45 Castro St.                      1
Golden Gate Av/gough Street        1
Leland Av/elliot Street            1
3rd St. / Williams Av.             1
939 Ellis Street                   1
33rd Av. / Noriega St.             1
3685 17th St 14 Street             1
111 Taylor Street                  1
50 Drumm St.                       1
Bright St. / Randolph St.          1
3rd St/innes Avenue                1
1130 Guerrero St 3                 1
Polk St/geary Street               1
330 Parnassus Avenue               1
856 Stockton Street                1
Buchanan St. / Waller St.          1
1657 Clement St 3                  1
2659 45th Av.                      1
2425 Geary Bl                      1
50 Chenery St.                     1
Mission St/fremont Street          1
Name: address, dtype: int64 

********************

 Alarm_Dttm 

2003-01-01 14:23:34    1
2003-01-02 05:55:40    1
2003-01-11 01:12:53    1
2016-07-04 01:31:18    1
2016-06-29 10:21:27    1
2003-01-07 19:02:53    1
2003-01-09 21:08:22    1
2003-01-06 17:24:36    1
2003-01-06 08:51:53    1
2003-01-06 21:53:18    1
2016-06-26 17:38:30    1
2016-07-02 08:01:36    1
2016-07-08 13:04:38    1
2003-01-06 00:28:07    1
2016-06-29 00:37:43    1
2016-06-24 21:41:06    1
2003-01-09 15:26:31    1
2003-01-03 14:55:43    1
2016-06-30 05:58:15    1
2003-01-05 23:08:01    1
2016-06-30 23:04:49    1
2003-01-04 21:36:48    1
2003-01-03 17:01:13    1
2016-07-07 10:15:16    1
2003-01-01 01:35:26    1
2016-07-01 06:26:43    1
2016-06-30 04:47:46    1
2003-01-05 11:14:33    1
2016-06-26 14:04:07    1
2016-07-08 15:37:08    1
                      ..
2016-06-26 19:32:39    1
2003-01-08 00:22:40    1
2016-07-09 18:45:09    1
2016-07-07 21:52:33    1
2016-07-04 09:34:54    1
2003-01-03 11:55:38    1
2016-07-02 14:49:45    1
2016-06-24 22:21:32    1
2016-06-29 17:12:02    1
2003-01-07 13:57:25    1
2016-07-01 19:17:37    1
2016-06-26 22:01:53    1
2016-07-04 17:33:23    1
2016-07-04 21:50:32    1
2016-07-07 14:35:39    1
2003-01-05 11:37:11    1
2016-07-08 06:28:58    1
2016-07-09 13:01:53    1
2003-01-01 19:40:15    1
2003-01-03 10:22:59    1
2003-01-01 19:40:39    1
2003-01-09 13:22:47    1
2003-01-10 07:03:16    1
2003-01-06 07:12:39    1
2003-01-06 13:20:14    1
2003-01-03 21:03:24    1
2003-01-04 14:55:36    1
2016-07-01 18:10:58    1
2016-07-08 13:53:28    1
2016-07-07 21:33:20    1
Name: alarm_dttm, dtype: int64 

********************

 Area_Of_Fire_Origin 

NaN                                                      1880
-                                                          19
24 cooking area, kitchen                                    6
83 - engine area, running gear, wheel area                  6
80 - vehicle area, other                                    6
92 - highway, parking lot, street: on or n                  5
90 outside area, other                                      5
90 - outside area, other                                    5
24 - cooking area, kitchen                                  4
94 open area, outside; included are farmland, field         4
76 wall surface: exterior                                   4
uu undetermined                                             4
00 other area of fire origin                                4
14 - common room, den, family/living room                   3
70 structural area, other                                   3
21 - bedroom-<5 persons; inc. jail or pris                  3
25 - bathroom, checkroom, locker room                       2
98 vacant structural area                                   2
83 engine area, running gear, wheel area                    2
26 laundry area, wash house (laundry)                       2
00 - other                                                  2
93 courtyard, patio, terrace                                2
92 highway, parking lot, street: on or near                 1
22 - bedroom-5+persons; inc. barrack/dorms                  1
25 bathroom, checkroom, lavatory, locker room               1
01 - corridor, mall                                         1
50 service facilities, other                                1
84 - fuel tank, fuel line                                   1
26 - laundry area, wash house (laundry)                     1
72 exterior balcony, unenclosed porch                       1
01 hallway corridor, mall                                   1
uu - undetermined                                           1
80 vehicle area, other                                      1
13 assembly area - less than 100 persons                    1
02 exterior stairway, ramp, or fire escape                  1
37 projection room, spotlight area                          1
21 bedroom - < 5 persons; included are jail or prison       1
Name: area_of_fire_origin, dtype: int64 

********************

 Arrival_Dttm 

2003-01-04 16:46:58    2
2003-01-03 11:40:20    2
2016-07-10 11:51:05    1
2003-01-06 08:55:05    1
2016-07-05 00:45:37    1
2016-07-09 06:09:44    1
2016-07-04 23:32:44    1
2016-07-03 17:00:35    1
2016-07-02 12:55:24    1
2016-06-24 18:00:43    1
2016-07-03 14:11:01    1
2003-01-07 11:46:27    1
2003-01-07 05:40:02    1
2003-01-06 12:34:08    1
2016-07-04 22:22:03    1
2016-07-09 01:19:32    1
2003-01-03 19:50:27    1
2003-01-09 12:14:25    1
2016-07-05 09:23:00    1
2003-01-01 23:14:18    1
2016-07-04 22:23:31    1
2003-01-07 14:15:37    1
2003-01-02 12:27:11    1
2003-01-01 14:06:39    1
2016-07-03 12:26:26    1
2016-07-04 12:38:35    1
2003-01-02 06:16:38    1
2003-01-02 20:40:00    1
2016-06-29 04:22:38    1
2016-07-02 08:07:44    1
                      ..
2003-01-02 04:17:12    1
2003-01-04 18:43:21    1
2003-01-01 02:38:17    1
2016-07-03 15:23:49    1
2003-01-01 15:37:38    1
2016-07-04 21:24:15    1
2003-01-05 23:56:25    1
2016-07-06 17:56:20    1
2003-01-02 18:59:00    1
2016-06-27 01:49:26    1
2003-01-10 04:08:03    1
2016-06-29 00:15:48    1
2016-07-09 13:09:13    1
2003-01-03 20:39:43    1
2003-01-03 14:33:24    1
2003-01-03 15:46:51    1
2016-07-01 08:28:42    1
2016-06-28 08:24:13    1
2016-07-02 16:15:16    1
2003-01-06 20:46:12    1
2003-01-02 15:57:19    1
2016-06-28 08:25:25    1
2003-01-01 14:56:43    1
2016-06-27 20:49:22    1
2016-07-04 12:16:00    1
2016-06-29 07:39:30    1
2016-07-04 22:30:27    1
2003-01-10 21:19:05    1
2003-01-05 03:16:10    1
2016-07-03 06:22:57    1
Name: arrival_dttm, dtype: int64 

********************

 Automatic_Extinguishing_System_Present 

NaN                1938
-                    23
n -none present      10
n none present        6
1 present             5
u undetermined        3
1 -present            3
Name: automatic_extinguishing_system_present, dtype: int64 

********************

 Automatic_Extinguishing_Sytem_Failure_Reason 

NaN    1952
-        36
Name: automatic_extinguishing_sytem_failure_reason, dtype: int64 

********************

 Automatic_Extinguishing_Sytem_Perfomance 

NaN                                     1947
-                                         35
3 Fire too small to activate system        4
1 System operated and was effective        1
3 -Fire too small to activate system       1
Name: automatic_extinguishing_sytem_perfomance, dtype: int64 

********************

 Automatic_Extinguishing_Sytem_Type 

NaN                            1947
-                                35
1 wet-pipe sprinkler system       5
1 -wet-pipe sprinkler             1
Name: automatic_extinguishing_sytem_type, dtype: int64 

********************

 Battalion 

B02    342
B03    299
B01    227
B04    210
B10    175
B05    167
B08    152
B09    140
B06    140
B07    113
B99     23
Name: battalion, dtype: int64 

********************

 Box 

NaN       968
1456       11
1453       11
6913       10
1545        9
2314        9
5236        6
1262        6
2931        6
0545        6
5252        6
6374        5
8422        5
1562        5
1546        5
1455        5
2342        5
2145        5
2211        5
1644        4
1462        4
1623        4
3325        4
1461        4
6244        4
1312        4
5233        4
2153        4
3264        4
1553        4
         ... 
1335        1
4262        1
6576        1
7453        1
2324        1
5624        1
6642        1
6474        1
2154        1
2152        1
6247        1
3554        1
2626.0      1
1153        1
8332        1
3552        1
5533.0      1
3133        1
3645        1
3642        1
2522        1
5124        1
2551        1
3322        1
2556        1
4133        1
3246        1
4136        1
5213        1
3413        1
Name: box, dtype: int64 

********************

 Call_Number 

161862754    1
30080325     1
161922349    1
161901873    1
30080307     1
161842484    1
30010528     1
30010388     1
161822008    1
161811771    1
30080317     1
30070078     1
161811505    1
30080321     1
161871170    1
30070441     1
30070080     1
30080331     1
161894019    1
161840461    1
161772504    1
30010521     1
30080337     1
30070098     1
161842516    1
161811799    1
161842520    1
161793370    1
30070108     1
30080351     1
            ..
161911473    1
30100196     1
161764019    1
161860276    1
30030339     1
30010402     1
161830813    1
30020281     1
30100154     1
30020289     1
161850051    1
161802607    1
161922820    1
161860294    1
161821383    1
30010056     1
161782474    1
30100172     1
30030243     1
161911505    1
161852118    1
161772247    1
30020313     1
30030367     1
30070389     1
30030165     1
161903458    1
161770208    1
30020321     1
161792002    1
Name: call_number, dtype: int64 

********************

 City 

SF               983
San Francisco    968
NaN               13
Presidio          10
Treasure Isla      6
Yerba Buena        2
TI                 2
Fort Mason         1
HP                 1
SFO                1
FM                 1
Name: city, dtype: int64 

********************

 Civilian_Fatalities 

0    1987
1       1
Name: civilian_fatalities, dtype: int64 

********************

 Civilian_Injuries 

0    1987
1       1
Name: civilian_injuries, dtype: int64 

********************

 Close_Dttm 

2016-06-26 11:46:09    2
2016-06-29 20:22:25    2
2016-07-04 17:46:04    2
2003-01-09 11:45:19    1
2016-06-30 15:50:55    1
2016-07-02 12:59:40    1
2016-07-02 17:59:17    1
2003-01-02 20:39:52    1
2003-01-06 22:20:44    1
2016-07-01 16:14:19    1
2003-01-03 09:17:39    1
2016-07-04 10:13:17    1
2016-07-07 09:05:15    1
2016-06-28 11:17:56    1
2016-06-27 04:46:03    1
2003-01-05 23:13:45    1
2003-01-06 18:47:13    1
2003-01-03 08:51:42    1
2003-01-04 20:21:51    1
2003-01-09 07:52:31    1
2003-01-11 02:25:32    1
2016-06-28 20:30:24    1
2003-01-01 03:47:23    1
2003-01-05 13:29:21    1
2016-07-03 04:58:13    1
2016-07-04 21:17:18    1
2003-01-03 17:31:25    1
2016-07-10 19:19:14    1
2003-01-07 19:16:05    1
2003-01-03 09:15:26    1
                      ..
2003-01-01 04:00:58    1
2003-01-02 19:03:00    1
2016-06-27 22:29:57    1
2016-06-29 13:35:57    1
2016-07-06 02:18:11    1
2016-07-04 01:07:01    1
2016-07-03 01:54:36    1
2016-07-03 01:55:24    1
2003-01-06 21:53:23    1
2003-01-03 14:33:24    1
2003-01-05 21:28:07    1
2003-01-01 20:58:10    1
2003-01-05 17:49:10    1
2016-07-01 19:25:29    1
2016-06-30 14:43:00    1
2003-01-05 23:56:25    1
2016-06-24 21:17:13    1
2016-06-27 01:50:22    1
2016-07-01 17:01:19    1
2016-07-09 18:03:09    1
2003-01-05 10:32:40    1
2003-01-04 08:16:34    1
2003-01-01 14:55:31    1
2016-06-25 02:13:09    1
2016-06-26 20:58:58    1
2003-01-03 10:55:19    1
2003-01-02 17:49:13    1
2016-07-09 16:51:42    1
2003-01-03 02:08:46    1
2016-07-03 10:02:48    1
Name: close_dttm, dtype: int64 

********************

 Detector_Alerted_Occupants 

NaN                                     956
-                                       890
u - unknown                              69
1 detector alerted occupants             29
1 - detector alerted occupants           23
2 detector did not alert occupants        9
2 - detector did not alert occupants      6
u unknown                                 6
Name: detector_alerted_occupants, dtype: int64 

********************

 Detector_Effectiveness 

NaN                                                  1951
-                                                      32
1 -alerted occupants, occupants responded               4
1 detector alerted occupants, occupants responded       1
Name: detector_effectiveness, dtype: int64 

********************

 Detector_Failure_Reason 

NaN    1952
-        36
Name: detector_failure_reason, dtype: int64 

********************

 Detector_Operation 

NaN                                      1948
-                                          31
2 -detector operated                        4
1 fire too small to activate detector       3
u -undetermined                             1
2 detector operated                         1
Name: detector_operation, dtype: int64 

********************

 Detector_Type 

NaN                                           1948
-                                               31
1 smoke                                          3
1 -smoke                                         3
u -undetermined                                  1
u undetermined                                   1
3 -combination smoke & heat in single unit       1
Name: detector_type, dtype: int64 

********************

 Detectors_Present 

NaN                1938
-                    22
u -undetermined       7
u undetermined        6
1 -present            5
n none present        4
1 present             4
n -not present        2
Name: detectors_present, dtype: int64 

********************

 Ems_Personnel 

0     1764
2      155
3       27
1       18
4       13
5        7
6        2
11       1
7        1
Name: ems_personnel, dtype: int64 

********************

 Ems_Units 

0    1764
1     175
2      38
3       9
6       1
4       1
Name: ems_units, dtype: int64 

********************

 Estimated_Contents_Loss 

 0        988
NaN       970
 50         9
 100        3
 500        2
 200        2
 1000       2
 5000       1
 50000      1
 250        1
 1500       1
 25         1
 20         1
 10         1
 6          1
 5          1
 2          1
 1          1
 2000       1
Name: estimated_contents_loss, dtype: int64 

********************

 Estimated_Property_Loss 

 0         987
NaN        962
 50          9
 100         5
 5000        4
 500         3
 1000        3
 200         2
 2000        2
 15000       2
 1500        2
 10000       1
 400         1
 300         1
 200000      1
 10          1
 235000      1
 2500        1
Name: estimated_property_loss, dtype: int64 

********************

 Exposure_Number 

0    1988
Name: exposure_number, dtype: int64 

********************

 Fire_Fatalities 

0    1988
Name: fire_fatalities, dtype: int64 

********************

 Fire_Injuries 

0    1988
Name: fire_injuries, dtype: int64 

********************

 Fire_Spread 

NaN                                           1952
-                                               29
00 -item first ignited, other                    2
uu -undetermined                                 1
95 -film, residue, including paint & resin       1
17 -structural member or framing                 1
32 -bedding; blanket, sheet, comforter           1
81 -electrical wire, cable insulation            1
Name: fire_spread, dtype: int64 

********************

 First_Unit_On_Scene 

NaN    1121
E36      47
E05      42
E03      40
E17      35
E06      32
E07      29
E01      24
E16      24
E21      22
E13      21
E10      21
T03      20
T01      19
E29      17
E38      17
E44      16
E19      16
E33      16
E25      16
T13      15
E18      14
E43      14
E08      13
E12      13
E35      12
E31      11
E14      11
E11      11
E32      11
       ... 
T12       3
T02       3
B09       3
B07       3
E48       2
B05       2
M36       2
RC4       2
T09       2
T08       2
M12       2
M14       2
B06       1
FB2       1
RC1       1
RC2       1
RC3       1
M41       1
M28       1
M43       1
M05       1
M07       1
RS1       1
91        1
93        1
92        1
94        1
M17       1
M18       1
D3        1
Name: first_unit_on_scene, dtype: int64 

********************

 Floor_Of_Fire_Origin 

NaN    1958
 2       12
 1        9
 3        4
 0        2
 5        1
 6        1
 4        1
Name: floor_of_fire_origin, dtype: int64 

********************

 Heat_Source 

NaN                                                          1880
-                                                              19
uu - undetermined                                              14
uu undetermined                                                14
54 fireworks                                                    5
10 - heat from powered equipment, other                         5
61 cigarette                                                    4
61 - cigarette                                                  4
10 heat from powered equipment, other                           4
63 - heat from undetermined smoking materi                      3
13 electrical arcing                                            3
00 - heat source: other                                         3
81 heat from direct flame, convection currents                  3
65 lighter: cigarette, cigar                                    3
12 - radiated/conducted heat operating equ                      3
12 radiated or conducted heat from operating equipment          3
60 heat from other open flame or smoking materials, other       2
64 - match                                                      2
11 spark, ember, or flame from operating equipment              2
40 - hot or smoldering object, other                            2
60 - heat; other open flame/smoking materi                      2
43 hot ember or ash                                             2
70 - chemical, natural heat source, other                       1
66 candle                                                       1
11 - spark/ember/flame from operating equi                      1
42 molten, hot material                                         1
13 - arcing                                                     1
63 heat from undetermined smoking material                      1
Name: heat_source, dtype: int64 

********************

 Human_Factors_Associated_With_Ignition 

NaN                                                                       1880
-                                                                           44
n none                                                                      37
3 unattended or unsupervised person                                          8
3 - unattended or unsupervised person                                        8
n - none                                                                     7
1â§3 asleepâ§unattended or unsupervised person                               1
2â§4 possibly impaired by alcohol or drugsâ§possibly mentally disabled       1
2 - possibly impaired by alcohol or drugs                                    1
2 possibly impaired by alcohol or drugs                                      1
Name: human_factors_associated_with_ignition, dtype: int64 

********************

 Ignition_Cause 

NaN                                           1880
2 unintentional                                 29
-                                               19
2 - unintentional                               16
u cause undetermined after investigation         8
3 - failure of equipment or heat source          8
1 intentional                                    7
1 - intentional                                  6
u - cause undetermined after investigation       4
0 - cause, other                                 3
5 - cause under investigation                    3
3 failure of equipment or heat source            2
4 act of nature                                  1
5 cause under investigation                      1
4 - act of nature                                1
Name: ignition_cause, dtype: int64 

********************

 Ignition_Factor_Primary 

NaN                                                1880
-                                                    35
uu undetermined                                      12
nn none                                              11
11 abandoned or discarded materials or products       5
53 equipment unattended                               5
11 - abandoned or discarded materials or p            4
53 - equipment unattended                             3
20 - mechanical failure, malfunction, othe            3
12 - heat source too close to combustibles            3
00 factors contributing to ignition, other            3
19 playing with heat source                           3
10 misuse of material or product, other               2
34 unspecified short-circuit arc                      2
10 - misuse of material or product, other             2
25 - worn out                                         2
12 heat source too close to combustibles.             1
20 mechanical failure, malfunction, other             1
00 - other factor contributed to ignition             1
60 natural condition, other                           1
23 - leak or break                                    1
uu - undetermined                                     1
41 - design deficiency                                1
36 - arc, spark from operating equipment              1
18 - improper container or storage                    1
52 - accidentally turned on, not turned of            1
30 electrical failure, malfunction, other             1
14 - flammable liquid or gas spilled                  1
37 fluorescent light ballast                          1
Name: ignition_factor_primary, dtype: int64 

********************

 Ignition_Factor_Secondary 

NaN    1928
-        60
Name: ignition_factor_secondary, dtype: int64 

********************

 Incident_Date 

2003-01-01T00:00:00.000    143
2003-01-03T00:00:00.000    120
2003-01-07T00:00:00.000    104
2003-01-06T00:00:00.000    100
2003-01-05T00:00:00.000     95
2003-01-09T00:00:00.000     91
2003-01-04T00:00:00.000     90
2016-07-04T00:00:00.000     87
2003-01-02T00:00:00.000     84
2016-07-03T00:00:00.000     80
2003-01-10T00:00:00.000     79
2016-06-25T00:00:00.000     77
2003-01-08T00:00:00.000     77
2016-06-30T00:00:00.000     72
2016-07-01T00:00:00.000     70
2016-06-27T00:00:00.000     68
2016-07-02T00:00:00.000     67
2016-06-26T00:00:00.000     60
2016-07-06T00:00:00.000     60
2016-06-28T00:00:00.000     59
2016-07-07T00:00:00.000     57
2016-06-29T00:00:00.000     56
2016-07-05T00:00:00.000     44
2016-07-09T00:00:00.000     42
2016-07-08T00:00:00.000     42
2016-07-10T00:00:00.000     30
2016-06-24T00:00:00.000     29
2003-01-11T00:00:00.000      5
Name: incident_date, dtype: int64 

********************

 Incident_Number 

3000319     1
16074038    1
16069908    1
16071959    1
3001501     1
3001625     1
16069914    1
3001631     1
3001633     1
16071971    1
3001637     1
3001639     1
16071977    1
3001643     1
3001651     1
3001655     1
16074220    1
16071616    1
16069948    1
3001663     1
16069952    1
16069954    1
16072003    1
16069956    1
3002995     1
16069966    1
16072015    1
16069970    1
16074070    1
3000364     1
           ..
16075312    1
3000932     1
3002929     1
16071218    1
16075316    1
16074842    1
16071222    1
16075322    1
3002939     1
3002943     1
16071234    1
3000900     1
3002949     1
3002953     1
3001019     1
16073291    1
3000908     1
16073293    1
3000912     1
3002961     1
3000914     1
3000916     1
3002965     1
16071254    1
16071256    1
16073307    1
16072193    1
3001274     1
3000930     1
3002368     1
Name: incident_number, dtype: int64 

********************

 Item_First_Ignited 

NaN                                                   1880
-                                                       19
72 light vegetation - not crop, including grass          8
uu - undetermined                                        8
uu undetermined                                          7
81 - electrical wire, cable insulation                   5
00 item first ignited, other                             5
76 cooking materials, including edible materials         5
00 - item first ignited, other                           5
76 - cooking materials, inc. edible materi               4
62 - flam. liq/gas-in/from engine or burne               3
41 - christmas tree                                      3
51 - box, carton, bag, basket, barrel                    2
25 appliance housing or casing                           2
30 soft goods, wearing apparel, other                    2
31 - mattress, pillow                                    2
99 multiple items first ignited                          2
96 rubbish, trash, waste                                 2
81 electrical wire, cable insulation                     2
12 exterior sidewall covering, surface, finish           2
10 structural component or finish, other                 2
42 decoration                                            1
96 - rubbish, trash, or waste                            1
59 rolled, wound material (paper and fabrics)            1
94 - dust/fiber/lint. inc. sawdust, excels               1
10 - structural component or finish, other               1
32 - bedding; blanket, sheet, comforter                  1
86 fence, pole                                           1
15 - int. wall cover  exclude drapes, etc.               1
13 exterior trim, including doors                        1
20 furniture, utensils, other                            1
92 - magazine, newspaper, writing paper                  1
70 - organic materials, other                            1
88 pyrotechnics, explosives                              1
62 flammable liquid/gas - in/from engine or burner       1
21 - upholstered sofa, chair, vehicle seat               1
91 book                                                  1
26 household utensils                                    1
36 - curtains, blinds, drapery, tapestry                 1
Name: item_first_ignited, dtype: int64 

********************

 Mutual_Aid 

n none    1000
none       988
Name: mutual_aid, dtype: int64 

********************

 Neighborhood_District 

NaN                               228
Tenderloin                        182
Mission                           172
Financial District/South Beach    168
South of Market                   123
Western Addition                  101
Bayview Hunters Point              80
Castro/Upper Market                63
Hayes Valley                       56
Marina                             51
Nob Hill                           48
Pacific Heights                    48
Potrero Hill                       46
Sunset/Parkside                    44
North Beach                        42
Outer Richmond                     42
Russian Hill                       40
Chinatown                          38
Lakeshore                          34
Haight Ashbury                     31
West of Twin Peaks                 28
Presidio Heights                   27
Oceanview/Merced/Ingleside         27
Inner Sunset                       25
Mission Bay                        23
Excelsior                          23
Noe Valley                         22
Lone Mountain/USF                  21
Visitacion Valley                  21
Inner Richmond                     19
Bernal Heights                     19
Outer Mission                      17
Golden Gate Park                   15
Japantown                          13
Portola                            12
Presidio                           10
Treasure Island                     8
Glen Park                           7
Twin Peaks                          5
McLaren Park                        4
Lincoln Park                        3
Seacliff                            2
Name: neighborhood_district, dtype: int64 

********************

 No_Flame_Spead 

NaN    975
na     952
no      47
2.0      7
1.0      6
yes      1
Name: no_flame_spead, dtype: int64 

********************

 Number_Of_Floors_With_Extreme_Damage 

NaN    1972
 0       16
Name: number_of_floors_with_extreme_damage, dtype: int64 

********************

 Number_Of_Floors_With_Heavy_Damage 

NaN    1972
 0       16
Name: number_of_floors_with_heavy_damage, dtype: int64 

********************

 Number_Of_Floors_With_Minimum_Damage 

NaN    1972
 1       12
 0        4
Name: number_of_floors_with_minimum_damage, dtype: int64 

********************

 Number_Of_Floors_With_Significant_Damage 

NaN    1972
 0       14
 1        2
Name: number_of_floors_with_significant_damage, dtype: int64 

********************

 Number_Of_Sprinkler_Heads_Operating 

NaN    1971
 0       16
 1        1
Name: number_of_sprinkler_heads_operating, dtype: int64 

********************

 Other_Personnel 

0    1889
2      71
1      22
3       4
4       2
Name: other_personnel, dtype: int64 

********************

 Other_Units 

0    1889
1      93
2       6
Name: other_units, dtype: int64 

********************

 Primary_Situation 

711 - municipal alarm system, street box false            234
700 false alarm or false call, other                      139
711 municipal alarm system, malicious false alarm         104
700 - false alarm or false call, other                     72
743 smoke detector activation, no fire - unintentional     56
730 - system malfunction, other                            50
740 - unintentional alarm, other                           47
118 - trash or rubbish fire, contained                     45
311 - medical assist, assist ems crew                      43
735 alarm system sounded due to malfunction                38
745 alarm system activation, no fire - unintentional       37
113 cooking fire, confined to container                    35
500 service call, other                                    35
151 - outside rubbish, trash or waste fire                 33
735 - alarm system sounded due to malfunction              31
322 motor vehicle accident with injuries                   31
113 - cooking fire, confined to container                  31
151 outside rubbish, trash or waste fire                   30
745 - alarm system sounded/no fire-accidental              30
150 outside rubbish fire, other                            28
600 good intent call, other                                24
733 smoke detector activation due to malfunction           24
651 - smoke scare, odor of smoke                           22
730 system malfunction, other                              22
522 water or steam leak                                    21
500 - service call, other                                  20
740 unintentional transmission of alarm, other             20
100 - fire, other                                          18
554 assist invalid                                         18
131 - passenger vehicle fire                               17
                                                         ... 
672 biological hazard investigation, none found             1
161 outside storage fire                                    1
812 - flood assessment                                      1
220 - rupture from air or gas, other                        1
661 - ems call, party transported by non-fire ag            1
200 overpressure rupture, explosion, overheat other         1
413 oil or other combustible liquid spill                   1
512 ring or jewelry removal                                 1
11 -                                                        1
352 extrication of victim(s) from vehicle                   1
162 outside equipment fire                                  1
911 citizen complaint                                       1
423 - refrigeration leak                                    1
321 - ems excluding veh. accident w/injuries                1
736 - co detector activation/malfunction                    1
351 extrication of victim(s) from building/structure        1
652 - steam/vapor/fog/dust mistaken for smoke               1
652 steam, vapor, fog or dust thought to be smoke           1
361 swimming/recreational water areas rescue                1
364 - surf rescue                                           1
251 excessive heat, scorch burns with no ignition           1
213 - steam rupture, pressure/process vessel                1
210 - steam rupture, steam, other                           1
741 - sprinkler activation/no fire-accidental               1
140 - natural vegetation fire, other                        1
463 vehicle accident, general cleanup                       1
521 - water evacuation                                      1
561 unauthorized burning                                    1
410 - flammable gas or liquid condition, other              1
200 - overpressure rupture/explosion, overheat              1
Name: primary_situation, dtype: int64 

********************

 Property_Use 

429 multifamily dwelling                                  218
962 - residential street, road or residential dr          190
960 - street, other                                       177
429 - multifamily dwellings                               152
960 street, other                                         111
419 1 or 2 family dwelling                                104
419 - 1 or 2 family dwelling                              100
963 - street or road in commercial area                    87
962 residential street, road or residential driveway       81
400 residential, other                                     57
599 - business office                                      53
963 street or road in commercial area                      47
000 - property use, other                                  43
nnn none                                                   42
500 mercantile, business, other                            37
439 boarding/rooming house, residential hotels             32
400 - residential, other                                   28
599 business office                                        26
961 highway or divided highway                             20
439 - boarding/rooming house, res. hotels                  19
000 property use, other                                    19
uuu undetermined                                           17
900 outside or special property, other                     15
331 - hospital - medical or psychiatric                    15
213 elementary school, including kindergarten              15
150 - public or government, other                          15
449 hotel/motel, commercial                                14
uuu - undetermined                                         12
449 - hotel/motel, commercial                              10
331 hospital - medical or psychiatric                       9
                                                         ... 
182 - auditorium or concert hall                            1
122 convention center, exhibition hall                      1
539 household goods, sales, repairs                         1
170 - passenger terminal, other                             1
183 - movie theater                                         1
579 - vehicle/boat; sales, service or repair                1
311 - 24-hr care nurse homes, 4 or more person              1
300 - health care/detentino & correction, oth.              1
215 - middle/junior or high school                          1
151 library                                                 1
340 clinics, doctors offices, hemodialysis cntr, other      1
952 - railroad yard                                         1
571 - service station, gas station                          1
460 dormitory-type residence, other                         1
140 - clubs, other                                          1
639 - communications center                                 1
254 - day care, in commercial property                      1
557 - per. service; inc. barber/beauty shop                 1
342 doctor, dentist or oral surgeon office                  1
122 - convention center, exhibition hall                    1
4191 -                                                      1
807 - outside material storage area                         1
155 courthouse                                              1
151 - library                                               1
936 - vacant lot                                            1
973 aircraft taxiway                                        1
938 - graded and cared-for plots of land                    1
921 bridge, trestle                                         1
322 alcohol or substance abuse recovery center              1
940 - water area, other                                     1
Name: property_use, dtype: int64 

********************

 Station_Area 

1     108
01     97
3      73
36     60
03     59
36     57
05     51
6      49
7      46
06     45
17     44
07     42
13     41
28     36
5      35
8      35
10     34
16     33
13     33
16     32
10     29
38     28
41     27
21     25
44     25
28     24
35     23
19     23
33     23
33     23
     ... 
34     13
22     12
2      12
40     12
09     12
26     11
14     11
24     11
15     11
34     10
23     10
51     10
02     10
9       9
4       9
39      9
42      9
40      8
48      8
12      8
23      7
24      7
32      7
39      7
18      6
26      5
20      3
20      2
48      2
H1      1
Name: station_area, dtype: int64 

********************

 Structure_Status 

NaN                          1938
-                              22
2 in normal use                13
2 -in normal use               13
5 vacant and secured            1
4 -under major renovation       1
Name: structure_status, dtype: int64 

********************

 Structure_Type 

NaN                         1935
-                             20
1 enclosed building           14
1 -enclosed building          13
0 -structure type, other       2
0 structure type, other        2
3 open structure               1
5 -tent                        1
Name: structure_type, dtype: int64 

********************

 Supervisor_District 

 6     432
 3     234
NaN    228
 5     218
 2     171
 10    164
 8     142
 9     134
 1      81
 7      79
 11     62
 4      43
Name: supervisor_district, dtype: int64 

********************

 Suppression_Personnel 

4     635
11    387
9     342
10    235
5     124
29     29
6      27
12     23
1      23
14     18
8      15
30     15
33     10
28      9
3       9
15      8
32      7
2       7
16      7
34      6
36      6
19      5
13      5
20      4
7       3
37      3
18      3
43      3
0       3
50      2
22      2
23      2
31      2
38      2
44      1
24      1
26      1
27      1
35      1
39      1
21      1
Name: suppression_personnel, dtype: int64 

********************

 Suppression_Units 

1     779
3     663
2     383
8      45
4      37
9      24
10     15
5      15
11      7
6       7
7       5
0       3
38      2
12      2
18      1
Name: suppression_units, dtype: int64 

********************

 Zipcode 

NaN       228
 94102    213
 94103    196
 94109    152
 94110    132
 94107     88
 94115     84
 94124     79
 94117     75
 94114     69
 94105     68
 94133     63
 94112     60
 94118     55
 94123     54
 94132     52
 94121     44
 94122     44
 94108     39
 94134     37
 94111     35
 94131     29
 94116     28
 94104     22
 94127     13
 94158     11
 94129     10
 94130      8
Name: zipcode, dtype: int64 

********************

 Long 

NaN            214
-122.384160     11
-122.410955      8
-122.410501      7
-122.417316      6
-122.406657      6
-122.476826      6
-122.417505      5
-122.413943      5
-122.418928      5
-122.432911      5
-122.412393      5
-122.439481      4
-122.473006      4
-122.422925      4
-122.428124      4
-122.396707      4
-122.412074      4
-122.406959      4
-122.400474      4
-122.405501      4
-122.426721      4
-122.456060      4
-122.400699      4
-122.401847      4
-122.412784      4
-122.435188      4
-122.412305      4
-122.415565      4
-122.417210      4
              ... 
-122.404280      1
-122.458418      1
-122.471571      1
-122.412573      1
-122.425476      1
-122.423536      1
-122.413404      1
-122.415174      1
-122.410932      1
-122.431854      1
-122.484969      1
-122.410079      1
-122.455430      1
-122.407302      1
-122.438694      1
-122.446061      1
-122.392860      1
-122.426449      1
-122.437699      1
-122.507414      1
-122.413956      1
-122.459122      1
-122.437851      1
-122.465866      1
-122.397476      1
-122.481796      1
-122.406448      1
-122.421232      1
-122.392303      1
-122.421602      1
Name: long, dtype: int64 

********************

 Lat 

NaN           214
 37.616901     11
 37.784140      8
 37.780478      7
 37.806963      7
 37.796024      6
 37.728471      6
 37.778083      6
 37.776540      5
 37.781050      5
 37.784054      5
 37.763220      5
 37.774844      5
 37.785930      4
 37.760888      4
 37.774807      4
 37.773100      4
 37.805858      4
 37.756213      4
 37.790046      4
 37.785208      4
 37.784866      4
 37.789435      4
 37.762671      4
 37.754145      4
 37.791370      4
 37.715106      4
 37.783309      4
 37.779032      4
 37.785029      4
             ... 
 37.732715      1
 37.761097      1
 37.775811      1
 37.779545      1
 37.731129      1
 37.759387      1
 37.799229      1
 37.759560      1
 37.760338      1
 37.785893      1
 37.800160      1
 37.803990      1
 37.751972      1
 37.718175      1
 37.765054      1
 37.786416      1
 37.778110      1
 37.765964      1
 37.754397      1
 37.727527      1
 37.788093      1
 37.714691      1
 37.781321      1
 37.791842      1
 37.774707      1
 37.753967      1
 37.776599      1
 37.750186      1
 37.786696      1
 37.783992      1
Name: lat, dtype: int64 

********************

In [37]:
df.tail()


Out[37]:
action_taken_other action_taken_primary action_taken_secondary address alarm_dttm area_of_fire_origin arrival_dttm automatic_extinguishing_system_present automatic_extinguishing_sytem_failure_reason automatic_extinguishing_sytem_perfomance ... property_use station_area structure_status structure_type supervisor_district suppression_personnel suppression_units zipcode long lat
983 - 86 - investigate - 291 10th St. 2003-01-01 00:19:50 - 2003-01-01 00:26:23 - - - ... 322 - alcohol/sub. abuse recovery center 36 - - 6 11 3 94103 -122.412942 37.773001
984 - 32 - provide basic life support (bls) - Market St. / Spear St. 2003-01-01 00:07:32 - 2003-01-01 00:07:46 - - - ... 419 - 1 or 2 family dwelling 13 - - 6 0 0 94105 -122.395630 37.793790
985 - 86 - investigate - 33rd Av. / Noriega St. 2003-01-01 00:18:19 - 2003-01-01 00:21:15 - - - ... 962 - residential street, road or residential dr 18 - - NaN 4 1 NaN NaN NaN
986 - 87 - investigate - fire out on arrival - 3rd St. / Harrison St. 2003-01-01 00:08:13 - 2003-01-01 00:11:19 - - - ... 960 - street, other 08 - - 6 4 1 94107 -122.397389 37.782554
987 - 86 - investigate - Broadway St. / Taylor St. 2003-01-01 00:02:13 - 2003-01-01 00:06:13 - - - ... 962 - residential street, road or residential dr 02 - - 3 4 1 94133 -122.413562 37.797038

5 rows × 63 columns


In [38]:
# OK, so here's a list of columns that for the time being we'll just remove so that we can get a better sense of the 
# data and hopefully, more easily, grab the entire body of records, and not just these few rows

cols_to_drop = ["automatic_extinguishing_sytem_failure_reason",
                "automatic_extinguishing_sytem_type",
                "battalion",
                "box",
                "call_number",
                "detector_effectiveness",
                "detector_failure_reason",
                "ems_personnel",
                "ems_units",
                "exposure_number",
                "first_unit_on_scene",
                "ignition_factor_secondary",
                "mutual_aid",
                "no_flame_spead",
                "other_personnel",
                "other_units",
                "station_area",
                "supervisor_district"]
df = df.drop(cols_to_drop, axis=1)

In [39]:
df.info()


<class 'pandas.core.frame.DataFrame'>
Int64Index: 1988 entries, 0 to 987
Data columns (total 45 columns):
action_taken_other                          996 non-null object
action_taken_primary                        1988 non-null object
action_taken_secondary                      1095 non-null object
address                                     1985 non-null object
alarm_dttm                                  1988 non-null datetime64[ns]
area_of_fire_origin                         108 non-null object
arrival_dttm                                1988 non-null datetime64[ns]
automatic_extinguishing_system_present      50 non-null object
automatic_extinguishing_sytem_perfomance    41 non-null object
city                                        1975 non-null object
civilian_fatalities                         1988 non-null int64
civilian_injuries                           1988 non-null int64
close_dttm                                  1988 non-null datetime64[ns]
detector_alerted_occupants                  1032 non-null object
detector_operation                          40 non-null object
detector_type                               40 non-null object
detectors_present                           50 non-null object
estimated_contents_loss                     1018 non-null float64
estimated_property_loss                     1026 non-null float64
fire_fatalities                             1988 non-null int64
fire_injuries                               1988 non-null int64
fire_spread                                 36 non-null object
floor_of_fire_origin                        30 non-null float64
heat_source                                 108 non-null object
human_factors_associated_with_ignition      108 non-null object
ignition_cause                              108 non-null object
ignition_factor_primary                     108 non-null object
incident_date                               1988 non-null object
incident_number                             1988 non-null int64
item_first_ignited                          108 non-null object
neighborhood_district                       1760 non-null object
number_of_floors_with_extreme_damage        16 non-null float64
number_of_floors_with_heavy_damage          16 non-null float64
number_of_floors_with_minimum_damage        16 non-null float64
number_of_floors_with_significant_damage    16 non-null float64
number_of_sprinkler_heads_operating         17 non-null float64
primary_situation                           1988 non-null object
property_use                                1985 non-null object
structure_status                            50 non-null object
structure_type                              53 non-null object
suppression_personnel                       1988 non-null int64
suppression_units                           1988 non-null int64
zipcode                                     1760 non-null float64
long                                        1774 non-null float64
lat                                         1774 non-null float64
dtypes: datetime64[ns](3), float64(11), int64(7), object(24)
memory usage: 714.4+ KB

In [40]:
df.head()


Out[40]:
action_taken_other action_taken_primary action_taken_secondary address alarm_dttm area_of_fire_origin arrival_dttm automatic_extinguishing_system_present automatic_extinguishing_sytem_perfomance city ... number_of_sprinkler_heads_operating primary_situation property_use structure_status structure_type suppression_personnel suppression_units zipcode long lat
0 NaN 86 investigate NaN 105 Aptos Avenue 2016-07-10 21:50:58 NaN 2016-07-10 21:54:42 NaN NaN San Francisco ... NaN 733 smoke detector activation due to malfunction 215 high school/junior high school/middle school NaN NaN 11 3 94127 -122.466270 37.728947
1 NaN 86 investigate NaN 8th St/bryant Street 2016-07-10 19:49:15 NaN 2016-07-10 19:52:06 NaN NaN San Francisco ... NaN 711 municipal alarm system, malicious false alarm nnn none NaN NaN 9 2 94103 -122.406971 37.772527
2 NaN 61 restore municipal services NaN 2nd St/brannan Street 2016-07-10 19:44:29 NaN 2016-07-10 19:47:35 NaN NaN San Francisco ... NaN 711 municipal alarm system, malicious false alarm 960 street, other NaN NaN 9 2 94107 -122.392082 37.781846
3 NaN 86 investigate NaN 8400 Oceanview Trail 2016-07-10 19:24:41 NaN 2016-07-10 19:28:56 NaN NaN San Francisco ... NaN 733 smoke detector activation due to malfunction 429 multifamily dwelling NaN NaN 11 3 94132 -122.467447 37.709976
4 NaN 86 investigate NaN 636 Velasco Av A 2016-07-10 18:56:31 NaN 2016-07-10 19:00:13 NaN NaN San Francisco ... NaN 531 smoke or odor removal 419 1 or 2 family dwelling NaN NaN 29 8 94134 -122.417959 37.709630

5 rows × 45 columns


In [41]:
df[df.zipcode.isnull()].iloc[-1]


Out[41]:
action_taken_other                                                                         -
action_taken_primary                                                        86 - investigate
action_taken_secondary                                                                     -
address                                                               33rd Av. / Noriega St.
alarm_dttm                                                               2003-01-01 00:18:19
area_of_fire_origin                                                                        -
arrival_dttm                                                             2003-01-01 00:21:15
automatic_extinguishing_system_present                                                     -
automatic_extinguishing_sytem_perfomance                                                   -
city                                                                                      SF
civilian_fatalities                                                                        0
civilian_injuries                                                                          0
close_dttm                                                               2003-01-01 00:21:21
detector_alerted_occupants                                                                 -
detector_operation                                                                         -
detector_type                                                                              -
detectors_present                                                                          -
estimated_contents_loss                                                                    0
estimated_property_loss                                                                    0
fire_fatalities                                                                            0
fire_injuries                                                                              0
fire_spread                                                                                -
floor_of_fire_origin                                                                     NaN
heat_source                                                                                -
human_factors_associated_with_ignition                                                     -
ignition_cause                                                                             -
ignition_factor_primary                                                                    -
incident_date                                                        2003-01-01T00:00:00.000
incident_number                                                                      3000014
item_first_ignited                                                                         -
neighborhood_district                                                                    NaN
number_of_floors_with_extreme_damage                                                     NaN
number_of_floors_with_heavy_damage                                                       NaN
number_of_floors_with_minimum_damage                                                     NaN
number_of_floors_with_significant_damage                                                 NaN
number_of_sprinkler_heads_operating                                                      NaN
primary_situation                             711 - municipal alarm system, street box false
property_use                                962 - residential street, road or residential dr
structure_status                                                                           -
structure_type                                                                             -
suppression_personnel                                                                      4
suppression_units                                                                          1
zipcode                                                                                  NaN
long                                                                                     NaN
lat                                                                                      NaN
Name: 985, dtype: object

In [42]:
# after googling around I found this API for doing a reverse geo lookup, using the lat long to get the nearest zipcode
# we will do that now, but try to be polite about it.
# http://api.geonames.org/findNearbyPostalCodesJSON?lat=37.728947&lng=-122.466270&username=demo

df[df.zipcode.isnull()].groupby(['lat','long'])['address'].value_counts(dropna=False)


Out[42]:
lat        long         address              
37.616901  -122.384160  1 Sf Intl Airport        10
                        1                         1
37.812941  -122.477899  1019                      1
37.820633  -122.337061  800 80 Eb Bay Br Z Yb     1
37.866731  -122.432600  1 Angel Island Dr, Ai     1
dtype: int64

In [43]:
grouped = df[df.zipcode.isnull()].groupby(['lat','long'])

In [44]:
for name, group in grouped:
    print("")
    print(name)

    print(group['address'].value_counts(dropna=False))


(37.616900999999999, -122.38415999999999)
1 Sf Intl Airport    10
1                     1
Name: address, dtype: int64

(37.812941000000002, -122.47789899999999)
1019    1
Name: address, dtype: int64

(37.820633000000001, -122.33706100000001)
800 80 Eb Bay Br Z Yb    1
Name: address, dtype: int64

(37.866731000000001, -122.43259999999999)
1 Angel Island Dr, Ai    1
Name: address, dtype: int64

In [48]:
geo_url = "http://api.geonames.org/findNearbyPostalCodesJSON?lat={}&lng={}&username={}"
username = 'mikezawitkowski' # TODO: hide this in a config file not in source control

temp_df = pd.read_json(geo_url.format('37.616900999999999', '-122.38415999999999', username))
temp_df.head()


Out[48]:
postalCodes
0 {u'distance': u'0', u'countryCode': u'US', u'p...
1 {u'distance': u'2.41637', u'countryCode': u'US...
2 {u'distance': u'3.97939', u'countryCode': u'US...
3 {u'distance': u'4.04594', u'countryCode': u'US...
4 {u'distance': u'4.68373', u'countryCode': u'US...

In [51]:
temp_df.iloc[0]['postalCodes']


Out[51]:
{u'adminCode1': u'CA',
 u'adminCode2': u'081',
 u'adminName1': u'California',
 u'adminName2': u'San Mateo',
 u'countryCode': u'US',
 u'distance': u'0',
 u'lat': 37.616901,
 u'lng': -122.38416,
 u'placeName': u'San Francisco',
 u'postalCode': u'94128'}

In [55]:
# OK, so let's populate the missing zipcodes
grouped = df[df.zipcode.isnull()].groupby(['lat','long'])

geo_url = "http://api.geonames.org/findNearbyPostalCodesJSON?lat={}&lng={}&username={}"
username = 'mikezawitkowski' # TODO: hide this in a config file not in source control

for name, group in grouped:
    lat, lon = name[0], name[1]
    print("lat: {}, long: {}".format(lat, lon))

    temp_df = pd.read_json(geo_url.format(lat, 
                                          lon, 
                                          username))
    mask = ((df.lat == float(lat)) & 
            (df['long'] == float(lon)) &
            (df.zipcode.isnull())
           )

    df.loc[mask, 'zipcode'] = temp_df.iloc[0]['postalCodes']['postalCode']


lat: 37.616901, long: -122.38416
lat: 37.812941, long: -122.477899
lat: 37.820633, long: -122.337061
lat: 37.866731, long: -122.4326

In [58]:
df[df.zipcode.isnull()].shape # there are still 214 values that are missing a lat lon AND zip


Out[58]:
(214, 45)

In [67]:
mask = (df.zipcode.isnull())
geocolumns = ['address','city','neighborhood_district','zipcode','lat','long']
df[mask][geocolumns]


Out[67]:
address city neighborhood_district zipcode lat long
515 908 Connecticut Street San Francisco NaN NaN NaN NaN
8 1001 Larkin St. SF NaN NaN NaN NaN
15 86 Rossi Av. SF NaN NaN NaN NaN
18 6239 Geary Bl. SF NaN NaN NaN NaN
29 Andover St. / Tompkins Av. SF NaN NaN NaN NaN
32 Paris St. / Russia Av. SF NaN NaN NaN NaN
34 Blanken Av. / Wheeler Av. SF NaN NaN NaN NaN
35 1700 Steiner St. SF NaN NaN NaN NaN
40 840 Van Ness Av. SF NaN NaN NaN NaN
42 Eucalyptus Dr. / Melba Av. SF NaN NaN NaN NaN
46 Bright St. / Sargent St. SF NaN NaN NaN NaN
47 2255 41st Av. SF NaN NaN NaN NaN
62 46 Idora Av. SF NaN NaN NaN NaN
73 Schwerin St. / Visitacion Av. SF NaN NaN NaN NaN
75 Schwerin St. / Visitacion Av. SF NaN NaN NaN NaN
80 2603 33rd Av. SF NaN NaN NaN NaN
92 1100 Lake Merced Bl. SF NaN NaN NaN NaN
94 50 Stanyan Bl. SF NaN NaN NaN NaN
100 2659 45th Av. SF NaN NaN NaN NaN
103 Brewster St. / Joy St. SF NaN NaN NaN NaN
106 1 I280nb C Chavez Of SF NaN NaN NaN NaN
107 2427 28th Av. SF NaN NaN NaN NaN
108 351 Buena Vista East Av. SF NaN NaN NaN NaN
113 935 Golden Gate Av. SF NaN NaN NaN NaN
117 1451 Thomas Av. SF NaN NaN NaN NaN
119 Blanken Av. / Wheeler Av. SF NaN NaN NaN NaN
120 1101 Van Ness Av. SF NaN NaN NaN NaN
128 Corbett Av. / Iron Al. SF NaN NaN NaN NaN
149 5700 3rd St. SF NaN NaN NaN NaN
150 Divisadero St. / Waller St. SF NaN NaN NaN NaN
... ... ... ... ... ... ...
809 Evans Av. / Rankin St. SF NaN NaN NaN NaN
826 1250 La Playa St. SF NaN NaN NaN NaN
838 1260 Masonic Av. SF NaN NaN NaN NaN
839 Broderick St. / Bush St. SF NaN NaN NaN NaN
840 1436 8th Av. SF NaN NaN NaN NaN
845 2941 23rd Av. SF NaN NaN NaN NaN
846 1540 Sunnydale Av. SF NaN NaN NaN NaN
853 419 Ivy St. SF NaN NaN NaN NaN
855 Cesar Chavez St. / Minnesota St. SF NaN NaN NaN NaN
868 319 Lower Fort Mason St. FM NaN NaN NaN NaN
876 Goettingen St. / Ward St. SF NaN NaN NaN NaN
878 1345 44th Av. SF NaN NaN NaN NaN
881 50 Laurel St. SF NaN NaN NaN NaN
883 Brookdale Av. / Santos St. SF NaN NaN NaN NaN
892 Bay View St. / Latona St. SF NaN NaN NaN NaN
893 2169 26th Av. SF NaN NaN NaN NaN
911 1208 Buchanan St. SF NaN NaN NaN NaN
913 2354 22nd Av. SF NaN NaN NaN NaN
915 Hawes St. / Revere Av. SF NaN NaN NaN NaN
923 30 States St. SF NaN NaN NaN NaN
926 132 Brookdale Av. SF NaN NaN NaN NaN
927 Keith St. / Newcomb Av. SF NaN NaN NaN NaN
931 1251 Eddy St. SF NaN NaN NaN NaN
937 Geneva Av. / Santos St. SF NaN NaN NaN NaN
938 18th St. / Rhode Island St. SF NaN NaN NaN NaN
942 573 26th Av. SF NaN NaN NaN NaN
949 Buchanan St. / Moulton St. SF NaN NaN NaN NaN
958 Coleridge St. / Eugenia Av. SF NaN NaN NaN NaN
971 2 Parker Av. SF NaN NaN NaN NaN
985 33rd Av. / Noriega St. SF NaN NaN NaN NaN

214 rows × 6 columns


In [69]:
df[mask].iloc[0][geocolumns]


Out[69]:
address                  908 Connecticut Street
city                              San Francisco
neighborhood_district                       NaN
zipcode                                     NaN
lat                                         NaN
long                                        NaN
Name: 515, dtype: object

In [73]:
# let's try building a street2coordinates endpoint
address = ('+').join(df[mask].iloc[0]['address'].split())
address = address + '%2c+San+Francisco%2c+CA'
s2c_url = "http://www.datasciencetoolkit.org/street2coordinates/"
temp_df = pd.read_json(s2c_url + address)
temp_df
# "http://www.datasciencetoolkit.org/street2coordinates/2543+Graystone+Place%2c+Simi+Valley%2c+CA+93065"


Out[73]:
908 Connecticut Street, San Francisco, CA
confidence 0.902
country_code US
country_code3 USA
country_name United States
fips_county 06075
latitude 37.7537
locality San Francisco
longitude -122.398
region CA
street_address 908 Connecticut St
street_name Connecticut St
street_number 908

In [82]:
temp_df.loc['latitude'][0]


Out[82]:
37.753653

In [74]:
# rename the data
df.loc[df.city == 'SF', 'city'] = 'San Francisco'

In [110]:
mask = (df.zipcode.isnull())
grouped = df[mask].groupby(['address','city'])

for name, group in grouped:
    if '/' in str(name[0]):  # the sign that it's an intersection, deal with those separately
        continue
    street, city = name[0], name[1]
    address = ('+').join(street.split())
    address = address + '%2c+' + '+'.join(city.split()) + '%2c+CA'
    s2c_url = "http://www.datasciencetoolkit.org/street2coordinates/"
    try:
        temp_df = pd.read_json(s2c_url + address)
    except ValueError as err:
        print(err)
        print(name)
        continue
    mask = ((df.address == street) & (df.city == city))
    df.loc[mask, 'lat'] = temp_df.loc['latitude'][0]
    df.loc[mask, 'long'] = temp_df.loc['longitude'][0]


If using all scalar values, you must pass an index
(u'319 Lower Fort Mason St.', u'FM')

In [112]:
df[df.zipcode.isnull() & (df.lat.isnull())]


Out[112]:
action_taken_other action_taken_primary action_taken_secondary address alarm_dttm area_of_fire_origin arrival_dttm automatic_extinguishing_system_present automatic_extinguishing_sytem_perfomance city ... number_of_sprinkler_heads_operating primary_situation property_use structure_status structure_type suppression_personnel suppression_units zipcode long lat
29 - 86 - investigate - Andover St. / Tompkins Av. 2003-01-10 16:42:53 NaN 2003-01-10 16:46:25 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 962 - residential street, road or residential dr NaN NaN 4 1 NaN NaN NaN
32 - 86 - investigate - Paris St. / Russia Av. 2003-01-10 16:05:35 NaN 2003-01-10 16:09:34 NaN NaN San Francisco ... NaN 700 - false alarm or false call, other 960 - street, other NaN NaN 4 1 NaN NaN NaN
34 - 86 - investigate - Blanken Av. / Wheeler Av. 2003-01-10 15:59:28 NaN 2003-01-10 16:02:29 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 963 - street or road in commercial area NaN NaN 9 2 NaN NaN NaN
42 - 86 - investigate - Eucalyptus Dr. / Melba Av. 2003-01-10 14:46:03 NaN 2003-01-10 14:49:23 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 962 - residential street, road or residential dr NaN NaN 9 2 NaN NaN NaN
46 - 86 - investigate - Bright St. / Sargent St. 2003-01-10 13:56:24 NaN 2003-01-10 13:59:24 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 962 - residential street, road or residential dr NaN NaN 9 2 NaN NaN NaN
73 - 86 - investigate - Schwerin St. / Visitacion Av. 2003-01-10 03:15:30 NaN 2003-01-10 03:19:30 NaN NaN San Francisco ... NaN 700 - false alarm or false call, other 215 - middle/junior or high school NaN NaN 30 9 NaN NaN NaN
75 - 86 - investigate - Schwerin St. / Visitacion Av. 2003-01-10 03:17:04 NaN 2003-01-10 03:23:15 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 962 - residential street, road or residential dr NaN NaN 4 1 NaN NaN NaN
103 - 86 - investigate - Brewster St. / Joy St. 2003-01-09 19:14:57 NaN 2003-01-09 19:20:31 NaN NaN San Francisco ... NaN 651 - smoke scare, odor of smoke 962 - residential street, road or residential dr NaN NaN 23 6 NaN NaN NaN
119 - 86 - investigate - Blanken Av. / Wheeler Av. 2003-01-09 15:47:17 NaN 2003-01-09 15:50:35 NaN NaN San Francisco ... NaN 100 - fire, other 000 - property use, other NaN NaN 4 1 NaN NaN NaN
128 - 86 - investigate - Corbett Av. / Iron Al. 2003-01-09 14:35:04 NaN 2003-01-09 14:39:27 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 960 - street, other NaN NaN 4 1 NaN NaN NaN
150 - 86 - investigate - Divisadero St. / Waller St. 2003-01-09 10:12:06 NaN 2003-01-09 10:15:07 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 962 - residential street, road or residential dr NaN NaN 4 1 NaN NaN NaN
163 - 86 - investigate - Peabody St. / Visitacion Av. 2003-01-09 03:05:21 NaN 2003-01-09 03:06:58 NaN NaN San Francisco ... NaN 710 - malicious, mischievous false call, other 960 - street, other NaN NaN 4 1 NaN NaN NaN
164 - 86 - investigate - Schwerin St. / Visitacion Av. 2003-01-09 02:59:42 NaN 2003-01-09 03:05:07 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 962 - residential street, road or residential dr NaN NaN 4 1 NaN NaN NaN
165 - 86 - investigate - Buchanan St. / Waller St. 2003-01-09 02:40:41 NaN 2003-01-09 02:43:42 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 962 - residential street, road or residential dr NaN NaN 11 3 NaN NaN NaN
170 - 10 - fire, other - Collingwood St. / Market St. 2003-01-09 01:44:08 NaN 2003-01-09 01:46:43 NaN NaN San Francisco ... NaN 118 - trash or rubbish fire, contained 960 - street, other NaN NaN 4 1 NaN NaN NaN
180 - 11 - extinguish - Clayton St. / Fell St. 2003-01-08 23:27:03 NaN 2003-01-08 23:30:36 NaN NaN San Francisco ... NaN 151 - outside rubbish, trash or waste fire 962 - residential street, road or residential dr NaN NaN 4 1 NaN NaN NaN
226 - 86 - investigate - Bright St. / Sargent St. 2003-01-08 14:01:47 NaN 2003-01-08 14:04:17 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 960 - street, other NaN NaN 4 2 NaN NaN NaN
239 - 86 - investigate - Bush St. / Webster St. 2003-01-08 10:30:13 NaN 2003-01-08 10:33:44 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 960 - street, other NaN NaN 9 2 NaN NaN NaN
272 - 86 - investigate - 29th St. / Noe St. 2003-01-07 19:43:38 NaN 2003-01-07 19:46:47 NaN NaN San Francisco ... NaN 311 - medical assist, assist ems crew 419 - 1 or 2 family dwelling NaN NaN 4 1 NaN NaN NaN
288 - 66 - remove water - Ivy St. / Octavia St. 2003-01-07 17:44:33 NaN 2003-01-07 17:47:22 NaN NaN San Francisco ... NaN 522 - water or steam leak 429 - multifamily dwellings NaN NaN 11 3 NaN NaN NaN
296 - 86 - investigate - Bennington St. / Ellert St. 2003-01-07 16:06:39 NaN 2003-01-07 16:11:13 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 960 - street, other NaN NaN 5 1 NaN NaN NaN
314 - 86 - investigate - Congo St. / Monterey Bl. 2003-01-07 12:26:53 NaN 2003-01-07 12:30:10 NaN NaN San Francisco ... NaN 700 - false alarm or false call, other 419 - 1 or 2 family dwelling NaN NaN 11 3 NaN NaN NaN
324 - 45 - remove hazard - 39th Av. / Balboa St. 2003-01-07 10:29:51 NaN 2003-01-07 10:29:51 NaN NaN San Francisco ... NaN 400 - hazardous condition, other 210 - schools, non-adult NaN NaN 5 1 NaN NaN NaN
329 - 86 - investigate - 19th Av. / Anza St. 2003-01-07 09:30:44 NaN 2003-01-07 09:34:03 NaN NaN San Francisco ... NaN 411 - gasoline or other flammable liquid spill 962 - residential street, road or residential dr NaN NaN 4 1 NaN NaN NaN
342 - 86 - investigate - Bradford St. / Cortland Av. 2003-01-07 05:32:48 NaN 2003-01-07 05:40:02 NaN NaN San Francisco ... NaN 400 - hazardous condition, other 962 - residential street, road or residential dr NaN NaN 4 1 NaN NaN NaN
346 - 86 - investigate - Schwerin St. / Visitacion Av. 2003-01-07 02:26:18 NaN 2003-01-07 02:30:39 NaN NaN San Francisco ... NaN 710 - malicious, mischievous false call, other 960 - street, other NaN NaN 4 1 NaN NaN NaN
352 - 11 - extinguish - 18th St. / Oakwood St. 2003-01-07 00:15:03 NaN 2003-01-07 00:19:31 NaN NaN San Francisco ... NaN 151 - outside rubbish, trash or waste fire 962 - residential street, road or residential dr NaN NaN 4 1 NaN NaN NaN
357 - 11 - extinguish - Kirkwood Av. / Newhall St. 2003-01-06 22:36:40 NaN 2003-01-06 22:39:03 NaN NaN San Francisco ... NaN 118 - trash or rubbish fire, contained 960 - street, other NaN NaN 4 1 NaN NaN NaN
376 - 31 - provide first aid & check for injuries - 101nb C Chavez On / Cesar Chavez S 2003-01-06 18:19:54 NaN 2003-01-06 18:24:17 NaN NaN San Francisco ... NaN 311 - medical assist, assist ems crew 961 - highway or divided highway (street) NaN NaN 4 1 NaN NaN NaN
379 - 86 - investigate - Garlington Ct. / La Salle Av. 2003-01-06 19:13:02 NaN 2003-01-06 19:15:27 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 962 - residential street, road or residential dr NaN NaN 9 2 NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
641 - 86 - investigate - Farallones St. / San Jose Av. 2003-01-03 23:38:55 NaN 2003-01-03 23:42:53 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 963 - street or road in commercial area NaN NaN 4 1 NaN NaN NaN
642 - 86 - investigate - Euclid Av. / Palm Av. 2003-01-03 23:24:14 NaN 2003-01-03 23:27:26 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 963 - street or road in commercial area NaN NaN 4 1 NaN NaN NaN
661 - 86 - investigate - 13th St. / Gateview Av. 2003-01-03 20:15:57 NaN 2003-01-03 20:18:37 NaN NaN TI ... NaN 151 - outside rubbish, trash or waste fire 429 - multifamily dwellings NaN NaN 4 1 NaN NaN NaN
673 - 86 - investigate - Arch St. / Brotherhood Wy. 2003-01-03 17:46:08 NaN 2003-01-03 17:51:48 NaN NaN San Francisco ... NaN 463 - vehicle accident, general cleanup 962 - residential street, road or residential dr NaN NaN 4 1 NaN NaN NaN
674 - 86 - investigate - Girard St. / Wayland St. 2003-01-03 18:14:30 NaN 2003-01-03 18:17:04 NaN NaN San Francisco ... NaN 672 - biological hazard investigation,none found 962 - residential street, road or residential dr NaN NaN 11 3 NaN NaN NaN
735 - 86 - investigate - Central Av. / Turk Bl. 2003-01-03 08:48:44 NaN 2003-01-03 08:51:15 NaN NaN San Francisco ... NaN 700 - false alarm or false call, other 960 - street, other NaN NaN 9 2 NaN NaN NaN
746 - 86 - investigate - Octavia St. / Pine St. 2003-01-03 05:10:13 NaN 2003-01-03 05:12:53 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 962 - residential street, road or residential dr NaN NaN 4 1 NaN NaN NaN
751 - 86 - investigate - 17th St. / Florida St. 2003-01-03 02:42:53 NaN 2003-01-03 02:46:16 NaN NaN San Francisco ... NaN 651 - smoke scare, odor of smoke 960 - street, other NaN NaN 4 1 NaN NaN NaN
760 - 11 - extinguish - Ceres St. / Williams Av. 2003-01-03 00:22:59 NaN 2003-01-03 00:27:12 NaN NaN San Francisco ... NaN 118 - trash or rubbish fire, contained 962 - residential street, road or residential dr NaN NaN 4 1 NaN NaN NaN
767 - 11 - extinguish - Connecticut St. / Wisconsin St. 2003-01-02 22:53:12 NaN 2003-01-02 22:56:18 NaN NaN San Francisco ... NaN 118 - trash or rubbish fire, contained 962 - residential street, road or residential dr NaN NaN 4 1 NaN NaN NaN
772 - 86 - investigate - Hawes St. / Revere Av. 2003-01-02 20:57:24 NaN 2003-01-02 20:59:13 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 962 - residential street, road or residential dr NaN NaN 9 2 NaN NaN NaN
780 - 86 - investigate - Lyon St. / Oak St. 2003-01-02 19:00:22 NaN 2003-01-02 19:02:22 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 962 - residential street, road or residential dr NaN NaN 9 2 NaN NaN NaN
787 - 86 - investigate - Byxbee St. / Shields St. 2003-01-02 18:28:53 NaN 2003-01-02 18:33:11 NaN NaN San Francisco ... NaN 700 - false alarm or false call, other 150 - public or government, other NaN NaN 4 1 NaN NaN NaN
792 - 86 - investigate - California St. / Lyon St. 2003-01-02 18:09:15 NaN 2003-01-02 18:11:30 NaN NaN San Francisco ... NaN 440 - elec. wiring/equip. problem, other 962 - residential street, road or residential dr NaN NaN 4 1 NaN NaN NaN
793 - 86 - investigate - Hazelwood Av. / Yerba Buena Av. 2003-01-02 17:44:24 NaN 2003-01-02 17:47:22 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 960 - street, other NaN NaN 9 2 NaN NaN NaN
808 - 86 - investigate - Lyon St. / Marina Bl. 2003-01-02 13:10:31 NaN 2003-01-02 13:13:59 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 963 - street or road in commercial area NaN NaN 4 1 NaN NaN NaN
809 - 86 - investigate - Evans Av. / Rankin St. 2003-01-02 11:53:14 NaN 2003-01-02 11:54:05 NaN NaN San Francisco ... NaN 531 - smoke or odor removal 891 - warehouse NaN NaN 15 9 NaN NaN NaN
839 - 86 - investigate - Broderick St. / Bush St. 2003-01-02 01:49:01 NaN 2003-01-02 01:52:21 NaN NaN San Francisco ... NaN 100 - fire, other 962 - residential street, road or residential dr NaN NaN 4 1 NaN NaN NaN
855 - 86 - investigate - Cesar Chavez St. / Minnesota St. 2003-01-01 20:23:51 NaN 2003-01-01 20:26:17 NaN NaN San Francisco ... NaN 551 - assist pd or other govern. agency 960 - street, other NaN NaN 4 1 NaN NaN NaN
868 - 45 - remove hazard - 319 Lower Fort Mason St. 2003-01-01 19:19:24 NaN 2003-01-01 19:28:18 NaN NaN FM ... NaN 411 - gasoline or other flammable liquid spill 962 - residential street, road or residential dr NaN NaN 4 1 NaN NaN NaN
876 - 86 - investigate - Goettingen St. / Ward St. 2003-01-01 18:45:01 NaN 2003-01-01 18:47:54 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 960 - street, other NaN NaN 4 1 NaN NaN NaN
883 - 86 - investigate - Brookdale Av. / Santos St. 2003-01-01 18:10:22 NaN 2003-01-01 18:15:03 NaN NaN San Francisco ... NaN 700 - false alarm or false call, other 960 - street, other NaN NaN 4 1 NaN NaN NaN
892 - 11 - extinguish - Bay View St. / Latona St. 2003-01-01 16:54:24 NaN 2003-01-01 16:56:37 NaN NaN San Francisco ... NaN 151 - outside rubbish, trash or waste fire 963 - street or road in commercial area NaN NaN 4 1 NaN NaN NaN
915 - 86 - investigate - Hawes St. / Revere Av. 2003-01-01 13:49:53 NaN 2003-01-01 13:51:39 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 962 - residential street, road or residential dr NaN NaN 9 2 NaN NaN NaN
927 - 86 - investigate - Keith St. / Newcomb Av. 2003-01-01 08:49:30 NaN 2003-01-01 08:52:01 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 962 - residential street, road or residential dr NaN NaN 9 2 NaN NaN NaN
937 - 62 - restore fire protection system - Geneva Av. / Santos St. 2003-01-01 04:47:20 NaN 2003-01-01 04:51:34 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 960 - street, other NaN NaN 9 2 NaN NaN NaN
938 - 86 - investigate - 18th St. / Rhode Island St. 2003-01-01 04:45:41 NaN 2003-01-01 04:48:37 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 962 - residential street, road or residential dr NaN NaN 4 1 NaN NaN NaN
949 - 86 - investigate - Buchanan St. / Moulton St. 2003-01-01 03:10:40 80 - vehicle area, other 2003-01-01 03:12:12 NaN NaN San Francisco ... NaN 131 - passenger vehicle fire 962 - residential street, road or residential dr NaN NaN 4 1 NaN NaN NaN
958 - 33 - provide advanced life support (als) - Coleridge St. / Eugenia Av. 2003-01-01 01:30:02 NaN 2003-01-01 01:37:10 NaN NaN San Francisco ... NaN 322 - vehicle accident with injuries 962 - residential street, road or residential dr NaN NaN 6 2 NaN NaN NaN
985 - 86 - investigate - 33rd Av. / Noriega St. 2003-01-01 00:18:19 - 2003-01-01 00:21:15 - - San Francisco ... NaN 711 - municipal alarm system, street box false 962 - residential street, road or residential dr - - 4 1 NaN NaN NaN

75 rows × 45 columns


In [63]:
# now, we go back and we fill in the zipcodes using lat long
geo_url = "http://api.geonames.org/findNearbyPostalCodesJSON?lat={}&lng={}&username={}"


Out[63]:
action_taken_other                                                                       NaN
action_taken_primary                                                          86 investigate
action_taken_secondary                                                                   NaN
address                                                                     105 Aptos Avenue
alarm_dttm                                                               2016-07-10 21:50:58
area_of_fire_origin                                                                      NaN
arrival_dttm                                                             2016-07-10 21:54:42
automatic_extinguishing_system_present                                                   NaN
automatic_extinguishing_sytem_perfomance                                                 NaN
city                                                                           San Francisco
civilian_fatalities                                                                        0
civilian_injuries                                                                          0
close_dttm                                                               2016-07-10 22:08:15
detector_alerted_occupants                                                               NaN
detector_operation                                                                       NaN
detector_type                                                                            NaN
detectors_present                                                                        NaN
estimated_contents_loss                                                                  NaN
estimated_property_loss                                                                  NaN
fire_fatalities                                                                            0
fire_injuries                                                                              0
fire_spread                                                                              NaN
floor_of_fire_origin                                                                     NaN
heat_source                                                                              NaN
human_factors_associated_with_ignition                                                   NaN
ignition_cause                                                                           NaN
ignition_factor_primary                                                                  NaN
incident_date                                                        2016-07-10T00:00:00.000
incident_number                                                                     16076046
item_first_ignited                                                                       NaN
neighborhood_district                                                     West of Twin Peaks
number_of_floors_with_extreme_damage                                                     NaN
number_of_floors_with_heavy_damage                                                       NaN
number_of_floors_with_minimum_damage                                                     NaN
number_of_floors_with_significant_damage                                                 NaN
number_of_sprinkler_heads_operating                                                      NaN
primary_situation                           733 smoke detector activation due to malfunction
property_use                                215 high school/junior high school/middle school
structure_status                                                                         NaN
structure_type                                                                           NaN
suppression_personnel                                                                     11
suppression_units                                                                          3
zipcode                                                                                94127
long                                                                                -122.466
lat                                                                                  37.7289
Name: 0, dtype: object

In [109]:
df[df.zipcode.isnull()]


Out[109]:
action_taken_other action_taken_primary action_taken_secondary address alarm_dttm area_of_fire_origin arrival_dttm automatic_extinguishing_system_present automatic_extinguishing_sytem_perfomance city ... number_of_sprinkler_heads_operating primary_situation property_use structure_status structure_type suppression_personnel suppression_units zipcode long lat
515 NaN 30 emergency medical services, other NaN 908 Connecticut Street 2016-07-01 22:34:15 NaN 2016-07-01 22:39:20 NaN NaN San Francisco ... NaN 300 rescue, ems incident, other 960 street, other NaN NaN 10 3 NaN NaN NaN
8 - 86 - investigate - 1001 Larkin St. 2003-01-10 23:49:00 NaN 2003-01-10 23:52:13 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 960 - street, other NaN NaN 9 2 NaN -122.418271 37.787037
15 - 86 - investigate - 86 Rossi Av. 2003-01-10 21:14:33 NaN 2003-01-10 21:19:05 NaN NaN San Francisco ... NaN 400 - hazardous condition, other 419 - 1 or 2 family dwelling NaN NaN 4 1 NaN NaN NaN
18 - 10 - fire, other - 6239 Geary Bl. 2003-01-10 20:09:19 NaN 2003-01-10 20:12:07 NaN NaN San Francisco ... NaN 113 - cooking fire, confined to container 429 - multifamily dwellings NaN NaN 20 5 NaN NaN NaN
29 - 86 - investigate - Andover St. / Tompkins Av. 2003-01-10 16:42:53 NaN 2003-01-10 16:46:25 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 962 - residential street, road or residential dr NaN NaN 4 1 NaN NaN NaN
32 - 86 - investigate - Paris St. / Russia Av. 2003-01-10 16:05:35 NaN 2003-01-10 16:09:34 NaN NaN San Francisco ... NaN 700 - false alarm or false call, other 960 - street, other NaN NaN 4 1 NaN NaN NaN
34 - 86 - investigate - Blanken Av. / Wheeler Av. 2003-01-10 15:59:28 NaN 2003-01-10 16:02:29 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 963 - street or road in commercial area NaN NaN 9 2 NaN NaN NaN
35 - 86 - investigate - 1700 Steiner St. 2003-01-10 15:44:31 NaN 2003-01-10 15:47:07 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 962 - residential street, road or residential dr NaN NaN 1 1 NaN -122.434730 37.784943
40 - 86 - investigate - 840 Van Ness Av. 2003-01-10 15:08:39 NaN 2003-01-10 15:11:09 NaN NaN San Francisco ... NaN 730 - system malfunction, other 429 - multifamily dwellings NaN NaN 11 3 NaN NaN NaN
42 - 86 - investigate - Eucalyptus Dr. / Melba Av. 2003-01-10 14:46:03 NaN 2003-01-10 14:49:23 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 962 - residential street, road or residential dr NaN NaN 9 2 NaN NaN NaN
46 - 86 - investigate - Bright St. / Sargent St. 2003-01-10 13:56:24 NaN 2003-01-10 13:59:24 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 962 - residential street, road or residential dr NaN NaN 9 2 NaN NaN NaN
47 - 71 - assist physically disabled - 2255 41st Av. 2003-01-10 13:37:53 NaN 2003-01-10 13:42:47 NaN NaN San Francisco ... NaN 554 - assist invalid 419 - 1 or 2 family dwelling NaN NaN 4 1 NaN -122.499542 37.744710
62 - 86 - investigate - 46 Idora Av. 2003-01-10 08:37:14 NaN 2003-01-10 08:43:06 NaN NaN San Francisco ... NaN 733 - smoke detector activation/malfunction 419 - 1 or 2 family dwelling NaN NaN 11 3 NaN NaN NaN
73 - 86 - investigate - Schwerin St. / Visitacion Av. 2003-01-10 03:15:30 NaN 2003-01-10 03:19:30 NaN NaN San Francisco ... NaN 700 - false alarm or false call, other 215 - middle/junior or high school NaN NaN 30 9 NaN NaN NaN
75 - 86 - investigate - Schwerin St. / Visitacion Av. 2003-01-10 03:17:04 NaN 2003-01-10 03:23:15 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 962 - residential street, road or residential dr NaN NaN 4 1 NaN NaN NaN
80 - 62 - restore fire protection system - 2603 33rd Av. 2003-01-10 00:33:28 NaN 2003-01-10 00:37:43 NaN NaN San Francisco ... NaN 710 - malicious, mischievous false call, other 429 - multifamily dwellings NaN NaN 11 3 NaN -122.490506 37.738642
92 - 86 - investigate - 1100 Lake Merced Bl. 2003-01-09 22:10:48 NaN 2003-01-09 22:17:21 NaN NaN San Francisco ... NaN 463 - vehicle accident, general cleanup 962 - residential street, road or residential dr NaN NaN 4 1 NaN -122.485384 37.711318
94 - 86 - investigate - 50 Stanyan Bl. 2003-01-09 21:38:32 NaN 2003-01-09 21:42:29 NaN NaN San Francisco ... NaN 600 - good intent call, other 429 - multifamily dwellings NaN NaN 11 3 NaN NaN NaN
100 - 86 - investigate - 2659 45th Av. 2003-01-09 20:00:35 NaN 2003-01-09 20:05:54 NaN NaN San Francisco ... NaN 600 - good intent call, other 419 - 1 or 2 family dwelling NaN NaN 4 1 NaN -122.503302 37.736988
103 - 86 - investigate - Brewster St. / Joy St. 2003-01-09 19:14:57 NaN 2003-01-09 19:20:31 NaN NaN San Francisco ... NaN 651 - smoke scare, odor of smoke 962 - residential street, road or residential dr NaN NaN 23 6 NaN NaN NaN
106 - 32 - provide basic life support (bls) - 1 I280nb C Chavez Of 2003-01-09 17:40:20 NaN 2003-01-09 18:03:37 NaN NaN San Francisco ... NaN 311 - medical assist, assist ems crew 961 - highway or divided highway (street) NaN NaN 6 2 NaN -122.452869 37.742996
107 - 30 - emergency medical services, other - 2427 28th Av. 2003-01-09 17:16:04 NaN 2003-01-09 17:20:11 NaN NaN San Francisco ... NaN 311 - medical assist, assist ems crew 419 - 1 or 2 family dwelling NaN NaN 4 1 NaN -122.485382 37.742126
108 - 86 - investigate - 351 Buena Vista East Av. 2003-01-09 18:33:52 NaN 2003-01-09 18:37:04 NaN NaN San Francisco ... NaN 700 - false alarm or false call, other 429 - multifamily dwellings NaN NaN 10 3 NaN NaN NaN
113 - 86 - investigate - 935 Golden Gate Av. 2003-01-09 17:52:04 NaN 2003-01-09 17:54:29 NaN NaN San Francisco ... NaN 113 - cooking fire, confined to container 419 - 1 or 2 family dwelling NaN NaN 1 1 NaN NaN NaN
117 - 64 - shut down system - 1451 Thomas Av. 2003-01-09 16:31:50 NaN 2003-01-09 16:34:47 NaN NaN San Francisco ... NaN 522 - water or steam leak 419 - 1 or 2 family dwelling NaN NaN 5 1 NaN -122.388349 37.728774
119 - 86 - investigate - Blanken Av. / Wheeler Av. 2003-01-09 15:47:17 NaN 2003-01-09 15:50:35 NaN NaN San Francisco ... NaN 100 - fire, other 000 - property use, other NaN NaN 4 1 NaN NaN NaN
120 - 86 - investigate - 1101 Van Ness Av. 2003-01-09 15:26:31 NaN 2003-01-09 15:29:03 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 429 - multifamily dwellings NaN NaN 9 3 NaN -122.421393 37.785686
128 - 86 - investigate - Corbett Av. / Iron Al. 2003-01-09 14:35:04 NaN 2003-01-09 14:39:27 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 960 - street, other NaN NaN 4 1 NaN NaN NaN
149 - 86 - investigate - 5700 3rd St. 2003-01-09 10:31:16 NaN 2003-01-09 10:34:30 NaN NaN San Francisco ... NaN 740 - unintentional alarm, other 599 - business office NaN NaN 4 1 NaN NaN NaN
150 - 86 - investigate - Divisadero St. / Waller St. 2003-01-09 10:12:06 NaN 2003-01-09 10:15:07 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 962 - residential street, road or residential dr NaN NaN 4 1 NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
809 - 86 - investigate - Evans Av. / Rankin St. 2003-01-02 11:53:14 NaN 2003-01-02 11:54:05 NaN NaN San Francisco ... NaN 531 - smoke or odor removal 891 - warehouse NaN NaN 15 9 NaN NaN NaN
826 - 00 - action taken, other - 1250 La Playa St. 2003-01-02 08:10:26 NaN 2003-01-02 08:16:42 NaN NaN San Francisco ... NaN 500 - service call, other 000 - property use, other NaN NaN 4 18 NaN -122.509273 37.763073
838 - 86 - investigate - 1260 Masonic Av. 2003-01-02 01:57:17 NaN 2003-01-02 02:02:02 NaN NaN San Francisco ... NaN 520 - water problem, other 429 - multifamily dwellings NaN NaN 4 1 NaN -122.445181 37.769642
839 - 86 - investigate - Broderick St. / Bush St. 2003-01-02 01:49:01 NaN 2003-01-02 01:52:21 NaN NaN San Francisco ... NaN 100 - fire, other 962 - residential street, road or residential dr NaN NaN 4 1 NaN NaN NaN
840 - 86 - investigate - 1436 8th Av. 2003-01-02 01:40:29 NaN 2003-01-02 01:42:20 NaN NaN San Francisco ... NaN 423 - refrigeration leak 419 - 1 or 2 family dwelling NaN NaN 11 3 NaN -122.465001 37.761542
845 - 86 - investigate - 2941 23rd Av. 2003-01-01 23:58:48 NaN 2003-01-02 00:02:02 NaN NaN San Francisco ... NaN 445 - arcing, shorted electrical equipment 419 - 1 or 2 family dwelling NaN NaN 11 3 NaN -122.479398 37.733732
846 - 10 - fire, other - 1540 Sunnydale Av. 2003-01-01 23:38:34 NaN 2003-01-01 23:47:46 NaN NaN San Francisco ... NaN 100 - fire, other 000 - property use, other NaN NaN 4 1 NaN -122.416332 37.712110
853 - 31 - provide first aid & check for injuries - 419 Ivy St. 2003-01-01 20:45:41 NaN 2003-01-01 20:47:58 NaN NaN San Francisco ... NaN 311 - medical assist, assist ems crew 429 - multifamily dwellings NaN NaN 6 2 NaN NaN NaN
855 - 86 - investigate - Cesar Chavez St. / Minnesota St. 2003-01-01 20:23:51 NaN 2003-01-01 20:26:17 NaN NaN San Francisco ... NaN 551 - assist pd or other govern. agency 960 - street, other NaN NaN 4 1 NaN NaN NaN
868 - 45 - remove hazard - 319 Lower Fort Mason St. 2003-01-01 19:19:24 NaN 2003-01-01 19:28:18 NaN NaN FM ... NaN 411 - gasoline or other flammable liquid spill 962 - residential street, road or residential dr NaN NaN 4 1 NaN NaN NaN
876 - 86 - investigate - Goettingen St. / Ward St. 2003-01-01 18:45:01 NaN 2003-01-01 18:47:54 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 960 - street, other NaN NaN 4 1 NaN NaN NaN
878 - 30 - emergency medical services, other - 1345 44th Av. 2003-01-01 18:35:39 NaN 2003-01-01 18:37:42 NaN NaN San Francisco ... NaN 311 - medical assist, assist ems crew 419 - 1 or 2 family dwelling NaN NaN 4 1 NaN -122.503944 37.761540
881 - 93 - cancelled enroute - 50 Laurel St. 2003-01-01 18:28:53 NaN 2003-01-01 18:33:27 NaN NaN San Francisco ... NaN 611 - dispatched & canceled en route - NaN NaN 11 3 NaN NaN NaN
883 - 86 - investigate - Brookdale Av. / Santos St. 2003-01-01 18:10:22 NaN 2003-01-01 18:15:03 NaN NaN San Francisco ... NaN 700 - false alarm or false call, other 960 - street, other NaN NaN 4 1 NaN NaN NaN
892 - 11 - extinguish - Bay View St. / Latona St. 2003-01-01 16:54:24 NaN 2003-01-01 16:56:37 NaN NaN San Francisco ... NaN 151 - outside rubbish, trash or waste fire 963 - street or road in commercial area NaN NaN 4 1 NaN NaN NaN
893 - 86 - investigate - 2169 26th Av. 2003-01-01 16:52:51 NaN 2003-01-01 16:55:53 NaN NaN San Francisco ... NaN 531 - smoke or odor removal 419 - 1 or 2 family dwelling NaN NaN 11 3 NaN -122.483586 37.747016
911 - 86 - investigate - 1208 Buchanan St. 2003-01-01 14:03:16 NaN 2003-01-01 14:06:39 NaN NaN San Francisco ... NaN 743 - smoke detector, no fire, accidental 429 - multifamily dwellings NaN NaN 11 3 NaN -122.428638 37.781051
913 - 86 - investigate - 2354 22nd Av. 2003-01-01 13:06:30 NaN 2003-01-01 13:08:48 NaN NaN San Francisco ... NaN 311 - medical assist, assist ems crew 419 - 1 or 2 family dwelling NaN NaN 7 2 NaN -122.478903 37.743751
915 - 86 - investigate - Hawes St. / Revere Av. 2003-01-01 13:49:53 NaN 2003-01-01 13:51:39 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 962 - residential street, road or residential dr NaN NaN 9 2 NaN NaN NaN
923 - 86 - investigate - 30 States St. 2003-01-01 10:32:23 NaN 2003-01-01 10:35:23 NaN NaN San Francisco ... NaN 600 - good intent call, other 429 - multifamily dwellings NaN NaN 11 3 NaN -122.435742 37.763501
926 - 71 - assist physically disabled - 132 Brookdale Av. 2003-01-01 09:26:39 NaN 2003-01-01 09:32:28 NaN NaN San Francisco ... NaN 500 - service call, other 429 - multifamily dwellings NaN NaN 5 1 NaN -122.421501 37.712122
927 - 86 - investigate - Keith St. / Newcomb Av. 2003-01-01 08:49:30 NaN 2003-01-01 08:52:01 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 962 - residential street, road or residential dr NaN NaN 9 2 NaN NaN NaN
931 - 86 - investigate - 1251 Eddy St. 2003-01-01 07:36:08 NaN 2003-01-01 07:39:19 NaN NaN San Francisco ... NaN 113 - cooking fire, confined to container 429 - multifamily dwellings NaN NaN 50 10 NaN -122.428047 37.781875
937 - 62 - restore fire protection system - Geneva Av. / Santos St. 2003-01-01 04:47:20 NaN 2003-01-01 04:51:34 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 960 - street, other NaN NaN 9 2 NaN NaN NaN
938 - 86 - investigate - 18th St. / Rhode Island St. 2003-01-01 04:45:41 NaN 2003-01-01 04:48:37 NaN NaN San Francisco ... NaN 711 - municipal alarm system, street box false 962 - residential street, road or residential dr NaN NaN 4 1 NaN NaN NaN
942 - 22 - rescue, remove from harm - 573 26th Av. 2003-01-01 03:36:46 NaN 2003-01-01 03:40:54 NaN NaN San Francisco ... NaN 510 - person in distress, other 000 - property use, other NaN NaN 4 1 NaN NaN NaN
949 - 86 - investigate - Buchanan St. / Moulton St. 2003-01-01 03:10:40 80 - vehicle area, other 2003-01-01 03:12:12 NaN NaN San Francisco ... NaN 131 - passenger vehicle fire 962 - residential street, road or residential dr NaN NaN 4 1 NaN NaN NaN
958 - 33 - provide advanced life support (als) - Coleridge St. / Eugenia Av. 2003-01-01 01:30:02 NaN 2003-01-01 01:37:10 NaN NaN San Francisco ... NaN 322 - vehicle accident with injuries 962 - residential street, road or residential dr NaN NaN 6 2 NaN NaN NaN
971 - 10 - fire, other - 2 Parker Av. 2003-01-01 00:43:53 - 2003-01-01 00:47:00 - - San Francisco ... NaN 463 - vehicle accident, general cleanup 960 - street, other - - 4 1 NaN -122.454774 37.786244
985 - 86 - investigate - 33rd Av. / Noriega St. 2003-01-01 00:18:19 - 2003-01-01 00:21:15 - - San Francisco ... NaN 711 - municipal alarm system, street box false 962 - residential street, road or residential dr - - 4 1 NaN NaN NaN

214 rows × 45 columns


In [113]:
# TODO NEXT
# it's clear from teh above that the steps being used for cleaning are working fine. 
# What's next is we need to refactor the above into a cleaning script
# and move that to src
# we also should figure out a more lightweight method of getting the 
# the credentials from any .env file, perhaps a simple .json is fine here
# the biggest problem is how to make this work on someone else's library. 
# Do we instruct one to download the dstk image?
# Is there a container that's lightweight that they can install?

In [ ]: