In [46]:
"""
author: mikezawitkowski
I had a conversation with someone who is working with the 
LA Fire Department to figure out how important ambient 
temperature is to predicting the outbreak of fire.
I wanted to figure out if this was also important
in predicting fire for San Francisco.
We'll try and do a simply seaborn and pandas
correlation plot to see.
"""
from __future__ import division, print_function
import pandas as pd
%matplotlib inline
import seaborn as sns

In [2]:
query_url = 'https://data.sfgov.org/resource/wbb6-uh78.json?$order=close_dttm%20DESC&$offset={}&$limit={}'
offset = 0
limit = 100000
df = pd.read_json(query_url.format(offset, limit))

In [3]:
cols_to_drop = ["automatic_extinguishing_sytem_failure_reason",
                "automatic_extinguishing_sytem_type",
                "battalion",
                "box",
                "call_number",
                "detector_effectiveness",
                "detector_failure_reason",
                "ems_personnel",
                "ems_units",
                "exposure_number",
                "first_unit_on_scene",
                "ignition_factor_secondary",
                "mutual_aid",
                "no_flame_spead",
                "other_personnel",
                "other_units",
                "station_area",
                "supervisor_district"]
df = df.drop(cols_to_drop, axis=1)

In [5]:
for col in df.columns:
    if 'dttm' in col:
        df[col] = pd.to_datetime(df[col])

In [8]:
df.alarm_dttm.min()


Out[8]:
Timestamp('2013-02-10 19:19:30')

In [9]:
df.alarm_dttm.max()


Out[9]:
Timestamp('2016-07-14 21:09:18')

In [13]:
d = df.alarm_dttm.min()

In [17]:
import json

In [24]:
with open('../../config.json', 'r') as fh:
    weather_api_key = json.load(fh)['weatherunderground']

In [23]:



Out[23]:
u'945a4462cb26de37'

In [25]:
# weather_underground developer key limits you to 500 calls per day and 10 calls per minute
url = "http://api.wunderground.com/api/{}/history_{}/q/CA/San_Francisco.json"

In [26]:
import requests

In [27]:
df.head()


Out[27]:
action_taken_other action_taken_primary action_taken_secondary address alarm_dttm area_of_fire_origin arrival_dttm automatic_extinguishing_system_present automatic_extinguishing_sytem_perfomance city ... number_of_floors_with_minimum_damage number_of_floors_with_significant_damage number_of_sprinkler_heads_operating primary_situation property_use structure_status structure_type suppression_personnel suppression_units zipcode
0 NaN 32 provide basic life support (bls) NaN 16th St/capp Street 2016-07-14 21:09:18 NaN 2016-07-14 21:11:29 NaN NaN San Francisco ... NaN NaN NaN 322 motor vehicle accident with injuries 962 residential street, road or residential dr... NaN NaN 4 1 94110
1 NaN 86 investigate NaN 524 Central Avenue 2016-07-14 20:27:21 NaN 2016-07-14 20:30:23 NaN NaN San Francisco ... NaN NaN NaN 700 false alarm or false call, other 429 multifamily dwelling NaN NaN 34 10 94117
2 NaN 63 restore fire alarm system NaN 1250 Sunnydale Avenue 2016-07-14 18:56:12 NaN 2016-07-14 19:03:35 NaN NaN San Francisco ... NaN NaN NaN 745 alarm system activation, no fire - uninten... 429 multifamily dwelling NaN NaN 11 3 94134
3 NaN 60 systems and services, other NaN 36th Av/ulloa Street 2016-07-14 17:09:22 NaN 2016-07-14 17:17:15 NaN NaN San Francisco ... NaN NaN NaN 500 service call, other 900 outside or special property, other NaN NaN 4 1 94116
4 NaN 86 investigate NaN 2301 Harrison Street 2016-07-14 13:50:36 NaN 2016-07-14 13:54:46 NaN NaN San Francisco ... NaN NaN NaN 745 alarm system activation, no fire - uninten... 419 1 or 2 family dwelling NaN NaN 11 3 94110

5 rows × 44 columns


In [28]:
df.estimated_property_loss.value_counts(dropna=False)


Out[28]:
NaN         96335
 50           487
 1000         252
 1            246
 10           239
 500          233
 100          216
 5000         206
 10000        131
 2000         129
 5            127
 0            114
 25            84
 200           82
 20000         71
 20            60
 3000          56
 1500          53
 15000         53
 50000         51
 25000         48
 750           40
 2500          40
 100000        38
 300           30
 30000         25
 30            22
 75000         18
 250           18
 8000          17
            ...  
 6025           1
 5533           1
 2900000        1
 5429           1
 5400           1
 5299           1
 5100           1
 4999           1
 4710           1
 4100           1
 8600           1
 8700           1
 140000         1
 9129           1
 16200          1
 14993          1
 14642          1
 14609          1
 14000          1
 800000         1
 12400          1
 11900          1
 11594          1
 11000          1
 10500          1
 9841           1
 9750           1
 9500           1
 9425           1
 8328           1
Name: estimated_property_loss, dtype: int64

In [31]:
# of the 100,000 rows, 96,335 are null
96335 / float(df.shape[0])


Out[31]:
0.96335

Check In

Switching back to ambient temp, I found this resource for downloading from SF based weather stations going back to 2013: http://www.ncdc.noaa.gov/cgi-bin/cdo/cdoprod.pl

I'll add this to external data folder, and here's the link for future reference to download the requested data from 2013 through 2016: http://www.ncdc.noaa.gov/orders/isd/CDO5991787088242.html

Still waiting for the data file to be available at the above link.


UPDATE 7/22 The data was made available, and downloaded to the directory /data/external/noaa/

So next let's take a look at the data and how it correlates with the number of fires.

We'll start this in a separate notebook, using the data file that was shared on 7/20


In [ ]: