Weather Correlation EDA

Using a new set of cleaned interim data provided by Satya, and looking for weather correlations such as ambient temperature.

The main question we're trying to answer is whether there's a correlation between weather data and FD service calls.

First we'll look at some simple correlations, just to see if there's a relationship in general between temperature and # of calls. The null hypothesis is that there is no relationship.

After we try to explore this, we'll look at other ways we can explore these variables, such as rate of change:

The alt hypothesis is that the quantity of FD service calls significantly increases after the temperature becomes an outlier (hot or cold) for that time of day for that day of week, compared to prior time of days and days of week.

The null is that there is no significant correlation between temperature outliers and service calls.



In [338]:

    
from __future__ import division, print_function
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns



In [21]:

    
# we'll use the fresh download of the fire call csv, and do a little exploration
cols_to_use = ['Incident.Date',
               'Call.Number',
               'Zipcode',
               'Alarm.DtTm',
               'Location',
               'Incident.Number',
               'Close.DtTm',
               'Action.Taken.Primary',
               'type',
               'Address',
               'Neighborhood..District'
                ]
fdf = pd.read_csv('../../data/interim/Clean_Fire_Incidents.csv', usecols=cols_to_use)



In [26]:

    
# rename the columns so that it's easier to use built-in dot notation functions
fdf = fdf.rename(columns={'Incident.Number':'incident_number',
'Incident.Date':'incident_date',
'Call.Number':'call_number',
'Zipcode':'zipcode',
'Alarm.DtTm':'alarm_dttm',
'Location':'location',
'Close.DtTm':'close_dttm',
'Action.Taken.Primary':'action_taken_primary',
'type':'call_type',
'Address':'address',
'Neighborhood..District':'neighborhood_district'})



In [29]:

    
fdf.incident_date = pd.to_datetime(fdf.incident_date)



In [30]:

    
fdf.head()









    Out[30]:






  
    
      
      incident_number
      address
      incident_date
      call_number
      alarm_dttm
      close_dttm
      zipcode
      action_taken_primary
      neighborhood_district
      location
      call_type
    
  
  
    
      0
      3000001
      1301 Turk St.
      2003-07-08
      31890183
      2003-07-08 06:58:18
      2003-07-08 06:58:39
      94109
      11 - extinguish
      Tenderloin
      (37.7866410000297, -122.419321)
      Building Fire
    
    
      1
      3000181
      1251 Eddy St.
      2003-01-01
      30010241
      2003-01-01 07:36:08
      2003-01-01 07:44:07
      0
      86 - investigate
      NaN
      NaN
      Building Fire
    
    
      2
      3000185
      33 White St.
      2003-01-01
      30010246
      2003-01-01 07:56:47
      2003-01-01 08:02:53
      94109
      86 - investigate
      Russian Hill
      (37.7976446500297, -122.41971707)
      Building Fire
    
    
      3
      3000224
      1201 12th Av.
      2003-01-01
      30010313
      2003-01-01 11:31:21
      2003-01-01 11:39:35
      94122
      86 - investigate
      Inner Sunset
      (37.7657010000297, -122.469938)
      Building Fire
    
    
      4
      3000232
      832 Mission St.
      2003-01-01
      30010326
      2003-01-01 12:33:52
      2003-01-01 12:39:03
      94103
      86 - investigate
      South of Market
      (37.7840465400297, -122.40539981)
      Building Fire



In [31]:

    
fdf.info()









    



<class 'pandas.core.frame.DataFrame'>
Int64Index: 64330 entries, 0 to 64329
Data columns (total 11 columns):
incident_number          64330 non-null int64
address                  64324 non-null object
incident_date            64330 non-null datetime64[ns]
call_number              64330 non-null int64
alarm_dttm               64330 non-null datetime64[ns]
close_dttm               64330 non-null datetime64[ns]
zipcode                  64330 non-null int64
action_taken_primary     64330 non-null object
neighborhood_district    54526 non-null object
location                 55798 non-null object
call_type                64330 non-null object
dtypes: datetime64[ns](3), int64(3), object(5)
memory usage: 5.9+ MB



In [23]:

    
# based on above we have some typing to do
for col in fdf.columns:
    if 'dttm' in col.lower():
        fdf[col] = pd.to_datetime(fdf[col])



In [33]:

    
fdf.loc[0]









    Out[33]:





incident_number                                  3000001
address                                    1301 Turk St.
incident_date                        2003-07-08 00:00:00
call_number                                     31890183
alarm_dttm                           2003-07-08 06:58:18
close_dttm                           2003-07-08 06:58:39
zipcode                                            94109
action_taken_primary                     11 - extinguish
neighborhood_district                         Tenderloin
location                 (37.7866410000297, -122.419321)
call_type                                  Building Fire
Name: 0, dtype: object

Observations of the cleaned dataset

All call_type values are 'Building Fire' which is useful, but if one of the goals is to reduce the burden on the SFFD, false alarms should also be included.
It appears there are numerical values for action_taken_primary so those could be split and parsed and mapped to an index, making the field a simple int. Something to try
neighborhood district and some of the location values appear to be null, so that will probably need to be fixed at a later stage if we want to do localized temperature correlations



In [159]:

    
# we'll do the same thing with the weather csv, do a little exploration on it too
tdf = pd.read_table('../../data/external/noaa/5991787088242dat.txt', delim_whitespace=True)



In [160]:

    
# TODO: 4,21 have mixed dtypes, so that must be fixed. It looks like TEMP has **** instead of a NaN value



In [161]:

    
# we need to fix the YR--MODAHRMN column
# YR--MODAHRMN = YEAR-MONTH-DAY-HOUR-MINUTE IN GREENWICH MEAN TIME (GMT)
tdf['data_time'] = pd.to_datetime(tdf['YR--MODAHRMN'], format='%Y%m%d%H%M')



In [162]:

    
tdf['data_time'].min() # kind of a bummer that the entire dataset only goes back to 2013. Why did we do that?? 
# TODO: requery NOAA for records going back to 2000.









    Out[162]:





Timestamp('2013-02-01 00:00:00')



In [163]:

    
# since column 21 is TEMP and that is really important to us, we'll fix that one first
mask = tdf['TEMP'] == '****'
tdf.loc[mask, 'TEMP'] = np.nan



In [164]:

    
tdf.TEMP = pd.to_numeric(tdf.TEMP)



In [165]:

    
tdf.info()









    



<class 'pandas.core.frame.DataFrame'>
Int64Index: 156781 entries, 0 to 156780
Data columns (total 34 columns):
USAF            156781 non-null int64
WBAN            156781 non-null int64
YR--MODAHRMN    156781 non-null int64
DIR             156781 non-null object
SPD             156781 non-null object
GUS             156781 non-null object
CLG             156781 non-null object
SKC             156781 non-null object
L               156781 non-null object
M               156781 non-null object
H               156781 non-null object
VSB             156781 non-null object
MW              156781 non-null object
MW.1            156781 non-null object
MW.2            156781 non-null object
MW.3            156781 non-null object
AW              156781 non-null object
AW.1            156781 non-null object
AW.2            156781 non-null object
AW.3            156781 non-null object
W               156781 non-null object
TEMP            154481 non-null float64
DEWP            156781 non-null object
SLP             156781 non-null object
ALT             156781 non-null object
STP             156781 non-null object
MAX             156781 non-null object
MIN             156781 non-null object
PCP01           156781 non-null object
PCP06           156781 non-null object
PCP24           156781 non-null object
PCPXX           156694 non-null object
SD              155424 non-null object
data_time       156781 non-null datetime64[ns]
dtypes: datetime64[ns](1), float64(1), int64(3), object(29)
memory usage: 41.9+ MB



In [166]:

    
tdf.head()









    Out[166]:






  
    
      
      USAF
      WBAN
      YR--MODAHRMN
      DIR
      SPD
      GUS
      CLG
      SKC
      L
      M
      ...
      ALT
      STP
      MAX
      MIN
      PCP01
      PCP06
      PCP24
      PCPXX
      SD
      data_time
    
  
  
    
      0
      724940
      23234
      201302010000
      ***
      0
      ***
      722
      CLR
      0
      0
      ...
      *****
      1019.5
      62
      42
      *****
      *****
      *****
      *****
      **
      2013-02-01 00:00:00
    
    
      1
      724940
      23234
      201302010056
      310
      5
      ***
      722
      SCT
      *
      *
      ...
      30.19
      1021.7
      ***
      ***
      0.00
      *****
      *****
      *****
      **
      2013-02-01 00:56:00
    
    
      2
      724940
      23234
      201302010156
      280
      8
      ***
      722
      SCT
      *
      *
      ...
      30.18
      1021.4
      ***
      ***
      0.00
      *****
      *****
      *****
      **
      2013-02-01 01:56:00
    
    
      3
      724940
      23234
      201302010256
      280
      8
      ***
      722
      CLR
      *
      *
      ...
      30.19
      1021.7
      ***
      ***
      0.00
      *****
      *****
      *****
      **
      2013-02-01 02:56:00
    
    
      4
      724940
      23234
      201302010356
      280
      7
      ***
      722
      CLR
      *
      *
      ...
      30.19
      1021.7
      ***
      ***
      0.00
      *****
      *****
      *****
      **
      2013-02-01 03:56:00
    
  

5 rows × 34 columns



In [167]:

    
# to make this a little easier to work with let's grab the station ids and limit our data points to only 1 station,
# perhaps the one with the fewest null values?
tdf.groupby('USAF').count()['TEMP']









    Out[167]:





USAF
724940    39236
992640    29455
994016    29311
998011    29822
998479    26657
Name: TEMP, dtype: int64



In [168]:

    
# 724940 is at SF International Airport, which is a little too far south for my taste. Let's see if we can get closer
# oh, it occurs to me that if the temp is absent, we should drop it from our dataset.
tdf = tdf[tdf.TEMP.notnull()]
tdf = tdf[tdf.USAF != 724940]



In [169]:

    
# the second most records comes from 998011 which is near Fairfield. We should get rid of those
tdf = tdf[tdf.USAF != 998011]



In [170]:

    
tdf.groupby('USAF').count()['TEMP'].sort_values(ascending=False)









    Out[170]:





USAF
992640    29455
994016    29311
998479    26657
Name: TEMP, dtype: int64



In [171]:

    
# 992640 is from 18 nautical miles off the coast of SF, so not useful, we'll nix that too
tdf = tdf[tdf.USAF != 992640]



In [172]:

    
# 994016 is in the water off of Fort Mason, so a little better, we'll hold onto that value
# 998479 is pier 1 near the ferry building
# This is disappointing because it means we don't have any reading from inland in the city
# we only have 2 station locations that are close enough to SF to give a good idea of what's going on
tdf.groupby('USAF').count()['TEMP'].sort_values(ascending=False)









    Out[172]:





USAF
994016    29311
998479    26657
Name: TEMP, dtype: int64



In [173]:

    
tdf.iloc[-1]









    Out[173]:





USAF                         998479
WBAN                          99999
YR--MODAHRMN           201607162100
DIR                             270
SPD                              11
GUS                             ***
CLG                             ***
SKC                             ***
L                                 *
M                                 *
H                                 *
VSB                            ****
MW                               **
MW.1                             **
MW.2                             **
MW.3                             **
AW                               **
AW.1                             **
AW.2                             **
AW.3                             **
W                                 *
TEMP                             63
DEWP                           ****
SLP                          ******
ALT                           *****
STP                          ******
MAX                             ***
MIN                             ***
PCP01                         *****
PCP06                         *****
PCP24                         *****
PCPXX                         *****
SD                               **
data_time       2016-07-16 21:00:00
Name: 156780, dtype: object



In [174]:

    
tdf.iloc[-1].head()









    Out[174]:





USAF                  998479
WBAN                   99999
YR--MODAHRMN    201607162100
DIR                      270
SPD                       11
Name: 156780, dtype: object



In [175]:

    
tdf.iloc[0].head()









    Out[175]:





USAF                  994016
WBAN                   99999
YR--MODAHRMN    201302010000
DIR                      230
SPD                        5
Name: 70225, dtype: object



In [176]:

    
# of all the station locations, let's just start with the ferry building and see if there are correlations there
tdf.groupby('USAF').min()['data_time'].sort_values(ascending=False)









    Out[176]:





USAF
998479   2013-02-01
994016   2013-02-01
Name: data_time, dtype: datetime64[ns]



In [177]:

    
tdf.groupby('USAF').max()['data_time'].sort_values(ascending=False)









    Out[177]:





USAF
998479   2016-07-16 21:00:00
994016   2016-07-16 21:00:00
Name: data_time, dtype: datetime64[ns]



In [178]:

    
# since the min and max values are the same for both weather stations, I'll take the ferry building for now.
tdf = tdf[tdf.USAF == 998479]



In [179]:

    
tdf.shape









    Out[179]:





(26657, 34)



In [180]:

    
sns.distplot(tdf.TEMP);



In [181]:

    
tdf.describe()









    Out[181]:






  
    
      
      USAF
      WBAN
      YR--MODAHRMN
      TEMP
    
  
  
    
      count
      26657
      26657
      2.665700e+04
      26657.000000
    
    
      mean
      998479
      99999
      2.014327e+11
      58.078178
    
    
      std
      0
      0
      9.979859e+07
      5.942638
    
    
      min
      998479
      99999
      2.013020e+11
      38.000000
    
    
      25%
      998479
      99999
      2.013113e+11
      54.000000
    
    
      50%
      998479
      99999
      2.014092e+11
      58.000000
    
    
      75%
      998479
      99999
      2.015072e+11
      62.000000
    
    
      max
      998479
      99999
      2.016072e+11
      91.000000



In [182]:

    
tdf.tail()









    Out[182]:






  
    
      
      USAF
      WBAN
      YR--MODAHRMN
      DIR
      SPD
      GUS
      CLG
      SKC
      L
      M
      ...
      ALT
      STP
      MAX
      MIN
      PCP01
      PCP06
      PCP24
      PCPXX
      SD
      data_time
    
  
  
    
      156776
      998479
      99999
      201607161700
      270
      6
      ***
      ***
      ***
      *
      *
      ...
      *****
      ******
      ***
      ***
      *****
      *****
      *****
      *****
      **
      2016-07-16 17:00:00
    
    
      156777
      998479
      99999
      201607161800
      250
      9
      ***
      ***
      ***
      *
      *
      ...
      *****
      ******
      ***
      ***
      *****
      *****
      *****
      *****
      **
      2016-07-16 18:00:00
    
    
      156778
      998479
      99999
      201607161900
      260
      9
      ***
      ***
      ***
      *
      *
      ...
      *****
      ******
      ***
      ***
      *****
      *****
      *****
      *****
      **
      2016-07-16 19:00:00
    
    
      156779
      998479
      99999
      201607162000
      260
      9
      ***
      ***
      ***
      *
      *
      ...
      *****
      ******
      ***
      ***
      *****
      *****
      *****
      *****
      **
      2016-07-16 20:00:00
    
    
      156780
      998479
      99999
      201607162100
      270
      11
      ***
      ***
      ***
      *
      *
      ...
      *****
      ******
      ***
      ***
      *****
      *****
      *****
      *****
      **
      2016-07-16 21:00:00
    
  

5 rows × 34 columns



In [183]:

    
fdf.columns









    Out[183]:





Index([u'incident_number', u'address', u'incident_date', u'call_number',
       u'alarm_dttm', u'close_dttm', u'zipcode', u'action_taken_primary',
       u'neighborhood_district', u'location', u'call_type'],
      dtype='object')



In [291]:

    
# wnow let's try to map both dataframes together, and run groupby functions and correlations
# in the interest of time we'll just focus on the TEMP field and go from there

# We should probably make a mapping function to map the temperature for a call to the closest datetime value
# it appears the granularity of the data is by the hour.
# one simply way of dealing with this is to create a new dataframe, df,
# and the index is the datetime by the hour
# then do a groupby for number of incidents in that hour.
# so the item will have datetime, temperature, incident count
df = fdf.set_index('alarm_dttm').groupby(pd.TimeGrouper('H')).count()['incident_number']



In [292]:

    
df = df.reset_index()



In [293]:

    
df.head()









    Out[293]:






  
    
      
      alarm_dttm
      incident_number
    
  
  
    
      0
      2003-01-01 07:00:00
      2
    
    
      1
      2003-01-01 08:00:00
      0
    
    
      2
      2003-01-01 09:00:00
      0
    
    
      3
      2003-01-01 10:00:00
      0
    
    
      4
      2003-01-01 11:00:00
      1



In [294]:

    
df.min()









    Out[294]:





alarm_dttm         2003-01-01 07:00:00
incident_number                      0
dtype: object



In [295]:

    
# I realized that if I don't have temperature data for 2003, I need to drop that, so let's limit df to only those dates
# where we have a temp value
tdf.USAF.value_counts()









    Out[295]:





998479    26657
Name: USAF, dtype: int64



In [296]:

    
tdf.iloc[-1]









    Out[296]:





USAF                         998479
WBAN                          99999
YR--MODAHRMN           201607162100
DIR                             270
SPD                              11
GUS                             ***
CLG                             ***
SKC                             ***
L                                 *
M                                 *
H                                 *
VSB                            ****
MW                               **
MW.1                             **
MW.2                             **
MW.3                             **
AW                               **
AW.1                             **
AW.2                             **
AW.3                             **
W                                 *
TEMP                             63
DEWP                           ****
SLP                          ******
ALT                           *****
STP                          ******
MAX                             ***
MIN                             ***
PCP01                         *****
PCP06                         *****
PCP24                         *****
PCPXX                         *****
SD                               **
data_time       2016-07-16 21:00:00
Name: 156780, dtype: object



In [297]:

    
temp_df = tdf[['data_time','TEMP']]
temp_df.head()









    Out[297]:






  
    
      
      data_time
      TEMP
    
  
  
    
      129795
      2013-02-01 00:00:00
      61
    
    
      129796
      2013-02-01 01:00:00
      61
    
    
      129797
      2013-02-01 02:00:00
      60
    
    
      129798
      2013-02-01 03:00:00
      58
    
    
      129799
      2013-02-01 04:00:00
      57



In [298]:

    
df = df.rename(columns={'alarm_dttm':'dttm'})



In [299]:

    
temp_df = temp_df.rename(columns={'data_time':'dttm'})



In [300]:

    
df = df.merge(temp_df)



In [301]:

    
df.head()









    Out[301]:






  
    
      
      dttm
      incident_number
      TEMP
    
  
  
    
      0
      2013-02-01 00:00:00
      1
      61
    
    
      1
      2013-02-01 01:00:00
      0
      61
    
    
      2
      2013-02-01 02:00:00
      1
      60
    
    
      3
      2013-02-01 03:00:00
      1
      58
    
    
      4
      2013-02-01 04:00:00
      0
      57



In [302]:

    
df.info()









    



<class 'pandas.core.frame.DataFrame'>
Int64Index: 26148 entries, 0 to 26147
Data columns (total 3 columns):
dttm               26148 non-null datetime64[ns]
incident_number    26148 non-null int64
TEMP               26148 non-null float64
dtypes: datetime64[ns](1), float64(1), int64(1)
memory usage: 817.1 KB



In [303]:

    
df = df.rename(columns={'incident_number':'incident_count'})



In [304]:

    
sns.distplot(df.incident_count)









    Out[304]:





<matplotlib.axes._subplots.AxesSubplot at 0x11ca67bd0>



In [305]:

    
sns.distplot(df.TEMP)









    Out[305]:





<matplotlib.axes._subplots.AxesSubplot at 0x11ca12590>



In [306]:

    
sns.jointplot(x="TEMP", y="incident_count", data=df);



In [307]:

    
df = df.set_index('dttm')



In [308]:

    
ndf = df.groupby(pd.TimeGrouper('D')).mean()



In [309]:

    
sns.jointplot(x="TEMP", y="incident_count", data=ndf);



In [310]:

    
with sns.axes_style("white"):
    sns.jointplot(x="TEMP", y="incident_count", data=ndf, kind='hex', color='k');



In [311]:

    
# TODO: look only at when the incident count is high
# TODO: did it wrong! rebuild with 'D' seperately and merge
df = fdf.set_index('alarm_dttm').groupby(pd.TimeGrouper('D')).count()['incident_number']
df = df.reset_index()



In [312]:

    
temp_df = tdf[['data_time','TEMP']]
temp_df = temp_df.set_index('data_time').groupby(pd.TimeGrouper('d')).mean()['TEMP']
temp_df.head()









    Out[312]:





data_time
2013-02-01    54.083333
2013-02-02    52.750000
2013-02-03    51.083333
2013-02-04    49.916667
2013-02-05    49.541667
Freq: D, Name: TEMP, dtype: float64



In [313]:

    
temp_df = temp_df.reset_index().rename(columns={'data_time':'dttm'})



In [314]:

    
temp_df.head()



In [315]:

    
df.head()









    Out[315]:






  
    
      
      alarm_dttm
      incident_number
    
  
  
    
      0
      2003-01-01
      15
    
    
      1
      2003-01-02
      12
    
    
      2
      2003-01-03
      10
    
    
      3
      2003-01-04
      14
    
    
      4
      2003-01-05
      13



In [316]:

    
df = df.rename(columns={'incident_number':'incident_count'})



In [317]:

    
df = df.rename(columns={'alarm_dttm':'dttm'})



In [318]:

    
df = df.merge(temp_df)



In [319]:

    
df.head()









    Out[319]:






  
    
      
      dttm
      incident_count
      TEMP
    
  
  
    
      0
      2013-02-01
      13
      54.083333
    
    
      1
      2013-02-02
      15
      52.750000
    
    
      2
      2013-02-03
      15
      51.083333
    
    
      3
      2013-02-04
      13
      49.916667
    
    
      4
      2013-02-05
      15
      49.541667



In [320]:

    
sns.jointplot(x="TEMP", y="incident_count", data=df);



In [321]:

    
sns.jointplot(x="TEMP", y="incident_count", data=df, kind='reg');



In [322]:

    
with sns.axes_style("white"):
    sns.jointplot(x="TEMP", y="incident_count", data=ndf, kind='hex', color='k');



In [323]:

    
# TODO : explore the extreme outliers, what are up with them?



In [324]:

    
sns.jointplot(x="TEMP", y="incident_count", data=ndf, kind='kde', color='k');



In [325]:

    
df.head()









    Out[325]:






  
    
      
      dttm
      incident_count
      TEMP
    
  
  
    
      0
      2013-02-01
      13
      54.083333
    
    
      1
      2013-02-02
      15
      52.750000
    
    
      2
      2013-02-03
      15
      51.083333
    
    
      3
      2013-02-04
      13
      49.916667
    
    
      4
      2013-02-05
      15
      49.541667



In [326]:

    
df.describe()









    Out[326]:






  
    
      
      incident_count
      TEMP
    
  
  
    
      count
      1235.000000
      1193.000000
    
    
      mean
      12.750607
      58.025158
    
    
      std
      4.586923
      4.866921
    
    
      min
      1.000000
      41.458333
    
    
      25%
      10.000000
      54.625000
    
    
      50%
      12.000000
      57.769231
    
    
      75%
      15.000000
      61.500000
    
    
      max
      46.000000
      75.500000



In [327]:

    
df[:365].set_index('dttm').TEMP.plot()









    Out[327]:





<matplotlib.axes._subplots.AxesSubplot at 0x11a947a50>



In [328]:

    
df[:365].set_index('dttm').incident_count.plot()









    Out[328]:





<matplotlib.axes._subplots.AxesSubplot at 0x112fb0490>



In [329]:

    
df[:365].set_index('dttm').incident_count.plot()









    Out[329]:





<matplotlib.axes._subplots.AxesSubplot at 0x11b14d510>



In [330]:

    
df.shape









    Out[330]:





(1235, 3)



In [331]:

    
df.incident_count.sum()









    Out[331]:





15747



In [332]:

    
df.set_index('dttm').plot(kind='scatter', y='incident_count', x='TEMP')









    Out[332]:





<matplotlib.axes._subplots.AxesSubplot at 0x11a8f7ad0>



In [335]:

    
df.set_index('dttm').plot(kind='hexbin', y='incident_count', x='TEMP')









    Out[335]:





<matplotlib.axes._subplots.AxesSubplot at 0x1242503d0>



In [334]:

    
# nothing is jumping out here, but check the pearsonr and 
# TODO: rolling mean and rolling std

Conclusions so far

In aggregate when looking at daily averages, and also when looking at hourly data, there is no strnog evidence that would suggest that we can reject the null hypothesis.

TODO NEXT

Try and look at some other variables from NOAA to see if there are other correlations there. Make a master dataframe with all data by the hour from fdf and merge it with all numerical data from NOAA.

Do a proper cleaning and filling of NA values with averages and zeros, instead of just omitting them from the dataset.



In [ ]:

	incident_number	address	incident_date	call_number	alarm_dttm	close_dttm	zipcode	action_taken_primary	neighborhood_district	location	call_type
0	3000001	1301 Turk St.	2003-07-08	31890183	2003-07-08 06:58:18	2003-07-08 06:58:39	94109	11 - extinguish	Tenderloin	(37.7866410000297, -122.419321)	Building Fire
1	3000181	1251 Eddy St.	2003-01-01	30010241	2003-01-01 07:36:08	2003-01-01 07:44:07	0	86 - investigate	NaN	NaN	Building Fire
2	3000185	33 White St.	2003-01-01	30010246	2003-01-01 07:56:47	2003-01-01 08:02:53	94109	86 - investigate	Russian Hill	(37.7976446500297, -122.41971707)	Building Fire
3	3000224	1201 12th Av.	2003-01-01	30010313	2003-01-01 11:31:21	2003-01-01 11:39:35	94122	86 - investigate	Inner Sunset	(37.7657010000297, -122.469938)	Building Fire
4	3000232	832 Mission St.	2003-01-01	30010326	2003-01-01 12:33:52	2003-01-01 12:39:03	94103	86 - investigate	South of Market	(37.7840465400297, -122.40539981)	Building Fire

	USAF	WBAN	YR--MODAHRMN	DIR	SPD	GUS	CLG	SKC	L	M	...	ALT	STP	MAX	MIN	PCP01	PCP06	PCP24	PCPXX	SD	data_time
0	724940	23234	201302010000	***	0	***	722	CLR	0	0	...	*****	1019.5	62	42	*****	*****	*****	*****	**	2013-02-01 00:00:00
1	724940	23234	201302010056	310	5	***	722	SCT	*	*	...	30.19	1021.7	***	***	0.00	*****	*****	*****	**	2013-02-01 00:56:00
2	724940	23234	201302010156	280	8	***	722	SCT	*	*	...	30.18	1021.4	***	***	0.00	*****	*****	*****	**	2013-02-01 01:56:00
3	724940	23234	201302010256	280	8	***	722	CLR	*	*	...	30.19	1021.7	***	***	0.00	*****	*****	*****	**	2013-02-01 02:56:00
4	724940	23234	201302010356	280	7	***	722	CLR	*	*	...	30.19	1021.7	***	***	0.00	*****	*****	*****	**	2013-02-01 03:56:00

	USAF	WBAN	YR--MODAHRMN	TEMP
count	26657	26657	2.665700e+04	26657.000000
mean	998479	99999	2.014327e+11	58.078178
std	0	0	9.979859e+07	5.942638
min	998479	99999	2.013020e+11	38.000000
25%	998479	99999	2.013113e+11	54.000000
50%	998479	99999	2.014092e+11	58.000000
75%	998479	99999	2.015072e+11	62.000000
max	998479	99999	2.016072e+11	91.000000

	USAF	WBAN	YR--MODAHRMN	DIR	SPD	GUS	CLG	SKC	L	M	...	ALT	STP	MAX	MIN	PCP01	PCP06	PCP24	PCPXX	SD	data_time
156776	998479	99999	201607161700	270	6	***	***	***	*	*	...	*****	******	***	***	*****	*****	*****	*****	**	2016-07-16 17:00:00
156777	998479	99999	201607161800	250	9	***	***	***	*	*	...	*****	******	***	***	*****	*****	*****	*****	**	2016-07-16 18:00:00
156778	998479	99999	201607161900	260	9	***	***	***	*	*	...	*****	******	***	***	*****	*****	*****	*****	**	2016-07-16 19:00:00
156779	998479	99999	201607162000	260	9	***	***	***	*	*	...	*****	******	***	***	*****	*****	*****	*****	**	2016-07-16 20:00:00
156780	998479	99999	201607162100	270	11	***	***	***	*	*	...	*****	******	***	***	*****	*****	*****	*****	**	2016-07-16 21:00:00

	alarm_dttm	incident_number
0	2003-01-01 07:00:00	2
1	2003-01-01 08:00:00	0
2	2003-01-01 09:00:00	0
3	2003-01-01 10:00:00	0
4	2003-01-01 11:00:00	1

	data_time	TEMP
129795	2013-02-01 00:00:00	61
129796	2013-02-01 01:00:00	61
129797	2013-02-01 02:00:00	60
129798	2013-02-01 03:00:00	58
129799	2013-02-01 04:00:00	57

	dttm	TEMP
0	2013-02-01	54.083333
1	2013-02-02	52.750000
2	2013-02-03	51.083333
3	2013-02-04	49.916667
4	2013-02-05	49.541667

	alarm_dttm	incident_number
0	2003-01-01	15
1	2003-01-02	12
2	2003-01-03	10
3	2003-01-04	14
4	2003-01-05	13

	incident_count	TEMP
count	1235.000000	1193.000000
mean	12.750607	58.025158
std	4.586923	4.866921
min	1.000000	41.458333
25%	10.000000	54.625000
50%	12.000000	57.769231
75%	15.000000	61.500000
max	46.000000	75.500000