Traffic Incident Reports in San Francisco

Visualize traffic incident reports in San Francisco.

Data sources:



In [1]:

    
from cartoframes.auth import set_default_credentials, Credentials
from cartoframes.viz import Map, Layer, Source
import pandas as pd
import geopandas as gpd

If you have a CARTO account, you can set your credentials in the following cell. This allows you to upload the dataset and share the final visualization through your account.



In [2]:

    
set_default_credentials('creds.json')

Load incident reports

Using pandas, we can read an external data source, which is converted to a dataframe. Let's see which columns we have:



In [3]:

    
incident_reports_df = pd.read_csv('http://data.sfgov.org/resource/wg3w-h783.csv')
incident_reports_df.head()









    Out[3]:







  
    
      
      incident_datetime
      incident_date
      incident_time
      incident_year
      incident_day_of_week
      report_datetime
      row_id
      incident_id
      incident_number
      cad_number
      ...
      :@computed_region_6qbp_sg9q
      :@computed_region_qgnn_b9vv
      :@computed_region_26cr_cadq
      :@computed_region_ajp5_b2md
      :@computed_region_nqbw_i6c3
      :@computed_region_2dwj_jsy4
      :@computed_region_h4ep_8xdi
      :@computed_region_y6ts_4iup
      :@computed_region_jg9y_a9du
      :@computed_region_6pnf_4xz7
    
  
  
    
      0
      2020-02-03T14:45:00.000
      2020-02-03T00:00:00.000
      14:45
      2020
      Monday
      2020-02-03T17:50:00.000
      89881675000
      898816
      200085557
      200342870.0
      ...
      41.0
      10.0
      8.0
      16.0
      NaN
      NaN
      NaN
      NaN
      NaN
      2.0
    
    
      1
      2020-02-03T03:45:00.000
      2020-02-03T00:00:00.000
      03:45
      2020
      Monday
      2020-02-03T03:45:00.000
      89860711012
      898607
      200083749
      200340316.0
      ...
      53.0
      3.0
      2.0
      20.0
      3.0
      NaN
      NaN
      NaN
      NaN
      2.0
    
    
      2
      2020-02-03T10:00:00.000
      2020-02-03T00:00:00.000
      10:00
      2020
      Monday
      2020-02-03T10:06:00.000
      89867264015
      898672
      200084060
      200340808.0
      ...
      19.0
      5.0
      3.0
      8.0
      NaN
      35.0
      NaN
      NaN
      NaN
      2.0
    
    
      3
      2020-01-19T17:12:00.000
      2020-01-19T00:00:00.000
      17:12
      2020
      Sunday
      2020-02-01T13:01:00.000
      89863571000
      898635
      206024187
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      4
      2020-01-05T00:00:00.000
      2020-01-05T00:00:00.000
      00:00
      2020
      Sunday
      2020-02-03T16:09:00.000
      89877368020
      898773
      200085193
      200342341.0
      ...
      103.0
      4.0
      6.0
      30.0
      NaN
      NaN
      NaN
      NaN
      NaN
      1.0
    
  

5 rows × 36 columns



In [4]:

    
incident_reports_df.columns









    Out[4]:





Index(['incident_datetime', 'incident_date', 'incident_time', 'incident_year',
       'incident_day_of_week', 'report_datetime', 'row_id', 'incident_id',
       'incident_number', 'cad_number', 'report_type_code',
       'report_type_description', 'filed_online', 'incident_code',
       'incident_category', 'incident_subcategory', 'incident_description',
       'resolution', 'intersection', 'cnn', 'police_district',
       'analysis_neighborhood', 'supervisor_district', 'latitude', 'longitude',
       'point', ':@computed_region_6qbp_sg9q', ':@computed_region_qgnn_b9vv',
       ':@computed_region_26cr_cadq', ':@computed_region_ajp5_b2md',
       ':@computed_region_nqbw_i6c3', ':@computed_region_2dwj_jsy4',
       ':@computed_region_h4ep_8xdi', ':@computed_region_y6ts_4iup',
       ':@computed_region_jg9y_a9du', ':@computed_region_6pnf_4xz7'],
      dtype='object')

Some of the latitude and longitude values are NaN, in the next step we get rid of them. After that, we create a dataset from the dataframe and use it in a Layer to visualize the data:



In [5]:

    
incident_reports_df = incident_reports_df[incident_reports_df.longitude == incident_reports_df.longitude]
incident_reports_df = incident_reports_df[incident_reports_df.latitude == incident_reports_df.latitude]

incident_reports_gdf = gpd.GeoDataFrame(
    incident_reports_df,
    geometry=gpd.points_from_xy(incident_reports_df.longitude, incident_reports_df.latitude)
)

incident_reports_gdf.head()









    Out[5]:







  
    
      
      incident_datetime
      incident_date
      incident_time
      incident_year
      incident_day_of_week
      report_datetime
      row_id
      incident_id
      incident_number
      cad_number
      ...
      :@computed_region_qgnn_b9vv
      :@computed_region_26cr_cadq
      :@computed_region_ajp5_b2md
      :@computed_region_nqbw_i6c3
      :@computed_region_2dwj_jsy4
      :@computed_region_h4ep_8xdi
      :@computed_region_y6ts_4iup
      :@computed_region_jg9y_a9du
      :@computed_region_6pnf_4xz7
      geometry
    
  
  
    
      0
      2020-02-03T14:45:00.000
      2020-02-03T00:00:00.000
      14:45
      2020
      Monday
      2020-02-03T17:50:00.000
      89881675000
      898816
      200085557
      200342870.0
      ...
      10.0
      8.0
      16.0
      NaN
      NaN
      NaN
      NaN
      NaN
      2.0
      POINT (-122.47604 37.72695)
    
    
      1
      2020-02-03T03:45:00.000
      2020-02-03T00:00:00.000
      03:45
      2020
      Monday
      2020-02-03T03:45:00.000
      89860711012
      898607
      200083749
      200340316.0
      ...
      3.0
      2.0
      20.0
      3.0
      NaN
      NaN
      NaN
      NaN
      2.0
      POINT (-122.41517 37.75244)
    
    
      2
      2020-02-03T10:00:00.000
      2020-02-03T00:00:00.000
      10:00
      2020
      Monday
      2020-02-03T10:06:00.000
      89867264015
      898672
      200084060
      200340808.0
      ...
      5.0
      3.0
      8.0
      NaN
      35.0
      NaN
      NaN
      NaN
      2.0
      POINT (-122.40734 37.78456)
    
    
      4
      2020-01-05T00:00:00.000
      2020-01-05T00:00:00.000
      00:00
      2020
      Sunday
      2020-02-03T16:09:00.000
      89877368020
      898773
      200085193
      200342341.0
      ...
      4.0
      6.0
      30.0
      NaN
      NaN
      NaN
      NaN
      NaN
      1.0
      POINT (-122.44025 37.78711)
    
    
      5
      2020-02-03T08:36:00.000
      2020-02-03T00:00:00.000
      08:36
      2020
      Monday
      2020-02-03T08:36:00.000
      89876268020
      898762
      200083909
      200340826.0
      ...
      6.0
      3.0
      8.0
      NaN
      NaN
      NaN
      NaN
      NaN
      1.0
      POINT (-122.39951 37.79693)
    
  

5 rows × 37 columns



In [7]:

    
Layer(incident_reports_gdf)

Now, we are going to use a helper method to color by category, and the category is 'Day of Week' (incident_day_of_week)



In [20]:

    
from cartoframes.viz import Layer, color_category_style

Layer(incident_reports_gdf, color_category_style('incident_day_of_week', top=7), title='Day of Week')

As we can see in the legend, the days are sorted by frequency, which means that there're less incidents on Thursdays and More on Tuesdays. Since our purpose is not to visualize the frequency and we want to see the days properly sorted from Monday to Sunday in the legend, we can modify the helper and set the categories we want to visualize in the desired position:



In [21]:

    
from cartoframes.viz import color_category_style


Layer(
    incident_reports_gdf, 
    color_category_style(
        'incident_day_of_week',
        cat=[
            'Monday',
            'Tuesday',
            'Wednesday',
            'Thursday',
            'Friday',
            'Saturday',
            'Sunday'
        ]
    ),
    title='Day of Week'
)

Now, we want to look for traffic incidents, and then use these categories to visualize those incidents:



In [10]:

    
incident_reports_df.incident_category.unique()









    Out[10]:





array(['Missing Person', 'Stolen Property', 'Non-Criminal',
       'Miscellaneous Investigation',
       'Offences Against The Family And Children', 'Other Miscellaneous',
       'Lost Property', 'Larceny Theft', 'Warrant', 'Assault', 'Fraud',
       'Burglary', 'Traffic Violation Arrest', 'Weapons Carrying Etc',
       'Malicious Mischief', 'Motor Vehicle Theft', 'Drug Offense',
       'Other', 'Other Offenses', 'Recovered Vehicle', 'Robbery',
       'Suspicious Occ', 'Disorderly Conduct', 'Weapons Offense',
       'Vandalism', 'Embezzlement', 'Courtesy Report', 'Sex Offense',
       'Drug Violation', 'Traffic Collision', 'Prostitution',
       'Forgery And Counterfeiting', 'Case Closure', 'Homicide', 'Arson',
       'Suicide', 'Vehicle Impounded', 'Liquor Laws'], dtype=object)



In [22]:

    
from cartoframes.viz import Layer, size_category_style

Layer(
    incident_reports_gdf,
    size_category_style(
        'incident_category',
        cat=['Traffic Collision', 'Traffic Violation Arrest']
    ),
    title='Traffic Incidents'
)

In CARTO we have a dataset we can use for the next step, named 'sfcta_congestion_roads'. We are going to set the Credentials for this dataset. To have more control over this dataset, if you have a CARTO account you can import it to have everything together, and it won't be needed to create a different source for this Dataset.

Once we've the data source created, we're going to combine two helper methods. The first one uses the Source with the roads data from CARTO, and the second one the traffic incident reports.



In [23]:

    
from cartoframes.viz import Layer, color_continuous_style, size_category_style

sfcta_congestion_roads_source=Source(
    'sfcta_congestion_roads',
    Credentials(
        base_url='https://cartovl.carto.com',
        api_key='default_public'
    )
)

Map([
    Layer(
        sfcta_congestion_roads_source,
        color_continuous_style('auto_speed'),
        title='Recorded vehicle speeds'
    ),
    Layer(
        incident_reports_gdf,
        size_category_style(
            'incident_category',
            cat=['Traffic Collision', 'Traffic Violation Arrest']
        ),
        title='Traffic Incidents'
    )
])

We are going to add information about traffic signals, by getting data from a different source:



In [30]:

    
traffic_signals_df = pd.read_csv('http://data.sfgov.org/resource/c8ue-f4py.csv')
traffic_signals_df.head()









    Out[30]:







  
    
      
      objectid
      cnn
      code
      cnn_1
      street1
      street2
      street3
      street4
      detection
      sup_dist
      ...
      percent_po
      point
      point_address
      point_city
      point_state
      point_zip
      :@computed_region_6qbp_sg9q
      :@computed_region_qgnn_b9vv
      :@computed_region_26cr_cadq
      :@computed_region_ajp5_b2md
    
  
  
    
      0
      2400
      26392000
      NaN
      26392000
      MASONIC
      TURK
      NaN
      NaN
      NaN
      1,2,5
      ...
      NaN
      POINT (-122.44702658724947 37.778617364507106)
      NaN
      NaN
      NaN
      NaN
      97.0
      7.0
      11.0
      18.0
    
    
      1
      2737
      20421000
      NaN
      20421000
      GENEVA
      SANTOS
      NaN
      NaN
      NaN
      10
      ...
      NaN
      POINT (-122.42006985840429 37.70830632733309)
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      9.0
      NaN
    
    
      2
      2981
      24018000
      NaN
      24018000
      16TH ST
      UTAH
      NaN
      NaN
      NaN
      10
      ...
      NaN
      POINT (-122.40653961377903 37.76583732660274)
      NaN
      NaN
      NaN
      NaN
      54.0
      3.0
      9.0
      20.0
    
    
      3
      2934
      9941000
      NaN
      9941000
      OLYMPIA
      MIDBLOCK AT
      DELLBROOK (N)
      DELLBROOK (S)
      NaN
      7
      ...
      NaN
      POINT (-122.45364260943721 37.75120299724448)
      NaN
      NaN
      NaN
      NaN
      48.0
      7.0
      8.0
      38.0
    
    
      4
      2922
      26932000
      NaN
      26932000
      BEAUMONT
      GEARY
      NaN
      NaN
      NaN
      1,2
      ...
      NaN
      POINT (-122.4552675286903 37.781453718876065)
      NaN
      NaN
      NaN
      NaN
      12.0
      8.0
      4.0
      18.0
    
  

5 rows × 44 columns



In [31]:

    
traffic_signals_df.columns









    Out[31]:





Index(['objectid', 'cnn', 'code', 'cnn_1', 'street1', 'street2', 'street3',
       'street4', 'detection', 'sup_dist', 'veh_actuat', 'aps', 'ped_signal',
       'ped_actuat', 'tbc', 'preempt_pr', 'd_ate2070', 'project_ne',
       'project_ol', 'upgraded', 'yr_of_cont', 'last_upgra', 'new_signal',
       'mod_projec', 'full_upgra', 'beacon_fla', 'funding', 'rlcam',
       'startyear', 'caltrans_r', 'caltrans', 'percent_c', 'sf', 'percent_sf',
       'percent_po', 'point', 'point_address', 'point_city', 'point_state',
       'point_zip', ':@computed_region_6qbp_sg9q',
       ':@computed_region_qgnn_b9vv', ':@computed_region_26cr_cadq',
       ':@computed_region_ajp5_b2md'],
      dtype='object')



In [32]:

    
traffic_signals_df.code.unique()









    Out[32]:





array([nan])

Since there is no latitude and longitude columns, we can use the point column to create a GeoDataFrame.



In [33]:

    
from shapely import wkt

traffic_signals_df['point'] = traffic_signals_df['point'].apply(wkt.loads)
traffic_signals_df = traffic_signals_df.rename(columns={'point': 'geometry'}).set_geometry('geometry')
trafic_signals_gdf = gpd.GeoDataFrame(traffic_signals_df, geometry='geometry')



In [34]:

    
Map(Layer(trafic_signals_gdf))



In [36]:

    
from cartoframes.viz import Layer, color_continuous_style, size_category_style, basic_style

Map([
    Layer(
        sfcta_congestion_roads_source,
        color_continuous_style('auto_speed'),
        title='Recorded vehicle speeds'
    ),
    Layer(
        incident_reports_gdf,
        size_category_style(
            'incident_category',
            cat=['Traffic Collision', 'Traffic Violation Arrest']
        ),
        title='Traffic Incidents'
    ),
    Layer(
        trafic_signals_gdf,
        basic_style(color='blue', size=1),
        title='Traffic Signals'
    )
],
layer_selector=True)

	incident_datetime	incident_date	incident_time	incident_year	incident_day_of_week	report_datetime	row_id	incident_id	incident_number	cad_number	...	:@computed_region_6qbp_sg9q	:@computed_region_qgnn_b9vv	:@computed_region_26cr_cadq	:@computed_region_ajp5_b2md	:@computed_region_nqbw_i6c3	:@computed_region_2dwj_jsy4	:@computed_region_h4ep_8xdi	:@computed_region_y6ts_4iup	:@computed_region_jg9y_a9du	:@computed_region_6pnf_4xz7
0	2020-02-03T14:45:00.000	2020-02-03T00:00:00.000	14:45	2020	Monday	2020-02-03T17:50:00.000	89881675000	898816	200085557	200342870.0	...	41.0	10.0	8.0	16.0	NaN	NaN	NaN	NaN	NaN	2.0
1	2020-02-03T03:45:00.000	2020-02-03T00:00:00.000	03:45	2020	Monday	2020-02-03T03:45:00.000	89860711012	898607	200083749	200340316.0	...	53.0	3.0	2.0	20.0	3.0	NaN	NaN	NaN	NaN	2.0
2	2020-02-03T10:00:00.000	2020-02-03T00:00:00.000	10:00	2020	Monday	2020-02-03T10:06:00.000	89867264015	898672	200084060	200340808.0	...	19.0	5.0	3.0	8.0	NaN	35.0	NaN	NaN	NaN	2.0
3	2020-01-19T17:12:00.000	2020-01-19T00:00:00.000	17:12	2020	Sunday	2020-02-01T13:01:00.000	89863571000	898635	206024187	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	2020-01-05T00:00:00.000	2020-01-05T00:00:00.000	00:00	2020	Sunday	2020-02-03T16:09:00.000	89877368020	898773	200085193	200342341.0	...	103.0	4.0	6.0	30.0	NaN	NaN	NaN	NaN	NaN	1.0

	objectid	cnn	code	cnn_1	street1	street2	street3	street4	detection	sup_dist	...	percent_po	point	point_address	point_city	point_state	point_zip	:@computed_region_6qbp_sg9q	:@computed_region_qgnn_b9vv	:@computed_region_26cr_cadq	:@computed_region_ajp5_b2md
0	2400	26392000	NaN	26392000	MASONIC	TURK	NaN	NaN	NaN	1,2,5	...	NaN	POINT (-122.44702658724947 37.778617364507106)	NaN	NaN	NaN	NaN	97.0	7.0	11.0	18.0
1	2737	20421000	NaN	20421000	GENEVA	SANTOS	NaN	NaN	NaN	10	...	NaN	POINT (-122.42006985840429 37.70830632733309)	NaN	NaN	NaN	NaN	NaN	NaN	9.0	NaN
2	2981	24018000	NaN	24018000	16TH ST	UTAH	NaN	NaN	NaN	10	...	NaN	POINT (-122.40653961377903 37.76583732660274)	NaN	NaN	NaN	NaN	54.0	3.0	9.0	20.0
3	2934	9941000	NaN	9941000	OLYMPIA	MIDBLOCK AT	DELLBROOK (N)	DELLBROOK (S)	NaN	7	...	NaN	POINT (-122.45364260943721 37.75120299724448)	NaN	NaN	NaN	NaN	48.0	7.0	8.0	38.0
4	2922	26932000	NaN	26932000	BEAUMONT	GEARY	NaN	NaN	NaN	1,2	...	NaN	POINT (-122.4552675286903 37.781453718876065)	NaN	NaN	NaN	NaN	12.0	8.0	4.0	18.0