In [1]:

    
import collections
import functools
from imposm.parser import OSMParser
import itertools
from IPython.display import HTML
import json
from matplotlib import collections as mc
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import matplotlib.cm as cmx
from numpy import nan
import numpy as np
import pandas as pd
import pyproj
import requests
import scipy as sp
import rtree
import seaborn as sb
from scipy import signal
# import shapely
import shapely.geometry
%pylab inline

import data_munging









    



Populating the interactive namespace from numpy and matplotlib






    



WARNING: pylab import has clobbered these variables: ['colors']
`%matplotlib` prevents importing * from pylab and numpy

Measurement of Street Quality by Citizens

Zane Blanton

Project Pavement

Pain Points

Point of Indifference

Pleasure Points

Why should citizens measure infrastructure themselves?

But Who Exactly Are We?

"a group of thousands of designers, academic researchers, data journalists, activists, policy wonks, web developers and curious citizens who want to make our city more just, equitable, transparent and delightful to live in through data, design and technology."



In [14]:

    
HTML('<iframe src="http://chihacknight.org/" width=800 height=400></iframe>')









    Out[14]:



In [17]:

    
HTML('<iframe src="http://project-pavement.herokuapp.com/" width=800 height=400></iframe>')









    Out[17]:

Empower the People!

Open data for individual cyclists and planner -- Show road quality by data journalism routing corridor planning road resurfacing -- anyone can take our code and measure pavement quality in their cities -- can extend our apps to gather more data



In [55]:

    
rides, readings = data_munging.read_raw_data()
raw_reading = readings.loc[0, :]
# This is here temporarily both to create a static dataset and because
# some of the newer android readings are bogus and need to be removed.
rides = rides.loc[rides['created_at'] < '2016-03-01', :]
readings = readings.loc[readings['created_at'] < '2016-03-01', :]
n, p = readings.shape

Data Collection

Servers

Data Description

Every second we collect lat/lon and 100 readings from x, y, and z axes



In [11]:

    
raw_reading









    Out[11]:





id                                                             5611
created_at                               2015-10-22 23:12:41.285151
updated_at                               2015-10-27 21:13:49.066056
start_lat                                                   41.8968
start_lon                                                  -87.6342
end_lat                                                     41.8967
end_lon                                                    -87.6342
acceleration_x    ---\n- 0.0809326171875\n- 0.171295166015625\n-...
acceleration_y    ---\n- -0.6805877685546875\n- -0.6985168457031...
acceleration_z    ---\n- -0.207244873046875\n- -0.50392150878906...
angle_x                                                     1.59645
angle_y                                                     2.48285
angle_z                                                     2.16167
ride_id                                                          41
start_time                                              1.44556e+09
end_time                                                1.44556e+09
mean_g                                                     0.346415
Name: 0, dtype: object



In [57]:

    
# chi_readings = data_munging.filter_readings_to_chicago(readings)
# chi_rides = list(set(chi_readings.ride_id))



In [41]:

    
readings = readings.loc[np.logical_and(readings.gps_mph > 5, readings.gps_mph < 30), :]



In [182]:

    
# Visualization of Accelerometer Time Series
for i in random_indices:
    colors.append(np.random.random(3))
    sb.tsplot(readings['num_accel_' + axis][i][0:100], alpha=0.8, color=colors[-1])
fig = plt.gcf()
fig.set_size_inches(18.5, 10.5)
plt.xlabel('Accelerometer Values')
plt.ylabel('Force (Gravities)')
plt.title('Random sample of ' + str(sample_size) + ' ' + axis + ' Accelerometer Time Series')
plt.savefig('../dat/plots_for_presentation/z_accel.png')



In [56]:

    
readings = data_munging.clean_readings(readings)
readings = data_munging.add_proj_to_readings(readings, data_munging.NAD83)



In [177]:

    
axis = 'z'
sample_size = 10
random_indices = np.random.choice(readings.index, sample_size)
colors = []

Mapped Data

http://project-pavement.herokuapp.com/map/chi

What to do with Fourier Series?

Could filter out the high frequency components as noise?
Could try to back out pavement quality by modeling the bicycle/cyclist system as an FIR?
Could determine the most subjectively unpleasant frequencies and model those?

I couldn't find anything practical to do with these.

Data Notes

Data is sampled at 100Hz
Z access is measured with respect to the phone and

Justifying Roughness Measure

Find all pairs of readings that are close
Check candidate "roughness" measures for consistency



In [42]:

    
readings_idx = data_munging.insert_readings_rtree(readings)

Finding Geographically Close Readings



In [ ]:

    
total_intersections = []



In [175]:

    
for sample_in in range(100):
    random_idx, random_point = data_munging.select_random_point(readings)
    random_bb = data_munging.point_to_bb(*random_point, side_length=0.2)
    intersecting_segments = list(readings_idx.intersection(random_bb))
    intersecting_segments = [i for i in intersecting_segments if data_munging.calc_reading_diffs(readings.ix[i, :], readings.ix[random_idx, :]) < 0.4]
    if len(intersecting_segments) > 1:
        for i in intersecting_segments:
            plt.plot([readings.ix[i, 'start_x'], readings.ix[i, 'end_x']], [readings.ix[i, 'start_y'], readings.ix[i, 'end_y']], alpha=0.5)
        fig = plt.gcf()
        fig.set_size_inches(18.5, 10.5)
        plt.title('Intersections of ' + str(intersecting_segments))
        plt.show()



In [44]:

    
# Use a threshold for 0.4

def calc_reading_diffs(reading0, reading1):
    start0 = reading0[['start_x', 'start_y']].values
    start1 = reading1[['start_x', 'start_y']].values
    end0 = reading0[['end_x', 'end_y']].values
    end1 = reading1[['end_x', 'end_y']].values
    diff0 = np.linalg.norm(start0 - start1) + np.linalg.norm(end0 - end1)
    diff1 = np.linalg.norm(start0 - end1) + np.linalg.norm(end0 - start1)
    diff = min(diff0, diff1)
    dist0 = np.linalg.norm(start0 - end0)
    dist1 = np.linalg.norm(start1 - end1)
    if dist0 == 0 or dist1 == 0:
        return np.inf
    return diff / (dist0 + dist1)

Candidate Roughness Metrics

Mean Absolute Acceleration

$$ \bar{a} = \frac{a_1 + ... + a_{100}}{100} \space \text{where} \space a_i = (x_i^2 + y_i^2 + z_i^2)^{\frac{1}{2}}$$



In [46]:

    
pairwise_comps = pd.DataFrame({'id0': ids0, 'id1': ids1})
for i in ('0', '1'):
    for var in ('std_z', 'gps_speed', 'abs_mean_total'):
        pairwise_comps[var + i] = readings[var][pairwise_comps['id' + i]].values
for i in ('0', '1'):
    pairwise_comps['abs_mean_over_speed' + i] = pairwise_comps['abs_mean_total' + i] /  pairwise_comps['gps_speed' + i]



In [190]:

    
pairwise_comps.plot(x='abs_mean_total0', y='abs_mean_total1', kind='scatter', alpha=0.5, title='Comparing Mean Absolute Acceleration')
fig = plt.gcf()
fig.set_size_inches(12.5, 12.5)
plt.show()

plt.savefig('../dat/plots_for_presentation/mean_abs_accel.png')

Mean Absolute Acceleration Over Speed

$$ \frac{\bar{a}}{v} \text{ where } v \space \text{is distance in meters over one second}$$



In [191]:

    
pairwise_comps.plot(x='abs_mean_over_speed0', y='abs_mean_over_speed1', kind='scatter', alpha=0.5, title='Comparing Absolute Acceleration Over Speed')
fig = plt.gcf()
fig.set_size_inches(12.5, 12.5)
plt.show()

fig.savefig('../dat/plots_for_presentation/mean_abs_accel_over_speed.png')

Final Comments on Roughness Measure

Ideally would like to validate based on user experience. Options include

User feedback on app (during ride and after the fact)
External rating of road bumpiness (tagging sections manually)



In [73]:

    
clean_chi_readings = pd.read_csv(data_munging.data_dir + 'clean_chi_readings.csv')

Snapping Data to Road Segments

Used Open Street Map (OSM) data along with the Open Street Routing Machine (OSRM) to snap our GPS trails.

Sometimes it worked well

Sometimes it did not



In [66]:

    
ride_ids = set(clean_chi_readings.ride_id)
print(ride_ids)









    



set([128.0, 129.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 13.0, 14.0, 15.0, 16.0, 18.0, 131.0, 174.0, 133.0, 34.0, 35.0, 36.0, 37.0, 38.0, 39.0, 40.0, 41.0, 171.0, 44.0, 45.0, 46.0, 172.0, 130.0, 173.0, 169.0, 165.0, 115.0, 119.0, 120.0, 121.0, 122.0, 123.0, 124.0, 125.0, 126.0, 127.0])



In [60]:

    
ride_id = 35



In [68]:

    
for ride_id in ride_ids:
    try:
        ax = clean_chi_readings.loc[readings['ride_id'] == ride_id, :].plot(x='start_lon', y='start_lat', title='Snapping GPS Trails Using OSRM')
        clean_chi_readings.loc[readings['ride_id'] == ride_id, :].plot(x='snapped_lon', y='snapped_lat', ax=ax)
        fig = plt.gcf()
        fig.set_size_inches(18.5, 10.5)
        plt.show()
    except:
        'oh well'









    












    












    












    












    












    












    












    












    












    












    












    












    












    












    












    



/opt/conda/lib/python2.7/site-packages/matplotlib/axes/_base.py:2787: UserWarning: Attempting to set identical left==right results
in singular transformations; automatically expanding.
left=-87.6320669575, right=-87.6320669575
  'left=%s, right=%s') % (left, right))



In [115]:

    
road_bumpiness = collections.defaultdict(list)
for index, reading in clean_chi_readings.iterrows():
    if reading['gps_mph'] < 30 and reading['gps_mph'] > 5:
        osm_segment = [(reading['snapped_lat'], reading['snapped_lon']),
                      (reading['next_snapped_lat'], reading['next_snapped_lon'])]
        osm_segment = sorted(osm_segment)
        if all([(not pd.isnull(lat_lon[0])) and (not pd.isnull(lat_lon[0])) for lat_lon in osm_segment]):
            road_bumpiness[tuple(osm_segment)].append(reading['abs_mean_over_speed'])



In [117]:

    
total_road_readings = dict((osm_segment, len(road_bumpiness[osm_segment])) for osm_segment in road_bumpiness)



In [118]:

    
agg_road_bumpiness = dict((osm_segment, np.mean(road_bumpiness[osm_segment])) for osm_segment in road_bumpiness)



In [105]:

    
def find_seg_dist(lat_lon):
    return data_munging.calc_dist(lat_lon[0][1], lat_lon[0][0], lat_lon[1][1], lat_lon[1][0])



In [135]:

    
dist_of_road = dict((osm_segment, find_seg_dist(osm_segment)) for osm_segment in road_bumpiness if find_seg_dist(osm_segment) < 200)



In [144]:

    
thickness_of_road = dict()
for osm_segment in dist_of_road:
    thickness_of_road[osm_segment] = n(total_road_readings[osm_segment] / (dist_of_road[osm_segment] + 1)) ** 0.5



In [146]:

    
np.percentile(thickness_of_road.values(), np.arange(0, 100, 11))









    Out[146]:





array([ 0.07147919,  0.31805193,  0.33973884,  0.35418201,  0.36933322,
        0.38517204,  0.40458076,  0.42709482,  0.4744099 ,  0.79620523])



In [127]:

    
import matplotlib.colors as colors
plasma = cm = plt.get_cmap('plasma')
cNorm  = colors.Normalize(vmin=0, vmax=1.0)
scalarMap = cmx.ScalarMappable(norm=cNorm, cmap=plasma)

Unfortunately, We Don't Have Much Coverage Yet

Readings Per OSM Segment



In [187]:

    
plt.hist(total_road_readings.values(), bins= 40)
fig = plt.gcf()
fig.set_size_inches(12.5, 10.5)

fig.savefig('../dat/plots_for_presentation/readings_per_osm_segment.png')



In [158]:

    
std_road_bumpiness = dict((osm_segment, np.std(road_bumpiness[osm_segment])) for osm_segment in road_bumpiness)

We Finally Have All The Ingredients to get an Overall Measure of Pavement Quality

Pavement Quality in Chicago



In [192]:

    
for osm_segment in dist_of_road:
    plt.plot([osm_segment[0][1], osm_segment[1][1]], [osm_segment[0][0], osm_segment[1][0]],
        color=scalarMap.to_rgba(agg_road_bumpiness[osm_segment]), linewidth=2*thickness_of_road[osm_segment] + 4)
fig = plt.gcf()
fig.set_size_inches(12, 16)
plt.savefig('../dat/plots_for_presentation/final_agg_readings_per_osm_segment.png')

Future Directions

Show pavement quality in Leaflet on web
Serve up routing in OSRM based on pavement quality
Interface with policy community

Bring Ride Report to Chicago!

Same idea as Pavement, but with subjective experience and a budget.
\$5000 early adopter fee for a dashboard view of bicycle patterns in city

Thanks!!!!

Name	Role	Website
Michael Hassin	Rails and iOS	https://github.com/strangerloops
Joshua Shin	Android	https://github.com/jjshin85
Nate Hutcheson	Group Dad	https://www.linkedin.com/in/nate-hutcheson-2946444
Stephen Vance	Data and Bike Policy Guru, Photos	http://www.stevevance.net/

Name	Role	Website
Stephen Liu	UX and Project Management	http://www.iamstephenliu.com/
William Henderson	Backend (Ride Report)	https://twitter.com/quicklywilliam
Evan Heidtmann	Backend (Ride Report)	https://twitter.com/evanheidtmann



In [45]:

    
total_data_points = 0
total_tries = 0
examined_ids = set()
ids0 = list()
ids1 = list()
while total_data_points < 5000 and total_tries < 100000:
    total_tries += 1
    random_idx, random_point = data_munging.select_random_point(readings)
    random_bb = data_munging.point_to_bb(*random_point, side_length=0.2)
    intersecting_segments = list(readings_idx.intersection(random_bb))
    intersecting_segments = set([i for i in intersecting_segments if data_munging.calc_reading_diffs(readings.ix[i, :], readings.ix[random_idx, :]) < 0.4])
    intersecting_segments.difference_update(examined_ids)
    examined_ids.update(intersecting_segments)
    if intersecting_segments > 1:
        for idx_pair in itertools.combinations(intersecting_segments, 2):
            ids0.append(idx_pair[0])
            ids1.append(idx_pair[1])
            total_data_points += 1

Appendix

Icy Roads

Goal

Get a measure of pavement quality and an idea of this measure's accuracy all for road segments as defined in Open Street Map

Construction on Roads

Here's how I get to work

https://www.google.com/maps/dir/4015+N+Avers+Ave,+Chicago,+IL+60618,+USA/600+W+Chicago+Ave,+Chicago,+IL/@41.9253389,-87.7185152,13z/data=!3m1!4b1!4m14!4m13!1m5!1m1!1s0x880fcdb8e1bb9bb9:0x68f076f1528bf96a!2m2!1d-87.7236269!2d41.954289!1m5!1m1!1s0x880fd333f2e9f011:0x10660e26e7c76a5c!2m2!1d-87.6436198!2d41.8973006!3e1

suffices to look at OSM level since segments always take into account intersections. Negligible to consider differing pavement quality for different parts of a road segment for start and destination

Possible Obstacles

bike/person combination responds to pavement bumpiness differently (think of this in terms of modeling vibrations in a system)
People take different routes over pavement (swerving to avoid bumps)
Potholes are measured by large instantaneous force as opposed to the constant vibration of riding over gravel
Roads change over time
Weather conditions might change road

Obligatory Fourier Transform of Data

Possible Obstacles

Fortunately for me, we don't have enough data yet to be able to address any of these problems. We are currently ignoring the "pothole detection" problem and focusing on high average vibration.

Environment
Health
Happiness
Economic

Research Question

What is the best way to transform accelerometer and GPS data into a measure of pavement quality?

What observations to throw out?

How to Get Measure of Pavement Quality

Take Fourier transform of frequency data and

Normalizing Data

How do we aggregate pavement data across multiple riders and rides?

Future Directions

Integrate pavement data into a routing service via OSRM
Make data available to policy-makers, transportation planners, and cyclists
Associate subjective comfort data with physical readings



In [7]:

    
for color, i in zip(colors, indices):
    f, Pxx_den = signal.periodogram(readings['num_accel_' + axis][i][0:100])
    plt.plot(f, Pxx_den, color=color, alpha=0.8)
    plt.title('Power Spectrum for ' + axis + ' axis')
    plt.xlabel('frequency [Hz]')
    plt.ylabel('Power Spectrum Density')
fig = plt.gcf()
fig.set_size_inches(18.5, 10.5)
plt.show()



In [8]:

    
sample_size = 1000
indices = np.random.choice(n, sample_size)
for axis in ['z']:
    for i in indices:
        f, Pxx_den = signal.periodogram(readings['num_accel_' + axis][i][0:100])b
        plt.plot(f, Pxx_den, alpha=0.5)
        plt.title('Power Spectrum for ' + axis + ' axis')
        plt.xlabel('frequency [Hz]')
        plt.ylabel('Power Spectrum Density')
    fig = plt.gcf()
    fig.set_size_inches(18.5, 10.5)
    plt.show()









    



  File "<ipython-input-8-54bcfd4dc2e5>", line 5
    f, Pxx_den = signal.periodogram(readings['num_accel_' + axis][i][0:100])b
                                                                            ^
SyntaxError: invalid syntax

Justification for Cycling

What is Chi Hack Night Ethos: empower cyclists and provide open data and tools to improve the urban cycling experience Collective of civically-minded hackers who meet at the Merch Mart

Obstacles to Cycling Congested roads No bicycle lanes Icy bicycle lanes Potholes/divots in road/cracked pavement Road closings

Pain Points Western and Ashland versus Damen Open Grate Bridges

Personalized Bicycle Routing Mark up OSM with relevant information Collect Surface Quality Data (Project Pavement) Collect Subjective Experience (Ride-Report and Stress Map) Integrate all of these data sources with OSRM and Leaflet-Routing-Machine

Project Pavement

Ride-Report

OSRM

Policy Implications Show bike equity Talk about how usability score could determine bicycle deserts

Future Directions

Ride Report Evan and William

Michael Hassin- iOS and rails app Steven Vance - data guy and bicycle outreach expert Nate Hutcheson - project management Joshua Shin - Android App (give him a job!)

Appendix and Cutting Room Floor

Pain Points

https://www.google.com/maps/dir/4015+N+Avers+Ave,+Chicago,+IL+60618,+USA/41.9450308,-87.6547654/@41.9539148,-87.7068186,14z/am=t/data=!3m1!4b1!4m9!4m8!1m5!1m1!1s0x880fcdb8e1bb9bb9:0x68f076f1528bf96a!2m2!1d-87.7236269!2d41.954289!1m0!3e1

We want more people on bicycles

Congested Roads

Who am I?

Zane Blanton, a data scientist at Tempus (formerly at Allstate) and a bicycle commuter.

Unprotected (or absent!) bike lanes