SF Open Data Portal

Browse the portal for public datasets: https://data.sfgov.org/

Use Python to retrieve data from the API, just like we learned last week: https://data.sfgov.org/developers

Look at data on bike parking in SF: https://data.sfgov.org/Transportation/Bicycle-Parking-Public-/2e7e-i7me


In [1]:
import pandas as pd, requests, json

In [2]:
# use endpoint for bike parking in SF
endpoint_url = 'https://data.sfgov.org/resource/dd7x-3h4a.json'

# open a connection to the URL and download its response
response = requests.get(endpoint_url)

# parse the json string into a Python dict
data = response.json()

In [3]:
data[0]


Out[3]:
{'address': '4634 03RD ST',
 'location': 'VeloBrews Cafe & Cycling Community Center',
 'month_installed': '12',
 'number_of_racks': '5',
 'number_of_spaces': '10',
 'object_id': '37',
 'placement': 'ROADWAY',
 'street': '03RD ST',
 'year_installed': '2014'}

In [4]:
# turn the json data into a dataframe
df = pd.DataFrame(data)
df.head()


Out[4]:
address geom location month_installed number_of_racks number_of_spaces object_id placement street year_installed
0 4634 03RD ST NaN VeloBrews Cafe & Cycling Community Center 12 5 10 37 ROADWAY 03RD ST 2014
1 2023 MISSION ST {'coordinates': [-122.419104, 37.764429], 'typ... Mission Plaza 0 1 2 438 SIDEWALK MISSION 2002
2 70 04TH ST {'coordinates': [-122.404896, 37.784708], 'typ... Cole Hardware 7 2 4 853 SIDEWALK 04TH ST 2015
3 714 MONTGOMERY ST {'coordinates': [-122.403068, 37.795902], 'typ... Bubble Lounge 9 1 2 2407 SIDEWALK MONTGOMERY 2010
4 1740 OFARRELL ST {'coordinates': [-122.433316, 37.783481], 'typ... Fat Angel Wine Bar 11 1 2 265 SIDEWALK O'FARRELL 2010

In [5]:
# this column contains dicts that contain lats and lons
df['geom'][1]


Out[5]:
{'coordinates': [-122.419104, 37.764429], 'type': 'Point'}

In [6]:
# create lists to hold all my lat-lons as i extract them
latitudes = []
longitudes = []

# loop through each row in df, extracting lat and lon values if geom is not null
for label, row in df.iterrows():
    if pd.notnull(row['geom']):
        latitudes.append(row['geom']['coordinates'][1])
        longitudes.append(row['geom']['coordinates'][0])
    else:
        latitudes.append(None)
        longitudes.append(None)

In [7]:
df['lat'] = latitudes
df['lon'] = longitudes

In [8]:
df2 = df[['lat', 'lon']]
df2.head()


Out[8]:
lat lon
0 NaN NaN
1 37.764429 -122.419104
2 37.784708 -122.404896
3 37.795902 -122.403068
4 37.783481 -122.433316

In [ ]: