Convert a pandas dataframe to geojson for web-mapping

Author: Geoff Boeing

Original: pandas-to-geojson


In [1]:
import pandas as pd, requests, json

First download data from the city of Berkeley's API. You can use Socrata's $limit parameter to specify how many rows to grab (otherwise the default is 1,000 rows of data): https://dev.socrata.com/docs/paging.html

Example request: https://data.cityofberkeley.info/resource/k489-uv4i.json?$limit=5


In [2]:
# API endpoint for city of Berkeley's 311 calls
endpoint_url = 'https://data.cityofberkeley.info/resource/k489-uv4i.json?$limit=20'

In [3]:
# fetch the URL and load the data
response = requests.get(endpoint_url)
data = response.json()

Next, turn the json data into a dataframe and clean it up a bit: drop unnecessary columns and any rows that lack lat-long data. We want to make our json file as small as possible (prefer under 5 mb) so that it can be loaded over the Internet to anyone viewing your map, without taking forever to download a huge file.


In [4]:
# turn the json data into a dataframe and see how many rows and what columns we have
df = pd.DataFrame(data)

print('We have {} rows'.format(len(df)))
str(df.columns.tolist())


We have 20 rows
Out[4]:
"['apn', 'city', 'indbdate', 'issue_description', 'issue_type', 'latitude', 'location', 'longitude', 'neighborhood_district', 'object_type', 'secondary_issue_type', 'state', 'street_address', 'ticket_closed_date_time', 'ticket_created_date_time', 'ticket_id', 'ticket_status']"

In [5]:
# convert lat-long to floats and change address from ALL CAPS to regular capitalization
df['latitude'] = df['latitude'].astype(float)
df['longitude'] = df['longitude'].astype(float)
df['street_address'] = df['street_address'].str.title()

In [6]:
# we don't need all those columns - only keep useful ones
cols = ['issue_description', 'issue_type', 'latitude', 'longitude', 'street_address', 'ticket_status']
df_subset = df[cols]

In [7]:
# drop any rows that lack lat/long data
df_geo = df_subset.dropna(subset=['latitude', 'longitude'], axis=0, inplace=False)

print('We have {} geotagged rows'.format(len(df_geo)))
df_geo.tail()


We have 8 geotagged rows
Out[7]:
issue_description issue_type latitude longitude street_address ticket_status
12 Application Business License 37.857073 -122.258698 2918 Florence St Closed
13 Residential Bulky Pickup Refuse and Recycling 37.857077 -122.240570 47 Alvarado Rd Closed
14 Commercial Special Pickup Refuse and Recycling 37.865041 -122.264094 2230 Haste St Closed
16 Illegal Dumping - City Property Streets, Utilities, and Transportation 37.864497 -122.293439 2240 Ninth St Closed
19 Residential Service Stop Refuse and Recycling 37.876000 -122.293046 1127 Hopkins St Closed

In [8]:
# what is the distribution of issue types?
df_geo['issue_type'].value_counts()


Out[8]:
Refuse and Recycling                      6
Streets, Utilities, and Transportation    1
Business License                          1
Name: issue_type, dtype: int64

Finally, convert each row in the dataframe to a geojson-formatted feature and save the result as a file. The format is pretty simple and you can see it here: http://geojson.org/


In [9]:
def df_to_geojson(df, properties, lat='latitude', lon='longitude'):
    # create a new python dict to contain our geojson data, using geojson format
    geojson = {'type':'FeatureCollection', 'features':[]}

    # loop through each row in the dataframe and convert each row to geojson format
    for _, row in df.iterrows():
        # create a feature template to fill in
        feature = {'type':'Feature',
                   'properties':{},
                   'geometry':{'type':'Point',
                               'coordinates':[]}}

        # fill in the coordinates
        feature['geometry']['coordinates'] = [row[lon],row[lat]]

        # for each column, get the value and add it as a new feature property
        for prop in properties:
            feature['properties'][prop] = row[prop]
        
        # add this feature (aka, converted dataframe row) to the list of features inside our dict
        geojson['features'].append(feature)
    
    return geojson

In [10]:
cols = ['street_address', 'issue_description', 'issue_type', 'ticket_status']
geojson = df_to_geojson(df_geo, cols)

In nteract, we can display geojson directly with the built-in leaflet renderer.


In [11]:
import IPython
IPython.display.display({'application/geo+json': geojson}, raw=True)