Convert a pandas dataframe to geojson for web-mapping



In [1]:

    
import pandas as pd, requests, json

First download data from the city of Berkeley's API. You can use Socrata's $limit parameter to specify how many rows to grab (otherwise the default is 1,000 rows of data): https://dev.socrata.com/docs/paging.html

Example request: https://data.cityofberkeley.info/resource/k489-uv4i.json?$limit=5



In [2]:

    
# API endpoint for city of Berkeley's 311 calls
endpoint_url = 'https://data.cityofberkeley.info/resource/k489-uv4i.json?$limit=2000'



In [3]:

    
# fetch the URL and load the data
response = requests.get(endpoint_url)
data = response.json()

Next, turn the json data into a dataframe and clean it up a bit: drop unnecessary columns and any rows that lack lat-long data. We want to make our json file as small as possible (prefer under 5 mb) so that it can be loaded over the Internet to anyone viewing your map, without taking forever to download a huge file.



In [4]:

    
# turn the json data into a dataframe and see how many rows and what columns we have
df = pd.DataFrame(data)

print('We have {} rows'.format(len(df)))
str(df.columns.tolist())









    



We have 2000 rows






    Out[4]:





"['apn', 'city', 'indbdate', 'issue_description', 'issue_type', 'latitude', 'location', 'longitude', 'neighborhood_district', 'object_type', 'secondary_issue_type', 'state', 'street_address', 'ticket_closed_date_time', 'ticket_created_date_time', 'ticket_id', 'ticket_status']"



In [5]:

    
df.head()









    Out[5]:






  
    
      
      apn
      city
      indbdate
      issue_description
      issue_type
      latitude
      location
      longitude
      neighborhood_district
      object_type
      secondary_issue_type
      state
      street_address
      ticket_closed_date_time
      ticket_created_date_time
      ticket_id
      ticket_status
    
  
  
    
      0
      NaN
      Berkeley
      2016-10-13T04:17:52
      Parking Meter Repair
      Traffic and Transportation
      NaN
      NaN
      NaN
      Berkeley
      Individual
      Parking
      CA
      NaN
      2011-08-23T11:27:06
      2011-03-01T12:48:53
      121000042212
      Closed
    
    
      1
      054 170801100
      Berkeley
      2016-10-13T04:13:37
      Residential Service Start
      Refuse and Recycling
      37.86165226
      {'longitude': '-122.25267724', 'latitude': '37...
      -122.25267724
      Berkeley
      Property
      Residential
      CA
      2717 FOREST AVE
      2013-05-09T09:35:31
      2013-05-07T13:00:10
      121000141381
      Closed
    
    
      2
      053 160400800
      Berkeley
      2016-10-13T04:18:09
      Residential Lost or Stolen Cart
      Refuse and Recycling
      37.85527017
      {'longitude': '-122.27364915', 'latitude': '37...
      -122.27364915
      Berkeley
      Property
      Residential
      CA
      2910 ELLIS ST
      2014-10-01T17:32:04
      2014-09-25T09:33:44
      121000214450
      Closed
    
    
      3
      052 143700100
      Berkeley
      2016-10-13T04:13:29
      Commercial Cart Size Decrease
      Refuse and Recycling
      37.84693953
      {'longitude': '-122.27597171', 'latitude': '37...
      -122.27597171
      Berkeley
      Property
      Commercial
      CA
      1538 SIXTY-THIRD ST
      2015-09-18T11:51:12
      2015-07-06T09:14:01
      121000252506
      Closed
    
    
      4
      057 202901300
      Berkeley
      2016-10-13T04:13:26
      Illegal Dumping - City Property
      Streets, Utilities, and Transportation
      37.86833388
      {'longitude': '-122.26742331', 'latitude': '37...
      -122.26742331
      Berkeley
      Property
      Clean City Program
      CA
      2255 SHATTUCK AVE
      2015-06-25T08:33:30
      2015-06-24T09:14:52
      121000251283
      Closed



In [6]:

    
# convert lat-long to floats and change address from ALL CAPS to Regular Capitalization
df['latitude'] = df['latitude'].astype(float)
df['longitude'] = df['longitude'].astype(float)
df['street_address'] = df['street_address'].str.title()



In [7]:

    
# we don't need all those columns - only keep useful ones
useful_cols = ['issue_description', 'issue_type', 'latitude', 'longitude', 'street_address', 'ticket_status']
df_subset = df[useful_cols]



In [8]:

    
# drop any rows that lack lat/long data
df_geo = df_subset.dropna(subset=['latitude', 'longitude'], axis=0, inplace=False)

print('We have {} geotagged rows'.format(len(df_geo)))
df_geo.tail()









    



We have 985 geotagged rows






    Out[8]:






  
    
      
      issue_description
      issue_type
      latitude
      longitude
      street_address
      ticket_status
    
  
  
    
      1989
      Residential Missed Pickup
      Refuse and Recycling
      37.884110
      -122.248453
      1438 Grizzly Peak Blvd
      Closed
    
    
      1991
      Residential Bulky Pickup
      Refuse and Recycling
      37.857363
      -122.274100
      1726 Stuart St
      Closed
    
    
      1992
      Commercial Site Inspection
      Refuse and Recycling
      37.854159
      -122.259605
      3075 Telegraph Ave Com
      Closed
    
    
      1993
      Miscellaneous Internet Request
      General Questions/information
      37.869606
      -122.273367
      2100 M L King Jr Way
      Closed
    
    
      1995
      Commercial Bin Size Increase
      Refuse and Recycling
      37.851065
      -122.291845
      2950 Seventh St
      Closed



In [9]:

    
# what is the distribution of issue types?
df_geo['issue_type'].value_counts()









    Out[9]:





Refuse and Recycling                            709
General Questions/information                   103
Streets, Utilities, and Transportation           96
Parks, Trees and Vegetation                      39
Environmental Services and Programs              14
Business License                                  7
Traffic and Transportation                        7
Facilities, Electrical & Property Management      5
Graffiti and Vandalism                            3
Other Account Services and Billing                1
Equipment Maintenance                             1
Name: issue_type, dtype: int64

Finally, convert each row in the dataframe to a geojson-formatted feature and save the result as a file. The format is pretty simple and you can see it here: http://geojson.org/



In [10]:

    
def df_to_geojson(df, properties, lat='latitude', lon='longitude'):
    """
    Turn a dataframe containing point data into a geojson formatted python dictionary
    
    df : the dataframe to convert to geojson
    properties : a list of columns in the dataframe to turn into geojson feature properties
    lat : the name of the column in the dataframe that contains latitude data
    lon : the name of the column in the dataframe that contains longitude data
    """
    
    # create a new python dict to contain our geojson data, using geojson format
    geojson = {'type':'FeatureCollection', 'features':[]}

    # loop through each row in the dataframe and convert each row to geojson format
    for _, row in df.iterrows():
        # create a feature template to fill in
        feature = {'type':'Feature',
                   'properties':{},
                   'geometry':{'type':'Point',
                               'coordinates':[]}}

        # fill in the coordinates
        feature['geometry']['coordinates'] = [row[lon],row[lat]]

        # for each column, get the value and add it as a new feature property
        for prop in properties:
            feature['properties'][prop] = row[prop]
        
        # add this feature (aka, converted dataframe row) to the list of features inside our dict
        geojson['features'].append(feature)
    
    return geojson



In [11]:

    
useful_columns = ['street_address', 'issue_description', 'issue_type', 'ticket_status']
geojson_dict = df_to_geojson(df_geo, properties=useful_columns)
geojson_str = json.dumps(geojson_dict, indent=2)



In [12]:

    
# save the geojson result to a file
output_filename = 'dataset.js'
with open(output_filename, 'w') as output_file:
    output_file.write('var dataset = {};'.format(geojson_str))
    
# how many features did we save to the geojson file?
print('{} geotagged features saved to file'.format(len(geojson_dict['features'])))









    



985 geotagged features saved to file



In [ ]:

	apn	city	indbdate	issue_description	issue_type	latitude	location	longitude	neighborhood_district	object_type	secondary_issue_type	state	street_address	ticket_closed_date_time	ticket_created_date_time	ticket_id	ticket_status
0	NaN	Berkeley	2016-10-13T04:17:52	Parking Meter Repair	Traffic and Transportation	NaN	NaN	NaN	Berkeley	Individual	Parking	CA	NaN	2011-08-23T11:27:06	2011-03-01T12:48:53	121000042212	Closed
1	054 170801100	Berkeley	2016-10-13T04:13:37	Residential Service Start	Refuse and Recycling	37.86165226	{'longitude': '-122.25267724', 'latitude': '37...	-122.25267724	Berkeley	Property	Residential	CA	2717 FOREST AVE	2013-05-09T09:35:31	2013-05-07T13:00:10	121000141381	Closed
2	053 160400800	Berkeley	2016-10-13T04:18:09	Residential Lost or Stolen Cart	Refuse and Recycling	37.85527017	{'longitude': '-122.27364915', 'latitude': '37...	-122.27364915	Berkeley	Property	Residential	CA	2910 ELLIS ST	2014-10-01T17:32:04	2014-09-25T09:33:44	121000214450	Closed
3	052 143700100	Berkeley	2016-10-13T04:13:29	Commercial Cart Size Decrease	Refuse and Recycling	37.84693953	{'longitude': '-122.27597171', 'latitude': '37...	-122.27597171	Berkeley	Property	Commercial	CA	1538 SIXTY-THIRD ST	2015-09-18T11:51:12	2015-07-06T09:14:01	121000252506	Closed
4	057 202901300	Berkeley	2016-10-13T04:13:26	Illegal Dumping - City Property	Streets, Utilities, and Transportation	37.86833388	{'longitude': '-122.26742331', 'latitude': '37...	-122.26742331	Berkeley	Property	Clean City Program	CA	2255 SHATTUCK AVE	2015-06-25T08:33:30	2015-06-24T09:14:52	121000251283	Closed

	issue_description	issue_type	latitude	longitude	street_address	ticket_status
1989	Residential Missed Pickup	Refuse and Recycling	37.884110	-122.248453	1438 Grizzly Peak Blvd	Closed
1991	Residential Bulky Pickup	Refuse and Recycling	37.857363	-122.274100	1726 Stuart St	Closed
1992	Commercial Site Inspection	Refuse and Recycling	37.854159	-122.259605	3075 Telegraph Ave Com	Closed
1993	Miscellaneous Internet Request	General Questions/information	37.869606	-122.273367	2100 M L King Jr Way	Closed
1995	Commercial Bin Size Increase	Refuse and Recycling	37.851065	-122.291845	2950 Seventh St	Closed