read_seeclickfix_api

This notebook describes how to use python to read in the seeclickfix API to a pandas DataFrame



In [1]:

    
import json
import urllib2
import pandas as pd
from jq import jq
import matplotlib.pyplot as plt
import numpy as np
import nhrc2



In [2]:

    
%matplotlib inline



In [2]:

    
nhrc2.__file__









    Out[2]:





'/Users/matt/projects/nhrc2/__init__.pyc'

Extracting the Categories

The SeeClickFix Public API does not include the category in the output. Furthermore, the categories are non-standard (i.e., they vary from municipality to municipality). However, there is a solution! We can request all the categories from New Haven from the SeeClickFix API, and then request the issues for each category individually. This will allow us to associate a category with each issue.



In [3]:

    
#json_cats = json.load(urllib2.urlopen('https://seeclickfix.com/api/v2/issues/new?address=New+Haven,+CT'))
json_cats = json.load(open('/Users/matt/projects/nhrc2/data/scf_cats.json', 'r'))



In [4]:

    
scf_cat_rule = '[.[] | .[] | {title: .title, url: .url, organization: .organization}]'



In [5]:

    
scf_cat_df = pd.DataFrame(jq(scf_cat_rule).transform(json_cats))
scf_cat_df['type'] = [urlstr.split('/')[-1] for urlstr in scf_cat_df['url']]



In [6]:

    
print('Number of categories for New Haven: {}'.format(len(scf_cat_df)))
scf_cat_df.head(5)









    



Number of categories for New Haven: 25






    Out[6]:






  
    
      
      organization
      title
      url
      type
    
  
  
    
      0
      City of New Haven
      SNOW RELATED
      https://seeclickfix.com/api/v2/request_types/8516
      8516
    
    
      1
      City of New Haven
      Bins for Trash & Recycling
      https://seeclickfix.com/api/v2/request_types/5743
      5743
    
    
      2
      City of New Haven
      Graffiti
      https://seeclickfix.com/api/v2/request_types/122
      122
    
    
      3
      City of New Haven
      Hangers
      https://seeclickfix.com/api/v2/request_types/6215
      6215
    
    
      4
      City of New Haven
      Health Complaints
      https://seeclickfix.com/api/v2/request_types/5185
      5185

Extracting the Issues



In [7]:

    
#json_data = json.load(urllib2.urlopen('https://seeclickfix.com/api/v2/issues?place_url=new-haven&page=1&per_page=10'))
json_data = json.load(urllib2.urlopen('https://seeclickfix.com/api/v2/issues?request_types=116&per_page=100000'))



In [12]:

    
#json_data



In [8]:

    
json_data.keys()









    Out[8]:





[u'errors', u'issues', u'metadata']



In [5]:

    
json_data['issues'][0]









    Out[5]:





{u'acknowledged_at': u'2015-03-30T16:06:17-04:00',
 u'address': u'Fairmont Ave  New Haven, Connecticut',
 u'closed_at': None,
 u'comment_url': u'https://seeclickfix.com/api/v2/issues/1561282/comments',
 u'created_at': u'2015-03-30T15:10:43-04:00',
 u'description': u'From Fairmount Ave right down to Forbes Ave the pot holes are like driving through a mine field. Something has to be done. Been pointing this stretch for years and nothing is getting done. "NOTHING"! ',
 u'flag_url': u'https://seeclickfix.com/api/v2/issues/1561282/flag',
 u'html_url': u'https://seeclickfix.com/issues/1561282',
 u'id': 1561282,
 u'lat': 41.2967237,
 u'lng': -72.8939387,
 u'media': {u'image_full': None,
  u'image_square_100x100': None,
  u'representative_image_url': u'https://seeclickfix.com/assets/categories_trans/no-image.png',
  u'video_url': None},
 u'point': {u'coordinates': [-72.8939387, 41.2967237], u'type': u'Point'},
 u'rating': 3,
 u'reporter': {u'avatar': {u'full': u'https://seeclickfix.com/assets/no-avatar-100-0e1ba3fad46692e1a3e9a4b333e91ce5.png',
   u'square_100x100': u'https://seeclickfix.com/assets/no-avatar-100-0e1ba3fad46692e1a3e9a4b333e91ce5.png'},
  u'civic_points': 860,
  u'id': 5248,
  u'name': u'dmrowka',
  u'role': u'Registered User',
  u'witty_title': u'Municipal Avenger'},
 u'shortened_url': u'http://bit.ly/1abCoSL',
 u'status': u'Acknowledged',
 u'summary': u'Potholes',
 u'transitions': {},
 u'updated_at': u'2015-03-30T16:06:17-04:00',
 u'url': u'https://seeclickfix.com/api/v2/issues/1561282'}



In [ ]:

    
scf_iss_rule = '.issues | {id: .[] .id, status: .[] .status, lat: .[] .lat, lng: .[] .lng}'
scf_issues = jq(scf_iss_rule).transform(json_data)



In [6]:

    
#scf_issues



In [7]:

    
dict_lst_rule = '.issues | {id: [.[] .id], status: [.[] .status]}'
#jq(dict_lst_rule).transform(json_data)



In [9]:

    
record_rule = ("[.issues | .[] | {"
               "id: .id, "
               "status: .status,"
               "summary: .summary,"
               "description: .description,"
               "address: .address,"
               "lat: .lat,"
               "lng: .lng,"
               "closed_at: .closed_at,"
               "acknowledged_at: .acknowledged_at,"
               "created_at: .created_at,"
               "updated_at: .updated_at,"
               "shortened_url: .shortened_url,"
               "reporter_id: .reporter.id,"
               "reporter_name: .reporter.name,"
               "reporter_role: .reporter.role,"
               "}]")
#jq(record_rule).transform(json_data)



In [44]:

    
scf_df = pd.DataFrame(jq(record_rule).transform(json_data))



In [26]:

    
print('number of records: {}'.format(len(scf_df)))
scf_df.head(3)









    



number of records: 835






    Out[26]:






  
    
      
      acknowledged_at
      address
      closed_at
      created_at
      description
      id
      lat
      lng
      reporter_id
      reporter_name
      reporter_role
      shortened_url
      status
      summary
      updated_at
    
  
  
    
      0
      None
      204-210 Edgewood Avenue New Haven, CT 06511, USA
      None
      2015-04-10T08:41:59-04:00
      Can the secondary question of block be removed...
      1582951
      41.312336
      -72.940659
      2
      Ben
      Admin
      http://bit.ly/1chAu3J
      Open
      Potholes
      2015-04-10T08:41:59-04:00
    
    
      1
      None
      331-341 Clifton Street New Haven, CT 06513, USA
      None
      2015-04-09T17:57:21-04:00
      Huge pothole and collapsed curb at intersection
      1581476
      41.308563
      -72.873829
      92781
      Tech75
      Registered User
      http://bit.ly/1Ix47eT
      Open
      Potholes
      2015-04-10T06:32:48-04:00
    
    
      2
      None
      63-85 Eastern Street New Haven, CT 06513, USA
      None
      2015-04-09T17:55:14-04:00
      Deep hole 4 feet long
      1581470
      41.308647
      -72.873724
      92781
      Tech75
      Registered User
      http://bit.ly/1Ix3JwH
      Open
      Potholes
      2015-04-09T17:55:14-04:00

Merge two DataFrames



In [39]:

    
i=0
print(scf_cat_df.loc[i, 'title'], scf_cat_df.loc[i, 'type'])
issurl = 'https://seeclickfix.com/api/v2/issues?request_types='+scf_cat_df.loc[i, 'type']+'&per_page=100000'
json_data = json.load(urllib2.urlopen(issurl))
scf_iss_df = pd.DataFrame(jq(record_rule).transform(json_data))









    



(u'SNOW RELATED', u'8516')



In [41]:

    
print(len(scf_iss_df))



In [42]:

    
scf_df = scf_df.append(scf_iss_df, ignore_index=True)



In [43]:

    
len(scf_df)









    Out[43]:





1539

Looping over all issue categories

So far we have seen how to get the SeeClickFix issue categories for a municipality, request all issues for a particular category, and merge the results to an existing DataFrame. The next step is to loop over all categories, request all issues, and save them all to a huge DataFrame.



In [45]:

    
tstdf = pd.DataFrame(columns=['id', 'status', 'summary', 'address', 'lat', 'lng', 'closed_at', 'acknowledged_at',
                              'created_at', 'updated_at', 'shortened_url', 'issue_id', 'category'])
tstdf









    Out[45]:






  
    
      
      id
      status
      summary
      address
      lat
      lng
      closed_at
      acknowledged_at
      created_at
      updated_at
      shortened_url



In [50]:

    
scf_df = pd.DataFrame(columns=['id', 'status', 'summary', 'address', 'lat', 'lng', 'closed_at', 'acknowledged_at',
                              'created_at', 'updated_at', 'shortened_url', 
                              'reporter_id', 'reporter_name', 'reporter_role',
                              'issue_id', 'category'])

for i in scf_cat_df.index:
    print(scf_cat_df.loc[i, 'title'], scf_cat_df.loc[i, 'type'])
    issurl = 'https://seeclickfix.com/api/v2/issues?request_types='+scf_cat_df.loc[i, 'type']+'&per_page=10000'
    json_data = json.load(urllib2.urlopen(issurl))
    scf_iss_df = pd.DataFrame(jq(record_rule).transform(json_data))
    scf_iss_df['issue_id'] = scf_cat_df.loc[i, 'type']
    scf_iss_df['category'] = scf_cat_df.loc[i, 'title']
    scf_df = scf_df.append(scf_iss_df, ignore_index=True)









    



(u'SNOW RELATED', u'8516')
(u'Bins for Trash & Recycling', u'5743')
(u'Graffiti', u'122')
(u'Hangers', u'6215')
(u'Health Complaints', u'5185')
(u'Illegal Dumping', u'1250')
(u'Other', u'374')
(u'Other - city responsibility', u'3018')
(u'Parking Meter', u'372')
(u'Parking Violation/Abandoned Auto', u'121')
(u'Parks Request', u'126')
(u'Policing Issue', u'2626')
(u'Potholes', u'116')
(u'Public Space, Streets and Drains', u'1249')
(u'Private Property Issue', u'1251')
(u'Sidewalks and Curb damage', u'117')
(u'Signs / Bus Shelters / Pavement Markings', u'373')
(u'Street Lamp', u'124')
(u'Traffic/Road Safety', u'2625')
(u'Traffic Signal / Pedestrian Signal', u'51')
(u'Trash & Recycling', u'1966')
(u'Tree Trimming', u'1853')
(u'Request for volunteers', u'5998')
(u'General Bus Request/Incident', u'4947')
(u'Post to Neighbors', u'other')



In [45]:

    
print(len(scf_df))
scf_df.head(3)









    



835






    Out[45]:






  
    
      
      acknowledged_at
      address
      closed_at
      created_at
      description
      id
      lat
      lng
      reporter_id
      reporter_name
      reporter_role
      shortened_url
      status
      summary
      updated_at
    
  
  
    
      0
      None
      204-210 Edgewood Avenue New Haven, CT 06511, USA
      None
      2015-04-10T08:41:59-04:00
      Can the secondary question of block be removed...
      1582951
      41.312336
      -72.940659
      2
      Ben
      Admin
      http://bit.ly/1chAu3J
      Open
      Potholes
      2015-04-10T08:41:59-04:00
    
    
      1
      None
      331-341 Clifton Street New Haven, CT 06513, USA
      None
      2015-04-09T17:57:21-04:00
      Huge pothole and collapsed curb at intersection
      1581476
      41.308563
      -72.873829
      92781
      Tech75
      Registered User
      http://bit.ly/1Ix47eT
      Open
      Potholes
      2015-04-10T06:32:48-04:00
    
    
      2
      None
      63-85 Eastern Street New Haven, CT 06513, USA
      None
      2015-04-09T17:55:14-04:00
      Deep hole 4 feet long
      1581470
      41.308647
      -72.873724
      92781
      Tech75
      Registered User
      http://bit.ly/1Ix3JwH
      Open
      Potholes
      2015-04-09T17:55:14-04:00

Remove timezone from DateTimes for querying



In [52]:

    
csv_scf_df = scf_df.copy()



In [53]:

    
csv_scf_df.head(3)









    Out[53]:






  
    
      
      acknowledged_at
      address
      closed_at
      created_at
      description
      id
      lat
      lng
      reporter_id
      reporter_name
      reporter_role
      shortened_url
      status
      summary
      updated_at
    
  
  
    
      0
      None
      204-210 Edgewood Avenue New Haven, CT 06511, USA
      None
      2015-04-10T08:41:59-04:00
      Can the secondary question of block be removed...
      1582951
      41.312336
      -72.940659
      2
      Ben
      Admin
      http://bit.ly/1chAu3J
      Open
      Potholes
      2015-04-10T08:41:59-04:00
    
    
      1
      None
      331-341 Clifton Street New Haven, CT 06513, USA
      None
      2015-04-09T17:57:21-04:00
      Huge pothole and collapsed curb at intersection
      1581476
      41.308563
      -72.873829
      92781
      Tech75
      Registered User
      http://bit.ly/1Ix47eT
      Open
      Potholes
      2015-04-10T06:32:48-04:00
    
    
      2
      None
      63-85 Eastern Street New Haven, CT 06513, USA
      None
      2015-04-09T17:55:14-04:00
      Deep hole 4 feet long
      1581470
      41.308647
      -72.873724
      92781
      Tech75
      Registered User
      http://bit.ly/1Ix3JwH
      Open
      Potholes
      2015-04-09T17:55:14-04:00



In [54]:

    
csv_scf_df['created_at'] = ['-'.join(str(ndatetime).split('-')[:-1]) if ndatetime is not None else None for ndatetime in csv_scf_df['created_at']]
csv_scf_df['acknowledged_at'] = ['-'.join(str(ndatetime).split('-')[:-1]) if ndatetime is not None else None for ndatetime in csv_scf_df['acknowledged_at']]
csv_scf_df['closed_at'] = ['-'.join(str(ndatetime).split('-')[:-1]) if ndatetime is not None else None for ndatetime in csv_scf_df['closed_at']]
csv_scf_df['updated_at'] = ['-'.join(str(ndatetime).split('-')[:-1]) if ndatetime is not None else None for ndatetime in csv_scf_df['updated_at']]



In [55]:

    
csv_scf_df.head(3)









    Out[55]:






  
    
      
      acknowledged_at
      address
      closed_at
      created_at
      description
      id
      lat
      lng
      reporter_id
      reporter_name
      reporter_role
      shortened_url
      status
      summary
      updated_at
    
  
  
    
      0
      None
      204-210 Edgewood Avenue New Haven, CT 06511, USA
      None
      2015-04-10T08:41:59
      Can the secondary question of block be removed...
      1582951
      41.312336
      -72.940659
      2
      Ben
      Admin
      http://bit.ly/1chAu3J
      Open
      Potholes
      2015-04-10T08:41:59
    
    
      1
      None
      331-341 Clifton Street New Haven, CT 06513, USA
      None
      2015-04-09T17:57:21
      Huge pothole and collapsed curb at intersection
      1581476
      41.308563
      -72.873829
      92781
      Tech75
      Registered User
      http://bit.ly/1Ix47eT
      Open
      Potholes
      2015-04-10T06:32:48
    
    
      2
      None
      63-85 Eastern Street New Haven, CT 06513, USA
      None
      2015-04-09T17:55:14
      Deep hole 4 feet long
      1581470
      41.308647
      -72.873724
      92781
      Tech75
      Registered User
      http://bit.ly/1Ix3JwH
      Open
      Potholes
      2015-04-09T17:55:14



In [56]:

    
csv_scf_df.to_csv('/Users/matt/projects/nhrc2/data/scf_data_full.csv', sep=',', encoding='utf-8')



In [ ]:

	organization	title	url	type
0	City of New Haven	SNOW RELATED	https://seeclickfix.com/api/v2/request_types/8516	8516
1	City of New Haven	Bins for Trash & Recycling	https://seeclickfix.com/api/v2/request_types/5743	5743
2	City of New Haven	Graffiti	https://seeclickfix.com/api/v2/request_types/122	122
3	City of New Haven	Hangers	https://seeclickfix.com/api/v2/request_types/6215	6215
4	City of New Haven	Health Complaints	https://seeclickfix.com/api/v2/request_types/5185	5185

	acknowledged_at	address	closed_at	created_at	description	id	lat	lng	reporter_id	reporter_name	reporter_role	shortened_url	status	summary	updated_at
0	None	204-210 Edgewood Avenue New Haven, CT 06511, USA	None	2015-04-10T08:41:59-04:00	Can the secondary question of block be removed...	1582951	41.312336	-72.940659	2	Ben	Admin	http://bit.ly/1chAu3J	Open	Potholes	2015-04-10T08:41:59-04:00
1	None	331-341 Clifton Street New Haven, CT 06513, USA	None	2015-04-09T17:57:21-04:00	Huge pothole and collapsed curb at intersection	1581476	41.308563	-72.873829	92781	Tech75	Registered User	http://bit.ly/1Ix47eT	Open	Potholes	2015-04-10T06:32:48-04:00
2	None	63-85 Eastern Street New Haven, CT 06513, USA	None	2015-04-09T17:55:14-04:00	Deep hole 4 feet long	1581470	41.308647	-72.873724	92781	Tech75	Registered User	http://bit.ly/1Ix3JwH	Open	Potholes	2015-04-09T17:55:14-04:00