read_seeclickfix_api

This notebook describes how to use python to read in the seeclickfix API to a pandas DataFrame


In [1]:
import json
import urllib2
import pandas as pd
from jq import jq
import matplotlib.pyplot as plt
import numpy as np
import nhrc2

In [2]:
%matplotlib inline

In [2]:
nhrc2.__file__


Out[2]:
'/Users/matt/projects/nhrc2/__init__.pyc'

Extracting the Categories

The SeeClickFix Public API does not include the category in the output. Furthermore, the categories are non-standard (i.e., they vary from municipality to municipality). However, there is a solution! We can request all the categories from New Haven from the SeeClickFix API, and then request the issues for each category individually. This will allow us to associate a category with each issue.


In [3]:
#json_cats = json.load(urllib2.urlopen('https://seeclickfix.com/api/v2/issues/new?address=New+Haven,+CT'))
json_cats = json.load(open('/Users/matt/projects/nhrc2/data/scf_cats.json', 'r'))

In [4]:
scf_cat_rule = '[.[] | .[] | {title: .title, url: .url, organization: .organization}]'

In [5]:
scf_cat_df = pd.DataFrame(jq(scf_cat_rule).transform(json_cats))
scf_cat_df['type'] = [urlstr.split('/')[-1] for urlstr in scf_cat_df['url']]

In [6]:
print('Number of categories for New Haven: {}'.format(len(scf_cat_df)))
scf_cat_df.head(5)


Number of categories for New Haven: 25
Out[6]:
organization title url type
0 City of New Haven SNOW RELATED https://seeclickfix.com/api/v2/request_types/8516 8516
1 City of New Haven Bins for Trash & Recycling https://seeclickfix.com/api/v2/request_types/5743 5743
2 City of New Haven Graffiti https://seeclickfix.com/api/v2/request_types/122 122
3 City of New Haven Hangers https://seeclickfix.com/api/v2/request_types/6215 6215
4 City of New Haven Health Complaints https://seeclickfix.com/api/v2/request_types/5185 5185

Extracting the Issues


In [7]:
#json_data = json.load(urllib2.urlopen('https://seeclickfix.com/api/v2/issues?place_url=new-haven&page=1&per_page=10'))
json_data = json.load(urllib2.urlopen('https://seeclickfix.com/api/v2/issues?request_types=116&per_page=100000'))

In [12]:
#json_data

In [8]:
json_data.keys()


Out[8]:
[u'errors', u'issues', u'metadata']

In [5]:
json_data['issues'][0]


Out[5]:
{u'acknowledged_at': u'2015-03-30T16:06:17-04:00',
 u'address': u'Fairmont Ave  New Haven, Connecticut',
 u'closed_at': None,
 u'comment_url': u'https://seeclickfix.com/api/v2/issues/1561282/comments',
 u'created_at': u'2015-03-30T15:10:43-04:00',
 u'description': u'From Fairmount Ave right down to Forbes Ave the pot holes are like driving through a mine field. Something has to be done. Been pointing this stretch for years and nothing is getting done. "NOTHING"! ',
 u'flag_url': u'https://seeclickfix.com/api/v2/issues/1561282/flag',
 u'html_url': u'https://seeclickfix.com/issues/1561282',
 u'id': 1561282,
 u'lat': 41.2967237,
 u'lng': -72.8939387,
 u'media': {u'image_full': None,
  u'image_square_100x100': None,
  u'representative_image_url': u'https://seeclickfix.com/assets/categories_trans/no-image.png',
  u'video_url': None},
 u'point': {u'coordinates': [-72.8939387, 41.2967237], u'type': u'Point'},
 u'rating': 3,
 u'reporter': {u'avatar': {u'full': u'https://seeclickfix.com/assets/no-avatar-100-0e1ba3fad46692e1a3e9a4b333e91ce5.png',
   u'square_100x100': u'https://seeclickfix.com/assets/no-avatar-100-0e1ba3fad46692e1a3e9a4b333e91ce5.png'},
  u'civic_points': 860,
  u'id': 5248,
  u'name': u'dmrowka',
  u'role': u'Registered User',
  u'witty_title': u'Municipal Avenger'},
 u'shortened_url': u'http://bit.ly/1abCoSL',
 u'status': u'Acknowledged',
 u'summary': u'Potholes',
 u'transitions': {},
 u'updated_at': u'2015-03-30T16:06:17-04:00',
 u'url': u'https://seeclickfix.com/api/v2/issues/1561282'}

In [ ]:
scf_iss_rule = '.issues | {id: .[] .id, status: .[] .status, lat: .[] .lat, lng: .[] .lng}'
scf_issues = jq(scf_iss_rule).transform(json_data)

In [6]:
#scf_issues

In [7]:
dict_lst_rule = '.issues | {id: [.[] .id], status: [.[] .status]}'
#jq(dict_lst_rule).transform(json_data)

In [9]:
record_rule = ("[.issues | .[] | {"
               "id: .id, "
               "status: .status,"
               "summary: .summary,"
               "description: .description,"
               "address: .address,"
               "lat: .lat,"
               "lng: .lng,"
               "closed_at: .closed_at,"
               "acknowledged_at: .acknowledged_at,"
               "created_at: .created_at,"
               "updated_at: .updated_at,"
               "shortened_url: .shortened_url,"
               "reporter_id: .reporter.id,"
               "reporter_name: .reporter.name,"
               "reporter_role: .reporter.role,"
               "}]")
#jq(record_rule).transform(json_data)

In [44]:
scf_df = pd.DataFrame(jq(record_rule).transform(json_data))

In [26]:
print('number of records: {}'.format(len(scf_df)))
scf_df.head(3)


number of records: 835
Out[26]:
acknowledged_at address closed_at created_at description id lat lng reporter_id reporter_name reporter_role shortened_url status summary updated_at
0 None 204-210 Edgewood Avenue New Haven, CT 06511, USA None 2015-04-10T08:41:59-04:00 Can the secondary question of block be removed... 1582951 41.312336 -72.940659 2 Ben Admin http://bit.ly/1chAu3J Open Potholes 2015-04-10T08:41:59-04:00
1 None 331-341 Clifton Street New Haven, CT 06513, USA None 2015-04-09T17:57:21-04:00 Huge pothole and collapsed curb at intersection 1581476 41.308563 -72.873829 92781 Tech75 Registered User http://bit.ly/1Ix47eT Open Potholes 2015-04-10T06:32:48-04:00
2 None 63-85 Eastern Street New Haven, CT 06513, USA None 2015-04-09T17:55:14-04:00 Deep hole 4 feet long 1581470 41.308647 -72.873724 92781 Tech75 Registered User http://bit.ly/1Ix3JwH Open Potholes 2015-04-09T17:55:14-04:00

Merge two DataFrames


In [39]:
i=0
print(scf_cat_df.loc[i, 'title'], scf_cat_df.loc[i, 'type'])
issurl = 'https://seeclickfix.com/api/v2/issues?request_types='+scf_cat_df.loc[i, 'type']+'&per_page=100000'
json_data = json.load(urllib2.urlopen(issurl))
scf_iss_df = pd.DataFrame(jq(record_rule).transform(json_data))


(u'SNOW RELATED', u'8516')

In [41]:
print(len(scf_iss_df))


569

In [42]:
scf_df = scf_df.append(scf_iss_df, ignore_index=True)

In [43]:
len(scf_df)


Out[43]:
1539

Looping over all issue categories

So far we have seen how to get the SeeClickFix issue categories for a municipality, request all issues for a particular category, and merge the results to an existing DataFrame. The next step is to loop over all categories, request all issues, and save them all to a huge DataFrame.


In [45]:
tstdf = pd.DataFrame(columns=['id', 'status', 'summary', 'address', 'lat', 'lng', 'closed_at', 'acknowledged_at',
                              'created_at', 'updated_at', 'shortened_url', 'issue_id', 'category'])
tstdf


Out[45]:
id status summary address lat lng closed_at acknowledged_at created_at updated_at shortened_url

In [50]:
scf_df = pd.DataFrame(columns=['id', 'status', 'summary', 'address', 'lat', 'lng', 'closed_at', 'acknowledged_at',
                              'created_at', 'updated_at', 'shortened_url', 
                              'reporter_id', 'reporter_name', 'reporter_role',
                              'issue_id', 'category'])

for i in scf_cat_df.index:
    print(scf_cat_df.loc[i, 'title'], scf_cat_df.loc[i, 'type'])
    issurl = 'https://seeclickfix.com/api/v2/issues?request_types='+scf_cat_df.loc[i, 'type']+'&per_page=10000'
    json_data = json.load(urllib2.urlopen(issurl))
    scf_iss_df = pd.DataFrame(jq(record_rule).transform(json_data))
    scf_iss_df['issue_id'] = scf_cat_df.loc[i, 'type']
    scf_iss_df['category'] = scf_cat_df.loc[i, 'title']
    scf_df = scf_df.append(scf_iss_df, ignore_index=True)


(u'SNOW RELATED', u'8516')
(u'Bins for Trash & Recycling', u'5743')
(u'Graffiti', u'122')
(u'Hangers', u'6215')
(u'Health Complaints', u'5185')
(u'Illegal Dumping', u'1250')
(u'Other', u'374')
(u'Other - city responsibility', u'3018')
(u'Parking Meter', u'372')
(u'Parking Violation/Abandoned Auto', u'121')
(u'Parks Request', u'126')
(u'Policing Issue', u'2626')
(u'Potholes', u'116')
(u'Public Space, Streets and Drains', u'1249')
(u'Private Property Issue', u'1251')
(u'Sidewalks and Curb damage', u'117')
(u'Signs / Bus Shelters / Pavement Markings', u'373')
(u'Street Lamp', u'124')
(u'Traffic/Road Safety', u'2625')
(u'Traffic Signal / Pedestrian Signal', u'51')
(u'Trash & Recycling', u'1966')
(u'Tree Trimming', u'1853')
(u'Request for volunteers', u'5998')
(u'General Bus Request/Incident', u'4947')
(u'Post to Neighbors', u'other')

In [45]:
print(len(scf_df))
scf_df.head(3)


835
Out[45]:
acknowledged_at address closed_at created_at description id lat lng reporter_id reporter_name reporter_role shortened_url status summary updated_at
0 None 204-210 Edgewood Avenue New Haven, CT 06511, USA None 2015-04-10T08:41:59-04:00 Can the secondary question of block be removed... 1582951 41.312336 -72.940659 2 Ben Admin http://bit.ly/1chAu3J Open Potholes 2015-04-10T08:41:59-04:00
1 None 331-341 Clifton Street New Haven, CT 06513, USA None 2015-04-09T17:57:21-04:00 Huge pothole and collapsed curb at intersection 1581476 41.308563 -72.873829 92781 Tech75 Registered User http://bit.ly/1Ix47eT Open Potholes 2015-04-10T06:32:48-04:00
2 None 63-85 Eastern Street New Haven, CT 06513, USA None 2015-04-09T17:55:14-04:00 Deep hole 4 feet long 1581470 41.308647 -72.873724 92781 Tech75 Registered User http://bit.ly/1Ix3JwH Open Potholes 2015-04-09T17:55:14-04:00

Remove timezone from DateTimes for querying


In [52]:
csv_scf_df = scf_df.copy()

In [53]:
csv_scf_df.head(3)


Out[53]:
acknowledged_at address closed_at created_at description id lat lng reporter_id reporter_name reporter_role shortened_url status summary updated_at
0 None 204-210 Edgewood Avenue New Haven, CT 06511, USA None 2015-04-10T08:41:59-04:00 Can the secondary question of block be removed... 1582951 41.312336 -72.940659 2 Ben Admin http://bit.ly/1chAu3J Open Potholes 2015-04-10T08:41:59-04:00
1 None 331-341 Clifton Street New Haven, CT 06513, USA None 2015-04-09T17:57:21-04:00 Huge pothole and collapsed curb at intersection 1581476 41.308563 -72.873829 92781 Tech75 Registered User http://bit.ly/1Ix47eT Open Potholes 2015-04-10T06:32:48-04:00
2 None 63-85 Eastern Street New Haven, CT 06513, USA None 2015-04-09T17:55:14-04:00 Deep hole 4 feet long 1581470 41.308647 -72.873724 92781 Tech75 Registered User http://bit.ly/1Ix3JwH Open Potholes 2015-04-09T17:55:14-04:00

In [54]:
csv_scf_df['created_at'] = ['-'.join(str(ndatetime).split('-')[:-1]) if ndatetime is not None else None for ndatetime in csv_scf_df['created_at']]
csv_scf_df['acknowledged_at'] = ['-'.join(str(ndatetime).split('-')[:-1]) if ndatetime is not None else None for ndatetime in csv_scf_df['acknowledged_at']]
csv_scf_df['closed_at'] = ['-'.join(str(ndatetime).split('-')[:-1]) if ndatetime is not None else None for ndatetime in csv_scf_df['closed_at']]
csv_scf_df['updated_at'] = ['-'.join(str(ndatetime).split('-')[:-1]) if ndatetime is not None else None for ndatetime in csv_scf_df['updated_at']]

In [55]:
csv_scf_df.head(3)


Out[55]:
acknowledged_at address closed_at created_at description id lat lng reporter_id reporter_name reporter_role shortened_url status summary updated_at
0 None 204-210 Edgewood Avenue New Haven, CT 06511, USA None 2015-04-10T08:41:59 Can the secondary question of block be removed... 1582951 41.312336 -72.940659 2 Ben Admin http://bit.ly/1chAu3J Open Potholes 2015-04-10T08:41:59
1 None 331-341 Clifton Street New Haven, CT 06513, USA None 2015-04-09T17:57:21 Huge pothole and collapsed curb at intersection 1581476 41.308563 -72.873829 92781 Tech75 Registered User http://bit.ly/1Ix47eT Open Potholes 2015-04-10T06:32:48
2 None 63-85 Eastern Street New Haven, CT 06513, USA None 2015-04-09T17:55:14 Deep hole 4 feet long 1581470 41.308647 -72.873724 92781 Tech75 Registered User http://bit.ly/1Ix3JwH Open Potholes 2015-04-09T17:55:14

In [56]:
csv_scf_df.to_csv('/Users/matt/projects/nhrc2/data/scf_data_full.csv', sep=',', encoding='utf-8')

In [ ]: