This notebook describes how to use python to read in the seeclickfix API to a pandas DataFrame
In [1]:
import json
import urllib2
import pandas as pd
from jq import jq
import matplotlib.pyplot as plt
import numpy as np
import nhrc2
In [2]:
%matplotlib inline
In [2]:
nhrc2.__file__
Out[2]:
The SeeClickFix Public API does not include the category in the output. Furthermore, the categories are non-standard (i.e., they vary from municipality to municipality). However, there is a solution! We can request all the categories from New Haven from the SeeClickFix API, and then request the issues for each category individually. This will allow us to associate a category with each issue.
In [3]:
#json_cats = json.load(urllib2.urlopen('https://seeclickfix.com/api/v2/issues/new?address=New+Haven,+CT'))
json_cats = json.load(open('/Users/matt/projects/nhrc2/data/scf_cats.json', 'r'))
In [4]:
scf_cat_rule = '[.[] | .[] | {title: .title, url: .url, organization: .organization}]'
In [5]:
scf_cat_df = pd.DataFrame(jq(scf_cat_rule).transform(json_cats))
scf_cat_df['type'] = [urlstr.split('/')[-1] for urlstr in scf_cat_df['url']]
In [6]:
print('Number of categories for New Haven: {}'.format(len(scf_cat_df)))
scf_cat_df.head(5)
Out[6]:
In [7]:
#json_data = json.load(urllib2.urlopen('https://seeclickfix.com/api/v2/issues?place_url=new-haven&page=1&per_page=10'))
json_data = json.load(urllib2.urlopen('https://seeclickfix.com/api/v2/issues?request_types=116&per_page=100000'))
In [12]:
#json_data
In [8]:
json_data.keys()
Out[8]:
In [5]:
json_data['issues'][0]
Out[5]:
In [ ]:
scf_iss_rule = '.issues | {id: .[] .id, status: .[] .status, lat: .[] .lat, lng: .[] .lng}'
scf_issues = jq(scf_iss_rule).transform(json_data)
In [6]:
#scf_issues
In [7]:
dict_lst_rule = '.issues | {id: [.[] .id], status: [.[] .status]}'
#jq(dict_lst_rule).transform(json_data)
In [9]:
record_rule = ("[.issues | .[] | {"
"id: .id, "
"status: .status,"
"summary: .summary,"
"description: .description,"
"address: .address,"
"lat: .lat,"
"lng: .lng,"
"closed_at: .closed_at,"
"acknowledged_at: .acknowledged_at,"
"created_at: .created_at,"
"updated_at: .updated_at,"
"shortened_url: .shortened_url,"
"reporter_id: .reporter.id,"
"reporter_name: .reporter.name,"
"reporter_role: .reporter.role,"
"}]")
#jq(record_rule).transform(json_data)
In [44]:
scf_df = pd.DataFrame(jq(record_rule).transform(json_data))
In [26]:
print('number of records: {}'.format(len(scf_df)))
scf_df.head(3)
Out[26]:
In [39]:
i=0
print(scf_cat_df.loc[i, 'title'], scf_cat_df.loc[i, 'type'])
issurl = 'https://seeclickfix.com/api/v2/issues?request_types='+scf_cat_df.loc[i, 'type']+'&per_page=100000'
json_data = json.load(urllib2.urlopen(issurl))
scf_iss_df = pd.DataFrame(jq(record_rule).transform(json_data))
In [41]:
print(len(scf_iss_df))
In [42]:
scf_df = scf_df.append(scf_iss_df, ignore_index=True)
In [43]:
len(scf_df)
Out[43]:
So far we have seen how to get the SeeClickFix issue categories for a municipality, request all issues for a particular category, and merge the results to an existing DataFrame. The next step is to loop over all categories, request all issues, and save them all to a huge DataFrame.
In [45]:
tstdf = pd.DataFrame(columns=['id', 'status', 'summary', 'address', 'lat', 'lng', 'closed_at', 'acknowledged_at',
'created_at', 'updated_at', 'shortened_url', 'issue_id', 'category'])
tstdf
Out[45]:
In [50]:
scf_df = pd.DataFrame(columns=['id', 'status', 'summary', 'address', 'lat', 'lng', 'closed_at', 'acknowledged_at',
'created_at', 'updated_at', 'shortened_url',
'reporter_id', 'reporter_name', 'reporter_role',
'issue_id', 'category'])
for i in scf_cat_df.index:
print(scf_cat_df.loc[i, 'title'], scf_cat_df.loc[i, 'type'])
issurl = 'https://seeclickfix.com/api/v2/issues?request_types='+scf_cat_df.loc[i, 'type']+'&per_page=10000'
json_data = json.load(urllib2.urlopen(issurl))
scf_iss_df = pd.DataFrame(jq(record_rule).transform(json_data))
scf_iss_df['issue_id'] = scf_cat_df.loc[i, 'type']
scf_iss_df['category'] = scf_cat_df.loc[i, 'title']
scf_df = scf_df.append(scf_iss_df, ignore_index=True)
In [45]:
print(len(scf_df))
scf_df.head(3)
Out[45]:
In [52]:
csv_scf_df = scf_df.copy()
In [53]:
csv_scf_df.head(3)
Out[53]:
In [54]:
csv_scf_df['created_at'] = ['-'.join(str(ndatetime).split('-')[:-1]) if ndatetime is not None else None for ndatetime in csv_scf_df['created_at']]
csv_scf_df['acknowledged_at'] = ['-'.join(str(ndatetime).split('-')[:-1]) if ndatetime is not None else None for ndatetime in csv_scf_df['acknowledged_at']]
csv_scf_df['closed_at'] = ['-'.join(str(ndatetime).split('-')[:-1]) if ndatetime is not None else None for ndatetime in csv_scf_df['closed_at']]
csv_scf_df['updated_at'] = ['-'.join(str(ndatetime).split('-')[:-1]) if ndatetime is not None else None for ndatetime in csv_scf_df['updated_at']]
In [55]:
csv_scf_df.head(3)
Out[55]:
In [56]:
csv_scf_df.to_csv('/Users/matt/projects/nhrc2/data/scf_data_full.csv', sep=',', encoding='utf-8')
In [ ]: