In [1]:
import requests
import json
import numpy as np
import pandas as pd
from pandas import DataFrame, Series
URIBASE = 'http://java.epa.gov/chemview/'
In [2]:
uri = URIBASE + 'uses'
r = requests.get(uri, headers = {'Accept': 'application/json, */*'})
j = json.loads(r.text)
In [3]:
print(len(j))
In [4]:
DataFrame(j)
Out[4]:
In [5]:
uri = URIBASE + 'uses/124470' # "Flame retardant"
r = requests.get(uri, headers = {'Accept': 'application/json, */*'})
j = json.loads(r.text)
j
Out[5]:
Unfortunately, the monster URI that the documentation provides (item 3, p 3) for doesn't really do much, or I am not using it correctly.
In [6]:
uri = URIBASE + 'chemicals/datatable?isTemplateFilter=false&chemicalIds=&snurUseIds=&useIds=124470&groupIds=&categoryIds=&endpointKeys=&synonymIds=&sourceIds='
# &sEcho=4&iColumns=6&sColumns=&iDisplayStart=0&iDisplayLength=10&mDataProp_0=0&mDataProp_1=1\
# &mDataProp_2=2&mDataProp_3=3&mDataProp_4=4&mDataProp_5=5&sSearch=&bRegex=false&sSearch_0=\
# &bRegex_0=false&bSearchable_0=true&sSearch_1=&bRegex_1=false&bSearchable_1=true&sSearch_2=\
# &bRegex_2=false&bSearchable_2=true&sSearch_3=&bRegex_3=false&bSearchable_3=true&sSearch_4=\
# &bRegex_4=false&bSearchable_4=true&sSearch_5=&bRegex_5=false&bSearchable_5=true&iSortCol_0=0\
# &sSortDir_0=asc&iSortingCols=1&bSortable_0=false&bSortable_1=true&bSortable_2=false\
# &bSortable_3=false&bSortable_4=false&bSortable_5=false'
print(uri)
r = requests.get(uri, headers = {'Accept': 'application/json, */*'})
j = json.loads(r.text)
j
Out[6]:
Trying something different: learn from the URIs that ChemView generates when you do a search and export the results.
mediaType=xls to retrieve json instead in the resulting URI....This doesn't work either.
In [7]:
uri = URIBASE + 'datatable?mediaType=json&useIds=124470&sourceIds=2-5-6-7-3-10-9-8-1-16-4-11-1981377'
r = requests.get(uri, headers = {'Accept': 'application/json, */*'})
r.text
Out[7]:
In [8]:
uri = URIBASE + 'sources'
r = requests.get(uri, headers = {'Accept': 'application/json, */*'})
j = json.loads(r.text)
In [9]:
sources_df = DataFrame(j)
In [10]:
sources_df
Out[10]:
In [11]:
# Calculate the number of items in the 'chemicals' field for each source.
sources_df['num_chems'] = sources_df['chemicals'].apply(len)
sources_df[['sourceId', 'sourceDesc', 'num_chems']]
Out[11]:
In [12]:
sources_df.ix[9,:]
Out[12]:
In [13]:
DataFrame(sources_df.ix[9,0])
Out[13]:
This tells us that if you ask ChemView for information form SNUR sources, you will get information about... just two chemicals?
In [18]:
uri = URIBASE + 'chemicals/f&sourceIds=1' #&chemicalIds=&snurUseIds=&useIds=&groupIds=&categoryIds=&endpointKeys=&synonymIds='
# &sEcho=4&iColumns=6&sColumns=&iDisplayStart=0&iDisplayLength=10&mDataProp_0=0&mDataProp_1=1\
# &mDataProp_2=2&mDataProp_3=3&mDataProp_4=4&mDataProp_5=5&sSearch=&bRegex=false&sSearch_0=\
# &bRegex_0=false&bSearchable_0=true&sSearch_1=&bRegex_1=false&bSearchable_1=true&sSearch_2=\
# &bRegex_2=false&bSearchable_2=true&sSearch_3=&bRegex_3=false&bSearchable_3=true&sSearch_4=\
# &bRegex_4=false&bSearchable_4=true&sSearch_5=&bRegex_5=false&bSearchable_5=true&iSortCol_0=0\
# &sSortDir_0=asc&iSortingCols=1&bSortable_0=false&bSortable_1=true&bSortable_2=false\
# &bSortable_3=false&bSortable_4=false&bSortable_5=false'
print(uri)
r = requests.get(uri, headers = {'Accept': 'application/json, */*'})
j = json.loads(r.text)
j
Out[18]:
What if we look up info about one of these chemicals, specifying SNURs as the source.
In [14]:
uri = URIBASE + 'chemicals/3554283?sourceIds=1'
print(uri)
r = requests.get(uri, headers = {'Accept': 'application/json, */*'})
j = json.loads(r.text)
j
Out[14]:
That returned nothing.
OK, we also know that chemical ID 3565112 corresponds to PMN Number P-11-0607 and that ChemView has a record of the SNURs linked to this substance...
In [15]:
uri = URIBASE + 'chemicals/3565112?sourceIds=1&synonymIds='
print(uri)
r = requests.get(uri, headers = {'Accept': 'application/json, */*'})
j = json.loads(r.text)
j
Out[15]:
In [16]:
print(j['sources'][0]['chemicals'][0]['externalLink'])
That did return some actual information. The external links about the specific chemicals both point to a PDF of the SNURs published in the Federal Register. We already know that this is not the extent of EPA's public data on these SNURs, so where is it in ChemView?
I navigated to the ChemView record for PMN number P-09-0248 and clicked on it to get a summary of the SNUR:
Below, I copied the link that it gives you when you click "E-mail Url", but added &mediaType=json.
In [17]:
uri = 'http://java.epa.gov/chemview?tf=0&ch=P-09-0248&su=2-5-6-7&as=3-10-9-8&ac=1-16&ma=4-11-1981377&tds=0&tdl=10&tas1=1&tas2=asc&tas3=undefined&tss=&modal=template&modalId=3517608&modalSrc=1&modalDetailId=3517610&mediaType=json'
r = requests.get(uri, headers = {'Accept': 'application/json, */*'})
j = json.loads(r.text)
j
Out[17]:
Apparently these data are not API-present yet.
Trying something else by tweaking the URL from a different search...
In [22]:
uri = 'http://java.epa.gov/chemview?tf=1&su=2-5-6-7&as=3-10-9-8&ac=1-16&ma=4-11-1981377&tds=0&tdl=10&tas1=1&tas2=asc&tas3=undefined&tss=&modal=template&modalId=103298&modalSrc=3&modalDetailId=5636434&modalVae=0-0-1-0-0&mediaType=json'
r = requests.get(uri, headers = {'Accept': 'application/json, */*'})
j = json.loads(r.text)
j
Out[22]: