The openaq
api is an easy-to-use wrapper built around the OpenAQ Api. Complete API documentation can be found on their website.
There are no keys or rate limits (as of March 2017), so working with the API is straight forward. If building a website or app, you may want to just use the python wrapper and interact with the data in json format. However, the rest of this tutorial will assume you are interested in analyzing the data. To get more out of it, I recommend installing seaborn
for manipulating the asthetics of plots, and working with data as DataFrames using pandas
. For more information on these, check out the installation section of this documentation.
From this point forward, I assume you have at least a basic knowledge of python and matplotlib. This documentation was built using the following versions of all packages:
In [32]:
import pandas as pd
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt
import openaq
import warnings
warnings.simplefilter('ignore')
%matplotlib inline
# Set major seaborn asthetics
sns.set("notebook", style='ticks', font_scale=1.0)
# Increase the quality of inline plots
mpl.rcParams['figure.dpi']= 500
print ("pandas v{}".format(pd.__version__))
print ("matplotlib v{}".format(mpl.__version__))
print ("seaborn v{}".format(sns.__version__))
print ("openaq v{}".format(openaq.__version__))
The OpenAQ APi has only eight endpoints that we are interested in:
For detailed documentation about each one in the context of this API wrapper, please check out the API documentation.
Real quick, let's go ahead and initiate an instance of the openaq.OpenAQ
class so we can begin looking at data:
In [2]:
api = openaq.OpenAQ()
In [3]:
resp = api.cities(df=True, limit=10000)
# display the first 10 rows
resp.info()
So we retrieved 1400+ entries from the database. We can then take a look at them:
In [4]:
print (resp.head(10))
Let's try to find out which ones are in India:
In [5]:
print (resp.query("country == 'IN'"))
Great! For the rest of the tutorial, we are going to focus on Delhi, India. Why? Well..because there are over 500,000 data points and my personal research is primarily in India. We will also take a look at some $SO_2$ data from Hawai'i later on (another great research locale).
In [7]:
res = api.countries(limit=10000, df=True)
print (res.head())
If you are interested in getting information pertaining to the individual data fetch operations, go ahead and use this endpoint. Most people won't need to use this. This API method does not allow the df
parameter; if you would like it to be added, drop me a message.
Otherwise, here is how you can access the json-formatted data:
In [9]:
status, resp = api.fetches(limit=1)
# Print out the meta info
resp['meta']
Out[9]:
In [10]:
res = api.parameters(df=True)
print (res)
In [11]:
res = api.sources(df=True)
# Print out the first one
res.ix[0]
Out[11]:
In [12]:
res = api.locations(df=True)
res.info()
In [13]:
# print out the first one
res.ix[0]
Out[13]:
What if we just want to grab the locations in Delhi?
In [14]:
res = api.locations(city='Delhi', df=True)
res.ix[0]
Out[14]:
What about just figuring out which locations in Delhi have $PM_{2.5}$ data?
In [15]:
res = api.locations(city='Delhi', parameter='pm25', df=True)
res.ix[0]
Out[15]:
In [17]:
res = api.latest(city='Delhi', parameter='pm25', df=True)
res.head()
Out[17]:
What about the most recent $SO_2$ data in Hawii?
In [18]:
res = api.latest(city='Hilo', parameter='so2', df=True)
res
Out[18]:
In [19]:
res = api.measurements(city='Delhi', parameter='pm25', limit=10000, df=True)
# Print out the statistics on a per-location basiss
res.groupby(['location'])['value'].describe()
Out[19]:
Clearly, we should be doing some serious data cleaning ;) Why don't we go ahead and plot all of these locations on a figure.
In [38]:
fig, ax = plt.subplots(1, figsize=(10, 6))
for group, df in res.groupby('location'):
# Query the data to only get positive values and resample to hourly
_df = df.query("value >= 0.0").resample('1h').mean()
_df.value.plot(ax=ax, label=group)
ax.legend(loc='best')
ax.set_ylabel("$PM_{2.5}$ [$\mu g m^{-3}$]", fontsize=20)
ax.set_xlabel("")
sns.despine(offset=5)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.show()