Retrieving Data from the TataAQ Server

Data can be accessed using the API and this (py-tata) python library. It can also be retrieved in many other ways using the API itself; however, that isn't documented yet and probably won't be in the foreseable future.

To retrieve data, you must have an API key which can be accessed through your account on the TataAQ website. Here, I will show how to retrieve data for one of the instruments and export it to an external file format (i.e. csv or feather).

Initialize the API Wrapper


In [1]:
import tataaq

YOUR_API_KEY_HERE = ""

api = tataaq.TataAQ(apikey=YOUR_API_KEY_HERE)

# Ping the server to see if we have valid auth credentials
resp = api.ping()

print (resp.status_code)


200

Import Things

We are going to import some other libraries that are handy for working with data


In [2]:
import pandas as pd
import feather

Retrieve Information about a Device

To retrieve information about a device, you need to know it's Device ID. This can be found by looking at the website.

The device API endpoint returns a request object. you can learn more about them by looking at the python requests library. All you really need to know will be shown below.

Example: grab information for device_id="EBAM001"


In [3]:
# Request decice information for EBAM001
resp = api.device("EBAM001")

Access the status of the previous request


In [4]:
resp.status_code


Out[4]:
200

Access the header information


In [5]:
resp.headers


Out[5]:
{'Server': 'nginx/1.4.6 (Ubuntu)', 'Date': 'Sun, 26 Mar 2017 17:18:28 GMT', 'Content-Type': 'application/json', 'Content-Length': '358', 'Connection': 'keep-alive', 'ETag': '"8a01d67b6b8ce76bf0efee3b23fe6f3f"'}

Access the json information (data)


In [6]:
resp.json()


Out[6]:
{'city': 'Delhi',
 'country': 'IN',
 'last_updated': '2017-03-26T17:10:31',
 'latitude': '28.6257',
 'location': 'Connaught Place',
 'longitude': '77.2276',
 'model': 'E-BAM',
 'name': 'E-BAM',
 'outdoors': True,
 'sn': 'EBAM001',
 'timezone': 'Asia/Kolkata',
 'url': 'https://tatacenter-airquality.mit.edu/api/v1.0/device/EBAM001'}

Retrieve the Actual Data

First, we are going to retrieve the data and return in JSON format.


In [7]:
# Request the data
resp = api.data("EBAM001")

In [8]:
# Print the  meta information
resp.json()['meta']


Out[8]:
{'first_url': 'https://tatacenter-airquality.mit.edu/api/v1.0/device/EBAM001/data/?per_page=50&page=1&expand=1',
 'last_url': 'https://tatacenter-airquality.mit.edu/api/v1.0/device/EBAM001/data/?per_page=50&page=245&expand=1',
 'next_url': 'https://tatacenter-airquality.mit.edu/api/v1.0/device/EBAM001/data/?per_page=50&expand=1&page=2',
 'page': 1,
 'pages': 245,
 'per_page': 50,
 'prev_page': None,
 'total': 12205}

We can get the actual data by accesing the "data" key in the resp.json() dictionary:


In [9]:
# print the 0 row
resp.json()['data'][0]


Out[9]:
{'alarm': 0,
 'ambient temperature': 27.5,
 'conc_hr': 0.046,
 'conc_rt': 0.077,
 'flowrate': 16.7,
 'instrument': 'https://tatacenter-airquality.mit.edu/api/v1.0/device/EBAM001',
 'parameter': 'pm25',
 'rh external': 64.0,
 'rh internal': 43.0,
 'timestamp': '2016-05-30T02:20:00',
 'timestamp_local': '2016-05-30T07:50:00+05:30',
 'unit': 'mg/m3',
 'wind direction': 36.0,
 'wind speed': 2.7}

We can also add keywords to our request. The most useful ones are the following:

  • per_page: alter the number of data points sent per page (default is 50)
  • page: iterate over all pages
  • filter: complex keyword that is very powerful. Examples shown below...

The filter keyword allows you to select by any column in the database. The most useful ones are querying over certain points in time. For example, if we wanted to return all data for EBAM001 after 2017-01-01, we would use the filter keyword as follows:

filter="timestamp,gt,2017-01-01"

We can also join multiple filter arguments together by seperating using a semicolon. For example, to return all data during the month of January 2017:

filter="timestamp,gt,2017-01-01;timestamp,lt,2017-02-01"

See below for working examples


In [10]:
# return data after 2017-01-01

resp = api.data("EBAM001", per_page=100, filter="timestamp,gt,2017-01-01")

resp.json()['meta']


Out[10]:
{'first_url': 'https://tatacenter-airquality.mit.edu/api/v1.0/device/EBAM001/data/?filter=timestamp%2Cgt%2C2017-01-01&per_page=100&page=1&expand=1',
 'last_url': 'https://tatacenter-airquality.mit.edu/api/v1.0/device/EBAM001/data/?filter=timestamp%2Cgt%2C2017-01-01&per_page=100&page=14&expand=1',
 'next_url': 'https://tatacenter-airquality.mit.edu/api/v1.0/device/EBAM001/data/?per_page=100&expand=1&page=2',
 'page': 1,
 'pages': 14,
 'per_page': 100,
 'prev_page': None,
 'total': 1302}

Utilizing the Magical DataFrame

We can also retrieve the data and return as a pandas DataFrame, which is way more useful. When you add the dataframe=True argument to the request, it will return the meta information as a dictionary, and the data as a pandas DataFrame.

Let's see how this works:


In [11]:
meta, df = api.data("EBAM001", dataframe=True)

meta


Out[11]:
{'first_url': 'https://tatacenter-airquality.mit.edu/api/v1.0/device/EBAM001/data/?per_page=50&page=1&expand=1',
 'last_url': 'https://tatacenter-airquality.mit.edu/api/v1.0/device/EBAM001/data/?per_page=50&page=245&expand=1',
 'next_url': 'https://tatacenter-airquality.mit.edu/api/v1.0/device/EBAM001/data/?per_page=50&expand=1&page=2',
 'page': 1,
 'pages': 245,
 'per_page': 50,
 'prev_page': None,
 'total': 12205}

Let's take a look at our data now:


In [12]:
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 14 columns):
alarm                  50 non-null int64
ambient temperature    50 non-null float64
conc_hr                50 non-null float64
conc_rt                50 non-null float64
flowrate               50 non-null float64
instrument             50 non-null object
parameter              50 non-null object
rh external            50 non-null float64
rh internal            50 non-null float64
timestamp              50 non-null datetime64[ns]
timestamp_local        50 non-null datetime64[ns]
unit                   50 non-null object
wind direction         50 non-null float64
wind speed             50 non-null float64
dtypes: datetime64[ns](2), float64(8), int64(1), object(3)
memory usage: 5.5+ KB

Let's get all data from the EBAM for the year 2017


In [13]:
meta, df = api.data("EBAM001", per_page=10000, filter="timestamp,gt,2017-01-01", dataframe=True)

df.index = df['timestamp_local']

df.info()


<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1302 entries, 2017-01-13 21:00:00 to 2017-03-27 04:00:00
Data columns (total 14 columns):
alarm                  1302 non-null int64
ambient temperature    1302 non-null float64
conc_hr                1302 non-null float64
conc_rt                1302 non-null float64
flowrate               1302 non-null float64
instrument             1302 non-null object
parameter              1302 non-null object
rh external            1302 non-null float64
rh internal            1302 non-null float64
timestamp              1302 non-null datetime64[ns]
timestamp_local        1302 non-null datetime64[ns]
unit                   1302 non-null object
wind direction         1302 non-null float64
wind speed             1302 non-null float64
dtypes: datetime64[ns](2), float64(8), int64(1), object(3)
memory usage: 152.6+ KB

In [14]:
# Delete a couple of columns so we can easily peak at the data

del df['instrument']
del df['timestamp']

df.head()


Out[14]:
alarm ambient temperature conc_hr conc_rt flowrate parameter rh external rh internal timestamp_local unit wind direction wind speed
timestamp_local
2017-01-13 21:00:00 0 19.9 0.07 0.097 16.7 pm25 23.0 13.0 2017-01-13 21:00:00 mg/m3 151.0 1.3
2017-01-13 21:50:00 0 18.6 0.07 0.071 16.7 pm25 25.0 15.0 2017-01-13 21:50:00 mg/m3 151.0 1.2
2017-01-16 14:50:00 256 13.7 0.00 0.000 14.2 pm25 55.0 33.0 2017-01-16 14:50:00 mg/m3 279.0 0.8
2017-01-16 15:20:00 256 15.4 0.00 0.163 16.7 pm25 53.0 33.0 2017-01-16 15:20:00 mg/m3 196.0 0.6
2017-01-16 15:40:00 256 16.3 0.18 0.191 16.7 pm25 51.0 30.0 2017-01-16 15:40:00 mg/m3 199.0 0.9

Export Data

Once in a DataFrame, it is super easy to export and save your data. I personally recommend using feather, as it is much faster than anything else, and is language agnostic. There are libraries built for R, Python, and Julia, making it easy to analyze your data in any programming language (OSS only, obviously).

To export the dataframe to feather, do the following:


In [15]:
%time feather.write_dataframe(df, "EBAM001_2017_data.feather")


CPU times: user 2.12 ms, sys: 463 µs, total: 2.59 ms
Wall time: 2.58 ms

In [ ]: