Retrieving Data from the TataAQ Server

Data can be accessed using the API and this (py-tata) python library. It can also be retrieved in many other ways using the API itself; however, that isn't documented yet and probably won't be in the foreseable future.

To retrieve data, you must have an API key which can be accessed through your account on the TataAQ website. Here, I will show how to retrieve data for one of the instruments and export it to an external file format (i.e. csv or feather).

Initialize the API Wrapper



In [1]:

    
import tataaq

YOUR_API_KEY_HERE = ""

api = tataaq.TataAQ(apikey=YOUR_API_KEY_HERE)

# Ping the server to see if we have valid auth credentials
resp = api.ping()

print (resp.status_code)

Import Things

We are going to import some other libraries that are handy for working with data



In [2]:

    
import pandas as pd
import feather

Retrieve Information about a Device

To retrieve information about a device, you need to know it's Device ID. This can be found by looking at the website.

The device API endpoint returns a request object. you can learn more about them by looking at the python requests library. All you really need to know will be shown below.

Example: grab information for device_id="EBAM001"



In [3]:

    
# Request decice information for EBAM001
resp = api.device("EBAM001")

Access the status of the previous request



In [4]:

    
resp.status_code









    Out[4]:





200

Access the header information



In [5]:

    
resp.headers









    Out[5]:





{'Server': 'nginx/1.4.6 (Ubuntu)', 'Date': 'Sun, 26 Mar 2017 17:18:28 GMT', 'Content-Type': 'application/json', 'Content-Length': '358', 'Connection': 'keep-alive', 'ETag': '"8a01d67b6b8ce76bf0efee3b23fe6f3f"'}

Access the json information (data)



In [6]:

    
resp.json()









    Out[6]:





{'city': 'Delhi',
 'country': 'IN',
 'last_updated': '2017-03-26T17:10:31',
 'latitude': '28.6257',
 'location': 'Connaught Place',
 'longitude': '77.2276',
 'model': 'E-BAM',
 'name': 'E-BAM',
 'outdoors': True,
 'sn': 'EBAM001',
 'timezone': 'Asia/Kolkata',
 'url': 'https://tatacenter-airquality.mit.edu/api/v1.0/device/EBAM001'}

Retrieve the Actual Data

First, we are going to retrieve the data and return in JSON format.



In [7]:

    
# Request the data
resp = api.data("EBAM001")



In [8]:

    
# Print the  meta information
resp.json()['meta']









    Out[8]:





{'first_url': 'https://tatacenter-airquality.mit.edu/api/v1.0/device/EBAM001/data/?per_page=50&page=1&expand=1',
 'last_url': 'https://tatacenter-airquality.mit.edu/api/v1.0/device/EBAM001/data/?per_page=50&page=245&expand=1',
 'next_url': 'https://tatacenter-airquality.mit.edu/api/v1.0/device/EBAM001/data/?per_page=50&expand=1&page=2',
 'page': 1,
 'pages': 245,
 'per_page': 50,
 'prev_page': None,
 'total': 12205}

We can get the actual data by accesing the "data" key in the resp.json() dictionary:



In [9]:

    
# print the 0 row
resp.json()['data'][0]









    Out[9]:





{'alarm': 0,
 'ambient temperature': 27.5,
 'conc_hr': 0.046,
 'conc_rt': 0.077,
 'flowrate': 16.7,
 'instrument': 'https://tatacenter-airquality.mit.edu/api/v1.0/device/EBAM001',
 'parameter': 'pm25',
 'rh external': 64.0,
 'rh internal': 43.0,
 'timestamp': '2016-05-30T02:20:00',
 'timestamp_local': '2016-05-30T07:50:00+05:30',
 'unit': 'mg/m3',
 'wind direction': 36.0,
 'wind speed': 2.7}

We can also add keywords to our request. The most useful ones are the following:

per_page: alter the number of data points sent per page (default is 50)
page: iterate over all pages
filter: complex keyword that is very powerful. Examples shown below...

The filter keyword allows you to select by any column in the database. The most useful ones are querying over certain points in time. For example, if we wanted to return all data for EBAM001 after 2017-01-01, we would use the filter keyword as follows:

filter="timestamp,gt,2017-01-01"

We can also join multiple filter arguments together by seperating using a semicolon. For example, to return all data during the month of January 2017:

filter="timestamp,gt,2017-01-01;timestamp,lt,2017-02-01"

See below for working examples



In [10]:

    
# return data after 2017-01-01

resp = api.data("EBAM001", per_page=100, filter="timestamp,gt,2017-01-01")

resp.json()['meta']









    Out[10]:





{'first_url': 'https://tatacenter-airquality.mit.edu/api/v1.0/device/EBAM001/data/?filter=timestamp%2Cgt%2C2017-01-01&per_page=100&page=1&expand=1',
 'last_url': 'https://tatacenter-airquality.mit.edu/api/v1.0/device/EBAM001/data/?filter=timestamp%2Cgt%2C2017-01-01&per_page=100&page=14&expand=1',
 'next_url': 'https://tatacenter-airquality.mit.edu/api/v1.0/device/EBAM001/data/?per_page=100&expand=1&page=2',
 'page': 1,
 'pages': 14,
 'per_page': 100,
 'prev_page': None,
 'total': 1302}

Utilizing the Magical DataFrame

We can also retrieve the data and return as a pandas DataFrame, which is way more useful. When you add the dataframe=True argument to the request, it will return the meta information as a dictionary, and the data as a pandas DataFrame.

Let's see how this works:



In [11]:

    
meta, df = api.data("EBAM001", dataframe=True)

meta









    Out[11]:





{'first_url': 'https://tatacenter-airquality.mit.edu/api/v1.0/device/EBAM001/data/?per_page=50&page=1&expand=1',
 'last_url': 'https://tatacenter-airquality.mit.edu/api/v1.0/device/EBAM001/data/?per_page=50&page=245&expand=1',
 'next_url': 'https://tatacenter-airquality.mit.edu/api/v1.0/device/EBAM001/data/?per_page=50&expand=1&page=2',
 'page': 1,
 'pages': 245,
 'per_page': 50,
 'prev_page': None,
 'total': 12205}

Let's take a look at our data now:



In [12]:

    
df.info()









    



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 14 columns):
alarm                  50 non-null int64
ambient temperature    50 non-null float64
conc_hr                50 non-null float64
conc_rt                50 non-null float64
flowrate               50 non-null float64
instrument             50 non-null object
parameter              50 non-null object
rh external            50 non-null float64
rh internal            50 non-null float64
timestamp              50 non-null datetime64[ns]
timestamp_local        50 non-null datetime64[ns]
unit                   50 non-null object
wind direction         50 non-null float64
wind speed             50 non-null float64
dtypes: datetime64[ns](2), float64(8), int64(1), object(3)
memory usage: 5.5+ KB

Let's get all data from the EBAM for the year 2017



In [13]:

    
meta, df = api.data("EBAM001", per_page=10000, filter="timestamp,gt,2017-01-01", dataframe=True)

df.index = df['timestamp_local']

df.info()









    



<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1302 entries, 2017-01-13 21:00:00 to 2017-03-27 04:00:00
Data columns (total 14 columns):
alarm                  1302 non-null int64
ambient temperature    1302 non-null float64
conc_hr                1302 non-null float64
conc_rt                1302 non-null float64
flowrate               1302 non-null float64
instrument             1302 non-null object
parameter              1302 non-null object
rh external            1302 non-null float64
rh internal            1302 non-null float64
timestamp              1302 non-null datetime64[ns]
timestamp_local        1302 non-null datetime64[ns]
unit                   1302 non-null object
wind direction         1302 non-null float64
wind speed             1302 non-null float64
dtypes: datetime64[ns](2), float64(8), int64(1), object(3)
memory usage: 152.6+ KB



In [14]:

    
# Delete a couple of columns so we can easily peak at the data

del df['instrument']
del df['timestamp']

df.head()









    Out[14]:






  
    
      
      alarm
      ambient temperature
      conc_hr
      conc_rt
      flowrate
      parameter
      rh external
      rh internal
      timestamp_local
      unit
      wind direction
      wind speed
    
    
      timestamp_local
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2017-01-13 21:00:00
      0
      19.9
      0.07
      0.097
      16.7
      pm25
      23.0
      13.0
      2017-01-13 21:00:00
      mg/m3
      151.0
      1.3
    
    
      2017-01-13 21:50:00
      0
      18.6
      0.07
      0.071
      16.7
      pm25
      25.0
      15.0
      2017-01-13 21:50:00
      mg/m3
      151.0
      1.2
    
    
      2017-01-16 14:50:00
      256
      13.7
      0.00
      0.000
      14.2
      pm25
      55.0
      33.0
      2017-01-16 14:50:00
      mg/m3
      279.0
      0.8
    
    
      2017-01-16 15:20:00
      256
      15.4
      0.00
      0.163
      16.7
      pm25
      53.0
      33.0
      2017-01-16 15:20:00
      mg/m3
      196.0
      0.6
    
    
      2017-01-16 15:40:00
      256
      16.3
      0.18
      0.191
      16.7
      pm25
      51.0
      30.0
      2017-01-16 15:40:00
      mg/m3
      199.0
      0.9

Export Data

Once in a DataFrame, it is super easy to export and save your data. I personally recommend using feather, as it is much faster than anything else, and is language agnostic. There are libraries built for R, Python, and Julia, making it easy to analyze your data in any programming language (OSS only, obviously).

To export the dataframe to feather, do the following:



In [15]:

    
%time feather.write_dataframe(df, "EBAM001_2017_data.feather")









    



CPU times: user 2.12 ms, sys: 463 µs, total: 2.59 ms
Wall time: 2.58 ms



In [ ]:

	alarm	ambient temperature	conc_hr	conc_rt	flowrate	parameter	rh external	rh internal	timestamp_local	unit	wind direction	wind speed
timestamp_local
2017-01-13 21:00:00	0	19.9	0.07	0.097	16.7	pm25	23.0	13.0	2017-01-13 21:00:00	mg/m3	151.0	1.3
2017-01-13 21:50:00	0	18.6	0.07	0.071	16.7	pm25	25.0	15.0	2017-01-13 21:50:00	mg/m3	151.0	1.2
2017-01-16 14:50:00	256	13.7	0.00	0.000	14.2	pm25	55.0	33.0	2017-01-16 14:50:00	mg/m3	279.0	0.8
2017-01-16 15:20:00	256	15.4	0.00	0.163	16.7	pm25	53.0	33.0	2017-01-16 15:20:00	mg/m3	196.0	0.6
2017-01-16 15:40:00	256	16.3	0.18	0.191	16.7	pm25	51.0	30.0	2017-01-16 15:40:00	mg/m3	199.0	0.9