Using Treasure Data with Python and Pandas

Treasure Data has a python client, which means pandas/python users can connect directly from their iPython Notebooks.

All you need is a Treasure Data account, which you can get from here


In [2]:
import tdclient
import pandas as pd
import numpy as np
%matplotlib inline

Getting Treasure Data's apikey

You need to get your Treasure Data API key. There are two ways to fetch your API keys after you sign up for Treasure Data.

  1. From web console: Please access this URL. At the right most column, you can retrieve the API key. You want to use the Normal, not Write-Only API keys to run queries.
  2. From CLI: If you are the td command user, running the following command exposes your API key.
     td apikey:show

In [3]:
apikey = 'Your API key here' # Setting your API key

In [4]:
client = tdclient.Client(apikey) # instantiating the client

Running a query against the sample dataset

As you can see below, running queries is easy. Just use the query method, which accepts three arguments.

  1. The first argument is the name of the database
  2. The second argument is the query string (Make sure you use single quotes if you are using the Presto engine!)
  3. The optional keyword arguments. I am using type='presto' here to use Presto and not Hive.

In [5]:
job = client.query('sample_datasets',
                   "SELECT TD_TIME_FORMAT(time, 'yyyy') AS t, SUM(volume) "
                   "FROM nasdaq "
                   "WHERE symbol='AMZN' "
                   "GROUP BY TD_TIME_FORMAT(time, 'yyyy') "
                   "ORDER BY t", type='presto')

Asynchronous execution

Your query creates a job asynchronously. Please check the job is

  1. finished (job.finished() should return True)
  2. successful (job.status() should return success)

In [6]:
[job.status(), job.finished()]


Out[6]:
[u'success', True]

In [7]:
results = [r for r in job.result()]

In [8]:
results_df = pd.DataFrame.from_records(results, columns=('year', 'AMZN trade volume'))

In [9]:
results_df.plot(x='year')


Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0xba64748>