Datasets: Downloading Data from Google Trends

28th May 2014

Neil Lawrence

This data set collection was inspired by a ipython notebook from sahuguet which made queries to google trends and downloaded the results. We've modified the download to cache the results of a query: making multiple calls to the google API results in a block due to terms of service violations, cacheing the data locally prevents this happening.


In [ ]:
import pods
%matplotlib inline

In [ ]:
# calling without arguments uses the default query terms
data = pods.datasets.google_trends()

The default query terms are 'big data', 'data science' and 'machine learning'. The dictionary returned from the call contains the standard 'X' and 'y' keys that are ready to be used in the GPy toolkit as inputs to the Gaussian process. In this case the 'X' variables are the time (first column) and an index representing the query.


In [ ]:
print(data['X'][284, :])

So the 284th element of X contains is the 34th time point of the query term 2, which in this case is the 34th time point of the 'machine learning' time series. The value of the time series at that point is given by the corresponding row of Y


In [ ]:
print(data['Y'][284, :])

The dictionary also contains a pandas data frame of the trend data, which is in line with what sahuguet originally returned.


In [ ]:
data['data frame'].describe()

And we can plot the trends data to see what the effect is.


In [ ]:
data['data frame'].set_index('Date', inplace=True) # Set date column as index
data['data frame'].plot()

Dogs, Cats and Rabbits

Another data set we might consider downloading from google trends is different pets. Below we consider cats, dogs and rabbits.


In [ ]:
data = pods.datasets.google_trends(['cats', 'dogs', 'rabbits'])
data['data frame'].set_index('Date', inplace=True)
data['data frame'].plot()

Here we've plotted the data in the same manner as sahuguet suggested in his original notebook, using the plotting facility of pandas.

Games Consoles

Finally we can try and compare different games console popularity.


In [ ]:
data = pods.datasets.google_trends(['xbox one', 'wii u', 'ps4'])

In [ ]:
data['data frame'].set_index('Date', inplace=True)
data['data frame'].plot()