Homework 1

Google Trends is pretty awesome, except that on the site you cannot do more than overlay plots. Here we'll play with search term data downloaded from Google and draw our own conclusions.

Data from: https://www.google.com/trends/explore#q=spring%20break%2C%20textbooks%2C%20norad%2C%20skiing%2C%20global%20warming&cmpt=q&tz=Etc%2FGMT%2B4

We will be using numpy and matplotlib to explore the data. Remember you can import all these modules at once using:


In [2]:
%pylab inline


Populating the interactive namespace from numpy and matplotlib

1. Use the "trends.csv" file and csv2rec() to import the data and reproduce this plot:


In [3]:
# we can import the CSV data as a numpy rec array
from matplotlib.pylab import csv2rec
trends = csv2rec('trends.csv')

In [ ]:

2. Determine in which week of each year (for all five search trends including "global_warming") that search term reached its peak. What trends can you spot with any of the terms?


In [2]:

3. Which term has the largest scatter about its median value? Which term has the smallest scatter? $\sigma_{median}^2 = \sum (x_i - {\rm median}(x_i))^2$


In [ ]:

3. Determine the time lag, in weeks, that maximizes the cross-correlation between "skiing" and "spring break". Try this also for "norad" and "spring break".

numpy has tools for cross-correlations: result = np.correlate(trends.spring_break,trends.spring_break,mode='full') plot(arange(result.size) - result.size/2,result)


In [ ]:

4. Download the trend data on two terms of your choosing and redo the questions above. You can obtain the data from:

https://www.google.com/trends/


In [ ]: