In [5]:

    
import pandas as pd
import datetime
import numpy as np
import scipy as sp
import os
import matplotlib.pyplot as plt
import matplotlib
# from ggplot import geom_point
%matplotlib inline
# font = {'size'   : 18}
# matplotlib.rc('font', **font)
matplotlib.rcParams['figure.figsize'] = (12.0, 6.0)
os.chdir("/root/Envs/btc-project/btc-price-analysis")

reading google trend data

Google trend data seems to be updated since the orignal google trend paper. Currently, we are unable to get the absolute search volume data. Instead, for each week, we get a normalized search interest index. This index seems to be ranging from 0 to 100. The index is normalized against the search region. Therefore, it shows more of a search "density", instead of the pure count of searches.

This update might render the replicate not possible.



In [6]:

    
# parse data from google trend
def parseWeek(w):
    return w.split(" - ")[1]

In this replicate, we focus on one keyword, 'Bitcoin'.



In [7]:

    
trend = pd.read_csv("./data/trend.csv", converters={0:parseWeek})
trend['Week'] = pd.to_datetime(trend['Week'])
trend.set_index(['Week'], inplace=True)
trend.columns = ['search']
trend.head()

We use the Bitcoin price index data from coinbase. Resample it per week starting at every Saturday to match google trend data. That means every Friday will be our action day, to either buy or sell bitcoin.



In [8]:

    
time_format = "%Y-%m-%dT%H:%M:%S"
data = pd.read_csv("./data/price.csv", names=['time', 'price'], index_col='time',
                   parse_dates=[0], date_parser=lambda x: datetime.datetime.strptime(x, time_format))
bpi = data.resample('w-sat', how='ohlc')
bpi.index.name = 'Week'
bpi = pd.DataFrame(bpi['price']['close'])
bpi.head()



In [9]:

    
trend_bpi = pd.merge(trend, bpi, how='right', left_index=True, right_index=True)
trend_bpi.columns = ['search', 'close_price']
trend_bpi = trend_bpi['2012':]
trend_bpi.head()









    Out[9]:






  
    
      
      search
      close_price
    
    
      Week
      
      
    
  
  
    
      2012-01-07
      2
      6.73
    
    
      2012-01-14
      2
      6.73
    
    
      2012-01-21
      4
      6.16
    
    
      2012-01-28
      2
      5.68
    
    
      2012-02-04
      2
      5.92

BPI and search interest plot



In [10]:

    
plt.figure()
ax = trend_bpi.plot(secondary_y=['close_price'])









    





<matplotlib.figure.Figure at 0x7fc4e578d590>

Correlation given by Pearson's coefficient



In [11]:

    
trend_bpi.corr()









    Out[11]:






  
    
      
      search
      close_price
    
  
  
    
      search
      1.000000
      0.736974
    
    
      close_price
      0.736974
      1.000000

BPI v. search interest (relative change)

Similar to return index, here we calculate the change of both variables (BPI and search interest) each week compared with last week. We only show 2014 to 2015 as previous variance is too big.



In [12]:

    
trend_bpi.pct_change().plot()









    Out[12]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fc4e5510450>

Again, correlation.



In [13]:

    
trend_bpi.pct_change().corr()









    Out[13]:






  
    
      
      search
      close_price
    
  
  
    
      search
      1.000000
      0.179871
    
    
      close_price
      0.179871
      1.000000

Pearson correlation coefficient has been decreased once we start examining the return index instead of the actual value of two variables. Does this matter?

Replicating google trend paper

We first take the moving average of the search interest index (SII).



In [14]:

    
delta_t = 3

trend_bpi['rolling_SII'] = pd.rolling_mean(trend_bpi.search, delta_t)
trend_bpi.head()









    Out[14]:






  
    
      
      search
      close_price
      rolling_SII
    
    
      Week
      
      
      
    
  
  
    
      2012-01-07
      2
      6.73
      NaN
    
    
      2012-01-14
      2
      6.73
      NaN
    
    
      2012-01-21
      4
      6.16
      2.666667
    
    
      2012-01-28
      2
      5.68
      2.666667
    
    
      2012-02-04
      2
      5.92
      2.666667

We shift the moving average one week ahead, as we trying to predict BPI based on previous search interest.



In [15]:

    
trend_bpi['rolling_SII_shifted'] = trend_bpi.rolling_SII.shift(1)
trend_bpi.head()









    Out[15]:






  
    
      
      search
      close_price
      rolling_SII
      rolling_SII_shifted
    
    
      Week
      
      
      
      
    
  
  
    
      2012-01-07
      2
      6.73
      NaN
      NaN
    
    
      2012-01-14
      2
      6.73
      NaN
      NaN
    
    
      2012-01-21
      4
      6.16
      2.666667
      NaN
    
    
      2012-01-28
      2
      5.68
      2.666667
      2.666667
    
    
      2012-02-04
      2
      5.92
      2.666667
      2.666667

Generate order signal

If the search interest of this week is less than the moving average of interest of the past three week (delta_t), people search less about bitcoint, therefore it is likely this is the time to buy in; otherwise, if people start searching more about bitcoin this week, it is the time to sell.

We generate order data. If it is a 1, it means at that particular week we buy BTC, and sell it next week. If it is -1, we sell it this week and buy it back next week.

We assign the order signal based on the comparison of this week's search interest and the rolling mean of previous three weeks' search interest.



In [16]:

    
trend_bpi['order']=0
trend_bpi['SII_diff'] = trend_bpi.search - trend_bpi.rolling_SII_shifted
## SII_diff >= diff => search interest rises this week => price rises next week
trend_bpi.loc[trend_bpi.SII_diff >= 0,'order'] = -1
## SII_diff < diff => search interest falls this week => price falls next week
trend_bpi.loc[trend_bpi.SII_diff < 0,'order'] = 1
trend_bpi.head()









    Out[16]:






  
    
      
      search
      close_price
      rolling_SII
      rolling_SII_shifted
      order
      SII_diff
    
    
      Week
      
      
      
      
      
      
    
  
  
    
      2012-01-07
      2
      6.73
      NaN
      NaN
      0
      NaN
    
    
      2012-01-14
      2
      6.73
      NaN
      NaN
      0
      NaN
    
    
      2012-01-21
      4
      6.16
      2.666667
      NaN
      0
      NaN
    
    
      2012-01-28
      2
      5.68
      2.666667
      2.666667
      1
      -0.666667
    
    
      2012-02-04
      2
      5.92
      2.666667
      2.666667
      1
      -0.666667

Evaluation as returns

Compute log returns as proposed in the paper: the difference of prices of two consecutive weeks times the order signal.



In [17]:

    
trend_bpi['log_returns'] = trend_bpi.order * np.log(trend_bpi.close_price.shift(-1)) - \
                            trend_bpi.order * np.log(trend_bpi.close_price)



In [18]:

    
trend_bpi.log_returns.head()









    Out[18]:





Week
2012-01-07    0.000000
2012-01-14    0.000000
2012-01-21    0.000000
2012-01-28    0.041385
2012-02-04   -0.050227
Freq: W-SAT, Name: log_returns, dtype: float64

Positive returns indicate earning on that week.



In [19]:

    
trend_bpi[trend_bpi.log_returns>0].close_price.count()









    Out[19]:





65

Negative returns indicate losing on that week.



In [20]:

    
trend_bpi[trend_bpi.log_returns<0].close_price.count()









    Out[20]:





101

Plot cumulative returns over time.



In [21]:

    
(np.exp(trend_bpi.log_returns.cumsum()) - 1).plot()









    Out[21]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fc4e539c350>

Plot cumulative return only for 2015



In [22]:

    
(np.exp(trend_bpi['2015'].log_returns.cumsum()) - 1).plot()









    Out[22]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fc4e5514890>

Plot only 2014



In [23]:

    
(np.exp(trend_bpi['2014'].log_returns.cumsum()) - 1).plot()









    Out[23]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fc4e4f885d0>

Plot only 2013



In [24]:

    
(np.exp(trend_bpi['2013'].log_returns.cumsum()) - 1).plot()









    Out[24]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fc4e4dcbd50>

It seems this strategy relies on heavily the performance of earlies times (e.g. 2013), than later times. If earlier times work well, the performance overall is really good. However, cutting off earlier times the performance is much worse.

Evaluation as prediction of trend

Label each week as up (1) and down (-1) comparing its price and that of previous week.



In [25]:

    
def trend_label(cur,prev):
    if cur == prev:
        return 0
    elif cur > prev:
        return 1
    else:
        return -1
trend_bpi['truth'] = np.vectorize(trend_label)(trend_bpi.close_price, trend_bpi.close_price.shift(1))
trend_bpi.head()









    Out[25]:






  
    
      
      search
      close_price
      rolling_SII
      rolling_SII_shifted
      order
      SII_diff
      log_returns
      truth
    
    
      Week
      
      
      
      
      
      
      
      
    
  
  
    
      2012-01-07
      2
      6.73
      NaN
      NaN
      0
      NaN
      0.000000
      -1
    
    
      2012-01-14
      2
      6.73
      NaN
      NaN
      0
      NaN
      0.000000
      0
    
    
      2012-01-21
      4
      6.16
      2.666667
      NaN
      0
      NaN
      0.000000
      -1
    
    
      2012-01-28
      2
      5.68
      2.666667
      2.666667
      1
      -0.666667
      0.041385
      -1
    
    
      2012-02-04
      2
      5.92
      2.666667
      2.666667
      1
      -0.666667
      -0.050227
      1



In [26]:

    
trend_bpi.groupby('truth').truth.count()









    Out[26]:





truth
-1     72
 0      1
 1    100
Name: truth, dtype: int64

therefore in our case "-1" is the positive case and "1" negative case.



In [27]:

    
trend_bpi.groupby('order').truth.count()









    Out[27]:





order
-1    92
 0     7
 1    74
Name: truth, dtype: int64



In [28]:

    
trend_bpi_exclude_init = trend_bpi[3:]
true_prediction = trend_bpi_exclude_init.truth==trend_bpi_exclude_init.order
correct_ratio = trend_bpi_exclude_init[true_prediction].close_price.count()/float(trend_bpi_exclude_init.close_price.count())
print "Correctly predicting trend: %f" % correct_ratio









    



Correctly predicting trend: 0.423529



In [29]:

    
true_positive = trend_bpi[(trend_bpi.truth==1)&(trend_bpi.order==1)].order.count()
false_negative = trend_bpi[(trend_bpi.truth==1)&(trend_bpi.order==-1)].order.count()
false_positive = trend_bpi[(trend_bpi.truth==-1)&(trend_bpi.order==1)].order.count()
true_negative = trend_bpi[(trend_bpi.truth==-1)&(trend_bpi.order==-1)].order.count()
print "TP: %d, FN: %d, FP: %d, TN: %d" % (true_positive, false_negative, false_positive, true_negative)









    



TP: 39, FN: 59, FP: 35, TN: 33



In [30]:

    
tp_rate = float(true_positive) /(true_positive+false_negative)
fp_rate = float(false_positive) /(true_negative+false_positive)
print "TPR: %f, FPR: %f" % (tp_rate, fp_rate)









    



TPR: 0.397959, FPR: 0.514706

	close
Week
2011-11-05	2.98
2011-11-12	3.01
2011-11-19	2.19
2011-11-26	2.47
2011-12-03	2.78

	search	close_price
Week
2012-01-07	2	6.73
2012-01-14	2	6.73
2012-01-21	4	6.16
2012-01-28	2	5.68
2012-02-04	2	5.92