In [46]:
import GPy
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from IPython.display import display
%matplotlib inline
In [24]:
tsla = pd.read_csv('../data/raw/financial/tsla.csv').sort_index(ascending=False)
tsla['Date'] = pd.to_datetime(tsla['Date'])
tsla.head()
Out[24]:
In [36]:
# regress over days since the first date seen
dates = tsla['Date'].as_matrix()
first_date = np.min(dates)
T = np.zeros_like(dates).astype(float).reshape(-1,1)
for i, t in np.ndenumerate(dates):
T[i] = (t - first_date) / np.timedelta64(1, 'D')
T[:5].T
Out[36]:
In [37]:
price = tsla['Open'].as_matrix().reshape(-1,1)
price[:5].T
Out[37]:
In [45]:
plt.plot(T, price)
plt.title('TSLA Daily Open Price')
plt.xlabel('Days Since {}'.format(pd.to_datetime(first_date).strftime('%d-%m-%Y')))
Out[45]:
Now let's try running Gaussian Process regression on these stock prices, using a poor choice of hyperparameters. Within the RBF (aka. Squared Exponential) kernel,
$$ \kappa(x,x') = \sigma_f \exp(-||x - x'||^2/(2\ell^2)) $$the lengthscale $\ell$, when small, means that each point only depends on the points very close to it. For our data, having points at every integer along an interval, having a lengthscale of 0.75 means that the closest point away will be weighted at about $-\sigma_f$. This means that our function will be pulled to the mean $\mu=0$ very quickly, resulting in a plot that looks almost piecewise linear.
The variance $\sigma_f$ controls how 'wiggly' the function is. By increasing this, with small lengthscale $\ell$, you end up being pulled to the mean very quickly.
We can see these affects below.
In [50]:
kernel = GPy.kern.RBF(input_dim=1, variance=1.5, lengthscale=0.75)
model = GPy.models.GPRegression(T, price, kernel=kernel, noise_var=0.2)
display(model)
In [51]:
model.plot()
Out[51]:
Now let us find the optimal parameters by maximizing the marginal likelihood over our data. This results in a very smooth function. Note that the optimal lengthscale is about a month, and the optimal variance is about \$25K.
In [54]:
model.optimize()
display(model)
In [53]:
model.plot()
Out[53]: