We load data from the web (http://www.quandl.com)
In [1]:
from os import sys, path
sys.path.append(path.abspath('../src/regression'))
import linear_regression
from linear_regression import *
%matplotlib inline
# We use the london market to get the stock values of gold and silver
gold = quandl.get("LBMA/GOLD", returns="numpy", start_date="2015-01-01")
silver = quandl.get("LBMA/SILVER", returns="numpy", start_date="2015-01-01")
copper = quandl.get("CHRIS/CME_SI3", returns="numpy", start_date="2015-01-01")
We downloaded the london daily gold and silver values in dollars of the past two years. For each days (x_silver or x_gold), we have gold value (y_gold) and the silver value (y_silver) in dollars.
We vizualize gold and silver values in time (left plot) and visualize how gold value depends on silver (right plot).
In [ ]:
# Retrieve gold and silver values in $ by day
XY_gold = stock_arr_to_XY(gold)
XY_silver = stock_arr_to_XY(silver)
XY_copper = stock_arr_to_XY(copper)
# Filter arrays such that gold and silver shares the same Xs
XY_gold, XY_silver = filter_on_same_X(XY_gold, XY_silver)
XY_gold, XY_copper = filter_on_same_X(XY_gold, XY_copper)
XY_copper, XY_silver = filter_on_same_X(XY_copper, XY_silver)
x_gold, y_gold = XY_gold
x_silver, y_silver = XY_silver
x_copper, y_copper = XY_copper
# Plot the data
plot_data(XY_silver, XY_gold, 'silver', 'gold')
plot_data(XY_copper, XY_gold, 'copper', 'gold')
(In case of doubts, read this : http://cs229.stanford.edu/notes/cs229-notes1.pdf)
We want to know whether the gold and the silver values are correlated. We hypothesize that the price of Gold (y_gold or Y) is linearly equal to the one of Silver (y_silver or X) :
We are going to set the weight and bias to some random values and compute how far our predictions $P$ are from the true gold value $Y$. They are many distances we can chose from, one of which is the sum of squares (L2 distance):
In [ ]:
# Rename y_silver to X and y_gold to Y
X, Y = [np.array(y_silver), ], np.array(y_gold)
# Initilize the parameters
Ws = [0.5, 0.5]
alphas = (0.0001, 0.01)
# Load Trainer
t = Trainer(X, Y, Ws, alphas)
# Define Prediction and Loss
t.pred = lambda X : np.multiply(X[0], t.Ws[0]) + t.Ws[1]
t.loss = lambda : (np.power((t.Y - t.pred(X)), 2) * 1 / 2.).mean()
# Define the gradient functions
dl_dp = lambda : -(t.Y - t.pred(X))
dl_dw0 = lambda : np.multiply(dl_dp(), X[0]).mean()
dl_dw1 = lambda : dl_dp().mean()
t.dWs = (dl_dw0, dl_dw1)
# Start training
anim = t.animated_train(is_notebook=True)
# Show it
from IPython.display import HTML
HTML(anim.to_html5_video())
In [ ]:
print "Final Loss is %f" % t.loss()