Consider a ridge regression problem as follows:
This problem is a special case of a more general family of problems called regularized empirical risk minimization, where the objective function is usually comprised of two parts: a set of loss terms and a regularization term.
Now, we show how to use the package SGDOptim to solve such a problem. First, we have to prepare some simulation data:
In [2]:
w = [3.0, -4.0, 5.0]; # the underlying model coefficients
Out[2]:
In [3]:
n = 10000; X = randn(3, n); # generate 10000 sample features
Out[3]:
In [4]:
sig = 0.1; y = vec(w'X) + sig * randn(n); # generate the responses, adding some noise
Out[4]:
Now, let's try to estimate $w$ from the data. This can be done by the following statement:
In [5]:
using SGDOptim
The first step is to construct a risk model, which comprises a prediction model and a loss function.
In [8]:
rmodel = riskmodel(LinearPred(3), # use linear prediction x -> w'x, 3 is the input dimension
SqrLoss()) # use squared loss: loss(u, y) = (u - y)^2/2
Out[8]:
Now, we are ready to solve the problem:
In [10]:
w_e = sgd(rmodel,
zeros(3), # the initial guess
minibatch_seq(X, y, 10), # supply the data in mini-batches, each with 10 samples
reg = SqrL2Reg(1.0e-4), # add a squared L2 regression with coefficient 1.0e-4
lrate = t->1.0/(100.0 + t), # set the rule of learning rate
cbinterval = 100, # invoke the callback every 100 iterations
callback = simple_trace) # print the optimization trace in callback
Out[10]:
Note that 10000 samples can be partitioned into 1000 minibatches of size 10. So there were 1000 iterations, each using a single minibatch.
Now let's compare the estimated solution with the ground-truth:
In [12]:
sumabs2(w_e - w) / sumabs2(w)
Out[12]:
The result looks quite accurate. We are done!