In gradient-base optimization algorithms, we update the parameters (or weights) using the gradients in each iteration. We call this updating function as Optimizer
.
The main method of an optimizer is update(weight, grad)
, which updates a NDArray weight using a NDArray gradient. But given that a multi-layer neural network often has more than one weights, we assign each weight a unique integer index. Furthermore, an optimizer may need space to store auxiliary state, such as momentum, we also allow a user-defined state for updating. In summary, an optimizer has two major methods
create_state(index, weight)
: create auxiliary state for the index
-th weight. update(index, weight, grad, state)
: update the index
-th weight given the gradient and auxiliary state. The state can be also updated.MXNet has already implemented several popular optimizers in python/mxnet/optimizer.py. An convenient way to create one is by using optimizer.create(name, args...)
. The following codes create a standard SGD updater which does
weight = weight - learning_rate * grad
In [1]:
import mxnet as mx
opt = mx.optimizer.create('sgd', learning_rate=.1)
Then we can use the update
function.
In [2]:
grad = mx.nd.ones((2,3))
weight = mx.nd.ones((2,3))
index = 0
opt.update(index, weight, grad, state=None)
print(weight.asnumpy())
When momentum is non-zero, the sgd optimizer needs extra state.
In [3]:
mom_opt = mx.optimizer.create('sgd', learning_rate=.1, momentum=.01)
state = mom_opt.create_state(index, weight)
opt.update(index, weight, grad, state)
print(state.asnumpy())
In [ ]: