In gradient-base optimization algorithms, we update the parameters (or weights) using the gradients in each iteration. We call this updating function as Optimizer.
The main method of an optimizer is update(weight, grad), which updates a NDArray weight using a NDArray gradient. But given that a multi-layer neural network often has more than one weights, we assign each weight a unique integer index. Furthermore, an optimizer may need space to store auxiliary state, such as momentum, we also allow a user-defined state for updating. In summary, an optimizer has two major methods
create_state(index, weight): create auxiliary state for the index-th weight. update(index, weight, grad, state): update the index-th weight given the gradient and auxiliary state. The state can be also updated.MXNet has already implemented several popular optimizers in python/mxnet/optimizer.py. An convenient way to create one is by using optimizer.create(name, args...). The following codes create a standard SGD updater which does
weight = weight - learning_rate * grad
In [1]:
import mxnet as mx
opt = mx.optimizer.create('sgd', learning_rate=.1)
Then we can use the update function.
In [2]:
grad = mx.nd.ones((2,3))
weight = mx.nd.ones((2,3))
index = 0
opt.update(index, weight, grad, state=None)
print(weight.asnumpy())
When momentum is non-zero, the sgd optimizer needs extra state.
In [3]:
mom_opt = mx.optimizer.create('sgd', learning_rate=.1, momentum=.01)
state = mom_opt.create_state(index, weight)
opt.update(index, weight, grad, state)
print(state.asnumpy())
In [ ]: