Optimizer

In gradient-base optimization algorithms, we update the parameters (or weights) using the gradients in each iteration. We call this updating function as Optimizer.

The main method of an optimizer is update(weight, grad), which updates a NDArray weight using a NDArray gradient. But given that a multi-layer neural network often has more than one weights, we assign each weight a unique integer index. Furthermore, an optimizer may need space to store auxiliary state, such as momentum, we also allow a user-defined state for updating. In summary, an optimizer has two major methods

  • create_state(index, weight): create auxiliary state for the index-th weight.
  • update(index, weight, grad, state): update the index-th weight given the gradient and auxiliary state. The state can be also updated.

Basic Usage

Create and Update

MXNet has already implemented several popular optimizers in python/mxnet/optimizer.py. An convenient way to create one is by using optimizer.create(name, args...). The following codes create a standard SGD updater which does

weight = weight - learning_rate * grad

In [1]:
import mxnet as mx
opt = mx.optimizer.create('sgd', learning_rate=.1)

Then we can use the update function.


In [2]:
grad = mx.nd.ones((2,3))
weight = mx.nd.ones((2,3))
index = 0
opt.update(index, weight, grad, state=None)
print(weight.asnumpy())


[[ 0.89999998  0.89999998  0.89999998]
 [ 0.89999998  0.89999998  0.89999998]]

When momentum is non-zero, the sgd optimizer needs extra state.


In [3]:
mom_opt = mx.optimizer.create('sgd', learning_rate=.1, momentum=.01)
state = mom_opt.create_state(index, weight)
opt.update(index, weight, grad, state)
print(state.asnumpy())


[[-0.1 -0.1 -0.1]
 [-0.1 -0.1 -0.1]]

Flexible Learning Rate

More optimizers

Customized Optimizer


In [ ]: