Optimizer

In gradient-base optimization algorithms, we update the parameters (or weights) using the gradients in each iteration. We call this updating function as Optimizer.

The main method of an optimizer is update(weight, grad), which updates a NDArray weight using a NDArray gradient. But given that a multi-layer neural network often has more than one weights, we assign each weight a unique integer index. Furthermore, an optimizer may need space to store auxiliary state, such as momentum, we also allow a user-defined state for updating. In summary, an optimizer has two major methods

create_state(index, weight): create auxiliary state for the index-th weight.
update(index, weight, grad, state): update the index-th weight given the gradient and auxiliary state. The state can be also updated.

Basic Usage

Create and Update

MXNet has already implemented several popular optimizers in python/mxnet/optimizer.py. An convenient way to create one is by using optimizer.create(name, args...). The following codes create a standard SGD updater which does

weight = weight - learning_rate * grad



In [1]:

    
import mxnet as mx
opt = mx.optimizer.create('sgd', learning_rate=.1)

Then we can use the update function.



In [2]:

    
grad = mx.nd.ones((2,3))
weight = mx.nd.ones((2,3))
index = 0
opt.update(index, weight, grad, state=None)
print(weight.asnumpy())









    



[[ 0.89999998  0.89999998  0.89999998]
 [ 0.89999998  0.89999998  0.89999998]]

When momentum is non-zero, the sgd optimizer needs extra state.



In [3]:

    
mom_opt = mx.optimizer.create('sgd', learning_rate=.1, momentum=.01)
state = mom_opt.create_state(index, weight)
opt.update(index, weight, grad, state)
print(state.asnumpy())









    



[[-0.1 -0.1 -0.1]
 [-0.1 -0.1 -0.1]]

Flexible Learning Rate

lr scheduler
layer-wise lr: set_lr_mult, set_wd_mult

More optimizers

Customized Optimizer



In [ ]: