MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix the flavours of symbolic programming and imperative programming to maximize efficiency and productivity. In its core, a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient. The library is portable and lightweight, and it scales to multiple GPUs and multiple machines.
In [40]:
#=
using MNIST, Images
trainX, trainY = traindata()
testX, testY = testdata()
function showtestdigit(d::Int64)
a = zeros(28,28)
j=1
for i=1:28:length(testX[:,d])-27
a[:,j]= testX[:,d][i:i+27]
j = j+1
end
grayim(a)
end
function showtraindigit(d::Int64)
a = zeros(28,28)
j=1
for i=1:28:length(trainX[:,d])-27
a[:,j]= trainX[:,d][i:i+27]
j = j+1
end
grayim(a)
end
=#
In [42]:
# 1<n<60000
n=60000
@show trainY[n]
showtraindigit(n)
Out[42]:
In [11]:
# GPU specific configurations
ENV["MXNET_HOME"] = joinpath(Pkg.dir("MXNet"), "deps", "usr", "lib")
Base.compilecache("MXNet")
using MXNet
Create a placeholder for the data.
In [12]:
data = mx.Variable(:data)
Out[12]:
In [13]:
fc1 = mx.FullyConnected(data = data, name=:fc1, num_hidden=128)
act1 = mx.Activation(data = fc1, name=:relu1, act_type=:relu)
fc2 = mx.FullyConnected(data = act1, name=:fc2, num_hidden=64)
act2 = mx.Activation(data = fc2, name=:relu2, act_type=:relu)
fc3 = mx.FullyConnected(data = act2, name=:fc3, num_hidden=10)
Out[13]:
We then add a final SoftmaxOutput operation to turn the 10-dimensional prediction to proper probability values for the 10 classes.
In [14]:
mlp = mx.SoftmaxOutput(data = fc3, name=:softmax)
Out[14]:
As we can see, the MLP is just a chain of layers. For this case, we can also use the mx.chain macro. The same architecture above can be defined as,
In [15]:
mlp = @mx.chain mx.Variable(:data) =>
mx.FullyConnected(name=:fc1, num_hidden=128) =>
mx.Activation(name=:relu1, act_type=:relu) =>
mx.FullyConnected(name=:fc2, num_hidden=64) =>
mx.Activation(name=:relu2, act_type=:relu) =>
mx.FullyConnected(name=:fc3, num_hidden=10) =>
mx.SoftmaxOutput(name=:softmax)
Out[15]:
After defining the architecture, we are ready to load the MNIST data. MXNet.jl provide built-in data providers for the MNIST dataset, which could automatically download the dataset into Pkg.dir("MXNet")/data/mnist if necessary.
In [16]:
batch_size = 100
include(Pkg.dir("MXNet", "examples", "mnist", "mnist-data.jl"))
train_provider, eval_provider = get_mnist_providers(batch_size)
Out[16]:
Given the architecture and data, we can instantiate an model to do the actual training. mx.FeedForward is the built-in model that is suitable for most feed-forward architectures. When constructing the model, we also specify the context on which the computation should be carried out.
In [17]:
model = mx.FeedForward(mlp, context=mx.gpu())
Out[17]:
The last thing we need to specify is the optimization algorithm (a.k.a. optimizer) to use. We use the basic SGD with a fixed learning rate 0.1 and momentum 0.9:
In [18]:
optimizer = mx.SGD(lr=0.1, momentum=0.9, weight_decay=0.00001)
Out[18]:
Now we can do the training. Here the n_epoch parameter specifies that we want to train for 20 epochs. We also supply a eval_data to monitor validation accuracy on the validation set.
In [19]:
@time mx.fit(model, optimizer, train_provider, n_epoch=20, eval_data=eval_provider)
In [20]:
probs = mx.predict(model, eval_provider)
Out[20]:
In [21]:
# collect all labels from eval data
labels = Array[]
for batch in eval_provider
push!(labels, copy(mx.get(eval_provider, batch, :softmax_label)))
end
labels = cat(1, labels...)
Out[21]:
In [22]:
# Now we use compute the accuracy
correct = 0
for i = 1:length(labels)
# labels are 0...9
if indmax(probs[:,i]) == labels[i]+1
correct += 1
end
end
accuracy = 100correct/length(labels)
println(mx.format("Accuracy on eval set: {1:.2f}%", accuracy))
In [37]:
n=144
println(Int(labels[n]))
showtestdigit(n)
Out[37]:
In [39]:
t = [1639,23]
using DataFrames
df = DataFrame(Names=["CPU", "GPU"], Time=t, Accuracy=[97.31, 97.69])
using Gadfly
p1=Gadfly.plot( x=df[:Names],y=df[:Time], Guide.ylabel("Time in sec"), Geom.bar, Guide.title("Performance."))
#p2=Gadfly.plot( x=df[:Names],y=df[:Accuracy], Guide.ylabel("Time in sec"), Geom.bar, Guide.title("Accuracy measure."))
Out[39]:
In [ ]: