Fast Sign Adversary Generation Example

This notebook demos finds adversary examples using MXNet Gluon and taking advantage of the gradient information

[1] Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572 (2014). https://arxiv.org/abs/1412.6572


In [1]:
%matplotlib inline
import mxnet as mx
import numpy as np

import matplotlib.pyplot as plt
import matplotlib.cm as cm

from mxnet import gluon

Build simple CNN network for solving the MNIST dataset digit recognition task


In [17]:
ctx = mx.gpu() if mx.context.num_gpus() else mx.cpu()
batch_size = 128

Data Loading


In [3]:
transform = lambda x,y: (x.transpose((2,0,1)).astype('float32')/255., y)

train_dataset = gluon.data.vision.MNIST(train=True).transform(transform)
test_dataset = gluon.data.vision.MNIST(train=False).transform(transform)

train_data = gluon.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=5)
test_data = gluon.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

Create the network


In [4]:
net = gluon.nn.HybridSequential()
with net.name_scope():
    net.add(
        gluon.nn.Conv2D(kernel_size=5, channels=20, activation='tanh'),
        gluon.nn.MaxPool2D(pool_size=2, strides=2),
        gluon.nn.Conv2D(kernel_size=5, channels=50, activation='tanh'),
        gluon.nn.MaxPool2D(pool_size=2, strides=2),
        gluon.nn.Flatten(),
        gluon.nn.Dense(500, activation='tanh'),
        gluon.nn.Dense(10)
    )

Initialize training


In [5]:
net.initialize(mx.initializer.Uniform(), ctx=ctx)
net.hybridize()

In [6]:
loss = gluon.loss.SoftmaxCELoss()

In [7]:
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.1, 'momentum':0.95})

Training loop


In [8]:
epoch = 3
for e in range(epoch):
    train_loss = 0.
    acc = mx.metric.Accuracy()
    for i, (data, label) in enumerate(train_data):
        data = data.as_in_context(ctx)
        label = label.as_in_context(ctx)
        
        with mx.autograd.record():
            output = net(data)
            l = loss(output, label)
            
        l.backward()
        trainer.update(data.shape[0])
        
        train_loss += l.mean().asscalar()
        acc.update(label, output)
    
    print("Train Accuracy: %.2f\t Train Loss: %.5f" % (acc.get()[1], train_loss/(i+1)))


Train Accuracy: 0.92	 Train Loss: 0.32142
Train Accuracy: 0.97	 Train Loss: 0.16773
Train Accuracy: 0.97	 Train Loss: 0.14660

Perturbation

We first run a validation batch and measure the resulting accuracy. We then perturbate this batch by modifying the input in the opposite direction of the gradient.


In [9]:
# Get a batch from the testing set
for data, label in test_data:
    data = data.as_in_context(ctx)
    label = label.as_in_context(ctx)
    break

# Attach gradient to it to get the gradient of the loss with respect to the input
data.attach_grad()
with mx.autograd.record():
    output = net(data)    
    l = loss(output, label)
l.backward()

acc = mx.metric.Accuracy()
acc.update(label, output)

print("Validation batch accuracy {}".format(acc.get()[1]))


Validation batch accuracy 0.96875

Now we perturb the input


In [10]:
data_perturbated = data + 0.15 * mx.nd.sign(data.grad)

output = net(data_perturbated)    

acc = mx.metric.Accuracy()
acc.update(label, output)

print("Validation batch accuracy after perturbation {}".format(acc.get()[1]))


Validation batch accuracy after perturbation 0.40625

Visualization

Let's visualize an example after pertubation.

We can see that the prediction is often incorrect.


In [16]:
from random import randint
idx = randint(0, batch_size-1)

plt.imshow(data_perturbated[idx, :].asnumpy().reshape(28,28), cmap=cm.Greys_r)
print("true label: %d" % label.asnumpy()[idx])
print("predicted: %d" % np.argmax(output.asnumpy(), axis=1)[idx])


true label: 1
predicted: 3