Copyright 2020 The TensorFlow Authors.



In [0]:

    
#@title Licensed under the Apache License, Version 2.0
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

TensorFlow Addons Optimizers: ConditionalGradient

View on TensorFlow.org

View source on GitHub

Download notebook

Overview

This notebook will demonstrate how to use the Conditional Graident Optimizer from the Addons package.

ConditionalGradient

Constraining the parameters of a neural network has been shown to be beneficial in training because of the underlying regularization effects. Often, parameters are constrained via a soft penalty (which never guarantees the constraint satisfaction) or via a projection operation (which is computationally expensive). Conditional gradient (CG) optimizer, on the other hand, enforces the constraints strictly without the need for an expensive projection step. It works by minimizing a linear approximation of the objective within the constraint set. In this notebook, we demonstrate the appliction of Frobenius norm constraint via the CG optimizer on the MNIST dataset. CG is now available as a tensorflow API. More details of the optimizer are available at https://arxiv.org/pdf/1803.06453.pdf

Setup



In [0]:

    
import tensorflow as tf
import tensorflow_addons as tfa
from matplotlib import pyplot as plt









    



TensorFlow 2.x selected.



In [0]:

    
# Hyperparameters
batch_size=64
epochs=10

Build the Model



In [0]:

    
model_1 = tf.keras.Sequential([
    tf.keras.layers.Dense(64, input_shape=(784,), activation='relu', name='dense_1'),
    tf.keras.layers.Dense(64, activation='relu', name='dense_2'),
    tf.keras.layers.Dense(10, activation='softmax', name='predictions'),
])

Prep the Data



In [0]:

    
# Load MNIST dataset as NumPy arrays
dataset = {}
num_validation = 10000
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Preprocess the data
x_train = x_train.reshape(-1, 784).astype('float32') / 255
x_test = x_test.reshape(-1, 784).astype('float32') / 255









    



Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 0s 0us/step

Define a Custom Callback Function



In [0]:

    
def frobenius_norm(m):
    """This function is to calculate the frobenius norm of the matrix of all
    layer's weight.
  
    Args:
        m: is a list of weights param for each layers.
    """
    total_reduce_sum = 0
    for i in range(len(m)):
        total_reduce_sum = total_reduce_sum + tf.math.reduce_sum(m[i]**2)
    norm = total_reduce_sum**0.5
    return norm



In [0]:

    
CG_frobenius_norm_of_weight = []
CG_get_weight_norm = tf.keras.callbacks.LambdaCallback(
    on_epoch_end=lambda batch, logs: CG_frobenius_norm_of_weight.append(
        frobenius_norm(model_1.trainable_weights).numpy()))

Train and Evaluate: Using CG as Optimizer

Simply replace typical keras optimizers with the new tfa optimizer



In [0]:

    
# Compile the model
model_1.compile(
    optimizer=tfa.optimizers.ConditionalGradient(
        learning_rate=0.99949, lambda_=203),  # Utilize TFA optimizer
    loss=tf.keras.losses.SparseCategoricalCrossentropy(),
    metrics=['accuracy'])

history_cg = model_1.fit(
    x_train,
    y_train,
    batch_size=batch_size,
    validation_data=(x_test, y_test),
    epochs=epochs,
    callbacks=[CG_get_weight_norm])









    



Train on 60000 samples, validate on 10000 samples
Epoch 1/10
60000/60000 [==============================] - 5s 85us/sample - loss: 0.3745 - accuracy: 0.8894 - val_loss: 0.2323 - val_accuracy: 0.9275
Epoch 2/10
60000/60000 [==============================] - 3s 50us/sample - loss: 0.1908 - accuracy: 0.9430 - val_loss: 0.1538 - val_accuracy: 0.9547
Epoch 3/10
60000/60000 [==============================] - 3s 49us/sample - loss: 0.1497 - accuracy: 0.9548 - val_loss: 0.1473 - val_accuracy: 0.9560
Epoch 4/10
60000/60000 [==============================] - 3s 49us/sample - loss: 0.1306 - accuracy: 0.9612 - val_loss: 0.1215 - val_accuracy: 0.9609
Epoch 5/10
60000/60000 [==============================] - 3s 49us/sample - loss: 0.1211 - accuracy: 0.9636 - val_loss: 0.1114 - val_accuracy: 0.9660
Epoch 6/10
60000/60000 [==============================] - 3s 48us/sample - loss: 0.1125 - accuracy: 0.9663 - val_loss: 0.1260 - val_accuracy: 0.9640
Epoch 7/10
60000/60000 [==============================] - 3s 50us/sample - loss: 0.1108 - accuracy: 0.9665 - val_loss: 0.1009 - val_accuracy: 0.9697
Epoch 8/10
60000/60000 [==============================] - 3s 51us/sample - loss: 0.1081 - accuracy: 0.9676 - val_loss: 0.1129 - val_accuracy: 0.9647
Epoch 9/10
60000/60000 [==============================] - 3s 50us/sample - loss: 0.1065 - accuracy: 0.9675 - val_loss: 0.1058 - val_accuracy: 0.9683
Epoch 10/10
60000/60000 [==============================] - 3s 51us/sample - loss: 0.1039 - accuracy: 0.9683 - val_loss: 0.1126 - val_accuracy: 0.9646

Train and Evaluate: Using SGD as Optimizer



In [0]:

    
model_2 = tf.keras.Sequential([
    tf.keras.layers.Dense(64, input_shape=(784,), activation='relu', name='dense_1'),
    tf.keras.layers.Dense(64, activation='relu', name='dense_2'),
    tf.keras.layers.Dense(10, activation='softmax', name='predictions'),
])



In [0]:

    
SGD_frobenius_norm_of_weight = []
SGD_get_weight_norm = tf.keras.callbacks.LambdaCallback(
    on_epoch_end=lambda batch, logs: SGD_frobenius_norm_of_weight.append(
        frobenius_norm(model_2.trainable_weights).numpy()))



In [0]:

    
# Compile the model
model_2.compile(
    optimizer=tf.keras.optimizers.SGD(0.01),  # Utilize SGD optimizer
    loss=tf.keras.losses.SparseCategoricalCrossentropy(),
    metrics=['accuracy'])

history_sgd = model_2.fit(
    x_train,
    y_train,
    batch_size=batch_size,
    validation_data=(x_test, y_test),
    epochs=epochs,
    callbacks=[SGD_get_weight_norm])









    



Train on 60000 samples, validate on 10000 samples
Epoch 1/10
60000/60000 [==============================] - 3s 46us/sample - loss: 0.9498 - accuracy: 0.7523 - val_loss: 0.4306 - val_accuracy: 0.8844
Epoch 2/10
60000/60000 [==============================] - 2s 41us/sample - loss: 0.3851 - accuracy: 0.8916 - val_loss: 0.3298 - val_accuracy: 0.9068
Epoch 3/10
60000/60000 [==============================] - 3s 42us/sample - loss: 0.3230 - accuracy: 0.9064 - val_loss: 0.2917 - val_accuracy: 0.9150
Epoch 4/10
60000/60000 [==============================] - 2s 41us/sample - loss: 0.2897 - accuracy: 0.9169 - val_loss: 0.2676 - val_accuracy: 0.9241
Epoch 5/10
60000/60000 [==============================] - 3s 43us/sample - loss: 0.2658 - accuracy: 0.9237 - val_loss: 0.2485 - val_accuracy: 0.9288
Epoch 6/10
60000/60000 [==============================] - 2s 41us/sample - loss: 0.2467 - accuracy: 0.9301 - val_loss: 0.2374 - val_accuracy: 0.9285
Epoch 7/10
60000/60000 [==============================] - 3s 42us/sample - loss: 0.2308 - accuracy: 0.9343 - val_loss: 0.2201 - val_accuracy: 0.9358
Epoch 8/10
60000/60000 [==============================] - 2s 41us/sample - loss: 0.2169 - accuracy: 0.9388 - val_loss: 0.2096 - val_accuracy: 0.9388
Epoch 9/10
60000/60000 [==============================] - 2s 42us/sample - loss: 0.2046 - accuracy: 0.9421 - val_loss: 0.2009 - val_accuracy: 0.9404
Epoch 10/10
60000/60000 [==============================] - 2s 41us/sample - loss: 0.1939 - accuracy: 0.9448 - val_loss: 0.1900 - val_accuracy: 0.9442

Frobenius Norm of Weights: CG vs SGD

The current implementation of CG optimizer is based on Frobenius Norm, with considering Frobenius Norm as regularizer in the target function. Therefore, we compare CG’s regularized effect with SGD optimizer, which has not imposed Frobenius Norm regularizer.



In [0]:

    
plt.plot(
    CG_frobenius_norm_of_weight,
    color='r',
    label='CG_frobenius_norm_of_weights')
plt.plot(
    SGD_frobenius_norm_of_weight,
    color='b',
    label='SGD_frobenius_norm_of_weights')
plt.xlabel('Epoch')
plt.ylabel('Frobenius norm of weights')
plt.legend(loc=1)









    Out[0]:





<matplotlib.legend.Legend at 0x7f2cd4f5f6d8>

Train and Validation Accuracy: CG vs SGD



In [0]:

    
plt.plot(history_cg.history['accuracy'], color='r', label='CG_train')
plt.plot(history_cg.history['val_accuracy'], color='g', label='CG_test')
plt.plot(history_sgd.history['accuracy'], color='pink', label='SGD_train')
plt.plot(history_sgd.history['val_accuracy'], color='b', label='SGD_test')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend(loc=4)









    Out[0]:





<matplotlib.legend.Legend at 0x7f2cd4a3f668>