Squeeze and Excitation

Today’s experiment is based on the Jie Hu, Li Shen, Gang Sun "Squeeze-and-Excitation Networks".

We will compare the usual ResNet to ResNet with squeeze and excitation blocks.

To begin with let's figure out what SE block is.

It improves channel interdependencies at almost no computational cost. The main idea is to add the parameters to each channel of a convolutional block so that the network can adaptively adjust the weighting of each feature map.

Let’s take a closer look at the structure of the block:

Squeeze block

Firstly, each channel's feature maps are squeezed into a single numeric value. This results in a vector of size С, where С is the number of channels.

Excitation block

Afterwards, this value is fed through a tiny two-layer neural network, which outputs a vector of the same size C. These values can now be used as weights for the original feature maps, thus scaling each channel based on its importance.

A more detailed explanation is presented in the blog post on Medium.



In [1]:

    
import sys

import numpy as np
import tensorflow as tf

from tqdm import tqdm_notebook as tqn
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('seaborn-poster')
plt.style.use('ggplot')

sys.path.append('../../..')
from batchflow import B, V
from batchflow.opensets import MNIST
from batchflow.models.tf import ResNet

sys.path.append('../../utils')
import utils









    



---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
C:\Anaconda36\lib\site-packages\tensorflow\python\platform\self_check.py in preload_check()
     74         try:
---> 75           ctypes.WinDLL(build_info.cudart_dll_name)
     76         except OSError:

C:\Anaconda36\lib\ctypes\__init__.py in __init__(self, name, mode, handle, use_errno, use_last_error)
    347         if handle is None:
--> 348             self._handle = _dlopen(self._name, mode)
    349         else:

OSError: [WinError 126] Не найден указанный модуль

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
<ipython-input-1-35c035e89cf6> in <module>()
      2 
      3 import numpy as np
----> 4 import tensorflow as tf
      5 
      6 from tqdm import tqdm_notebook as tqn

C:\Anaconda36\lib\site-packages\tensorflow\__init__.py in <module>()
     22 
     23 # pylint: disable=wildcard-import
---> 24 from tensorflow.python import *
     25 # pylint: enable=wildcard-import
     26 

C:\Anaconda36\lib\site-packages\tensorflow\python\__init__.py in <module>()
     47 import numpy as np
     48 
---> 49 from tensorflow.python import pywrap_tensorflow
     50 
     51 # Protocol buffers

C:\Anaconda36\lib\site-packages\tensorflow\python\pywrap_tensorflow.py in <module>()
     28 # Perform pre-load sanity checks in order to produce a more actionable error
     29 # than we get from an error during SWIG import.
---> 30 self_check.preload_check()
     31 
     32 # pylint: disable=wildcard-import,g-import-not-at-top,unused-import,line-too-long

C:\Anaconda36\lib\site-packages\tensorflow\python\platform\self_check.py in preload_check()
     80               "environment variable. Download and install CUDA %s from "
     81               "this URL: https://developer.nvidia.com/cuda-toolkit"
---> 82               % (build_info.cudart_dll_name, build_info.cuda_version_number))
     83 
     84       if hasattr(build_info, "cudnn_dll_name") and hasattr(

ImportError: Could not find 'cudart64_90.dll'. TensorFlow requires that this DLL be installed in a directory that is named in your %PATH% environment variable. Download and install CUDA 9.0 from this URL: https://developer.nvidia.com/cuda-toolkit

As always, let's create a dataset with MNIST data



In [2]:

    
dset = MNIST()









    



Extracting /tmp/t10k-images-idx3-ubyte.gz
ExtractingExtracting  /tmp/train-images-idx3-ubyte.gz/tmp/train-labels-idx1-ubyte.gz

Extracting /tmp/t10k-labels-idx1-ubyte.gz

We will use the standard ResNet from the BatchFlow models.

For comparison, we will create classic ResNet and ResNet with SE blocks. Both models have the same number of blocks. SE block is added to the model by specifying True value for the key 'body/block/se_block' in the config as shown below.



In [3]:

    
ResNet_config = {
    'inputs':{'images': {'shape': (28, 28, 1)},
              'labels': {'classes': 10,
                         'transform': 'ohe',
                         'dtype': 'int32',
                         'name': 'targets'}},
    'input_block/inputs': 'images',
    'body/num_blocks': [1, 1, 1, 1],
    'body/filters': [64, 128, 256, 512],
    'body/block/bottleneck': True,
    'body/block/post_activation': tf.nn.relu,
    'body/block/layout': 'cna cna cn',
    'loss': 'ce',
    'optimizer': 'Adam',    
}

SE_config = {
    **ResNet_config, 
    'body/block/se_block': True,
}

Now create pipelines with the given configurations for a simple ResNet model



In [4]:

    
res_train_ppl = (dset.train.p
                 .init_model('dynamic',
                             ResNet,
                             'resnet',
                             config=ResNet_config)
                 .train_model('resnet',
                              feed_dict={'images': B('images'),
                                         'labels': B('labels')}))

res_test_ppl = (dset.test.p
                .init_variable('resloss', init_on_each_run=list)
                .import_model('resnet', res_train_ppl)
                .predict_model('resnet',
                               fetches='loss',
                               feed_dict={'images': B('images'),
                                          'labels': B('labels')},
                               save_to=V('resloss'), 
                               mode='a'))

And now the model with SE blocks



In [5]:

    
se_train_ppl = (dset.train.p
                .init_model('dynamic',
                            ResNet,
                            'se_block',
                            config=SE_config)
                .train_model('se_block',
                             feed_dict={'images': B('images'),
                                        'labels': B('labels')}))

se_test_ppl = (dset.test.p
               .init_variable('seloss', init_on_each_run=list)
               .import_model('se_block', se_train_ppl)
               .predict_model('se_block',
                              fetches='loss',
                              feed_dict={'images': B('images'),
                                         'labels': B('labels')},
                              save_to=V('seloss'), 
                              mode='a'))

After that, train our models



In [6]:

    
for i in tqn(range(500)):
    res_train_ppl.next_batch(300, n_epochs=None, shuffle=2)
    res_test_ppl.next_batch(300, n_epochs=None, shuffle=2)
    se_train_ppl.next_batch(300, n_epochs=None, shuffle=2)
    se_test_ppl.next_batch(300, n_epochs=None, shuffle=2)









    



Widget Javascript not detected.  It may not be installed or enabled properly.

It’s time to show the entire learning process



In [7]:

    
ResNet_loss = res_test_ppl.get_variable('resloss')
SE_loss = se_test_ppl.get_variable('seloss')
utils.draw(ResNet_loss, 'ResNet', SE_loss, 'Squeeze and excitation')









    



../../utils.py:44: FutureWarning: pd.ewm_mean is deprecated for ndarrays and will be removed in a future version
  firt_ewma = ewma(np.array(first), span=window, adjust=False)
../../utils.py:45: FutureWarning: pd.ewm_mean is deprecated for ndarrays and will be removed in a future version
  second_ewma = ewma(np.array(second), span=window, adjust=False) if second else None

On this plot, it is very difficult to see the difference between them. Let’s look at the chart closer to see the last 200 iterations.



In [8]:

    
utils.draw(ResNet_loss, 'ResNet', SE_loss, 'Squeeze and excitation', bound=[300, 500, 0, 0.3])









    



../../utils.py:44: FutureWarning: pd.ewm_mean is deprecated for ndarrays and will be removed in a future version
  firt_ewma = ewma(np.array(first), span=window, adjust=False)
../../utils.py:45: FutureWarning: pd.ewm_mean is deprecated for ndarrays and will be removed in a future version
  second_ewma = ewma(np.array(second), span=window, adjust=False) if second else None

Because of the large variance, it is again impossible to tell which model is better. We can try to smooth out and see how the error will behave.



In [9]:

    
utils.draw(ResNet_loss, 'ResNet', SE_loss, 'Squeeze and excitation', window=50, bound=[300, 500, 0, 0.3])









    



../../utils.py:44: FutureWarning: pd.ewm_mean is deprecated for ndarrays and will be removed in a future version
  firt_ewma = ewma(np.array(first), span=window, adjust=False)
../../utils.py:45: FutureWarning: pd.ewm_mean is deprecated for ndarrays and will be removed in a future version
  second_ewma = ewma(np.array(second), span=window, adjust=False) if second else None

It's clearer now that squeeze and excitation block on average gives better quality than simple ResNet. And SE ResNet has approximately the same number of parameters:

SE ResNet - 23994378
classic ResNet - 23495690.

While SE blocks have been empirically shown to improve network performance, let's understand how the self-gating excitation mechanism operates in practice. To provide a clearer picture of the behavior of SE blocks, we will draw the values of activations from our SE ResNet and examine their distribution with respect to different classes. In this case, the distribution is the difference between the activation values for examples with different classes.



In [10]:

    
def get_maps(graph, ppl, sess):
    operations = graph.get_operations()
    head_operations = [oper for oper in operations if 'head' in oper.name]
    oper_name = head_operations[1].name + ':0'

    next_batch = ppl.next_batch()
    maps = sess.run(oper_name, 
                    feed_dict={
                        'ResNet/inputs/images:0': next_batch.images, 
                        'ResNet/inputs/labels:0': next_batch.labels,
                        'ResNet/globals/is_training:0': False
                    })

    return maps, next_batch.labels

Loading our maps and answers



In [11]:

    
res_sess = res_test_ppl.get_model_by_name("resnet").session
res_graph = res_sess.graph

se_sess = se_test_ppl.get_model_by_name('se_block').session
se_graph = se_sess.graph

res_maps, res_answers = get_maps(res_graph, res_test_ppl, res_sess)
se_maps, se_answers = get_maps(se_graph, se_test_ppl, se_sess)

Draw a plot of the distribution of card activations after GAP for individual classes. Each line is the distribution of one class.



In [12]:

    
def draw_avgpooling(maps, answers, model=True):
    import seaborn as sns
    from pandas import ewma
    col = sns.color_palette("Set2", 8) + sns.color_palette(["#9b59b6", "#3498db"])

    indices = np.array([np.where(answers == i)[0] for i in range(10)])

    filters = np.array([np.mean(maps[indices[i]], axis=0).reshape(-1) for i in range(10)])
    for i in range(10):
        plt.plot(ewma(filters[i], span=350, adjust=False), color=col[i], label=str(i))

    plt.title("Distribution of average pooling in "+("SE ResNet" if model else 'simple ResNet'))
    plt.legend(fontsize=16, bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
    plt.ylabel('Activation value', fontsize=18)
    plt.xlabel('Future map index', fontsize=18)
    plt.axis([0, 2060, 0., 1.])
    plt.show()



In [13]:

    
draw_avgpooling(se_maps, se_answers)
draw_avgpooling(res_maps, res_answers, False)









    



/home/anton/.local/lib/python3.5/site-packages/ipykernel_launcher.py:10: FutureWarning: pd.ewm_mean is deprecated for ndarrays and will be removed in a future version
  # Remove the CWD from sys.path while we load stuff.

On the first picture there are future maps, depending on the class, the activation values vary greatly. In the second picture, this behavior is not observed and the activation value in each future map is practically independent of the class of the object.

Conclusion:

Squeeze and excitation block gives better quality than simple ResNet with almost the same number of parameters.
The distribution of SE ResNet maps is wider than that of a simple ResNet.

And what's next?

Squeeze and excitation block can be used not only in ResNet. You can embed it yourself in your favorite network and see how it will affect the quality.
If you want to know more about BatchFlow library take a look at the tutorials.
Read and apply another experiments:
- FreezeOut
- Stochastic Depth
- or choose one from the list of experiments.