Experiment:

  • Opposite of Hebbian Learning: Hebbian Learning by pruning the highest coactivation, instead of the lowest.
  • Opposite of Hebbian Growth: growth connections by allowing gradient flow on connections with the lowest coactivation, instead of the highest

Motivation.

  • Verify the relevance of highest coactivated units, by checking their impact on the model when they are pruned
  • Verify the relevance of lowest coactivated units, by checking their impact on the model when they are added to the model

Conclusions:

  • The opposite logic of hebbian pruning, when weight pruning is set to 0, clearly affects the model performance.
  • Acc when full pruning is done at each state is 0.965 {(1,0), (0,1), (1,1)}
  • Acc with no pruning is 0.977 {(0,0)}
  • Best acc is still with only magnitude based pruning {(0,0.2), (0, 0.4)}
  • Opposite of hebbian prunning (removing connections with highest coactivation) only is harmful to the model, with acc equal or worst than full pruning, even with as low as 0.2 pruning
  • Opposite random growth (adding connections with lowest activation) reduces acc by ~ 0.02

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import sys
sys.path.append("../../")

In [3]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
import glob
import tabulate
import pprint
import click
import numpy as np
import pandas as pd
from ray.tune.commands import *
from dynamic_sparse.common.browser import *

Load and check data


In [4]:
exps = ['neurips_debug_test10', 'neurips_debug_test11']
paths = [os.path.expanduser("~/nta/results/{}".format(e)) for e in exps]
df = load_many(paths)

In [5]:
df.head(5)


Out[5]:
Experiment Name train_acc_max train_acc_max_epoch train_acc_min train_acc_min_epoch train_acc_median train_acc_last val_acc_max val_acc_max_epoch val_acc_min ... momentum network num_classes on_perc optim_alg pruning_early_stop test_noise use_kwinners weight_decay weight_prune_perc
0 0_hebbian_prune_perc=None,weight_prune_perc=None 0.987767 29 0.921683 0 0.984892 0.987767 0.9764 17 0.9629 ... 0.9 MLPHeb 10 0.2 SGD 0 False False 0.0001 NaN
1 1_hebbian_prune_perc=0.2,weight_prune_perc=None 0.931967 1 0.852817 22 0.876142 0.866717 0.9653 0 0.9016 ... 0.9 MLPHeb 10 0.2 SGD 0 False False 0.0001 NaN
2 2_hebbian_prune_perc=0.4,weight_prune_perc=None 0.925267 0 0.842883 13 0.868217 0.860283 0.9648 0 0.9008 ... 0.9 MLPHeb 10 0.2 SGD 0 False False 0.0001 NaN
3 3_hebbian_prune_perc=0.6,weight_prune_perc=None 0.922650 0 0.810317 22 0.869442 0.854883 0.9612 0 0.8888 ... 0.9 MLPHeb 10 0.2 SGD 0 False False 0.0001 NaN
4 4_hebbian_prune_perc=0.8,weight_prune_perc=None 0.926633 0 0.208517 28 0.878492 0.397800 0.9647 0 0.2306 ... 0.9 MLPHeb 10 0.2 SGD 0 False False 0.0001 NaN

5 rows × 42 columns


In [6]:
# replace hebbian prine
df['hebbian_prune_perc'] = df['hebbian_prune_perc'].replace(np.nan, 0.0, regex=True)
df['weight_prune_perc'] = df['weight_prune_perc'].replace(np.nan, 0.0, regex=True)

In [7]:
df.columns


Out[7]:
Index(['Experiment Name', 'train_acc_max', 'train_acc_max_epoch',
       'train_acc_min', 'train_acc_min_epoch', 'train_acc_median',
       'train_acc_last', 'val_acc_max', 'val_acc_max_epoch', 'val_acc_min',
       'val_acc_min_epoch', 'val_acc_median', 'val_acc_last', 'epochs',
       'experiment_file_name', 'trial_time', 'mean_epoch_time', 'batch_norm',
       'data_dir', 'dataset_name', 'debug_sparse', 'debug_weights', 'device',
       'hebbian_grow', 'hebbian_prune_perc', 'hidden_sizes', 'input_size',
       'learning_rate', 'lr_gamma', 'lr_milestones', 'lr_scheduler', 'model',
       'momentum', 'network', 'num_classes', 'on_perc', 'optim_alg',
       'pruning_early_stop', 'test_noise', 'use_kwinners', 'weight_decay',
       'weight_prune_perc'],
      dtype='object')

In [8]:
df.shape


Out[8]:
(217, 42)

In [9]:
df.iloc[1]


Out[9]:
Experiment Name           1_hebbian_prune_perc=0.2,weight_prune_perc=None
train_acc_max                                                    0.931967
train_acc_max_epoch                                                     1
train_acc_min                                                    0.852817
train_acc_min_epoch                                                    22
train_acc_median                                                 0.876142
train_acc_last                                                   0.866717
val_acc_max                                                        0.9653
val_acc_max_epoch                                                       0
val_acc_min                                                        0.9016
val_acc_min_epoch                                                      22
val_acc_median                                                    0.92105
val_acc_last                                                       0.9216
epochs                                                                 30
experiment_file_name    /Users/lsouza/nta/results/neurips_debug_test10...
trial_time                                                        19.7118
mean_epoch_time                                                  0.657061
batch_norm                                                           True
data_dir                                        /home/ubuntu/nta/datasets
dataset_name                                                        MNIST
debug_sparse                                                         True
debug_weights                                                        True
device                                                               cuda
hebbian_grow                                                        False
hebbian_prune_perc                                                    0.2
hidden_sizes                                                          100
input_size                                                            784
learning_rate                                                         0.1
lr_gamma                                                              0.1
lr_milestones                                                          60
lr_scheduler                                                  MultiStepLR
model                                                        DSNNMixedHeb
momentum                                                              0.9
network                                                            MLPHeb
num_classes                                                            10
on_perc                                                               0.2
optim_alg                                                             SGD
pruning_early_stop                                                      0
test_noise                                                          False
use_kwinners                                                        False
weight_decay                                                       0.0001
weight_prune_perc                                                       0
Name: 1, dtype: object

In [10]:
df.groupby('model')['model'].count()


Out[10]:
model
DSNNMixedHeb    217
Name: model, dtype: int64

Analysis

Experiment Details

base_exp_config = dict( device="cuda", # dataset related dataset_name="MNIST", data_dir=os.path.expanduser("~/nta/datasets"), input_size=784, num_classes=10, # network related network="MLPHeb", hidden_sizes=[100, 100, 100], batch_norm=True, use_kwinners=tune.grid_search([True, False]), # model related model="DSNNMixedHeb", on_perc=0.2, optim_alg="SGD", momentum=0.9, weight_decay=1e-4, learning_rate=0.1, lr_scheduler="MultiStepLR", lr_milestones=[30,60,90], lr_gamma=0.1, # sparse related hebbian_prune_perc=tune.grid_search([0, 0.1, 0.2, 0.3, 0.4, 0.5]), pruning_early_stop=0, hebbian_grow=tune.grid_search([True, False]), # additional validation test_noise=False, # debugging debug_weights=True, debug_sparse=True, stop={"training_iteration": 30}, )

In [11]:
# Did any  trials failed?
df[df["epochs"]<30]["epochs"].count()


Out[11]:
1

In [12]:
# Removing failed or incomplete trials
df_origin = df.copy()
df = df_origin[df_origin["epochs"]>=30]
df.shape


Out[12]:
(216, 42)

In [13]:
# which ones failed?
# failed, or still ongoing?
df_origin['failed'] = df_origin["epochs"]<30
df_origin[df_origin['failed']]['epochs']


Out[13]:
108    1
Name: epochs, dtype: int64

In [14]:
# helper functions
def mean_and_std(s):
    return "{:.3f} ± {:.3f}".format(s.mean(), s.std())

def round_mean(s):
    return "{:.0f}".format(round(s.mean()))

stats = ['min', 'max', 'mean', 'std']

def agg(columns, filter=None, round=3):
    if filter is None:
        return (df.groupby(columns)
             .agg({'val_acc_max_epoch': round_mean,
                   'val_acc_max': stats,                
                   'model': ['count']})).round(round)
    else:
        return (df[filter].groupby(columns)
             .agg({'val_acc_max_epoch': round_mean,
                   'val_acc_max': stats,                
                   'model': ['count']})).round(round)
What is the impact of removing connections with highest coactivation

In [15]:
random_grow = (df['hebbian_grow'] == False)

In [16]:
agg(['hebbian_prune_perc'], random_grow)


Out[16]:
val_acc_max_epoch val_acc_max model
round_mean min max mean std count
hebbian_prune_perc
0.0 21 0.964 0.982 0.977 0.006 18
0.2 12 0.961 0.980 0.973 0.007 18
0.4 15 0.959 0.980 0.973 0.008 18
0.6 17 0.957 0.980 0.973 0.009 18
0.8 16 0.960 0.981 0.974 0.008 18
1.0 15 0.964 0.982 0.975 0.008 18

In [17]:
agg(['weight_prune_perc'], random_grow)


Out[17]:
val_acc_max_epoch val_acc_max model
round_mean min max mean std count
weight_prune_perc
0.0 4 0.957 0.978 0.966 0.006 18
0.2 22 0.976 0.982 0.979 0.002 18
0.4 21 0.977 0.982 0.979 0.001 18
0.6 23 0.978 0.981 0.979 0.001 18
0.8 23 0.976 0.981 0.979 0.001 18
1.0 3 0.959 0.967 0.963 0.002 18
What is the optimal combination of both

In [18]:
pd.pivot_table(df[random_grow], 
              index='hebbian_prune_perc',
              columns='weight_prune_perc',
              values='val_acc_max',
              aggfunc=mean_and_std)


Out[18]:
weight_prune_perc 0.0 0.2 0.4 0.6 0.8 1.0
hebbian_prune_perc
0.0 0.977 ± 0.001 0.981 ± 0.001 0.981 ± 0.001 0.980 ± 0.001 0.980 ± 0.000 0.965 ± 0.001
0.2 0.964 ± 0.001 0.978 ± 0.002 0.978 ± 0.000 0.979 ± 0.001 0.977 ± 0.001 0.963 ± 0.001
0.4 0.963 ± 0.001 0.978 ± 0.001 0.979 ± 0.001 0.979 ± 0.001 0.979 ± 0.001 0.962 ± 0.003
0.6 0.960 ± 0.003 0.979 ± 0.000 0.979 ± 0.002 0.980 ± 0.001 0.979 ± 0.001 0.963 ± 0.002
0.8 0.965 ± 0.000 0.980 ± 0.001 0.980 ± 0.001 0.979 ± 0.000 0.979 ± 0.001 0.961 ± 0.001
1.0 0.964 ± 0.000 0.981 ± 0.001 0.980 ± 0.001 0.980 ± 0.001 0.980 ± 0.001 0.965 ± 0.002
  • The opposite logic of hebbian pruning, when weight pruning is set to 0, clearly affects the model performance.
  • Acc when full pruning is done at each state is 0.965 {(1,0), (0,1), (1,1)}
  • Acc with no pruning is 0.977 {(0,0)}
  • Best acc is still with only magnitude based pruning {(0,0.2), (0, 0.4)}
  • Opposite of hebbian prunning (removing connections with highest coactivation) only is harmful to the model, with acc equal or worst than full pruning, even with as low as 0.2 pruning
What is the impact of the adding connections with lowest coactivation

In [19]:
# with and without hebbian grow
agg('hebbian_grow')


Out[19]:
val_acc_max_epoch val_acc_max model
round_mean min max mean std count
hebbian_grow
False 16 0.957 0.982 0.974 0.008 108
True 13 0.956 0.979 0.972 0.007 108

In [20]:
# with and without hebbian grow
pd.pivot_table(df, 
              index=['hebbian_grow', 'hebbian_prune_perc'],
              columns='weight_prune_perc',
              values='val_acc_max',
              aggfunc=mean_and_std)


Out[20]:
weight_prune_perc 0.0 0.2 0.4 0.6 0.8 1.0
hebbian_grow hebbian_prune_perc
False 0.0 0.977 ± 0.001 0.981 ± 0.001 0.981 ± 0.001 0.980 ± 0.001 0.980 ± 0.000 0.965 ± 0.001
0.2 0.964 ± 0.001 0.978 ± 0.002 0.978 ± 0.000 0.979 ± 0.001 0.977 ± 0.001 0.963 ± 0.001
0.4 0.963 ± 0.001 0.978 ± 0.001 0.979 ± 0.001 0.979 ± 0.001 0.979 ± 0.001 0.962 ± 0.003
0.6 0.960 ± 0.003 0.979 ± 0.000 0.979 ± 0.002 0.980 ± 0.001 0.979 ± 0.001 0.963 ± 0.002
0.8 0.965 ± 0.000 0.980 ± 0.001 0.980 ± 0.001 0.979 ± 0.000 0.979 ± 0.001 0.961 ± 0.001
1.0 0.964 ± 0.000 0.981 ± 0.001 0.980 ± 0.001 0.980 ± 0.001 0.980 ± 0.001 0.965 ± 0.002
True 0.0 0.976 ± 0.001 0.978 ± 0.001 0.979 ± 0.001 0.975 ± 0.001 0.974 ± 0.001 0.963 ± 0.002
0.2 0.961 ± 0.001 0.977 ± 0.001 0.978 ± 0.001 0.977 ± 0.001 0.975 ± 0.000 0.964 ± 0.003
0.4 0.962 ± 0.002 0.978 ± 0.001 0.976 ± 0.000 0.976 ± 0.000 0.974 ± 0.001 0.962 ± 0.001
0.6 0.963 ± 0.001 0.977 ± 0.001 0.977 ± 0.000 0.977 ± 0.001 0.974 ± 0.001 0.961 ± 0.002
0.8 0.963 ± 0.003 0.977 ± 0.000 0.976 ± 0.001 0.976 ± 0.001 0.974 ± 0.001 0.961 ± 0.004
1.0 0.958 ± 0.002 0.977 ± 0.001 0.977 ± 0.001 0.976 ± 0.001 0.972 ± 0.001 0.961 ± 0.002
  • Opposite random growth (adding connections with lowest activation) reduces acc by ~ 0.02

In [ ]: