Experiment:

Opposite of Hebbian Learning: Hebbian Learning by pruning the highest coactivation, instead of the lowest.
Opposite of Hebbian Growth: growth connections by allowing gradient flow on connections with the lowest coactivation, instead of the highest

Motivation.

Verify the relevance of highest coactivated units, by checking their impact on the model when they are pruned
Verify the relevance of lowest coactivated units, by checking their impact on the model when they are added to the model

Conclusions:

The opposite logic of hebbian pruning, when weight pruning is set to 0, clearly affects the model performance.
Acc when full pruning is done at each state is 0.965 {(1,0), (0,1), (1,1)}
Acc with no pruning is 0.977 {(0,0)}
Best acc is still with only magnitude based pruning {(0,0.2), (0, 0.4)}
Opposite of hebbian prunning (removing connections with highest coactivation) only is harmful to the model, with acc equal or worst than full pruning, even with as low as 0.2 pruning
Opposite random growth (adding connections with lowest activation) reduces acc by ~ 0.02



In [1]:

    
%load_ext autoreload
%autoreload 2



In [2]:

    
import sys
sys.path.append("../../")



In [3]:

    
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
import glob
import tabulate
import pprint
import click
import numpy as np
import pandas as pd
from ray.tune.commands import *
from dynamic_sparse.common.browser import *

Load and check data



In [4]:

    
exps = ['neurips_debug_test10', 'neurips_debug_test11']
paths = [os.path.expanduser("~/nta/results/{}".format(e)) for e in exps]
df = load_many(paths)



In [5]:

    
df.head(5)









    Out[5]:







  
    
      
      Experiment Name
      train_acc_max
      train_acc_max_epoch
      train_acc_min
      train_acc_min_epoch
      train_acc_median
      train_acc_last
      val_acc_max
      val_acc_max_epoch
      val_acc_min
      ...
      momentum
      network
      num_classes
      on_perc
      optim_alg
      pruning_early_stop
      test_noise
      use_kwinners
      weight_decay
      weight_prune_perc
    
  
  
    
      0
      0_hebbian_prune_perc=None,weight_prune_perc=None
      0.987767
      29
      0.921683
      0
      0.984892
      0.987767
      0.9764
      17
      0.9629
      ...
      0.9
      MLPHeb
      10
      0.2
      SGD
      0
      False
      False
      0.0001
      NaN
    
    
      1
      1_hebbian_prune_perc=0.2,weight_prune_perc=None
      0.931967
      1
      0.852817
      22
      0.876142
      0.866717
      0.9653
      0
      0.9016
      ...
      0.9
      MLPHeb
      10
      0.2
      SGD
      0
      False
      False
      0.0001
      NaN
    
    
      2
      2_hebbian_prune_perc=0.4,weight_prune_perc=None
      0.925267
      0
      0.842883
      13
      0.868217
      0.860283
      0.9648
      0
      0.9008
      ...
      0.9
      MLPHeb
      10
      0.2
      SGD
      0
      False
      False
      0.0001
      NaN
    
    
      3
      3_hebbian_prune_perc=0.6,weight_prune_perc=None
      0.922650
      0
      0.810317
      22
      0.869442
      0.854883
      0.9612
      0
      0.8888
      ...
      0.9
      MLPHeb
      10
      0.2
      SGD
      0
      False
      False
      0.0001
      NaN
    
    
      4
      4_hebbian_prune_perc=0.8,weight_prune_perc=None
      0.926633
      0
      0.208517
      28
      0.878492
      0.397800
      0.9647
      0
      0.2306
      ...
      0.9
      MLPHeb
      10
      0.2
      SGD
      0
      False
      False
      0.0001
      NaN
    
  

5 rows × 42 columns



In [6]:

    
# replace hebbian prine
df['hebbian_prune_perc'] = df['hebbian_prune_perc'].replace(np.nan, 0.0, regex=True)
df['weight_prune_perc'] = df['weight_prune_perc'].replace(np.nan, 0.0, regex=True)



In [7]:

    
df.columns









    Out[7]:





Index(['Experiment Name', 'train_acc_max', 'train_acc_max_epoch',
       'train_acc_min', 'train_acc_min_epoch', 'train_acc_median',
       'train_acc_last', 'val_acc_max', 'val_acc_max_epoch', 'val_acc_min',
       'val_acc_min_epoch', 'val_acc_median', 'val_acc_last', 'epochs',
       'experiment_file_name', 'trial_time', 'mean_epoch_time', 'batch_norm',
       'data_dir', 'dataset_name', 'debug_sparse', 'debug_weights', 'device',
       'hebbian_grow', 'hebbian_prune_perc', 'hidden_sizes', 'input_size',
       'learning_rate', 'lr_gamma', 'lr_milestones', 'lr_scheduler', 'model',
       'momentum', 'network', 'num_classes', 'on_perc', 'optim_alg',
       'pruning_early_stop', 'test_noise', 'use_kwinners', 'weight_decay',
       'weight_prune_perc'],
      dtype='object')



In [8]:

    
df.shape









    Out[8]:





(217, 42)



In [9]:

    
df.iloc[1]









    Out[9]:





Experiment Name           1_hebbian_prune_perc=0.2,weight_prune_perc=None
train_acc_max                                                    0.931967
train_acc_max_epoch                                                     1
train_acc_min                                                    0.852817
train_acc_min_epoch                                                    22
train_acc_median                                                 0.876142
train_acc_last                                                   0.866717
val_acc_max                                                        0.9653
val_acc_max_epoch                                                       0
val_acc_min                                                        0.9016
val_acc_min_epoch                                                      22
val_acc_median                                                    0.92105
val_acc_last                                                       0.9216
epochs                                                                 30
experiment_file_name    /Users/lsouza/nta/results/neurips_debug_test10...
trial_time                                                        19.7118
mean_epoch_time                                                  0.657061
batch_norm                                                           True
data_dir                                        /home/ubuntu/nta/datasets
dataset_name                                                        MNIST
debug_sparse                                                         True
debug_weights                                                        True
device                                                               cuda
hebbian_grow                                                        False
hebbian_prune_perc                                                    0.2
hidden_sizes                                                          100
input_size                                                            784
learning_rate                                                         0.1
lr_gamma                                                              0.1
lr_milestones                                                          60
lr_scheduler                                                  MultiStepLR
model                                                        DSNNMixedHeb
momentum                                                              0.9
network                                                            MLPHeb
num_classes                                                            10
on_perc                                                               0.2
optim_alg                                                             SGD
pruning_early_stop                                                      0
test_noise                                                          False
use_kwinners                                                        False
weight_decay                                                       0.0001
weight_prune_perc                                                       0
Name: 1, dtype: object



In [10]:

    
df.groupby('model')['model'].count()









    Out[10]:





model
DSNNMixedHeb    217
Name: model, dtype: int64

Analysis

Experiment Details

base_exp_config = dict( device="cuda", # dataset related dataset_name="MNIST", data_dir=os.path.expanduser("~/nta/datasets"), input_size=784, num_classes=10, # network related network="MLPHeb", hidden_sizes=[100, 100, 100], batch_norm=True, use_kwinners=tune.grid_search([True, False]), # model related model="DSNNMixedHeb", on_perc=0.2, optim_alg="SGD", momentum=0.9, weight_decay=1e-4, learning_rate=0.1, lr_scheduler="MultiStepLR", lr_milestones=[30,60,90], lr_gamma=0.1, # sparse related hebbian_prune_perc=tune.grid_search([0, 0.1, 0.2, 0.3, 0.4, 0.5]), pruning_early_stop=0, hebbian_grow=tune.grid_search([True, False]), # additional validation test_noise=False, # debugging debug_weights=True, debug_sparse=True, stop={"training_iteration": 30}, )



In [11]:

    
# Did any  trials failed?
df[df["epochs"]<30]["epochs"].count()









    Out[11]:





1



In [12]:

    
# Removing failed or incomplete trials
df_origin = df.copy()
df = df_origin[df_origin["epochs"]>=30]
df.shape









    Out[12]:





(216, 42)



In [13]:

    
# which ones failed?
# failed, or still ongoing?
df_origin['failed'] = df_origin["epochs"]<30
df_origin[df_origin['failed']]['epochs']









    Out[13]:





108    1
Name: epochs, dtype: int64



In [14]:

    
# helper functions
def mean_and_std(s):
    return "{:.3f} ± {:.3f}".format(s.mean(), s.std())

def round_mean(s):
    return "{:.0f}".format(round(s.mean()))

stats = ['min', 'max', 'mean', 'std']

def agg(columns, filter=None, round=3):
    if filter is None:
        return (df.groupby(columns)
             .agg({'val_acc_max_epoch': round_mean,
                   'val_acc_max': stats,                
                   'model': ['count']})).round(round)
    else:
        return (df[filter].groupby(columns)
             .agg({'val_acc_max_epoch': round_mean,
                   'val_acc_max': stats,                
                   'model': ['count']})).round(round)

What is the impact of removing connections with highest coactivation



In [15]:

    
random_grow = (df['hebbian_grow'] == False)



In [16]:

    
agg(['hebbian_prune_perc'], random_grow)









    Out[16]:







  
    
      
      val_acc_max_epoch
      val_acc_max
      model
    
    
      
      round_mean
      min
      max
      mean
      std
      count
    
    
      hebbian_prune_perc
      
      
      
      
      
      
    
  
  
    
      0.0
      21
      0.964
      0.982
      0.977
      0.006
      18
    
    
      0.2
      12
      0.961
      0.980
      0.973
      0.007
      18
    
    
      0.4
      15
      0.959
      0.980
      0.973
      0.008
      18
    
    
      0.6
      17
      0.957
      0.980
      0.973
      0.009
      18
    
    
      0.8
      16
      0.960
      0.981
      0.974
      0.008
      18
    
    
      1.0
      15
      0.964
      0.982
      0.975
      0.008
      18



In [17]:

    
agg(['weight_prune_perc'], random_grow)









    Out[17]:







  
    
      
      val_acc_max_epoch
      val_acc_max
      model
    
    
      
      round_mean
      min
      max
      mean
      std
      count
    
    
      weight_prune_perc
      
      
      
      
      
      
    
  
  
    
      0.0
      4
      0.957
      0.978
      0.966
      0.006
      18
    
    
      0.2
      22
      0.976
      0.982
      0.979
      0.002
      18
    
    
      0.4
      21
      0.977
      0.982
      0.979
      0.001
      18
    
    
      0.6
      23
      0.978
      0.981
      0.979
      0.001
      18
    
    
      0.8
      23
      0.976
      0.981
      0.979
      0.001
      18
    
    
      1.0
      3
      0.959
      0.967
      0.963
      0.002
      18

What is the optimal combination of both



In [18]:

    
pd.pivot_table(df[random_grow], 
              index='hebbian_prune_perc',
              columns='weight_prune_perc',
              values='val_acc_max',
              aggfunc=mean_and_std)









    Out[18]:







  
    
      weight_prune_perc
      0.0
      0.2
      0.4
      0.6
      0.8
      1.0
    
    
      hebbian_prune_perc
      
      
      
      
      
      
    
  
  
    
      0.0
      0.977 ± 0.001
      0.981 ± 0.001
      0.981 ± 0.001
      0.980 ± 0.001
      0.980 ± 0.000
      0.965 ± 0.001
    
    
      0.2
      0.964 ± 0.001
      0.978 ± 0.002
      0.978 ± 0.000
      0.979 ± 0.001
      0.977 ± 0.001
      0.963 ± 0.001
    
    
      0.4
      0.963 ± 0.001
      0.978 ± 0.001
      0.979 ± 0.001
      0.979 ± 0.001
      0.979 ± 0.001
      0.962 ± 0.003
    
    
      0.6
      0.960 ± 0.003
      0.979 ± 0.000
      0.979 ± 0.002
      0.980 ± 0.001
      0.979 ± 0.001
      0.963 ± 0.002
    
    
      0.8
      0.965 ± 0.000
      0.980 ± 0.001
      0.980 ± 0.001
      0.979 ± 0.000
      0.979 ± 0.001
      0.961 ± 0.001
    
    
      1.0
      0.964 ± 0.000
      0.981 ± 0.001
      0.980 ± 0.001
      0.980 ± 0.001
      0.980 ± 0.001
      0.965 ± 0.002

The opposite logic of hebbian pruning, when weight pruning is set to 0, clearly affects the model performance.
Acc when full pruning is done at each state is 0.965 {(1,0), (0,1), (1,1)}
Acc with no pruning is 0.977 {(0,0)}
Best acc is still with only magnitude based pruning {(0,0.2), (0, 0.4)}
Opposite of hebbian prunning (removing connections with highest coactivation) only is harmful to the model, with acc equal or worst than full pruning, even with as low as 0.2 pruning

What is the impact of the adding connections with lowest coactivation



In [19]:

    
# with and without hebbian grow
agg('hebbian_grow')









    Out[19]:







  
    
      
      val_acc_max_epoch
      val_acc_max
      model
    
    
      
      round_mean
      min
      max
      mean
      std
      count
    
    
      hebbian_grow
      
      
      
      
      
      
    
  
  
    
      False
      16
      0.957
      0.982
      0.974
      0.008
      108
    
    
      True
      13
      0.956
      0.979
      0.972
      0.007
      108



In [20]:

    
# with and without hebbian grow
pd.pivot_table(df, 
              index=['hebbian_grow', 'hebbian_prune_perc'],
              columns='weight_prune_perc',
              values='val_acc_max',
              aggfunc=mean_and_std)









    Out[20]:







  
    
      
      weight_prune_perc
      0.0
      0.2
      0.4
      0.6
      0.8
      1.0
    
    
      hebbian_grow
      hebbian_prune_perc
      
      
      
      
      
      
    
  
  
    
      False
      0.0
      0.977 ± 0.001
      0.981 ± 0.001
      0.981 ± 0.001
      0.980 ± 0.001
      0.980 ± 0.000
      0.965 ± 0.001
    
    
      0.2
      0.964 ± 0.001
      0.978 ± 0.002
      0.978 ± 0.000
      0.979 ± 0.001
      0.977 ± 0.001
      0.963 ± 0.001
    
    
      0.4
      0.963 ± 0.001
      0.978 ± 0.001
      0.979 ± 0.001
      0.979 ± 0.001
      0.979 ± 0.001
      0.962 ± 0.003
    
    
      0.6
      0.960 ± 0.003
      0.979 ± 0.000
      0.979 ± 0.002
      0.980 ± 0.001
      0.979 ± 0.001
      0.963 ± 0.002
    
    
      0.8
      0.965 ± 0.000
      0.980 ± 0.001
      0.980 ± 0.001
      0.979 ± 0.000
      0.979 ± 0.001
      0.961 ± 0.001
    
    
      1.0
      0.964 ± 0.000
      0.981 ± 0.001
      0.980 ± 0.001
      0.980 ± 0.001
      0.980 ± 0.001
      0.965 ± 0.002
    
    
      True
      0.0
      0.976 ± 0.001
      0.978 ± 0.001
      0.979 ± 0.001
      0.975 ± 0.001
      0.974 ± 0.001
      0.963 ± 0.002
    
    
      0.2
      0.961 ± 0.001
      0.977 ± 0.001
      0.978 ± 0.001
      0.977 ± 0.001
      0.975 ± 0.000
      0.964 ± 0.003
    
    
      0.4
      0.962 ± 0.002
      0.978 ± 0.001
      0.976 ± 0.000
      0.976 ± 0.000
      0.974 ± 0.001
      0.962 ± 0.001
    
    
      0.6
      0.963 ± 0.001
      0.977 ± 0.001
      0.977 ± 0.000
      0.977 ± 0.001
      0.974 ± 0.001
      0.961 ± 0.002
    
    
      0.8
      0.963 ± 0.003
      0.977 ± 0.000
      0.976 ± 0.001
      0.976 ± 0.001
      0.974 ± 0.001
      0.961 ± 0.004
    
    
      1.0
      0.958 ± 0.002
      0.977 ± 0.001
      0.977 ± 0.001
      0.976 ± 0.001
      0.972 ± 0.001
      0.961 ± 0.002

Opposite random growth (adding connections with lowest activation) reduces acc by ~ 0.02



In [ ]:

	Experiment Name	train_acc_max	train_acc_max_epoch	train_acc_min	train_acc_min_epoch	train_acc_median	train_acc_last	val_acc_max	val_acc_max_epoch	val_acc_min	...	momentum	network	num_classes	on_perc	optim_alg	test_noise	use_kwinners	weight_decay	weight_prune_perc
0	0_hebbian_prune_perc=None,weight_prune_perc=None	0.987767	29	0.921683	0	0.984892	0.987767	0.9764	17	0.9629	...	0.9	MLPHeb	10	0.2	SGD	False	False	0.0001	NaN
1	1_hebbian_prune_perc=0.2,weight_prune_perc=None	0.931967	1	0.852817	22	0.876142	0.866717	0.9653	0	0.9016	...	0.9	MLPHeb	10	0.2	SGD	False	False	0.0001	NaN
2	2_hebbian_prune_perc=0.4,weight_prune_perc=None	0.925267	0	0.842883	13	0.868217	0.860283	0.9648	0	0.9008	...	0.9	MLPHeb	10	0.2	SGD	False	False	0.0001	NaN
3	3_hebbian_prune_perc=0.6,weight_prune_perc=None	0.922650	0	0.810317	22	0.869442	0.854883	0.9612	0	0.8888	...	0.9	MLPHeb	10	0.2	SGD	False	False	0.0001	NaN
4	4_hebbian_prune_perc=0.8,weight_prune_perc=None	0.926633	0	0.208517	28	0.878492	0.397800	0.9647	0	0.2306	...	0.9	MLPHeb	10	0.2	SGD	False	False	0.0001	NaN

	val_acc_max_epoch	val_acc_max				model
	round_mean	min	max	mean	std	count
hebbian_prune_perc
0.0	21	0.964	0.982	0.977	0.006	18
0.2	12	0.961	0.980	0.973	0.007	18
0.4	15	0.959	0.980	0.973	0.008	18
0.6	17	0.957	0.980	0.973	0.009	18
0.8	16	0.960	0.981	0.974	0.008	18
1.0	15	0.964	0.982	0.975	0.008	18

weight_prune_perc	0.0	0.2	0.4	0.6	0.8	1.0
hebbian_prune_perc
0.0	0.977 ± 0.001	0.981 ± 0.001	0.981 ± 0.001	0.980 ± 0.001	0.980 ± 0.000	0.965 ± 0.001
0.2	0.964 ± 0.001	0.978 ± 0.002	0.978 ± 0.000	0.979 ± 0.001	0.977 ± 0.001	0.963 ± 0.001
0.4	0.963 ± 0.001	0.978 ± 0.001	0.979 ± 0.001	0.979 ± 0.001	0.979 ± 0.001	0.962 ± 0.003
0.6	0.960 ± 0.003	0.979 ± 0.000	0.979 ± 0.002	0.980 ± 0.001	0.979 ± 0.001	0.963 ± 0.002
0.8	0.965 ± 0.000	0.980 ± 0.001	0.980 ± 0.001	0.979 ± 0.000	0.979 ± 0.001	0.961 ± 0.001
1.0	0.964 ± 0.000	0.981 ± 0.001	0.980 ± 0.001	0.980 ± 0.001	0.980 ± 0.001	0.965 ± 0.002

	weight_prune_perc	0.0	0.2	0.4	0.6	0.8	1.0
hebbian_grow	hebbian_prune_perc
False	0.0	0.977 ± 0.001	0.981 ± 0.001	0.981 ± 0.001	0.980 ± 0.001	0.980 ± 0.000	0.965 ± 0.001
	0.2	0.964 ± 0.001	0.978 ± 0.002	0.978 ± 0.000	0.979 ± 0.001	0.977 ± 0.001	0.963 ± 0.001
	0.4	0.963 ± 0.001	0.978 ± 0.001	0.979 ± 0.001	0.979 ± 0.001	0.979 ± 0.001	0.962 ± 0.003
	0.6	0.960 ± 0.003	0.979 ± 0.000	0.979 ± 0.002	0.980 ± 0.001	0.979 ± 0.001	0.963 ± 0.002
	0.8	0.965 ± 0.000	0.980 ± 0.001	0.980 ± 0.001	0.979 ± 0.000	0.979 ± 0.001	0.961 ± 0.001
	1.0	0.964 ± 0.000	0.981 ± 0.001	0.980 ± 0.001	0.980 ± 0.001	0.980 ± 0.001	0.965 ± 0.002
True	0.0	0.976 ± 0.001	0.978 ± 0.001	0.979 ± 0.001	0.975 ± 0.001	0.974 ± 0.001	0.963 ± 0.002
	0.2	0.961 ± 0.001	0.977 ± 0.001	0.978 ± 0.001	0.977 ± 0.001	0.975 ± 0.000	0.964 ± 0.003
	0.4	0.962 ± 0.002	0.978 ± 0.001	0.976 ± 0.000	0.976 ± 0.000	0.974 ± 0.001	0.962 ± 0.001
	0.6	0.963 ± 0.001	0.977 ± 0.001	0.977 ± 0.000	0.977 ± 0.001	0.974 ± 0.001	0.961 ± 0.002
	0.8	0.963 ± 0.003	0.977 ± 0.000	0.976 ± 0.001	0.976 ± 0.001	0.974 ± 0.001	0.961 ± 0.004
	1.0	0.958 ± 0.002	0.977 ± 0.001	0.977 ± 0.001	0.976 ± 0.001	0.972 ± 0.001	0.961 ± 0.002