Experiment:

Compare prunning by Hebbian Learning and Weight Magnitude.

Motivation.

Verify if Hebbian Learning pruning outperforms pruning by Magnitude

Conclusions:

  • No pruning leads (0,0) to acc of 0.976
  • Pruning all connections at every epoch (1,0) leads to acc of 0.964
  • Best performing model is still no hebbian pruning, and weight pruning set to 0.2 (0.981)
  • Pruning only by hebbian learning decreases accuracy
  • Combining hebbian and weight magnitude is not an improvement compared to simple weight magnitude pruning

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import sys
sys.path.append("../../")

In [26]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
import glob
import tabulate
import pprint
import click
import numpy as np
import pandas as pd
from ray.tune.commands import *
from dynamic_sparse.common.browser import *

Load and check data


In [27]:
exps = ['neurips_debug_test6', ]
paths = [os.path.expanduser("~/nta/results/{}".format(e)) for e in exps]
df = load_many(paths)

In [28]:
df.head(5)


Out[28]:
Experiment Name train_acc_max train_acc_max_epoch train_acc_min train_acc_min_epoch train_acc_median train_acc_last val_acc_max val_acc_max_epoch val_acc_min ... momentum network num_classes on_perc optim_alg pruning_early_stop test_noise use_kwinners weight_decay weight_prune_perc
0 0_hebbian_prune_perc=None,weight_prune_perc=None 0.988333 28 0.923450 0 0.985358 0.988000 0.9768 29 0.9614 ... 0.9 MLPHeb 10 0.2 SGD 0 False False 0.0001 NaN
1 1_hebbian_prune_perc=0.2,weight_prune_perc=None 0.974583 27 0.924417 0 0.970733 0.974483 0.9753 5 0.9609 ... 0.9 MLPHeb 10 0.2 SGD 0 False False 0.0001 NaN
2 2_hebbian_prune_perc=0.4,weight_prune_perc=None 0.968250 25 0.926067 0 0.963083 0.967533 0.9710 20 0.9623 ... 0.9 MLPHeb 10 0.2 SGD 0 False False 0.0001 NaN
3 3_hebbian_prune_perc=0.6,weight_prune_perc=None 0.957933 23 0.926083 0 0.952508 0.957533 0.9673 23 0.9589 ... 0.9 MLPHeb 10 0.2 SGD 0 False False 0.0001 NaN
4 4_hebbian_prune_perc=0.8,weight_prune_perc=None 0.943033 22 0.923467 2 0.936533 0.935983 0.9665 9 0.9514 ... 0.9 MLPHeb 10 0.2 SGD 0 False False 0.0001 NaN

5 rows × 42 columns


In [30]:
df['on_perc'].unique()


Out[30]:
array([0.2])

In [6]:
# replace hebbian prine
df['hebbian_prune_perc'] = df['hebbian_prune_perc'].replace(np.nan, 0.0, regex=True)
df['weight_prune_perc'] = df['weight_prune_perc'].replace(np.nan, 0.0, regex=True)

In [7]:
df.columns


Out[7]:
Index(['Experiment Name', 'train_acc_max', 'train_acc_max_epoch',
       'train_acc_min', 'train_acc_min_epoch', 'train_acc_median',
       'train_acc_last', 'val_acc_max', 'val_acc_max_epoch', 'val_acc_min',
       'val_acc_min_epoch', 'val_acc_median', 'val_acc_last', 'epochs',
       'experiment_file_name', 'trial_time', 'mean_epoch_time', 'batch_norm',
       'data_dir', 'dataset_name', 'debug_sparse', 'debug_weights', 'device',
       'hebbian_grow', 'hebbian_prune_perc', 'hidden_sizes', 'input_size',
       'learning_rate', 'lr_gamma', 'lr_milestones', 'lr_scheduler', 'model',
       'momentum', 'network', 'num_classes', 'on_perc', 'optim_alg',
       'pruning_early_stop', 'test_noise', 'use_kwinners', 'weight_decay',
       'weight_prune_perc'],
      dtype='object')

In [8]:
df.shape


Out[8]:
(108, 42)

In [9]:
df.iloc[1]


Out[9]:
Experiment Name           1_hebbian_prune_perc=0.2,weight_prune_perc=None
train_acc_max                                                    0.974583
train_acc_max_epoch                                                    27
train_acc_min                                                    0.924417
train_acc_min_epoch                                                     0
train_acc_median                                                 0.970733
train_acc_last                                                   0.974483
val_acc_max                                                        0.9753
val_acc_max_epoch                                                       5
val_acc_min                                                        0.9609
val_acc_min_epoch                                                       0
val_acc_median                                                     0.9694
val_acc_last                                                       0.9672
epochs                                                                 30
experiment_file_name    /Users/lsouza/nta/results/neurips_debug_test6/...
trial_time                                                        19.5668
mean_epoch_time                                                  0.652226
batch_norm                                                           True
data_dir                                        /home/ubuntu/nta/datasets
dataset_name                                                        MNIST
debug_sparse                                                         True
debug_weights                                                        True
device                                                               cuda
hebbian_grow                                                        False
hebbian_prune_perc                                                    0.2
hidden_sizes                                                          100
input_size                                                            784
learning_rate                                                         0.1
lr_gamma                                                              0.1
lr_milestones                                                          60
lr_scheduler                                                  MultiStepLR
model                                                        DSNNMixedHeb
momentum                                                              0.9
network                                                            MLPHeb
num_classes                                                            10
on_perc                                                               0.2
optim_alg                                                             SGD
pruning_early_stop                                                      0
test_noise                                                          False
use_kwinners                                                        False
weight_decay                                                       0.0001
weight_prune_perc                                                       0
Name: 1, dtype: object

In [10]:
df.groupby('model')['model'].count()


Out[10]:
model
DSNNMixedHeb    108
Name: model, dtype: int64

Analysis

Experiment Details

base_exp_config = dict( device="cuda", # dataset related dataset_name="MNIST", data_dir=os.path.expanduser("~/nta/datasets"), input_size=784, num_classes=10, # network related network="MLPHeb", hidden_sizes=[100, 100, 100], batch_norm=True, use_kwinners=tune.grid_search([True, False]), # model related model="DSNNMixedHeb", on_perc=0.2, optim_alg="SGD", momentum=0.9, weight_decay=1e-4, learning_rate=0.1, lr_scheduler="MultiStepLR", lr_milestones=[30,60,90], lr_gamma=0.1, # sparse related hebbian_prune_perc=tune.grid_search([0, 0.1, 0.2, 0.3, 0.4, 0.5]), pruning_early_stop=0, hebbian_grow=tune.grid_search([True, False]), # additional validation test_noise=False, # debugging debug_weights=True, debug_sparse=True, stop={"training_iteration": 30}, )

In [11]:
# Did any  trials failed?
df[df["epochs"]<30]["epochs"].count()


Out[11]:
0

In [12]:
# Removing failed or incomplete trials
df_origin = df.copy()
df = df_origin[df_origin["epochs"]>=30]
df.shape


Out[12]:
(108, 42)

In [13]:
# which ones failed?
# failed, or still ongoing?
df_origin['failed'] = df_origin["epochs"]<30
df_origin[df_origin['failed']]['epochs']


Out[13]:
Series([], Name: epochs, dtype: int64)

In [14]:
# helper functions
def mean_and_std(s):
    return "{:.3f} ± {:.3f}".format(s.mean(), s.std())

def round_mean(s):
    return "{:.0f}".format(round(s.mean()))

stats = ['min', 'max', 'mean', 'std']

def agg(columns, filter=None, round=3):
    if filter is None:
        return (df.groupby(columns)
             .agg({'val_acc_max_epoch': round_mean,
                   'val_acc_max': stats,                
                   'model': ['count']})).round(round)
    else:
        return (df[filter].groupby(columns)
             .agg({'val_acc_max_epoch': round_mean,
                   'val_acc_max': stats,                
                   'model': ['count']})).round(round)
What are optimal levels of hebbian and weight pruning

In [15]:
# ignoring experiments where weight_prune_perc = 1, results not reliable
filter = (df['weight_prune_perc'] < 1)

In [16]:
agg(['hebbian_prune_perc'], filter)


Out[16]:
val_acc_max_epoch val_acc_max model
round_mean min max mean std count
hebbian_prune_perc
0.0 23 0.976 0.982 0.979 0.002 15
0.2 17 0.973 0.982 0.978 0.003 15
0.4 21 0.970 0.981 0.977 0.004 15
0.6 20 0.967 0.982 0.977 0.004 15
0.8 18 0.964 0.981 0.977 0.006 15
1.0 19 0.963 0.982 0.977 0.007 15
  • No relevant difference

In [18]:
filter = (df['weight_prune_perc'] < 1)
agg(['weight_prune_perc'], filter)


Out[18]:
val_acc_max_epoch val_acc_max model
round_mean min max mean std count
weight_prune_perc
0.0 13 0.963 0.977 0.970 0.005 18
0.2 23 0.978 0.982 0.980 0.001 18
0.4 22 0.977 0.982 0.980 0.001 18
0.6 20 0.978 0.981 0.979 0.001 18
0.8 20 0.976 0.980 0.978 0.001 18
  • Optimal level between 0.2 and 0.4 (consistent with previous experiments and SET paper, where 0.3 is an optimal value)

In [24]:
magonly = (df['hebbian_prune_perc'] == 0.0) & (df['weight_prune_perc'] < 0.6) 
agg(['weight_prune_perc'], magonly)


Out[24]:
val_acc_max_epoch val_acc_max model
round_mean min max mean std count
weight_prune_perc
0.0 22 0.976 0.977 0.976 0.000 3
0.2 25 0.979 0.982 0.981 0.001 3
0.4 24 0.980 0.981 0.980 0.000 3
What is the optimal combination of both

In [20]:
pd.pivot_table(df[filter], 
              index='hebbian_prune_perc',
              columns='weight_prune_perc',
              values='val_acc_max',
              aggfunc=mean_and_std)


Out[20]:
weight_prune_perc 0.0 0.2 0.4 0.6 0.8
hebbian_prune_perc
0.0 0.976 ± 0.000 0.981 ± 0.001 0.980 ± 0.000 0.980 ± 0.001 0.979 ± 0.001
0.2 0.974 ± 0.001 0.979 ± 0.001 0.979 ± 0.003 0.979 ± 0.002 0.977 ± 0.001
0.4 0.971 ± 0.001 0.980 ± 0.001 0.979 ± 0.001 0.979 ± 0.001 0.978 ± 0.001
0.6 0.969 ± 0.001 0.979 ± 0.001 0.980 ± 0.001 0.979 ± 0.001 0.978 ± 0.001
0.8 0.966 ± 0.001 0.980 ± 0.001 0.980 ± 0.002 0.980 ± 0.001 0.978 ± 0.000
1.0 0.964 ± 0.001 0.981 ± 0.001 0.981 ± 0.001 0.980 ± 0.001 0.980 ± 0.001

In [25]:
pd.pivot_table(df[filter], 
              index='hebbian_prune_perc',
              columns='weight_prune_perc',
              values='val_acc_last',
              aggfunc=mean_and_std)


Out[25]:
weight_prune_perc 0.0 0.2 0.4 0.6 0.8
hebbian_prune_perc
0.0 0.974 ± 0.003 0.980 ± 0.001 0.977 ± 0.002 0.978 ± 0.001 0.976 ± 0.001
0.2 0.969 ± 0.002 0.978 ± 0.002 0.976 ± 0.001 0.977 ± 0.001 0.975 ± 0.001
0.4 0.967 ± 0.000 0.977 ± 0.002 0.978 ± 0.002 0.975 ± 0.002 0.977 ± 0.003
0.6 0.967 ± 0.003 0.977 ± 0.001 0.978 ± 0.001 0.976 ± 0.002 0.975 ± 0.001
0.8 0.961 ± 0.000 0.979 ± 0.001 0.978 ± 0.001 0.978 ± 0.002 0.976 ± 0.002
1.0 0.954 ± 0.001 0.979 ± 0.000 0.979 ± 0.001 0.978 ± 0.001 0.976 ± 0.002

In [21]:
df.shape


Out[21]:
(108, 42)

Conclusions:

  • No pruning leads (0,0) to acc of 0.976
  • Pruning all connections at every epoch (1,0) leads to acc of 0.964
  • Best performing model is still no hebbian pruning, and weight pruning set to 0.2 (0.981)
  • Pruning only by hebbian learning decreases accuracy
  • Combining hebbian and weight magnitude is not an improvement compared to simple weight magnitude pruning

In [ ]: