From ~/Downloads/github/pylearn2/pylearn2/scripts/tutorials/stacked_autoencoders


In [1]:
import os
os.environ['PYLEARN2_DATA_PATH'] = '/Users/udi/Downloads/lisa/data'

Stacked Autoencoders

by Mehdi Mirza

Introduction

This notebook will show you how to perform layer-wise pre-training using denoising autoencoders (DAEs), and subsequently stack the layers to form a multilayer perceptron (MLP) which can be fine-tuned using supervised training. You can also look at this more detailed tutorial of training DAEs using Theano as well as this tutorial which covers the stacked version.

The methods used here can easily be adapted to other models such as contractive auto-encoders (CAEs) or restricted Boltzmann machines (RBMs) with only small modifications.

First layer

The first layer and its training algorithm are defined in the file dae_l1.yaml. Here we load the model and set some of its hypyerparameters.


In [2]:
layer1_yaml = open('dae_l1.yaml', 'r').read()
hyper_params_l1 = {'train_stop' : 50000,
                   'batch_size' : 100,
                   'monitoring_batches' : 5,
                   'nhid' : 500,
                   'max_epochs' : 10,
                   'save_path' : '.'}
layer1_yaml = layer1_yaml % (hyper_params_l1)
print layer1_yaml


!obj:pylearn2.train.Train {
    dataset: &train !obj:pylearn2.datasets.mnist.MNIST {
        which_set: 'train',
        # TODO: the one_hot: 1 is only necessary because one_hot: 0 is
        # broken, remove it after one_hot: 0 is fixed.
        one_hot: 1,
        start: 0,
        stop: 50000
    },
    model: !obj:pylearn2.models.autoencoder.DenoisingAutoencoder {
        nvis : 784,
        nhid : 500,
        irange : 0.05,
        corruptor: !obj:pylearn2.corruption.BinomialCorruptor {
            corruption_level: .2,
        },
        act_enc: "tanh",
        act_dec: null,    # Linear activation on the decoder side.
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        learning_rate : 1e-3,
        batch_size : 100,
        monitoring_batches : 5,
        monitoring_dataset : *train,
        cost : !obj:pylearn2.costs.autoencoder.MeanSquaredReconstructionError {},
        termination_criterion : !obj:pylearn2.termination_criteria.EpochCounter {
            max_epochs: 10,
        },
    },
    save_path: "./dae_l1.pkl",
    save_freq: 1
}

Now we can train the model using the YAML string in the same way as the previous tutorials:


In [3]:
from pylearn2.config import yaml_parse
train = yaml_parse.load(layer1_yaml)
train.main_loop()


/Users/udi/Downloads/github/pylearn2/pylearn2/utils/call_check.py:99: UserWarning: the `one_hot` parameter is deprecated. To get one-hot encoded targets, request that they live in `VectorSpace` through the `data_specs` parameter of MNIST's iterator method. `one_hot` will be removed on or after September 20, 2014.
  return to_call(**kwargs)
/Users/udi/anaconda/lib/python2.7/site-packages/theano/sandbox/rng_mrg.py:1188: UserWarning: MRG_RandomStreams Can't determine #streams from size (Shape.0), guessing 60*256
  nstreams = self.n_streams(size)
Parameter and initial learning rate summary:
	vb: 0.001
	hb: 0.001
	W: 0.001
	Wprime: 0.001
Compiling sgd_update...
/Users/udi/Downloads/github/pylearn2/pylearn2/models/model.py:72: UserWarning: The <class 'pylearn2.models.autoencoder.DenoisingAutoencoder'> Model subclass seems not to call the Model constructor. This behavior may be considered an error on or after 2014-11-01.
  warnings.warn("The " + str(type(self)) + " Model subclass "
Compiling sgd_update done. Time elapsed: 10.284412 seconds
compiling begin_record_entry...
compiling begin_record_entry done. Time elapsed: 0.543545 seconds
Monitored channels: 
	learning_rate
	objective
	total_seconds_last_epoch
	training_seconds_this_epoch
Compiling accum...
graph size: 19
Compiling accum done. Time elapsed: 3.985918 seconds
Monitoring step:
	Epochs seen: 0
	Batches seen: 0
	Examples seen: 0
	learning_rate: 0.001
	objective: 89.189888493
	total_seconds_last_epoch: 0.0
	training_seconds_this_epoch: 0.0
Time this epoch: 7.978305 seconds
Monitoring step:
	Epochs seen: 1
	Batches seen: 500
	Examples seen: 50000
	learning_rate: 0.001
	objective: 30.2296487294
	total_seconds_last_epoch: 0.0
	training_seconds_this_epoch: 7.978305
Saving to ./dae_l1.pkl...
Saving to ./dae_l1.pkl done. Time elapsed: 0.788898 seconds
Time this epoch: 6.294500 seconds
Monitoring step:
	Epochs seen: 2
	Batches seen: 1000
	Examples seen: 100000
	learning_rate: 0.001
	objective: 22.9704822504
	total_seconds_last_epoch: 12.095553
	training_seconds_this_epoch: 6.2945
Saving to ./dae_l1.pkl...
Saving to ./dae_l1.pkl done. Time elapsed: 0.606921 seconds
Time this epoch: 5.781151 seconds
Monitoring step:
	Epochs seen: 3
	Batches seen: 1500
	Examples seen: 150000
	learning_rate: 0.001
	objective: 19.3272344275
	total_seconds_last_epoch: 10.68441
	training_seconds_this_epoch: 5.781151
Saving to ./dae_l1.pkl...
Saving to ./dae_l1.pkl done. Time elapsed: 0.587990 seconds
Time this epoch: 5.826722 seconds
Monitoring step:
	Epochs seen: 4
	Batches seen: 2000
	Examples seen: 200000
	learning_rate: 0.001
	objective: 17.0729861682
	total_seconds_last_epoch: 9.526738
	training_seconds_this_epoch: 5.826722
Saving to ./dae_l1.pkl...
Saving to ./dae_l1.pkl done. Time elapsed: 0.631785 seconds
Time this epoch: 5.467358 seconds
Monitoring step:
	Epochs seen: 5
	Batches seen: 2500
	Examples seen: 250000
	learning_rate: 0.001
	objective: 15.5457219544
	total_seconds_last_epoch: 9.541899
	training_seconds_this_epoch: 5.467358
Saving to ./dae_l1.pkl...
Saving to ./dae_l1.pkl done. Time elapsed: 0.573065 seconds
Time this epoch: 5.347703 seconds
Monitoring step:
	Epochs seen: 6
	Batches seen: 3000
	Examples seen: 300000
	learning_rate: 0.001
	objective: 14.4348570196
	total_seconds_last_epoch: 9.192685
	training_seconds_this_epoch: 5.347703
Saving to ./dae_l1.pkl...
Saving to ./dae_l1.pkl done. Time elapsed: 0.589890 seconds
Time this epoch: 5.440438 seconds
Monitoring step:
	Epochs seen: 7
	Batches seen: 3500
	Examples seen: 350000
	learning_rate: 0.001
	objective: 13.5968633264
	total_seconds_last_epoch: 9.076605
	training_seconds_this_epoch: 5.440438
Saving to ./dae_l1.pkl...
Saving to ./dae_l1.pkl done. Time elapsed: 0.564320 seconds
Time this epoch: 5.469406 seconds
Monitoring step:
	Epochs seen: 8
	Batches seen: 4000
	Examples seen: 400000
	learning_rate: 0.001
	objective: 12.9241679727
	total_seconds_last_epoch: 9.071418
	training_seconds_this_epoch: 5.469406
Saving to ./dae_l1.pkl...
Saving to ./dae_l1.pkl done. Time elapsed: 0.579426 seconds
Time this epoch: 5.066210 seconds
Monitoring step:
	Epochs seen: 9
	Batches seen: 4500
	Examples seen: 450000
	learning_rate: 0.001
	objective: 12.3858239701
	total_seconds_last_epoch: 9.461765
	training_seconds_this_epoch: 5.06621
Saving to ./dae_l1.pkl...
Saving to ./dae_l1.pkl done. Time elapsed: 0.560447 seconds
Time this epoch: 5.325952 seconds
Monitoring step:
	Epochs seen: 10
	Batches seen: 5000
	Examples seen: 500000
	learning_rate: 0.001
	objective: 11.9513600238
	total_seconds_last_epoch: 8.866708
	training_seconds_this_epoch: 5.325952
Saving to ./dae_l1.pkl...
Saving to ./dae_l1.pkl done. Time elapsed: 0.557787 seconds
Saving to ./dae_l1.pkl...
Saving to ./dae_l1.pkl done. Time elapsed: 0.558132 seconds

Second layer

The second layer takes the output of the first layer as its input. Hence we must first apply the first layer's transformations to the raw data using datasets.transformer_dataset.TransformerDataset. This class takes two arguments:

  • raw: the raw data
  • transformer: a Pylearn2 block that transforms the raw data, which in our case is the dae_l1.pkl file from the previous step

To train the second layer, we load the YAML file as before and set the hyperparameters before starting the training loop.


In [4]:
layer2_yaml = open('dae_l2.yaml', 'r').read()
hyper_params_l2 = {'train_stop' : 50000,
                   'batch_size' : 100,
                   'monitoring_batches' : 5,
                   'nvis' : hyper_params_l1['nhid'],
                   'nhid' : 500,
                   'max_epochs' : 10,
                   'save_path' : '.'}
layer2_yaml = layer2_yaml % (hyper_params_l2)
print layer2_yaml


!obj:pylearn2.train.Train {
    dataset: &train !obj:pylearn2.datasets.transformer_dataset.TransformerDataset {
        raw: !obj:pylearn2.datasets.mnist.MNIST {
            which_set: 'train',
            # TODO: the one_hot: 1 is only necessary because one_hot: 0 is
            # broken, remove it after one_hot: 0 is fixed.
            one_hot: 1,
            start: 0,
            stop: 50000
        },
        transformer: !pkl: "./dae_l1.pkl"
    },
    model: !obj:pylearn2.models.autoencoder.DenoisingAutoencoder {
        nvis : 500,
        nhid : 500,
        irange : 0.05,
        corruptor: !obj:pylearn2.corruption.BinomialCorruptor {
            corruption_level: .3,
        },
        act_enc: "tanh",
        act_dec: null,    # Linear activation on the decoder side.
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        learning_rate : 1e-3,
        batch_size : 100,
        monitoring_batches : 5,
        monitoring_dataset : *train,
        cost : !obj:pylearn2.costs.autoencoder.MeanSquaredReconstructionError {},
        termination_criterion : !obj:pylearn2.termination_criteria.EpochCounter {
            max_epochs: 10,
        },
    },
    save_path: "./dae_l2.pkl",
    save_freq: 1
}


In [5]:
train = yaml_parse.load(layer2_yaml)
train.main_loop()


Parameter and initial learning rate summary:
	vb: 0.001
	hb: 0.001
	W: 0.001
	Wprime: 0.001
Compiling sgd_update...
Compiling sgd_update done. Time elapsed: 2.543316 seconds
compiling begin_record_entry...
compiling begin_record_entry done. Time elapsed: 0.055552 seconds
Monitored channels: 
	learning_rate
	objective
	total_seconds_last_epoch
	training_seconds_this_epoch
Compiling accum...
graph size: 19
Compiling accum done. Time elapsed: 0.212405 seconds
Monitoring step:
	Epochs seen: 0
	Batches seen: 0
	Examples seen: 0
	learning_rate: 0.001
	objective: 52.3473862576
	total_seconds_last_epoch: 0.0
	training_seconds_this_epoch: 0.0
Time this epoch: 5.916752 seconds
Monitoring step:
	Epochs seen: 1
	Batches seen: 500
	Examples seen: 50000
	learning_rate: 0.001
	objective: 20.403730567
	total_seconds_last_epoch: 0.0
	training_seconds_this_epoch: 5.916752
Saving to ./dae_l2.pkl...
Saving to ./dae_l2.pkl done. Time elapsed: 0.445217 seconds
Time this epoch: 8.161488 seconds
Monitoring step:
	Epochs seen: 2
	Batches seen: 1000
	Examples seen: 100000
	learning_rate: 0.001
	objective: 13.3085194431
	total_seconds_last_epoch: 10.261751
	training_seconds_this_epoch: 8.161488
Saving to ./dae_l2.pkl...
Saving to ./dae_l2.pkl done. Time elapsed: 0.446662 seconds
Time this epoch: 7.390480 seconds
Monitoring step:
	Epochs seen: 3
	Batches seen: 1500
	Examples seen: 150000
	learning_rate: 0.001
	objective: 9.98722106485
	total_seconds_last_epoch: 13.724457
	training_seconds_this_epoch: 7.39048
Saving to ./dae_l2.pkl...
Saving to ./dae_l2.pkl done. Time elapsed: 0.402655 seconds
Time this epoch: 5.774639 seconds
Monitoring step:
	Epochs seen: 4
	Batches seen: 2000
	Examples seen: 200000
	learning_rate: 0.001
	objective: 8.00958744431
	total_seconds_last_epoch: 11.817375
	training_seconds_this_epoch: 5.774639
Saving to ./dae_l2.pkl...
Saving to ./dae_l2.pkl done. Time elapsed: 0.457210 seconds
Time this epoch: 6.381065 seconds
Monitoring step:
	Epochs seen: 5
	Batches seen: 2500
	Examples seen: 250000
	learning_rate: 0.001
	objective: 6.75177105446
	total_seconds_last_epoch: 9.740588
	training_seconds_this_epoch: 6.381065
Saving to ./dae_l2.pkl...
Saving to ./dae_l2.pkl done. Time elapsed: 0.441539 seconds
Time this epoch: 6.016833 seconds
Monitoring step:
	Epochs seen: 6
	Batches seen: 3000
	Examples seen: 300000
	learning_rate: 0.001
	objective: 5.90950617494
	total_seconds_last_epoch: 10.551165
	training_seconds_this_epoch: 6.016833
Saving to ./dae_l2.pkl...
Saving to ./dae_l2.pkl done. Time elapsed: 0.462518 seconds
Time this epoch: 5.173134 seconds
Monitoring step:
	Epochs seen: 7
	Batches seen: 3500
	Examples seen: 350000
	learning_rate: 0.001
	objective: 5.3139621876
	total_seconds_last_epoch: 9.939718
	training_seconds_this_epoch: 5.173134
Saving to ./dae_l2.pkl...
Saving to ./dae_l2.pkl done. Time elapsed: 0.402607 seconds
Time this epoch: 5.052310 seconds
Monitoring step:
	Epochs seen: 8
	Batches seen: 4000
	Examples seen: 400000
	learning_rate: 0.001
	objective: 4.88844513578
	total_seconds_last_epoch: 8.790959
	training_seconds_this_epoch: 5.05231
Saving to ./dae_l2.pkl...
Saving to ./dae_l2.pkl done. Time elapsed: 0.404440 seconds
Time this epoch: 4.980679 seconds
Monitoring step:
	Epochs seen: 9
	Batches seen: 4500
	Examples seen: 450000
	learning_rate: 0.001
	objective: 4.57154658534
	total_seconds_last_epoch: 9.117831
	training_seconds_this_epoch: 4.980679
Saving to ./dae_l2.pkl...
Saving to ./dae_l2.pkl done. Time elapsed: 0.402997 seconds
Time this epoch: 5.802203 seconds
Monitoring step:
	Epochs seen: 10
	Batches seen: 5000
	Examples seen: 500000
	learning_rate: 0.001
	objective: 4.33914081349
	total_seconds_last_epoch: 8.836268
	training_seconds_this_epoch: 5.802203
Saving to ./dae_l2.pkl...
Saving to ./dae_l2.pkl done. Time elapsed: 0.398750 seconds
Saving to ./dae_l2.pkl...
Saving to ./dae_l2.pkl done. Time elapsed: 0.391285 seconds

Supervised fine-tuning

Now that we have two pre-trained layers, we can stack them to form an MLP which can be trained in a supervised fashion. We use the MLP class as usual for this, except that we now use models.mlp.PretrainedLayer for the different layers so that we can pass our pre-trained layers (as pickle files) using the layer_content argument.


In [6]:
mlp_yaml = open('dae_mlp.yaml', 'r').read()
hyper_params_mlp = {'train_stop' : 50000,
                    'valid_stop' : 60000,
                    'batch_size' : 100,
                    'max_epochs' : 50,
                    'save_path' : '.'}
mlp_yaml = mlp_yaml % (hyper_params_mlp)
print mlp_yaml


!obj:pylearn2.train.Train {
    dataset: &train !obj:pylearn2.datasets.mnist.MNIST {
        which_set: 'train',
        one_hot: 1,
        start: 0,
        stop: 50000
    },
    model: !obj:pylearn2.models.mlp.MLP {
        batch_size: 100,
        layers: [
                 !obj:pylearn2.models.mlp.PretrainedLayer {
                     layer_name: 'h1',
                     layer_content: !pkl: "./dae_l1.pkl"
                 },
                 !obj:pylearn2.models.mlp.PretrainedLayer {
                     layer_name: 'h2',
                     layer_content: !pkl: "./dae_l2.pkl"
                 },
                 !obj:pylearn2.models.mlp.Softmax {
                     max_col_norm: 1.9365,
                     layer_name: 'y',
                     n_classes: 10,
                     irange: .005
                 }
                ],
        nvis: 784
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        learning_rate: .05,
        learning_rule: !obj:pylearn2.training_algorithms.learning_rule.Momentum {
            init_momentum: .5,
        },
        monitoring_dataset:
            {
                'valid' : !obj:pylearn2.datasets.mnist.MNIST {
                              which_set: 'train',
                              one_hot: 1,
                              start: 50000,
                              stop: 60000
                          },
            },
        cost: !obj:pylearn2.costs.mlp.Default {},
        termination_criterion: !obj:pylearn2.termination_criteria.And {
            criteria: [
                !obj:pylearn2.termination_criteria.MonitorBased {
                    channel_name: "valid_y_misclass",
                    prop_decrease: 0.,
                    N: 100
                },
                !obj:pylearn2.termination_criteria.EpochCounter {
                    max_epochs: 50
                }
            ]
        },
        update_callbacks: !obj:pylearn2.training_algorithms.sgd.ExponentialDecay {
            decay_factor: 1.00004,
            min_lr: .000001
        }
    },
    extensions: [
        !obj:pylearn2.training_algorithms.learning_rule.MomentumAdjustor {
            start: 1,
            saturate: 250,
            final_momentum: .7
        }
    ]
}


In [7]:
train = yaml_parse.load(mlp_yaml)
train.main_loop()


Parameter and initial learning rate summary:
	vb: 0.05
	hb: 0.05
	W: 0.05
	Wprime: 0.05
	vb: 0.05
	hb: 0.05
	W: 0.05
	Wprime: 0.05
	softmax_b: 0.05
	softmax_W: 0.05
Compiling sgd_update...
Compiling sgd_update done. Time elapsed: 18.207897 seconds
compiling begin_record_entry...
compiling begin_record_entry done. Time elapsed: 0.071935 seconds
Monitored channels: 
	learning_rate
	momentum
	total_seconds_last_epoch
	training_seconds_this_epoch
	valid_objective
	valid_y_col_norms_max
	valid_y_col_norms_mean
	valid_y_col_norms_min
	valid_y_max_max_class
	valid_y_mean_max_class
	valid_y_min_max_class
	valid_y_misclass
	valid_y_nll
	valid_y_row_norms_max
	valid_y_row_norms_mean
	valid_y_row_norms_min
Compiling accum...
graph size: 63
Compiling accum done. Time elapsed: 10.991465 seconds
Monitoring step:
	Epochs seen: 0
	Batches seen: 0
	Examples seen: 0
	learning_rate: 0.05
	momentum: 0.5
	total_seconds_last_epoch: 0.0
	training_seconds_this_epoch: 0.0
	valid_objective: 2.3025427138
	valid_y_col_norms_max: 0.0650026130651
	valid_y_col_norms_mean: 0.0641744853852
	valid_y_col_norms_min: 0.0624679393698
	valid_y_max_max_class: 0.105517846447
	valid_y_mean_max_class: 0.102751944167
	valid_y_min_max_class: 0.101061860389
	valid_y_misclass: 0.9045
	valid_y_nll: 2.3025427138
	valid_y_row_norms_max: 0.0125483545665
	valid_y_row_norms_mean: 0.00897718040255
	valid_y_row_norms_min: 0.00411555936503
Time this epoch: 5.424466 seconds
Monitoring step:
	Epochs seen: 1
	Batches seen: 500
	Examples seen: 50000
	learning_rate: 0.0490099532688
	momentum: 0.5
	total_seconds_last_epoch: 0.0
	training_seconds_this_epoch: 5.424466
	valid_objective: 0.285522835091
	valid_y_col_norms_max: 1.37933177742
	valid_y_col_norms_mean: 1.26006992057
	valid_y_col_norms_min: 1.1055487119
	valid_y_max_max_class: 0.999642237207
	valid_y_mean_max_class: 0.891380619054
	valid_y_min_max_class: 0.366655260832
	valid_y_misclass: 0.0816
	valid_y_nll: 0.285522835091
	valid_y_row_norms_max: 0.305729549433
	valid_y_row_norms_mean: 0.173910043492
	valid_y_row_norms_min: 0.0764016240587
Time this epoch: 5.914395 seconds
Monitoring step:
	Epochs seen: 2
	Batches seen: 1000
	Examples seen: 100000
	learning_rate: 0.0480395103882
	momentum: 0.500803212851
	total_seconds_last_epoch: 5.916628
	training_seconds_this_epoch: 5.914395
	valid_objective: 0.247152456183
	valid_y_col_norms_max: 1.53966505735
	valid_y_col_norms_mean: 1.40253319321
	valid_y_col_norms_min: 1.25545532251
	valid_y_max_max_class: 0.999809042373
	valid_y_mean_max_class: 0.91413155486
	valid_y_min_max_class: 0.396858131932
	valid_y_misclass: 0.0693
	valid_y_nll: 0.247152456183
	valid_y_row_norms_max: 0.349859366019
	valid_y_row_norms_mean: 0.193156755311
	valid_y_row_norms_min: 0.0768834575651
Time this epoch: 5.384064 seconds
Monitoring step:
	Epochs seen: 3
	Batches seen: 1500
	Examples seen: 150000
	learning_rate: 0.0470882831836
	momentum: 0.501606425703
	total_seconds_last_epoch: 6.467402
	training_seconds_this_epoch: 5.384064
	valid_objective: 0.209608763896
	valid_y_col_norms_max: 1.67361362258
	valid_y_col_norms_mean: 1.5175975166
	valid_y_col_norms_min: 1.4172014207
	valid_y_max_max_class: 0.999854004396
	valid_y_mean_max_class: 0.925858884522
	valid_y_min_max_class: 0.406040902917
	valid_y_misclass: 0.0606
	valid_y_nll: 0.209608763896
	valid_y_row_norms_max: 0.39865097889
	valid_y_row_norms_mean: 0.208246054914
	valid_y_row_norms_min: 0.0794959523134
Time this epoch: 6.462137 seconds
Monitoring step:
	Epochs seen: 4
	Batches seen: 2000
	Examples seen: 200000
	learning_rate: 0.0461558911667
	momentum: 0.502409638554
	total_seconds_last_epoch: 5.87754
	training_seconds_this_epoch: 6.462137
	valid_objective: 0.182002507511
	valid_y_col_norms_max: 1.88661737526
	valid_y_col_norms_mean: 1.62843307965
	valid_y_col_norms_min: 1.44947897479
	valid_y_max_max_class: 0.999894541276
	valid_y_mean_max_class: 0.934676843166
	valid_y_min_max_class: 0.424385874472
	valid_y_misclass: 0.0517
	valid_y_nll: 0.182002507511
	valid_y_row_norms_max: 0.445090109865
	valid_y_row_norms_mean: 0.222652142005
	valid_y_row_norms_min: 0.0809203018493
Time this epoch: 7.260037 seconds
Monitoring step:
	Epochs seen: 5
	Batches seen: 2500
	Examples seen: 250000
	learning_rate: 0.0452419613832
	momentum: 0.503212851406
	total_seconds_last_epoch: 7.185899
	training_seconds_this_epoch: 7.260037
	valid_objective: 0.15993592079
	valid_y_col_norms_max: 1.93649990002
	valid_y_col_norms_mean: 1.72137351889
	valid_y_col_norms_min: 1.47563705063
	valid_y_max_max_class: 0.999887268664
	valid_y_mean_max_class: 0.940857921767
	valid_y_min_max_class: 0.425844421556
	valid_y_misclass: 0.0443
	valid_y_nll: 0.15993592079
	valid_y_row_norms_max: 0.469129537434
	valid_y_row_norms_mean: 0.234461782035
	valid_y_row_norms_min: 0.0818907474095
Time this epoch: 6.109988 seconds
Monitoring step:
	Epochs seen: 6
	Batches seen: 3000
	Examples seen: 300000
	learning_rate: 0.0443461282636
	momentum: 0.504016064257
	total_seconds_last_epoch: 7.832857
	training_seconds_this_epoch: 6.109988
	valid_objective: 0.143053170079
	valid_y_col_norms_max: 1.93127251362
	valid_y_col_norms_mean: 1.79741241555
	valid_y_col_norms_min: 1.52140954332
	valid_y_max_max_class: 0.999934626403
	valid_y_mean_max_class: 0.948279377144
	valid_y_min_max_class: 0.448804897479
	valid_y_misclass: 0.0378
	valid_y_nll: 0.143053170079
	valid_y_row_norms_max: 0.50656723117
	valid_y_row_norms_mean: 0.244059832993
	valid_y_row_norms_min: 0.0839453832919
Time this epoch: 6.274909 seconds
Monitoring step:
	Epochs seen: 7
	Batches seen: 3500
	Examples seen: 350000
	learning_rate: 0.043468033477
	momentum: 0.504819277108
	total_seconds_last_epoch: 6.641488
	training_seconds_this_epoch: 6.274909
	valid_objective: 0.12899298538
	valid_y_col_norms_max: 1.93629631377
	valid_y_col_norms_mean: 1.8413293273
	valid_y_col_norms_min: 1.56393053427
	valid_y_max_max_class: 0.999935186738
	valid_y_mean_max_class: 0.952691727165
	valid_y_min_max_class: 0.457475747077
	valid_y_misclass: 0.0371
	valid_y_nll: 0.12899298538
	valid_y_row_norms_max: 0.522436534903
	valid_y_row_norms_mean: 0.249350677872
	valid_y_row_norms_min: 0.0835793194683
Time this epoch: 5.780168 seconds
Monitoring step:
	Epochs seen: 8
	Batches seen: 4000
	Examples seen: 400000
	learning_rate: 0.0426073257879
	momentum: 0.50562248996
	total_seconds_last_epoch: 6.848056
	training_seconds_this_epoch: 5.780168
	valid_objective: 0.123507539177
	valid_y_col_norms_max: 1.93649990004
	valid_y_col_norms_mean: 1.87322047941
	valid_y_col_norms_min: 1.61648739436
	valid_y_max_max_class: 0.999963122695
	valid_y_mean_max_class: 0.954592708979
	valid_y_min_max_class: 0.463553841674
	valid_y_misclass: 0.0348
	valid_y_nll: 0.123507539177
	valid_y_row_norms_max: 0.530573236568
	valid_y_row_norms_mean: 0.253283284212
	valid_y_row_norms_min: 0.0839513317413
Time this epoch: 5.862484 seconds
Monitoring step:
	Epochs seen: 9
	Batches seen: 4500
	Examples seen: 450000
	learning_rate: 0.0417636609155
	momentum: 0.506425702811
	total_seconds_last_epoch: 6.263422
	training_seconds_this_epoch: 5.862484
	valid_objective: 0.119219852832
	valid_y_col_norms_max: 1.93649990006
	valid_y_col_norms_mean: 1.89298464921
	valid_y_col_norms_min: 1.65981380041
	valid_y_max_max_class: 0.999965706697
	valid_y_mean_max_class: 0.956153143933
	valid_y_min_max_class: 0.464692993356
	valid_y_misclass: 0.0325
	valid_y_nll: 0.119219852832
	valid_y_row_norms_max: 0.534978569943
	valid_y_row_norms_mean: 0.255624289812
	valid_y_row_norms_min: 0.083866479842
Time this epoch: 5.537590 seconds
Monitoring step:
	Epochs seen: 10
	Batches seen: 5000
	Examples seen: 500000
	learning_rate: 0.040936701396
	momentum: 0.507228915663
	total_seconds_last_epoch: 6.380026
	training_seconds_this_epoch: 5.53759
	valid_objective: 0.107567259811
	valid_y_col_norms_max: 1.9364999001
	valid_y_col_norms_mean: 1.9034125219
	valid_y_col_norms_min: 1.70328608813
	valid_y_max_max_class: 0.999951104974
	valid_y_mean_max_class: 0.960034365701
	valid_y_min_max_class: 0.468327175933
	valid_y_misclass: 0.0301
	valid_y_nll: 0.107567259811
	valid_y_row_norms_max: 0.542294959221
	valid_y_row_norms_mean: 0.256801448936
	valid_y_row_norms_min: 0.0852080089068
Time this epoch: 5.906574 seconds
Monitoring step:
	Epochs seen: 11
	Batches seen: 5500
	Examples seen: 550000
	learning_rate: 0.0401261164479
	momentum: 0.508032128514
	total_seconds_last_epoch: 6.028702
	training_seconds_this_epoch: 5.906574
	valid_objective: 0.107947427591
	valid_y_col_norms_max: 1.93649990007
	valid_y_col_norms_mean: 1.91489786344
	valid_y_col_norms_min: 1.76231726369
	valid_y_max_max_class: 0.999973748357
	valid_y_mean_max_class: 0.959634515782
	valid_y_min_max_class: 0.474093275273
	valid_y_misclass: 0.03
	valid_y_nll: 0.107947427591
	valid_y_row_norms_max: 0.549992925871
	valid_y_row_norms_mean: 0.258155136976
	valid_y_row_norms_min: 0.0866275711933
Time this epoch: 5.412370 seconds
Monitoring step:
	Epochs seen: 12
	Batches seen: 6000
	Examples seen: 600000
	learning_rate: 0.0393315818394
	momentum: 0.508835341365
	total_seconds_last_epoch: 6.476826
	training_seconds_this_epoch: 5.41237
	valid_objective: 0.099866818925
	valid_y_col_norms_max: 1.93649990003
	valid_y_col_norms_mean: 1.92230766664
	valid_y_col_norms_min: 1.80934364428
	valid_y_max_max_class: 0.999977163009
	valid_y_mean_max_class: 0.964608656832
	valid_y_min_max_class: 0.500261865705
	valid_y_misclass: 0.0275
	valid_y_nll: 0.099866818925
	valid_y_row_norms_max: 0.558781354335
	valid_y_row_norms_mean: 0.259100870088
	valid_y_row_norms_min: 0.0880999968826
Time this epoch: 6.015195 seconds
Monitoring step:
	Epochs seen: 13
	Batches seen: 6500
	Examples seen: 650000
	learning_rate: 0.0385527797588
	momentum: 0.509638554217
	total_seconds_last_epoch: 5.900172
	training_seconds_this_epoch: 6.015195
	valid_objective: 0.0978410464664
	valid_y_col_norms_max: 1.93649990007
	valid_y_col_norms_mean: 1.92794140834
	valid_y_col_norms_min: 1.86112450513
	valid_y_max_max_class: 0.999976969849
	valid_y_mean_max_class: 0.964530701679
	valid_y_min_max_class: 0.493560643367
	valid_y_misclass: 0.0282
	valid_y_nll: 0.0978410464664
	valid_y_row_norms_max: 0.564768602442
	valid_y_row_norms_mean: 0.259801533503
	valid_y_row_norms_min: 0.0896851288266
Time this epoch: 5.360026 seconds
Monitoring step:
	Epochs seen: 14
	Batches seen: 7000
	Examples seen: 700000
	learning_rate: 0.0377893986872
	momentum: 0.510441767068
	total_seconds_last_epoch: 6.57288
	training_seconds_this_epoch: 5.360026
	valid_objective: 0.0951468312278
	valid_y_col_norms_max: 1.93649990004
	valid_y_col_norms_mean: 1.93098837983
	valid_y_col_norms_min: 1.90879801473
	valid_y_max_max_class: 0.999983550414
	valid_y_mean_max_class: 0.965579195639
	valid_y_min_max_class: 0.49387690176
	valid_y_misclass: 0.028
	valid_y_nll: 0.0951468312278
	valid_y_row_norms_max: 0.572342419681
	valid_y_row_norms_mean: 0.260207611713
	valid_y_row_norms_min: 0.0907168875795
Time this epoch: 6.135147 seconds
Monitoring step:
	Epochs seen: 15
	Batches seen: 7500
	Examples seen: 750000
	learning_rate: 0.0370411332743
	momentum: 0.51124497992
	total_seconds_last_epoch: 5.840326
	training_seconds_this_epoch: 6.135147
	valid_objective: 0.0946865767775
	valid_y_col_norms_max: 1.9364999001
	valid_y_col_norms_mean: 1.93528101845
	valid_y_col_norms_min: 1.93273820413
	valid_y_max_max_class: 0.99998445477
	valid_y_mean_max_class: 0.966292634743
	valid_y_min_max_class: 0.497419916571
	valid_y_misclass: 0.0264
	valid_y_nll: 0.0946865767775
	valid_y_row_norms_max: 0.575220483639
	valid_y_row_norms_mean: 0.26081324661
	valid_y_row_norms_min: 0.0913843843503
Time this epoch: 5.447133 seconds
Monitoring step:
	Epochs seen: 16
	Batches seen: 8000
	Examples seen: 800000
	learning_rate: 0.0363076842159
	momentum: 0.512048192771
	total_seconds_last_epoch: 6.627608
	training_seconds_this_epoch: 5.447133
	valid_objective: 0.0891035756577
	valid_y_col_norms_max: 1.93649990007
	valid_y_col_norms_mean: 1.93567546133
	valid_y_col_norms_min: 1.93226078666
	valid_y_max_max_class: 0.99998511463
	valid_y_mean_max_class: 0.968229560194
	valid_y_min_max_class: 0.503004856839
	valid_y_misclass: 0.0255
	valid_y_nll: 0.0891035756577
	valid_y_row_norms_max: 0.577980787976
	valid_y_row_norms_mean: 0.260947657836
	valid_y_row_norms_min: 0.0923057387534
Time this epoch: 6.057061 seconds
Monitoring step:
	Epochs seen: 17
	Batches seen: 8500
	Examples seen: 850000
	learning_rate: 0.0355887581344
	momentum: 0.512851405622
	total_seconds_last_epoch: 5.923046
	training_seconds_this_epoch: 6.057061
	valid_objective: 0.0881537906447
	valid_y_col_norms_max: 1.93649990003
	valid_y_col_norms_mean: 1.93483799999
	valid_y_col_norms_min: 1.93229350347
	valid_y_max_max_class: 0.999977816061
	valid_y_mean_max_class: 0.968547838073
	valid_y_min_max_class: 0.503032249459
	valid_y_misclass: 0.026
	valid_y_nll: 0.0881537906447
	valid_y_row_norms_max: 0.580327280586
	valid_y_row_norms_mean: 0.260898516117
	valid_y_row_norms_min: 0.0930101525575
Time this epoch: 5.353301 seconds
Monitoring step:
	Epochs seen: 18
	Batches seen: 9000
	Examples seen: 900000
	learning_rate: 0.0348840674612
	momentum: 0.513654618474
	total_seconds_last_epoch: 6.544159
	training_seconds_this_epoch: 5.353301
	valid_objective: 0.0849988168168
	valid_y_col_norms_max: 1.93649990007
	valid_y_col_norms_mean: 1.93541869502
	valid_y_col_norms_min: 1.93201480261
	valid_y_max_max_class: 0.999984295216
	valid_y_mean_max_class: 0.969765470533
	valid_y_min_max_class: 0.510558770721
	valid_y_misclass: 0.0242
	valid_y_nll: 0.0849988168168
	valid_y_row_norms_max: 0.581088622209
	valid_y_row_norms_mean: 0.26107829656
	valid_y_row_norms_min: 0.0923959534237
Time this epoch: 6.384324 seconds
Monitoring step:
	Epochs seen: 19
	Batches seen: 9500
	Examples seen: 950000
	learning_rate: 0.034193330322
	momentum: 0.514457831325
	total_seconds_last_epoch: 5.837694
	training_seconds_this_epoch: 6.384324
	valid_objective: 0.0860030965262
	valid_y_col_norms_max: 1.93649990004
	valid_y_col_norms_mean: 1.93476177366
	valid_y_col_norms_min: 1.93239041045
	valid_y_max_max_class: 0.999982263957
	valid_y_mean_max_class: 0.968567755339
	valid_y_min_max_class: 0.500306941533
	valid_y_misclass: 0.0244
	valid_y_nll: 0.0860030965262
	valid_y_row_norms_max: 0.583485659518
	valid_y_row_norms_mean: 0.261092936398
	valid_y_row_norms_min: 0.0937852230236
Time this epoch: 5.956484 seconds
Monitoring step:
	Epochs seen: 20
	Batches seen: 10000
	Examples seen: 1000000
	learning_rate: 0.0335162704237
	momentum: 0.515261044177
	total_seconds_last_epoch: 6.863189
	training_seconds_this_epoch: 5.956484
	valid_objective: 0.0827991311558
	valid_y_col_norms_max: 1.93649990006
	valid_y_col_norms_mean: 1.9356068121
	valid_y_col_norms_min: 1.932339644
	valid_y_max_max_class: 0.999988465173
	valid_y_mean_max_class: 0.970974120441
	valid_y_min_max_class: 0.512024650555
	valid_y_misclass: 0.0253
	valid_y_nll: 0.0827991311558
	valid_y_row_norms_max: 0.585478955937
	valid_y_row_norms_mean: 0.261291684144
	valid_y_row_norms_min: 0.0935873984313
Time this epoch: 6.912120 seconds
Monitoring step:
	Epochs seen: 21
	Batches seen: 10500
	Examples seen: 1050000
	learning_rate: 0.0328526169442
	momentum: 0.516064257028
	total_seconds_last_epoch: 6.623968
	training_seconds_this_epoch: 6.91212
	valid_objective: 0.0818521815965
	valid_y_col_norms_max: 1.93649990004
	valid_y_col_norms_mean: 1.93554695608
	valid_y_col_norms_min: 1.93384153168
	valid_y_max_max_class: 0.999989665122
	valid_y_mean_max_class: 0.972759271353
	valid_y_min_max_class: 0.532786606556
	valid_y_misclass: 0.0242
	valid_y_nll: 0.0818521815965
	valid_y_row_norms_max: 0.582850746611
	valid_y_row_norms_mean: 0.261402125409
	valid_y_row_norms_min: 0.0943095869177
Time this epoch: 5.942600 seconds
Monitoring step:
	Epochs seen: 22
	Batches seen: 11000
	Examples seen: 1100000
	learning_rate: 0.0322021044239
	momentum: 0.51686746988
	total_seconds_last_epoch: 7.449036
	training_seconds_this_epoch: 5.9426
	valid_objective: 0.0818381087461
	valid_y_col_norms_max: 1.93649990004
	valid_y_col_norms_mean: 1.9354317889
	valid_y_col_norms_min: 1.93240682737
	valid_y_max_max_class: 0.999989762395
	valid_y_mean_max_class: 0.971525657813
	valid_y_min_max_class: 0.511311093519
	valid_y_misclass: 0.0237
	valid_y_nll: 0.0818381087461
	valid_y_row_norms_max: 0.583744391912
	valid_y_row_norms_mean: 0.261528473827
	valid_y_row_norms_min: 0.0954943063263
Time this epoch: 6.306351 seconds
Monitoring step:
	Epochs seen: 23
	Batches seen: 11500
	Examples seen: 1150000
	learning_rate: 0.0315644726594
	momentum: 0.517670682731
	total_seconds_last_epoch: 6.45007
	training_seconds_this_epoch: 6.306351
	valid_objective: 0.0783682477947
	valid_y_col_norms_max: 1.93649990006
	valid_y_col_norms_mean: 1.93526822051
	valid_y_col_norms_min: 1.93147578338
	valid_y_max_max_class: 0.99999011425
	valid_y_mean_max_class: 0.97293918071
	valid_y_min_max_class: 0.526539137394
	valid_y_misclass: 0.0227
	valid_y_nll: 0.0783682477947
	valid_y_row_norms_max: 0.582500635239
	valid_y_row_norms_mean: 0.261607855034
	valid_y_row_norms_min: 0.0961656137637
Time this epoch: 7.636177 seconds
Monitoring step:
	Epochs seen: 24
	Batches seen: 12000
	Examples seen: 1200000
	learning_rate: 0.0309394665998
	momentum: 0.518473895582
	total_seconds_last_epoch: 6.911065
	training_seconds_this_epoch: 7.636177
	valid_objective: 0.0787622672218
	valid_y_col_norms_max: 1.93649990001
	valid_y_col_norms_mean: 1.93575002012
	valid_y_col_norms_min: 1.93321450572
	valid_y_max_max_class: 0.99998974409
	valid_y_mean_max_class: 0.97318432332
	valid_y_min_max_class: 0.518712142169
	valid_y_misclass: 0.0225
	valid_y_nll: 0.0787622672218
	valid_y_row_norms_max: 0.583587850135
	valid_y_row_norms_mean: 0.261799057069
	valid_y_row_norms_min: 0.0973764887908
Time this epoch: 5.956405 seconds
Monitoring step:
	Epochs seen: 25
	Batches seen: 12500
	Examples seen: 1250000
	learning_rate: 0.0303268362444
	momentum: 0.519277108434
	total_seconds_last_epoch: 8.194458
	training_seconds_this_epoch: 5.956405
	valid_objective: 0.0773718887277
	valid_y_col_norms_max: 1.93649990004
	valid_y_col_norms_mean: 1.93532355362
	valid_y_col_norms_min: 1.93351647398
	valid_y_max_max_class: 0.99999145149
	valid_y_mean_max_class: 0.973759518777
	valid_y_min_max_class: 0.529670173287
	valid_y_misclass: 0.0233
	valid_y_nll: 0.0773718887277
	valid_y_row_norms_max: 0.582513357512
	valid_y_row_norms_mean: 0.26186702499
	valid_y_row_norms_min: 0.0979607422094
Time this epoch: 5.743074 seconds
Monitoring step:
	Epochs seen: 26
	Batches seen: 13000
	Examples seen: 1300000
	learning_rate: 0.0297263365425
	momentum: 0.520080321285
	total_seconds_last_epoch: 6.488384
	training_seconds_this_epoch: 5.743074
	valid_objective: 0.0761000400278
	valid_y_col_norms_max: 1.93649990003
	valid_y_col_norms_mean: 1.93572293722
	valid_y_col_norms_min: 1.93308728002
	valid_y_max_max_class: 0.999990865485
	valid_y_mean_max_class: 0.974327828318
	valid_y_min_max_class: 0.535549664623
	valid_y_misclass: 0.0222
	valid_y_nll: 0.0761000400278
	valid_y_row_norms_max: 0.582412817396
	valid_y_row_norms_mean: 0.262060779735
	valid_y_row_norms_min: 0.0987484252388
Time this epoch: 5.530118 seconds
Monitoring step:
	Epochs seen: 27
	Batches seen: 13500
	Examples seen: 1350000
	learning_rate: 0.029137727296
	momentum: 0.520883534137
	total_seconds_last_epoch: 6.300112
	training_seconds_this_epoch: 5.530118
	valid_objective: 0.0745081666292
	valid_y_col_norms_max: 1.93649990002
	valid_y_col_norms_mean: 1.93582241037
	valid_y_col_norms_min: 1.93385572353
	valid_y_max_max_class: 0.999992622678
	valid_y_mean_max_class: 0.974613548424
	valid_y_min_max_class: 0.528671878457
	valid_y_misclass: 0.0224
	valid_y_nll: 0.0745081666292
	valid_y_row_norms_max: 0.580291571914
	valid_y_row_norms_mean: 0.262198077457
	valid_y_row_norms_min: 0.0991215943111
Time this epoch: 5.768094 seconds
Monitoring step:
	Epochs seen: 28
	Batches seen: 14000
	Examples seen: 1400000
	learning_rate: 0.0285607730628
	momentum: 0.521686746988
	total_seconds_last_epoch: 6.053191
	training_seconds_this_epoch: 5.768094
	valid_objective: 0.0740452001005
	valid_y_col_norms_max: 1.93649990003
	valid_y_col_norms_mean: 1.93625501779
	valid_y_col_norms_min: 1.93551373359
	valid_y_max_max_class: 0.999993221811
	valid_y_mean_max_class: 0.975309055813
	valid_y_min_max_class: 0.533641371741
	valid_y_misclass: 0.0215
	valid_y_nll: 0.0740452001005
	valid_y_row_norms_max: 0.580056651905
	valid_y_row_norms_mean: 0.262354805648
	valid_y_row_norms_min: 0.100014501684
Time this epoch: 5.513620 seconds
Monitoring step:
	Epochs seen: 29
	Batches seen: 14500
	Examples seen: 1450000
	learning_rate: 0.0279952430625
	momentum: 0.522489959839
	total_seconds_last_epoch: 6.254529
	training_seconds_this_epoch: 5.51362
	valid_objective: 0.073504601426
	valid_y_col_norms_max: 1.93649990002
	valid_y_col_norms_mean: 1.93584892773
	valid_y_col_norms_min: 1.93416522858
	valid_y_max_max_class: 0.999993403152
	valid_y_mean_max_class: 0.975481711533
	valid_y_min_max_class: 0.529656340405
	valid_y_misclass: 0.0227
	valid_y_nll: 0.073504601426
	valid_y_row_norms_max: 0.5790139873
	valid_y_row_norms_mean: 0.262423752802
	valid_y_row_norms_min: 0.100872029594
Time this epoch: 5.889689 seconds
Monitoring step:
	Epochs seen: 30
	Batches seen: 15000
	Examples seen: 1500000
	learning_rate: 0.0274409110849
	momentum: 0.523293172691
	total_seconds_last_epoch: 6.052284
	training_seconds_this_epoch: 5.889689
	valid_objective: 0.0742676570822
	valid_y_col_norms_max: 1.93649990004
	valid_y_col_norms_mean: 1.93607586676
	valid_y_col_norms_min: 1.93510461646
	valid_y_max_max_class: 0.999993515244
	valid_y_mean_max_class: 0.975243305948
	valid_y_min_max_class: 0.529464250743
	valid_y_misclass: 0.0214
	valid_y_nll: 0.0742676570822
	valid_y_row_norms_max: 0.577061673794
	valid_y_row_norms_mean: 0.262574402565
	valid_y_row_norms_min: 0.101525157962
Time this epoch: 5.481031 seconds
Monitoring step:
	Epochs seen: 31
	Batches seen: 15500
	Examples seen: 1550000
	learning_rate: 0.0268975553984
	momentum: 0.524096385542
	total_seconds_last_epoch: 6.374491
	training_seconds_this_epoch: 5.481031
	valid_objective: 0.0728776691158
	valid_y_col_norms_max: 1.93649990004
	valid_y_col_norms_mean: 1.93537613609
	valid_y_col_norms_min: 1.93343938173
	valid_y_max_max_class: 0.999993167924
	valid_y_mean_max_class: 0.97504830818
	valid_y_min_max_class: 0.530201316588
	valid_y_misclass: 0.0212
	valid_y_nll: 0.0728776691158
	valid_y_row_norms_max: 0.575226588311
	valid_y_row_norms_mean: 0.262591911926
	valid_y_row_norms_min: 0.102804652481
Time this epoch: 5.813406 seconds
Monitoring step:
	Epochs seen: 32
	Batches seen: 16000
	Examples seen: 1600000
	learning_rate: 0.0263649586624
	momentum: 0.524899598394
	total_seconds_last_epoch: 6.010313
	training_seconds_this_epoch: 5.813406
	valid_objective: 0.0728836989039
	valid_y_col_norms_max: 1.93649990003
	valid_y_col_norms_mean: 1.93582905882
	valid_y_col_norms_min: 1.93383688687
	valid_y_max_max_class: 0.999994016591
	valid_y_mean_max_class: 0.976248888568
	valid_y_min_max_class: 0.523712212878
	valid_y_misclass: 0.0216
	valid_y_nll: 0.0728836989039
	valid_y_row_norms_max: 0.573600835574
	valid_y_row_norms_mean: 0.262762722983
	valid_y_row_norms_min: 0.103200994169
Time this epoch: 5.543042 seconds
Monitoring step:
	Epochs seen: 33
	Batches seen: 16500
	Examples seen: 1650000
	learning_rate: 0.0258429078396
	momentum: 0.525702811245
	total_seconds_last_epoch: 6.272276
	training_seconds_this_epoch: 5.543042
	valid_objective: 0.0711776701676
	valid_y_col_norms_max: 1.93649990002
	valid_y_col_norms_mean: 1.93601849576
	valid_y_col_norms_min: 1.9347452523
	valid_y_max_max_class: 0.999994804807
	valid_y_mean_max_class: 0.976937162353
	valid_y_min_max_class: 0.535316842768
	valid_y_misclass: 0.0215
	valid_y_nll: 0.0711776701676
	valid_y_row_norms_max: 0.57294316861
	valid_y_row_norms_mean: 0.26290736819
	valid_y_row_norms_min: 0.103953794527
Time this epoch: 5.721724 seconds
Monitoring step:
	Epochs seen: 34
	Batches seen: 17000
	Examples seen: 1700000
	learning_rate: 0.025331194111
	momentum: 0.526506024096
	total_seconds_last_epoch: 6.074407
	training_seconds_this_epoch: 5.721724
	valid_objective: 0.0699203328758
	valid_y_col_norms_max: 1.93649990002
	valid_y_col_norms_mean: 1.93592743587
	valid_y_col_norms_min: 1.93428107334
	valid_y_max_max_class: 0.999995376263
	valid_y_mean_max_class: 0.976599284745
	valid_y_min_max_class: 0.526191214552
	valid_y_misclass: 0.021
	valid_y_nll: 0.0699203328758
	valid_y_row_norms_max: 0.571380504077
	valid_y_row_norms_mean: 0.262986149627
	valid_y_row_norms_min: 0.10382617767
Time this epoch: 5.558886 seconds
Monitoring step:
	Epochs seen: 35
	Batches seen: 17500
	Examples seen: 1750000
	learning_rate: 0.0248296127924
	momentum: 0.527309236948
	total_seconds_last_epoch: 6.196367
	training_seconds_this_epoch: 5.558886
	valid_objective: 0.0703058351436
	valid_y_col_norms_max: 1.93649990004
	valid_y_col_norms_mean: 1.93556322238
	valid_y_col_norms_min: 1.93352817916
	valid_y_max_max_class: 0.999995617856
	valid_y_mean_max_class: 0.977215418205
	valid_y_min_max_class: 0.540389429489
	valid_y_misclass: 0.0214
	valid_y_nll: 0.0703058351436
	valid_y_row_norms_max: 0.56730498694
	valid_y_row_norms_mean: 0.263051858774
	valid_y_row_norms_min: 0.103871414144
Time this epoch: 5.726418 seconds
Monitoring step:
	Epochs seen: 36
	Batches seen: 18000
	Examples seen: 1800000
	learning_rate: 0.0243379632528
	momentum: 0.528112449799
	total_seconds_last_epoch: 6.09765
	training_seconds_this_epoch: 5.726418
	valid_objective: 0.0702599115727
	valid_y_col_norms_max: 1.93649990002
	valid_y_col_norms_mean: 1.93603299071
	valid_y_col_norms_min: 1.93538484492
	valid_y_max_max_class: 0.999995632432
	valid_y_mean_max_class: 0.97787637084
	valid_y_min_max_class: 0.538194459331
	valid_y_misclass: 0.0213
	valid_y_nll: 0.0702599115727
	valid_y_row_norms_max: 0.568113383761
	valid_y_row_norms_mean: 0.263222703155
	valid_y_row_norms_min: 0.104112327856
Time this epoch: 5.610193 seconds
Monitoring step:
	Epochs seen: 37
	Batches seen: 18500
	Examples seen: 1850000
	learning_rate: 0.0238560488335
	momentum: 0.528915662651
	total_seconds_last_epoch: 6.197008
	training_seconds_this_epoch: 5.610193
	valid_objective: 0.0700176350165
	valid_y_col_norms_max: 1.93649990002
	valid_y_col_norms_mean: 1.93587895351
	valid_y_col_norms_min: 1.93341619328
	valid_y_max_max_class: 0.999996322536
	valid_y_mean_max_class: 0.977628071153
	valid_y_min_max_class: 0.542568666563
	valid_y_misclass: 0.0209
	valid_y_nll: 0.0700176350165
	valid_y_row_norms_max: 0.566012260871
	valid_y_row_norms_mean: 0.263292142199
	valid_y_row_norms_min: 0.103009402164
Time this epoch: 5.764235 seconds
Monitoring step:
	Epochs seen: 38
	Batches seen: 19000
	Examples seen: 1900000
	learning_rate: 0.0233836767702
	momentum: 0.529718875502
	total_seconds_last_epoch: 6.1322
	training_seconds_this_epoch: 5.764235
	valid_objective: 0.0708193370316
	valid_y_col_norms_max: 1.93649990004
	valid_y_col_norms_mean: 1.93591896523
	valid_y_col_norms_min: 1.93506382472
	valid_y_max_max_class: 0.999996173337
	valid_y_mean_max_class: 0.978034880895
	valid_y_min_max_class: 0.541540751774
	valid_y_misclass: 0.0211
	valid_y_nll: 0.0708193370316
	valid_y_row_norms_max: 0.564171236477
	valid_y_row_norms_mean: 0.263400685781
	valid_y_row_norms_min: 0.103364593489
Time this epoch: 8.119455 seconds
Monitoring step:
	Epochs seen: 39
	Batches seen: 19500
	Examples seen: 1950000
	learning_rate: 0.0229206581152
	momentum: 0.530522088353
	total_seconds_last_epoch: 6.258956
	training_seconds_this_epoch: 8.119455
	valid_objective: 0.070476986816
	valid_y_col_norms_max: 1.93649990002
	valid_y_col_norms_mean: 1.93585109817
	valid_y_col_norms_min: 1.93342596378
	valid_y_max_max_class: 0.999996190721
	valid_y_mean_max_class: 0.978034580491
	valid_y_min_max_class: 0.547503878931
	valid_y_misclass: 0.0211
	valid_y_nll: 0.070476986816
	valid_y_row_norms_max: 0.561514221412
	valid_y_row_norms_mean: 0.263508182543
	valid_y_row_norms_min: 0.103347333576
Time this epoch: 6.312362 seconds
Monitoring step:
	Epochs seen: 40
	Batches seen: 20000
	Examples seen: 2000000
	learning_rate: 0.0224668076623
	momentum: 0.531325301205
	total_seconds_last_epoch: 8.714237
	training_seconds_this_epoch: 6.312362
	valid_objective: 0.0690301024797
	valid_y_col_norms_max: 1.93649990001
	valid_y_col_norms_mean: 1.93612443028
	valid_y_col_norms_min: 1.93513479545
	valid_y_max_max_class: 0.99999639704
	valid_y_mean_max_class: 0.978084482899
	valid_y_min_max_class: 0.548426862892
	valid_y_misclass: 0.0201
	valid_y_nll: 0.0690301024797
	valid_y_row_norms_max: 0.559634525084
	valid_y_row_norms_mean: 0.263633786722
	valid_y_row_norms_min: 0.102985762366
Time this epoch: 6.132795 seconds
Monitoring step:
	Epochs seen: 41
	Batches seen: 20500
	Examples seen: 2050000
	learning_rate: 0.0220219438726
	momentum: 0.532128514056
	total_seconds_last_epoch: 6.828239
	training_seconds_this_epoch: 6.132795
	valid_objective: 0.0694333851078
	valid_y_col_norms_max: 1.93649990003
	valid_y_col_norms_mean: 1.93617701772
	valid_y_col_norms_min: 1.93556864871
	valid_y_max_max_class: 0.999996268269
	valid_y_mean_max_class: 0.978082685929
	valid_y_min_max_class: 0.551582992126
	valid_y_misclass: 0.021
	valid_y_nll: 0.0694333851078
	valid_y_row_norms_max: 0.557070974796
	valid_y_row_norms_mean: 0.263730191482
	valid_y_row_norms_min: 0.103568235915
Time this epoch: 5.379670 seconds
Monitoring step:
	Epochs seen: 42
	Batches seen: 21000
	Examples seen: 2100000
	learning_rate: 0.0215858888017
	momentum: 0.532931726908
	total_seconds_last_epoch: 6.611762
	training_seconds_this_epoch: 5.37967
	valid_objective: 0.0682303426279
	valid_y_col_norms_max: 1.93649990001
	valid_y_col_norms_mean: 1.93583612318
	valid_y_col_norms_min: 1.93494503131
	valid_y_max_max_class: 0.99999624094
	valid_y_mean_max_class: 0.978472639253
	valid_y_min_max_class: 0.536463401277
	valid_y_misclass: 0.0207
	valid_y_nll: 0.0682303426279
	valid_y_row_norms_max: 0.556202545699
	valid_y_row_norms_mean: 0.263762636273
	valid_y_row_norms_min: 0.102883379419
Time this epoch: 5.930689 seconds
Monitoring step:
	Epochs seen: 43
	Batches seen: 21500
	Examples seen: 2150000
	learning_rate: 0.0211584680287
	momentum: 0.533734939759
	total_seconds_last_epoch: 5.896643
	training_seconds_this_epoch: 5.930689
	valid_objective: 0.0681362986241
	valid_y_col_norms_max: 1.93649990002
	valid_y_col_norms_mean: 1.93625812448
	valid_y_col_norms_min: 1.93568812209
	valid_y_max_max_class: 0.999996842896
	valid_y_mean_max_class: 0.978984041993
	valid_y_min_max_class: 0.545429982729
	valid_y_misclass: 0.0208
	valid_y_nll: 0.0681362986241
	valid_y_row_norms_max: 0.555366264275
	valid_y_row_norms_mean: 0.263893828294
	valid_y_row_norms_min: 0.10304527795
Time this epoch: 5.453186 seconds
Monitoring step:
	Epochs seen: 44
	Batches seen: 22000
	Examples seen: 2200000
	learning_rate: 0.0207395105865
	momentum: 0.53453815261
	total_seconds_last_epoch: 6.391853
	training_seconds_this_epoch: 5.453186
	valid_objective: 0.0677736444362
	valid_y_col_norms_max: 1.93649990001
	valid_y_col_norms_mean: 1.93613422245
	valid_y_col_norms_min: 1.93494219442
	valid_y_max_max_class: 0.999996661316
	valid_y_mean_max_class: 0.978851094724
	valid_y_min_max_class: 0.544526207014
	valid_y_misclass: 0.0201
	valid_y_nll: 0.0677736444362
	valid_y_row_norms_max: 0.554979558689
	valid_y_row_norms_mean: 0.263962321191
	valid_y_row_norms_min: 0.103195129463
Time this epoch: 5.829830 seconds
Monitoring step:
	Epochs seen: 45
	Batches seen: 22500
	Examples seen: 2250000
	learning_rate: 0.0203288488932
	momentum: 0.535341365462
	total_seconds_last_epoch: 5.97628
	training_seconds_this_epoch: 5.82983
	valid_objective: 0.0676661977602
	valid_y_col_norms_max: 1.93649990001
	valid_y_col_norms_mean: 1.93608486641
	valid_y_col_norms_min: 1.93489012164
	valid_y_max_max_class: 0.999997122489
	valid_y_mean_max_class: 0.979207311934
	valid_y_min_max_class: 0.541562838223
	valid_y_misclass: 0.0213
	valid_y_nll: 0.0676661977602
	valid_y_row_norms_max: 0.55242065089
	valid_y_row_norms_mean: 0.264012881039
	valid_y_row_norms_min: 0.102884048645
Time this epoch: 5.430088 seconds
Monitoring step:
	Epochs seen: 46
	Batches seen: 23000
	Examples seen: 2300000
	learning_rate: 0.0199263186853
	momentum: 0.536144578313
	total_seconds_last_epoch: 6.305132
	training_seconds_this_epoch: 5.430088
	valid_objective: 0.0665784955302
	valid_y_col_norms_max: 1.93649990001
	valid_y_col_norms_mean: 1.93639183725
	valid_y_col_norms_min: 1.93598163006
	valid_y_max_max_class: 0.999997269505
	valid_y_mean_max_class: 0.97923932851
	valid_y_min_max_class: 0.552940083313
	valid_y_misclass: 0.02
	valid_y_nll: 0.0665784955302
	valid_y_row_norms_max: 0.549802901625
	valid_y_row_norms_mean: 0.264148793386
	valid_y_row_norms_min: 0.102842307922
Time this epoch: 5.897824 seconds
Monitoring step:
	Epochs seen: 47
	Batches seen: 23500
	Examples seen: 2350000
	learning_rate: 0.0195317589517
	momentum: 0.536947791165
	total_seconds_last_epoch: 5.959428
	training_seconds_this_epoch: 5.897824
	valid_objective: 0.0681469031182
	valid_y_col_norms_max: 1.93649990002
	valid_y_col_norms_mean: 1.93598926751
	valid_y_col_norms_min: 1.93463964799
	valid_y_max_max_class: 0.999997257893
	valid_y_mean_max_class: 0.979613624216
	valid_y_min_max_class: 0.542315144824
	valid_y_misclass: 0.0201
	valid_y_nll: 0.0681469031182
	valid_y_row_norms_max: 0.548223465559
	valid_y_row_norms_mean: 0.264169306677
	valid_y_row_norms_min: 0.102381753864
Time this epoch: 5.525158 seconds
Monitoring step:
	Epochs seen: 48
	Batches seen: 24000
	Examples seen: 2400000
	learning_rate: 0.0191450118696
	momentum: 0.537751004016
	total_seconds_last_epoch: 6.35558
	training_seconds_this_epoch: 5.525158
	valid_objective: 0.0698892905822
	valid_y_col_norms_max: 1.93649990001
	valid_y_col_norms_mean: 1.93575084477
	valid_y_col_norms_min: 1.93358945446
	valid_y_max_max_class: 0.999997643543
	valid_y_mean_max_class: 0.979543125765
	valid_y_min_max_class: 0.542149109755
	valid_y_misclass: 0.0213
	valid_y_nll: 0.0698892905822
	valid_y_row_norms_max: 0.546325855752
	valid_y_row_norms_mean: 0.264193353918
	valid_y_row_norms_min: 0.10206665524
Time this epoch: 5.760655 seconds
Monitoring step:
	Epochs seen: 49
	Batches seen: 24500
	Examples seen: 2450000
	learning_rate: 0.0187659227412
	momentum: 0.538554216867
	total_seconds_last_epoch: 6.052614
	training_seconds_this_epoch: 5.760655
	valid_objective: 0.0663245377722
	valid_y_col_norms_max: 1.93649990002
	valid_y_col_norms_mean: 1.93616316362
	valid_y_col_norms_min: 1.93531479922
	valid_y_max_max_class: 0.999997640275
	valid_y_mean_max_class: 0.979900889515
	valid_y_min_max_class: 0.557046970771
	valid_y_misclass: 0.0205
	valid_y_nll: 0.0663245377722
	valid_y_row_norms_max: 0.546145159507
	valid_y_row_norms_mean: 0.264324724773
	valid_y_row_norms_min: 0.102420875042
Time this epoch: 5.505016 seconds
Monitoring step:
	Epochs seen: 50
	Batches seen: 25000
	Examples seen: 2500000
	learning_rate: 0.0183943399319
	momentum: 0.539357429719
	total_seconds_last_epoch: 6.23482
	training_seconds_this_epoch: 5.505016
	valid_objective: 0.0667406732564
	valid_y_col_norms_max: 1.93649990002
	valid_y_col_norms_mean: 1.93614341932
	valid_y_col_norms_min: 1.93520197097
	valid_y_max_max_class: 0.999997716942
	valid_y_mean_max_class: 0.980091116867
	valid_y_min_max_class: 0.549003720211
	valid_y_misclass: 0.0201
	valid_y_nll: 0.0667406732564
	valid_y_row_norms_max: 0.544320723906
	valid_y_row_norms_mean: 0.264379014826
	valid_y_row_norms_min: 0.102157939244

In [ ]: