notebook.community

Edit and run

A suggested experimental workflow is to name the clade 'environment' a reference to a location in a notebook, which can be used to keep track of experimental steps

The environment paramter is used to name the files generated from clade activities, and also names the folder in which the generated files are stored
Evernote or OneNote are useful notebooks for tracking experiment activities
- Jupyter Notebooks or relevant .py scripts can be stored at each experimental step to record (perhaps redundantly--which is ok in experimental notekeeping) clade functions called or edits to .py scripts (if any) made during a given experimental step



In [1]:

    
from environment import ex
import clades
import pandas as pd
import os

#limit the architectures that will be generated
two_layers_max = {'type': 'range', 'bounds': [1, 2]}
max_ten_units = {'type': 'range', 'bounds': [2, 10]}

#create a new sacred object, which includes the config dictionary
n1e1p1b1_dict = ex.run(config_updates=\
                          {'population_size':3,\
                           'environment':'lab3000_n1e1p1b1',\
                           'max_train_time':5,\
                          'nb_layers':two_layers_max,\
                          'nb_units':max_ten_units})
#create a new clade object, passing in the config dictionary
n1e1p1b1_clade = clades.GAFC1(n1e1p1b1_dict.config)

#loading the data creates train,test, and validation sets
#and also creates a folder to store the output of clade activity 
n1e1p1b1_clade.load_data()









    



Using TensorFlow backend.
WARNING - DLGn1e1p1 - No observers have been added to this run
INFO - DLGn1e1p1 - Running command 'main'
INFO - DLGn1e1p1 - Started
INFO - DLGn1e1p1 - Completed after 0:00:00






    



Vectorizing sequence data...
x_ shape: (8982, 10000)
46 classes
Converting class vector to binary class matrix (for use with categorical_crossentropy)

Initially the output folder is empty
Generations are 0-indexed



In [2]:

    
n1e1p1b1_clade.current_generation









    Out[2]:





0

spawn() creates a pandas dataframe of genes which 'encode' the model architectures of a given population
the dataframe is saved as a property and also pickled into the experiment folder
- Note that the pickled dataframe file, and gene and model name includes reference to the generation (Gen0)



In [3]:

    
n1e1p1b1_clade.spawn()



In [4]:

    
n1e1p1b1_clade.genotypes









    Out[4]:






  
    
      
      LR
      activations
      batch_size
      epochs
      gene_name
      layer_units
      loss
      model_name
      nb_layers
      optimizer
    
  
  
    
      0
      0.106895
      [relu]
      512
      4
      lab3000_n1e1p1b1+Gen0+gene0
      [4]
      categorical_crossentropy
      lab3000_n1e1p1b1+Gen0+gene0+model.h5
      1
      Adadelta
    
    
      0
      0.065681
      [relu, softplus]
      512
      16
      lab3000_n1e1p1b1+Gen0+gene1
      [4, 10]
      categorical_crossentropy
      lab3000_n1e1p1b1+Gen0+gene1+model.h5
      2
      RMSProp
    
    
      0
      0.004449
      [softmax]
      64
      16
      lab3000_n1e1p1b1+Gen0+gene2
      [9]
      categorical_crossentropy
      lab3000_n1e1p1b1+Gen0+gene2+model.h5
      1
      Adam

seed_models() acts as an intermediary between genotypes and model evaluations, which are executed in grow_models()
compiled models are saved as .h5 files in the experiment folder



In [5]:

    
n1e1p1b1_clade.seed_models()



In [6]:

    
n1e1p1b1_clade.grow_models()









    



this is the index:  0
and this is the gene:  LR                                         0.106895
activations                                  [relu]
batch_size                                      512
epochs                                            4
gene_name               lab3000_n1e1p1b1+Gen0+gene0
layer_units                                     [4]
loss                       categorical_crossentropy
model_name     lab3000_n1e1p1b1+Gen0+gene0+model.h5
nb_layers                                         1
optimizer                                  Adadelta
Name: 0, dtype: object
Train on 8083 samples, validate on 899 samples
Epoch 1/4
8083/8083 [==============================] - 3s - loss: 3.8003 - acc: 0.1518 - val_loss: 3.7406 - val_acc: 0.2147
Epoch 2/4
8083/8083 [==============================] - 1s - loss: 3.6515 - acc: 0.3167 - val_loss: 3.5476 - val_acc: 0.4082
Epoch 3/4
7680/8083 [===========================>..] - ETA: 0s - loss: 3.3815 - acc: 0.4320_______Stopping after 5 seconds.
8083/8083 [==============================] - 1s - loss: 3.3726 - acc: 0.4328 - val_loss: 3.1841 - val_acc: 0.4705
2080/2246 [==========================>...] - ETA: 0sthis is the index:  1
and this is the gene:  LR                                        0.0656806
activations                        [relu, softplus]
batch_size                                      512
epochs                                           16
gene_name               lab3000_n1e1p1b1+Gen0+gene1
layer_units                                 [4, 10]
loss                       categorical_crossentropy
model_name     lab3000_n1e1p1b1+Gen0+gene1+model.h5
nb_layers                                         2
optimizer                                   RMSProp
Name: 0, dtype: object
Train on 8083 samples, validate on 899 samples
Epoch 1/16
8083/8083 [==============================] - 2s - loss: 3.9295 - acc: 0.0040 - val_loss: 3.7433 - val_acc: 0.0044
Epoch 2/16
8083/8083 [==============================] - 1s - loss: 3.5759 - acc: 0.0073 - val_loss: 3.4173 - val_acc: 0.0189
Epoch 3/16
8083/8083 [==============================] - 1s - loss: 3.2314 - acc: 0.1466 - val_loss: 3.0809 - val_acc: 0.3471
Epoch 4/16
7680/8083 [===========================>..] - ETA: 0s - loss: 2.8953 - acc: 0.4474_______Stopping after 5 seconds.
8083/8083 [==============================] - 1s - loss: 2.8852 - acc: 0.4506 - val_loss: 2.7559 - val_acc: 0.4461
2246/2246 [==============================] - 0s     
in the else
this is the index:  2
and this is the gene:  LR                                       0.00444869
activations                               [softmax]
batch_size                                       64
epochs                                           16
gene_name               lab3000_n1e1p1b1+Gen0+gene2
layer_units                                     [9]
loss                       categorical_crossentropy
model_name     lab3000_n1e1p1b1+Gen0+gene2+model.h5
nb_layers                                         1
optimizer                                      Adam
Name: 0, dtype: object
Train on 8083 samples, validate on 899 samples
Epoch 1/16
8083/8083 [==============================] - 3s - loss: 3.5798 - acc: 0.1379 - val_loss: 3.3895 - val_acc: 0.2191
Epoch 2/16
7936/8083 [============================>.] - ETA: 0s - loss: 3.2423 - acc: 0.2387_______Stopping after 5 seconds.
8083/8083 [==============================] - 2s - loss: 3.2388 - acc: 0.2380 - val_loss: 3.1041 - val_acc: 0.2570
2080/2246 [==========================>...] - ETA: 0sin the else

^^^verbose output of n1e1p1b1_clade.grow_models()

grow_models() trains the models and generates pickled 'growth analyses' dataframes, one for each model trained, which include train and validation loss and accuracy for each batch and epoch, as well as the time take to run each batch and epoch
grow_models() also pickles, and saves as a property, a phenotypes dataframe, which summarizes the performance of each model
- the misclassed dictionaries store the true and labeled classes for each mislabeled datapoint
grow_models() also saves each trained model as a .h5 file



In [7]:

    
n1e1p1b1_clade.phenotypes









    Out[7]:






  
    
      
      gene_name
      misclassed
      test_accuracy
      test_loss
      time
      train_accuracy
      train_loss
    
  
  
    
      0
      lab3000_n1e1p1b1+Gen0+gene0
      {'true_class': [3, 10, 1, 4, 3, 3, 3, 5, 1, 1,...
      0.468833
      3.168827
      6.302856
      0.467401
      3.147856
    
    
      0
      lab3000_n1e1p1b1+Gen0+gene1
      {'true_class': [10, 1, 4, 4, 5, 4, 1, 1, 11, 2...
      0.455476
      2.760931
      6.278670
      0.464803
      2.702639
    
    
      0
      lab3000_n1e1p1b1+Gen0+gene2
      {'true_class': [3, 10, 1, 3, 3, 3, 3, 3, 5, 1,...
      0.253339
      3.106247
      5.817873
      0.265124
      3.091641

select_parents() selects, by default, the top 20% of models by test accuracy, plut 10% random models; or if the population size is small, such as in this demo case, at least two parent models are selected



In [8]:

    
n1e1p1b1_clade.select_parents()



In [9]:

    
n1e1p1b1_clade.parent_genes









    Out[9]:






  
    
      
      LR
      activations
      batch_size
      epochs
      gene_name
      layer_units
      loss
      model_name
      nb_layers
      optimizer
    
  
  
    
      0
      0.106895
      [relu]
      512
      4
      lab3000_n1e1p1b1+Gen0+gene0
      [4]
      categorical_crossentropy
      lab3000_n1e1p1b1+Gen0+gene0+model.h5
      1
      Adadelta
    
    
      0
      0.004449
      [softmax]
      64
      16
      lab3000_n1e1p1b1+Gen0+gene2
      [9]
      categorical_crossentropy
      lab3000_n1e1p1b1+Gen0+gene2+model.h5
      1
      Adam

breed() generates a new population of genes, encoding a new generation of models; note that current_generation is incremented when clade.breed() is run



In [10]:

    
n1e1p1b1_clade.breed()



In [11]:

    
n1e1p1b1_clade.current_generation









    Out[11]:





1



In [12]:

    
n1e1p1b1_clade.genotypes









    Out[12]:






  
    
      
      LR
      activations
      batch_size
      epochs
      gene_name
      layer_units
      model_name
      nb_layers
      optimizer
    
  
  
    
      0
      0.004449
      [relu]
      512
      16
      lab3000_n1e1p1b1+Gen1+gene0
      [9]
      lab3000_n1e1p1b1+Gen1+gene0+model.h5
      1
      Adam
    
    
      1
      0.004449
      [relu]
      64
      4
      lab3000_n1e1p1b1+Gen1+gene1
      [9]
      lab3000_n1e1p1b1+Gen1+gene1+model.h5
      1
      Adam
    
    
      2
      0.004449
      [softmax]
      64
      4
      lab3000_n1e1p1b1+Gen1+gene2
      [4]
      lab3000_n1e1p1b1+Gen1+gene2+model.h5
      1
      Adadelta

after model evolution is run interactively, the commands can be saved to the experiment notebook (here, Evernote)



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:



In [ ]:

LR	activations	batch_size	epochs	gene_name	layer_units	loss	model_name	nb_layers	optimizer
0.106895	[relu]	512	4	lab3000_n1e1p1b1+Gen0+gene0	[4]	categorical_crossentropy	lab3000_n1e1p1b1+Gen0+gene0+model.h5	1	Adadelta
0.065681	[relu, softplus]	512	16	lab3000_n1e1p1b1+Gen0+gene1	[4, 10]	categorical_crossentropy	lab3000_n1e1p1b1+Gen0+gene1+model.h5	2	RMSProp
0.004449	[softmax]	64	16	lab3000_n1e1p1b1+Gen0+gene2	[9]	categorical_crossentropy	lab3000_n1e1p1b1+Gen0+gene2+model.h5	1	Adam

gene_name	misclassed	test_accuracy	test_loss	time	train_accuracy	train_loss
lab3000_n1e1p1b1+Gen0+gene0	{'true_class': [3, 10, 1, 4, 3, 3, 3, 5, 1, 1,...	0.468833	3.168827	6.302856	0.467401	3.147856
lab3000_n1e1p1b1+Gen0+gene1	{'true_class': [10, 1, 4, 4, 5, 4, 1, 1, 11, 2...	0.455476	2.760931	6.278670	0.464803	2.702639
lab3000_n1e1p1b1+Gen0+gene2	{'true_class': [3, 10, 1, 3, 3, 3, 3, 3, 5, 1,...	0.253339	3.106247	5.817873	0.265124	3.091641

	LR	activations	batch_size	epochs	gene_name	layer_units	model_name	nb_layers	optimizer
0	0.004449	[relu]	512	16	lab3000_n1e1p1b1+Gen1+gene0	[9]	lab3000_n1e1p1b1+Gen1+gene0+model.h5	1	Adam
1	0.004449	[relu]	64	4	lab3000_n1e1p1b1+Gen1+gene1	[9]	lab3000_n1e1p1b1+Gen1+gene1+model.h5	1	Adam
2	0.004449	[softmax]	64	4	lab3000_n1e1p1b1+Gen1+gene2	[4]	lab3000_n1e1p1b1+Gen1+gene2+model.h5	1	Adadelta