Predicting inter-cell statistics

This tutorial describes how to predict inter-cell statistics such as the mean methylation rate or variance across cells.

Initialization

We first initialize some variables that will be used throughout the tutorial. test_mode=1 should be used for testing purposes, which speeds up computations by only using a subset of the data. For real applications, test_mode=0 should be used.


In [1]:
function run {
  local cmd=$@
  echo
  echo "#################################"
  echo $cmd
  echo "#################################"
  eval $cmd
}

test_mode=1 # Set to 1 for testing and 0 otherwise.
example_dir="../../data" # Directory with example data.
cpg_dir="$example_dir/cpg" # Directory with CpG profiles.
dna_dir="$example_dir/dna/mm10" # Directory with DNA sequences.

Creating DeepCpG data files

dcpg_data.py provides the arguments --cpg_stats and --win_stats to compute statistics across cells for single CpG sites or in windows of lengths --win_stats_wlen centred on CpG sites, respectively. Supported statistics are described in the documentation and include the mean methylation rate (mean), variance (var), and if a CpG site is differentially methylated (diff). With --cpg_stats_cov, per-CpG statistics will be computed only for CpG sites that are covered by at least the specified number of cells. If this number of too low, estimated statistics might be unreliable in lowly covered regions. We will compute the mean methylation rate and variance across cells in windows of different lengths, and if CpG sites with at least three observations are differentially methylated.


In [2]:
data_dir="./data"
cmd="dcpg_data.py
    --cpg_profiles $cpg_dir/*.tsv
    --dna_files $dna_dir
    --out_dir $data_dir
    --dna_wlen 1001
    --cpg_wlen 50
    --cpg_stats diff
    --cpg_stats_cov 3
    --win_stats mean var
    --win_stats_wlen 1001 2001 3001 4001 5001
"
if [[ $test_mode -eq 1 ]]; then
    cmd="$cmd
        --chromo 1 13
        --nb_sample_chromo 10000
        "
fi
run $cmd


#################################
dcpg_data.py --cpg_profiles ../../data/cpg/BS27_1_SER.tsv ../../data/cpg/BS27_3_SER.tsv ../../data/cpg/BS27_5_SER.tsv ../../data/cpg/BS27_6_SER.tsv ../../data/cpg/BS27_8_SER.tsv --dna_files ../../data/dna/mm10 --out_dir ./data --dna_wlen 1001 --cpg_wlen 50 --cpg_stats diff --cpg_stats_cov 3 --win_stats mean var --win_stats_wlen 1001 2001 3001 4001 5001 --chromo 1 13 --nb_sample_chromo 10000
#################################
INFO (2017-05-01 09:00:48,895): Reading CpG profiles ...
INFO (2017-05-01 09:00:48,895): ../../data/cpg/BS27_1_SER.tsv
INFO (2017-05-01 09:00:55,405): ../../data/cpg/BS27_3_SER.tsv
INFO (2017-05-01 09:01:00,122): ../../data/cpg/BS27_5_SER.tsv
INFO (2017-05-01 09:01:07,260): ../../data/cpg/BS27_6_SER.tsv
INFO (2017-05-01 09:01:12,711): ../../data/cpg/BS27_8_SER.tsv
INFO (2017-05-01 09:01:18,234): 20000 samples
INFO (2017-05-01 09:01:18,235): --------------------------------------------------------------------------------
INFO (2017-05-01 09:01:18,235): Chromosome 1 ...
INFO (2017-05-01 09:01:18,280): 10000 / 10000 (100.0%) sites matched minimum coverage filter
INFO (2017-05-01 09:01:22,054): Chunk 	1 / 1
INFO (2017-05-01 09:01:22,063): Computing per CpG statistics ...
INFO (2017-05-01 09:01:22,066): Extracting DNA sequence windows ...
INFO (2017-05-01 09:01:24,203): Extracting CpG neighbors ...
INFO (2017-05-01 09:01:25,366): Computing window-based statistics ...
INFO (2017-05-01 09:01:25,968): --------------------------------------------------------------------------------
INFO (2017-05-01 09:01:25,969): Chromosome 13 ...
INFO (2017-05-01 09:01:26,012): 10000 / 10000 (100.0%) sites matched minimum coverage filter
INFO (2017-05-01 09:01:28,264): Chunk 	1 / 1
INFO (2017-05-01 09:01:28,271): Computing per CpG statistics ...
INFO (2017-05-01 09:01:28,275): Extracting DNA sequence windows ...
INFO (2017-05-01 09:01:30,274): Extracting CpG neighbors ...
INFO (2017-05-01 09:01:31,433): Computing window-based statistics ...
INFO (2017-05-01 09:01:32,071): Done!

Model training

We will train a DNA model to predict mean methylation rates, cell-to-cell variance, and differentially methylated CpG sites from the DNA sequence alone. However, you could train a CpG model or Joint model to also use neighboring CpG sites for making predictions. To predict all per-CpG and window-based statistics computed by dcpg_data.py instead of methylation states, we are running dcpg_train.py with --output_names 'cpg_stats/.*' 'win_stats/.*'. You could use --output_names '.*' to predict both methylation states and statistics.


In [3]:
train_files=$(ls $data_dir/c{1,3,5,7,9}_*.h5 2> /dev/null)
echo "Training files:"
echo $train_files
echo

val_files=$(ls $data_dir/c{13,14,15,16,17}_*.h5 2> /dev/null)
echo "Validation files:"
echo $val_files


Training files:
./data/c1_000000-010000.h5

Validation files:
./data/c13_000000-010000.h5

In [4]:
model_dir="./model"

cmd="dcpg_train.py
    $train_files
    --val_files $val_files
    --dna_model CnnL2h128
    --out_dir $model_dir
    --output_names 'cpg_stats/.*' 'win_stats/.*'
    "
if [[ $test_mode -eq 1 ]]; then
    cmd="$cmd
        --nb_epoch 1
        --nb_train_sample 10000
        --nb_val_sample 10000
    "
else
    cmd="$cmd
        --nb_epoch 30
        "
fi
run $cmd


#################################
dcpg_train.py ./data/c1_000000-010000.h5 --val_files ./data/c13_000000-010000.h5 --dna_model CnnL2h128 --out_dir ./model --output_names 'cpg_stats/.*' 'win_stats/.*' --nb_epoch 1 --nb_train_sample 10000 --nb_val_sample 10000
#################################
Using TensorFlow backend.
INFO (2017-05-01 09:01:37,896): Building model ...
INFO (2017-05-01 09:01:37,901): Building DNA model ...
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
dna (InputLayer)                 (None, 1001, 4)       0                                            
____________________________________________________________________________________________________
dna/conv1d_1 (Conv1D)            (None, 991, 128)      5760        dna[0][0]                        
____________________________________________________________________________________________________
dna/activation_1 (Activation)    (None, 991, 128)      0           dna/conv1d_1[0][0]               
____________________________________________________________________________________________________
dna/max_pooling1d_1 (MaxPooling1 (None, 247, 128)      0           dna/activation_1[0][0]           
____________________________________________________________________________________________________
dna/conv1d_2 (Conv1D)            (None, 245, 256)      98560       dna/max_pooling1d_1[0][0]        
____________________________________________________________________________________________________
dna/activation_2 (Activation)    (None, 245, 256)      0           dna/conv1d_2[0][0]               
____________________________________________________________________________________________________
dna/max_pooling1d_2 (MaxPooling1 (None, 122, 256)      0           dna/activation_2[0][0]           
____________________________________________________________________________________________________
dna/flatten_1 (Flatten)          (None, 31232)         0           dna/max_pooling1d_2[0][0]        
____________________________________________________________________________________________________
dna/dense_1 (Dense)              (None, 128)           3997824     dna/flatten_1[0][0]              
____________________________________________________________________________________________________
dna/activation_3 (Activation)    (None, 128)           0           dna/dense_1[0][0]                
____________________________________________________________________________________________________
dna/dropout_1 (Dropout)          (None, 128)           0           dna/activation_3[0][0]           
____________________________________________________________________________________________________
dense_2 (Dense)                  (None, 1)             129         dna/dropout_1[0][0]              
____________________________________________________________________________________________________
dense_3 (Dense)                  (None, 1)             129         dna/dropout_1[0][0]              
____________________________________________________________________________________________________
dense_4 (Dense)                  (None, 1)             129         dna/dropout_1[0][0]              
____________________________________________________________________________________________________
dense_5 (Dense)                  (None, 1)             129         dna/dropout_1[0][0]              
____________________________________________________________________________________________________
dense_6 (Dense)                  (None, 1)             129         dna/dropout_1[0][0]              
____________________________________________________________________________________________________
cpg_stats/diff (Dense)           (None, 1)             129         dna/dropout_1[0][0]              
____________________________________________________________________________________________________
win_stats/1001/mean (Dense)      (None, 1)             129         dna/dropout_1[0][0]              
____________________________________________________________________________________________________
win_stats/1001/var (ScaledSigmoi (None, 1)             0           dense_2[0][0]                    
____________________________________________________________________________________________________
win_stats/2001/mean (Dense)      (None, 1)             129         dna/dropout_1[0][0]              
____________________________________________________________________________________________________
win_stats/2001/var (ScaledSigmoi (None, 1)             0           dense_3[0][0]                    
____________________________________________________________________________________________________
win_stats/3001/mean (Dense)      (None, 1)             129         dna/dropout_1[0][0]              
____________________________________________________________________________________________________
win_stats/3001/var (ScaledSigmoi (None, 1)             0           dense_4[0][0]                    
____________________________________________________________________________________________________
win_stats/4001/mean (Dense)      (None, 1)             129         dna/dropout_1[0][0]              
____________________________________________________________________________________________________
win_stats/4001/var (ScaledSigmoi (None, 1)             0           dense_5[0][0]                    
____________________________________________________________________________________________________
win_stats/5001/mean (Dense)      (None, 1)             129         dna/dropout_1[0][0]              
____________________________________________________________________________________________________
win_stats/5001/var (ScaledSigmoi (None, 1)             0           dense_6[0][0]                    
====================================================================================================
Total params: 4,103,563
Trainable params: 4,103,563
Non-trainable params: 0
____________________________________________________________________________________________________
INFO (2017-05-01 09:01:38,241): Computing output statistics ...
Output statistics:
               name | nb_tot | nb_obs | frac_obs | mean |  var
--------------------------------------------------------------
     cpg_stats/diff |  10000 |      3 |     0.00 | 0.00 | 0.00
win_stats/1001/mean |  10000 |  10000 |     1.00 | 0.74 | 0.17
 win_stats/1001/var |  10000 |  10000 |     1.00 | 0.02 | 0.00
win_stats/2001/mean |  10000 |  10000 |     1.00 | 0.74 | 0.16
 win_stats/2001/var |  10000 |  10000 |     1.00 | 0.03 | 0.00
win_stats/3001/mean |  10000 |  10000 |     1.00 | 0.74 | 0.15
 win_stats/3001/var |  10000 |  10000 |     1.00 | 0.03 | 0.01
win_stats/4001/mean |  10000 |  10000 |     1.00 | 0.74 | 0.14
 win_stats/4001/var |  10000 |  10000 |     1.00 | 0.04 | 0.01
win_stats/5001/mean |  10000 |  10000 |     1.00 | 0.74 | 0.13
 win_stats/5001/var |  10000 |  10000 |     1.00 | 0.04 | 0.01

Class weights:
cpg_stats/diff
--------------
        0=0.00
        1=1.00

INFO (2017-05-01 09:01:39,188): Loading data ...
INFO (2017-05-01 09:01:39,191): Initializing callbacks ...
INFO (2017-05-01 09:01:39,192): Training model ...

Training samples: 10000
Validation samples: 10000
Epochs: 1
Learning rate: 0.0001
====================================================================================================
Epoch 1/1
====================================================================================================
done (%) | time |   loss |    acc |    mse |    mae | cpg_stats/diff_loss | win_stats/1001/mean_loss | win_stats/1001/var_loss | win_stats/2001/mean_loss | win_stats/2001/var_loss | win_stats/3001/mean_loss | win_stats/3001/var_loss | win_stats/4001/mean_loss | win_stats/4001/var_loss | win_stats/5001/mean_loss | win_stats/5001/var_loss | cpg_stats/diff_acc | win_stats/1001/mean_acc | win_stats/2001/mean_acc | win_stats/3001/mean_acc | win_stats/4001/mean_acc | win_stats/5001/mean_acc | win_stats/1001/mean_mse | win_stats/1001/var_mse | win_stats/2001/mean_mse | win_stats/2001/var_mse | win_stats/3001/mean_mse | win_stats/3001/var_mse | win_stats/4001/mean_mse | win_stats/4001/var_mse | win_stats/5001/mean_mse | win_stats/5001/var_mse | win_stats/1001/mean_mae | win_stats/1001/var_mae | win_stats/2001/mean_mae | win_stats/2001/var_mae | win_stats/3001/mean_mae | win_stats/3001/var_mae | win_stats/4001/mean_mae | win_stats/4001/var_mae | win_stats/5001/mean_mae | win_stats/5001/var_mae
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     1.3 |  0.0 | 3.9243 | 0.5281 | 0.1093 | 0.2705 |              0.0000 |                   0.2252 |                  0.0153 |                   0.2113 |                  0.0149 |                   0.2029 |                  0.0144 |                   0.1932 |                  0.0140 |                   0.1885 |                  0.0136 |                nan |                  0.7266 |                  0.5938 |                  0.4609 |                  0.3672 |                  0.4922 |                  0.2252 |                 0.0153 |                  0.2113 |                 0.0149 |                  0.2029 |                 0.0144 |                  0.1932 |                 0.0140 |                  0.1885 |                 0.0136 |                  0.4566 |                 0.1233 |                  0.4325 |                 0.1208 |                  0.4188 |                 0.1186 |                  0.4029 |                 0.1163 |                  0.4007 |                 0.1144
    10.3 |  0.2 | 3.7188 | 0.6951 | 0.1012 | 0.2624 |              0.0000 |                   0.2098 |                  0.0150 |                   0.1955 |                  0.0150 |                   0.1889 |                  0.0136 |                   0.1760 |                  0.0141 |                   0.1717 |                  0.0128 |                nan |                  0.7119 |                  0.7041 |                  0.6836 |                  0.6816 |                  0.6943 |                  0.2098 |                 0.0150 |                  0.1955 |                 0.0150 |                  0.1889 |                 0.0136 |                  0.1760 |                 0.0141 |                  0.1717 |                 0.0128 |                  0.4398 |                 0.1219 |                  0.4190 |                 0.1210 |                  0.4084 |                 0.1145 |                  0.3903 |                 0.1159 |                  0.3837 |                 0.1100
    20.5 |  0.3 | 3.4898 | 0.5862 | 0.0917 | 0.2458 |              0.0000 |                   0.1916 |                  0.0143 |                   0.1760 |                  0.0156 |                   0.1698 |                  0.0122 |                   0.1574 |                  0.0157 |                   0.1526 |                  0.0117 |             0.0000 |                  0.7129 |                  0.7085 |                  0.6978 |                  0.6929 |                  0.7051 |                  0.1916 |                 0.0143 |                  0.1760 |                 0.0156 |                  0.1698 |                 0.0122 |                  0.1574 |                 0.0157 |                  0.1526 |                 0.0117 |                  0.4091 |                 0.1191 |                  0.3840 |                 0.1233 |                  0.3782 |                 0.1081 |                  0.3563 |                 0.1222 |                  0.3529 |                 0.1050
    30.8 |  0.5 | 3.3053 | 0.5928 | 0.0861 | 0.2285 |              0.0000 |                   0.1797 |                  0.0128 |                   0.1659 |                  0.0142 |                   0.1593 |                  0.0106 |                   0.1490 |                  0.0149 |                   0.1438 |                  0.0109 |             0.0000 |                  0.7168 |                  0.7152 |                  0.7083 |                  0.7044 |                  0.7122 |                  0.1797 |                 0.0128 |                  0.1659 |                 0.0142 |                  0.1593 |                 0.0106 |                  0.1490 |                 0.0149 |                  0.1438 |                 0.0109 |                  0.3738 |                 0.1119 |                  0.3558 |                 0.1174 |                  0.3495 |                 0.0988 |                  0.3325 |                 0.1184 |                  0.3264 |                 0.1010
    41.0 |  0.6 | 3.1635 | 0.5922 | 0.0843 | 0.2236 |              0.0000 |                   0.1775 |                  0.0115 |                   0.1640 |                  0.0125 |                   0.1571 |                  0.0095 |                   0.1470 |                  0.0130 |                   0.1407 |                  0.0106 |             0.0000 |                  0.7134 |                  0.7134 |                  0.7070 |                  0.7065 |                  0.7131 |                  0.1775 |                 0.0115 |                  0.1640 |                 0.0125 |                  0.1571 |                 0.0095 |                  0.1470 |                 0.0130 |                  0.1407 |                 0.0106 |                  0.3686 |                 0.1045 |                  0.3554 |                 0.1082 |                  0.3452 |                 0.0912 |                  0.3316 |                 0.1093 |                  0.3226 |                 0.0998
    51.3 |  0.8 | 3.0206 | 0.5939 | 0.0820 | 0.2188 |              0.0000 |                   0.1738 |                  0.0104 |                   0.1606 |                  0.0112 |                   0.1527 |                  0.0088 |                   0.1432 |                  0.0119 |                   0.1368 |                  0.0102 |             0.0000 |                  0.7164 |                  0.7154 |                  0.7082 |                  0.7084 |                  0.7150 |                  0.1738 |                 0.0104 |                  0.1606 |                 0.0112 |                  0.1527 |                 0.0088 |                  0.1432 |                 0.0119 |                  0.1368 |                 0.0102 |                  0.3662 |                 0.0982 |                  0.3508 |                 0.1008 |                  0.3410 |                 0.0852 |                  0.3270 |                 0.1023 |                  0.3192 |                 0.0974
    61.5 |  0.9 | 2.8945 | 0.5947 | 0.0808 | 0.2135 |              0.0000 |                   0.1727 |                  0.0095 |                   0.1599 |                  0.0102 |                   0.1510 |                  0.0082 |                   0.1412 |                  0.0109 |                   0.1348 |                  0.0096 |             0.0000 |                  0.7150 |                  0.7142 |                  0.7103 |                  0.7118 |                  0.7171 |                  0.1727 |                 0.0095 |                  0.1599 |                 0.0102 |                  0.1510 |                 0.0082 |                  0.1412 |                 0.0109 |                  0.1348 |                 0.0096 |                  0.3615 |                 0.0919 |                  0.3463 |                 0.0937 |                  0.3370 |                 0.0794 |                  0.3220 |                 0.0960 |                  0.3138 |                 0.0936
    71.8 |  1.1 | 2.7730 | 0.5954 | 0.0796 | 0.2105 |              0.0000 |                   0.1709 |                  0.0087 |                   0.1586 |                  0.0094 |                   0.1492 |                  0.0077 |                   0.1394 |                  0.0102 |                   0.1326 |                  0.0092 |             0.0000 |                  0.7160 |                  0.7139 |                  0.7112 |                  0.7125 |                  0.7192 |                  0.1709 |                 0.0087 |                  0.1586 |                 0.0094 |                  0.1492 |                 0.0077 |                  0.1394 |                 0.0102 |                  0.1326 |                 0.0092 |                  0.3603 |                 0.0868 |                  0.3461 |                 0.0885 |                  0.3347 |                 0.0755 |                  0.3204 |                 0.0913 |                  0.3113 |                 0.0904
    82.1 |  1.3 | 2.6631 | 0.5937 | 0.0791 | 0.2085 |              0.0000 |                   0.1709 |                  0.0082 |                   0.1585 |                  0.0089 |                   0.1484 |                  0.0075 |                   0.1382 |                  0.0097 |                   0.1316 |                  0.0089 |             0.0000 |                  0.7139 |                  0.7111 |                  0.7086 |                  0.7109 |                  0.7177 |                  0.1709 |                 0.0082 |                  0.1585 |                 0.0089 |                  0.1484 |                 0.0075 |                  0.1382 |                 0.0097 |                  0.1316 |                 0.0089 |                  0.3601 |                 0.0830 |                  0.3456 |                 0.0846 |                  0.3338 |                 0.0730 |                  0.3189 |                 0.0883 |                  0.3102 |                 0.0878
    92.3 |  1.4 | 2.5563 | 0.5930 | 0.0784 | 0.2069 |              0.0000 |                   0.1701 |                  0.0077 |                   0.1574 |                  0.0084 |                   0.1472 |                  0.0073 |                   0.1372 |                  0.0093 |                   0.1303 |                  0.0086 |             0.0000 |                  0.7125 |                  0.7101 |                  0.7081 |                  0.7106 |                  0.7169 |                  0.1701 |                 0.0077 |                  0.1574 |                 0.0084 |                  0.1472 |                 0.0073 |                  0.1372 |                 0.0093 |                  0.1303 |                 0.0086 |                  0.3601 |                 0.0800 |                  0.3449 |                 0.0814 |                  0.3328 |                 0.0710 |                  0.3179 |                 0.0860 |                  0.3087 |                 0.0858
   100.0 |  1.5 | 2.4809 | 0.5924 | 0.0780 | 0.2056 |              0.0000 |                   0.1695 |                  0.0075 |                   0.1566 |                  0.0082 |                   0.1466 |                  0.0072 |                   0.1368 |                  0.0091 |                   0.1298 |                  0.0084 |             0.0000 |                  0.7117 |                  0.7088 |                  0.7073 |                  0.7099 |                  0.7164 |                  0.1695 |                 0.0075 |                  0.1566 |                 0.0082 |                  0.1466 |                 0.0072 |                  0.1368 |                 0.0091 |                  0.1298 |                 0.0084 |                  0.3588 |                 0.0781 |                  0.3434 |                 0.0796 |                  0.3316 |                 0.0698 |                  0.3172 |                 0.0847 |                  0.3081 |                 0.0844
Epoch 00000: val_loss improved from inf to 1.44847, saving model to ./model/model_weights_val.h5

 split |   loss |    acc |    mse |    mae | cpg_stats/diff_loss | win_stats/1001/mean_loss | win_stats/1001/var_loss | win_stats/2001/mean_loss | win_stats/2001/var_loss | win_stats/3001/mean_loss | win_stats/3001/var_loss | win_stats/4001/mean_loss | win_stats/4001/var_loss | win_stats/5001/mean_loss | win_stats/5001/var_loss | cpg_stats/diff_acc | win_stats/1001/mean_acc | win_stats/2001/mean_acc | win_stats/3001/mean_acc | win_stats/4001/mean_acc | win_stats/5001/mean_acc | win_stats/1001/mean_mse | win_stats/1001/var_mse | win_stats/2001/mean_mse | win_stats/2001/var_mse | win_stats/3001/mean_mse | win_stats/3001/var_mse | win_stats/4001/mean_mse | win_stats/4001/var_mse | win_stats/5001/mean_mse | win_stats/5001/var_mse | win_stats/1001/mean_mae | win_stats/1001/var_mae | win_stats/2001/mean_mae | win_stats/2001/var_mae | win_stats/3001/mean_mae | win_stats/3001/var_mae | win_stats/4001/mean_mae | win_stats/4001/var_mae | win_stats/5001/mean_mae | win_stats/5001/var_mae
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 train | 2.4809 | 0.7109 | 0.0780 | 0.2056 |              0.0000 |                   0.1695 |                  0.0075 |                   0.1566 |                  0.0082 |                   0.1466 |                  0.0072 |                   0.1368 |                  0.0091 |                   0.1298 |                  0.0084 |                nan |                  0.7117 |                  0.7088 |                  0.7073 |                  0.7099 |                  0.7164 |                  0.1695 |                 0.0075 |                  0.1566 |                 0.0082 |                  0.1466 |                 0.0072 |                  0.1368 |                 0.0091 |                  0.1298 |                 0.0084 |                  0.3588 |                 0.0781 |                  0.3434 |                 0.0796 |                  0.3316 |                 0.0698 |                  0.3172 |                 0.0847 |                  0.3081 |                 0.0844
   val | 1.4485 | 0.7175 | 0.0665 | 0.1816 |              0.0007 |                   0.1509 |                  0.0046 |                   0.1351 |                  0.0054 |                   0.1252 |                  0.0058 |                   0.1161 |                  0.0063 |                   0.1092 |                  0.0062 |                nan |                  0.7109 |                  0.7124 |                  0.7173 |                  0.7206 |                  0.7264 |                  0.1509 |                 0.0046 |                  0.1351 |                 0.0054 |                  0.1252 |                 0.0058 |                  0.1161 |                 0.0063 |                  0.1092 |                 0.0062 |                  0.3299 |                 0.0556 |                  0.3131 |                 0.0571 |                  0.3004 |                 0.0564 |                  0.2900 |                 0.0668 |                  0.2803 |                 0.0665
====================================================================================================

Training set performance:
  loss |    acc |    mse |    mae | cpg_stats/diff_loss | win_stats/1001/mean_loss | win_stats/1001/var_loss | win_stats/2001/mean_loss | win_stats/2001/var_loss | win_stats/3001/mean_loss | win_stats/3001/var_loss | win_stats/4001/mean_loss | win_stats/4001/var_loss | win_stats/5001/mean_loss | win_stats/5001/var_loss | cpg_stats/diff_acc | win_stats/1001/mean_acc | win_stats/2001/mean_acc | win_stats/3001/mean_acc | win_stats/4001/mean_acc | win_stats/5001/mean_acc | win_stats/1001/mean_mse | win_stats/1001/var_mse | win_stats/2001/mean_mse | win_stats/2001/var_mse | win_stats/3001/mean_mse | win_stats/3001/var_mse | win_stats/4001/mean_mse | win_stats/4001/var_mse | win_stats/5001/mean_mse | win_stats/5001/var_mse | win_stats/1001/mean_mae | win_stats/1001/var_mae | win_stats/2001/mean_mae | win_stats/2001/var_mae | win_stats/3001/mean_mae | win_stats/3001/var_mae | win_stats/4001/mean_mae | win_stats/4001/var_mae | win_stats/5001/mean_mae | win_stats/5001/var_mae
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2.4809 | 0.7109 | 0.0780 | 0.2056 |              0.0000 |                   0.1695 |                  0.0075 |                   0.1566 |                  0.0082 |                   0.1466 |                  0.0072 |                   0.1368 |                  0.0091 |                   0.1298 |                  0.0084 |                nan |                  0.7117 |                  0.7088 |                  0.7073 |                  0.7099 |                  0.7164 |                  0.1695 |                 0.0075 |                  0.1566 |                 0.0082 |                  0.1466 |                 0.0072 |                  0.1368 |                 0.0091 |                  0.1298 |                 0.0084 |                  0.3588 |                 0.0781 |                  0.3434 |                 0.0796 |                  0.3316 |                 0.0698 |                  0.3172 |                 0.0847 |                  0.3081 |                 0.0844

Validation set performance:
  loss |    acc |    mse |    mae | cpg_stats/diff_loss | win_stats/1001/mean_loss | win_stats/1001/var_loss | win_stats/2001/mean_loss | win_stats/2001/var_loss | win_stats/3001/mean_loss | win_stats/3001/var_loss | win_stats/4001/mean_loss | win_stats/4001/var_loss | win_stats/5001/mean_loss | win_stats/5001/var_loss | cpg_stats/diff_acc | win_stats/1001/mean_acc | win_stats/2001/mean_acc | win_stats/3001/mean_acc | win_stats/4001/mean_acc | win_stats/5001/mean_acc | win_stats/1001/mean_mse | win_stats/1001/var_mse | win_stats/2001/mean_mse | win_stats/2001/var_mse | win_stats/3001/mean_mse | win_stats/3001/var_mse | win_stats/4001/mean_mse | win_stats/4001/var_mse | win_stats/5001/mean_mse | win_stats/5001/var_mse | win_stats/1001/mean_mae | win_stats/1001/var_mae | win_stats/2001/mean_mae | win_stats/2001/var_mae | win_stats/3001/mean_mae | win_stats/3001/var_mae | win_stats/4001/mean_mae | win_stats/4001/var_mae | win_stats/5001/mean_mae | win_stats/5001/var_mae
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1.4485 | 0.7175 | 0.0665 | 0.1816 |              0.0007 |                   0.1509 |                  0.0046 |                   0.1351 |                  0.0054 |                   0.1252 |                  0.0058 |                   0.1161 |                  0.0063 |                   0.1092 |                  0.0062 |                nan |                  0.7109 |                  0.7124 |                  0.7173 |                  0.7206 |                  0.7264 |                  0.1509 |                 0.0046 |                  0.1351 |                 0.0054 |                  0.1252 |                 0.0058 |                  0.1161 |                 0.0063 |                  0.1092 |                 0.0062 |                  0.3299 |                 0.0556 |                  0.3131 |                 0.0571 |                  0.3004 |                 0.0564 |                  0.2900 |                 0.0668 |                  0.2803 |                 0.0665
INFO (2017-05-01 09:03:51,333): Done!

Model evaluation

Finally, we use dcpg_eval.py for predicting statistics and evaluating predictions.


In [5]:
eval_dir="./eval"
mkdir -p $eval_dir

cmd="dcpg_eval.py
    $data_dir/c*.h5
    --model_files $model_dir
    --out_data $eval_dir/data.h5
    --out_report $eval_dir/report.csv
    "
run $cmd


#################################
dcpg_eval.py ./data/c13_000000-010000.h5 ./data/c1_000000-010000.h5 --model_files ./model --out_data ./eval/data.h5 --out_report ./eval/report.csv
#################################
Using TensorFlow backend.
INFO (2017-05-01 09:03:56,192): Loading model ...
INFO (2017-05-01 09:03:56,834): Loading data ...
INFO (2017-05-01 09:03:56,838): Predicting ...
INFO (2017-05-01 09:03:56,868):   128/20000 (0.6%)
INFO (2017-05-01 09:04:02,852):  2176/20000 (10.9%)
INFO (2017-05-01 09:04:08,858):  4224/20000 (21.1%)
INFO (2017-05-01 09:04:14,793):  6272/20000 (31.4%)
INFO (2017-05-01 09:04:20,718):  8320/20000 (41.6%)
INFO (2017-05-01 09:04:26,740): 10384/20000 (51.9%)
INFO (2017-05-01 09:04:32,829): 12432/20000 (62.2%)
INFO (2017-05-01 09:04:39,001): 14480/20000 (72.4%)
INFO (2017-05-01 09:04:45,036): 16528/20000 (82.6%)
INFO (2017-05-01 09:04:51,084): 18576/20000 (92.9%)
INFO (2017-05-01 09:04:55,628): 20000/20000 (100.0%)
/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/metrics/classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 due to no predicted samples.
  'precision', 'predicted', average, warn_for)
/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/metrics/classification.py:516: RuntimeWarning: invalid value encountered in double_scalars
  mcc = cov_ytyp / np.sqrt(var_yt * var_yp)
                 output       auc       acc  tpr  tnr        f1  mcc       mse       mad       cor   kendall        n
1   win_stats/1001/mean  0.759164  0.711100  1.0  0.0  0.831161  0.0  0.153702  0.333584  0.566523  0.351679  20000.0
3   win_stats/2001/mean  0.747603  0.711150  1.0  0.0  0.831195  0.0  0.139558  0.319082  0.574080  0.348815  20000.0
5   win_stats/3001/mean  0.739240  0.713600  1.0  0.0  0.832866  0.0  0.129698  0.307023  0.575865  0.344773  20000.0
7   win_stats/4001/mean  0.733387  0.717100  1.0  0.0  0.835245  0.0  0.120615  0.296751  0.577660  0.339402  20000.0
9   win_stats/5001/mean  0.729933  0.722550  1.0  0.0  0.838931  0.0  0.113275  0.286596  0.576349  0.335275  20000.0
0        cpg_stats/diff  0.318182  0.846154  0.0  1.0  0.000000  0.0       NaN       NaN       NaN       NaN     13.0
2    win_stats/1001/var       NaN       NaN  NaN  NaN       NaN  NaN  0.004467  0.055278  0.047110  0.156537  20000.0
4    win_stats/2001/var       NaN       NaN  NaN  NaN       NaN  NaN  0.005197  0.056420  0.040851  0.194774  20000.0
6    win_stats/3001/var       NaN       NaN  NaN  NaN       NaN  NaN  0.005651  0.055848  0.027888  0.195326  20000.0
8    win_stats/4001/var       NaN       NaN  NaN  NaN       NaN  NaN  0.006332  0.067386  0.034885  0.194757  20000.0
10   win_stats/5001/var       NaN       NaN  NaN  NaN       NaN  NaN  0.006366  0.067522  0.036160  0.190275  20000.0
INFO (2017-05-01 09:04:56,053): Done!

In [6]:
cat $eval_dir/report.csv


metric	output	value
acc	cpg_stats/diff	0.8461538461538461
acc	win_stats/1001/mean	0.7111
acc	win_stats/2001/mean	0.71115
acc	win_stats/3001/mean	0.7136
acc	win_stats/4001/mean	0.7171
acc	win_stats/5001/mean	0.72255
auc	cpg_stats/diff	0.31818181818181823
auc	win_stats/1001/mean	0.7591639136300757
auc	win_stats/2001/mean	0.7476034296359877
auc	win_stats/3001/mean	0.7392403201486835
auc	win_stats/4001/mean	0.7333872428809352
auc	win_stats/5001/mean	0.7299328050362869
cor	win_stats/1001/mean	0.5665225871692309
cor	win_stats/1001/var	0.04710989485589752
cor	win_stats/2001/mean	0.5740795240820958
cor	win_stats/2001/var	0.04085099681709634
cor	win_stats/3001/mean	0.5758646227680748
cor	win_stats/3001/var	0.02788840114716501
cor	win_stats/4001/mean	0.5776600608207685
cor	win_stats/4001/var	0.03488458615307337
cor	win_stats/5001/mean	0.5763485749134751
cor	win_stats/5001/var	0.03616016798069942
f1	cpg_stats/diff	0.0
f1	win_stats/1001/mean	0.8311612413067616
f1	win_stats/2001/mean	0.831195394909856
f1	win_stats/3001/mean	0.8328664799253035
f1	win_stats/4001/mean	0.8352454720167725
f1	win_stats/5001/mean	0.8389306551333778
kendall	win_stats/1001/mean	0.35167866636565387
kendall	win_stats/1001/var	0.1565369577239087
kendall	win_stats/2001/mean	0.3488148874578765
kendall	win_stats/2001/var	0.1947737110933883
kendall	win_stats/3001/mean	0.3447726359965206
kendall	win_stats/3001/var	0.19532601476289968
kendall	win_stats/4001/mean	0.3394017215742897
kendall	win_stats/4001/var	0.19475738164613146
kendall	win_stats/5001/mean	0.3352746071160157
kendall	win_stats/5001/var	0.19027459921161938
mad	win_stats/1001/mean	0.3335840702056885
mad	win_stats/1001/var	0.055277857929468155
mad	win_stats/2001/mean	0.3190816342830658
mad	win_stats/2001/var	0.05642000213265419
mad	win_stats/3001/mean	0.3070233464241028
mad	win_stats/3001/var	0.0558478944003582
mad	win_stats/4001/mean	0.2967507839202881
mad	win_stats/4001/var	0.06738576292991638
mad	win_stats/5001/mean	0.28659600019454956
mad	win_stats/5001/var	0.06752248853445053
mcc	cpg_stats/diff	0.0
mcc	win_stats/1001/mean	0.0
mcc	win_stats/2001/mean	0.0
mcc	win_stats/3001/mean	0.0
mcc	win_stats/4001/mean	0.0
mcc	win_stats/5001/mean	0.0
mse	win_stats/1001/mean	0.15370191633701324
mse	win_stats/1001/var	0.004466880578547716
mse	win_stats/2001/mean	0.1395581066608429
mse	win_stats/2001/var	0.005196714773774147
mse	win_stats/3001/mean	0.12969765067100525
mse	win_stats/3001/var	0.005651499610394239
mse	win_stats/4001/mean	0.12061500549316406
mse	win_stats/4001/var	0.0063323709182441235
mse	win_stats/5001/mean	0.11327465623617172
mse	win_stats/5001/var	0.006365941371768713
n	cpg_stats/diff	13.0
n	win_stats/1001/mean	20000.0
n	win_stats/1001/var	20000.0
n	win_stats/2001/mean	20000.0
n	win_stats/2001/var	20000.0
n	win_stats/3001/mean	20000.0
n	win_stats/3001/var	20000.0
n	win_stats/4001/mean	20000.0
n	win_stats/4001/var	20000.0
n	win_stats/5001/mean	20000.0
n	win_stats/5001/var	20000.0
tnr	cpg_stats/diff	1.0
tnr	win_stats/1001/mean	0.0
tnr	win_stats/2001/mean	0.0
tnr	win_stats/3001/mean	0.0
tnr	win_stats/4001/mean	0.0
tnr	win_stats/5001/mean	0.0
tpr	cpg_stats/diff	0.0
tpr	win_stats/1001/mean	1.0
tpr	win_stats/2001/mean	1.0
tpr	win_stats/3001/mean	1.0
tpr	win_stats/4001/mean	1.0
tpr	win_stats/5001/mean	1.0