This tutorial describes how to predict inter-cell statistics such as the mean methylation rate or variance across cells.
We first initialize some variables that will be used throughout the tutorial. test_mode=1
should be used for testing purposes, which speeds up computations by only using a subset of the data. For real applications, test_mode=0
should be used.
In [1]:
function run {
local cmd=$@
echo
echo "#################################"
echo $cmd
echo "#################################"
eval $cmd
}
test_mode=1 # Set to 1 for testing and 0 otherwise.
example_dir="../../data" # Directory with example data.
cpg_dir="$example_dir/cpg" # Directory with CpG profiles.
dna_dir="$example_dir/dna/mm10" # Directory with DNA sequences.
dcpg_data.py
provides the arguments --cpg_stats
and --win_stats
to compute statistics across cells for single CpG sites or in windows of lengths --win_stats_wlen
centred on CpG sites, respectively. Supported statistics are described in the documentation and include the mean methylation rate (mean
), variance (var
), and if a CpG site is differentially methylated (diff
). With --cpg_stats_cov
, per-CpG statistics will be computed only for CpG sites that are covered by at least the specified number of cells. If this number of too low, estimated statistics might be unreliable in lowly covered regions. We will compute the mean methylation rate and variance across cells in windows of different lengths, and if CpG sites with at least three observations are differentially methylated.
In [2]:
data_dir="./data"
cmd="dcpg_data.py
--cpg_profiles $cpg_dir/*.tsv
--dna_files $dna_dir
--out_dir $data_dir
--dna_wlen 1001
--cpg_wlen 50
--cpg_stats diff
--cpg_stats_cov 3
--win_stats mean var
--win_stats_wlen 1001 2001 3001 4001 5001
"
if [[ $test_mode -eq 1 ]]; then
cmd="$cmd
--chromo 1 13
--nb_sample_chromo 10000
"
fi
run $cmd
#################################
dcpg_data.py --cpg_profiles ../../data/cpg/BS27_1_SER.tsv ../../data/cpg/BS27_3_SER.tsv ../../data/cpg/BS27_5_SER.tsv ../../data/cpg/BS27_6_SER.tsv ../../data/cpg/BS27_8_SER.tsv --dna_files ../../data/dna/mm10 --out_dir ./data --dna_wlen 1001 --cpg_wlen 50 --cpg_stats diff --cpg_stats_cov 3 --win_stats mean var --win_stats_wlen 1001 2001 3001 4001 5001 --chromo 1 13 --nb_sample_chromo 10000
#################################
INFO (2017-05-01 09:00:48,895): Reading CpG profiles ...
INFO (2017-05-01 09:00:48,895): ../../data/cpg/BS27_1_SER.tsv
INFO (2017-05-01 09:00:55,405): ../../data/cpg/BS27_3_SER.tsv
INFO (2017-05-01 09:01:00,122): ../../data/cpg/BS27_5_SER.tsv
INFO (2017-05-01 09:01:07,260): ../../data/cpg/BS27_6_SER.tsv
INFO (2017-05-01 09:01:12,711): ../../data/cpg/BS27_8_SER.tsv
INFO (2017-05-01 09:01:18,234): 20000 samples
INFO (2017-05-01 09:01:18,235): --------------------------------------------------------------------------------
INFO (2017-05-01 09:01:18,235): Chromosome 1 ...
INFO (2017-05-01 09:01:18,280): 10000 / 10000 (100.0%) sites matched minimum coverage filter
INFO (2017-05-01 09:01:22,054): Chunk 1 / 1
INFO (2017-05-01 09:01:22,063): Computing per CpG statistics ...
INFO (2017-05-01 09:01:22,066): Extracting DNA sequence windows ...
INFO (2017-05-01 09:01:24,203): Extracting CpG neighbors ...
INFO (2017-05-01 09:01:25,366): Computing window-based statistics ...
INFO (2017-05-01 09:01:25,968): --------------------------------------------------------------------------------
INFO (2017-05-01 09:01:25,969): Chromosome 13 ...
INFO (2017-05-01 09:01:26,012): 10000 / 10000 (100.0%) sites matched minimum coverage filter
INFO (2017-05-01 09:01:28,264): Chunk 1 / 1
INFO (2017-05-01 09:01:28,271): Computing per CpG statistics ...
INFO (2017-05-01 09:01:28,275): Extracting DNA sequence windows ...
INFO (2017-05-01 09:01:30,274): Extracting CpG neighbors ...
INFO (2017-05-01 09:01:31,433): Computing window-based statistics ...
INFO (2017-05-01 09:01:32,071): Done!
We will train a DNA model to predict mean methylation rates, cell-to-cell variance, and differentially methylated CpG sites from the DNA sequence alone. However, you could train a CpG model or Joint model to also use neighboring CpG sites for making predictions. To predict all per-CpG and window-based statistics computed by dcpg_data.py
instead of methylation states, we are running dcpg_train.py
with --output_names 'cpg_stats/.*' 'win_stats/.*'
. You could use --output_names '.*'
to predict both methylation states and statistics.
In [3]:
train_files=$(ls $data_dir/c{1,3,5,7,9}_*.h5 2> /dev/null)
echo "Training files:"
echo $train_files
echo
val_files=$(ls $data_dir/c{13,14,15,16,17}_*.h5 2> /dev/null)
echo "Validation files:"
echo $val_files
Training files:
./data/c1_000000-010000.h5
Validation files:
./data/c13_000000-010000.h5
In [4]:
model_dir="./model"
cmd="dcpg_train.py
$train_files
--val_files $val_files
--dna_model CnnL2h128
--out_dir $model_dir
--output_names 'cpg_stats/.*' 'win_stats/.*'
"
if [[ $test_mode -eq 1 ]]; then
cmd="$cmd
--nb_epoch 1
--nb_train_sample 10000
--nb_val_sample 10000
"
else
cmd="$cmd
--nb_epoch 30
"
fi
run $cmd
#################################
dcpg_train.py ./data/c1_000000-010000.h5 --val_files ./data/c13_000000-010000.h5 --dna_model CnnL2h128 --out_dir ./model --output_names 'cpg_stats/.*' 'win_stats/.*' --nb_epoch 1 --nb_train_sample 10000 --nb_val_sample 10000
#################################
Using TensorFlow backend.
INFO (2017-05-01 09:01:37,896): Building model ...
INFO (2017-05-01 09:01:37,901): Building DNA model ...
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
dna (InputLayer) (None, 1001, 4) 0
____________________________________________________________________________________________________
dna/conv1d_1 (Conv1D) (None, 991, 128) 5760 dna[0][0]
____________________________________________________________________________________________________
dna/activation_1 (Activation) (None, 991, 128) 0 dna/conv1d_1[0][0]
____________________________________________________________________________________________________
dna/max_pooling1d_1 (MaxPooling1 (None, 247, 128) 0 dna/activation_1[0][0]
____________________________________________________________________________________________________
dna/conv1d_2 (Conv1D) (None, 245, 256) 98560 dna/max_pooling1d_1[0][0]
____________________________________________________________________________________________________
dna/activation_2 (Activation) (None, 245, 256) 0 dna/conv1d_2[0][0]
____________________________________________________________________________________________________
dna/max_pooling1d_2 (MaxPooling1 (None, 122, 256) 0 dna/activation_2[0][0]
____________________________________________________________________________________________________
dna/flatten_1 (Flatten) (None, 31232) 0 dna/max_pooling1d_2[0][0]
____________________________________________________________________________________________________
dna/dense_1 (Dense) (None, 128) 3997824 dna/flatten_1[0][0]
____________________________________________________________________________________________________
dna/activation_3 (Activation) (None, 128) 0 dna/dense_1[0][0]
____________________________________________________________________________________________________
dna/dropout_1 (Dropout) (None, 128) 0 dna/activation_3[0][0]
____________________________________________________________________________________________________
dense_2 (Dense) (None, 1) 129 dna/dropout_1[0][0]
____________________________________________________________________________________________________
dense_3 (Dense) (None, 1) 129 dna/dropout_1[0][0]
____________________________________________________________________________________________________
dense_4 (Dense) (None, 1) 129 dna/dropout_1[0][0]
____________________________________________________________________________________________________
dense_5 (Dense) (None, 1) 129 dna/dropout_1[0][0]
____________________________________________________________________________________________________
dense_6 (Dense) (None, 1) 129 dna/dropout_1[0][0]
____________________________________________________________________________________________________
cpg_stats/diff (Dense) (None, 1) 129 dna/dropout_1[0][0]
____________________________________________________________________________________________________
win_stats/1001/mean (Dense) (None, 1) 129 dna/dropout_1[0][0]
____________________________________________________________________________________________________
win_stats/1001/var (ScaledSigmoi (None, 1) 0 dense_2[0][0]
____________________________________________________________________________________________________
win_stats/2001/mean (Dense) (None, 1) 129 dna/dropout_1[0][0]
____________________________________________________________________________________________________
win_stats/2001/var (ScaledSigmoi (None, 1) 0 dense_3[0][0]
____________________________________________________________________________________________________
win_stats/3001/mean (Dense) (None, 1) 129 dna/dropout_1[0][0]
____________________________________________________________________________________________________
win_stats/3001/var (ScaledSigmoi (None, 1) 0 dense_4[0][0]
____________________________________________________________________________________________________
win_stats/4001/mean (Dense) (None, 1) 129 dna/dropout_1[0][0]
____________________________________________________________________________________________________
win_stats/4001/var (ScaledSigmoi (None, 1) 0 dense_5[0][0]
____________________________________________________________________________________________________
win_stats/5001/mean (Dense) (None, 1) 129 dna/dropout_1[0][0]
____________________________________________________________________________________________________
win_stats/5001/var (ScaledSigmoi (None, 1) 0 dense_6[0][0]
====================================================================================================
Total params: 4,103,563
Trainable params: 4,103,563
Non-trainable params: 0
____________________________________________________________________________________________________
INFO (2017-05-01 09:01:38,241): Computing output statistics ...
Output statistics:
name | nb_tot | nb_obs | frac_obs | mean | var
--------------------------------------------------------------
cpg_stats/diff | 10000 | 3 | 0.00 | 0.00 | 0.00
win_stats/1001/mean | 10000 | 10000 | 1.00 | 0.74 | 0.17
win_stats/1001/var | 10000 | 10000 | 1.00 | 0.02 | 0.00
win_stats/2001/mean | 10000 | 10000 | 1.00 | 0.74 | 0.16
win_stats/2001/var | 10000 | 10000 | 1.00 | 0.03 | 0.00
win_stats/3001/mean | 10000 | 10000 | 1.00 | 0.74 | 0.15
win_stats/3001/var | 10000 | 10000 | 1.00 | 0.03 | 0.01
win_stats/4001/mean | 10000 | 10000 | 1.00 | 0.74 | 0.14
win_stats/4001/var | 10000 | 10000 | 1.00 | 0.04 | 0.01
win_stats/5001/mean | 10000 | 10000 | 1.00 | 0.74 | 0.13
win_stats/5001/var | 10000 | 10000 | 1.00 | 0.04 | 0.01
Class weights:
cpg_stats/diff
--------------
0=0.00
1=1.00
INFO (2017-05-01 09:01:39,188): Loading data ...
INFO (2017-05-01 09:01:39,191): Initializing callbacks ...
INFO (2017-05-01 09:01:39,192): Training model ...
Training samples: 10000
Validation samples: 10000
Epochs: 1
Learning rate: 0.0001
====================================================================================================
Epoch 1/1
====================================================================================================
done (%) | time | loss | acc | mse | mae | cpg_stats/diff_loss | win_stats/1001/mean_loss | win_stats/1001/var_loss | win_stats/2001/mean_loss | win_stats/2001/var_loss | win_stats/3001/mean_loss | win_stats/3001/var_loss | win_stats/4001/mean_loss | win_stats/4001/var_loss | win_stats/5001/mean_loss | win_stats/5001/var_loss | cpg_stats/diff_acc | win_stats/1001/mean_acc | win_stats/2001/mean_acc | win_stats/3001/mean_acc | win_stats/4001/mean_acc | win_stats/5001/mean_acc | win_stats/1001/mean_mse | win_stats/1001/var_mse | win_stats/2001/mean_mse | win_stats/2001/var_mse | win_stats/3001/mean_mse | win_stats/3001/var_mse | win_stats/4001/mean_mse | win_stats/4001/var_mse | win_stats/5001/mean_mse | win_stats/5001/var_mse | win_stats/1001/mean_mae | win_stats/1001/var_mae | win_stats/2001/mean_mae | win_stats/2001/var_mae | win_stats/3001/mean_mae | win_stats/3001/var_mae | win_stats/4001/mean_mae | win_stats/4001/var_mae | win_stats/5001/mean_mae | win_stats/5001/var_mae
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1.3 | 0.0 | 3.9243 | 0.5281 | 0.1093 | 0.2705 | 0.0000 | 0.2252 | 0.0153 | 0.2113 | 0.0149 | 0.2029 | 0.0144 | 0.1932 | 0.0140 | 0.1885 | 0.0136 | nan | 0.7266 | 0.5938 | 0.4609 | 0.3672 | 0.4922 | 0.2252 | 0.0153 | 0.2113 | 0.0149 | 0.2029 | 0.0144 | 0.1932 | 0.0140 | 0.1885 | 0.0136 | 0.4566 | 0.1233 | 0.4325 | 0.1208 | 0.4188 | 0.1186 | 0.4029 | 0.1163 | 0.4007 | 0.1144
10.3 | 0.2 | 3.7188 | 0.6951 | 0.1012 | 0.2624 | 0.0000 | 0.2098 | 0.0150 | 0.1955 | 0.0150 | 0.1889 | 0.0136 | 0.1760 | 0.0141 | 0.1717 | 0.0128 | nan | 0.7119 | 0.7041 | 0.6836 | 0.6816 | 0.6943 | 0.2098 | 0.0150 | 0.1955 | 0.0150 | 0.1889 | 0.0136 | 0.1760 | 0.0141 | 0.1717 | 0.0128 | 0.4398 | 0.1219 | 0.4190 | 0.1210 | 0.4084 | 0.1145 | 0.3903 | 0.1159 | 0.3837 | 0.1100
20.5 | 0.3 | 3.4898 | 0.5862 | 0.0917 | 0.2458 | 0.0000 | 0.1916 | 0.0143 | 0.1760 | 0.0156 | 0.1698 | 0.0122 | 0.1574 | 0.0157 | 0.1526 | 0.0117 | 0.0000 | 0.7129 | 0.7085 | 0.6978 | 0.6929 | 0.7051 | 0.1916 | 0.0143 | 0.1760 | 0.0156 | 0.1698 | 0.0122 | 0.1574 | 0.0157 | 0.1526 | 0.0117 | 0.4091 | 0.1191 | 0.3840 | 0.1233 | 0.3782 | 0.1081 | 0.3563 | 0.1222 | 0.3529 | 0.1050
30.8 | 0.5 | 3.3053 | 0.5928 | 0.0861 | 0.2285 | 0.0000 | 0.1797 | 0.0128 | 0.1659 | 0.0142 | 0.1593 | 0.0106 | 0.1490 | 0.0149 | 0.1438 | 0.0109 | 0.0000 | 0.7168 | 0.7152 | 0.7083 | 0.7044 | 0.7122 | 0.1797 | 0.0128 | 0.1659 | 0.0142 | 0.1593 | 0.0106 | 0.1490 | 0.0149 | 0.1438 | 0.0109 | 0.3738 | 0.1119 | 0.3558 | 0.1174 | 0.3495 | 0.0988 | 0.3325 | 0.1184 | 0.3264 | 0.1010
41.0 | 0.6 | 3.1635 | 0.5922 | 0.0843 | 0.2236 | 0.0000 | 0.1775 | 0.0115 | 0.1640 | 0.0125 | 0.1571 | 0.0095 | 0.1470 | 0.0130 | 0.1407 | 0.0106 | 0.0000 | 0.7134 | 0.7134 | 0.7070 | 0.7065 | 0.7131 | 0.1775 | 0.0115 | 0.1640 | 0.0125 | 0.1571 | 0.0095 | 0.1470 | 0.0130 | 0.1407 | 0.0106 | 0.3686 | 0.1045 | 0.3554 | 0.1082 | 0.3452 | 0.0912 | 0.3316 | 0.1093 | 0.3226 | 0.0998
51.3 | 0.8 | 3.0206 | 0.5939 | 0.0820 | 0.2188 | 0.0000 | 0.1738 | 0.0104 | 0.1606 | 0.0112 | 0.1527 | 0.0088 | 0.1432 | 0.0119 | 0.1368 | 0.0102 | 0.0000 | 0.7164 | 0.7154 | 0.7082 | 0.7084 | 0.7150 | 0.1738 | 0.0104 | 0.1606 | 0.0112 | 0.1527 | 0.0088 | 0.1432 | 0.0119 | 0.1368 | 0.0102 | 0.3662 | 0.0982 | 0.3508 | 0.1008 | 0.3410 | 0.0852 | 0.3270 | 0.1023 | 0.3192 | 0.0974
61.5 | 0.9 | 2.8945 | 0.5947 | 0.0808 | 0.2135 | 0.0000 | 0.1727 | 0.0095 | 0.1599 | 0.0102 | 0.1510 | 0.0082 | 0.1412 | 0.0109 | 0.1348 | 0.0096 | 0.0000 | 0.7150 | 0.7142 | 0.7103 | 0.7118 | 0.7171 | 0.1727 | 0.0095 | 0.1599 | 0.0102 | 0.1510 | 0.0082 | 0.1412 | 0.0109 | 0.1348 | 0.0096 | 0.3615 | 0.0919 | 0.3463 | 0.0937 | 0.3370 | 0.0794 | 0.3220 | 0.0960 | 0.3138 | 0.0936
71.8 | 1.1 | 2.7730 | 0.5954 | 0.0796 | 0.2105 | 0.0000 | 0.1709 | 0.0087 | 0.1586 | 0.0094 | 0.1492 | 0.0077 | 0.1394 | 0.0102 | 0.1326 | 0.0092 | 0.0000 | 0.7160 | 0.7139 | 0.7112 | 0.7125 | 0.7192 | 0.1709 | 0.0087 | 0.1586 | 0.0094 | 0.1492 | 0.0077 | 0.1394 | 0.0102 | 0.1326 | 0.0092 | 0.3603 | 0.0868 | 0.3461 | 0.0885 | 0.3347 | 0.0755 | 0.3204 | 0.0913 | 0.3113 | 0.0904
82.1 | 1.3 | 2.6631 | 0.5937 | 0.0791 | 0.2085 | 0.0000 | 0.1709 | 0.0082 | 0.1585 | 0.0089 | 0.1484 | 0.0075 | 0.1382 | 0.0097 | 0.1316 | 0.0089 | 0.0000 | 0.7139 | 0.7111 | 0.7086 | 0.7109 | 0.7177 | 0.1709 | 0.0082 | 0.1585 | 0.0089 | 0.1484 | 0.0075 | 0.1382 | 0.0097 | 0.1316 | 0.0089 | 0.3601 | 0.0830 | 0.3456 | 0.0846 | 0.3338 | 0.0730 | 0.3189 | 0.0883 | 0.3102 | 0.0878
92.3 | 1.4 | 2.5563 | 0.5930 | 0.0784 | 0.2069 | 0.0000 | 0.1701 | 0.0077 | 0.1574 | 0.0084 | 0.1472 | 0.0073 | 0.1372 | 0.0093 | 0.1303 | 0.0086 | 0.0000 | 0.7125 | 0.7101 | 0.7081 | 0.7106 | 0.7169 | 0.1701 | 0.0077 | 0.1574 | 0.0084 | 0.1472 | 0.0073 | 0.1372 | 0.0093 | 0.1303 | 0.0086 | 0.3601 | 0.0800 | 0.3449 | 0.0814 | 0.3328 | 0.0710 | 0.3179 | 0.0860 | 0.3087 | 0.0858
100.0 | 1.5 | 2.4809 | 0.5924 | 0.0780 | 0.2056 | 0.0000 | 0.1695 | 0.0075 | 0.1566 | 0.0082 | 0.1466 | 0.0072 | 0.1368 | 0.0091 | 0.1298 | 0.0084 | 0.0000 | 0.7117 | 0.7088 | 0.7073 | 0.7099 | 0.7164 | 0.1695 | 0.0075 | 0.1566 | 0.0082 | 0.1466 | 0.0072 | 0.1368 | 0.0091 | 0.1298 | 0.0084 | 0.3588 | 0.0781 | 0.3434 | 0.0796 | 0.3316 | 0.0698 | 0.3172 | 0.0847 | 0.3081 | 0.0844
Epoch 00000: val_loss improved from inf to 1.44847, saving model to ./model/model_weights_val.h5
split | loss | acc | mse | mae | cpg_stats/diff_loss | win_stats/1001/mean_loss | win_stats/1001/var_loss | win_stats/2001/mean_loss | win_stats/2001/var_loss | win_stats/3001/mean_loss | win_stats/3001/var_loss | win_stats/4001/mean_loss | win_stats/4001/var_loss | win_stats/5001/mean_loss | win_stats/5001/var_loss | cpg_stats/diff_acc | win_stats/1001/mean_acc | win_stats/2001/mean_acc | win_stats/3001/mean_acc | win_stats/4001/mean_acc | win_stats/5001/mean_acc | win_stats/1001/mean_mse | win_stats/1001/var_mse | win_stats/2001/mean_mse | win_stats/2001/var_mse | win_stats/3001/mean_mse | win_stats/3001/var_mse | win_stats/4001/mean_mse | win_stats/4001/var_mse | win_stats/5001/mean_mse | win_stats/5001/var_mse | win_stats/1001/mean_mae | win_stats/1001/var_mae | win_stats/2001/mean_mae | win_stats/2001/var_mae | win_stats/3001/mean_mae | win_stats/3001/var_mae | win_stats/4001/mean_mae | win_stats/4001/var_mae | win_stats/5001/mean_mae | win_stats/5001/var_mae
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
train | 2.4809 | 0.7109 | 0.0780 | 0.2056 | 0.0000 | 0.1695 | 0.0075 | 0.1566 | 0.0082 | 0.1466 | 0.0072 | 0.1368 | 0.0091 | 0.1298 | 0.0084 | nan | 0.7117 | 0.7088 | 0.7073 | 0.7099 | 0.7164 | 0.1695 | 0.0075 | 0.1566 | 0.0082 | 0.1466 | 0.0072 | 0.1368 | 0.0091 | 0.1298 | 0.0084 | 0.3588 | 0.0781 | 0.3434 | 0.0796 | 0.3316 | 0.0698 | 0.3172 | 0.0847 | 0.3081 | 0.0844
val | 1.4485 | 0.7175 | 0.0665 | 0.1816 | 0.0007 | 0.1509 | 0.0046 | 0.1351 | 0.0054 | 0.1252 | 0.0058 | 0.1161 | 0.0063 | 0.1092 | 0.0062 | nan | 0.7109 | 0.7124 | 0.7173 | 0.7206 | 0.7264 | 0.1509 | 0.0046 | 0.1351 | 0.0054 | 0.1252 | 0.0058 | 0.1161 | 0.0063 | 0.1092 | 0.0062 | 0.3299 | 0.0556 | 0.3131 | 0.0571 | 0.3004 | 0.0564 | 0.2900 | 0.0668 | 0.2803 | 0.0665
====================================================================================================
Training set performance:
loss | acc | mse | mae | cpg_stats/diff_loss | win_stats/1001/mean_loss | win_stats/1001/var_loss | win_stats/2001/mean_loss | win_stats/2001/var_loss | win_stats/3001/mean_loss | win_stats/3001/var_loss | win_stats/4001/mean_loss | win_stats/4001/var_loss | win_stats/5001/mean_loss | win_stats/5001/var_loss | cpg_stats/diff_acc | win_stats/1001/mean_acc | win_stats/2001/mean_acc | win_stats/3001/mean_acc | win_stats/4001/mean_acc | win_stats/5001/mean_acc | win_stats/1001/mean_mse | win_stats/1001/var_mse | win_stats/2001/mean_mse | win_stats/2001/var_mse | win_stats/3001/mean_mse | win_stats/3001/var_mse | win_stats/4001/mean_mse | win_stats/4001/var_mse | win_stats/5001/mean_mse | win_stats/5001/var_mse | win_stats/1001/mean_mae | win_stats/1001/var_mae | win_stats/2001/mean_mae | win_stats/2001/var_mae | win_stats/3001/mean_mae | win_stats/3001/var_mae | win_stats/4001/mean_mae | win_stats/4001/var_mae | win_stats/5001/mean_mae | win_stats/5001/var_mae
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2.4809 | 0.7109 | 0.0780 | 0.2056 | 0.0000 | 0.1695 | 0.0075 | 0.1566 | 0.0082 | 0.1466 | 0.0072 | 0.1368 | 0.0091 | 0.1298 | 0.0084 | nan | 0.7117 | 0.7088 | 0.7073 | 0.7099 | 0.7164 | 0.1695 | 0.0075 | 0.1566 | 0.0082 | 0.1466 | 0.0072 | 0.1368 | 0.0091 | 0.1298 | 0.0084 | 0.3588 | 0.0781 | 0.3434 | 0.0796 | 0.3316 | 0.0698 | 0.3172 | 0.0847 | 0.3081 | 0.0844
Validation set performance:
loss | acc | mse | mae | cpg_stats/diff_loss | win_stats/1001/mean_loss | win_stats/1001/var_loss | win_stats/2001/mean_loss | win_stats/2001/var_loss | win_stats/3001/mean_loss | win_stats/3001/var_loss | win_stats/4001/mean_loss | win_stats/4001/var_loss | win_stats/5001/mean_loss | win_stats/5001/var_loss | cpg_stats/diff_acc | win_stats/1001/mean_acc | win_stats/2001/mean_acc | win_stats/3001/mean_acc | win_stats/4001/mean_acc | win_stats/5001/mean_acc | win_stats/1001/mean_mse | win_stats/1001/var_mse | win_stats/2001/mean_mse | win_stats/2001/var_mse | win_stats/3001/mean_mse | win_stats/3001/var_mse | win_stats/4001/mean_mse | win_stats/4001/var_mse | win_stats/5001/mean_mse | win_stats/5001/var_mse | win_stats/1001/mean_mae | win_stats/1001/var_mae | win_stats/2001/mean_mae | win_stats/2001/var_mae | win_stats/3001/mean_mae | win_stats/3001/var_mae | win_stats/4001/mean_mae | win_stats/4001/var_mae | win_stats/5001/mean_mae | win_stats/5001/var_mae
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1.4485 | 0.7175 | 0.0665 | 0.1816 | 0.0007 | 0.1509 | 0.0046 | 0.1351 | 0.0054 | 0.1252 | 0.0058 | 0.1161 | 0.0063 | 0.1092 | 0.0062 | nan | 0.7109 | 0.7124 | 0.7173 | 0.7206 | 0.7264 | 0.1509 | 0.0046 | 0.1351 | 0.0054 | 0.1252 | 0.0058 | 0.1161 | 0.0063 | 0.1092 | 0.0062 | 0.3299 | 0.0556 | 0.3131 | 0.0571 | 0.3004 | 0.0564 | 0.2900 | 0.0668 | 0.2803 | 0.0665
INFO (2017-05-01 09:03:51,333): Done!
Finally, we use dcpg_eval.py
for predicting statistics and evaluating predictions.
In [5]:
eval_dir="./eval"
mkdir -p $eval_dir
cmd="dcpg_eval.py
$data_dir/c*.h5
--model_files $model_dir
--out_data $eval_dir/data.h5
--out_report $eval_dir/report.csv
"
run $cmd
#################################
dcpg_eval.py ./data/c13_000000-010000.h5 ./data/c1_000000-010000.h5 --model_files ./model --out_data ./eval/data.h5 --out_report ./eval/report.csv
#################################
Using TensorFlow backend.
INFO (2017-05-01 09:03:56,192): Loading model ...
INFO (2017-05-01 09:03:56,834): Loading data ...
INFO (2017-05-01 09:03:56,838): Predicting ...
INFO (2017-05-01 09:03:56,868): 128/20000 (0.6%)
INFO (2017-05-01 09:04:02,852): 2176/20000 (10.9%)
INFO (2017-05-01 09:04:08,858): 4224/20000 (21.1%)
INFO (2017-05-01 09:04:14,793): 6272/20000 (31.4%)
INFO (2017-05-01 09:04:20,718): 8320/20000 (41.6%)
INFO (2017-05-01 09:04:26,740): 10384/20000 (51.9%)
INFO (2017-05-01 09:04:32,829): 12432/20000 (62.2%)
INFO (2017-05-01 09:04:39,001): 14480/20000 (72.4%)
INFO (2017-05-01 09:04:45,036): 16528/20000 (82.6%)
INFO (2017-05-01 09:04:51,084): 18576/20000 (92.9%)
INFO (2017-05-01 09:04:55,628): 20000/20000 (100.0%)
/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/metrics/classification.py:1113: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 due to no predicted samples.
'precision', 'predicted', average, warn_for)
/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/metrics/classification.py:516: RuntimeWarning: invalid value encountered in double_scalars
mcc = cov_ytyp / np.sqrt(var_yt * var_yp)
output auc acc tpr tnr f1 mcc mse mad cor kendall n
1 win_stats/1001/mean 0.759164 0.711100 1.0 0.0 0.831161 0.0 0.153702 0.333584 0.566523 0.351679 20000.0
3 win_stats/2001/mean 0.747603 0.711150 1.0 0.0 0.831195 0.0 0.139558 0.319082 0.574080 0.348815 20000.0
5 win_stats/3001/mean 0.739240 0.713600 1.0 0.0 0.832866 0.0 0.129698 0.307023 0.575865 0.344773 20000.0
7 win_stats/4001/mean 0.733387 0.717100 1.0 0.0 0.835245 0.0 0.120615 0.296751 0.577660 0.339402 20000.0
9 win_stats/5001/mean 0.729933 0.722550 1.0 0.0 0.838931 0.0 0.113275 0.286596 0.576349 0.335275 20000.0
0 cpg_stats/diff 0.318182 0.846154 0.0 1.0 0.000000 0.0 NaN NaN NaN NaN 13.0
2 win_stats/1001/var NaN NaN NaN NaN NaN NaN 0.004467 0.055278 0.047110 0.156537 20000.0
4 win_stats/2001/var NaN NaN NaN NaN NaN NaN 0.005197 0.056420 0.040851 0.194774 20000.0
6 win_stats/3001/var NaN NaN NaN NaN NaN NaN 0.005651 0.055848 0.027888 0.195326 20000.0
8 win_stats/4001/var NaN NaN NaN NaN NaN NaN 0.006332 0.067386 0.034885 0.194757 20000.0
10 win_stats/5001/var NaN NaN NaN NaN NaN NaN 0.006366 0.067522 0.036160 0.190275 20000.0
INFO (2017-05-01 09:04:56,053): Done!
In [6]:
cat $eval_dir/report.csv
metric output value
acc cpg_stats/diff 0.8461538461538461
acc win_stats/1001/mean 0.7111
acc win_stats/2001/mean 0.71115
acc win_stats/3001/mean 0.7136
acc win_stats/4001/mean 0.7171
acc win_stats/5001/mean 0.72255
auc cpg_stats/diff 0.31818181818181823
auc win_stats/1001/mean 0.7591639136300757
auc win_stats/2001/mean 0.7476034296359877
auc win_stats/3001/mean 0.7392403201486835
auc win_stats/4001/mean 0.7333872428809352
auc win_stats/5001/mean 0.7299328050362869
cor win_stats/1001/mean 0.5665225871692309
cor win_stats/1001/var 0.04710989485589752
cor win_stats/2001/mean 0.5740795240820958
cor win_stats/2001/var 0.04085099681709634
cor win_stats/3001/mean 0.5758646227680748
cor win_stats/3001/var 0.02788840114716501
cor win_stats/4001/mean 0.5776600608207685
cor win_stats/4001/var 0.03488458615307337
cor win_stats/5001/mean 0.5763485749134751
cor win_stats/5001/var 0.03616016798069942
f1 cpg_stats/diff 0.0
f1 win_stats/1001/mean 0.8311612413067616
f1 win_stats/2001/mean 0.831195394909856
f1 win_stats/3001/mean 0.8328664799253035
f1 win_stats/4001/mean 0.8352454720167725
f1 win_stats/5001/mean 0.8389306551333778
kendall win_stats/1001/mean 0.35167866636565387
kendall win_stats/1001/var 0.1565369577239087
kendall win_stats/2001/mean 0.3488148874578765
kendall win_stats/2001/var 0.1947737110933883
kendall win_stats/3001/mean 0.3447726359965206
kendall win_stats/3001/var 0.19532601476289968
kendall win_stats/4001/mean 0.3394017215742897
kendall win_stats/4001/var 0.19475738164613146
kendall win_stats/5001/mean 0.3352746071160157
kendall win_stats/5001/var 0.19027459921161938
mad win_stats/1001/mean 0.3335840702056885
mad win_stats/1001/var 0.055277857929468155
mad win_stats/2001/mean 0.3190816342830658
mad win_stats/2001/var 0.05642000213265419
mad win_stats/3001/mean 0.3070233464241028
mad win_stats/3001/var 0.0558478944003582
mad win_stats/4001/mean 0.2967507839202881
mad win_stats/4001/var 0.06738576292991638
mad win_stats/5001/mean 0.28659600019454956
mad win_stats/5001/var 0.06752248853445053
mcc cpg_stats/diff 0.0
mcc win_stats/1001/mean 0.0
mcc win_stats/2001/mean 0.0
mcc win_stats/3001/mean 0.0
mcc win_stats/4001/mean 0.0
mcc win_stats/5001/mean 0.0
mse win_stats/1001/mean 0.15370191633701324
mse win_stats/1001/var 0.004466880578547716
mse win_stats/2001/mean 0.1395581066608429
mse win_stats/2001/var 0.005196714773774147
mse win_stats/3001/mean 0.12969765067100525
mse win_stats/3001/var 0.005651499610394239
mse win_stats/4001/mean 0.12061500549316406
mse win_stats/4001/var 0.0063323709182441235
mse win_stats/5001/mean 0.11327465623617172
mse win_stats/5001/var 0.006365941371768713
n cpg_stats/diff 13.0
n win_stats/1001/mean 20000.0
n win_stats/1001/var 20000.0
n win_stats/2001/mean 20000.0
n win_stats/2001/var 20000.0
n win_stats/3001/mean 20000.0
n win_stats/3001/var 20000.0
n win_stats/4001/mean 20000.0
n win_stats/4001/var 20000.0
n win_stats/5001/mean 20000.0
n win_stats/5001/var 20000.0
tnr cpg_stats/diff 1.0
tnr win_stats/1001/mean 0.0
tnr win_stats/2001/mean 0.0
tnr win_stats/3001/mean 0.0
tnr win_stats/4001/mean 0.0
tnr win_stats/5001/mean 0.0
tpr cpg_stats/diff 0.0
tpr win_stats/1001/mean 1.0
tpr win_stats/2001/mean 1.0
tpr win_stats/3001/mean 1.0
tpr win_stats/4001/mean 1.0
tpr win_stats/5001/mean 1.0
Content source: cangermueller/deepcpg
Similar notebooks: