How to train your DragoNN tutorial

How to use this tutorial

This tutorial utilizes a Jupyter Notebook - an interactive computational enviroment that combines live code, visualizations, and explanatory text. The notebook is organized into a series of cells. You can run the next cell by cliking the play button: You can also run all cells in a series by clicking "run all" in the Cell drop-down menu: Half of the cells in this tutorial contain code, the other half contain visualizations and explanatory text. Code, visualizations, and text in cells can be modified - you are encouraged to modify the code as you advance through the tutorial. You can inspect the implementation of a function used in a cell by following these steps:

Tutorial Overview

In this tutorial, we will:

1) Simulate regulatory DNA sequence classification task
2) Train DragoNN models of varying complexity to solve the simulation
3) Interpret trained DragoNN models
4) Show how to train your DragoNN on your own, non-simulated data and use it to interpret data

This tutorial is implemented in python (see this online python course for an introduction).

We start by loading dragonn's tutorial utilities. Let's review properties of regulatory sequence while the utilities are loading


In [9]:
%reload_ext autoreload
%autoreload 2
from dragonn.tutorial_utils import *
%matplotlib inline

In this tutorial, we will simulate heterodimer motif grammar detection task. Specifically, we will simulate a "positive" class of sequences with a SIX5-ZNF143 grammar with relatively fixed spacing between the motifs and a negative class of sequences containing both motifs positioned independently: Here is an overview of the sequence simulation functions in the dragonn tutorial:

Let's run the print_available_simulations function and see it in action.


In [10]:
print_available_simulations()


simulate_differential_accessibility
simulate_heterodimer_grammar
simulate_motif_counting
simulate_motif_density_localization
simulate_multi_motif_embedding
simulate_single_motif_detection

Getting simulation data

To get simulation data we:

1) Define the simulation parameters
    - obtain description of simulation parameters using the print_simulation_info function
2) Call the get_simulation_data function, which takes as input the simulation name and the simulation
parameters, and outputs the simulation data.

We simulate the SIX5-ZNF143 heterodimer motif grammar using the "simulate_heterodimer_grammar" simulation function. To get a description of the simulation parameters we use the print_simulation_info function, which takes as input the simulation function name, and outputs documentation for the simulation including the simulation parameters:


In [11]:
print_simulation_info("simulate_heterodimer_grammar")


    Simulates two classes of sequences with motif1 and motif2:
        - Positive class sequences with motif1 and motif2 positioned
          min_spacing and max_spacing
        - Negative class sequences with independent motif1 and motif2 positioned
        anywhere in the sequence, not as a heterodimer grammar

    Parameters
    ----------
    seq_length : int, length of sequence
    GC_fraction : float, GC fraction in background sequence
    num_pos : int, number of positive class sequences
    num_neg : int, number of negatice class sequences
    motif1 : str, encode motif name
    motif2 : str, encode motif name
    min_spacing : int, minimum inter motif spacing
    max_spacing : int, maximum inter motif spacing

    Returns
    -------
    sequence_arr : 1darray
        Array with sequence strings.
    y : 1darray
        Array with positive/negative class labels.
    

Next, we define parameters for a heterodimer grammar simulation of 500bp long sequence, with 0.4 GC fraction, 10000 positive and negative sequences, with SIx5 and ZNF143 motifs spaced 2-10 bp apart in the positive sequences:


In [12]:
heterodimer_grammar_simulation_parameters = {
    "seq_length": 500,
    "GC_fraction": 0.4,
    "num_pos": 10000,
    "num_neg": 10000,
    "motif1": "SIX5_known5",
    "motif2": "ZNF143_known2",
    "min_spacing": 2,
    "max_spacing": 10}

We get the simulation data by calling the get_simulation_data function with the simulation name and the simulation parameters as inputs.


In [13]:
simulation_data = get_simulation_data("simulate_heterodimer_grammar", heterodimer_grammar_simulation_parameters)

simulation_data provides training, validation, and test sets of input sequences X and sequence labels y. The inputs X are matrices with a one-hot-encoding of the sequences: Here are the first 10bp of a sequence in our training data:


In [14]:
simulation_data.X_train[0, :, :, :10]


Out[14]:
array([[[0, 0, 1, 1, 0, 0, 0, 0, 0, 1],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 1, 0, 0, 0, 1, 0],
        [1, 1, 0, 0, 0, 1, 1, 1, 0, 0]]], dtype=int8)

This matrix represent the 10bp sequence TTGGTAGATA.

Next, we will provide a brief overview of DragoNNs and proceed to train a DragoNN to classify the sequences we simulated:

DragoNN Models

A locally connected linear unit in a DragoNN model can represent a PSSM (part a). A sequence PSSM score is obtained by multiplying the PSSM across the sequence, thersholding the PSSM scores, and taking the max (part b). A PSSM score can also be computed by a DragoNN model with tiled locally connected linear units, amounting to a convolutional layer with a single convolutional filter representing the PSSM, followed by ReLU thersholding and maxpooling (part c). By utilizing multiple convolutional layers with multiple convolutional filters, DragoNN models can represent a wide range of sequence features in a compositional fashion:

Getting a DragoNN model

The main DragoNN model class is SequenceDNN, which provides a simple interface to a range of models and methods to train, test, and interpret DragoNNs. SequenceDNN uses keras, a deep learning library for Theano and TensorFlow, which are popular software packages for deep learning.

To get a DragoNN model we:

1) Define the DragoNN architecture parameters
    - obtain description of architecture parameters using the inspect_SequenceDNN() function
2) Call the get_SequenceDNN function, which takes as input the DragoNN architecture parameters, and outputs a 
randomly initialized DragoNN model.

To get a description of the architecture parameters we use the inspect_SequenceDNN function, which outputs documentation for the model class including the architecture parameters:


In [15]:
inspect_SequenceDNN()


Sequence DNN models.

Parameters
----------
seq_length : int, optional
    length of input sequence.
keras_model : instance of keras.models.Sequential, optional
    seq_length or keras_model must be specified.
num_tasks : int, optional
    number of tasks. Default: 1.
num_filters : list[int] | tuple[int]
    number of convolutional filters in each layer. Default: (15,).
conv_width : list[int] | tuple[int]
    width of each layer's convolutional filters. Default: (15,).
pool_width : int
    width of max pooling after the last layer. Default: 35.
L1 : float
    strength of L1 penalty.
dropout : float
    dropout probability in every convolutional layer. Default: 0.
verbose: int
    Verbosity level during training. Valida values: 0, 1, 2.

Returns
-------
Compiled DNN model.

Available methods:

deeplift
get_sequence_filters
in_silico_mutagenesis
plot_architecture
plot_deeplift
plot_in_silico_mutagenesis
predict
save
score
test
train

"Available methods" display what can be done with a SequenceDNN model. These include common operations such as training and testing the model, and more complex operations such as extracting insight from trained models. We define a simple DragoNN model with one convolutional layer with one convolutional filter, followed by maxpooling of width 35.


In [16]:
one_filter_dragonn_parameters = {
    'seq_length': 500,
    'num_filters': [1],
    'conv_width': [45],
    'pool_width': 45}

we get a radnomly initialized DragoNN model by calling the get_SequenceDNN function with one_filter_dragonn_parameters as the input


In [17]:
one_filter_dragonn = get_SequenceDNN(one_filter_dragonn_parameters)

Training a DragoNN model

Next, we train the one_filter_dragonn by calling train_SequenceDNN with one_filter_dragonn and simulation_data as the inputs. In each epoch, the one_filter_dragonn will perform a complete pass over the training data, and update its parameters to minimize the loss, which quantifies the error in the model predictions. After each epoch, the code prints performance metrics for the one_filter_dragonn on the validation data. Training stops once the loss on the validation stops improving for multiple consecutive epochs. The performance metrics include balanced accuracy, area under the receiver-operating curve (auROC), are under the precision-recall curve (auPRC), area under the precision-recall-gain curve (auPRG), and recall for multiple false discovery rates (Recall at FDR).


In [18]:
train_SequenceDNN(one_filter_dragonn, simulation_data)


Training model (* indicates new best result)...
Epoch 1:
Train Loss: 0.6989	Balanced Accuracy: 50.58%	 auROC: 0.504	 auPRC: 0.501
	Recall at 5%|10%|20% FDR: 0.0%|0.0%|0.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.7027	Balanced Accuracy: 48.84%	 auROC: 0.488	 auPRC: 0.488
	Recall at 5%|10%|20% FDR: 0.1%|0.1%|0.1%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 2:
Train Loss: 0.6938	Balanced Accuracy: 50.78%	 auROC: 0.511	 auPRC: 0.506
	Recall at 5%|10%|20% FDR: 0.0%|0.0%|0.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6955	Balanced Accuracy: 49.16%	 auROC: 0.485	 auPRC: 0.494
	Recall at 5%|10%|20% FDR: 0.1%|0.1%|0.4%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 3:
Train Loss: 0.6929	Balanced Accuracy: 51.19%	 auROC: 0.517	 auPRC: 0.513
	Recall at 5%|10%|20% FDR: 0.0%|0.0%|0.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6940	Balanced Accuracy: 48.94%	 auROC: 0.485	 auPRC: 0.493
	Recall at 5%|10%|20% FDR: 0.1%|0.1%|0.1%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 4:
Train Loss: 0.6922	Balanced Accuracy: 51.85%	 auROC: 0.524	 auPRC: 0.520
	Recall at 5%|10%|20% FDR: 0.0%|0.0%|0.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6948	Balanced Accuracy: 48.94%	 auROC: 0.483	 auPRC: 0.490
	Recall at 5%|10%|20% FDR: 0.1%|0.1%|0.1%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 5:
Train Loss: 0.6915	Balanced Accuracy: 52.34%	 auROC: 0.532	 auPRC: 0.529
	Recall at 5%|10%|20% FDR: 0.0%|0.0%|0.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6950	Balanced Accuracy: 49.19%	 auROC: 0.482	 auPRC: 0.489
	Recall at 5%|10%|20% FDR: 0.1%|0.1%|0.1%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 6:
Train Loss: 0.6910	Balanced Accuracy: 52.43%	 auROC: 0.536	 auPRC: 0.534
	Recall at 5%|10%|20% FDR: 0.1%|0.1%|0.1%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6952	Balanced Accuracy: 49.25%	 auROC: 0.485	 auPRC: 0.490
	Recall at 5%|10%|20% FDR: 0.0%|0.0%|0.0%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 7:
Train Loss: 0.6904	Balanced Accuracy: 52.41%	 auROC: 0.540	 auPRC: 0.539
	Recall at 5%|10%|20% FDR: 0.1%|0.1%|0.1%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6955	Balanced Accuracy: 48.62%	 auROC: 0.485	 auPRC: 0.491
	Recall at 5%|10%|20% FDR: 0.0%|0.0%|0.0%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 8:
Train Loss: 0.6903	Balanced Accuracy: 53.82%	 auROC: 0.548	 auPRC: 0.543
	Recall at 5%|10%|20% FDR: 0.1%|0.1%|0.1%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6949	Balanced Accuracy: 49.44%	 auROC: 0.487	 auPRC: 0.492
	Recall at 5%|10%|20% FDR: 0.0%|0.0%|0.0%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 9:
Train Loss: 0.6896	Balanced Accuracy: 53.54%	 auROC: 0.549	 auPRC: 0.546
	Recall at 5%|10%|20% FDR: 0.1%|0.1%|0.1%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6954	Balanced Accuracy: 49.03%	 auROC: 0.492	 auPRC: 0.494
	Recall at 5%|10%|20% FDR: 0.0%|0.0%|0.0%	 Num Positives: 1600	 Num Negatives: 1600
Finished training after 9 epochs.

We can see that the validation loss is not decreasing and the auROC metric is not decreasing, which indicates this model is not learning. A simple plot of the learning curve, showing the loss function on the training and validation data over the course of training, demonstrates this visually:


In [19]:
SequenceDNN_learning_curve(one_filter_dragonn)


A multi-filter DragoNN model

Next, we modify the model to have 15 convolutional filters instead of just one filter. Will the model learn now?


In [20]:
multi_filter_dragonn_parameters = {
    'seq_length': 500,
    'num_filters': [15], ## notice the change from 1 filter to 15 filters
    'conv_width': [45],
    'pool_width': 45,
    'dropout': 0.1}
multi_filter_dragonn = get_SequenceDNN(multi_filter_dragonn_parameters)
train_SequenceDNN(multi_filter_dragonn, simulation_data)
SequenceDNN_learning_curve(multi_filter_dragonn)


Training model (* indicates new best result)...
Epoch 1:
Train Loss: 0.7016	Balanced Accuracy: 52.69%	 auROC: 0.546	 auPRC: 0.534
	Recall at 5%|10%|20% FDR: 0.0%|0.0%|0.2%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.7117	Balanced Accuracy: 51.28%	 auROC: 0.508	 auPRC: 0.517
	Recall at 5%|10%|20% FDR: 0.1%|0.1%|0.2%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 2:
Train Loss: 0.6798	Balanced Accuracy: 56.49%	 auROC: 0.593	 auPRC: 0.576
	Recall at 5%|10%|20% FDR: 0.1%|0.1%|0.4%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6985	Balanced Accuracy: 52.56%	 auROC: 0.530	 auPRC: 0.534
	Recall at 5%|10%|20% FDR: 0.1%|0.1%|1.2%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 3:
Train Loss: 0.6678	Balanced Accuracy: 59.42%	 auROC: 0.633	 auPRC: 0.615
	Recall at 5%|10%|20% FDR: 0.2%|0.3%|1.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6926	Balanced Accuracy: 53.47%	 auROC: 0.552	 auPRC: 0.553
	Recall at 5%|10%|20% FDR: 0.1%|0.8%|0.8%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 4:
Train Loss: 0.6609	Balanced Accuracy: 60.28%	 auROC: 0.664	 auPRC: 0.645
	Recall at 5%|10%|20% FDR: 0.1%|0.5%|3.1%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6908	Balanced Accuracy: 54.34%	 auROC: 0.570	 auPRC: 0.566
	Recall at 5%|10%|20% FDR: 0.2%|0.2%|1.1%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 5:
Train Loss: 0.6501	Balanced Accuracy: 62.04%	 auROC: 0.688	 auPRC: 0.670
	Recall at 5%|10%|20% FDR: 0.4%|0.9%|7.4%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6873	Balanced Accuracy: 55.41%	 auROC: 0.586	 auPRC: 0.578
	Recall at 5%|10%|20% FDR: 0.2%|0.2%|1.2%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 6:
Train Loss: 0.6380	Balanced Accuracy: 64.95%	 auROC: 0.707	 auPRC: 0.690
	Recall at 5%|10%|20% FDR: 0.4%|1.2%|13.7%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6783	Balanced Accuracy: 57.59%	 auROC: 0.600	 auPRC: 0.590
	Recall at 5%|10%|20% FDR: 0.3%|0.6%|1.9%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 7:
Train Loss: 0.6286	Balanced Accuracy: 66.19%	 auROC: 0.721	 auPRC: 0.705
	Recall at 5%|10%|20% FDR: 0.0%|2.1%|18.2%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6740	Balanced Accuracy: 58.50%	 auROC: 0.611	 auPRC: 0.600
	Recall at 5%|10%|20% FDR: 0.2%|0.8%|1.7%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 8:
Train Loss: 0.6201	Balanced Accuracy: 67.32%	 auROC: 0.734	 auPRC: 0.718
	Recall at 5%|10%|20% FDR: 0.0%|5.6%|22.7%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6698	Balanced Accuracy: 58.81%	 auROC: 0.622	 auPRC: 0.607
	Recall at 5%|10%|20% FDR: 0.2%|0.7%|1.6%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 9:
Train Loss: 0.6120	Balanced Accuracy: 68.22%	 auROC: 0.745	 auPRC: 0.730
	Recall at 5%|10%|20% FDR: 0.0%|6.9%|27.3%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6658	Balanced Accuracy: 59.72%	 auROC: 0.632	 auPRC: 0.615
	Recall at 5%|10%|20% FDR: 0.3%|0.9%|1.4%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 10:
Train Loss: 0.6042	Balanced Accuracy: 68.67%	 auROC: 0.755	 auPRC: 0.741
	Recall at 5%|10%|20% FDR: 0.0%|8.3%|32.1%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6613	Balanced Accuracy: 60.44%	 auROC: 0.642	 auPRC: 0.622
	Recall at 5%|10%|20% FDR: 0.3%|0.9%|1.3%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 11:
Train Loss: 0.6002	Balanced Accuracy: 68.04%	 auROC: 0.766	 auPRC: 0.751
	Recall at 5%|10%|20% FDR: 3.2%|9.2%|35.4%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6624	Balanced Accuracy: 59.66%	 auROC: 0.651	 auPRC: 0.631
	Recall at 5%|10%|20% FDR: 0.4%|1.1%|2.1%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 12:
Train Loss: 0.5866	Balanced Accuracy: 70.72%	 auROC: 0.777	 auPRC: 0.763
	Recall at 5%|10%|20% FDR: 0.0%|12.7%|39.1%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6518	Balanced Accuracy: 61.03%	 auROC: 0.660	 auPRC: 0.638
	Recall at 5%|10%|20% FDR: 0.5%|0.9%|1.8%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 13:
Train Loss: 0.5765	Balanced Accuracy: 71.68%	 auROC: 0.789	 auPRC: 0.775
	Recall at 5%|10%|20% FDR: 3.3%|14.0%|44.5%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6452	Balanced Accuracy: 61.78%	 auROC: 0.673	 auPRC: 0.648
	Recall at 5%|10%|20% FDR: 0.5%|0.8%|2.1%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 14:
Train Loss: 0.5671	Balanced Accuracy: 72.75%	 auROC: 0.804	 auPRC: 0.789
	Recall at 5%|10%|20% FDR: 3.9%|18.7%|50.3%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6371	Balanced Accuracy: 63.72%	 auROC: 0.688	 auPRC: 0.661
	Recall at 5%|10%|20% FDR: 0.7%|0.8%|2.6%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 15:
Train Loss: 0.5538	Balanced Accuracy: 73.43%	 auROC: 0.820	 auPRC: 0.804
	Recall at 5%|10%|20% FDR: 5.9%|22.2%|56.5%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6276	Balanced Accuracy: 63.62%	 auROC: 0.705	 auPRC: 0.675
	Recall at 5%|10%|20% FDR: 0.6%|0.9%|2.6%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 16:
Train Loss: 0.5396	Balanced Accuracy: 74.25%	 auROC: 0.837	 auPRC: 0.821
	Recall at 5%|10%|20% FDR: 7.9%|26.5%|62.5%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6172	Balanced Accuracy: 64.53%	 auROC: 0.724	 auPRC: 0.691
	Recall at 5%|10%|20% FDR: 0.6%|0.9%|3.3%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 17:
Train Loss: 0.5182	Balanced Accuracy: 76.96%	 auROC: 0.853	 auPRC: 0.837
	Recall at 5%|10%|20% FDR: 13.0%|34.1%|67.8%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.5974	Balanced Accuracy: 67.56%	 auROC: 0.744	 auPRC: 0.710
	Recall at 5%|10%|20% FDR: 0.6%|0.8%|6.0%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 18:
Train Loss: 0.4996	Balanced Accuracy: 78.18%	 auROC: 0.868	 auPRC: 0.852
	Recall at 5%|10%|20% FDR: 13.0%|39.7%|72.7%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.5815	Balanced Accuracy: 68.88%	 auROC: 0.764	 auPRC: 0.729
	Recall at 5%|10%|20% FDR: 0.8%|0.9%|16.6%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 19:
Train Loss: 0.4799	Balanced Accuracy: 79.74%	 auROC: 0.882	 auPRC: 0.866
	Recall at 5%|10%|20% FDR: 15.3%|45.5%|78.1%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.5640	Balanced Accuracy: 70.59%	 auROC: 0.782	 auPRC: 0.748
	Recall at 5%|10%|20% FDR: 0.6%|1.2%|21.9%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 20:
Train Loss: 0.4609	Balanced Accuracy: 81.47%	 auROC: 0.896	 auPRC: 0.881
	Recall at 5%|10%|20% FDR: 21.6%|52.8%|83.5%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.5465	Balanced Accuracy: 72.69%	 auROC: 0.801	 auPRC: 0.767
	Recall at 5%|10%|20% FDR: 0.8%|1.2%|31.0%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 21:
Train Loss: 0.4499	Balanced Accuracy: 82.16%	 auROC: 0.908	 auPRC: 0.894
	Recall at 5%|10%|20% FDR: 28.2%|58.9%|87.5%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.5357	Balanced Accuracy: 74.50%	 auROC: 0.819	 auPRC: 0.786
	Recall at 5%|10%|20% FDR: 0.9%|6.2%|49.9%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 22:
Train Loss: 0.4330	Balanced Accuracy: 83.07%	 auROC: 0.916	 auPRC: 0.903
	Recall at 5%|10%|20% FDR: 32.7%|62.4%|89.7%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.5200	Balanced Accuracy: 76.12%	 auROC: 0.833	 auPRC: 0.801
	Recall at 5%|10%|20% FDR: 0.9%|7.9%|55.1%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 23:
Train Loss: 0.4097	Balanced Accuracy: 84.59%	 auROC: 0.924	 auPRC: 0.912
	Recall at 5%|10%|20% FDR: 39.5%|66.9%|91.8%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.4978	Balanced Accuracy: 77.28%	 auROC: 0.846	 auPRC: 0.815
	Recall at 5%|10%|20% FDR: 0.9%|5.0%|64.2%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 24:
Train Loss: 0.3954	Balanced Accuracy: 85.31%	 auROC: 0.929	 auPRC: 0.919
	Recall at 5%|10%|20% FDR: 42.7%|71.7%|92.9%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.4849	Balanced Accuracy: 78.19%	 auROC: 0.856	 auPRC: 0.826
	Recall at 5%|10%|20% FDR: 3.7%|9.8%|69.1%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 25:
Train Loss: 0.3856	Balanced Accuracy: 85.63%	 auROC: 0.935	 auPRC: 0.925
	Recall at 5%|10%|20% FDR: 46.8%|75.1%|94.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.4764	Balanced Accuracy: 79.41%	 auROC: 0.864	 auPRC: 0.836
	Recall at 5%|10%|20% FDR: 3.2%|23.1%|74.1%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 26:
Train Loss: 0.3650	Balanced Accuracy: 86.87%	 auROC: 0.939	 auPRC: 0.929
	Recall at 5%|10%|20% FDR: 48.5%|78.4%|94.7%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.4569	Balanced Accuracy: 80.06%	 auROC: 0.872	 auPRC: 0.845
	Recall at 5%|10%|20% FDR: 3.4%|30.8%|77.3%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 27:
Train Loss: 0.3578	Balanced Accuracy: 86.85%	 auROC: 0.942	 auPRC: 0.933
	Recall at 5%|10%|20% FDR: 50.7%|80.8%|95.2%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.4510	Balanced Accuracy: 80.81%	 auROC: 0.877	 auPRC: 0.852
	Recall at 5%|10%|20% FDR: 5.1%|33.4%|80.7%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 28:
Train Loss: 0.3626	Balanced Accuracy: 86.30%	 auROC: 0.945	 auPRC: 0.937
	Recall at 5%|10%|20% FDR: 53.7%|82.2%|95.6%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.4566	Balanced Accuracy: 80.31%	 auROC: 0.883	 auPRC: 0.859
	Recall at 5%|10%|20% FDR: 5.5%|34.4%|83.3%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 29:
Train Loss: 0.3407	Balanced Accuracy: 87.83%	 auROC: 0.948	 auPRC: 0.940
	Recall at 5%|10%|20% FDR: 55.9%|84.1%|96.1%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.4343	Balanced Accuracy: 82.53%	 auROC: 0.888	 auPRC: 0.866
	Recall at 5%|10%|20% FDR: 13.8%|38.4%|85.9%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 30:
Train Loss: 0.3297	Balanced Accuracy: 88.27%	 auROC: 0.950	 auPRC: 0.942
	Recall at 5%|10%|20% FDR: 57.6%|85.1%|96.3%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.4240	Balanced Accuracy: 82.47%	 auROC: 0.892	 auPRC: 0.871
	Recall at 5%|10%|20% FDR: 14.8%|39.1%|86.7%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 31:
Train Loss: 0.3256	Balanced Accuracy: 88.54%	 auROC: 0.952	 auPRC: 0.945
	Recall at 5%|10%|20% FDR: 60.8%|85.6%|96.4%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.4209	Balanced Accuracy: 82.62%	 auROC: 0.895	 auPRC: 0.875
	Recall at 5%|10%|20% FDR: 18.5%|42.9%|87.0%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 32:
Train Loss: 0.3214	Balanced Accuracy: 88.67%	 auROC: 0.954	 auPRC: 0.947
	Recall at 5%|10%|20% FDR: 63.0%|86.7%|96.6%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.4171	Balanced Accuracy: 82.75%	 auROC: 0.898	 auPRC: 0.879
	Recall at 5%|10%|20% FDR: 20.6%|45.0%|87.7%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 33:
Train Loss: 0.3090	Balanced Accuracy: 89.26%	 auROC: 0.955	 auPRC: 0.948
	Recall at 5%|10%|20% FDR: 63.9%|87.1%|96.8%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.4056	Balanced Accuracy: 83.38%	 auROC: 0.901	 auPRC: 0.882
	Recall at 5%|10%|20% FDR: 20.4%|48.8%|88.8%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 34:
Train Loss: 0.3126	Balanced Accuracy: 88.84%	 auROC: 0.956	 auPRC: 0.949
	Recall at 5%|10%|20% FDR: 65.5%|87.6%|96.9%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.4116	Balanced Accuracy: 83.28%	 auROC: 0.901	 auPRC: 0.883
	Recall at 5%|10%|20% FDR: 19.9%|47.1%|89.3%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 35:
Train Loss: 0.3024	Balanced Accuracy: 89.59%	 auROC: 0.957	 auPRC: 0.951
	Recall at 5%|10%|20% FDR: 67.2%|88.1%|97.1%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.4033	Balanced Accuracy: 83.25%	 auROC: 0.902	 auPRC: 0.885
	Recall at 5%|10%|20% FDR: 18.9%|50.1%|89.3%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 36:
Train Loss: 0.3106	Balanced Accuracy: 88.62%	 auROC: 0.958	 auPRC: 0.952
	Recall at 5%|10%|20% FDR: 67.9%|88.8%|97.2%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.4136	Balanced Accuracy: 82.84%	 auROC: 0.904	 auPRC: 0.886
	Recall at 5%|10%|20% FDR: 19.3%|50.9%|89.5%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 37:
Train Loss: 0.2975	Balanced Accuracy: 89.70%	 auROC: 0.959	 auPRC: 0.953
	Recall at 5%|10%|20% FDR: 69.3%|88.9%|97.3%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.4005	Balanced Accuracy: 83.75%	 auROC: 0.906	 auPRC: 0.889
	Recall at 5%|10%|20% FDR: 16.8%|52.7%|89.3%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 38:
Train Loss: 0.3048	Balanced Accuracy: 88.84%	 auROC: 0.960	 auPRC: 0.954
	Recall at 5%|10%|20% FDR: 70.6%|89.1%|97.4%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.4104	Balanced Accuracy: 83.25%	 auROC: 0.906	 auPRC: 0.889
	Recall at 5%|10%|20% FDR: 16.5%|54.2%|89.6%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 39:
Train Loss: 0.2836	Balanced Accuracy: 90.08%	 auROC: 0.961	 auPRC: 0.955
	Recall at 5%|10%|20% FDR: 72.4%|89.3%|97.4%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3890	Balanced Accuracy: 84.12%	 auROC: 0.908	 auPRC: 0.891
	Recall at 5%|10%|20% FDR: 15.9%|52.6%|90.2%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 40:
Train Loss: 0.2803	Balanced Accuracy: 90.09%	 auROC: 0.962	 auPRC: 0.956
	Recall at 5%|10%|20% FDR: 73.4%|89.9%|97.4%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3886	Balanced Accuracy: 83.19%	 auROC: 0.908	 auPRC: 0.892
	Recall at 5%|10%|20% FDR: 17.0%|55.0%|89.1%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 41:
Train Loss: 0.2882	Balanced Accuracy: 89.84%	 auROC: 0.963	 auPRC: 0.957
	Recall at 5%|10%|20% FDR: 73.6%|90.1%|97.5%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3972	Balanced Accuracy: 83.62%	 auROC: 0.909	 auPRC: 0.893
	Recall at 5%|10%|20% FDR: 20.6%|57.1%|89.6%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 42:
Train Loss: 0.2965	Balanced Accuracy: 89.02%	 auROC: 0.963	 auPRC: 0.957
	Recall at 5%|10%|20% FDR: 73.6%|90.2%|97.6%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.4077	Balanced Accuracy: 83.41%	 auROC: 0.910	 auPRC: 0.894
	Recall at 5%|10%|20% FDR: 19.2%|59.6%|90.6%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 43:
Train Loss: 0.2715	Balanced Accuracy: 90.47%	 auROC: 0.964	 auPRC: 0.959
	Recall at 5%|10%|20% FDR: 74.5%|90.7%|97.7%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3837	Balanced Accuracy: 83.97%	 auROC: 0.911	 auPRC: 0.895
	Recall at 5%|10%|20% FDR: 16.7%|60.9%|89.8%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 44:
Train Loss: 0.2732	Balanced Accuracy: 90.43%	 auROC: 0.965	 auPRC: 0.960
	Recall at 5%|10%|20% FDR: 76.4%|91.0%|97.7%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3866	Balanced Accuracy: 83.88%	 auROC: 0.911	 auPRC: 0.896
	Recall at 5%|10%|20% FDR: 18.8%|60.4%|90.1%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 45:
Train Loss: 0.2708	Balanced Accuracy: 90.56%	 auROC: 0.965	 auPRC: 0.961
	Recall at 5%|10%|20% FDR: 77.1%|91.5%|97.7%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3870	Balanced Accuracy: 83.75%	 auROC: 0.911	 auPRC: 0.896
	Recall at 5%|10%|20% FDR: 17.1%|61.9%|90.1%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 46:
Train Loss: 0.2655	Balanced Accuracy: 90.69%	 auROC: 0.966	 auPRC: 0.961
	Recall at 5%|10%|20% FDR: 77.3%|91.8%|97.9%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3824	Balanced Accuracy: 84.16%	 auROC: 0.912	 auPRC: 0.898
	Recall at 5%|10%|20% FDR: 21.0%|59.9%|90.2%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 47:
Train Loss: 0.2586	Balanced Accuracy: 90.90%	 auROC: 0.966	 auPRC: 0.962
	Recall at 5%|10%|20% FDR: 77.9%|92.0%|97.9%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3776	Balanced Accuracy: 84.12%	 auROC: 0.912	 auPRC: 0.898
	Recall at 5%|10%|20% FDR: 27.5%|60.4%|90.2%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 48:
Train Loss: 0.2701	Balanced Accuracy: 90.33%	 auROC: 0.967	 auPRC: 0.962
	Recall at 5%|10%|20% FDR: 78.4%|92.3%|98.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3889	Balanced Accuracy: 83.75%	 auROC: 0.914	 auPRC: 0.899
	Recall at 5%|10%|20% FDR: 28.8%|62.3%|89.9%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 49:
Train Loss: 0.2668	Balanced Accuracy: 90.46%	 auROC: 0.968	 auPRC: 0.963
	Recall at 5%|10%|20% FDR: 78.6%|92.7%|98.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3879	Balanced Accuracy: 83.84%	 auROC: 0.914	 auPRC: 0.899
	Recall at 5%|10%|20% FDR: 26.4%|63.8%|90.1%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 50:
Train Loss: 0.2658	Balanced Accuracy: 90.31%	 auROC: 0.968	 auPRC: 0.964
	Recall at 5%|10%|20% FDR: 78.9%|92.6%|98.1%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3872	Balanced Accuracy: 83.94%	 auROC: 0.915	 auPRC: 0.901
	Recall at 5%|10%|20% FDR: 21.2%|63.9%|90.3%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 51:
Train Loss: 0.2857	Balanced Accuracy: 88.97%	 auROC: 0.969	 auPRC: 0.965
	Recall at 5%|10%|20% FDR: 79.2%|92.8%|98.2%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.4100	Balanced Accuracy: 83.56%	 auROC: 0.916	 auPRC: 0.901
	Recall at 5%|10%|20% FDR: 21.7%|65.6%|90.5%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 52:
Train Loss: 0.2469	Balanced Accuracy: 91.33%	 auROC: 0.970	 auPRC: 0.966
	Recall at 5%|10%|20% FDR: 80.8%|93.0%|98.2%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3720	Balanced Accuracy: 84.56%	 auROC: 0.916	 auPRC: 0.902
	Recall at 5%|10%|20% FDR: 31.5%|65.4%|90.6%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 53:
Train Loss: 0.2540	Balanced Accuracy: 90.98%	 auROC: 0.970	 auPRC: 0.966
	Recall at 5%|10%|20% FDR: 80.9%|93.2%|98.2%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3786	Balanced Accuracy: 84.09%	 auROC: 0.917	 auPRC: 0.903
	Recall at 5%|10%|20% FDR: 31.2%|65.5%|90.7%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 54:
Train Loss: 0.2477	Balanced Accuracy: 91.25%	 auROC: 0.971	 auPRC: 0.967
	Recall at 5%|10%|20% FDR: 81.7%|93.4%|98.3%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3738	Balanced Accuracy: 84.28%	 auROC: 0.918	 auPRC: 0.904
	Recall at 5%|10%|20% FDR: 31.1%|67.2%|91.3%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 55:
Train Loss: 0.2389	Balanced Accuracy: 91.71%	 auROC: 0.972	 auPRC: 0.968
	Recall at 5%|10%|20% FDR: 82.2%|93.7%|98.5%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3645	Balanced Accuracy: 84.66%	 auROC: 0.919	 auPRC: 0.906
	Recall at 5%|10%|20% FDR: 33.1%|68.2%|92.1%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 56:
Train Loss: 0.2505	Balanced Accuracy: 91.19%	 auROC: 0.972	 auPRC: 0.969
	Recall at 5%|10%|20% FDR: 82.6%|94.0%|98.4%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3763	Balanced Accuracy: 84.44%	 auROC: 0.920	 auPRC: 0.907
	Recall at 5%|10%|20% FDR: 34.9%|68.0%|92.1%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 57:
Train Loss: 0.2412	Balanced Accuracy: 91.44%	 auROC: 0.973	 auPRC: 0.969
	Recall at 5%|10%|20% FDR: 83.6%|94.4%|98.5%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3666	Balanced Accuracy: 84.97%	 auROC: 0.922	 auPRC: 0.909
	Recall at 5%|10%|20% FDR: 35.4%|69.3%|92.4%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 58:
Train Loss: 0.2584	Balanced Accuracy: 90.66%	 auROC: 0.974	 auPRC: 0.970
	Recall at 5%|10%|20% FDR: 84.5%|94.4%|98.5%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3856	Balanced Accuracy: 84.78%	 auROC: 0.923	 auPRC: 0.910
	Recall at 5%|10%|20% FDR: 38.1%|69.2%|92.6%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 59:
Train Loss: 0.2377	Balanced Accuracy: 91.61%	 auROC: 0.974	 auPRC: 0.971
	Recall at 5%|10%|20% FDR: 84.7%|94.7%|98.6%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3629	Balanced Accuracy: 85.06%	 auROC: 0.925	 auPRC: 0.912
	Recall at 5%|10%|20% FDR: 39.9%|71.2%|93.5%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 60:
Train Loss: 0.2264	Balanced Accuracy: 92.12%	 auROC: 0.975	 auPRC: 0.972
	Recall at 5%|10%|20% FDR: 85.3%|94.8%|98.6%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3525	Balanced Accuracy: 85.34%	 auROC: 0.926	 auPRC: 0.914
	Recall at 5%|10%|20% FDR: 39.6%|71.6%|93.6%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 61:
Train Loss: 0.2233	Balanced Accuracy: 92.33%	 auROC: 0.976	 auPRC: 0.973
	Recall at 5%|10%|20% FDR: 86.0%|95.1%|98.7%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3474	Balanced Accuracy: 85.31%	 auROC: 0.928	 auPRC: 0.916
	Recall at 5%|10%|20% FDR: 39.9%|72.8%|93.7%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 62:
Train Loss: 0.2211	Balanced Accuracy: 92.34%	 auROC: 0.977	 auPRC: 0.974
	Recall at 5%|10%|20% FDR: 86.9%|95.3%|98.8%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3456	Balanced Accuracy: 85.47%	 auROC: 0.929	 auPRC: 0.918
	Recall at 5%|10%|20% FDR: 41.0%|73.1%|93.9%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 63:
Train Loss: 0.2176	Balanced Accuracy: 92.50%	 auROC: 0.977	 auPRC: 0.975
	Recall at 5%|10%|20% FDR: 87.3%|95.5%|98.9%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3420	Balanced Accuracy: 85.69%	 auROC: 0.931	 auPRC: 0.920
	Recall at 5%|10%|20% FDR: 48.9%|75.1%|94.2%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 64:
Train Loss: 0.2127	Balanced Accuracy: 92.83%	 auROC: 0.978	 auPRC: 0.976
	Recall at 5%|10%|20% FDR: 88.0%|95.8%|99.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3378	Balanced Accuracy: 85.78%	 auROC: 0.932	 auPRC: 0.921
	Recall at 5%|10%|20% FDR: 51.1%|75.3%|94.1%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 65:
Train Loss: 0.2044	Balanced Accuracy: 93.00%	 auROC: 0.979	 auPRC: 0.977
	Recall at 5%|10%|20% FDR: 88.9%|96.1%|99.1%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3308	Balanced Accuracy: 85.62%	 auROC: 0.933	 auPRC: 0.923
	Recall at 5%|10%|20% FDR: 53.3%|75.8%|94.1%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 66:
Train Loss: 0.2025	Balanced Accuracy: 93.23%	 auROC: 0.980	 auPRC: 0.978
	Recall at 5%|10%|20% FDR: 89.9%|96.1%|99.2%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3283	Balanced Accuracy: 86.16%	 auROC: 0.934	 auPRC: 0.924
	Recall at 5%|10%|20% FDR: 53.2%|77.0%|94.1%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 67:
Train Loss: 0.2087	Balanced Accuracy: 92.72%	 auROC: 0.980	 auPRC: 0.978
	Recall at 5%|10%|20% FDR: 90.2%|96.2%|99.1%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3346	Balanced Accuracy: 86.25%	 auROC: 0.936	 auPRC: 0.926
	Recall at 5%|10%|20% FDR: 54.9%|78.1%|94.8%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 68:
Train Loss: 0.2033	Balanced Accuracy: 93.01%	 auROC: 0.981	 auPRC: 0.979
	Recall at 5%|10%|20% FDR: 90.4%|96.4%|99.1%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3290	Balanced Accuracy: 86.59%	 auROC: 0.937	 auPRC: 0.927
	Recall at 5%|10%|20% FDR: 56.7%|78.6%|94.5%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 69:
Train Loss: 0.1974	Balanced Accuracy: 93.22%	 auROC: 0.982	 auPRC: 0.980
	Recall at 5%|10%|20% FDR: 91.0%|96.6%|99.2%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3231	Balanced Accuracy: 86.59%	 auROC: 0.939	 auPRC: 0.929
	Recall at 5%|10%|20% FDR: 58.8%|78.5%|94.9%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 70:
Train Loss: 0.1874	Balanced Accuracy: 93.71%	 auROC: 0.982	 auPRC: 0.981
	Recall at 5%|10%|20% FDR: 91.1%|96.7%|99.2%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3129	Balanced Accuracy: 86.44%	 auROC: 0.940	 auPRC: 0.931
	Recall at 5%|10%|20% FDR: 59.7%|79.9%|95.2%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 71:
Train Loss: 0.1862	Balanced Accuracy: 93.77%	 auROC: 0.983	 auPRC: 0.981
	Recall at 5%|10%|20% FDR: 91.6%|96.7%|99.2%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3109	Balanced Accuracy: 86.81%	 auROC: 0.941	 auPRC: 0.932
	Recall at 5%|10%|20% FDR: 59.2%|80.4%|95.1%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 72:
Train Loss: 0.1828	Balanced Accuracy: 93.95%	 auROC: 0.983	 auPRC: 0.982
	Recall at 5%|10%|20% FDR: 91.8%|97.0%|99.2%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3071	Balanced Accuracy: 87.03%	 auROC: 0.943	 auPRC: 0.934
	Recall at 5%|10%|20% FDR: 58.9%|81.1%|95.1%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 73:
Train Loss: 0.1838	Balanced Accuracy: 93.87%	 auROC: 0.984	 auPRC: 0.982
	Recall at 5%|10%|20% FDR: 92.3%|97.2%|99.2%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3077	Balanced Accuracy: 87.22%	 auROC: 0.944	 auPRC: 0.935
	Recall at 5%|10%|20% FDR: 59.7%|82.4%|95.6%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 74:
Train Loss: 0.1890	Balanced Accuracy: 93.54%	 auROC: 0.984	 auPRC: 0.983
	Recall at 5%|10%|20% FDR: 92.9%|97.3%|99.3%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3130	Balanced Accuracy: 87.59%	 auROC: 0.945	 auPRC: 0.937
	Recall at 5%|10%|20% FDR: 60.8%|82.2%|95.9%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 75:
Train Loss: 0.1758	Balanced Accuracy: 94.35%	 auROC: 0.985	 auPRC: 0.984
	Recall at 5%|10%|20% FDR: 93.1%|97.5%|99.3%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2992	Balanced Accuracy: 88.03%	 auROC: 0.946	 auPRC: 0.938
	Recall at 5%|10%|20% FDR: 63.1%|81.7%|96.1%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 76:
Train Loss: 0.1726	Balanced Accuracy: 94.43%	 auROC: 0.986	 auPRC: 0.985
	Recall at 5%|10%|20% FDR: 93.7%|97.7%|99.3%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2961	Balanced Accuracy: 87.97%	 auROC: 0.947	 auPRC: 0.939
	Recall at 5%|10%|20% FDR: 63.9%|82.1%|95.8%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 77:
Train Loss: 0.1722	Balanced Accuracy: 94.28%	 auROC: 0.986	 auPRC: 0.985
	Recall at 5%|10%|20% FDR: 93.6%|97.7%|99.4%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2943	Balanced Accuracy: 88.34%	 auROC: 0.949	 auPRC: 0.941
	Recall at 5%|10%|20% FDR: 64.2%|83.5%|96.4%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 78:
Train Loss: 0.1722	Balanced Accuracy: 94.25%	 auROC: 0.986	 auPRC: 0.985
	Recall at 5%|10%|20% FDR: 93.9%|97.9%|99.4%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2954	Balanced Accuracy: 88.06%	 auROC: 0.950	 auPRC: 0.942
	Recall at 5%|10%|20% FDR: 64.0%|83.7%|96.4%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 79:
Train Loss: 0.1649	Balanced Accuracy: 94.65%	 auROC: 0.987	 auPRC: 0.986
	Recall at 5%|10%|20% FDR: 94.2%|97.9%|99.4%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2883	Balanced Accuracy: 88.50%	 auROC: 0.950	 auPRC: 0.942
	Recall at 5%|10%|20% FDR: 63.7%|83.9%|96.3%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 80:
Train Loss: 0.1677	Balanced Accuracy: 94.45%	 auROC: 0.987	 auPRC: 0.986
	Recall at 5%|10%|20% FDR: 94.5%|98.1%|99.4%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2912	Balanced Accuracy: 88.56%	 auROC: 0.951	 auPRC: 0.943
	Recall at 5%|10%|20% FDR: 65.7%|84.7%|96.4%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 81:
Train Loss: 0.1596	Balanced Accuracy: 94.86%	 auROC: 0.988	 auPRC: 0.987
	Recall at 5%|10%|20% FDR: 94.8%|98.3%|99.5%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2820	Balanced Accuracy: 88.78%	 auROC: 0.952	 auPRC: 0.945
	Recall at 5%|10%|20% FDR: 66.9%|85.3%|96.6%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 82:
Train Loss: 0.1536	Balanced Accuracy: 95.07%	 auROC: 0.988	 auPRC: 0.987
	Recall at 5%|10%|20% FDR: 95.0%|98.2%|99.5%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2763	Balanced Accuracy: 88.53%	 auROC: 0.953	 auPRC: 0.946
	Recall at 5%|10%|20% FDR: 69.2%|86.1%|96.6%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 83:
Train Loss: 0.1592	Balanced Accuracy: 94.71%	 auROC: 0.988	 auPRC: 0.988
	Recall at 5%|10%|20% FDR: 95.1%|98.2%|99.5%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2810	Balanced Accuracy: 89.00%	 auROC: 0.955	 auPRC: 0.947
	Recall at 5%|10%|20% FDR: 69.4%|87.0%|96.7%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 84:
Train Loss: 0.1736	Balanced Accuracy: 93.97%	 auROC: 0.989	 auPRC: 0.988
	Recall at 5%|10%|20% FDR: 95.3%|98.3%|99.5%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2977	Balanced Accuracy: 87.94%	 auROC: 0.955	 auPRC: 0.947
	Recall at 5%|10%|20% FDR: 69.7%|86.4%|96.9%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 85:
Train Loss: 0.1505	Balanced Accuracy: 95.08%	 auROC: 0.989	 auPRC: 0.989
	Recall at 5%|10%|20% FDR: 95.6%|98.3%|99.6%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2730	Balanced Accuracy: 89.44%	 auROC: 0.956	 auPRC: 0.948
	Recall at 5%|10%|20% FDR: 69.3%|86.9%|96.9%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 86:
Train Loss: 0.1650	Balanced Accuracy: 94.32%	 auROC: 0.989	 auPRC: 0.989
	Recall at 5%|10%|20% FDR: 95.6%|98.3%|99.6%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2889	Balanced Accuracy: 88.50%	 auROC: 0.957	 auPRC: 0.949
	Recall at 5%|10%|20% FDR: 68.7%|87.5%|97.1%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 87:
Train Loss: 0.1441	Balanced Accuracy: 95.30%	 auROC: 0.990	 auPRC: 0.989
	Recall at 5%|10%|20% FDR: 95.8%|98.4%|99.5%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2662	Balanced Accuracy: 89.34%	 auROC: 0.957	 auPRC: 0.950
	Recall at 5%|10%|20% FDR: 70.0%|87.9%|96.9%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 88:
Train Loss: 0.1722	Balanced Accuracy: 93.87%	 auROC: 0.990	 auPRC: 0.989
	Recall at 5%|10%|20% FDR: 95.9%|98.4%|99.6%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2967	Balanced Accuracy: 88.44%	 auROC: 0.958	 auPRC: 0.951
	Recall at 5%|10%|20% FDR: 69.9%|88.6%|97.1%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 89:
Train Loss: 0.1469	Balanced Accuracy: 95.13%	 auROC: 0.990	 auPRC: 0.990
	Recall at 5%|10%|20% FDR: 95.9%|98.4%|99.6%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2689	Balanced Accuracy: 89.56%	 auROC: 0.959	 auPRC: 0.952
	Recall at 5%|10%|20% FDR: 69.4%|89.2%|97.3%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 90:
Train Loss: 0.1486	Balanced Accuracy: 95.11%	 auROC: 0.990	 auPRC: 0.990
	Recall at 5%|10%|20% FDR: 96.0%|98.5%|99.6%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2715	Balanced Accuracy: 89.41%	 auROC: 0.959	 auPRC: 0.952
	Recall at 5%|10%|20% FDR: 68.8%|89.1%|97.2%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 91:
Train Loss: 0.1353	Balanced Accuracy: 95.53%	 auROC: 0.991	 auPRC: 0.990
	Recall at 5%|10%|20% FDR: 96.1%|98.6%|99.6%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2565	Balanced Accuracy: 90.06%	 auROC: 0.960	 auPRC: 0.953
	Recall at 5%|10%|20% FDR: 70.0%|89.9%|97.4%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 92:
Train Loss: 0.1461	Balanced Accuracy: 95.21%	 auROC: 0.991	 auPRC: 0.991
	Recall at 5%|10%|20% FDR: 96.4%|98.7%|99.6%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2697	Balanced Accuracy: 89.25%	 auROC: 0.960	 auPRC: 0.953
	Recall at 5%|10%|20% FDR: 69.8%|89.6%|97.3%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 93:
Train Loss: 0.1505	Balanced Accuracy: 94.90%	 auROC: 0.991	 auPRC: 0.991
	Recall at 5%|10%|20% FDR: 96.5%|98.7%|99.6%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2749	Balanced Accuracy: 89.31%	 auROC: 0.961	 auPRC: 0.954
	Recall at 5%|10%|20% FDR: 73.0%|89.9%|97.5%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 94:
Train Loss: 0.1543	Balanced Accuracy: 94.60%	 auROC: 0.991	 auPRC: 0.991
	Recall at 5%|10%|20% FDR: 96.6%|98.7%|99.7%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2795	Balanced Accuracy: 89.06%	 auROC: 0.961	 auPRC: 0.954
	Recall at 5%|10%|20% FDR: 74.1%|89.6%|97.6%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 95:
Train Loss: 0.1410	Balanced Accuracy: 95.32%	 auROC: 0.991	 auPRC: 0.991
	Recall at 5%|10%|20% FDR: 96.6%|98.7%|99.6%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2654	Balanced Accuracy: 89.88%	 auROC: 0.961	 auPRC: 0.955
	Recall at 5%|10%|20% FDR: 71.1%|90.1%|97.6%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 96:
Train Loss: 0.1314	Balanced Accuracy: 95.73%	 auROC: 0.992	 auPRC: 0.991
	Recall at 5%|10%|20% FDR: 96.7%|98.8%|99.6%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2545	Balanced Accuracy: 90.38%	 auROC: 0.962	 auPRC: 0.955
	Recall at 5%|10%|20% FDR: 74.1%|90.1%|97.7%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 97:
Train Loss: 0.1339	Balanced Accuracy: 95.61%	 auROC: 0.992	 auPRC: 0.991
	Recall at 5%|10%|20% FDR: 96.9%|98.8%|99.6%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2579	Balanced Accuracy: 90.38%	 auROC: 0.962	 auPRC: 0.956
	Recall at 5%|10%|20% FDR: 72.1%|91.0%|97.8%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 98:
Train Loss: 0.1343	Balanced Accuracy: 95.60%	 auROC: 0.992	 auPRC: 0.992
	Recall at 5%|10%|20% FDR: 97.0%|98.8%|99.7%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2610	Balanced Accuracy: 90.34%	 auROC: 0.962	 auPRC: 0.955
	Recall at 5%|10%|20% FDR: 73.2%|90.4%|97.8%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 99:
Train Loss: 0.1351	Balanced Accuracy: 95.48%	 auROC: 0.992	 auPRC: 0.992
	Recall at 5%|10%|20% FDR: 97.0%|98.9%|99.7%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2624	Balanced Accuracy: 90.16%	 auROC: 0.962	 auPRC: 0.956
	Recall at 5%|10%|20% FDR: 74.8%|90.8%|97.8%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 100:
Train Loss: 0.1456	Balanced Accuracy: 94.85%	 auROC: 0.992	 auPRC: 0.992
	Recall at 5%|10%|20% FDR: 97.0%|98.9%|99.7%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2729	Balanced Accuracy: 89.62%	 auROC: 0.963	 auPRC: 0.957
	Recall at 5%|10%|20% FDR: 74.5%|90.9%|97.6%	 Num Positives: 1600	 Num Negatives: 1600
Finished training after 100 epochs.

Interpreting a DragoNN model using filter visualization

We can see that this model has not learned much because the validation loss has hardly decreased over the course of training and the auROC is only 0.586. Let's see what the sequence filters of this model look like.


In [21]:
interpret_SequenceDNN_filters(multi_filter_dragonn, simulation_data)


Plotting simulation motifs...
Visualizing convolutional sequence filters in SequenceDNN...

As can be expected, the sequence filters don't reveal patterns that resemble the simulated motifs. Next we explore methods to interpret specific sequences with this DragoNN model.

Interpreting data with a DragoNN model

Using in-silico mutagenesis (ISM) and DeepLIFT, we can obtain scores for specific sequence indicating the importance of each position in the sequence. To assess these methods we compare ISM and DeepLIFT scores to motif scores for each simulated motif at each position in the sequence. These motif scores represent the "ground truth" importance of each position because they are based on the motifs used to simulate the data. We plot provide comaprisons for a positive class sequence on the left and a negative class sequence on the right.


In [22]:
interpret_data_with_SequenceDNN(multi_filter_dragonn, simulation_data)


/users/jisraeli/local/anaconda/envs/dragonn2/lib/python2.7/site-packages/matplotlib/figure.py:1742: UserWarning: This figure includes Axes that are not compatible with tight_layout, so its results might be incorrect.
  warnings.warn("This figure includes Axes that are not "

We can see that neither DeepLIFT nor ISM highlight the locations of the simulated motifs (highlighted in grey). This is expected because this model doesn't perform well on this simulation.

A multi-layer DragoNN model

Next, we extend modify multi_filter_dragon to have 3 convolutional layers, with convolutional filter of 15 in each layer, to learn the heterodimer grammar compositionally across multiple layers.


In [23]:
multi_layer_dragonn_parameters = {
    'seq_length': 500,
    'num_filters': [15, 15, 15], ## notice the change to multiple filter values, one for each layer
    'conv_width': [25, 25, 25], ## convolutional filter width has been modified to 25 from 45
    'pool_width': 45,
    'dropout': 0.1}
multi_layer_dragonn = get_SequenceDNN(multi_layer_dragonn_parameters)
train_SequenceDNN(multi_layer_dragonn, simulation_data)
SequenceDNN_learning_curve(multi_layer_dragonn)


Training model (* indicates new best result)...
Epoch 1:
Train Loss: 0.6995	Balanced Accuracy: 50.30%	 auROC: 0.530	 auPRC: 0.531
	Recall at 5%|10%|20% FDR: 0.0%|0.0%|0.1%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6985	Balanced Accuracy: 50.97%	 auROC: 0.531	 auPRC: 0.524
	Recall at 5%|10%|20% FDR: 0.0%|0.0%|0.0%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 2:
Train Loss: 0.6889	Balanced Accuracy: 54.20%	 auROC: 0.561	 auPRC: 0.555
	Recall at 5%|10%|20% FDR: 0.1%|0.1%|0.3%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6916	Balanced Accuracy: 52.53%	 auROC: 0.532	 auPRC: 0.524
	Recall at 5%|10%|20% FDR: 0.1%|0.1%|0.1%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 3:
Train Loss: 0.6793	Balanced Accuracy: 58.09%	 auROC: 0.620	 auPRC: 0.613
	Recall at 5%|10%|20% FDR: 0.1%|0.4%|1.2%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6854	Balanced Accuracy: 55.41%	 auROC: 0.577	 auPRC: 0.566
	Recall at 5%|10%|20% FDR: 0.2%|0.2%|0.2%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 4:
Train Loss: 0.6575	Balanced Accuracy: 61.91%	 auROC: 0.666	 auPRC: 0.648
	Recall at 5%|10%|20% FDR: 0.1%|0.4%|4.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6648	Balanced Accuracy: 60.62%	 auROC: 0.645	 auPRC: 0.626
	Recall at 5%|10%|20% FDR: 0.2%|0.2%|0.3%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 5:
Train Loss: 0.6448	Balanced Accuracy: 65.26%	 auROC: 0.714	 auPRC: 0.693
	Recall at 5%|10%|20% FDR: 0.4%|2.5%|13.1%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6583	Balanced Accuracy: 62.56%	 auROC: 0.671	 auPRC: 0.647
	Recall at 5%|10%|20% FDR: 0.6%|0.9%|1.9%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 6:
Train Loss: 0.6188	Balanced Accuracy: 68.34%	 auROC: 0.750	 auPRC: 0.726
	Recall at 5%|10%|20% FDR: 0.4%|2.2%|25.3%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.6379	Balanced Accuracy: 64.75%	 auROC: 0.703	 auPRC: 0.682
	Recall at 5%|10%|20% FDR: 0.5%|0.5%|12.4%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 7:
Train Loss: 0.5732	Balanced Accuracy: 71.23%	 auROC: 0.797	 auPRC: 0.773
	Recall at 5%|10%|20% FDR: 2.0%|8.4%|43.2%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.5985	Balanced Accuracy: 67.53%	 auROC: 0.756	 auPRC: 0.733
	Recall at 5%|10%|20% FDR: 1.6%|2.0%|25.4%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 8:
Train Loss: 0.5247	Balanced Accuracy: 76.10%	 auROC: 0.851	 auPRC: 0.830
	Recall at 5%|10%|20% FDR: 4.6%|25.9%|69.1%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.5498	Balanced Accuracy: 73.69%	 auROC: 0.822	 auPRC: 0.799
	Recall at 5%|10%|20% FDR: 1.4%|12.7%|51.7%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 9:
Train Loss: 0.4555	Balanced Accuracy: 79.51%	 auROC: 0.880	 auPRC: 0.864
	Recall at 5%|10%|20% FDR: 12.8%|46.9%|79.2%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.4814	Balanced Accuracy: 77.94%	 auROC: 0.861	 auPRC: 0.843
	Recall at 5%|10%|20% FDR: 3.1%|31.0%|71.3%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 10:
Train Loss: 0.4019	Balanced Accuracy: 82.91%	 auROC: 0.910	 auPRC: 0.899
	Recall at 5%|10%|20% FDR: 35.5%|65.7%|86.5%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.4283	Balanced Accuracy: 81.53%	 auROC: 0.893	 auPRC: 0.881
	Recall at 5%|10%|20% FDR: 19.9%|56.2%|84.4%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 11:
Train Loss: 0.3665	Balanced Accuracy: 85.17%	 auROC: 0.928	 auPRC: 0.920
	Recall at 5%|10%|20% FDR: 48.8%|75.0%|90.2%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3958	Balanced Accuracy: 83.69%	 auROC: 0.911	 auPRC: 0.901
	Recall at 5%|10%|20% FDR: 33.9%|65.5%|88.4%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 12:
Train Loss: 0.3470	Balanced Accuracy: 86.70%	 auROC: 0.941	 auPRC: 0.936
	Recall at 5%|10%|20% FDR: 62.9%|80.9%|92.7%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3767	Balanced Accuracy: 84.81%	 auROC: 0.924	 auPRC: 0.918
	Recall at 5%|10%|20% FDR: 45.3%|74.6%|91.0%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 13:
Train Loss: 0.3114	Balanced Accuracy: 87.79%	 auROC: 0.949	 auPRC: 0.944
	Recall at 5%|10%|20% FDR: 68.3%|84.4%|94.1%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3448	Balanced Accuracy: 85.97%	 auROC: 0.932	 auPRC: 0.926
	Recall at 5%|10%|20% FDR: 58.5%|78.9%|91.9%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 14:
Train Loss: 0.2963	Balanced Accuracy: 88.33%	 auROC: 0.955	 auPRC: 0.952
	Recall at 5%|10%|20% FDR: 74.3%|86.4%|94.6%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3322	Balanced Accuracy: 86.22%	 auROC: 0.940	 auPRC: 0.935
	Recall at 5%|10%|20% FDR: 62.8%|82.7%|93.2%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 15:
Train Loss: 0.2642	Balanced Accuracy: 89.51%	 auROC: 0.962	 auPRC: 0.960
	Recall at 5%|10%|20% FDR: 80.0%|89.0%|95.6%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.3020	Balanced Accuracy: 88.19%	 auROC: 0.947	 auPRC: 0.943
	Recall at 5%|10%|20% FDR: 70.2%|86.1%|93.8%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 16:
Train Loss: 0.2428	Balanced Accuracy: 90.70%	 auROC: 0.970	 auPRC: 0.969
	Recall at 5%|10%|20% FDR: 84.6%|91.5%|97.2%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2818	Balanced Accuracy: 89.09%	 auROC: 0.955	 auPRC: 0.951
	Recall at 5%|10%|20% FDR: 77.6%|88.1%|94.7%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 17:
Train Loss: 0.2186	Balanced Accuracy: 91.47%	 auROC: 0.973	 auPRC: 0.973
	Recall at 5%|10%|20% FDR: 86.7%|92.2%|97.5%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2575	Balanced Accuracy: 90.03%	 auROC: 0.960	 auPRC: 0.957
	Recall at 5%|10%|20% FDR: 81.1%|90.1%|95.2%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 18:
Train Loss: 0.2103	Balanced Accuracy: 91.88%	 auROC: 0.981	 auPRC: 0.980
	Recall at 5%|10%|20% FDR: 90.4%|94.8%|98.4%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2506	Balanced Accuracy: 90.28%	 auROC: 0.967	 auPRC: 0.965
	Recall at 5%|10%|20% FDR: 85.0%|91.8%|96.8%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 19:
Train Loss: 0.1869	Balanced Accuracy: 93.22%	 auROC: 0.983	 auPRC: 0.983
	Recall at 5%|10%|20% FDR: 91.7%|95.4%|98.7%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2273	Balanced Accuracy: 91.94%	 auROC: 0.971	 auPRC: 0.968
	Recall at 5%|10%|20% FDR: 88.4%|92.6%|96.9%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 20:
Train Loss: 0.1742	Balanced Accuracy: 93.70%	 auROC: 0.987	 auPRC: 0.986
	Recall at 5%|10%|20% FDR: 93.2%|96.9%|99.2%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2156	Balanced Accuracy: 92.34%	 auROC: 0.975	 auPRC: 0.973
	Recall at 5%|10%|20% FDR: 89.4%|93.8%|97.5%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 21:
Train Loss: 0.1623	Balanced Accuracy: 93.69%	 auROC: 0.985	 auPRC: 0.985
	Recall at 5%|10%|20% FDR: 92.1%|95.9%|99.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.2069	Balanced Accuracy: 92.06%	 auROC: 0.974	 auPRC: 0.973
	Recall at 5%|10%|20% FDR: 89.4%|93.6%|97.3%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 22:
Train Loss: 0.1499	Balanced Accuracy: 94.44%	 auROC: 0.988	 auPRC: 0.988
	Recall at 5%|10%|20% FDR: 93.6%|97.3%|99.3%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.1999	Balanced Accuracy: 92.28%	 auROC: 0.977	 auPRC: 0.975
	Recall at 5%|10%|20% FDR: 90.2%|94.4%|97.9%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 23:
Train Loss: 0.1419	Balanced Accuracy: 95.48%	 auROC: 0.992	 auPRC: 0.991
	Recall at 5%|10%|20% FDR: 96.0%|98.2%|99.7%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.1942	Balanced Accuracy: 93.16%	 auROC: 0.980	 auPRC: 0.977
	Recall at 5%|10%|20% FDR: 90.7%|95.2%|97.8%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 24:
Train Loss: 0.1282	Balanced Accuracy: 94.81%	 auROC: 0.991	 auPRC: 0.990
	Recall at 5%|10%|20% FDR: 94.6%|97.7%|99.6%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.1844	Balanced Accuracy: 93.03%	 auROC: 0.979	 auPRC: 0.978
	Recall at 5%|10%|20% FDR: 90.6%|94.2%|97.9%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 25:
Train Loss: 0.1202	Balanced Accuracy: 95.49%	 auROC: 0.994	 auPRC: 0.992
	Recall at 5%|10%|20% FDR: 97.4%|98.9%|99.9%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.1710	Balanced Accuracy: 93.91%	 auROC: 0.982	 auPRC: 0.980
	Recall at 5%|10%|20% FDR: 92.9%|95.9%|98.6%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 26:
Train Loss: 0.1089	Balanced Accuracy: 95.67%	 auROC: 0.994	 auPRC: 0.994
	Recall at 5%|10%|20% FDR: 97.6%|99.0%|99.9%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.1669	Balanced Accuracy: 93.78%	 auROC: 0.983	 auPRC: 0.981
	Recall at 5%|10%|20% FDR: 92.6%|95.8%|98.8%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 27:
Train Loss: 0.1053	Balanced Accuracy: 96.62%	 auROC: 0.995	 auPRC: 0.995
	Recall at 5%|10%|20% FDR: 98.1%|99.4%|99.9%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.1632	Balanced Accuracy: 94.06%	 auROC: 0.984	 auPRC: 0.982
	Recall at 5%|10%|20% FDR: 93.1%|96.5%|98.9%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 28:
Train Loss: 0.1113	Balanced Accuracy: 96.87%	 auROC: 0.996	 auPRC: 0.995
	Recall at 5%|10%|20% FDR: 98.2%|99.5%|100.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.1745	Balanced Accuracy: 93.44%	 auROC: 0.985	 auPRC: 0.982
	Recall at 5%|10%|20% FDR: 93.0%|96.2%|99.1%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 29:
Train Loss: 0.0963	Balanced Accuracy: 97.28%	 auROC: 0.996	 auPRC: 0.996
	Recall at 5%|10%|20% FDR: 98.7%|99.7%|100.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.1597	Balanced Accuracy: 94.16%	 auROC: 0.985	 auPRC: 0.983
	Recall at 5%|10%|20% FDR: 93.4%|96.5%|99.1%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 30:
Train Loss: 0.0855	Balanced Accuracy: 96.95%	 auROC: 0.996	 auPRC: 0.996
	Recall at 5%|10%|20% FDR: 98.4%|99.6%|100.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.1568	Balanced Accuracy: 94.03%	 auROC: 0.985	 auPRC: 0.983
	Recall at 5%|10%|20% FDR: 93.0%|95.9%|99.1%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 31:
Train Loss: 0.0941	Balanced Accuracy: 97.49%	 auROC: 0.997	 auPRC: 0.996
	Recall at 5%|10%|20% FDR: 98.9%|99.8%|100.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.1597	Balanced Accuracy: 94.22%	 auROC: 0.987	 auPRC: 0.984
	Recall at 5%|10%|20% FDR: 94.1%|97.1%|99.2%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 32:
Train Loss: 0.0767	Balanced Accuracy: 97.44%	 auROC: 0.997	 auPRC: 0.997
	Recall at 5%|10%|20% FDR: 99.2%|99.8%|100.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.1439	Balanced Accuracy: 94.62%	 auROC: 0.987	 auPRC: 0.984
	Recall at 5%|10%|20% FDR: 94.0%|97.1%|99.2%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 33:
Train Loss: 0.0721	Balanced Accuracy: 97.39%	 auROC: 0.998	 auPRC: 0.997
	Recall at 5%|10%|20% FDR: 99.4%|99.8%|100.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.1424	Balanced Accuracy: 94.94%	 auROC: 0.987	 auPRC: 0.984
	Recall at 5%|10%|20% FDR: 94.2%|97.4%|99.2%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 34:
Train Loss: 0.0688	Balanced Accuracy: 98.04%	 auROC: 0.998	 auPRC: 0.998
	Recall at 5%|10%|20% FDR: 99.6%|99.9%|100.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.1436	Balanced Accuracy: 94.72%	 auROC: 0.987	 auPRC: 0.985
	Recall at 5%|10%|20% FDR: 94.2%|97.4%|99.2%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 35:
Train Loss: 0.0653	Balanced Accuracy: 97.37%	 auROC: 0.998	 auPRC: 0.998
	Recall at 5%|10%|20% FDR: 99.4%|99.9%|100.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.1474	Balanced Accuracy: 94.72%	 auROC: 0.987	 auPRC: 0.985
	Recall at 5%|10%|20% FDR: 93.8%|97.0%|99.3%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 36:
Train Loss: 0.0611	Balanced Accuracy: 98.34%	 auROC: 0.998	 auPRC: 0.998
	Recall at 5%|10%|20% FDR: 99.7%|99.9%|100.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.1401	Balanced Accuracy: 95.09%	 auROC: 0.987	 auPRC: 0.985
	Recall at 5%|10%|20% FDR: 94.7%|97.2%|99.2%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 37:
Train Loss: 0.0583	Balanced Accuracy: 97.79%	 auROC: 0.999	 auPRC: 0.999
	Recall at 5%|10%|20% FDR: 99.9%|100.0%|100.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.1465	Balanced Accuracy: 95.00%	 auROC: 0.988	 auPRC: 0.985
	Recall at 5%|10%|20% FDR: 95.1%|97.6%|99.0%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 38:
Train Loss: 0.0555	Balanced Accuracy: 98.22%	 auROC: 0.999	 auPRC: 0.998
	Recall at 5%|10%|20% FDR: 99.8%|100.0%|100.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.1443	Balanced Accuracy: 94.91%	 auROC: 0.988	 auPRC: 0.985
	Recall at 5%|10%|20% FDR: 94.6%|97.2%|99.4%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 39:
Train Loss: 0.0541	Balanced Accuracy: 97.82%	 auROC: 0.999	 auPRC: 0.999
	Recall at 5%|10%|20% FDR: 99.9%|100.0%|100.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.1442	Balanced Accuracy: 94.88%	 auROC: 0.988	 auPRC: 0.985
	Recall at 5%|10%|20% FDR: 95.3%|97.7%|99.2%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 40:
Train Loss: 0.0556	Balanced Accuracy: 97.56%	 auROC: 0.999	 auPRC: 0.999
	Recall at 5%|10%|20% FDR: 99.9%|100.0%|100.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.1540	Balanced Accuracy: 94.69%	 auROC: 0.988	 auPRC: 0.985
	Recall at 5%|10%|20% FDR: 95.1%|97.6%|99.3%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 41:
Train Loss: 0.0484	Balanced Accuracy: 99.05%	 auROC: 0.999	 auPRC: 0.999
	Recall at 5%|10%|20% FDR: 99.9%|100.0%|100.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.1354	Balanced Accuracy: 95.47%	 auROC: 0.988	 auPRC: 0.986
	Recall at 5%|10%|20% FDR: 95.6%|97.6%|99.2%	 Num Positives: 1600	 Num Negatives: 1600 *
Epoch 42:
Train Loss: 0.0449	Balanced Accuracy: 98.23%	 auROC: 0.999	 auPRC: 0.999
	Recall at 5%|10%|20% FDR: 99.9%|100.0%|100.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.1448	Balanced Accuracy: 95.06%	 auROC: 0.988	 auPRC: 0.986
	Recall at 5%|10%|20% FDR: 95.1%|97.4%|99.5%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 43:
Train Loss: 0.0403	Balanced Accuracy: 99.10%	 auROC: 0.999	 auPRC: 0.999
	Recall at 5%|10%|20% FDR: 100.0%|100.0%|100.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.1434	Balanced Accuracy: 95.31%	 auROC: 0.988	 auPRC: 0.985
	Recall at 5%|10%|20% FDR: 95.5%|97.4%|99.5%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 44:
Train Loss: 0.0405	Balanced Accuracy: 99.24%	 auROC: 0.999	 auPRC: 0.999
	Recall at 5%|10%|20% FDR: 99.9%|100.0%|100.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.1392	Balanced Accuracy: 95.28%	 auROC: 0.988	 auPRC: 0.986
	Recall at 5%|10%|20% FDR: 95.3%|97.6%|99.5%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 45:
Train Loss: 0.0395	Balanced Accuracy: 99.28%	 auROC: 0.999	 auPRC: 0.999
	Recall at 5%|10%|20% FDR: 99.9%|100.0%|100.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.1491	Balanced Accuracy: 95.41%	 auROC: 0.988	 auPRC: 0.986
	Recall at 5%|10%|20% FDR: 95.4%|97.5%|99.5%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 46:
Train Loss: 0.0354	Balanced Accuracy: 99.30%	 auROC: 1.000	 auPRC: 1.000
	Recall at 5%|10%|20% FDR: 100.0%|100.0%|100.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.1386	Balanced Accuracy: 95.16%	 auROC: 0.989	 auPRC: 0.986
	Recall at 5%|10%|20% FDR: 95.2%|97.9%|99.6%	 Num Positives: 1600	 Num Negatives: 1600
Epoch 47:
Train Loss: 0.0334	Balanced Accuracy: 99.33%	 auROC: 1.000	 auPRC: 1.000
	Recall at 5%|10%|20% FDR: 100.0%|100.0%|100.0%	 Num Positives: 6355	 Num Negatives: 6445
Valid Loss: 0.1412	Balanced Accuracy: 95.38%	 auROC: 0.988	 auPRC: 0.986
	Recall at 5%|10%|20% FDR: 95.6%|97.7%|99.3%	 Num Positives: 1600	 Num Negatives: 1600
Finished training after 47 epochs.

The multi-layered DragoNN model achieves a higher auROC and a lower training and validation loss than the multi-filter DragoNN model. Try the same model without dropout regularization: how important is dropout?

Let's see what the model learns in its sequence filters.


In [24]:
interpret_SequenceDNN_filters(multi_layer_dragonn, simulation_data)


Plotting simulation motifs...
Visualizing convolutional sequence filters in SequenceDNN...

The sequence filters here are not amenable to interpretation based on visualization alone. In multi-layered models, sequence features are learned compositionally across the layers. As a result, sequence filters in the first layer focus more on simple features that can be combined in higher layers to learn motif features more efficiently, and their interpretation becomes less clear based on simple visualizations. Let's see where ISM and DeepLIFT get us with this model.


In [25]:
interpret_data_with_SequenceDNN(multi_layer_dragonn, simulation_data)


DeepLIFT and ISM scores for this model on representative positive (left) and negative (right) sequences expose what the model is doing.. The SIX5-ZNF143 grammar is clearly highlighted by both methods in the positive class sequence. However, ISM assigns higher scores to false features around position 250, so we would not be able to distinguish between flase and true features in this example based on ISM score magnitude. DeepLIFT, on the other hand, assigns the highest scores to the true features and therefore it could be used in this example to detect the SIX5-ZNF143 grammar.

Using DragoNN on your own non-simulated data

The dragonn package provides a command-line interface to train and test DragoNN models, and use them to predict and interpret new data. We start by training a dragonn model on positive and negative sequence:


In [26]:
!dragonn train --pos-sequences example_pos_sequences.fa --neg-sequences example_neg_sequences.fa --prefix training_example


Using Theano backend.
Using gpu device 1: TITAN X (Pascal) (CNMeM is enabled with initial size: 2500 MB, cuDNN 5105)
/users/jisraeli/local/anaconda/envs/dragonn2/lib/python2.7/site-packages/Theano-0.8.2-py2.7.egg/theano/sandbox/cuda/__init__.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.
  warnings.warn(warn)
loading sequence data...
initializing model...
starting model training...
Training model (* indicates new best result)...
Epoch 1:
Train Loss: 0.6852	Balanced Accuracy: 54.68%	 auROC: 0.574	 auPRC: 0.563
	Recall at 5%|10%|20% FDR: 0.0%|0.0%|0.0%	 Num Positives: 1976	 Num Negatives: 2024
Valid Loss: 0.6920	Balanced Accuracy: 53.27%	 auROC: 0.542	 auPRC: 0.548
	Recall at 5%|10%|20% FDR: 0.0%|0.0%|0.0%	 Num Positives: 524	 Num Negatives: 476 *
Epoch 2:
Train Loss: 0.6710	Balanced Accuracy: 59.90%	 auROC: 0.646	 auPRC: 0.632
	Recall at 5%|10%|20% FDR: 0.0%|0.0%|0.9%	 Num Positives: 1976	 Num Negatives: 2024
Valid Loss: 0.6874	Balanced Accuracy: 55.37%	 auROC: 0.563	 auPRC: 0.574
	Recall at 5%|10%|20% FDR: 0.0%|0.0%|0.0%	 Num Positives: 524	 Num Negatives: 476 *
Epoch 3:
Train Loss: 0.6601	Balanced Accuracy: 60.43%	 auROC: 0.703	 auPRC: 0.696
	Recall at 5%|10%|20% FDR: 0.0%|0.7%|17.8%	 Num Positives: 1976	 Num Negatives: 2024
Valid Loss: 0.6825	Balanced Accuracy: 52.97%	 auROC: 0.584	 auPRC: 0.604
	Recall at 5%|10%|20% FDR: 0.2%|0.2%|0.2%	 Num Positives: 524	 Num Negatives: 476 *
Epoch 4:
Train Loss: 0.6608	Balanced Accuracy: 56.15%	 auROC: 0.745	 auPRC: 0.738
	Recall at 5%|10%|20% FDR: 0.1%|3.0%|42.4%	 Num Positives: 1976	 Num Negatives: 2024
Valid Loss: 0.6880	Balanced Accuracy: 52.40%	 auROC: 0.607	 auPRC: 0.624
	Recall at 5%|10%|20% FDR: 0.0%|0.0%|12.6%	 Num Positives: 524	 Num Negatives: 476
Epoch 5:
Train Loss: 0.6194	Balanced Accuracy: 70.51%	 auROC: 0.781	 auPRC: 0.776
	Recall at 5%|10%|20% FDR: 0.0%|22.0%|51.8%	 Num Positives: 1976	 Num Negatives: 2024
Valid Loss: 0.6695	Balanced Accuracy: 58.85%	 auROC: 0.626	 auPRC: 0.636
	Recall at 5%|10%|20% FDR: 0.0%|0.0%|0.0%	 Num Positives: 524	 Num Negatives: 476 *
Epoch 6:
Train Loss: 0.5957	Balanced Accuracy: 72.65%	 auROC: 0.806	 auPRC: 0.803
	Recall at 5%|10%|20% FDR: 6.3%|33.4%|57.6%	 Num Positives: 1976	 Num Negatives: 2024
Valid Loss: 0.6621	Balanced Accuracy: 60.79%	 auROC: 0.655	 auPRC: 0.661
	Recall at 5%|10%|20% FDR: 0.0%|0.0%|7.8%	 Num Positives: 524	 Num Negatives: 476 *
Epoch 7:
Train Loss: 0.5673	Balanced Accuracy: 74.38%	 auROC: 0.825	 auPRC: 0.822
	Recall at 5%|10%|20% FDR: 8.5%|39.8%|63.1%	 Num Positives: 1976	 Num Negatives: 2024
Valid Loss: 0.6460	Balanced Accuracy: 62.52%	 auROC: 0.682	 auPRC: 0.686
	Recall at 5%|10%|20% FDR: 0.0%|0.0%|13.7%	 Num Positives: 524	 Num Negatives: 476 *
Epoch 8:
Train Loss: 0.5367	Balanced Accuracy: 76.24%	 auROC: 0.849	 auPRC: 0.848
	Recall at 5%|10%|20% FDR: 25.7%|45.8%|70.1%	 Num Positives: 1976	 Num Negatives: 2024
Valid Loss: 0.6339	Balanced Accuracy: 64.44%	 auROC: 0.705	 auPRC: 0.714
	Recall at 5%|10%|20% FDR: 0.0%|0.0%|23.3%	 Num Positives: 524	 Num Negatives: 476 *
Epoch 9:
Train Loss: 0.5205	Balanced Accuracy: 75.62%	 auROC: 0.862	 auPRC: 0.863
	Recall at 5%|10%|20% FDR: 30.6%|52.5%|75.5%	 Num Positives: 1976	 Num Negatives: 2024
Valid Loss: 0.6397	Balanced Accuracy: 65.66%	 auROC: 0.721	 auPRC: 0.739
	Recall at 5%|10%|20% FDR: 0.0%|13.7%|36.3%	 Num Positives: 524	 Num Negatives: 476
Epoch 10:
Train Loss: 0.4794	Balanced Accuracy: 79.57%	 auROC: 0.879	 auPRC: 0.881
	Recall at 5%|10%|20% FDR: 34.6%|56.6%|79.0%	 Num Positives: 1976	 Num Negatives: 2024
Valid Loss: 0.6025	Balanced Accuracy: 67.97%	 auROC: 0.741	 auPRC: 0.757
	Recall at 5%|10%|20% FDR: 0.0%|16.0%|47.5%	 Num Positives: 524	 Num Negatives: 476 *
Epoch 11:
Train Loss: 0.4521	Balanced Accuracy: 81.43%	 auROC: 0.900	 auPRC: 0.902
	Recall at 5%|10%|20% FDR: 46.8%|65.2%|83.7%	 Num Positives: 1976	 Num Negatives: 2024
Valid Loss: 0.6026	Balanced Accuracy: 68.63%	 auROC: 0.752	 auPRC: 0.770
	Recall at 5%|10%|20% FDR: 11.3%|16.4%|51.9%	 Num Positives: 524	 Num Negatives: 476
Epoch 12:
Train Loss: 0.4315	Balanced Accuracy: 83.14%	 auROC: 0.915	 auPRC: 0.916
	Recall at 5%|10%|20% FDR: 49.2%|71.5%|88.0%	 Num Positives: 1976	 Num Negatives: 2024
Valid Loss: 0.5900	Balanced Accuracy: 67.71%	 auROC: 0.755	 auPRC: 0.776
	Recall at 5%|10%|20% FDR: 12.0%|16.4%|50.2%	 Num Positives: 524	 Num Negatives: 476 *
Epoch 13:
Train Loss: 0.4187	Balanced Accuracy: 82.04%	 auROC: 0.922	 auPRC: 0.923
	Recall at 5%|10%|20% FDR: 55.2%|74.3%|89.0%	 Num Positives: 1976	 Num Negatives: 2024
Valid Loss: 0.5922	Balanced Accuracy: 67.45%	 auROC: 0.765	 auPRC: 0.784
	Recall at 5%|10%|20% FDR: 0.2%|21.6%|55.0%	 Num Positives: 524	 Num Negatives: 476
Epoch 14:
Train Loss: 0.3775	Balanced Accuracy: 85.39%	 auROC: 0.934	 auPRC: 0.934
	Recall at 5%|10%|20% FDR: 62.2%|79.1%|90.8%	 Num Positives: 1976	 Num Negatives: 2024
Valid Loss: 0.5832	Balanced Accuracy: 69.08%	 auROC: 0.767	 auPRC: 0.787
	Recall at 5%|10%|20% FDR: 0.2%|19.5%|54.8%	 Num Positives: 524	 Num Negatives: 476 *
Epoch 15:
Train Loss: 0.3555	Balanced Accuracy: 87.35%	 auROC: 0.946	 auPRC: 0.946
	Recall at 5%|10%|20% FDR: 67.6%|82.7%|93.7%	 Num Positives: 1976	 Num Negatives: 2024
Valid Loss: 0.5820	Balanced Accuracy: 71.39%	 auROC: 0.772	 auPRC: 0.792
	Recall at 5%|10%|20% FDR: 0.4%|23.1%|57.3%	 Num Positives: 524	 Num Negatives: 476 *
Epoch 16:
Train Loss: 0.3350	Balanced Accuracy: 87.77%	 auROC: 0.950	 auPRC: 0.951
	Recall at 5%|10%|20% FDR: 69.2%|84.4%|93.9%	 Num Positives: 1976	 Num Negatives: 2024
Valid Loss: 0.5862	Balanced Accuracy: 70.59%	 auROC: 0.774	 auPRC: 0.793
	Recall at 5%|10%|20% FDR: 0.2%|24.8%|57.6%	 Num Positives: 524	 Num Negatives: 476
Epoch 17:
Train Loss: 0.3109	Balanced Accuracy: 89.23%	 auROC: 0.960	 auPRC: 0.960
	Recall at 5%|10%|20% FDR: 75.9%|88.3%|95.7%	 Num Positives: 1976	 Num Negatives: 2024
Valid Loss: 0.5865	Balanced Accuracy: 71.13%	 auROC: 0.776	 auPRC: 0.797
	Recall at 5%|10%|20% FDR: 0.8%|26.7%|57.8%	 Num Positives: 524	 Num Negatives: 476
Epoch 18:
Train Loss: 0.3063	Balanced Accuracy: 90.00%	 auROC: 0.967	 auPRC: 0.967
	Recall at 5%|10%|20% FDR: 80.6%|90.5%|96.9%	 Num Positives: 1976	 Num Negatives: 2024
Valid Loss: 0.5852	Balanced Accuracy: 68.92%	 auROC: 0.780	 auPRC: 0.802
	Recall at 5%|10%|20% FDR: 10.9%|33.6%|58.4%	 Num Positives: 524	 Num Negatives: 476
Epoch 19:
Train Loss: 0.2857	Balanced Accuracy: 90.62%	 auROC: 0.973	 auPRC: 0.972
	Recall at 5%|10%|20% FDR: 83.8%|92.5%|98.1%	 Num Positives: 1976	 Num Negatives: 2024
Valid Loss: 0.5925	Balanced Accuracy: 68.98%	 auROC: 0.779	 auPRC: 0.802
	Recall at 5%|10%|20% FDR: 11.3%|34.4%|58.0%	 Num Positives: 524	 Num Negatives: 476
Epoch 20:
Train Loss: 0.2664	Balanced Accuracy: 90.83%	 auROC: 0.978	 auPRC: 0.977
	Recall at 5%|10%|20% FDR: 88.0%|94.5%|98.0%	 Num Positives: 1976	 Num Negatives: 2024
Valid Loss: 0.6171	Balanced Accuracy: 71.56%	 auROC: 0.779	 auPRC: 0.803
	Recall at 5%|10%|20% FDR: 12.4%|34.9%|57.8%	 Num Positives: 524	 Num Negatives: 476
Epoch 21:
Train Loss: 0.2468	Balanced Accuracy: 92.71%	 auROC: 0.983	 auPRC: 0.983
	Recall at 5%|10%|20% FDR: 91.3%|96.0%|98.7%	 Num Positives: 1976	 Num Negatives: 2024
Valid Loss: 0.5994	Balanced Accuracy: 69.99%	 auROC: 0.783	 auPRC: 0.807
	Recall at 5%|10%|20% FDR: 1.0%|37.0%|57.1%	 Num Positives: 524	 Num Negatives: 476
Finished training after 21 epochs.
final validation metrics:
Loss: 0.5994	Balanced Accuracy: 69.99%	 auROC: 0.783	 auPRC: 0.807
	Recall at 5%|10%|20% FDR: 1.0%|37.0%|57.1%	 Num Positives: 524	 Num Negatives: 476
saving model files..
Done!

Based on the provided prefix, this command stores a model file, training_example.model.json, with the model architecture and a weights file, training_example.weights.hd5, with the parameters of the trained model. We test the model by running:


In [27]:
!dragonn test --pos-sequences example_pos_sequences.fa --neg-sequences example_neg_sequences.fa \
--arch-file training_example.arch.json --weights-file training_example.weights.h5


Using Theano backend.
Using gpu device 1: TITAN X (Pascal) (CNMeM is enabled with initial size: 2500 MB, cuDNN 5105)
/users/jisraeli/local/anaconda/envs/dragonn2/lib/python2.7/site-packages/Theano-0.8.2-py2.7.egg/theano/sandbox/cuda/__init__.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.
  warnings.warn(warn)
loading sequence data...
loading model...
testing model...
Loss: 0.3173	Balanced Accuracy: 88.18%	 auROC: 0.949	 auPRC: 0.949
	Recall at 5%|10%|20% FDR: 74.6%|87.2%|94.8%	 Num Positives: 2500	 Num Negatives: 2500

This command prints the model's test performance metrics on the provided data. Model predictions on sequence data can be obtained by running:


In [28]:
!dragonn predict --sequences example_pos_sequences.fa --arch-file training_example.arch.json \
--weights-file training_example.weights.h5 --output-file example_predictions.txt


Using Theano backend.
Using gpu device 1: TITAN X (Pascal) (CNMeM is enabled with initial size: 2500 MB, cuDNN 5105)
/users/jisraeli/local/anaconda/envs/dragonn2/lib/python2.7/site-packages/Theano-0.8.2-py2.7.egg/theano/sandbox/cuda/__init__.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.
  warnings.warn(warn)
loading sequence data...
loading model...
getting predictions...
saving predictions to output file...
Done!

This command stores the model predictions for sequences in example_pos_sequences.fa in the output file example_predictions.txt. We can interpret sequence data with a dragonn model by running:


In [29]:
!dragonn interpret --sequences example_pos_sequences.fa --arch-file training_example.arch.json \
--weights-file training_example.weights.h5 --prefix example_interpretation


Using Theano backend.
Using gpu device 1: TITAN X (Pascal) (CNMeM is enabled with initial size: 2500 MB, cuDNN 5105)
/users/jisraeli/local/anaconda/envs/dragonn2/lib/python2.7/site-packages/Theano-0.8.2-py2.7.egg/theano/sandbox/cuda/__init__.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.
  warnings.warn(warn)
loading sequence data...
loading model...
getting predictions...
getting deeplift scores...
extracting important sequences and writing to file...
Done!

This will write the most important subsequence in each input sequence along with its location in the input sequence in the file example_interpretation.task_0.important_sequences.txt. Note: by default, only examples with predicted positive class probability >0.5 are interpreted. Examples below this threshold yield important subsequence of Ns with location -1. Let's look the first few lines of this file:


In [30]:
!head example_interpretation.task_0.important_sequences.txt


> sequence_0
418: GCCTGAGGAGGGCAGAAGGG
> sequence_1
373: GGCACCTGGTGGCCCCGAAG
> sequence_2
392: CCCTAGCTGCCAGCAGGCGG
> sequence_3
404: CTAACTCTGTGTCGGTTTTG
> sequence_4
430: GCCTTCCTCAGGCAGGAGGG

Extras for HW

The tutorial example here touches on general principles of DragoNN model development and interpretation. To gain a deeper insight into the difference between DeepLIFT and ISM for model interpretation, consider the following exercise:

Train, test, and run sequence-centric interpretation for the one layered CNN model used here for the following simulations:

1. single motif detection simulation of TAL1 in 1000bp sequence with 40% GC content
(run print_simulation_info("simulate_single_motif_detection") to see the exact simulation parameters)
2. motif density localization simulation of 2-4 TAL1 motif instances in the central of 150bp of a total 1000bp
sequence with 40% GC
content
(run print_simulation_info("simulate_motif_density_localization") to see the exact simulation parameters)

Key questions:

1) What could explain the difference in ISM's sensitivity to the TAL1 motif sequence between the simulations?
2) What does that tell us about the the scope of ISM for feature discovery? Under what conditions is it likely
to show sensitivity to sequence features?

Starter code is provided below to get the data for each simulation and new DragoNN model.


In [ ]:
single_motif_detection_simulation_parameters = {
    "motif_name": "TAL1_known4",
    "seq_length": 1000,
    "num_pos": 10000,
    "num_neg": 10000,
    "GC_fraction": 0.4}

density_localization_simulation_parameters = {
    "motif_name": "TAL1_known4",
    "seq_length": 1000,
    "center_size": 150,
    "min_motif_counts": 2,
    "max_motif_counts": 4,
    "num_pos": 10000,
    "num_neg": 10000,
    "GC_fraction": 0.4}

single_motif_detection_simulation_data = get_simulation_data(
    "simulate_single_motif_detection", single_motif_detection_simulation_parameters)

density_localization_simulation_data = get_simulation_data(
    "simulate_motif_density_localization", density_localization_simulation_parameters)

In [ ]:
new_dragonn_model = get_SequenceDNN(multi_layer_dragonn_parameters)