Anna KaRNNa

In this notebook, we'll build a character-wise RNN trained on Anna Karenina, one of my all-time favorite books. It'll be able to generate new text based on the text from the book.

This network is based off of Andrej Karpathy's post on RNNs and implementation in Torch. Also, some information here at r2rt and from Sherjil Ozair on GitHub. Below is the general architecture of the character-wise RNN.


In [1]:
import time
from collections import namedtuple

import numpy as np
import tensorflow as tf

First we'll load the text file and convert it into integers for our network to use. Here I'm creating a couple dictionaries to convert the characters to and from integers. Encoding the characters as integers makes it easier to use as input in the network.


In [2]:
with open('anna.txt', 'r') as f:
    text=f.read()
vocab = set(text)
vocab_to_int = {c: i for i, c in enumerate(vocab)}
int_to_vocab = dict(enumerate(vocab))
encoded = np.array([vocab_to_int[c] for c in text], dtype=np.int32)

Let's check out the first 100 characters, make sure everything is peachy. According to the American Book Review, this is the 6th best first line of a book ever.


In [3]:
text[:100]


Out[3]:
'Chapter 1\n\n\nHappy families are all alike; every unhappy family is unhappy in its own\nway.\n\nEverythin'

And we can see the characters encoded as integers.


In [4]:
encoded[:100]


Out[4]:
array([54, 61, 18, 32, 28,  2, 82, 65, 22, 47, 47, 47,  7, 18, 32, 32,  1,
       65, 20, 18,  8, 75, 29, 75,  2, 42, 65, 18, 82,  2, 65, 18, 29, 29,
       65, 18, 29, 75, 36,  2, 69, 65,  2, 56,  2, 82,  1, 65, 34, 26, 61,
       18, 32, 32,  1, 65, 20, 18,  8, 75, 29,  1, 65, 75, 42, 65, 34, 26,
       61, 18, 32, 32,  1, 65, 75, 26, 65, 75, 28, 42, 65, 72, 66, 26, 47,
       66, 18,  1, 19, 47, 47, 74, 56,  2, 82,  1, 28, 61, 75, 26], dtype=int32)

Since the network is working with individual characters, it's similar to a classification problem in which we are trying to predict the next character from the previous text. Here's how many 'classes' our network has to pick from.


In [5]:
len(vocab)


Out[5]:
83

Making training mini-batches

Here is where we'll make our mini-batches for training. Remember that we want our batches to be multiple sequences of some desired number of sequence steps. Considering a simple example, our batches would look like this:


We have our text encoded as integers as one long array in encoded. Let's create a function that will give us an iterator for our batches. I like using generator functions to do this. Then we can pass encoded into this function and get our batch generator.

The first thing we need to do is discard some of the text so we only have completely full batches. Each batch contains $N \times M$ characters, where $N$ is the batch size (the number of sequences) and $M$ is the number of steps. Then, to get the number of batches we can make from some array arr, you divide the length of arr by the batch size. Once you know the number of batches and the batch size, you can get the total number of characters to keep.

After that, we need to split arr into $N$ sequences. You can do this using arr.reshape(size) where size is a tuple containing the dimensions sizes of the reshaped array. We know we want $N$ sequences (n_seqs below), let's make that the size of the first dimension. For the second dimension, you can use -1 as a placeholder in the size, it'll fill up the array with the appropriate data for you. After this, you should have an array that is $N \times (M * K)$ where $K$ is the number of batches.

Now that we have this array, we can iterate through it to get our batches. The idea is each batch is a $N \times M$ window on the array. For each subsequent batch, the window moves over by n_steps. We also want to create both the input and target arrays. Remember that the targets are the inputs shifted over one character. You'll usually see the first input character used as the last target character, so something like this:

y[:, :-1], y[:, -1] = x[:, 1:], x[:, 0]

where x is the input batch and y is the target batch.

The way I like to do this window is use range to take steps of size n_steps from $0$ to arr.shape[1], the total number of steps in each sequence. That way, the integers you get from range always point to the start of a batch, and each window is n_steps wide.


In [57]:
def get_batches(arr, n_seqs, n_steps):
    '''Create a generator that returns batches of size
       n_seqs x n_steps from arr.
       
       Arguments
       ---------
       arr: Array you want to make batches from
       n_seqs: Batch size, the number of sequences per batch
       n_steps: Number of sequence steps per batch
    '''
    # Get the batch size and number of batches we can make
    batch_size = n_seqs * n_steps
    n_batches = len(arr)//batch_size
    
    # Keep only enough characters to make full batches
    arr = arr[:n_batches * batch_size]
    
    # Reshape into n_seqs rows
    arr = arr.reshape((n_seqs, -1))
    
    for n in range(0, arr.shape[1], n_steps):
        # The features
        x = arr[:, n:n+n_steps]
        # The targets, shifted by one
        y = np.zeros_like(x)
        y[:, :-1], y[:, -1] = x[:, 1:], x[:, 0]
        yield x, y

Now I'll make my data sets and we can check out what's going on here. Here I'm going to use a batch size of 10 and 50 sequence steps.


In [17]:
batches = get_batches(encoded, 10, 10)
x, y = next(batches)

In [15]:
encoded.shape


Out[15]:
(1985223,)

In [18]:
x.shape


Out[18]:
(10, 10)

In [19]:
encoded


Out[19]:
array([54, 61, 18, ..., 42, 19, 47], dtype=int32)

In [20]:
print('x\n', x[:10, :])
print('\ny\n', y[:10, :])


x
 [[54 61 18 32 28  2 82 65 22 47]
 [ 1 80  6 65 18 26 42 66  2 82]
 [18 24 26 75 20 75 49  2 26 28]
 [65 18 65 32 82  2 56 75 72 34]
 [ 2 65 28 61 18 26 36  2 70 65]
 [65 42 18 66 65 61  2 82 65 28]
 [49 28  2 70 65 66 61 18 28 47]
 [18 42 65 42 34 20 20  2 82 75]
 [ 2 70 65 75 28 42 65 28 75  8]
 [49  2 65 72 20 65  8 75 26 70]]

y
 [[61 18 32 28  2 82 65 22 47 54]
 [80  6 65 18 26 42 66  2 82  1]
 [24 26 75 20 75 49  2 26 28 18]
 [18 65 32 82  2 56 75 72 34 65]
 [65 28 61 18 26 36  2 70 65  2]
 [42 18 66 65 61  2 82 65 28 65]
 [28  2 70 65 66 61 18 28 47 49]
 [42 65 42 34 20 20  2 82 75 18]
 [70 65 75 28 42 65 28 75  8  2]
 [ 2 65 72 20 65  8 75 26 70 49]]

If you implemented get_batches correctly, the above output should look something like

x
 [[55 63 69 22  6 76 45  5 16 35]
 [ 5 69  1  5 12 52  6  5 56 52]
 [48 29 12 61 35 35  8 64 76 78]
 [12  5 24 39 45 29 12 56  5 63]
 [ 5 29  6  5 29 78 28  5 78 29]
 [ 5 13  6  5 36 69 78 35 52 12]
 [63 76 12  5 18 52  1 76  5 58]
 [34  5 73 39  6  5 12 52 36  5]
 [ 6  5 29 78 12 79  6 61  5 59]
 [ 5 78 69 29 24  5  6 52  5 63]]

y
 [[63 69 22  6 76 45  5 16 35 35]
 [69  1  5 12 52  6  5 56 52 29]
 [29 12 61 35 35  8 64 76 78 28]
 [ 5 24 39 45 29 12 56  5 63 29]
 [29  6  5 29 78 28  5 78 29 45]
 [13  6  5 36 69 78 35 52 12 43]
 [76 12  5 18 52  1 76  5 58 52]
 [ 5 73 39  6  5 12 52 36  5 78]
 [ 5 29 78 12 79  6 61  5 59 63]
 [78 69 29 24  5  6 52  5 63 76]]

although the exact numbers will be different. Check to make sure the data is shifted over one step for y.

Building the model

Below is where you'll build the network. We'll break it up into parts so it's easier to reason about each bit. Then we can connect them up into the whole network.

Inputs

First off we'll create our input placeholders. As usual we need placeholders for the training data and the targets. We'll also create a placeholder for dropout layers called keep_prob. This will be a scalar, that is a 0-D tensor. To make a scalar, you create a placeholder without giving it a size.

Exercise: Create the input placeholders in the function below.


In [58]:
def build_inputs(batch_size, num_steps):
    ''' Define placeholders for inputs, targets, and dropout 
    
        Arguments
        ---------
        batch_size: Batch size, number of sequences per batch
        num_steps: Number of sequence steps in a batch
        
    '''
    # Declare placeholders we'll feed into the graph
    inputs = tf.placeholder(tf.int32, (batch_size, num_steps), name='inputs')
    targets = tf.placeholder(tf.int32, (batch_size, num_steps), name='targets')
    
    # Keep probability placeholder for drop out layers
    keep_prob = tf.placeholder(tf.float32, name='keep_prob')
    
    return inputs, targets, keep_prob

LSTM Cell

Here we will create the LSTM cell we'll use in the hidden layer. We'll use this cell as a building block for the RNN. So we aren't actually defining the RNN here, just the type of cell we'll use in the hidden layer.

We first create a basic LSTM cell with

lstm = tf.contrib.rnn.BasicLSTMCell(num_units)

where num_units is the number of units in the hidden layers in the cell. Then we can add dropout by wrapping it with

tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)

You pass in a cell and it will automatically add dropout to the inputs or outputs. Finally, we can stack up the LSTM cells into layers with tf.contrib.rnn.MultiRNNCell. With this, you pass in a list of cells and it will send the output of one cell into the next cell. For example,

tf.contrib.rnn.MultiRNNCell([cell]*num_layers)

This might look a little weird if you know Python well because this will create a list of the same cell object. However, TensorFlow will create different weight matrices for all cell objects. Even though this is actually multiple LSTM cells stacked on each other, you can treat the multiple layers as one cell.

We also need to create an initial cell state of all zeros. This can be done like so

initial_state = cell.zero_state(batch_size, tf.float32)

Exercise: Below, implement the build_lstm function to create these LSTM cells and the initial state.


In [83]:
def lstm_cell(lstm_size, keep_prob):
    cell = tf.contrib.rnn.BasicLSTMCell(lstm_size, reuse=tf.get_variable_scope().reuse)
    return tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=keep_prob)

def build_lstm(lstm_size, num_layers, batch_size, keep_prob):
    ''' Build LSTM cell.
    
        Arguments
        ---------
        keep_prob: Scalar tensor (tf.placeholder) for the dropout keep probability
        lstm_size: Size of the hidden layers in the LSTM cells
        num_layers: Number of LSTM layers
        batch_size: Batch size

    '''
    ### Build the LSTM Cell
#     # Use a basic LSTM cell
#     lstm = tf.contrib.rnn.BasicLSTMCell(batch_size, reuse=tf.get_variable_scope().reuse)
    
#     # Add dropout to the cell outputs
#     drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
    
    # Stack up multiple LSTM layers, for deep learning
    cell = tf.contrib.rnn.MultiRNNCell([lstm_cell(lstm_size, keep_prob) for _ in range(num_layers)], state_is_tuple=True)
    initial_state = cell.zero_state(batch_size, tf.float32)
    
    return cell, initial_state

In [75]:
# https://stackoverflow.com/questions/42669578/tensorflow-1-0-valueerror-attempt-to-reuse-rnncell-with-a-different-variable-s
# def lstm_cell():
#     cell = tf.contrib.rnn.NASCell(state_size, reuse=tf.get_variable_scope().reuse)
#     return tf.contrib.rnn.DropoutWrapper(cell, output_keep_prob=0.8)

# rnn_cells = tf.contrib.rnn.MultiRNNCell([lstm_cell() for _ in range(num_layers)], state_is_tuple = True)
# outputs, current_state = tf.nn.dynamic_rnn(rnn_cells, x, initial_state=rnn_tuple_state)

In [76]:
# MultiRNNCell([BasicLSTMCell(...) for _ in range(num_layers)])

RNN Output

Here we'll create the output layer. We need to connect the output of the RNN cells to a full connected layer with a softmax output. The softmax output gives us a probability distribution we can use to predict the next character, so we want this layer to have size $C$, the number of classes/characters we have in our text.

If our input has batch size $N$, number of steps $M$, and the hidden layer has $L$ hidden units, then the output is a 3D tensor with size $N \times M \times L$. The output of each LSTM cell has size $L$, we have $M$ of them, one for each sequence step, and we have $N$ sequences. So the total size is $N \times M \times L$.

We are using the same fully connected layer, the same weights, for each of the outputs. Then, to make things easier, we should reshape the outputs into a 2D tensor with shape $(M * N) \times L$. That is, one row for each sequence and step, where the values of each row are the output from the LSTM cells. We get the LSTM output as a list, lstm_output. First we need to concatenate this whole list into one array with tf.concat. Then, reshape it (with tf.reshape) to size $(M * N) \times L$.

One we have the outputs reshaped, we can do the matrix multiplication with the weights. We need to wrap the weight and bias variables in a variable scope with tf.variable_scope(scope_name) because there are weights being created in the LSTM cells. TensorFlow will throw an error if the weights created here have the same names as the weights created in the LSTM cells, which they will be default. To avoid this, we wrap the variables in a variable scope so we can give them unique names.

Exercise: Implement the output layer in the function below.


In [87]:
def build_output(lstm_output, in_size, out_size):
    ''' Build a softmax layer, return the softmax output and logits.
    
        Arguments
        ---------
        
        lstm_output: List of output tensors from the LSTM layer
        in_size: Size of the input tensor, for example, size of the LSTM cells
        out_size: Size of this softmax layer
    
    '''

    # Reshape output so it's a bunch of rows, one row for each step for each sequence.
    # Concatenate lstm_output over axis 1 (the columns)
    seq_output = tf.concat(lstm_output, axis=1)
    # Reshape seq_output to a 2D tensor with lstm_size columns
    x = tf.reshape(seq_output, [-1, in_size])
    
    # Connect the RNN outputs to a softmax layer
    with tf.variable_scope('softmax'):
        # Create the weight and bias variables here
        softmax_w = tf.Variable(tf.truncated_normal((in_size, out_size), stddev=0.1))
        softmax_b = tf.Variable(tf.zeros(out_size))
    
    # Since output is a bunch of rows of RNN cell outputs, logits will be a bunch
    # of rows of logit outputs, one for each step and sequence
    logits = tf.add(tf.matmul(x, softmax_w), softmax_b)
    
    # Use softmax to get the probabilities for predicted characters
    out = tf.nn.softmax(logits, name='prediction')
    
    return out, logits

Training loss

Next up is the training loss. We get the logits and targets and calculate the softmax cross-entropy loss. First we need to one-hot encode the targets, we're getting them as encoded characters. Then, reshape the one-hot targets so it's a 2D tensor with size $(M*N) \times C$ where $C$ is the number of classes/characters we have. Remember that we reshaped the LSTM outputs and ran them through a fully connected layer with $C$ units. So our logits will also have size $(M*N) \times C$.

Then we run the logits and targets through tf.nn.softmax_cross_entropy_with_logits and find the mean to get the loss.

Exercise: Implement the loss calculation in the function below.


In [93]:
def build_loss(logits, targets, lstm_size, num_classes):
    ''' Calculate the loss from the logits and the targets.
    
        Arguments
        ---------
        logits: Logits from final fully connected layer
        targets: Targets for supervised learning
        lstm_size: Number of LSTM hidden units
        num_classes: Number of classes in targets
        
    '''
    
    # One-hot encode targets and reshape to match logits, one row per sequence per step
    y_one_hot = tf.one_hot(targets, num_classes)
    y_reshaped = tf.reshape(y_one_hot, logits.get_shape())
    
    # Softmax cross entropy loss
    loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y_reshaped)
    loss = tf.reduce_mean(loss)
    return loss

Optimizer

Here we build the optimizer. Normal RNNs have have issues gradients exploding and disappearing. LSTMs fix the disappearance problem, but the gradients can still grow without bound. To fix this, we can clip the gradients above some threshold. That is, if a gradient is larger than that threshold, we set it to the threshold. This will ensure the gradients never grow overly large. Then we use an AdamOptimizer for the learning step.


In [94]:
def build_optimizer(loss, learning_rate, grad_clip):
    ''' Build optmizer for training, using gradient clipping.
    
        Arguments:
        loss: Network loss
        learning_rate: Learning rate for optimizer
    
    '''
    
    # Optimizer for training, using gradient clipping to control exploding gradients
    tvars = tf.trainable_variables()
    grads, _ = tf.clip_by_global_norm(tf.gradients(loss, tvars), grad_clip)
    train_op = tf.train.AdamOptimizer(learning_rate)
    optimizer = train_op.apply_gradients(zip(grads, tvars))
    
    return optimizer

Build the network

Now we can put all the pieces together and build a class for the network. To actually run data through the LSTM cells, we will use tf.nn.dynamic_rnn. This function will pass the hidden and cell states across LSTM cells appropriately for us. It returns the outputs for each LSTM cell at each step for each sequence in the mini-batch. It also gives us the final LSTM state. We want to save this state as final_state so we can pass it to the first LSTM cell in the the next mini-batch run. For tf.nn.dynamic_rnn, we pass in the cell and initial state we get from build_lstm, as well as our input sequences. Also, we need to one-hot encode the inputs before going into the RNN.

Exercise: Use the functions you've implemented previously and tf.nn.dynamic_rnn to build the network.


In [95]:
class CharRNN:
    
    def __init__(self, num_classes, batch_size=64, num_steps=50, 
                       lstm_size=128, num_layers=2, learning_rate=0.001, 
                       grad_clip=5, sampling=False):
    
        # When we're using this network for sampling later, we'll be passing in
        # one character at a time, so providing an option for that
        if sampling == True:
            batch_size, num_steps = 1, 1
        else:
            batch_size, num_steps = batch_size, num_steps

        tf.reset_default_graph()
        
        # Build the input placeholder tensors
        self.inputs, self.targets, self.keep_prob = build_inputs(batch_size, num_steps)

        # Build the LSTM cell
        cell, self.initial_state = build_lstm(lstm_size, num_layers, batch_size, keep_prob)

        ### Run the data through the RNN layers
        # First, one-hot encode the input tokens
        x_one_hot = tf.one_hot(self.inputs, num_classes)
        
        # Run each sequence step through the RNN with tf.nn.dynamic_rnn 
        outputs, state = tf.nn.dynamic_rnn(cell, x_one_hot, initial_state=self.initial_state, scope='layer')
        self.final_state = state
        
        # Get softmax predictions and logits
        self.prediction, self.logits = build_output(outputs, lstm_size, num_classes)
        
        # Loss and optimizer (with gradient clipping)
        self.loss =  build_loss(self.logits, self.targets, lstm_size, num_classes)
        self.optimizer = build_optimizer(self.loss, learning_rate, grad_clip)

Hyperparameters

Here are the hyperparameters for the network.

  • batch_size - Number of sequences running through the network in one pass.
  • num_steps - Number of characters in the sequence the network is trained on. Larger is better typically, the network will learn more long range dependencies. But it takes longer to train. 100 is typically a good number here.
  • lstm_size - The number of units in the hidden layers.
  • num_layers - Number of hidden LSTM layers to use
  • learning_rate - Learning rate for training
  • keep_prob - The dropout keep probability when training. If you're network is overfitting, try decreasing this.

Here's some good advice from Andrej Karpathy on training the network. I'm going to copy it in here for your benefit, but also link to where it originally came from.

Tips and Tricks

Monitoring Validation Loss vs. Training Loss

If you're somewhat new to Machine Learning or Neural Networks it can take a bit of expertise to get good models. The most important quantity to keep track of is the difference between your training loss (printed during training) and the validation loss (printed once in a while when the RNN is run on the validation data (by default every 1000 iterations)). In particular:

  • If your training loss is much lower than validation loss then this means the network might be overfitting. Solutions to this are to decrease your network size, or to increase dropout. For example you could try dropout of 0.5 and so on.
  • If your training/validation loss are about equal then your model is underfitting. Increase the size of your model (either number of layers or the raw number of neurons per layer)

Approximate number of parameters

The two most important parameters that control the model are lstm_size and num_layers. I would advise that you always use num_layers of either 2/3. The lstm_size can be adjusted based on how much data you have. The two important quantities to keep track of here are:

  • The number of parameters in your model. This is printed when you start training.
  • The size of your dataset. 1MB file is approximately 1 million characters.

These two should be about the same order of magnitude. It's a little tricky to tell. Here are some examples:

  • I have a 100MB dataset and I'm using the default parameter settings (which currently print 150K parameters). My data size is significantly larger (100 mil >> 0.15 mil), so I expect to heavily underfit. I am thinking I can comfortably afford to make lstm_size larger.
  • I have a 10MB dataset and running a 10 million parameter model. I'm slightly nervous and I'm carefully monitoring my validation loss. If it's larger than my training loss then I may want to try to increase dropout a bit and see if that helps the validation loss.

Best models strategy

The winning strategy to obtaining very good models (if you have the compute time) is to always err on making the network larger (as large as you're willing to wait for it to compute) and then try different dropout values (between 0,1). Whatever model has the best validation performance (the loss, written in the checkpoint filename, low is good) is the one you should use in the end.

It is very common in deep learning to run many different models with many different hyperparameter settings, and in the end take whatever checkpoint gave the best validation performance.

By the way, the size of your training and validation splits are also parameters. Make sure you have a decent amount of data in your validation set or otherwise the validation performance will be noisy and not very informative.


In [96]:
batch_size = 100         # Sequences per batch
num_steps = 100          # Number of sequence steps per batch
lstm_size = 128         # Size of hidden layers in LSTMs
num_layers = 2          # Number of LSTM layers
learning_rate = 0.001    # Learning rate
keep_prob = 0.5         # Dropout keep probability

Time for training

This is typical training code, passing inputs and targets into the network, then running the optimizer. Here we also get back the final LSTM state for the mini-batch. Then, we pass that state back into the network so the next batch can continue the state from the previous batch. And every so often (set by save_every_n) I save a checkpoint.

Here I'm saving checkpoints with the format

i{iteration number}_l{# hidden layer units}.ckpt

Exercise: Set the hyperparameters above to train the network. Watch the training loss, it should be consistently dropping. Also, I highly advise running this on a GPU.


In [98]:
epochs = 20
# Save every N iterations
save_every_n = 200

model = CharRNN(len(vocab), batch_size=batch_size, num_steps=num_steps,
                lstm_size=lstm_size, num_layers=num_layers, 
                learning_rate=learning_rate)

saver = tf.train.Saver(max_to_keep=100)
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    # Use the line below to load a checkpoint and resume training
    #saver.restore(sess, 'checkpoints/______.ckpt')
    counter = 0
    for e in range(epochs):
        # Train network
        new_state = sess.run(model.initial_state)
        loss = 0
        for x, y in get_batches(encoded, batch_size, num_steps):
            counter += 1
            start = time.time()
            feed = {model.inputs: x,
                    model.targets: y,
                    model.keep_prob: keep_prob,
                    model.initial_state: new_state}
            batch_loss, new_state, _ = sess.run([model.loss, 
                                                 model.final_state, 
                                                 model.optimizer], 
                                                 feed_dict=feed)
            
            end = time.time()
            print('Epoch: {}/{}... '.format(e+1, epochs),
                  'Training Step: {}... '.format(counter),
                  'Training loss: {:.4f}... '.format(batch_loss),
                  '{:.4f} sec/batch'.format((end-start)))
        
            if (counter % save_every_n == 0):
                saver.save(sess, "checkpoints/i{}_l{}.ckpt".format(counter, lstm_size))
    
    saver.save(sess, "checkpoints/i{}_l{}.ckpt".format(counter, lstm_size))


Epoch: 1/20...  Training Step: 1...  Training loss: 4.4195...  0.6008 sec/batch
Epoch: 1/20...  Training Step: 2...  Training loss: 4.4045...  0.5864 sec/batch
Epoch: 1/20...  Training Step: 3...  Training loss: 4.3868...  0.5844 sec/batch
Epoch: 1/20...  Training Step: 4...  Training loss: 4.3560...  0.5775 sec/batch
Epoch: 1/20...  Training Step: 5...  Training loss: 4.3065...  0.5532 sec/batch
Epoch: 1/20...  Training Step: 6...  Training loss: 4.2077...  0.5814 sec/batch
Epoch: 1/20...  Training Step: 7...  Training loss: 4.0346...  0.6074 sec/batch
Epoch: 1/20...  Training Step: 8...  Training loss: 3.8619...  0.5819 sec/batch
Epoch: 1/20...  Training Step: 9...  Training loss: 3.7611...  0.5802 sec/batch
Epoch: 1/20...  Training Step: 10...  Training loss: 3.7036...  0.5836 sec/batch
Epoch: 1/20...  Training Step: 11...  Training loss: 3.6514...  0.5765 sec/batch
Epoch: 1/20...  Training Step: 12...  Training loss: 3.5878...  0.6904 sec/batch
Epoch: 1/20...  Training Step: 13...  Training loss: 3.5299...  0.6458 sec/batch
Epoch: 1/20...  Training Step: 14...  Training loss: 3.5106...  0.5735 sec/batch
Epoch: 1/20...  Training Step: 15...  Training loss: 3.4775...  0.5754 sec/batch
Epoch: 1/20...  Training Step: 16...  Training loss: 3.4632...  0.5774 sec/batch
Epoch: 1/20...  Training Step: 17...  Training loss: 3.4109...  0.5808 sec/batch
Epoch: 1/20...  Training Step: 18...  Training loss: 3.4235...  0.5616 sec/batch
Epoch: 1/20...  Training Step: 19...  Training loss: 3.4001...  0.5818 sec/batch
Epoch: 1/20...  Training Step: 20...  Training loss: 3.3505...  0.5720 sec/batch
Epoch: 1/20...  Training Step: 21...  Training loss: 3.3605...  0.5848 sec/batch
Epoch: 1/20...  Training Step: 22...  Training loss: 3.3379...  0.5793 sec/batch
Epoch: 1/20...  Training Step: 23...  Training loss: 3.3258...  0.5681 sec/batch
Epoch: 1/20...  Training Step: 24...  Training loss: 3.3275...  0.5876 sec/batch
Epoch: 1/20...  Training Step: 25...  Training loss: 3.2984...  0.5859 sec/batch
Epoch: 1/20...  Training Step: 26...  Training loss: 3.3109...  0.5826 sec/batch
Epoch: 1/20...  Training Step: 27...  Training loss: 3.3025...  0.5736 sec/batch
Epoch: 1/20...  Training Step: 28...  Training loss: 3.2818...  0.5815 sec/batch
Epoch: 1/20...  Training Step: 29...  Training loss: 3.2864...  0.5783 sec/batch
Epoch: 1/20...  Training Step: 30...  Training loss: 3.2854...  0.5722 sec/batch
Epoch: 1/20...  Training Step: 31...  Training loss: 3.2882...  0.5802 sec/batch
Epoch: 1/20...  Training Step: 32...  Training loss: 3.2640...  0.5768 sec/batch
Epoch: 1/20...  Training Step: 33...  Training loss: 3.2549...  0.5692 sec/batch
Epoch: 1/20...  Training Step: 34...  Training loss: 3.2661...  0.5794 sec/batch
Epoch: 1/20...  Training Step: 35...  Training loss: 3.2502...  0.5775 sec/batch
Epoch: 1/20...  Training Step: 36...  Training loss: 3.2626...  0.5647 sec/batch
Epoch: 1/20...  Training Step: 37...  Training loss: 3.2329...  0.5719 sec/batch
Epoch: 1/20...  Training Step: 38...  Training loss: 3.2374...  0.5862 sec/batch
Epoch: 1/20...  Training Step: 39...  Training loss: 3.2345...  0.5877 sec/batch
Epoch: 1/20...  Training Step: 40...  Training loss: 3.2300...  0.5758 sec/batch
Epoch: 1/20...  Training Step: 41...  Training loss: 3.2127...  0.5954 sec/batch
Epoch: 1/20...  Training Step: 42...  Training loss: 3.2222...  0.6005 sec/batch
Epoch: 1/20...  Training Step: 43...  Training loss: 3.2128...  0.5838 sec/batch
Epoch: 1/20...  Training Step: 44...  Training loss: 3.2165...  0.5803 sec/batch
Epoch: 1/20...  Training Step: 45...  Training loss: 3.2023...  0.5886 sec/batch
Epoch: 1/20...  Training Step: 46...  Training loss: 3.2241...  0.5837 sec/batch
Epoch: 1/20...  Training Step: 47...  Training loss: 3.2253...  0.5817 sec/batch
Epoch: 1/20...  Training Step: 48...  Training loss: 3.2243...  0.5801 sec/batch
Epoch: 1/20...  Training Step: 49...  Training loss: 3.2193...  0.5820 sec/batch
Epoch: 1/20...  Training Step: 50...  Training loss: 3.2222...  0.5803 sec/batch
Epoch: 1/20...  Training Step: 51...  Training loss: 3.2131...  0.5645 sec/batch
Epoch: 1/20...  Training Step: 52...  Training loss: 3.1938...  0.5761 sec/batch
Epoch: 1/20...  Training Step: 53...  Training loss: 3.2097...  0.6028 sec/batch
Epoch: 1/20...  Training Step: 54...  Training loss: 3.1942...  0.5898 sec/batch
Epoch: 1/20...  Training Step: 55...  Training loss: 3.2048...  0.5892 sec/batch
Epoch: 1/20...  Training Step: 56...  Training loss: 3.1830...  0.5756 sec/batch
Epoch: 1/20...  Training Step: 57...  Training loss: 3.1896...  0.5744 sec/batch
Epoch: 1/20...  Training Step: 58...  Training loss: 3.1983...  0.5762 sec/batch
Epoch: 1/20...  Training Step: 59...  Training loss: 3.1811...  0.5762 sec/batch
Epoch: 1/20...  Training Step: 60...  Training loss: 3.1985...  0.5794 sec/batch
Epoch: 1/20...  Training Step: 61...  Training loss: 3.1981...  0.5726 sec/batch
Epoch: 1/20...  Training Step: 62...  Training loss: 3.2053...  0.5859 sec/batch
Epoch: 1/20...  Training Step: 63...  Training loss: 3.2215...  0.5793 sec/batch
Epoch: 1/20...  Training Step: 64...  Training loss: 3.1725...  0.5717 sec/batch
Epoch: 1/20...  Training Step: 65...  Training loss: 3.1738...  0.5741 sec/batch
Epoch: 1/20...  Training Step: 66...  Training loss: 3.2098...  0.5755 sec/batch
Epoch: 1/20...  Training Step: 67...  Training loss: 3.1925...  0.5752 sec/batch
Epoch: 1/20...  Training Step: 68...  Training loss: 3.1462...  0.5794 sec/batch
Epoch: 1/20...  Training Step: 69...  Training loss: 3.1721...  0.5756 sec/batch
Epoch: 1/20...  Training Step: 70...  Training loss: 3.1886...  0.5782 sec/batch
Epoch: 1/20...  Training Step: 71...  Training loss: 3.1790...  0.5761 sec/batch
Epoch: 1/20...  Training Step: 72...  Training loss: 3.1974...  0.5518 sec/batch
Epoch: 1/20...  Training Step: 73...  Training loss: 3.1760...  0.5694 sec/batch
Epoch: 1/20...  Training Step: 74...  Training loss: 3.1850...  0.5902 sec/batch
Epoch: 1/20...  Training Step: 75...  Training loss: 3.1927...  0.5788 sec/batch
Epoch: 1/20...  Training Step: 76...  Training loss: 3.1998...  0.5799 sec/batch
Epoch: 1/20...  Training Step: 77...  Training loss: 3.1885...  0.5824 sec/batch
Epoch: 1/20...  Training Step: 78...  Training loss: 3.1784...  0.5631 sec/batch
Epoch: 1/20...  Training Step: 79...  Training loss: 3.1737...  0.5839 sec/batch
Epoch: 1/20...  Training Step: 80...  Training loss: 3.1576...  0.5737 sec/batch
Epoch: 1/20...  Training Step: 81...  Training loss: 3.1642...  0.5806 sec/batch
Epoch: 1/20...  Training Step: 82...  Training loss: 3.1853...  0.5623 sec/batch
Epoch: 1/20...  Training Step: 83...  Training loss: 3.1787...  0.6116 sec/batch
Epoch: 1/20...  Training Step: 84...  Training loss: 3.1731...  0.5918 sec/batch
Epoch: 1/20...  Training Step: 85...  Training loss: 3.1621...  0.5754 sec/batch
Epoch: 1/20...  Training Step: 86...  Training loss: 3.1653...  0.5758 sec/batch
Epoch: 1/20...  Training Step: 87...  Training loss: 3.1578...  0.5785 sec/batch
Epoch: 1/20...  Training Step: 88...  Training loss: 3.1631...  0.5910 sec/batch
Epoch: 1/20...  Training Step: 89...  Training loss: 3.1775...  0.5811 sec/batch
Epoch: 1/20...  Training Step: 90...  Training loss: 3.1702...  0.5846 sec/batch
Epoch: 1/20...  Training Step: 91...  Training loss: 3.1793...  0.5663 sec/batch
Epoch: 1/20...  Training Step: 92...  Training loss: 3.1623...  0.5601 sec/batch
Epoch: 1/20...  Training Step: 93...  Training loss: 3.1723...  0.6087 sec/batch
Epoch: 1/20...  Training Step: 94...  Training loss: 3.1708...  0.5757 sec/batch
Epoch: 1/20...  Training Step: 95...  Training loss: 3.1688...  0.5702 sec/batch
Epoch: 1/20...  Training Step: 96...  Training loss: 3.1628...  0.5741 sec/batch
Epoch: 1/20...  Training Step: 97...  Training loss: 3.1720...  0.5662 sec/batch
Epoch: 1/20...  Training Step: 98...  Training loss: 3.1612...  0.5868 sec/batch
Epoch: 1/20...  Training Step: 99...  Training loss: 3.1663...  0.5808 sec/batch
Epoch: 1/20...  Training Step: 100...  Training loss: 3.1587...  0.5773 sec/batch
Epoch: 1/20...  Training Step: 101...  Training loss: 3.1723...  0.5813 sec/batch
Epoch: 1/20...  Training Step: 102...  Training loss: 3.1658...  0.5647 sec/batch
Epoch: 1/20...  Training Step: 103...  Training loss: 3.1667...  0.5722 sec/batch
Epoch: 1/20...  Training Step: 104...  Training loss: 3.1660...  0.5836 sec/batch
Epoch: 1/20...  Training Step: 105...  Training loss: 3.1594...  0.5751 sec/batch
Epoch: 1/20...  Training Step: 106...  Training loss: 3.1637...  0.5628 sec/batch
Epoch: 1/20...  Training Step: 107...  Training loss: 3.1386...  0.6109 sec/batch
Epoch: 1/20...  Training Step: 108...  Training loss: 3.1427...  0.5879 sec/batch
Epoch: 1/20...  Training Step: 109...  Training loss: 3.1605...  0.5856 sec/batch
Epoch: 1/20...  Training Step: 110...  Training loss: 3.1305...  0.5775 sec/batch
Epoch: 1/20...  Training Step: 111...  Training loss: 3.1539...  0.5693 sec/batch
Epoch: 1/20...  Training Step: 112...  Training loss: 3.1559...  0.5756 sec/batch
Epoch: 1/20...  Training Step: 113...  Training loss: 3.1477...  0.5643 sec/batch
Epoch: 1/20...  Training Step: 114...  Training loss: 3.1327...  0.5902 sec/batch
Epoch: 1/20...  Training Step: 115...  Training loss: 3.1400...  0.5879 sec/batch
Epoch: 1/20...  Training Step: 116...  Training loss: 3.1330...  0.5690 sec/batch
Epoch: 1/20...  Training Step: 117...  Training loss: 3.1413...  0.5886 sec/batch
Epoch: 1/20...  Training Step: 118...  Training loss: 3.1614...  0.5559 sec/batch
Epoch: 1/20...  Training Step: 119...  Training loss: 3.1638...  0.5736 sec/batch
Epoch: 1/20...  Training Step: 120...  Training loss: 3.1393...  0.5847 sec/batch
Epoch: 1/20...  Training Step: 121...  Training loss: 3.1714...  0.5790 sec/batch
Epoch: 1/20...  Training Step: 122...  Training loss: 3.1480...  0.5787 sec/batch
Epoch: 1/20...  Training Step: 123...  Training loss: 3.1550...  0.5763 sec/batch
Epoch: 1/20...  Training Step: 124...  Training loss: 3.1539...  0.5858 sec/batch
Epoch: 1/20...  Training Step: 125...  Training loss: 3.1293...  0.5847 sec/batch
Epoch: 1/20...  Training Step: 126...  Training loss: 3.1223...  0.5757 sec/batch
Epoch: 1/20...  Training Step: 127...  Training loss: 3.1449...  0.5754 sec/batch
Epoch: 1/20...  Training Step: 128...  Training loss: 3.1503...  0.5797 sec/batch
Epoch: 1/20...  Training Step: 129...  Training loss: 3.1308...  0.5738 sec/batch
Epoch: 1/20...  Training Step: 130...  Training loss: 3.1408...  0.5633 sec/batch
Epoch: 1/20...  Training Step: 131...  Training loss: 3.1511...  0.5804 sec/batch
Epoch: 1/20...  Training Step: 132...  Training loss: 3.1360...  0.5756 sec/batch
Epoch: 1/20...  Training Step: 133...  Training loss: 3.1401...  0.5827 sec/batch
Epoch: 1/20...  Training Step: 134...  Training loss: 3.1280...  0.5655 sec/batch
Epoch: 1/20...  Training Step: 135...  Training loss: 3.0964...  0.5804 sec/batch
Epoch: 1/20...  Training Step: 136...  Training loss: 3.1116...  0.5860 sec/batch
Epoch: 1/20...  Training Step: 137...  Training loss: 3.1224...  0.5694 sec/batch
Epoch: 1/20...  Training Step: 138...  Training loss: 3.1159...  0.5845 sec/batch
Epoch: 1/20...  Training Step: 139...  Training loss: 3.1385...  0.5792 sec/batch
Epoch: 1/20...  Training Step: 140...  Training loss: 3.1330...  0.5821 sec/batch
Epoch: 1/20...  Training Step: 141...  Training loss: 3.1258...  0.5731 sec/batch
Epoch: 1/20...  Training Step: 142...  Training loss: 3.1057...  0.5785 sec/batch
Epoch: 1/20...  Training Step: 143...  Training loss: 3.1129...  0.5584 sec/batch
Epoch: 1/20...  Training Step: 144...  Training loss: 3.1113...  0.5738 sec/batch
Epoch: 1/20...  Training Step: 145...  Training loss: 3.1160...  0.6115 sec/batch
Epoch: 1/20...  Training Step: 146...  Training loss: 3.1238...  0.5869 sec/batch
Epoch: 1/20...  Training Step: 147...  Training loss: 3.1282...  0.5626 sec/batch
Epoch: 1/20...  Training Step: 148...  Training loss: 3.1486...  0.5759 sec/batch
Epoch: 1/20...  Training Step: 149...  Training loss: 3.1064...  0.5793 sec/batch
Epoch: 1/20...  Training Step: 150...  Training loss: 3.1159...  0.5862 sec/batch
Epoch: 1/20...  Training Step: 151...  Training loss: 3.1310...  0.5686 sec/batch
Epoch: 1/20...  Training Step: 152...  Training loss: 3.1412...  0.5789 sec/batch
Epoch: 1/20...  Training Step: 153...  Training loss: 3.1112...  0.5900 sec/batch
Epoch: 1/20...  Training Step: 154...  Training loss: 3.1193...  0.5762 sec/batch
Epoch: 1/20...  Training Step: 155...  Training loss: 3.1034...  0.5759 sec/batch
Epoch: 1/20...  Training Step: 156...  Training loss: 3.1016...  0.5524 sec/batch
Epoch: 1/20...  Training Step: 157...  Training loss: 3.0979...  0.5801 sec/batch
Epoch: 1/20...  Training Step: 158...  Training loss: 3.0977...  0.5717 sec/batch
Epoch: 1/20...  Training Step: 159...  Training loss: 3.0742...  0.5883 sec/batch
Epoch: 1/20...  Training Step: 160...  Training loss: 3.0837...  0.5826 sec/batch
Epoch: 1/20...  Training Step: 161...  Training loss: 3.0982...  0.5796 sec/batch
Epoch: 1/20...  Training Step: 162...  Training loss: 3.0652...  0.5774 sec/batch
Epoch: 1/20...  Training Step: 163...  Training loss: 3.0667...  0.5802 sec/batch
Epoch: 1/20...  Training Step: 164...  Training loss: 3.0879...  0.5801 sec/batch
Epoch: 1/20...  Training Step: 165...  Training loss: 3.0779...  0.5761 sec/batch
Epoch: 1/20...  Training Step: 166...  Training loss: 3.0767...  0.5873 sec/batch
Epoch: 1/20...  Training Step: 167...  Training loss: 3.0764...  0.5744 sec/batch
Epoch: 1/20...  Training Step: 168...  Training loss: 3.0837...  0.5872 sec/batch
Epoch: 1/20...  Training Step: 169...  Training loss: 3.0763...  0.5865 sec/batch
Epoch: 1/20...  Training Step: 170...  Training loss: 3.0472...  0.5787 sec/batch
Epoch: 1/20...  Training Step: 171...  Training loss: 3.0778...  0.5648 sec/batch
Epoch: 1/20...  Training Step: 172...  Training loss: 3.0933...  0.5670 sec/batch
Epoch: 1/20...  Training Step: 173...  Training loss: 3.1016...  0.5758 sec/batch
Epoch: 1/20...  Training Step: 174...  Training loss: 3.1049...  0.5852 sec/batch
Epoch: 1/20...  Training Step: 175...  Training loss: 3.0736...  0.5747 sec/batch
Epoch: 1/20...  Training Step: 176...  Training loss: 3.0668...  0.5672 sec/batch
Epoch: 1/20...  Training Step: 177...  Training loss: 3.0510...  0.5816 sec/batch
Epoch: 1/20...  Training Step: 178...  Training loss: 3.0222...  0.5806 sec/batch
Epoch: 1/20...  Training Step: 179...  Training loss: 3.0334...  0.5723 sec/batch
Epoch: 1/20...  Training Step: 180...  Training loss: 3.0214...  0.5803 sec/batch
Epoch: 1/20...  Training Step: 181...  Training loss: 3.0317...  0.5867 sec/batch
Epoch: 1/20...  Training Step: 182...  Training loss: 3.0334...  0.5645 sec/batch
Epoch: 1/20...  Training Step: 183...  Training loss: 3.0064...  0.5790 sec/batch
Epoch: 1/20...  Training Step: 184...  Training loss: 3.0325...  0.5624 sec/batch
Epoch: 1/20...  Training Step: 185...  Training loss: 3.0562...  0.5702 sec/batch
Epoch: 1/20...  Training Step: 186...  Training loss: 3.0092...  0.5768 sec/batch
Epoch: 1/20...  Training Step: 187...  Training loss: 3.0060...  0.5804 sec/batch
Epoch: 1/20...  Training Step: 188...  Training loss: 2.9767...  0.5800 sec/batch
Epoch: 1/20...  Training Step: 189...  Training loss: 2.9960...  0.5795 sec/batch
Epoch: 1/20...  Training Step: 190...  Training loss: 2.9909...  0.5872 sec/batch
Epoch: 1/20...  Training Step: 191...  Training loss: 2.9965...  0.5834 sec/batch
Epoch: 1/20...  Training Step: 192...  Training loss: 2.9549...  0.5795 sec/batch
Epoch: 1/20...  Training Step: 193...  Training loss: 2.9752...  0.5657 sec/batch
Epoch: 1/20...  Training Step: 194...  Training loss: 2.9648...  0.5693 sec/batch
Epoch: 1/20...  Training Step: 195...  Training loss: 2.9479...  0.5792 sec/batch
Epoch: 1/20...  Training Step: 196...  Training loss: 2.9508...  0.5787 sec/batch
Epoch: 1/20...  Training Step: 197...  Training loss: 2.9504...  0.6006 sec/batch
Epoch: 1/20...  Training Step: 198...  Training loss: 2.9397...  0.5810 sec/batch
Epoch: 2/20...  Training Step: 199...  Training loss: 3.0009...  0.5689 sec/batch
Epoch: 2/20...  Training Step: 200...  Training loss: 2.9127...  0.5778 sec/batch
Epoch: 2/20...  Training Step: 201...  Training loss: 2.9192...  0.5801 sec/batch
Epoch: 2/20...  Training Step: 202...  Training loss: 2.9245...  0.5843 sec/batch
Epoch: 2/20...  Training Step: 203...  Training loss: 2.9281...  0.5819 sec/batch
Epoch: 2/20...  Training Step: 204...  Training loss: 2.9350...  0.5711 sec/batch
Epoch: 2/20...  Training Step: 205...  Training loss: 2.9289...  0.5760 sec/batch
Epoch: 2/20...  Training Step: 206...  Training loss: 2.9202...  0.5812 sec/batch
Epoch: 2/20...  Training Step: 207...  Training loss: 2.9083...  0.5770 sec/batch
Epoch: 2/20...  Training Step: 208...  Training loss: 2.9016...  0.5875 sec/batch
Epoch: 2/20...  Training Step: 209...  Training loss: 2.8889...  0.5755 sec/batch
Epoch: 2/20...  Training Step: 210...  Training loss: 2.9034...  0.5902 sec/batch
Epoch: 2/20...  Training Step: 211...  Training loss: 2.8817...  0.5877 sec/batch
Epoch: 2/20...  Training Step: 212...  Training loss: 2.9137...  0.5786 sec/batch
Epoch: 2/20...  Training Step: 213...  Training loss: 2.8923...  0.5834 sec/batch
Epoch: 2/20...  Training Step: 214...  Training loss: 2.8817...  0.5724 sec/batch
Epoch: 2/20...  Training Step: 215...  Training loss: 2.8722...  0.5639 sec/batch
Epoch: 2/20...  Training Step: 216...  Training loss: 2.9077...  0.5799 sec/batch
Epoch: 2/20...  Training Step: 217...  Training loss: 2.8860...  0.5773 sec/batch
Epoch: 2/20...  Training Step: 218...  Training loss: 2.8268...  0.5780 sec/batch
Epoch: 2/20...  Training Step: 219...  Training loss: 2.8591...  0.5824 sec/batch
Epoch: 2/20...  Training Step: 220...  Training loss: 2.8620...  0.5766 sec/batch
Epoch: 2/20...  Training Step: 221...  Training loss: 2.8439...  0.5846 sec/batch
Epoch: 2/20...  Training Step: 222...  Training loss: 2.8431...  0.5768 sec/batch
Epoch: 2/20...  Training Step: 223...  Training loss: 2.8184...  0.6142 sec/batch
Epoch: 2/20...  Training Step: 224...  Training loss: 2.8586...  0.5873 sec/batch
Epoch: 2/20...  Training Step: 225...  Training loss: 2.8362...  0.5881 sec/batch
Epoch: 2/20...  Training Step: 226...  Training loss: 2.8090...  0.5805 sec/batch
Epoch: 2/20...  Training Step: 227...  Training loss: 2.8229...  0.5812 sec/batch
Epoch: 2/20...  Training Step: 228...  Training loss: 2.8135...  0.5749 sec/batch
Epoch: 2/20...  Training Step: 229...  Training loss: 2.8473...  0.5836 sec/batch
Epoch: 2/20...  Training Step: 230...  Training loss: 2.8033...  0.5749 sec/batch
Epoch: 2/20...  Training Step: 231...  Training loss: 2.7908...  0.5785 sec/batch
Epoch: 2/20...  Training Step: 232...  Training loss: 2.8008...  0.5805 sec/batch
Epoch: 2/20...  Training Step: 233...  Training loss: 2.7731...  0.5831 sec/batch
Epoch: 2/20...  Training Step: 234...  Training loss: 2.7999...  0.5801 sec/batch
Epoch: 2/20...  Training Step: 235...  Training loss: 2.7809...  0.5655 sec/batch
Epoch: 2/20...  Training Step: 236...  Training loss: 2.7567...  0.5652 sec/batch
Epoch: 2/20...  Training Step: 237...  Training loss: 2.7643...  0.5828 sec/batch
Epoch: 2/20...  Training Step: 238...  Training loss: 2.7601...  0.5728 sec/batch
Epoch: 2/20...  Training Step: 239...  Training loss: 2.7446...  0.5815 sec/batch
Epoch: 2/20...  Training Step: 240...  Training loss: 2.7435...  0.5773 sec/batch
Epoch: 2/20...  Training Step: 241...  Training loss: 2.7343...  0.5615 sec/batch
Epoch: 2/20...  Training Step: 242...  Training loss: 2.7367...  0.5813 sec/batch
Epoch: 2/20...  Training Step: 243...  Training loss: 2.7259...  0.5862 sec/batch
Epoch: 2/20...  Training Step: 244...  Training loss: 2.7338...  0.5880 sec/batch
Epoch: 2/20...  Training Step: 245...  Training loss: 2.7413...  0.5742 sec/batch
Epoch: 2/20...  Training Step: 246...  Training loss: 2.7419...  0.5810 sec/batch
Epoch: 2/20...  Training Step: 247...  Training loss: 2.7355...  0.5506 sec/batch
Epoch: 2/20...  Training Step: 248...  Training loss: 2.7438...  0.6212 sec/batch
Epoch: 2/20...  Training Step: 249...  Training loss: 2.7129...  0.5815 sec/batch
Epoch: 2/20...  Training Step: 250...  Training loss: 2.7176...  0.5775 sec/batch
Epoch: 2/20...  Training Step: 251...  Training loss: 2.7169...  0.5568 sec/batch
Epoch: 2/20...  Training Step: 252...  Training loss: 2.6994...  0.5802 sec/batch
Epoch: 2/20...  Training Step: 253...  Training loss: 2.6928...  0.5850 sec/batch
Epoch: 2/20...  Training Step: 254...  Training loss: 2.7022...  0.5776 sec/batch
Epoch: 2/20...  Training Step: 255...  Training loss: 2.6971...  0.5824 sec/batch
Epoch: 2/20...  Training Step: 256...  Training loss: 2.6820...  0.5796 sec/batch
Epoch: 2/20...  Training Step: 257...  Training loss: 2.6835...  0.5790 sec/batch
Epoch: 2/20...  Training Step: 258...  Training loss: 2.6909...  0.5612 sec/batch
Epoch: 2/20...  Training Step: 259...  Training loss: 2.6882...  0.5778 sec/batch
Epoch: 2/20...  Training Step: 260...  Training loss: 2.7015...  0.5799 sec/batch
Epoch: 2/20...  Training Step: 261...  Training loss: 2.7113...  0.5706 sec/batch
Epoch: 2/20...  Training Step: 262...  Training loss: 2.6667...  0.5923 sec/batch
Epoch: 2/20...  Training Step: 263...  Training loss: 2.6627...  0.5827 sec/batch
Epoch: 2/20...  Training Step: 264...  Training loss: 2.7040...  0.5849 sec/batch
Epoch: 2/20...  Training Step: 265...  Training loss: 2.6721...  0.5730 sec/batch
Epoch: 2/20...  Training Step: 266...  Training loss: 2.6285...  0.5812 sec/batch
Epoch: 2/20...  Training Step: 267...  Training loss: 2.6380...  0.5803 sec/batch
Epoch: 2/20...  Training Step: 268...  Training loss: 2.6667...  0.5637 sec/batch
Epoch: 2/20...  Training Step: 269...  Training loss: 2.6624...  0.5816 sec/batch
Epoch: 2/20...  Training Step: 270...  Training loss: 2.6788...  0.5836 sec/batch
Epoch: 2/20...  Training Step: 271...  Training loss: 2.6534...  0.5821 sec/batch
Epoch: 2/20...  Training Step: 272...  Training loss: 2.6477...  0.5807 sec/batch
Epoch: 2/20...  Training Step: 273...  Training loss: 2.6611...  0.5732 sec/batch
Epoch: 2/20...  Training Step: 274...  Training loss: 2.6866...  0.5603 sec/batch
Epoch: 2/20...  Training Step: 275...  Training loss: 2.6484...  0.5842 sec/batch
Epoch: 2/20...  Training Step: 276...  Training loss: 2.6492...  0.5800 sec/batch
Epoch: 2/20...  Training Step: 277...  Training loss: 2.6299...  0.5866 sec/batch
Epoch: 2/20...  Training Step: 278...  Training loss: 2.6267...  0.5788 sec/batch
Epoch: 2/20...  Training Step: 279...  Training loss: 2.6296...  0.5898 sec/batch
Epoch: 2/20...  Training Step: 280...  Training loss: 2.6474...  0.5690 sec/batch
Epoch: 2/20...  Training Step: 281...  Training loss: 2.6383...  0.5839 sec/batch
Epoch: 2/20...  Training Step: 282...  Training loss: 2.6123...  0.5892 sec/batch
Epoch: 2/20...  Training Step: 283...  Training loss: 2.6040...  0.5848 sec/batch
Epoch: 2/20...  Training Step: 284...  Training loss: 2.6201...  0.5872 sec/batch
Epoch: 2/20...  Training Step: 285...  Training loss: 2.6216...  0.5771 sec/batch
Epoch: 2/20...  Training Step: 286...  Training loss: 2.6107...  0.5723 sec/batch
Epoch: 2/20...  Training Step: 287...  Training loss: 2.6213...  0.5837 sec/batch
Epoch: 2/20...  Training Step: 288...  Training loss: 2.6309...  0.5713 sec/batch
Epoch: 2/20...  Training Step: 289...  Training loss: 2.6100...  0.5784 sec/batch
Epoch: 2/20...  Training Step: 290...  Training loss: 2.6227...  0.5709 sec/batch
Epoch: 2/20...  Training Step: 291...  Training loss: 2.6160...  0.5770 sec/batch
Epoch: 2/20...  Training Step: 292...  Training loss: 2.5978...  0.5608 sec/batch
Epoch: 2/20...  Training Step: 293...  Training loss: 2.5909...  0.5568 sec/batch
Epoch: 2/20...  Training Step: 294...  Training loss: 2.5889...  0.5761 sec/batch
Epoch: 2/20...  Training Step: 295...  Training loss: 2.6064...  0.5824 sec/batch
Epoch: 2/20...  Training Step: 296...  Training loss: 2.6073...  0.5831 sec/batch
Epoch: 2/20...  Training Step: 297...  Training loss: 2.5980...  0.5854 sec/batch
Epoch: 2/20...  Training Step: 298...  Training loss: 2.5895...  0.5845 sec/batch
Epoch: 2/20...  Training Step: 299...  Training loss: 2.6054...  0.5946 sec/batch
Epoch: 2/20...  Training Step: 300...  Training loss: 2.5906...  0.6039 sec/batch
Epoch: 2/20...  Training Step: 301...  Training loss: 2.5868...  0.5848 sec/batch
Epoch: 2/20...  Training Step: 302...  Training loss: 2.5847...  0.5801 sec/batch
Epoch: 2/20...  Training Step: 303...  Training loss: 2.5866...  0.5840 sec/batch
Epoch: 2/20...  Training Step: 304...  Training loss: 2.5847...  0.5768 sec/batch
Epoch: 2/20...  Training Step: 305...  Training loss: 2.5762...  0.5602 sec/batch
Epoch: 2/20...  Training Step: 306...  Training loss: 2.5967...  0.5837 sec/batch
Epoch: 2/20...  Training Step: 307...  Training loss: 2.5956...  0.5828 sec/batch
Epoch: 2/20...  Training Step: 308...  Training loss: 2.5571...  0.5845 sec/batch
Epoch: 2/20...  Training Step: 309...  Training loss: 2.5799...  0.5817 sec/batch
Epoch: 2/20...  Training Step: 310...  Training loss: 2.5899...  0.5781 sec/batch
Epoch: 2/20...  Training Step: 311...  Training loss: 2.5726...  0.5849 sec/batch
Epoch: 2/20...  Training Step: 312...  Training loss: 2.5504...  0.5739 sec/batch
Epoch: 2/20...  Training Step: 313...  Training loss: 2.5566...  0.5926 sec/batch
Epoch: 2/20...  Training Step: 314...  Training loss: 2.5389...  0.5809 sec/batch
Epoch: 2/20...  Training Step: 315...  Training loss: 2.5635...  0.5881 sec/batch
Epoch: 2/20...  Training Step: 316...  Training loss: 2.5667...  0.5728 sec/batch
Epoch: 2/20...  Training Step: 317...  Training loss: 2.5982...  0.5812 sec/batch
Epoch: 2/20...  Training Step: 318...  Training loss: 2.5660...  0.5788 sec/batch
Epoch: 2/20...  Training Step: 319...  Training loss: 2.6008...  0.5759 sec/batch
Epoch: 2/20...  Training Step: 320...  Training loss: 2.5637...  0.5727 sec/batch
Epoch: 2/20...  Training Step: 321...  Training loss: 2.5610...  0.5775 sec/batch
Epoch: 2/20...  Training Step: 322...  Training loss: 2.5752...  0.5896 sec/batch
Epoch: 2/20...  Training Step: 323...  Training loss: 2.5557...  0.5788 sec/batch
Epoch: 2/20...  Training Step: 324...  Training loss: 2.5420...  0.5783 sec/batch
Epoch: 2/20...  Training Step: 325...  Training loss: 2.5770...  0.5822 sec/batch
Epoch: 2/20...  Training Step: 326...  Training loss: 2.5886...  0.5851 sec/batch
Epoch: 2/20...  Training Step: 327...  Training loss: 2.5517...  0.5793 sec/batch
Epoch: 2/20...  Training Step: 328...  Training loss: 2.5577...  0.5816 sec/batch
Epoch: 2/20...  Training Step: 329...  Training loss: 2.5545...  0.5579 sec/batch
Epoch: 2/20...  Training Step: 330...  Training loss: 2.5342...  0.5937 sec/batch
Epoch: 2/20...  Training Step: 331...  Training loss: 2.5663...  0.5723 sec/batch
Epoch: 2/20...  Training Step: 332...  Training loss: 2.5627...  0.5911 sec/batch
Epoch: 2/20...  Training Step: 333...  Training loss: 2.5340...  0.5857 sec/batch
Epoch: 2/20...  Training Step: 334...  Training loss: 2.5274...  0.5820 sec/batch
Epoch: 2/20...  Training Step: 335...  Training loss: 2.5387...  0.5805 sec/batch
Epoch: 2/20...  Training Step: 336...  Training loss: 2.5498...  0.5849 sec/batch
Epoch: 2/20...  Training Step: 337...  Training loss: 2.5663...  0.5795 sec/batch
Epoch: 2/20...  Training Step: 338...  Training loss: 2.5435...  0.5836 sec/batch
Epoch: 2/20...  Training Step: 339...  Training loss: 2.5610...  0.5874 sec/batch
Epoch: 2/20...  Training Step: 340...  Training loss: 2.5325...  0.5800 sec/batch
Epoch: 2/20...  Training Step: 341...  Training loss: 2.5493...  0.5830 sec/batch
Epoch: 2/20...  Training Step: 342...  Training loss: 2.5308...  0.5573 sec/batch
Epoch: 2/20...  Training Step: 343...  Training loss: 2.5331...  0.5818 sec/batch
Epoch: 2/20...  Training Step: 344...  Training loss: 2.5604...  0.5812 sec/batch
Epoch: 2/20...  Training Step: 345...  Training loss: 2.5390...  0.5844 sec/batch
Epoch: 2/20...  Training Step: 346...  Training loss: 2.5562...  0.5731 sec/batch
Epoch: 2/20...  Training Step: 347...  Training loss: 2.5276...  0.5904 sec/batch
Epoch: 2/20...  Training Step: 348...  Training loss: 2.5208...  0.5739 sec/batch
Epoch: 2/20...  Training Step: 349...  Training loss: 2.5627...  0.5823 sec/batch
Epoch: 2/20...  Training Step: 350...  Training loss: 2.5714...  0.5858 sec/batch
Epoch: 2/20...  Training Step: 351...  Training loss: 2.5399...  0.6043 sec/batch
Epoch: 2/20...  Training Step: 352...  Training loss: 2.5408...  0.5869 sec/batch
Epoch: 2/20...  Training Step: 353...  Training loss: 2.5180...  0.5915 sec/batch
Epoch: 2/20...  Training Step: 354...  Training loss: 2.5314...  0.5679 sec/batch
Epoch: 2/20...  Training Step: 355...  Training loss: 2.5119...  0.5777 sec/batch
Epoch: 2/20...  Training Step: 356...  Training loss: 2.5174...  0.5787 sec/batch
Epoch: 2/20...  Training Step: 357...  Training loss: 2.4997...  0.5835 sec/batch
Epoch: 2/20...  Training Step: 358...  Training loss: 2.5408...  0.5646 sec/batch
Epoch: 2/20...  Training Step: 359...  Training loss: 2.5260...  0.5868 sec/batch
Epoch: 2/20...  Training Step: 360...  Training loss: 2.4912...  0.5793 sec/batch
Epoch: 2/20...  Training Step: 361...  Training loss: 2.4978...  0.5829 sec/batch
Epoch: 2/20...  Training Step: 362...  Training loss: 2.5255...  0.5733 sec/batch
Epoch: 2/20...  Training Step: 363...  Training loss: 2.5206...  0.5816 sec/batch
Epoch: 2/20...  Training Step: 364...  Training loss: 2.5227...  0.5820 sec/batch
Epoch: 2/20...  Training Step: 365...  Training loss: 2.5156...  0.5893 sec/batch
Epoch: 2/20...  Training Step: 366...  Training loss: 2.5190...  0.5707 sec/batch
Epoch: 2/20...  Training Step: 367...  Training loss: 2.5198...  0.5830 sec/batch
Epoch: 2/20...  Training Step: 368...  Training loss: 2.4861...  0.5820 sec/batch
Epoch: 2/20...  Training Step: 369...  Training loss: 2.5221...  0.5874 sec/batch
Epoch: 2/20...  Training Step: 370...  Training loss: 2.5334...  0.5874 sec/batch
Epoch: 2/20...  Training Step: 371...  Training loss: 2.5338...  0.5708 sec/batch
Epoch: 2/20...  Training Step: 372...  Training loss: 2.5484...  0.5771 sec/batch
Epoch: 2/20...  Training Step: 373...  Training loss: 2.5304...  0.5946 sec/batch
Epoch: 2/20...  Training Step: 374...  Training loss: 2.5150...  0.5700 sec/batch
Epoch: 2/20...  Training Step: 375...  Training loss: 2.5046...  0.5746 sec/batch
Epoch: 2/20...  Training Step: 376...  Training loss: 2.4789...  0.5685 sec/batch
Epoch: 2/20...  Training Step: 377...  Training loss: 2.4799...  0.5744 sec/batch
Epoch: 2/20...  Training Step: 378...  Training loss: 2.4834...  0.5758 sec/batch
Epoch: 2/20...  Training Step: 379...  Training loss: 2.4970...  0.5744 sec/batch
Epoch: 2/20...  Training Step: 380...  Training loss: 2.5022...  0.5795 sec/batch
Epoch: 2/20...  Training Step: 381...  Training loss: 2.4927...  0.5762 sec/batch
Epoch: 2/20...  Training Step: 382...  Training loss: 2.5161...  0.5620 sec/batch
Epoch: 2/20...  Training Step: 383...  Training loss: 2.5203...  0.5871 sec/batch
Epoch: 2/20...  Training Step: 384...  Training loss: 2.4928...  0.5810 sec/batch
Epoch: 2/20...  Training Step: 385...  Training loss: 2.4799...  0.5784 sec/batch
Epoch: 2/20...  Training Step: 386...  Training loss: 2.4759...  0.5687 sec/batch
Epoch: 2/20...  Training Step: 387...  Training loss: 2.4799...  0.5871 sec/batch
Epoch: 2/20...  Training Step: 388...  Training loss: 2.4918...  0.5799 sec/batch
Epoch: 2/20...  Training Step: 389...  Training loss: 2.5016...  0.5837 sec/batch
Epoch: 2/20...  Training Step: 390...  Training loss: 2.4591...  0.5755 sec/batch
Epoch: 2/20...  Training Step: 391...  Training loss: 2.4955...  0.5861 sec/batch
Epoch: 2/20...  Training Step: 392...  Training loss: 2.4855...  0.5905 sec/batch
Epoch: 2/20...  Training Step: 393...  Training loss: 2.4638...  0.5798 sec/batch
Epoch: 2/20...  Training Step: 394...  Training loss: 2.4750...  0.5803 sec/batch
Epoch: 2/20...  Training Step: 395...  Training loss: 2.4702...  0.5574 sec/batch
Epoch: 2/20...  Training Step: 396...  Training loss: 2.4709...  0.5682 sec/batch
Epoch: 3/20...  Training Step: 397...  Training loss: 2.5470...  0.5829 sec/batch
Epoch: 3/20...  Training Step: 398...  Training loss: 2.4582...  0.5749 sec/batch
Epoch: 3/20...  Training Step: 399...  Training loss: 2.4629...  0.5855 sec/batch
Epoch: 3/20...  Training Step: 400...  Training loss: 2.4787...  0.5774 sec/batch
Epoch: 3/20...  Training Step: 401...  Training loss: 2.4734...  0.5713 sec/batch
Epoch: 3/20...  Training Step: 402...  Training loss: 2.4735...  0.6093 sec/batch
Epoch: 3/20...  Training Step: 403...  Training loss: 2.4788...  0.5789 sec/batch
Epoch: 3/20...  Training Step: 404...  Training loss: 2.4806...  0.5865 sec/batch
Epoch: 3/20...  Training Step: 405...  Training loss: 2.5106...  0.5834 sec/batch
Epoch: 3/20...  Training Step: 406...  Training loss: 2.4641...  0.5797 sec/batch
Epoch: 3/20...  Training Step: 407...  Training loss: 2.4687...  0.5838 sec/batch
Epoch: 3/20...  Training Step: 408...  Training loss: 2.4808...  0.5839 sec/batch
Epoch: 3/20...  Training Step: 409...  Training loss: 2.4687...  0.5795 sec/batch
Epoch: 3/20...  Training Step: 410...  Training loss: 2.5076...  0.5832 sec/batch
Epoch: 3/20...  Training Step: 411...  Training loss: 2.4755...  0.5831 sec/batch
Epoch: 3/20...  Training Step: 412...  Training loss: 2.4766...  0.5862 sec/batch
Epoch: 3/20...  Training Step: 413...  Training loss: 2.4754...  0.5862 sec/batch
Epoch: 3/20...  Training Step: 414...  Training loss: 2.5091...  0.5786 sec/batch
Epoch: 3/20...  Training Step: 415...  Training loss: 2.4816...  0.5862 sec/batch
Epoch: 3/20...  Training Step: 416...  Training loss: 2.4453...  0.5912 sec/batch
Epoch: 3/20...  Training Step: 417...  Training loss: 2.4675...  0.5718 sec/batch
Epoch: 3/20...  Training Step: 418...  Training loss: 2.4893...  0.5896 sec/batch
Epoch: 3/20...  Training Step: 419...  Training loss: 2.4760...  0.5864 sec/batch
Epoch: 3/20...  Training Step: 420...  Training loss: 2.4585...  0.5864 sec/batch
Epoch: 3/20...  Training Step: 421...  Training loss: 2.4619...  0.5706 sec/batch
Epoch: 3/20...  Training Step: 422...  Training loss: 2.4693...  0.5758 sec/batch
Epoch: 3/20...  Training Step: 423...  Training loss: 2.4597...  0.5771 sec/batch
Epoch: 3/20...  Training Step: 424...  Training loss: 2.4474...  0.5924 sec/batch
Epoch: 3/20...  Training Step: 425...  Training loss: 2.4760...  0.5569 sec/batch
Epoch: 3/20...  Training Step: 426...  Training loss: 2.4601...  0.5694 sec/batch
Epoch: 3/20...  Training Step: 427...  Training loss: 2.4787...  0.5840 sec/batch
Epoch: 3/20...  Training Step: 428...  Training loss: 2.4468...  0.5825 sec/batch
Epoch: 3/20...  Training Step: 429...  Training loss: 2.4414...  0.5833 sec/batch
Epoch: 3/20...  Training Step: 430...  Training loss: 2.4705...  0.5859 sec/batch
Epoch: 3/20...  Training Step: 431...  Training loss: 2.4540...  0.6092 sec/batch
Epoch: 3/20...  Training Step: 432...  Training loss: 2.4648...  0.5764 sec/batch
Epoch: 3/20...  Training Step: 433...  Training loss: 2.4437...  0.5827 sec/batch
Epoch: 3/20...  Training Step: 434...  Training loss: 2.4225...  0.5864 sec/batch
Epoch: 3/20...  Training Step: 435...  Training loss: 2.4347...  0.5790 sec/batch
Epoch: 3/20...  Training Step: 436...  Training loss: 2.4316...  0.5700 sec/batch
Epoch: 3/20...  Training Step: 437...  Training loss: 2.4278...  0.5853 sec/batch
Epoch: 3/20...  Training Step: 438...  Training loss: 2.4341...  0.5825 sec/batch
Epoch: 3/20...  Training Step: 439...  Training loss: 2.4288...  0.5720 sec/batch
Epoch: 3/20...  Training Step: 440...  Training loss: 2.4378...  0.5697 sec/batch
Epoch: 3/20...  Training Step: 441...  Training loss: 2.4331...  0.5856 sec/batch
Epoch: 3/20...  Training Step: 442...  Training loss: 2.4053...  0.5690 sec/batch
Epoch: 3/20...  Training Step: 443...  Training loss: 2.4474...  0.5844 sec/batch
Epoch: 3/20...  Training Step: 444...  Training loss: 2.4373...  0.5863 sec/batch
Epoch: 3/20...  Training Step: 445...  Training loss: 2.4371...  0.5781 sec/batch
Epoch: 3/20...  Training Step: 446...  Training loss: 2.4535...  0.5739 sec/batch
Epoch: 3/20...  Training Step: 447...  Training loss: 2.4199...  0.5831 sec/batch
Epoch: 3/20...  Training Step: 448...  Training loss: 2.4402...  0.5881 sec/batch
Epoch: 3/20...  Training Step: 449...  Training loss: 2.4286...  0.5785 sec/batch
Epoch: 3/20...  Training Step: 450...  Training loss: 2.4213...  0.5768 sec/batch
Epoch: 3/20...  Training Step: 451...  Training loss: 2.4264...  0.5618 sec/batch
Epoch: 3/20...  Training Step: 452...  Training loss: 2.4321...  0.5745 sec/batch
Epoch: 3/20...  Training Step: 453...  Training loss: 2.4277...  0.6023 sec/batch
Epoch: 3/20...  Training Step: 454...  Training loss: 2.4183...  0.5898 sec/batch
Epoch: 3/20...  Training Step: 455...  Training loss: 2.4291...  0.5839 sec/batch
Epoch: 3/20...  Training Step: 456...  Training loss: 2.4346...  0.5735 sec/batch
Epoch: 3/20...  Training Step: 457...  Training loss: 2.4200...  0.5896 sec/batch
Epoch: 3/20...  Training Step: 458...  Training loss: 2.4415...  0.5787 sec/batch
Epoch: 3/20...  Training Step: 459...  Training loss: 2.4482...  0.5796 sec/batch
Epoch: 3/20...  Training Step: 460...  Training loss: 2.4199...  0.5817 sec/batch
Epoch: 3/20...  Training Step: 461...  Training loss: 2.4134...  0.5790 sec/batch
Epoch: 3/20...  Training Step: 462...  Training loss: 2.4414...  0.5870 sec/batch
Epoch: 3/20...  Training Step: 463...  Training loss: 2.4285...  0.5744 sec/batch
Epoch: 3/20...  Training Step: 464...  Training loss: 2.3846...  0.5814 sec/batch
Epoch: 3/20...  Training Step: 465...  Training loss: 2.4002...  0.5780 sec/batch
Epoch: 3/20...  Training Step: 466...  Training loss: 2.4286...  0.5659 sec/batch
Epoch: 3/20...  Training Step: 467...  Training loss: 2.4294...  0.5896 sec/batch
Epoch: 3/20...  Training Step: 468...  Training loss: 2.4289...  0.5860 sec/batch
Epoch: 3/20...  Training Step: 469...  Training loss: 2.4338...  0.5777 sec/batch
Epoch: 3/20...  Training Step: 470...  Training loss: 2.4057...  0.5786 sec/batch
Epoch: 3/20...  Training Step: 471...  Training loss: 2.4067...  0.5902 sec/batch
Epoch: 3/20...  Training Step: 472...  Training loss: 2.4594...  0.5780 sec/batch
Epoch: 3/20...  Training Step: 473...  Training loss: 2.4128...  0.5861 sec/batch
Epoch: 3/20...  Training Step: 474...  Training loss: 2.4266...  0.5774 sec/batch
Epoch: 3/20...  Training Step: 475...  Training loss: 2.4020...  0.5811 sec/batch
Epoch: 3/20...  Training Step: 476...  Training loss: 2.4078...  0.5864 sec/batch
Epoch: 3/20...  Training Step: 477...  Training loss: 2.4019...  0.5990 sec/batch
Epoch: 3/20...  Training Step: 478...  Training loss: 2.4275...  0.5915 sec/batch
Epoch: 3/20...  Training Step: 479...  Training loss: 2.4076...  0.5798 sec/batch
Epoch: 3/20...  Training Step: 480...  Training loss: 2.3849...  0.5765 sec/batch
Epoch: 3/20...  Training Step: 481...  Training loss: 2.3761...  0.5690 sec/batch
Epoch: 3/20...  Training Step: 482...  Training loss: 2.4027...  0.5851 sec/batch
Epoch: 3/20...  Training Step: 483...  Training loss: 2.4077...  0.5780 sec/batch
Epoch: 3/20...  Training Step: 484...  Training loss: 2.4108...  0.5847 sec/batch
Epoch: 3/20...  Training Step: 485...  Training loss: 2.4034...  0.5814 sec/batch
Epoch: 3/20...  Training Step: 486...  Training loss: 2.4136...  0.5861 sec/batch
Epoch: 3/20...  Training Step: 487...  Training loss: 2.3951...  0.5741 sec/batch
Epoch: 3/20...  Training Step: 488...  Training loss: 2.4115...  0.5624 sec/batch
Epoch: 3/20...  Training Step: 489...  Training loss: 2.4026...  0.5854 sec/batch
Epoch: 3/20...  Training Step: 490...  Training loss: 2.3858...  0.5770 sec/batch
Epoch: 3/20...  Training Step: 491...  Training loss: 2.3884...  0.5903 sec/batch
Epoch: 3/20...  Training Step: 492...  Training loss: 2.3948...  0.5840 sec/batch
Epoch: 3/20...  Training Step: 493...  Training loss: 2.4111...  0.5951 sec/batch
Epoch: 3/20...  Training Step: 494...  Training loss: 2.3976...  0.5849 sec/batch
Epoch: 3/20...  Training Step: 495...  Training loss: 2.4045...  0.5781 sec/batch
Epoch: 3/20...  Training Step: 496...  Training loss: 2.3899...  0.5881 sec/batch
Epoch: 3/20...  Training Step: 497...  Training loss: 2.4084...  0.5825 sec/batch
Epoch: 3/20...  Training Step: 498...  Training loss: 2.3981...  0.5922 sec/batch
Epoch: 3/20...  Training Step: 499...  Training loss: 2.3801...  0.5838 sec/batch
Epoch: 3/20...  Training Step: 500...  Training loss: 2.3784...  0.5764 sec/batch
Epoch: 3/20...  Training Step: 501...  Training loss: 2.3859...  0.5767 sec/batch
Epoch: 3/20...  Training Step: 502...  Training loss: 2.3954...  0.5881 sec/batch
Epoch: 3/20...  Training Step: 503...  Training loss: 2.3857...  0.5770 sec/batch
Epoch: 3/20...  Training Step: 504...  Training loss: 2.4072...  0.5852 sec/batch
Epoch: 3/20...  Training Step: 505...  Training loss: 2.4004...  0.6116 sec/batch
Epoch: 3/20...  Training Step: 506...  Training loss: 2.3724...  0.5924 sec/batch
Epoch: 3/20...  Training Step: 507...  Training loss: 2.3869...  0.5856 sec/batch
Epoch: 3/20...  Training Step: 508...  Training loss: 2.3996...  0.5788 sec/batch
Epoch: 3/20...  Training Step: 509...  Training loss: 2.3816...  0.5604 sec/batch
Epoch: 3/20...  Training Step: 510...  Training loss: 2.3847...  0.5944 sec/batch
Epoch: 3/20...  Training Step: 511...  Training loss: 2.3829...  0.5870 sec/batch
Epoch: 3/20...  Training Step: 512...  Training loss: 2.3598...  0.5797 sec/batch
Epoch: 3/20...  Training Step: 513...  Training loss: 2.3971...  0.5869 sec/batch
Epoch: 3/20...  Training Step: 514...  Training loss: 2.3904...  0.5767 sec/batch
Epoch: 3/20...  Training Step: 515...  Training loss: 2.4113...  0.5915 sec/batch
Epoch: 3/20...  Training Step: 516...  Training loss: 2.3895...  0.5781 sec/batch
Epoch: 3/20...  Training Step: 517...  Training loss: 2.4085...  0.5828 sec/batch
Epoch: 3/20...  Training Step: 518...  Training loss: 2.3800...  0.5880 sec/batch
Epoch: 3/20...  Training Step: 519...  Training loss: 2.3845...  0.5892 sec/batch
Epoch: 3/20...  Training Step: 520...  Training loss: 2.4076...  0.5872 sec/batch
Epoch: 3/20...  Training Step: 521...  Training loss: 2.3896...  0.5816 sec/batch
Epoch: 3/20...  Training Step: 522...  Training loss: 2.3620...  0.5825 sec/batch
Epoch: 3/20...  Training Step: 523...  Training loss: 2.3948...  0.5833 sec/batch
Epoch: 3/20...  Training Step: 524...  Training loss: 2.4007...  0.5836 sec/batch
Epoch: 3/20...  Training Step: 525...  Training loss: 2.3899...  0.5490 sec/batch
Epoch: 3/20...  Training Step: 526...  Training loss: 2.3899...  0.5954 sec/batch
Epoch: 3/20...  Training Step: 527...  Training loss: 2.3873...  0.6855 sec/batch
Epoch: 3/20...  Training Step: 528...  Training loss: 2.3610...  0.5704 sec/batch
Epoch: 3/20...  Training Step: 529...  Training loss: 2.3877...  0.5867 sec/batch
Epoch: 3/20...  Training Step: 530...  Training loss: 2.3984...  0.5834 sec/batch
Epoch: 3/20...  Training Step: 531...  Training loss: 2.3778...  0.5705 sec/batch
Epoch: 3/20...  Training Step: 532...  Training loss: 2.3796...  0.5881 sec/batch
Epoch: 3/20...  Training Step: 533...  Training loss: 2.3736...  0.5598 sec/batch
Epoch: 3/20...  Training Step: 534...  Training loss: 2.3774...  0.5794 sec/batch
Epoch: 3/20...  Training Step: 535...  Training loss: 2.4004...  0.5759 sec/batch
Epoch: 3/20...  Training Step: 536...  Training loss: 2.3662...  0.5999 sec/batch
Epoch: 3/20...  Training Step: 537...  Training loss: 2.3845...  0.5860 sec/batch
Epoch: 3/20...  Training Step: 538...  Training loss: 2.3698...  0.5782 sec/batch
Epoch: 3/20...  Training Step: 539...  Training loss: 2.3737...  0.5937 sec/batch
Epoch: 3/20...  Training Step: 540...  Training loss: 2.3709...  0.5836 sec/batch
Epoch: 3/20...  Training Step: 541...  Training loss: 2.3722...  0.5820 sec/batch
Epoch: 3/20...  Training Step: 542...  Training loss: 2.3910...  0.5900 sec/batch
Epoch: 3/20...  Training Step: 543...  Training loss: 2.3696...  0.5598 sec/batch
Epoch: 3/20...  Training Step: 544...  Training loss: 2.3942...  0.6084 sec/batch
Epoch: 3/20...  Training Step: 545...  Training loss: 2.3712...  0.5694 sec/batch
Epoch: 3/20...  Training Step: 546...  Training loss: 2.3607...  0.5687 sec/batch
Epoch: 3/20...  Training Step: 547...  Training loss: 2.3920...  0.5866 sec/batch
Epoch: 3/20...  Training Step: 548...  Training loss: 2.4094...  0.5817 sec/batch
Epoch: 3/20...  Training Step: 549...  Training loss: 2.3827...  0.5862 sec/batch
Epoch: 3/20...  Training Step: 550...  Training loss: 2.3894...  0.5849 sec/batch
Epoch: 3/20...  Training Step: 551...  Training loss: 2.3652...  0.5749 sec/batch
Epoch: 3/20...  Training Step: 552...  Training loss: 2.3718...  0.5832 sec/batch
Epoch: 3/20...  Training Step: 553...  Training loss: 2.3550...  0.6014 sec/batch
Epoch: 3/20...  Training Step: 554...  Training loss: 2.3553...  0.5767 sec/batch
Epoch: 3/20...  Training Step: 555...  Training loss: 2.3495...  0.5748 sec/batch
Epoch: 3/20...  Training Step: 556...  Training loss: 2.3874...  0.6135 sec/batch
Epoch: 3/20...  Training Step: 557...  Training loss: 2.3774...  0.5796 sec/batch
Epoch: 3/20...  Training Step: 558...  Training loss: 2.3466...  0.5819 sec/batch
Epoch: 3/20...  Training Step: 559...  Training loss: 2.3516...  0.5800 sec/batch
Epoch: 3/20...  Training Step: 560...  Training loss: 2.3686...  0.5851 sec/batch
Epoch: 3/20...  Training Step: 561...  Training loss: 2.3697...  0.5887 sec/batch
Epoch: 3/20...  Training Step: 562...  Training loss: 2.3669...  0.5778 sec/batch
Epoch: 3/20...  Training Step: 563...  Training loss: 2.3620...  0.5928 sec/batch
Epoch: 3/20...  Training Step: 564...  Training loss: 2.3744...  0.5833 sec/batch
Epoch: 3/20...  Training Step: 565...  Training loss: 2.3669...  0.5887 sec/batch
Epoch: 3/20...  Training Step: 566...  Training loss: 2.3465...  0.5880 sec/batch
Epoch: 3/20...  Training Step: 567...  Training loss: 2.3586...  0.6017 sec/batch
Epoch: 3/20...  Training Step: 568...  Training loss: 2.3600...  0.6004 sec/batch
Epoch: 3/20...  Training Step: 569...  Training loss: 2.3868...  0.5794 sec/batch
Epoch: 3/20...  Training Step: 570...  Training loss: 2.3783...  0.5898 sec/batch
Epoch: 3/20...  Training Step: 571...  Training loss: 2.3786...  0.5740 sec/batch
Epoch: 3/20...  Training Step: 572...  Training loss: 2.3638...  0.5757 sec/batch
Epoch: 3/20...  Training Step: 573...  Training loss: 2.3495...  0.5856 sec/batch
Epoch: 3/20...  Training Step: 574...  Training loss: 2.3381...  0.5812 sec/batch
Epoch: 3/20...  Training Step: 575...  Training loss: 2.3428...  0.5834 sec/batch
Epoch: 3/20...  Training Step: 576...  Training loss: 2.3303...  0.5869 sec/batch
Epoch: 3/20...  Training Step: 577...  Training loss: 2.3469...  0.5876 sec/batch
Epoch: 3/20...  Training Step: 578...  Training loss: 2.3466...  0.5864 sec/batch
Epoch: 3/20...  Training Step: 579...  Training loss: 2.3389...  0.5797 sec/batch
Epoch: 3/20...  Training Step: 580...  Training loss: 2.3745...  0.5848 sec/batch
Epoch: 3/20...  Training Step: 581...  Training loss: 2.3687...  0.5813 sec/batch
Epoch: 3/20...  Training Step: 582...  Training loss: 2.3875...  0.5733 sec/batch
Epoch: 3/20...  Training Step: 583...  Training loss: 2.3899...  0.5887 sec/batch
Epoch: 3/20...  Training Step: 584...  Training loss: 2.3767...  0.5819 sec/batch
Epoch: 3/20...  Training Step: 585...  Training loss: 2.3875...  0.5829 sec/batch
Epoch: 3/20...  Training Step: 586...  Training loss: 2.3953...  0.5842 sec/batch
Epoch: 3/20...  Training Step: 587...  Training loss: 2.3926...  0.5890 sec/batch
Epoch: 3/20...  Training Step: 588...  Training loss: 2.3480...  0.5779 sec/batch
Epoch: 3/20...  Training Step: 589...  Training loss: 2.3949...  0.5907 sec/batch
Epoch: 3/20...  Training Step: 590...  Training loss: 2.3858...  0.5821 sec/batch
Epoch: 3/20...  Training Step: 591...  Training loss: 2.3647...  0.5812 sec/batch
Epoch: 3/20...  Training Step: 592...  Training loss: 2.3704...  0.5874 sec/batch
Epoch: 3/20...  Training Step: 593...  Training loss: 2.3580...  0.5829 sec/batch
Epoch: 3/20...  Training Step: 594...  Training loss: 2.3624...  0.5799 sec/batch
Epoch: 4/20...  Training Step: 595...  Training loss: 2.4167...  0.5776 sec/batch
Epoch: 4/20...  Training Step: 596...  Training loss: 2.3153...  0.5767 sec/batch
Epoch: 4/20...  Training Step: 597...  Training loss: 2.3123...  0.5865 sec/batch
Epoch: 4/20...  Training Step: 598...  Training loss: 2.3258...  0.5899 sec/batch
Epoch: 4/20...  Training Step: 599...  Training loss: 2.3363...  0.5835 sec/batch
Epoch: 4/20...  Training Step: 600...  Training loss: 2.3326...  0.5872 sec/batch
Epoch: 4/20...  Training Step: 601...  Training loss: 2.3386...  0.5721 sec/batch
Epoch: 4/20...  Training Step: 602...  Training loss: 2.3432...  0.5755 sec/batch
Epoch: 4/20...  Training Step: 603...  Training loss: 2.3591...  0.6014 sec/batch
Epoch: 4/20...  Training Step: 604...  Training loss: 2.3337...  0.5817 sec/batch
Epoch: 4/20...  Training Step: 605...  Training loss: 2.3283...  0.5756 sec/batch
Epoch: 4/20...  Training Step: 606...  Training loss: 2.3241...  0.5996 sec/batch
Epoch: 4/20...  Training Step: 607...  Training loss: 2.3463...  0.5981 sec/batch
Epoch: 4/20...  Training Step: 608...  Training loss: 2.3714...  0.5783 sec/batch
Epoch: 4/20...  Training Step: 609...  Training loss: 2.3469...  0.5806 sec/batch
Epoch: 4/20...  Training Step: 610...  Training loss: 2.3397...  0.5876 sec/batch
Epoch: 4/20...  Training Step: 611...  Training loss: 2.3309...  0.5857 sec/batch
Epoch: 4/20...  Training Step: 612...  Training loss: 2.3732...  0.5799 sec/batch
Epoch: 4/20...  Training Step: 613...  Training loss: 2.3531...  0.5844 sec/batch
Epoch: 4/20...  Training Step: 614...  Training loss: 2.3163...  0.5841 sec/batch
Epoch: 4/20...  Training Step: 615...  Training loss: 2.3234...  0.5711 sec/batch
Epoch: 4/20...  Training Step: 616...  Training loss: 2.3573...  0.5761 sec/batch
Epoch: 4/20...  Training Step: 617...  Training loss: 2.3404...  0.5777 sec/batch
Epoch: 4/20...  Training Step: 618...  Training loss: 2.3275...  0.5869 sec/batch
Epoch: 4/20...  Training Step: 619...  Training loss: 2.3179...  0.5796 sec/batch
Epoch: 4/20...  Training Step: 620...  Training loss: 2.3284...  0.5907 sec/batch
Epoch: 4/20...  Training Step: 621...  Training loss: 2.3215...  0.5693 sec/batch
Epoch: 4/20...  Training Step: 622...  Training loss: 2.3321...  0.5844 sec/batch
Epoch: 4/20...  Training Step: 623...  Training loss: 2.3593...  0.5830 sec/batch
Epoch: 4/20...  Training Step: 624...  Training loss: 2.3415...  0.5867 sec/batch
Epoch: 4/20...  Training Step: 625...  Training loss: 2.3431...  0.5842 sec/batch
Epoch: 4/20...  Training Step: 626...  Training loss: 2.3170...  0.5824 sec/batch
Epoch: 4/20...  Training Step: 627...  Training loss: 2.3256...  0.5887 sec/batch
Epoch: 4/20...  Training Step: 628...  Training loss: 2.3458...  0.5880 sec/batch
Epoch: 4/20...  Training Step: 629...  Training loss: 2.3166...  0.5899 sec/batch
Epoch: 4/20...  Training Step: 630...  Training loss: 2.3318...  0.5860 sec/batch
Epoch: 4/20...  Training Step: 631...  Training loss: 2.3181...  0.5791 sec/batch
Epoch: 4/20...  Training Step: 632...  Training loss: 2.2971...  0.5903 sec/batch
Epoch: 4/20...  Training Step: 633...  Training loss: 2.3008...  0.5749 sec/batch
Epoch: 4/20...  Training Step: 634...  Training loss: 2.3013...  0.5864 sec/batch
Epoch: 4/20...  Training Step: 635...  Training loss: 2.3083...  0.5813 sec/batch
Epoch: 4/20...  Training Step: 636...  Training loss: 2.3083...  0.5860 sec/batch
Epoch: 4/20...  Training Step: 637...  Training loss: 2.2939...  0.5969 sec/batch
Epoch: 4/20...  Training Step: 638...  Training loss: 2.3052...  0.5785 sec/batch
Epoch: 4/20...  Training Step: 639...  Training loss: 2.3035...  0.5813 sec/batch
Epoch: 4/20...  Training Step: 640...  Training loss: 2.2687...  0.5885 sec/batch
Epoch: 4/20...  Training Step: 641...  Training loss: 2.3267...  0.5902 sec/batch
Epoch: 4/20...  Training Step: 642...  Training loss: 2.3061...  0.5872 sec/batch
Epoch: 4/20...  Training Step: 643...  Training loss: 2.3101...  0.5863 sec/batch
Epoch: 4/20...  Training Step: 644...  Training loss: 2.3324...  0.5923 sec/batch
Epoch: 4/20...  Training Step: 645...  Training loss: 2.2846...  0.5912 sec/batch
Epoch: 4/20...  Training Step: 646...  Training loss: 2.3351...  0.5817 sec/batch
Epoch: 4/20...  Training Step: 647...  Training loss: 2.3063...  0.5806 sec/batch
Epoch: 4/20...  Training Step: 648...  Training loss: 2.3075...  0.5881 sec/batch
Epoch: 4/20...  Training Step: 649...  Training loss: 2.3010...  0.5870 sec/batch
Epoch: 4/20...  Training Step: 650...  Training loss: 2.3063...  0.5661 sec/batch
Epoch: 4/20...  Training Step: 651...  Training loss: 2.3064...  0.5822 sec/batch
Epoch: 4/20...  Training Step: 652...  Training loss: 2.3060...  0.5849 sec/batch
Epoch: 4/20...  Training Step: 653...  Training loss: 2.2926...  0.5752 sec/batch
Epoch: 4/20...  Training Step: 654...  Training loss: 2.3201...  0.5917 sec/batch
Epoch: 4/20...  Training Step: 655...  Training loss: 2.3083...  0.5817 sec/batch
Epoch: 4/20...  Training Step: 656...  Training loss: 2.3263...  0.5903 sec/batch
Epoch: 4/20...  Training Step: 657...  Training loss: 2.3192...  0.5900 sec/batch
Epoch: 4/20...  Training Step: 658...  Training loss: 2.3016...  0.6148 sec/batch
Epoch: 4/20...  Training Step: 659...  Training loss: 2.3049...  0.5618 sec/batch
Epoch: 4/20...  Training Step: 660...  Training loss: 2.3263...  0.5886 sec/batch
Epoch: 4/20...  Training Step: 661...  Training loss: 2.3192...  0.5840 sec/batch
Epoch: 4/20...  Training Step: 662...  Training loss: 2.2740...  0.5809 sec/batch
Epoch: 4/20...  Training Step: 663...  Training loss: 2.2901...  0.5977 sec/batch
Epoch: 4/20...  Training Step: 664...  Training loss: 2.3034...  0.5837 sec/batch
Epoch: 4/20...  Training Step: 665...  Training loss: 2.3162...  0.5768 sec/batch
Epoch: 4/20...  Training Step: 666...  Training loss: 2.3214...  0.5829 sec/batch
Epoch: 4/20...  Training Step: 667...  Training loss: 2.3126...  0.5765 sec/batch
Epoch: 4/20...  Training Step: 668...  Training loss: 2.2858...  0.5724 sec/batch
Epoch: 4/20...  Training Step: 669...  Training loss: 2.3011...  0.5871 sec/batch
Epoch: 4/20...  Training Step: 670...  Training loss: 2.3496...  0.5888 sec/batch
Epoch: 4/20...  Training Step: 671...  Training loss: 2.2999...  0.5666 sec/batch
Epoch: 4/20...  Training Step: 672...  Training loss: 2.3132...  0.5872 sec/batch
Epoch: 4/20...  Training Step: 673...  Training loss: 2.2876...  0.5886 sec/batch
Epoch: 4/20...  Training Step: 674...  Training loss: 2.2942...  0.5795 sec/batch
Epoch: 4/20...  Training Step: 675...  Training loss: 2.2792...  0.5849 sec/batch
Epoch: 4/20...  Training Step: 676...  Training loss: 2.3135...  0.5791 sec/batch
Epoch: 4/20...  Training Step: 677...  Training loss: 2.2852...  0.5760 sec/batch
Epoch: 4/20...  Training Step: 678...  Training loss: 2.2792...  0.5881 sec/batch
Epoch: 4/20...  Training Step: 679...  Training loss: 2.2680...  0.5839 sec/batch
Epoch: 4/20...  Training Step: 680...  Training loss: 2.2895...  0.5974 sec/batch
Epoch: 4/20...  Training Step: 681...  Training loss: 2.2876...  0.5777 sec/batch
Epoch: 4/20...  Training Step: 682...  Training loss: 2.2880...  0.5919 sec/batch
Epoch: 4/20...  Training Step: 683...  Training loss: 2.2883...  0.5796 sec/batch
Epoch: 4/20...  Training Step: 684...  Training loss: 2.2971...  0.5863 sec/batch
Epoch: 4/20...  Training Step: 685...  Training loss: 2.2743...  0.5856 sec/batch
Epoch: 4/20...  Training Step: 686...  Training loss: 2.2971...  0.5751 sec/batch
Epoch: 4/20...  Training Step: 687...  Training loss: 2.2738...  0.5794 sec/batch
Epoch: 4/20...  Training Step: 688...  Training loss: 2.2787...  0.5792 sec/batch
Epoch: 4/20...  Training Step: 689...  Training loss: 2.2718...  0.5837 sec/batch
Epoch: 4/20...  Training Step: 690...  Training loss: 2.2824...  0.5821 sec/batch
Epoch: 4/20...  Training Step: 691...  Training loss: 2.2815...  0.5780 sec/batch
Epoch: 4/20...  Training Step: 692...  Training loss: 2.2720...  0.5923 sec/batch
Epoch: 4/20...  Training Step: 693...  Training loss: 2.2755...  0.5781 sec/batch
Epoch: 4/20...  Training Step: 694...  Training loss: 2.2750...  0.5830 sec/batch
Epoch: 4/20...  Training Step: 695...  Training loss: 2.3032...  0.5901 sec/batch
Epoch: 4/20...  Training Step: 696...  Training loss: 2.2902...  0.5831 sec/batch
Epoch: 4/20...  Training Step: 697...  Training loss: 2.2685...  0.5952 sec/batch
Epoch: 4/20...  Training Step: 698...  Training loss: 2.2681...  0.5783 sec/batch
Epoch: 4/20...  Training Step: 699...  Training loss: 2.2688...  0.5883 sec/batch
Epoch: 4/20...  Training Step: 700...  Training loss: 2.2863...  0.5766 sec/batch
Epoch: 4/20...  Training Step: 701...  Training loss: 2.2747...  0.5870 sec/batch
Epoch: 4/20...  Training Step: 702...  Training loss: 2.2884...  0.5900 sec/batch
Epoch: 4/20...  Training Step: 703...  Training loss: 2.2926...  0.5839 sec/batch
Epoch: 4/20...  Training Step: 704...  Training loss: 2.2636...  0.5824 sec/batch
Epoch: 4/20...  Training Step: 705...  Training loss: 2.2832...  0.5884 sec/batch
Epoch: 4/20...  Training Step: 706...  Training loss: 2.2944...  0.5886 sec/batch
Epoch: 4/20...  Training Step: 707...  Training loss: 2.2803...  0.5942 sec/batch
Epoch: 4/20...  Training Step: 708...  Training loss: 2.2684...  0.5866 sec/batch
Epoch: 4/20...  Training Step: 709...  Training loss: 2.2797...  0.6114 sec/batch
Epoch: 4/20...  Training Step: 710...  Training loss: 2.2454...  0.5721 sec/batch
Epoch: 4/20...  Training Step: 711...  Training loss: 2.2778...  0.5852 sec/batch
Epoch: 4/20...  Training Step: 712...  Training loss: 2.2762...  0.5686 sec/batch
Epoch: 4/20...  Training Step: 713...  Training loss: 2.3003...  0.5863 sec/batch
Epoch: 4/20...  Training Step: 714...  Training loss: 2.2810...  0.5942 sec/batch
Epoch: 4/20...  Training Step: 715...  Training loss: 2.3006...  0.5797 sec/batch
Epoch: 4/20...  Training Step: 716...  Training loss: 2.2740...  0.5933 sec/batch
Epoch: 4/20...  Training Step: 717...  Training loss: 2.2764...  0.5762 sec/batch
Epoch: 4/20...  Training Step: 718...  Training loss: 2.3008...  0.5826 sec/batch
Epoch: 4/20...  Training Step: 719...  Training loss: 2.2761...  0.5828 sec/batch
Epoch: 4/20...  Training Step: 720...  Training loss: 2.2417...  0.5780 sec/batch
Epoch: 4/20...  Training Step: 721...  Training loss: 2.2972...  0.5887 sec/batch
Epoch: 4/20...  Training Step: 722...  Training loss: 2.2992...  0.6476 sec/batch
Epoch: 4/20...  Training Step: 723...  Training loss: 2.2806...  0.6120 sec/batch
Epoch: 4/20...  Training Step: 724...  Training loss: 2.2831...  0.5947 sec/batch
Epoch: 4/20...  Training Step: 725...  Training loss: 2.2767...  0.5850 sec/batch
Epoch: 4/20...  Training Step: 726...  Training loss: 2.2526...  0.5841 sec/batch
Epoch: 4/20...  Training Step: 727...  Training loss: 2.2842...  0.5781 sec/batch
Epoch: 4/20...  Training Step: 728...  Training loss: 2.2791...  0.5818 sec/batch
Epoch: 4/20...  Training Step: 729...  Training loss: 2.2618...  0.5902 sec/batch
Epoch: 4/20...  Training Step: 730...  Training loss: 2.2767...  0.5873 sec/batch
Epoch: 4/20...  Training Step: 731...  Training loss: 2.2762...  0.5984 sec/batch
Epoch: 4/20...  Training Step: 732...  Training loss: 2.2735...  0.5856 sec/batch
Epoch: 4/20...  Training Step: 733...  Training loss: 2.2991...  0.5862 sec/batch
Epoch: 4/20...  Training Step: 734...  Training loss: 2.2561...  0.5820 sec/batch
Epoch: 4/20...  Training Step: 735...  Training loss: 2.2812...  0.5804 sec/batch
Epoch: 4/20...  Training Step: 736...  Training loss: 2.2633...  0.5897 sec/batch
Epoch: 4/20...  Training Step: 737...  Training loss: 2.2734...  0.5884 sec/batch
Epoch: 4/20...  Training Step: 738...  Training loss: 2.2652...  0.5576 sec/batch
Epoch: 4/20...  Training Step: 739...  Training loss: 2.2636...  0.5892 sec/batch
Epoch: 4/20...  Training Step: 740...  Training loss: 2.2928...  0.5962 sec/batch
Epoch: 4/20...  Training Step: 741...  Training loss: 2.2820...  0.5895 sec/batch
Epoch: 4/20...  Training Step: 742...  Training loss: 2.2992...  0.5815 sec/batch
Epoch: 4/20...  Training Step: 743...  Training loss: 2.2679...  0.5886 sec/batch
Epoch: 4/20...  Training Step: 744...  Training loss: 2.2607...  0.5834 sec/batch
Epoch: 4/20...  Training Step: 745...  Training loss: 2.2756...  0.5856 sec/batch
Epoch: 4/20...  Training Step: 746...  Training loss: 2.3100...  0.5796 sec/batch
Epoch: 4/20...  Training Step: 747...  Training loss: 2.2749...  0.5845 sec/batch
Epoch: 4/20...  Training Step: 748...  Training loss: 2.2842...  0.5805 sec/batch
Epoch: 4/20...  Training Step: 749...  Training loss: 2.2640...  0.5909 sec/batch
Epoch: 4/20...  Training Step: 750...  Training loss: 2.2617...  0.5599 sec/batch
Epoch: 4/20...  Training Step: 751...  Training loss: 2.2679...  0.5950 sec/batch
Epoch: 4/20...  Training Step: 752...  Training loss: 2.2631...  0.5887 sec/batch
Epoch: 4/20...  Training Step: 753...  Training loss: 2.2479...  0.5624 sec/batch
Epoch: 4/20...  Training Step: 754...  Training loss: 2.2899...  0.5803 sec/batch
Epoch: 4/20...  Training Step: 755...  Training loss: 2.2764...  0.5991 sec/batch
Epoch: 4/20...  Training Step: 756...  Training loss: 2.2484...  0.5824 sec/batch
Epoch: 4/20...  Training Step: 757...  Training loss: 2.2701...  0.5889 sec/batch
Epoch: 4/20...  Training Step: 758...  Training loss: 2.2743...  0.5909 sec/batch
Epoch: 4/20...  Training Step: 759...  Training loss: 2.2701...  0.5780 sec/batch
Epoch: 4/20...  Training Step: 760...  Training loss: 2.2778...  0.6292 sec/batch
Epoch: 4/20...  Training Step: 761...  Training loss: 2.2642...  0.5970 sec/batch
Epoch: 4/20...  Training Step: 762...  Training loss: 2.2798...  0.6174 sec/batch
Epoch: 4/20...  Training Step: 763...  Training loss: 2.2562...  0.5935 sec/batch
Epoch: 4/20...  Training Step: 764...  Training loss: 2.2365...  0.5743 sec/batch
Epoch: 4/20...  Training Step: 765...  Training loss: 2.2516...  0.5874 sec/batch
Epoch: 4/20...  Training Step: 766...  Training loss: 2.2564...  0.5760 sec/batch
Epoch: 4/20...  Training Step: 767...  Training loss: 2.2760...  0.5923 sec/batch
Epoch: 4/20...  Training Step: 768...  Training loss: 2.2843...  0.5907 sec/batch
Epoch: 4/20...  Training Step: 769...  Training loss: 2.2778...  0.5822 sec/batch
Epoch: 4/20...  Training Step: 770...  Training loss: 2.2689...  0.5859 sec/batch
Epoch: 4/20...  Training Step: 771...  Training loss: 2.2486...  0.5925 sec/batch
Epoch: 4/20...  Training Step: 772...  Training loss: 2.2478...  0.5898 sec/batch
Epoch: 4/20...  Training Step: 773...  Training loss: 2.2404...  0.5855 sec/batch
Epoch: 4/20...  Training Step: 774...  Training loss: 2.2211...  0.5922 sec/batch
Epoch: 4/20...  Training Step: 775...  Training loss: 2.2426...  0.5884 sec/batch
Epoch: 4/20...  Training Step: 776...  Training loss: 2.2589...  0.5833 sec/batch
Epoch: 4/20...  Training Step: 777...  Training loss: 2.2476...  0.5877 sec/batch
Epoch: 4/20...  Training Step: 778...  Training loss: 2.2836...  0.5883 sec/batch
Epoch: 4/20...  Training Step: 779...  Training loss: 2.2806...  0.5844 sec/batch
Epoch: 4/20...  Training Step: 780...  Training loss: 2.2566...  0.5862 sec/batch
Epoch: 4/20...  Training Step: 781...  Training loss: 2.2423...  0.5877 sec/batch
Epoch: 4/20...  Training Step: 782...  Training loss: 2.2396...  0.5926 sec/batch
Epoch: 4/20...  Training Step: 783...  Training loss: 2.2411...  0.5707 sec/batch
Epoch: 4/20...  Training Step: 784...  Training loss: 2.2543...  0.5829 sec/batch
Epoch: 4/20...  Training Step: 785...  Training loss: 2.2632...  0.5869 sec/batch
Epoch: 4/20...  Training Step: 786...  Training loss: 2.2277...  0.5912 sec/batch
Epoch: 4/20...  Training Step: 787...  Training loss: 2.2558...  0.5968 sec/batch
Epoch: 4/20...  Training Step: 788...  Training loss: 2.2465...  0.5898 sec/batch
Epoch: 4/20...  Training Step: 789...  Training loss: 2.2268...  0.5870 sec/batch
Epoch: 4/20...  Training Step: 790...  Training loss: 2.2476...  0.5849 sec/batch
Epoch: 4/20...  Training Step: 791...  Training loss: 2.2350...  0.5940 sec/batch
Epoch: 4/20...  Training Step: 792...  Training loss: 2.2322...  0.5874 sec/batch
Epoch: 5/20...  Training Step: 793...  Training loss: 2.3210...  0.5860 sec/batch
Epoch: 5/20...  Training Step: 794...  Training loss: 2.2168...  0.5849 sec/batch
Epoch: 5/20...  Training Step: 795...  Training loss: 2.2200...  0.5871 sec/batch
Epoch: 5/20...  Training Step: 796...  Training loss: 2.2386...  0.5767 sec/batch
Epoch: 5/20...  Training Step: 797...  Training loss: 2.2319...  0.5912 sec/batch
Epoch: 5/20...  Training Step: 798...  Training loss: 2.2309...  0.5874 sec/batch
Epoch: 5/20...  Training Step: 799...  Training loss: 2.2449...  0.5744 sec/batch
Epoch: 5/20...  Training Step: 800...  Training loss: 2.2582...  0.5806 sec/batch
Epoch: 5/20...  Training Step: 801...  Training loss: 2.2589...  0.6164 sec/batch
Epoch: 5/20...  Training Step: 802...  Training loss: 2.2340...  0.5884 sec/batch
Epoch: 5/20...  Training Step: 803...  Training loss: 2.2267...  0.5914 sec/batch
Epoch: 5/20...  Training Step: 804...  Training loss: 2.2341...  0.5850 sec/batch
Epoch: 5/20...  Training Step: 805...  Training loss: 2.2487...  0.5935 sec/batch
Epoch: 5/20...  Training Step: 806...  Training loss: 2.2823...  0.5803 sec/batch
Epoch: 5/20...  Training Step: 807...  Training loss: 2.2438...  0.5963 sec/batch
Epoch: 5/20...  Training Step: 808...  Training loss: 2.2375...  0.5893 sec/batch
Epoch: 5/20...  Training Step: 809...  Training loss: 2.2342...  0.5714 sec/batch
Epoch: 5/20...  Training Step: 810...  Training loss: 2.2754...  0.6114 sec/batch
Epoch: 5/20...  Training Step: 811...  Training loss: 2.2441...  0.5933 sec/batch
Epoch: 5/20...  Training Step: 812...  Training loss: 2.2313...  0.5906 sec/batch
Epoch: 5/20...  Training Step: 813...  Training loss: 2.2254...  0.5903 sec/batch
Epoch: 5/20...  Training Step: 814...  Training loss: 2.2677...  0.5847 sec/batch
Epoch: 5/20...  Training Step: 815...  Training loss: 2.2466...  0.5977 sec/batch
Epoch: 5/20...  Training Step: 816...  Training loss: 2.2312...  0.5720 sec/batch
Epoch: 5/20...  Training Step: 817...  Training loss: 2.2228...  0.5957 sec/batch
Epoch: 5/20...  Training Step: 818...  Training loss: 2.2367...  0.5753 sec/batch
Epoch: 5/20...  Training Step: 819...  Training loss: 2.2170...  0.5856 sec/batch
Epoch: 5/20...  Training Step: 820...  Training loss: 2.2396...  0.5830 sec/batch
Epoch: 5/20...  Training Step: 821...  Training loss: 2.2658...  0.5764 sec/batch
Epoch: 5/20...  Training Step: 822...  Training loss: 2.2534...  0.5945 sec/batch
Epoch: 5/20...  Training Step: 823...  Training loss: 2.2516...  0.5737 sec/batch
Epoch: 5/20...  Training Step: 824...  Training loss: 2.2273...  0.5893 sec/batch
Epoch: 5/20...  Training Step: 825...  Training loss: 2.2357...  0.5866 sec/batch
Epoch: 5/20...  Training Step: 826...  Training loss: 2.2616...  0.5840 sec/batch
Epoch: 5/20...  Training Step: 827...  Training loss: 2.2281...  0.5872 sec/batch
Epoch: 5/20...  Training Step: 828...  Training loss: 2.2481...  0.5839 sec/batch
Epoch: 5/20...  Training Step: 829...  Training loss: 2.2316...  0.5935 sec/batch
Epoch: 5/20...  Training Step: 830...  Training loss: 2.1885...  0.5867 sec/batch
Epoch: 5/20...  Training Step: 831...  Training loss: 2.2129...  0.5934 sec/batch
Epoch: 5/20...  Training Step: 832...  Training loss: 2.2089...  0.5888 sec/batch
Epoch: 5/20...  Training Step: 833...  Training loss: 2.2118...  0.5835 sec/batch
Epoch: 5/20...  Training Step: 834...  Training loss: 2.2176...  0.5851 sec/batch
Epoch: 5/20...  Training Step: 835...  Training loss: 2.2169...  0.5879 sec/batch
Epoch: 5/20...  Training Step: 836...  Training loss: 2.2199...  0.5831 sec/batch
Epoch: 5/20...  Training Step: 837...  Training loss: 2.2233...  0.5785 sec/batch
Epoch: 5/20...  Training Step: 838...  Training loss: 2.1832...  0.5844 sec/batch
Epoch: 5/20...  Training Step: 839...  Training loss: 2.2409...  0.5855 sec/batch
Epoch: 5/20...  Training Step: 840...  Training loss: 2.2165...  0.5822 sec/batch
Epoch: 5/20...  Training Step: 841...  Training loss: 2.2173...  0.5818 sec/batch
Epoch: 5/20...  Training Step: 842...  Training loss: 2.2518...  0.5773 sec/batch
Epoch: 5/20...  Training Step: 843...  Training loss: 2.1901...  0.5684 sec/batch
Epoch: 5/20...  Training Step: 844...  Training loss: 2.2298...  0.5932 sec/batch
Epoch: 5/20...  Training Step: 845...  Training loss: 2.2298...  0.5803 sec/batch
Epoch: 5/20...  Training Step: 846...  Training loss: 2.2130...  0.5803 sec/batch
Epoch: 5/20...  Training Step: 847...  Training loss: 2.2048...  0.5753 sec/batch
Epoch: 5/20...  Training Step: 848...  Training loss: 2.2373...  0.5940 sec/batch
Epoch: 5/20...  Training Step: 849...  Training loss: 2.2213...  0.5837 sec/batch
Epoch: 5/20...  Training Step: 850...  Training loss: 2.2126...  0.5862 sec/batch
Epoch: 5/20...  Training Step: 851...  Training loss: 2.2188...  0.5967 sec/batch
Epoch: 5/20...  Training Step: 852...  Training loss: 2.2372...  0.5853 sec/batch
Epoch: 5/20...  Training Step: 853...  Training loss: 2.2221...  0.5882 sec/batch
Epoch: 5/20...  Training Step: 854...  Training loss: 2.2386...  0.5721 sec/batch
Epoch: 5/20...  Training Step: 855...  Training loss: 2.2450...  0.5847 sec/batch
Epoch: 5/20...  Training Step: 856...  Training loss: 2.2167...  0.5999 sec/batch
Epoch: 5/20...  Training Step: 857...  Training loss: 2.2157...  0.5915 sec/batch
Epoch: 5/20...  Training Step: 858...  Training loss: 2.2456...  0.5616 sec/batch
Epoch: 5/20...  Training Step: 859...  Training loss: 2.2284...  0.5922 sec/batch
Epoch: 5/20...  Training Step: 860...  Training loss: 2.1835...  0.5842 sec/batch
Epoch: 5/20...  Training Step: 861...  Training loss: 2.2005...  0.6159 sec/batch
Epoch: 5/20...  Training Step: 862...  Training loss: 2.2228...  0.5870 sec/batch
Epoch: 5/20...  Training Step: 863...  Training loss: 2.2443...  0.5793 sec/batch
Epoch: 5/20...  Training Step: 864...  Training loss: 2.2292...  0.5915 sec/batch
Epoch: 5/20...  Training Step: 865...  Training loss: 2.2317...  0.5963 sec/batch
Epoch: 5/20...  Training Step: 866...  Training loss: 2.2058...  0.5878 sec/batch
Epoch: 5/20...  Training Step: 867...  Training loss: 2.2143...  0.5862 sec/batch
Epoch: 5/20...  Training Step: 868...  Training loss: 2.2508...  0.5880 sec/batch
Epoch: 5/20...  Training Step: 869...  Training loss: 2.2145...  0.5870 sec/batch
Epoch: 5/20...  Training Step: 870...  Training loss: 2.2309...  0.5940 sec/batch
Epoch: 5/20...  Training Step: 871...  Training loss: 2.2047...  0.5913 sec/batch
Epoch: 5/20...  Training Step: 872...  Training loss: 2.2073...  0.5880 sec/batch
Epoch: 5/20...  Training Step: 873...  Training loss: 2.1969...  0.5881 sec/batch
Epoch: 5/20...  Training Step: 874...  Training loss: 2.2296...  0.5775 sec/batch
Epoch: 5/20...  Training Step: 875...  Training loss: 2.1916...  0.5878 sec/batch
Epoch: 5/20...  Training Step: 876...  Training loss: 2.2015...  0.5865 sec/batch
Epoch: 5/20...  Training Step: 877...  Training loss: 2.1655...  0.5846 sec/batch
Epoch: 5/20...  Training Step: 878...  Training loss: 2.2021...  0.5876 sec/batch
Epoch: 5/20...  Training Step: 879...  Training loss: 2.2031...  0.5871 sec/batch
Epoch: 5/20...  Training Step: 880...  Training loss: 2.2043...  0.5973 sec/batch
Epoch: 5/20...  Training Step: 881...  Training loss: 2.1833...  0.5895 sec/batch
Epoch: 5/20...  Training Step: 882...  Training loss: 2.2126...  0.5870 sec/batch
Epoch: 5/20...  Training Step: 883...  Training loss: 2.1993...  0.5921 sec/batch
Epoch: 5/20...  Training Step: 884...  Training loss: 2.2076...  0.5838 sec/batch
Epoch: 5/20...  Training Step: 885...  Training loss: 2.1845...  0.5824 sec/batch
Epoch: 5/20...  Training Step: 886...  Training loss: 2.1840...  0.5828 sec/batch
Epoch: 5/20...  Training Step: 887...  Training loss: 2.1800...  0.5879 sec/batch
Epoch: 5/20...  Training Step: 888...  Training loss: 2.2063...  0.5835 sec/batch
Epoch: 5/20...  Training Step: 889...  Training loss: 2.2025...  0.5847 sec/batch
Epoch: 5/20...  Training Step: 890...  Training loss: 2.1884...  0.5737 sec/batch
Epoch: 5/20...  Training Step: 891...  Training loss: 2.1892...  0.5908 sec/batch
Epoch: 5/20...  Training Step: 892...  Training loss: 2.1818...  0.5850 sec/batch
Epoch: 5/20...  Training Step: 893...  Training loss: 2.2140...  0.5788 sec/batch
Epoch: 5/20...  Training Step: 894...  Training loss: 2.2060...  0.5929 sec/batch
Epoch: 5/20...  Training Step: 895...  Training loss: 2.1946...  0.5985 sec/batch
Epoch: 5/20...  Training Step: 896...  Training loss: 2.1972...  0.5673 sec/batch
Epoch: 5/20...  Training Step: 897...  Training loss: 2.2036...  0.5910 sec/batch
Epoch: 5/20...  Training Step: 898...  Training loss: 2.2023...  0.5887 sec/batch
Epoch: 5/20...  Training Step: 899...  Training loss: 2.1965...  0.5802 sec/batch
Epoch: 5/20...  Training Step: 900...  Training loss: 2.2123...  0.5802 sec/batch
Epoch: 5/20...  Training Step: 901...  Training loss: 2.2164...  0.5986 sec/batch
Epoch: 5/20...  Training Step: 902...  Training loss: 2.1941...  0.5835 sec/batch
Epoch: 5/20...  Training Step: 903...  Training loss: 2.2004...  0.5822 sec/batch
Epoch: 5/20...  Training Step: 904...  Training loss: 2.2153...  0.5859 sec/batch
Epoch: 5/20...  Training Step: 905...  Training loss: 2.1920...  0.5758 sec/batch
Epoch: 5/20...  Training Step: 906...  Training loss: 2.1908...  0.5979 sec/batch
Epoch: 5/20...  Training Step: 907...  Training loss: 2.1899...  0.6000 sec/batch
Epoch: 5/20...  Training Step: 908...  Training loss: 2.1565...  0.5875 sec/batch
Epoch: 5/20...  Training Step: 909...  Training loss: 2.2092...  0.5895 sec/batch
Epoch: 5/20...  Training Step: 910...  Training loss: 2.1971...  0.5832 sec/batch
Epoch: 5/20...  Training Step: 911...  Training loss: 2.2132...  0.5864 sec/batch
Epoch: 5/20...  Training Step: 912...  Training loss: 2.2022...  0.6139 sec/batch
Epoch: 5/20...  Training Step: 913...  Training loss: 2.2195...  0.5908 sec/batch
Epoch: 5/20...  Training Step: 914...  Training loss: 2.1899...  0.5880 sec/batch
Epoch: 5/20...  Training Step: 915...  Training loss: 2.1913...  0.5853 sec/batch
Epoch: 5/20...  Training Step: 916...  Training loss: 2.2068...  0.5900 sec/batch
Epoch: 5/20...  Training Step: 917...  Training loss: 2.1922...  0.5921 sec/batch
Epoch: 5/20...  Training Step: 918...  Training loss: 2.1575...  0.5968 sec/batch
Epoch: 5/20...  Training Step: 919...  Training loss: 2.1916...  0.5759 sec/batch
Epoch: 5/20...  Training Step: 920...  Training loss: 2.2083...  0.5862 sec/batch
Epoch: 5/20...  Training Step: 921...  Training loss: 2.2001...  0.5856 sec/batch
Epoch: 5/20...  Training Step: 922...  Training loss: 2.2100...  0.5900 sec/batch
Epoch: 5/20...  Training Step: 923...  Training loss: 2.1882...  0.5927 sec/batch
Epoch: 5/20...  Training Step: 924...  Training loss: 2.1725...  0.5851 sec/batch
Epoch: 5/20...  Training Step: 925...  Training loss: 2.2035...  0.5934 sec/batch
Epoch: 5/20...  Training Step: 926...  Training loss: 2.2081...  0.5846 sec/batch
Epoch: 5/20...  Training Step: 927...  Training loss: 2.1859...  0.5854 sec/batch
Epoch: 5/20...  Training Step: 928...  Training loss: 2.2072...  0.5856 sec/batch
Epoch: 5/20...  Training Step: 929...  Training loss: 2.1952...  0.5887 sec/batch
Epoch: 5/20...  Training Step: 930...  Training loss: 2.2025...  0.5823 sec/batch
Epoch: 5/20...  Training Step: 931...  Training loss: 2.2342...  0.5876 sec/batch
Epoch: 5/20...  Training Step: 932...  Training loss: 2.1727...  0.5786 sec/batch
Epoch: 5/20...  Training Step: 933...  Training loss: 2.2019...  0.5875 sec/batch
Epoch: 5/20...  Training Step: 934...  Training loss: 2.1877...  0.5834 sec/batch
Epoch: 5/20...  Training Step: 935...  Training loss: 2.1975...  0.5878 sec/batch
Epoch: 5/20...  Training Step: 936...  Training loss: 2.1767...  0.5880 sec/batch
Epoch: 5/20...  Training Step: 937...  Training loss: 2.1888...  0.5958 sec/batch
Epoch: 5/20...  Training Step: 938...  Training loss: 2.2122...  0.5890 sec/batch
Epoch: 5/20...  Training Step: 939...  Training loss: 2.1905...  0.5851 sec/batch
Epoch: 5/20...  Training Step: 940...  Training loss: 2.2176...  0.5858 sec/batch
Epoch: 5/20...  Training Step: 941...  Training loss: 2.1907...  0.5917 sec/batch
Epoch: 5/20...  Training Step: 942...  Training loss: 2.1897...  0.5877 sec/batch
Epoch: 5/20...  Training Step: 943...  Training loss: 2.1971...  0.5909 sec/batch
Epoch: 5/20...  Training Step: 944...  Training loss: 2.2290...  0.5816 sec/batch
Epoch: 5/20...  Training Step: 945...  Training loss: 2.1956...  0.5988 sec/batch
Epoch: 5/20...  Training Step: 946...  Training loss: 2.2068...  0.5912 sec/batch
Epoch: 5/20...  Training Step: 947...  Training loss: 2.1894...  0.5857 sec/batch
Epoch: 5/20...  Training Step: 948...  Training loss: 2.1823...  0.5895 sec/batch
Epoch: 5/20...  Training Step: 949...  Training loss: 2.1850...  0.5778 sec/batch
Epoch: 5/20...  Training Step: 950...  Training loss: 2.1851...  0.5889 sec/batch
Epoch: 5/20...  Training Step: 951...  Training loss: 2.1687...  0.5835 sec/batch
Epoch: 5/20...  Training Step: 952...  Training loss: 2.2252...  0.5853 sec/batch
Epoch: 5/20...  Training Step: 953...  Training loss: 2.2019...  0.5843 sec/batch
Epoch: 5/20...  Training Step: 954...  Training loss: 2.1796...  0.5911 sec/batch
Epoch: 5/20...  Training Step: 955...  Training loss: 2.1910...  0.5898 sec/batch
Epoch: 5/20...  Training Step: 956...  Training loss: 2.1989...  0.5916 sec/batch
Epoch: 5/20...  Training Step: 957...  Training loss: 2.2014...  0.5789 sec/batch
Epoch: 5/20...  Training Step: 958...  Training loss: 2.1860...  0.5851 sec/batch
Epoch: 5/20...  Training Step: 959...  Training loss: 2.1887...  0.5989 sec/batch
Epoch: 5/20...  Training Step: 960...  Training loss: 2.2098...  0.5885 sec/batch
Epoch: 5/20...  Training Step: 961...  Training loss: 2.1875...  0.5922 sec/batch
Epoch: 5/20...  Training Step: 962...  Training loss: 2.1729...  0.5872 sec/batch
Epoch: 5/20...  Training Step: 963...  Training loss: 2.1788...  0.6148 sec/batch
Epoch: 5/20...  Training Step: 964...  Training loss: 2.1810...  0.5818 sec/batch
Epoch: 5/20...  Training Step: 965...  Training loss: 2.2002...  0.5929 sec/batch
Epoch: 5/20...  Training Step: 966...  Training loss: 2.1962...  0.5835 sec/batch
Epoch: 5/20...  Training Step: 967...  Training loss: 2.1993...  0.5883 sec/batch
Epoch: 5/20...  Training Step: 968...  Training loss: 2.1779...  0.5970 sec/batch
Epoch: 5/20...  Training Step: 969...  Training loss: 2.1705...  0.5912 sec/batch
Epoch: 5/20...  Training Step: 970...  Training loss: 2.1645...  0.5835 sec/batch
Epoch: 5/20...  Training Step: 971...  Training loss: 2.1657...  0.5947 sec/batch
Epoch: 5/20...  Training Step: 972...  Training loss: 2.1531...  0.5901 sec/batch
Epoch: 5/20...  Training Step: 973...  Training loss: 2.1652...  0.5828 sec/batch
Epoch: 5/20...  Training Step: 974...  Training loss: 2.1904...  0.5875 sec/batch
Epoch: 5/20...  Training Step: 975...  Training loss: 2.1730...  0.5979 sec/batch
Epoch: 5/20...  Training Step: 976...  Training loss: 2.2105...  0.5870 sec/batch
Epoch: 5/20...  Training Step: 977...  Training loss: 2.1868...  0.6211 sec/batch
Epoch: 5/20...  Training Step: 978...  Training loss: 2.1885...  0.5518 sec/batch
Epoch: 5/20...  Training Step: 979...  Training loss: 2.1662...  0.5910 sec/batch
Epoch: 5/20...  Training Step: 980...  Training loss: 2.1569...  0.5929 sec/batch
Epoch: 5/20...  Training Step: 981...  Training loss: 2.1647...  0.5869 sec/batch
Epoch: 5/20...  Training Step: 982...  Training loss: 2.1722...  0.5868 sec/batch
Epoch: 5/20...  Training Step: 983...  Training loss: 2.1948...  0.5866 sec/batch
Epoch: 5/20...  Training Step: 984...  Training loss: 2.1519...  0.5815 sec/batch
Epoch: 5/20...  Training Step: 985...  Training loss: 2.1783...  0.5917 sec/batch
Epoch: 5/20...  Training Step: 986...  Training loss: 2.1726...  0.5923 sec/batch
Epoch: 5/20...  Training Step: 987...  Training loss: 2.1490...  0.5933 sec/batch
Epoch: 5/20...  Training Step: 988...  Training loss: 2.1822...  0.5895 sec/batch
Epoch: 5/20...  Training Step: 989...  Training loss: 2.1844...  0.5925 sec/batch
Epoch: 5/20...  Training Step: 990...  Training loss: 2.1623...  0.5831 sec/batch
Epoch: 6/20...  Training Step: 991...  Training loss: 2.2449...  0.5932 sec/batch
Epoch: 6/20...  Training Step: 992...  Training loss: 2.1491...  0.5929 sec/batch
Epoch: 6/20...  Training Step: 993...  Training loss: 2.1521...  0.5854 sec/batch
Epoch: 6/20...  Training Step: 994...  Training loss: 2.1641...  0.5833 sec/batch
Epoch: 6/20...  Training Step: 995...  Training loss: 2.1737...  0.5791 sec/batch
Epoch: 6/20...  Training Step: 996...  Training loss: 2.1566...  0.5849 sec/batch
Epoch: 6/20...  Training Step: 997...  Training loss: 2.1813...  0.5841 sec/batch
Epoch: 6/20...  Training Step: 998...  Training loss: 2.1762...  0.5890 sec/batch
Epoch: 6/20...  Training Step: 999...  Training loss: 2.2017...  0.5964 sec/batch
Epoch: 6/20...  Training Step: 1000...  Training loss: 2.1674...  0.5730 sec/batch
Epoch: 6/20...  Training Step: 1001...  Training loss: 2.1478...  0.6157 sec/batch
Epoch: 6/20...  Training Step: 1002...  Training loss: 2.1505...  0.5865 sec/batch
Epoch: 6/20...  Training Step: 1003...  Training loss: 2.1729...  0.5936 sec/batch
Epoch: 6/20...  Training Step: 1004...  Training loss: 2.2152...  0.5916 sec/batch
Epoch: 6/20...  Training Step: 1005...  Training loss: 2.1685...  0.5857 sec/batch
Epoch: 6/20...  Training Step: 1006...  Training loss: 2.1712...  0.5803 sec/batch
Epoch: 6/20...  Training Step: 1007...  Training loss: 2.1716...  0.5892 sec/batch
Epoch: 6/20...  Training Step: 1008...  Training loss: 2.2055...  0.5900 sec/batch
Epoch: 6/20...  Training Step: 1009...  Training loss: 2.1804...  0.5724 sec/batch
Epoch: 6/20...  Training Step: 1010...  Training loss: 2.1721...  0.6023 sec/batch
Epoch: 6/20...  Training Step: 1011...  Training loss: 2.1612...  0.5830 sec/batch
Epoch: 6/20...  Training Step: 1012...  Training loss: 2.1905...  0.5923 sec/batch
Epoch: 6/20...  Training Step: 1013...  Training loss: 2.1760...  0.6123 sec/batch
Epoch: 6/20...  Training Step: 1014...  Training loss: 2.1712...  0.5944 sec/batch
Epoch: 6/20...  Training Step: 1015...  Training loss: 2.1598...  0.5845 sec/batch
Epoch: 6/20...  Training Step: 1016...  Training loss: 2.1635...  0.5920 sec/batch
Epoch: 6/20...  Training Step: 1017...  Training loss: 2.1628...  0.5706 sec/batch
Epoch: 6/20...  Training Step: 1018...  Training loss: 2.1716...  0.5969 sec/batch
Epoch: 6/20...  Training Step: 1019...  Training loss: 2.2007...  0.5835 sec/batch
Epoch: 6/20...  Training Step: 1020...  Training loss: 2.1857...  0.5897 sec/batch
Epoch: 6/20...  Training Step: 1021...  Training loss: 2.1772...  0.5743 sec/batch
Epoch: 6/20...  Training Step: 1022...  Training loss: 2.1522...  0.5914 sec/batch
Epoch: 6/20...  Training Step: 1023...  Training loss: 2.1676...  0.5895 sec/batch
Epoch: 6/20...  Training Step: 1024...  Training loss: 2.1959...  0.5927 sec/batch
Epoch: 6/20...  Training Step: 1025...  Training loss: 2.1525...  0.6007 sec/batch
Epoch: 6/20...  Training Step: 1026...  Training loss: 2.1636...  0.5831 sec/batch
Epoch: 6/20...  Training Step: 1027...  Training loss: 2.1694...  0.6016 sec/batch
Epoch: 6/20...  Training Step: 1028...  Training loss: 2.1305...  0.5920 sec/batch
Epoch: 6/20...  Training Step: 1029...  Training loss: 2.1342...  0.5802 sec/batch
Epoch: 6/20...  Training Step: 1030...  Training loss: 2.1398...  0.5760 sec/batch
Epoch: 6/20...  Training Step: 1031...  Training loss: 2.1540...  0.5970 sec/batch
Epoch: 6/20...  Training Step: 1032...  Training loss: 2.1529...  0.5737 sec/batch
Epoch: 6/20...  Training Step: 1033...  Training loss: 2.1455...  0.5884 sec/batch
Epoch: 6/20...  Training Step: 1034...  Training loss: 2.1337...  0.5917 sec/batch
Epoch: 6/20...  Training Step: 1035...  Training loss: 2.1578...  0.7308 sec/batch
Epoch: 6/20...  Training Step: 1036...  Training loss: 2.1085...  0.7178 sec/batch
Epoch: 6/20...  Training Step: 1037...  Training loss: 2.1737...  0.5979 sec/batch
Epoch: 6/20...  Training Step: 1038...  Training loss: 2.1462...  0.5939 sec/batch
Epoch: 6/20...  Training Step: 1039...  Training loss: 2.1636...  0.5894 sec/batch
Epoch: 6/20...  Training Step: 1040...  Training loss: 2.1847...  0.5923 sec/batch
Epoch: 6/20...  Training Step: 1041...  Training loss: 2.1335...  0.5902 sec/batch
Epoch: 6/20...  Training Step: 1042...  Training loss: 2.1829...  0.5758 sec/batch
Epoch: 6/20...  Training Step: 1043...  Training loss: 2.1531...  0.5930 sec/batch
Epoch: 6/20...  Training Step: 1044...  Training loss: 2.1457...  0.5854 sec/batch
Epoch: 6/20...  Training Step: 1045...  Training loss: 2.1297...  0.5954 sec/batch
Epoch: 6/20...  Training Step: 1046...  Training loss: 2.1656...  0.5649 sec/batch
Epoch: 6/20...  Training Step: 1047...  Training loss: 2.1638...  0.5945 sec/batch
Epoch: 6/20...  Training Step: 1048...  Training loss: 2.1467...  0.5865 sec/batch
Epoch: 6/20...  Training Step: 1049...  Training loss: 2.1441...  0.5893 sec/batch
Epoch: 6/20...  Training Step: 1050...  Training loss: 2.1728...  0.5959 sec/batch
Epoch: 6/20...  Training Step: 1051...  Training loss: 2.1481...  0.5826 sec/batch
Epoch: 6/20...  Training Step: 1052...  Training loss: 2.1762...  0.6038 sec/batch
Epoch: 6/20...  Training Step: 1053...  Training loss: 2.1782...  0.5768 sec/batch
Epoch: 6/20...  Training Step: 1054...  Training loss: 2.1615...  0.5862 sec/batch
Epoch: 6/20...  Training Step: 1055...  Training loss: 2.1389...  0.5931 sec/batch
Epoch: 6/20...  Training Step: 1056...  Training loss: 2.1698...  0.5825 sec/batch
Epoch: 6/20...  Training Step: 1057...  Training loss: 2.1626...  0.5910 sec/batch
Epoch: 6/20...  Training Step: 1058...  Training loss: 2.1158...  0.5834 sec/batch
Epoch: 6/20...  Training Step: 1059...  Training loss: 2.1276...  0.5970 sec/batch
Epoch: 6/20...  Training Step: 1060...  Training loss: 2.1478...  0.5829 sec/batch
Epoch: 6/20...  Training Step: 1061...  Training loss: 2.1802...  0.5934 sec/batch
Epoch: 6/20...  Training Step: 1062...  Training loss: 2.1740...  0.6027 sec/batch
Epoch: 6/20...  Training Step: 1063...  Training loss: 2.1725...  0.5804 sec/batch
Epoch: 6/20...  Training Step: 1064...  Training loss: 2.1473...  0.6233 sec/batch
Epoch: 6/20...  Training Step: 1065...  Training loss: 2.1481...  0.5901 sec/batch
Epoch: 6/20...  Training Step: 1066...  Training loss: 2.1750...  0.5864 sec/batch
Epoch: 6/20...  Training Step: 1067...  Training loss: 2.1454...  0.6127 sec/batch
Epoch: 6/20...  Training Step: 1068...  Training loss: 2.1528...  0.6061 sec/batch
Epoch: 6/20...  Training Step: 1069...  Training loss: 2.1325...  0.5865 sec/batch
Epoch: 6/20...  Training Step: 1070...  Training loss: 2.1356...  0.5827 sec/batch
Epoch: 6/20...  Training Step: 1071...  Training loss: 2.1162...  0.5942 sec/batch
Epoch: 6/20...  Training Step: 1072...  Training loss: 2.1636...  0.5944 sec/batch
Epoch: 6/20...  Training Step: 1073...  Training loss: 2.1280...  0.5922 sec/batch
Epoch: 6/20...  Training Step: 1074...  Training loss: 2.1309...  0.5892 sec/batch
Epoch: 6/20...  Training Step: 1075...  Training loss: 2.1117...  0.6070 sec/batch
Epoch: 6/20...  Training Step: 1076...  Training loss: 2.1308...  0.6148 sec/batch
Epoch: 6/20...  Training Step: 1077...  Training loss: 2.1405...  0.5856 sec/batch
Epoch: 6/20...  Training Step: 1078...  Training loss: 2.1346...  0.5950 sec/batch
Epoch: 6/20...  Training Step: 1079...  Training loss: 2.1207...  0.5923 sec/batch
Epoch: 6/20...  Training Step: 1080...  Training loss: 2.1522...  0.5864 sec/batch
Epoch: 6/20...  Training Step: 1081...  Training loss: 2.1363...  0.5938 sec/batch
Epoch: 6/20...  Training Step: 1082...  Training loss: 2.1426...  0.5869 sec/batch
Epoch: 6/20...  Training Step: 1083...  Training loss: 2.1196...  0.5783 sec/batch
Epoch: 6/20...  Training Step: 1084...  Training loss: 2.1239...  0.5919 sec/batch
Epoch: 6/20...  Training Step: 1085...  Training loss: 2.1236...  0.5965 sec/batch
Epoch: 6/20...  Training Step: 1086...  Training loss: 2.1431...  0.5700 sec/batch
Epoch: 6/20...  Training Step: 1087...  Training loss: 2.1335...  0.5882 sec/batch
Epoch: 6/20...  Training Step: 1088...  Training loss: 2.1220...  0.5982 sec/batch
Epoch: 6/20...  Training Step: 1089...  Training loss: 2.1323...  0.5822 sec/batch
Epoch: 6/20...  Training Step: 1090...  Training loss: 2.1164...  0.5922 sec/batch
Epoch: 6/20...  Training Step: 1091...  Training loss: 2.1536...  0.5907 sec/batch
Epoch: 6/20...  Training Step: 1092...  Training loss: 2.1469...  0.5764 sec/batch
Epoch: 6/20...  Training Step: 1093...  Training loss: 2.1264...  0.5704 sec/batch
Epoch: 6/20...  Training Step: 1094...  Training loss: 2.1345...  0.5890 sec/batch
Epoch: 6/20...  Training Step: 1095...  Training loss: 2.1376...  0.5820 sec/batch
Epoch: 6/20...  Training Step: 1096...  Training loss: 2.1367...  0.5847 sec/batch
Epoch: 6/20...  Training Step: 1097...  Training loss: 2.1384...  0.5766 sec/batch
Epoch: 6/20...  Training Step: 1098...  Training loss: 2.1414...  0.5836 sec/batch
Epoch: 6/20...  Training Step: 1099...  Training loss: 2.1567...  0.5926 sec/batch
Epoch: 6/20...  Training Step: 1100...  Training loss: 2.1339...  0.5895 sec/batch
Epoch: 6/20...  Training Step: 1101...  Training loss: 2.1367...  0.5960 sec/batch
Epoch: 6/20...  Training Step: 1102...  Training loss: 2.1396...  0.5907 sec/batch
Epoch: 6/20...  Training Step: 1103...  Training loss: 2.1447...  0.5877 sec/batch
Epoch: 6/20...  Training Step: 1104...  Training loss: 2.1208...  0.5915 sec/batch
Epoch: 6/20...  Training Step: 1105...  Training loss: 2.1314...  0.5873 sec/batch
Epoch: 6/20...  Training Step: 1106...  Training loss: 2.0933...  0.5978 sec/batch
Epoch: 6/20...  Training Step: 1107...  Training loss: 2.1395...  0.5919 sec/batch
Epoch: 6/20...  Training Step: 1108...  Training loss: 2.1310...  0.5815 sec/batch
Epoch: 6/20...  Training Step: 1109...  Training loss: 2.1477...  0.5916 sec/batch
Epoch: 6/20...  Training Step: 1110...  Training loss: 2.1316...  0.5823 sec/batch
Epoch: 6/20...  Training Step: 1111...  Training loss: 2.1488...  0.6018 sec/batch
Epoch: 6/20...  Training Step: 1112...  Training loss: 2.1277...  0.5903 sec/batch
Epoch: 6/20...  Training Step: 1113...  Training loss: 2.1205...  0.5873 sec/batch
Epoch: 6/20...  Training Step: 1114...  Training loss: 2.1573...  0.5956 sec/batch
Epoch: 6/20...  Training Step: 1115...  Training loss: 2.1321...  0.6049 sec/batch
Epoch: 6/20...  Training Step: 1116...  Training loss: 2.1020...  0.5916 sec/batch
Epoch: 6/20...  Training Step: 1117...  Training loss: 2.1416...  0.5856 sec/batch
Epoch: 6/20...  Training Step: 1118...  Training loss: 2.1478...  0.5881 sec/batch
Epoch: 6/20...  Training Step: 1119...  Training loss: 2.1393...  0.5759 sec/batch
Epoch: 6/20...  Training Step: 1120...  Training loss: 2.1318...  0.5991 sec/batch
Epoch: 6/20...  Training Step: 1121...  Training loss: 2.1211...  0.5934 sec/batch
Epoch: 6/20...  Training Step: 1122...  Training loss: 2.1245...  0.5904 sec/batch
Epoch: 6/20...  Training Step: 1123...  Training loss: 2.1416...  0.5937 sec/batch
Epoch: 6/20...  Training Step: 1124...  Training loss: 2.1472...  0.5777 sec/batch
Epoch: 6/20...  Training Step: 1125...  Training loss: 2.1292...  0.5845 sec/batch
Epoch: 6/20...  Training Step: 1126...  Training loss: 2.1374...  0.6006 sec/batch
Epoch: 6/20...  Training Step: 1127...  Training loss: 2.1275...  0.5882 sec/batch
Epoch: 6/20...  Training Step: 1128...  Training loss: 2.1359...  0.5873 sec/batch
Epoch: 6/20...  Training Step: 1129...  Training loss: 2.1665...  0.5903 sec/batch
Epoch: 6/20...  Training Step: 1130...  Training loss: 2.1135...  0.5864 sec/batch
Epoch: 6/20...  Training Step: 1131...  Training loss: 2.1601...  0.5922 sec/batch
Epoch: 6/20...  Training Step: 1132...  Training loss: 2.1286...  0.5766 sec/batch
Epoch: 6/20...  Training Step: 1133...  Training loss: 2.1276...  0.5879 sec/batch
Epoch: 6/20...  Training Step: 1134...  Training loss: 2.1222...  0.5946 sec/batch
Epoch: 6/20...  Training Step: 1135...  Training loss: 2.1123...  0.5838 sec/batch
Epoch: 6/20...  Training Step: 1136...  Training loss: 2.1477...  0.5850 sec/batch
Epoch: 6/20...  Training Step: 1137...  Training loss: 2.1375...  0.5669 sec/batch
Epoch: 6/20...  Training Step: 1138...  Training loss: 2.1515...  0.5867 sec/batch
Epoch: 6/20...  Training Step: 1139...  Training loss: 2.1349...  0.5927 sec/batch
Epoch: 6/20...  Training Step: 1140...  Training loss: 2.1185...  0.5875 sec/batch
Epoch: 6/20...  Training Step: 1141...  Training loss: 2.1201...  0.5689 sec/batch
Epoch: 6/20...  Training Step: 1142...  Training loss: 2.1642...  0.5924 sec/batch
Epoch: 6/20...  Training Step: 1143...  Training loss: 2.1315...  0.5883 sec/batch
Epoch: 6/20...  Training Step: 1144...  Training loss: 2.1440...  0.5886 sec/batch
Epoch: 6/20...  Training Step: 1145...  Training loss: 2.1242...  0.5928 sec/batch
Epoch: 6/20...  Training Step: 1146...  Training loss: 2.1181...  0.5935 sec/batch
Epoch: 6/20...  Training Step: 1147...  Training loss: 2.1324...  0.5690 sec/batch
Epoch: 6/20...  Training Step: 1148...  Training loss: 2.1256...  0.5828 sec/batch
Epoch: 6/20...  Training Step: 1149...  Training loss: 2.0974...  0.5700 sec/batch
Epoch: 6/20...  Training Step: 1150...  Training loss: 2.1493...  0.5937 sec/batch
Epoch: 6/20...  Training Step: 1151...  Training loss: 2.1480...  0.5849 sec/batch
Epoch: 6/20...  Training Step: 1152...  Training loss: 2.1157...  0.5913 sec/batch
Epoch: 6/20...  Training Step: 1153...  Training loss: 2.1301...  0.5945 sec/batch
Epoch: 6/20...  Training Step: 1154...  Training loss: 2.1415...  0.6043 sec/batch
Epoch: 6/20...  Training Step: 1155...  Training loss: 2.1312...  0.5883 sec/batch
Epoch: 6/20...  Training Step: 1156...  Training loss: 2.1207...  0.5695 sec/batch
Epoch: 6/20...  Training Step: 1157...  Training loss: 2.1292...  0.5970 sec/batch
Epoch: 6/20...  Training Step: 1158...  Training loss: 2.1610...  0.5885 sec/batch
Epoch: 6/20...  Training Step: 1159...  Training loss: 2.1245...  0.5905 sec/batch
Epoch: 6/20...  Training Step: 1160...  Training loss: 2.1110...  0.5940 sec/batch
Epoch: 6/20...  Training Step: 1161...  Training loss: 2.1251...  0.5844 sec/batch
Epoch: 6/20...  Training Step: 1162...  Training loss: 2.1193...  0.5899 sec/batch
Epoch: 6/20...  Training Step: 1163...  Training loss: 2.1470...  0.5901 sec/batch
Epoch: 6/20...  Training Step: 1164...  Training loss: 2.1552...  0.5768 sec/batch
Epoch: 6/20...  Training Step: 1165...  Training loss: 2.1408...  0.5791 sec/batch
Epoch: 6/20...  Training Step: 1166...  Training loss: 2.1565...  0.6179 sec/batch
Epoch: 6/20...  Training Step: 1167...  Training loss: 2.1366...  0.5863 sec/batch
Epoch: 6/20...  Training Step: 1168...  Training loss: 2.1487...  0.5919 sec/batch
Epoch: 6/20...  Training Step: 1169...  Training loss: 2.1305...  0.5864 sec/batch
Epoch: 6/20...  Training Step: 1170...  Training loss: 2.1036...  0.5904 sec/batch
Epoch: 6/20...  Training Step: 1171...  Training loss: 2.1368...  0.5976 sec/batch
Epoch: 6/20...  Training Step: 1172...  Training loss: 2.1478...  0.5883 sec/batch
Epoch: 6/20...  Training Step: 1173...  Training loss: 2.1391...  0.5896 sec/batch
Epoch: 6/20...  Training Step: 1174...  Training loss: 2.1596...  0.5923 sec/batch
Epoch: 6/20...  Training Step: 1175...  Training loss: 2.1401...  0.5909 sec/batch
Epoch: 6/20...  Training Step: 1176...  Training loss: 2.1298...  0.5930 sec/batch
Epoch: 6/20...  Training Step: 1177...  Training loss: 2.1346...  0.5906 sec/batch
Epoch: 6/20...  Training Step: 1178...  Training loss: 2.1203...  0.5805 sec/batch
Epoch: 6/20...  Training Step: 1179...  Training loss: 2.1273...  0.6030 sec/batch
Epoch: 6/20...  Training Step: 1180...  Training loss: 2.1411...  0.5904 sec/batch
Epoch: 6/20...  Training Step: 1181...  Training loss: 2.1443...  0.5906 sec/batch
Epoch: 6/20...  Training Step: 1182...  Training loss: 2.1081...  0.6045 sec/batch
Epoch: 6/20...  Training Step: 1183...  Training loss: 2.1353...  0.5740 sec/batch
Epoch: 6/20...  Training Step: 1184...  Training loss: 2.1278...  0.5931 sec/batch
Epoch: 6/20...  Training Step: 1185...  Training loss: 2.1096...  0.5751 sec/batch
Epoch: 6/20...  Training Step: 1186...  Training loss: 2.1302...  0.5994 sec/batch
Epoch: 6/20...  Training Step: 1187...  Training loss: 2.1297...  0.5995 sec/batch
Epoch: 6/20...  Training Step: 1188...  Training loss: 2.1111...  0.5998 sec/batch
Epoch: 7/20...  Training Step: 1189...  Training loss: 2.1932...  0.5959 sec/batch
Epoch: 7/20...  Training Step: 1190...  Training loss: 2.0998...  0.5975 sec/batch
Epoch: 7/20...  Training Step: 1191...  Training loss: 2.0900...  0.5792 sec/batch
Epoch: 7/20...  Training Step: 1192...  Training loss: 2.1060...  0.5916 sec/batch
Epoch: 7/20...  Training Step: 1193...  Training loss: 2.1108...  0.5937 sec/batch
Epoch: 7/20...  Training Step: 1194...  Training loss: 2.0828...  0.5886 sec/batch
Epoch: 7/20...  Training Step: 1195...  Training loss: 2.1180...  0.5970 sec/batch
Epoch: 7/20...  Training Step: 1196...  Training loss: 2.1092...  0.6013 sec/batch
Epoch: 7/20...  Training Step: 1197...  Training loss: 2.1335...  0.5729 sec/batch
Epoch: 7/20...  Training Step: 1198...  Training loss: 2.1127...  0.5956 sec/batch
Epoch: 7/20...  Training Step: 1199...  Training loss: 2.0924...  0.5913 sec/batch
Epoch: 7/20...  Training Step: 1200...  Training loss: 2.1045...  0.5849 sec/batch
Epoch: 7/20...  Training Step: 1201...  Training loss: 2.1161...  0.5777 sec/batch
Epoch: 7/20...  Training Step: 1202...  Training loss: 2.1479...  0.5777 sec/batch
Epoch: 7/20...  Training Step: 1203...  Training loss: 2.1092...  0.5897 sec/batch
Epoch: 7/20...  Training Step: 1204...  Training loss: 2.1054...  0.5962 sec/batch
Epoch: 7/20...  Training Step: 1205...  Training loss: 2.1127...  0.5958 sec/batch
Epoch: 7/20...  Training Step: 1206...  Training loss: 2.1518...  0.5908 sec/batch
Epoch: 7/20...  Training Step: 1207...  Training loss: 2.1136...  0.5948 sec/batch
Epoch: 7/20...  Training Step: 1208...  Training loss: 2.1121...  0.5963 sec/batch
Epoch: 7/20...  Training Step: 1209...  Training loss: 2.0897...  0.5802 sec/batch
Epoch: 7/20...  Training Step: 1210...  Training loss: 2.1477...  0.5868 sec/batch
Epoch: 7/20...  Training Step: 1211...  Training loss: 2.1039...  0.5962 sec/batch
Epoch: 7/20...  Training Step: 1212...  Training loss: 2.1090...  0.5955 sec/batch
Epoch: 7/20...  Training Step: 1213...  Training loss: 2.1048...  0.5839 sec/batch
Epoch: 7/20...  Training Step: 1214...  Training loss: 2.0980...  0.5824 sec/batch
Epoch: 7/20...  Training Step: 1215...  Training loss: 2.0973...  0.6140 sec/batch
Epoch: 7/20...  Training Step: 1216...  Training loss: 2.1088...  0.5922 sec/batch
Epoch: 7/20...  Training Step: 1217...  Training loss: 2.1485...  0.5974 sec/batch
Epoch: 7/20...  Training Step: 1218...  Training loss: 2.1246...  0.5895 sec/batch
Epoch: 7/20...  Training Step: 1219...  Training loss: 2.1119...  0.5794 sec/batch
Epoch: 7/20...  Training Step: 1220...  Training loss: 2.0966...  0.5685 sec/batch
Epoch: 7/20...  Training Step: 1221...  Training loss: 2.1143...  0.5879 sec/batch
Epoch: 7/20...  Training Step: 1222...  Training loss: 2.1315...  0.5879 sec/batch
Epoch: 7/20...  Training Step: 1223...  Training loss: 2.0939...  0.5953 sec/batch
Epoch: 7/20...  Training Step: 1224...  Training loss: 2.1135...  0.6013 sec/batch
Epoch: 7/20...  Training Step: 1225...  Training loss: 2.0972...  0.5927 sec/batch
Epoch: 7/20...  Training Step: 1226...  Training loss: 2.0670...  0.5933 sec/batch
Epoch: 7/20...  Training Step: 1227...  Training loss: 2.0796...  0.5875 sec/batch
Epoch: 7/20...  Training Step: 1228...  Training loss: 2.0793...  0.5999 sec/batch
Epoch: 7/20...  Training Step: 1229...  Training loss: 2.0884...  0.5905 sec/batch
Epoch: 7/20...  Training Step: 1230...  Training loss: 2.0983...  0.5853 sec/batch
Epoch: 7/20...  Training Step: 1231...  Training loss: 2.0823...  0.5810 sec/batch
Epoch: 7/20...  Training Step: 1232...  Training loss: 2.0753...  0.5775 sec/batch
Epoch: 7/20...  Training Step: 1233...  Training loss: 2.0987...  0.5780 sec/batch
Epoch: 7/20...  Training Step: 1234...  Training loss: 2.0481...  0.5842 sec/batch
Epoch: 7/20...  Training Step: 1235...  Training loss: 2.1070...  0.5947 sec/batch
Epoch: 7/20...  Training Step: 1236...  Training loss: 2.0908...  0.5944 sec/batch
Epoch: 7/20...  Training Step: 1237...  Training loss: 2.0928...  0.5721 sec/batch
Epoch: 7/20...  Training Step: 1238...  Training loss: 2.1423...  0.5892 sec/batch
Epoch: 7/20...  Training Step: 1239...  Training loss: 2.0818...  0.5973 sec/batch
Epoch: 7/20...  Training Step: 1240...  Training loss: 2.1390...  0.5823 sec/batch
Epoch: 7/20...  Training Step: 1241...  Training loss: 2.0847...  0.5749 sec/batch
Epoch: 7/20...  Training Step: 1242...  Training loss: 2.0983...  0.6191 sec/batch
Epoch: 7/20...  Training Step: 1243...  Training loss: 2.0845...  0.5885 sec/batch
Epoch: 7/20...  Training Step: 1244...  Training loss: 2.1146...  0.6119 sec/batch
Epoch: 7/20...  Training Step: 1245...  Training loss: 2.1104...  0.5752 sec/batch
Epoch: 7/20...  Training Step: 1246...  Training loss: 2.0836...  0.5982 sec/batch
Epoch: 7/20...  Training Step: 1247...  Training loss: 2.0806...  0.5976 sec/batch
Epoch: 7/20...  Training Step: 1248...  Training loss: 2.1083...  0.5900 sec/batch
Epoch: 7/20...  Training Step: 1249...  Training loss: 2.0915...  0.5787 sec/batch
Epoch: 7/20...  Training Step: 1250...  Training loss: 2.1423...  0.5964 sec/batch
Epoch: 7/20...  Training Step: 1251...  Training loss: 2.1373...  0.5924 sec/batch
Epoch: 7/20...  Training Step: 1252...  Training loss: 2.1005...  0.5925 sec/batch
Epoch: 7/20...  Training Step: 1253...  Training loss: 2.0973...  0.5891 sec/batch
Epoch: 7/20...  Training Step: 1254...  Training loss: 2.1228...  0.6101 sec/batch
Epoch: 7/20...  Training Step: 1255...  Training loss: 2.1167...  0.5674 sec/batch
Epoch: 7/20...  Training Step: 1256...  Training loss: 2.0691...  0.5902 sec/batch
Epoch: 7/20...  Training Step: 1257...  Training loss: 2.0849...  0.5855 sec/batch
Epoch: 7/20...  Training Step: 1258...  Training loss: 2.0976...  0.5867 sec/batch
Epoch: 7/20...  Training Step: 1259...  Training loss: 2.1281...  0.5846 sec/batch
Epoch: 7/20...  Training Step: 1260...  Training loss: 2.1098...  0.5804 sec/batch
Epoch: 7/20...  Training Step: 1261...  Training loss: 2.1167...  0.5957 sec/batch
Epoch: 7/20...  Training Step: 1262...  Training loss: 2.0866...  0.5846 sec/batch
Epoch: 7/20...  Training Step: 1263...  Training loss: 2.1018...  0.6057 sec/batch
Epoch: 7/20...  Training Step: 1264...  Training loss: 2.1258...  0.5776 sec/batch
Epoch: 7/20...  Training Step: 1265...  Training loss: 2.0906...  0.5803 sec/batch
Epoch: 7/20...  Training Step: 1266...  Training loss: 2.1044...  0.6287 sec/batch
Epoch: 7/20...  Training Step: 1267...  Training loss: 2.0731...  0.5900 sec/batch
Epoch: 7/20...  Training Step: 1268...  Training loss: 2.0824...  0.5894 sec/batch
Epoch: 7/20...  Training Step: 1269...  Training loss: 2.0773...  0.5884 sec/batch
Epoch: 7/20...  Training Step: 1270...  Training loss: 2.1096...  0.5936 sec/batch
Epoch: 7/20...  Training Step: 1271...  Training loss: 2.0591...  0.5946 sec/batch
Epoch: 7/20...  Training Step: 1272...  Training loss: 2.0757...  0.5804 sec/batch
Epoch: 7/20...  Training Step: 1273...  Training loss: 2.0580...  0.5966 sec/batch
Epoch: 7/20...  Training Step: 1274...  Training loss: 2.0790...  0.5880 sec/batch
Epoch: 7/20...  Training Step: 1275...  Training loss: 2.0849...  0.5605 sec/batch
Epoch: 7/20...  Training Step: 1276...  Training loss: 2.0732...  0.5817 sec/batch
Epoch: 7/20...  Training Step: 1277...  Training loss: 2.0651...  0.5911 sec/batch
Epoch: 7/20...  Training Step: 1278...  Training loss: 2.1033...  0.5896 sec/batch
Epoch: 7/20...  Training Step: 1279...  Training loss: 2.0843...  0.5904 sec/batch
Epoch: 7/20...  Training Step: 1280...  Training loss: 2.0939...  0.5933 sec/batch
Epoch: 7/20...  Training Step: 1281...  Training loss: 2.0708...  0.5873 sec/batch
Epoch: 7/20...  Training Step: 1282...  Training loss: 2.0686...  0.5865 sec/batch
Epoch: 7/20...  Training Step: 1283...  Training loss: 2.0667...  0.5965 sec/batch
Epoch: 7/20...  Training Step: 1284...  Training loss: 2.0837...  0.5940 sec/batch
Epoch: 7/20...  Training Step: 1285...  Training loss: 2.0810...  0.5861 sec/batch
Epoch: 7/20...  Training Step: 1286...  Training loss: 2.0760...  0.5934 sec/batch
Epoch: 7/20...  Training Step: 1287...  Training loss: 2.0684...  0.5976 sec/batch
Epoch: 7/20...  Training Step: 1288...  Training loss: 2.0631...  0.5811 sec/batch
Epoch: 7/20...  Training Step: 1289...  Training loss: 2.1017...  0.5839 sec/batch
Epoch: 7/20...  Training Step: 1290...  Training loss: 2.0917...  0.5935 sec/batch
Epoch: 7/20...  Training Step: 1291...  Training loss: 2.0686...  0.5938 sec/batch
Epoch: 7/20...  Training Step: 1292...  Training loss: 2.0777...  0.5906 sec/batch
Epoch: 7/20...  Training Step: 1293...  Training loss: 2.0655...  0.5943 sec/batch
Epoch: 7/20...  Training Step: 1294...  Training loss: 2.0869...  0.5936 sec/batch
Epoch: 7/20...  Training Step: 1295...  Training loss: 2.0840...  0.5953 sec/batch
Epoch: 7/20...  Training Step: 1296...  Training loss: 2.0892...  0.5836 sec/batch
Epoch: 7/20...  Training Step: 1297...  Training loss: 2.1110...  0.5940 sec/batch
Epoch: 7/20...  Training Step: 1298...  Training loss: 2.0841...  0.5919 sec/batch
Epoch: 7/20...  Training Step: 1299...  Training loss: 2.0886...  0.5948 sec/batch
Epoch: 7/20...  Training Step: 1300...  Training loss: 2.0892...  0.5904 sec/batch
Epoch: 7/20...  Training Step: 1301...  Training loss: 2.0883...  0.5911 sec/batch
Epoch: 7/20...  Training Step: 1302...  Training loss: 2.0780...  0.5873 sec/batch
Epoch: 7/20...  Training Step: 1303...  Training loss: 2.0679...  0.5749 sec/batch
Epoch: 7/20...  Training Step: 1304...  Training loss: 2.0424...  0.5886 sec/batch
Epoch: 7/20...  Training Step: 1305...  Training loss: 2.0848...  0.5989 sec/batch
Epoch: 7/20...  Training Step: 1306...  Training loss: 2.0843...  0.5854 sec/batch
Epoch: 7/20...  Training Step: 1307...  Training loss: 2.1108...  0.5857 sec/batch
Epoch: 7/20...  Training Step: 1308...  Training loss: 2.0866...  0.5964 sec/batch
Epoch: 7/20...  Training Step: 1309...  Training loss: 2.1066...  0.5921 sec/batch
Epoch: 7/20...  Training Step: 1310...  Training loss: 2.0796...  0.5945 sec/batch
Epoch: 7/20...  Training Step: 1311...  Training loss: 2.0740...  0.5938 sec/batch
Epoch: 7/20...  Training Step: 1312...  Training loss: 2.1032...  0.6016 sec/batch
Epoch: 7/20...  Training Step: 1313...  Training loss: 2.0880...  0.5905 sec/batch
Epoch: 7/20...  Training Step: 1314...  Training loss: 2.0385...  0.5964 sec/batch
Epoch: 7/20...  Training Step: 1315...  Training loss: 2.1011...  0.5858 sec/batch
Epoch: 7/20...  Training Step: 1316...  Training loss: 2.0937...  0.5846 sec/batch
Epoch: 7/20...  Training Step: 1317...  Training loss: 2.0901...  0.6189 sec/batch
Epoch: 7/20...  Training Step: 1318...  Training loss: 2.0836...  0.5867 sec/batch
Epoch: 7/20...  Training Step: 1319...  Training loss: 2.0711...  0.5931 sec/batch
Epoch: 7/20...  Training Step: 1320...  Training loss: 2.0516...  0.5900 sec/batch
Epoch: 7/20...  Training Step: 1321...  Training loss: 2.1013...  0.6031 sec/batch
Epoch: 7/20...  Training Step: 1322...  Training loss: 2.0983...  0.5957 sec/batch
Epoch: 7/20...  Training Step: 1323...  Training loss: 2.0829...  0.5866 sec/batch
Epoch: 7/20...  Training Step: 1324...  Training loss: 2.0832...  0.5930 sec/batch
Epoch: 7/20...  Training Step: 1325...  Training loss: 2.0931...  0.5900 sec/batch
Epoch: 7/20...  Training Step: 1326...  Training loss: 2.0937...  0.5947 sec/batch
Epoch: 7/20...  Training Step: 1327...  Training loss: 2.1254...  0.5973 sec/batch
Epoch: 7/20...  Training Step: 1328...  Training loss: 2.0649...  0.5916 sec/batch
Epoch: 7/20...  Training Step: 1329...  Training loss: 2.1129...  0.6058 sec/batch
Epoch: 7/20...  Training Step: 1330...  Training loss: 2.0774...  0.5961 sec/batch
Epoch: 7/20...  Training Step: 1331...  Training loss: 2.0864...  0.5948 sec/batch
Epoch: 7/20...  Training Step: 1332...  Training loss: 2.0801...  0.5833 sec/batch
Epoch: 7/20...  Training Step: 1333...  Training loss: 2.0693...  0.6019 sec/batch
Epoch: 7/20...  Training Step: 1334...  Training loss: 2.0946...  0.5846 sec/batch
Epoch: 7/20...  Training Step: 1335...  Training loss: 2.0979...  0.5899 sec/batch
Epoch: 7/20...  Training Step: 1336...  Training loss: 2.1103...  0.6002 sec/batch
Epoch: 7/20...  Training Step: 1337...  Training loss: 2.0867...  0.6008 sec/batch
Epoch: 7/20...  Training Step: 1338...  Training loss: 2.0657...  0.5951 sec/batch
Epoch: 7/20...  Training Step: 1339...  Training loss: 2.0653...  0.6051 sec/batch
Epoch: 7/20...  Training Step: 1340...  Training loss: 2.1164...  0.5889 sec/batch
Epoch: 7/20...  Training Step: 1341...  Training loss: 2.0915...  0.5841 sec/batch
Epoch: 7/20...  Training Step: 1342...  Training loss: 2.0828...  0.5941 sec/batch
Epoch: 7/20...  Training Step: 1343...  Training loss: 2.0806...  0.5873 sec/batch
Epoch: 7/20...  Training Step: 1344...  Training loss: 2.0655...  0.5968 sec/batch
Epoch: 7/20...  Training Step: 1345...  Training loss: 2.0877...  0.5932 sec/batch
Epoch: 7/20...  Training Step: 1346...  Training loss: 2.0641...  0.5785 sec/batch
Epoch: 7/20...  Training Step: 1347...  Training loss: 2.0514...  0.5956 sec/batch
Epoch: 7/20...  Training Step: 1348...  Training loss: 2.1084...  0.5980 sec/batch
Epoch: 7/20...  Training Step: 1349...  Training loss: 2.1069...  0.5905 sec/batch
Epoch: 7/20...  Training Step: 1350...  Training loss: 2.0721...  0.5887 sec/batch
Epoch: 7/20...  Training Step: 1351...  Training loss: 2.0920...  0.5856 sec/batch
Epoch: 7/20...  Training Step: 1352...  Training loss: 2.0800...  0.5988 sec/batch
Epoch: 7/20...  Training Step: 1353...  Training loss: 2.0903...  0.5954 sec/batch
Epoch: 7/20...  Training Step: 1354...  Training loss: 2.0768...  0.5927 sec/batch
Epoch: 7/20...  Training Step: 1355...  Training loss: 2.0822...  0.5957 sec/batch
Epoch: 7/20...  Training Step: 1356...  Training loss: 2.1181...  0.6252 sec/batch
Epoch: 7/20...  Training Step: 1357...  Training loss: 2.0828...  0.5947 sec/batch
Epoch: 7/20...  Training Step: 1358...  Training loss: 2.0614...  0.5800 sec/batch
Epoch: 7/20...  Training Step: 1359...  Training loss: 2.0717...  0.5698 sec/batch
Epoch: 7/20...  Training Step: 1360...  Training loss: 2.0590...  0.6001 sec/batch
Epoch: 7/20...  Training Step: 1361...  Training loss: 2.0968...  0.5898 sec/batch
Epoch: 7/20...  Training Step: 1362...  Training loss: 2.0949...  0.6019 sec/batch
Epoch: 7/20...  Training Step: 1363...  Training loss: 2.0865...  0.5867 sec/batch
Epoch: 7/20...  Training Step: 1364...  Training loss: 2.0759...  0.6005 sec/batch
Epoch: 7/20...  Training Step: 1365...  Training loss: 2.0625...  0.5893 sec/batch
Epoch: 7/20...  Training Step: 1366...  Training loss: 2.0802...  0.5693 sec/batch
Epoch: 7/20...  Training Step: 1367...  Training loss: 2.0633...  0.6287 sec/batch
Epoch: 7/20...  Training Step: 1368...  Training loss: 2.0463...  0.5916 sec/batch
Epoch: 7/20...  Training Step: 1369...  Training loss: 2.0567...  0.6181 sec/batch
Epoch: 7/20...  Training Step: 1370...  Training loss: 2.0782...  0.5930 sec/batch
Epoch: 7/20...  Training Step: 1371...  Training loss: 2.0710...  0.5891 sec/batch
Epoch: 7/20...  Training Step: 1372...  Training loss: 2.1000...  0.6025 sec/batch
Epoch: 7/20...  Training Step: 1373...  Training loss: 2.0768...  0.5916 sec/batch
Epoch: 7/20...  Training Step: 1374...  Training loss: 2.0857...  0.5937 sec/batch
Epoch: 7/20...  Training Step: 1375...  Training loss: 2.0802...  0.5916 sec/batch
Epoch: 7/20...  Training Step: 1376...  Training loss: 2.0631...  0.5892 sec/batch
Epoch: 7/20...  Training Step: 1377...  Training loss: 2.0702...  0.5770 sec/batch
Epoch: 7/20...  Training Step: 1378...  Training loss: 2.0817...  0.5813 sec/batch
Epoch: 7/20...  Training Step: 1379...  Training loss: 2.0899...  0.5968 sec/batch
Epoch: 7/20...  Training Step: 1380...  Training loss: 2.0566...  0.5989 sec/batch
Epoch: 7/20...  Training Step: 1381...  Training loss: 2.0850...  0.5967 sec/batch
Epoch: 7/20...  Training Step: 1382...  Training loss: 2.0745...  0.5843 sec/batch
Epoch: 7/20...  Training Step: 1383...  Training loss: 2.0599...  0.5915 sec/batch
Epoch: 7/20...  Training Step: 1384...  Training loss: 2.0885...  0.6012 sec/batch
Epoch: 7/20...  Training Step: 1385...  Training loss: 2.0782...  0.5883 sec/batch
Epoch: 7/20...  Training Step: 1386...  Training loss: 2.0603...  0.5977 sec/batch
Epoch: 8/20...  Training Step: 1387...  Training loss: 2.1507...  0.5860 sec/batch
Epoch: 8/20...  Training Step: 1388...  Training loss: 2.0639...  0.6086 sec/batch
Epoch: 8/20...  Training Step: 1389...  Training loss: 2.0642...  0.5946 sec/batch
Epoch: 8/20...  Training Step: 1390...  Training loss: 2.0629...  0.5859 sec/batch
Epoch: 8/20...  Training Step: 1391...  Training loss: 2.0672...  0.5874 sec/batch
Epoch: 8/20...  Training Step: 1392...  Training loss: 2.0434...  0.5852 sec/batch
Epoch: 8/20...  Training Step: 1393...  Training loss: 2.0803...  0.5928 sec/batch
Epoch: 8/20...  Training Step: 1394...  Training loss: 2.0631...  0.6070 sec/batch
Epoch: 8/20...  Training Step: 1395...  Training loss: 2.1007...  0.5967 sec/batch
Epoch: 8/20...  Training Step: 1396...  Training loss: 2.0636...  0.5867 sec/batch
Epoch: 8/20...  Training Step: 1397...  Training loss: 2.0503...  0.5924 sec/batch
Epoch: 8/20...  Training Step: 1398...  Training loss: 2.0748...  0.5968 sec/batch
Epoch: 8/20...  Training Step: 1399...  Training loss: 2.0796...  0.5869 sec/batch
Epoch: 8/20...  Training Step: 1400...  Training loss: 2.1161...  0.5847 sec/batch
Epoch: 8/20...  Training Step: 1401...  Training loss: 2.0816...  0.5737 sec/batch
Epoch: 8/20...  Training Step: 1402...  Training loss: 2.0679...  0.5991 sec/batch
Epoch: 8/20...  Training Step: 1403...  Training loss: 2.0759...  0.5785 sec/batch
Epoch: 8/20...  Training Step: 1404...  Training loss: 2.1175...  0.6007 sec/batch
Epoch: 8/20...  Training Step: 1405...  Training loss: 2.0806...  0.5829 sec/batch
Epoch: 8/20...  Training Step: 1406...  Training loss: 2.0772...  0.6108 sec/batch
Epoch: 8/20...  Training Step: 1407...  Training loss: 2.0621...  0.5973 sec/batch
Epoch: 8/20...  Training Step: 1408...  Training loss: 2.1164...  0.5937 sec/batch
Epoch: 8/20...  Training Step: 1409...  Training loss: 2.0913...  0.5842 sec/batch
Epoch: 8/20...  Training Step: 1410...  Training loss: 2.0863...  0.5937 sec/batch
Epoch: 8/20...  Training Step: 1411...  Training loss: 2.0947...  0.5877 sec/batch
Epoch: 8/20...  Training Step: 1412...  Training loss: 2.0786...  0.5761 sec/batch
Epoch: 8/20...  Training Step: 1413...  Training loss: 2.0827...  0.5893 sec/batch
Epoch: 8/20...  Training Step: 1414...  Training loss: 2.1088...  0.6019 sec/batch
Epoch: 8/20...  Training Step: 1415...  Training loss: 2.1222...  0.5879 sec/batch
Epoch: 8/20...  Training Step: 1416...  Training loss: 2.1066...  0.5959 sec/batch
Epoch: 8/20...  Training Step: 1417...  Training loss: 2.0941...  0.6198 sec/batch
Epoch: 8/20...  Training Step: 1418...  Training loss: 2.0656...  0.5948 sec/batch
Epoch: 8/20...  Training Step: 1419...  Training loss: 2.1018...  0.5869 sec/batch
Epoch: 8/20...  Training Step: 1420...  Training loss: 2.1113...  0.5812 sec/batch
Epoch: 8/20...  Training Step: 1421...  Training loss: 2.0780...  0.5893 sec/batch
Epoch: 8/20...  Training Step: 1422...  Training loss: 2.0906...  0.5970 sec/batch
Epoch: 8/20...  Training Step: 1423...  Training loss: 2.0741...  0.5854 sec/batch
Epoch: 8/20...  Training Step: 1424...  Training loss: 2.0440...  0.5923 sec/batch
Epoch: 8/20...  Training Step: 1425...  Training loss: 2.0634...  0.6062 sec/batch
Epoch: 8/20...  Training Step: 1426...  Training loss: 2.0658...  0.5910 sec/batch
Epoch: 8/20...  Training Step: 1427...  Training loss: 2.0642...  0.5997 sec/batch
Epoch: 8/20...  Training Step: 1428...  Training loss: 2.0807...  0.5981 sec/batch
Epoch: 8/20...  Training Step: 1429...  Training loss: 2.0545...  0.5919 sec/batch
Epoch: 8/20...  Training Step: 1430...  Training loss: 2.0586...  0.5902 sec/batch
Epoch: 8/20...  Training Step: 1431...  Training loss: 2.0730...  0.5961 sec/batch
Epoch: 8/20...  Training Step: 1432...  Training loss: 2.0371...  0.5858 sec/batch
Epoch: 8/20...  Training Step: 1433...  Training loss: 2.0902...  0.5918 sec/batch
Epoch: 8/20...  Training Step: 1434...  Training loss: 2.0715...  0.5961 sec/batch
Epoch: 8/20...  Training Step: 1435...  Training loss: 2.0721...  0.5867 sec/batch
Epoch: 8/20...  Training Step: 1436...  Training loss: 2.0965...  0.5924 sec/batch
Epoch: 8/20...  Training Step: 1437...  Training loss: 2.0205...  0.5898 sec/batch
Epoch: 8/20...  Training Step: 1438...  Training loss: 2.0928...  0.5920 sec/batch
Epoch: 8/20...  Training Step: 1439...  Training loss: 2.0566...  0.5848 sec/batch
Epoch: 8/20...  Training Step: 1440...  Training loss: 2.0538...  0.5873 sec/batch
Epoch: 8/20...  Training Step: 1441...  Training loss: 2.0375...  0.5804 sec/batch
Epoch: 8/20...  Training Step: 1442...  Training loss: 2.0651...  0.5787 sec/batch
Epoch: 8/20...  Training Step: 1443...  Training loss: 2.0740...  0.5834 sec/batch
Epoch: 8/20...  Training Step: 1444...  Training loss: 2.0532...  0.5949 sec/batch
Epoch: 8/20...  Training Step: 1445...  Training loss: 2.0389...  0.5910 sec/batch
Epoch: 8/20...  Training Step: 1446...  Training loss: 2.0866...  0.5877 sec/batch
Epoch: 8/20...  Training Step: 1447...  Training loss: 2.0559...  0.5891 sec/batch
Epoch: 8/20...  Training Step: 1448...  Training loss: 2.0896...  0.5992 sec/batch
Epoch: 8/20...  Training Step: 1449...  Training loss: 2.0892...  0.5926 sec/batch
Epoch: 8/20...  Training Step: 1450...  Training loss: 2.0609...  0.5836 sec/batch
Epoch: 8/20...  Training Step: 1451...  Training loss: 2.0499...  0.5998 sec/batch
Epoch: 8/20...  Training Step: 1452...  Training loss: 2.0817...  0.5935 sec/batch
Epoch: 8/20...  Training Step: 1453...  Training loss: 2.0721...  0.5880 sec/batch
Epoch: 8/20...  Training Step: 1454...  Training loss: 2.0335...  0.5862 sec/batch
Epoch: 8/20...  Training Step: 1455...  Training loss: 2.0405...  0.5815 sec/batch
Epoch: 8/20...  Training Step: 1456...  Training loss: 2.0550...  0.5935 sec/batch
Epoch: 8/20...  Training Step: 1457...  Training loss: 2.0888...  0.5923 sec/batch
Epoch: 8/20...  Training Step: 1458...  Training loss: 2.0709...  0.5923 sec/batch
Epoch: 8/20...  Training Step: 1459...  Training loss: 2.0718...  0.6039 sec/batch
Epoch: 8/20...  Training Step: 1460...  Training loss: 2.0470...  0.5895 sec/batch
Epoch: 8/20...  Training Step: 1461...  Training loss: 2.0602...  0.5882 sec/batch
Epoch: 8/20...  Training Step: 1462...  Training loss: 2.0787...  0.5892 sec/batch
Epoch: 8/20...  Training Step: 1463...  Training loss: 2.0564...  0.5953 sec/batch
Epoch: 8/20...  Training Step: 1464...  Training loss: 2.0608...  0.6068 sec/batch
Epoch: 8/20...  Training Step: 1465...  Training loss: 2.0359...  0.5898 sec/batch
Epoch: 8/20...  Training Step: 1466...  Training loss: 2.0440...  0.5824 sec/batch
Epoch: 8/20...  Training Step: 1467...  Training loss: 2.0202...  0.5946 sec/batch
Epoch: 8/20...  Training Step: 1468...  Training loss: 2.0795...  0.6249 sec/batch
Epoch: 8/20...  Training Step: 1469...  Training loss: 2.0240...  0.5904 sec/batch
Epoch: 8/20...  Training Step: 1470...  Training loss: 2.0458...  0.5930 sec/batch
Epoch: 8/20...  Training Step: 1471...  Training loss: 2.0172...  0.5848 sec/batch
Epoch: 8/20...  Training Step: 1472...  Training loss: 2.0477...  0.6006 sec/batch
Epoch: 8/20...  Training Step: 1473...  Training loss: 2.0478...  0.6067 sec/batch
Epoch: 8/20...  Training Step: 1474...  Training loss: 2.0253...  0.5957 sec/batch
Epoch: 8/20...  Training Step: 1475...  Training loss: 2.0249...  0.5836 sec/batch
Epoch: 8/20...  Training Step: 1476...  Training loss: 2.0566...  0.5990 sec/batch
Epoch: 8/20...  Training Step: 1477...  Training loss: 2.0295...  0.5947 sec/batch
Epoch: 8/20...  Training Step: 1478...  Training loss: 2.0440...  0.5945 sec/batch
Epoch: 8/20...  Training Step: 1479...  Training loss: 2.0336...  0.6006 sec/batch
Epoch: 8/20...  Training Step: 1480...  Training loss: 2.0264...  0.5875 sec/batch
Epoch: 8/20...  Training Step: 1481...  Training loss: 2.0126...  0.5925 sec/batch
Epoch: 8/20...  Training Step: 1482...  Training loss: 2.0523...  0.5847 sec/batch
Epoch: 8/20...  Training Step: 1483...  Training loss: 2.0458...  0.6046 sec/batch
Epoch: 8/20...  Training Step: 1484...  Training loss: 2.0404...  0.6000 sec/batch
Epoch: 8/20...  Training Step: 1485...  Training loss: 2.0294...  0.5867 sec/batch
Epoch: 8/20...  Training Step: 1486...  Training loss: 2.0099...  0.5959 sec/batch
Epoch: 8/20...  Training Step: 1487...  Training loss: 2.0592...  0.5830 sec/batch
Epoch: 8/20...  Training Step: 1488...  Training loss: 2.0541...  0.5930 sec/batch
Epoch: 8/20...  Training Step: 1489...  Training loss: 2.0321...  0.5632 sec/batch
Epoch: 8/20...  Training Step: 1490...  Training loss: 2.0388...  0.5927 sec/batch
Epoch: 8/20...  Training Step: 1491...  Training loss: 2.0470...  0.5983 sec/batch
Epoch: 8/20...  Training Step: 1492...  Training loss: 2.0499...  0.5920 sec/batch
Epoch: 8/20...  Training Step: 1493...  Training loss: 2.0447...  0.5847 sec/batch
Epoch: 8/20...  Training Step: 1494...  Training loss: 2.0576...  0.5763 sec/batch
Epoch: 8/20...  Training Step: 1495...  Training loss: 2.0656...  0.5829 sec/batch
Epoch: 8/20...  Training Step: 1496...  Training loss: 2.0461...  0.6012 sec/batch
Epoch: 8/20...  Training Step: 1497...  Training loss: 2.0486...  0.5968 sec/batch
Epoch: 8/20...  Training Step: 1498...  Training loss: 2.0392...  0.5895 sec/batch
Epoch: 8/20...  Training Step: 1499...  Training loss: 2.0468...  0.5798 sec/batch
Epoch: 8/20...  Training Step: 1500...  Training loss: 2.0350...  0.6042 sec/batch
Epoch: 8/20...  Training Step: 1501...  Training loss: 2.0270...  0.6066 sec/batch
Epoch: 8/20...  Training Step: 1502...  Training loss: 1.9928...  0.5948 sec/batch
Epoch: 8/20...  Training Step: 1503...  Training loss: 2.0478...  0.6001 sec/batch
Epoch: 8/20...  Training Step: 1504...  Training loss: 2.0429...  0.5907 sec/batch
Epoch: 8/20...  Training Step: 1505...  Training loss: 2.0453...  0.5927 sec/batch
Epoch: 8/20...  Training Step: 1506...  Training loss: 2.0484...  0.5964 sec/batch
Epoch: 8/20...  Training Step: 1507...  Training loss: 2.0495...  0.5961 sec/batch
Epoch: 8/20...  Training Step: 1508...  Training loss: 2.0272...  0.5850 sec/batch
Epoch: 8/20...  Training Step: 1509...  Training loss: 2.0377...  0.5941 sec/batch
Epoch: 8/20...  Training Step: 1510...  Training loss: 2.0614...  0.5963 sec/batch
Epoch: 8/20...  Training Step: 1511...  Training loss: 2.0441...  0.5926 sec/batch
Epoch: 8/20...  Training Step: 1512...  Training loss: 2.0037...  0.5965 sec/batch
Epoch: 8/20...  Training Step: 1513...  Training loss: 2.0496...  0.5911 sec/batch
Epoch: 8/20...  Training Step: 1514...  Training loss: 2.0567...  0.5989 sec/batch
Epoch: 8/20...  Training Step: 1515...  Training loss: 2.0457...  0.5927 sec/batch
Epoch: 8/20...  Training Step: 1516...  Training loss: 2.0380...  0.5967 sec/batch
Epoch: 8/20...  Training Step: 1517...  Training loss: 2.0302...  0.5851 sec/batch
Epoch: 8/20...  Training Step: 1518...  Training loss: 2.0229...  0.6169 sec/batch
Epoch: 8/20...  Training Step: 1519...  Training loss: 2.0552...  0.5991 sec/batch
Epoch: 8/20...  Training Step: 1520...  Training loss: 2.0477...  0.5972 sec/batch
Epoch: 8/20...  Training Step: 1521...  Training loss: 2.0359...  0.5874 sec/batch
Epoch: 8/20...  Training Step: 1522...  Training loss: 2.0540...  0.5995 sec/batch
Epoch: 8/20...  Training Step: 1523...  Training loss: 2.0591...  0.6069 sec/batch
Epoch: 8/20...  Training Step: 1524...  Training loss: 2.0594...  0.6026 sec/batch
Epoch: 8/20...  Training Step: 1525...  Training loss: 2.0678...  0.5916 sec/batch
Epoch: 8/20...  Training Step: 1526...  Training loss: 2.0293...  0.5960 sec/batch
Epoch: 8/20...  Training Step: 1527...  Training loss: 2.0807...  0.5956 sec/batch
Epoch: 8/20...  Training Step: 1528...  Training loss: 2.0495...  0.6003 sec/batch
Epoch: 8/20...  Training Step: 1529...  Training loss: 2.0471...  0.5992 sec/batch
Epoch: 8/20...  Training Step: 1530...  Training loss: 2.0341...  0.5947 sec/batch
Epoch: 8/20...  Training Step: 1531...  Training loss: 2.0395...  0.6065 sec/batch
Epoch: 8/20...  Training Step: 1532...  Training loss: 2.0508...  0.6064 sec/batch
Epoch: 8/20...  Training Step: 1533...  Training loss: 2.0505...  0.5910 sec/batch
Epoch: 8/20...  Training Step: 1534...  Training loss: 2.0690...  0.5996 sec/batch
Epoch: 8/20...  Training Step: 1535...  Training loss: 2.0385...  0.5824 sec/batch
Epoch: 8/20...  Training Step: 1536...  Training loss: 2.0352...  0.5845 sec/batch
Epoch: 8/20...  Training Step: 1537...  Training loss: 2.0367...  0.5967 sec/batch
Epoch: 8/20...  Training Step: 1538...  Training loss: 2.0759...  0.6544 sec/batch
Epoch: 8/20...  Training Step: 1539...  Training loss: 2.0439...  0.6584 sec/batch
Epoch: 8/20...  Training Step: 1540...  Training loss: 2.0447...  0.6446 sec/batch
Epoch: 8/20...  Training Step: 1541...  Training loss: 2.0331...  0.5963 sec/batch
Epoch: 8/20...  Training Step: 1542...  Training loss: 2.0396...  0.6088 sec/batch
Epoch: 8/20...  Training Step: 1543...  Training loss: 2.0422...  0.5841 sec/batch
Epoch: 8/20...  Training Step: 1544...  Training loss: 2.0409...  0.5896 sec/batch
Epoch: 8/20...  Training Step: 1545...  Training loss: 2.0059...  0.5927 sec/batch
Epoch: 8/20...  Training Step: 1546...  Training loss: 2.0690...  0.5957 sec/batch
Epoch: 8/20...  Training Step: 1547...  Training loss: 2.0609...  0.5879 sec/batch
Epoch: 8/20...  Training Step: 1548...  Training loss: 2.0252...  0.5899 sec/batch
Epoch: 8/20...  Training Step: 1549...  Training loss: 2.0493...  0.5659 sec/batch
Epoch: 8/20...  Training Step: 1550...  Training loss: 2.0352...  0.5998 sec/batch
Epoch: 8/20...  Training Step: 1551...  Training loss: 2.0366...  0.5955 sec/batch
Epoch: 8/20...  Training Step: 1552...  Training loss: 2.0305...  0.5636 sec/batch
Epoch: 8/20...  Training Step: 1553...  Training loss: 2.0423...  0.5895 sec/batch
Epoch: 8/20...  Training Step: 1554...  Training loss: 2.0737...  0.5894 sec/batch
Epoch: 8/20...  Training Step: 1555...  Training loss: 2.0408...  0.6042 sec/batch
Epoch: 8/20...  Training Step: 1556...  Training loss: 2.0287...  0.6065 sec/batch
Epoch: 8/20...  Training Step: 1557...  Training loss: 2.0350...  0.5900 sec/batch
Epoch: 8/20...  Training Step: 1558...  Training loss: 2.0245...  0.5788 sec/batch
Epoch: 8/20...  Training Step: 1559...  Training loss: 2.0711...  0.5945 sec/batch
Epoch: 8/20...  Training Step: 1560...  Training loss: 2.0581...  0.5889 sec/batch
Epoch: 8/20...  Training Step: 1561...  Training loss: 2.0644...  0.5972 sec/batch
Epoch: 8/20...  Training Step: 1562...  Training loss: 2.0359...  0.5961 sec/batch
Epoch: 8/20...  Training Step: 1563...  Training loss: 2.0230...  0.5933 sec/batch
Epoch: 8/20...  Training Step: 1564...  Training loss: 2.0401...  0.5785 sec/batch
Epoch: 8/20...  Training Step: 1565...  Training loss: 2.0291...  0.5748 sec/batch
Epoch: 8/20...  Training Step: 1566...  Training loss: 2.0150...  0.5967 sec/batch
Epoch: 8/20...  Training Step: 1567...  Training loss: 2.0215...  0.5889 sec/batch
Epoch: 8/20...  Training Step: 1568...  Training loss: 2.0403...  0.6497 sec/batch
Epoch: 8/20...  Training Step: 1569...  Training loss: 2.0345...  0.7303 sec/batch
Epoch: 8/20...  Training Step: 1570...  Training loss: 2.0691...  0.6031 sec/batch
Epoch: 8/20...  Training Step: 1571...  Training loss: 2.0465...  0.5985 sec/batch
Epoch: 8/20...  Training Step: 1572...  Training loss: 2.0308...  0.6148 sec/batch
Epoch: 8/20...  Training Step: 1573...  Training loss: 2.0261...  0.5960 sec/batch
Epoch: 8/20...  Training Step: 1574...  Training loss: 2.0138...  0.5940 sec/batch
Epoch: 8/20...  Training Step: 1575...  Training loss: 2.0215...  0.5913 sec/batch
Epoch: 8/20...  Training Step: 1576...  Training loss: 2.0271...  0.5944 sec/batch
Epoch: 8/20...  Training Step: 1577...  Training loss: 2.0481...  0.5946 sec/batch
Epoch: 8/20...  Training Step: 1578...  Training loss: 2.0148...  0.5953 sec/batch
Epoch: 8/20...  Training Step: 1579...  Training loss: 2.0434...  0.6063 sec/batch
Epoch: 8/20...  Training Step: 1580...  Training loss: 2.0285...  0.6123 sec/batch
Epoch: 8/20...  Training Step: 1581...  Training loss: 2.0136...  0.5922 sec/batch
Epoch: 8/20...  Training Step: 1582...  Training loss: 2.0478...  0.5939 sec/batch
Epoch: 8/20...  Training Step: 1583...  Training loss: 2.0304...  0.5999 sec/batch
Epoch: 8/20...  Training Step: 1584...  Training loss: 2.0140...  0.5870 sec/batch
Epoch: 9/20...  Training Step: 1585...  Training loss: 2.1035...  0.5960 sec/batch
Epoch: 9/20...  Training Step: 1586...  Training loss: 2.0134...  0.5959 sec/batch
Epoch: 9/20...  Training Step: 1587...  Training loss: 2.0076...  0.5726 sec/batch
Epoch: 9/20...  Training Step: 1588...  Training loss: 2.0214...  0.6018 sec/batch
Epoch: 9/20...  Training Step: 1589...  Training loss: 2.0280...  0.5872 sec/batch
Epoch: 9/20...  Training Step: 1590...  Training loss: 2.0090...  0.5953 sec/batch
Epoch: 9/20...  Training Step: 1591...  Training loss: 2.0381...  0.5898 sec/batch
Epoch: 9/20...  Training Step: 1592...  Training loss: 2.0325...  0.5952 sec/batch
Epoch: 9/20...  Training Step: 1593...  Training loss: 2.0700...  0.5995 sec/batch
Epoch: 9/20...  Training Step: 1594...  Training loss: 2.0197...  0.5894 sec/batch
Epoch: 9/20...  Training Step: 1595...  Training loss: 2.0090...  0.5821 sec/batch
Epoch: 9/20...  Training Step: 1596...  Training loss: 2.0097...  0.6003 sec/batch
Epoch: 9/20...  Training Step: 1597...  Training loss: 2.0290...  0.5967 sec/batch
Epoch: 9/20...  Training Step: 1598...  Training loss: 2.0570...  0.5973 sec/batch
Epoch: 9/20...  Training Step: 1599...  Training loss: 2.0243...  0.5972 sec/batch
Epoch: 9/20...  Training Step: 1600...  Training loss: 2.0125...  0.5875 sec/batch
Epoch: 9/20...  Training Step: 1601...  Training loss: 2.0266...  0.5658 sec/batch
Epoch: 9/20...  Training Step: 1602...  Training loss: 2.0731...  0.6017 sec/batch
Epoch: 9/20...  Training Step: 1603...  Training loss: 2.0170...  0.5896 sec/batch
Epoch: 9/20...  Training Step: 1604...  Training loss: 2.0186...  0.5691 sec/batch
Epoch: 9/20...  Training Step: 1605...  Training loss: 2.0143...  0.5977 sec/batch
Epoch: 9/20...  Training Step: 1606...  Training loss: 2.0636...  0.5945 sec/batch
Epoch: 9/20...  Training Step: 1607...  Training loss: 2.0343...  0.6047 sec/batch
Epoch: 9/20...  Training Step: 1608...  Training loss: 2.0203...  0.5980 sec/batch
Epoch: 9/20...  Training Step: 1609...  Training loss: 2.0253...  0.5963 sec/batch
Epoch: 9/20...  Training Step: 1610...  Training loss: 2.0156...  0.5798 sec/batch
Epoch: 9/20...  Training Step: 1611...  Training loss: 2.0218...  0.5947 sec/batch
Epoch: 9/20...  Training Step: 1612...  Training loss: 2.0379...  0.6047 sec/batch
Epoch: 9/20...  Training Step: 1613...  Training loss: 2.0647...  0.5898 sec/batch
Epoch: 9/20...  Training Step: 1614...  Training loss: 2.0447...  0.6073 sec/batch
Epoch: 9/20...  Training Step: 1615...  Training loss: 2.0312...  0.5883 sec/batch
Epoch: 9/20...  Training Step: 1616...  Training loss: 2.0029...  0.5922 sec/batch
Epoch: 9/20...  Training Step: 1617...  Training loss: 2.0344...  0.6021 sec/batch
Epoch: 9/20...  Training Step: 1618...  Training loss: 2.0575...  0.6233 sec/batch
Epoch: 9/20...  Training Step: 1619...  Training loss: 2.0116...  0.6051 sec/batch
Epoch: 9/20...  Training Step: 1620...  Training loss: 2.0345...  0.5793 sec/batch
Epoch: 9/20...  Training Step: 1621...  Training loss: 2.0265...  0.5890 sec/batch
Epoch: 9/20...  Training Step: 1622...  Training loss: 1.9909...  0.5939 sec/batch
Epoch: 9/20...  Training Step: 1623...  Training loss: 1.9918...  0.5995 sec/batch
Epoch: 9/20...  Training Step: 1624...  Training loss: 1.9966...  0.6030 sec/batch
Epoch: 9/20...  Training Step: 1625...  Training loss: 2.0084...  0.5837 sec/batch
Epoch: 9/20...  Training Step: 1626...  Training loss: 2.0210...  0.5955 sec/batch
Epoch: 9/20...  Training Step: 1627...  Training loss: 1.9957...  0.5959 sec/batch
Epoch: 9/20...  Training Step: 1628...  Training loss: 1.9933...  0.5959 sec/batch
Epoch: 9/20...  Training Step: 1629...  Training loss: 2.0272...  0.5997 sec/batch
Epoch: 9/20...  Training Step: 1630...  Training loss: 1.9654...  0.5755 sec/batch
Epoch: 9/20...  Training Step: 1631...  Training loss: 2.0237...  0.5958 sec/batch
Epoch: 9/20...  Training Step: 1632...  Training loss: 2.0055...  0.5975 sec/batch
Epoch: 9/20...  Training Step: 1633...  Training loss: 2.0061...  0.5910 sec/batch
Epoch: 9/20...  Training Step: 1634...  Training loss: 2.0437...  0.5825 sec/batch
Epoch: 9/20...  Training Step: 1635...  Training loss: 1.9997...  0.5908 sec/batch
Epoch: 9/20...  Training Step: 1636...  Training loss: 2.0633...  0.5946 sec/batch
Epoch: 9/20...  Training Step: 1637...  Training loss: 2.0141...  0.5911 sec/batch
Epoch: 9/20...  Training Step: 1638...  Training loss: 2.0036...  0.5989 sec/batch
Epoch: 9/20...  Training Step: 1639...  Training loss: 1.9975...  0.5881 sec/batch
Epoch: 9/20...  Training Step: 1640...  Training loss: 2.0328...  0.6747 sec/batch
Epoch: 9/20...  Training Step: 1641...  Training loss: 2.0305...  0.6131 sec/batch
Epoch: 9/20...  Training Step: 1642...  Training loss: 2.0212...  0.5868 sec/batch
Epoch: 9/20...  Training Step: 1643...  Training loss: 2.0087...  0.5868 sec/batch
Epoch: 9/20...  Training Step: 1644...  Training loss: 2.0462...  0.6056 sec/batch
Epoch: 9/20...  Training Step: 1645...  Training loss: 2.0221...  0.5985 sec/batch
Epoch: 9/20...  Training Step: 1646...  Training loss: 2.0602...  0.5977 sec/batch
Epoch: 9/20...  Training Step: 1647...  Training loss: 2.0581...  0.5954 sec/batch
Epoch: 9/20...  Training Step: 1648...  Training loss: 2.0308...  0.5947 sec/batch
Epoch: 9/20...  Training Step: 1649...  Training loss: 2.0117...  0.5988 sec/batch
Epoch: 9/20...  Training Step: 1650...  Training loss: 2.0440...  0.5886 sec/batch
Epoch: 9/20...  Training Step: 1651...  Training loss: 2.0313...  0.5892 sec/batch
Epoch: 9/20...  Training Step: 1652...  Training loss: 2.0015...  0.5930 sec/batch
Epoch: 9/20...  Training Step: 1653...  Training loss: 2.0093...  0.5895 sec/batch
Epoch: 9/20...  Training Step: 1654...  Training loss: 2.0091...  0.5902 sec/batch
Epoch: 9/20...  Training Step: 1655...  Training loss: 2.0421...  0.5881 sec/batch
Epoch: 9/20...  Training Step: 1656...  Training loss: 2.0447...  0.5960 sec/batch
Epoch: 9/20...  Training Step: 1657...  Training loss: 2.0426...  0.5767 sec/batch
Epoch: 9/20...  Training Step: 1658...  Training loss: 2.0058...  0.6028 sec/batch
Epoch: 9/20...  Training Step: 1659...  Training loss: 2.0184...  0.6096 sec/batch
Epoch: 9/20...  Training Step: 1660...  Training loss: 2.0459...  0.5835 sec/batch
Epoch: 9/20...  Training Step: 1661...  Training loss: 2.0168...  0.5883 sec/batch
Epoch: 9/20...  Training Step: 1662...  Training loss: 2.0184...  0.6011 sec/batch
Epoch: 9/20...  Training Step: 1663...  Training loss: 1.9873...  0.5915 sec/batch
Epoch: 9/20...  Training Step: 1664...  Training loss: 2.0083...  0.5923 sec/batch
Epoch: 9/20...  Training Step: 1665...  Training loss: 1.9898...  0.5923 sec/batch
Epoch: 9/20...  Training Step: 1666...  Training loss: 2.0285...  0.6015 sec/batch
Epoch: 9/20...  Training Step: 1667...  Training loss: 1.9857...  0.6199 sec/batch
Epoch: 9/20...  Training Step: 1668...  Training loss: 1.9997...  0.6057 sec/batch
Epoch: 9/20...  Training Step: 1669...  Training loss: 1.9721...  0.5816 sec/batch
Epoch: 9/20...  Training Step: 1670...  Training loss: 2.0098...  0.5958 sec/batch
Epoch: 9/20...  Training Step: 1671...  Training loss: 2.0077...  0.5908 sec/batch
Epoch: 9/20...  Training Step: 1672...  Training loss: 1.9951...  0.5969 sec/batch
Epoch: 9/20...  Training Step: 1673...  Training loss: 1.9795...  0.5983 sec/batch
Epoch: 9/20...  Training Step: 1674...  Training loss: 2.0296...  0.5996 sec/batch
Epoch: 9/20...  Training Step: 1675...  Training loss: 1.9865...  0.5932 sec/batch
Epoch: 9/20...  Training Step: 1676...  Training loss: 2.0049...  0.5966 sec/batch
Epoch: 9/20...  Training Step: 1677...  Training loss: 1.9828...  0.5817 sec/batch
Epoch: 9/20...  Training Step: 1678...  Training loss: 1.9767...  0.5999 sec/batch
Epoch: 9/20...  Training Step: 1679...  Training loss: 1.9877...  0.5926 sec/batch
Epoch: 9/20...  Training Step: 1680...  Training loss: 2.0134...  0.5855 sec/batch
Epoch: 9/20...  Training Step: 1681...  Training loss: 2.0101...  0.6033 sec/batch
Epoch: 9/20...  Training Step: 1682...  Training loss: 1.9865...  0.5971 sec/batch
Epoch: 9/20...  Training Step: 1683...  Training loss: 1.9891...  0.5917 sec/batch
Epoch: 9/20...  Training Step: 1684...  Training loss: 1.9827...  0.5880 sec/batch
Epoch: 9/20...  Training Step: 1685...  Training loss: 2.0226...  0.6009 sec/batch
Epoch: 9/20...  Training Step: 1686...  Training loss: 2.0059...  0.6009 sec/batch
Epoch: 9/20...  Training Step: 1687...  Training loss: 1.9917...  0.5921 sec/batch
Epoch: 9/20...  Training Step: 1688...  Training loss: 1.9908...  0.5962 sec/batch
Epoch: 9/20...  Training Step: 1689...  Training loss: 1.9947...  0.5792 sec/batch
Epoch: 9/20...  Training Step: 1690...  Training loss: 2.0027...  0.6023 sec/batch
Epoch: 9/20...  Training Step: 1691...  Training loss: 2.0178...  0.6061 sec/batch
Epoch: 9/20...  Training Step: 1692...  Training loss: 2.0241...  0.5972 sec/batch
Epoch: 9/20...  Training Step: 1693...  Training loss: 2.0230...  0.5983 sec/batch
Epoch: 9/20...  Training Step: 1694...  Training loss: 2.0078...  0.5887 sec/batch
Epoch: 9/20...  Training Step: 1695...  Training loss: 2.0168...  0.5992 sec/batch
Epoch: 9/20...  Training Step: 1696...  Training loss: 2.0210...  0.5945 sec/batch
Epoch: 9/20...  Training Step: 1697...  Training loss: 2.0138...  0.5929 sec/batch
Epoch: 9/20...  Training Step: 1698...  Training loss: 2.0080...  0.6057 sec/batch
Epoch: 9/20...  Training Step: 1699...  Training loss: 2.0022...  0.5982 sec/batch
Epoch: 9/20...  Training Step: 1700...  Training loss: 1.9762...  0.5968 sec/batch
Epoch: 9/20...  Training Step: 1701...  Training loss: 2.0101...  0.5970 sec/batch
Epoch: 9/20...  Training Step: 1702...  Training loss: 2.0076...  0.5898 sec/batch
Epoch: 9/20...  Training Step: 1703...  Training loss: 2.0276...  0.5970 sec/batch
Epoch: 9/20...  Training Step: 1704...  Training loss: 2.0091...  0.5887 sec/batch
Epoch: 9/20...  Training Step: 1705...  Training loss: 2.0308...  0.6467 sec/batch
Epoch: 9/20...  Training Step: 1706...  Training loss: 1.9883...  0.6032 sec/batch
Epoch: 9/20...  Training Step: 1707...  Training loss: 1.9918...  0.5907 sec/batch
Epoch: 9/20...  Training Step: 1708...  Training loss: 2.0386...  0.6000 sec/batch
Epoch: 9/20...  Training Step: 1709...  Training loss: 2.0118...  0.5852 sec/batch
Epoch: 9/20...  Training Step: 1710...  Training loss: 1.9694...  0.5862 sec/batch
Epoch: 9/20...  Training Step: 1711...  Training loss: 2.0222...  0.5863 sec/batch
Epoch: 9/20...  Training Step: 1712...  Training loss: 2.0248...  0.5885 sec/batch
Epoch: 9/20...  Training Step: 1713...  Training loss: 2.0064...  0.6095 sec/batch
Epoch: 9/20...  Training Step: 1714...  Training loss: 2.0044...  0.5982 sec/batch
Epoch: 9/20...  Training Step: 1715...  Training loss: 1.9981...  0.6078 sec/batch
Epoch: 9/20...  Training Step: 1716...  Training loss: 1.9809...  0.5835 sec/batch
Epoch: 9/20...  Training Step: 1717...  Training loss: 2.0091...  0.5870 sec/batch
Epoch: 9/20...  Training Step: 1718...  Training loss: 2.0173...  0.6139 sec/batch
Epoch: 9/20...  Training Step: 1719...  Training loss: 2.0084...  0.6015 sec/batch
Epoch: 9/20...  Training Step: 1720...  Training loss: 2.0089...  0.5936 sec/batch
Epoch: 9/20...  Training Step: 1721...  Training loss: 2.0254...  0.6005 sec/batch
Epoch: 9/20...  Training Step: 1722...  Training loss: 2.0086...  0.5940 sec/batch
Epoch: 9/20...  Training Step: 1723...  Training loss: 2.0392...  0.5869 sec/batch
Epoch: 9/20...  Training Step: 1724...  Training loss: 1.9961...  0.5925 sec/batch
Epoch: 9/20...  Training Step: 1725...  Training loss: 2.0413...  0.6069 sec/batch
Epoch: 9/20...  Training Step: 1726...  Training loss: 1.9968...  0.6045 sec/batch
Epoch: 9/20...  Training Step: 1727...  Training loss: 2.0189...  0.5917 sec/batch
Epoch: 9/20...  Training Step: 1728...  Training loss: 2.0031...  0.6016 sec/batch
Epoch: 9/20...  Training Step: 1729...  Training loss: 2.0000...  0.6022 sec/batch
Epoch: 9/20...  Training Step: 1730...  Training loss: 2.0147...  0.5994 sec/batch
Epoch: 9/20...  Training Step: 1731...  Training loss: 2.0259...  0.5996 sec/batch
Epoch: 9/20...  Training Step: 1732...  Training loss: 2.0318...  0.5926 sec/batch
Epoch: 9/20...  Training Step: 1733...  Training loss: 2.0159...  0.5911 sec/batch
Epoch: 9/20...  Training Step: 1734...  Training loss: 1.9988...  0.5882 sec/batch
Epoch: 9/20...  Training Step: 1735...  Training loss: 1.9870...  0.5921 sec/batch
Epoch: 9/20...  Training Step: 1736...  Training loss: 2.0393...  0.5916 sec/batch
Epoch: 9/20...  Training Step: 1737...  Training loss: 2.0104...  0.6006 sec/batch
Epoch: 9/20...  Training Step: 1738...  Training loss: 2.0204...  0.5919 sec/batch
Epoch: 9/20...  Training Step: 1739...  Training loss: 2.0034...  0.5981 sec/batch
Epoch: 9/20...  Training Step: 1740...  Training loss: 2.0044...  0.6060 sec/batch
Epoch: 9/20...  Training Step: 1741...  Training loss: 2.0100...  0.5945 sec/batch
Epoch: 9/20...  Training Step: 1742...  Training loss: 2.0000...  0.5992 sec/batch
Epoch: 9/20...  Training Step: 1743...  Training loss: 1.9667...  0.6015 sec/batch
Epoch: 9/20...  Training Step: 1744...  Training loss: 2.0324...  0.5897 sec/batch
Epoch: 9/20...  Training Step: 1745...  Training loss: 2.0322...  0.5961 sec/batch
Epoch: 9/20...  Training Step: 1746...  Training loss: 1.9967...  0.6143 sec/batch
Epoch: 9/20...  Training Step: 1747...  Training loss: 2.0241...  0.5934 sec/batch
Epoch: 9/20...  Training Step: 1748...  Training loss: 2.0118...  0.6053 sec/batch
Epoch: 9/20...  Training Step: 1749...  Training loss: 2.0159...  0.6169 sec/batch
Epoch: 9/20...  Training Step: 1750...  Training loss: 2.0032...  0.6004 sec/batch
Epoch: 9/20...  Training Step: 1751...  Training loss: 2.0081...  0.5923 sec/batch
Epoch: 9/20...  Training Step: 1752...  Training loss: 2.0434...  0.6042 sec/batch
Epoch: 9/20...  Training Step: 1753...  Training loss: 2.0068...  0.5986 sec/batch
Epoch: 9/20...  Training Step: 1754...  Training loss: 1.9915...  0.5903 sec/batch
Epoch: 9/20...  Training Step: 1755...  Training loss: 1.9877...  0.6020 sec/batch
Epoch: 9/20...  Training Step: 1756...  Training loss: 1.9898...  0.6007 sec/batch
Epoch: 9/20...  Training Step: 1757...  Training loss: 2.0201...  0.5915 sec/batch
Epoch: 9/20...  Training Step: 1758...  Training loss: 2.0181...  0.5818 sec/batch
Epoch: 9/20...  Training Step: 1759...  Training loss: 2.0256...  0.5964 sec/batch
Epoch: 9/20...  Training Step: 1760...  Training loss: 2.0028...  0.5908 sec/batch
Epoch: 9/20...  Training Step: 1761...  Training loss: 1.9914...  0.5962 sec/batch
Epoch: 9/20...  Training Step: 1762...  Training loss: 2.0173...  0.5857 sec/batch
Epoch: 9/20...  Training Step: 1763...  Training loss: 1.9907...  0.5982 sec/batch
Epoch: 9/20...  Training Step: 1764...  Training loss: 1.9665...  0.5995 sec/batch
Epoch: 9/20...  Training Step: 1765...  Training loss: 1.9863...  0.6033 sec/batch
Epoch: 9/20...  Training Step: 1766...  Training loss: 1.9961...  0.5968 sec/batch
Epoch: 9/20...  Training Step: 1767...  Training loss: 2.0019...  0.5888 sec/batch
Epoch: 9/20...  Training Step: 1768...  Training loss: 2.0270...  0.6167 sec/batch
Epoch: 9/20...  Training Step: 1769...  Training loss: 2.0048...  0.6055 sec/batch
Epoch: 9/20...  Training Step: 1770...  Training loss: 1.9903...  0.5945 sec/batch
Epoch: 9/20...  Training Step: 1771...  Training loss: 1.9984...  0.5911 sec/batch
Epoch: 9/20...  Training Step: 1772...  Training loss: 1.9813...  0.5898 sec/batch
Epoch: 9/20...  Training Step: 1773...  Training loss: 2.0033...  0.5955 sec/batch
Epoch: 9/20...  Training Step: 1774...  Training loss: 2.0043...  0.5976 sec/batch
Epoch: 9/20...  Training Step: 1775...  Training loss: 2.0073...  0.5950 sec/batch
Epoch: 9/20...  Training Step: 1776...  Training loss: 1.9840...  0.5982 sec/batch
Epoch: 9/20...  Training Step: 1777...  Training loss: 2.0060...  0.5961 sec/batch
Epoch: 9/20...  Training Step: 1778...  Training loss: 1.9856...  0.5979 sec/batch
Epoch: 9/20...  Training Step: 1779...  Training loss: 1.9748...  0.6057 sec/batch
Epoch: 9/20...  Training Step: 1780...  Training loss: 2.0045...  0.5965 sec/batch
Epoch: 9/20...  Training Step: 1781...  Training loss: 1.9965...  0.5957 sec/batch
Epoch: 9/20...  Training Step: 1782...  Training loss: 1.9879...  0.5699 sec/batch
Epoch: 10/20...  Training Step: 1783...  Training loss: 2.0687...  0.5997 sec/batch
Epoch: 10/20...  Training Step: 1784...  Training loss: 1.9816...  0.5883 sec/batch
Epoch: 10/20...  Training Step: 1785...  Training loss: 1.9946...  0.5999 sec/batch
Epoch: 10/20...  Training Step: 1786...  Training loss: 1.9968...  0.5913 sec/batch
Epoch: 10/20...  Training Step: 1787...  Training loss: 1.9917...  0.5953 sec/batch
Epoch: 10/20...  Training Step: 1788...  Training loss: 1.9644...  0.5964 sec/batch
Epoch: 10/20...  Training Step: 1789...  Training loss: 2.0053...  0.5920 sec/batch
Epoch: 10/20...  Training Step: 1790...  Training loss: 1.9912...  0.6033 sec/batch
Epoch: 10/20...  Training Step: 1791...  Training loss: 2.0330...  0.5877 sec/batch
Epoch: 10/20...  Training Step: 1792...  Training loss: 1.9875...  0.5964 sec/batch
Epoch: 10/20...  Training Step: 1793...  Training loss: 1.9805...  0.5897 sec/batch
Epoch: 10/20...  Training Step: 1794...  Training loss: 1.9688...  0.6006 sec/batch
Epoch: 10/20...  Training Step: 1795...  Training loss: 1.9977...  0.5918 sec/batch
Epoch: 10/20...  Training Step: 1796...  Training loss: 2.0308...  0.5956 sec/batch
Epoch: 10/20...  Training Step: 1797...  Training loss: 1.9885...  0.5945 sec/batch
Epoch: 10/20...  Training Step: 1798...  Training loss: 1.9801...  0.5991 sec/batch
Epoch: 10/20...  Training Step: 1799...  Training loss: 1.9971...  0.6020 sec/batch
Epoch: 10/20...  Training Step: 1800...  Training loss: 2.0391...  0.5941 sec/batch
Epoch: 10/20...  Training Step: 1801...  Training loss: 1.9939...  0.5735 sec/batch
Epoch: 10/20...  Training Step: 1802...  Training loss: 1.9975...  0.5776 sec/batch
Epoch: 10/20...  Training Step: 1803...  Training loss: 1.9792...  0.5940 sec/batch
Epoch: 10/20...  Training Step: 1804...  Training loss: 2.0247...  0.6001 sec/batch
Epoch: 10/20...  Training Step: 1805...  Training loss: 1.9914...  0.5841 sec/batch
Epoch: 10/20...  Training Step: 1806...  Training loss: 1.9822...  0.6055 sec/batch
Epoch: 10/20...  Training Step: 1807...  Training loss: 1.9916...  0.5850 sec/batch
Epoch: 10/20...  Training Step: 1808...  Training loss: 1.9810...  0.6055 sec/batch
Epoch: 10/20...  Training Step: 1809...  Training loss: 1.9761...  0.5997 sec/batch
Epoch: 10/20...  Training Step: 1810...  Training loss: 2.0080...  0.5921 sec/batch
Epoch: 10/20...  Training Step: 1811...  Training loss: 2.0291...  0.5954 sec/batch
Epoch: 10/20...  Training Step: 1812...  Training loss: 2.0154...  0.5887 sec/batch
Epoch: 10/20...  Training Step: 1813...  Training loss: 1.9829...  0.5943 sec/batch
Epoch: 10/20...  Training Step: 1814...  Training loss: 1.9785...  0.6064 sec/batch
Epoch: 10/20...  Training Step: 1815...  Training loss: 2.0112...  0.6002 sec/batch
Epoch: 10/20...  Training Step: 1816...  Training loss: 2.0127...  0.5974 sec/batch
Epoch: 10/20...  Training Step: 1817...  Training loss: 1.9805...  0.5976 sec/batch
Epoch: 10/20...  Training Step: 1818...  Training loss: 1.9960...  0.6285 sec/batch
Epoch: 10/20...  Training Step: 1819...  Training loss: 1.9838...  0.5966 sec/batch
Epoch: 10/20...  Training Step: 1820...  Training loss: 1.9489...  0.5912 sec/batch
Epoch: 10/20...  Training Step: 1821...  Training loss: 1.9620...  0.5962 sec/batch
Epoch: 10/20...  Training Step: 1822...  Training loss: 1.9601...  0.5912 sec/batch
Epoch: 10/20...  Training Step: 1823...  Training loss: 1.9741...  0.5955 sec/batch
Epoch: 10/20...  Training Step: 1824...  Training loss: 2.0050...  0.5974 sec/batch
Epoch: 10/20...  Training Step: 1825...  Training loss: 1.9853...  0.5947 sec/batch
Epoch: 10/20...  Training Step: 1826...  Training loss: 1.9614...  0.5924 sec/batch
Epoch: 10/20...  Training Step: 1827...  Training loss: 2.0007...  0.5952 sec/batch
Epoch: 10/20...  Training Step: 1828...  Training loss: 1.9353...  0.5939 sec/batch
Epoch: 10/20...  Training Step: 1829...  Training loss: 2.0045...  0.6103 sec/batch
Epoch: 10/20...  Training Step: 1830...  Training loss: 1.9670...  0.5884 sec/batch
Epoch: 10/20...  Training Step: 1831...  Training loss: 1.9694...  0.6233 sec/batch
Epoch: 10/20...  Training Step: 1832...  Training loss: 2.0248...  0.5662 sec/batch
Epoch: 10/20...  Training Step: 1833...  Training loss: 1.9632...  0.5976 sec/batch
Epoch: 10/20...  Training Step: 1834...  Training loss: 2.0336...  0.5943 sec/batch
Epoch: 10/20...  Training Step: 1835...  Training loss: 1.9866...  0.6140 sec/batch
Epoch: 10/20...  Training Step: 1836...  Training loss: 1.9753...  0.6108 sec/batch
Epoch: 10/20...  Training Step: 1837...  Training loss: 1.9749...  0.5996 sec/batch
Epoch: 10/20...  Training Step: 1838...  Training loss: 1.9972...  0.6041 sec/batch
Epoch: 10/20...  Training Step: 1839...  Training loss: 2.0017...  0.6080 sec/batch
Epoch: 10/20...  Training Step: 1840...  Training loss: 1.9776...  0.5980 sec/batch
Epoch: 10/20...  Training Step: 1841...  Training loss: 1.9753...  0.5931 sec/batch
Epoch: 10/20...  Training Step: 1842...  Training loss: 2.0111...  0.5990 sec/batch
Epoch: 10/20...  Training Step: 1843...  Training loss: 1.9836...  0.5967 sec/batch
Epoch: 10/20...  Training Step: 1844...  Training loss: 2.0393...  0.5989 sec/batch
Epoch: 10/20...  Training Step: 1845...  Training loss: 2.0230...  0.5941 sec/batch
Epoch: 10/20...  Training Step: 1846...  Training loss: 2.0004...  0.5956 sec/batch
Epoch: 10/20...  Training Step: 1847...  Training loss: 1.9828...  0.5986 sec/batch
Epoch: 10/20...  Training Step: 1848...  Training loss: 2.0051...  0.5905 sec/batch
Epoch: 10/20...  Training Step: 1849...  Training loss: 2.0089...  0.5864 sec/batch
Epoch: 10/20...  Training Step: 1850...  Training loss: 1.9615...  0.5993 sec/batch
Epoch: 10/20...  Training Step: 1851...  Training loss: 1.9786...  0.5941 sec/batch
Epoch: 10/20...  Training Step: 1852...  Training loss: 1.9734...  0.5937 sec/batch
Epoch: 10/20...  Training Step: 1853...  Training loss: 2.0207...  0.5889 sec/batch
Epoch: 10/20...  Training Step: 1854...  Training loss: 2.0136...  0.5880 sec/batch
Epoch: 10/20...  Training Step: 1855...  Training loss: 2.0060...  0.5947 sec/batch
Epoch: 10/20...  Training Step: 1856...  Training loss: 1.9793...  0.5975 sec/batch
Epoch: 10/20...  Training Step: 1857...  Training loss: 1.9919...  0.6012 sec/batch
Epoch: 10/20...  Training Step: 1858...  Training loss: 2.0070...  0.6063 sec/batch
Epoch: 10/20...  Training Step: 1859...  Training loss: 1.9899...  0.6036 sec/batch
Epoch: 10/20...  Training Step: 1860...  Training loss: 2.0006...  0.5897 sec/batch
Epoch: 10/20...  Training Step: 1861...  Training loss: 1.9629...  0.5925 sec/batch
Epoch: 10/20...  Training Step: 1862...  Training loss: 1.9723...  0.6067 sec/batch
Epoch: 10/20...  Training Step: 1863...  Training loss: 1.9600...  0.5912 sec/batch
Epoch: 10/20...  Training Step: 1864...  Training loss: 1.9989...  0.5851 sec/batch
Epoch: 10/20...  Training Step: 1865...  Training loss: 1.9605...  0.5987 sec/batch
Epoch: 10/20...  Training Step: 1866...  Training loss: 1.9733...  0.5997 sec/batch
Epoch: 10/20...  Training Step: 1867...  Training loss: 1.9455...  0.6003 sec/batch
Epoch: 10/20...  Training Step: 1868...  Training loss: 1.9703...  0.6089 sec/batch
Epoch: 10/20...  Training Step: 1869...  Training loss: 1.9821...  0.6011 sec/batch
Epoch: 10/20...  Training Step: 1870...  Training loss: 1.9697...  0.5898 sec/batch
Epoch: 10/20...  Training Step: 1871...  Training loss: 1.9511...  0.5859 sec/batch
Epoch: 10/20...  Training Step: 1872...  Training loss: 1.9905...  0.5921 sec/batch
Epoch: 10/20...  Training Step: 1873...  Training loss: 1.9735...  0.5995 sec/batch
Epoch: 10/20...  Training Step: 1874...  Training loss: 1.9741...  0.5959 sec/batch
Epoch: 10/20...  Training Step: 1875...  Training loss: 1.9490...  0.6014 sec/batch
Epoch: 10/20...  Training Step: 1876...  Training loss: 1.9600...  0.6013 sec/batch
Epoch: 10/20...  Training Step: 1877...  Training loss: 1.9598...  0.5979 sec/batch
Epoch: 10/20...  Training Step: 1878...  Training loss: 1.9920...  0.6031 sec/batch
Epoch: 10/20...  Training Step: 1879...  Training loss: 1.9756...  0.5970 sec/batch
Epoch: 10/20...  Training Step: 1880...  Training loss: 1.9573...  0.5865 sec/batch
Epoch: 10/20...  Training Step: 1881...  Training loss: 1.9554...  0.6050 sec/batch
Epoch: 10/20...  Training Step: 1882...  Training loss: 1.9507...  0.5967 sec/batch
Epoch: 10/20...  Training Step: 1883...  Training loss: 1.9818...  0.5738 sec/batch
Epoch: 10/20...  Training Step: 1884...  Training loss: 1.9809...  0.6025 sec/batch
Epoch: 10/20...  Training Step: 1885...  Training loss: 1.9495...  0.5803 sec/batch
Epoch: 10/20...  Training Step: 1886...  Training loss: 1.9641...  0.6005 sec/batch
Epoch: 10/20...  Training Step: 1887...  Training loss: 1.9657...  0.5917 sec/batch
Epoch: 10/20...  Training Step: 1888...  Training loss: 1.9743...  0.6013 sec/batch
Epoch: 10/20...  Training Step: 1889...  Training loss: 1.9775...  0.5769 sec/batch
Epoch: 10/20...  Training Step: 1890...  Training loss: 1.9852...  0.5860 sec/batch
Epoch: 10/20...  Training Step: 1891...  Training loss: 1.9847...  0.5833 sec/batch
Epoch: 10/20...  Training Step: 1892...  Training loss: 1.9807...  0.5932 sec/batch
Epoch: 10/20...  Training Step: 1893...  Training loss: 1.9971...  0.5969 sec/batch
Epoch: 10/20...  Training Step: 1894...  Training loss: 1.9650...  0.6017 sec/batch
Epoch: 10/20...  Training Step: 1895...  Training loss: 1.9802...  0.5928 sec/batch
Epoch: 10/20...  Training Step: 1896...  Training loss: 1.9739...  0.5894 sec/batch
Epoch: 10/20...  Training Step: 1897...  Training loss: 1.9546...  0.5894 sec/batch
Epoch: 10/20...  Training Step: 1898...  Training loss: 1.9355...  0.6075 sec/batch
Epoch: 10/20...  Training Step: 1899...  Training loss: 1.9810...  0.5943 sec/batch
Epoch: 10/20...  Training Step: 1900...  Training loss: 1.9699...  0.6065 sec/batch
Epoch: 10/20...  Training Step: 1901...  Training loss: 1.9801...  0.5897 sec/batch
Epoch: 10/20...  Training Step: 1902...  Training loss: 1.9680...  0.5967 sec/batch
Epoch: 10/20...  Training Step: 1903...  Training loss: 1.9919...  0.6054 sec/batch
Epoch: 10/20...  Training Step: 1904...  Training loss: 1.9579...  0.5948 sec/batch
Epoch: 10/20...  Training Step: 1905...  Training loss: 1.9521...  0.5918 sec/batch
Epoch: 10/20...  Training Step: 1906...  Training loss: 2.0083...  0.5948 sec/batch
Epoch: 10/20...  Training Step: 1907...  Training loss: 1.9723...  0.6044 sec/batch
Epoch: 10/20...  Training Step: 1908...  Training loss: 1.9390...  0.5978 sec/batch
Epoch: 10/20...  Training Step: 1909...  Training loss: 1.9877...  0.5695 sec/batch
Epoch: 10/20...  Training Step: 1910...  Training loss: 1.9929...  0.6004 sec/batch
Epoch: 10/20...  Training Step: 1911...  Training loss: 1.9865...  0.5932 sec/batch
Epoch: 10/20...  Training Step: 1912...  Training loss: 1.9885...  0.5999 sec/batch
Epoch: 10/20...  Training Step: 1913...  Training loss: 1.9583...  0.6001 sec/batch
Epoch: 10/20...  Training Step: 1914...  Training loss: 1.9458...  0.5921 sec/batch
Epoch: 10/20...  Training Step: 1915...  Training loss: 1.9872...  0.5875 sec/batch
Epoch: 10/20...  Training Step: 1916...  Training loss: 1.9832...  0.6012 sec/batch
Epoch: 10/20...  Training Step: 1917...  Training loss: 1.9783...  0.6004 sec/batch
Epoch: 10/20...  Training Step: 1918...  Training loss: 1.9798...  0.6262 sec/batch
Epoch: 10/20...  Training Step: 1919...  Training loss: 1.9845...  0.6051 sec/batch
Epoch: 10/20...  Training Step: 1920...  Training loss: 1.9858...  0.6026 sec/batch
Epoch: 10/20...  Training Step: 1921...  Training loss: 2.0045...  0.5961 sec/batch
Epoch: 10/20...  Training Step: 1922...  Training loss: 1.9588...  0.5982 sec/batch
Epoch: 10/20...  Training Step: 1923...  Training loss: 1.9940...  0.6046 sec/batch
Epoch: 10/20...  Training Step: 1924...  Training loss: 1.9733...  0.6015 sec/batch
Epoch: 10/20...  Training Step: 1925...  Training loss: 1.9757...  0.5902 sec/batch
Epoch: 10/20...  Training Step: 1926...  Training loss: 1.9733...  0.5928 sec/batch
Epoch: 10/20...  Training Step: 1927...  Training loss: 1.9643...  0.5943 sec/batch
Epoch: 10/20...  Training Step: 1928...  Training loss: 1.9883...  0.6076 sec/batch
Epoch: 10/20...  Training Step: 1929...  Training loss: 1.9917...  0.5879 sec/batch
Epoch: 10/20...  Training Step: 1930...  Training loss: 1.9910...  0.5976 sec/batch
Epoch: 10/20...  Training Step: 1931...  Training loss: 1.9820...  0.5981 sec/batch
Epoch: 10/20...  Training Step: 1932...  Training loss: 1.9649...  0.6027 sec/batch
Epoch: 10/20...  Training Step: 1933...  Training loss: 1.9624...  0.5963 sec/batch
Epoch: 10/20...  Training Step: 1934...  Training loss: 1.9990...  0.5988 sec/batch
Epoch: 10/20...  Training Step: 1935...  Training loss: 1.9840...  0.5935 sec/batch
Epoch: 10/20...  Training Step: 1936...  Training loss: 1.9916...  0.5821 sec/batch
Epoch: 10/20...  Training Step: 1937...  Training loss: 1.9676...  0.5974 sec/batch
Epoch: 10/20...  Training Step: 1938...  Training loss: 1.9734...  0.5948 sec/batch
Epoch: 10/20...  Training Step: 1939...  Training loss: 1.9759...  0.5992 sec/batch
Epoch: 10/20...  Training Step: 1940...  Training loss: 1.9672...  0.6007 sec/batch
Epoch: 10/20...  Training Step: 1941...  Training loss: 1.9503...  0.6022 sec/batch
Epoch: 10/20...  Training Step: 1942...  Training loss: 1.9954...  0.6045 sec/batch
Epoch: 10/20...  Training Step: 1943...  Training loss: 1.9964...  0.6102 sec/batch
Epoch: 10/20...  Training Step: 1944...  Training loss: 1.9651...  0.5989 sec/batch
Epoch: 10/20...  Training Step: 1945...  Training loss: 1.9930...  0.6033 sec/batch
Epoch: 10/20...  Training Step: 1946...  Training loss: 1.9773...  0.6028 sec/batch
Epoch: 10/20...  Training Step: 1947...  Training loss: 1.9838...  0.5974 sec/batch
Epoch: 10/20...  Training Step: 1948...  Training loss: 1.9734...  0.5918 sec/batch
Epoch: 10/20...  Training Step: 1949...  Training loss: 1.9733...  0.6036 sec/batch
Epoch: 10/20...  Training Step: 1950...  Training loss: 2.0235...  0.5956 sec/batch
Epoch: 10/20...  Training Step: 1951...  Training loss: 1.9730...  0.5955 sec/batch
Epoch: 10/20...  Training Step: 1952...  Training loss: 1.9626...  0.5939 sec/batch
Epoch: 10/20...  Training Step: 1953...  Training loss: 1.9579...  0.5995 sec/batch
Epoch: 10/20...  Training Step: 1954...  Training loss: 1.9647...  0.5945 sec/batch
Epoch: 10/20...  Training Step: 1955...  Training loss: 1.9965...  0.5987 sec/batch
Epoch: 10/20...  Training Step: 1956...  Training loss: 1.9832...  0.5989 sec/batch
Epoch: 10/20...  Training Step: 1957...  Training loss: 1.9851...  0.6078 sec/batch
Epoch: 10/20...  Training Step: 1958...  Training loss: 1.9631...  0.5926 sec/batch
Epoch: 10/20...  Training Step: 1959...  Training loss: 1.9675...  0.6029 sec/batch
Epoch: 10/20...  Training Step: 1960...  Training loss: 1.9794...  0.5937 sec/batch
Epoch: 10/20...  Training Step: 1961...  Training loss: 1.9541...  0.5986 sec/batch
Epoch: 10/20...  Training Step: 1962...  Training loss: 1.9442...  0.5895 sec/batch
Epoch: 10/20...  Training Step: 1963...  Training loss: 1.9492...  0.6007 sec/batch
Epoch: 10/20...  Training Step: 1964...  Training loss: 1.9817...  0.5718 sec/batch
Epoch: 10/20...  Training Step: 1965...  Training loss: 1.9591...  0.6069 sec/batch
Epoch: 10/20...  Training Step: 1966...  Training loss: 1.9934...  0.6113 sec/batch
Epoch: 10/20...  Training Step: 1967...  Training loss: 1.9723...  0.6001 sec/batch
Epoch: 10/20...  Training Step: 1968...  Training loss: 1.9573...  0.6266 sec/batch
Epoch: 10/20...  Training Step: 1969...  Training loss: 1.9818...  0.5905 sec/batch
Epoch: 10/20...  Training Step: 1970...  Training loss: 1.9545...  0.5862 sec/batch
Epoch: 10/20...  Training Step: 1971...  Training loss: 1.9662...  0.5738 sec/batch
Epoch: 10/20...  Training Step: 1972...  Training loss: 1.9689...  0.5964 sec/batch
Epoch: 10/20...  Training Step: 1973...  Training loss: 1.9811...  0.5997 sec/batch
Epoch: 10/20...  Training Step: 1974...  Training loss: 1.9423...  0.5915 sec/batch
Epoch: 10/20...  Training Step: 1975...  Training loss: 1.9648...  0.6074 sec/batch
Epoch: 10/20...  Training Step: 1976...  Training loss: 1.9625...  0.6023 sec/batch
Epoch: 10/20...  Training Step: 1977...  Training loss: 1.9394...  0.5823 sec/batch
Epoch: 10/20...  Training Step: 1978...  Training loss: 1.9725...  0.6045 sec/batch
Epoch: 10/20...  Training Step: 1979...  Training loss: 1.9588...  0.5947 sec/batch
Epoch: 10/20...  Training Step: 1980...  Training loss: 1.9575...  0.5980 sec/batch
Epoch: 11/20...  Training Step: 1981...  Training loss: 2.0349...  0.5956 sec/batch
Epoch: 11/20...  Training Step: 1982...  Training loss: 1.9501...  0.5989 sec/batch
Epoch: 11/20...  Training Step: 1983...  Training loss: 1.9501...  0.6016 sec/batch
Epoch: 11/20...  Training Step: 1984...  Training loss: 1.9622...  0.5930 sec/batch
Epoch: 11/20...  Training Step: 1985...  Training loss: 1.9613...  0.5989 sec/batch
Epoch: 11/20...  Training Step: 1986...  Training loss: 1.9324...  0.5955 sec/batch
Epoch: 11/20...  Training Step: 1987...  Training loss: 1.9674...  0.5915 sec/batch
Epoch: 11/20...  Training Step: 1988...  Training loss: 1.9587...  0.5940 sec/batch
Epoch: 11/20...  Training Step: 1989...  Training loss: 2.0015...  0.6034 sec/batch
Epoch: 11/20...  Training Step: 1990...  Training loss: 1.9600...  0.6007 sec/batch
Epoch: 11/20...  Training Step: 1991...  Training loss: 1.9444...  0.6083 sec/batch
Epoch: 11/20...  Training Step: 1992...  Training loss: 1.9468...  0.5989 sec/batch
Epoch: 11/20...  Training Step: 1993...  Training loss: 1.9648...  0.5913 sec/batch
Epoch: 11/20...  Training Step: 1994...  Training loss: 2.0036...  0.5988 sec/batch
Epoch: 11/20...  Training Step: 1995...  Training loss: 1.9665...  0.5901 sec/batch
Epoch: 11/20...  Training Step: 1996...  Training loss: 1.9442...  0.5916 sec/batch
Epoch: 11/20...  Training Step: 1997...  Training loss: 1.9666...  0.5985 sec/batch
Epoch: 11/20...  Training Step: 1998...  Training loss: 2.0023...  0.5873 sec/batch
Epoch: 11/20...  Training Step: 1999...  Training loss: 1.9758...  0.6118 sec/batch
Epoch: 11/20...  Training Step: 2000...  Training loss: 1.9648...  0.5922 sec/batch
Epoch: 11/20...  Training Step: 2001...  Training loss: 1.9592...  0.5672 sec/batch
Epoch: 11/20...  Training Step: 2002...  Training loss: 2.0006...  0.5960 sec/batch
Epoch: 11/20...  Training Step: 2003...  Training loss: 1.9707...  0.6005 sec/batch
Epoch: 11/20...  Training Step: 2004...  Training loss: 1.9662...  0.6002 sec/batch
Epoch: 11/20...  Training Step: 2005...  Training loss: 1.9653...  0.6035 sec/batch
Epoch: 11/20...  Training Step: 2006...  Training loss: 1.9395...  0.5960 sec/batch
Epoch: 11/20...  Training Step: 2007...  Training loss: 1.9458...  0.5979 sec/batch
Epoch: 11/20...  Training Step: 2008...  Training loss: 1.9749...  0.6038 sec/batch
Epoch: 11/20...  Training Step: 2009...  Training loss: 1.9980...  0.6009 sec/batch
Epoch: 11/20...  Training Step: 2010...  Training loss: 1.9856...  0.5935 sec/batch
Epoch: 11/20...  Training Step: 2011...  Training loss: 1.9666...  0.5991 sec/batch
Epoch: 11/20...  Training Step: 2012...  Training loss: 1.9539...  0.6037 sec/batch
Epoch: 11/20...  Training Step: 2013...  Training loss: 1.9808...  0.6058 sec/batch
Epoch: 11/20...  Training Step: 2014...  Training loss: 1.9871...  0.5929 sec/batch
Epoch: 11/20...  Training Step: 2015...  Training loss: 1.9553...  0.5833 sec/batch
Epoch: 11/20...  Training Step: 2016...  Training loss: 1.9706...  0.5839 sec/batch
Epoch: 11/20...  Training Step: 2017...  Training loss: 1.9502...  0.5971 sec/batch
Epoch: 11/20...  Training Step: 2018...  Training loss: 1.9247...  0.6347 sec/batch
Epoch: 11/20...  Training Step: 2019...  Training loss: 1.9146...  0.6034 sec/batch
Epoch: 11/20...  Training Step: 2020...  Training loss: 1.9376...  0.6031 sec/batch
Epoch: 11/20...  Training Step: 2021...  Training loss: 1.9427...  0.6012 sec/batch
Epoch: 11/20...  Training Step: 2022...  Training loss: 1.9751...  0.5933 sec/batch
Epoch: 11/20...  Training Step: 2023...  Training loss: 1.9451...  0.6074 sec/batch
Epoch: 11/20...  Training Step: 2024...  Training loss: 1.9322...  0.6011 sec/batch
Epoch: 11/20...  Training Step: 2025...  Training loss: 1.9705...  0.5798 sec/batch
Epoch: 11/20...  Training Step: 2026...  Training loss: 1.9120...  0.6036 sec/batch
Epoch: 11/20...  Training Step: 2027...  Training loss: 1.9585...  0.5906 sec/batch
Epoch: 11/20...  Training Step: 2028...  Training loss: 1.9435...  0.5960 sec/batch
Epoch: 11/20...  Training Step: 2029...  Training loss: 1.9505...  0.5975 sec/batch
Epoch: 11/20...  Training Step: 2030...  Training loss: 2.0125...  0.6028 sec/batch
Epoch: 11/20...  Training Step: 2031...  Training loss: 1.9275...  0.5996 sec/batch
Epoch: 11/20...  Training Step: 2032...  Training loss: 2.0041...  0.5921 sec/batch
Epoch: 11/20...  Training Step: 2033...  Training loss: 1.9556...  0.6047 sec/batch
Epoch: 11/20...  Training Step: 2034...  Training loss: 1.9543...  0.5964 sec/batch
Epoch: 11/20...  Training Step: 2035...  Training loss: 1.9549...  0.5997 sec/batch
Epoch: 11/20...  Training Step: 2036...  Training loss: 1.9665...  0.6015 sec/batch
Epoch: 11/20...  Training Step: 2037...  Training loss: 1.9765...  0.6000 sec/batch
Epoch: 11/20...  Training Step: 2038...  Training loss: 1.9546...  0.5909 sec/batch
Epoch: 11/20...  Training Step: 2039...  Training loss: 1.9462...  0.7376 sec/batch
Epoch: 11/20...  Training Step: 2040...  Training loss: 1.9781...  0.6419 sec/batch
Epoch: 11/20...  Training Step: 2041...  Training loss: 1.9552...  0.5971 sec/batch
Epoch: 11/20...  Training Step: 2042...  Training loss: 1.9931...  0.5961 sec/batch
Epoch: 11/20...  Training Step: 2043...  Training loss: 1.9843...  0.6010 sec/batch
Epoch: 11/20...  Training Step: 2044...  Training loss: 1.9755...  0.6009 sec/batch
Epoch: 11/20...  Training Step: 2045...  Training loss: 1.9382...  0.5936 sec/batch
Epoch: 11/20...  Training Step: 2046...  Training loss: 1.9827...  0.6057 sec/batch
Epoch: 11/20...  Training Step: 2047...  Training loss: 1.9716...  0.6070 sec/batch
Epoch: 11/20...  Training Step: 2048...  Training loss: 1.9411...  0.5986 sec/batch
Epoch: 11/20...  Training Step: 2049...  Training loss: 1.9411...  0.6031 sec/batch
Epoch: 11/20...  Training Step: 2050...  Training loss: 1.9482...  0.6160 sec/batch
Epoch: 11/20...  Training Step: 2051...  Training loss: 1.9907...  0.5989 sec/batch
Epoch: 11/20...  Training Step: 2052...  Training loss: 1.9632...  0.5947 sec/batch
Epoch: 11/20...  Training Step: 2053...  Training loss: 1.9785...  0.5891 sec/batch
Epoch: 11/20...  Training Step: 2054...  Training loss: 1.9509...  0.5930 sec/batch
Epoch: 11/20...  Training Step: 2055...  Training loss: 1.9556...  0.5960 sec/batch
Epoch: 11/20...  Training Step: 2056...  Training loss: 1.9766...  0.5741 sec/batch
Epoch: 11/20...  Training Step: 2057...  Training loss: 1.9569...  0.6011 sec/batch
Epoch: 11/20...  Training Step: 2058...  Training loss: 1.9673...  0.5930 sec/batch
Epoch: 11/20...  Training Step: 2059...  Training loss: 1.9280...  0.5980 sec/batch
Epoch: 11/20...  Training Step: 2060...  Training loss: 1.9494...  0.5683 sec/batch
Epoch: 11/20...  Training Step: 2061...  Training loss: 1.9223...  0.6031 sec/batch
Epoch: 11/20...  Training Step: 2062...  Training loss: 1.9664...  0.5977 sec/batch
Epoch: 11/20...  Training Step: 2063...  Training loss: 1.9225...  0.5958 sec/batch
Epoch: 11/20...  Training Step: 2064...  Training loss: 1.9512...  0.5996 sec/batch
Epoch: 11/20...  Training Step: 2065...  Training loss: 1.9234...  0.5995 sec/batch
Epoch: 11/20...  Training Step: 2066...  Training loss: 1.9421...  0.5979 sec/batch
Epoch: 11/20...  Training Step: 2067...  Training loss: 1.9500...  0.6076 sec/batch
Epoch: 11/20...  Training Step: 2068...  Training loss: 1.9326...  0.6175 sec/batch
Epoch: 11/20...  Training Step: 2069...  Training loss: 1.9287...  0.6108 sec/batch
Epoch: 11/20...  Training Step: 2070...  Training loss: 1.9579...  0.5942 sec/batch
Epoch: 11/20...  Training Step: 2071...  Training loss: 1.9368...  0.5980 sec/batch
Epoch: 11/20...  Training Step: 2072...  Training loss: 1.9521...  0.6047 sec/batch
Epoch: 11/20...  Training Step: 2073...  Training loss: 1.9178...  0.5911 sec/batch
Epoch: 11/20...  Training Step: 2074...  Training loss: 1.9347...  0.5975 sec/batch
Epoch: 11/20...  Training Step: 2075...  Training loss: 1.9178...  0.5936 sec/batch
Epoch: 11/20...  Training Step: 2076...  Training loss: 1.9510...  0.5922 sec/batch
Epoch: 11/20...  Training Step: 2077...  Training loss: 1.9430...  0.5990 sec/batch
Epoch: 11/20...  Training Step: 2078...  Training loss: 1.9279...  0.6191 sec/batch
Epoch: 11/20...  Training Step: 2079...  Training loss: 1.9348...  0.6153 sec/batch
Epoch: 11/20...  Training Step: 2080...  Training loss: 1.9121...  0.6000 sec/batch
Epoch: 11/20...  Training Step: 2081...  Training loss: 1.9619...  0.6032 sec/batch
Epoch: 11/20...  Training Step: 2082...  Training loss: 1.9463...  0.5945 sec/batch
Epoch: 11/20...  Training Step: 2083...  Training loss: 1.9329...  0.5910 sec/batch
Epoch: 11/20...  Training Step: 2084...  Training loss: 1.9327...  0.5832 sec/batch
Epoch: 11/20...  Training Step: 2085...  Training loss: 1.9423...  0.5920 sec/batch
Epoch: 11/20...  Training Step: 2086...  Training loss: 1.9540...  0.6015 sec/batch
Epoch: 11/20...  Training Step: 2087...  Training loss: 1.9478...  0.6012 sec/batch
Epoch: 11/20...  Training Step: 2088...  Training loss: 1.9504...  0.5889 sec/batch
Epoch: 11/20...  Training Step: 2089...  Training loss: 1.9548...  0.5923 sec/batch
Epoch: 11/20...  Training Step: 2090...  Training loss: 1.9504...  0.5843 sec/batch
Epoch: 11/20...  Training Step: 2091...  Training loss: 1.9481...  0.5867 sec/batch
Epoch: 11/20...  Training Step: 2092...  Training loss: 1.9433...  0.5978 sec/batch
Epoch: 11/20...  Training Step: 2093...  Training loss: 1.9539...  0.5893 sec/batch
Epoch: 11/20...  Training Step: 2094...  Training loss: 1.9317...  0.5953 sec/batch
Epoch: 11/20...  Training Step: 2095...  Training loss: 1.9337...  0.5963 sec/batch
Epoch: 11/20...  Training Step: 2096...  Training loss: 1.9103...  0.6048 sec/batch
Epoch: 11/20...  Training Step: 2097...  Training loss: 1.9598...  0.5762 sec/batch
Epoch: 11/20...  Training Step: 2098...  Training loss: 1.9314...  0.6185 sec/batch
Epoch: 11/20...  Training Step: 2099...  Training loss: 1.9687...  0.5972 sec/batch
Epoch: 11/20...  Training Step: 2100...  Training loss: 1.9474...  0.5880 sec/batch
Epoch: 11/20...  Training Step: 2101...  Training loss: 1.9698...  0.5937 sec/batch
Epoch: 11/20...  Training Step: 2102...  Training loss: 1.9164...  0.5979 sec/batch
Epoch: 11/20...  Training Step: 2103...  Training loss: 1.9308...  0.6002 sec/batch
Epoch: 11/20...  Training Step: 2104...  Training loss: 1.9747...  0.6050 sec/batch
Epoch: 11/20...  Training Step: 2105...  Training loss: 1.9477...  0.5923 sec/batch
Epoch: 11/20...  Training Step: 2106...  Training loss: 1.9180...  0.6067 sec/batch
Epoch: 11/20...  Training Step: 2107...  Training loss: 1.9716...  0.5955 sec/batch
Epoch: 11/20...  Training Step: 2108...  Training loss: 1.9527...  0.6039 sec/batch
Epoch: 11/20...  Training Step: 2109...  Training loss: 1.9474...  0.6000 sec/batch
Epoch: 11/20...  Training Step: 2110...  Training loss: 1.9521...  0.5900 sec/batch
Epoch: 11/20...  Training Step: 2111...  Training loss: 1.9323...  0.6053 sec/batch
Epoch: 11/20...  Training Step: 2112...  Training loss: 1.9208...  0.6027 sec/batch
Epoch: 11/20...  Training Step: 2113...  Training loss: 1.9574...  0.6053 sec/batch
Epoch: 11/20...  Training Step: 2114...  Training loss: 1.9535...  0.5905 sec/batch
Epoch: 11/20...  Training Step: 2115...  Training loss: 1.9423...  0.6121 sec/batch
Epoch: 11/20...  Training Step: 2116...  Training loss: 1.9461...  0.6040 sec/batch
Epoch: 11/20...  Training Step: 2117...  Training loss: 1.9596...  0.6018 sec/batch
Epoch: 11/20...  Training Step: 2118...  Training loss: 1.9551...  0.6402 sec/batch
Epoch: 11/20...  Training Step: 2119...  Training loss: 1.9778...  0.6005 sec/batch
Epoch: 11/20...  Training Step: 2120...  Training loss: 1.9435...  0.6041 sec/batch
Epoch: 11/20...  Training Step: 2121...  Training loss: 1.9808...  0.6088 sec/batch
Epoch: 11/20...  Training Step: 2122...  Training loss: 1.9398...  0.6033 sec/batch
Epoch: 11/20...  Training Step: 2123...  Training loss: 1.9478...  0.6016 sec/batch
Epoch: 11/20...  Training Step: 2124...  Training loss: 1.9472...  0.5969 sec/batch
Epoch: 11/20...  Training Step: 2125...  Training loss: 1.9346...  0.5960 sec/batch
Epoch: 11/20...  Training Step: 2126...  Training loss: 1.9514...  0.6064 sec/batch
Epoch: 11/20...  Training Step: 2127...  Training loss: 1.9699...  0.5999 sec/batch
Epoch: 11/20...  Training Step: 2128...  Training loss: 1.9721...  0.5976 sec/batch
Epoch: 11/20...  Training Step: 2129...  Training loss: 1.9524...  0.6040 sec/batch
Epoch: 11/20...  Training Step: 2130...  Training loss: 1.9426...  0.5971 sec/batch
Epoch: 11/20...  Training Step: 2131...  Training loss: 1.9334...  0.6030 sec/batch
Epoch: 11/20...  Training Step: 2132...  Training loss: 1.9717...  0.6025 sec/batch
Epoch: 11/20...  Training Step: 2133...  Training loss: 1.9398...  0.6013 sec/batch
Epoch: 11/20...  Training Step: 2134...  Training loss: 1.9567...  0.6000 sec/batch
Epoch: 11/20...  Training Step: 2135...  Training loss: 1.9402...  0.6033 sec/batch
Epoch: 11/20...  Training Step: 2136...  Training loss: 1.9466...  0.5991 sec/batch
Epoch: 11/20...  Training Step: 2137...  Training loss: 1.9477...  0.5825 sec/batch
Epoch: 11/20...  Training Step: 2138...  Training loss: 1.9497...  0.5903 sec/batch
Epoch: 11/20...  Training Step: 2139...  Training loss: 1.9287...  0.5865 sec/batch
Epoch: 11/20...  Training Step: 2140...  Training loss: 1.9740...  0.5766 sec/batch
Epoch: 11/20...  Training Step: 2141...  Training loss: 1.9730...  0.5995 sec/batch
Epoch: 11/20...  Training Step: 2142...  Training loss: 1.9358...  0.5984 sec/batch
Epoch: 11/20...  Training Step: 2143...  Training loss: 1.9653...  0.6064 sec/batch
Epoch: 11/20...  Training Step: 2144...  Training loss: 1.9606...  0.6058 sec/batch
Epoch: 11/20...  Training Step: 2145...  Training loss: 1.9523...  0.6043 sec/batch
Epoch: 11/20...  Training Step: 2146...  Training loss: 1.9434...  0.5941 sec/batch
Epoch: 11/20...  Training Step: 2147...  Training loss: 1.9536...  0.5934 sec/batch
Epoch: 11/20...  Training Step: 2148...  Training loss: 1.9876...  0.6100 sec/batch
Epoch: 11/20...  Training Step: 2149...  Training loss: 1.9396...  0.5975 sec/batch
Epoch: 11/20...  Training Step: 2150...  Training loss: 1.9466...  0.6080 sec/batch
Epoch: 11/20...  Training Step: 2151...  Training loss: 1.9411...  0.6002 sec/batch
Epoch: 11/20...  Training Step: 2152...  Training loss: 1.9364...  0.5985 sec/batch
Epoch: 11/20...  Training Step: 2153...  Training loss: 1.9648...  0.6024 sec/batch
Epoch: 11/20...  Training Step: 2154...  Training loss: 1.9480...  0.6078 sec/batch
Epoch: 11/20...  Training Step: 2155...  Training loss: 1.9508...  0.6050 sec/batch
Epoch: 11/20...  Training Step: 2156...  Training loss: 1.9406...  0.5956 sec/batch
Epoch: 11/20...  Training Step: 2157...  Training loss: 1.9261...  0.5827 sec/batch
Epoch: 11/20...  Training Step: 2158...  Training loss: 1.9600...  0.6027 sec/batch
Epoch: 11/20...  Training Step: 2159...  Training loss: 1.9362...  0.6005 sec/batch
Epoch: 11/20...  Training Step: 2160...  Training loss: 1.9010...  0.5981 sec/batch
Epoch: 11/20...  Training Step: 2161...  Training loss: 1.9165...  0.5788 sec/batch
Epoch: 11/20...  Training Step: 2162...  Training loss: 1.9423...  0.5913 sec/batch
Epoch: 11/20...  Training Step: 2163...  Training loss: 1.9400...  0.5990 sec/batch
Epoch: 11/20...  Training Step: 2164...  Training loss: 1.9658...  0.6139 sec/batch
Epoch: 11/20...  Training Step: 2165...  Training loss: 1.9375...  0.5913 sec/batch
Epoch: 11/20...  Training Step: 2166...  Training loss: 1.9377...  0.6038 sec/batch
Epoch: 11/20...  Training Step: 2167...  Training loss: 1.9509...  0.5848 sec/batch
Epoch: 11/20...  Training Step: 2168...  Training loss: 1.9266...  0.6578 sec/batch
Epoch: 11/20...  Training Step: 2169...  Training loss: 1.9292...  0.6042 sec/batch
Epoch: 11/20...  Training Step: 2170...  Training loss: 1.9470...  0.5912 sec/batch
Epoch: 11/20...  Training Step: 2171...  Training loss: 1.9441...  0.6019 sec/batch
Epoch: 11/20...  Training Step: 2172...  Training loss: 1.9148...  0.5937 sec/batch
Epoch: 11/20...  Training Step: 2173...  Training loss: 1.9510...  0.6027 sec/batch
Epoch: 11/20...  Training Step: 2174...  Training loss: 1.9338...  0.5985 sec/batch
Epoch: 11/20...  Training Step: 2175...  Training loss: 1.9091...  0.5980 sec/batch
Epoch: 11/20...  Training Step: 2176...  Training loss: 1.9425...  0.5960 sec/batch
Epoch: 11/20...  Training Step: 2177...  Training loss: 1.9288...  0.5916 sec/batch
Epoch: 11/20...  Training Step: 2178...  Training loss: 1.9307...  0.5951 sec/batch
Epoch: 12/20...  Training Step: 2179...  Training loss: 2.0063...  0.6023 sec/batch
Epoch: 12/20...  Training Step: 2180...  Training loss: 1.9285...  0.5941 sec/batch
Epoch: 12/20...  Training Step: 2181...  Training loss: 1.9272...  0.6053 sec/batch
Epoch: 12/20...  Training Step: 2182...  Training loss: 1.9347...  0.5997 sec/batch
Epoch: 12/20...  Training Step: 2183...  Training loss: 1.9287...  0.5949 sec/batch
Epoch: 12/20...  Training Step: 2184...  Training loss: 1.9041...  0.6045 sec/batch
Epoch: 12/20...  Training Step: 2185...  Training loss: 1.9364...  0.6037 sec/batch
Epoch: 12/20...  Training Step: 2186...  Training loss: 1.9367...  0.5993 sec/batch
Epoch: 12/20...  Training Step: 2187...  Training loss: 1.9652...  0.5809 sec/batch
Epoch: 12/20...  Training Step: 2188...  Training loss: 1.9278...  0.5981 sec/batch
Epoch: 12/20...  Training Step: 2189...  Training loss: 1.9092...  0.5866 sec/batch
Epoch: 12/20...  Training Step: 2190...  Training loss: 1.9205...  0.5996 sec/batch
Epoch: 12/20...  Training Step: 2191...  Training loss: 1.9403...  0.6001 sec/batch
Epoch: 12/20...  Training Step: 2192...  Training loss: 1.9719...  0.6016 sec/batch
Epoch: 12/20...  Training Step: 2193...  Training loss: 1.9302...  0.6032 sec/batch
Epoch: 12/20...  Training Step: 2194...  Training loss: 1.9202...  0.6000 sec/batch
Epoch: 12/20...  Training Step: 2195...  Training loss: 1.9339...  0.6053 sec/batch
Epoch: 12/20...  Training Step: 2196...  Training loss: 1.9668...  0.5991 sec/batch
Epoch: 12/20...  Training Step: 2197...  Training loss: 1.9408...  0.5933 sec/batch
Epoch: 12/20...  Training Step: 2198...  Training loss: 1.9440...  0.5984 sec/batch
Epoch: 12/20...  Training Step: 2199...  Training loss: 1.9217...  0.5906 sec/batch
Epoch: 12/20...  Training Step: 2200...  Training loss: 1.9703...  0.5974 sec/batch
Epoch: 12/20...  Training Step: 2201...  Training loss: 1.9373...  0.5795 sec/batch
Epoch: 12/20...  Training Step: 2202...  Training loss: 1.9244...  0.6041 sec/batch
Epoch: 12/20...  Training Step: 2203...  Training loss: 1.9329...  0.6159 sec/batch
Epoch: 12/20...  Training Step: 2204...  Training loss: 1.9063...  0.6037 sec/batch
Epoch: 12/20...  Training Step: 2205...  Training loss: 1.9206...  0.6029 sec/batch
Epoch: 12/20...  Training Step: 2206...  Training loss: 1.9458...  0.6029 sec/batch
Epoch: 12/20...  Training Step: 2207...  Training loss: 1.9657...  0.6006 sec/batch
Epoch: 12/20...  Training Step: 2208...  Training loss: 1.9540...  0.5945 sec/batch
Epoch: 12/20...  Training Step: 2209...  Training loss: 1.9394...  0.5983 sec/batch
Epoch: 12/20...  Training Step: 2210...  Training loss: 1.9222...  0.5958 sec/batch
Epoch: 12/20...  Training Step: 2211...  Training loss: 1.9541...  0.5962 sec/batch
Epoch: 12/20...  Training Step: 2212...  Training loss: 1.9545...  0.6023 sec/batch
Epoch: 12/20...  Training Step: 2213...  Training loss: 1.9164...  0.6049 sec/batch
Epoch: 12/20...  Training Step: 2214...  Training loss: 1.9415...  0.5821 sec/batch
Epoch: 12/20...  Training Step: 2215...  Training loss: 1.9283...  0.5926 sec/batch
Epoch: 12/20...  Training Step: 2216...  Training loss: 1.8879...  0.6015 sec/batch
Epoch: 12/20...  Training Step: 2217...  Training loss: 1.9058...  0.6378 sec/batch
Epoch: 12/20...  Training Step: 2218...  Training loss: 1.9191...  0.6263 sec/batch
Epoch: 12/20...  Training Step: 2219...  Training loss: 1.9225...  0.5980 sec/batch
Epoch: 12/20...  Training Step: 2220...  Training loss: 1.9411...  0.5995 sec/batch
Epoch: 12/20...  Training Step: 2221...  Training loss: 1.9287...  0.6009 sec/batch
Epoch: 12/20...  Training Step: 2222...  Training loss: 1.8988...  0.6078 sec/batch
Epoch: 12/20...  Training Step: 2223...  Training loss: 1.9404...  0.5957 sec/batch
Epoch: 12/20...  Training Step: 2224...  Training loss: 1.8736...  0.5923 sec/batch
Epoch: 12/20...  Training Step: 2225...  Training loss: 1.9336...  0.5923 sec/batch
Epoch: 12/20...  Training Step: 2226...  Training loss: 1.9183...  0.5822 sec/batch
Epoch: 12/20...  Training Step: 2227...  Training loss: 1.9169...  0.6018 sec/batch
Epoch: 12/20...  Training Step: 2228...  Training loss: 1.9721...  0.5969 sec/batch
Epoch: 12/20...  Training Step: 2229...  Training loss: 1.9156...  0.5977 sec/batch
Epoch: 12/20...  Training Step: 2230...  Training loss: 1.9770...  0.6469 sec/batch
Epoch: 12/20...  Training Step: 2231...  Training loss: 1.9259...  0.6201 sec/batch
Epoch: 12/20...  Training Step: 2232...  Training loss: 1.9255...  0.5897 sec/batch
Epoch: 12/20...  Training Step: 2233...  Training loss: 1.9123...  0.6189 sec/batch
Epoch: 12/20...  Training Step: 2234...  Training loss: 1.9390...  0.6056 sec/batch
Epoch: 12/20...  Training Step: 2235...  Training loss: 1.9515...  0.5875 sec/batch
Epoch: 12/20...  Training Step: 2236...  Training loss: 1.9152...  0.6065 sec/batch
Epoch: 12/20...  Training Step: 2237...  Training loss: 1.9208...  0.5910 sec/batch
Epoch: 12/20...  Training Step: 2238...  Training loss: 1.9438...  0.6155 sec/batch
Epoch: 12/20...  Training Step: 2239...  Training loss: 1.9366...  0.5999 sec/batch
Epoch: 12/20...  Training Step: 2240...  Training loss: 1.9802...  0.6011 sec/batch
Epoch: 12/20...  Training Step: 2241...  Training loss: 1.9628...  0.6056 sec/batch
Epoch: 12/20...  Training Step: 2242...  Training loss: 1.9394...  0.6005 sec/batch
Epoch: 12/20...  Training Step: 2243...  Training loss: 1.9267...  0.5983 sec/batch
Epoch: 12/20...  Training Step: 2244...  Training loss: 1.9489...  0.5997 sec/batch
Epoch: 12/20...  Training Step: 2245...  Training loss: 1.9388...  0.6013 sec/batch
Epoch: 12/20...  Training Step: 2246...  Training loss: 1.9073...  0.5984 sec/batch
Epoch: 12/20...  Training Step: 2247...  Training loss: 1.9130...  0.6003 sec/batch
Epoch: 12/20...  Training Step: 2248...  Training loss: 1.9141...  0.5942 sec/batch
Epoch: 12/20...  Training Step: 2249...  Training loss: 1.9677...  0.6014 sec/batch
Epoch: 12/20...  Training Step: 2250...  Training loss: 1.9503...  0.5872 sec/batch
Epoch: 12/20...  Training Step: 2251...  Training loss: 1.9489...  0.6011 sec/batch
Epoch: 12/20...  Training Step: 2252...  Training loss: 1.9190...  0.6013 sec/batch
Epoch: 12/20...  Training Step: 2253...  Training loss: 1.9255...  0.5962 sec/batch
Epoch: 12/20...  Training Step: 2254...  Training loss: 1.9529...  0.5915 sec/batch
Epoch: 12/20...  Training Step: 2255...  Training loss: 1.9295...  0.6321 sec/batch
Epoch: 12/20...  Training Step: 2256...  Training loss: 1.9372...  0.5989 sec/batch
Epoch: 12/20...  Training Step: 2257...  Training loss: 1.9111...  0.6054 sec/batch
Epoch: 12/20...  Training Step: 2258...  Training loss: 1.9199...  0.6044 sec/batch
Epoch: 12/20...  Training Step: 2259...  Training loss: 1.8849...  0.5916 sec/batch
Epoch: 12/20...  Training Step: 2260...  Training loss: 1.9362...  0.6071 sec/batch
Epoch: 12/20...  Training Step: 2261...  Training loss: 1.8916...  0.6035 sec/batch
Epoch: 12/20...  Training Step: 2262...  Training loss: 1.9240...  0.5849 sec/batch
Epoch: 12/20...  Training Step: 2263...  Training loss: 1.8937...  0.6177 sec/batch
Epoch: 12/20...  Training Step: 2264...  Training loss: 1.9224...  0.6013 sec/batch
Epoch: 12/20...  Training Step: 2265...  Training loss: 1.9204...  0.5944 sec/batch
Epoch: 12/20...  Training Step: 2266...  Training loss: 1.9153...  0.6242 sec/batch
Epoch: 12/20...  Training Step: 2267...  Training loss: 1.8933...  0.5938 sec/batch
Epoch: 12/20...  Training Step: 2268...  Training loss: 1.9427...  0.6021 sec/batch
Epoch: 12/20...  Training Step: 2269...  Training loss: 1.9145...  0.6118 sec/batch
Epoch: 12/20...  Training Step: 2270...  Training loss: 1.9274...  0.6150 sec/batch
Epoch: 12/20...  Training Step: 2271...  Training loss: 1.8968...  0.6079 sec/batch
Epoch: 12/20...  Training Step: 2272...  Training loss: 1.8987...  0.5945 sec/batch
Epoch: 12/20...  Training Step: 2273...  Training loss: 1.9154...  0.5996 sec/batch
Epoch: 12/20...  Training Step: 2274...  Training loss: 1.9369...  0.6118 sec/batch
Epoch: 12/20...  Training Step: 2275...  Training loss: 1.9233...  0.5954 sec/batch
Epoch: 12/20...  Training Step: 2276...  Training loss: 1.8956...  0.6045 sec/batch
Epoch: 12/20...  Training Step: 2277...  Training loss: 1.9047...  0.5796 sec/batch
Epoch: 12/20...  Training Step: 2278...  Training loss: 1.8812...  0.5996 sec/batch
Epoch: 12/20...  Training Step: 2279...  Training loss: 1.9277...  0.6061 sec/batch
Epoch: 12/20...  Training Step: 2280...  Training loss: 1.9164...  0.6228 sec/batch
Epoch: 12/20...  Training Step: 2281...  Training loss: 1.9041...  0.6110 sec/batch
Epoch: 12/20...  Training Step: 2282...  Training loss: 1.9031...  0.6019 sec/batch
Epoch: 12/20...  Training Step: 2283...  Training loss: 1.9115...  0.6025 sec/batch
Epoch: 12/20...  Training Step: 2284...  Training loss: 1.9147...  0.5967 sec/batch
Epoch: 12/20...  Training Step: 2285...  Training loss: 1.9229...  0.6006 sec/batch
Epoch: 12/20...  Training Step: 2286...  Training loss: 1.9329...  0.6007 sec/batch
Epoch: 12/20...  Training Step: 2287...  Training loss: 1.9348...  0.5960 sec/batch
Epoch: 12/20...  Training Step: 2288...  Training loss: 1.9257...  0.6061 sec/batch
Epoch: 12/20...  Training Step: 2289...  Training loss: 1.9099...  0.5958 sec/batch
Epoch: 12/20...  Training Step: 2290...  Training loss: 1.9198...  0.5999 sec/batch
Epoch: 12/20...  Training Step: 2291...  Training loss: 1.9270...  0.6046 sec/batch
Epoch: 12/20...  Training Step: 2292...  Training loss: 1.9178...  0.6035 sec/batch
Epoch: 12/20...  Training Step: 2293...  Training loss: 1.8988...  0.5929 sec/batch
Epoch: 12/20...  Training Step: 2294...  Training loss: 1.8805...  0.6019 sec/batch
Epoch: 12/20...  Training Step: 2295...  Training loss: 1.9353...  0.5998 sec/batch
Epoch: 12/20...  Training Step: 2296...  Training loss: 1.9263...  0.5976 sec/batch
Epoch: 12/20...  Training Step: 2297...  Training loss: 1.9292...  0.6011 sec/batch
Epoch: 12/20...  Training Step: 2298...  Training loss: 1.9202...  0.5914 sec/batch
Epoch: 12/20...  Training Step: 2299...  Training loss: 1.9322...  0.6045 sec/batch
Epoch: 12/20...  Training Step: 2300...  Training loss: 1.8930...  0.5988 sec/batch
Epoch: 12/20...  Training Step: 2301...  Training loss: 1.8957...  0.5975 sec/batch
Epoch: 12/20...  Training Step: 2302...  Training loss: 1.9489...  0.5913 sec/batch
Epoch: 12/20...  Training Step: 2303...  Training loss: 1.9173...  0.5903 sec/batch
Epoch: 12/20...  Training Step: 2304...  Training loss: 1.8774...  0.6035 sec/batch
Epoch: 12/20...  Training Step: 2305...  Training loss: 1.9304...  0.6065 sec/batch
Epoch: 12/20...  Training Step: 2306...  Training loss: 1.9324...  0.6068 sec/batch
Epoch: 12/20...  Training Step: 2307...  Training loss: 1.9168...  0.5923 sec/batch
Epoch: 12/20...  Training Step: 2308...  Training loss: 1.9357...  0.5996 sec/batch
Epoch: 12/20...  Training Step: 2309...  Training loss: 1.8956...  0.6038 sec/batch
Epoch: 12/20...  Training Step: 2310...  Training loss: 1.8930...  0.6148 sec/batch
Epoch: 12/20...  Training Step: 2311...  Training loss: 1.9321...  0.5928 sec/batch
Epoch: 12/20...  Training Step: 2312...  Training loss: 1.9322...  0.6032 sec/batch
Epoch: 12/20...  Training Step: 2313...  Training loss: 1.9376...  0.6145 sec/batch
Epoch: 12/20...  Training Step: 2314...  Training loss: 1.9264...  0.5975 sec/batch
Epoch: 12/20...  Training Step: 2315...  Training loss: 1.9389...  0.5900 sec/batch
Epoch: 12/20...  Training Step: 2316...  Training loss: 1.9341...  0.6292 sec/batch
Epoch: 12/20...  Training Step: 2317...  Training loss: 1.9558...  0.5921 sec/batch
Epoch: 12/20...  Training Step: 2318...  Training loss: 1.9098...  0.6013 sec/batch
Epoch: 12/20...  Training Step: 2319...  Training loss: 1.9655...  0.6034 sec/batch
Epoch: 12/20...  Training Step: 2320...  Training loss: 1.9123...  0.5989 sec/batch
Epoch: 12/20...  Training Step: 2321...  Training loss: 1.9342...  0.6094 sec/batch
Epoch: 12/20...  Training Step: 2322...  Training loss: 1.9228...  0.5974 sec/batch
Epoch: 12/20...  Training Step: 2323...  Training loss: 1.9063...  0.6033 sec/batch
Epoch: 12/20...  Training Step: 2324...  Training loss: 1.9286...  0.5877 sec/batch
Epoch: 12/20...  Training Step: 2325...  Training loss: 1.9398...  0.6059 sec/batch
Epoch: 12/20...  Training Step: 2326...  Training loss: 1.9555...  0.5929 sec/batch
Epoch: 12/20...  Training Step: 2327...  Training loss: 1.9249...  0.5889 sec/batch
Epoch: 12/20...  Training Step: 2328...  Training loss: 1.9061...  0.5924 sec/batch
Epoch: 12/20...  Training Step: 2329...  Training loss: 1.9016...  0.5895 sec/batch
Epoch: 12/20...  Training Step: 2330...  Training loss: 1.9424...  0.6123 sec/batch
Epoch: 12/20...  Training Step: 2331...  Training loss: 1.9406...  0.6030 sec/batch
Epoch: 12/20...  Training Step: 2332...  Training loss: 1.9369...  0.5949 sec/batch
Epoch: 12/20...  Training Step: 2333...  Training loss: 1.9227...  0.5954 sec/batch
Epoch: 12/20...  Training Step: 2334...  Training loss: 1.9176...  0.6003 sec/batch
Epoch: 12/20...  Training Step: 2335...  Training loss: 1.9155...  0.6014 sec/batch
Epoch: 12/20...  Training Step: 2336...  Training loss: 1.9128...  0.6168 sec/batch
Epoch: 12/20...  Training Step: 2337...  Training loss: 1.8806...  0.6023 sec/batch
Epoch: 12/20...  Training Step: 2338...  Training loss: 1.9468...  0.6069 sec/batch
Epoch: 12/20...  Training Step: 2339...  Training loss: 1.9462...  0.5950 sec/batch
Epoch: 12/20...  Training Step: 2340...  Training loss: 1.9187...  0.6018 sec/batch
Epoch: 12/20...  Training Step: 2341...  Training loss: 1.9448...  0.6042 sec/batch
Epoch: 12/20...  Training Step: 2342...  Training loss: 1.9213...  0.5969 sec/batch
Epoch: 12/20...  Training Step: 2343...  Training loss: 1.9343...  0.5952 sec/batch
Epoch: 12/20...  Training Step: 2344...  Training loss: 1.9194...  0.5838 sec/batch
Epoch: 12/20...  Training Step: 2345...  Training loss: 1.9273...  0.5986 sec/batch
Epoch: 12/20...  Training Step: 2346...  Training loss: 1.9711...  0.6125 sec/batch
Epoch: 12/20...  Training Step: 2347...  Training loss: 1.9204...  0.6036 sec/batch
Epoch: 12/20...  Training Step: 2348...  Training loss: 1.9193...  0.5974 sec/batch
Epoch: 12/20...  Training Step: 2349...  Training loss: 1.9092...  0.5990 sec/batch
Epoch: 12/20...  Training Step: 2350...  Training loss: 1.9046...  0.5990 sec/batch
Epoch: 12/20...  Training Step: 2351...  Training loss: 1.9517...  0.5973 sec/batch
Epoch: 12/20...  Training Step: 2352...  Training loss: 1.9331...  0.6004 sec/batch
Epoch: 12/20...  Training Step: 2353...  Training loss: 1.9311...  0.6078 sec/batch
Epoch: 12/20...  Training Step: 2354...  Training loss: 1.9174...  0.5992 sec/batch
Epoch: 12/20...  Training Step: 2355...  Training loss: 1.9014...  0.6041 sec/batch
Epoch: 12/20...  Training Step: 2356...  Training loss: 1.9272...  0.6110 sec/batch
Epoch: 12/20...  Training Step: 2357...  Training loss: 1.8997...  0.5933 sec/batch
Epoch: 12/20...  Training Step: 2358...  Training loss: 1.8991...  0.5863 sec/batch
Epoch: 12/20...  Training Step: 2359...  Training loss: 1.8931...  0.5879 sec/batch
Epoch: 12/20...  Training Step: 2360...  Training loss: 1.9245...  0.6072 sec/batch
Epoch: 12/20...  Training Step: 2361...  Training loss: 1.9000...  0.6065 sec/batch
Epoch: 12/20...  Training Step: 2362...  Training loss: 1.9397...  0.5945 sec/batch
Epoch: 12/20...  Training Step: 2363...  Training loss: 1.9186...  0.6065 sec/batch
Epoch: 12/20...  Training Step: 2364...  Training loss: 1.9059...  0.6035 sec/batch
Epoch: 12/20...  Training Step: 2365...  Training loss: 1.9153...  0.5953 sec/batch
Epoch: 12/20...  Training Step: 2366...  Training loss: 1.9040...  0.6055 sec/batch
Epoch: 12/20...  Training Step: 2367...  Training loss: 1.9265...  0.5965 sec/batch
Epoch: 12/20...  Training Step: 2368...  Training loss: 1.9122...  0.6114 sec/batch
Epoch: 12/20...  Training Step: 2369...  Training loss: 1.9289...  0.5852 sec/batch
Epoch: 12/20...  Training Step: 2370...  Training loss: 1.8916...  0.6001 sec/batch
Epoch: 12/20...  Training Step: 2371...  Training loss: 1.9199...  0.5964 sec/batch
Epoch: 12/20...  Training Step: 2372...  Training loss: 1.8966...  0.5970 sec/batch
Epoch: 12/20...  Training Step: 2373...  Training loss: 1.8936...  0.5982 sec/batch
Epoch: 12/20...  Training Step: 2374...  Training loss: 1.9192...  0.5928 sec/batch
Epoch: 12/20...  Training Step: 2375...  Training loss: 1.9067...  0.6004 sec/batch
Epoch: 12/20...  Training Step: 2376...  Training loss: 1.9031...  0.6111 sec/batch
Epoch: 13/20...  Training Step: 2377...  Training loss: 1.9910...  0.5727 sec/batch
Epoch: 13/20...  Training Step: 2378...  Training loss: 1.9125...  0.5967 sec/batch
Epoch: 13/20...  Training Step: 2379...  Training loss: 1.9085...  0.6042 sec/batch
Epoch: 13/20...  Training Step: 2380...  Training loss: 1.9152...  0.6040 sec/batch
Epoch: 13/20...  Training Step: 2381...  Training loss: 1.9010...  0.6046 sec/batch
Epoch: 13/20...  Training Step: 2382...  Training loss: 1.8746...  0.6065 sec/batch
Epoch: 13/20...  Training Step: 2383...  Training loss: 1.9221...  0.5954 sec/batch
Epoch: 13/20...  Training Step: 2384...  Training loss: 1.9168...  0.6001 sec/batch
Epoch: 13/20...  Training Step: 2385...  Training loss: 1.9502...  0.5969 sec/batch
Epoch: 13/20...  Training Step: 2386...  Training loss: 1.8988...  0.5899 sec/batch
Epoch: 13/20...  Training Step: 2387...  Training loss: 1.8896...  0.6040 sec/batch
Epoch: 13/20...  Training Step: 2388...  Training loss: 1.8884...  0.6102 sec/batch
Epoch: 13/20...  Training Step: 2389...  Training loss: 1.9198...  0.5986 sec/batch
Epoch: 13/20...  Training Step: 2390...  Training loss: 1.9400...  0.6046 sec/batch
Epoch: 13/20...  Training Step: 2391...  Training loss: 1.9116...  0.5953 sec/batch
Epoch: 13/20...  Training Step: 2392...  Training loss: 1.8901...  0.5983 sec/batch
Epoch: 13/20...  Training Step: 2393...  Training loss: 1.9167...  0.6028 sec/batch
Epoch: 13/20...  Training Step: 2394...  Training loss: 1.9512...  0.6102 sec/batch
Epoch: 13/20...  Training Step: 2395...  Training loss: 1.9114...  0.5961 sec/batch
Epoch: 13/20...  Training Step: 2396...  Training loss: 1.9174...  0.6016 sec/batch
Epoch: 13/20...  Training Step: 2397...  Training loss: 1.9009...  0.6045 sec/batch
Epoch: 13/20...  Training Step: 2398...  Training loss: 1.9376...  0.5984 sec/batch
Epoch: 13/20...  Training Step: 2399...  Training loss: 1.9185...  0.6036 sec/batch
Epoch: 13/20...  Training Step: 2400...  Training loss: 1.9135...  0.6013 sec/batch
Epoch: 13/20...  Training Step: 2401...  Training loss: 1.9088...  0.5730 sec/batch
Epoch: 13/20...  Training Step: 2402...  Training loss: 1.9000...  0.6020 sec/batch
Epoch: 13/20...  Training Step: 2403...  Training loss: 1.8952...  0.6095 sec/batch
Epoch: 13/20...  Training Step: 2404...  Training loss: 1.9223...  0.6047 sec/batch
Epoch: 13/20...  Training Step: 2405...  Training loss: 1.9435...  0.6068 sec/batch
Epoch: 13/20...  Training Step: 2406...  Training loss: 1.9319...  0.5999 sec/batch
Epoch: 13/20...  Training Step: 2407...  Training loss: 1.9166...  0.6052 sec/batch
Epoch: 13/20...  Training Step: 2408...  Training loss: 1.9006...  0.6046 sec/batch
Epoch: 13/20...  Training Step: 2409...  Training loss: 1.9238...  0.6050 sec/batch
Epoch: 13/20...  Training Step: 2410...  Training loss: 1.9438...  0.5875 sec/batch
Epoch: 13/20...  Training Step: 2411...  Training loss: 1.9005...  0.6048 sec/batch
Epoch: 13/20...  Training Step: 2412...  Training loss: 1.9246...  0.6057 sec/batch
Epoch: 13/20...  Training Step: 2413...  Training loss: 1.9003...  0.5989 sec/batch
Epoch: 13/20...  Training Step: 2414...  Training loss: 1.8697...  0.6025 sec/batch
Epoch: 13/20...  Training Step: 2415...  Training loss: 1.8670...  0.6222 sec/batch
Epoch: 13/20...  Training Step: 2416...  Training loss: 1.8834...  0.6010 sec/batch
Epoch: 13/20...  Training Step: 2417...  Training loss: 1.8868...  0.5984 sec/batch
Epoch: 13/20...  Training Step: 2418...  Training loss: 1.9356...  0.6142 sec/batch
Epoch: 13/20...  Training Step: 2419...  Training loss: 1.8924...  0.5971 sec/batch
Epoch: 13/20...  Training Step: 2420...  Training loss: 1.8754...  0.6048 sec/batch
Epoch: 13/20...  Training Step: 2421...  Training loss: 1.9146...  0.5937 sec/batch
Epoch: 13/20...  Training Step: 2422...  Training loss: 1.8596...  0.6099 sec/batch
Epoch: 13/20...  Training Step: 2423...  Training loss: 1.9067...  0.5980 sec/batch
Epoch: 13/20...  Training Step: 2424...  Training loss: 1.8998...  0.6006 sec/batch
Epoch: 13/20...  Training Step: 2425...  Training loss: 1.9052...  0.5982 sec/batch
Epoch: 13/20...  Training Step: 2426...  Training loss: 1.9365...  0.6105 sec/batch
Epoch: 13/20...  Training Step: 2427...  Training loss: 1.8815...  0.6043 sec/batch
Epoch: 13/20...  Training Step: 2428...  Training loss: 1.9543...  0.5934 sec/batch
Epoch: 13/20...  Training Step: 2429...  Training loss: 1.9001...  0.5939 sec/batch
Epoch: 13/20...  Training Step: 2430...  Training loss: 1.9078...  0.6030 sec/batch
Epoch: 13/20...  Training Step: 2431...  Training loss: 1.8970...  0.5963 sec/batch
Epoch: 13/20...  Training Step: 2432...  Training loss: 1.9151...  0.6050 sec/batch
Epoch: 13/20...  Training Step: 2433...  Training loss: 1.9200...  0.6005 sec/batch
Epoch: 13/20...  Training Step: 2434...  Training loss: 1.8894...  0.5966 sec/batch
Epoch: 13/20...  Training Step: 2435...  Training loss: 1.8846...  0.6115 sec/batch
Epoch: 13/20...  Training Step: 2436...  Training loss: 1.9277...  0.6068 sec/batch
Epoch: 13/20...  Training Step: 2437...  Training loss: 1.9168...  0.6115 sec/batch
Epoch: 13/20...  Training Step: 2438...  Training loss: 1.9508...  0.5963 sec/batch
Epoch: 13/20...  Training Step: 2439...  Training loss: 1.9494...  0.6089 sec/batch
Epoch: 13/20...  Training Step: 2440...  Training loss: 1.9174...  0.6007 sec/batch
Epoch: 13/20...  Training Step: 2441...  Training loss: 1.9081...  0.5895 sec/batch
Epoch: 13/20...  Training Step: 2442...  Training loss: 1.9339...  0.6089 sec/batch
Epoch: 13/20...  Training Step: 2443...  Training loss: 1.9211...  0.5951 sec/batch
Epoch: 13/20...  Training Step: 2444...  Training loss: 1.8849...  0.5981 sec/batch
Epoch: 13/20...  Training Step: 2445...  Training loss: 1.9007...  0.6055 sec/batch
Epoch: 13/20...  Training Step: 2446...  Training loss: 1.8997...  0.5890 sec/batch
Epoch: 13/20...  Training Step: 2447...  Training loss: 1.9468...  0.6141 sec/batch
Epoch: 13/20...  Training Step: 2448...  Training loss: 1.9232...  0.6079 sec/batch
Epoch: 13/20...  Training Step: 2449...  Training loss: 1.9286...  0.6006 sec/batch
Epoch: 13/20...  Training Step: 2450...  Training loss: 1.9055...  0.6091 sec/batch
Epoch: 13/20...  Training Step: 2451...  Training loss: 1.9073...  0.5971 sec/batch
Epoch: 13/20...  Training Step: 2452...  Training loss: 1.9286...  0.6024 sec/batch
Epoch: 13/20...  Training Step: 2453...  Training loss: 1.9176...  0.6032 sec/batch
Epoch: 13/20...  Training Step: 2454...  Training loss: 1.9175...  0.6148 sec/batch
Epoch: 13/20...  Training Step: 2455...  Training loss: 1.8818...  0.6047 sec/batch
Epoch: 13/20...  Training Step: 2456...  Training loss: 1.8942...  0.6055 sec/batch
Epoch: 13/20...  Training Step: 2457...  Training loss: 1.8632...  0.6128 sec/batch
Epoch: 13/20...  Training Step: 2458...  Training loss: 1.9183...  0.6041 sec/batch
Epoch: 13/20...  Training Step: 2459...  Training loss: 1.8719...  0.5950 sec/batch
Epoch: 13/20...  Training Step: 2460...  Training loss: 1.9012...  0.6085 sec/batch
Epoch: 13/20...  Training Step: 2461...  Training loss: 1.8716...  0.5971 sec/batch
Epoch: 13/20...  Training Step: 2462...  Training loss: 1.8848...  0.6033 sec/batch
Epoch: 13/20...  Training Step: 2463...  Training loss: 1.8946...  0.6026 sec/batch
Epoch: 13/20...  Training Step: 2464...  Training loss: 1.8760...  0.6017 sec/batch
Epoch: 13/20...  Training Step: 2465...  Training loss: 1.8559...  0.6322 sec/batch
Epoch: 13/20...  Training Step: 2466...  Training loss: 1.8986...  0.5846 sec/batch
Epoch: 13/20...  Training Step: 2467...  Training loss: 1.8712...  0.5803 sec/batch
Epoch: 13/20...  Training Step: 2468...  Training loss: 1.8929...  0.6072 sec/batch
Epoch: 13/20...  Training Step: 2469...  Training loss: 1.8763...  0.6040 sec/batch
Epoch: 13/20...  Training Step: 2470...  Training loss: 1.8773...  0.5986 sec/batch
Epoch: 13/20...  Training Step: 2471...  Training loss: 1.8677...  0.6008 sec/batch
Epoch: 13/20...  Training Step: 2472...  Training loss: 1.9053...  0.6088 sec/batch
Epoch: 13/20...  Training Step: 2473...  Training loss: 1.8944...  0.5997 sec/batch
Epoch: 13/20...  Training Step: 2474...  Training loss: 1.8608...  0.5743 sec/batch
Epoch: 13/20...  Training Step: 2475...  Training loss: 1.8841...  0.6010 sec/batch
Epoch: 13/20...  Training Step: 2476...  Training loss: 1.8727...  0.6100 sec/batch
Epoch: 13/20...  Training Step: 2477...  Training loss: 1.9045...  0.5913 sec/batch
Epoch: 13/20...  Training Step: 2478...  Training loss: 1.8868...  0.6090 sec/batch
Epoch: 13/20...  Training Step: 2479...  Training loss: 1.8865...  0.6016 sec/batch
Epoch: 13/20...  Training Step: 2480...  Training loss: 1.8956...  0.6012 sec/batch
Epoch: 13/20...  Training Step: 2481...  Training loss: 1.8934...  0.6008 sec/batch
Epoch: 13/20...  Training Step: 2482...  Training loss: 1.8939...  0.5977 sec/batch
Epoch: 13/20...  Training Step: 2483...  Training loss: 1.9034...  0.5885 sec/batch
Epoch: 13/20...  Training Step: 2484...  Training loss: 1.9055...  0.6018 sec/batch
Epoch: 13/20...  Training Step: 2485...  Training loss: 1.9117...  0.6108 sec/batch
Epoch: 13/20...  Training Step: 2486...  Training loss: 1.9101...  0.6040 sec/batch
Epoch: 13/20...  Training Step: 2487...  Training loss: 1.9062...  0.6222 sec/batch
Epoch: 13/20...  Training Step: 2488...  Training loss: 1.8915...  0.6040 sec/batch
Epoch: 13/20...  Training Step: 2489...  Training loss: 1.8941...  0.5975 sec/batch
Epoch: 13/20...  Training Step: 2490...  Training loss: 1.8994...  0.5918 sec/batch
Epoch: 13/20...  Training Step: 2491...  Training loss: 1.8919...  0.5951 sec/batch
Epoch: 13/20...  Training Step: 2492...  Training loss: 1.8688...  0.6077 sec/batch
Epoch: 13/20...  Training Step: 2493...  Training loss: 1.9064...  0.6070 sec/batch
Epoch: 13/20...  Training Step: 2494...  Training loss: 1.8980...  0.6098 sec/batch
Epoch: 13/20...  Training Step: 2495...  Training loss: 1.9128...  0.6057 sec/batch
Epoch: 13/20...  Training Step: 2496...  Training loss: 1.8894...  0.6075 sec/batch
Epoch: 13/20...  Training Step: 2497...  Training loss: 1.9094...  0.6070 sec/batch
Epoch: 13/20...  Training Step: 2498...  Training loss: 1.8657...  0.6076 sec/batch
Epoch: 13/20...  Training Step: 2499...  Training loss: 1.8779...  0.5986 sec/batch
Epoch: 13/20...  Training Step: 2500...  Training loss: 1.9198...  0.5973 sec/batch
Epoch: 13/20...  Training Step: 2501...  Training loss: 1.8997...  0.6013 sec/batch
Epoch: 13/20...  Training Step: 2502...  Training loss: 1.8472...  0.5978 sec/batch
Epoch: 13/20...  Training Step: 2503...  Training loss: 1.9129...  0.6082 sec/batch
Epoch: 13/20...  Training Step: 2504...  Training loss: 1.8996...  0.5956 sec/batch
Epoch: 13/20...  Training Step: 2505...  Training loss: 1.8981...  0.5974 sec/batch
Epoch: 13/20...  Training Step: 2506...  Training loss: 1.9056...  0.6021 sec/batch
Epoch: 13/20...  Training Step: 2507...  Training loss: 1.8796...  0.5987 sec/batch
Epoch: 13/20...  Training Step: 2508...  Training loss: 1.8778...  0.6121 sec/batch
Epoch: 13/20...  Training Step: 2509...  Training loss: 1.9088...  0.6089 sec/batch
Epoch: 13/20...  Training Step: 2510...  Training loss: 1.9026...  0.5983 sec/batch
Epoch: 13/20...  Training Step: 2511...  Training loss: 1.9133...  0.6016 sec/batch
Epoch: 13/20...  Training Step: 2512...  Training loss: 1.9016...  0.6113 sec/batch
Epoch: 13/20...  Training Step: 2513...  Training loss: 1.9156...  0.6022 sec/batch
Epoch: 13/20...  Training Step: 2514...  Training loss: 1.9143...  0.6066 sec/batch
Epoch: 13/20...  Training Step: 2515...  Training loss: 1.9373...  0.6271 sec/batch
Epoch: 13/20...  Training Step: 2516...  Training loss: 1.8898...  0.5977 sec/batch
Epoch: 13/20...  Training Step: 2517...  Training loss: 1.9342...  0.6009 sec/batch
Epoch: 13/20...  Training Step: 2518...  Training loss: 1.8889...  0.6047 sec/batch
Epoch: 13/20...  Training Step: 2519...  Training loss: 1.8982...  0.6093 sec/batch
Epoch: 13/20...  Training Step: 2520...  Training loss: 1.9112...  0.6065 sec/batch
Epoch: 13/20...  Training Step: 2521...  Training loss: 1.8969...  0.6081 sec/batch
Epoch: 13/20...  Training Step: 2522...  Training loss: 1.9117...  0.6016 sec/batch
Epoch: 13/20...  Training Step: 2523...  Training loss: 1.9080...  0.5973 sec/batch
Epoch: 13/20...  Training Step: 2524...  Training loss: 1.9372...  0.5938 sec/batch
Epoch: 13/20...  Training Step: 2525...  Training loss: 1.9068...  0.6014 sec/batch
Epoch: 13/20...  Training Step: 2526...  Training loss: 1.8906...  0.6092 sec/batch
Epoch: 13/20...  Training Step: 2527...  Training loss: 1.8836...  0.5957 sec/batch
Epoch: 13/20...  Training Step: 2528...  Training loss: 1.9201...  0.5990 sec/batch
Epoch: 13/20...  Training Step: 2529...  Training loss: 1.9095...  0.5973 sec/batch
Epoch: 13/20...  Training Step: 2530...  Training loss: 1.9187...  0.6056 sec/batch
Epoch: 13/20...  Training Step: 2531...  Training loss: 1.8930...  0.5967 sec/batch
Epoch: 13/20...  Training Step: 2532...  Training loss: 1.8949...  0.5973 sec/batch
Epoch: 13/20...  Training Step: 2533...  Training loss: 1.9072...  0.5964 sec/batch
Epoch: 13/20...  Training Step: 2534...  Training loss: 1.9101...  0.6029 sec/batch
Epoch: 13/20...  Training Step: 2535...  Training loss: 1.8525...  0.6125 sec/batch
Epoch: 13/20...  Training Step: 2536...  Training loss: 1.9157...  0.7515 sec/batch
Epoch: 13/20...  Training Step: 2537...  Training loss: 1.9207...  0.6017 sec/batch
Epoch: 13/20...  Training Step: 2538...  Training loss: 1.8875...  0.6073 sec/batch
Epoch: 13/20...  Training Step: 2539...  Training loss: 1.9074...  0.6120 sec/batch
Epoch: 13/20...  Training Step: 2540...  Training loss: 1.9063...  0.6017 sec/batch
Epoch: 13/20...  Training Step: 2541...  Training loss: 1.8994...  0.5969 sec/batch
Epoch: 13/20...  Training Step: 2542...  Training loss: 1.9009...  0.6008 sec/batch
Epoch: 13/20...  Training Step: 2543...  Training loss: 1.9113...  0.5972 sec/batch
Epoch: 13/20...  Training Step: 2544...  Training loss: 1.9396...  0.6066 sec/batch
Epoch: 13/20...  Training Step: 2545...  Training loss: 1.8942...  0.6060 sec/batch
Epoch: 13/20...  Training Step: 2546...  Training loss: 1.8871...  0.6046 sec/batch
Epoch: 13/20...  Training Step: 2547...  Training loss: 1.8863...  0.6077 sec/batch
Epoch: 13/20...  Training Step: 2548...  Training loss: 1.8867...  0.6087 sec/batch
Epoch: 13/20...  Training Step: 2549...  Training loss: 1.9256...  0.5972 sec/batch
Epoch: 13/20...  Training Step: 2550...  Training loss: 1.9103...  0.5989 sec/batch
Epoch: 13/20...  Training Step: 2551...  Training loss: 1.9147...  0.6089 sec/batch
Epoch: 13/20...  Training Step: 2552...  Training loss: 1.8905...  0.6098 sec/batch
Epoch: 13/20...  Training Step: 2553...  Training loss: 1.8838...  0.6100 sec/batch
Epoch: 13/20...  Training Step: 2554...  Training loss: 1.9075...  0.6089 sec/batch
Epoch: 13/20...  Training Step: 2555...  Training loss: 1.8773...  0.5994 sec/batch
Epoch: 13/20...  Training Step: 2556...  Training loss: 1.8591...  0.5913 sec/batch
Epoch: 13/20...  Training Step: 2557...  Training loss: 1.8723...  0.6016 sec/batch
Epoch: 13/20...  Training Step: 2558...  Training loss: 1.9039...  0.6021 sec/batch
Epoch: 13/20...  Training Step: 2559...  Training loss: 1.8828...  0.6010 sec/batch
Epoch: 13/20...  Training Step: 2560...  Training loss: 1.9148...  0.6029 sec/batch
Epoch: 13/20...  Training Step: 2561...  Training loss: 1.9020...  0.6179 sec/batch
Epoch: 13/20...  Training Step: 2562...  Training loss: 1.8853...  0.6110 sec/batch
Epoch: 13/20...  Training Step: 2563...  Training loss: 1.9047...  0.5975 sec/batch
Epoch: 13/20...  Training Step: 2564...  Training loss: 1.8765...  0.6105 sec/batch
Epoch: 13/20...  Training Step: 2565...  Training loss: 1.8960...  0.6062 sec/batch
Epoch: 13/20...  Training Step: 2566...  Training loss: 1.8921...  0.6046 sec/batch
Epoch: 13/20...  Training Step: 2567...  Training loss: 1.9044...  0.6086 sec/batch
Epoch: 13/20...  Training Step: 2568...  Training loss: 1.8663...  0.5972 sec/batch
Epoch: 13/20...  Training Step: 2569...  Training loss: 1.8843...  0.5928 sec/batch
Epoch: 13/20...  Training Step: 2570...  Training loss: 1.8751...  0.6083 sec/batch
Epoch: 13/20...  Training Step: 2571...  Training loss: 1.8594...  0.5804 sec/batch
Epoch: 13/20...  Training Step: 2572...  Training loss: 1.8911...  0.6059 sec/batch
Epoch: 13/20...  Training Step: 2573...  Training loss: 1.8875...  0.6005 sec/batch
Epoch: 13/20...  Training Step: 2574...  Training loss: 1.8768...  0.6014 sec/batch
Epoch: 14/20...  Training Step: 2575...  Training loss: 1.9658...  0.6072 sec/batch
Epoch: 14/20...  Training Step: 2576...  Training loss: 1.8790...  0.6277 sec/batch
Epoch: 14/20...  Training Step: 2577...  Training loss: 1.8655...  0.6032 sec/batch
Epoch: 14/20...  Training Step: 2578...  Training loss: 1.8861...  0.6002 sec/batch
Epoch: 14/20...  Training Step: 2579...  Training loss: 1.8794...  0.6004 sec/batch
Epoch: 14/20...  Training Step: 2580...  Training loss: 1.8632...  0.6074 sec/batch
Epoch: 14/20...  Training Step: 2581...  Training loss: 1.8904...  0.6008 sec/batch
Epoch: 14/20...  Training Step: 2582...  Training loss: 1.8870...  0.6064 sec/batch
Epoch: 14/20...  Training Step: 2583...  Training loss: 1.9301...  0.6016 sec/batch
Epoch: 14/20...  Training Step: 2584...  Training loss: 1.8774...  0.6229 sec/batch
Epoch: 14/20...  Training Step: 2585...  Training loss: 1.8584...  0.5971 sec/batch
Epoch: 14/20...  Training Step: 2586...  Training loss: 1.8683...  0.6080 sec/batch
Epoch: 14/20...  Training Step: 2587...  Training loss: 1.8917...  0.6034 sec/batch
Epoch: 14/20...  Training Step: 2588...  Training loss: 1.9311...  0.5897 sec/batch
Epoch: 14/20...  Training Step: 2589...  Training loss: 1.8846...  0.6032 sec/batch
Epoch: 14/20...  Training Step: 2590...  Training loss: 1.8658...  0.6034 sec/batch
Epoch: 14/20...  Training Step: 2591...  Training loss: 1.8963...  0.6072 sec/batch
Epoch: 14/20...  Training Step: 2592...  Training loss: 1.9283...  0.6016 sec/batch
Epoch: 14/20...  Training Step: 2593...  Training loss: 1.8839...  0.6000 sec/batch
Epoch: 14/20...  Training Step: 2594...  Training loss: 1.9058...  0.6186 sec/batch
Epoch: 14/20...  Training Step: 2595...  Training loss: 1.8788...  0.6157 sec/batch
Epoch: 14/20...  Training Step: 2596...  Training loss: 1.9217...  0.5905 sec/batch
Epoch: 14/20...  Training Step: 2597...  Training loss: 1.8918...  0.6141 sec/batch
Epoch: 14/20...  Training Step: 2598...  Training loss: 1.8889...  0.5978 sec/batch
Epoch: 14/20...  Training Step: 2599...  Training loss: 1.8884...  0.6030 sec/batch
Epoch: 14/20...  Training Step: 2600...  Training loss: 1.8625...  0.6079 sec/batch
Epoch: 14/20...  Training Step: 2601...  Training loss: 1.8751...  0.5649 sec/batch
Epoch: 14/20...  Training Step: 2602...  Training loss: 1.9087...  0.6037 sec/batch
Epoch: 14/20...  Training Step: 2603...  Training loss: 1.9241...  0.6069 sec/batch
Epoch: 14/20...  Training Step: 2604...  Training loss: 1.9048...  0.6054 sec/batch
Epoch: 14/20...  Training Step: 2605...  Training loss: 1.8952...  0.6022 sec/batch
Epoch: 14/20...  Training Step: 2606...  Training loss: 1.8708...  0.5989 sec/batch
Epoch: 14/20...  Training Step: 2607...  Training loss: 1.9067...  0.5995 sec/batch
Epoch: 14/20...  Training Step: 2608...  Training loss: 1.9056...  0.5846 sec/batch
Epoch: 14/20...  Training Step: 2609...  Training loss: 1.8625...  0.6053 sec/batch
Epoch: 14/20...  Training Step: 2610...  Training loss: 1.8990...  0.6118 sec/batch
Epoch: 14/20...  Training Step: 2611...  Training loss: 1.8726...  0.6055 sec/batch
Epoch: 14/20...  Training Step: 2612...  Training loss: 1.8501...  0.5960 sec/batch
Epoch: 14/20...  Training Step: 2613...  Training loss: 1.8503...  0.6381 sec/batch
Epoch: 14/20...  Training Step: 2614...  Training loss: 1.8541...  0.6041 sec/batch
Epoch: 14/20...  Training Step: 2615...  Training loss: 1.8821...  0.5908 sec/batch
Epoch: 14/20...  Training Step: 2616...  Training loss: 1.8994...  0.5866 sec/batch
Epoch: 14/20...  Training Step: 2617...  Training loss: 1.8714...  0.6011 sec/batch
Epoch: 14/20...  Training Step: 2618...  Training loss: 1.8464...  0.6062 sec/batch
Epoch: 14/20...  Training Step: 2619...  Training loss: 1.9008...  0.5958 sec/batch
Epoch: 14/20...  Training Step: 2620...  Training loss: 1.8306...  0.6043 sec/batch
Epoch: 14/20...  Training Step: 2621...  Training loss: 1.8895...  0.6072 sec/batch
Epoch: 14/20...  Training Step: 2622...  Training loss: 1.8698...  0.5835 sec/batch
Epoch: 14/20...  Training Step: 2623...  Training loss: 1.8744...  0.5937 sec/batch
Epoch: 14/20...  Training Step: 2624...  Training loss: 1.9319...  0.6079 sec/batch
Epoch: 14/20...  Training Step: 2625...  Training loss: 1.8644...  0.5992 sec/batch
Epoch: 14/20...  Training Step: 2626...  Training loss: 1.9335...  0.6111 sec/batch
Epoch: 14/20...  Training Step: 2627...  Training loss: 1.8852...  0.5987 sec/batch
Epoch: 14/20...  Training Step: 2628...  Training loss: 1.8735...  0.6067 sec/batch
Epoch: 14/20...  Training Step: 2629...  Training loss: 1.8700...  0.6071 sec/batch
Epoch: 14/20...  Training Step: 2630...  Training loss: 1.8956...  0.6098 sec/batch
Epoch: 14/20...  Training Step: 2631...  Training loss: 1.8973...  0.6111 sec/batch
Epoch: 14/20...  Training Step: 2632...  Training loss: 1.8786...  0.6063 sec/batch
Epoch: 14/20...  Training Step: 2633...  Training loss: 1.8802...  0.6012 sec/batch
Epoch: 14/20...  Training Step: 2634...  Training loss: 1.9188...  0.6059 sec/batch
Epoch: 14/20...  Training Step: 2635...  Training loss: 1.8908...  0.5987 sec/batch
Epoch: 14/20...  Training Step: 2636...  Training loss: 1.9352...  0.6052 sec/batch
Epoch: 14/20...  Training Step: 2637...  Training loss: 1.9132...  0.6039 sec/batch
Epoch: 14/20...  Training Step: 2638...  Training loss: 1.8921...  0.5997 sec/batch
Epoch: 14/20...  Training Step: 2639...  Training loss: 1.8764...  0.6179 sec/batch
Epoch: 14/20...  Training Step: 2640...  Training loss: 1.9090...  0.6075 sec/batch
Epoch: 14/20...  Training Step: 2641...  Training loss: 1.9012...  0.6025 sec/batch
Epoch: 14/20...  Training Step: 2642...  Training loss: 1.8628...  0.6070 sec/batch
Epoch: 14/20...  Training Step: 2643...  Training loss: 1.8881...  0.6138 sec/batch
Epoch: 14/20...  Training Step: 2644...  Training loss: 1.8859...  0.6059 sec/batch
Epoch: 14/20...  Training Step: 2645...  Training loss: 1.9163...  0.6065 sec/batch
Epoch: 14/20...  Training Step: 2646...  Training loss: 1.9015...  0.6023 sec/batch
Epoch: 14/20...  Training Step: 2647...  Training loss: 1.9128...  0.5978 sec/batch
Epoch: 14/20...  Training Step: 2648...  Training loss: 1.8750...  0.5908 sec/batch
Epoch: 14/20...  Training Step: 2649...  Training loss: 1.8857...  0.6079 sec/batch
Epoch: 14/20...  Training Step: 2650...  Training loss: 1.9094...  0.5999 sec/batch
Epoch: 14/20...  Training Step: 2651...  Training loss: 1.8884...  0.5826 sec/batch
Epoch: 14/20...  Training Step: 2652...  Training loss: 1.8923...  0.5834 sec/batch
Epoch: 14/20...  Training Step: 2653...  Training loss: 1.8472...  0.6033 sec/batch
Epoch: 14/20...  Training Step: 2654...  Training loss: 1.8776...  0.6027 sec/batch
Epoch: 14/20...  Training Step: 2655...  Training loss: 1.8412...  0.5880 sec/batch
Epoch: 14/20...  Training Step: 2656...  Training loss: 1.8944...  0.6045 sec/batch
Epoch: 14/20...  Training Step: 2657...  Training loss: 1.8566...  0.5999 sec/batch
Epoch: 14/20...  Training Step: 2658...  Training loss: 1.8780...  0.6112 sec/batch
Epoch: 14/20...  Training Step: 2659...  Training loss: 1.8500...  0.6134 sec/batch
Epoch: 14/20...  Training Step: 2660...  Training loss: 1.8573...  0.5993 sec/batch
Epoch: 14/20...  Training Step: 2661...  Training loss: 1.8665...  0.5988 sec/batch
Epoch: 14/20...  Training Step: 2662...  Training loss: 1.8590...  0.6068 sec/batch
Epoch: 14/20...  Training Step: 2663...  Training loss: 1.8398...  0.6368 sec/batch
Epoch: 14/20...  Training Step: 2664...  Training loss: 1.8830...  0.6035 sec/batch
Epoch: 14/20...  Training Step: 2665...  Training loss: 1.8547...  0.5944 sec/batch
Epoch: 14/20...  Training Step: 2666...  Training loss: 1.8741...  0.6099 sec/batch
Epoch: 14/20...  Training Step: 2667...  Training loss: 1.8565...  0.6020 sec/batch
Epoch: 14/20...  Training Step: 2668...  Training loss: 1.8639...  0.6142 sec/batch
Epoch: 14/20...  Training Step: 2669...  Training loss: 1.8580...  0.6024 sec/batch
Epoch: 14/20...  Training Step: 2670...  Training loss: 1.8767...  0.6100 sec/batch
Epoch: 14/20...  Training Step: 2671...  Training loss: 1.8835...  0.6012 sec/batch
Epoch: 14/20...  Training Step: 2672...  Training loss: 1.8442...  0.6104 sec/batch
Epoch: 14/20...  Training Step: 2673...  Training loss: 1.8562...  0.5837 sec/batch
Epoch: 14/20...  Training Step: 2674...  Training loss: 1.8480...  0.5956 sec/batch
Epoch: 14/20...  Training Step: 2675...  Training loss: 1.8744...  0.6094 sec/batch
Epoch: 14/20...  Training Step: 2676...  Training loss: 1.8792...  0.6273 sec/batch
Epoch: 14/20...  Training Step: 2677...  Training loss: 1.8666...  0.6107 sec/batch
Epoch: 14/20...  Training Step: 2678...  Training loss: 1.8739...  0.5996 sec/batch
Epoch: 14/20...  Training Step: 2679...  Training loss: 1.8744...  0.6109 sec/batch
Epoch: 14/20...  Training Step: 2680...  Training loss: 1.8744...  0.6008 sec/batch
Epoch: 14/20...  Training Step: 2681...  Training loss: 1.8781...  0.5978 sec/batch
Epoch: 14/20...  Training Step: 2682...  Training loss: 1.8866...  0.6060 sec/batch
Epoch: 14/20...  Training Step: 2683...  Training loss: 1.8864...  0.5962 sec/batch
Epoch: 14/20...  Training Step: 2684...  Training loss: 1.8798...  0.6128 sec/batch
Epoch: 14/20...  Training Step: 2685...  Training loss: 1.8826...  0.6052 sec/batch
Epoch: 14/20...  Training Step: 2686...  Training loss: 1.8751...  0.6081 sec/batch
Epoch: 14/20...  Training Step: 2687...  Training loss: 1.8795...  0.6099 sec/batch
Epoch: 14/20...  Training Step: 2688...  Training loss: 1.8696...  0.6012 sec/batch
Epoch: 14/20...  Training Step: 2689...  Training loss: 1.8611...  0.6100 sec/batch
Epoch: 14/20...  Training Step: 2690...  Training loss: 1.8291...  0.6355 sec/batch
Epoch: 14/20...  Training Step: 2691...  Training loss: 1.8758...  0.6217 sec/batch
Epoch: 14/20...  Training Step: 2692...  Training loss: 1.8761...  0.6095 sec/batch
Epoch: 14/20...  Training Step: 2693...  Training loss: 1.8866...  0.6072 sec/batch
Epoch: 14/20...  Training Step: 2694...  Training loss: 1.8678...  0.6012 sec/batch
Epoch: 14/20...  Training Step: 2695...  Training loss: 1.8909...  0.6071 sec/batch
Epoch: 14/20...  Training Step: 2696...  Training loss: 1.8521...  0.5980 sec/batch
Epoch: 14/20...  Training Step: 2697...  Training loss: 1.8503...  0.6027 sec/batch
Epoch: 14/20...  Training Step: 2698...  Training loss: 1.9167...  0.6072 sec/batch
Epoch: 14/20...  Training Step: 2699...  Training loss: 1.8670...  0.6066 sec/batch
Epoch: 14/20...  Training Step: 2700...  Training loss: 1.8419...  0.6104 sec/batch
Epoch: 14/20...  Training Step: 2701...  Training loss: 1.8896...  0.6040 sec/batch
Epoch: 14/20...  Training Step: 2702...  Training loss: 1.8880...  0.5963 sec/batch
Epoch: 14/20...  Training Step: 2703...  Training loss: 1.8722...  0.5991 sec/batch
Epoch: 14/20...  Training Step: 2704...  Training loss: 1.8789...  0.6071 sec/batch
Epoch: 14/20...  Training Step: 2705...  Training loss: 1.8550...  0.6000 sec/batch
Epoch: 14/20...  Training Step: 2706...  Training loss: 1.8594...  0.6010 sec/batch
Epoch: 14/20...  Training Step: 2707...  Training loss: 1.8961...  0.6020 sec/batch
Epoch: 14/20...  Training Step: 2708...  Training loss: 1.8872...  0.5948 sec/batch
Epoch: 14/20...  Training Step: 2709...  Training loss: 1.8847...  0.6145 sec/batch
Epoch: 14/20...  Training Step: 2710...  Training loss: 1.8744...  0.6095 sec/batch
Epoch: 14/20...  Training Step: 2711...  Training loss: 1.8921...  0.6027 sec/batch
Epoch: 14/20...  Training Step: 2712...  Training loss: 1.8881...  0.6188 sec/batch
Epoch: 14/20...  Training Step: 2713...  Training loss: 1.9147...  0.5957 sec/batch
Epoch: 14/20...  Training Step: 2714...  Training loss: 1.8587...  0.6055 sec/batch
Epoch: 14/20...  Training Step: 2715...  Training loss: 1.9282...  0.6082 sec/batch
Epoch: 14/20...  Training Step: 2716...  Training loss: 1.8677...  0.6062 sec/batch
Epoch: 14/20...  Training Step: 2717...  Training loss: 1.8814...  0.6097 sec/batch
Epoch: 14/20...  Training Step: 2718...  Training loss: 1.8897...  0.6041 sec/batch
Epoch: 14/20...  Training Step: 2719...  Training loss: 1.8706...  0.6022 sec/batch
Epoch: 14/20...  Training Step: 2720...  Training loss: 1.8911...  0.6080 sec/batch
Epoch: 14/20...  Training Step: 2721...  Training loss: 1.8984...  0.6044 sec/batch
Epoch: 14/20...  Training Step: 2722...  Training loss: 1.9079...  0.6054 sec/batch
Epoch: 14/20...  Training Step: 2723...  Training loss: 1.8825...  0.6026 sec/batch
Epoch: 14/20...  Training Step: 2724...  Training loss: 1.8778...  0.5931 sec/batch
Epoch: 14/20...  Training Step: 2725...  Training loss: 1.8529...  0.6090 sec/batch
Epoch: 14/20...  Training Step: 2726...  Training loss: 1.9036...  0.6085 sec/batch
Epoch: 14/20...  Training Step: 2727...  Training loss: 1.8866...  0.6083 sec/batch
Epoch: 14/20...  Training Step: 2728...  Training loss: 1.8922...  0.5842 sec/batch
Epoch: 14/20...  Training Step: 2729...  Training loss: 1.8788...  0.6062 sec/batch
Epoch: 14/20...  Training Step: 2730...  Training loss: 1.8779...  0.6013 sec/batch
Epoch: 14/20...  Training Step: 2731...  Training loss: 1.8926...  0.6026 sec/batch
Epoch: 14/20...  Training Step: 2732...  Training loss: 1.8668...  0.6044 sec/batch
Epoch: 14/20...  Training Step: 2733...  Training loss: 1.8446...  0.6010 sec/batch
Epoch: 14/20...  Training Step: 2734...  Training loss: 1.9030...  0.6156 sec/batch
Epoch: 14/20...  Training Step: 2735...  Training loss: 1.9083...  0.6033 sec/batch
Epoch: 14/20...  Training Step: 2736...  Training loss: 1.8685...  0.6025 sec/batch
Epoch: 14/20...  Training Step: 2737...  Training loss: 1.8958...  0.6043 sec/batch
Epoch: 14/20...  Training Step: 2738...  Training loss: 1.8817...  0.6093 sec/batch
Epoch: 14/20...  Training Step: 2739...  Training loss: 1.8782...  0.6042 sec/batch
Epoch: 14/20...  Training Step: 2740...  Training loss: 1.8765...  0.5962 sec/batch
Epoch: 14/20...  Training Step: 2741...  Training loss: 1.8875...  0.6073 sec/batch
Epoch: 14/20...  Training Step: 2742...  Training loss: 1.9353...  0.6123 sec/batch
Epoch: 14/20...  Training Step: 2743...  Training loss: 1.8775...  0.5946 sec/batch
Epoch: 14/20...  Training Step: 2744...  Training loss: 1.8711...  0.6035 sec/batch
Epoch: 14/20...  Training Step: 2745...  Training loss: 1.8599...  0.5889 sec/batch
Epoch: 14/20...  Training Step: 2746...  Training loss: 1.8567...  0.6045 sec/batch
Epoch: 14/20...  Training Step: 2747...  Training loss: 1.8974...  0.5995 sec/batch
Epoch: 14/20...  Training Step: 2748...  Training loss: 1.8806...  0.6025 sec/batch
Epoch: 14/20...  Training Step: 2749...  Training loss: 1.8761...  0.6027 sec/batch
Epoch: 14/20...  Training Step: 2750...  Training loss: 1.8706...  0.6118 sec/batch
Epoch: 14/20...  Training Step: 2751...  Training loss: 1.8597...  0.6048 sec/batch
Epoch: 14/20...  Training Step: 2752...  Training loss: 1.8845...  0.6022 sec/batch
Epoch: 14/20...  Training Step: 2753...  Training loss: 1.8528...  0.6099 sec/batch
Epoch: 14/20...  Training Step: 2754...  Training loss: 1.8402...  0.6087 sec/batch
Epoch: 14/20...  Training Step: 2755...  Training loss: 1.8542...  0.6019 sec/batch
Epoch: 14/20...  Training Step: 2756...  Training loss: 1.8818...  0.6099 sec/batch
Epoch: 14/20...  Training Step: 2757...  Training loss: 1.8757...  0.6077 sec/batch
Epoch: 14/20...  Training Step: 2758...  Training loss: 1.8918...  0.5999 sec/batch
Epoch: 14/20...  Training Step: 2759...  Training loss: 1.8841...  0.6039 sec/batch
Epoch: 14/20...  Training Step: 2760...  Training loss: 1.8727...  0.6096 sec/batch
Epoch: 14/20...  Training Step: 2761...  Training loss: 1.8856...  0.5990 sec/batch
Epoch: 14/20...  Training Step: 2762...  Training loss: 1.8556...  0.6336 sec/batch
Epoch: 14/20...  Training Step: 2763...  Training loss: 1.8744...  0.5959 sec/batch
Epoch: 14/20...  Training Step: 2764...  Training loss: 1.8700...  0.6070 sec/batch
Epoch: 14/20...  Training Step: 2765...  Training loss: 1.8801...  0.6052 sec/batch
Epoch: 14/20...  Training Step: 2766...  Training loss: 1.8533...  0.5945 sec/batch
Epoch: 14/20...  Training Step: 2767...  Training loss: 1.8775...  0.6092 sec/batch
Epoch: 14/20...  Training Step: 2768...  Training loss: 1.8501...  0.6018 sec/batch
Epoch: 14/20...  Training Step: 2769...  Training loss: 1.8457...  0.6028 sec/batch
Epoch: 14/20...  Training Step: 2770...  Training loss: 1.8708...  0.6030 sec/batch
Epoch: 14/20...  Training Step: 2771...  Training loss: 1.8590...  0.5987 sec/batch
Epoch: 14/20...  Training Step: 2772...  Training loss: 1.8585...  0.6032 sec/batch
Epoch: 15/20...  Training Step: 2773...  Training loss: 1.9336...  0.6145 sec/batch
Epoch: 15/20...  Training Step: 2774...  Training loss: 1.8472...  0.6101 sec/batch
Epoch: 15/20...  Training Step: 2775...  Training loss: 1.8677...  0.6070 sec/batch
Epoch: 15/20...  Training Step: 2776...  Training loss: 1.8721...  0.6097 sec/batch
Epoch: 15/20...  Training Step: 2777...  Training loss: 1.8594...  0.6060 sec/batch
Epoch: 15/20...  Training Step: 2778...  Training loss: 1.8302...  0.6021 sec/batch
Epoch: 15/20...  Training Step: 2779...  Training loss: 1.8697...  0.6020 sec/batch
Epoch: 15/20...  Training Step: 2780...  Training loss: 1.8657...  0.5955 sec/batch
Epoch: 15/20...  Training Step: 2781...  Training loss: 1.8992...  0.6059 sec/batch
Epoch: 15/20...  Training Step: 2782...  Training loss: 1.8693...  0.6025 sec/batch
Epoch: 15/20...  Training Step: 2783...  Training loss: 1.8394...  0.6118 sec/batch
Epoch: 15/20...  Training Step: 2784...  Training loss: 1.8371...  0.6048 sec/batch
Epoch: 15/20...  Training Step: 2785...  Training loss: 1.8623...  0.6042 sec/batch
Epoch: 15/20...  Training Step: 2786...  Training loss: 1.9040...  0.6033 sec/batch
Epoch: 15/20...  Training Step: 2787...  Training loss: 1.8590...  0.6107 sec/batch
Epoch: 15/20...  Training Step: 2788...  Training loss: 1.8498...  0.6098 sec/batch
Epoch: 15/20...  Training Step: 2789...  Training loss: 1.8738...  0.6096 sec/batch
Epoch: 15/20...  Training Step: 2790...  Training loss: 1.9009...  0.5937 sec/batch
Epoch: 15/20...  Training Step: 2791...  Training loss: 1.8642...  0.6082 sec/batch
Epoch: 15/20...  Training Step: 2792...  Training loss: 1.8854...  0.6008 sec/batch
Epoch: 15/20...  Training Step: 2793...  Training loss: 1.8612...  0.6058 sec/batch
Epoch: 15/20...  Training Step: 2794...  Training loss: 1.9005...  0.6052 sec/batch
Epoch: 15/20...  Training Step: 2795...  Training loss: 1.8699...  0.6044 sec/batch
Epoch: 15/20...  Training Step: 2796...  Training loss: 1.8682...  0.6095 sec/batch
Epoch: 15/20...  Training Step: 2797...  Training loss: 1.8736...  0.6054 sec/batch
Epoch: 15/20...  Training Step: 2798...  Training loss: 1.8492...  0.6040 sec/batch
Epoch: 15/20...  Training Step: 2799...  Training loss: 1.8398...  0.5977 sec/batch
Epoch: 15/20...  Training Step: 2800...  Training loss: 1.8846...  0.6125 sec/batch
Epoch: 15/20...  Training Step: 2801...  Training loss: 1.8934...  0.6023 sec/batch
Epoch: 15/20...  Training Step: 2802...  Training loss: 1.8893...  0.6045 sec/batch
Epoch: 15/20...  Training Step: 2803...  Training loss: 1.8726...  0.6122 sec/batch
Epoch: 15/20...  Training Step: 2804...  Training loss: 1.8488...  0.5864 sec/batch
Epoch: 15/20...  Training Step: 2805...  Training loss: 1.8915...  0.6093 sec/batch
Epoch: 15/20...  Training Step: 2806...  Training loss: 1.9010...  0.6101 sec/batch
Epoch: 15/20...  Training Step: 2807...  Training loss: 1.8485...  0.6075 sec/batch
Epoch: 15/20...  Training Step: 2808...  Training loss: 1.8608...  0.6073 sec/batch
Epoch: 15/20...  Training Step: 2809...  Training loss: 1.8448...  0.6095 sec/batch
Epoch: 15/20...  Training Step: 2810...  Training loss: 1.8344...  0.6255 sec/batch
Epoch: 15/20...  Training Step: 2811...  Training loss: 1.8357...  0.6046 sec/batch
Epoch: 15/20...  Training Step: 2812...  Training loss: 1.8542...  0.5978 sec/batch
Epoch: 15/20...  Training Step: 2813...  Training loss: 1.8590...  0.6013 sec/batch
Epoch: 15/20...  Training Step: 2814...  Training loss: 1.8729...  0.5922 sec/batch
Epoch: 15/20...  Training Step: 2815...  Training loss: 1.8617...  0.6016 sec/batch
Epoch: 15/20...  Training Step: 2816...  Training loss: 1.8338...  0.5991 sec/batch
Epoch: 15/20...  Training Step: 2817...  Training loss: 1.8839...  0.6000 sec/batch
Epoch: 15/20...  Training Step: 2818...  Training loss: 1.8174...  0.6068 sec/batch
Epoch: 15/20...  Training Step: 2819...  Training loss: 1.8658...  0.6015 sec/batch
Epoch: 15/20...  Training Step: 2820...  Training loss: 1.8553...  0.5976 sec/batch
Epoch: 15/20...  Training Step: 2821...  Training loss: 1.8484...  0.6055 sec/batch
Epoch: 15/20...  Training Step: 2822...  Training loss: 1.9060...  0.5980 sec/batch
Epoch: 15/20...  Training Step: 2823...  Training loss: 1.8390...  0.6031 sec/batch
Epoch: 15/20...  Training Step: 2824...  Training loss: 1.9033...  0.5841 sec/batch
Epoch: 15/20...  Training Step: 2825...  Training loss: 1.8638...  0.6112 sec/batch
Epoch: 15/20...  Training Step: 2826...  Training loss: 1.8568...  0.6065 sec/batch
Epoch: 15/20...  Training Step: 2827...  Training loss: 1.8455...  0.6134 sec/batch
Epoch: 15/20...  Training Step: 2828...  Training loss: 1.8671...  0.6038 sec/batch
Epoch: 15/20...  Training Step: 2829...  Training loss: 1.8941...  0.6007 sec/batch
Epoch: 15/20...  Training Step: 2830...  Training loss: 1.8507...  0.6059 sec/batch
Epoch: 15/20...  Training Step: 2831...  Training loss: 1.8459...  0.6010 sec/batch
Epoch: 15/20...  Training Step: 2832...  Training loss: 1.8849...  0.6091 sec/batch
Epoch: 15/20...  Training Step: 2833...  Training loss: 1.8717...  0.5995 sec/batch
Epoch: 15/20...  Training Step: 2834...  Training loss: 1.9247...  0.6096 sec/batch
Epoch: 15/20...  Training Step: 2835...  Training loss: 1.9020...  0.6050 sec/batch
Epoch: 15/20...  Training Step: 2836...  Training loss: 1.8824...  0.6071 sec/batch
Epoch: 15/20...  Training Step: 2837...  Training loss: 1.8596...  0.5959 sec/batch
Epoch: 15/20...  Training Step: 2838...  Training loss: 1.8892...  0.6094 sec/batch
Epoch: 15/20...  Training Step: 2839...  Training loss: 1.8822...  0.6053 sec/batch
Epoch: 15/20...  Training Step: 2840...  Training loss: 1.8480...  0.5981 sec/batch
Epoch: 15/20...  Training Step: 2841...  Training loss: 1.8509...  0.5839 sec/batch
Epoch: 15/20...  Training Step: 2842...  Training loss: 1.8504...  0.6044 sec/batch
Epoch: 15/20...  Training Step: 2843...  Training loss: 1.8950...  0.6092 sec/batch
Epoch: 15/20...  Training Step: 2844...  Training loss: 1.8830...  0.6081 sec/batch
Epoch: 15/20...  Training Step: 2845...  Training loss: 1.8887...  0.6076 sec/batch
Epoch: 15/20...  Training Step: 2846...  Training loss: 1.8598...  0.5890 sec/batch
Epoch: 15/20...  Training Step: 2847...  Training loss: 1.8616...  0.6106 sec/batch
Epoch: 15/20...  Training Step: 2848...  Training loss: 1.8894...  0.6171 sec/batch
Epoch: 15/20...  Training Step: 2849...  Training loss: 1.8686...  0.6038 sec/batch
Epoch: 15/20...  Training Step: 2850...  Training loss: 1.8713...  0.6098 sec/batch
Epoch: 15/20...  Training Step: 2851...  Training loss: 1.8319...  0.6022 sec/batch
Epoch: 15/20...  Training Step: 2852...  Training loss: 1.8473...  0.6060 sec/batch
Epoch: 15/20...  Training Step: 2853...  Training loss: 1.8249...  0.6168 sec/batch
Epoch: 15/20...  Training Step: 2854...  Training loss: 1.8829...  0.6082 sec/batch
Epoch: 15/20...  Training Step: 2855...  Training loss: 1.8302...  0.5841 sec/batch
Epoch: 15/20...  Training Step: 2856...  Training loss: 1.8603...  0.6096 sec/batch
Epoch: 15/20...  Training Step: 2857...  Training loss: 1.8419...  0.6099 sec/batch
Epoch: 15/20...  Training Step: 2858...  Training loss: 1.8560...  0.6001 sec/batch
Epoch: 15/20...  Training Step: 2859...  Training loss: 1.8510...  0.5988 sec/batch
Epoch: 15/20...  Training Step: 2860...  Training loss: 1.8397...  0.6361 sec/batch
Epoch: 15/20...  Training Step: 2861...  Training loss: 1.8251...  0.6130 sec/batch
Epoch: 15/20...  Training Step: 2862...  Training loss: 1.8787...  0.5987 sec/batch
Epoch: 15/20...  Training Step: 2863...  Training loss: 1.8376...  0.6104 sec/batch
Epoch: 15/20...  Training Step: 2864...  Training loss: 1.8560...  0.6035 sec/batch
Epoch: 15/20...  Training Step: 2865...  Training loss: 1.8380...  0.6090 sec/batch
Epoch: 15/20...  Training Step: 2866...  Training loss: 1.8375...  0.6093 sec/batch
Epoch: 15/20...  Training Step: 2867...  Training loss: 1.8361...  0.6062 sec/batch
Epoch: 15/20...  Training Step: 2868...  Training loss: 1.8611...  0.6093 sec/batch
Epoch: 15/20...  Training Step: 2869...  Training loss: 1.8623...  0.6045 sec/batch
Epoch: 15/20...  Training Step: 2870...  Training loss: 1.8306...  0.6033 sec/batch
Epoch: 15/20...  Training Step: 2871...  Training loss: 1.8406...  0.5976 sec/batch
Epoch: 15/20...  Training Step: 2872...  Training loss: 1.8293...  0.6041 sec/batch
Epoch: 15/20...  Training Step: 2873...  Training loss: 1.8712...  0.6185 sec/batch
Epoch: 15/20...  Training Step: 2874...  Training loss: 1.8528...  0.6042 sec/batch
Epoch: 15/20...  Training Step: 2875...  Training loss: 1.8512...  0.5961 sec/batch
Epoch: 15/20...  Training Step: 2876...  Training loss: 1.8417...  0.6109 sec/batch
Epoch: 15/20...  Training Step: 2877...  Training loss: 1.8577...  0.6080 sec/batch
Epoch: 15/20...  Training Step: 2878...  Training loss: 1.8556...  0.6026 sec/batch
Epoch: 15/20...  Training Step: 2879...  Training loss: 1.8582...  0.6054 sec/batch
Epoch: 15/20...  Training Step: 2880...  Training loss: 1.8595...  0.6167 sec/batch
Epoch: 15/20...  Training Step: 2881...  Training loss: 1.8724...  0.6205 sec/batch
Epoch: 15/20...  Training Step: 2882...  Training loss: 1.8560...  0.6057 sec/batch
Epoch: 15/20...  Training Step: 2883...  Training loss: 1.8571...  0.6116 sec/batch
Epoch: 15/20...  Training Step: 2884...  Training loss: 1.8510...  0.6062 sec/batch
Epoch: 15/20...  Training Step: 2885...  Training loss: 1.8571...  0.5972 sec/batch
Epoch: 15/20...  Training Step: 2886...  Training loss: 1.8351...  0.6097 sec/batch
Epoch: 15/20...  Training Step: 2887...  Training loss: 1.8371...  0.6045 sec/batch
Epoch: 15/20...  Training Step: 2888...  Training loss: 1.8179...  0.6031 sec/batch
Epoch: 15/20...  Training Step: 2889...  Training loss: 1.8678...  0.6003 sec/batch
Epoch: 15/20...  Training Step: 2890...  Training loss: 1.8460...  0.6059 sec/batch
Epoch: 15/20...  Training Step: 2891...  Training loss: 1.8644...  0.6110 sec/batch
Epoch: 15/20...  Training Step: 2892...  Training loss: 1.8561...  0.5986 sec/batch
Epoch: 15/20...  Training Step: 2893...  Training loss: 1.8725...  0.6110 sec/batch
Epoch: 15/20...  Training Step: 2894...  Training loss: 1.8250...  0.6028 sec/batch
Epoch: 15/20...  Training Step: 2895...  Training loss: 1.8214...  0.5807 sec/batch
Epoch: 15/20...  Training Step: 2896...  Training loss: 1.8859...  0.6096 sec/batch
Epoch: 15/20...  Training Step: 2897...  Training loss: 1.8388...  0.5977 sec/batch
Epoch: 15/20...  Training Step: 2898...  Training loss: 1.8102...  0.5992 sec/batch
Epoch: 15/20...  Training Step: 2899...  Training loss: 1.8638...  0.6304 sec/batch
Epoch: 15/20...  Training Step: 2900...  Training loss: 1.8733...  0.6020 sec/batch
Epoch: 15/20...  Training Step: 2901...  Training loss: 1.8588...  0.6114 sec/batch
Epoch: 15/20...  Training Step: 2902...  Training loss: 1.8603...  0.5820 sec/batch
Epoch: 15/20...  Training Step: 2903...  Training loss: 1.8376...  0.6079 sec/batch
Epoch: 15/20...  Training Step: 2904...  Training loss: 1.8224...  0.6082 sec/batch
Epoch: 15/20...  Training Step: 2905...  Training loss: 1.8686...  0.5957 sec/batch
Epoch: 15/20...  Training Step: 2906...  Training loss: 1.8730...  0.6028 sec/batch
Epoch: 15/20...  Training Step: 2907...  Training loss: 1.8576...  0.6047 sec/batch
Epoch: 15/20...  Training Step: 2908...  Training loss: 1.8667...  0.6071 sec/batch
Epoch: 15/20...  Training Step: 2909...  Training loss: 1.8766...  0.6383 sec/batch
Epoch: 15/20...  Training Step: 2910...  Training loss: 1.8727...  0.6082 sec/batch
Epoch: 15/20...  Training Step: 2911...  Training loss: 1.8872...  0.6047 sec/batch
Epoch: 15/20...  Training Step: 2912...  Training loss: 1.8431...  0.6030 sec/batch
Epoch: 15/20...  Training Step: 2913...  Training loss: 1.8989...  0.6014 sec/batch
Epoch: 15/20...  Training Step: 2914...  Training loss: 1.8454...  0.6211 sec/batch
Epoch: 15/20...  Training Step: 2915...  Training loss: 1.8654...  0.6000 sec/batch
Epoch: 15/20...  Training Step: 2916...  Training loss: 1.8638...  0.5987 sec/batch
Epoch: 15/20...  Training Step: 2917...  Training loss: 1.8434...  0.6009 sec/batch
Epoch: 15/20...  Training Step: 2918...  Training loss: 1.8589...  0.6067 sec/batch
Epoch: 15/20...  Training Step: 2919...  Training loss: 1.8814...  0.6043 sec/batch
Epoch: 15/20...  Training Step: 2920...  Training loss: 1.8969...  0.6099 sec/batch
Epoch: 15/20...  Training Step: 2921...  Training loss: 1.8705...  0.6081 sec/batch
Epoch: 15/20...  Training Step: 2922...  Training loss: 1.8488...  0.6083 sec/batch
Epoch: 15/20...  Training Step: 2923...  Training loss: 1.8353...  0.6039 sec/batch
Epoch: 15/20...  Training Step: 2924...  Training loss: 1.8820...  0.6062 sec/batch
Epoch: 15/20...  Training Step: 2925...  Training loss: 1.8626...  0.6038 sec/batch
Epoch: 15/20...  Training Step: 2926...  Training loss: 1.8734...  0.6038 sec/batch
Epoch: 15/20...  Training Step: 2927...  Training loss: 1.8527...  0.6012 sec/batch
Epoch: 15/20...  Training Step: 2928...  Training loss: 1.8510...  0.6103 sec/batch
Epoch: 15/20...  Training Step: 2929...  Training loss: 1.8715...  0.6127 sec/batch
Epoch: 15/20...  Training Step: 2930...  Training loss: 1.8641...  0.6009 sec/batch
Epoch: 15/20...  Training Step: 2931...  Training loss: 1.8218...  0.6076 sec/batch
Epoch: 15/20...  Training Step: 2932...  Training loss: 1.8925...  0.6046 sec/batch
Epoch: 15/20...  Training Step: 2933...  Training loss: 1.8908...  0.6120 sec/batch
Epoch: 15/20...  Training Step: 2934...  Training loss: 1.8462...  0.6074 sec/batch
Epoch: 15/20...  Training Step: 2935...  Training loss: 1.8752...  0.6019 sec/batch
Epoch: 15/20...  Training Step: 2936...  Training loss: 1.8581...  0.6110 sec/batch
Epoch: 15/20...  Training Step: 2937...  Training loss: 1.8627...  0.6148 sec/batch
Epoch: 15/20...  Training Step: 2938...  Training loss: 1.8579...  0.6088 sec/batch
Epoch: 15/20...  Training Step: 2939...  Training loss: 1.8709...  0.6159 sec/batch
Epoch: 15/20...  Training Step: 2940...  Training loss: 1.9285...  0.6083 sec/batch
Epoch: 15/20...  Training Step: 2941...  Training loss: 1.8461...  0.6104 sec/batch
Epoch: 15/20...  Training Step: 2942...  Training loss: 1.8413...  0.5942 sec/batch
Epoch: 15/20...  Training Step: 2943...  Training loss: 1.8516...  0.6059 sec/batch
Epoch: 15/20...  Training Step: 2944...  Training loss: 1.8421...  0.6017 sec/batch
Epoch: 15/20...  Training Step: 2945...  Training loss: 1.8670...  0.6083 sec/batch
Epoch: 15/20...  Training Step: 2946...  Training loss: 1.8586...  0.5989 sec/batch
Epoch: 15/20...  Training Step: 2947...  Training loss: 1.8690...  0.6056 sec/batch
Epoch: 15/20...  Training Step: 2948...  Training loss: 1.8444...  0.6094 sec/batch
Epoch: 15/20...  Training Step: 2949...  Training loss: 1.8350...  0.6034 sec/batch
Epoch: 15/20...  Training Step: 2950...  Training loss: 1.8684...  0.6090 sec/batch
Epoch: 15/20...  Training Step: 2951...  Training loss: 1.8309...  0.5880 sec/batch
Epoch: 15/20...  Training Step: 2952...  Training loss: 1.8218...  0.5981 sec/batch
Epoch: 15/20...  Training Step: 2953...  Training loss: 1.8398...  0.6103 sec/batch
Epoch: 15/20...  Training Step: 2954...  Training loss: 1.8640...  0.5936 sec/batch
Epoch: 15/20...  Training Step: 2955...  Training loss: 1.8586...  0.5851 sec/batch
Epoch: 15/20...  Training Step: 2956...  Training loss: 1.8708...  0.6067 sec/batch
Epoch: 15/20...  Training Step: 2957...  Training loss: 1.8657...  0.6082 sec/batch
Epoch: 15/20...  Training Step: 2958...  Training loss: 1.8426...  0.6058 sec/batch
Epoch: 15/20...  Training Step: 2959...  Training loss: 1.8706...  0.6386 sec/batch
Epoch: 15/20...  Training Step: 2960...  Training loss: 1.8397...  0.6031 sec/batch
Epoch: 15/20...  Training Step: 2961...  Training loss: 1.8480...  0.5997 sec/batch
Epoch: 15/20...  Training Step: 2962...  Training loss: 1.8527...  0.6024 sec/batch
Epoch: 15/20...  Training Step: 2963...  Training loss: 1.8620...  0.5994 sec/batch
Epoch: 15/20...  Training Step: 2964...  Training loss: 1.8448...  0.6222 sec/batch
Epoch: 15/20...  Training Step: 2965...  Training loss: 1.8540...  0.6035 sec/batch
Epoch: 15/20...  Training Step: 2966...  Training loss: 1.8375...  0.5993 sec/batch
Epoch: 15/20...  Training Step: 2967...  Training loss: 1.8285...  0.6128 sec/batch
Epoch: 15/20...  Training Step: 2968...  Training loss: 1.8654...  0.5994 sec/batch
Epoch: 15/20...  Training Step: 2969...  Training loss: 1.8429...  0.5986 sec/batch
Epoch: 15/20...  Training Step: 2970...  Training loss: 1.8268...  0.6044 sec/batch
Epoch: 16/20...  Training Step: 2971...  Training loss: 1.9272...  0.6022 sec/batch
Epoch: 16/20...  Training Step: 2972...  Training loss: 1.8298...  0.5944 sec/batch
Epoch: 16/20...  Training Step: 2973...  Training loss: 1.8402...  0.6072 sec/batch
Epoch: 16/20...  Training Step: 2974...  Training loss: 1.8546...  0.6114 sec/batch
Epoch: 16/20...  Training Step: 2975...  Training loss: 1.8483...  0.6063 sec/batch
Epoch: 16/20...  Training Step: 2976...  Training loss: 1.8236...  0.6046 sec/batch
Epoch: 16/20...  Training Step: 2977...  Training loss: 1.8566...  0.6105 sec/batch
Epoch: 16/20...  Training Step: 2978...  Training loss: 1.8362...  0.6130 sec/batch
Epoch: 16/20...  Training Step: 2979...  Training loss: 1.8815...  0.5957 sec/batch
Epoch: 16/20...  Training Step: 2980...  Training loss: 1.8482...  0.6103 sec/batch
Epoch: 16/20...  Training Step: 2981...  Training loss: 1.8211...  0.5957 sec/batch
Epoch: 16/20...  Training Step: 2982...  Training loss: 1.8350...  0.5979 sec/batch
Epoch: 16/20...  Training Step: 2983...  Training loss: 1.8642...  0.6123 sec/batch
Epoch: 16/20...  Training Step: 2984...  Training loss: 1.8888...  0.6034 sec/batch
Epoch: 16/20...  Training Step: 2985...  Training loss: 1.8464...  0.6044 sec/batch
Epoch: 16/20...  Training Step: 2986...  Training loss: 1.8343...  0.6040 sec/batch
Epoch: 16/20...  Training Step: 2987...  Training loss: 1.8431...  0.6151 sec/batch
Epoch: 16/20...  Training Step: 2988...  Training loss: 1.8917...  0.6098 sec/batch
Epoch: 16/20...  Training Step: 2989...  Training loss: 1.8511...  0.6143 sec/batch
Epoch: 16/20...  Training Step: 2990...  Training loss: 1.8633...  0.6089 sec/batch
Epoch: 16/20...  Training Step: 2991...  Training loss: 1.8308...  0.6013 sec/batch
Epoch: 16/20...  Training Step: 2992...  Training loss: 1.8821...  0.6005 sec/batch
Epoch: 16/20...  Training Step: 2993...  Training loss: 1.8595...  0.6025 sec/batch
Epoch: 16/20...  Training Step: 2994...  Training loss: 1.8394...  0.5998 sec/batch
Epoch: 16/20...  Training Step: 2995...  Training loss: 1.8463...  0.6083 sec/batch
Epoch: 16/20...  Training Step: 2996...  Training loss: 1.8251...  0.6067 sec/batch
Epoch: 16/20...  Training Step: 2997...  Training loss: 1.8293...  0.6326 sec/batch
Epoch: 16/20...  Training Step: 2998...  Training loss: 1.8736...  0.6118 sec/batch
Epoch: 16/20...  Training Step: 2999...  Training loss: 1.8881...  0.6080 sec/batch
Epoch: 16/20...  Training Step: 3000...  Training loss: 1.8624...  0.6090 sec/batch
Epoch: 16/20...  Training Step: 3001...  Training loss: 1.8570...  0.5734 sec/batch
Epoch: 16/20...  Training Step: 3002...  Training loss: 1.8271...  0.6011 sec/batch
Epoch: 16/20...  Training Step: 3003...  Training loss: 1.8638...  0.6081 sec/batch
Epoch: 16/20...  Training Step: 3004...  Training loss: 1.8690...  0.5968 sec/batch
Epoch: 16/20...  Training Step: 3005...  Training loss: 1.8411...  0.6049 sec/batch
Epoch: 16/20...  Training Step: 3006...  Training loss: 1.8642...  0.6024 sec/batch
Epoch: 16/20...  Training Step: 3007...  Training loss: 1.8245...  0.6225 sec/batch
Epoch: 16/20...  Training Step: 3008...  Training loss: 1.8174...  0.6053 sec/batch
Epoch: 16/20...  Training Step: 3009...  Training loss: 1.8180...  0.6079 sec/batch
Epoch: 16/20...  Training Step: 3010...  Training loss: 1.8274...  0.6107 sec/batch
Epoch: 16/20...  Training Step: 3011...  Training loss: 1.8372...  0.6061 sec/batch
Epoch: 16/20...  Training Step: 3012...  Training loss: 1.8760...  0.6194 sec/batch
Epoch: 16/20...  Training Step: 3013...  Training loss: 1.8277...  0.6077 sec/batch
Epoch: 16/20...  Training Step: 3014...  Training loss: 1.8125...  0.6107 sec/batch
Epoch: 16/20...  Training Step: 3015...  Training loss: 1.8499...  0.5935 sec/batch
Epoch: 16/20...  Training Step: 3016...  Training loss: 1.8049...  0.6131 sec/batch
Epoch: 16/20...  Training Step: 3017...  Training loss: 1.8390...  0.6059 sec/batch
Epoch: 16/20...  Training Step: 3018...  Training loss: 1.8335...  0.6099 sec/batch
Epoch: 16/20...  Training Step: 3019...  Training loss: 1.8339...  0.6097 sec/batch
Epoch: 16/20...  Training Step: 3020...  Training loss: 1.8858...  0.6237 sec/batch
Epoch: 16/20...  Training Step: 3021...  Training loss: 1.8193...  0.6021 sec/batch
Epoch: 16/20...  Training Step: 3022...  Training loss: 1.8969...  0.6095 sec/batch
Epoch: 16/20...  Training Step: 3023...  Training loss: 1.8516...  0.6105 sec/batch
Epoch: 16/20...  Training Step: 3024...  Training loss: 1.8448...  0.5922 sec/batch
Epoch: 16/20...  Training Step: 3025...  Training loss: 1.8261...  0.5767 sec/batch
Epoch: 16/20...  Training Step: 3026...  Training loss: 1.8625...  0.6093 sec/batch
Epoch: 16/20...  Training Step: 3027...  Training loss: 1.8732...  0.6166 sec/batch
Epoch: 16/20...  Training Step: 3028...  Training loss: 1.8373...  0.6774 sec/batch
Epoch: 16/20...  Training Step: 3029...  Training loss: 1.8247...  0.7644 sec/batch
Epoch: 16/20...  Training Step: 3030...  Training loss: 1.8637...  0.6292 sec/batch
Epoch: 16/20...  Training Step: 3031...  Training loss: 1.8539...  0.6094 sec/batch
Epoch: 16/20...  Training Step: 3032...  Training loss: 1.9059...  0.6145 sec/batch
Epoch: 16/20...  Training Step: 3033...  Training loss: 1.8793...  0.6056 sec/batch
Epoch: 16/20...  Training Step: 3034...  Training loss: 1.8566...  0.6123 sec/batch
Epoch: 16/20...  Training Step: 3035...  Training loss: 1.8383...  0.6083 sec/batch
Epoch: 16/20...  Training Step: 3036...  Training loss: 1.8671...  0.6121 sec/batch
Epoch: 16/20...  Training Step: 3037...  Training loss: 1.8615...  0.6110 sec/batch
Epoch: 16/20...  Training Step: 3038...  Training loss: 1.8165...  0.5886 sec/batch
Epoch: 16/20...  Training Step: 3039...  Training loss: 1.8331...  0.6086 sec/batch
Epoch: 16/20...  Training Step: 3040...  Training loss: 1.8288...  0.5916 sec/batch
Epoch: 16/20...  Training Step: 3041...  Training loss: 1.8761...  0.6019 sec/batch
Epoch: 16/20...  Training Step: 3042...  Training loss: 1.8626...  0.6059 sec/batch
Epoch: 16/20...  Training Step: 3043...  Training loss: 1.8853...  0.6102 sec/batch
Epoch: 16/20...  Training Step: 3044...  Training loss: 1.8422...  0.6176 sec/batch
Epoch: 16/20...  Training Step: 3045...  Training loss: 1.8422...  0.6210 sec/batch
Epoch: 16/20...  Training Step: 3046...  Training loss: 1.8729...  0.6060 sec/batch
Epoch: 16/20...  Training Step: 3047...  Training loss: 1.8563...  0.6052 sec/batch
Epoch: 16/20...  Training Step: 3048...  Training loss: 1.8565...  0.6068 sec/batch
Epoch: 16/20...  Training Step: 3049...  Training loss: 1.8232...  0.5982 sec/batch
Epoch: 16/20...  Training Step: 3050...  Training loss: 1.8342...  0.6186 sec/batch
Epoch: 16/20...  Training Step: 3051...  Training loss: 1.8032...  0.6107 sec/batch
Epoch: 16/20...  Training Step: 3052...  Training loss: 1.8654...  0.6099 sec/batch
Epoch: 16/20...  Training Step: 3053...  Training loss: 1.8163...  0.6174 sec/batch
Epoch: 16/20...  Training Step: 3054...  Training loss: 1.8357...  0.6221 sec/batch
Epoch: 16/20...  Training Step: 3055...  Training loss: 1.8085...  0.6100 sec/batch
Epoch: 16/20...  Training Step: 3056...  Training loss: 1.8334...  0.6688 sec/batch
Epoch: 16/20...  Training Step: 3057...  Training loss: 1.8301...  0.6011 sec/batch
Epoch: 16/20...  Training Step: 3058...  Training loss: 1.8187...  0.6074 sec/batch
Epoch: 16/20...  Training Step: 3059...  Training loss: 1.8071...  0.5957 sec/batch
Epoch: 16/20...  Training Step: 3060...  Training loss: 1.8515...  0.6383 sec/batch
Epoch: 16/20...  Training Step: 3061...  Training loss: 1.8177...  0.6305 sec/batch
Epoch: 16/20...  Training Step: 3062...  Training loss: 1.8353...  0.6012 sec/batch
Epoch: 16/20...  Training Step: 3063...  Training loss: 1.8187...  0.6218 sec/batch
Epoch: 16/20...  Training Step: 3064...  Training loss: 1.8179...  0.6087 sec/batch
Epoch: 16/20...  Training Step: 3065...  Training loss: 1.8218...  0.6044 sec/batch
Epoch: 16/20...  Training Step: 3066...  Training loss: 1.8589...  0.6115 sec/batch
Epoch: 16/20...  Training Step: 3067...  Training loss: 1.8382...  0.6243 sec/batch
Epoch: 16/20...  Training Step: 3068...  Training loss: 1.8080...  0.6227 sec/batch
Epoch: 16/20...  Training Step: 3069...  Training loss: 1.8234...  0.5929 sec/batch
Epoch: 16/20...  Training Step: 3070...  Training loss: 1.7972...  0.5809 sec/batch
Epoch: 16/20...  Training Step: 3071...  Training loss: 1.8425...  0.6064 sec/batch
Epoch: 16/20...  Training Step: 3072...  Training loss: 1.8317...  0.5970 sec/batch
Epoch: 16/20...  Training Step: 3073...  Training loss: 1.8277...  0.6094 sec/batch
Epoch: 16/20...  Training Step: 3074...  Training loss: 1.8255...  0.6065 sec/batch
Epoch: 16/20...  Training Step: 3075...  Training loss: 1.8337...  0.6087 sec/batch
Epoch: 16/20...  Training Step: 3076...  Training loss: 1.8433...  0.6050 sec/batch
Epoch: 16/20...  Training Step: 3077...  Training loss: 1.8467...  0.6161 sec/batch
Epoch: 16/20...  Training Step: 3078...  Training loss: 1.8449...  0.5975 sec/batch
Epoch: 16/20...  Training Step: 3079...  Training loss: 1.8429...  0.6116 sec/batch
Epoch: 16/20...  Training Step: 3080...  Training loss: 1.8447...  0.6099 sec/batch
Epoch: 16/20...  Training Step: 3081...  Training loss: 1.8401...  0.6113 sec/batch
Epoch: 16/20...  Training Step: 3082...  Training loss: 1.8412...  0.6111 sec/batch
Epoch: 16/20...  Training Step: 3083...  Training loss: 1.8370...  0.6067 sec/batch
Epoch: 16/20...  Training Step: 3084...  Training loss: 1.8304...  0.6105 sec/batch
Epoch: 16/20...  Training Step: 3085...  Training loss: 1.8216...  0.5891 sec/batch
Epoch: 16/20...  Training Step: 3086...  Training loss: 1.7977...  0.6143 sec/batch
Epoch: 16/20...  Training Step: 3087...  Training loss: 1.8493...  0.6061 sec/batch
Epoch: 16/20...  Training Step: 3088...  Training loss: 1.8377...  0.6124 sec/batch
Epoch: 16/20...  Training Step: 3089...  Training loss: 1.8347...  0.6159 sec/batch
Epoch: 16/20...  Training Step: 3090...  Training loss: 1.8370...  0.6102 sec/batch
Epoch: 16/20...  Training Step: 3091...  Training loss: 1.8566...  0.6047 sec/batch
Epoch: 16/20...  Training Step: 3092...  Training loss: 1.8179...  0.6053 sec/batch
Epoch: 16/20...  Training Step: 3093...  Training loss: 1.7995...  0.6191 sec/batch
Epoch: 16/20...  Training Step: 3094...  Training loss: 1.8609...  0.5953 sec/batch
Epoch: 16/20...  Training Step: 3095...  Training loss: 1.8272...  0.5941 sec/batch
Epoch: 16/20...  Training Step: 3096...  Training loss: 1.8014...  0.6051 sec/batch
Epoch: 16/20...  Training Step: 3097...  Training loss: 1.8550...  0.6082 sec/batch
Epoch: 16/20...  Training Step: 3098...  Training loss: 1.8528...  0.6103 sec/batch
Epoch: 16/20...  Training Step: 3099...  Training loss: 1.8360...  0.6031 sec/batch
Epoch: 16/20...  Training Step: 3100...  Training loss: 1.8232...  0.6088 sec/batch
Epoch: 16/20...  Training Step: 3101...  Training loss: 1.8159...  0.6088 sec/batch
Epoch: 16/20...  Training Step: 3102...  Training loss: 1.8096...  0.6337 sec/batch
Epoch: 16/20...  Training Step: 3103...  Training loss: 1.8488...  0.6140 sec/batch
Epoch: 16/20...  Training Step: 3104...  Training loss: 1.8476...  0.5960 sec/batch
Epoch: 16/20...  Training Step: 3105...  Training loss: 1.8442...  0.6276 sec/batch
Epoch: 16/20...  Training Step: 3106...  Training loss: 1.8497...  0.6045 sec/batch
Epoch: 16/20...  Training Step: 3107...  Training loss: 1.8556...  0.6144 sec/batch
Epoch: 16/20...  Training Step: 3108...  Training loss: 1.8470...  0.6123 sec/batch
Epoch: 16/20...  Training Step: 3109...  Training loss: 1.8641...  0.5981 sec/batch
Epoch: 16/20...  Training Step: 3110...  Training loss: 1.8256...  0.6145 sec/batch
Epoch: 16/20...  Training Step: 3111...  Training loss: 1.8862...  0.5998 sec/batch
Epoch: 16/20...  Training Step: 3112...  Training loss: 1.8382...  0.6149 sec/batch
Epoch: 16/20...  Training Step: 3113...  Training loss: 1.8341...  0.5991 sec/batch
Epoch: 16/20...  Training Step: 3114...  Training loss: 1.8417...  0.6111 sec/batch
Epoch: 16/20...  Training Step: 3115...  Training loss: 1.8400...  0.6054 sec/batch
Epoch: 16/20...  Training Step: 3116...  Training loss: 1.8499...  0.6065 sec/batch
Epoch: 16/20...  Training Step: 3117...  Training loss: 1.8579...  0.6151 sec/batch
Epoch: 16/20...  Training Step: 3118...  Training loss: 1.8672...  0.6065 sec/batch
Epoch: 16/20...  Training Step: 3119...  Training loss: 1.8401...  0.6127 sec/batch
Epoch: 16/20...  Training Step: 3120...  Training loss: 1.8310...  0.6071 sec/batch
Epoch: 16/20...  Training Step: 3121...  Training loss: 1.8198...  0.6027 sec/batch
Epoch: 16/20...  Training Step: 3122...  Training loss: 1.8559...  0.6032 sec/batch
Epoch: 16/20...  Training Step: 3123...  Training loss: 1.8429...  0.6201 sec/batch
Epoch: 16/20...  Training Step: 3124...  Training loss: 1.8497...  0.6138 sec/batch
Epoch: 16/20...  Training Step: 3125...  Training loss: 1.8453...  0.6093 sec/batch
Epoch: 16/20...  Training Step: 3126...  Training loss: 1.8338...  0.6115 sec/batch
Epoch: 16/20...  Training Step: 3127...  Training loss: 1.8564...  0.6182 sec/batch
Epoch: 16/20...  Training Step: 3128...  Training loss: 1.8340...  0.6161 sec/batch
Epoch: 16/20...  Training Step: 3129...  Training loss: 1.8113...  0.6054 sec/batch
Epoch: 16/20...  Training Step: 3130...  Training loss: 1.8689...  0.6153 sec/batch
Epoch: 16/20...  Training Step: 3131...  Training loss: 1.8715...  0.6047 sec/batch
Epoch: 16/20...  Training Step: 3132...  Training loss: 1.8304...  0.6150 sec/batch
Epoch: 16/20...  Training Step: 3133...  Training loss: 1.8557...  0.6026 sec/batch
Epoch: 16/20...  Training Step: 3134...  Training loss: 1.8481...  0.6182 sec/batch
Epoch: 16/20...  Training Step: 3135...  Training loss: 1.8423...  0.6105 sec/batch
Epoch: 16/20...  Training Step: 3136...  Training loss: 1.8398...  0.5984 sec/batch
Epoch: 16/20...  Training Step: 3137...  Training loss: 1.8518...  0.6113 sec/batch
Epoch: 16/20...  Training Step: 3138...  Training loss: 1.8927...  0.6073 sec/batch
Epoch: 16/20...  Training Step: 3139...  Training loss: 1.8392...  0.6040 sec/batch
Epoch: 16/20...  Training Step: 3140...  Training loss: 1.8295...  0.6042 sec/batch
Epoch: 16/20...  Training Step: 3141...  Training loss: 1.8208...  0.6078 sec/batch
Epoch: 16/20...  Training Step: 3142...  Training loss: 1.8226...  0.6121 sec/batch
Epoch: 16/20...  Training Step: 3143...  Training loss: 1.8662...  0.6161 sec/batch
Epoch: 16/20...  Training Step: 3144...  Training loss: 1.8553...  0.5923 sec/batch
Epoch: 16/20...  Training Step: 3145...  Training loss: 1.8427...  0.6024 sec/batch
Epoch: 16/20...  Training Step: 3146...  Training loss: 1.8262...  0.6174 sec/batch
Epoch: 16/20...  Training Step: 3147...  Training loss: 1.8246...  0.5972 sec/batch
Epoch: 16/20...  Training Step: 3148...  Training loss: 1.8540...  0.6146 sec/batch
Epoch: 16/20...  Training Step: 3149...  Training loss: 1.8138...  0.6053 sec/batch
Epoch: 16/20...  Training Step: 3150...  Training loss: 1.8110...  0.6133 sec/batch
Epoch: 16/20...  Training Step: 3151...  Training loss: 1.8093...  0.6132 sec/batch
Epoch: 16/20...  Training Step: 3152...  Training loss: 1.8502...  0.6089 sec/batch
Epoch: 16/20...  Training Step: 3153...  Training loss: 1.8334...  0.6061 sec/batch
Epoch: 16/20...  Training Step: 3154...  Training loss: 1.8642...  0.6087 sec/batch
Epoch: 16/20...  Training Step: 3155...  Training loss: 1.8463...  0.6523 sec/batch
Epoch: 16/20...  Training Step: 3156...  Training loss: 1.8193...  0.6084 sec/batch
Epoch: 16/20...  Training Step: 3157...  Training loss: 1.8500...  0.6084 sec/batch
Epoch: 16/20...  Training Step: 3158...  Training loss: 1.8297...  0.5952 sec/batch
Epoch: 16/20...  Training Step: 3159...  Training loss: 1.8405...  0.6164 sec/batch
Epoch: 16/20...  Training Step: 3160...  Training loss: 1.8435...  0.6131 sec/batch
Epoch: 16/20...  Training Step: 3161...  Training loss: 1.8506...  0.6122 sec/batch
Epoch: 16/20...  Training Step: 3162...  Training loss: 1.8207...  0.6102 sec/batch
Epoch: 16/20...  Training Step: 3163...  Training loss: 1.8408...  0.6027 sec/batch
Epoch: 16/20...  Training Step: 3164...  Training loss: 1.8199...  0.6117 sec/batch
Epoch: 16/20...  Training Step: 3165...  Training loss: 1.8061...  0.6009 sec/batch
Epoch: 16/20...  Training Step: 3166...  Training loss: 1.8528...  0.6126 sec/batch
Epoch: 16/20...  Training Step: 3167...  Training loss: 1.8318...  0.5960 sec/batch
Epoch: 16/20...  Training Step: 3168...  Training loss: 1.8244...  0.6124 sec/batch
Epoch: 17/20...  Training Step: 3169...  Training loss: 1.9116...  0.6102 sec/batch
Epoch: 17/20...  Training Step: 3170...  Training loss: 1.8098...  0.6118 sec/batch
Epoch: 17/20...  Training Step: 3171...  Training loss: 1.8294...  0.6079 sec/batch
Epoch: 17/20...  Training Step: 3172...  Training loss: 1.8399...  0.6077 sec/batch
Epoch: 17/20...  Training Step: 3173...  Training loss: 1.8289...  0.6142 sec/batch
Epoch: 17/20...  Training Step: 3174...  Training loss: 1.7873...  0.6126 sec/batch
Epoch: 17/20...  Training Step: 3175...  Training loss: 1.8409...  0.6128 sec/batch
Epoch: 17/20...  Training Step: 3176...  Training loss: 1.8206...  0.6045 sec/batch
Epoch: 17/20...  Training Step: 3177...  Training loss: 1.8723...  0.6044 sec/batch
Epoch: 17/20...  Training Step: 3178...  Training loss: 1.8284...  0.6146 sec/batch
Epoch: 17/20...  Training Step: 3179...  Training loss: 1.8075...  0.6040 sec/batch
Epoch: 17/20...  Training Step: 3180...  Training loss: 1.8130...  0.6026 sec/batch
Epoch: 17/20...  Training Step: 3181...  Training loss: 1.8394...  0.6151 sec/batch
Epoch: 17/20...  Training Step: 3182...  Training loss: 1.8661...  0.6140 sec/batch
Epoch: 17/20...  Training Step: 3183...  Training loss: 1.8392...  0.6065 sec/batch
Epoch: 17/20...  Training Step: 3184...  Training loss: 1.8032...  0.6354 sec/batch
Epoch: 17/20...  Training Step: 3185...  Training loss: 1.8425...  0.6131 sec/batch
Epoch: 17/20...  Training Step: 3186...  Training loss: 1.8693...  0.6109 sec/batch
Epoch: 17/20...  Training Step: 3187...  Training loss: 1.8340...  0.6175 sec/batch
Epoch: 17/20...  Training Step: 3188...  Training loss: 1.8456...  0.6001 sec/batch
Epoch: 17/20...  Training Step: 3189...  Training loss: 1.8179...  0.6046 sec/batch
Epoch: 17/20...  Training Step: 3190...  Training loss: 1.8565...  0.6034 sec/batch
Epoch: 17/20...  Training Step: 3191...  Training loss: 1.8361...  0.6076 sec/batch
Epoch: 17/20...  Training Step: 3192...  Training loss: 1.8395...  0.6143 sec/batch
Epoch: 17/20...  Training Step: 3193...  Training loss: 1.8300...  0.6117 sec/batch
Epoch: 17/20...  Training Step: 3194...  Training loss: 1.8113...  0.6134 sec/batch
Epoch: 17/20...  Training Step: 3195...  Training loss: 1.8266...  0.6031 sec/batch
Epoch: 17/20...  Training Step: 3196...  Training loss: 1.8420...  0.5990 sec/batch
Epoch: 17/20...  Training Step: 3197...  Training loss: 1.8686...  0.6097 sec/batch
Epoch: 17/20...  Training Step: 3198...  Training loss: 1.8572...  0.6122 sec/batch
Epoch: 17/20...  Training Step: 3199...  Training loss: 1.8264...  0.6090 sec/batch
Epoch: 17/20...  Training Step: 3200...  Training loss: 1.8101...  0.6111 sec/batch
Epoch: 17/20...  Training Step: 3201...  Training loss: 1.8579...  0.6013 sec/batch
Epoch: 17/20...  Training Step: 3202...  Training loss: 1.8652...  0.6132 sec/batch
Epoch: 17/20...  Training Step: 3203...  Training loss: 1.8229...  0.6388 sec/batch
Epoch: 17/20...  Training Step: 3204...  Training loss: 1.8323...  0.6022 sec/batch
Epoch: 17/20...  Training Step: 3205...  Training loss: 1.8057...  0.6055 sec/batch
Epoch: 17/20...  Training Step: 3206...  Training loss: 1.8015...  0.6217 sec/batch
Epoch: 17/20...  Training Step: 3207...  Training loss: 1.7862...  0.5987 sec/batch
Epoch: 17/20...  Training Step: 3208...  Training loss: 1.8078...  0.6109 sec/batch
Epoch: 17/20...  Training Step: 3209...  Training loss: 1.8171...  0.6062 sec/batch
Epoch: 17/20...  Training Step: 3210...  Training loss: 1.8498...  0.6132 sec/batch
Epoch: 17/20...  Training Step: 3211...  Training loss: 1.8180...  0.6034 sec/batch
Epoch: 17/20...  Training Step: 3212...  Training loss: 1.8020...  0.6147 sec/batch
Epoch: 17/20...  Training Step: 3213...  Training loss: 1.8408...  0.6036 sec/batch
Epoch: 17/20...  Training Step: 3214...  Training loss: 1.7844...  0.6151 sec/batch
Epoch: 17/20...  Training Step: 3215...  Training loss: 1.8239...  0.6040 sec/batch
Epoch: 17/20...  Training Step: 3216...  Training loss: 1.8145...  0.6154 sec/batch
Epoch: 17/20...  Training Step: 3217...  Training loss: 1.8177...  0.5991 sec/batch
Epoch: 17/20...  Training Step: 3218...  Training loss: 1.8741...  0.5850 sec/batch
Epoch: 17/20...  Training Step: 3219...  Training loss: 1.8101...  0.5888 sec/batch
Epoch: 17/20...  Training Step: 3220...  Training loss: 1.8820...  0.6091 sec/batch
Epoch: 17/20...  Training Step: 3221...  Training loss: 1.8222...  0.6127 sec/batch
Epoch: 17/20...  Training Step: 3222...  Training loss: 1.8316...  0.6059 sec/batch
Epoch: 17/20...  Training Step: 3223...  Training loss: 1.8100...  0.6164 sec/batch
Epoch: 17/20...  Training Step: 3224...  Training loss: 1.8372...  0.6120 sec/batch
Epoch: 17/20...  Training Step: 3225...  Training loss: 1.8507...  0.6083 sec/batch
Epoch: 17/20...  Training Step: 3226...  Training loss: 1.8202...  0.6030 sec/batch
Epoch: 17/20...  Training Step: 3227...  Training loss: 1.8135...  0.6072 sec/batch
Epoch: 17/20...  Training Step: 3228...  Training loss: 1.8400...  0.6245 sec/batch
Epoch: 17/20...  Training Step: 3229...  Training loss: 1.8327...  0.6277 sec/batch
Epoch: 17/20...  Training Step: 3230...  Training loss: 1.8811...  0.6111 sec/batch
Epoch: 17/20...  Training Step: 3231...  Training loss: 1.8615...  0.6147 sec/batch
Epoch: 17/20...  Training Step: 3232...  Training loss: 1.8345...  0.6133 sec/batch
Epoch: 17/20...  Training Step: 3233...  Training loss: 1.8206...  0.6098 sec/batch
Epoch: 17/20...  Training Step: 3234...  Training loss: 1.8523...  0.5933 sec/batch
Epoch: 17/20...  Training Step: 3235...  Training loss: 1.8449...  0.6047 sec/batch
Epoch: 17/20...  Training Step: 3236...  Training loss: 1.8065...  0.6053 sec/batch
Epoch: 17/20...  Training Step: 3237...  Training loss: 1.8249...  0.6092 sec/batch
Epoch: 17/20...  Training Step: 3238...  Training loss: 1.8163...  0.6022 sec/batch
Epoch: 17/20...  Training Step: 3239...  Training loss: 1.8614...  0.6211 sec/batch
Epoch: 17/20...  Training Step: 3240...  Training loss: 1.8516...  0.6092 sec/batch
Epoch: 17/20...  Training Step: 3241...  Training loss: 1.8694...  0.6136 sec/batch
Epoch: 17/20...  Training Step: 3242...  Training loss: 1.8275...  0.6068 sec/batch
Epoch: 17/20...  Training Step: 3243...  Training loss: 1.8280...  0.6047 sec/batch
Epoch: 17/20...  Training Step: 3244...  Training loss: 1.8590...  0.6164 sec/batch
Epoch: 17/20...  Training Step: 3245...  Training loss: 1.8325...  0.6080 sec/batch
Epoch: 17/20...  Training Step: 3246...  Training loss: 1.8363...  0.6031 sec/batch
Epoch: 17/20...  Training Step: 3247...  Training loss: 1.7984...  0.6141 sec/batch
Epoch: 17/20...  Training Step: 3248...  Training loss: 1.8184...  0.5982 sec/batch
Epoch: 17/20...  Training Step: 3249...  Training loss: 1.7847...  0.6112 sec/batch
Epoch: 17/20...  Training Step: 3250...  Training loss: 1.8452...  0.6180 sec/batch
Epoch: 17/20...  Training Step: 3251...  Training loss: 1.7870...  0.6194 sec/batch
Epoch: 17/20...  Training Step: 3252...  Training loss: 1.8198...  0.6402 sec/batch
Epoch: 17/20...  Training Step: 3253...  Training loss: 1.7963...  0.6120 sec/batch
Epoch: 17/20...  Training Step: 3254...  Training loss: 1.8137...  0.5924 sec/batch
Epoch: 17/20...  Training Step: 3255...  Training loss: 1.8160...  0.6141 sec/batch
Epoch: 17/20...  Training Step: 3256...  Training loss: 1.8110...  0.5857 sec/batch
Epoch: 17/20...  Training Step: 3257...  Training loss: 1.7834...  0.6135 sec/batch
Epoch: 17/20...  Training Step: 3258...  Training loss: 1.8399...  0.6059 sec/batch
Epoch: 17/20...  Training Step: 3259...  Training loss: 1.8003...  0.6137 sec/batch
Epoch: 17/20...  Training Step: 3260...  Training loss: 1.8215...  0.6105 sec/batch
Epoch: 17/20...  Training Step: 3261...  Training loss: 1.8009...  0.6093 sec/batch
Epoch: 17/20...  Training Step: 3262...  Training loss: 1.7998...  0.6186 sec/batch
Epoch: 17/20...  Training Step: 3263...  Training loss: 1.8046...  0.5933 sec/batch
Epoch: 17/20...  Training Step: 3264...  Training loss: 1.8400...  0.6128 sec/batch
Epoch: 17/20...  Training Step: 3265...  Training loss: 1.8310...  0.6227 sec/batch
Epoch: 17/20...  Training Step: 3266...  Training loss: 1.7991...  0.6067 sec/batch
Epoch: 17/20...  Training Step: 3267...  Training loss: 1.8080...  0.6149 sec/batch
Epoch: 17/20...  Training Step: 3268...  Training loss: 1.7834...  0.6081 sec/batch
Epoch: 17/20...  Training Step: 3269...  Training loss: 1.8371...  0.6155 sec/batch
Epoch: 17/20...  Training Step: 3270...  Training loss: 1.8204...  0.6000 sec/batch
Epoch: 17/20...  Training Step: 3271...  Training loss: 1.8155...  0.6098 sec/batch
Epoch: 17/20...  Training Step: 3272...  Training loss: 1.8058...  0.6117 sec/batch
Epoch: 17/20...  Training Step: 3273...  Training loss: 1.8143...  0.6202 sec/batch
Epoch: 17/20...  Training Step: 3274...  Training loss: 1.8186...  0.6068 sec/batch
Epoch: 17/20...  Training Step: 3275...  Training loss: 1.8315...  0.6117 sec/batch
Epoch: 17/20...  Training Step: 3276...  Training loss: 1.8260...  0.6073 sec/batch
Epoch: 17/20...  Training Step: 3277...  Training loss: 1.8374...  0.6008 sec/batch
Epoch: 17/20...  Training Step: 3278...  Training loss: 1.8395...  0.6088 sec/batch
Epoch: 17/20...  Training Step: 3279...  Training loss: 1.8068...  0.6044 sec/batch
Epoch: 17/20...  Training Step: 3280...  Training loss: 1.8180...  0.5915 sec/batch
Epoch: 17/20...  Training Step: 3281...  Training loss: 1.8253...  0.6096 sec/batch
Epoch: 17/20...  Training Step: 3282...  Training loss: 1.8142...  0.6240 sec/batch
Epoch: 17/20...  Training Step: 3283...  Training loss: 1.8034...  0.6071 sec/batch
Epoch: 17/20...  Training Step: 3284...  Training loss: 1.7892...  0.6087 sec/batch
Epoch: 17/20...  Training Step: 3285...  Training loss: 1.8313...  0.6121 sec/batch
Epoch: 17/20...  Training Step: 3286...  Training loss: 1.8204...  0.6029 sec/batch
Epoch: 17/20...  Training Step: 3287...  Training loss: 1.8277...  0.6107 sec/batch
Epoch: 17/20...  Training Step: 3288...  Training loss: 1.8203...  0.6132 sec/batch
Epoch: 17/20...  Training Step: 3289...  Training loss: 1.8354...  0.6093 sec/batch
Epoch: 17/20...  Training Step: 3290...  Training loss: 1.7987...  0.6182 sec/batch
Epoch: 17/20...  Training Step: 3291...  Training loss: 1.8057...  0.6032 sec/batch
Epoch: 17/20...  Training Step: 3292...  Training loss: 1.8426...  0.6076 sec/batch
Epoch: 17/20...  Training Step: 3293...  Training loss: 1.8117...  0.6107 sec/batch
Epoch: 17/20...  Training Step: 3294...  Training loss: 1.7864...  0.6092 sec/batch
Epoch: 17/20...  Training Step: 3295...  Training loss: 1.8428...  0.6145 sec/batch
Epoch: 17/20...  Training Step: 3296...  Training loss: 1.8427...  0.6131 sec/batch
Epoch: 17/20...  Training Step: 3297...  Training loss: 1.8211...  0.6051 sec/batch
Epoch: 17/20...  Training Step: 3298...  Training loss: 1.8114...  0.6283 sec/batch
Epoch: 17/20...  Training Step: 3299...  Training loss: 1.8036...  0.5996 sec/batch
Epoch: 17/20...  Training Step: 3300...  Training loss: 1.7998...  0.5915 sec/batch
Epoch: 17/20...  Training Step: 3301...  Training loss: 1.8378...  0.6368 sec/batch
Epoch: 17/20...  Training Step: 3302...  Training loss: 1.8298...  0.6134 sec/batch
Epoch: 17/20...  Training Step: 3303...  Training loss: 1.8257...  0.6088 sec/batch
Epoch: 17/20...  Training Step: 3304...  Training loss: 1.8395...  0.6127 sec/batch
Epoch: 17/20...  Training Step: 3305...  Training loss: 1.8407...  0.6154 sec/batch
Epoch: 17/20...  Training Step: 3306...  Training loss: 1.8248...  0.6114 sec/batch
Epoch: 17/20...  Training Step: 3307...  Training loss: 1.8504...  0.6064 sec/batch
Epoch: 17/20...  Training Step: 3308...  Training loss: 1.8211...  0.6138 sec/batch
Epoch: 17/20...  Training Step: 3309...  Training loss: 1.8698...  0.6188 sec/batch
Epoch: 17/20...  Training Step: 3310...  Training loss: 1.8201...  0.6087 sec/batch
Epoch: 17/20...  Training Step: 3311...  Training loss: 1.8196...  0.6104 sec/batch
Epoch: 17/20...  Training Step: 3312...  Training loss: 1.8274...  0.6030 sec/batch
Epoch: 17/20...  Training Step: 3313...  Training loss: 1.8145...  0.6112 sec/batch
Epoch: 17/20...  Training Step: 3314...  Training loss: 1.8341...  0.6141 sec/batch
Epoch: 17/20...  Training Step: 3315...  Training loss: 1.8491...  0.5971 sec/batch
Epoch: 17/20...  Training Step: 3316...  Training loss: 1.8683...  0.6133 sec/batch
Epoch: 17/20...  Training Step: 3317...  Training loss: 1.8276...  0.6133 sec/batch
Epoch: 17/20...  Training Step: 3318...  Training loss: 1.8310...  0.6186 sec/batch
Epoch: 17/20...  Training Step: 3319...  Training loss: 1.7948...  0.6071 sec/batch
Epoch: 17/20...  Training Step: 3320...  Training loss: 1.8495...  0.6146 sec/batch
Epoch: 17/20...  Training Step: 3321...  Training loss: 1.8299...  0.5994 sec/batch
Epoch: 17/20...  Training Step: 3322...  Training loss: 1.8447...  0.6028 sec/batch
Epoch: 17/20...  Training Step: 3323...  Training loss: 1.8200...  0.6214 sec/batch
Epoch: 17/20...  Training Step: 3324...  Training loss: 1.8240...  0.6160 sec/batch
Epoch: 17/20...  Training Step: 3325...  Training loss: 1.8371...  0.6091 sec/batch
Epoch: 17/20...  Training Step: 3326...  Training loss: 1.8312...  0.6142 sec/batch
Epoch: 17/20...  Training Step: 3327...  Training loss: 1.7926...  0.6171 sec/batch
Epoch: 17/20...  Training Step: 3328...  Training loss: 1.8509...  0.6168 sec/batch
Epoch: 17/20...  Training Step: 3329...  Training loss: 1.8586...  0.6064 sec/batch
Epoch: 17/20...  Training Step: 3330...  Training loss: 1.8222...  0.6049 sec/batch
Epoch: 17/20...  Training Step: 3331...  Training loss: 1.8478...  0.6230 sec/batch
Epoch: 17/20...  Training Step: 3332...  Training loss: 1.8324...  0.6154 sec/batch
Epoch: 17/20...  Training Step: 3333...  Training loss: 1.8273...  0.6037 sec/batch
Epoch: 17/20...  Training Step: 3334...  Training loss: 1.8229...  0.6072 sec/batch
Epoch: 17/20...  Training Step: 3335...  Training loss: 1.8264...  0.6169 sec/batch
Epoch: 17/20...  Training Step: 3336...  Training loss: 1.8841...  0.6113 sec/batch
Epoch: 17/20...  Training Step: 3337...  Training loss: 1.8210...  0.6158 sec/batch
Epoch: 17/20...  Training Step: 3338...  Training loss: 1.8115...  0.6089 sec/batch
Epoch: 17/20...  Training Step: 3339...  Training loss: 1.8118...  0.6143 sec/batch
Epoch: 17/20...  Training Step: 3340...  Training loss: 1.8173...  0.6227 sec/batch
Epoch: 17/20...  Training Step: 3341...  Training loss: 1.8533...  0.6215 sec/batch
Epoch: 17/20...  Training Step: 3342...  Training loss: 1.8321...  0.6316 sec/batch
Epoch: 17/20...  Training Step: 3343...  Training loss: 1.8264...  0.6242 sec/batch
Epoch: 17/20...  Training Step: 3344...  Training loss: 1.8194...  0.6221 sec/batch
Epoch: 17/20...  Training Step: 3345...  Training loss: 1.8104...  0.5957 sec/batch
Epoch: 17/20...  Training Step: 3346...  Training loss: 1.8326...  0.6147 sec/batch
Epoch: 17/20...  Training Step: 3347...  Training loss: 1.7918...  0.6104 sec/batch
Epoch: 17/20...  Training Step: 3348...  Training loss: 1.7921...  0.6191 sec/batch
Epoch: 17/20...  Training Step: 3349...  Training loss: 1.7958...  0.6035 sec/batch
Epoch: 17/20...  Training Step: 3350...  Training loss: 1.8293...  0.6419 sec/batch
Epoch: 17/20...  Training Step: 3351...  Training loss: 1.8213...  0.5990 sec/batch
Epoch: 17/20...  Training Step: 3352...  Training loss: 1.8500...  0.6105 sec/batch
Epoch: 17/20...  Training Step: 3353...  Training loss: 1.8241...  0.6149 sec/batch
Epoch: 17/20...  Training Step: 3354...  Training loss: 1.8142...  0.6040 sec/batch
Epoch: 17/20...  Training Step: 3355...  Training loss: 1.8237...  0.6145 sec/batch
Epoch: 17/20...  Training Step: 3356...  Training loss: 1.8160...  0.6127 sec/batch
Epoch: 17/20...  Training Step: 3357...  Training loss: 1.8197...  0.6282 sec/batch
Epoch: 17/20...  Training Step: 3358...  Training loss: 1.8170...  0.6118 sec/batch
Epoch: 17/20...  Training Step: 3359...  Training loss: 1.8209...  0.6044 sec/batch
Epoch: 17/20...  Training Step: 3360...  Training loss: 1.8059...  0.6118 sec/batch
Epoch: 17/20...  Training Step: 3361...  Training loss: 1.8256...  0.6057 sec/batch
Epoch: 17/20...  Training Step: 3362...  Training loss: 1.7922...  0.6087 sec/batch
Epoch: 17/20...  Training Step: 3363...  Training loss: 1.7938...  0.6148 sec/batch
Epoch: 17/20...  Training Step: 3364...  Training loss: 1.8336...  0.6071 sec/batch
Epoch: 17/20...  Training Step: 3365...  Training loss: 1.8140...  0.6134 sec/batch
Epoch: 17/20...  Training Step: 3366...  Training loss: 1.7981...  0.6051 sec/batch
Epoch: 18/20...  Training Step: 3367...  Training loss: 1.9018...  0.6124 sec/batch
Epoch: 18/20...  Training Step: 3368...  Training loss: 1.8164...  0.6155 sec/batch
Epoch: 18/20...  Training Step: 3369...  Training loss: 1.8106...  0.6100 sec/batch
Epoch: 18/20...  Training Step: 3370...  Training loss: 1.8263...  0.6075 sec/batch
Epoch: 18/20...  Training Step: 3371...  Training loss: 1.8109...  0.6188 sec/batch
Epoch: 18/20...  Training Step: 3372...  Training loss: 1.7901...  0.6084 sec/batch
Epoch: 18/20...  Training Step: 3373...  Training loss: 1.8175...  0.6120 sec/batch
Epoch: 18/20...  Training Step: 3374...  Training loss: 1.8141...  0.6073 sec/batch
Epoch: 18/20...  Training Step: 3375...  Training loss: 1.8484...  0.5988 sec/batch
Epoch: 18/20...  Training Step: 3376...  Training loss: 1.7994...  0.6074 sec/batch
Epoch: 18/20...  Training Step: 3377...  Training loss: 1.7941...  0.6166 sec/batch
Epoch: 18/20...  Training Step: 3378...  Training loss: 1.7944...  0.6130 sec/batch
Epoch: 18/20...  Training Step: 3379...  Training loss: 1.8180...  0.6107 sec/batch
Epoch: 18/20...  Training Step: 3380...  Training loss: 1.8614...  0.6224 sec/batch
Epoch: 18/20...  Training Step: 3381...  Training loss: 1.8151...  0.6083 sec/batch
Epoch: 18/20...  Training Step: 3382...  Training loss: 1.7898...  0.6122 sec/batch
Epoch: 18/20...  Training Step: 3383...  Training loss: 1.8208...  0.6113 sec/batch
Epoch: 18/20...  Training Step: 3384...  Training loss: 1.8508...  0.6155 sec/batch
Epoch: 18/20...  Training Step: 3385...  Training loss: 1.8172...  0.6073 sec/batch
Epoch: 18/20...  Training Step: 3386...  Training loss: 1.8297...  0.6101 sec/batch
Epoch: 18/20...  Training Step: 3387...  Training loss: 1.8043...  0.6099 sec/batch
Epoch: 18/20...  Training Step: 3388...  Training loss: 1.8491...  0.6232 sec/batch
Epoch: 18/20...  Training Step: 3389...  Training loss: 1.8121...  0.6106 sec/batch
Epoch: 18/20...  Training Step: 3390...  Training loss: 1.8191...  0.6048 sec/batch
Epoch: 18/20...  Training Step: 3391...  Training loss: 1.8110...  0.6060 sec/batch
Epoch: 18/20...  Training Step: 3392...  Training loss: 1.7864...  0.6153 sec/batch
Epoch: 18/20...  Training Step: 3393...  Training loss: 1.7895...  0.6115 sec/batch
Epoch: 18/20...  Training Step: 3394...  Training loss: 1.8343...  0.6149 sec/batch
Epoch: 18/20...  Training Step: 3395...  Training loss: 1.8483...  0.6017 sec/batch
Epoch: 18/20...  Training Step: 3396...  Training loss: 1.8384...  0.6066 sec/batch
Epoch: 18/20...  Training Step: 3397...  Training loss: 1.8200...  0.6070 sec/batch
Epoch: 18/20...  Training Step: 3398...  Training loss: 1.8047...  0.6159 sec/batch
Epoch: 18/20...  Training Step: 3399...  Training loss: 1.8342...  0.6361 sec/batch
Epoch: 18/20...  Training Step: 3400...  Training loss: 1.8321...  0.6095 sec/batch
Epoch: 18/20...  Training Step: 3401...  Training loss: 1.8096...  0.5722 sec/batch
Epoch: 18/20...  Training Step: 3402...  Training loss: 1.8167...  0.6126 sec/batch
Epoch: 18/20...  Training Step: 3403...  Training loss: 1.7882...  0.6188 sec/batch
Epoch: 18/20...  Training Step: 3404...  Training loss: 1.7930...  0.6100 sec/batch
Epoch: 18/20...  Training Step: 3405...  Training loss: 1.7803...  0.6186 sec/batch
Epoch: 18/20...  Training Step: 3406...  Training loss: 1.7902...  0.6330 sec/batch
Epoch: 18/20...  Training Step: 3407...  Training loss: 1.8014...  0.6109 sec/batch
Epoch: 18/20...  Training Step: 3408...  Training loss: 1.8321...  0.6158 sec/batch
Epoch: 18/20...  Training Step: 3409...  Training loss: 1.8025...  0.6087 sec/batch
Epoch: 18/20...  Training Step: 3410...  Training loss: 1.7802...  0.6052 sec/batch
Epoch: 18/20...  Training Step: 3411...  Training loss: 1.8192...  0.6176 sec/batch
Epoch: 18/20...  Training Step: 3412...  Training loss: 1.7715...  0.6025 sec/batch
Epoch: 18/20...  Training Step: 3413...  Training loss: 1.8138...  0.6149 sec/batch
Epoch: 18/20...  Training Step: 3414...  Training loss: 1.8089...  0.6032 sec/batch
Epoch: 18/20...  Training Step: 3415...  Training loss: 1.8033...  0.6165 sec/batch
Epoch: 18/20...  Training Step: 3416...  Training loss: 1.8533...  0.6102 sec/batch
Epoch: 18/20...  Training Step: 3417...  Training loss: 1.8015...  0.5940 sec/batch
Epoch: 18/20...  Training Step: 3418...  Training loss: 1.8660...  0.6042 sec/batch
Epoch: 18/20...  Training Step: 3419...  Training loss: 1.8192...  0.6084 sec/batch
Epoch: 18/20...  Training Step: 3420...  Training loss: 1.8170...  0.6106 sec/batch
Epoch: 18/20...  Training Step: 3421...  Training loss: 1.8018...  0.6097 sec/batch
Epoch: 18/20...  Training Step: 3422...  Training loss: 1.8300...  0.6095 sec/batch
Epoch: 18/20...  Training Step: 3423...  Training loss: 1.8333...  0.6171 sec/batch
Epoch: 18/20...  Training Step: 3424...  Training loss: 1.8052...  0.6068 sec/batch
Epoch: 18/20...  Training Step: 3425...  Training loss: 1.8026...  0.6158 sec/batch
Epoch: 18/20...  Training Step: 3426...  Training loss: 1.8401...  0.6171 sec/batch
Epoch: 18/20...  Training Step: 3427...  Training loss: 1.8249...  0.6498 sec/batch
Epoch: 18/20...  Training Step: 3428...  Training loss: 1.8725...  0.6182 sec/batch
Epoch: 18/20...  Training Step: 3429...  Training loss: 1.8417...  0.6160 sec/batch
Epoch: 18/20...  Training Step: 3430...  Training loss: 1.8254...  0.6071 sec/batch
Epoch: 18/20...  Training Step: 3431...  Training loss: 1.8131...  0.6142 sec/batch
Epoch: 18/20...  Training Step: 3432...  Training loss: 1.8430...  0.6128 sec/batch
Epoch: 18/20...  Training Step: 3433...  Training loss: 1.8345...  0.6105 sec/batch
Epoch: 18/20...  Training Step: 3434...  Training loss: 1.7900...  0.6128 sec/batch
Epoch: 18/20...  Training Step: 3435...  Training loss: 1.8121...  0.6171 sec/batch
Epoch: 18/20...  Training Step: 3436...  Training loss: 1.8070...  0.6225 sec/batch
Epoch: 18/20...  Training Step: 3437...  Training loss: 1.8514...  0.6131 sec/batch
Epoch: 18/20...  Training Step: 3438...  Training loss: 1.8321...  0.6190 sec/batch
Epoch: 18/20...  Training Step: 3439...  Training loss: 1.8431...  0.6021 sec/batch
Epoch: 18/20...  Training Step: 3440...  Training loss: 1.8107...  0.6202 sec/batch
Epoch: 18/20...  Training Step: 3441...  Training loss: 1.8147...  0.6096 sec/batch
Epoch: 18/20...  Training Step: 3442...  Training loss: 1.8357...  0.6168 sec/batch
Epoch: 18/20...  Training Step: 3443...  Training loss: 1.8124...  0.6034 sec/batch
Epoch: 18/20...  Training Step: 3444...  Training loss: 1.8248...  0.6233 sec/batch
Epoch: 18/20...  Training Step: 3445...  Training loss: 1.7796...  0.6186 sec/batch
Epoch: 18/20...  Training Step: 3446...  Training loss: 1.8010...  0.6134 sec/batch
Epoch: 18/20...  Training Step: 3447...  Training loss: 1.7753...  0.6332 sec/batch
Epoch: 18/20...  Training Step: 3448...  Training loss: 1.8292...  0.6056 sec/batch
Epoch: 18/20...  Training Step: 3449...  Training loss: 1.7777...  0.6162 sec/batch
Epoch: 18/20...  Training Step: 3450...  Training loss: 1.8162...  0.6190 sec/batch
Epoch: 18/20...  Training Step: 3451...  Training loss: 1.7821...  0.6106 sec/batch
Epoch: 18/20...  Training Step: 3452...  Training loss: 1.7964...  0.6276 sec/batch
Epoch: 18/20...  Training Step: 3453...  Training loss: 1.8011...  0.5888 sec/batch
Epoch: 18/20...  Training Step: 3454...  Training loss: 1.7940...  0.6084 sec/batch
Epoch: 18/20...  Training Step: 3455...  Training loss: 1.7702...  0.6143 sec/batch
Epoch: 18/20...  Training Step: 3456...  Training loss: 1.8279...  0.6066 sec/batch
Epoch: 18/20...  Training Step: 3457...  Training loss: 1.7930...  0.6189 sec/batch
Epoch: 18/20...  Training Step: 3458...  Training loss: 1.8003...  0.6126 sec/batch
Epoch: 18/20...  Training Step: 3459...  Training loss: 1.7879...  0.6088 sec/batch
Epoch: 18/20...  Training Step: 3460...  Training loss: 1.7896...  0.6168 sec/batch
Epoch: 18/20...  Training Step: 3461...  Training loss: 1.7833...  0.6131 sec/batch
Epoch: 18/20...  Training Step: 3462...  Training loss: 1.8110...  0.6208 sec/batch
Epoch: 18/20...  Training Step: 3463...  Training loss: 1.8152...  0.6172 sec/batch
Epoch: 18/20...  Training Step: 3464...  Training loss: 1.7727...  0.6020 sec/batch
Epoch: 18/20...  Training Step: 3465...  Training loss: 1.8057...  0.6060 sec/batch
Epoch: 18/20...  Training Step: 3466...  Training loss: 1.7758...  0.6156 sec/batch
Epoch: 18/20...  Training Step: 3467...  Training loss: 1.8317...  0.6144 sec/batch
Epoch: 18/20...  Training Step: 3468...  Training loss: 1.8019...  0.6193 sec/batch
Epoch: 18/20...  Training Step: 3469...  Training loss: 1.8008...  0.6060 sec/batch
Epoch: 18/20...  Training Step: 3470...  Training loss: 1.8020...  0.6078 sec/batch
Epoch: 18/20...  Training Step: 3471...  Training loss: 1.8018...  0.6098 sec/batch
Epoch: 18/20...  Training Step: 3472...  Training loss: 1.8052...  0.6089 sec/batch
Epoch: 18/20...  Training Step: 3473...  Training loss: 1.8110...  0.6122 sec/batch
Epoch: 18/20...  Training Step: 3474...  Training loss: 1.8078...  0.6071 sec/batch
Epoch: 18/20...  Training Step: 3475...  Training loss: 1.8177...  0.6008 sec/batch
Epoch: 18/20...  Training Step: 3476...  Training loss: 1.8149...  0.6140 sec/batch
Epoch: 18/20...  Training Step: 3477...  Training loss: 1.8050...  0.6208 sec/batch
Epoch: 18/20...  Training Step: 3478...  Training loss: 1.8096...  0.6101 sec/batch
Epoch: 18/20...  Training Step: 3479...  Training loss: 1.8151...  0.6066 sec/batch
Epoch: 18/20...  Training Step: 3480...  Training loss: 1.8049...  0.6096 sec/batch
Epoch: 18/20...  Training Step: 3481...  Training loss: 1.7885...  0.6032 sec/batch
Epoch: 18/20...  Training Step: 3482...  Training loss: 1.7711...  0.6139 sec/batch
Epoch: 18/20...  Training Step: 3483...  Training loss: 1.8223...  0.6132 sec/batch
Epoch: 18/20...  Training Step: 3484...  Training loss: 1.8019...  0.6132 sec/batch
Epoch: 18/20...  Training Step: 3485...  Training loss: 1.8133...  0.6214 sec/batch
Epoch: 18/20...  Training Step: 3486...  Training loss: 1.7975...  0.6099 sec/batch
Epoch: 18/20...  Training Step: 3487...  Training loss: 1.8211...  0.6046 sec/batch
Epoch: 18/20...  Training Step: 3488...  Training loss: 1.7823...  0.6059 sec/batch
Epoch: 18/20...  Training Step: 3489...  Training loss: 1.7828...  0.6149 sec/batch
Epoch: 18/20...  Training Step: 3490...  Training loss: 1.8334...  0.6141 sec/batch
Epoch: 18/20...  Training Step: 3491...  Training loss: 1.8024...  0.6234 sec/batch
Epoch: 18/20...  Training Step: 3492...  Training loss: 1.7644...  0.6139 sec/batch
Epoch: 18/20...  Training Step: 3493...  Training loss: 1.8237...  0.6135 sec/batch
Epoch: 18/20...  Training Step: 3494...  Training loss: 1.8195...  0.6317 sec/batch
Epoch: 18/20...  Training Step: 3495...  Training loss: 1.8124...  0.6152 sec/batch
Epoch: 18/20...  Training Step: 3496...  Training loss: 1.7926...  0.6415 sec/batch
Epoch: 18/20...  Training Step: 3497...  Training loss: 1.7945...  0.6064 sec/batch
Epoch: 18/20...  Training Step: 3498...  Training loss: 1.7695...  0.6108 sec/batch
Epoch: 18/20...  Training Step: 3499...  Training loss: 1.8153...  0.6166 sec/batch
Epoch: 18/20...  Training Step: 3500...  Training loss: 1.8177...  0.6116 sec/batch
Epoch: 18/20...  Training Step: 3501...  Training loss: 1.8091...  0.6225 sec/batch
Epoch: 18/20...  Training Step: 3502...  Training loss: 1.8138...  0.6041 sec/batch
Epoch: 18/20...  Training Step: 3503...  Training loss: 1.8271...  0.6100 sec/batch
Epoch: 18/20...  Training Step: 3504...  Training loss: 1.8188...  0.6119 sec/batch
Epoch: 18/20...  Training Step: 3505...  Training loss: 1.8404...  0.6089 sec/batch
Epoch: 18/20...  Training Step: 3506...  Training loss: 1.8066...  0.5986 sec/batch
Epoch: 18/20...  Training Step: 3507...  Training loss: 1.8660...  0.6062 sec/batch
Epoch: 18/20...  Training Step: 3508...  Training loss: 1.8010...  0.6087 sec/batch
Epoch: 18/20...  Training Step: 3509...  Training loss: 1.8071...  0.6236 sec/batch
Epoch: 18/20...  Training Step: 3510...  Training loss: 1.8138...  0.6141 sec/batch
Epoch: 18/20...  Training Step: 3511...  Training loss: 1.7971...  0.6111 sec/batch
Epoch: 18/20...  Training Step: 3512...  Training loss: 1.8188...  0.6120 sec/batch
Epoch: 18/20...  Training Step: 3513...  Training loss: 1.8368...  0.6236 sec/batch
Epoch: 18/20...  Training Step: 3514...  Training loss: 1.8465...  0.6150 sec/batch
Epoch: 18/20...  Training Step: 3515...  Training loss: 1.8169...  0.6117 sec/batch
Epoch: 18/20...  Training Step: 3516...  Training loss: 1.8144...  0.6202 sec/batch
Epoch: 18/20...  Training Step: 3517...  Training loss: 1.7849...  0.7749 sec/batch
Epoch: 18/20...  Training Step: 3518...  Training loss: 1.8263...  0.6204 sec/batch
Epoch: 18/20...  Training Step: 3519...  Training loss: 1.8232...  0.6162 sec/batch
Epoch: 18/20...  Training Step: 3520...  Training loss: 1.8224...  0.6102 sec/batch
Epoch: 18/20...  Training Step: 3521...  Training loss: 1.8068...  0.6076 sec/batch
Epoch: 18/20...  Training Step: 3522...  Training loss: 1.8060...  0.6107 sec/batch
Epoch: 18/20...  Training Step: 3523...  Training loss: 1.8279...  0.6198 sec/batch
Epoch: 18/20...  Training Step: 3524...  Training loss: 1.8020...  0.6019 sec/batch
Epoch: 18/20...  Training Step: 3525...  Training loss: 1.7642...  0.6338 sec/batch
Epoch: 18/20...  Training Step: 3526...  Training loss: 1.8299...  0.6063 sec/batch
Epoch: 18/20...  Training Step: 3527...  Training loss: 1.8376...  0.6035 sec/batch
Epoch: 18/20...  Training Step: 3528...  Training loss: 1.7970...  0.6144 sec/batch
Epoch: 18/20...  Training Step: 3529...  Training loss: 1.8270...  0.6181 sec/batch
Epoch: 18/20...  Training Step: 3530...  Training loss: 1.8246...  0.6183 sec/batch
Epoch: 18/20...  Training Step: 3531...  Training loss: 1.8070...  0.6097 sec/batch
Epoch: 18/20...  Training Step: 3532...  Training loss: 1.8085...  0.6154 sec/batch
Epoch: 18/20...  Training Step: 3533...  Training loss: 1.8149...  0.6262 sec/batch
Epoch: 18/20...  Training Step: 3534...  Training loss: 1.8762...  0.6092 sec/batch
Epoch: 18/20...  Training Step: 3535...  Training loss: 1.8200...  0.6136 sec/batch
Epoch: 18/20...  Training Step: 3536...  Training loss: 1.8014...  0.6185 sec/batch
Epoch: 18/20...  Training Step: 3537...  Training loss: 1.7990...  0.6142 sec/batch
Epoch: 18/20...  Training Step: 3538...  Training loss: 1.7889...  0.6062 sec/batch
Epoch: 18/20...  Training Step: 3539...  Training loss: 1.8412...  0.6080 sec/batch
Epoch: 18/20...  Training Step: 3540...  Training loss: 1.8220...  0.6211 sec/batch
Epoch: 18/20...  Training Step: 3541...  Training loss: 1.8114...  0.6180 sec/batch
Epoch: 18/20...  Training Step: 3542...  Training loss: 1.7988...  0.6230 sec/batch
Epoch: 18/20...  Training Step: 3543...  Training loss: 1.7890...  0.6182 sec/batch
Epoch: 18/20...  Training Step: 3544...  Training loss: 1.8062...  0.6238 sec/batch
Epoch: 18/20...  Training Step: 3545...  Training loss: 1.7972...  0.6373 sec/batch
Epoch: 18/20...  Training Step: 3546...  Training loss: 1.7768...  0.6094 sec/batch
Epoch: 18/20...  Training Step: 3547...  Training loss: 1.7756...  0.6080 sec/batch
Epoch: 18/20...  Training Step: 3548...  Training loss: 1.8074...  0.6111 sec/batch
Epoch: 18/20...  Training Step: 3549...  Training loss: 1.8049...  0.5964 sec/batch
Epoch: 18/20...  Training Step: 3550...  Training loss: 1.8274...  0.6178 sec/batch
Epoch: 18/20...  Training Step: 3551...  Training loss: 1.8079...  0.6081 sec/batch
Epoch: 18/20...  Training Step: 3552...  Training loss: 1.7945...  0.6102 sec/batch
Epoch: 18/20...  Training Step: 3553...  Training loss: 1.8114...  0.6122 sec/batch
Epoch: 18/20...  Training Step: 3554...  Training loss: 1.8009...  0.6058 sec/batch
Epoch: 18/20...  Training Step: 3555...  Training loss: 1.8129...  0.6406 sec/batch
Epoch: 18/20...  Training Step: 3556...  Training loss: 1.8108...  0.6357 sec/batch
Epoch: 18/20...  Training Step: 3557...  Training loss: 1.8163...  0.6127 sec/batch
Epoch: 18/20...  Training Step: 3558...  Training loss: 1.7883...  0.6196 sec/batch
Epoch: 18/20...  Training Step: 3559...  Training loss: 1.8035...  0.6129 sec/batch
Epoch: 18/20...  Training Step: 3560...  Training loss: 1.7854...  0.6207 sec/batch
Epoch: 18/20...  Training Step: 3561...  Training loss: 1.7865...  0.6079 sec/batch
Epoch: 18/20...  Training Step: 3562...  Training loss: 1.8226...  0.6199 sec/batch
Epoch: 18/20...  Training Step: 3563...  Training loss: 1.7951...  0.6096 sec/batch
Epoch: 18/20...  Training Step: 3564...  Training loss: 1.7886...  0.6059 sec/batch
Epoch: 19/20...  Training Step: 3565...  Training loss: 1.8814...  0.6127 sec/batch
Epoch: 19/20...  Training Step: 3566...  Training loss: 1.8008...  0.6188 sec/batch
Epoch: 19/20...  Training Step: 3567...  Training loss: 1.8072...  0.6171 sec/batch
Epoch: 19/20...  Training Step: 3568...  Training loss: 1.8054...  0.6264 sec/batch
Epoch: 19/20...  Training Step: 3569...  Training loss: 1.7916...  0.6157 sec/batch
Epoch: 19/20...  Training Step: 3570...  Training loss: 1.7743...  0.6123 sec/batch
Epoch: 19/20...  Training Step: 3571...  Training loss: 1.8034...  0.6196 sec/batch
Epoch: 19/20...  Training Step: 3572...  Training loss: 1.7906...  0.6154 sec/batch
Epoch: 19/20...  Training Step: 3573...  Training loss: 1.8358...  0.6140 sec/batch
Epoch: 19/20...  Training Step: 3574...  Training loss: 1.7922...  0.6176 sec/batch
Epoch: 19/20...  Training Step: 3575...  Training loss: 1.7725...  0.6137 sec/batch
Epoch: 19/20...  Training Step: 3576...  Training loss: 1.7827...  0.6099 sec/batch
Epoch: 19/20...  Training Step: 3577...  Training loss: 1.8071...  0.6092 sec/batch
Epoch: 19/20...  Training Step: 3578...  Training loss: 1.8494...  0.6144 sec/batch
Epoch: 19/20...  Training Step: 3579...  Training loss: 1.8049...  0.6131 sec/batch
Epoch: 19/20...  Training Step: 3580...  Training loss: 1.7913...  0.6088 sec/batch
Epoch: 19/20...  Training Step: 3581...  Training loss: 1.8114...  0.6199 sec/batch
Epoch: 19/20...  Training Step: 3582...  Training loss: 1.8248...  0.6227 sec/batch
Epoch: 19/20...  Training Step: 3583...  Training loss: 1.8046...  0.6081 sec/batch
Epoch: 19/20...  Training Step: 3584...  Training loss: 1.8131...  0.6168 sec/batch
Epoch: 19/20...  Training Step: 3585...  Training loss: 1.7871...  0.6088 sec/batch
Epoch: 19/20...  Training Step: 3586...  Training loss: 1.8307...  0.6158 sec/batch
Epoch: 19/20...  Training Step: 3587...  Training loss: 1.8047...  0.6169 sec/batch
Epoch: 19/20...  Training Step: 3588...  Training loss: 1.8045...  0.6249 sec/batch
Epoch: 19/20...  Training Step: 3589...  Training loss: 1.8001...  0.6171 sec/batch
Epoch: 19/20...  Training Step: 3590...  Training loss: 1.7787...  0.6153 sec/batch
Epoch: 19/20...  Training Step: 3591...  Training loss: 1.7805...  0.6191 sec/batch
Epoch: 19/20...  Training Step: 3592...  Training loss: 1.8209...  0.6144 sec/batch
Epoch: 19/20...  Training Step: 3593...  Training loss: 1.8321...  0.6521 sec/batch
Epoch: 19/20...  Training Step: 3594...  Training loss: 1.8186...  0.6012 sec/batch
Epoch: 19/20...  Training Step: 3595...  Training loss: 1.7908...  0.6204 sec/batch
Epoch: 19/20...  Training Step: 3596...  Training loss: 1.7865...  0.6191 sec/batch
Epoch: 19/20...  Training Step: 3597...  Training loss: 1.8084...  0.6131 sec/batch
Epoch: 19/20...  Training Step: 3598...  Training loss: 1.8232...  0.6195 sec/batch
Epoch: 19/20...  Training Step: 3599...  Training loss: 1.7919...  0.6050 sec/batch
Epoch: 19/20...  Training Step: 3600...  Training loss: 1.8057...  0.6161 sec/batch
Epoch: 19/20...  Training Step: 3601...  Training loss: 1.7894...  0.5645 sec/batch
Epoch: 19/20...  Training Step: 3602...  Training loss: 1.7668...  0.6076 sec/batch
Epoch: 19/20...  Training Step: 3603...  Training loss: 1.7670...  0.6228 sec/batch
Epoch: 19/20...  Training Step: 3604...  Training loss: 1.7752...  0.6202 sec/batch
Epoch: 19/20...  Training Step: 3605...  Training loss: 1.7893...  0.6087 sec/batch
Epoch: 19/20...  Training Step: 3606...  Training loss: 1.8215...  0.6033 sec/batch
Epoch: 19/20...  Training Step: 3607...  Training loss: 1.7893...  0.6193 sec/batch
Epoch: 19/20...  Training Step: 3608...  Training loss: 1.7671...  0.6067 sec/batch
Epoch: 19/20...  Training Step: 3609...  Training loss: 1.8112...  0.6137 sec/batch
Epoch: 19/20...  Training Step: 3610...  Training loss: 1.7615...  0.6164 sec/batch
Epoch: 19/20...  Training Step: 3611...  Training loss: 1.8058...  0.6176 sec/batch
Epoch: 19/20...  Training Step: 3612...  Training loss: 1.7878...  0.6186 sec/batch
Epoch: 19/20...  Training Step: 3613...  Training loss: 1.8040...  0.6052 sec/batch
Epoch: 19/20...  Training Step: 3614...  Training loss: 1.8484...  0.6248 sec/batch
Epoch: 19/20...  Training Step: 3615...  Training loss: 1.7805...  0.6204 sec/batch
Epoch: 19/20...  Training Step: 3616...  Training loss: 1.8439...  0.6118 sec/batch
Epoch: 19/20...  Training Step: 3617...  Training loss: 1.7964...  0.6115 sec/batch
Epoch: 19/20...  Training Step: 3618...  Training loss: 1.8020...  0.5919 sec/batch
Epoch: 19/20...  Training Step: 3619...  Training loss: 1.7893...  0.6021 sec/batch
Epoch: 19/20...  Training Step: 3620...  Training loss: 1.8079...  0.6209 sec/batch
Epoch: 19/20...  Training Step: 3621...  Training loss: 1.8240...  0.6121 sec/batch
Epoch: 19/20...  Training Step: 3622...  Training loss: 1.7903...  0.6062 sec/batch
Epoch: 19/20...  Training Step: 3623...  Training loss: 1.7889...  0.6122 sec/batch
Epoch: 19/20...  Training Step: 3624...  Training loss: 1.8305...  0.6151 sec/batch
Epoch: 19/20...  Training Step: 3625...  Training loss: 1.7963...  0.6134 sec/batch
Epoch: 19/20...  Training Step: 3626...  Training loss: 1.8579...  0.6173 sec/batch
Epoch: 19/20...  Training Step: 3627...  Training loss: 1.8220...  0.6168 sec/batch
Epoch: 19/20...  Training Step: 3628...  Training loss: 1.8143...  0.6199 sec/batch
Epoch: 19/20...  Training Step: 3629...  Training loss: 1.7886...  0.6115 sec/batch
Epoch: 19/20...  Training Step: 3630...  Training loss: 1.8228...  0.6312 sec/batch
Epoch: 19/20...  Training Step: 3631...  Training loss: 1.8179...  0.6157 sec/batch
Epoch: 19/20...  Training Step: 3632...  Training loss: 1.7795...  0.6164 sec/batch
Epoch: 19/20...  Training Step: 3633...  Training loss: 1.7843...  0.6203 sec/batch
Epoch: 19/20...  Training Step: 3634...  Training loss: 1.7908...  0.5958 sec/batch
Epoch: 19/20...  Training Step: 3635...  Training loss: 1.8318...  0.6169 sec/batch
Epoch: 19/20...  Training Step: 3636...  Training loss: 1.8078...  0.6079 sec/batch
Epoch: 19/20...  Training Step: 3637...  Training loss: 1.8259...  0.6134 sec/batch
Epoch: 19/20...  Training Step: 3638...  Training loss: 1.7885...  0.6144 sec/batch
Epoch: 19/20...  Training Step: 3639...  Training loss: 1.7933...  0.6184 sec/batch
Epoch: 19/20...  Training Step: 3640...  Training loss: 1.8196...  0.6087 sec/batch
Epoch: 19/20...  Training Step: 3641...  Training loss: 1.8172...  0.6495 sec/batch
Epoch: 19/20...  Training Step: 3642...  Training loss: 1.8015...  0.6115 sec/batch
Epoch: 19/20...  Training Step: 3643...  Training loss: 1.7673...  0.6064 sec/batch
Epoch: 19/20...  Training Step: 3644...  Training loss: 1.7917...  0.6109 sec/batch
Epoch: 19/20...  Training Step: 3645...  Training loss: 1.7685...  0.6151 sec/batch
Epoch: 19/20...  Training Step: 3646...  Training loss: 1.8020...  0.6222 sec/batch
Epoch: 19/20...  Training Step: 3647...  Training loss: 1.7542...  0.6086 sec/batch
Epoch: 19/20...  Training Step: 3648...  Training loss: 1.8057...  0.6124 sec/batch
Epoch: 19/20...  Training Step: 3649...  Training loss: 1.7631...  0.6085 sec/batch
Epoch: 19/20...  Training Step: 3650...  Training loss: 1.7904...  0.6131 sec/batch
Epoch: 19/20...  Training Step: 3651...  Training loss: 1.7853...  0.6125 sec/batch
Epoch: 19/20...  Training Step: 3652...  Training loss: 1.7701...  0.6070 sec/batch
Epoch: 19/20...  Training Step: 3653...  Training loss: 1.7675...  0.6131 sec/batch
Epoch: 19/20...  Training Step: 3654...  Training loss: 1.8172...  0.6196 sec/batch
Epoch: 19/20...  Training Step: 3655...  Training loss: 1.7786...  0.6118 sec/batch
Epoch: 19/20...  Training Step: 3656...  Training loss: 1.7895...  0.6537 sec/batch
Epoch: 19/20...  Training Step: 3657...  Training loss: 1.7722...  0.6271 sec/batch
Epoch: 19/20...  Training Step: 3658...  Training loss: 1.7794...  0.6143 sec/batch
Epoch: 19/20...  Training Step: 3659...  Training loss: 1.7806...  0.6158 sec/batch
Epoch: 19/20...  Training Step: 3660...  Training loss: 1.8006...  0.6072 sec/batch
Epoch: 19/20...  Training Step: 3661...  Training loss: 1.7892...  0.6160 sec/batch
Epoch: 19/20...  Training Step: 3662...  Training loss: 1.7693...  0.6151 sec/batch
Epoch: 19/20...  Training Step: 3663...  Training loss: 1.7933...  0.6107 sec/batch
Epoch: 19/20...  Training Step: 3664...  Training loss: 1.7597...  0.6127 sec/batch
Epoch: 19/20...  Training Step: 3665...  Training loss: 1.8046...  0.6055 sec/batch
Epoch: 19/20...  Training Step: 3666...  Training loss: 1.7875...  0.6154 sec/batch
Epoch: 19/20...  Training Step: 3667...  Training loss: 1.7827...  0.6146 sec/batch
Epoch: 19/20...  Training Step: 3668...  Training loss: 1.7830...  0.6204 sec/batch
Epoch: 19/20...  Training Step: 3669...  Training loss: 1.7882...  0.6174 sec/batch
Epoch: 19/20...  Training Step: 3670...  Training loss: 1.7976...  0.6112 sec/batch
Epoch: 19/20...  Training Step: 3671...  Training loss: 1.8083...  0.6224 sec/batch
Epoch: 19/20...  Training Step: 3672...  Training loss: 1.7989...  0.6129 sec/batch
Epoch: 19/20...  Training Step: 3673...  Training loss: 1.7996...  0.6107 sec/batch
Epoch: 19/20...  Training Step: 3674...  Training loss: 1.7990...  0.6151 sec/batch
Epoch: 19/20...  Training Step: 3675...  Training loss: 1.7958...  0.6158 sec/batch
Epoch: 19/20...  Training Step: 3676...  Training loss: 1.7941...  0.6134 sec/batch
Epoch: 19/20...  Training Step: 3677...  Training loss: 1.7943...  0.6092 sec/batch
Epoch: 19/20...  Training Step: 3678...  Training loss: 1.7713...  0.6036 sec/batch
Epoch: 19/20...  Training Step: 3679...  Training loss: 1.7844...  0.6175 sec/batch
Epoch: 19/20...  Training Step: 3680...  Training loss: 1.7626...  0.6166 sec/batch
Epoch: 19/20...  Training Step: 3681...  Training loss: 1.8161...  0.6153 sec/batch
Epoch: 19/20...  Training Step: 3682...  Training loss: 1.7932...  0.6183 sec/batch
Epoch: 19/20...  Training Step: 3683...  Training loss: 1.8140...  0.6049 sec/batch
Epoch: 19/20...  Training Step: 3684...  Training loss: 1.7878...  0.6126 sec/batch
Epoch: 19/20...  Training Step: 3685...  Training loss: 1.8125...  0.6235 sec/batch
Epoch: 19/20...  Training Step: 3686...  Training loss: 1.7650...  0.6222 sec/batch
Epoch: 19/20...  Training Step: 3687...  Training loss: 1.7694...  0.6150 sec/batch
Epoch: 19/20...  Training Step: 3688...  Training loss: 1.8116...  0.6115 sec/batch
Epoch: 19/20...  Training Step: 3689...  Training loss: 1.7970...  0.6043 sec/batch
Epoch: 19/20...  Training Step: 3690...  Training loss: 1.7489...  0.6373 sec/batch
Epoch: 19/20...  Training Step: 3691...  Training loss: 1.8180...  0.6069 sec/batch
Epoch: 19/20...  Training Step: 3692...  Training loss: 1.8222...  0.6225 sec/batch
Epoch: 19/20...  Training Step: 3693...  Training loss: 1.7850...  0.6157 sec/batch
Epoch: 19/20...  Training Step: 3694...  Training loss: 1.7742...  0.6161 sec/batch
Epoch: 19/20...  Training Step: 3695...  Training loss: 1.7730...  0.6220 sec/batch
Epoch: 19/20...  Training Step: 3696...  Training loss: 1.7694...  0.6167 sec/batch
Epoch: 19/20...  Training Step: 3697...  Training loss: 1.8149...  0.6175 sec/batch
Epoch: 19/20...  Training Step: 3698...  Training loss: 1.8095...  0.6090 sec/batch
Epoch: 19/20...  Training Step: 3699...  Training loss: 1.7974...  0.6207 sec/batch
Epoch: 19/20...  Training Step: 3700...  Training loss: 1.8018...  0.6129 sec/batch
Epoch: 19/20...  Training Step: 3701...  Training loss: 1.8083...  0.6258 sec/batch
Epoch: 19/20...  Training Step: 3702...  Training loss: 1.8067...  0.6212 sec/batch
Epoch: 19/20...  Training Step: 3703...  Training loss: 1.8245...  0.6385 sec/batch
Epoch: 19/20...  Training Step: 3704...  Training loss: 1.7846...  0.6210 sec/batch
Epoch: 19/20...  Training Step: 3705...  Training loss: 1.8367...  0.6035 sec/batch
Epoch: 19/20...  Training Step: 3706...  Training loss: 1.7935...  0.6166 sec/batch
Epoch: 19/20...  Training Step: 3707...  Training loss: 1.7945...  0.6117 sec/batch
Epoch: 19/20...  Training Step: 3708...  Training loss: 1.7995...  0.6197 sec/batch
Epoch: 19/20...  Training Step: 3709...  Training loss: 1.7892...  0.6184 sec/batch
Epoch: 19/20...  Training Step: 3710...  Training loss: 1.8109...  0.6055 sec/batch
Epoch: 19/20...  Training Step: 3711...  Training loss: 1.8166...  0.6073 sec/batch
Epoch: 19/20...  Training Step: 3712...  Training loss: 1.8373...  0.6151 sec/batch
Epoch: 19/20...  Training Step: 3713...  Training loss: 1.8174...  0.6034 sec/batch
Epoch: 19/20...  Training Step: 3714...  Training loss: 1.7931...  0.6177 sec/batch
Epoch: 19/20...  Training Step: 3715...  Training loss: 1.7655...  0.6154 sec/batch
Epoch: 19/20...  Training Step: 3716...  Training loss: 1.8099...  0.6044 sec/batch
Epoch: 19/20...  Training Step: 3717...  Training loss: 1.7976...  0.6412 sec/batch
Epoch: 19/20...  Training Step: 3718...  Training loss: 1.8226...  0.6277 sec/batch
Epoch: 19/20...  Training Step: 3719...  Training loss: 1.7930...  0.6199 sec/batch
Epoch: 19/20...  Training Step: 3720...  Training loss: 1.7853...  0.6143 sec/batch
Epoch: 19/20...  Training Step: 3721...  Training loss: 1.8012...  0.6255 sec/batch
Epoch: 19/20...  Training Step: 3722...  Training loss: 1.7908...  0.6280 sec/batch
Epoch: 19/20...  Training Step: 3723...  Training loss: 1.7596...  0.5965 sec/batch
Epoch: 19/20...  Training Step: 3724...  Training loss: 1.8200...  0.5889 sec/batch
Epoch: 19/20...  Training Step: 3725...  Training loss: 1.8288...  0.6166 sec/batch
Epoch: 19/20...  Training Step: 3726...  Training loss: 1.7905...  0.6204 sec/batch
Epoch: 19/20...  Training Step: 3727...  Training loss: 1.8115...  0.6247 sec/batch
Epoch: 19/20...  Training Step: 3728...  Training loss: 1.8013...  0.6142 sec/batch
Epoch: 19/20...  Training Step: 3729...  Training loss: 1.7908...  0.6133 sec/batch
Epoch: 19/20...  Training Step: 3730...  Training loss: 1.8017...  0.6033 sec/batch
Epoch: 19/20...  Training Step: 3731...  Training loss: 1.8079...  0.6108 sec/batch
Epoch: 19/20...  Training Step: 3732...  Training loss: 1.8626...  0.6112 sec/batch
Epoch: 19/20...  Training Step: 3733...  Training loss: 1.7921...  0.6243 sec/batch
Epoch: 19/20...  Training Step: 3734...  Training loss: 1.7975...  0.6100 sec/batch
Epoch: 19/20...  Training Step: 3735...  Training loss: 1.7755...  0.6078 sec/batch
Epoch: 19/20...  Training Step: 3736...  Training loss: 1.7748...  0.6122 sec/batch
Epoch: 19/20...  Training Step: 3737...  Training loss: 1.8249...  0.6124 sec/batch
Epoch: 19/20...  Training Step: 3738...  Training loss: 1.8145...  0.6139 sec/batch
Epoch: 19/20...  Training Step: 3739...  Training loss: 1.7860...  0.6475 sec/batch
Epoch: 19/20...  Training Step: 3740...  Training loss: 1.7825...  0.6310 sec/batch
Epoch: 19/20...  Training Step: 3741...  Training loss: 1.7825...  0.6267 sec/batch
Epoch: 19/20...  Training Step: 3742...  Training loss: 1.8109...  0.6318 sec/batch
Epoch: 19/20...  Training Step: 3743...  Training loss: 1.7722...  0.6198 sec/batch
Epoch: 19/20...  Training Step: 3744...  Training loss: 1.7752...  0.6136 sec/batch
Epoch: 19/20...  Training Step: 3745...  Training loss: 1.7719...  0.6161 sec/batch
Epoch: 19/20...  Training Step: 3746...  Training loss: 1.7954...  0.6185 sec/batch
Epoch: 19/20...  Training Step: 3747...  Training loss: 1.7985...  0.6164 sec/batch
Epoch: 19/20...  Training Step: 3748...  Training loss: 1.8007...  0.6099 sec/batch
Epoch: 19/20...  Training Step: 3749...  Training loss: 1.7909...  0.6193 sec/batch
Epoch: 19/20...  Training Step: 3750...  Training loss: 1.7837...  0.6275 sec/batch
Epoch: 19/20...  Training Step: 3751...  Training loss: 1.8037...  0.6116 sec/batch
Epoch: 19/20...  Training Step: 3752...  Training loss: 1.7873...  0.6198 sec/batch
Epoch: 19/20...  Training Step: 3753...  Training loss: 1.7898...  0.6146 sec/batch
Epoch: 19/20...  Training Step: 3754...  Training loss: 1.7967...  0.6130 sec/batch
Epoch: 19/20...  Training Step: 3755...  Training loss: 1.7855...  0.6196 sec/batch
Epoch: 19/20...  Training Step: 3756...  Training loss: 1.7739...  0.6104 sec/batch
Epoch: 19/20...  Training Step: 3757...  Training loss: 1.7845...  0.6124 sec/batch
Epoch: 19/20...  Training Step: 3758...  Training loss: 1.7689...  0.6186 sec/batch
Epoch: 19/20...  Training Step: 3759...  Training loss: 1.7591...  0.6152 sec/batch
Epoch: 19/20...  Training Step: 3760...  Training loss: 1.7985...  0.6205 sec/batch
Epoch: 19/20...  Training Step: 3761...  Training loss: 1.7910...  0.5952 sec/batch
Epoch: 19/20...  Training Step: 3762...  Training loss: 1.7824...  0.6126 sec/batch
Epoch: 20/20...  Training Step: 3763...  Training loss: 1.8515...  0.6112 sec/batch
Epoch: 20/20...  Training Step: 3764...  Training loss: 1.7801...  0.6097 sec/batch
Epoch: 20/20...  Training Step: 3765...  Training loss: 1.7823...  0.6113 sec/batch
Epoch: 20/20...  Training Step: 3766...  Training loss: 1.7958...  0.6252 sec/batch
Epoch: 20/20...  Training Step: 3767...  Training loss: 1.7774...  0.6103 sec/batch
Epoch: 20/20...  Training Step: 3768...  Training loss: 1.7569...  0.6292 sec/batch
Epoch: 20/20...  Training Step: 3769...  Training loss: 1.7905...  0.6081 sec/batch
Epoch: 20/20...  Training Step: 3770...  Training loss: 1.7835...  0.6148 sec/batch
Epoch: 20/20...  Training Step: 3771...  Training loss: 1.8144...  0.6140 sec/batch
Epoch: 20/20...  Training Step: 3772...  Training loss: 1.7852...  0.6147 sec/batch
Epoch: 20/20...  Training Step: 3773...  Training loss: 1.7612...  0.6146 sec/batch
Epoch: 20/20...  Training Step: 3774...  Training loss: 1.7691...  0.6053 sec/batch
Epoch: 20/20...  Training Step: 3775...  Training loss: 1.7900...  0.6238 sec/batch
Epoch: 20/20...  Training Step: 3776...  Training loss: 1.8289...  0.6239 sec/batch
Epoch: 20/20...  Training Step: 3777...  Training loss: 1.7800...  0.6138 sec/batch
Epoch: 20/20...  Training Step: 3778...  Training loss: 1.7645...  0.6219 sec/batch
Epoch: 20/20...  Training Step: 3779...  Training loss: 1.7914...  0.6149 sec/batch
Epoch: 20/20...  Training Step: 3780...  Training loss: 1.8225...  0.6150 sec/batch
Epoch: 20/20...  Training Step: 3781...  Training loss: 1.7841...  0.6176 sec/batch
Epoch: 20/20...  Training Step: 3782...  Training loss: 1.8049...  0.5983 sec/batch
Epoch: 20/20...  Training Step: 3783...  Training loss: 1.7673...  0.6133 sec/batch
Epoch: 20/20...  Training Step: 3784...  Training loss: 1.8114...  0.6197 sec/batch
Epoch: 20/20...  Training Step: 3785...  Training loss: 1.7926...  0.6073 sec/batch
Epoch: 20/20...  Training Step: 3786...  Training loss: 1.7876...  0.6265 sec/batch
Epoch: 20/20...  Training Step: 3787...  Training loss: 1.7862...  0.6439 sec/batch
Epoch: 20/20...  Training Step: 3788...  Training loss: 1.7629...  0.6154 sec/batch
Epoch: 20/20...  Training Step: 3789...  Training loss: 1.7749...  0.6189 sec/batch
Epoch: 20/20...  Training Step: 3790...  Training loss: 1.8011...  0.6200 sec/batch
Epoch: 20/20...  Training Step: 3791...  Training loss: 1.8197...  0.6203 sec/batch
Epoch: 20/20...  Training Step: 3792...  Training loss: 1.8244...  0.6266 sec/batch
Epoch: 20/20...  Training Step: 3793...  Training loss: 1.7904...  0.6137 sec/batch
Epoch: 20/20...  Training Step: 3794...  Training loss: 1.7748...  0.6088 sec/batch
Epoch: 20/20...  Training Step: 3795...  Training loss: 1.8053...  0.6095 sec/batch
Epoch: 20/20...  Training Step: 3796...  Training loss: 1.8164...  0.6103 sec/batch
Epoch: 20/20...  Training Step: 3797...  Training loss: 1.7789...  0.6136 sec/batch
Epoch: 20/20...  Training Step: 3798...  Training loss: 1.7896...  0.6150 sec/batch
Epoch: 20/20...  Training Step: 3799...  Training loss: 1.7671...  0.6021 sec/batch
Epoch: 20/20...  Training Step: 3800...  Training loss: 1.7643...  0.6272 sec/batch
Epoch: 20/20...  Training Step: 3801...  Training loss: 1.7423...  0.5479 sec/batch
Epoch: 20/20...  Training Step: 3802...  Training loss: 1.7689...  0.6143 sec/batch
Epoch: 20/20...  Training Step: 3803...  Training loss: 1.7712...  0.6276 sec/batch
Epoch: 20/20...  Training Step: 3804...  Training loss: 1.8088...  0.6179 sec/batch
Epoch: 20/20...  Training Step: 3805...  Training loss: 1.7678...  0.6183 sec/batch
Epoch: 20/20...  Training Step: 3806...  Training loss: 1.7515...  0.6132 sec/batch
Epoch: 20/20...  Training Step: 3807...  Training loss: 1.7912...  0.6114 sec/batch
Epoch: 20/20...  Training Step: 3808...  Training loss: 1.7504...  0.6161 sec/batch
Epoch: 20/20...  Training Step: 3809...  Training loss: 1.7900...  0.6214 sec/batch
Epoch: 20/20...  Training Step: 3810...  Training loss: 1.7692...  0.6218 sec/batch
Epoch: 20/20...  Training Step: 3811...  Training loss: 1.7765...  0.6212 sec/batch
Epoch: 20/20...  Training Step: 3812...  Training loss: 1.8242...  0.6303 sec/batch
Epoch: 20/20...  Training Step: 3813...  Training loss: 1.7582...  0.6286 sec/batch
Epoch: 20/20...  Training Step: 3814...  Training loss: 1.8332...  0.6130 sec/batch
Epoch: 20/20...  Training Step: 3815...  Training loss: 1.8021...  0.6159 sec/batch
Epoch: 20/20...  Training Step: 3816...  Training loss: 1.7929...  0.6086 sec/batch
Epoch: 20/20...  Training Step: 3817...  Training loss: 1.7758...  0.6142 sec/batch
Epoch: 20/20...  Training Step: 3818...  Training loss: 1.7973...  0.6192 sec/batch
Epoch: 20/20...  Training Step: 3819...  Training loss: 1.8116...  0.6127 sec/batch
Epoch: 20/20...  Training Step: 3820...  Training loss: 1.7752...  0.6058 sec/batch
Epoch: 20/20...  Training Step: 3821...  Training loss: 1.7790...  0.6200 sec/batch
Epoch: 20/20...  Training Step: 3822...  Training loss: 1.8185...  0.6095 sec/batch
Epoch: 20/20...  Training Step: 3823...  Training loss: 1.7805...  0.6150 sec/batch
Epoch: 20/20...  Training Step: 3824...  Training loss: 1.8501...  0.6179 sec/batch
Epoch: 20/20...  Training Step: 3825...  Training loss: 1.8153...  0.6123 sec/batch
Epoch: 20/20...  Training Step: 3826...  Training loss: 1.8064...  0.6145 sec/batch
Epoch: 20/20...  Training Step: 3827...  Training loss: 1.7878...  0.6103 sec/batch
Epoch: 20/20...  Training Step: 3828...  Training loss: 1.8092...  0.6053 sec/batch
Epoch: 20/20...  Training Step: 3829...  Training loss: 1.8100...  0.6184 sec/batch
Epoch: 20/20...  Training Step: 3830...  Training loss: 1.7704...  0.6032 sec/batch
Epoch: 20/20...  Training Step: 3831...  Training loss: 1.7816...  0.6160 sec/batch
Epoch: 20/20...  Training Step: 3832...  Training loss: 1.7739...  0.6168 sec/batch
Epoch: 20/20...  Training Step: 3833...  Training loss: 1.8304...  0.6110 sec/batch
Epoch: 20/20...  Training Step: 3834...  Training loss: 1.7997...  0.6210 sec/batch
Epoch: 20/20...  Training Step: 3835...  Training loss: 1.8234...  0.6433 sec/batch
Epoch: 20/20...  Training Step: 3836...  Training loss: 1.7823...  0.6116 sec/batch
Epoch: 20/20...  Training Step: 3837...  Training loss: 1.7813...  0.6163 sec/batch
Epoch: 20/20...  Training Step: 3838...  Training loss: 1.8018...  0.6149 sec/batch
Epoch: 20/20...  Training Step: 3839...  Training loss: 1.7912...  0.6077 sec/batch
Epoch: 20/20...  Training Step: 3840...  Training loss: 1.8004...  0.6309 sec/batch
Epoch: 20/20...  Training Step: 3841...  Training loss: 1.7494...  0.6223 sec/batch
Epoch: 20/20...  Training Step: 3842...  Training loss: 1.7739...  0.6158 sec/batch
Epoch: 20/20...  Training Step: 3843...  Training loss: 1.7541...  0.6258 sec/batch
Epoch: 20/20...  Training Step: 3844...  Training loss: 1.8007...  0.6152 sec/batch
Epoch: 20/20...  Training Step: 3845...  Training loss: 1.7567...  0.6145 sec/batch
Epoch: 20/20...  Training Step: 3846...  Training loss: 1.7832...  0.6201 sec/batch
Epoch: 20/20...  Training Step: 3847...  Training loss: 1.7506...  0.6221 sec/batch
Epoch: 20/20...  Training Step: 3848...  Training loss: 1.7829...  0.6155 sec/batch
Epoch: 20/20...  Training Step: 3849...  Training loss: 1.7718...  0.6135 sec/batch
Epoch: 20/20...  Training Step: 3850...  Training loss: 1.7642...  0.6166 sec/batch
Epoch: 20/20...  Training Step: 3851...  Training loss: 1.7419...  0.6070 sec/batch
Epoch: 20/20...  Training Step: 3852...  Training loss: 1.7947...  0.6206 sec/batch
Epoch: 20/20...  Training Step: 3853...  Training loss: 1.7608...  0.6121 sec/batch
Epoch: 20/20...  Training Step: 3854...  Training loss: 1.7592...  0.6115 sec/batch
Epoch: 20/20...  Training Step: 3855...  Training loss: 1.7629...  0.6106 sec/batch
Epoch: 20/20...  Training Step: 3856...  Training loss: 1.7584...  0.6220 sec/batch
Epoch: 20/20...  Training Step: 3857...  Training loss: 1.7616...  0.6196 sec/batch
Epoch: 20/20...  Training Step: 3858...  Training loss: 1.7979...  0.6115 sec/batch
Epoch: 20/20...  Training Step: 3859...  Training loss: 1.7800...  0.6092 sec/batch
Epoch: 20/20...  Training Step: 3860...  Training loss: 1.7506...  0.6143 sec/batch
Epoch: 20/20...  Training Step: 3861...  Training loss: 1.7680...  0.6163 sec/batch
Epoch: 20/20...  Training Step: 3862...  Training loss: 1.7495...  0.6171 sec/batch
Epoch: 20/20...  Training Step: 3863...  Training loss: 1.7970...  0.6316 sec/batch
Epoch: 20/20...  Training Step: 3864...  Training loss: 1.7689...  0.6197 sec/batch
Epoch: 20/20...  Training Step: 3865...  Training loss: 1.7685...  0.6113 sec/batch
Epoch: 20/20...  Training Step: 3866...  Training loss: 1.7703...  0.6143 sec/batch
Epoch: 20/20...  Training Step: 3867...  Training loss: 1.7816...  0.6180 sec/batch
Epoch: 20/20...  Training Step: 3868...  Training loss: 1.7845...  0.6130 sec/batch
Epoch: 20/20...  Training Step: 3869...  Training loss: 1.7886...  0.6183 sec/batch
Epoch: 20/20...  Training Step: 3870...  Training loss: 1.7886...  0.6194 sec/batch
Epoch: 20/20...  Training Step: 3871...  Training loss: 1.7888...  0.6075 sec/batch
Epoch: 20/20...  Training Step: 3872...  Training loss: 1.7943...  0.6199 sec/batch
Epoch: 20/20...  Training Step: 3873...  Training loss: 1.7765...  0.6005 sec/batch
Epoch: 20/20...  Training Step: 3874...  Training loss: 1.7781...  0.6162 sec/batch
Epoch: 20/20...  Training Step: 3875...  Training loss: 1.7798...  0.6057 sec/batch
Epoch: 20/20...  Training Step: 3876...  Training loss: 1.7716...  0.6215 sec/batch
Epoch: 20/20...  Training Step: 3877...  Training loss: 1.7563...  0.6221 sec/batch
Epoch: 20/20...  Training Step: 3878...  Training loss: 1.7370...  0.6181 sec/batch
Epoch: 20/20...  Training Step: 3879...  Training loss: 1.7891...  0.6237 sec/batch
Epoch: 20/20...  Training Step: 3880...  Training loss: 1.7723...  0.6192 sec/batch
Epoch: 20/20...  Training Step: 3881...  Training loss: 1.7888...  0.6069 sec/batch
Epoch: 20/20...  Training Step: 3882...  Training loss: 1.7799...  0.6128 sec/batch
Epoch: 20/20...  Training Step: 3883...  Training loss: 1.7816...  0.6102 sec/batch
Epoch: 20/20...  Training Step: 3884...  Training loss: 1.7621...  0.6658 sec/batch
Epoch: 20/20...  Training Step: 3885...  Training loss: 1.7474...  0.6098 sec/batch
Epoch: 20/20...  Training Step: 3886...  Training loss: 1.7901...  0.6086 sec/batch
Epoch: 20/20...  Training Step: 3887...  Training loss: 1.7829...  0.6022 sec/batch
Epoch: 20/20...  Training Step: 3888...  Training loss: 1.7251...  0.6193 sec/batch
Epoch: 20/20...  Training Step: 3889...  Training loss: 1.7961...  0.6248 sec/batch
Epoch: 20/20...  Training Step: 3890...  Training loss: 1.7974...  0.6038 sec/batch
Epoch: 20/20...  Training Step: 3891...  Training loss: 1.7854...  0.6107 sec/batch
Epoch: 20/20...  Training Step: 3892...  Training loss: 1.7670...  0.6200 sec/batch
Epoch: 20/20...  Training Step: 3893...  Training loss: 1.7637...  0.6131 sec/batch
Epoch: 20/20...  Training Step: 3894...  Training loss: 1.7594...  0.6153 sec/batch
Epoch: 20/20...  Training Step: 3895...  Training loss: 1.7972...  0.6167 sec/batch
Epoch: 20/20...  Training Step: 3896...  Training loss: 1.7873...  0.6223 sec/batch
Epoch: 20/20...  Training Step: 3897...  Training loss: 1.7943...  0.6195 sec/batch
Epoch: 20/20...  Training Step: 3898...  Training loss: 1.7811...  0.6181 sec/batch
Epoch: 20/20...  Training Step: 3899...  Training loss: 1.7986...  0.6143 sec/batch
Epoch: 20/20...  Training Step: 3900...  Training loss: 1.7901...  0.6104 sec/batch
Epoch: 20/20...  Training Step: 3901...  Training loss: 1.8132...  0.6168 sec/batch
Epoch: 20/20...  Training Step: 3902...  Training loss: 1.7733...  0.6127 sec/batch
Epoch: 20/20...  Training Step: 3903...  Training loss: 1.8358...  0.6196 sec/batch
Epoch: 20/20...  Training Step: 3904...  Training loss: 1.7679...  0.6257 sec/batch
Epoch: 20/20...  Training Step: 3905...  Training loss: 1.7887...  0.6204 sec/batch
Epoch: 20/20...  Training Step: 3906...  Training loss: 1.7967...  0.6191 sec/batch
Epoch: 20/20...  Training Step: 3907...  Training loss: 1.7771...  0.6117 sec/batch
Epoch: 20/20...  Training Step: 3908...  Training loss: 1.7908...  0.6113 sec/batch
Epoch: 20/20...  Training Step: 3909...  Training loss: 1.8077...  0.6222 sec/batch
Epoch: 20/20...  Training Step: 3910...  Training loss: 1.8180...  0.6175 sec/batch
Epoch: 20/20...  Training Step: 3911...  Training loss: 1.7897...  0.6205 sec/batch
Epoch: 20/20...  Training Step: 3912...  Training loss: 1.7872...  0.6212 sec/batch
Epoch: 20/20...  Training Step: 3913...  Training loss: 1.7518...  0.6114 sec/batch
Epoch: 20/20...  Training Step: 3914...  Training loss: 1.8085...  0.6131 sec/batch
Epoch: 20/20...  Training Step: 3915...  Training loss: 1.7891...  0.6129 sec/batch
Epoch: 20/20...  Training Step: 3916...  Training loss: 1.7968...  0.6164 sec/batch
Epoch: 20/20...  Training Step: 3917...  Training loss: 1.7775...  0.6200 sec/batch
Epoch: 20/20...  Training Step: 3918...  Training loss: 1.7705...  0.6161 sec/batch
Epoch: 20/20...  Training Step: 3919...  Training loss: 1.8075...  0.6225 sec/batch
Epoch: 20/20...  Training Step: 3920...  Training loss: 1.7836...  0.6119 sec/batch
Epoch: 20/20...  Training Step: 3921...  Training loss: 1.7485...  0.6226 sec/batch
Epoch: 20/20...  Training Step: 3922...  Training loss: 1.8083...  0.6184 sec/batch
Epoch: 20/20...  Training Step: 3923...  Training loss: 1.8151...  0.6155 sec/batch
Epoch: 20/20...  Training Step: 3924...  Training loss: 1.7819...  0.6200 sec/batch
Epoch: 20/20...  Training Step: 3925...  Training loss: 1.8003...  0.6291 sec/batch
Epoch: 20/20...  Training Step: 3926...  Training loss: 1.7910...  0.6196 sec/batch
Epoch: 20/20...  Training Step: 3927...  Training loss: 1.7808...  0.6166 sec/batch
Epoch: 20/20...  Training Step: 3928...  Training loss: 1.7889...  0.6204 sec/batch
Epoch: 20/20...  Training Step: 3929...  Training loss: 1.7878...  0.6523 sec/batch
Epoch: 20/20...  Training Step: 3930...  Training loss: 1.8385...  0.6200 sec/batch
Epoch: 20/20...  Training Step: 3931...  Training loss: 1.7828...  0.6152 sec/batch
Epoch: 20/20...  Training Step: 3932...  Training loss: 1.7751...  0.6420 sec/batch
Epoch: 20/20...  Training Step: 3933...  Training loss: 1.7720...  0.6140 sec/batch
Epoch: 20/20...  Training Step: 3934...  Training loss: 1.7651...  0.6121 sec/batch
Epoch: 20/20...  Training Step: 3935...  Training loss: 1.8025...  0.6138 sec/batch
Epoch: 20/20...  Training Step: 3936...  Training loss: 1.7928...  0.6169 sec/batch
Epoch: 20/20...  Training Step: 3937...  Training loss: 1.7982...  0.6122 sec/batch
Epoch: 20/20...  Training Step: 3938...  Training loss: 1.7693...  0.6182 sec/batch
Epoch: 20/20...  Training Step: 3939...  Training loss: 1.7800...  0.6216 sec/batch
Epoch: 20/20...  Training Step: 3940...  Training loss: 1.7850...  0.6196 sec/batch
Epoch: 20/20...  Training Step: 3941...  Training loss: 1.7645...  0.6136 sec/batch
Epoch: 20/20...  Training Step: 3942...  Training loss: 1.7580...  0.6141 sec/batch
Epoch: 20/20...  Training Step: 3943...  Training loss: 1.7586...  0.6149 sec/batch
Epoch: 20/20...  Training Step: 3944...  Training loss: 1.7766...  0.6056 sec/batch
Epoch: 20/20...  Training Step: 3945...  Training loss: 1.7733...  0.6194 sec/batch
Epoch: 20/20...  Training Step: 3946...  Training loss: 1.8002...  0.6203 sec/batch
Epoch: 20/20...  Training Step: 3947...  Training loss: 1.7798...  0.6171 sec/batch
Epoch: 20/20...  Training Step: 3948...  Training loss: 1.7667...  0.6239 sec/batch
Epoch: 20/20...  Training Step: 3949...  Training loss: 1.7897...  0.5979 sec/batch
Epoch: 20/20...  Training Step: 3950...  Training loss: 1.7720...  0.6195 sec/batch
Epoch: 20/20...  Training Step: 3951...  Training loss: 1.7908...  0.6164 sec/batch
Epoch: 20/20...  Training Step: 3952...  Training loss: 1.7860...  0.6203 sec/batch
Epoch: 20/20...  Training Step: 3953...  Training loss: 1.7760...  0.6180 sec/batch
Epoch: 20/20...  Training Step: 3954...  Training loss: 1.7569...  0.6121 sec/batch
Epoch: 20/20...  Training Step: 3955...  Training loss: 1.7865...  0.6110 sec/batch
Epoch: 20/20...  Training Step: 3956...  Training loss: 1.7645...  0.6235 sec/batch
Epoch: 20/20...  Training Step: 3957...  Training loss: 1.7662...  0.6162 sec/batch
Epoch: 20/20...  Training Step: 3958...  Training loss: 1.7780...  0.6079 sec/batch
Epoch: 20/20...  Training Step: 3959...  Training loss: 1.7740...  0.6134 sec/batch
Epoch: 20/20...  Training Step: 3960...  Training loss: 1.7588...  0.6176 sec/batch

Saved checkpoints

Read up on saving and loading checkpoints here: https://www.tensorflow.org/programmers_guide/variables


In [99]:
tf.train.get_checkpoint_state('checkpoints')


Out[99]:
model_checkpoint_path: "checkpoints/i3960_l128.ckpt"
all_model_checkpoint_paths: "checkpoints/i200_l128.ckpt"
all_model_checkpoint_paths: "checkpoints/i400_l128.ckpt"
all_model_checkpoint_paths: "checkpoints/i600_l128.ckpt"
all_model_checkpoint_paths: "checkpoints/i800_l128.ckpt"
all_model_checkpoint_paths: "checkpoints/i1000_l128.ckpt"
all_model_checkpoint_paths: "checkpoints/i1200_l128.ckpt"
all_model_checkpoint_paths: "checkpoints/i1400_l128.ckpt"
all_model_checkpoint_paths: "checkpoints/i1600_l128.ckpt"
all_model_checkpoint_paths: "checkpoints/i1800_l128.ckpt"
all_model_checkpoint_paths: "checkpoints/i2000_l128.ckpt"
all_model_checkpoint_paths: "checkpoints/i2200_l128.ckpt"
all_model_checkpoint_paths: "checkpoints/i2400_l128.ckpt"
all_model_checkpoint_paths: "checkpoints/i2600_l128.ckpt"
all_model_checkpoint_paths: "checkpoints/i2800_l128.ckpt"
all_model_checkpoint_paths: "checkpoints/i3000_l128.ckpt"
all_model_checkpoint_paths: "checkpoints/i3200_l128.ckpt"
all_model_checkpoint_paths: "checkpoints/i3400_l128.ckpt"
all_model_checkpoint_paths: "checkpoints/i3600_l128.ckpt"
all_model_checkpoint_paths: "checkpoints/i3800_l128.ckpt"
all_model_checkpoint_paths: "checkpoints/i3960_l128.ckpt"

Sampling

Now that the network is trained, we'll can use it to generate new text. The idea is that we pass in a character, then the network will predict the next character. We can use the new one, to predict the next one. And we keep doing this to generate all new text. I also included some functionality to prime the network with some text by passing in a string and building up a state from that.

The network gives us predictions for each character. To reduce noise and make things a little less random, I'm going to only choose a new character from the top N most likely characters.


In [100]:
def pick_top_n(preds, vocab_size, top_n=5):
    p = np.squeeze(preds)
    p[np.argsort(p)[:-top_n]] = 0
    p = p / np.sum(p)
    c = np.random.choice(vocab_size, 1, p=p)[0]
    return c

In [101]:
def sample(checkpoint, n_samples, lstm_size, vocab_size, prime="The "):
    samples = [c for c in prime]
    model = CharRNN(len(vocab), lstm_size=lstm_size, sampling=True)
    saver = tf.train.Saver()
    with tf.Session() as sess:
        saver.restore(sess, checkpoint)
        new_state = sess.run(model.initial_state)
        for c in prime:
            x = np.zeros((1, 1))
            x[0,0] = vocab_to_int[c]
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.prediction, model.final_state], 
                                         feed_dict=feed)

        c = pick_top_n(preds, len(vocab))
        samples.append(int_to_vocab[c])

        for i in range(n_samples):
            x[0,0] = c
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.prediction, model.final_state], 
                                         feed_dict=feed)

            c = pick_top_n(preds, len(vocab))
            samples.append(int_to_vocab[c])
        
    return ''.join(samples)

Here, pass in the path to a checkpoint and sample from the network.


In [102]:
tf.train.latest_checkpoint('checkpoints')


Out[102]:
'checkpoints/i3960_l128.ckpt'

In [103]:
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = sample(checkpoint, 2000, lstm_size, len(vocab), prime="Far")
print(samp)


INFO:tensorflow:Restoring parameters from checkpoints/i3960_l128.ckpt
Faritrro an ampet ous to her, wasked to a when had
to her..

"That'n shad beer
begit on his ansereed.
"And twe me an erest the say", and when his timpingly, and stiens of tole alrry of
tor, horghed
angerent it with her stond with that aroon of the streach it...

"Yeved to him. "ov the sain.

"Yin was, the
were ale a decinor,"
sauded in to she hed hand. He had now take to his whenhing the
mares and offeriont, harn and
shupranganed hardstions had
not begen his" all herself to
the wall of atery the souses of time aland the heart, becting and as thementen altitedelt hew oo an the sorn, and that he coudd op the ran on other that said to take the all. He wat a tallicied and would now a thens forwort on the can the right, so be
then tell of the rithe while
saed this, and
cat hurseer bros who saw he had dinglys the
cores other. Teed tovery the realinn anndered this had divereding a holfer, was and her sindene, and to her and had beeve hh siin. "I would to her ann wosen seasse op harsated of thene devessed nof tit say. I do won't the had imsatt of seeming to the
soineats itshelle dayite.

At't sipt warked with of stars, to hem to and the rege astelles to the sere was to she was cannt and the fartion hands that where he say she and him that said her stinger had a sente itting and would thome to arl of tor her beatione tome him
tike
her and starcading on the concor ald, wit one. That that
they all
to the carness soun and hoand on her thim, and stoulse of a sait, the for his sant, barly he knew the said to the compacte oft a dester to her tree apparate, at the cellects, he cowt his blasse a fles tames fros ouch him was bloulded to him she witist hurboss, her and colceanter the start of the pearent in a daresty, and thouding of him of his with him or the weet andwroved the heard teeght of with the himsesss that.

"When's his hear on her asd for heart than they to know himes of it!"
doses.
 "Thly imae
in when here and
met of the liet she ded her tree."
 "Yes, alree asay to the tonth 

In [ ]:


In [ ]: