In this notebook, we'll build a character-wise RNN trained on Anna Karenina, one of my all-time favorite books. It'll be able to generate new text based on the text from the book.
This network is based off of Andrej Karpathy's post on RNNs and implementation in Torch. Also, some information here at r2rt and from Sherjil Ozair on GitHub. Below is the general architecture of the character-wise RNN.
In [1]:
import time
from collections import namedtuple
import numpy as np
import tensorflow as tf
First we'll load the text file and convert it into integers for our network to use. Here I'm creating a couple dictionaries to convert the characters to and from integers. Encoding the characters as integers makes it easier to use as input in the network.
In [2]:
with open('anna.txt', 'r') as f:
text=f.read()
vocab = sorted(set(text))
vocab_to_int = {c: i for i, c in enumerate(vocab)}
int_to_vocab = dict(enumerate(vocab))
encoded = np.array([vocab_to_int[c] for c in text], dtype=np.int32)
Let's check out the first 100 characters, make sure everything is peachy. According to the American Book Review, this is the 6th best first line of a book ever.
In [3]:
text[:100]
Out[3]:
'Chapter 1\n\n\nHappy families are all alike; every unhappy family is unhappy in its own\nway.\n\nEverythin'
And we can see the characters encoded as integers.
In [4]:
encoded[:100]
Out[4]:
array([28, 81, 3, 53, 56, 30, 61, 41, 7, 62, 62, 62, 31, 3, 53, 53, 12,
41, 19, 3, 26, 77, 65, 77, 30, 43, 41, 3, 61, 30, 41, 3, 65, 65,
41, 3, 65, 77, 48, 30, 52, 41, 30, 39, 30, 61, 12, 41, 68, 80, 81,
3, 53, 53, 12, 41, 19, 3, 26, 77, 65, 12, 41, 77, 43, 41, 68, 80,
81, 3, 53, 53, 12, 41, 77, 80, 41, 77, 56, 43, 41, 67, 21, 80, 62,
21, 3, 12, 18, 62, 62, 4, 39, 30, 61, 12, 56, 81, 77, 80], dtype=int32)
Since the network is working with individual characters, it's similar to a classification problem in which we are trying to predict the next character from the previous text. Here's how many 'classes' our network has to pick from.
In [5]:
len(vocab)
Out[5]:
83
Here is where we'll make our mini-batches for training. Remember that we want our batches to be multiple sequences of some desired number of sequence steps. Considering a simple example, our batches would look like this:
We have our text encoded as integers as one long array in encoded. Let's create a function that will give us an iterator for our batches. I like using generator functions to do this. Then we can pass encoded into this function and get our batch generator.
The first thing we need to do is discard some of the text so we only have completely full batches. Each batch contains $N \times M$ characters, where $N$ is the batch size (the number of sequences) and $M$ is the number of steps. Then, to get the number of batches we can make from some array arr, you divide the length of arr by the batch size. Once you know the number of batches and the batch size, you can get the total number of characters to keep.
After that, we need to split arr into $N$ sequences. You can do this using arr.reshape(size) where size is a tuple containing the dimensions sizes of the reshaped array. We know we want $N$ sequences (n_seqs below), let's make that the size of the first dimension. For the second dimension, you can use -1 as a placeholder in the size, it'll fill up the array with the appropriate data for you. After this, you should have an array that is $N \times (M * K)$ where $K$ is the number of batches.
Now that we have this array, we can iterate through it to get our batches. The idea is each batch is a $N \times M$ window on the array. For each subsequent batch, the window moves over by n_steps. We also want to create both the input and target arrays. Remember that the targets are the inputs shifted over one character. You'll usually see the first input character used as the last target character, so something like this:
y[:, :-1], y[:, -1] = x[:, 1:], x[:, 0]
where x is the input batch and y is the target batch.
The way I like to do this window is use range to take steps of size n_steps from $0$ to arr.shape[1], the total number of steps in each sequence. That way, the integers you get from range always point to the start of a batch, and each window is n_steps wide.
Exercise: Write the code for creating batches in the function below. The exercises in this notebook will not be easy. I've provided a notebook with solutions alongside this notebook. If you get stuck, checkout the solutions. The most important thing is that you don't copy and paste the code into here, type out the solution code yourself.
In [6]:
def get_batches(arr, n_seqs, n_steps):
'''Create a generator that returns batches of size
n_seqs x n_steps from arr.
Arguments
---------
arr: Array you want to make batches from
n_seqs: Batch size, the number of sequences per batch
n_steps: Number of sequence steps per batch
'''
# Get the number of characters per batch and number of batches we can make
characters_per_batch = n_seqs * n_steps
n_batches = len(arr) // characters_per_batch
# Keep only enough characters to make full batches
arr = arr[:characters_per_batch * n_batches]
# Reshape into n_seqs rows
arr = arr.reshape((n_seqs, -1))
for n in range(0, arr.shape[1], n_steps):
# The features
x = arr[:, n: n+n_steps]
# The targets, shifted by one
y = np.zeros_like(x)
y[:, 0:-1] = x[:, 1:]
y[:, -1] = x[:, 0]
yield x, y
Now I'll make my data sets and we can check out what's going on here. Here I'm going to use a batch size of 10 and 50 sequence steps.
In [7]:
batches = get_batches(encoded, 10, 50)
x, y = next(batches)
In [8]:
b = get_batches(encoded, 2, 32)
x, y = next(b)
print("x: \n", x)
print("y: \n", y)
x:
[[28 81 3 53 56 30 61 41 7 62 62 62 31 3 53 53 12 41 19 3 26 77 65 77
30 43 41 3 61 30 41 3]
[41 56 30 3 61 1 43 56 3 77 80 30 57 60 41 53 77 56 77 19 68 65 60 41
43 21 30 30 56 41 19 3]]
y:
[[81 3 53 56 30 61 41 7 62 62 62 31 3 53 53 12 41 19 3 26 77 65 77 30
43 41 3 61 30 41 3 28]
[56 30 3 61 1 43 56 3 77 80 30 57 60 41 53 77 56 77 19 68 65 60 41 43
21 30 30 56 41 19 3 41]]
In [9]:
print("x's shape: \n", x.shape)
print("y's shape: \n", y.shape)
x's shape:
(2, 32)
y's shape:
(2, 32)
In [10]:
print('x\n', x[:10, :10])
print('\ny\n', y[:10, :10])
x
[[28 81 3 53 56 30 61 41 7 62]
[41 56 30 3 61 1 43 56 3 77]]
y
[[81 3 53 56 30 61 41 7 62 62]
[56 30 3 61 1 43 56 3 77 80]]
If you implemented get_batches correctly, the above output should look something like
x
[[55 63 69 22 6 76 45 5 16 35]
[ 5 69 1 5 12 52 6 5 56 52]
[48 29 12 61 35 35 8 64 76 78]
[12 5 24 39 45 29 12 56 5 63]
[ 5 29 6 5 29 78 28 5 78 29]
[ 5 13 6 5 36 69 78 35 52 12]
[63 76 12 5 18 52 1 76 5 58]
[34 5 73 39 6 5 12 52 36 5]
[ 6 5 29 78 12 79 6 61 5 59]
[ 5 78 69 29 24 5 6 52 5 63]]
y
[[63 69 22 6 76 45 5 16 35 35]
[69 1 5 12 52 6 5 56 52 29]
[29 12 61 35 35 8 64 76 78 28]
[ 5 24 39 45 29 12 56 5 63 29]
[29 6 5 29 78 28 5 78 29 45]
[13 6 5 36 69 78 35 52 12 43]
[76 12 5 18 52 1 76 5 58 52]
[ 5 73 39 6 5 12 52 36 5 78]
[ 5 29 78 12 79 6 61 5 59 63]
[78 69 29 24 5 6 52 5 63 76]]
although the exact numbers will be different. Check to make sure the data is shifted over one step for y.
Below is where you'll build the network. We'll break it up into parts so it's easier to reason about each bit. Then we can connect them up into the whole network.
First off we'll create our input placeholders. As usual we need placeholders for the training data and the targets. We'll also create a placeholder for dropout layers called keep_prob. This will be a scalar, that is a 0-D tensor. To make a scalar, you create a placeholder without giving it a size.
Exercise: Create the input placeholders in the function below.
In [11]:
def build_inputs(batch_size, num_steps):
''' Define placeholders for inputs, targets, and dropout
Arguments
---------
batch_size: Batch size, number of sequences per batch
num_steps: Number of sequence steps in a batch
'''
# Declare placeholders we'll feed into the graph
inputs = tf.placeholder(tf.int32, shape=(batch_size, num_steps), name="inputs")
targets = tf.placeholder(tf.int32, shape=(batch_size, num_steps), name="targets")
# Keep probability placeholder for drop out layers
keep_prob = tf.placeholder(tf.float32, name='keep_prob')
return inputs, targets, keep_prob
Here we will create the LSTM cell we'll use in the hidden layer. We'll use this cell as a building block for the RNN. So we aren't actually defining the RNN here, just the type of cell we'll use in the hidden layer.
We first create a basic LSTM cell with
lstm = tf.contrib.rnn.BasicLSTMCell(num_units)
where num_units is the number of units in the hidden layers in the cell. Then we can add dropout by wrapping it with
tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
You pass in a cell and it will automatically add dropout to the inputs or outputs. Finally, we can stack up the LSTM cells into layers with tf.contrib.rnn.MultiRNNCell. With this, you pass in a list of cells and it will send the output of one cell into the next cell. Previously with TensorFlow 1.0, you could do this
tf.contrib.rnn.MultiRNNCell([cell]*num_layers)
This might look a little weird if you know Python well because this will create a list of the same cell object. However, TensorFlow 1.0 will create different weight matrices for all cell objects. But, starting with TensorFlow 1.1 you actually need to create new cell objects in the list. To get it to work in TensorFlow 1.1, it should look like
def build_cell(num_units, keep_prob):
lstm = tf.contrib.rnn.BasicLSTMCell(num_units)
drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
return drop
tf.contrib.rnn.MultiRNNCell([build_cell(num_units, keep_prob) for _ in range(num_layers)])
Even though this is actually multiple LSTM cells stacked on each other, you can treat the multiple layers as one cell.
We also need to create an initial cell state of all zeros. This can be done like so
initial_state = cell.zero_state(batch_size, tf.float32)
Below, we implement the build_lstm function to create these LSTM cells and the initial state.
In [12]:
def build_lstm(lstm_size, num_layers, batch_size, keep_prob):
''' Build LSTM cell.
Arguments
---------
keep_prob: Scalar tensor (tf.placeholder) for the dropout keep probability
lstm_size: Size of the hidden layers in the LSTM cells
num_layers: Number of LSTM layers
batch_size: Batch size
'''
### Build the LSTM Cell
# Use a basic LSTM cell
lstm_cells = [tf.contrib.rnn.BasicLSTMCell(lstm_size) for _ in range(num_layers)]
# Add dropout to the cell outputs
lstm_cells = [tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob) for lstm in lstm_cells]
# Stack up multiple LSTM layers, for deep learning
cell = tf.contrib.rnn.MultiRNNCell(lstm_cells)
initial_state = cell.zero_state(batch_size, tf.float32)
return cell, initial_state
Here we'll create the output layer. We need to connect the output of the RNN cells to a full connected layer with a softmax output. The softmax output gives us a probability distribution we can use to predict the next character, so we want this layer to have size $C$, the number of classes/characters we have in our text.
If our input has batch size $N$, number of steps $M$, and the hidden layer has $L$ hidden units, then the output is a 3D tensor with size $N \times M \times L$. The output of each LSTM cell has size $L$, we have $M$ of them, one for each sequence step, and we have $N$ sequences. So the total size is $N \times M \times L$.
We are using the same fully connected layer, the same weights, for each of the outputs. Then, to make things easier, we should reshape the outputs into a 2D tensor with shape $(M * N) \times L$. That is, one row for each sequence and step, where the values of each row are the output from the LSTM cells. We get the LSTM output as a list, lstm_output. First we need to concatenate this whole list into one array with tf.concat. Then, reshape it (with tf.reshape) to size $(M * N) \times L$.
One we have the outputs reshaped, we can do the matrix multiplication with the weights. We need to wrap the weight and bias variables in a variable scope with tf.variable_scope(scope_name) because there are weights being created in the LSTM cells. TensorFlow will throw an error if the weights created here have the same names as the weights created in the LSTM cells, which they will be default. To avoid this, we wrap the variables in a variable scope so we can give them unique names.
Exercise: Implement the output layer in the function below.
In [13]:
def build_output(lstm_output, in_size, out_size):
''' Build a softmax layer, return the softmax output and logits.
Arguments
---------
lstm_output: List of output tensors from the LSTM layer
in_size: Size of the input tensor, for example, size of the LSTM cells
out_size: Size of this softmax layer
'''
# Reshape output so it's a bunch of rows, one row for each step for each sequence.
# Concatenate lstm_output over axis 1 (the columns)
seq_output = tf.concat((lstm_output), axis=1)
# Reshape seq_output to a 2D tensor with lstm_size columns
x = tf.reshape(seq_output, shape=(-1, in_size))
# Connect the RNN outputs to a softmax layer
with tf.variable_scope('softmax'):
# Create the weight and bias variables here
softmax_w = tf.Variable(
tf.truncated_normal((in_size, out_size), stddev=0.1),
name="softmax_w"
)
softmax_b = tf.Variable(
tf.zeros(out_size),
name="softmax_b"
)
# Since output is a bunch of rows of RNN cell outputs, logits will be a bunch
# of rows of logit outputs, one for each step and sequence
logits = tf.matmul(x, softmax_w) + softmax_b
# Use softmax to get the probabilities for predicted characters
out = tf.nn.softmax(logits, name="predications")
return out, logits
Next up is the training loss. We get the logits and targets and calculate the softmax cross-entropy loss. First we need to one-hot encode the targets, we're getting them as encoded characters. Then, reshape the one-hot targets so it's a 2D tensor with size $(M*N) \times C$ where $C$ is the number of classes/characters we have. Remember that we reshaped the LSTM outputs and ran them through a fully connected layer with $C$ units. So our logits will also have size $(M*N) \times C$.
Then we run the logits and targets through tf.nn.softmax_cross_entropy_with_logits and find the mean to get the loss.
Exercise: Implement the loss calculation in the function below.
In [14]:
def build_loss(logits, targets, lstm_size, num_classes):
''' Calculate the loss from the logits and the targets.
Arguments
---------
logits: Logits from final fully connected layer
targets: Targets for supervised learning
lstm_size: Number of LSTM hidden units
num_classes: Number of classes in targets
'''
# One-hot encode targets and reshape to match logits, one row per sequence per step
y_one_hot = tf.one_hot(targets, depth=num_classes)
y_reshaped = tf.reshape(y_one_hot, shape=logits.get_shape())
# Softmax cross entropy loss
loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y_reshaped)
loss = tf.reduce_mean(loss)
return loss
Here we build the optimizer. Normal RNNs have have issues gradients exploding and disappearing. LSTMs fix the disappearance problem, but the gradients can still grow without bound. To fix this, we can clip the gradients above some threshold. That is, if a gradient is larger than that threshold, we set it to the threshold. This will ensure the gradients never grow overly large. Then we use an AdamOptimizer for the learning step.
In [15]:
def build_optimizer(loss, learning_rate, grad_clip):
''' Build optmizer for training, using gradient clipping.
Arguments:
loss: Network loss
learning_rate: Learning rate for optimizer
'''
# Optimizer for training, using gradient clipping to control exploding gradients
tvars = tf.trainable_variables()
grads, _ = tf.clip_by_global_norm(tf.gradients(loss, tvars), grad_clip)
train_op = tf.train.AdamOptimizer(learning_rate)
optimizer = train_op.apply_gradients(zip(grads, tvars))
return optimizer
Now we can put all the pieces together and build a class for the network. To actually run data through the LSTM cells, we will use tf.nn.dynamic_rnn. This function will pass the hidden and cell states across LSTM cells appropriately for us. It returns the outputs for each LSTM cell at each step for each sequence in the mini-batch. It also gives us the final LSTM state. We want to save this state as final_state so we can pass it to the first LSTM cell in the the next mini-batch run. For tf.nn.dynamic_rnn, we pass in the cell and initial state we get from build_lstm, as well as our input sequences. Also, we need to one-hot encode the inputs before going into the RNN.
Exercise: Use the functions you've implemented previously and
tf.nn.dynamic_rnnto build the network.
In [16]:
class CharRNN:
def __init__(self, num_classes, batch_size=64, num_steps=50,
lstm_size=128, num_layers=2, learning_rate=0.001,
grad_clip=5, sampling=False):
# When we're using this network for sampling later, we'll be passing in
# one character at a time, so providing an option for that
if sampling == True:
batch_size, num_steps = 1, 1
else:
batch_size, num_steps = batch_size, num_steps
tf.reset_default_graph()
# Build the input placeholder tensors
self.inputs, self.targets, self.keep_prob = build_inputs(batch_size, num_steps)
# Build the LSTM cell
cell, self.initial_state = build_lstm(lstm_size, num_layers, batch_size, self.keep_prob)
### Run the data through the RNN layers
# First, one-hot encode the input tokens
x_one_hot = tf.one_hot(self.inputs, depth=num_classes)
# Run each sequence step through the RNN with tf.nn.dynamic_rnn
lstm_outputs, state = tf.nn.dynamic_rnn(cell, x_one_hot, initial_state=self.initial_state)
self.final_state = state
# Get softmax predictions and logits
self.prediction, self.logits = build_output(lstm_outputs, lstm_size, num_classes)
# Loss and optimizer (with gradient clipping)
self.loss = build_loss(self.logits, self.targets, lstm_size, num_classes)
self.optimizer = build_optimizer(self.loss, learning_rate, grad_clip)
Here are the hyperparameters for the network.
batch_size - Number of sequences running through the network in one pass.num_steps - Number of characters in the sequence the network is trained on. Larger is better typically, the network will learn more long range dependencies. But it takes longer to train. 100 is typically a good number here.lstm_size - The number of units in the hidden layers.num_layers - Number of hidden LSTM layers to uselearning_rate - Learning rate for trainingkeep_prob - The dropout keep probability when training. If you're network is overfitting, try decreasing this.Here's some good advice from Andrej Karpathy on training the network. I'm going to copy it in here for your benefit, but also link to where it originally came from.
Tips and Tricks
Monitoring Validation Loss vs. Training Loss
If you're somewhat new to Machine Learning or Neural Networks it can take a bit of expertise to get good models. The most important quantity to keep track of is the difference between your training loss (printed during training) and the validation loss (printed once in a while when the RNN is run on the validation data (by default every 1000 iterations)). In particular:
- If your training loss is much lower than validation loss then this means the network might be overfitting. Solutions to this are to decrease your network size, or to increase dropout. For example you could try dropout of 0.5 and so on.
- If your training/validation loss are about equal then your model is underfitting. Increase the size of your model (either number of layers or the raw number of neurons per layer)
Approximate number of parameters
The two most important parameters that control the model are
lstm_sizeandnum_layers. I would advise that you always usenum_layersof either 2/3. Thelstm_sizecan be adjusted based on how much data you have. The two important quantities to keep track of here are:
- The number of parameters in your model. This is printed when you start training.
- The size of your dataset. 1MB file is approximately 1 million characters.
These two should be about the same order of magnitude. It's a little tricky to tell. Here are some examples:
- I have a 100MB dataset and I'm using the default parameter settings (which currently print 150K parameters). My data size is significantly larger (100 mil >> 0.15 mil), so I expect to heavily underfit. I am thinking I can comfortably afford to make
lstm_sizelarger.- I have a 10MB dataset and running a 10 million parameter model. I'm slightly nervous and I'm carefully monitoring my validation loss. If it's larger than my training loss then I may want to try to increase dropout a bit and see if that helps the validation loss.
Best models strategy
The winning strategy to obtaining very good models (if you have the compute time) is to always err on making the network larger (as large as you're willing to wait for it to compute) and then try different dropout values (between 0,1). Whatever model has the best validation performance (the loss, written in the checkpoint filename, low is good) is the one you should use in the end.
It is very common in deep learning to run many different models with many different hyperparameter settings, and in the end take whatever checkpoint gave the best validation performance.
By the way, the size of your training and validation splits are also parameters. Make sure you have a decent amount of data in your validation set or otherwise the validation performance will be noisy and not very informative.
In [22]:
batch_size = 100 # Sequences per batch
num_steps = 100 # Number of sequence steps per batch
lstm_size = 256 # Size of hidden layers in LSTMs
num_layers = 2 # Number of LSTM layers
learning_rate = 0.005 # Learning rate
keep_prob = 0.5 # Dropout keep probability
This is typical training code, passing inputs and targets into the network, then running the optimizer. Here we also get back the final LSTM state for the mini-batch. Then, we pass that state back into the network so the next batch can continue the state from the previous batch. And every so often (set by save_every_n) I save a checkpoint.
Here I'm saving checkpoints with the format
i{iteration number}_l{# hidden layer units}.ckpt
Exercise: Set the hyperparameters above to train the network. Watch the training loss, it should be consistently dropping. Also, I highly advise running this on a GPU.
In [23]:
epochs = 20
# Save every N iterations
save_every_n = 200
model = CharRNN(len(vocab), batch_size=batch_size, num_steps=num_steps,
lstm_size=lstm_size, num_layers=num_layers,
learning_rate=learning_rate)
saver = tf.train.Saver(max_to_keep=100)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# Use the line below to load a checkpoint and resume training
#saver.restore(sess, 'checkpoints/______.ckpt')
counter = 0
for e in range(epochs):
# Train network
new_state = sess.run(model.initial_state)
loss = 0
batches = get_batches(encoded, batch_size, num_steps)
for x, y in batches:
counter += 1
start = time.time()
feed = {model.inputs: x,
model.targets: y,
model.keep_prob: keep_prob,
model.initial_state: new_state}
batch_loss, new_state, _ = sess.run([model.loss,
model.final_state,
model.optimizer],
feed_dict=feed)
end = time.time()
print('Epoch: {}/{}... '.format(e+1, epochs),
'Training Step: {}... '.format(counter),
'Training loss: {:.4f}... '.format(batch_loss),
'{:.4f} sec/batch'.format((end-start)))
if (counter % save_every_n == 0):
saver.save(sess, "checkpoints/i{}_l{}.ckpt".format(counter, lstm_size))
saver.save(sess, "checkpoints/i{}_l{}.ckpt".format(counter, lstm_size))
Epoch: 1/20... Training Step: 1... Training loss: 4.4168... 2.7553 sec/batch
Epoch: 1/20... Training Step: 2... Training loss: 4.0440... 2.8731 sec/batch
Epoch: 1/20... Training Step: 3... Training loss: 3.8536... 2.1222 sec/batch
Epoch: 1/20... Training Step: 4... Training loss: 3.6617... 2.0287 sec/batch
Epoch: 1/20... Training Step: 5... Training loss: 3.5697... 2.0195 sec/batch
Epoch: 1/20... Training Step: 6... Training loss: 3.4796... 1.9183 sec/batch
Epoch: 1/20... Training Step: 7... Training loss: 3.3868... 1.9083 sec/batch
Epoch: 1/20... Training Step: 8... Training loss: 3.3118... 1.9457 sec/batch
Epoch: 1/20... Training Step: 9... Training loss: 3.2891... 2.0432 sec/batch
Epoch: 1/20... Training Step: 10... Training loss: 3.2658... 2.0174 sec/batch
Epoch: 1/20... Training Step: 11... Training loss: 3.2200... 2.5222 sec/batch
Epoch: 1/20... Training Step: 12... Training loss: 3.2174... 2.0447 sec/batch
Epoch: 1/20... Training Step: 13... Training loss: 3.2082... 2.0139 sec/batch
Epoch: 1/20... Training Step: 14... Training loss: 3.2263... 1.9188 sec/batch
Epoch: 1/20... Training Step: 15... Training loss: 3.2088... 1.9633 sec/batch
Epoch: 1/20... Training Step: 16... Training loss: 3.1969... 2.1001 sec/batch
Epoch: 1/20... Training Step: 17... Training loss: 3.1740... 1.8937 sec/batch
Epoch: 1/20... Training Step: 18... Training loss: 3.1994... 2.1194 sec/batch
Epoch: 1/20... Training Step: 19... Training loss: 3.1720... 2.1471 sec/batch
Epoch: 1/20... Training Step: 20... Training loss: 3.1390... 2.0628 sec/batch
Epoch: 1/20... Training Step: 21... Training loss: 3.1634... 2.0003 sec/batch
Epoch: 1/20... Training Step: 22... Training loss: 3.1569... 1.9300 sec/batch
Epoch: 1/20... Training Step: 23... Training loss: 3.1601... 1.9476 sec/batch
Epoch: 1/20... Training Step: 24... Training loss: 3.1534... 2.0000 sec/batch
Epoch: 1/20... Training Step: 25... Training loss: 3.1449... 1.9467 sec/batch
Epoch: 1/20... Training Step: 26... Training loss: 3.1569... 1.9103 sec/batch
Epoch: 1/20... Training Step: 27... Training loss: 3.1639... 2.1750 sec/batch
Epoch: 1/20... Training Step: 28... Training loss: 3.1320... 2.0915 sec/batch
Epoch: 1/20... Training Step: 29... Training loss: 3.1399... 1.9737 sec/batch
Epoch: 1/20... Training Step: 30... Training loss: 3.1498... 1.9983 sec/batch
Epoch: 1/20... Training Step: 31... Training loss: 3.1669... 1.9630 sec/batch
Epoch: 1/20... Training Step: 32... Training loss: 3.1432... 2.1393 sec/batch
Epoch: 1/20... Training Step: 33... Training loss: 3.1217... 2.2798 sec/batch
Epoch: 1/20... Training Step: 34... Training loss: 3.1444... 2.5280 sec/batch
Epoch: 1/20... Training Step: 35... Training loss: 3.1237... 2.2557 sec/batch
Epoch: 1/20... Training Step: 36... Training loss: 3.1411... 2.1953 sec/batch
Epoch: 1/20... Training Step: 37... Training loss: 3.1098... 1.9605 sec/batch
Epoch: 1/20... Training Step: 38... Training loss: 3.1187... 1.9796 sec/batch
Epoch: 1/20... Training Step: 39... Training loss: 3.1120... 2.0201 sec/batch
Epoch: 1/20... Training Step: 40... Training loss: 3.1185... 2.1992 sec/batch
Epoch: 1/20... Training Step: 41... Training loss: 3.1093... 2.1017 sec/batch
Epoch: 1/20... Training Step: 42... Training loss: 3.1148... 2.3602 sec/batch
Epoch: 1/20... Training Step: 43... Training loss: 3.1110... 2.1090 sec/batch
Epoch: 1/20... Training Step: 44... Training loss: 3.1103... 2.0887 sec/batch
Epoch: 1/20... Training Step: 45... Training loss: 3.1005... 2.5002 sec/batch
Epoch: 1/20... Training Step: 46... Training loss: 3.1188... 2.1311 sec/batch
Epoch: 1/20... Training Step: 47... Training loss: 3.1230... 1.9943 sec/batch
Epoch: 1/20... Training Step: 48... Training loss: 3.1254... 2.0552 sec/batch
Epoch: 1/20... Training Step: 49... Training loss: 3.1197... 2.4129 sec/batch
Epoch: 1/20... Training Step: 50... Training loss: 3.1217... 2.0459 sec/batch
Epoch: 1/20... Training Step: 51... Training loss: 3.1104... 2.0295 sec/batch
Epoch: 1/20... Training Step: 52... Training loss: 3.1009... 2.3155 sec/batch
Epoch: 1/20... Training Step: 53... Training loss: 3.1069... 2.0166 sec/batch
Epoch: 1/20... Training Step: 54... Training loss: 3.0948... 1.9723 sec/batch
Epoch: 1/20... Training Step: 55... Training loss: 3.1120... 2.1343 sec/batch
Epoch: 1/20... Training Step: 56... Training loss: 3.0813... 2.0456 sec/batch
Epoch: 1/20... Training Step: 57... Training loss: 3.0959... 2.0481 sec/batch
Epoch: 1/20... Training Step: 58... Training loss: 3.0994... 1.9233 sec/batch
Epoch: 1/20... Training Step: 59... Training loss: 3.0834... 2.1618 sec/batch
Epoch: 1/20... Training Step: 60... Training loss: 3.0993... 2.7316 sec/batch
Epoch: 1/20... Training Step: 61... Training loss: 3.0973... 2.3087 sec/batch
Epoch: 1/20... Training Step: 62... Training loss: 3.1109... 1.9595 sec/batch
Epoch: 1/20... Training Step: 63... Training loss: 3.1178... 2.0614 sec/batch
Epoch: 1/20... Training Step: 64... Training loss: 3.0664... 2.1200 sec/batch
Epoch: 1/20... Training Step: 65... Training loss: 3.0702... 2.3960 sec/batch
Epoch: 1/20... Training Step: 66... Training loss: 3.0993... 2.2587 sec/batch
Epoch: 1/20... Training Step: 67... Training loss: 3.0885... 1.9088 sec/batch
Epoch: 1/20... Training Step: 68... Training loss: 3.0380... 2.0045 sec/batch
Epoch: 1/20... Training Step: 69... Training loss: 3.0566... 1.9522 sec/batch
Epoch: 1/20... Training Step: 70... Training loss: 3.0771... 2.0493 sec/batch
Epoch: 1/20... Training Step: 71... Training loss: 3.0591... 1.9902 sec/batch
Epoch: 1/20... Training Step: 72... Training loss: 3.0792... 2.0944 sec/batch
Epoch: 1/20... Training Step: 73... Training loss: 3.0477... 2.2302 sec/batch
Epoch: 1/20... Training Step: 74... Training loss: 3.0558... 1.9817 sec/batch
Epoch: 1/20... Training Step: 75... Training loss: 3.0537... 2.0817 sec/batch
Epoch: 1/20... Training Step: 76... Training loss: 3.0554... 2.1730 sec/batch
Epoch: 1/20... Training Step: 77... Training loss: 3.0379... 2.0917 sec/batch
Epoch: 1/20... Training Step: 78... Training loss: 3.0278... 2.2478 sec/batch
Epoch: 1/20... Training Step: 79... Training loss: 3.0153... 2.0197 sec/batch
Epoch: 1/20... Training Step: 80... Training loss: 2.9997... 1.9851 sec/batch
Epoch: 1/20... Training Step: 81... Training loss: 2.9906... 2.0219 sec/batch
Epoch: 1/20... Training Step: 82... Training loss: 3.0084... 1.9235 sec/batch
Epoch: 1/20... Training Step: 83... Training loss: 2.9973... 2.1426 sec/batch
Epoch: 1/20... Training Step: 84... Training loss: 2.9643... 2.2811 sec/batch
Epoch: 1/20... Training Step: 85... Training loss: 2.9427... 1.9399 sec/batch
Epoch: 1/20... Training Step: 86... Training loss: 2.9429... 2.3182 sec/batch
Epoch: 1/20... Training Step: 87... Training loss: 2.9262... 2.2326 sec/batch
Epoch: 1/20... Training Step: 88... Training loss: 2.9084... 1.8849 sec/batch
Epoch: 1/20... Training Step: 89... Training loss: 2.9163... 2.1052 sec/batch
Epoch: 1/20... Training Step: 90... Training loss: 2.9043... 2.0087 sec/batch
Epoch: 1/20... Training Step: 91... Training loss: 2.8948... 2.0777 sec/batch
Epoch: 1/20... Training Step: 92... Training loss: 2.8743... 2.1016 sec/batch
Epoch: 1/20... Training Step: 93... Training loss: 2.8639... 2.1428 sec/batch
Epoch: 1/20... Training Step: 94... Training loss: 2.8342... 2.0582 sec/batch
Epoch: 1/20... Training Step: 95... Training loss: 2.8197... 2.5050 sec/batch
Epoch: 1/20... Training Step: 96... Training loss: 2.8650... 2.0856 sec/batch
Epoch: 1/20... Training Step: 97... Training loss: 2.8217... 1.9424 sec/batch
Epoch: 1/20... Training Step: 98... Training loss: 2.8408... 2.3388 sec/batch
Epoch: 1/20... Training Step: 99... Training loss: 2.8158... 2.2813 sec/batch
Epoch: 1/20... Training Step: 100... Training loss: 2.8064... 2.3606 sec/batch
Epoch: 1/20... Training Step: 101... Training loss: 2.8256... 2.5728 sec/batch
Epoch: 1/20... Training Step: 102... Training loss: 2.7760... 2.0857 sec/batch
Epoch: 1/20... Training Step: 103... Training loss: 2.7803... 2.1497 sec/batch
Epoch: 1/20... Training Step: 104... Training loss: 2.7521... 2.2895 sec/batch
Epoch: 1/20... Training Step: 105... Training loss: 2.7485... 2.0542 sec/batch
Epoch: 1/20... Training Step: 106... Training loss: 2.7517... 1.9620 sec/batch
Epoch: 1/20... Training Step: 107... Training loss: 2.7045... 1.9533 sec/batch
Epoch: 1/20... Training Step: 108... Training loss: 2.7260... 1.9333 sec/batch
Epoch: 1/20... Training Step: 109... Training loss: 2.7274... 2.0592 sec/batch
Epoch: 1/20... Training Step: 110... Training loss: 2.6743... 2.1531 sec/batch
Epoch: 1/20... Training Step: 111... Training loss: 2.6993... 1.9431 sec/batch
Epoch: 1/20... Training Step: 112... Training loss: 2.7151... 1.8982 sec/batch
Epoch: 1/20... Training Step: 113... Training loss: 2.6870... 1.9588 sec/batch
Epoch: 1/20... Training Step: 114... Training loss: 2.6701... 2.0813 sec/batch
Epoch: 1/20... Training Step: 115... Training loss: 2.6577... 1.9090 sec/batch
Epoch: 1/20... Training Step: 116... Training loss: 2.6446... 2.1170 sec/batch
Epoch: 1/20... Training Step: 117... Training loss: 2.6443... 2.1661 sec/batch
Epoch: 1/20... Training Step: 118... Training loss: 2.6491... 2.0343 sec/batch
Epoch: 1/20... Training Step: 119... Training loss: 2.6742... 2.1882 sec/batch
Epoch: 1/20... Training Step: 120... Training loss: 2.6276... 2.3986 sec/batch
Epoch: 1/20... Training Step: 121... Training loss: 2.6708... 2.3725 sec/batch
Epoch: 1/20... Training Step: 122... Training loss: 2.6377... 2.3958 sec/batch
Epoch: 1/20... Training Step: 123... Training loss: 2.6235... 2.2277 sec/batch
Epoch: 1/20... Training Step: 124... Training loss: 2.6267... 1.9575 sec/batch
Epoch: 1/20... Training Step: 125... Training loss: 2.6172... 1.9572 sec/batch
Epoch: 1/20... Training Step: 126... Training loss: 2.5950... 1.9319 sec/batch
Epoch: 1/20... Training Step: 127... Training loss: 2.6164... 1.9432 sec/batch
Epoch: 1/20... Training Step: 128... Training loss: 2.6023... 2.1055 sec/batch
Epoch: 1/20... Training Step: 129... Training loss: 2.5891... 2.3903 sec/batch
Epoch: 1/20... Training Step: 130... Training loss: 2.5887... 2.3856 sec/batch
Epoch: 1/20... Training Step: 131... Training loss: 2.5881... 2.4950 sec/batch
Epoch: 1/20... Training Step: 132... Training loss: 2.5694... 2.3428 sec/batch
Epoch: 1/20... Training Step: 133... Training loss: 2.5876... 2.3142 sec/batch
Epoch: 1/20... Training Step: 134... Training loss: 2.5761... 2.0185 sec/batch
Epoch: 1/20... Training Step: 135... Training loss: 2.5358... 2.1932 sec/batch
Epoch: 1/20... Training Step: 136... Training loss: 2.5489... 1.9337 sec/batch
Epoch: 1/20... Training Step: 137... Training loss: 2.5503... 2.0297 sec/batch
Epoch: 1/20... Training Step: 138... Training loss: 2.5480... 2.1179 sec/batch
Epoch: 1/20... Training Step: 139... Training loss: 2.5664... 2.2933 sec/batch
Epoch: 1/20... Training Step: 140... Training loss: 2.5387... 2.0309 sec/batch
Epoch: 1/20... Training Step: 141... Training loss: 2.5682... 2.1545 sec/batch
Epoch: 1/20... Training Step: 142... Training loss: 2.5186... 2.1773 sec/batch
Epoch: 1/20... Training Step: 143... Training loss: 2.5459... 2.2507 sec/batch
Epoch: 1/20... Training Step: 144... Training loss: 2.5086... 2.4008 sec/batch
Epoch: 1/20... Training Step: 145... Training loss: 2.5321... 2.1602 sec/batch
Epoch: 1/20... Training Step: 146... Training loss: 2.5492... 2.0114 sec/batch
Epoch: 1/20... Training Step: 147... Training loss: 2.5258... 2.1263 sec/batch
Epoch: 1/20... Training Step: 148... Training loss: 2.5453... 2.0643 sec/batch
Epoch: 1/20... Training Step: 149... Training loss: 2.5034... 1.9896 sec/batch
Epoch: 1/20... Training Step: 150... Training loss: 2.4969... 2.1947 sec/batch
Epoch: 1/20... Training Step: 151... Training loss: 2.5358... 2.0773 sec/batch
Epoch: 1/20... Training Step: 152... Training loss: 2.5420... 2.0621 sec/batch
Epoch: 1/20... Training Step: 153... Training loss: 2.5090... 2.3053 sec/batch
Epoch: 1/20... Training Step: 154... Training loss: 2.5209... 3.2227 sec/batch
Epoch: 1/20... Training Step: 155... Training loss: 2.4893... 3.0396 sec/batch
Epoch: 1/20... Training Step: 156... Training loss: 2.4850... 3.3374 sec/batch
Epoch: 1/20... Training Step: 157... Training loss: 2.4807... 2.2557 sec/batch
Epoch: 1/20... Training Step: 158... Training loss: 2.4686... 2.6283 sec/batch
Epoch: 1/20... Training Step: 159... Training loss: 2.4575... 2.1560 sec/batch
Epoch: 1/20... Training Step: 160... Training loss: 2.4826... 1.7158 sec/batch
Epoch: 1/20... Training Step: 161... Training loss: 2.4808... 1.6707 sec/batch
Epoch: 1/20... Training Step: 162... Training loss: 2.4285... 1.7296 sec/batch
Epoch: 1/20... Training Step: 163... Training loss: 2.4460... 1.8398 sec/batch
Epoch: 1/20... Training Step: 164... Training loss: 2.4597... 2.0886 sec/batch
Epoch: 1/20... Training Step: 165... Training loss: 2.4669... 2.6731 sec/batch
Epoch: 1/20... Training Step: 166... Training loss: 2.4588... 1.8357 sec/batch
Epoch: 1/20... Training Step: 167... Training loss: 2.4558... 2.0074 sec/batch
Epoch: 1/20... Training Step: 168... Training loss: 2.4513... 1.7596 sec/batch
Epoch: 1/20... Training Step: 169... Training loss: 2.4578... 1.6651 sec/batch
Epoch: 1/20... Training Step: 170... Training loss: 2.4322... 1.6511 sec/batch
Epoch: 1/20... Training Step: 171... Training loss: 2.4637... 1.7745 sec/batch
Epoch: 1/20... Training Step: 172... Training loss: 2.4777... 1.8598 sec/batch
Epoch: 1/20... Training Step: 173... Training loss: 2.4804... 2.0516 sec/batch
Epoch: 1/20... Training Step: 174... Training loss: 2.4885... 2.3678 sec/batch
Epoch: 1/20... Training Step: 175... Training loss: 2.4769... 1.8810 sec/batch
Epoch: 1/20... Training Step: 176... Training loss: 2.4362... 1.8902 sec/batch
Epoch: 1/20... Training Step: 177... Training loss: 2.4253... 1.8345 sec/batch
Epoch: 1/20... Training Step: 178... Training loss: 2.3924... 1.6792 sec/batch
Epoch: 1/20... Training Step: 179... Training loss: 2.4088... 1.6715 sec/batch
Epoch: 1/20... Training Step: 180... Training loss: 2.3992... 1.7460 sec/batch
Epoch: 1/20... Training Step: 181... Training loss: 2.4015... 1.8290 sec/batch
Epoch: 1/20... Training Step: 182... Training loss: 2.4117... 1.8898 sec/batch
Epoch: 1/20... Training Step: 183... Training loss: 2.3974... 1.8465 sec/batch
Epoch: 1/20... Training Step: 184... Training loss: 2.4330... 1.7292 sec/batch
Epoch: 1/20... Training Step: 185... Training loss: 2.4477... 2.7872 sec/batch
Epoch: 1/20... Training Step: 186... Training loss: 2.4079... 2.2951 sec/batch
Epoch: 1/20... Training Step: 187... Training loss: 2.3784... 1.8652 sec/batch
Epoch: 1/20... Training Step: 188... Training loss: 2.3627... 1.6959 sec/batch
Epoch: 1/20... Training Step: 189... Training loss: 2.3773... 1.6689 sec/batch
Epoch: 1/20... Training Step: 190... Training loss: 2.3840... 1.7404 sec/batch
Epoch: 1/20... Training Step: 191... Training loss: 2.3985... 1.6915 sec/batch
Epoch: 1/20... Training Step: 192... Training loss: 2.3489... 2.2362 sec/batch
Epoch: 1/20... Training Step: 193... Training loss: 2.3741... 2.7166 sec/batch
Epoch: 1/20... Training Step: 194... Training loss: 2.3783... 1.8118 sec/batch
Epoch: 1/20... Training Step: 195... Training loss: 2.3538... 1.7650 sec/batch
Epoch: 1/20... Training Step: 196... Training loss: 2.3616... 1.7153 sec/batch
Epoch: 1/20... Training Step: 197... Training loss: 2.3670... 1.8419 sec/batch
Epoch: 1/20... Training Step: 198... Training loss: 2.3530... 2.6139 sec/batch
Epoch: 2/20... Training Step: 199... Training loss: 2.5927... 2.2561 sec/batch
Epoch: 2/20... Training Step: 200... Training loss: 2.4487... 2.4998 sec/batch
Epoch: 2/20... Training Step: 201... Training loss: 2.4106... 1.8219 sec/batch
Epoch: 2/20... Training Step: 202... Training loss: 2.4025... 2.1761 sec/batch
Epoch: 2/20... Training Step: 203... Training loss: 2.4007... 2.0647 sec/batch
Epoch: 2/20... Training Step: 204... Training loss: 2.3863... 2.4917 sec/batch
Epoch: 2/20... Training Step: 205... Training loss: 2.3822... 2.0895 sec/batch
Epoch: 2/20... Training Step: 206... Training loss: 2.3856... 1.7564 sec/batch
Epoch: 2/20... Training Step: 207... Training loss: 2.3917... 1.7335 sec/batch
Epoch: 2/20... Training Step: 208... Training loss: 2.3619... 1.6807 sec/batch
Epoch: 2/20... Training Step: 209... Training loss: 2.3701... 1.7061 sec/batch
Epoch: 2/20... Training Step: 210... Training loss: 2.3538... 1.7318 sec/batch
Epoch: 2/20... Training Step: 211... Training loss: 2.3770... 1.7551 sec/batch
Epoch: 2/20... Training Step: 212... Training loss: 2.4030... 1.6551 sec/batch
Epoch: 2/20... Training Step: 213... Training loss: 2.3656... 1.8113 sec/batch
Epoch: 2/20... Training Step: 214... Training loss: 2.3595... 1.8593 sec/batch
Epoch: 2/20... Training Step: 215... Training loss: 2.3551... 1.9149 sec/batch
Epoch: 2/20... Training Step: 216... Training loss: 2.3996... 3.1842 sec/batch
Epoch: 2/20... Training Step: 217... Training loss: 2.3623... 2.7714 sec/batch
Epoch: 2/20... Training Step: 218... Training loss: 2.3301... 3.1459 sec/batch
Epoch: 2/20... Training Step: 219... Training loss: 2.3445... 1.8953 sec/batch
Epoch: 2/20... Training Step: 220... Training loss: 2.3722... 1.9278 sec/batch
Epoch: 2/20... Training Step: 221... Training loss: 2.3548... 2.2295 sec/batch
Epoch: 2/20... Training Step: 222... Training loss: 2.3357... 2.0142 sec/batch
Epoch: 2/20... Training Step: 223... Training loss: 2.3128... 2.1517 sec/batch
Epoch: 2/20... Training Step: 224... Training loss: 2.3429... 1.8334 sec/batch
Epoch: 2/20... Training Step: 225... Training loss: 2.3257... 1.7010 sec/batch
Epoch: 2/20... Training Step: 226... Training loss: 2.3304... 1.8607 sec/batch
Epoch: 2/20... Training Step: 227... Training loss: 2.3400... 1.7801 sec/batch
Epoch: 2/20... Training Step: 228... Training loss: 2.3297... 1.6890 sec/batch
Epoch: 2/20... Training Step: 229... Training loss: 2.3444... 1.7010 sec/batch
Epoch: 2/20... Training Step: 230... Training loss: 2.3022... 1.6874 sec/batch
Epoch: 2/20... Training Step: 231... Training loss: 2.3035... 1.9353 sec/batch
Epoch: 2/20... Training Step: 232... Training loss: 2.3345... 2.0761 sec/batch
Epoch: 2/20... Training Step: 233... Training loss: 2.2962... 1.7944 sec/batch
Epoch: 2/20... Training Step: 234... Training loss: 2.3155... 2.0981 sec/batch
Epoch: 2/20... Training Step: 235... Training loss: 2.3019... 2.2029 sec/batch
Epoch: 2/20... Training Step: 236... Training loss: 2.2773... 2.4492 sec/batch
Epoch: 2/20... Training Step: 237... Training loss: 2.2775... 1.8753 sec/batch
Epoch: 2/20... Training Step: 238... Training loss: 2.2659... 1.9057 sec/batch
Epoch: 2/20... Training Step: 239... Training loss: 2.2687... 1.8440 sec/batch
Epoch: 2/20... Training Step: 240... Training loss: 2.2727... 1.8626 sec/batch
Epoch: 2/20... Training Step: 241... Training loss: 2.2582... 1.8221 sec/batch
Epoch: 2/20... Training Step: 242... Training loss: 2.2736... 1.8735 sec/batch
Epoch: 2/20... Training Step: 243... Training loss: 2.2726... 1.8934 sec/batch
Epoch: 2/20... Training Step: 244... Training loss: 2.2475... 2.0639 sec/batch
Epoch: 2/20... Training Step: 245... Training loss: 2.2891... 1.8372 sec/batch
Epoch: 2/20... Training Step: 246... Training loss: 2.2590... 1.9968 sec/batch
Epoch: 2/20... Training Step: 247... Training loss: 2.2599... 2.0496 sec/batch
Epoch: 2/20... Training Step: 248... Training loss: 2.2926... 1.8878 sec/batch
Epoch: 2/20... Training Step: 249... Training loss: 2.2461... 2.0864 sec/batch
Epoch: 2/20... Training Step: 250... Training loss: 2.2839... 1.8605 sec/batch
Epoch: 2/20... Training Step: 251... Training loss: 2.2662... 2.0269 sec/batch
Epoch: 2/20... Training Step: 252... Training loss: 2.2452... 1.8573 sec/batch
Epoch: 2/20... Training Step: 253... Training loss: 2.2490... 1.9435 sec/batch
Epoch: 2/20... Training Step: 254... Training loss: 2.2547... 1.8413 sec/batch
Epoch: 2/20... Training Step: 255... Training loss: 2.2479... 1.8947 sec/batch
Epoch: 2/20... Training Step: 256... Training loss: 2.2468... 1.8555 sec/batch
Epoch: 2/20... Training Step: 257... Training loss: 2.2464... 1.9050 sec/batch
Epoch: 2/20... Training Step: 258... Training loss: 2.2565... 1.8105 sec/batch
Epoch: 2/20... Training Step: 259... Training loss: 2.2360... 2.0863 sec/batch
Epoch: 2/20... Training Step: 260... Training loss: 2.2620... 1.9118 sec/batch
Epoch: 2/20... Training Step: 261... Training loss: 2.2591... 1.8765 sec/batch
Epoch: 2/20... Training Step: 262... Training loss: 2.2315... 2.0133 sec/batch
Epoch: 2/20... Training Step: 263... Training loss: 2.2234... 1.9906 sec/batch
Epoch: 2/20... Training Step: 264... Training loss: 2.2674... 1.9539 sec/batch
Epoch: 2/20... Training Step: 265... Training loss: 2.2464... 2.6817 sec/batch
Epoch: 2/20... Training Step: 266... Training loss: 2.2220... 2.5362 sec/batch
Epoch: 2/20... Training Step: 267... Training loss: 2.2247... 2.2596 sec/batch
Epoch: 2/20... Training Step: 268... Training loss: 2.2246... 2.7595 sec/batch
Epoch: 2/20... Training Step: 269... Training loss: 2.2528... 2.0964 sec/batch
Epoch: 2/20... Training Step: 270... Training loss: 2.2349... 1.9393 sec/batch
Epoch: 2/20... Training Step: 271... Training loss: 2.2463... 1.9863 sec/batch
Epoch: 2/20... Training Step: 272... Training loss: 2.2266... 2.2577 sec/batch
Epoch: 2/20... Training Step: 273... Training loss: 2.2151... 2.7519 sec/batch
Epoch: 2/20... Training Step: 274... Training loss: 2.2534... 2.4682 sec/batch
Epoch: 2/20... Training Step: 275... Training loss: 2.2171... 2.4704 sec/batch
Epoch: 2/20... Training Step: 276... Training loss: 2.2158... 1.7883 sec/batch
Epoch: 2/20... Training Step: 277... Training loss: 2.1977... 2.1919 sec/batch
Epoch: 2/20... Training Step: 278... Training loss: 2.2052... 2.5182 sec/batch
Epoch: 2/20... Training Step: 279... Training loss: 2.1870... 2.1545 sec/batch
Epoch: 2/20... Training Step: 280... Training loss: 2.2255... 1.9464 sec/batch
Epoch: 2/20... Training Step: 281... Training loss: 2.1849... 1.9431 sec/batch
Epoch: 2/20... Training Step: 282... Training loss: 2.1818... 2.5070 sec/batch
Epoch: 2/20... Training Step: 283... Training loss: 2.1722... 2.2905 sec/batch
Epoch: 2/20... Training Step: 284... Training loss: 2.1950... 1.9480 sec/batch
Epoch: 2/20... Training Step: 285... Training loss: 2.1840... 2.0355 sec/batch
Epoch: 2/20... Training Step: 286... Training loss: 2.1852... 3.0299 sec/batch
Epoch: 2/20... Training Step: 287... Training loss: 2.1615... 3.8708 sec/batch
Epoch: 2/20... Training Step: 288... Training loss: 2.2113... 2.7966 sec/batch
Epoch: 2/20... Training Step: 289... Training loss: 2.1808... 3.2135 sec/batch
Epoch: 2/20... Training Step: 290... Training loss: 2.1854... 2.9938 sec/batch
Epoch: 2/20... Training Step: 291... Training loss: 2.1751... 3.7351 sec/batch
Epoch: 2/20... Training Step: 292... Training loss: 2.1708... 2.5165 sec/batch
Epoch: 2/20... Training Step: 293... Training loss: 2.1651... 2.9031 sec/batch
Epoch: 2/20... Training Step: 294... Training loss: 2.1849... 2.3584 sec/batch
Epoch: 2/20... Training Step: 295... Training loss: 2.1841... 2.6968 sec/batch
Epoch: 2/20... Training Step: 296... Training loss: 2.1658... 2.9703 sec/batch
Epoch: 2/20... Training Step: 297... Training loss: 2.1672... 3.0349 sec/batch
Epoch: 2/20... Training Step: 298... Training loss: 2.1489... 2.8462 sec/batch
Epoch: 2/20... Training Step: 299... Training loss: 2.1785... 2.8863 sec/batch
Epoch: 2/20... Training Step: 300... Training loss: 2.1696... 3.0764 sec/batch
Epoch: 2/20... Training Step: 301... Training loss: 2.1623... 2.6057 sec/batch
Epoch: 2/20... Training Step: 302... Training loss: 2.1586... 2.6025 sec/batch
Epoch: 2/20... Training Step: 303... Training loss: 2.1529... 2.7924 sec/batch
Epoch: 2/20... Training Step: 304... Training loss: 2.1750... 2.2616 sec/batch
Epoch: 2/20... Training Step: 305... Training loss: 2.1664... 2.6682 sec/batch
Epoch: 2/20... Training Step: 306... Training loss: 2.1716... 2.5663 sec/batch
Epoch: 2/20... Training Step: 307... Training loss: 2.1788... 2.5770 sec/batch
Epoch: 2/20... Training Step: 308... Training loss: 2.1468... 2.7801 sec/batch
Epoch: 2/20... Training Step: 309... Training loss: 2.1666... 2.5286 sec/batch
Epoch: 2/20... Training Step: 310... Training loss: 2.1620... 2.4954 sec/batch
Epoch: 2/20... Training Step: 311... Training loss: 2.1560... 3.1552 sec/batch
Epoch: 2/20... Training Step: 312... Training loss: 2.1563... 3.8565 sec/batch
Epoch: 2/20... Training Step: 313... Training loss: 2.1522... 2.6858 sec/batch
Epoch: 2/20... Training Step: 314... Training loss: 2.1078... 2.3150 sec/batch
Epoch: 2/20... Training Step: 315... Training loss: 2.1498... 2.2824 sec/batch
Epoch: 2/20... Training Step: 316... Training loss: 2.1450... 2.4283 sec/batch
Epoch: 2/20... Training Step: 317... Training loss: 2.1690... 2.3825 sec/batch
Epoch: 2/20... Training Step: 318... Training loss: 2.1428... 2.4701 sec/batch
Epoch: 2/20... Training Step: 319... Training loss: 2.1600... 1.9463 sec/batch
Epoch: 2/20... Training Step: 320... Training loss: 2.1443... 2.0320 sec/batch
Epoch: 2/20... Training Step: 321... Training loss: 2.1342... 3.0958 sec/batch
Epoch: 2/20... Training Step: 322... Training loss: 2.1562... 2.1615 sec/batch
Epoch: 2/20... Training Step: 323... Training loss: 2.1365... 2.5373 sec/batch
Epoch: 2/20... Training Step: 324... Training loss: 2.1087... 2.6556 sec/batch
Epoch: 2/20... Training Step: 325... Training loss: 2.1513... 2.0379 sec/batch
Epoch: 2/20... Training Step: 326... Training loss: 2.1549... 2.1781 sec/batch
Epoch: 2/20... Training Step: 327... Training loss: 2.1377... 2.2019 sec/batch
Epoch: 2/20... Training Step: 328... Training loss: 2.1463... 2.1255 sec/batch
Epoch: 2/20... Training Step: 329... Training loss: 2.1249... 2.7511 sec/batch
Epoch: 2/20... Training Step: 330... Training loss: 2.1283... 2.4645 sec/batch
Epoch: 2/20... Training Step: 331... Training loss: 2.1447... 2.3587 sec/batch
Epoch: 2/20... Training Step: 332... Training loss: 2.1534... 1.9504 sec/batch
Epoch: 2/20... Training Step: 333... Training loss: 2.1076... 2.0665 sec/batch
Epoch: 2/20... Training Step: 334... Training loss: 2.1263... 2.0145 sec/batch
Epoch: 2/20... Training Step: 335... Training loss: 2.1345... 2.7048 sec/batch
Epoch: 2/20... Training Step: 336... Training loss: 2.1258... 2.0041 sec/batch
Epoch: 2/20... Training Step: 337... Training loss: 2.1525... 2.1267 sec/batch
Epoch: 2/20... Training Step: 338... Training loss: 2.1158... 2.0479 sec/batch
Epoch: 2/20... Training Step: 339... Training loss: 2.1401... 2.2201 sec/batch
Epoch: 2/20... Training Step: 340... Training loss: 2.1107... 2.0363 sec/batch
Epoch: 2/20... Training Step: 341... Training loss: 2.1118... 2.1387 sec/batch
Epoch: 2/20... Training Step: 342... Training loss: 2.0971... 2.0979 sec/batch
Epoch: 2/20... Training Step: 343... Training loss: 2.0938... 2.0996 sec/batch
Epoch: 2/20... Training Step: 344... Training loss: 2.1330... 2.0835 sec/batch
Epoch: 2/20... Training Step: 345... Training loss: 2.1190... 2.0356 sec/batch
Epoch: 2/20... Training Step: 346... Training loss: 2.1315... 2.1023 sec/batch
Epoch: 2/20... Training Step: 347... Training loss: 2.1073... 2.0747 sec/batch
Epoch: 2/20... Training Step: 348... Training loss: 2.0940... 2.0592 sec/batch
Epoch: 2/20... Training Step: 349... Training loss: 2.1115... 2.0044 sec/batch
Epoch: 2/20... Training Step: 350... Training loss: 2.1474... 2.0188 sec/batch
Epoch: 2/20... Training Step: 351... Training loss: 2.1109... 2.1633 sec/batch
Epoch: 2/20... Training Step: 352... Training loss: 2.1232... 2.0466 sec/batch
Epoch: 2/20... Training Step: 353... Training loss: 2.0978... 2.0123 sec/batch
Epoch: 2/20... Training Step: 354... Training loss: 2.0893... 1.9962 sec/batch
Epoch: 2/20... Training Step: 355... Training loss: 2.0905... 2.0694 sec/batch
Epoch: 2/20... Training Step: 356... Training loss: 2.0951... 2.1111 sec/batch
Epoch: 2/20... Training Step: 357... Training loss: 2.0731... 2.0520 sec/batch
Epoch: 2/20... Training Step: 358... Training loss: 2.1174... 2.0318 sec/batch
Epoch: 2/20... Training Step: 359... Training loss: 2.1008... 1.9856 sec/batch
Epoch: 2/20... Training Step: 360... Training loss: 2.0665... 2.1054 sec/batch
Epoch: 2/20... Training Step: 361... Training loss: 2.0931... 2.0430 sec/batch
Epoch: 2/20... Training Step: 362... Training loss: 2.0890... 2.0907 sec/batch
Epoch: 2/20... Training Step: 363... Training loss: 2.0964... 1.9815 sec/batch
Epoch: 2/20... Training Step: 364... Training loss: 2.0770... 2.0830 sec/batch
Epoch: 2/20... Training Step: 365... Training loss: 2.0965... 2.1589 sec/batch
Epoch: 2/20... Training Step: 366... Training loss: 2.1155... 2.0517 sec/batch
Epoch: 2/20... Training Step: 367... Training loss: 2.0692... 2.1399 sec/batch
Epoch: 2/20... Training Step: 368... Training loss: 2.0724... 1.9234 sec/batch
Epoch: 2/20... Training Step: 369... Training loss: 2.0913... 2.1424 sec/batch
Epoch: 2/20... Training Step: 370... Training loss: 2.1100... 2.0324 sec/batch
Epoch: 2/20... Training Step: 371... Training loss: 2.1274... 2.0143 sec/batch
Epoch: 2/20... Training Step: 372... Training loss: 2.1252... 2.2016 sec/batch
Epoch: 2/20... Training Step: 373... Training loss: 2.1128... 2.1002 sec/batch
Epoch: 2/20... Training Step: 374... Training loss: 2.0874... 2.0253 sec/batch
Epoch: 2/20... Training Step: 375... Training loss: 2.0626... 1.9895 sec/batch
Epoch: 2/20... Training Step: 376... Training loss: 2.0594... 2.3482 sec/batch
Epoch: 2/20... Training Step: 377... Training loss: 2.0413... 2.2877 sec/batch
Epoch: 2/20... Training Step: 378... Training loss: 2.0471... 1.9323 sec/batch
Epoch: 2/20... Training Step: 379... Training loss: 2.0538... 1.9655 sec/batch
Epoch: 2/20... Training Step: 380... Training loss: 2.0763... 1.8898 sec/batch
Epoch: 2/20... Training Step: 381... Training loss: 2.0758... 1.9156 sec/batch
Epoch: 2/20... Training Step: 382... Training loss: 2.0868... 1.9598 sec/batch
Epoch: 2/20... Training Step: 383... Training loss: 2.0899... 2.0768 sec/batch
Epoch: 2/20... Training Step: 384... Training loss: 2.0610... 2.0035 sec/batch
Epoch: 2/20... Training Step: 385... Training loss: 2.0585... 1.9350 sec/batch
Epoch: 2/20... Training Step: 386... Training loss: 2.0534... 2.0063 sec/batch
Epoch: 2/20... Training Step: 387... Training loss: 2.0426... 1.9591 sec/batch
Epoch: 2/20... Training Step: 388... Training loss: 2.0547... 1.9975 sec/batch
Epoch: 2/20... Training Step: 389... Training loss: 2.0675... 2.2877 sec/batch
Epoch: 2/20... Training Step: 390... Training loss: 2.0309... 2.2942 sec/batch
Epoch: 2/20... Training Step: 391... Training loss: 2.0552... 2.1031 sec/batch
Epoch: 2/20... Training Step: 392... Training loss: 2.0479... 2.1331 sec/batch
Epoch: 2/20... Training Step: 393... Training loss: 2.0232... 2.2204 sec/batch
Epoch: 2/20... Training Step: 394... Training loss: 2.0471... 2.0505 sec/batch
Epoch: 2/20... Training Step: 395... Training loss: 2.0363... 2.2792 sec/batch
Epoch: 2/20... Training Step: 396... Training loss: 2.0352... 2.2425 sec/batch
Epoch: 3/20... Training Step: 397... Training loss: 2.1768... 2.1503 sec/batch
Epoch: 3/20... Training Step: 398... Training loss: 2.0597... 2.5327 sec/batch
Epoch: 3/20... Training Step: 399... Training loss: 2.0759... 2.1837 sec/batch
Epoch: 3/20... Training Step: 400... Training loss: 2.0766... 2.0770 sec/batch
Epoch: 3/20... Training Step: 401... Training loss: 2.0886... 2.8065 sec/batch
Epoch: 3/20... Training Step: 402... Training loss: 2.0847... 2.2050 sec/batch
Epoch: 3/20... Training Step: 403... Training loss: 2.0864... 2.1813 sec/batch
Epoch: 3/20... Training Step: 404... Training loss: 2.0906... 2.3273 sec/batch
Epoch: 3/20... Training Step: 405... Training loss: 2.1171... 2.0569 sec/batch
Epoch: 3/20... Training Step: 406... Training loss: 2.0795... 1.8914 sec/batch
Epoch: 3/20... Training Step: 407... Training loss: 2.0716... 2.1455 sec/batch
Epoch: 3/20... Training Step: 408... Training loss: 2.0663... 2.1870 sec/batch
Epoch: 3/20... Training Step: 409... Training loss: 2.0887... 2.4975 sec/batch
Epoch: 3/20... Training Step: 410... Training loss: 2.1107... 2.4742 sec/batch
Epoch: 3/20... Training Step: 411... Training loss: 2.0834... 1.8256 sec/batch
Epoch: 3/20... Training Step: 412... Training loss: 2.0560... 1.7357 sec/batch
Epoch: 3/20... Training Step: 413... Training loss: 2.0853... 1.8573 sec/batch
Epoch: 3/20... Training Step: 414... Training loss: 2.1116... 1.8864 sec/batch
Epoch: 3/20... Training Step: 415... Training loss: 2.0715... 2.6628 sec/batch
Epoch: 3/20... Training Step: 416... Training loss: 2.0700... 2.7341 sec/batch
Epoch: 3/20... Training Step: 417... Training loss: 2.0633... 1.7301 sec/batch
Epoch: 3/20... Training Step: 418... Training loss: 2.1155... 1.7584 sec/batch
Epoch: 3/20... Training Step: 419... Training loss: 2.0663... 2.2014 sec/batch
Epoch: 3/20... Training Step: 420... Training loss: 2.0617... 2.1724 sec/batch
Epoch: 3/20... Training Step: 421... Training loss: 2.0574... 3.6361 sec/batch
Epoch: 3/20... Training Step: 422... Training loss: 2.0538... 2.6632 sec/batch
Epoch: 3/20... Training Step: 423... Training loss: 2.0595... 2.3612 sec/batch
Epoch: 3/20... Training Step: 424... Training loss: 2.0705... 2.5692 sec/batch
Epoch: 3/20... Training Step: 425... Training loss: 2.0951... 2.1954 sec/batch
Epoch: 3/20... Training Step: 426... Training loss: 2.0749... 2.4146 sec/batch
Epoch: 3/20... Training Step: 427... Training loss: 2.0506... 2.4039 sec/batch
Epoch: 3/20... Training Step: 428... Training loss: 2.0406... 2.5509 sec/batch
Epoch: 3/20... Training Step: 429... Training loss: 2.0658... 2.5478 sec/batch
Epoch: 3/20... Training Step: 430... Training loss: 2.0838... 3.1571 sec/batch
Epoch: 3/20... Training Step: 431... Training loss: 2.0375... 3.5113 sec/batch
Epoch: 3/20... Training Step: 432... Training loss: 2.0662... 2.6376 sec/batch
Epoch: 3/20... Training Step: 433... Training loss: 2.0444... 2.8733 sec/batch
Epoch: 3/20... Training Step: 434... Training loss: 2.0091... 2.1142 sec/batch
Epoch: 3/20... Training Step: 435... Training loss: 2.0229... 2.4665 sec/batch
Epoch: 3/20... Training Step: 436... Training loss: 2.0219... 3.1557 sec/batch
Epoch: 3/20... Training Step: 437... Training loss: 2.0240... 3.7243 sec/batch
Epoch: 3/20... Training Step: 438... Training loss: 2.0410... 3.5335 sec/batch
Epoch: 3/20... Training Step: 439... Training loss: 2.0156... 3.4895 sec/batch
Epoch: 3/20... Training Step: 440... Training loss: 2.0275... 2.8941 sec/batch
Epoch: 3/20... Training Step: 441... Training loss: 2.0435... 2.9838 sec/batch
Epoch: 3/20... Training Step: 442... Training loss: 2.0019... 4.0644 sec/batch
Epoch: 3/20... Training Step: 443... Training loss: 2.0399... 3.3620 sec/batch
Epoch: 3/20... Training Step: 444... Training loss: 2.0229... 2.4381 sec/batch
Epoch: 3/20... Training Step: 445... Training loss: 2.0346... 2.3956 sec/batch
Epoch: 3/20... Training Step: 446... Training loss: 2.0656... 2.4376 sec/batch
Epoch: 3/20... Training Step: 447... Training loss: 2.0046... 2.6621 sec/batch
Epoch: 3/20... Training Step: 448... Training loss: 2.0809... 2.3592 sec/batch
Epoch: 3/20... Training Step: 449... Training loss: 2.0280... 2.4419 sec/batch
Epoch: 3/20... Training Step: 450... Training loss: 2.0135... 2.3130 sec/batch
Epoch: 3/20... Training Step: 451... Training loss: 2.0225... 2.5560 sec/batch
Epoch: 3/20... Training Step: 452... Training loss: 2.0373... 2.2842 sec/batch
Epoch: 3/20... Training Step: 453... Training loss: 2.0179... 2.4184 sec/batch
Epoch: 3/20... Training Step: 454... Training loss: 2.0160... 2.3650 sec/batch
Epoch: 3/20... Training Step: 455... Training loss: 2.0042... 2.2146 sec/batch
Epoch: 3/20... Training Step: 456... Training loss: 2.0480... 2.2237 sec/batch
Epoch: 3/20... Training Step: 457... Training loss: 2.0107... 2.4614 sec/batch
Epoch: 3/20... Training Step: 458... Training loss: 2.0617... 2.2171 sec/batch
Epoch: 3/20... Training Step: 459... Training loss: 2.0491... 2.3927 sec/batch
Epoch: 3/20... Training Step: 460... Training loss: 2.0366... 2.5030 sec/batch
Epoch: 3/20... Training Step: 461... Training loss: 2.0167... 2.3133 sec/batch
Epoch: 3/20... Training Step: 462... Training loss: 2.0469... 2.1509 sec/batch
Epoch: 3/20... Training Step: 463... Training loss: 2.0242... 2.1234 sec/batch
Epoch: 3/20... Training Step: 464... Training loss: 2.0025... 2.0640 sec/batch
Epoch: 3/20... Training Step: 465... Training loss: 2.0088... 2.4794 sec/batch
Epoch: 3/20... Training Step: 466... Training loss: 2.0009... 2.4477 sec/batch
Epoch: 3/20... Training Step: 467... Training loss: 2.0376... 2.1414 sec/batch
Epoch: 3/20... Training Step: 468... Training loss: 2.0280... 2.1765 sec/batch
Epoch: 3/20... Training Step: 469... Training loss: 2.0317... 2.1451 sec/batch
Epoch: 3/20... Training Step: 470... Training loss: 2.0048... 2.1270 sec/batch
Epoch: 3/20... Training Step: 471... Training loss: 2.0067... 2.2368 sec/batch
Epoch: 3/20... Training Step: 472... Training loss: 2.0381... 2.3191 sec/batch
Epoch: 3/20... Training Step: 473... Training loss: 2.0013... 2.3582 sec/batch
Epoch: 3/20... Training Step: 474... Training loss: 2.0118... 2.1915 sec/batch
Epoch: 3/20... Training Step: 475... Training loss: 1.9818... 2.5617 sec/batch
Epoch: 3/20... Training Step: 476... Training loss: 1.9937... 3.5426 sec/batch
Epoch: 3/20... Training Step: 477... Training loss: 1.9727... 3.3825 sec/batch
Epoch: 3/20... Training Step: 478... Training loss: 2.0268... 2.5173 sec/batch
Epoch: 3/20... Training Step: 479... Training loss: 1.9691... 4.0151 sec/batch
Epoch: 3/20... Training Step: 480... Training loss: 1.9896... 3.6029 sec/batch
Epoch: 3/20... Training Step: 481... Training loss: 1.9524... 2.4683 sec/batch
Epoch: 3/20... Training Step: 482... Training loss: 1.9846... 3.4380 sec/batch
Epoch: 3/20... Training Step: 483... Training loss: 1.9919... 2.5363 sec/batch
Epoch: 3/20... Training Step: 484... Training loss: 1.9747... 2.9440 sec/batch
Epoch: 3/20... Training Step: 485... Training loss: 1.9672... 2.3834 sec/batch
Epoch: 3/20... Training Step: 486... Training loss: 2.0073... 2.5218 sec/batch
Epoch: 3/20... Training Step: 487... Training loss: 1.9685... 2.0903 sec/batch
Epoch: 3/20... Training Step: 488... Training loss: 1.9839... 2.4436 sec/batch
Epoch: 3/20... Training Step: 489... Training loss: 1.9789... 2.9515 sec/batch
Epoch: 3/20... Training Step: 490... Training loss: 1.9773... 2.0478 sec/batch
Epoch: 3/20... Training Step: 491... Training loss: 1.9693... 2.7413 sec/batch
Epoch: 3/20... Training Step: 492... Training loss: 1.9961... 2.2533 sec/batch
Epoch: 3/20... Training Step: 493... Training loss: 1.9844... 2.4380 sec/batch
Epoch: 3/20... Training Step: 494... Training loss: 1.9712... 2.2152 sec/batch
Epoch: 3/20... Training Step: 495... Training loss: 1.9688... 2.6204 sec/batch
Epoch: 3/20... Training Step: 496... Training loss: 1.9496... 2.5417 sec/batch
Epoch: 3/20... Training Step: 497... Training loss: 1.9869... 2.0589 sec/batch
Epoch: 3/20... Training Step: 498... Training loss: 1.9784... 2.2155 sec/batch
Epoch: 3/20... Training Step: 499... Training loss: 1.9642... 2.3986 sec/batch
Epoch: 3/20... Training Step: 500... Training loss: 1.9601... 2.1879 sec/batch
Epoch: 3/20... Training Step: 501... Training loss: 1.9677... 2.9963 sec/batch
Epoch: 3/20... Training Step: 502... Training loss: 1.9926... 2.3876 sec/batch
Epoch: 3/20... Training Step: 503... Training loss: 1.9625... 2.3920 sec/batch
Epoch: 3/20... Training Step: 504... Training loss: 1.9796... 2.3234 sec/batch
Epoch: 3/20... Training Step: 505... Training loss: 1.9861... 2.7686 sec/batch
Epoch: 3/20... Training Step: 506... Training loss: 1.9692... 2.9429 sec/batch
Epoch: 3/20... Training Step: 507... Training loss: 1.9666... 2.9165 sec/batch
Epoch: 3/20... Training Step: 508... Training loss: 1.9702... 3.0782 sec/batch
Epoch: 3/20... Training Step: 509... Training loss: 1.9771... 2.1458 sec/batch
Epoch: 3/20... Training Step: 510... Training loss: 1.9603... 2.4537 sec/batch
Epoch: 3/20... Training Step: 511... Training loss: 1.9515... 2.1588 sec/batch
Epoch: 3/20... Training Step: 512... Training loss: 1.9285... 2.3095 sec/batch
Epoch: 3/20... Training Step: 513... Training loss: 1.9738... 2.3201 sec/batch
Epoch: 3/20... Training Step: 514... Training loss: 1.9676... 2.3646 sec/batch
Epoch: 3/20... Training Step: 515... Training loss: 1.9688... 2.1568 sec/batch
Epoch: 3/20... Training Step: 516... Training loss: 1.9528... 2.2987 sec/batch
Epoch: 3/20... Training Step: 517... Training loss: 1.9866... 2.2531 sec/batch
Epoch: 3/20... Training Step: 518... Training loss: 1.9523... 2.3908 sec/batch
Epoch: 3/20... Training Step: 519... Training loss: 1.9591... 2.2615 sec/batch
Epoch: 3/20... Training Step: 520... Training loss: 1.9816... 2.3790 sec/batch
Epoch: 3/20... Training Step: 521... Training loss: 1.9511... 2.1957 sec/batch
Epoch: 3/20... Training Step: 522... Training loss: 1.9312... 2.1446 sec/batch
Epoch: 3/20... Training Step: 523... Training loss: 1.9736... 2.1208 sec/batch
Epoch: 3/20... Training Step: 524... Training loss: 1.9632... 2.7073 sec/batch
Epoch: 3/20... Training Step: 525... Training loss: 1.9581... 2.0598 sec/batch
Epoch: 3/20... Training Step: 526... Training loss: 1.9602... 2.3730 sec/batch
Epoch: 3/20... Training Step: 527... Training loss: 1.9452... 2.2233 sec/batch
Epoch: 3/20... Training Step: 528... Training loss: 1.9349... 2.2481 sec/batch
Epoch: 3/20... Training Step: 529... Training loss: 1.9798... 2.5100 sec/batch
Epoch: 3/20... Training Step: 530... Training loss: 1.9705... 2.2841 sec/batch
Epoch: 3/20... Training Step: 531... Training loss: 1.9758... 2.4361 sec/batch
Epoch: 3/20... Training Step: 532... Training loss: 1.9748... 2.2959 sec/batch
Epoch: 3/20... Training Step: 533... Training loss: 1.9753... 2.4793 sec/batch
Epoch: 3/20... Training Step: 534... Training loss: 1.9655... 2.3799 sec/batch
Epoch: 3/20... Training Step: 535... Training loss: 1.9812... 2.4206 sec/batch
Epoch: 3/20... Training Step: 536... Training loss: 1.9438... 2.5469 sec/batch
Epoch: 3/20... Training Step: 537... Training loss: 1.9839... 2.3609 sec/batch
Epoch: 3/20... Training Step: 538... Training loss: 1.9555... 2.2969 sec/batch
Epoch: 3/20... Training Step: 539... Training loss: 1.9584... 2.5510 sec/batch
Epoch: 3/20... Training Step: 540... Training loss: 1.9487... 2.4892 sec/batch
Epoch: 3/20... Training Step: 541... Training loss: 1.9371... 2.5219 sec/batch
Epoch: 3/20... Training Step: 542... Training loss: 1.9711... 2.4154 sec/batch
Epoch: 3/20... Training Step: 543... Training loss: 1.9570... 2.5545 sec/batch
Epoch: 3/20... Training Step: 544... Training loss: 1.9718... 2.4949 sec/batch
Epoch: 3/20... Training Step: 545... Training loss: 1.9617... 2.4399 sec/batch
Epoch: 3/20... Training Step: 546... Training loss: 1.9359... 2.2902 sec/batch
Epoch: 3/20... Training Step: 547... Training loss: 1.9246... 2.5115 sec/batch
Epoch: 3/20... Training Step: 548... Training loss: 1.9898... 2.3534 sec/batch
Epoch: 3/20... Training Step: 549... Training loss: 1.9471... 2.8463 sec/batch
Epoch: 3/20... Training Step: 550... Training loss: 1.9595... 2.4626 sec/batch
Epoch: 3/20... Training Step: 551... Training loss: 1.9521... 2.2484 sec/batch
Epoch: 3/20... Training Step: 552... Training loss: 1.9375... 2.6110 sec/batch
Epoch: 3/20... Training Step: 553... Training loss: 1.9307... 2.5219 sec/batch
Epoch: 3/20... Training Step: 554... Training loss: 1.9248... 2.5959 sec/batch
Epoch: 3/20... Training Step: 555... Training loss: 1.9086... 2.1955 sec/batch
Epoch: 3/20... Training Step: 556... Training loss: 1.9639... 2.2747 sec/batch
Epoch: 3/20... Training Step: 557... Training loss: 1.9708... 2.6545 sec/batch
Epoch: 3/20... Training Step: 558... Training loss: 1.9317... 2.4453 sec/batch
Epoch: 3/20... Training Step: 559... Training loss: 1.9485... 2.2323 sec/batch
Epoch: 3/20... Training Step: 560... Training loss: 1.9337... 2.1520 sec/batch
Epoch: 3/20... Training Step: 561... Training loss: 1.9372... 2.1395 sec/batch
Epoch: 3/20... Training Step: 562... Training loss: 1.9322... 2.1819 sec/batch
Epoch: 3/20... Training Step: 563... Training loss: 1.9499... 2.0708 sec/batch
Epoch: 3/20... Training Step: 564... Training loss: 1.9771... 2.3898 sec/batch
Epoch: 3/20... Training Step: 565... Training loss: 1.9364... 2.2000 sec/batch
Epoch: 3/20... Training Step: 566... Training loss: 1.9270... 2.0751 sec/batch
Epoch: 3/20... Training Step: 567... Training loss: 1.9323... 2.3315 sec/batch
Epoch: 3/20... Training Step: 568... Training loss: 1.9344... 2.2792 sec/batch
Epoch: 3/20... Training Step: 569... Training loss: 1.9544... 2.2023 sec/batch
Epoch: 3/20... Training Step: 570... Training loss: 1.9478... 2.4560 sec/batch
Epoch: 3/20... Training Step: 571... Training loss: 1.9548... 2.4708 sec/batch
Epoch: 3/20... Training Step: 572... Training loss: 1.9360... 2.5540 sec/batch
Epoch: 3/20... Training Step: 573... Training loss: 1.9367... 2.3406 sec/batch
Epoch: 3/20... Training Step: 574... Training loss: 1.9384... 3.1864 sec/batch
Epoch: 3/20... Training Step: 575... Training loss: 1.9112... 2.3470 sec/batch
Epoch: 3/20... Training Step: 576... Training loss: 1.8966... 2.4821 sec/batch
Epoch: 3/20... Training Step: 577... Training loss: 1.9001... 2.6148 sec/batch
Epoch: 3/20... Training Step: 578... Training loss: 1.9322... 2.4583 sec/batch
Epoch: 3/20... Training Step: 579... Training loss: 1.9260... 2.2450 sec/batch
Epoch: 3/20... Training Step: 580... Training loss: 1.9500... 2.1460 sec/batch
Epoch: 3/20... Training Step: 581... Training loss: 1.9418... 2.5145 sec/batch
Epoch: 3/20... Training Step: 582... Training loss: 1.9144... 2.2057 sec/batch
Epoch: 3/20... Training Step: 583... Training loss: 1.9342... 2.4850 sec/batch
Epoch: 3/20... Training Step: 584... Training loss: 1.9211... 2.1541 sec/batch
Epoch: 3/20... Training Step: 585... Training loss: 1.9236... 2.7128 sec/batch
Epoch: 3/20... Training Step: 586... Training loss: 1.9317... 2.3073 sec/batch
Epoch: 3/20... Training Step: 587... Training loss: 1.9333... 2.5272 sec/batch
Epoch: 3/20... Training Step: 588... Training loss: 1.8961... 2.8610 sec/batch
Epoch: 3/20... Training Step: 589... Training loss: 1.9255... 2.8020 sec/batch
Epoch: 3/20... Training Step: 590... Training loss: 1.9037... 2.5561 sec/batch
Epoch: 3/20... Training Step: 591... Training loss: 1.8884... 2.2863 sec/batch
Epoch: 3/20... Training Step: 592... Training loss: 1.9246... 2.3632 sec/batch
Epoch: 3/20... Training Step: 593... Training loss: 1.8999... 2.4842 sec/batch
Epoch: 3/20... Training Step: 594... Training loss: 1.9203... 2.5376 sec/batch
Epoch: 4/20... Training Step: 595... Training loss: 1.9492... 2.7197 sec/batch
Epoch: 4/20... Training Step: 596... Training loss: 1.8826... 2.2711 sec/batch
Epoch: 4/20... Training Step: 597... Training loss: 1.8777... 2.6547 sec/batch
Epoch: 4/20... Training Step: 598... Training loss: 1.8927... 2.1838 sec/batch
Epoch: 4/20... Training Step: 599... Training loss: 1.8791... 2.3608 sec/batch
Epoch: 4/20... Training Step: 600... Training loss: 1.8552... 2.2168 sec/batch
Epoch: 4/20... Training Step: 601... Training loss: 1.8766... 2.1820 sec/batch
Epoch: 4/20... Training Step: 602... Training loss: 1.8914... 2.4608 sec/batch
Epoch: 4/20... Training Step: 603... Training loss: 1.9123... 2.1946 sec/batch
Epoch: 4/20... Training Step: 604... Training loss: 1.8810... 2.6991 sec/batch
Epoch: 4/20... Training Step: 605... Training loss: 1.8665... 2.1736 sec/batch
Epoch: 4/20... Training Step: 606... Training loss: 1.8525... 2.4712 sec/batch
Epoch: 4/20... Training Step: 607... Training loss: 1.8819... 2.3727 sec/batch
Epoch: 4/20... Training Step: 608... Training loss: 1.9047... 2.6809 sec/batch
Epoch: 4/20... Training Step: 609... Training loss: 1.8703... 2.3301 sec/batch
Epoch: 4/20... Training Step: 610... Training loss: 1.8502... 2.5480 sec/batch
Epoch: 4/20... Training Step: 611... Training loss: 1.8729... 2.3550 sec/batch
Epoch: 4/20... Training Step: 612... Training loss: 1.9036... 2.3773 sec/batch
Epoch: 4/20... Training Step: 613... Training loss: 1.8696... 2.4960 sec/batch
Epoch: 4/20... Training Step: 614... Training loss: 1.8725... 2.2792 sec/batch
Epoch: 4/20... Training Step: 615... Training loss: 1.8679... 2.2328 sec/batch
Epoch: 4/20... Training Step: 616... Training loss: 1.9104... 2.2310 sec/batch
Epoch: 4/20... Training Step: 617... Training loss: 1.8698... 2.3715 sec/batch
Epoch: 4/20... Training Step: 618... Training loss: 1.8560... 2.2770 sec/batch
Epoch: 4/20... Training Step: 619... Training loss: 1.8772... 2.5669 sec/batch
Epoch: 4/20... Training Step: 620... Training loss: 1.8644... 2.2947 sec/batch
Epoch: 4/20... Training Step: 621... Training loss: 1.8560... 2.4115 sec/batch
Epoch: 4/20... Training Step: 622... Training loss: 1.8781... 2.7200 sec/batch
Epoch: 4/20... Training Step: 623... Training loss: 1.9131... 2.9492 sec/batch
Epoch: 4/20... Training Step: 624... Training loss: 1.8992... 2.4207 sec/batch
Epoch: 4/20... Training Step: 625... Training loss: 1.8747... 2.3881 sec/batch
Epoch: 4/20... Training Step: 626... Training loss: 1.8476... 2.5404 sec/batch
Epoch: 4/20... Training Step: 627... Training loss: 1.8830... 2.1808 sec/batch
Epoch: 4/20... Training Step: 628... Training loss: 1.8987... 2.4131 sec/batch
Epoch: 4/20... Training Step: 629... Training loss: 1.8522... 3.0767 sec/batch
Epoch: 4/20... Training Step: 630... Training loss: 1.8724... 2.7377 sec/batch
Epoch: 4/20... Training Step: 631... Training loss: 1.8422... 2.3774 sec/batch
Epoch: 4/20... Training Step: 632... Training loss: 1.8230... 2.1569 sec/batch
Epoch: 4/20... Training Step: 633... Training loss: 1.8325... 2.3105 sec/batch
Epoch: 4/20... Training Step: 634... Training loss: 1.8338... 3.9858 sec/batch
Epoch: 4/20... Training Step: 635... Training loss: 1.8400... 3.1261 sec/batch
Epoch: 4/20... Training Step: 636... Training loss: 1.8755... 2.5155 sec/batch
Epoch: 4/20... Training Step: 637... Training loss: 1.8359... 2.1761 sec/batch
Epoch: 4/20... Training Step: 638... Training loss: 1.8361... 1.9148 sec/batch
Epoch: 4/20... Training Step: 639... Training loss: 1.8736... 1.8037 sec/batch
Epoch: 4/20... Training Step: 640... Training loss: 1.8213... 1.7118 sec/batch
Epoch: 4/20... Training Step: 641... Training loss: 1.8632... 1.6896 sec/batch
Epoch: 4/20... Training Step: 642... Training loss: 1.8479... 1.7223 sec/batch
Epoch: 4/20... Training Step: 643... Training loss: 1.8450... 2.5450 sec/batch
Epoch: 4/20... Training Step: 644... Training loss: 1.9069... 1.7767 sec/batch
Epoch: 4/20... Training Step: 645... Training loss: 1.8422... 1.8991 sec/batch
Epoch: 4/20... Training Step: 646... Training loss: 1.9173... 1.7784 sec/batch
Epoch: 4/20... Training Step: 647... Training loss: 1.8578... 1.8403 sec/batch
Epoch: 4/20... Training Step: 648... Training loss: 1.8533... 1.7845 sec/batch
Epoch: 4/20... Training Step: 649... Training loss: 1.8394... 1.7447 sec/batch
Epoch: 4/20... Training Step: 650... Training loss: 1.8533... 1.7945 sec/batch
Epoch: 4/20... Training Step: 651... Training loss: 1.8635... 2.0277 sec/batch
Epoch: 4/20... Training Step: 652... Training loss: 1.8335... 2.0753 sec/batch
Epoch: 4/20... Training Step: 653... Training loss: 1.8292... 1.8654 sec/batch
Epoch: 4/20... Training Step: 654... Training loss: 1.8779... 2.0531 sec/batch
Epoch: 4/20... Training Step: 655... Training loss: 1.8479... 1.9124 sec/batch
Epoch: 4/20... Training Step: 656... Training loss: 1.9015... 2.1048 sec/batch
Epoch: 4/20... Training Step: 657... Training loss: 1.8884... 2.1180 sec/batch
Epoch: 4/20... Training Step: 658... Training loss: 1.8647... 2.4419 sec/batch
Epoch: 4/20... Training Step: 659... Training loss: 1.8524... 2.4276 sec/batch
Epoch: 4/20... Training Step: 660... Training loss: 1.8871... 2.6080 sec/batch
Epoch: 4/20... Training Step: 661... Training loss: 1.8666... 2.4792 sec/batch
Epoch: 4/20... Training Step: 662... Training loss: 1.8295... 1.6730 sec/batch
Epoch: 4/20... Training Step: 663... Training loss: 1.8346... 1.6534 sec/batch
Epoch: 4/20... Training Step: 664... Training loss: 1.8495... 1.7747 sec/batch
Epoch: 4/20... Training Step: 665... Training loss: 1.8826... 1.7987 sec/batch
Epoch: 4/20... Training Step: 666... Training loss: 1.8733... 1.7270 sec/batch
Epoch: 4/20... Training Step: 667... Training loss: 1.8788... 1.7310 sec/batch
Epoch: 4/20... Training Step: 668... Training loss: 1.8553... 1.9741 sec/batch
Epoch: 4/20... Training Step: 669... Training loss: 1.8563... 2.0294 sec/batch
Epoch: 4/20... Training Step: 670... Training loss: 1.8717... 1.7427 sec/batch
Epoch: 4/20... Training Step: 671... Training loss: 1.8491... 1.7013 sec/batch
Epoch: 4/20... Training Step: 672... Training loss: 1.8603... 2.3847 sec/batch
Epoch: 4/20... Training Step: 673... Training loss: 1.8194... 2.6886 sec/batch
Epoch: 4/20... Training Step: 674... Training loss: 1.8297... 2.6755 sec/batch
Epoch: 4/20... Training Step: 675... Training loss: 1.8136... 2.8939 sec/batch
Epoch: 4/20... Training Step: 676... Training loss: 1.8648... 2.2125 sec/batch
Epoch: 4/20... Training Step: 677... Training loss: 1.8149... 2.1777 sec/batch
Epoch: 4/20... Training Step: 678... Training loss: 1.8494... 2.9936 sec/batch
Epoch: 4/20... Training Step: 679... Training loss: 1.7997... 2.3241 sec/batch
Epoch: 4/20... Training Step: 680... Training loss: 1.8129... 2.7047 sec/batch
Epoch: 4/20... Training Step: 681... Training loss: 1.8289... 1.9135 sec/batch
Epoch: 4/20... Training Step: 682... Training loss: 1.8176... 1.7511 sec/batch
Epoch: 4/20... Training Step: 683... Training loss: 1.7988... 1.6895 sec/batch
Epoch: 4/20... Training Step: 684... Training loss: 1.8505... 1.7942 sec/batch
Epoch: 4/20... Training Step: 685... Training loss: 1.8147... 1.6127 sec/batch
Epoch: 4/20... Training Step: 686... Training loss: 1.8309... 1.7914 sec/batch
Epoch: 4/20... Training Step: 687... Training loss: 1.8166... 1.7448 sec/batch
Epoch: 4/20... Training Step: 688... Training loss: 1.8087... 1.8180 sec/batch
Epoch: 4/20... Training Step: 689... Training loss: 1.8086... 1.8203 sec/batch
Epoch: 4/20... Training Step: 690... Training loss: 1.8371... 1.6332 sec/batch
Epoch: 4/20... Training Step: 691... Training loss: 1.8320... 1.7692 sec/batch
Epoch: 4/20... Training Step: 692... Training loss: 1.8090... 1.7422 sec/batch
Epoch: 4/20... Training Step: 693... Training loss: 1.8265... 1.8581 sec/batch
Epoch: 4/20... Training Step: 694... Training loss: 1.7958... 1.6204 sec/batch
Epoch: 4/20... Training Step: 695... Training loss: 1.8241... 1.5990 sec/batch
Epoch: 4/20... Training Step: 696... Training loss: 1.8148... 1.5971 sec/batch
Epoch: 4/20... Training Step: 697... Training loss: 1.8099... 1.9723 sec/batch
Epoch: 4/20... Training Step: 698... Training loss: 1.8072... 2.8427 sec/batch
Epoch: 4/20... Training Step: 699... Training loss: 1.8181... 1.9762 sec/batch
Epoch: 4/20... Training Step: 700... Training loss: 1.8298... 1.7520 sec/batch
Epoch: 4/20... Training Step: 701... Training loss: 1.8178... 1.7815 sec/batch
Epoch: 4/20... Training Step: 702... Training loss: 1.8235... 1.6866 sec/batch
Epoch: 4/20... Training Step: 703... Training loss: 1.8269... 2.1249 sec/batch
Epoch: 4/20... Training Step: 704... Training loss: 1.8292... 2.3222 sec/batch
Epoch: 4/20... Training Step: 705... Training loss: 1.8133... 2.3084 sec/batch
Epoch: 4/20... Training Step: 706... Training loss: 1.8111... 2.0347 sec/batch
Epoch: 4/20... Training Step: 707... Training loss: 1.8205... 2.0583 sec/batch
Epoch: 4/20... Training Step: 708... Training loss: 1.8104... 2.5126 sec/batch
Epoch: 4/20... Training Step: 709... Training loss: 1.8006... 2.5983 sec/batch
Epoch: 4/20... Training Step: 710... Training loss: 1.7915... 2.4854 sec/batch
Epoch: 4/20... Training Step: 711... Training loss: 1.8226... 2.7666 sec/batch
Epoch: 4/20... Training Step: 712... Training loss: 1.8102... 1.9521 sec/batch
Epoch: 4/20... Training Step: 713... Training loss: 1.8135... 2.0943 sec/batch
Epoch: 4/20... Training Step: 714... Training loss: 1.8136... 2.3485 sec/batch
Epoch: 4/20... Training Step: 715... Training loss: 1.8361... 2.2357 sec/batch
Epoch: 4/20... Training Step: 716... Training loss: 1.7917... 2.0458 sec/batch
Epoch: 4/20... Training Step: 717... Training loss: 1.8045... 2.1499 sec/batch
Epoch: 4/20... Training Step: 718... Training loss: 1.8369... 2.0593 sec/batch
Epoch: 4/20... Training Step: 719... Training loss: 1.8087... 2.1698 sec/batch
Epoch: 4/20... Training Step: 720... Training loss: 1.7692... 2.0276 sec/batch
Epoch: 4/20... Training Step: 721... Training loss: 1.8247... 2.1220 sec/batch
Epoch: 4/20... Training Step: 722... Training loss: 1.8312... 1.9966 sec/batch
Epoch: 4/20... Training Step: 723... Training loss: 1.8129... 2.3838 sec/batch
Epoch: 4/20... Training Step: 724... Training loss: 1.8033... 2.0821 sec/batch
Epoch: 4/20... Training Step: 725... Training loss: 1.7910... 2.1119 sec/batch
Epoch: 4/20... Training Step: 726... Training loss: 1.7888... 2.0558 sec/batch
Epoch: 4/20... Training Step: 727... Training loss: 1.8220... 2.0737 sec/batch
Epoch: 4/20... Training Step: 728... Training loss: 1.8315... 2.1293 sec/batch
Epoch: 4/20... Training Step: 729... Training loss: 1.8220... 2.2006 sec/batch
Epoch: 4/20... Training Step: 730... Training loss: 1.8167... 1.9947 sec/batch
Epoch: 4/20... Training Step: 731... Training loss: 1.8377... 2.0546 sec/batch
Epoch: 4/20... Training Step: 732... Training loss: 1.8061... 1.9783 sec/batch
Epoch: 4/20... Training Step: 733... Training loss: 1.8315... 1.9661 sec/batch
Epoch: 4/20... Training Step: 734... Training loss: 1.7952... 2.0029 sec/batch
Epoch: 4/20... Training Step: 735... Training loss: 1.8475... 1.9724 sec/batch
Epoch: 4/20... Training Step: 736... Training loss: 1.8079... 2.4360 sec/batch
Epoch: 4/20... Training Step: 737... Training loss: 1.8137... 1.9377 sec/batch
Epoch: 4/20... Training Step: 738... Training loss: 1.8091... 1.9428 sec/batch
Epoch: 4/20... Training Step: 739... Training loss: 1.8003... 2.1270 sec/batch
Epoch: 4/20... Training Step: 740... Training loss: 1.8297... 2.0771 sec/batch
Epoch: 4/20... Training Step: 741... Training loss: 1.8239... 2.0302 sec/batch
Epoch: 4/20... Training Step: 742... Training loss: 1.8318... 1.9213 sec/batch
Epoch: 4/20... Training Step: 743... Training loss: 1.8109... 2.3819 sec/batch
Epoch: 4/20... Training Step: 744... Training loss: 1.8117... 2.4812 sec/batch
Epoch: 4/20... Training Step: 745... Training loss: 1.7844... 2.2370 sec/batch
Epoch: 4/20... Training Step: 746... Training loss: 1.8323... 1.9730 sec/batch
Epoch: 4/20... Training Step: 747... Training loss: 1.8109... 1.9374 sec/batch
Epoch: 4/20... Training Step: 748... Training loss: 1.8188... 1.9590 sec/batch
Epoch: 4/20... Training Step: 749... Training loss: 1.8055... 1.9197 sec/batch
Epoch: 4/20... Training Step: 750... Training loss: 1.7999... 1.9198 sec/batch
Epoch: 4/20... Training Step: 751... Training loss: 1.8131... 1.9253 sec/batch
Epoch: 4/20... Training Step: 752... Training loss: 1.8042... 1.9488 sec/batch
Epoch: 4/20... Training Step: 753... Training loss: 1.7745... 2.0045 sec/batch
Epoch: 4/20... Training Step: 754... Training loss: 1.8228... 1.9745 sec/batch
Epoch: 4/20... Training Step: 755... Training loss: 1.8304... 1.9446 sec/batch
Epoch: 4/20... Training Step: 756... Training loss: 1.8014... 1.9313 sec/batch
Epoch: 4/20... Training Step: 757... Training loss: 1.8141... 1.9217 sec/batch
Epoch: 4/20... Training Step: 758... Training loss: 1.7995... 1.9226 sec/batch
Epoch: 4/20... Training Step: 759... Training loss: 1.8000... 1.9498 sec/batch
Epoch: 4/20... Training Step: 760... Training loss: 1.8054... 1.9246 sec/batch
Epoch: 4/20... Training Step: 761... Training loss: 1.8196... 1.9451 sec/batch
Epoch: 4/20... Training Step: 762... Training loss: 1.8447... 1.9174 sec/batch
Epoch: 4/20... Training Step: 763... Training loss: 1.8013... 1.9489 sec/batch
Epoch: 4/20... Training Step: 764... Training loss: 1.7911... 1.9457 sec/batch
Epoch: 4/20... Training Step: 765... Training loss: 1.7908... 1.9461 sec/batch
Epoch: 4/20... Training Step: 766... Training loss: 1.8141... 1.9422 sec/batch
Epoch: 4/20... Training Step: 767... Training loss: 1.8218... 1.9270 sec/batch
Epoch: 4/20... Training Step: 768... Training loss: 1.8246... 1.9206 sec/batch
Epoch: 4/20... Training Step: 769... Training loss: 1.8168... 1.9690 sec/batch
Epoch: 4/20... Training Step: 770... Training loss: 1.7873... 1.9132 sec/batch
Epoch: 4/20... Training Step: 771... Training loss: 1.7886... 1.9672 sec/batch
Epoch: 4/20... Training Step: 772... Training loss: 1.8115... 1.9253 sec/batch
Epoch: 4/20... Training Step: 773... Training loss: 1.7721... 2.0768 sec/batch
Epoch: 4/20... Training Step: 774... Training loss: 1.7645... 2.8624 sec/batch
Epoch: 4/20... Training Step: 775... Training loss: 1.7651... 1.9837 sec/batch
Epoch: 4/20... Training Step: 776... Training loss: 1.7899... 1.7820 sec/batch
Epoch: 4/20... Training Step: 777... Training loss: 1.7871... 1.6254 sec/batch
Epoch: 4/20... Training Step: 778... Training loss: 1.8021... 1.8746 sec/batch
Epoch: 4/20... Training Step: 779... Training loss: 1.8012... 2.5603 sec/batch
Epoch: 4/20... Training Step: 780... Training loss: 1.7819... 2.5835 sec/batch
Epoch: 4/20... Training Step: 781... Training loss: 1.8085... 2.0280 sec/batch
Epoch: 4/20... Training Step: 782... Training loss: 1.7835... 2.3202 sec/batch
Epoch: 4/20... Training Step: 783... Training loss: 1.8006... 2.2455 sec/batch
Epoch: 4/20... Training Step: 784... Training loss: 1.7996... 1.7993 sec/batch
Epoch: 4/20... Training Step: 785... Training loss: 1.7959... 2.1988 sec/batch
Epoch: 4/20... Training Step: 786... Training loss: 1.7740... 2.4777 sec/batch
Epoch: 4/20... Training Step: 787... Training loss: 1.7920... 1.6222 sec/batch
Epoch: 4/20... Training Step: 788... Training loss: 1.7627... 1.5804 sec/batch
Epoch: 4/20... Training Step: 789... Training loss: 1.7513... 1.6259 sec/batch
Epoch: 4/20... Training Step: 790... Training loss: 1.7834... 1.7051 sec/batch
Epoch: 4/20... Training Step: 791... Training loss: 1.7697... 1.7268 sec/batch
Epoch: 4/20... Training Step: 792... Training loss: 1.7748... 1.7122 sec/batch
Epoch: 5/20... Training Step: 793... Training loss: 1.8568... 1.6118 sec/batch
Epoch: 5/20... Training Step: 794... Training loss: 1.7815... 1.6109 sec/batch
Epoch: 5/20... Training Step: 795... Training loss: 1.7724... 1.6305 sec/batch
Epoch: 5/20... Training Step: 796... Training loss: 1.7964... 1.5976 sec/batch
Epoch: 5/20... Training Step: 797... Training loss: 1.7689... 1.5923 sec/batch
Epoch: 5/20... Training Step: 798... Training loss: 1.7492... 1.5791 sec/batch
Epoch: 5/20... Training Step: 799... Training loss: 1.7762... 1.5650 sec/batch
Epoch: 5/20... Training Step: 800... Training loss: 1.7736... 1.6022 sec/batch
Epoch: 5/20... Training Step: 801... Training loss: 1.8074... 1.6042 sec/batch
Epoch: 5/20... Training Step: 802... Training loss: 1.7765... 1.5671 sec/batch
Epoch: 5/20... Training Step: 803... Training loss: 1.7541... 1.6052 sec/batch
Epoch: 5/20... Training Step: 804... Training loss: 1.7619... 1.5866 sec/batch
Epoch: 5/20... Training Step: 805... Training loss: 1.7794... 1.5618 sec/batch
Epoch: 5/20... Training Step: 806... Training loss: 1.8158... 1.5824 sec/batch
Epoch: 5/20... Training Step: 807... Training loss: 1.7707... 1.5722 sec/batch
Epoch: 5/20... Training Step: 808... Training loss: 1.7504... 1.5844 sec/batch
Epoch: 5/20... Training Step: 809... Training loss: 1.7816... 1.5747 sec/batch
Epoch: 5/20... Training Step: 810... Training loss: 1.8057... 1.5780 sec/batch
Epoch: 5/20... Training Step: 811... Training loss: 1.7812... 1.5680 sec/batch
Epoch: 5/20... Training Step: 812... Training loss: 1.7818... 1.5675 sec/batch
Epoch: 5/20... Training Step: 813... Training loss: 1.7594... 1.5804 sec/batch
Epoch: 5/20... Training Step: 814... Training loss: 1.8193... 1.5659 sec/batch
Epoch: 5/20... Training Step: 815... Training loss: 1.7787... 1.5729 sec/batch
Epoch: 5/20... Training Step: 816... Training loss: 1.7711... 1.5661 sec/batch
Epoch: 5/20... Training Step: 817... Training loss: 1.7753... 1.5765 sec/batch
Epoch: 5/20... Training Step: 818... Training loss: 1.7472... 1.5769 sec/batch
Epoch: 5/20... Training Step: 819... Training loss: 1.7592... 1.6027 sec/batch
Epoch: 5/20... Training Step: 820... Training loss: 1.7952... 1.5935 sec/batch
Epoch: 5/20... Training Step: 821... Training loss: 1.8014... 1.6402 sec/batch
Epoch: 5/20... Training Step: 822... Training loss: 1.7947... 1.6592 sec/batch
Epoch: 5/20... Training Step: 823... Training loss: 1.7764... 1.6454 sec/batch
Epoch: 5/20... Training Step: 824... Training loss: 1.7568... 1.6444 sec/batch
Epoch: 5/20... Training Step: 825... Training loss: 1.7936... 1.7417 sec/batch
Epoch: 5/20... Training Step: 826... Training loss: 1.7975... 1.6135 sec/batch
Epoch: 5/20... Training Step: 827... Training loss: 1.7517... 1.7180 sec/batch
Epoch: 5/20... Training Step: 828... Training loss: 1.7805... 1.7050 sec/batch
Epoch: 5/20... Training Step: 829... Training loss: 1.7470... 1.6016 sec/batch
Epoch: 5/20... Training Step: 830... Training loss: 1.7420... 1.6069 sec/batch
Epoch: 5/20... Training Step: 831... Training loss: 1.7359... 1.5742 sec/batch
Epoch: 5/20... Training Step: 832... Training loss: 1.7546... 1.5787 sec/batch
Epoch: 5/20... Training Step: 833... Training loss: 1.7344... 1.5763 sec/batch
Epoch: 5/20... Training Step: 834... Training loss: 1.7914... 1.5919 sec/batch
Epoch: 5/20... Training Step: 835... Training loss: 1.7543... 1.5833 sec/batch
Epoch: 5/20... Training Step: 836... Training loss: 1.7314... 1.5868 sec/batch
Epoch: 5/20... Training Step: 837... Training loss: 1.7820... 1.6378 sec/batch
Epoch: 5/20... Training Step: 838... Training loss: 1.7264... 2.2606 sec/batch
Epoch: 5/20... Training Step: 839... Training loss: 1.7634... 2.1282 sec/batch
Epoch: 5/20... Training Step: 840... Training loss: 1.7503... 2.2883 sec/batch
Epoch: 5/20... Training Step: 841... Training loss: 1.7523... 1.6258 sec/batch
Epoch: 5/20... Training Step: 842... Training loss: 1.8082... 1.9996 sec/batch
Epoch: 5/20... Training Step: 843... Training loss: 1.7442... 1.8995 sec/batch
Epoch: 5/20... Training Step: 844... Training loss: 1.8199... 1.6702 sec/batch
Epoch: 5/20... Training Step: 845... Training loss: 1.7813... 1.6252 sec/batch
Epoch: 5/20... Training Step: 846... Training loss: 1.7640... 1.5958 sec/batch
Epoch: 5/20... Training Step: 847... Training loss: 1.7444... 1.5772 sec/batch
Epoch: 5/20... Training Step: 848... Training loss: 1.7598... 1.6556 sec/batch
Epoch: 5/20... Training Step: 849... Training loss: 1.7780... 1.6903 sec/batch
Epoch: 5/20... Training Step: 850... Training loss: 1.7535... 1.5837 sec/batch
Epoch: 5/20... Training Step: 851... Training loss: 1.7483... 1.5611 sec/batch
Epoch: 5/20... Training Step: 852... Training loss: 1.7937... 1.6747 sec/batch
Epoch: 5/20... Training Step: 853... Training loss: 1.7692... 1.9975 sec/batch
Epoch: 5/20... Training Step: 854... Training loss: 1.8048... 2.1526 sec/batch
Epoch: 5/20... Training Step: 855... Training loss: 1.7956... 1.9838 sec/batch
Epoch: 5/20... Training Step: 856... Training loss: 1.7808... 1.6770 sec/batch
Epoch: 5/20... Training Step: 857... Training loss: 1.7584... 1.6009 sec/batch
Epoch: 5/20... Training Step: 858... Training loss: 1.7966... 2.5707 sec/batch
Epoch: 5/20... Training Step: 859... Training loss: 1.7827... 2.5227 sec/batch
Epoch: 5/20... Training Step: 860... Training loss: 1.7339... 2.7457 sec/batch
Epoch: 5/20... Training Step: 861... Training loss: 1.7532... 2.8049 sec/batch
Epoch: 5/20... Training Step: 862... Training loss: 1.7449... 2.3021 sec/batch
Epoch: 5/20... Training Step: 863... Training loss: 1.7970... 2.1119 sec/batch
Epoch: 5/20... Training Step: 864... Training loss: 1.7811... 1.9675 sec/batch
Epoch: 5/20... Training Step: 865... Training loss: 1.7940... 1.8450 sec/batch
Epoch: 5/20... Training Step: 866... Training loss: 1.7579... 2.3406 sec/batch
Epoch: 5/20... Training Step: 867... Training loss: 1.7668... 1.6328 sec/batch
Epoch: 5/20... Training Step: 868... Training loss: 1.7793... 1.6742 sec/batch
Epoch: 5/20... Training Step: 869... Training loss: 1.7632... 1.5911 sec/batch
Epoch: 5/20... Training Step: 870... Training loss: 1.7608... 1.5919 sec/batch
Epoch: 5/20... Training Step: 871... Training loss: 1.7194... 1.5896 sec/batch
Epoch: 5/20... Training Step: 872... Training loss: 1.7482... 1.5928 sec/batch
Epoch: 5/20... Training Step: 873... Training loss: 1.7283... 1.5807 sec/batch
Epoch: 5/20... Training Step: 874... Training loss: 1.7628... 1.5992 sec/batch
Epoch: 5/20... Training Step: 875... Training loss: 1.7221... 1.6003 sec/batch
Epoch: 5/20... Training Step: 876... Training loss: 1.7459... 1.5952 sec/batch
Epoch: 5/20... Training Step: 877... Training loss: 1.7181... 1.5946 sec/batch
Epoch: 5/20... Training Step: 878... Training loss: 1.7287... 1.6083 sec/batch
Epoch: 5/20... Training Step: 879... Training loss: 1.7344... 1.6292 sec/batch
Epoch: 5/20... Training Step: 880... Training loss: 1.7350... 1.5731 sec/batch
Epoch: 5/20... Training Step: 881... Training loss: 1.7150... 1.5984 sec/batch
Epoch: 5/20... Training Step: 882... Training loss: 1.7719... 1.6450 sec/batch
Epoch: 5/20... Training Step: 883... Training loss: 1.7228... 1.6606 sec/batch
Epoch: 5/20... Training Step: 884... Training loss: 1.7338... 2.1287 sec/batch
Epoch: 5/20... Training Step: 885... Training loss: 1.7259... 2.0331 sec/batch
Epoch: 5/20... Training Step: 886... Training loss: 1.7188... 2.2518 sec/batch
Epoch: 5/20... Training Step: 887... Training loss: 1.7293... 1.6755 sec/batch
Epoch: 5/20... Training Step: 888... Training loss: 1.7596... 1.6383 sec/batch
Epoch: 5/20... Training Step: 889... Training loss: 1.7431... 1.6018 sec/batch
Epoch: 5/20... Training Step: 890... Training loss: 1.7105... 1.6023 sec/batch
Epoch: 5/20... Training Step: 891... Training loss: 1.7298... 1.6083 sec/batch
Epoch: 5/20... Training Step: 892... Training loss: 1.7098... 1.6214 sec/batch
Epoch: 5/20... Training Step: 893... Training loss: 1.7483... 1.6060 sec/batch
Epoch: 5/20... Training Step: 894... Training loss: 1.7267... 1.6226 sec/batch
Epoch: 5/20... Training Step: 895... Training loss: 1.7240... 1.6334 sec/batch
Epoch: 5/20... Training Step: 896... Training loss: 1.7267... 1.5746 sec/batch
Epoch: 5/20... Training Step: 897... Training loss: 1.7251... 1.5859 sec/batch
Epoch: 5/20... Training Step: 898... Training loss: 1.7370... 1.6063 sec/batch
Epoch: 5/20... Training Step: 899... Training loss: 1.7259... 1.6276 sec/batch
Epoch: 5/20... Training Step: 900... Training loss: 1.7415... 1.6281 sec/batch
Epoch: 5/20... Training Step: 901... Training loss: 1.7370... 1.6065 sec/batch
Epoch: 5/20... Training Step: 902... Training loss: 1.7555... 1.5985 sec/batch
Epoch: 5/20... Training Step: 903... Training loss: 1.7354... 1.5776 sec/batch
Epoch: 5/20... Training Step: 904... Training loss: 1.7205... 1.6444 sec/batch
Epoch: 5/20... Training Step: 905... Training loss: 1.7489... 1.5963 sec/batch
Epoch: 5/20... Training Step: 906... Training loss: 1.7168... 1.5757 sec/batch
Epoch: 5/20... Training Step: 907... Training loss: 1.7174... 1.5945 sec/batch
Epoch: 5/20... Training Step: 908... Training loss: 1.7120... 1.9104 sec/batch
Epoch: 5/20... Training Step: 909... Training loss: 1.7511... 1.8667 sec/batch
Epoch: 5/20... Training Step: 910... Training loss: 1.7389... 2.2085 sec/batch
Epoch: 5/20... Training Step: 911... Training loss: 1.7356... 2.5748 sec/batch
Epoch: 5/20... Training Step: 912... Training loss: 1.7282... 2.7134 sec/batch
Epoch: 5/20... Training Step: 913... Training loss: 1.7513... 1.8820 sec/batch
Epoch: 5/20... Training Step: 914... Training loss: 1.7081... 2.5158 sec/batch
Epoch: 5/20... Training Step: 915... Training loss: 1.7161... 1.8463 sec/batch
Epoch: 5/20... Training Step: 916... Training loss: 1.7562... 2.2526 sec/batch
Epoch: 5/20... Training Step: 917... Training loss: 1.7254... 2.0081 sec/batch
Epoch: 5/20... Training Step: 918... Training loss: 1.6929... 1.9963 sec/batch
Epoch: 5/20... Training Step: 919... Training loss: 1.7577... 1.9949 sec/batch
Epoch: 5/20... Training Step: 920... Training loss: 1.7421... 1.8850 sec/batch
Epoch: 5/20... Training Step: 921... Training loss: 1.7205... 1.8703 sec/batch
Epoch: 5/20... Training Step: 922... Training loss: 1.7174... 1.7569 sec/batch
Epoch: 5/20... Training Step: 923... Training loss: 1.7050... 1.8150 sec/batch
Epoch: 5/20... Training Step: 924... Training loss: 1.7111... 1.9369 sec/batch
Epoch: 5/20... Training Step: 925... Training loss: 1.7443... 1.8334 sec/batch
Epoch: 5/20... Training Step: 926... Training loss: 1.7437... 2.0825 sec/batch
Epoch: 5/20... Training Step: 927... Training loss: 1.7389... 1.9937 sec/batch
Epoch: 5/20... Training Step: 928... Training loss: 1.7393... 3.9284 sec/batch
Epoch: 5/20... Training Step: 929... Training loss: 1.7592... 3.4461 sec/batch
Epoch: 5/20... Training Step: 930... Training loss: 1.7321... 2.2478 sec/batch
Epoch: 5/20... Training Step: 931... Training loss: 1.7524... 2.8537 sec/batch
Epoch: 5/20... Training Step: 932... Training loss: 1.7222... 3.1908 sec/batch
Epoch: 5/20... Training Step: 933... Training loss: 1.7735... 2.6702 sec/batch
Epoch: 5/20... Training Step: 934... Training loss: 1.7345... 2.0984 sec/batch
Epoch: 5/20... Training Step: 935... Training loss: 1.7312... 1.6210 sec/batch
Epoch: 5/20... Training Step: 936... Training loss: 1.7376... 1.7400 sec/batch
Epoch: 5/20... Training Step: 937... Training loss: 1.7115... 1.9285 sec/batch
Epoch: 5/20... Training Step: 938... Training loss: 1.7532... 2.0917 sec/batch
Epoch: 5/20... Training Step: 939... Training loss: 1.7428... 1.8649 sec/batch
Epoch: 5/20... Training Step: 940... Training loss: 1.7669... 2.0821 sec/batch
Epoch: 5/20... Training Step: 941... Training loss: 1.7364... 1.8653 sec/batch
Epoch: 5/20... Training Step: 942... Training loss: 1.7222... 2.3830 sec/batch
Epoch: 5/20... Training Step: 943... Training loss: 1.7020... 1.9057 sec/batch
Epoch: 5/20... Training Step: 944... Training loss: 1.7434... 1.8464 sec/batch
Epoch: 5/20... Training Step: 945... Training loss: 1.7363... 1.6879 sec/batch
Epoch: 5/20... Training Step: 946... Training loss: 1.7374... 2.0600 sec/batch
Epoch: 5/20... Training Step: 947... Training loss: 1.7356... 1.6270 sec/batch
Epoch: 5/20... Training Step: 948... Training loss: 1.7384... 1.5824 sec/batch
Epoch: 5/20... Training Step: 949... Training loss: 1.7405... 1.9861 sec/batch
Epoch: 5/20... Training Step: 950... Training loss: 1.7294... 2.1644 sec/batch
Epoch: 5/20... Training Step: 951... Training loss: 1.6908... 1.9242 sec/batch
Epoch: 5/20... Training Step: 952... Training loss: 1.7503... 2.1178 sec/batch
Epoch: 5/20... Training Step: 953... Training loss: 1.7643... 1.9491 sec/batch
Epoch: 5/20... Training Step: 954... Training loss: 1.7178... 2.0956 sec/batch
Epoch: 5/20... Training Step: 955... Training loss: 1.7536... 1.6532 sec/batch
Epoch: 5/20... Training Step: 956... Training loss: 1.7276... 1.7369 sec/batch
Epoch: 5/20... Training Step: 957... Training loss: 1.7157... 1.6619 sec/batch
Epoch: 5/20... Training Step: 958... Training loss: 1.7250... 1.6067 sec/batch
Epoch: 5/20... Training Step: 959... Training loss: 1.7355... 2.1594 sec/batch
Epoch: 5/20... Training Step: 960... Training loss: 1.7940... 1.9281 sec/batch
Epoch: 5/20... Training Step: 961... Training loss: 1.7211... 1.9365 sec/batch
Epoch: 5/20... Training Step: 962... Training loss: 1.7308... 2.3711 sec/batch
Epoch: 5/20... Training Step: 963... Training loss: 1.7128... 2.1136 sec/batch
Epoch: 5/20... Training Step: 964... Training loss: 1.7153... 1.6096 sec/batch
Epoch: 5/20... Training Step: 965... Training loss: 1.7560... 1.6393 sec/batch
Epoch: 5/20... Training Step: 966... Training loss: 1.7453... 1.6837 sec/batch
Epoch: 5/20... Training Step: 967... Training loss: 1.7525... 1.8199 sec/batch
Epoch: 5/20... Training Step: 968... Training loss: 1.7050... 1.8878 sec/batch
Epoch: 5/20... Training Step: 969... Training loss: 1.7139... 1.7241 sec/batch
Epoch: 5/20... Training Step: 970... Training loss: 1.7486... 2.9181 sec/batch
Epoch: 5/20... Training Step: 971... Training loss: 1.7068... 1.8317 sec/batch
Epoch: 5/20... Training Step: 972... Training loss: 1.6858... 2.1576 sec/batch
Epoch: 5/20... Training Step: 973... Training loss: 1.7002... 1.7012 sec/batch
Epoch: 5/20... Training Step: 974... Training loss: 1.7180... 2.0090 sec/batch
Epoch: 5/20... Training Step: 975... Training loss: 1.7192... 1.6395 sec/batch
Epoch: 5/20... Training Step: 976... Training loss: 1.7326... 1.5983 sec/batch
Epoch: 5/20... Training Step: 977... Training loss: 1.7296... 1.6038 sec/batch
Epoch: 5/20... Training Step: 978... Training loss: 1.6947... 1.6112 sec/batch
Epoch: 5/20... Training Step: 979... Training loss: 1.7349... 1.6652 sec/batch
Epoch: 5/20... Training Step: 980... Training loss: 1.7156... 1.7277 sec/batch
Epoch: 5/20... Training Step: 981... Training loss: 1.7100... 1.6338 sec/batch
Epoch: 5/20... Training Step: 982... Training loss: 1.7267... 1.6251 sec/batch
Epoch: 5/20... Training Step: 983... Training loss: 1.7284... 1.6978 sec/batch
Epoch: 5/20... Training Step: 984... Training loss: 1.6884... 1.7878 sec/batch
Epoch: 5/20... Training Step: 985... Training loss: 1.7212... 1.7289 sec/batch
Epoch: 5/20... Training Step: 986... Training loss: 1.6890... 1.6339 sec/batch
Epoch: 5/20... Training Step: 987... Training loss: 1.7004... 1.6070 sec/batch
Epoch: 5/20... Training Step: 988... Training loss: 1.7185... 1.5984 sec/batch
Epoch: 5/20... Training Step: 989... Training loss: 1.7100... 1.5926 sec/batch
Epoch: 5/20... Training Step: 990... Training loss: 1.6931... 1.5768 sec/batch
Epoch: 6/20... Training Step: 991... Training loss: 1.7890... 1.5938 sec/batch
Epoch: 6/20... Training Step: 992... Training loss: 1.7115... 1.5944 sec/batch
Epoch: 6/20... Training Step: 993... Training loss: 1.6986... 1.6645 sec/batch
Epoch: 6/20... Training Step: 994... Training loss: 1.7193... 1.6163 sec/batch
Epoch: 6/20... Training Step: 995... Training loss: 1.6999... 1.6063 sec/batch
Epoch: 6/20... Training Step: 996... Training loss: 1.6795... 1.6162 sec/batch
Epoch: 6/20... Training Step: 997... Training loss: 1.7054... 1.7029 sec/batch
Epoch: 6/20... Training Step: 998... Training loss: 1.7052... 1.6073 sec/batch
Epoch: 6/20... Training Step: 999... Training loss: 1.7253... 1.6943 sec/batch
Epoch: 6/20... Training Step: 1000... Training loss: 1.7039... 1.6805 sec/batch
Epoch: 6/20... Training Step: 1001... Training loss: 1.6930... 1.6333 sec/batch
Epoch: 6/20... Training Step: 1002... Training loss: 1.6893... 1.6321 sec/batch
Epoch: 6/20... Training Step: 1003... Training loss: 1.6923... 1.6308 sec/batch
Epoch: 6/20... Training Step: 1004... Training loss: 1.7392... 1.6122 sec/batch
Epoch: 6/20... Training Step: 1005... Training loss: 1.6997... 1.6167 sec/batch
Epoch: 6/20... Training Step: 1006... Training loss: 1.6730... 1.6089 sec/batch
Epoch: 6/20... Training Step: 1007... Training loss: 1.7130... 1.6629 sec/batch
Epoch: 6/20... Training Step: 1008... Training loss: 1.7406... 1.5933 sec/batch
Epoch: 6/20... Training Step: 1009... Training loss: 1.7150... 1.5910 sec/batch
Epoch: 6/20... Training Step: 1010... Training loss: 1.7047... 1.5967 sec/batch
Epoch: 6/20... Training Step: 1011... Training loss: 1.6932... 1.5863 sec/batch
Epoch: 6/20... Training Step: 1012... Training loss: 1.7474... 1.5996 sec/batch
Epoch: 6/20... Training Step: 1013... Training loss: 1.7055... 1.5986 sec/batch
Epoch: 6/20... Training Step: 1014... Training loss: 1.7022... 1.6347 sec/batch
Epoch: 6/20... Training Step: 1015... Training loss: 1.7113... 1.8056 sec/batch
Epoch: 6/20... Training Step: 1016... Training loss: 1.6791... 1.8720 sec/batch
Epoch: 6/20... Training Step: 1017... Training loss: 1.6886... 1.6411 sec/batch
Epoch: 6/20... Training Step: 1018... Training loss: 1.7206... 1.7022 sec/batch
Epoch: 6/20... Training Step: 1019... Training loss: 1.7298... 1.7111 sec/batch
Epoch: 6/20... Training Step: 1020... Training loss: 1.7189... 1.6238 sec/batch
Epoch: 6/20... Training Step: 1021... Training loss: 1.7081... 1.6200 sec/batch
Epoch: 6/20... Training Step: 1022... Training loss: 1.6847... 1.6025 sec/batch
Epoch: 6/20... Training Step: 1023... Training loss: 1.7163... 1.6575 sec/batch
Epoch: 6/20... Training Step: 1024... Training loss: 1.7168... 1.6039 sec/batch
Epoch: 6/20... Training Step: 1025... Training loss: 1.6892... 1.5789 sec/batch
Epoch: 6/20... Training Step: 1026... Training loss: 1.7096... 1.6095 sec/batch
Epoch: 6/20... Training Step: 1027... Training loss: 1.6719... 1.6516 sec/batch
Epoch: 6/20... Training Step: 1028... Training loss: 1.6716... 1.5784 sec/batch
Epoch: 6/20... Training Step: 1029... Training loss: 1.6659... 1.6222 sec/batch
Epoch: 6/20... Training Step: 1030... Training loss: 1.6735... 1.6290 sec/batch
Epoch: 6/20... Training Step: 1031... Training loss: 1.6780... 1.5941 sec/batch
Epoch: 6/20... Training Step: 1032... Training loss: 1.7223... 1.5881 sec/batch
Epoch: 6/20... Training Step: 1033... Training loss: 1.6802... 1.6910 sec/batch
Epoch: 6/20... Training Step: 1034... Training loss: 1.6706... 1.6079 sec/batch
Epoch: 6/20... Training Step: 1035... Training loss: 1.7128... 1.6219 sec/batch
Epoch: 6/20... Training Step: 1036... Training loss: 1.6647... 1.6303 sec/batch
Epoch: 6/20... Training Step: 1037... Training loss: 1.7024... 1.6083 sec/batch
Epoch: 6/20... Training Step: 1038... Training loss: 1.6779... 1.6277 sec/batch
Epoch: 6/20... Training Step: 1039... Training loss: 1.6841... 1.6551 sec/batch
Epoch: 6/20... Training Step: 1040... Training loss: 1.7393... 1.6263 sec/batch
Epoch: 6/20... Training Step: 1041... Training loss: 1.6766... 1.6116 sec/batch
Epoch: 6/20... Training Step: 1042... Training loss: 1.7553... 1.6298 sec/batch
Epoch: 6/20... Training Step: 1043... Training loss: 1.7089... 1.5885 sec/batch
Epoch: 6/20... Training Step: 1044... Training loss: 1.7055... 1.5785 sec/batch
Epoch: 6/20... Training Step: 1045... Training loss: 1.6928... 1.5814 sec/batch
Epoch: 6/20... Training Step: 1046... Training loss: 1.7089... 1.8849 sec/batch
Epoch: 6/20... Training Step: 1047... Training loss: 1.7064... 1.7330 sec/batch
Epoch: 6/20... Training Step: 1048... Training loss: 1.6857... 1.6041 sec/batch
Epoch: 6/20... Training Step: 1049... Training loss: 1.6856... 1.7294 sec/batch
Epoch: 6/20... Training Step: 1050... Training loss: 1.7189... 1.6148 sec/batch
Epoch: 6/20... Training Step: 1051... Training loss: 1.6858... 1.6432 sec/batch
Epoch: 6/20... Training Step: 1052... Training loss: 1.7456... 1.7071 sec/batch
Epoch: 6/20... Training Step: 1053... Training loss: 1.7219... 1.7592 sec/batch
Epoch: 6/20... Training Step: 1054... Training loss: 1.7040... 1.6213 sec/batch
Epoch: 6/20... Training Step: 1055... Training loss: 1.7047... 1.8275 sec/batch
Epoch: 6/20... Training Step: 1056... Training loss: 1.7138... 1.8957 sec/batch
Epoch: 6/20... Training Step: 1057... Training loss: 1.7117... 1.7789 sec/batch
Epoch: 6/20... Training Step: 1058... Training loss: 1.6762... 1.9349 sec/batch
Epoch: 6/20... Training Step: 1059... Training loss: 1.6884... 1.7730 sec/batch
Epoch: 6/20... Training Step: 1060... Training loss: 1.6865... 1.8560 sec/batch
Epoch: 6/20... Training Step: 1061... Training loss: 1.7208... 1.7856 sec/batch
Epoch: 6/20... Training Step: 1062... Training loss: 1.7225... 1.7628 sec/batch
Epoch: 6/20... Training Step: 1063... Training loss: 1.7277... 1.8027 sec/batch
Epoch: 6/20... Training Step: 1064... Training loss: 1.7006... 1.7213 sec/batch
Epoch: 6/20... Training Step: 1065... Training loss: 1.6982... 1.6367 sec/batch
Epoch: 6/20... Training Step: 1066... Training loss: 1.7290... 1.5770 sec/batch
Epoch: 6/20... Training Step: 1067... Training loss: 1.6986... 1.6556 sec/batch
Epoch: 6/20... Training Step: 1068... Training loss: 1.6992... 1.7287 sec/batch
Epoch: 6/20... Training Step: 1069... Training loss: 1.6644... 1.8083 sec/batch
Epoch: 6/20... Training Step: 1070... Training loss: 1.6908... 1.6687 sec/batch
Epoch: 6/20... Training Step: 1071... Training loss: 1.6543... 1.6422 sec/batch
Epoch: 6/20... Training Step: 1072... Training loss: 1.7000... 1.6283 sec/batch
Epoch: 6/20... Training Step: 1073... Training loss: 1.6666... 1.7075 sec/batch
Epoch: 6/20... Training Step: 1074... Training loss: 1.6943... 1.6946 sec/batch
Epoch: 6/20... Training Step: 1075... Training loss: 1.6609... 1.8153 sec/batch
Epoch: 6/20... Training Step: 1076... Training loss: 1.6735... 1.6110 sec/batch
Epoch: 6/20... Training Step: 1077... Training loss: 1.6673... 1.5927 sec/batch
Epoch: 6/20... Training Step: 1078... Training loss: 1.6704... 1.5723 sec/batch
Epoch: 6/20... Training Step: 1079... Training loss: 1.6595... 1.5759 sec/batch
Epoch: 6/20... Training Step: 1080... Training loss: 1.7021... 1.5958 sec/batch
Epoch: 6/20... Training Step: 1081... Training loss: 1.6625... 1.5767 sec/batch
Epoch: 6/20... Training Step: 1082... Training loss: 1.6667... 1.6587 sec/batch
Epoch: 6/20... Training Step: 1083... Training loss: 1.6653... 1.6960 sec/batch
Epoch: 6/20... Training Step: 1084... Training loss: 1.6518... 1.7753 sec/batch
Epoch: 6/20... Training Step: 1085... Training loss: 1.6607... 1.7068 sec/batch
Epoch: 6/20... Training Step: 1086... Training loss: 1.6927... 1.6234 sec/batch
Epoch: 6/20... Training Step: 1087... Training loss: 1.6832... 1.6019 sec/batch
Epoch: 6/20... Training Step: 1088... Training loss: 1.6611... 1.5769 sec/batch
Epoch: 6/20... Training Step: 1089... Training loss: 1.6691... 1.5832 sec/batch
Epoch: 6/20... Training Step: 1090... Training loss: 1.6526... 1.5872 sec/batch
Epoch: 6/20... Training Step: 1091... Training loss: 1.6808... 1.6871 sec/batch
Epoch: 6/20... Training Step: 1092... Training loss: 1.6622... 1.7246 sec/batch
Epoch: 6/20... Training Step: 1093... Training loss: 1.6736... 1.7208 sec/batch
Epoch: 6/20... Training Step: 1094... Training loss: 1.6729... 1.7265 sec/batch
Epoch: 6/20... Training Step: 1095... Training loss: 1.6698... 1.5910 sec/batch
Epoch: 6/20... Training Step: 1096... Training loss: 1.6770... 1.5838 sec/batch
Epoch: 6/20... Training Step: 1097... Training loss: 1.6754... 1.5872 sec/batch
Epoch: 6/20... Training Step: 1098... Training loss: 1.6793... 1.5884 sec/batch
Epoch: 6/20... Training Step: 1099... Training loss: 1.6787... 1.5936 sec/batch
Epoch: 6/20... Training Step: 1100... Training loss: 1.6914... 1.5941 sec/batch
Epoch: 6/20... Training Step: 1101... Training loss: 1.6694... 1.7390 sec/batch
Epoch: 6/20... Training Step: 1102... Training loss: 1.6690... 1.7093 sec/batch
Epoch: 6/20... Training Step: 1103... Training loss: 1.6788... 1.7659 sec/batch
Epoch: 6/20... Training Step: 1104... Training loss: 1.6635... 1.6243 sec/batch
Epoch: 6/20... Training Step: 1105... Training loss: 1.6576... 1.5865 sec/batch
Epoch: 6/20... Training Step: 1106... Training loss: 1.6459... 1.5815 sec/batch
Epoch: 6/20... Training Step: 1107... Training loss: 1.6791... 1.5784 sec/batch
Epoch: 6/20... Training Step: 1108... Training loss: 1.6751... 1.5840 sec/batch
Epoch: 6/20... Training Step: 1109... Training loss: 1.6801... 1.6081 sec/batch
Epoch: 6/20... Training Step: 1110... Training loss: 1.6763... 1.7037 sec/batch
Epoch: 6/20... Training Step: 1111... Training loss: 1.6904... 1.6913 sec/batch
Epoch: 6/20... Training Step: 1112... Training loss: 1.6465... 1.7717 sec/batch
Epoch: 6/20... Training Step: 1113... Training loss: 1.6430... 1.6365 sec/batch
Epoch: 6/20... Training Step: 1114... Training loss: 1.6897... 1.6045 sec/batch
Epoch: 6/20... Training Step: 1115... Training loss: 1.6769... 1.5867 sec/batch
Epoch: 6/20... Training Step: 1116... Training loss: 1.6396... 1.5752 sec/batch
Epoch: 6/20... Training Step: 1117... Training loss: 1.6954... 1.5732 sec/batch
Epoch: 6/20... Training Step: 1118... Training loss: 1.6868... 1.5898 sec/batch
Epoch: 6/20... Training Step: 1119... Training loss: 1.6655... 1.6564 sec/batch
Epoch: 6/20... Training Step: 1120... Training loss: 1.6599... 1.6485 sec/batch
Epoch: 6/20... Training Step: 1121... Training loss: 1.6442... 1.7906 sec/batch
Epoch: 6/20... Training Step: 1122... Training loss: 1.6594... 1.6405 sec/batch
Epoch: 6/20... Training Step: 1123... Training loss: 1.6968... 1.6134 sec/batch
Epoch: 6/20... Training Step: 1124... Training loss: 1.6966... 1.6487 sec/batch
Epoch: 6/20... Training Step: 1125... Training loss: 1.6862... 1.6417 sec/batch
Epoch: 6/20... Training Step: 1126... Training loss: 1.6871... 1.5769 sec/batch
Epoch: 6/20... Training Step: 1127... Training loss: 1.6991... 1.5953 sec/batch
Epoch: 6/20... Training Step: 1128... Training loss: 1.6793... 1.5800 sec/batch
Epoch: 6/20... Training Step: 1129... Training loss: 1.6928... 1.5914 sec/batch
Epoch: 6/20... Training Step: 1130... Training loss: 1.6611... 1.5882 sec/batch
Epoch: 6/20... Training Step: 1131... Training loss: 1.7087... 1.5880 sec/batch
Epoch: 6/20... Training Step: 1132... Training loss: 1.6830... 1.5647 sec/batch
Epoch: 6/20... Training Step: 1133... Training loss: 1.6701... 1.5708 sec/batch
Epoch: 6/20... Training Step: 1134... Training loss: 1.6882... 1.5798 sec/batch
Epoch: 6/20... Training Step: 1135... Training loss: 1.6611... 1.5718 sec/batch
Epoch: 6/20... Training Step: 1136... Training loss: 1.7068... 1.5934 sec/batch
Epoch: 6/20... Training Step: 1137... Training loss: 1.6945... 1.5810 sec/batch
Epoch: 6/20... Training Step: 1138... Training loss: 1.7043... 1.5655 sec/batch
Epoch: 6/20... Training Step: 1139... Training loss: 1.6853... 1.5826 sec/batch
Epoch: 6/20... Training Step: 1140... Training loss: 1.6813... 1.5780 sec/batch
Epoch: 6/20... Training Step: 1141... Training loss: 1.6435... 1.7382 sec/batch
Epoch: 6/20... Training Step: 1142... Training loss: 1.6849... 1.7322 sec/batch
Epoch: 6/20... Training Step: 1143... Training loss: 1.6873... 1.8036 sec/batch
Epoch: 6/20... Training Step: 1144... Training loss: 1.6805... 1.6207 sec/batch
Epoch: 6/20... Training Step: 1145... Training loss: 1.6870... 1.5867 sec/batch
Epoch: 6/20... Training Step: 1146... Training loss: 1.6724... 1.5797 sec/batch
Epoch: 6/20... Training Step: 1147... Training loss: 1.6795... 1.5731 sec/batch
Epoch: 6/20... Training Step: 1148... Training loss: 1.6744... 1.5840 sec/batch
Epoch: 6/20... Training Step: 1149... Training loss: 1.6397... 1.5906 sec/batch
Epoch: 6/20... Training Step: 1150... Training loss: 1.6921... 1.5976 sec/batch
Epoch: 6/20... Training Step: 1151... Training loss: 1.7106... 1.5913 sec/batch
Epoch: 6/20... Training Step: 1152... Training loss: 1.6743... 1.5913 sec/batch
Epoch: 6/20... Training Step: 1153... Training loss: 1.6897... 1.5841 sec/batch
Epoch: 6/20... Training Step: 1154... Training loss: 1.6766... 1.5760 sec/batch
Epoch: 6/20... Training Step: 1155... Training loss: 1.6730... 1.5838 sec/batch
Epoch: 6/20... Training Step: 1156... Training loss: 1.6754... 1.5734 sec/batch
Epoch: 6/20... Training Step: 1157... Training loss: 1.6832... 1.5868 sec/batch
Epoch: 6/20... Training Step: 1158... Training loss: 1.7394... 1.5823 sec/batch
Epoch: 6/20... Training Step: 1159... Training loss: 1.6699... 1.6077 sec/batch
Epoch: 6/20... Training Step: 1160... Training loss: 1.6703... 1.6013 sec/batch
Epoch: 6/20... Training Step: 1161... Training loss: 1.6564... 1.5884 sec/batch
Epoch: 6/20... Training Step: 1162... Training loss: 1.6720... 1.5739 sec/batch
Epoch: 6/20... Training Step: 1163... Training loss: 1.7071... 1.7379 sec/batch
Epoch: 6/20... Training Step: 1164... Training loss: 1.6932... 1.6883 sec/batch
Epoch: 6/20... Training Step: 1165... Training loss: 1.6972... 1.7625 sec/batch
Epoch: 6/20... Training Step: 1166... Training loss: 1.6588... 1.7741 sec/batch
Epoch: 6/20... Training Step: 1167... Training loss: 1.6590... 1.6601 sec/batch
Epoch: 6/20... Training Step: 1168... Training loss: 1.6910... 1.5788 sec/batch
Epoch: 6/20... Training Step: 1169... Training loss: 1.6529... 1.5824 sec/batch
Epoch: 6/20... Training Step: 1170... Training loss: 1.6386... 1.5920 sec/batch
Epoch: 6/20... Training Step: 1171... Training loss: 1.6448... 1.5785 sec/batch
Epoch: 6/20... Training Step: 1172... Training loss: 1.6715... 1.5760 sec/batch
Epoch: 6/20... Training Step: 1173... Training loss: 1.6682... 1.5858 sec/batch
Epoch: 6/20... Training Step: 1174... Training loss: 1.6685... 1.5792 sec/batch
Epoch: 6/20... Training Step: 1175... Training loss: 1.6653... 1.5469 sec/batch
Epoch: 6/20... Training Step: 1176... Training loss: 1.6482... 1.5948 sec/batch
Epoch: 6/20... Training Step: 1177... Training loss: 1.6858... 1.5789 sec/batch
Epoch: 6/20... Training Step: 1178... Training loss: 1.6554... 1.5995 sec/batch
Epoch: 6/20... Training Step: 1179... Training loss: 1.6666... 1.6190 sec/batch
Epoch: 6/20... Training Step: 1180... Training loss: 1.6711... 1.5740 sec/batch
Epoch: 6/20... Training Step: 1181... Training loss: 1.6719... 1.5882 sec/batch
Epoch: 6/20... Training Step: 1182... Training loss: 1.6414... 1.5665 sec/batch
Epoch: 6/20... Training Step: 1183... Training loss: 1.6746... 1.5835 sec/batch
Epoch: 6/20... Training Step: 1184... Training loss: 1.6443... 1.5834 sec/batch
Epoch: 6/20... Training Step: 1185... Training loss: 1.6305... 1.5806 sec/batch
Epoch: 6/20... Training Step: 1186... Training loss: 1.6707... 1.5782 sec/batch
Epoch: 6/20... Training Step: 1187... Training loss: 1.6511... 1.5740 sec/batch
Epoch: 6/20... Training Step: 1188... Training loss: 1.6594... 1.5865 sec/batch
Epoch: 7/20... Training Step: 1189... Training loss: 1.7308... 1.5675 sec/batch
Epoch: 7/20... Training Step: 1190... Training loss: 1.6529... 1.5752 sec/batch
Epoch: 7/20... Training Step: 1191... Training loss: 1.6500... 1.5893 sec/batch
Epoch: 7/20... Training Step: 1192... Training loss: 1.6561... 1.5871 sec/batch
Epoch: 7/20... Training Step: 1193... Training loss: 1.6539... 1.5874 sec/batch
Epoch: 7/20... Training Step: 1194... Training loss: 1.6190... 1.5743 sec/batch
Epoch: 7/20... Training Step: 1195... Training loss: 1.6471... 1.5837 sec/batch
Epoch: 7/20... Training Step: 1196... Training loss: 1.6417... 1.5799 sec/batch
Epoch: 7/20... Training Step: 1197... Training loss: 1.6701... 1.5840 sec/batch
Epoch: 7/20... Training Step: 1198... Training loss: 1.6409... 1.7291 sec/batch
Epoch: 7/20... Training Step: 1199... Training loss: 1.6403... 1.7165 sec/batch
Epoch: 7/20... Training Step: 1200... Training loss: 1.6301... 1.6378 sec/batch
Epoch: 7/20... Training Step: 1201... Training loss: 1.6296... 1.6099 sec/batch
Epoch: 7/20... Training Step: 1202... Training loss: 1.6880... 1.5622 sec/batch
Epoch: 7/20... Training Step: 1203... Training loss: 1.6397... 1.6801 sec/batch
Epoch: 7/20... Training Step: 1204... Training loss: 1.6089... 1.7002 sec/batch
Epoch: 7/20... Training Step: 1205... Training loss: 1.6449... 1.7229 sec/batch
Epoch: 7/20... Training Step: 1206... Training loss: 1.6653... 1.6642 sec/batch
Epoch: 7/20... Training Step: 1207... Training loss: 1.6435... 1.5903 sec/batch
Epoch: 7/20... Training Step: 1208... Training loss: 1.6485... 1.5818 sec/batch
Epoch: 7/20... Training Step: 1209... Training loss: 1.6310... 1.5933 sec/batch
Epoch: 7/20... Training Step: 1210... Training loss: 1.6701... 1.5804 sec/batch
Epoch: 7/20... Training Step: 1211... Training loss: 1.6493... 1.6059 sec/batch
Epoch: 7/20... Training Step: 1212... Training loss: 1.6453... 1.5665 sec/batch
Epoch: 7/20... Training Step: 1213... Training loss: 1.6530... 1.5933 sec/batch
Epoch: 7/20... Training Step: 1214... Training loss: 1.6248... 1.5950 sec/batch
Epoch: 7/20... Training Step: 1215... Training loss: 1.6218... 1.5806 sec/batch
Epoch: 7/20... Training Step: 1216... Training loss: 1.6635... 1.6158 sec/batch
Epoch: 7/20... Training Step: 1217... Training loss: 1.6779... 1.5875 sec/batch
Epoch: 7/20... Training Step: 1218... Training loss: 1.6743... 1.5812 sec/batch
Epoch: 7/20... Training Step: 1219... Training loss: 1.6402... 1.5799 sec/batch
Epoch: 7/20... Training Step: 1220... Training loss: 1.6262... 1.5708 sec/batch
Epoch: 7/20... Training Step: 1221... Training loss: 1.6571... 1.5830 sec/batch
Epoch: 7/20... Training Step: 1222... Training loss: 1.6642... 1.5817 sec/batch
Epoch: 7/20... Training Step: 1223... Training loss: 1.6477... 1.5798 sec/batch
Epoch: 7/20... Training Step: 1224... Training loss: 1.6590... 1.5743 sec/batch
Epoch: 7/20... Training Step: 1225... Training loss: 1.6302... 1.5826 sec/batch
Epoch: 7/20... Training Step: 1226... Training loss: 1.6036... 1.5777 sec/batch
Epoch: 7/20... Training Step: 1227... Training loss: 1.6153... 1.5527 sec/batch
Epoch: 7/20... Training Step: 1228... Training loss: 1.6223... 1.5716 sec/batch
Epoch: 7/20... Training Step: 1229... Training loss: 1.6285... 1.5859 sec/batch
Epoch: 7/20... Training Step: 1230... Training loss: 1.6660... 1.5852 sec/batch
Epoch: 7/20... Training Step: 1231... Training loss: 1.6264... 1.5869 sec/batch
Epoch: 7/20... Training Step: 1232... Training loss: 1.6251... 1.5870 sec/batch
Epoch: 7/20... Training Step: 1233... Training loss: 1.6509... 1.7783 sec/batch
Epoch: 7/20... Training Step: 1234... Training loss: 1.6168... 1.6144 sec/batch
Epoch: 7/20... Training Step: 1235... Training loss: 1.6496... 1.5508 sec/batch
Epoch: 7/20... Training Step: 1236... Training loss: 1.6233... 1.5882 sec/batch
Epoch: 7/20... Training Step: 1237... Training loss: 1.6268... 1.5838 sec/batch
Epoch: 7/20... Training Step: 1238... Training loss: 1.6651... 1.5833 sec/batch
Epoch: 7/20... Training Step: 1239... Training loss: 1.6085... 1.5761 sec/batch
Epoch: 7/20... Training Step: 1240... Training loss: 1.6832... 1.5829 sec/batch
Epoch: 7/20... Training Step: 1241... Training loss: 1.6487... 1.5904 sec/batch
Epoch: 7/20... Training Step: 1242... Training loss: 1.6423... 1.5849 sec/batch
Epoch: 7/20... Training Step: 1243... Training loss: 1.6236... 1.5719 sec/batch
Epoch: 7/20... Training Step: 1244... Training loss: 1.6429... 1.6505 sec/batch
Epoch: 7/20... Training Step: 1245... Training loss: 1.6495... 1.7020 sec/batch
Epoch: 7/20... Training Step: 1246... Training loss: 1.6341... 1.6939 sec/batch
Epoch: 7/20... Training Step: 1247... Training loss: 1.6099... 1.7294 sec/batch
Epoch: 7/20... Training Step: 1248... Training loss: 1.6644... 1.5980 sec/batch
Epoch: 7/20... Training Step: 1249... Training loss: 1.6345... 1.5497 sec/batch
Epoch: 7/20... Training Step: 1250... Training loss: 1.6954... 1.5811 sec/batch
Epoch: 7/20... Training Step: 1251... Training loss: 1.6687... 1.5874 sec/batch
Epoch: 7/20... Training Step: 1252... Training loss: 1.6480... 1.5879 sec/batch
Epoch: 7/20... Training Step: 1253... Training loss: 1.6353... 1.6092 sec/batch
Epoch: 7/20... Training Step: 1254... Training loss: 1.6559... 1.5778 sec/batch
Epoch: 7/20... Training Step: 1255... Training loss: 1.6568... 1.5687 sec/batch
Epoch: 7/20... Training Step: 1256... Training loss: 1.6128... 1.5863 sec/batch
Epoch: 7/20... Training Step: 1257... Training loss: 1.6342... 1.5769 sec/batch
Epoch: 7/20... Training Step: 1258... Training loss: 1.6251... 1.5816 sec/batch
Epoch: 7/20... Training Step: 1259... Training loss: 1.6704... 1.5756 sec/batch
Epoch: 7/20... Training Step: 1260... Training loss: 1.6715... 1.5813 sec/batch
Epoch: 7/20... Training Step: 1261... Training loss: 1.6802... 1.5795 sec/batch
Epoch: 7/20... Training Step: 1262... Training loss: 1.6357... 1.5756 sec/batch
Epoch: 7/20... Training Step: 1263... Training loss: 1.6354... 1.5942 sec/batch
Epoch: 7/20... Training Step: 1264... Training loss: 1.6647... 1.5673 sec/batch
Epoch: 7/20... Training Step: 1265... Training loss: 1.6345... 1.5879 sec/batch
Epoch: 7/20... Training Step: 1266... Training loss: 1.6371... 1.5991 sec/batch
Epoch: 7/20... Training Step: 1267... Training loss: 1.6091... 1.5691 sec/batch
Epoch: 7/20... Training Step: 1268... Training loss: 1.6413... 1.5832 sec/batch
Epoch: 7/20... Training Step: 1269... Training loss: 1.6040... 1.5852 sec/batch
Epoch: 7/20... Training Step: 1270... Training loss: 1.6565... 1.5908 sec/batch
Epoch: 7/20... Training Step: 1271... Training loss: 1.6078... 1.5921 sec/batch
Epoch: 7/20... Training Step: 1272... Training loss: 1.6414... 1.6003 sec/batch
Epoch: 7/20... Training Step: 1273... Training loss: 1.6082... 1.6551 sec/batch
Epoch: 7/20... Training Step: 1274... Training loss: 1.6196... 1.6282 sec/batch
Epoch: 7/20... Training Step: 1275... Training loss: 1.6178... 1.6479 sec/batch
Epoch: 7/20... Training Step: 1276... Training loss: 1.6157... 1.5782 sec/batch
Epoch: 7/20... Training Step: 1277... Training loss: 1.5959... 1.5557 sec/batch
Epoch: 7/20... Training Step: 1278... Training loss: 1.6546... 1.6398 sec/batch
Epoch: 7/20... Training Step: 1279... Training loss: 1.6070... 1.5771 sec/batch
Epoch: 7/20... Training Step: 1280... Training loss: 1.6180... 1.5856 sec/batch
Epoch: 7/20... Training Step: 1281... Training loss: 1.6203... 1.5831 sec/batch
Epoch: 7/20... Training Step: 1282... Training loss: 1.6114... 1.5852 sec/batch
Epoch: 7/20... Training Step: 1283... Training loss: 1.6080... 1.5777 sec/batch
Epoch: 7/20... Training Step: 1284... Training loss: 1.6504... 1.5818 sec/batch
Epoch: 7/20... Training Step: 1285... Training loss: 1.6391... 1.6182 sec/batch
Epoch: 7/20... Training Step: 1286... Training loss: 1.6034... 1.7382 sec/batch
Epoch: 7/20... Training Step: 1287... Training loss: 1.6200... 1.7059 sec/batch
Epoch: 7/20... Training Step: 1288... Training loss: 1.5891... 1.7329 sec/batch
Epoch: 7/20... Training Step: 1289... Training loss: 1.6290... 1.6065 sec/batch
Epoch: 7/20... Training Step: 1290... Training loss: 1.6206... 1.5762 sec/batch
Epoch: 7/20... Training Step: 1291... Training loss: 1.6185... 1.5764 sec/batch
Epoch: 7/20... Training Step: 1292... Training loss: 1.6209... 1.5859 sec/batch
Epoch: 7/20... Training Step: 1293... Training loss: 1.6212... 1.5785 sec/batch
Epoch: 7/20... Training Step: 1294... Training loss: 1.6287... 1.5578 sec/batch
Epoch: 7/20... Training Step: 1295... Training loss: 1.6258... 1.5701 sec/batch
Epoch: 7/20... Training Step: 1296... Training loss: 1.6279... 1.5817 sec/batch
Epoch: 7/20... Training Step: 1297... Training loss: 1.6389... 1.5791 sec/batch
Epoch: 7/20... Training Step: 1298... Training loss: 1.6442... 1.5791 sec/batch
Epoch: 7/20... Training Step: 1299... Training loss: 1.6222... 1.5748 sec/batch
Epoch: 7/20... Training Step: 1300... Training loss: 1.6248... 1.5773 sec/batch
Epoch: 7/20... Training Step: 1301... Training loss: 1.6148... 1.5807 sec/batch
Epoch: 7/20... Training Step: 1302... Training loss: 1.6071... 1.5429 sec/batch
Epoch: 7/20... Training Step: 1303... Training loss: 1.5993... 1.5797 sec/batch
Epoch: 7/20... Training Step: 1304... Training loss: 1.5958... 1.5854 sec/batch
Epoch: 7/20... Training Step: 1305... Training loss: 1.6312... 1.5735 sec/batch
Epoch: 7/20... Training Step: 1306... Training loss: 1.6404... 1.5825 sec/batch
Epoch: 7/20... Training Step: 1307... Training loss: 1.6205... 1.5825 sec/batch
Epoch: 7/20... Training Step: 1308... Training loss: 1.6166... 1.5720 sec/batch
Epoch: 7/20... Training Step: 1309... Training loss: 1.6250... 1.6010 sec/batch
Epoch: 7/20... Training Step: 1310... Training loss: 1.5906... 1.6113 sec/batch
Epoch: 7/20... Training Step: 1311... Training loss: 1.5910... 1.5742 sec/batch
Epoch: 7/20... Training Step: 1312... Training loss: 1.6349... 1.5830 sec/batch
Epoch: 7/20... Training Step: 1313... Training loss: 1.6165... 1.5719 sec/batch
Epoch: 7/20... Training Step: 1314... Training loss: 1.5855... 1.5689 sec/batch
Epoch: 7/20... Training Step: 1315... Training loss: 1.6392... 1.5872 sec/batch
Epoch: 7/20... Training Step: 1316... Training loss: 1.6376... 1.5845 sec/batch
Epoch: 7/20... Training Step: 1317... Training loss: 1.6142... 1.5877 sec/batch
Epoch: 7/20... Training Step: 1318... Training loss: 1.5932... 1.5664 sec/batch
Epoch: 7/20... Training Step: 1319... Training loss: 1.5957... 1.5783 sec/batch
Epoch: 7/20... Training Step: 1320... Training loss: 1.5963... 1.5572 sec/batch
Epoch: 7/20... Training Step: 1321... Training loss: 1.6423... 1.5652 sec/batch
Epoch: 7/20... Training Step: 1322... Training loss: 1.6332... 1.5781 sec/batch
Epoch: 7/20... Training Step: 1323... Training loss: 1.6282... 1.5764 sec/batch
Epoch: 7/20... Training Step: 1324... Training loss: 1.6270... 1.6106 sec/batch
Epoch: 7/20... Training Step: 1325... Training loss: 1.6431... 1.5815 sec/batch
Epoch: 7/20... Training Step: 1326... Training loss: 1.6271... 1.5875 sec/batch
Epoch: 7/20... Training Step: 1327... Training loss: 1.6399... 1.7730 sec/batch
Epoch: 7/20... Training Step: 1328... Training loss: 1.6069... 1.7057 sec/batch
Epoch: 7/20... Training Step: 1329... Training loss: 1.6621... 1.7289 sec/batch
Epoch: 7/20... Training Step: 1330... Training loss: 1.6296... 1.6175 sec/batch
Epoch: 7/20... Training Step: 1331... Training loss: 1.6192... 1.5806 sec/batch
Epoch: 7/20... Training Step: 1332... Training loss: 1.6382... 1.5679 sec/batch
Epoch: 7/20... Training Step: 1333... Training loss: 1.6059... 1.5773 sec/batch
Epoch: 7/20... Training Step: 1334... Training loss: 1.6486... 1.5837 sec/batch
Epoch: 7/20... Training Step: 1335... Training loss: 1.6311... 1.5837 sec/batch
Epoch: 7/20... Training Step: 1336... Training loss: 1.6557... 1.5849 sec/batch
Epoch: 7/20... Training Step: 1337... Training loss: 1.6361... 1.5831 sec/batch
Epoch: 7/20... Training Step: 1338... Training loss: 1.6338... 1.5830 sec/batch
Epoch: 7/20... Training Step: 1339... Training loss: 1.5892... 1.5440 sec/batch
Epoch: 7/20... Training Step: 1340... Training loss: 1.6109... 1.5914 sec/batch
Epoch: 7/20... Training Step: 1341... Training loss: 1.6452... 1.5824 sec/batch
Epoch: 7/20... Training Step: 1342... Training loss: 1.6329... 1.5887 sec/batch
Epoch: 7/20... Training Step: 1343... Training loss: 1.6305... 1.5736 sec/batch
Epoch: 7/20... Training Step: 1344... Training loss: 1.6354... 1.5743 sec/batch
Epoch: 7/20... Training Step: 1345... Training loss: 1.6219... 1.5763 sec/batch
Epoch: 7/20... Training Step: 1346... Training loss: 1.6292... 1.5744 sec/batch
Epoch: 7/20... Training Step: 1347... Training loss: 1.5926... 1.6218 sec/batch
Epoch: 7/20... Training Step: 1348... Training loss: 1.6423... 1.6322 sec/batch
Epoch: 7/20... Training Step: 1349... Training loss: 1.6499... 1.6132 sec/batch
Epoch: 7/20... Training Step: 1350... Training loss: 1.6288... 1.6447 sec/batch
Epoch: 7/20... Training Step: 1351... Training loss: 1.6473... 1.5824 sec/batch
Epoch: 7/20... Training Step: 1352... Training loss: 1.6179... 1.5851 sec/batch
Epoch: 7/20... Training Step: 1353... Training loss: 1.6247... 1.5814 sec/batch
Epoch: 7/20... Training Step: 1354... Training loss: 1.6242... 1.5820 sec/batch
Epoch: 7/20... Training Step: 1355... Training loss: 1.6404... 1.5845 sec/batch
Epoch: 7/20... Training Step: 1356... Training loss: 1.6993... 1.5765 sec/batch
Epoch: 7/20... Training Step: 1357... Training loss: 1.6232... 1.5684 sec/batch
Epoch: 7/20... Training Step: 1358... Training loss: 1.6238... 1.5691 sec/batch
Epoch: 7/20... Training Step: 1359... Training loss: 1.6126... 1.5820 sec/batch
Epoch: 7/20... Training Step: 1360... Training loss: 1.6103... 1.5998 sec/batch
Epoch: 7/20... Training Step: 1361... Training loss: 1.6471... 1.6018 sec/batch
Epoch: 7/20... Training Step: 1362... Training loss: 1.6331... 1.7491 sec/batch
Epoch: 7/20... Training Step: 1363... Training loss: 1.6486... 1.6460 sec/batch
Epoch: 7/20... Training Step: 1364... Training loss: 1.6001... 1.5634 sec/batch
Epoch: 7/20... Training Step: 1365... Training loss: 1.6104... 1.5700 sec/batch
Epoch: 7/20... Training Step: 1366... Training loss: 1.6356... 1.5837 sec/batch
Epoch: 7/20... Training Step: 1367... Training loss: 1.6003... 1.5764 sec/batch
Epoch: 7/20... Training Step: 1368... Training loss: 1.5842... 1.6984 sec/batch
Epoch: 7/20... Training Step: 1369... Training loss: 1.5932... 1.6719 sec/batch
Epoch: 7/20... Training Step: 1370... Training loss: 1.6243... 1.7178 sec/batch
Epoch: 7/20... Training Step: 1371... Training loss: 1.6245... 1.6659 sec/batch
Epoch: 7/20... Training Step: 1372... Training loss: 1.6207... 1.5597 sec/batch
Epoch: 7/20... Training Step: 1373... Training loss: 1.6119... 1.5688 sec/batch
Epoch: 7/20... Training Step: 1374... Training loss: 1.5984... 1.5722 sec/batch
Epoch: 7/20... Training Step: 1375... Training loss: 1.6397... 1.5715 sec/batch
Epoch: 7/20... Training Step: 1376... Training loss: 1.6128... 1.5711 sec/batch
Epoch: 7/20... Training Step: 1377... Training loss: 1.6187... 1.5533 sec/batch
Epoch: 7/20... Training Step: 1378... Training loss: 1.6236... 1.5550 sec/batch
Epoch: 7/20... Training Step: 1379... Training loss: 1.6143... 1.5698 sec/batch
Epoch: 7/20... Training Step: 1380... Training loss: 1.5955... 1.5816 sec/batch
Epoch: 7/20... Training Step: 1381... Training loss: 1.6138... 1.5517 sec/batch
Epoch: 7/20... Training Step: 1382... Training loss: 1.6001... 1.5693 sec/batch
Epoch: 7/20... Training Step: 1383... Training loss: 1.5891... 1.5648 sec/batch
Epoch: 7/20... Training Step: 1384... Training loss: 1.6283... 1.5759 sec/batch
Epoch: 7/20... Training Step: 1385... Training loss: 1.6106... 1.6974 sec/batch
Epoch: 7/20... Training Step: 1386... Training loss: 1.5971... 1.6312 sec/batch
Epoch: 8/20... Training Step: 1387... Training loss: 1.6964... 1.5696 sec/batch
Epoch: 8/20... Training Step: 1388... Training loss: 1.6237... 1.5632 sec/batch
Epoch: 8/20... Training Step: 1389... Training loss: 1.5984... 1.5706 sec/batch
Epoch: 8/20... Training Step: 1390... Training loss: 1.6200... 1.5607 sec/batch
Epoch: 8/20... Training Step: 1391... Training loss: 1.5980... 1.5723 sec/batch
Epoch: 8/20... Training Step: 1392... Training loss: 1.5812... 1.5811 sec/batch
Epoch: 8/20... Training Step: 1393... Training loss: 1.6062... 1.5701 sec/batch
Epoch: 8/20... Training Step: 1394... Training loss: 1.6072... 1.5938 sec/batch
Epoch: 8/20... Training Step: 1395... Training loss: 1.6264... 1.5765 sec/batch
Epoch: 8/20... Training Step: 1396... Training loss: 1.6012... 1.5643 sec/batch
Epoch: 8/20... Training Step: 1397... Training loss: 1.5938... 1.5548 sec/batch
Epoch: 8/20... Training Step: 1398... Training loss: 1.5974... 1.5789 sec/batch
Epoch: 8/20... Training Step: 1399... Training loss: 1.5968... 1.5680 sec/batch
Epoch: 8/20... Training Step: 1400... Training loss: 1.6417... 1.5621 sec/batch
Epoch: 8/20... Training Step: 1401... Training loss: 1.6076... 1.6054 sec/batch
Epoch: 8/20... Training Step: 1402... Training loss: 1.5819... 1.6296 sec/batch
Epoch: 8/20... Training Step: 1403... Training loss: 1.6140... 1.6709 sec/batch
Epoch: 8/20... Training Step: 1404... Training loss: 1.6303... 1.6260 sec/batch
Epoch: 8/20... Training Step: 1405... Training loss: 1.6095... 1.6299 sec/batch
Epoch: 8/20... Training Step: 1406... Training loss: 1.6201... 1.6366 sec/batch
Epoch: 8/20... Training Step: 1407... Training loss: 1.6000... 1.6197 sec/batch
Epoch: 8/20... Training Step: 1408... Training loss: 1.6354... 1.6823 sec/batch
Epoch: 8/20... Training Step: 1409... Training loss: 1.6100... 1.7748 sec/batch
Epoch: 8/20... Training Step: 1410... Training loss: 1.6068... 1.7538 sec/batch
Epoch: 8/20... Training Step: 1411... Training loss: 1.6110... 1.7552 sec/batch
Epoch: 8/20... Training Step: 1412... Training loss: 1.5683... 1.6223 sec/batch
Epoch: 8/20... Training Step: 1413... Training loss: 1.5867... 1.6247 sec/batch
Epoch: 8/20... Training Step: 1414... Training loss: 1.6292... 1.6234 sec/batch
Epoch: 8/20... Training Step: 1415... Training loss: 1.6319... 1.6362 sec/batch
Epoch: 8/20... Training Step: 1416... Training loss: 1.6229... 1.6195 sec/batch
Epoch: 8/20... Training Step: 1417... Training loss: 1.6166... 1.6144 sec/batch
Epoch: 8/20... Training Step: 1418... Training loss: 1.5878... 1.6352 sec/batch
Epoch: 8/20... Training Step: 1419... Training loss: 1.6285... 1.6651 sec/batch
Epoch: 8/20... Training Step: 1420... Training loss: 1.6278... 1.8229 sec/batch
Epoch: 8/20... Training Step: 1421... Training loss: 1.6142... 1.6871 sec/batch
Epoch: 8/20... Training Step: 1422... Training loss: 1.6251... 1.7237 sec/batch
Epoch: 8/20... Training Step: 1423... Training loss: 1.5807... 1.6639 sec/batch
Epoch: 8/20... Training Step: 1424... Training loss: 1.5755... 1.6882 sec/batch
Epoch: 8/20... Training Step: 1425... Training loss: 1.5715... 1.6609 sec/batch
Epoch: 8/20... Training Step: 1426... Training loss: 1.5900... 1.6205 sec/batch
Epoch: 8/20... Training Step: 1427... Training loss: 1.5943... 1.6316 sec/batch
Epoch: 8/20... Training Step: 1428... Training loss: 1.6356... 1.6342 sec/batch
Epoch: 8/20... Training Step: 1429... Training loss: 1.5923... 1.6389 sec/batch
Epoch: 8/20... Training Step: 1430... Training loss: 1.5740... 1.6360 sec/batch
Epoch: 8/20... Training Step: 1431... Training loss: 1.6210... 1.6285 sec/batch
Epoch: 8/20... Training Step: 1432... Training loss: 1.5883... 1.6386 sec/batch
Epoch: 8/20... Training Step: 1433... Training loss: 1.6072... 1.6381 sec/batch
Epoch: 8/20... Training Step: 1434... Training loss: 1.5886... 1.6336 sec/batch
Epoch: 8/20... Training Step: 1435... Training loss: 1.5950... 1.6530 sec/batch
Epoch: 8/20... Training Step: 1436... Training loss: 1.6330... 1.6349 sec/batch
Epoch: 8/20... Training Step: 1437... Training loss: 1.5748... 1.6149 sec/batch
Epoch: 8/20... Training Step: 1438... Training loss: 1.6565... 1.6333 sec/batch
Epoch: 8/20... Training Step: 1439... Training loss: 1.6261... 1.6613 sec/batch
Epoch: 8/20... Training Step: 1440... Training loss: 1.6113... 1.6241 sec/batch
Epoch: 8/20... Training Step: 1441... Training loss: 1.5816... 1.6338 sec/batch
Epoch: 8/20... Training Step: 1442... Training loss: 1.6134... 1.6140 sec/batch
Epoch: 8/20... Training Step: 1443... Training loss: 1.6176... 1.6325 sec/batch
Epoch: 8/20... Training Step: 1444... Training loss: 1.5820... 1.5989 sec/batch
Epoch: 8/20... Training Step: 1445... Training loss: 1.5849... 1.6143 sec/batch
Epoch: 8/20... Training Step: 1446... Training loss: 1.6306... 1.6264 sec/batch
Epoch: 8/20... Training Step: 1447... Training loss: 1.5943... 1.6377 sec/batch
Epoch: 8/20... Training Step: 1448... Training loss: 1.6591... 1.7308 sec/batch
Epoch: 8/20... Training Step: 1449... Training loss: 1.6354... 1.7416 sec/batch
Epoch: 8/20... Training Step: 1450... Training loss: 1.6118... 1.7583 sec/batch
Epoch: 8/20... Training Step: 1451... Training loss: 1.5997... 1.7709 sec/batch
Epoch: 8/20... Training Step: 1452... Training loss: 1.6202... 1.6279 sec/batch
Epoch: 8/20... Training Step: 1453... Training loss: 1.6144... 1.6295 sec/batch
Epoch: 8/20... Training Step: 1454... Training loss: 1.5773... 1.6295 sec/batch
Epoch: 8/20... Training Step: 1455... Training loss: 1.6002... 1.6434 sec/batch
Epoch: 8/20... Training Step: 1456... Training loss: 1.5945... 1.6304 sec/batch
Epoch: 8/20... Training Step: 1457... Training loss: 1.6351... 1.6218 sec/batch
Epoch: 8/20... Training Step: 1458... Training loss: 1.6214... 1.6741 sec/batch
Epoch: 8/20... Training Step: 1459... Training loss: 1.6450... 1.6360 sec/batch
Epoch: 8/20... Training Step: 1460... Training loss: 1.6040... 1.6172 sec/batch
Epoch: 8/20... Training Step: 1461... Training loss: 1.6070... 1.6370 sec/batch
Epoch: 8/20... Training Step: 1462... Training loss: 1.6297... 1.6281 sec/batch
Epoch: 8/20... Training Step: 1463... Training loss: 1.5983... 1.6287 sec/batch
Epoch: 8/20... Training Step: 1464... Training loss: 1.5987... 1.6253 sec/batch
Epoch: 8/20... Training Step: 1465... Training loss: 1.5682... 1.6263 sec/batch
Epoch: 8/20... Training Step: 1466... Training loss: 1.5946... 1.6396 sec/batch
Epoch: 8/20... Training Step: 1467... Training loss: 1.5560... 1.6351 sec/batch
Epoch: 8/20... Training Step: 1468... Training loss: 1.6219... 1.6145 sec/batch
Epoch: 8/20... Training Step: 1469... Training loss: 1.5662... 1.6272 sec/batch
Epoch: 8/20... Training Step: 1470... Training loss: 1.5907... 1.6340 sec/batch
Epoch: 8/20... Training Step: 1471... Training loss: 1.5753... 1.6207 sec/batch
Epoch: 8/20... Training Step: 1472... Training loss: 1.5846... 1.6272 sec/batch
Epoch: 8/20... Training Step: 1473... Training loss: 1.5763... 1.6222 sec/batch
Epoch: 8/20... Training Step: 1474... Training loss: 1.5855... 1.6440 sec/batch
Epoch: 8/20... Training Step: 1475... Training loss: 1.5639... 1.6330 sec/batch
Epoch: 8/20... Training Step: 1476... Training loss: 1.6208... 1.6556 sec/batch
Epoch: 8/20... Training Step: 1477... Training loss: 1.5715... 1.6298 sec/batch
Epoch: 8/20... Training Step: 1478... Training loss: 1.5769... 1.6340 sec/batch
Epoch: 8/20... Training Step: 1479... Training loss: 1.5727... 1.6608 sec/batch
Epoch: 8/20... Training Step: 1480... Training loss: 1.5773... 1.6307 sec/batch
Epoch: 8/20... Training Step: 1481... Training loss: 1.5755... 1.6352 sec/batch
Epoch: 8/20... Training Step: 1482... Training loss: 1.6238... 1.6335 sec/batch
Epoch: 8/20... Training Step: 1483... Training loss: 1.5907... 1.6324 sec/batch
Epoch: 8/20... Training Step: 1484... Training loss: 1.5610... 1.6260 sec/batch
Epoch: 8/20... Training Step: 1485... Training loss: 1.5884... 1.6336 sec/batch
Epoch: 8/20... Training Step: 1486... Training loss: 1.5604... 1.6397 sec/batch
Epoch: 8/20... Training Step: 1487... Training loss: 1.5953... 1.6244 sec/batch
Epoch: 8/20... Training Step: 1488... Training loss: 1.5766... 1.7155 sec/batch
Epoch: 8/20... Training Step: 1489... Training loss: 1.5889... 1.7262 sec/batch
Epoch: 8/20... Training Step: 1490... Training loss: 1.5880... 1.7656 sec/batch
Epoch: 8/20... Training Step: 1491... Training loss: 1.5872... 1.7525 sec/batch
Epoch: 8/20... Training Step: 1492... Training loss: 1.5893... 1.6387 sec/batch
Epoch: 8/20... Training Step: 1493... Training loss: 1.5883... 1.6121 sec/batch
Epoch: 8/20... Training Step: 1494... Training loss: 1.6068... 1.6606 sec/batch
Epoch: 8/20... Training Step: 1495... Training loss: 1.5960... 1.6619 sec/batch
Epoch: 8/20... Training Step: 1496... Training loss: 1.6086... 1.6439 sec/batch
Epoch: 8/20... Training Step: 1497... Training loss: 1.5724... 1.6388 sec/batch
Epoch: 8/20... Training Step: 1498... Training loss: 1.5920... 1.6578 sec/batch
Epoch: 8/20... Training Step: 1499... Training loss: 1.5868... 1.6175 sec/batch
Epoch: 8/20... Training Step: 1500... Training loss: 1.5746... 1.6258 sec/batch
Epoch: 8/20... Training Step: 1501... Training loss: 1.5707... 1.6394 sec/batch
Epoch: 8/20... Training Step: 1502... Training loss: 1.5613... 1.6326 sec/batch
Epoch: 8/20... Training Step: 1503... Training loss: 1.6031... 1.6298 sec/batch
Epoch: 8/20... Training Step: 1504... Training loss: 1.5934... 1.6339 sec/batch
Epoch: 8/20... Training Step: 1505... Training loss: 1.5847... 1.6195 sec/batch
Epoch: 8/20... Training Step: 1506... Training loss: 1.5915... 1.6410 sec/batch
Epoch: 8/20... Training Step: 1507... Training loss: 1.5965... 1.6475 sec/batch
Epoch: 8/20... Training Step: 1508... Training loss: 1.5632... 1.6327 sec/batch
Epoch: 8/20... Training Step: 1509... Training loss: 1.5440... 1.6248 sec/batch
Epoch: 8/20... Training Step: 1510... Training loss: 1.6008... 1.6245 sec/batch
Epoch: 8/20... Training Step: 1511... Training loss: 1.5809... 1.6414 sec/batch
Epoch: 8/20... Training Step: 1512... Training loss: 1.5454... 1.6375 sec/batch
Epoch: 8/20... Training Step: 1513... Training loss: 1.6041... 1.6176 sec/batch
Epoch: 8/20... Training Step: 1514... Training loss: 1.6022... 1.6239 sec/batch
Epoch: 8/20... Training Step: 1515... Training loss: 1.5732... 1.6333 sec/batch
Epoch: 8/20... Training Step: 1516... Training loss: 1.5641... 1.6329 sec/batch
Epoch: 8/20... Training Step: 1517... Training loss: 1.5572... 1.6505 sec/batch
Epoch: 8/20... Training Step: 1518... Training loss: 1.5599... 1.6356 sec/batch
Epoch: 8/20... Training Step: 1519... Training loss: 1.6118... 1.6381 sec/batch
Epoch: 8/20... Training Step: 1520... Training loss: 1.5930... 1.6235 sec/batch
Epoch: 8/20... Training Step: 1521... Training loss: 1.6076... 1.6206 sec/batch
Epoch: 8/20... Training Step: 1522... Training loss: 1.6009... 1.6325 sec/batch
Epoch: 8/20... Training Step: 1523... Training loss: 1.6186... 1.6314 sec/batch
Epoch: 8/20... Training Step: 1524... Training loss: 1.5891... 1.6216 sec/batch
Epoch: 8/20... Training Step: 1525... Training loss: 1.6013... 1.6356 sec/batch
Epoch: 8/20... Training Step: 1526... Training loss: 1.5834... 1.6262 sec/batch
Epoch: 8/20... Training Step: 1527... Training loss: 1.6421... 1.6256 sec/batch
Epoch: 8/20... Training Step: 1528... Training loss: 1.6023... 1.7020 sec/batch
Epoch: 8/20... Training Step: 1529... Training loss: 1.5802... 1.7325 sec/batch
Epoch: 8/20... Training Step: 1530... Training loss: 1.6147... 1.7557 sec/batch
Epoch: 8/20... Training Step: 1531... Training loss: 1.5707... 1.8082 sec/batch
Epoch: 8/20... Training Step: 1532... Training loss: 1.6102... 1.6358 sec/batch
Epoch: 8/20... Training Step: 1533... Training loss: 1.6082... 1.6307 sec/batch
Epoch: 8/20... Training Step: 1534... Training loss: 1.6163... 1.6323 sec/batch
Epoch: 8/20... Training Step: 1535... Training loss: 1.6071... 1.6253 sec/batch
Epoch: 8/20... Training Step: 1536... Training loss: 1.5847... 1.6322 sec/batch
Epoch: 8/20... Training Step: 1537... Training loss: 1.5541... 1.6401 sec/batch
Epoch: 8/20... Training Step: 1538... Training loss: 1.5893... 1.6523 sec/batch
Epoch: 8/20... Training Step: 1539... Training loss: 1.6085... 1.6113 sec/batch
Epoch: 8/20... Training Step: 1540... Training loss: 1.5980... 1.6394 sec/batch
Epoch: 8/20... Training Step: 1541... Training loss: 1.5929... 1.6132 sec/batch
Epoch: 8/20... Training Step: 1542... Training loss: 1.5895... 1.6339 sec/batch
Epoch: 8/20... Training Step: 1543... Training loss: 1.5993... 1.6376 sec/batch
Epoch: 8/20... Training Step: 1544... Training loss: 1.5945... 1.6326 sec/batch
Epoch: 8/20... Training Step: 1545... Training loss: 1.5500... 1.6607 sec/batch
Epoch: 8/20... Training Step: 1546... Training loss: 1.6127... 1.8247 sec/batch
Epoch: 8/20... Training Step: 1547... Training loss: 1.6079... 1.6784 sec/batch
Epoch: 8/20... Training Step: 1548... Training loss: 1.5960... 1.6630 sec/batch
Epoch: 8/20... Training Step: 1549... Training loss: 1.6101... 1.7111 sec/batch
Epoch: 8/20... Training Step: 1550... Training loss: 1.5867... 1.6635 sec/batch
Epoch: 8/20... Training Step: 1551... Training loss: 1.5978... 1.6979 sec/batch
Epoch: 8/20... Training Step: 1552... Training loss: 1.5973... 1.6614 sec/batch
Epoch: 8/20... Training Step: 1553... Training loss: 1.6109... 1.6243 sec/batch
Epoch: 8/20... Training Step: 1554... Training loss: 1.6568... 1.6242 sec/batch
Epoch: 8/20... Training Step: 1555... Training loss: 1.5898... 1.6145 sec/batch
Epoch: 8/20... Training Step: 1556... Training loss: 1.5941... 1.6276 sec/batch
Epoch: 8/20... Training Step: 1557... Training loss: 1.5721... 1.6286 sec/batch
Epoch: 8/20... Training Step: 1558... Training loss: 1.5889... 1.6277 sec/batch
Epoch: 8/20... Training Step: 1559... Training loss: 1.6297... 1.6312 sec/batch
Epoch: 8/20... Training Step: 1560... Training loss: 1.5917... 1.6229 sec/batch
Epoch: 8/20... Training Step: 1561... Training loss: 1.6053... 1.6351 sec/batch
Epoch: 8/20... Training Step: 1562... Training loss: 1.5748... 1.6350 sec/batch
Epoch: 8/20... Training Step: 1563... Training loss: 1.5764... 1.6329 sec/batch
Epoch: 8/20... Training Step: 1564... Training loss: 1.6068... 1.6327 sec/batch
Epoch: 8/20... Training Step: 1565... Training loss: 1.5705... 1.5971 sec/batch
Epoch: 8/20... Training Step: 1566... Training loss: 1.5526... 1.6485 sec/batch
Epoch: 8/20... Training Step: 1567... Training loss: 1.5550... 1.6825 sec/batch
Epoch: 8/20... Training Step: 1568... Training loss: 1.5882... 2.0026 sec/batch
Epoch: 8/20... Training Step: 1569... Training loss: 1.5842... 1.7445 sec/batch
Epoch: 8/20... Training Step: 1570... Training loss: 1.5901... 1.8199 sec/batch
Epoch: 8/20... Training Step: 1571... Training loss: 1.5833... 1.6815 sec/batch
Epoch: 8/20... Training Step: 1572... Training loss: 1.5761... 1.6211 sec/batch
Epoch: 8/20... Training Step: 1573... Training loss: 1.5991... 1.6303 sec/batch
Epoch: 8/20... Training Step: 1574... Training loss: 1.5782... 1.6255 sec/batch
Epoch: 8/20... Training Step: 1575... Training loss: 1.5728... 1.6244 sec/batch
Epoch: 8/20... Training Step: 1576... Training loss: 1.5851... 1.6443 sec/batch
Epoch: 8/20... Training Step: 1577... Training loss: 1.5875... 1.6371 sec/batch
Epoch: 8/20... Training Step: 1578... Training loss: 1.5634... 1.6306 sec/batch
Epoch: 8/20... Training Step: 1579... Training loss: 1.5928... 1.6085 sec/batch
Epoch: 8/20... Training Step: 1580... Training loss: 1.5625... 1.6456 sec/batch
Epoch: 8/20... Training Step: 1581... Training loss: 1.5521... 1.6342 sec/batch
Epoch: 8/20... Training Step: 1582... Training loss: 1.5904... 1.6249 sec/batch
Epoch: 8/20... Training Step: 1583... Training loss: 1.5718... 1.6268 sec/batch
Epoch: 8/20... Training Step: 1584... Training loss: 1.5673... 1.6267 sec/batch
Epoch: 9/20... Training Step: 1585... Training loss: 1.6628... 1.6559 sec/batch
Epoch: 9/20... Training Step: 1586... Training loss: 1.5820... 1.6008 sec/batch
Epoch: 9/20... Training Step: 1587... Training loss: 1.5762... 1.6470 sec/batch
Epoch: 9/20... Training Step: 1588... Training loss: 1.5938... 1.6251 sec/batch
Epoch: 9/20... Training Step: 1589... Training loss: 1.5593... 1.6284 sec/batch
Epoch: 9/20... Training Step: 1590... Training loss: 1.5458... 1.6124 sec/batch
Epoch: 9/20... Training Step: 1591... Training loss: 1.5850... 1.6304 sec/batch
Epoch: 9/20... Training Step: 1592... Training loss: 1.5698... 1.6384 sec/batch
Epoch: 9/20... Training Step: 1593... Training loss: 1.5913... 1.6367 sec/batch
Epoch: 9/20... Training Step: 1594... Training loss: 1.5626... 1.6290 sec/batch
Epoch: 9/20... Training Step: 1595... Training loss: 1.5607... 1.6331 sec/batch
Epoch: 9/20... Training Step: 1596... Training loss: 1.5644... 1.6263 sec/batch
Epoch: 9/20... Training Step: 1597... Training loss: 1.5625... 1.6320 sec/batch
Epoch: 9/20... Training Step: 1598... Training loss: 1.5996... 1.6472 sec/batch
Epoch: 9/20... Training Step: 1599... Training loss: 1.5749... 1.6385 sec/batch
Epoch: 9/20... Training Step: 1600... Training loss: 1.5503... 1.6437 sec/batch
Epoch: 9/20... Training Step: 1601... Training loss: 1.5948... 1.7831 sec/batch
Epoch: 9/20... Training Step: 1602... Training loss: 1.6011... 1.6196 sec/batch
Epoch: 9/20... Training Step: 1603... Training loss: 1.5754... 1.6671 sec/batch
Epoch: 9/20... Training Step: 1604... Training loss: 1.6025... 1.6293 sec/batch
Epoch: 9/20... Training Step: 1605... Training loss: 1.5787... 1.6211 sec/batch
Epoch: 9/20... Training Step: 1606... Training loss: 1.6025... 1.6375 sec/batch
Epoch: 9/20... Training Step: 1607... Training loss: 1.5629... 1.7119 sec/batch
Epoch: 9/20... Training Step: 1608... Training loss: 1.5701... 1.7485 sec/batch
Epoch: 9/20... Training Step: 1609... Training loss: 1.5717... 1.7580 sec/batch
Epoch: 9/20... Training Step: 1610... Training loss: 1.5490... 1.7625 sec/batch
Epoch: 9/20... Training Step: 1611... Training loss: 1.5537... 1.6264 sec/batch
Epoch: 9/20... Training Step: 1612... Training loss: 1.5935... 1.6402 sec/batch
Epoch: 9/20... Training Step: 1613... Training loss: 1.5992... 1.6350 sec/batch
Epoch: 9/20... Training Step: 1614... Training loss: 1.6061... 1.6323 sec/batch
Epoch: 9/20... Training Step: 1615... Training loss: 1.5793... 1.6461 sec/batch
Epoch: 9/20... Training Step: 1616... Training loss: 1.5626... 1.6470 sec/batch
Epoch: 9/20... Training Step: 1617... Training loss: 1.5858... 1.6243 sec/batch
Epoch: 9/20... Training Step: 1618... Training loss: 1.5990... 1.6219 sec/batch
Epoch: 9/20... Training Step: 1619... Training loss: 1.5709... 1.6231 sec/batch
Epoch: 9/20... Training Step: 1620... Training loss: 1.5845... 1.6367 sec/batch
Epoch: 9/20... Training Step: 1621... Training loss: 1.5512... 1.6705 sec/batch
Epoch: 9/20... Training Step: 1622... Training loss: 1.5407... 1.6371 sec/batch
Epoch: 9/20... Training Step: 1623... Training loss: 1.5473... 1.6290 sec/batch
Epoch: 9/20... Training Step: 1624... Training loss: 1.5596... 1.6186 sec/batch
Epoch: 9/20... Training Step: 1625... Training loss: 1.5546... 1.6234 sec/batch
Epoch: 9/20... Training Step: 1626... Training loss: 1.6111... 1.6334 sec/batch
Epoch: 9/20... Training Step: 1627... Training loss: 1.5590... 1.6191 sec/batch
Epoch: 9/20... Training Step: 1628... Training loss: 1.5597... 1.6036 sec/batch
Epoch: 9/20... Training Step: 1629... Training loss: 1.5956... 1.6408 sec/batch
Epoch: 9/20... Training Step: 1630... Training loss: 1.5380... 1.6302 sec/batch
Epoch: 9/20... Training Step: 1631... Training loss: 1.5720... 1.6490 sec/batch
Epoch: 9/20... Training Step: 1632... Training loss: 1.5610... 1.6269 sec/batch
Epoch: 9/20... Training Step: 1633... Training loss: 1.5702... 1.6204 sec/batch
Epoch: 9/20... Training Step: 1634... Training loss: 1.6033... 1.6293 sec/batch
Epoch: 9/20... Training Step: 1635... Training loss: 1.5474... 1.6086 sec/batch
Epoch: 9/20... Training Step: 1636... Training loss: 1.6223... 1.6159 sec/batch
Epoch: 9/20... Training Step: 1637... Training loss: 1.5905... 1.6324 sec/batch
Epoch: 9/20... Training Step: 1638... Training loss: 1.5855... 1.6492 sec/batch
Epoch: 9/20... Training Step: 1639... Training loss: 1.5562... 1.6695 sec/batch
Epoch: 9/20... Training Step: 1640... Training loss: 1.5844... 1.6651 sec/batch
Epoch: 9/20... Training Step: 1641... Training loss: 1.6002... 1.6412 sec/batch
Epoch: 9/20... Training Step: 1642... Training loss: 1.5601... 1.6195 sec/batch
Epoch: 9/20... Training Step: 1643... Training loss: 1.5600... 1.6290 sec/batch
Epoch: 9/20... Training Step: 1644... Training loss: 1.6063... 1.6347 sec/batch
Epoch: 9/20... Training Step: 1645... Training loss: 1.5752... 1.6409 sec/batch
Epoch: 9/20... Training Step: 1646... Training loss: 1.6293... 1.6397 sec/batch
Epoch: 9/20... Training Step: 1647... Training loss: 1.6069... 1.6684 sec/batch
Epoch: 9/20... Training Step: 1648... Training loss: 1.5887... 1.7864 sec/batch
Epoch: 9/20... Training Step: 1649... Training loss: 1.5735... 1.7551 sec/batch
Epoch: 9/20... Training Step: 1650... Training loss: 1.5902... 1.7485 sec/batch
Epoch: 9/20... Training Step: 1651... Training loss: 1.5843... 1.6226 sec/batch
Epoch: 9/20... Training Step: 1652... Training loss: 1.5543... 1.6432 sec/batch
Epoch: 9/20... Training Step: 1653... Training loss: 1.5623... 1.6432 sec/batch
Epoch: 9/20... Training Step: 1654... Training loss: 1.5616... 1.6161 sec/batch
Epoch: 9/20... Training Step: 1655... Training loss: 1.6145... 1.6338 sec/batch
Epoch: 9/20... Training Step: 1656... Training loss: 1.6047... 1.6430 sec/batch
Epoch: 9/20... Training Step: 1657... Training loss: 1.6192... 1.6414 sec/batch
Epoch: 9/20... Training Step: 1658... Training loss: 1.5671... 1.6211 sec/batch
Epoch: 9/20... Training Step: 1659... Training loss: 1.5740... 1.6277 sec/batch
Epoch: 9/20... Training Step: 1660... Training loss: 1.5964... 1.6363 sec/batch
Epoch: 9/20... Training Step: 1661... Training loss: 1.5739... 1.6298 sec/batch
Epoch: 9/20... Training Step: 1662... Training loss: 1.5674... 1.6254 sec/batch
Epoch: 9/20... Training Step: 1663... Training loss: 1.5460... 1.6353 sec/batch
Epoch: 9/20... Training Step: 1664... Training loss: 1.5682... 1.6382 sec/batch
Epoch: 9/20... Training Step: 1665... Training loss: 1.5302... 1.6292 sec/batch
Epoch: 9/20... Training Step: 1666... Training loss: 1.5884... 1.6320 sec/batch
Epoch: 9/20... Training Step: 1667... Training loss: 1.5450... 1.6319 sec/batch
Epoch: 9/20... Training Step: 1668... Training loss: 1.5591... 1.6298 sec/batch
Epoch: 9/20... Training Step: 1669... Training loss: 1.5464... 1.6318 sec/batch
Epoch: 9/20... Training Step: 1670... Training loss: 1.5565... 1.6280 sec/batch
Epoch: 9/20... Training Step: 1671... Training loss: 1.5560... 1.6181 sec/batch
Epoch: 9/20... Training Step: 1672... Training loss: 1.5543... 1.6304 sec/batch
Epoch: 9/20... Training Step: 1673... Training loss: 1.5444... 1.6312 sec/batch
Epoch: 9/20... Training Step: 1674... Training loss: 1.5906... 1.6218 sec/batch
Epoch: 9/20... Training Step: 1675... Training loss: 1.5572... 1.6563 sec/batch
Epoch: 9/20... Training Step: 1676... Training loss: 1.5558... 1.6658 sec/batch
Epoch: 9/20... Training Step: 1677... Training loss: 1.5468... 1.6299 sec/batch
Epoch: 9/20... Training Step: 1678... Training loss: 1.5361... 1.6401 sec/batch
Epoch: 9/20... Training Step: 1679... Training loss: 1.5515... 1.6326 sec/batch
Epoch: 9/20... Training Step: 1680... Training loss: 1.5789... 1.6361 sec/batch
Epoch: 9/20... Training Step: 1681... Training loss: 1.5737... 1.6313 sec/batch
Epoch: 9/20... Training Step: 1682... Training loss: 1.5420... 1.6398 sec/batch
Epoch: 9/20... Training Step: 1683... Training loss: 1.5468... 1.6331 sec/batch
Epoch: 9/20... Training Step: 1684... Training loss: 1.5325... 1.6422 sec/batch
Epoch: 9/20... Training Step: 1685... Training loss: 1.5628... 1.6361 sec/batch
Epoch: 9/20... Training Step: 1686... Training loss: 1.5600... 1.6295 sec/batch
Epoch: 9/20... Training Step: 1687... Training loss: 1.5506... 1.6874 sec/batch
Epoch: 9/20... Training Step: 1688... Training loss: 1.5505... 1.7847 sec/batch
Epoch: 9/20... Training Step: 1689... Training loss: 1.5565... 1.7329 sec/batch
Epoch: 9/20... Training Step: 1690... Training loss: 1.5764... 1.7844 sec/batch
Epoch: 9/20... Training Step: 1691... Training loss: 1.5628... 1.6483 sec/batch
Epoch: 9/20... Training Step: 1692... Training loss: 1.5797... 1.6366 sec/batch
Epoch: 9/20... Training Step: 1693... Training loss: 1.5664... 1.6374 sec/batch
Epoch: 9/20... Training Step: 1694... Training loss: 1.5827... 1.6816 sec/batch
Epoch: 9/20... Training Step: 1695... Training loss: 1.5473... 1.6335 sec/batch
Epoch: 9/20... Training Step: 1696... Training loss: 1.5718... 1.6470 sec/batch
Epoch: 9/20... Training Step: 1697... Training loss: 1.5515... 1.6442 sec/batch
Epoch: 9/20... Training Step: 1698... Training loss: 1.5447... 1.6224 sec/batch
Epoch: 9/20... Training Step: 1699... Training loss: 1.5480... 1.6330 sec/batch
Epoch: 9/20... Training Step: 1700... Training loss: 1.5296... 1.6480 sec/batch
Epoch: 9/20... Training Step: 1701... Training loss: 1.5869... 1.6422 sec/batch
Epoch: 9/20... Training Step: 1702... Training loss: 1.5710... 1.6345 sec/batch
Epoch: 9/20... Training Step: 1703... Training loss: 1.5578... 1.6393 sec/batch
Epoch: 9/20... Training Step: 1704... Training loss: 1.5546... 1.6213 sec/batch
Epoch: 9/20... Training Step: 1705... Training loss: 1.5730... 1.6305 sec/batch
Epoch: 9/20... Training Step: 1706... Training loss: 1.5389... 1.6344 sec/batch
Epoch: 9/20... Training Step: 1707... Training loss: 1.5244... 1.6336 sec/batch
Epoch: 9/20... Training Step: 1708... Training loss: 1.5831... 1.6413 sec/batch
Epoch: 9/20... Training Step: 1709... Training loss: 1.5489... 1.6012 sec/batch
Epoch: 9/20... Training Step: 1710... Training loss: 1.5199... 1.6294 sec/batch
Epoch: 9/20... Training Step: 1711... Training loss: 1.5729... 1.7371 sec/batch
Epoch: 9/20... Training Step: 1712... Training loss: 1.5754... 1.6411 sec/batch
Epoch: 9/20... Training Step: 1713... Training loss: 1.5501... 1.6530 sec/batch
Epoch: 9/20... Training Step: 1714... Training loss: 1.5404... 1.6431 sec/batch
Epoch: 9/20... Training Step: 1715... Training loss: 1.5365... 1.6263 sec/batch
Epoch: 9/20... Training Step: 1716... Training loss: 1.5308... 1.6463 sec/batch
Epoch: 9/20... Training Step: 1717... Training loss: 1.5868... 1.6456 sec/batch
Epoch: 9/20... Training Step: 1718... Training loss: 1.5763... 1.7110 sec/batch
Epoch: 9/20... Training Step: 1719... Training loss: 1.5728... 1.6889 sec/batch
Epoch: 9/20... Training Step: 1720... Training loss: 1.5622... 1.6187 sec/batch
Epoch: 9/20... Training Step: 1721... Training loss: 1.5879... 1.6185 sec/batch
Epoch: 9/20... Training Step: 1722... Training loss: 1.5703... 1.6268 sec/batch
Epoch: 9/20... Training Step: 1723... Training loss: 1.5725... 1.6231 sec/batch
Epoch: 9/20... Training Step: 1724... Training loss: 1.5660... 1.6419 sec/batch
Epoch: 9/20... Training Step: 1725... Training loss: 1.6098... 1.6559 sec/batch
Epoch: 9/20... Training Step: 1726... Training loss: 1.5682... 1.6297 sec/batch
Epoch: 9/20... Training Step: 1727... Training loss: 1.5612... 1.7321 sec/batch
Epoch: 9/20... Training Step: 1728... Training loss: 1.5777... 1.7467 sec/batch
Epoch: 9/20... Training Step: 1729... Training loss: 1.5450... 1.7790 sec/batch
Epoch: 9/20... Training Step: 1730... Training loss: 1.5854... 1.7588 sec/batch
Epoch: 9/20... Training Step: 1731... Training loss: 1.5774... 1.6296 sec/batch
Epoch: 9/20... Training Step: 1732... Training loss: 1.5990... 1.6333 sec/batch
Epoch: 9/20... Training Step: 1733... Training loss: 1.5730... 1.6335 sec/batch
Epoch: 9/20... Training Step: 1734... Training loss: 1.5756... 1.6341 sec/batch
Epoch: 9/20... Training Step: 1735... Training loss: 1.5353... 1.6660 sec/batch
Epoch: 9/20... Training Step: 1736... Training loss: 1.5570... 1.6401 sec/batch
Epoch: 9/20... Training Step: 1737... Training loss: 1.5843... 1.6377 sec/batch
Epoch: 9/20... Training Step: 1738... Training loss: 1.5714... 1.6325 sec/batch
Epoch: 9/20... Training Step: 1739... Training loss: 1.5630... 1.6377 sec/batch
Epoch: 9/20... Training Step: 1740... Training loss: 1.5668... 1.6258 sec/batch
Epoch: 9/20... Training Step: 1741... Training loss: 1.5715... 1.6278 sec/batch
Epoch: 9/20... Training Step: 1742... Training loss: 1.5614... 1.6418 sec/batch
Epoch: 9/20... Training Step: 1743... Training loss: 1.5366... 1.6069 sec/batch
Epoch: 9/20... Training Step: 1744... Training loss: 1.5915... 1.6313 sec/batch
Epoch: 9/20... Training Step: 1745... Training loss: 1.6021... 1.6485 sec/batch
Epoch: 9/20... Training Step: 1746... Training loss: 1.5619... 1.6425 sec/batch
Epoch: 9/20... Training Step: 1747... Training loss: 1.5779... 1.6717 sec/batch
Epoch: 9/20... Training Step: 1748... Training loss: 1.5605... 1.6615 sec/batch
Epoch: 9/20... Training Step: 1749... Training loss: 1.5614... 1.8087 sec/batch
Epoch: 9/20... Training Step: 1750... Training loss: 1.5637... 1.6176 sec/batch
Epoch: 9/20... Training Step: 1751... Training loss: 1.5915... 1.6261 sec/batch
Epoch: 9/20... Training Step: 1752... Training loss: 1.6434... 1.6389 sec/batch
Epoch: 9/20... Training Step: 1753... Training loss: 1.5601... 1.6248 sec/batch
Epoch: 9/20... Training Step: 1754... Training loss: 1.5661... 1.6166 sec/batch
Epoch: 9/20... Training Step: 1755... Training loss: 1.5540... 1.6344 sec/batch
Epoch: 9/20... Training Step: 1756... Training loss: 1.5498... 1.6355 sec/batch
Epoch: 9/20... Training Step: 1757... Training loss: 1.5942... 1.6434 sec/batch
Epoch: 9/20... Training Step: 1758... Training loss: 1.5844... 1.6365 sec/batch
Epoch: 9/20... Training Step: 1759... Training loss: 1.5784... 1.6169 sec/batch
Epoch: 9/20... Training Step: 1760... Training loss: 1.5377... 1.6397 sec/batch
Epoch: 9/20... Training Step: 1761... Training loss: 1.5411... 1.6355 sec/batch
Epoch: 9/20... Training Step: 1762... Training loss: 1.5884... 1.6384 sec/batch
Epoch: 9/20... Training Step: 1763... Training loss: 1.5400... 1.6367 sec/batch
Epoch: 9/20... Training Step: 1764... Training loss: 1.5237... 1.6265 sec/batch
Epoch: 9/20... Training Step: 1765... Training loss: 1.5410... 1.6546 sec/batch
Epoch: 9/20... Training Step: 1766... Training loss: 1.5655... 1.6632 sec/batch
Epoch: 9/20... Training Step: 1767... Training loss: 1.5600... 1.7775 sec/batch
Epoch: 9/20... Training Step: 1768... Training loss: 1.5602... 1.7324 sec/batch
Epoch: 9/20... Training Step: 1769... Training loss: 1.5576... 1.8043 sec/batch
Epoch: 9/20... Training Step: 1770... Training loss: 1.5501... 1.6663 sec/batch
Epoch: 9/20... Training Step: 1771... Training loss: 1.5761... 1.6312 sec/batch
Epoch: 9/20... Training Step: 1772... Training loss: 1.5525... 1.6318 sec/batch
Epoch: 9/20... Training Step: 1773... Training loss: 1.5545... 1.6574 sec/batch
Epoch: 9/20... Training Step: 1774... Training loss: 1.5647... 1.6306 sec/batch
Epoch: 9/20... Training Step: 1775... Training loss: 1.5466... 1.6325 sec/batch
Epoch: 9/20... Training Step: 1776... Training loss: 1.5356... 1.6356 sec/batch
Epoch: 9/20... Training Step: 1777... Training loss: 1.5606... 1.6274 sec/batch
Epoch: 9/20... Training Step: 1778... Training loss: 1.5394... 1.6489 sec/batch
Epoch: 9/20... Training Step: 1779... Training loss: 1.5342... 1.6347 sec/batch
Epoch: 9/20... Training Step: 1780... Training loss: 1.5653... 1.6375 sec/batch
Epoch: 9/20... Training Step: 1781... Training loss: 1.5536... 1.6343 sec/batch
Epoch: 9/20... Training Step: 1782... Training loss: 1.5505... 1.6286 sec/batch
Epoch: 10/20... Training Step: 1783... Training loss: 1.6206... 1.8708 sec/batch
Epoch: 10/20... Training Step: 1784... Training loss: 1.5577... 1.6849 sec/batch
Epoch: 10/20... Training Step: 1785... Training loss: 1.5482... 1.6476 sec/batch
Epoch: 10/20... Training Step: 1786... Training loss: 1.5676... 1.6434 sec/batch
Epoch: 10/20... Training Step: 1787... Training loss: 1.5569... 1.6287 sec/batch
Epoch: 10/20... Training Step: 1788... Training loss: 1.5223... 1.6241 sec/batch
Epoch: 10/20... Training Step: 1789... Training loss: 1.5500... 1.6366 sec/batch
Epoch: 10/20... Training Step: 1790... Training loss: 1.5416... 1.6368 sec/batch
Epoch: 10/20... Training Step: 1791... Training loss: 1.5567... 1.6452 sec/batch
Epoch: 10/20... Training Step: 1792... Training loss: 1.5434... 1.6488 sec/batch
Epoch: 10/20... Training Step: 1793... Training loss: 1.5345... 1.6272 sec/batch
Epoch: 10/20... Training Step: 1794... Training loss: 1.5453... 1.6230 sec/batch
Epoch: 10/20... Training Step: 1795... Training loss: 1.5521... 1.6399 sec/batch
Epoch: 10/20... Training Step: 1796... Training loss: 1.5823... 1.6329 sec/batch
Epoch: 10/20... Training Step: 1797... Training loss: 1.5433... 1.6354 sec/batch
Epoch: 10/20... Training Step: 1798... Training loss: 1.5328... 1.6430 sec/batch
Epoch: 10/20... Training Step: 1799... Training loss: 1.5787... 1.6317 sec/batch
Epoch: 10/20... Training Step: 1800... Training loss: 1.5851... 1.6318 sec/batch
Epoch: 10/20... Training Step: 1801... Training loss: 1.5606... 1.6025 sec/batch
Epoch: 10/20... Training Step: 1802... Training loss: 1.5708... 1.6295 sec/batch
Epoch: 10/20... Training Step: 1803... Training loss: 1.5467... 1.6204 sec/batch
Epoch: 10/20... Training Step: 1804... Training loss: 1.5723... 1.6433 sec/batch
Epoch: 10/20... Training Step: 1805... Training loss: 1.5524... 1.6357 sec/batch
Epoch: 10/20... Training Step: 1806... Training loss: 1.5516... 1.7501 sec/batch
Epoch: 10/20... Training Step: 1807... Training loss: 1.5597... 1.7104 sec/batch
Epoch: 10/20... Training Step: 1808... Training loss: 1.5083... 1.7668 sec/batch
Epoch: 10/20... Training Step: 1809... Training loss: 1.5296... 1.7190 sec/batch
Epoch: 10/20... Training Step: 1810... Training loss: 1.5756... 1.6239 sec/batch
Epoch: 10/20... Training Step: 1811... Training loss: 1.5709... 1.6423 sec/batch
Epoch: 10/20... Training Step: 1812... Training loss: 1.5740... 1.6319 sec/batch
Epoch: 10/20... Training Step: 1813... Training loss: 1.5450... 1.6163 sec/batch
Epoch: 10/20... Training Step: 1814... Training loss: 1.5364... 1.6378 sec/batch
Epoch: 10/20... Training Step: 1815... Training loss: 1.5659... 1.6440 sec/batch
Epoch: 10/20... Training Step: 1816... Training loss: 1.5726... 1.6341 sec/batch
Epoch: 10/20... Training Step: 1817... Training loss: 1.5522... 1.6368 sec/batch
Epoch: 10/20... Training Step: 1818... Training loss: 1.5501... 1.6419 sec/batch
Epoch: 10/20... Training Step: 1819... Training loss: 1.5338... 1.6390 sec/batch
Epoch: 10/20... Training Step: 1820... Training loss: 1.5191... 1.6797 sec/batch
Epoch: 10/20... Training Step: 1821... Training loss: 1.5127... 1.6189 sec/batch
Epoch: 10/20... Training Step: 1822... Training loss: 1.5280... 1.6280 sec/batch
Epoch: 10/20... Training Step: 1823... Training loss: 1.5232... 1.6300 sec/batch
Epoch: 10/20... Training Step: 1824... Training loss: 1.5916... 1.6325 sec/batch
Epoch: 10/20... Training Step: 1825... Training loss: 1.5437... 1.6328 sec/batch
Epoch: 10/20... Training Step: 1826... Training loss: 1.5274... 1.6347 sec/batch
Epoch: 10/20... Training Step: 1827... Training loss: 1.5662... 1.6240 sec/batch
Epoch: 10/20... Training Step: 1828... Training loss: 1.5255... 1.6360 sec/batch
Epoch: 10/20... Training Step: 1829... Training loss: 1.5509... 1.6313 sec/batch
Epoch: 10/20... Training Step: 1830... Training loss: 1.5255... 1.6262 sec/batch
Epoch: 10/20... Training Step: 1831... Training loss: 1.5393... 1.6314 sec/batch
Epoch: 10/20... Training Step: 1832... Training loss: 1.5772... 1.6326 sec/batch
Epoch: 10/20... Training Step: 1833... Training loss: 1.5228... 1.6313 sec/batch
Epoch: 10/20... Training Step: 1834... Training loss: 1.6028... 1.6501 sec/batch
Epoch: 10/20... Training Step: 1835... Training loss: 1.5627... 1.6197 sec/batch
Epoch: 10/20... Training Step: 1836... Training loss: 1.5708... 1.6357 sec/batch
Epoch: 10/20... Training Step: 1837... Training loss: 1.5391... 1.6465 sec/batch
Epoch: 10/20... Training Step: 1838... Training loss: 1.5527... 1.6454 sec/batch
Epoch: 10/20... Training Step: 1839... Training loss: 1.5596... 1.6635 sec/batch
Epoch: 10/20... Training Step: 1840... Training loss: 1.5361... 1.6351 sec/batch
Epoch: 10/20... Training Step: 1841... Training loss: 1.5248... 1.6349 sec/batch
Epoch: 10/20... Training Step: 1842... Training loss: 1.5823... 1.5961 sec/batch
Epoch: 10/20... Training Step: 1843... Training loss: 1.5511... 1.6256 sec/batch
Epoch: 10/20... Training Step: 1844... Training loss: 1.6153... 1.6374 sec/batch
Epoch: 10/20... Training Step: 1845... Training loss: 1.5856... 1.6343 sec/batch
Epoch: 10/20... Training Step: 1846... Training loss: 1.5586... 1.7703 sec/batch
Epoch: 10/20... Training Step: 1847... Training loss: 1.5453... 1.7407 sec/batch
Epoch: 10/20... Training Step: 1848... Training loss: 1.5669... 1.7717 sec/batch
Epoch: 10/20... Training Step: 1849... Training loss: 1.5646... 1.6816 sec/batch
Epoch: 10/20... Training Step: 1850... Training loss: 1.5233... 1.6206 sec/batch
Epoch: 10/20... Training Step: 1851... Training loss: 1.5366... 1.6337 sec/batch
Epoch: 10/20... Training Step: 1852... Training loss: 1.5387... 1.6302 sec/batch
Epoch: 10/20... Training Step: 1853... Training loss: 1.5891... 1.6360 sec/batch
Epoch: 10/20... Training Step: 1854... Training loss: 1.5764... 1.6167 sec/batch
Epoch: 10/20... Training Step: 1855... Training loss: 1.5920... 1.6273 sec/batch
Epoch: 10/20... Training Step: 1856... Training loss: 1.5448... 1.6646 sec/batch
Epoch: 10/20... Training Step: 1857... Training loss: 1.5569... 1.6627 sec/batch
Epoch: 10/20... Training Step: 1858... Training loss: 1.5838... 1.6488 sec/batch
Epoch: 10/20... Training Step: 1859... Training loss: 1.5459... 1.6496 sec/batch
Epoch: 10/20... Training Step: 1860... Training loss: 1.5493... 1.6342 sec/batch
Epoch: 10/20... Training Step: 1861... Training loss: 1.5177... 1.6335 sec/batch
Epoch: 10/20... Training Step: 1862... Training loss: 1.5492... 1.6362 sec/batch
Epoch: 10/20... Training Step: 1863... Training loss: 1.5131... 1.6335 sec/batch
Epoch: 10/20... Training Step: 1864... Training loss: 1.5580... 1.6179 sec/batch
Epoch: 10/20... Training Step: 1865... Training loss: 1.5134... 1.6350 sec/batch
Epoch: 10/20... Training Step: 1866... Training loss: 1.5554... 1.6301 sec/batch
Epoch: 10/20... Training Step: 1867... Training loss: 1.5331... 1.6352 sec/batch
Epoch: 10/20... Training Step: 1868... Training loss: 1.5375... 1.6368 sec/batch
Epoch: 10/20... Training Step: 1869... Training loss: 1.5261... 1.6192 sec/batch
Epoch: 10/20... Training Step: 1870... Training loss: 1.5365... 1.6546 sec/batch
Epoch: 10/20... Training Step: 1871... Training loss: 1.5142... 1.6313 sec/batch
Epoch: 10/20... Training Step: 1872... Training loss: 1.5592... 1.6040 sec/batch
Epoch: 10/20... Training Step: 1873... Training loss: 1.5203... 1.6412 sec/batch
Epoch: 10/20... Training Step: 1874... Training loss: 1.5330... 1.6362 sec/batch
Epoch: 10/20... Training Step: 1875... Training loss: 1.5256... 1.6513 sec/batch
Epoch: 10/20... Training Step: 1876... Training loss: 1.5233... 1.6431 sec/batch
Epoch: 10/20... Training Step: 1877... Training loss: 1.5262... 1.6379 sec/batch
Epoch: 10/20... Training Step: 1878... Training loss: 1.5588... 1.6259 sec/batch
Epoch: 10/20... Training Step: 1879... Training loss: 1.5505... 1.6431 sec/batch
Epoch: 10/20... Training Step: 1880... Training loss: 1.5085... 1.6237 sec/batch
Epoch: 10/20... Training Step: 1881... Training loss: 1.5358... 1.6276 sec/batch
Epoch: 10/20... Training Step: 1882... Training loss: 1.5141... 1.6333 sec/batch
Epoch: 10/20... Training Step: 1883... Training loss: 1.5387... 1.6346 sec/batch
Epoch: 10/20... Training Step: 1884... Training loss: 1.5385... 1.6285 sec/batch
Epoch: 10/20... Training Step: 1885... Training loss: 1.5332... 1.6452 sec/batch
Epoch: 10/20... Training Step: 1886... Training loss: 1.5378... 1.7709 sec/batch
Epoch: 10/20... Training Step: 1887... Training loss: 1.5273... 1.7245 sec/batch
Epoch: 10/20... Training Step: 1888... Training loss: 1.5395... 1.8036 sec/batch
Epoch: 10/20... Training Step: 1889... Training loss: 1.5470... 1.6764 sec/batch
Epoch: 10/20... Training Step: 1890... Training loss: 1.5493... 1.6443 sec/batch
Epoch: 10/20... Training Step: 1891... Training loss: 1.5338... 1.6355 sec/batch
Epoch: 10/20... Training Step: 1892... Training loss: 1.5601... 1.6366 sec/batch
Epoch: 10/20... Training Step: 1893... Training loss: 1.5321... 1.6607 sec/batch
Epoch: 10/20... Training Step: 1894... Training loss: 1.5368... 1.6331 sec/batch
Epoch: 10/20... Training Step: 1895... Training loss: 1.5407... 1.5964 sec/batch
Epoch: 10/20... Training Step: 1896... Training loss: 1.5215... 1.6374 sec/batch
Epoch: 10/20... Training Step: 1897... Training loss: 1.5195... 1.6253 sec/batch
Epoch: 10/20... Training Step: 1898... Training loss: 1.5034... 1.6354 sec/batch
Epoch: 10/20... Training Step: 1899... Training loss: 1.5465... 1.6337 sec/batch
Epoch: 10/20... Training Step: 1900... Training loss: 1.5347... 1.6255 sec/batch
Epoch: 10/20... Training Step: 1901... Training loss: 1.5311... 1.6297 sec/batch
Epoch: 10/20... Training Step: 1902... Training loss: 1.5398... 1.6362 sec/batch
Epoch: 10/20... Training Step: 1903... Training loss: 1.5497... 1.6253 sec/batch
Epoch: 10/20... Training Step: 1904... Training loss: 1.5047... 1.6299 sec/batch
Epoch: 10/20... Training Step: 1905... Training loss: 1.4951... 1.6482 sec/batch
Epoch: 10/20... Training Step: 1906... Training loss: 1.5507... 1.6354 sec/batch
Epoch: 10/20... Training Step: 1907... Training loss: 1.5303... 1.6348 sec/batch
Epoch: 10/20... Training Step: 1908... Training loss: 1.5012... 1.6344 sec/batch
Epoch: 10/20... Training Step: 1909... Training loss: 1.5595... 1.6263 sec/batch
Epoch: 10/20... Training Step: 1910... Training loss: 1.5451... 1.6393 sec/batch
Epoch: 10/20... Training Step: 1911... Training loss: 1.5200... 1.6232 sec/batch
Epoch: 10/20... Training Step: 1912... Training loss: 1.5103... 1.6747 sec/batch
Epoch: 10/20... Training Step: 1913... Training loss: 1.4918... 1.6338 sec/batch
Epoch: 10/20... Training Step: 1914... Training loss: 1.5188... 1.6329 sec/batch
Epoch: 10/20... Training Step: 1915... Training loss: 1.5594... 1.6264 sec/batch
Epoch: 10/20... Training Step: 1916... Training loss: 1.5460... 1.6312 sec/batch
Epoch: 10/20... Training Step: 1917... Training loss: 1.5455... 1.6275 sec/batch
Epoch: 10/20... Training Step: 1918... Training loss: 1.5371... 1.6235 sec/batch
Epoch: 10/20... Training Step: 1919... Training loss: 1.5676... 1.6282 sec/batch
Epoch: 10/20... Training Step: 1920... Training loss: 1.5406... 1.6147 sec/batch
Epoch: 10/20... Training Step: 1921... Training loss: 1.5444... 1.6325 sec/batch
Epoch: 10/20... Training Step: 1922... Training loss: 1.5348... 1.6354 sec/batch
Epoch: 10/20... Training Step: 1923... Training loss: 1.5885... 1.6356 sec/batch
Epoch: 10/20... Training Step: 1924... Training loss: 1.5444... 1.6413 sec/batch
Epoch: 10/20... Training Step: 1925... Training loss: 1.5327... 1.6356 sec/batch
Epoch: 10/20... Training Step: 1926... Training loss: 1.5560... 1.8069 sec/batch
Epoch: 10/20... Training Step: 1927... Training loss: 1.5244... 1.7389 sec/batch
Epoch: 10/20... Training Step: 1928... Training loss: 1.5735... 1.7921 sec/batch
Epoch: 10/20... Training Step: 1929... Training loss: 1.5525... 1.6631 sec/batch
Epoch: 10/20... Training Step: 1930... Training loss: 1.5835... 1.8432 sec/batch
Epoch: 10/20... Training Step: 1931... Training loss: 1.5613... 1.6925 sec/batch
Epoch: 10/20... Training Step: 1932... Training loss: 1.5426... 1.6427 sec/batch
Epoch: 10/20... Training Step: 1933... Training loss: 1.5074... 1.6247 sec/batch
Epoch: 10/20... Training Step: 1934... Training loss: 1.5352... 1.6431 sec/batch
Epoch: 10/20... Training Step: 1935... Training loss: 1.5510... 1.6541 sec/batch
Epoch: 10/20... Training Step: 1936... Training loss: 1.5595... 1.6394 sec/batch
Epoch: 10/20... Training Step: 1937... Training loss: 1.5439... 1.6356 sec/batch
Epoch: 10/20... Training Step: 1938... Training loss: 1.5504... 1.6259 sec/batch
Epoch: 10/20... Training Step: 1939... Training loss: 1.5501... 1.6442 sec/batch
Epoch: 10/20... Training Step: 1940... Training loss: 1.5440... 1.6592 sec/batch
Epoch: 10/20... Training Step: 1941... Training loss: 1.4962... 1.6451 sec/batch
Epoch: 10/20... Training Step: 1942... Training loss: 1.5619... 1.6502 sec/batch
Epoch: 10/20... Training Step: 1943... Training loss: 1.5644... 1.6478 sec/batch
Epoch: 10/20... Training Step: 1944... Training loss: 1.5469... 1.6420 sec/batch
Epoch: 10/20... Training Step: 1945... Training loss: 1.5638... 1.6415 sec/batch
Epoch: 10/20... Training Step: 1946... Training loss: 1.5352... 1.6300 sec/batch
Epoch: 10/20... Training Step: 1947... Training loss: 1.5428... 1.6314 sec/batch
Epoch: 10/20... Training Step: 1948... Training loss: 1.5393... 1.6565 sec/batch
Epoch: 10/20... Training Step: 1949... Training loss: 1.5657... 1.6138 sec/batch
Epoch: 10/20... Training Step: 1950... Training loss: 1.6016... 1.6221 sec/batch
Epoch: 10/20... Training Step: 1951... Training loss: 1.5462... 1.6313 sec/batch
Epoch: 10/20... Training Step: 1952... Training loss: 1.5436... 1.6309 sec/batch
Epoch: 10/20... Training Step: 1953... Training loss: 1.5444... 1.6266 sec/batch
Epoch: 10/20... Training Step: 1954... Training loss: 1.5304... 1.6258 sec/batch
Epoch: 10/20... Training Step: 1955... Training loss: 1.5643... 1.6320 sec/batch
Epoch: 10/20... Training Step: 1956... Training loss: 1.5574... 1.6288 sec/batch
Epoch: 10/20... Training Step: 1957... Training loss: 1.5539... 1.6404 sec/batch
Epoch: 10/20... Training Step: 1958... Training loss: 1.5288... 1.6209 sec/batch
Epoch: 10/20... Training Step: 1959... Training loss: 1.5303... 1.6317 sec/batch
Epoch: 10/20... Training Step: 1960... Training loss: 1.5669... 1.6331 sec/batch
Epoch: 10/20... Training Step: 1961... Training loss: 1.5199... 1.6500 sec/batch
Epoch: 10/20... Training Step: 1962... Training loss: 1.5094... 1.6316 sec/batch
Epoch: 10/20... Training Step: 1963... Training loss: 1.5072... 1.6419 sec/batch
Epoch: 10/20... Training Step: 1964... Training loss: 1.5356... 1.8301 sec/batch
Epoch: 10/20... Training Step: 1965... Training loss: 1.5446... 1.7645 sec/batch
Epoch: 10/20... Training Step: 1966... Training loss: 1.5382... 1.7809 sec/batch
Epoch: 10/20... Training Step: 1967... Training loss: 1.5389... 1.7682 sec/batch
Epoch: 10/20... Training Step: 1968... Training loss: 1.5299... 1.7439 sec/batch
Epoch: 10/20... Training Step: 1969... Training loss: 1.5630... 1.6410 sec/batch
Epoch: 10/20... Training Step: 1970... Training loss: 1.5313... 1.6359 sec/batch
Epoch: 10/20... Training Step: 1971... Training loss: 1.5386... 1.6331 sec/batch
Epoch: 10/20... Training Step: 1972... Training loss: 1.5432... 1.6332 sec/batch
Epoch: 10/20... Training Step: 1973... Training loss: 1.5331... 1.6463 sec/batch
Epoch: 10/20... Training Step: 1974... Training loss: 1.5100... 1.6480 sec/batch
Epoch: 10/20... Training Step: 1975... Training loss: 1.5338... 1.6306 sec/batch
Epoch: 10/20... Training Step: 1976... Training loss: 1.5147... 1.6319 sec/batch
Epoch: 10/20... Training Step: 1977... Training loss: 1.5157... 1.6366 sec/batch
Epoch: 10/20... Training Step: 1978... Training loss: 1.5427... 1.6272 sec/batch
Epoch: 10/20... Training Step: 1979... Training loss: 1.5274... 1.6423 sec/batch
Epoch: 10/20... Training Step: 1980... Training loss: 1.5280... 1.6209 sec/batch
Epoch: 11/20... Training Step: 1981... Training loss: 1.6038... 1.6314 sec/batch
Epoch: 11/20... Training Step: 1982... Training loss: 1.5366... 1.6677 sec/batch
Epoch: 11/20... Training Step: 1983... Training loss: 1.5239... 1.6443 sec/batch
Epoch: 11/20... Training Step: 1984... Training loss: 1.5413... 1.6518 sec/batch
Epoch: 11/20... Training Step: 1985... Training loss: 1.5229... 1.6462 sec/batch
Epoch: 11/20... Training Step: 1986... Training loss: 1.5038... 1.6379 sec/batch
Epoch: 11/20... Training Step: 1987... Training loss: 1.5385... 1.6365 sec/batch
Epoch: 11/20... Training Step: 1988... Training loss: 1.5262... 1.6397 sec/batch
Epoch: 11/20... Training Step: 1989... Training loss: 1.5395... 1.6440 sec/batch
Epoch: 11/20... Training Step: 1990... Training loss: 1.5219... 1.6320 sec/batch
Epoch: 11/20... Training Step: 1991... Training loss: 1.5133... 1.6487 sec/batch
Epoch: 11/20... Training Step: 1992... Training loss: 1.5150... 1.6272 sec/batch
Epoch: 11/20... Training Step: 1993... Training loss: 1.5315... 1.6370 sec/batch
Epoch: 11/20... Training Step: 1994... Training loss: 1.5613... 1.6449 sec/batch
Epoch: 11/20... Training Step: 1995... Training loss: 1.5291... 1.6255 sec/batch
Epoch: 11/20... Training Step: 1996... Training loss: 1.4963... 1.6438 sec/batch
Epoch: 11/20... Training Step: 1997... Training loss: 1.5522... 1.6339 sec/batch
Epoch: 11/20... Training Step: 1998... Training loss: 1.5589... 1.6304 sec/batch
Epoch: 11/20... Training Step: 1999... Training loss: 1.5405... 1.6302 sec/batch
Epoch: 11/20... Training Step: 2000... Training loss: 1.5454... 1.6423 sec/batch
Epoch: 11/20... Training Step: 2001... Training loss: 1.5212... 1.5971 sec/batch
Epoch: 11/20... Training Step: 2002... Training loss: 1.5628... 1.6196 sec/batch
Epoch: 11/20... Training Step: 2003... Training loss: 1.5315... 1.6612 sec/batch
Epoch: 11/20... Training Step: 2004... Training loss: 1.5437... 1.6548 sec/batch
Epoch: 11/20... Training Step: 2005... Training loss: 1.5380... 1.7944 sec/batch
Epoch: 11/20... Training Step: 2006... Training loss: 1.4978... 1.7357 sec/batch
Epoch: 11/20... Training Step: 2007... Training loss: 1.5130... 1.7717 sec/batch
Epoch: 11/20... Training Step: 2008... Training loss: 1.5599... 1.6496 sec/batch
Epoch: 11/20... Training Step: 2009... Training loss: 1.5523... 1.6308 sec/batch
Epoch: 11/20... Training Step: 2010... Training loss: 1.5581... 1.6311 sec/batch
Epoch: 11/20... Training Step: 2011... Training loss: 1.5281... 1.6372 sec/batch
Epoch: 11/20... Training Step: 2012... Training loss: 1.5165... 1.6571 sec/batch
Epoch: 11/20... Training Step: 2013... Training loss: 1.5499... 1.6379 sec/batch
Epoch: 11/20... Training Step: 2014... Training loss: 1.5461... 1.6419 sec/batch
Epoch: 11/20... Training Step: 2015... Training loss: 1.5246... 1.6533 sec/batch
Epoch: 11/20... Training Step: 2016... Training loss: 1.5315... 1.6259 sec/batch
Epoch: 11/20... Training Step: 2017... Training loss: 1.5045... 1.6290 sec/batch
Epoch: 11/20... Training Step: 2018... Training loss: 1.4889... 1.6250 sec/batch
Epoch: 11/20... Training Step: 2019... Training loss: 1.4890... 1.6436 sec/batch
Epoch: 11/20... Training Step: 2020... Training loss: 1.5060... 1.6652 sec/batch
Epoch: 11/20... Training Step: 2021... Training loss: 1.5203... 1.6201 sec/batch
Epoch: 11/20... Training Step: 2022... Training loss: 1.5650... 1.6366 sec/batch
Epoch: 11/20... Training Step: 2023... Training loss: 1.5233... 1.6333 sec/batch
Epoch: 11/20... Training Step: 2024... Training loss: 1.5041... 1.6439 sec/batch
Epoch: 11/20... Training Step: 2025... Training loss: 1.5429... 1.6157 sec/batch
Epoch: 11/20... Training Step: 2026... Training loss: 1.4979... 1.6368 sec/batch
Epoch: 11/20... Training Step: 2027... Training loss: 1.5223... 1.6253 sec/batch
Epoch: 11/20... Training Step: 2028... Training loss: 1.5224... 1.6325 sec/batch
Epoch: 11/20... Training Step: 2029... Training loss: 1.5205... 1.6300 sec/batch
Epoch: 11/20... Training Step: 2030... Training loss: 1.5514... 1.6272 sec/batch
Epoch: 11/20... Training Step: 2031... Training loss: 1.5089... 1.6287 sec/batch
Epoch: 11/20... Training Step: 2032... Training loss: 1.5770... 1.6166 sec/batch
Epoch: 11/20... Training Step: 2033... Training loss: 1.5453... 1.6413 sec/batch
Epoch: 11/20... Training Step: 2034... Training loss: 1.5527... 1.6149 sec/batch
Epoch: 11/20... Training Step: 2035... Training loss: 1.5163... 1.6238 sec/batch
Epoch: 11/20... Training Step: 2036... Training loss: 1.5336... 1.6285 sec/batch
Epoch: 11/20... Training Step: 2037... Training loss: 1.5488... 1.6316 sec/batch
Epoch: 11/20... Training Step: 2038... Training loss: 1.5198... 1.6623 sec/batch
Epoch: 11/20... Training Step: 2039... Training loss: 1.5099... 1.6310 sec/batch
Epoch: 11/20... Training Step: 2040... Training loss: 1.5664... 1.6369 sec/batch
Epoch: 11/20... Training Step: 2041... Training loss: 1.5185... 1.6362 sec/batch
Epoch: 11/20... Training Step: 2042... Training loss: 1.5939... 1.6340 sec/batch
Epoch: 11/20... Training Step: 2043... Training loss: 1.5561... 1.6594 sec/batch
Epoch: 11/20... Training Step: 2044... Training loss: 1.5326... 1.6609 sec/batch
Epoch: 11/20... Training Step: 2045... Training loss: 1.5370... 1.8703 sec/batch
Epoch: 11/20... Training Step: 2046... Training loss: 1.5417... 1.7370 sec/batch
Epoch: 11/20... Training Step: 2047... Training loss: 1.5444... 1.7687 sec/batch
Epoch: 11/20... Training Step: 2048... Training loss: 1.5023... 1.6491 sec/batch
Epoch: 11/20... Training Step: 2049... Training loss: 1.5155... 1.6414 sec/batch
Epoch: 11/20... Training Step: 2050... Training loss: 1.5186... 1.6266 sec/batch
Epoch: 11/20... Training Step: 2051... Training loss: 1.5637... 1.6433 sec/batch
Epoch: 11/20... Training Step: 2052... Training loss: 1.5551... 1.6356 sec/batch
Epoch: 11/20... Training Step: 2053... Training loss: 1.5736... 1.6417 sec/batch
Epoch: 11/20... Training Step: 2054... Training loss: 1.5253... 1.6256 sec/batch
Epoch: 11/20... Training Step: 2055... Training loss: 1.5377... 1.6227 sec/batch
Epoch: 11/20... Training Step: 2056... Training loss: 1.5497... 1.6732 sec/batch
Epoch: 11/20... Training Step: 2057... Training loss: 1.5329... 1.5979 sec/batch
Epoch: 11/20... Training Step: 2058... Training loss: 1.5182... 1.6406 sec/batch
Epoch: 11/20... Training Step: 2059... Training loss: 1.5092... 1.6070 sec/batch
Epoch: 11/20... Training Step: 2060... Training loss: 1.5290... 1.6284 sec/batch
Epoch: 11/20... Training Step: 2061... Training loss: 1.4853... 1.6282 sec/batch
Epoch: 11/20... Training Step: 2062... Training loss: 1.5373... 1.6362 sec/batch
Epoch: 11/20... Training Step: 2063... Training loss: 1.4973... 1.6339 sec/batch
Epoch: 11/20... Training Step: 2064... Training loss: 1.5200... 1.6365 sec/batch
Epoch: 11/20... Training Step: 2065... Training loss: 1.5036... 1.6283 sec/batch
Epoch: 11/20... Training Step: 2066... Training loss: 1.5157... 1.6460 sec/batch
Epoch: 11/20... Training Step: 2067... Training loss: 1.5067... 1.6371 sec/batch
Epoch: 11/20... Training Step: 2068... Training loss: 1.5129... 1.6328 sec/batch
Epoch: 11/20... Training Step: 2069... Training loss: 1.5014... 1.6326 sec/batch
Epoch: 11/20... Training Step: 2070... Training loss: 1.5393... 1.6344 sec/batch
Epoch: 11/20... Training Step: 2071... Training loss: 1.4899... 1.6222 sec/batch
Epoch: 11/20... Training Step: 2072... Training loss: 1.5080... 1.6400 sec/batch
Epoch: 11/20... Training Step: 2073... Training loss: 1.5031... 1.6312 sec/batch
Epoch: 11/20... Training Step: 2074... Training loss: 1.5151... 1.6395 sec/batch
Epoch: 11/20... Training Step: 2075... Training loss: 1.5015... 1.6301 sec/batch
Epoch: 11/20... Training Step: 2076... Training loss: 1.5509... 1.6416 sec/batch
Epoch: 11/20... Training Step: 2077... Training loss: 1.5287... 1.6499 sec/batch
Epoch: 11/20... Training Step: 2078... Training loss: 1.4981... 1.6236 sec/batch
Epoch: 11/20... Training Step: 2079... Training loss: 1.5153... 1.6267 sec/batch
Epoch: 11/20... Training Step: 2080... Training loss: 1.4961... 1.6352 sec/batch
Epoch: 11/20... Training Step: 2081... Training loss: 1.5168... 1.6251 sec/batch
Epoch: 11/20... Training Step: 2082... Training loss: 1.5136... 1.6336 sec/batch
Epoch: 11/20... Training Step: 2083... Training loss: 1.5200... 1.6367 sec/batch
Epoch: 11/20... Training Step: 2084... Training loss: 1.5157... 1.6617 sec/batch
Epoch: 11/20... Training Step: 2085... Training loss: 1.5148... 1.7549 sec/batch
Epoch: 11/20... Training Step: 2086... Training loss: 1.5240... 1.7206 sec/batch
Epoch: 11/20... Training Step: 2087... Training loss: 1.5230... 1.8154 sec/batch
Epoch: 11/20... Training Step: 2088... Training loss: 1.5310... 1.6771 sec/batch
Epoch: 11/20... Training Step: 2089... Training loss: 1.5190... 1.6370 sec/batch
Epoch: 11/20... Training Step: 2090... Training loss: 1.5309... 1.6400 sec/batch
Epoch: 11/20... Training Step: 2091... Training loss: 1.5077... 1.6350 sec/batch
Epoch: 11/20... Training Step: 2092... Training loss: 1.5161... 1.6571 sec/batch
Epoch: 11/20... Training Step: 2093... Training loss: 1.5217... 1.6786 sec/batch
Epoch: 11/20... Training Step: 2094... Training loss: 1.5050... 1.6322 sec/batch
Epoch: 11/20... Training Step: 2095... Training loss: 1.4952... 1.6406 sec/batch
Epoch: 11/20... Training Step: 2096... Training loss: 1.4941... 1.6376 sec/batch
Epoch: 11/20... Training Step: 2097... Training loss: 1.5240... 1.6736 sec/batch
Epoch: 11/20... Training Step: 2098... Training loss: 1.5255... 1.6340 sec/batch
Epoch: 11/20... Training Step: 2099... Training loss: 1.5157... 1.6235 sec/batch
Epoch: 11/20... Training Step: 2100... Training loss: 1.5070... 1.6315 sec/batch
Epoch: 11/20... Training Step: 2101... Training loss: 1.5209... 1.6236 sec/batch
Epoch: 11/20... Training Step: 2102... Training loss: 1.4870... 1.6307 sec/batch
Epoch: 11/20... Training Step: 2103... Training loss: 1.4769... 1.6221 sec/batch
Epoch: 11/20... Training Step: 2104... Training loss: 1.5329... 1.6319 sec/batch
Epoch: 11/20... Training Step: 2105... Training loss: 1.5085... 1.6279 sec/batch
Epoch: 11/20... Training Step: 2106... Training loss: 1.4795... 1.6502 sec/batch
Epoch: 11/20... Training Step: 2107... Training loss: 1.5352... 1.6260 sec/batch
Epoch: 11/20... Training Step: 2108... Training loss: 1.5343... 1.6159 sec/batch
Epoch: 11/20... Training Step: 2109... Training loss: 1.5056... 1.6260 sec/batch
Epoch: 11/20... Training Step: 2110... Training loss: 1.4908... 1.6477 sec/batch
Epoch: 11/20... Training Step: 2111... Training loss: 1.4839... 1.6553 sec/batch
Epoch: 11/20... Training Step: 2112... Training loss: 1.5079... 1.8591 sec/batch
Epoch: 11/20... Training Step: 2113... Training loss: 1.5377... 1.6487 sec/batch
Epoch: 11/20... Training Step: 2114... Training loss: 1.5310... 1.6124 sec/batch
Epoch: 11/20... Training Step: 2115... Training loss: 1.5338... 1.6266 sec/batch
Epoch: 11/20... Training Step: 2116... Training loss: 1.5213... 1.6396 sec/batch
Epoch: 11/20... Training Step: 2117... Training loss: 1.5551... 1.6306 sec/batch
Epoch: 11/20... Training Step: 2118... Training loss: 1.5224... 1.6363 sec/batch
Epoch: 11/20... Training Step: 2119... Training loss: 1.5132... 1.6340 sec/batch
Epoch: 11/20... Training Step: 2120... Training loss: 1.5188... 1.6224 sec/batch
Epoch: 11/20... Training Step: 2121... Training loss: 1.5736... 1.6483 sec/batch
Epoch: 11/20... Training Step: 2122... Training loss: 1.5181... 1.6276 sec/batch
Epoch: 11/20... Training Step: 2123... Training loss: 1.5121... 1.6265 sec/batch
Epoch: 11/20... Training Step: 2124... Training loss: 1.5439... 1.7861 sec/batch
Epoch: 11/20... Training Step: 2125... Training loss: 1.5058... 1.7120 sec/batch
Epoch: 11/20... Training Step: 2126... Training loss: 1.5450... 1.9278 sec/batch
Epoch: 11/20... Training Step: 2127... Training loss: 1.5334... 1.8441 sec/batch
Epoch: 11/20... Training Step: 2128... Training loss: 1.5649... 1.6967 sec/batch
Epoch: 11/20... Training Step: 2129... Training loss: 1.5406... 2.0854 sec/batch
Epoch: 11/20... Training Step: 2130... Training loss: 1.5264... 2.7367 sec/batch
Epoch: 11/20... Training Step: 2131... Training loss: 1.4921... 2.6975 sec/batch
Epoch: 11/20... Training Step: 2132... Training loss: 1.5212... 2.6098 sec/batch
Epoch: 11/20... Training Step: 2133... Training loss: 1.5402... 1.8911 sec/batch
Epoch: 11/20... Training Step: 2134... Training loss: 1.5310... 2.0921 sec/batch
Epoch: 11/20... Training Step: 2135... Training loss: 1.5264... 2.6800 sec/batch
Epoch: 11/20... Training Step: 2136... Training loss: 1.5357... 2.5776 sec/batch
Epoch: 11/20... Training Step: 2137... Training loss: 1.5278... 3.1272 sec/batch
Epoch: 11/20... Training Step: 2138... Training loss: 1.5222... 2.0975 sec/batch
Epoch: 11/20... Training Step: 2139... Training loss: 1.4922... 1.9537 sec/batch
Epoch: 11/20... Training Step: 2140... Training loss: 1.5389... 2.4472 sec/batch
Epoch: 11/20... Training Step: 2141... Training loss: 1.5378... 2.3249 sec/batch
Epoch: 11/20... Training Step: 2142... Training loss: 1.5211... 2.0999 sec/batch
Epoch: 11/20... Training Step: 2143... Training loss: 1.5272... 1.9719 sec/batch
Epoch: 11/20... Training Step: 2144... Training loss: 1.5247... 1.8194 sec/batch
Epoch: 11/20... Training Step: 2145... Training loss: 1.5161... 1.6697 sec/batch
Epoch: 11/20... Training Step: 2146... Training loss: 1.5300... 1.7295 sec/batch
Epoch: 11/20... Training Step: 2147... Training loss: 1.5520... 2.0207 sec/batch
Epoch: 11/20... Training Step: 2148... Training loss: 1.5890... 1.7743 sec/batch
Epoch: 11/20... Training Step: 2149... Training loss: 1.5204... 1.8028 sec/batch
Epoch: 11/20... Training Step: 2150... Training loss: 1.5194... 1.6998 sec/batch
Epoch: 11/20... Training Step: 2151... Training loss: 1.5205... 2.1561 sec/batch
Epoch: 11/20... Training Step: 2152... Training loss: 1.5129... 1.9213 sec/batch
Epoch: 11/20... Training Step: 2153... Training loss: 1.5554... 2.1693 sec/batch
Epoch: 11/20... Training Step: 2154... Training loss: 1.5320... 1.8629 sec/batch
Epoch: 11/20... Training Step: 2155... Training loss: 1.5451... 2.4666 sec/batch
Epoch: 11/20... Training Step: 2156... Training loss: 1.4888... 1.8726 sec/batch
Epoch: 11/20... Training Step: 2157... Training loss: 1.5161... 1.9323 sec/batch
Epoch: 11/20... Training Step: 2158... Training loss: 1.5490... 1.9739 sec/batch
Epoch: 11/20... Training Step: 2159... Training loss: 1.5033... 1.8010 sec/batch
Epoch: 11/20... Training Step: 2160... Training loss: 1.4893... 1.8935 sec/batch
Epoch: 11/20... Training Step: 2161... Training loss: 1.4973... 1.7409 sec/batch
Epoch: 11/20... Training Step: 2162... Training loss: 1.5194... 1.8124 sec/batch
Epoch: 11/20... Training Step: 2163... Training loss: 1.5103... 1.7975 sec/batch
Epoch: 11/20... Training Step: 2164... Training loss: 1.5206... 1.7768 sec/batch
Epoch: 11/20... Training Step: 2165... Training loss: 1.5112... 1.8644 sec/batch
Epoch: 11/20... Training Step: 2166... Training loss: 1.4963... 1.9093 sec/batch
Epoch: 11/20... Training Step: 2167... Training loss: 1.5423... 1.8140 sec/batch
Epoch: 11/20... Training Step: 2168... Training loss: 1.5097... 1.8839 sec/batch
Epoch: 11/20... Training Step: 2169... Training loss: 1.5147... 1.8421 sec/batch
Epoch: 11/20... Training Step: 2170... Training loss: 1.5171... 1.8594 sec/batch
Epoch: 11/20... Training Step: 2171... Training loss: 1.5137... 1.8356 sec/batch
Epoch: 11/20... Training Step: 2172... Training loss: 1.4869... 1.8414 sec/batch
Epoch: 11/20... Training Step: 2173... Training loss: 1.5181... 1.8656 sec/batch
Epoch: 11/20... Training Step: 2174... Training loss: 1.4969... 1.7898 sec/batch
Epoch: 11/20... Training Step: 2175... Training loss: 1.4911... 1.7614 sec/batch
Epoch: 11/20... Training Step: 2176... Training loss: 1.5294... 1.8607 sec/batch
Epoch: 11/20... Training Step: 2177... Training loss: 1.5038... 1.7571 sec/batch
Epoch: 11/20... Training Step: 2178... Training loss: 1.5098... 1.7424 sec/batch
Epoch: 12/20... Training Step: 2179... Training loss: 1.5807... 1.7740 sec/batch
Epoch: 12/20... Training Step: 2180... Training loss: 1.5203... 1.7762 sec/batch
Epoch: 12/20... Training Step: 2181... Training loss: 1.5004... 1.7567 sec/batch
Epoch: 12/20... Training Step: 2182... Training loss: 1.5237... 1.7611 sec/batch
Epoch: 12/20... Training Step: 2183... Training loss: 1.5052... 1.7291 sec/batch
Epoch: 12/20... Training Step: 2184... Training loss: 1.4955... 1.7128 sec/batch
Epoch: 12/20... Training Step: 2185... Training loss: 1.5179... 1.7120 sec/batch
Epoch: 12/20... Training Step: 2186... Training loss: 1.5106... 1.7063 sec/batch
Epoch: 12/20... Training Step: 2187... Training loss: 1.5300... 1.7115 sec/batch
Epoch: 12/20... Training Step: 2188... Training loss: 1.5039... 1.7275 sec/batch
Epoch: 12/20... Training Step: 2189... Training loss: 1.4962... 1.7219 sec/batch
Epoch: 12/20... Training Step: 2190... Training loss: 1.4971... 1.7547 sec/batch
Epoch: 12/20... Training Step: 2191... Training loss: 1.5180... 1.6952 sec/batch
Epoch: 12/20... Training Step: 2192... Training loss: 1.5469... 1.7227 sec/batch
Epoch: 12/20... Training Step: 2193... Training loss: 1.5074... 1.7173 sec/batch
Epoch: 12/20... Training Step: 2194... Training loss: 1.4837... 1.8073 sec/batch
Epoch: 12/20... Training Step: 2195... Training loss: 1.5216... 1.7741 sec/batch
Epoch: 12/20... Training Step: 2196... Training loss: 1.5361... 1.5769 sec/batch
Epoch: 12/20... Training Step: 2197... Training loss: 1.5125... 1.5788 sec/batch
Epoch: 12/20... Training Step: 2198... Training loss: 1.5328... 1.5688 sec/batch
Epoch: 12/20... Training Step: 2199... Training loss: 1.4990... 2.0942 sec/batch
Epoch: 12/20... Training Step: 2200... Training loss: 1.5319... 1.7446 sec/batch
Epoch: 12/20... Training Step: 2201... Training loss: 1.5130... 1.6745 sec/batch
Epoch: 12/20... Training Step: 2202... Training loss: 1.5216... 2.2075 sec/batch
Epoch: 12/20... Training Step: 2203... Training loss: 1.5375... 1.8392 sec/batch
Epoch: 12/20... Training Step: 2204... Training loss: 1.4857... 1.9061 sec/batch
Epoch: 12/20... Training Step: 2205... Training loss: 1.4914... 2.0146 sec/batch
Epoch: 12/20... Training Step: 2206... Training loss: 1.5469... 1.7936 sec/batch
Epoch: 12/20... Training Step: 2207... Training loss: 1.5327... 1.7502 sec/batch
Epoch: 12/20... Training Step: 2208... Training loss: 1.5318... 1.6365 sec/batch
Epoch: 12/20... Training Step: 2209... Training loss: 1.5127... 1.8026 sec/batch
Epoch: 12/20... Training Step: 2210... Training loss: 1.4945... 1.7790 sec/batch
Epoch: 12/20... Training Step: 2211... Training loss: 1.5304... 1.6993 sec/batch
Epoch: 12/20... Training Step: 2212... Training loss: 1.5333... 1.8715 sec/batch
Epoch: 12/20... Training Step: 2213... Training loss: 1.5238... 1.5868 sec/batch
Epoch: 12/20... Training Step: 2214... Training loss: 1.5144... 1.9017 sec/batch
Epoch: 12/20... Training Step: 2215... Training loss: 1.4933... 1.9103 sec/batch
Epoch: 12/20... Training Step: 2216... Training loss: 1.4800... 1.9823 sec/batch
Epoch: 12/20... Training Step: 2217... Training loss: 1.4771... 1.7468 sec/batch
Epoch: 12/20... Training Step: 2218... Training loss: 1.4977... 1.9597 sec/batch
Epoch: 12/20... Training Step: 2219... Training loss: 1.4967... 2.0425 sec/batch
Epoch: 12/20... Training Step: 2220... Training loss: 1.5575... 1.6963 sec/batch
Epoch: 12/20... Training Step: 2221... Training loss: 1.4987... 1.5957 sec/batch
Epoch: 12/20... Training Step: 2222... Training loss: 1.4921... 1.9270 sec/batch
Epoch: 12/20... Training Step: 2223... Training loss: 1.5304... 1.6230 sec/batch
Epoch: 12/20... Training Step: 2224... Training loss: 1.4849... 1.7681 sec/batch
Epoch: 12/20... Training Step: 2225... Training loss: 1.5046... 1.7771 sec/batch
Epoch: 12/20... Training Step: 2226... Training loss: 1.5051... 1.5874 sec/batch
Epoch: 12/20... Training Step: 2227... Training loss: 1.5036... 2.0591 sec/batch
Epoch: 12/20... Training Step: 2228... Training loss: 1.5281... 1.9922 sec/batch
Epoch: 12/20... Training Step: 2229... Training loss: 1.4904... 2.0510 sec/batch
Epoch: 12/20... Training Step: 2230... Training loss: 1.5573... 1.7532 sec/batch
Epoch: 12/20... Training Step: 2231... Training loss: 1.5253... 2.1855 sec/batch
Epoch: 12/20... Training Step: 2232... Training loss: 1.5167... 2.8544 sec/batch
Epoch: 12/20... Training Step: 2233... Training loss: 1.4973... 2.2878 sec/batch
Epoch: 12/20... Training Step: 2234... Training loss: 1.5099... 3.0946 sec/batch
Epoch: 12/20... Training Step: 2235... Training loss: 1.5248... 2.5951 sec/batch
Epoch: 12/20... Training Step: 2236... Training loss: 1.5013... 4.0608 sec/batch
Epoch: 12/20... Training Step: 2237... Training loss: 1.4921... 3.3503 sec/batch
Epoch: 12/20... Training Step: 2238... Training loss: 1.5451... 2.1706 sec/batch
Epoch: 12/20... Training Step: 2239... Training loss: 1.5177... 2.9474 sec/batch
Epoch: 12/20... Training Step: 2240... Training loss: 1.5698... 2.7073 sec/batch
Epoch: 12/20... Training Step: 2241... Training loss: 1.5464... 3.0362 sec/batch
Epoch: 12/20... Training Step: 2242... Training loss: 1.5106... 3.7843 sec/batch
Epoch: 12/20... Training Step: 2243... Training loss: 1.5139... 2.1593 sec/batch
Epoch: 12/20... Training Step: 2244... Training loss: 1.5352... 2.3531 sec/batch
Epoch: 12/20... Training Step: 2245... Training loss: 1.5364... 2.0628 sec/batch
Epoch: 12/20... Training Step: 2246... Training loss: 1.4845... 1.8235 sec/batch
Epoch: 12/20... Training Step: 2247... Training loss: 1.5091... 1.8427 sec/batch
Epoch: 12/20... Training Step: 2248... Training loss: 1.4991... 1.9176 sec/batch
Epoch: 12/20... Training Step: 2249... Training loss: 1.5545... 1.8819 sec/batch
Epoch: 12/20... Training Step: 2250... Training loss: 1.5407... 2.6039 sec/batch
Epoch: 12/20... Training Step: 2251... Training loss: 1.5552... 3.2858 sec/batch
Epoch: 12/20... Training Step: 2252... Training loss: 1.5042... 3.8322 sec/batch
Epoch: 12/20... Training Step: 2253... Training loss: 1.5106... 2.1124 sec/batch
Epoch: 12/20... Training Step: 2254... Training loss: 1.5400... 2.5239 sec/batch
Epoch: 12/20... Training Step: 2255... Training loss: 1.5078... 2.2299 sec/batch
Epoch: 12/20... Training Step: 2256... Training loss: 1.5100... 1.9950 sec/batch
Epoch: 12/20... Training Step: 2257... Training loss: 1.4815... 2.1008 sec/batch
Epoch: 12/20... Training Step: 2258... Training loss: 1.5161... 2.0736 sec/batch
Epoch: 12/20... Training Step: 2259... Training loss: 1.4747... 2.0395 sec/batch
Epoch: 12/20... Training Step: 2260... Training loss: 1.5175... 2.0844 sec/batch
Epoch: 12/20... Training Step: 2261... Training loss: 1.4791... 2.7797 sec/batch
Epoch: 12/20... Training Step: 2262... Training loss: 1.5006... 2.4713 sec/batch
Epoch: 12/20... Training Step: 2263... Training loss: 1.4908... 2.0404 sec/batch
Epoch: 12/20... Training Step: 2264... Training loss: 1.4997... 2.5575 sec/batch
Epoch: 12/20... Training Step: 2265... Training loss: 1.4839... 2.2591 sec/batch
Epoch: 12/20... Training Step: 2266... Training loss: 1.4974... 2.3894 sec/batch
Epoch: 12/20... Training Step: 2267... Training loss: 1.4855... 2.1838 sec/batch
Epoch: 12/20... Training Step: 2268... Training loss: 1.5368... 2.0922 sec/batch
Epoch: 12/20... Training Step: 2269... Training loss: 1.4925... 2.0378 sec/batch
Epoch: 12/20... Training Step: 2270... Training loss: 1.5001... 2.0145 sec/batch
Epoch: 12/20... Training Step: 2271... Training loss: 1.4967... 1.9539 sec/batch
Epoch: 12/20... Training Step: 2272... Training loss: 1.4824... 1.9828 sec/batch
Epoch: 12/20... Training Step: 2273... Training loss: 1.4931... 2.2362 sec/batch
Epoch: 12/20... Training Step: 2274... Training loss: 1.5291... 2.2714 sec/batch
Epoch: 12/20... Training Step: 2275... Training loss: 1.5098... 2.0389 sec/batch
Epoch: 12/20... Training Step: 2276... Training loss: 1.4748... 1.9762 sec/batch
Epoch: 12/20... Training Step: 2277... Training loss: 1.4904... 2.0334 sec/batch
Epoch: 12/20... Training Step: 2278... Training loss: 1.4723... 2.0349 sec/batch
Epoch: 12/20... Training Step: 2279... Training loss: 1.5006... 2.0844 sec/batch
Epoch: 12/20... Training Step: 2280... Training loss: 1.5029... 2.3216 sec/batch
Epoch: 12/20... Training Step: 2281... Training loss: 1.4965... 2.4988 sec/batch
Epoch: 12/20... Training Step: 2282... Training loss: 1.5091... 2.4108 sec/batch
Epoch: 12/20... Training Step: 2283... Training loss: 1.4949... 2.1376 sec/batch
Epoch: 12/20... Training Step: 2284... Training loss: 1.5088... 2.3527 sec/batch
Epoch: 12/20... Training Step: 2285... Training loss: 1.5130... 2.0937 sec/batch
Epoch: 12/20... Training Step: 2286... Training loss: 1.5108... 2.3600 sec/batch
Epoch: 12/20... Training Step: 2287... Training loss: 1.5059... 1.9309 sec/batch
Epoch: 12/20... Training Step: 2288... Training loss: 1.5175... 2.2878 sec/batch
Epoch: 12/20... Training Step: 2289... Training loss: 1.4926... 2.3547 sec/batch
Epoch: 12/20... Training Step: 2290... Training loss: 1.5004... 2.1675 sec/batch
Epoch: 12/20... Training Step: 2291... Training loss: 1.5038... 1.9603 sec/batch
Epoch: 12/20... Training Step: 2292... Training loss: 1.4887... 2.1768 sec/batch
Epoch: 12/20... Training Step: 2293... Training loss: 1.4812... 2.1327 sec/batch
Epoch: 12/20... Training Step: 2294... Training loss: 1.4736... 2.2844 sec/batch
Epoch: 12/20... Training Step: 2295... Training loss: 1.5175... 2.2066 sec/batch
Epoch: 12/20... Training Step: 2296... Training loss: 1.5081... 2.3369 sec/batch
Epoch: 12/20... Training Step: 2297... Training loss: 1.4843... 2.1230 sec/batch
Epoch: 12/20... Training Step: 2298... Training loss: 1.4997... 2.0278 sec/batch
Epoch: 12/20... Training Step: 2299... Training loss: 1.5072... 1.9278 sec/batch
Epoch: 12/20... Training Step: 2300... Training loss: 1.4690... 1.8436 sec/batch
Epoch: 12/20... Training Step: 2301... Training loss: 1.4754... 2.0387 sec/batch
Epoch: 12/20... Training Step: 2302... Training loss: 1.5211... 2.0614 sec/batch
Epoch: 12/20... Training Step: 2303... Training loss: 1.4915... 2.0209 sec/batch
Epoch: 12/20... Training Step: 2304... Training loss: 1.4702... 1.8938 sec/batch
Epoch: 12/20... Training Step: 2305... Training loss: 1.5287... 1.8052 sec/batch
Epoch: 12/20... Training Step: 2306... Training loss: 1.5100... 1.7919 sec/batch
Epoch: 12/20... Training Step: 2307... Training loss: 1.4791... 1.8600 sec/batch
Epoch: 12/20... Training Step: 2308... Training loss: 1.4727... 1.8453 sec/batch
Epoch: 12/20... Training Step: 2309... Training loss: 1.4627... 1.8597 sec/batch
Epoch: 12/20... Training Step: 2310... Training loss: 1.4740... 1.8210 sec/batch
Epoch: 12/20... Training Step: 2311... Training loss: 1.5299... 1.8524 sec/batch
Epoch: 12/20... Training Step: 2312... Training loss: 1.5092... 1.8831 sec/batch
Epoch: 12/20... Training Step: 2313... Training loss: 1.5165... 1.8586 sec/batch
Epoch: 12/20... Training Step: 2314... Training loss: 1.5120... 1.8053 sec/batch
Epoch: 12/20... Training Step: 2315... Training loss: 1.5355... 1.8078 sec/batch
Epoch: 12/20... Training Step: 2316... Training loss: 1.5088... 1.7966 sec/batch
Epoch: 12/20... Training Step: 2317... Training loss: 1.5156... 1.8231 sec/batch
Epoch: 12/20... Training Step: 2318... Training loss: 1.5080... 1.8313 sec/batch
Epoch: 12/20... Training Step: 2319... Training loss: 1.5487... 1.8072 sec/batch
Epoch: 12/20... Training Step: 2320... Training loss: 1.5132... 1.8280 sec/batch
Epoch: 12/20... Training Step: 2321... Training loss: 1.4891... 1.8047 sec/batch
Epoch: 12/20... Training Step: 2322... Training loss: 1.5224... 1.8431 sec/batch
Epoch: 12/20... Training Step: 2323... Training loss: 1.4923... 1.8242 sec/batch
Epoch: 12/20... Training Step: 2324... Training loss: 1.5386... 1.7800 sec/batch
Epoch: 12/20... Training Step: 2325... Training loss: 1.5156... 1.8027 sec/batch
Epoch: 12/20... Training Step: 2326... Training loss: 1.5435... 1.7746 sec/batch
Epoch: 12/20... Training Step: 2327... Training loss: 1.5206... 1.8551 sec/batch
Epoch: 12/20... Training Step: 2328... Training loss: 1.5152... 1.8041 sec/batch
Epoch: 12/20... Training Step: 2329... Training loss: 1.4753... 1.7611 sec/batch
Epoch: 12/20... Training Step: 2330... Training loss: 1.4926... 1.7626 sec/batch
Epoch: 12/20... Training Step: 2331... Training loss: 1.5163... 1.7556 sec/batch
Epoch: 12/20... Training Step: 2332... Training loss: 1.5079... 1.7448 sec/batch
Epoch: 12/20... Training Step: 2333... Training loss: 1.4984... 1.7231 sec/batch
Epoch: 12/20... Training Step: 2334... Training loss: 1.5158... 1.9885 sec/batch
Epoch: 12/20... Training Step: 2335... Training loss: 1.5067... 1.8148 sec/batch
Epoch: 12/20... Training Step: 2336... Training loss: 1.5084... 1.9051 sec/batch
Epoch: 12/20... Training Step: 2337... Training loss: 1.4719... 1.7821 sec/batch
Epoch: 12/20... Training Step: 2338... Training loss: 1.5303... 1.7735 sec/batch
Epoch: 12/20... Training Step: 2339... Training loss: 1.5334... 1.7165 sec/batch
Epoch: 12/20... Training Step: 2340... Training loss: 1.4986... 1.7258 sec/batch
Epoch: 12/20... Training Step: 2341... Training loss: 1.5180... 1.7348 sec/batch
Epoch: 12/20... Training Step: 2342... Training loss: 1.5130... 1.7659 sec/batch
Epoch: 12/20... Training Step: 2343... Training loss: 1.5048... 1.7314 sec/batch
Epoch: 12/20... Training Step: 2344... Training loss: 1.5039... 1.7519 sec/batch
Epoch: 12/20... Training Step: 2345... Training loss: 1.5277... 1.7438 sec/batch
Epoch: 12/20... Training Step: 2346... Training loss: 1.5785... 1.7840 sec/batch
Epoch: 12/20... Training Step: 2347... Training loss: 1.4992... 1.7937 sec/batch
Epoch: 12/20... Training Step: 2348... Training loss: 1.5123... 1.9977 sec/batch
Epoch: 12/20... Training Step: 2349... Training loss: 1.4995... 1.7487 sec/batch
Epoch: 12/20... Training Step: 2350... Training loss: 1.4936... 1.7324 sec/batch
Epoch: 12/20... Training Step: 2351... Training loss: 1.5481... 1.7407 sec/batch
Epoch: 12/20... Training Step: 2352... Training loss: 1.5130... 1.7433 sec/batch
Epoch: 12/20... Training Step: 2353... Training loss: 1.5140... 1.7645 sec/batch
Epoch: 12/20... Training Step: 2354... Training loss: 1.4785... 1.7497 sec/batch
Epoch: 12/20... Training Step: 2355... Training loss: 1.4959... 1.7061 sec/batch
Epoch: 12/20... Training Step: 2356... Training loss: 1.5324... 1.7395 sec/batch
Epoch: 12/20... Training Step: 2357... Training loss: 1.4863... 1.7127 sec/batch
Epoch: 12/20... Training Step: 2358... Training loss: 1.4680... 1.7318 sec/batch
Epoch: 12/20... Training Step: 2359... Training loss: 1.4804... 1.7273 sec/batch
Epoch: 12/20... Training Step: 2360... Training loss: 1.5114... 1.7474 sec/batch
Epoch: 12/20... Training Step: 2361... Training loss: 1.5093... 1.7395 sec/batch
Epoch: 12/20... Training Step: 2362... Training loss: 1.4969... 1.7298 sec/batch
Epoch: 12/20... Training Step: 2363... Training loss: 1.5016... 1.7616 sec/batch
Epoch: 12/20... Training Step: 2364... Training loss: 1.4997... 1.7365 sec/batch
Epoch: 12/20... Training Step: 2365... Training loss: 1.5294... 1.7372 sec/batch
Epoch: 12/20... Training Step: 2366... Training loss: 1.4871... 1.7285 sec/batch
Epoch: 12/20... Training Step: 2367... Training loss: 1.5037... 1.7455 sec/batch
Epoch: 12/20... Training Step: 2368... Training loss: 1.5072... 1.8134 sec/batch
Epoch: 12/20... Training Step: 2369... Training loss: 1.5028... 1.6864 sec/batch
Epoch: 12/20... Training Step: 2370... Training loss: 1.4710... 1.7091 sec/batch
Epoch: 12/20... Training Step: 2371... Training loss: 1.5095... 1.7059 sec/batch
Epoch: 12/20... Training Step: 2372... Training loss: 1.4863... 1.7085 sec/batch
Epoch: 12/20... Training Step: 2373... Training loss: 1.4656... 1.7537 sec/batch
Epoch: 12/20... Training Step: 2374... Training loss: 1.4995... 1.6948 sec/batch
Epoch: 12/20... Training Step: 2375... Training loss: 1.4911... 1.6870 sec/batch
Epoch: 12/20... Training Step: 2376... Training loss: 1.4952... 1.7104 sec/batch
Epoch: 13/20... Training Step: 2377... Training loss: 1.5688... 1.6993 sec/batch
Epoch: 13/20... Training Step: 2378... Training loss: 1.5077... 1.7632 sec/batch
Epoch: 13/20... Training Step: 2379... Training loss: 1.4953... 1.7274 sec/batch
Epoch: 13/20... Training Step: 2380... Training loss: 1.5162... 1.7122 sec/batch
Epoch: 13/20... Training Step: 2381... Training loss: 1.4900... 1.6836 sec/batch
Epoch: 13/20... Training Step: 2382... Training loss: 1.4633... 1.6844 sec/batch
Epoch: 13/20... Training Step: 2383... Training loss: 1.4930... 1.6975 sec/batch
Epoch: 13/20... Training Step: 2384... Training loss: 1.4976... 1.6823 sec/batch
Epoch: 13/20... Training Step: 2385... Training loss: 1.5067... 1.6814 sec/batch
Epoch: 13/20... Training Step: 2386... Training loss: 1.4881... 1.7273 sec/batch
Epoch: 13/20... Training Step: 2387... Training loss: 1.4793... 1.7078 sec/batch
Epoch: 13/20... Training Step: 2388... Training loss: 1.4949... 1.6855 sec/batch
Epoch: 13/20... Training Step: 2389... Training loss: 1.5005... 1.6747 sec/batch
Epoch: 13/20... Training Step: 2390... Training loss: 1.5350... 1.6732 sec/batch
Epoch: 13/20... Training Step: 2391... Training loss: 1.4871... 1.7269 sec/batch
Epoch: 13/20... Training Step: 2392... Training loss: 1.4731... 1.6910 sec/batch
Epoch: 13/20... Training Step: 2393... Training loss: 1.5084... 1.6743 sec/batch
Epoch: 13/20... Training Step: 2394... Training loss: 1.5191... 1.7103 sec/batch
Epoch: 13/20... Training Step: 2395... Training loss: 1.5159... 1.7157 sec/batch
Epoch: 13/20... Training Step: 2396... Training loss: 1.5219... 1.6796 sec/batch
Epoch: 13/20... Training Step: 2397... Training loss: 1.4882... 1.6785 sec/batch
Epoch: 13/20... Training Step: 2398... Training loss: 1.5204... 1.6653 sec/batch
Epoch: 13/20... Training Step: 2399... Training loss: 1.4967... 1.7089 sec/batch
Epoch: 13/20... Training Step: 2400... Training loss: 1.5097... 1.7077 sec/batch
Epoch: 13/20... Training Step: 2401... Training loss: 1.5158... 1.6587 sec/batch
Epoch: 13/20... Training Step: 2402... Training loss: 1.4702... 1.7248 sec/batch
Epoch: 13/20... Training Step: 2403... Training loss: 1.4784... 2.0289 sec/batch
Epoch: 13/20... Training Step: 2404... Training loss: 1.5284... 1.7128 sec/batch
Epoch: 13/20... Training Step: 2405... Training loss: 1.5183... 1.6788 sec/batch
Epoch: 13/20... Training Step: 2406... Training loss: 1.5251... 1.6932 sec/batch
Epoch: 13/20... Training Step: 2407... Training loss: 1.4937... 1.7136 sec/batch
Epoch: 13/20... Training Step: 2408... Training loss: 1.4713... 1.7222 sec/batch
Epoch: 13/20... Training Step: 2409... Training loss: 1.5167... 1.7109 sec/batch
Epoch: 13/20... Training Step: 2410... Training loss: 1.5239... 1.7328 sec/batch
Epoch: 13/20... Training Step: 2411... Training loss: 1.5062... 1.7083 sec/batch
Epoch: 13/20... Training Step: 2412... Training loss: 1.5086... 1.7373 sec/batch
Epoch: 13/20... Training Step: 2413... Training loss: 1.4798... 1.7582 sec/batch
Epoch: 13/20... Training Step: 2414... Training loss: 1.4738... 1.7919 sec/batch
Epoch: 13/20... Training Step: 2415... Training loss: 1.4491... 1.7160 sec/batch
Epoch: 13/20... Training Step: 2416... Training loss: 1.4776... 1.7039 sec/batch
Epoch: 13/20... Training Step: 2417... Training loss: 1.4868... 1.7293 sec/batch
Epoch: 13/20... Training Step: 2418... Training loss: 1.5348... 1.6826 sec/batch
Epoch: 13/20... Training Step: 2419... Training loss: 1.4876... 1.7005 sec/batch
Epoch: 13/20... Training Step: 2420... Training loss: 1.4817... 1.6972 sec/batch
Epoch: 13/20... Training Step: 2421... Training loss: 1.5173... 1.7570 sec/batch
Epoch: 13/20... Training Step: 2422... Training loss: 1.4688... 1.8678 sec/batch
Epoch: 13/20... Training Step: 2423... Training loss: 1.4993... 1.7756 sec/batch
Epoch: 13/20... Training Step: 2424... Training loss: 1.4832... 1.7872 sec/batch
Epoch: 13/20... Training Step: 2425... Training loss: 1.4837... 1.7091 sec/batch
Epoch: 13/20... Training Step: 2426... Training loss: 1.5199... 1.7167 sec/batch
Epoch: 13/20... Training Step: 2427... Training loss: 1.4750... 1.7267 sec/batch
Epoch: 13/20... Training Step: 2428... Training loss: 1.5424... 1.6834 sec/batch
Epoch: 13/20... Training Step: 2429... Training loss: 1.5091... 1.7025 sec/batch
Epoch: 13/20... Training Step: 2430... Training loss: 1.5139... 1.6727 sec/batch
Epoch: 13/20... Training Step: 2431... Training loss: 1.4808... 1.6989 sec/batch
Epoch: 13/20... Training Step: 2432... Training loss: 1.5044... 1.6951 sec/batch
Epoch: 13/20... Training Step: 2433... Training loss: 1.5136... 1.6922 sec/batch
Epoch: 13/20... Training Step: 2434... Training loss: 1.4903... 1.6895 sec/batch
Epoch: 13/20... Training Step: 2435... Training loss: 1.4783... 1.6926 sec/batch
Epoch: 13/20... Training Step: 2436... Training loss: 1.5419... 1.6740 sec/batch
Epoch: 13/20... Training Step: 2437... Training loss: 1.4953... 1.6855 sec/batch
Epoch: 13/20... Training Step: 2438... Training loss: 1.5487... 1.7209 sec/batch
Epoch: 13/20... Training Step: 2439... Training loss: 1.5301... 1.6934 sec/batch
Epoch: 13/20... Training Step: 2440... Training loss: 1.5154... 1.7164 sec/batch
Epoch: 13/20... Training Step: 2441... Training loss: 1.4978... 1.7211 sec/batch
Epoch: 13/20... Training Step: 2442... Training loss: 1.5269... 1.6932 sec/batch
Epoch: 13/20... Training Step: 2443... Training loss: 1.5142... 1.7019 sec/batch
Epoch: 13/20... Training Step: 2444... Training loss: 1.4777... 1.6865 sec/batch
Epoch: 13/20... Training Step: 2445... Training loss: 1.4937... 1.6962 sec/batch
Epoch: 13/20... Training Step: 2446... Training loss: 1.4877... 1.7224 sec/batch
Epoch: 13/20... Training Step: 2447... Training loss: 1.5378... 1.6937 sec/batch
Epoch: 13/20... Training Step: 2448... Training loss: 1.5263... 1.7043 sec/batch
Epoch: 13/20... Training Step: 2449... Training loss: 1.5423... 1.6944 sec/batch
Epoch: 13/20... Training Step: 2450... Training loss: 1.4966... 1.6875 sec/batch
Epoch: 13/20... Training Step: 2451... Training loss: 1.4962... 1.6867 sec/batch
Epoch: 13/20... Training Step: 2452... Training loss: 1.5192... 1.7044 sec/batch
Epoch: 13/20... Training Step: 2453... Training loss: 1.4978... 1.7024 sec/batch
Epoch: 13/20... Training Step: 2454... Training loss: 1.4902... 1.7617 sec/batch
Epoch: 13/20... Training Step: 2455... Training loss: 1.4657... 1.8575 sec/batch
Epoch: 13/20... Training Step: 2456... Training loss: 1.4948... 1.7488 sec/batch
Epoch: 13/20... Training Step: 2457... Training loss: 1.4564... 1.7473 sec/batch
Epoch: 13/20... Training Step: 2458... Training loss: 1.5052... 1.7663 sec/batch
Epoch: 13/20... Training Step: 2459... Training loss: 1.4715... 1.8231 sec/batch
Epoch: 13/20... Training Step: 2460... Training loss: 1.4912... 1.9437 sec/batch
Epoch: 13/20... Training Step: 2461... Training loss: 1.4750... 1.8613 sec/batch
Epoch: 13/20... Training Step: 2462... Training loss: 1.4975... 1.7749 sec/batch
Epoch: 13/20... Training Step: 2463... Training loss: 1.4692... 1.7216 sec/batch
Epoch: 13/20... Training Step: 2464... Training loss: 1.4822... 1.7075 sec/batch
Epoch: 13/20... Training Step: 2465... Training loss: 1.4680... 1.7579 sec/batch
Epoch: 13/20... Training Step: 2466... Training loss: 1.5083... 1.9985 sec/batch
Epoch: 13/20... Training Step: 2467... Training loss: 1.4721... 1.8718 sec/batch
Epoch: 13/20... Training Step: 2468... Training loss: 1.4893... 1.7846 sec/batch
Epoch: 13/20... Training Step: 2469... Training loss: 1.4822... 1.7230 sec/batch
Epoch: 13/20... Training Step: 2470... Training loss: 1.4730... 1.6786 sec/batch
Epoch: 13/20... Training Step: 2471... Training loss: 1.4736... 1.7412 sec/batch
Epoch: 13/20... Training Step: 2472... Training loss: 1.5011... 1.9821 sec/batch
Epoch: 13/20... Training Step: 2473... Training loss: 1.4982... 1.7289 sec/batch
Epoch: 13/20... Training Step: 2474... Training loss: 1.4695... 1.7906 sec/batch
Epoch: 13/20... Training Step: 2475... Training loss: 1.4877... 1.7631 sec/batch
Epoch: 13/20... Training Step: 2476... Training loss: 1.4614... 1.7636 sec/batch
Epoch: 13/20... Training Step: 2477... Training loss: 1.4877... 1.7816 sec/batch
Epoch: 13/20... Training Step: 2478... Training loss: 1.4834... 1.7948 sec/batch
Epoch: 13/20... Training Step: 2479... Training loss: 1.4932... 1.7989 sec/batch
Epoch: 13/20... Training Step: 2480... Training loss: 1.4861... 1.8098 sec/batch
Epoch: 13/20... Training Step: 2481... Training loss: 1.4832... 1.8388 sec/batch
Epoch: 13/20... Training Step: 2482... Training loss: 1.4816... 1.8506 sec/batch
Epoch: 13/20... Training Step: 2483... Training loss: 1.4935... 1.8342 sec/batch
Epoch: 13/20... Training Step: 2484... Training loss: 1.5056... 1.8079 sec/batch
Epoch: 13/20... Training Step: 2485... Training loss: 1.4826... 1.8168 sec/batch
Epoch: 13/20... Training Step: 2486... Training loss: 1.5092... 1.7970 sec/batch
Epoch: 13/20... Training Step: 2487... Training loss: 1.4858... 2.2881 sec/batch
Epoch: 13/20... Training Step: 2488... Training loss: 1.4789... 2.1530 sec/batch
Epoch: 13/20... Training Step: 2489... Training loss: 1.4959... 1.9198 sec/batch
Epoch: 13/20... Training Step: 2490... Training loss: 1.4699... 1.7854 sec/batch
Epoch: 13/20... Training Step: 2491... Training loss: 1.4708... 1.8018 sec/batch
Epoch: 13/20... Training Step: 2492... Training loss: 1.4700... 1.8132 sec/batch
Epoch: 13/20... Training Step: 2493... Training loss: 1.5108... 1.9604 sec/batch
Epoch: 13/20... Training Step: 2494... Training loss: 1.4966... 2.0287 sec/batch
Epoch: 13/20... Training Step: 2495... Training loss: 1.4877... 2.1839 sec/batch
Epoch: 13/20... Training Step: 2496... Training loss: 1.4874... 1.9647 sec/batch
Epoch: 13/20... Training Step: 2497... Training loss: 1.4958... 1.8967 sec/batch
Epoch: 13/20... Training Step: 2498... Training loss: 1.4628... 1.8037 sec/batch
Epoch: 13/20... Training Step: 2499... Training loss: 1.4526... 1.7950 sec/batch
Epoch: 13/20... Training Step: 2500... Training loss: 1.5091... 1.8184 sec/batch
Epoch: 13/20... Training Step: 2501... Training loss: 1.4825... 1.8177 sec/batch
Epoch: 13/20... Training Step: 2502... Training loss: 1.4478... 1.7926 sec/batch
Epoch: 13/20... Training Step: 2503... Training loss: 1.5028... 1.8285 sec/batch
Epoch: 13/20... Training Step: 2504... Training loss: 1.4950... 1.8247 sec/batch
Epoch: 13/20... Training Step: 2505... Training loss: 1.4670... 1.7497 sec/batch
Epoch: 13/20... Training Step: 2506... Training loss: 1.4709... 1.7806 sec/batch
Epoch: 13/20... Training Step: 2507... Training loss: 1.4512... 1.7682 sec/batch
Epoch: 13/20... Training Step: 2508... Training loss: 1.4661... 1.8007 sec/batch
Epoch: 13/20... Training Step: 2509... Training loss: 1.5100... 1.7834 sec/batch
Epoch: 13/20... Training Step: 2510... Training loss: 1.4920... 1.7530 sec/batch
Epoch: 13/20... Training Step: 2511... Training loss: 1.4903... 1.8131 sec/batch
Epoch: 13/20... Training Step: 2512... Training loss: 1.4851... 1.7729 sec/batch
Epoch: 13/20... Training Step: 2513... Training loss: 1.5287... 1.8123 sec/batch
Epoch: 13/20... Training Step: 2514... Training loss: 1.4914... 1.8073 sec/batch
Epoch: 13/20... Training Step: 2515... Training loss: 1.4918... 1.8428 sec/batch
Epoch: 13/20... Training Step: 2516... Training loss: 1.4916... 1.8131 sec/batch
Epoch: 13/20... Training Step: 2517... Training loss: 1.5451... 1.7879 sec/batch
Epoch: 13/20... Training Step: 2518... Training loss: 1.5040... 1.9607 sec/batch
Epoch: 13/20... Training Step: 2519... Training loss: 1.4859... 2.0856 sec/batch
Epoch: 13/20... Training Step: 2520... Training loss: 1.5219... 2.1857 sec/batch
Epoch: 13/20... Training Step: 2521... Training loss: 1.4721... 2.1172 sec/batch
Epoch: 13/20... Training Step: 2522... Training loss: 1.5060... 2.0990 sec/batch
Epoch: 13/20... Training Step: 2523... Training loss: 1.5137... 1.8536 sec/batch
Epoch: 13/20... Training Step: 2524... Training loss: 1.5312... 1.8110 sec/batch
Epoch: 13/20... Training Step: 2525... Training loss: 1.5073... 1.8190 sec/batch
Epoch: 13/20... Training Step: 2526... Training loss: 1.4938... 1.8131 sec/batch
Epoch: 13/20... Training Step: 2527... Training loss: 1.4618... 1.8103 sec/batch
Epoch: 13/20... Training Step: 2528... Training loss: 1.4821... 1.8294 sec/batch
Epoch: 13/20... Training Step: 2529... Training loss: 1.5084... 1.8211 sec/batch
Epoch: 13/20... Training Step: 2530... Training loss: 1.4997... 1.8088 sec/batch
Epoch: 13/20... Training Step: 2531... Training loss: 1.4979... 1.8010 sec/batch
Epoch: 13/20... Training Step: 2532... Training loss: 1.4862... 1.8432 sec/batch
Epoch: 13/20... Training Step: 2533... Training loss: 1.5017... 1.7848 sec/batch
Epoch: 13/20... Training Step: 2534... Training loss: 1.4894... 1.8261 sec/batch
Epoch: 13/20... Training Step: 2535... Training loss: 1.4550... 1.8378 sec/batch
Epoch: 13/20... Training Step: 2536... Training loss: 1.5119... 1.9919 sec/batch
Epoch: 13/20... Training Step: 2537... Training loss: 1.5242... 2.0044 sec/batch
Epoch: 13/20... Training Step: 2538... Training loss: 1.5026... 1.7925 sec/batch
Epoch: 13/20... Training Step: 2539... Training loss: 1.5072... 1.7850 sec/batch
Epoch: 13/20... Training Step: 2540... Training loss: 1.4986... 1.8345 sec/batch
Epoch: 13/20... Training Step: 2541... Training loss: 1.4913... 1.8409 sec/batch
Epoch: 13/20... Training Step: 2542... Training loss: 1.5032... 1.8345 sec/batch
Epoch: 13/20... Training Step: 2543... Training loss: 1.5144... 1.8545 sec/batch
Epoch: 13/20... Training Step: 2544... Training loss: 1.5649... 1.8169 sec/batch
Epoch: 13/20... Training Step: 2545... Training loss: 1.4857... 1.8609 sec/batch
Epoch: 13/20... Training Step: 2546... Training loss: 1.4956... 1.8370 sec/batch
Epoch: 13/20... Training Step: 2547... Training loss: 1.4794... 1.8126 sec/batch
Epoch: 13/20... Training Step: 2548... Training loss: 1.4825... 1.8204 sec/batch
Epoch: 13/20... Training Step: 2549... Training loss: 1.5215... 1.8310 sec/batch
Epoch: 13/20... Training Step: 2550... Training loss: 1.4948... 1.8292 sec/batch
Epoch: 13/20... Training Step: 2551... Training loss: 1.5170... 1.8330 sec/batch
Epoch: 13/20... Training Step: 2552... Training loss: 1.4697... 1.8520 sec/batch
Epoch: 13/20... Training Step: 2553... Training loss: 1.4816... 1.8832 sec/batch
Epoch: 13/20... Training Step: 2554... Training loss: 1.5091... 1.8378 sec/batch
Epoch: 13/20... Training Step: 2555... Training loss: 1.4756... 1.8366 sec/batch
Epoch: 13/20... Training Step: 2556... Training loss: 1.4628... 1.8320 sec/batch
Epoch: 13/20... Training Step: 2557... Training loss: 1.4660... 1.8156 sec/batch
Epoch: 13/20... Training Step: 2558... Training loss: 1.4905... 1.8559 sec/batch
Epoch: 13/20... Training Step: 2559... Training loss: 1.4951... 1.8036 sec/batch
Epoch: 13/20... Training Step: 2560... Training loss: 1.4896... 1.8327 sec/batch
Epoch: 13/20... Training Step: 2561... Training loss: 1.4841... 1.8043 sec/batch
Epoch: 13/20... Training Step: 2562... Training loss: 1.4821... 1.8432 sec/batch
Epoch: 13/20... Training Step: 2563... Training loss: 1.5061... 1.7797 sec/batch
Epoch: 13/20... Training Step: 2564... Training loss: 1.4796... 1.7462 sec/batch
Epoch: 13/20... Training Step: 2565... Training loss: 1.4839... 1.7867 sec/batch
Epoch: 13/20... Training Step: 2566... Training loss: 1.4884... 1.8233 sec/batch
Epoch: 13/20... Training Step: 2567... Training loss: 1.4861... 1.7869 sec/batch
Epoch: 13/20... Training Step: 2568... Training loss: 1.4695... 1.7912 sec/batch
Epoch: 13/20... Training Step: 2569... Training loss: 1.4799... 1.7746 sec/batch
Epoch: 13/20... Training Step: 2570... Training loss: 1.4677... 1.8445 sec/batch
Epoch: 13/20... Training Step: 2571... Training loss: 1.4569... 1.7666 sec/batch
Epoch: 13/20... Training Step: 2572... Training loss: 1.4942... 1.7957 sec/batch
Epoch: 13/20... Training Step: 2573... Training loss: 1.4844... 1.7935 sec/batch
Epoch: 13/20... Training Step: 2574... Training loss: 1.4762... 1.8155 sec/batch
Epoch: 14/20... Training Step: 2575... Training loss: 1.5587... 1.8116 sec/batch
Epoch: 14/20... Training Step: 2576... Training loss: 1.4979... 1.8312 sec/batch
Epoch: 14/20... Training Step: 2577... Training loss: 1.4837... 1.9061 sec/batch
Epoch: 14/20... Training Step: 2578... Training loss: 1.4988... 1.8285 sec/batch
Epoch: 14/20... Training Step: 2579... Training loss: 1.4864... 1.8614 sec/batch
Epoch: 14/20... Training Step: 2580... Training loss: 1.4560... 1.8140 sec/batch
Epoch: 14/20... Training Step: 2581... Training loss: 1.4955... 1.8035 sec/batch
Epoch: 14/20... Training Step: 2582... Training loss: 1.4841... 1.8268 sec/batch
Epoch: 14/20... Training Step: 2583... Training loss: 1.5107... 1.7997 sec/batch
Epoch: 14/20... Training Step: 2584... Training loss: 1.4775... 1.8295 sec/batch
Epoch: 14/20... Training Step: 2585... Training loss: 1.4639... 1.8458 sec/batch
Epoch: 14/20... Training Step: 2586... Training loss: 1.4784... 1.8888 sec/batch
Epoch: 14/20... Training Step: 2587... Training loss: 1.4848... 2.0277 sec/batch
Epoch: 14/20... Training Step: 2588... Training loss: 1.5093... 1.9404 sec/batch
Epoch: 14/20... Training Step: 2589... Training loss: 1.4764... 1.8795 sec/batch
Epoch: 14/20... Training Step: 2590... Training loss: 1.4600... 1.8524 sec/batch
Epoch: 14/20... Training Step: 2591... Training loss: 1.5010... 1.8366 sec/batch
Epoch: 14/20... Training Step: 2592... Training loss: 1.5205... 1.8291 sec/batch
Epoch: 14/20... Training Step: 2593... Training loss: 1.4931... 1.8429 sec/batch
Epoch: 14/20... Training Step: 2594... Training loss: 1.5054... 1.8521 sec/batch
Epoch: 14/20... Training Step: 2595... Training loss: 1.4839... 1.8148 sec/batch
Epoch: 14/20... Training Step: 2596... Training loss: 1.5060... 1.8434 sec/batch
Epoch: 14/20... Training Step: 2597... Training loss: 1.4837... 1.7969 sec/batch
Epoch: 14/20... Training Step: 2598... Training loss: 1.4924... 1.8484 sec/batch
Epoch: 14/20... Training Step: 2599... Training loss: 1.5001... 1.8209 sec/batch
Epoch: 14/20... Training Step: 2600... Training loss: 1.4502... 1.8302 sec/batch
Epoch: 14/20... Training Step: 2601... Training loss: 1.4701... 1.9668 sec/batch
Epoch: 14/20... Training Step: 2602... Training loss: 1.5176... 1.9044 sec/batch
Epoch: 14/20... Training Step: 2603... Training loss: 1.5143... 1.8165 sec/batch
Epoch: 14/20... Training Step: 2604... Training loss: 1.5149... 1.8135 sec/batch
Epoch: 14/20... Training Step: 2605... Training loss: 1.4779... 1.7864 sec/batch
Epoch: 14/20... Training Step: 2606... Training loss: 1.4757... 1.8687 sec/batch
Epoch: 14/20... Training Step: 2607... Training loss: 1.4956... 1.8204 sec/batch
Epoch: 14/20... Training Step: 2608... Training loss: 1.5109... 1.8198 sec/batch
Epoch: 14/20... Training Step: 2609... Training loss: 1.4964... 1.8445 sec/batch
Epoch: 14/20... Training Step: 2610... Training loss: 1.4889... 1.8081 sec/batch
Epoch: 14/20... Training Step: 2611... Training loss: 1.4627... 1.8248 sec/batch
Epoch: 14/20... Training Step: 2612... Training loss: 1.4554... 1.8186 sec/batch
Epoch: 14/20... Training Step: 2613... Training loss: 1.4472... 1.8532 sec/batch
Epoch: 14/20... Training Step: 2614... Training loss: 1.4693... 1.8314 sec/batch
Epoch: 14/20... Training Step: 2615... Training loss: 1.4650... 1.8394 sec/batch
Epoch: 14/20... Training Step: 2616... Training loss: 1.5227... 1.8325 sec/batch
Epoch: 14/20... Training Step: 2617... Training loss: 1.4766... 1.9612 sec/batch
Epoch: 14/20... Training Step: 2618... Training loss: 1.4596... 1.9964 sec/batch
Epoch: 14/20... Training Step: 2619... Training loss: 1.4997... 1.8612 sec/batch
Epoch: 14/20... Training Step: 2620... Training loss: 1.4516... 2.0507 sec/batch
Epoch: 14/20... Training Step: 2621... Training loss: 1.4843... 1.8188 sec/batch
Epoch: 14/20... Training Step: 2622... Training loss: 1.4832... 1.9017 sec/batch
Epoch: 14/20... Training Step: 2623... Training loss: 1.4741... 2.1061 sec/batch
Epoch: 14/20... Training Step: 2624... Training loss: 1.5128... 1.9940 sec/batch
Epoch: 14/20... Training Step: 2625... Training loss: 1.4532... 2.0128 sec/batch
Epoch: 14/20... Training Step: 2626... Training loss: 1.5288... 1.8443 sec/batch
Epoch: 14/20... Training Step: 2627... Training loss: 1.4921... 1.9566 sec/batch
Epoch: 14/20... Training Step: 2628... Training loss: 1.5015... 1.9089 sec/batch
Epoch: 14/20... Training Step: 2629... Training loss: 1.4702... 1.8359 sec/batch
Epoch: 14/20... Training Step: 2630... Training loss: 1.4780... 1.8072 sec/batch
Epoch: 14/20... Training Step: 2631... Training loss: 1.5021... 1.9083 sec/batch
Epoch: 14/20... Training Step: 2632... Training loss: 1.4711... 1.9487 sec/batch
Epoch: 14/20... Training Step: 2633... Training loss: 1.4624... 1.9653 sec/batch
Epoch: 14/20... Training Step: 2634... Training loss: 1.5223... 1.8106 sec/batch
Epoch: 14/20... Training Step: 2635... Training loss: 1.4878... 1.8168 sec/batch
Epoch: 14/20... Training Step: 2636... Training loss: 1.5455... 1.8072 sec/batch
Epoch: 14/20... Training Step: 2637... Training loss: 1.5064... 1.8141 sec/batch
Epoch: 14/20... Training Step: 2638... Training loss: 1.4869... 1.8009 sec/batch
Epoch: 14/20... Training Step: 2639... Training loss: 1.4736... 1.7877 sec/batch
Epoch: 14/20... Training Step: 2640... Training loss: 1.5091... 1.8403 sec/batch
Epoch: 14/20... Training Step: 2641... Training loss: 1.5074... 1.8174 sec/batch
Epoch: 14/20... Training Step: 2642... Training loss: 1.4580... 1.8210 sec/batch
Epoch: 14/20... Training Step: 2643... Training loss: 1.4693... 1.7398 sec/batch
Epoch: 14/20... Training Step: 2644... Training loss: 1.4652... 1.7789 sec/batch
Epoch: 14/20... Training Step: 2645... Training loss: 1.5195... 1.7735 sec/batch
Epoch: 14/20... Training Step: 2646... Training loss: 1.5139... 1.7908 sec/batch
Epoch: 14/20... Training Step: 2647... Training loss: 1.5241... 1.8067 sec/batch
Epoch: 14/20... Training Step: 2648... Training loss: 1.4730... 1.8146 sec/batch
Epoch: 14/20... Training Step: 2649... Training loss: 1.4892... 1.8091 sec/batch
Epoch: 14/20... Training Step: 2650... Training loss: 1.5204... 1.8248 sec/batch
Epoch: 14/20... Training Step: 2651... Training loss: 1.4887... 1.8178 sec/batch
Epoch: 14/20... Training Step: 2652... Training loss: 1.4734... 1.8517 sec/batch
Epoch: 14/20... Training Step: 2653... Training loss: 1.4561... 1.7849 sec/batch
Epoch: 14/20... Training Step: 2654... Training loss: 1.4850... 1.7827 sec/batch
Epoch: 14/20... Training Step: 2655... Training loss: 1.4430... 1.7997 sec/batch
Epoch: 14/20... Training Step: 2656... Training loss: 1.5048... 1.7869 sec/batch
Epoch: 14/20... Training Step: 2657... Training loss: 1.4437... 1.8089 sec/batch
Epoch: 14/20... Training Step: 2658... Training loss: 1.4739... 1.8031 sec/batch
Epoch: 14/20... Training Step: 2659... Training loss: 1.4596... 1.7928 sec/batch
Epoch: 14/20... Training Step: 2660... Training loss: 1.4773... 1.7657 sec/batch
Epoch: 14/20... Training Step: 2661... Training loss: 1.4615... 1.7730 sec/batch
Epoch: 14/20... Training Step: 2662... Training loss: 1.4647... 1.8294 sec/batch
Epoch: 14/20... Training Step: 2663... Training loss: 1.4548... 1.8114 sec/batch
Epoch: 14/20... Training Step: 2664... Training loss: 1.4913... 1.7901 sec/batch
Epoch: 14/20... Training Step: 2665... Training loss: 1.4580... 1.8063 sec/batch
Epoch: 14/20... Training Step: 2666... Training loss: 1.4581... 1.8617 sec/batch
Epoch: 14/20... Training Step: 2667... Training loss: 1.4629... 1.9362 sec/batch
Epoch: 14/20... Training Step: 2668... Training loss: 1.4661... 1.7994 sec/batch
Epoch: 14/20... Training Step: 2669... Training loss: 1.4532... 1.8171 sec/batch
Epoch: 14/20... Training Step: 2670... Training loss: 1.4955... 1.8175 sec/batch
Epoch: 14/20... Training Step: 2671... Training loss: 1.4846... 1.7922 sec/batch
Epoch: 14/20... Training Step: 2672... Training loss: 1.4493... 1.8187 sec/batch
Epoch: 14/20... Training Step: 2673... Training loss: 1.4716... 2.1276 sec/batch
Epoch: 14/20... Training Step: 2674... Training loss: 1.4503... 1.9924 sec/batch
Epoch: 14/20... Training Step: 2675... Training loss: 1.4799... 1.8937 sec/batch
Epoch: 14/20... Training Step: 2676... Training loss: 1.4655... 1.8100 sec/batch
Epoch: 14/20... Training Step: 2677... Training loss: 1.4700... 1.7883 sec/batch
Epoch: 14/20... Training Step: 2678... Training loss: 1.4762... 1.7905 sec/batch
Epoch: 14/20... Training Step: 2679... Training loss: 1.4696... 1.8137 sec/batch
Epoch: 14/20... Training Step: 2680... Training loss: 1.4695... 2.0830 sec/batch
Epoch: 14/20... Training Step: 2681... Training loss: 1.4736... 1.9879 sec/batch
Epoch: 14/20... Training Step: 2682... Training loss: 1.4856... 2.1287 sec/batch
Epoch: 14/20... Training Step: 2683... Training loss: 1.4739... 2.1017 sec/batch
Epoch: 14/20... Training Step: 2684... Training loss: 1.4987... 2.2703 sec/batch
Epoch: 14/20... Training Step: 2685... Training loss: 1.4621... 2.0735 sec/batch
Epoch: 14/20... Training Step: 2686... Training loss: 1.4698... 2.1231 sec/batch
Epoch: 14/20... Training Step: 2687... Training loss: 1.4802... 2.0177 sec/batch
Epoch: 14/20... Training Step: 2688... Training loss: 1.4674... 1.8879 sec/batch
Epoch: 14/20... Training Step: 2689... Training loss: 1.4665... 1.8044 sec/batch
Epoch: 14/20... Training Step: 2690... Training loss: 1.4349... 1.8320 sec/batch
Epoch: 14/20... Training Step: 2691... Training loss: 1.4976... 1.8002 sec/batch
Epoch: 14/20... Training Step: 2692... Training loss: 1.4880... 2.1253 sec/batch
Epoch: 14/20... Training Step: 2693... Training loss: 1.4817... 2.0834 sec/batch
Epoch: 14/20... Training Step: 2694... Training loss: 1.4672... 2.0274 sec/batch
Epoch: 14/20... Training Step: 2695... Training loss: 1.4748... 1.9593 sec/batch
Epoch: 14/20... Training Step: 2696... Training loss: 1.4408... 1.8253 sec/batch
Epoch: 14/20... Training Step: 2697... Training loss: 1.4326... 1.7754 sec/batch
Epoch: 14/20... Training Step: 2698... Training loss: 1.4944... 1.8317 sec/batch
Epoch: 14/20... Training Step: 2699... Training loss: 1.4551... 1.7910 sec/batch
Epoch: 14/20... Training Step: 2700... Training loss: 1.4382... 1.8655 sec/batch
Epoch: 14/20... Training Step: 2701... Training loss: 1.4944... 1.7887 sec/batch
Epoch: 14/20... Training Step: 2702... Training loss: 1.4808... 1.7965 sec/batch
Epoch: 14/20... Training Step: 2703... Training loss: 1.4648... 1.7937 sec/batch
Epoch: 14/20... Training Step: 2704... Training loss: 1.4555... 1.7676 sec/batch
Epoch: 14/20... Training Step: 2705... Training loss: 1.4412... 1.7502 sec/batch
Epoch: 14/20... Training Step: 2706... Training loss: 1.4514... 1.7860 sec/batch
Epoch: 14/20... Training Step: 2707... Training loss: 1.4956... 1.8065 sec/batch
Epoch: 14/20... Training Step: 2708... Training loss: 1.4891... 1.7713 sec/batch
Epoch: 14/20... Training Step: 2709... Training loss: 1.4786... 1.7924 sec/batch
Epoch: 14/20... Training Step: 2710... Training loss: 1.4808... 1.7704 sec/batch
Epoch: 14/20... Training Step: 2711... Training loss: 1.5239... 1.8049 sec/batch
Epoch: 14/20... Training Step: 2712... Training loss: 1.4862... 1.8213 sec/batch
Epoch: 14/20... Training Step: 2713... Training loss: 1.4850... 1.8260 sec/batch
Epoch: 14/20... Training Step: 2714... Training loss: 1.4792... 1.9609 sec/batch
Epoch: 14/20... Training Step: 2715... Training loss: 1.5292... 1.8332 sec/batch
Epoch: 14/20... Training Step: 2716... Training loss: 1.4852... 1.8086 sec/batch
Epoch: 14/20... Training Step: 2717... Training loss: 1.4709... 1.7902 sec/batch
Epoch: 14/20... Training Step: 2718... Training loss: 1.5003... 1.7710 sec/batch
Epoch: 14/20... Training Step: 2719... Training loss: 1.4677... 1.8155 sec/batch
Epoch: 14/20... Training Step: 2720... Training loss: 1.5122... 1.8311 sec/batch
Epoch: 14/20... Training Step: 2721... Training loss: 1.4942... 1.7884 sec/batch
Epoch: 14/20... Training Step: 2722... Training loss: 1.5232... 1.8754 sec/batch
Epoch: 14/20... Training Step: 2723... Training loss: 1.4965... 1.8355 sec/batch
Epoch: 14/20... Training Step: 2724... Training loss: 1.4748... 1.8207 sec/batch
Epoch: 14/20... Training Step: 2725... Training loss: 1.4566... 1.8078 sec/batch
Epoch: 14/20... Training Step: 2726... Training loss: 1.4701... 1.7905 sec/batch
Epoch: 14/20... Training Step: 2727... Training loss: 1.5003... 1.8207 sec/batch
Epoch: 14/20... Training Step: 2728... Training loss: 1.4840... 1.8016 sec/batch
Epoch: 14/20... Training Step: 2729... Training loss: 1.4777... 1.8262 sec/batch
Epoch: 14/20... Training Step: 2730... Training loss: 1.4885... 1.8380 sec/batch
Epoch: 14/20... Training Step: 2731... Training loss: 1.4775... 1.9463 sec/batch
Epoch: 14/20... Training Step: 2732... Training loss: 1.4784... 1.6845 sec/batch
Epoch: 14/20... Training Step: 2733... Training loss: 1.4366... 1.6918 sec/batch
Epoch: 14/20... Training Step: 2734... Training loss: 1.4993... 1.6965 sec/batch
Epoch: 14/20... Training Step: 2735... Training loss: 1.5050... 1.7491 sec/batch
Epoch: 14/20... Training Step: 2736... Training loss: 1.4686... 1.7305 sec/batch
Epoch: 14/20... Training Step: 2737... Training loss: 1.4914... 1.6970 sec/batch
Epoch: 14/20... Training Step: 2738... Training loss: 1.4779... 1.7017 sec/batch
Epoch: 14/20... Training Step: 2739... Training loss: 1.4817... 1.7008 sec/batch
Epoch: 14/20... Training Step: 2740... Training loss: 1.4953... 1.6873 sec/batch
Epoch: 14/20... Training Step: 2741... Training loss: 1.5066... 1.7395 sec/batch
Epoch: 14/20... Training Step: 2742... Training loss: 1.5471... 1.7163 sec/batch
Epoch: 14/20... Training Step: 2743... Training loss: 1.4907... 1.7015 sec/batch
Epoch: 14/20... Training Step: 2744... Training loss: 1.4882... 1.7072 sec/batch
Epoch: 14/20... Training Step: 2745... Training loss: 1.4799... 1.7222 sec/batch
Epoch: 14/20... Training Step: 2746... Training loss: 1.4705... 1.7542 sec/batch
Epoch: 14/20... Training Step: 2747... Training loss: 1.5178... 1.6928 sec/batch
Epoch: 14/20... Training Step: 2748... Training loss: 1.4863... 1.7394 sec/batch
Epoch: 14/20... Training Step: 2749... Training loss: 1.4964... 1.7788 sec/batch
Epoch: 14/20... Training Step: 2750... Training loss: 1.4471... 1.8043 sec/batch
Epoch: 14/20... Training Step: 2751... Training loss: 1.4646... 1.7542 sec/batch
Epoch: 14/20... Training Step: 2752... Training loss: 1.5108... 1.8735 sec/batch
Epoch: 14/20... Training Step: 2753... Training loss: 1.4663... 1.8464 sec/batch
Epoch: 14/20... Training Step: 2754... Training loss: 1.4528... 1.7919 sec/batch
Epoch: 14/20... Training Step: 2755... Training loss: 1.4655... 1.6894 sec/batch
Epoch: 14/20... Training Step: 2756... Training loss: 1.4767... 1.7120 sec/batch
Epoch: 14/20... Training Step: 2757... Training loss: 1.4773... 1.7373 sec/batch
Epoch: 14/20... Training Step: 2758... Training loss: 1.4723... 1.6886 sec/batch
Epoch: 14/20... Training Step: 2759... Training loss: 1.4709... 1.7275 sec/batch
Epoch: 14/20... Training Step: 2760... Training loss: 1.4748... 1.6904 sec/batch
Epoch: 14/20... Training Step: 2761... Training loss: 1.4841... 1.6879 sec/batch
Epoch: 14/20... Training Step: 2762... Training loss: 1.4734... 1.7255 sec/batch
Epoch: 14/20... Training Step: 2763... Training loss: 1.4679... 1.6936 sec/batch
Epoch: 14/20... Training Step: 2764... Training loss: 1.4796... 1.6922 sec/batch
Epoch: 14/20... Training Step: 2765... Training loss: 1.4641... 1.7276 sec/batch
Epoch: 14/20... Training Step: 2766... Training loss: 1.4606... 1.7013 sec/batch
Epoch: 14/20... Training Step: 2767... Training loss: 1.4737... 1.7323 sec/batch
Epoch: 14/20... Training Step: 2768... Training loss: 1.4500... 1.7035 sec/batch
Epoch: 14/20... Training Step: 2769... Training loss: 1.4428... 1.6964 sec/batch
Epoch: 14/20... Training Step: 2770... Training loss: 1.4829... 1.7102 sec/batch
Epoch: 14/20... Training Step: 2771... Training loss: 1.4715... 1.6965 sec/batch
Epoch: 14/20... Training Step: 2772... Training loss: 1.4651... 1.6937 sec/batch
Epoch: 15/20... Training Step: 2773... Training loss: 1.5407... 1.7010 sec/batch
Epoch: 15/20... Training Step: 2774... Training loss: 1.4898... 1.7021 sec/batch
Epoch: 15/20... Training Step: 2775... Training loss: 1.4763... 1.7120 sec/batch
Epoch: 15/20... Training Step: 2776... Training loss: 1.4831... 1.7215 sec/batch
Epoch: 15/20... Training Step: 2777... Training loss: 1.4682... 1.6946 sec/batch
Epoch: 15/20... Training Step: 2778... Training loss: 1.4466... 1.6841 sec/batch
Epoch: 15/20... Training Step: 2779... Training loss: 1.4814... 1.7104 sec/batch
Epoch: 15/20... Training Step: 2780... Training loss: 1.4751... 1.6742 sec/batch
Epoch: 15/20... Training Step: 2781... Training loss: 1.4788... 1.7420 sec/batch
Epoch: 15/20... Training Step: 2782... Training loss: 1.4837... 1.8264 sec/batch
Epoch: 15/20... Training Step: 2783... Training loss: 1.4515... 1.7686 sec/batch
Epoch: 15/20... Training Step: 2784... Training loss: 1.4644... 1.8399 sec/batch
Epoch: 15/20... Training Step: 2785... Training loss: 1.4706... 1.7084 sec/batch
Epoch: 15/20... Training Step: 2786... Training loss: 1.5079... 1.6910 sec/batch
Epoch: 15/20... Training Step: 2787... Training loss: 1.4676... 1.7239 sec/batch
Epoch: 15/20... Training Step: 2788... Training loss: 1.4487... 1.7094 sec/batch
Epoch: 15/20... Training Step: 2789... Training loss: 1.4949... 1.6913 sec/batch
Epoch: 15/20... Training Step: 2790... Training loss: 1.5089... 1.6932 sec/batch
Epoch: 15/20... Training Step: 2791... Training loss: 1.4862... 1.6870 sec/batch
Epoch: 15/20... Training Step: 2792... Training loss: 1.4973... 1.7365 sec/batch
Epoch: 15/20... Training Step: 2793... Training loss: 1.4680... 1.6901 sec/batch
Epoch: 15/20... Training Step: 2794... Training loss: 1.4879... 1.7185 sec/batch
Epoch: 15/20... Training Step: 2795... Training loss: 1.4742... 1.6986 sec/batch
Epoch: 15/20... Training Step: 2796... Training loss: 1.4780... 1.6837 sec/batch
Epoch: 15/20... Training Step: 2797... Training loss: 1.4897... 1.6896 sec/batch
Epoch: 15/20... Training Step: 2798... Training loss: 1.4386... 1.6975 sec/batch
Epoch: 15/20... Training Step: 2799... Training loss: 1.4568... 1.7190 sec/batch
Epoch: 15/20... Training Step: 2800... Training loss: 1.4911... 1.8035 sec/batch
Epoch: 15/20... Training Step: 2801... Training loss: 1.4935... 1.6673 sec/batch
Epoch: 15/20... Training Step: 2802... Training loss: 1.4958... 1.7254 sec/batch
Epoch: 15/20... Training Step: 2803... Training loss: 1.4655... 1.7174 sec/batch
Epoch: 15/20... Training Step: 2804... Training loss: 1.4687... 1.7213 sec/batch
Epoch: 15/20... Training Step: 2805... Training loss: 1.4933... 1.6823 sec/batch
Epoch: 15/20... Training Step: 2806... Training loss: 1.4873... 1.6854 sec/batch
Epoch: 15/20... Training Step: 2807... Training loss: 1.4750... 1.7225 sec/batch
Epoch: 15/20... Training Step: 2808... Training loss: 1.4795... 1.6918 sec/batch
Epoch: 15/20... Training Step: 2809... Training loss: 1.4495... 1.6951 sec/batch
Epoch: 15/20... Training Step: 2810... Training loss: 1.4436... 1.7019 sec/batch
Epoch: 15/20... Training Step: 2811... Training loss: 1.4234... 1.7132 sec/batch
Epoch: 15/20... Training Step: 2812... Training loss: 1.4556... 1.7115 sec/batch
Epoch: 15/20... Training Step: 2813... Training loss: 1.4638... 1.6682 sec/batch
Epoch: 15/20... Training Step: 2814... Training loss: 1.5039... 1.6964 sec/batch
Epoch: 15/20... Training Step: 2815... Training loss: 1.4679... 1.6768 sec/batch
Epoch: 15/20... Training Step: 2816... Training loss: 1.4513... 1.7111 sec/batch
Epoch: 15/20... Training Step: 2817... Training loss: 1.4848... 1.7158 sec/batch
Epoch: 15/20... Training Step: 2818... Training loss: 1.4384... 1.6695 sec/batch
Epoch: 15/20... Training Step: 2819... Training loss: 1.4720... 1.7033 sec/batch
Epoch: 15/20... Training Step: 2820... Training loss: 1.4670... 1.7061 sec/batch
Epoch: 15/20... Training Step: 2821... Training loss: 1.4685... 1.7099 sec/batch
Epoch: 15/20... Training Step: 2822... Training loss: 1.4917... 1.7002 sec/batch
Epoch: 15/20... Training Step: 2823... Training loss: 1.4516... 1.7068 sec/batch
Epoch: 15/20... Training Step: 2824... Training loss: 1.5019... 1.6875 sec/batch
Epoch: 15/20... Training Step: 2825... Training loss: 1.4852... 1.6899 sec/batch
Epoch: 15/20... Training Step: 2826... Training loss: 1.4907... 1.7083 sec/batch
Epoch: 15/20... Training Step: 2827... Training loss: 1.4589... 1.6900 sec/batch
Epoch: 15/20... Training Step: 2828... Training loss: 1.4847... 1.6790 sec/batch
Epoch: 15/20... Training Step: 2829... Training loss: 1.4833... 1.6980 sec/batch
Epoch: 15/20... Training Step: 2830... Training loss: 1.4511... 1.6855 sec/batch
Epoch: 15/20... Training Step: 2831... Training loss: 1.4469... 1.7384 sec/batch
Epoch: 15/20... Training Step: 2832... Training loss: 1.5005... 1.7001 sec/batch
Epoch: 15/20... Training Step: 2833... Training loss: 1.4775... 1.7096 sec/batch
Epoch: 15/20... Training Step: 2834... Training loss: 1.5349... 1.7072 sec/batch
Epoch: 15/20... Training Step: 2835... Training loss: 1.4995... 1.7223 sec/batch
Epoch: 15/20... Training Step: 2836... Training loss: 1.4856... 1.6973 sec/batch
Epoch: 15/20... Training Step: 2837... Training loss: 1.4708... 1.6825 sec/batch
Epoch: 15/20... Training Step: 2838... Training loss: 1.4898... 1.7298 sec/batch
Epoch: 15/20... Training Step: 2839... Training loss: 1.4870... 1.6899 sec/batch
Epoch: 15/20... Training Step: 2840... Training loss: 1.4490... 1.6853 sec/batch
Epoch: 15/20... Training Step: 2841... Training loss: 1.4585... 1.6984 sec/batch
Epoch: 15/20... Training Step: 2842... Training loss: 1.4579... 1.7125 sec/batch
Epoch: 15/20... Training Step: 2843... Training loss: 1.5116... 1.7046 sec/batch
Epoch: 15/20... Training Step: 2844... Training loss: 1.4915... 1.7050 sec/batch
Epoch: 15/20... Training Step: 2845... Training loss: 1.5060... 1.7059 sec/batch
Epoch: 15/20... Training Step: 2846... Training loss: 1.4618... 1.6964 sec/batch
Epoch: 15/20... Training Step: 2847... Training loss: 1.4839... 1.7335 sec/batch
Epoch: 15/20... Training Step: 2848... Training loss: 1.4920... 1.6823 sec/batch
Epoch: 15/20... Training Step: 2849... Training loss: 1.4817... 1.6949 sec/batch
Epoch: 15/20... Training Step: 2850... Training loss: 1.4707... 1.6746 sec/batch
Epoch: 15/20... Training Step: 2851... Training loss: 1.4494... 1.6960 sec/batch
Epoch: 15/20... Training Step: 2852... Training loss: 1.4720... 1.7187 sec/batch
Epoch: 15/20... Training Step: 2853... Training loss: 1.4346... 1.7323 sec/batch
Epoch: 15/20... Training Step: 2854... Training loss: 1.4762... 1.7412 sec/batch
Epoch: 15/20... Training Step: 2855... Training loss: 1.4413... 1.7210 sec/batch
Epoch: 15/20... Training Step: 2856... Training loss: 1.4696... 1.7210 sec/batch
Epoch: 15/20... Training Step: 2857... Training loss: 1.4348... 1.7298 sec/batch
Epoch: 15/20... Training Step: 2858... Training loss: 1.4717... 1.6855 sec/batch
Epoch: 15/20... Training Step: 2859... Training loss: 1.4490... 1.6969 sec/batch
Epoch: 15/20... Training Step: 2860... Training loss: 1.4637... 1.7109 sec/batch
Epoch: 15/20... Training Step: 2861... Training loss: 1.4486... 1.7168 sec/batch
Epoch: 15/20... Training Step: 2862... Training loss: 1.4764... 1.6884 sec/batch
Epoch: 15/20... Training Step: 2863... Training loss: 1.4506... 1.6793 sec/batch
Epoch: 15/20... Training Step: 2864... Training loss: 1.4519... 1.6904 sec/batch
Epoch: 15/20... Training Step: 2865... Training loss: 1.4501... 1.7600 sec/batch
Epoch: 15/20... Training Step: 2866... Training loss: 1.4516... 1.6870 sec/batch
Epoch: 15/20... Training Step: 2867... Training loss: 1.4456... 1.7341 sec/batch
Epoch: 15/20... Training Step: 2868... Training loss: 1.4972... 1.6953 sec/batch
Epoch: 15/20... Training Step: 2869... Training loss: 1.4762... 1.7168 sec/batch
Epoch: 15/20... Training Step: 2870... Training loss: 1.4318... 1.8176 sec/batch
Epoch: 15/20... Training Step: 2871... Training loss: 1.4589... 1.7347 sec/batch
Epoch: 15/20... Training Step: 2872... Training loss: 1.4539... 1.6970 sec/batch
Epoch: 15/20... Training Step: 2873... Training loss: 1.4674... 1.7106 sec/batch
Epoch: 15/20... Training Step: 2874... Training loss: 1.4566... 1.7069 sec/batch
Epoch: 15/20... Training Step: 2875... Training loss: 1.4560... 1.6999 sec/batch
Epoch: 15/20... Training Step: 2876... Training loss: 1.4603... 1.6912 sec/batch
Epoch: 15/20... Training Step: 2877... Training loss: 1.4643... 1.7050 sec/batch
Epoch: 15/20... Training Step: 2878... Training loss: 1.4581... 1.7415 sec/batch
Epoch: 15/20... Training Step: 2879... Training loss: 1.4667... 1.6903 sec/batch
Epoch: 15/20... Training Step: 2880... Training loss: 1.4749... 1.6915 sec/batch
Epoch: 15/20... Training Step: 2881... Training loss: 1.4681... 1.7252 sec/batch
Epoch: 15/20... Training Step: 2882... Training loss: 1.4822... 1.7036 sec/batch
Epoch: 15/20... Training Step: 2883... Training loss: 1.4529... 1.6888 sec/batch
Epoch: 15/20... Training Step: 2884... Training loss: 1.4644... 1.6978 sec/batch
Epoch: 15/20... Training Step: 2885... Training loss: 1.4756... 1.6954 sec/batch
Epoch: 15/20... Training Step: 2886... Training loss: 1.4556... 1.7187 sec/batch
Epoch: 15/20... Training Step: 2887... Training loss: 1.4444... 1.7421 sec/batch
Epoch: 15/20... Training Step: 2888... Training loss: 1.4273... 1.7528 sec/batch
Epoch: 15/20... Training Step: 2889... Training loss: 1.4711... 1.7268 sec/batch
Epoch: 15/20... Training Step: 2890... Training loss: 1.4811... 1.6971 sec/batch
Epoch: 15/20... Training Step: 2891... Training loss: 1.4618... 1.6942 sec/batch
Epoch: 15/20... Training Step: 2892... Training loss: 1.4572... 1.6832 sec/batch
Epoch: 15/20... Training Step: 2893... Training loss: 1.4647... 1.7033 sec/batch
Epoch: 15/20... Training Step: 2894... Training loss: 1.4376... 1.7051 sec/batch
Epoch: 15/20... Training Step: 2895... Training loss: 1.4194... 1.7275 sec/batch
Epoch: 15/20... Training Step: 2896... Training loss: 1.4652... 1.6987 sec/batch
Epoch: 15/20... Training Step: 2897... Training loss: 1.4537... 1.6956 sec/batch
Epoch: 15/20... Training Step: 2898... Training loss: 1.4243... 1.6983 sec/batch
Epoch: 15/20... Training Step: 2899... Training loss: 1.4690... 1.7162 sec/batch
Epoch: 15/20... Training Step: 2900... Training loss: 1.4785... 1.6857 sec/batch
Epoch: 15/20... Training Step: 2901... Training loss: 1.4471... 1.6799 sec/batch
Epoch: 15/20... Training Step: 2902... Training loss: 1.4288... 1.6997 sec/batch
Epoch: 15/20... Training Step: 2903... Training loss: 1.4317... 1.6791 sec/batch
Epoch: 15/20... Training Step: 2904... Training loss: 1.4499... 1.7462 sec/batch
Epoch: 15/20... Training Step: 2905... Training loss: 1.4846... 1.7336 sec/batch
Epoch: 15/20... Training Step: 2906... Training loss: 1.4707... 1.6802 sec/batch
Epoch: 15/20... Training Step: 2907... Training loss: 1.4686... 1.7041 sec/batch
Epoch: 15/20... Training Step: 2908... Training loss: 1.4725... 1.6934 sec/batch
Epoch: 15/20... Training Step: 2909... Training loss: 1.4934... 1.6973 sec/batch
Epoch: 15/20... Training Step: 2910... Training loss: 1.4767... 1.6874 sec/batch
Epoch: 15/20... Training Step: 2911... Training loss: 1.4726... 1.7001 sec/batch
Epoch: 15/20... Training Step: 2912... Training loss: 1.4634... 1.7210 sec/batch
Epoch: 15/20... Training Step: 2913... Training loss: 1.5228... 1.7016 sec/batch
Epoch: 15/20... Training Step: 2914... Training loss: 1.4706... 1.7032 sec/batch
Epoch: 15/20... Training Step: 2915... Training loss: 1.4561... 1.7174 sec/batch
Epoch: 15/20... Training Step: 2916... Training loss: 1.4923... 1.6935 sec/batch
Epoch: 15/20... Training Step: 2917... Training loss: 1.4530... 1.6954 sec/batch
Epoch: 15/20... Training Step: 2918... Training loss: 1.4995... 1.6877 sec/batch
Epoch: 15/20... Training Step: 2919... Training loss: 1.4938... 1.6968 sec/batch
Epoch: 15/20... Training Step: 2920... Training loss: 1.5141... 1.7060 sec/batch
Epoch: 15/20... Training Step: 2921... Training loss: 1.4903... 1.7152 sec/batch
Epoch: 15/20... Training Step: 2922... Training loss: 1.4716... 1.7095 sec/batch
Epoch: 15/20... Training Step: 2923... Training loss: 1.4376... 1.7417 sec/batch
Epoch: 15/20... Training Step: 2924... Training loss: 1.4686... 1.7644 sec/batch
Epoch: 15/20... Training Step: 2925... Training loss: 1.4889... 1.8373 sec/batch
Epoch: 15/20... Training Step: 2926... Training loss: 1.4733... 1.7301 sec/batch
Epoch: 15/20... Training Step: 2927... Training loss: 1.4687... 1.6926 sec/batch
Epoch: 15/20... Training Step: 2928... Training loss: 1.4774... 1.6993 sec/batch
Epoch: 15/20... Training Step: 2929... Training loss: 1.4830... 1.6936 sec/batch
Epoch: 15/20... Training Step: 2930... Training loss: 1.4519... 1.7485 sec/batch
Epoch: 15/20... Training Step: 2931... Training loss: 1.4296... 1.7276 sec/batch
Epoch: 15/20... Training Step: 2932... Training loss: 1.4888... 1.6946 sec/batch
Epoch: 15/20... Training Step: 2933... Training loss: 1.4778... 1.7121 sec/batch
Epoch: 15/20... Training Step: 2934... Training loss: 1.4740... 1.7048 sec/batch
Epoch: 15/20... Training Step: 2935... Training loss: 1.4782... 1.6930 sec/batch
Epoch: 15/20... Training Step: 2936... Training loss: 1.4731... 1.7192 sec/batch
Epoch: 15/20... Training Step: 2937... Training loss: 1.4668... 1.7009 sec/batch
Epoch: 15/20... Training Step: 2938... Training loss: 1.4543... 1.7264 sec/batch
Epoch: 15/20... Training Step: 2939... Training loss: 1.5008... 1.6973 sec/batch
Epoch: 15/20... Training Step: 2940... Training loss: 1.5328... 1.7980 sec/batch
Epoch: 15/20... Training Step: 2941... Training loss: 1.4666... 1.7535 sec/batch
Epoch: 15/20... Training Step: 2942... Training loss: 1.4639... 1.7149 sec/batch
Epoch: 15/20... Training Step: 2943... Training loss: 1.4616... 1.7029 sec/batch
Epoch: 15/20... Training Step: 2944... Training loss: 1.4586... 1.6906 sec/batch
Epoch: 15/20... Training Step: 2945... Training loss: 1.5058... 1.6888 sec/batch
Epoch: 15/20... Training Step: 2946... Training loss: 1.4788... 1.7121 sec/batch
Epoch: 15/20... Training Step: 2947... Training loss: 1.4869... 1.7161 sec/batch
Epoch: 15/20... Training Step: 2948... Training loss: 1.4412... 1.6940 sec/batch
Epoch: 15/20... Training Step: 2949... Training loss: 1.4588... 1.7037 sec/batch
Epoch: 15/20... Training Step: 2950... Training loss: 1.4902... 1.7148 sec/batch
Epoch: 15/20... Training Step: 2951... Training loss: 1.4490... 1.7082 sec/batch
Epoch: 15/20... Training Step: 2952... Training loss: 1.4464... 1.6967 sec/batch
Epoch: 15/20... Training Step: 2953... Training loss: 1.4436... 1.6537 sec/batch
Epoch: 15/20... Training Step: 2954... Training loss: 1.4638... 1.7020 sec/batch
Epoch: 15/20... Training Step: 2955... Training loss: 1.4688... 1.7441 sec/batch
Epoch: 15/20... Training Step: 2956... Training loss: 1.4617... 1.7879 sec/batch
Epoch: 15/20... Training Step: 2957... Training loss: 1.4606... 1.7982 sec/batch
Epoch: 15/20... Training Step: 2958... Training loss: 1.4584... 1.8834 sec/batch
Epoch: 15/20... Training Step: 2959... Training loss: 1.4865... 1.6943 sec/batch
Epoch: 15/20... Training Step: 2960... Training loss: 1.4563... 1.6946 sec/batch
Epoch: 15/20... Training Step: 2961... Training loss: 1.4696... 1.6966 sec/batch
Epoch: 15/20... Training Step: 2962... Training loss: 1.4567... 1.6913 sec/batch
Epoch: 15/20... Training Step: 2963... Training loss: 1.4500... 1.7305 sec/batch
Epoch: 15/20... Training Step: 2964... Training loss: 1.4369... 1.6897 sec/batch
Epoch: 15/20... Training Step: 2965... Training loss: 1.4696... 1.6875 sec/batch
Epoch: 15/20... Training Step: 2966... Training loss: 1.4504... 1.7405 sec/batch
Epoch: 15/20... Training Step: 2967... Training loss: 1.4448... 1.6785 sec/batch
Epoch: 15/20... Training Step: 2968... Training loss: 1.4817... 1.7241 sec/batch
Epoch: 15/20... Training Step: 2969... Training loss: 1.4575... 1.6955 sec/batch
Epoch: 15/20... Training Step: 2970... Training loss: 1.4478... 1.7145 sec/batch
Epoch: 16/20... Training Step: 2971... Training loss: 1.5322... 1.7189 sec/batch
Epoch: 16/20... Training Step: 2972... Training loss: 1.4734... 1.7006 sec/batch
Epoch: 16/20... Training Step: 2973... Training loss: 1.4647... 1.6876 sec/batch
Epoch: 16/20... Training Step: 2974... Training loss: 1.4722... 1.7086 sec/batch
Epoch: 16/20... Training Step: 2975... Training loss: 1.4519... 1.7366 sec/batch
Epoch: 16/20... Training Step: 2976... Training loss: 1.4271... 1.6829 sec/batch
Epoch: 16/20... Training Step: 2977... Training loss: 1.4586... 1.7109 sec/batch
Epoch: 16/20... Training Step: 2978... Training loss: 1.4626... 1.6922 sec/batch
Epoch: 16/20... Training Step: 2979... Training loss: 1.4751... 1.6880 sec/batch
Epoch: 16/20... Training Step: 2980... Training loss: 1.4516... 1.6799 sec/batch
Epoch: 16/20... Training Step: 2981... Training loss: 1.4443... 1.7335 sec/batch
Epoch: 16/20... Training Step: 2982... Training loss: 1.4512... 1.7149 sec/batch
Epoch: 16/20... Training Step: 2983... Training loss: 1.4640... 1.7261 sec/batch
Epoch: 16/20... Training Step: 2984... Training loss: 1.4931... 1.6995 sec/batch
Epoch: 16/20... Training Step: 2985... Training loss: 1.4493... 1.6988 sec/batch
Epoch: 16/20... Training Step: 2986... Training loss: 1.4338... 1.7130 sec/batch
Epoch: 16/20... Training Step: 2987... Training loss: 1.4828... 1.6952 sec/batch
Epoch: 16/20... Training Step: 2988... Training loss: 1.4809... 1.6800 sec/batch
Epoch: 16/20... Training Step: 2989... Training loss: 1.4696... 1.6858 sec/batch
Epoch: 16/20... Training Step: 2990... Training loss: 1.4839... 1.6855 sec/batch
Epoch: 16/20... Training Step: 2991... Training loss: 1.4672... 1.7271 sec/batch
Epoch: 16/20... Training Step: 2992... Training loss: 1.4851... 1.8175 sec/batch
Epoch: 16/20... Training Step: 2993... Training loss: 1.4550... 1.9076 sec/batch
Epoch: 16/20... Training Step: 2994... Training loss: 1.4705... 1.8685 sec/batch
Epoch: 16/20... Training Step: 2995... Training loss: 1.4701... 1.6962 sec/batch
Epoch: 16/20... Training Step: 2996... Training loss: 1.4202... 1.6847 sec/batch
Epoch: 16/20... Training Step: 2997... Training loss: 1.4397... 1.7278 sec/batch
Epoch: 16/20... Training Step: 2998... Training loss: 1.4870... 1.7115 sec/batch
Epoch: 16/20... Training Step: 2999... Training loss: 1.4783... 1.6912 sec/batch
Epoch: 16/20... Training Step: 3000... Training loss: 1.4903... 1.6897 sec/batch
Epoch: 16/20... Training Step: 3001... Training loss: 1.4566... 1.6825 sec/batch
Epoch: 16/20... Training Step: 3002... Training loss: 1.4557... 1.6970 sec/batch
Epoch: 16/20... Training Step: 3003... Training loss: 1.4822... 1.7273 sec/batch
Epoch: 16/20... Training Step: 3004... Training loss: 1.4770... 1.6762 sec/batch
Epoch: 16/20... Training Step: 3005... Training loss: 1.4724... 1.6813 sec/batch
Epoch: 16/20... Training Step: 3006... Training loss: 1.4655... 1.7179 sec/batch
Epoch: 16/20... Training Step: 3007... Training loss: 1.4439... 1.7172 sec/batch
Epoch: 16/20... Training Step: 3008... Training loss: 1.4223... 1.9057 sec/batch
Epoch: 16/20... Training Step: 3009... Training loss: 1.4158... 1.9664 sec/batch
Epoch: 16/20... Training Step: 3010... Training loss: 1.4394... 2.0678 sec/batch
Epoch: 16/20... Training Step: 3011... Training loss: 1.4486... 1.8865 sec/batch
Epoch: 16/20... Training Step: 3012... Training loss: 1.5004... 1.6837 sec/batch
Epoch: 16/20... Training Step: 3013... Training loss: 1.4560... 1.6845 sec/batch
Epoch: 16/20... Training Step: 3014... Training loss: 1.4431... 1.7191 sec/batch
Epoch: 16/20... Training Step: 3015... Training loss: 1.4803... 1.7299 sec/batch
Epoch: 16/20... Training Step: 3016... Training loss: 1.4392... 1.6985 sec/batch
Epoch: 16/20... Training Step: 3017... Training loss: 1.4613... 1.7066 sec/batch
Epoch: 16/20... Training Step: 3018... Training loss: 1.4517... 1.6902 sec/batch
Epoch: 16/20... Training Step: 3019... Training loss: 1.4469... 1.6917 sec/batch
Epoch: 16/20... Training Step: 3020... Training loss: 1.4875... 1.6944 sec/batch
Epoch: 16/20... Training Step: 3021... Training loss: 1.4386... 1.7065 sec/batch
Epoch: 16/20... Training Step: 3022... Training loss: 1.5144... 1.7120 sec/batch
Epoch: 16/20... Training Step: 3023... Training loss: 1.4726... 1.7106 sec/batch
Epoch: 16/20... Training Step: 3024... Training loss: 1.4817... 1.6988 sec/batch
Epoch: 16/20... Training Step: 3025... Training loss: 1.4484... 1.7091 sec/batch
Epoch: 16/20... Training Step: 3026... Training loss: 1.4738... 1.9838 sec/batch
Epoch: 16/20... Training Step: 3027... Training loss: 1.4807... 1.8808 sec/batch
Epoch: 16/20... Training Step: 3028... Training loss: 1.4438... 1.7101 sec/batch
Epoch: 16/20... Training Step: 3029... Training loss: 1.4357... 1.6990 sec/batch
Epoch: 16/20... Training Step: 3030... Training loss: 1.5021... 1.6921 sec/batch
Epoch: 16/20... Training Step: 3031... Training loss: 1.4651... 1.7307 sec/batch
Epoch: 16/20... Training Step: 3032... Training loss: 1.5243... 1.7272 sec/batch
Epoch: 16/20... Training Step: 3033... Training loss: 1.4945... 1.6899 sec/batch
Epoch: 16/20... Training Step: 3034... Training loss: 1.4624... 1.6985 sec/batch
Epoch: 16/20... Training Step: 3035... Training loss: 1.4581... 1.6786 sec/batch
Epoch: 16/20... Training Step: 3036... Training loss: 1.4776... 1.6969 sec/batch
Epoch: 16/20... Training Step: 3037... Training loss: 1.4916... 1.7347 sec/batch
Epoch: 16/20... Training Step: 3038... Training loss: 1.4450... 1.6833 sec/batch
Epoch: 16/20... Training Step: 3039... Training loss: 1.4617... 1.6848 sec/batch
Epoch: 16/20... Training Step: 3040... Training loss: 1.4445... 1.6983 sec/batch
Epoch: 16/20... Training Step: 3041... Training loss: 1.5138... 1.6846 sec/batch
Epoch: 16/20... Training Step: 3042... Training loss: 1.4947... 1.6859 sec/batch
Epoch: 16/20... Training Step: 3043... Training loss: 1.4939... 1.6926 sec/batch
Epoch: 16/20... Training Step: 3044... Training loss: 1.4526... 1.7495 sec/batch
Epoch: 16/20... Training Step: 3045... Training loss: 1.4687... 1.7146 sec/batch
Epoch: 16/20... Training Step: 3046... Training loss: 1.4999... 1.7167 sec/batch
Epoch: 16/20... Training Step: 3047... Training loss: 1.4669... 1.7034 sec/batch
Epoch: 16/20... Training Step: 3048... Training loss: 1.4597... 1.6968 sec/batch
Epoch: 16/20... Training Step: 3049... Training loss: 1.4377... 1.7009 sec/batch
Epoch: 16/20... Training Step: 3050... Training loss: 1.4602... 1.7046 sec/batch
Epoch: 16/20... Training Step: 3051... Training loss: 1.4169... 1.6928 sec/batch
Epoch: 16/20... Training Step: 3052... Training loss: 1.4739... 1.6929 sec/batch
Epoch: 16/20... Training Step: 3053... Training loss: 1.4350... 1.6964 sec/batch
Epoch: 16/20... Training Step: 3054... Training loss: 1.4633... 1.6793 sec/batch
Epoch: 16/20... Training Step: 3055... Training loss: 1.4361... 1.7124 sec/batch
Epoch: 16/20... Training Step: 3056... Training loss: 1.4591... 1.7177 sec/batch
Epoch: 16/20... Training Step: 3057... Training loss: 1.4337... 1.6882 sec/batch
Epoch: 16/20... Training Step: 3058... Training loss: 1.4392... 1.7006 sec/batch
Epoch: 16/20... Training Step: 3059... Training loss: 1.4410... 1.6910 sec/batch
Epoch: 16/20... Training Step: 3060... Training loss: 1.4624... 1.7212 sec/batch
Epoch: 16/20... Training Step: 3061... Training loss: 1.4344... 1.6982 sec/batch
Epoch: 16/20... Training Step: 3062... Training loss: 1.4501... 1.7215 sec/batch
Epoch: 16/20... Training Step: 3063... Training loss: 1.4451... 1.7235 sec/batch
Epoch: 16/20... Training Step: 3064... Training loss: 1.4371... 1.7283 sec/batch
Epoch: 16/20... Training Step: 3065... Training loss: 1.4430... 1.7294 sec/batch
Epoch: 16/20... Training Step: 3066... Training loss: 1.4806... 1.6866 sec/batch
Epoch: 16/20... Training Step: 3067... Training loss: 1.4584... 1.6888 sec/batch
Epoch: 16/20... Training Step: 3068... Training loss: 1.4230... 1.6986 sec/batch
Epoch: 16/20... Training Step: 3069... Training loss: 1.4453... 1.7233 sec/batch
Epoch: 16/20... Training Step: 3070... Training loss: 1.4393... 1.6964 sec/batch
Epoch: 16/20... Training Step: 3071... Training loss: 1.4437... 1.6853 sec/batch
Epoch: 16/20... Training Step: 3072... Training loss: 1.4494... 1.6876 sec/batch
Epoch: 16/20... Training Step: 3073... Training loss: 1.4548... 1.7119 sec/batch
Epoch: 16/20... Training Step: 3074... Training loss: 1.4540... 1.7182 sec/batch
Epoch: 16/20... Training Step: 3075... Training loss: 1.4558... 1.6786 sec/batch
Epoch: 16/20... Training Step: 3076... Training loss: 1.4539... 1.6970 sec/batch
Epoch: 16/20... Training Step: 3077... Training loss: 1.4512... 1.7292 sec/batch
Epoch: 16/20... Training Step: 3078... Training loss: 1.4649... 1.6866 sec/batch
Epoch: 16/20... Training Step: 3079... Training loss: 1.4450... 1.7229 sec/batch
Epoch: 16/20... Training Step: 3080... Training loss: 1.4679... 1.7905 sec/batch
Epoch: 16/20... Training Step: 3081... Training loss: 1.4392... 1.6975 sec/batch
Epoch: 16/20... Training Step: 3082... Training loss: 1.4539... 1.7133 sec/batch
Epoch: 16/20... Training Step: 3083... Training loss: 1.4551... 1.7386 sec/batch
Epoch: 16/20... Training Step: 3084... Training loss: 1.4447... 1.6956 sec/batch
Epoch: 16/20... Training Step: 3085... Training loss: 1.4381... 1.7174 sec/batch
Epoch: 16/20... Training Step: 3086... Training loss: 1.4166... 1.7154 sec/batch
Epoch: 16/20... Training Step: 3087... Training loss: 1.4729... 1.7134 sec/batch
Epoch: 16/20... Training Step: 3088... Training loss: 1.4753... 1.6973 sec/batch
Epoch: 16/20... Training Step: 3089... Training loss: 1.4606... 1.7325 sec/batch
Epoch: 16/20... Training Step: 3090... Training loss: 1.4459... 1.6979 sec/batch
Epoch: 16/20... Training Step: 3091... Training loss: 1.4546... 1.7373 sec/batch
Epoch: 16/20... Training Step: 3092... Training loss: 1.4246... 1.6839 sec/batch
Epoch: 16/20... Training Step: 3093... Training loss: 1.4141... 1.6965 sec/batch
Epoch: 16/20... Training Step: 3094... Training loss: 1.4685... 1.7040 sec/batch
Epoch: 16/20... Training Step: 3095... Training loss: 1.4415... 1.6867 sec/batch
Epoch: 16/20... Training Step: 3096... Training loss: 1.4169... 1.7118 sec/batch
Epoch: 16/20... Training Step: 3097... Training loss: 1.4721... 1.7334 sec/batch
Epoch: 16/20... Training Step: 3098... Training loss: 1.4662... 1.7858 sec/batch
Epoch: 16/20... Training Step: 3099... Training loss: 1.4371... 1.8182 sec/batch
Epoch: 16/20... Training Step: 3100... Training loss: 1.4299... 1.7458 sec/batch
Epoch: 16/20... Training Step: 3101... Training loss: 1.4203... 1.7461 sec/batch
Epoch: 16/20... Training Step: 3102... Training loss: 1.4412... 1.6896 sec/batch
Epoch: 16/20... Training Step: 3103... Training loss: 1.4746... 1.7053 sec/batch
Epoch: 16/20... Training Step: 3104... Training loss: 1.4666... 1.7181 sec/batch
Epoch: 16/20... Training Step: 3105... Training loss: 1.4655... 1.6959 sec/batch
Epoch: 16/20... Training Step: 3106... Training loss: 1.4586... 1.7013 sec/batch
Epoch: 16/20... Training Step: 3107... Training loss: 1.4978... 1.6966 sec/batch
Epoch: 16/20... Training Step: 3108... Training loss: 1.4669... 1.6959 sec/batch
Epoch: 16/20... Training Step: 3109... Training loss: 1.4618... 1.7520 sec/batch
Epoch: 16/20... Training Step: 3110... Training loss: 1.4542... 1.7030 sec/batch
Epoch: 16/20... Training Step: 3111... Training loss: 1.4943... 1.7001 sec/batch
Epoch: 16/20... Training Step: 3112... Training loss: 1.4650... 1.6795 sec/batch
Epoch: 16/20... Training Step: 3113... Training loss: 1.4504... 1.7037 sec/batch
Epoch: 16/20... Training Step: 3114... Training loss: 1.4861... 1.7236 sec/batch
Epoch: 16/20... Training Step: 3115... Training loss: 1.4410... 1.6858 sec/batch
Epoch: 16/20... Training Step: 3116... Training loss: 1.4832... 1.6862 sec/batch
Epoch: 16/20... Training Step: 3117... Training loss: 1.4685... 1.7410 sec/batch
Epoch: 16/20... Training Step: 3118... Training loss: 1.4992... 1.6914 sec/batch
Epoch: 16/20... Training Step: 3119... Training loss: 1.4735... 1.6898 sec/batch
Epoch: 16/20... Training Step: 3120... Training loss: 1.4634... 1.7103 sec/batch
Epoch: 16/20... Training Step: 3121... Training loss: 1.4210... 1.6981 sec/batch
Epoch: 16/20... Training Step: 3122... Training loss: 1.4453... 1.7305 sec/batch
Epoch: 16/20... Training Step: 3123... Training loss: 1.4784... 1.7048 sec/batch
Epoch: 16/20... Training Step: 3124... Training loss: 1.4530... 1.6918 sec/batch
Epoch: 16/20... Training Step: 3125... Training loss: 1.4518... 1.6844 sec/batch
Epoch: 16/20... Training Step: 3126... Training loss: 1.4613... 1.7026 sec/batch
Epoch: 16/20... Training Step: 3127... Training loss: 1.4644... 1.6784 sec/batch
Epoch: 16/20... Training Step: 3128... Training loss: 1.4523... 1.6993 sec/batch
Epoch: 16/20... Training Step: 3129... Training loss: 1.4255... 1.7141 sec/batch
Epoch: 16/20... Training Step: 3130... Training loss: 1.4748... 1.7789 sec/batch
Epoch: 16/20... Training Step: 3131... Training loss: 1.4889... 1.7751 sec/batch
Epoch: 16/20... Training Step: 3132... Training loss: 1.4570... 1.8163 sec/batch
Epoch: 16/20... Training Step: 3133... Training loss: 1.4711... 1.8021 sec/batch
Epoch: 16/20... Training Step: 3134... Training loss: 1.4588... 1.7317 sec/batch
Epoch: 16/20... Training Step: 3135... Training loss: 1.4657... 1.6828 sec/batch
Epoch: 16/20... Training Step: 3136... Training loss: 1.4678... 1.6639 sec/batch
Epoch: 16/20... Training Step: 3137... Training loss: 1.4717... 1.7179 sec/batch
Epoch: 16/20... Training Step: 3138... Training loss: 1.5221... 1.7344 sec/batch
Epoch: 16/20... Training Step: 3139... Training loss: 1.4679... 1.6770 sec/batch
Epoch: 16/20... Training Step: 3140... Training loss: 1.4624... 1.6849 sec/batch
Epoch: 16/20... Training Step: 3141... Training loss: 1.4516... 1.6929 sec/batch
Epoch: 16/20... Training Step: 3142... Training loss: 1.4526... 1.6963 sec/batch
Epoch: 16/20... Training Step: 3143... Training loss: 1.4941... 1.7079 sec/batch
Epoch: 16/20... Training Step: 3144... Training loss: 1.4587... 1.7222 sec/batch
Epoch: 16/20... Training Step: 3145... Training loss: 1.4798... 1.6879 sec/batch
Epoch: 16/20... Training Step: 3146... Training loss: 1.4450... 1.7111 sec/batch
Epoch: 16/20... Training Step: 3147... Training loss: 1.4521... 1.6915 sec/batch
Epoch: 16/20... Training Step: 3148... Training loss: 1.4752... 1.7020 sec/batch
Epoch: 16/20... Training Step: 3149... Training loss: 1.4413... 1.7268 sec/batch
Epoch: 16/20... Training Step: 3150... Training loss: 1.4301... 1.8109 sec/batch
Epoch: 16/20... Training Step: 3151... Training loss: 1.4347... 1.7398 sec/batch
Epoch: 16/20... Training Step: 3152... Training loss: 1.4613... 1.6976 sec/batch
Epoch: 16/20... Training Step: 3153... Training loss: 1.4623... 1.6848 sec/batch
Epoch: 16/20... Training Step: 3154... Training loss: 1.4488... 1.7133 sec/batch
Epoch: 16/20... Training Step: 3155... Training loss: 1.4496... 1.6904 sec/batch
Epoch: 16/20... Training Step: 3156... Training loss: 1.4461... 1.6955 sec/batch
Epoch: 16/20... Training Step: 3157... Training loss: 1.4816... 1.7052 sec/batch
Epoch: 16/20... Training Step: 3158... Training loss: 1.4464... 1.7089 sec/batch
Epoch: 16/20... Training Step: 3159... Training loss: 1.4466... 1.7338 sec/batch
Epoch: 16/20... Training Step: 3160... Training loss: 1.4561... 1.7095 sec/batch
Epoch: 16/20... Training Step: 3161... Training loss: 1.4449... 1.7017 sec/batch
Epoch: 16/20... Training Step: 3162... Training loss: 1.4336... 1.6845 sec/batch
Epoch: 16/20... Training Step: 3163... Training loss: 1.4610... 1.7173 sec/batch
Epoch: 16/20... Training Step: 3164... Training loss: 1.4423... 1.7100 sec/batch
Epoch: 16/20... Training Step: 3165... Training loss: 1.4306... 1.6824 sec/batch
Epoch: 16/20... Training Step: 3166... Training loss: 1.4647... 1.6857 sec/batch
Epoch: 16/20... Training Step: 3167... Training loss: 1.4526... 1.7360 sec/batch
Epoch: 16/20... Training Step: 3168... Training loss: 1.4456... 1.7114 sec/batch
Epoch: 17/20... Training Step: 3169... Training loss: 1.5181... 1.7227 sec/batch
Epoch: 17/20... Training Step: 3170... Training loss: 1.4576... 1.6936 sec/batch
Epoch: 17/20... Training Step: 3171... Training loss: 1.4417... 1.6759 sec/batch
Epoch: 17/20... Training Step: 3172... Training loss: 1.4653... 1.6966 sec/batch
Epoch: 17/20... Training Step: 3173... Training loss: 1.4361... 1.7028 sec/batch
Epoch: 17/20... Training Step: 3174... Training loss: 1.4141... 1.6829 sec/batch
Epoch: 17/20... Training Step: 3175... Training loss: 1.4428... 1.6806 sec/batch
Epoch: 17/20... Training Step: 3176... Training loss: 1.4493... 1.6882 sec/batch
Epoch: 17/20... Training Step: 3177... Training loss: 1.4626... 1.7202 sec/batch
Epoch: 17/20... Training Step: 3178... Training loss: 1.4387... 1.6894 sec/batch
Epoch: 17/20... Training Step: 3179... Training loss: 1.4279... 1.6954 sec/batch
Epoch: 17/20... Training Step: 3180... Training loss: 1.4393... 1.6897 sec/batch
Epoch: 17/20... Training Step: 3181... Training loss: 1.4534... 1.6825 sec/batch
Epoch: 17/20... Training Step: 3182... Training loss: 1.4736... 1.6807 sec/batch
Epoch: 17/20... Training Step: 3183... Training loss: 1.4416... 1.6936 sec/batch
Epoch: 17/20... Training Step: 3184... Training loss: 1.4213... 1.7203 sec/batch
Epoch: 17/20... Training Step: 3185... Training loss: 1.4608... 1.7202 sec/batch
Epoch: 17/20... Training Step: 3186... Training loss: 1.4613... 1.6797 sec/batch
Epoch: 17/20... Training Step: 3187... Training loss: 1.4411... 1.6883 sec/batch
Epoch: 17/20... Training Step: 3188... Training loss: 1.4801... 1.6846 sec/batch
Epoch: 17/20... Training Step: 3189... Training loss: 1.4448... 1.7016 sec/batch
Epoch: 17/20... Training Step: 3190... Training loss: 1.4702... 1.6877 sec/batch
Epoch: 17/20... Training Step: 3191... Training loss: 1.4477... 1.6718 sec/batch
Epoch: 17/20... Training Step: 3192... Training loss: 1.4574... 1.7248 sec/batch
Epoch: 17/20... Training Step: 3193... Training loss: 1.4607... 1.6948 sec/batch
Epoch: 17/20... Training Step: 3194... Training loss: 1.4163... 1.6748 sec/batch
Epoch: 17/20... Training Step: 3195... Training loss: 1.4210... 1.6881 sec/batch
Epoch: 17/20... Training Step: 3196... Training loss: 1.4684... 1.6852 sec/batch
Epoch: 17/20... Training Step: 3197... Training loss: 1.4627... 1.6955 sec/batch
Epoch: 17/20... Training Step: 3198... Training loss: 1.4608... 1.7064 sec/batch
Epoch: 17/20... Training Step: 3199... Training loss: 1.4377... 1.6949 sec/batch
Epoch: 17/20... Training Step: 3200... Training loss: 1.4352... 1.7044 sec/batch
Epoch: 17/20... Training Step: 3201... Training loss: 1.4628... 1.6935 sec/batch
Epoch: 17/20... Training Step: 3202... Training loss: 1.4587... 1.7575 sec/batch
Epoch: 17/20... Training Step: 3203... Training loss: 1.4466... 1.7391 sec/batch
Epoch: 17/20... Training Step: 3204... Training loss: 1.4658... 1.7514 sec/batch
Epoch: 17/20... Training Step: 3205... Training loss: 1.4321... 1.7855 sec/batch
Epoch: 17/20... Training Step: 3206... Training loss: 1.4057... 1.7182 sec/batch
Epoch: 17/20... Training Step: 3207... Training loss: 1.4048... 1.7240 sec/batch
Epoch: 17/20... Training Step: 3208... Training loss: 1.4295... 1.7053 sec/batch
Epoch: 17/20... Training Step: 3209... Training loss: 1.4269... 1.6923 sec/batch
Epoch: 17/20... Training Step: 3210... Training loss: 1.4810... 1.6832 sec/batch
Epoch: 17/20... Training Step: 3211... Training loss: 1.4336... 1.6873 sec/batch
Epoch: 17/20... Training Step: 3212... Training loss: 1.4362... 1.6971 sec/batch
Epoch: 17/20... Training Step: 3213... Training loss: 1.4583... 1.7391 sec/batch
Epoch: 17/20... Training Step: 3214... Training loss: 1.4180... 1.7047 sec/batch
Epoch: 17/20... Training Step: 3215... Training loss: 1.4341... 1.6880 sec/batch
Epoch: 17/20... Training Step: 3216... Training loss: 1.4406... 1.7004 sec/batch
Epoch: 17/20... Training Step: 3217... Training loss: 1.4363... 1.7131 sec/batch
Epoch: 17/20... Training Step: 3218... Training loss: 1.4587... 1.7057 sec/batch
Epoch: 17/20... Training Step: 3219... Training loss: 1.4184... 1.7136 sec/batch
Epoch: 17/20... Training Step: 3220... Training loss: 1.4811... 1.8061 sec/batch
Epoch: 17/20... Training Step: 3221... Training loss: 1.4527... 1.7027 sec/batch
Epoch: 17/20... Training Step: 3222... Training loss: 1.4666... 1.7269 sec/batch
Epoch: 17/20... Training Step: 3223... Training loss: 1.4284... 1.6675 sec/batch
Epoch: 17/20... Training Step: 3224... Training loss: 1.4456... 1.7188 sec/batch
Epoch: 17/20... Training Step: 3225... Training loss: 1.4561... 1.7127 sec/batch
Epoch: 17/20... Training Step: 3226... Training loss: 1.4328... 1.6977 sec/batch
Epoch: 17/20... Training Step: 3227... Training loss: 1.4281... 1.7010 sec/batch
Epoch: 17/20... Training Step: 3228... Training loss: 1.4809... 1.6929 sec/batch
Epoch: 17/20... Training Step: 3229... Training loss: 1.4500... 1.6798 sec/batch
Epoch: 17/20... Training Step: 3230... Training loss: 1.4952... 1.7446 sec/batch
Epoch: 17/20... Training Step: 3231... Training loss: 1.4707... 1.6689 sec/batch
Epoch: 17/20... Training Step: 3232... Training loss: 1.4467... 1.6856 sec/batch
Epoch: 17/20... Training Step: 3233... Training loss: 1.4459... 1.6899 sec/batch
Epoch: 17/20... Training Step: 3234... Training loss: 1.4592... 1.7047 sec/batch
Epoch: 17/20... Training Step: 3235... Training loss: 1.4707... 1.6899 sec/batch
Epoch: 17/20... Training Step: 3236... Training loss: 1.4244... 1.7662 sec/batch
Epoch: 17/20... Training Step: 3237... Training loss: 1.4411... 1.7571 sec/batch
Epoch: 17/20... Training Step: 3238... Training loss: 1.4360... 1.7361 sec/batch
Epoch: 17/20... Training Step: 3239... Training loss: 1.4870... 1.7301 sec/batch
Epoch: 17/20... Training Step: 3240... Training loss: 1.4682... 1.6862 sec/batch
Epoch: 17/20... Training Step: 3241... Training loss: 1.4795... 1.7004 sec/batch
Epoch: 17/20... Training Step: 3242... Training loss: 1.4321... 1.7028 sec/batch
Epoch: 17/20... Training Step: 3243... Training loss: 1.4472... 1.7780 sec/batch
Epoch: 17/20... Training Step: 3244... Training loss: 1.4771... 1.7090 sec/batch
Epoch: 17/20... Training Step: 3245... Training loss: 1.4577... 1.6998 sec/batch
Epoch: 17/20... Training Step: 3246... Training loss: 1.4384... 1.7249 sec/batch
Epoch: 17/20... Training Step: 3247... Training loss: 1.4023... 1.7076 sec/batch
Epoch: 17/20... Training Step: 3248... Training loss: 1.4470... 1.6906 sec/batch
Epoch: 17/20... Training Step: 3249... Training loss: 1.4068... 1.6944 sec/batch
Epoch: 17/20... Training Step: 3250... Training loss: 1.4423... 1.6788 sec/batch
Epoch: 17/20... Training Step: 3251... Training loss: 1.4105... 1.6762 sec/batch
Epoch: 17/20... Training Step: 3252... Training loss: 1.4377... 1.6951 sec/batch
Epoch: 17/20... Training Step: 3253... Training loss: 1.4219... 1.7079 sec/batch
Epoch: 17/20... Training Step: 3254... Training loss: 1.4309... 1.7152 sec/batch
Epoch: 17/20... Training Step: 3255... Training loss: 1.4217... 1.6931 sec/batch
Epoch: 17/20... Training Step: 3256... Training loss: 1.4203... 1.7390 sec/batch
Epoch: 17/20... Training Step: 3257... Training loss: 1.4128... 1.6834 sec/batch
Epoch: 17/20... Training Step: 3258... Training loss: 1.4600... 1.6908 sec/batch
Epoch: 17/20... Training Step: 3259... Training loss: 1.4186... 1.6890 sec/batch
Epoch: 17/20... Training Step: 3260... Training loss: 1.4184... 1.7186 sec/batch
Epoch: 17/20... Training Step: 3261... Training loss: 1.4218... 1.6968 sec/batch
Epoch: 17/20... Training Step: 3262... Training loss: 1.4278... 1.6960 sec/batch
Epoch: 17/20... Training Step: 3263... Training loss: 1.4150... 1.6892 sec/batch
Epoch: 17/20... Training Step: 3264... Training loss: 1.4543... 1.7032 sec/batch
Epoch: 17/20... Training Step: 3265... Training loss: 1.4476... 1.7453 sec/batch
Epoch: 17/20... Training Step: 3266... Training loss: 1.4150... 1.6939 sec/batch
Epoch: 17/20... Training Step: 3267... Training loss: 1.4159... 1.6602 sec/batch
Epoch: 17/20... Training Step: 3268... Training loss: 1.4162... 1.6873 sec/batch
Epoch: 17/20... Training Step: 3269... Training loss: 1.4319... 1.7150 sec/batch
Epoch: 17/20... Training Step: 3270... Training loss: 1.4333... 1.6877 sec/batch
Epoch: 17/20... Training Step: 3271... Training loss: 1.4433... 1.7232 sec/batch
Epoch: 17/20... Training Step: 3272... Training loss: 1.4382... 1.7835 sec/batch
Epoch: 17/20... Training Step: 3273... Training loss: 1.4402... 1.8342 sec/batch
Epoch: 17/20... Training Step: 3274... Training loss: 1.4311... 1.8195 sec/batch
Epoch: 17/20... Training Step: 3275... Training loss: 1.4446... 1.7097 sec/batch
Epoch: 17/20... Training Step: 3276... Training loss: 1.4464... 1.7057 sec/batch
Epoch: 17/20... Training Step: 3277... Training loss: 1.4397... 1.6931 sec/batch
Epoch: 17/20... Training Step: 3278... Training loss: 1.4525... 1.7056 sec/batch
Epoch: 17/20... Training Step: 3279... Training loss: 1.4282... 1.6923 sec/batch
Epoch: 17/20... Training Step: 3280... Training loss: 1.4490... 1.6868 sec/batch
Epoch: 17/20... Training Step: 3281... Training loss: 1.4434... 1.7032 sec/batch
Epoch: 17/20... Training Step: 3282... Training loss: 1.4229... 1.7184 sec/batch
Epoch: 17/20... Training Step: 3283... Training loss: 1.4223... 1.7324 sec/batch
Epoch: 17/20... Training Step: 3284... Training loss: 1.4053... 1.7259 sec/batch
Epoch: 17/20... Training Step: 3285... Training loss: 1.4575... 1.7132 sec/batch
Epoch: 17/20... Training Step: 3286... Training loss: 1.4550... 1.6839 sec/batch
Epoch: 17/20... Training Step: 3287... Training loss: 1.4377... 1.7004 sec/batch
Epoch: 17/20... Training Step: 3288... Training loss: 1.4303... 1.7375 sec/batch
Epoch: 17/20... Training Step: 3289... Training loss: 1.4429... 1.7244 sec/batch
Epoch: 17/20... Training Step: 3290... Training loss: 1.4100... 1.7577 sec/batch
Epoch: 17/20... Training Step: 3291... Training loss: 1.3897... 1.6936 sec/batch
Epoch: 17/20... Training Step: 3292... Training loss: 1.4442... 1.6869 sec/batch
Epoch: 17/20... Training Step: 3293... Training loss: 1.4324... 1.6999 sec/batch
Epoch: 17/20... Training Step: 3294... Training loss: 1.3974... 1.7106 sec/batch
Epoch: 17/20... Training Step: 3295... Training loss: 1.4484... 1.6982 sec/batch
Epoch: 17/20... Training Step: 3296... Training loss: 1.4438... 1.6850 sec/batch
Epoch: 17/20... Training Step: 3297... Training loss: 1.4149... 1.7076 sec/batch
Epoch: 17/20... Training Step: 3298... Training loss: 1.4207... 1.7038 sec/batch
Epoch: 17/20... Training Step: 3299... Training loss: 1.3922... 1.7165 sec/batch
Epoch: 17/20... Training Step: 3300... Training loss: 1.4215... 1.6925 sec/batch
Epoch: 17/20... Training Step: 3301... Training loss: 1.4699... 1.7031 sec/batch
Epoch: 17/20... Training Step: 3302... Training loss: 1.4556... 1.6863 sec/batch
Epoch: 17/20... Training Step: 3303... Training loss: 1.4455... 1.6797 sec/batch
Epoch: 17/20... Training Step: 3304... Training loss: 1.4493... 1.7456 sec/batch
Epoch: 17/20... Training Step: 3305... Training loss: 1.4736... 1.7709 sec/batch
Epoch: 17/20... Training Step: 3306... Training loss: 1.4529... 1.8644 sec/batch
Epoch: 17/20... Training Step: 3307... Training loss: 1.4451... 1.8049 sec/batch
Epoch: 17/20... Training Step: 3308... Training loss: 1.4410... 1.6910 sec/batch
Epoch: 17/20... Training Step: 3309... Training loss: 1.4948... 1.6897 sec/batch
Epoch: 17/20... Training Step: 3310... Training loss: 1.4521... 1.7000 sec/batch
Epoch: 17/20... Training Step: 3311... Training loss: 1.4401... 1.6869 sec/batch
Epoch: 17/20... Training Step: 3312... Training loss: 1.4615... 1.7348 sec/batch
Epoch: 17/20... Training Step: 3313... Training loss: 1.4295... 1.6783 sec/batch
Epoch: 17/20... Training Step: 3314... Training loss: 1.4661... 1.6986 sec/batch
Epoch: 17/20... Training Step: 3315... Training loss: 1.4622... 1.6966 sec/batch
Epoch: 17/20... Training Step: 3316... Training loss: 1.4871... 1.6991 sec/batch
Epoch: 17/20... Training Step: 3317... Training loss: 1.4577... 1.7133 sec/batch
Epoch: 17/20... Training Step: 3318... Training loss: 1.4472... 1.6857 sec/batch
Epoch: 17/20... Training Step: 3319... Training loss: 1.4086... 1.7060 sec/batch
Epoch: 17/20... Training Step: 3320... Training loss: 1.4269... 1.7042 sec/batch
Epoch: 17/20... Training Step: 3321... Training loss: 1.4599... 1.6970 sec/batch
Epoch: 17/20... Training Step: 3322... Training loss: 1.4441... 1.6896 sec/batch
Epoch: 17/20... Training Step: 3323... Training loss: 1.4374... 1.7104 sec/batch
Epoch: 17/20... Training Step: 3324... Training loss: 1.4362... 1.7428 sec/batch
Epoch: 17/20... Training Step: 3325... Training loss: 1.4507... 1.6863 sec/batch
Epoch: 17/20... Training Step: 3326... Training loss: 1.4393... 1.6935 sec/batch
Epoch: 17/20... Training Step: 3327... Training loss: 1.3981... 1.7049 sec/batch
Epoch: 17/20... Training Step: 3328... Training loss: 1.4661... 1.6971 sec/batch
Epoch: 17/20... Training Step: 3329... Training loss: 1.4749... 1.7406 sec/batch
Epoch: 17/20... Training Step: 3330... Training loss: 1.4384... 1.7120 sec/batch
Epoch: 17/20... Training Step: 3331... Training loss: 1.4499... 1.6931 sec/batch
Epoch: 17/20... Training Step: 3332... Training loss: 1.4408... 1.6967 sec/batch
Epoch: 17/20... Training Step: 3333... Training loss: 1.4449... 1.7084 sec/batch
Epoch: 17/20... Training Step: 3334... Training loss: 1.4545... 1.7016 sec/batch
Epoch: 17/20... Training Step: 3335... Training loss: 1.4601... 1.6960 sec/batch
Epoch: 17/20... Training Step: 3336... Training loss: 1.5213... 1.6922 sec/batch
Epoch: 17/20... Training Step: 3337... Training loss: 1.4502... 1.7128 sec/batch
Epoch: 17/20... Training Step: 3338... Training loss: 1.4427... 1.6999 sec/batch
Epoch: 17/20... Training Step: 3339... Training loss: 1.4362... 1.6862 sec/batch
Epoch: 17/20... Training Step: 3340... Training loss: 1.4243... 1.7025 sec/batch
Epoch: 17/20... Training Step: 3341... Training loss: 1.4684... 1.7215 sec/batch
Epoch: 17/20... Training Step: 3342... Training loss: 1.4545... 1.7582 sec/batch
Epoch: 17/20... Training Step: 3343... Training loss: 1.4605... 1.7702 sec/batch
Epoch: 17/20... Training Step: 3344... Training loss: 1.4273... 1.6886 sec/batch
Epoch: 17/20... Training Step: 3345... Training loss: 1.4336... 1.6967 sec/batch
Epoch: 17/20... Training Step: 3346... Training loss: 1.4733... 1.7168 sec/batch
Epoch: 17/20... Training Step: 3347... Training loss: 1.4327... 1.6827 sec/batch
Epoch: 17/20... Training Step: 3348... Training loss: 1.4266... 1.6773 sec/batch
Epoch: 17/20... Training Step: 3349... Training loss: 1.4168... 1.6832 sec/batch
Epoch: 17/20... Training Step: 3350... Training loss: 1.4482... 1.6740 sec/batch
Epoch: 17/20... Training Step: 3351... Training loss: 1.4515... 1.7120 sec/batch
Epoch: 17/20... Training Step: 3352... Training loss: 1.4346... 1.6904 sec/batch
Epoch: 17/20... Training Step: 3353... Training loss: 1.4341... 1.6983 sec/batch
Epoch: 17/20... Training Step: 3354... Training loss: 1.4262... 1.7651 sec/batch
Epoch: 17/20... Training Step: 3355... Training loss: 1.4665... 1.7046 sec/batch
Epoch: 17/20... Training Step: 3356... Training loss: 1.4287... 1.7014 sec/batch
Epoch: 17/20... Training Step: 3357... Training loss: 1.4199... 1.7003 sec/batch
Epoch: 17/20... Training Step: 3358... Training loss: 1.4373... 1.7145 sec/batch
Epoch: 17/20... Training Step: 3359... Training loss: 1.4236... 1.7372 sec/batch
Epoch: 17/20... Training Step: 3360... Training loss: 1.4195... 1.7108 sec/batch
Epoch: 17/20... Training Step: 3361... Training loss: 1.4411... 1.7973 sec/batch
Epoch: 17/20... Training Step: 3362... Training loss: 1.4109... 1.6939 sec/batch
Epoch: 17/20... Training Step: 3363... Training loss: 1.4176... 1.6992 sec/batch
Epoch: 17/20... Training Step: 3364... Training loss: 1.4552... 1.6740 sec/batch
Epoch: 17/20... Training Step: 3365... Training loss: 1.4344... 1.6784 sec/batch
Epoch: 17/20... Training Step: 3366... Training loss: 1.4224... 1.6907 sec/batch
Epoch: 18/20... Training Step: 3367... Training loss: 1.4973... 1.7314 sec/batch
Epoch: 18/20... Training Step: 3368... Training loss: 1.4455... 1.7065 sec/batch
Epoch: 18/20... Training Step: 3369... Training loss: 1.4301... 1.6742 sec/batch
Epoch: 18/20... Training Step: 3370... Training loss: 1.4536... 1.7090 sec/batch
Epoch: 18/20... Training Step: 3371... Training loss: 1.4253... 1.7160 sec/batch
Epoch: 18/20... Training Step: 3372... Training loss: 1.4071... 1.6903 sec/batch
Epoch: 18/20... Training Step: 3373... Training loss: 1.4424... 1.6901 sec/batch
Epoch: 18/20... Training Step: 3374... Training loss: 1.4349... 1.7040 sec/batch
Epoch: 18/20... Training Step: 3375... Training loss: 1.4517... 1.7261 sec/batch
Epoch: 18/20... Training Step: 3376... Training loss: 1.4297... 1.7259 sec/batch
Epoch: 18/20... Training Step: 3377... Training loss: 1.4233... 1.7508 sec/batch
Epoch: 18/20... Training Step: 3378... Training loss: 1.4366... 1.6909 sec/batch
Epoch: 18/20... Training Step: 3379... Training loss: 1.4350... 1.7039 sec/batch
Epoch: 18/20... Training Step: 3380... Training loss: 1.4612... 1.7016 sec/batch
Epoch: 18/20... Training Step: 3381... Training loss: 1.4215... 1.6506 sec/batch
Epoch: 18/20... Training Step: 3382... Training loss: 1.4271... 1.7020 sec/batch
Epoch: 18/20... Training Step: 3383... Training loss: 1.4520... 1.6822 sec/batch
Epoch: 18/20... Training Step: 3384... Training loss: 1.4648... 1.7218 sec/batch
Epoch: 18/20... Training Step: 3385... Training loss: 1.4387... 1.6876 sec/batch
Epoch: 18/20... Training Step: 3386... Training loss: 1.4689... 1.7001 sec/batch
Epoch: 18/20... Training Step: 3387... Training loss: 1.4337... 1.6956 sec/batch
Epoch: 18/20... Training Step: 3388... Training loss: 1.4608... 1.6985 sec/batch
Epoch: 18/20... Training Step: 3389... Training loss: 1.4421... 1.6917 sec/batch
Epoch: 18/20... Training Step: 3390... Training loss: 1.4541... 1.7001 sec/batch
Epoch: 18/20... Training Step: 3391... Training loss: 1.4420... 1.6862 sec/batch
Epoch: 18/20... Training Step: 3392... Training loss: 1.4073... 1.6870 sec/batch
Epoch: 18/20... Training Step: 3393... Training loss: 1.4126... 1.6976 sec/batch
Epoch: 18/20... Training Step: 3394... Training loss: 1.4748... 1.7049 sec/batch
Epoch: 18/20... Training Step: 3395... Training loss: 1.4478... 1.7706 sec/batch
Epoch: 18/20... Training Step: 3396... Training loss: 1.4613... 1.6786 sec/batch
Epoch: 18/20... Training Step: 3397... Training loss: 1.4304... 1.6880 sec/batch
Epoch: 18/20... Training Step: 3398... Training loss: 1.4286... 1.6947 sec/batch
Epoch: 18/20... Training Step: 3399... Training loss: 1.4527... 1.6896 sec/batch
Epoch: 18/20... Training Step: 3400... Training loss: 1.4632... 1.6786 sec/batch
Epoch: 18/20... Training Step: 3401... Training loss: 1.4417... 1.6933 sec/batch
Epoch: 18/20... Training Step: 3402... Training loss: 1.4522... 1.6957 sec/batch
Epoch: 18/20... Training Step: 3403... Training loss: 1.4248... 1.6767 sec/batch
Epoch: 18/20... Training Step: 3404... Training loss: 1.3976... 1.6842 sec/batch
Epoch: 18/20... Training Step: 3405... Training loss: 1.3945... 1.6937 sec/batch
Epoch: 18/20... Training Step: 3406... Training loss: 1.4198... 1.7452 sec/batch
Epoch: 18/20... Training Step: 3407... Training loss: 1.4090... 1.6865 sec/batch
Epoch: 18/20... Training Step: 3408... Training loss: 1.4687... 1.6917 sec/batch
Epoch: 18/20... Training Step: 3409... Training loss: 1.4337... 1.6901 sec/batch
Epoch: 18/20... Training Step: 3410... Training loss: 1.4180... 1.7026 sec/batch
Epoch: 18/20... Training Step: 3411... Training loss: 1.4575... 1.7276 sec/batch
Epoch: 18/20... Training Step: 3412... Training loss: 1.4071... 1.7315 sec/batch
Epoch: 18/20... Training Step: 3413... Training loss: 1.4365... 1.7198 sec/batch
Epoch: 18/20... Training Step: 3414... Training loss: 1.4204... 1.7040 sec/batch
Epoch: 18/20... Training Step: 3415... Training loss: 1.4265... 1.7343 sec/batch
Epoch: 18/20... Training Step: 3416... Training loss: 1.4516... 1.6844 sec/batch
Epoch: 18/20... Training Step: 3417... Training loss: 1.4165... 1.6924 sec/batch
Epoch: 18/20... Training Step: 3418... Training loss: 1.4770... 1.7165 sec/batch
Epoch: 18/20... Training Step: 3419... Training loss: 1.4478... 1.6984 sec/batch
Epoch: 18/20... Training Step: 3420... Training loss: 1.4583... 1.6918 sec/batch
Epoch: 18/20... Training Step: 3421... Training loss: 1.4261... 1.6965 sec/batch
Epoch: 18/20... Training Step: 3422... Training loss: 1.4430... 1.6910 sec/batch
Epoch: 18/20... Training Step: 3423... Training loss: 1.4509... 1.7515 sec/batch
Epoch: 18/20... Training Step: 3424... Training loss: 1.4195... 1.6980 sec/batch
Epoch: 18/20... Training Step: 3425... Training loss: 1.4173... 1.6835 sec/batch
Epoch: 18/20... Training Step: 3426... Training loss: 1.4662... 1.6977 sec/batch
Epoch: 18/20... Training Step: 3427... Training loss: 1.4439... 1.6969 sec/batch
Epoch: 18/20... Training Step: 3428... Training loss: 1.4816... 1.7060 sec/batch
Epoch: 18/20... Training Step: 3429... Training loss: 1.4586... 1.7507 sec/batch
Epoch: 18/20... Training Step: 3430... Training loss: 1.4469... 1.6954 sec/batch
Epoch: 18/20... Training Step: 3431... Training loss: 1.4366... 1.8051 sec/batch
Epoch: 18/20... Training Step: 3432... Training loss: 1.4518... 1.7145 sec/batch
Epoch: 18/20... Training Step: 3433... Training loss: 1.4671... 1.7064 sec/batch
Epoch: 18/20... Training Step: 3434... Training loss: 1.4172... 1.7846 sec/batch
Epoch: 18/20... Training Step: 3435... Training loss: 1.4308... 1.6920 sec/batch
Epoch: 18/20... Training Step: 3436... Training loss: 1.4314... 1.7038 sec/batch
Epoch: 18/20... Training Step: 3437... Training loss: 1.4708... 1.7008 sec/batch
Epoch: 18/20... Training Step: 3438... Training loss: 1.4573... 1.7189 sec/batch
Epoch: 18/20... Training Step: 3439... Training loss: 1.4680... 1.6977 sec/batch
Epoch: 18/20... Training Step: 3440... Training loss: 1.4268... 1.6856 sec/batch
Epoch: 18/20... Training Step: 3441... Training loss: 1.4313... 1.7068 sec/batch
Epoch: 18/20... Training Step: 3442... Training loss: 1.4593... 1.6961 sec/batch
Epoch: 18/20... Training Step: 3443... Training loss: 1.4369... 1.7027 sec/batch
Epoch: 18/20... Training Step: 3444... Training loss: 1.4358... 1.7185 sec/batch
Epoch: 18/20... Training Step: 3445... Training loss: 1.4037... 1.6837 sec/batch
Epoch: 18/20... Training Step: 3446... Training loss: 1.4411... 1.7177 sec/batch
Epoch: 18/20... Training Step: 3447... Training loss: 1.4032... 1.8087 sec/batch
Epoch: 18/20... Training Step: 3448... Training loss: 1.4437... 1.7411 sec/batch
Epoch: 18/20... Training Step: 3449... Training loss: 1.4100... 1.8171 sec/batch
Epoch: 18/20... Training Step: 3450... Training loss: 1.4410... 1.7991 sec/batch
Epoch: 18/20... Training Step: 3451... Training loss: 1.4059... 1.7852 sec/batch
Epoch: 18/20... Training Step: 3452... Training loss: 1.4240... 1.7475 sec/batch
Epoch: 18/20... Training Step: 3453... Training loss: 1.4035... 1.7438 sec/batch
Epoch: 18/20... Training Step: 3454... Training loss: 1.4164... 1.6971 sec/batch
Epoch: 18/20... Training Step: 3455... Training loss: 1.4080... 1.7083 sec/batch
Epoch: 18/20... Training Step: 3456... Training loss: 1.4479... 1.7004 sec/batch
Epoch: 18/20... Training Step: 3457... Training loss: 1.4069... 1.6992 sec/batch
Epoch: 18/20... Training Step: 3458... Training loss: 1.4280... 1.7165 sec/batch
Epoch: 18/20... Training Step: 3459... Training loss: 1.4085... 1.6997 sec/batch
Epoch: 18/20... Training Step: 3460... Training loss: 1.4058... 1.6918 sec/batch
Epoch: 18/20... Training Step: 3461... Training loss: 1.4136... 1.6824 sec/batch
Epoch: 18/20... Training Step: 3462... Training loss: 1.4369... 1.7124 sec/batch
Epoch: 18/20... Training Step: 3463... Training loss: 1.4440... 1.7053 sec/batch
Epoch: 18/20... Training Step: 3464... Training loss: 1.4080... 1.7572 sec/batch
Epoch: 18/20... Training Step: 3465... Training loss: 1.4206... 1.7425 sec/batch
Epoch: 18/20... Training Step: 3466... Training loss: 1.4055... 1.7328 sec/batch
Epoch: 18/20... Training Step: 3467... Training loss: 1.4349... 1.6866 sec/batch
Epoch: 18/20... Training Step: 3468... Training loss: 1.4237... 1.6832 sec/batch
Epoch: 18/20... Training Step: 3469... Training loss: 1.4296... 1.6918 sec/batch
Epoch: 18/20... Training Step: 3470... Training loss: 1.4274... 1.6884 sec/batch
Epoch: 18/20... Training Step: 3471... Training loss: 1.4317... 1.7194 sec/batch
Epoch: 18/20... Training Step: 3472... Training loss: 1.4184... 1.6871 sec/batch
Epoch: 18/20... Training Step: 3473... Training loss: 1.4388... 1.6885 sec/batch
Epoch: 18/20... Training Step: 3474... Training loss: 1.4444... 1.6788 sec/batch
Epoch: 18/20... Training Step: 3475... Training loss: 1.4261... 1.6878 sec/batch
Epoch: 18/20... Training Step: 3476... Training loss: 1.4519... 1.7420 sec/batch
Epoch: 18/20... Training Step: 3477... Training loss: 1.4176... 1.6819 sec/batch
Epoch: 18/20... Training Step: 3478... Training loss: 1.4290... 1.6795 sec/batch
Epoch: 18/20... Training Step: 3479... Training loss: 1.4364... 1.6591 sec/batch
Epoch: 18/20... Training Step: 3480... Training loss: 1.4215... 1.8826 sec/batch
Epoch: 18/20... Training Step: 3481... Training loss: 1.4074... 1.7630 sec/batch
Epoch: 18/20... Training Step: 3482... Training loss: 1.3955... 1.8038 sec/batch
Epoch: 18/20... Training Step: 3483... Training loss: 1.4397... 1.7544 sec/batch
Epoch: 18/20... Training Step: 3484... Training loss: 1.4457... 1.7093 sec/batch
Epoch: 18/20... Training Step: 3485... Training loss: 1.4230... 1.7361 sec/batch
Epoch: 18/20... Training Step: 3486... Training loss: 1.4200... 1.6836 sec/batch
Epoch: 18/20... Training Step: 3487... Training loss: 1.4352... 1.6788 sec/batch
Epoch: 18/20... Training Step: 3488... Training loss: 1.3964... 1.7200 sec/batch
Epoch: 18/20... Training Step: 3489... Training loss: 1.3895... 1.6736 sec/batch
Epoch: 18/20... Training Step: 3490... Training loss: 1.4358... 1.7077 sec/batch
Epoch: 18/20... Training Step: 3491... Training loss: 1.4255... 1.6993 sec/batch
Epoch: 18/20... Training Step: 3492... Training loss: 1.3962... 1.6870 sec/batch
Epoch: 18/20... Training Step: 3493... Training loss: 1.4474... 1.7224 sec/batch
Epoch: 18/20... Training Step: 3494... Training loss: 1.4422... 1.7343 sec/batch
Epoch: 18/20... Training Step: 3495... Training loss: 1.4194... 1.7571 sec/batch
Epoch: 18/20... Training Step: 3496... Training loss: 1.4005... 1.7929 sec/batch
Epoch: 18/20... Training Step: 3497... Training loss: 1.3855... 1.7199 sec/batch
Epoch: 18/20... Training Step: 3498... Training loss: 1.4134... 1.7491 sec/batch
Epoch: 18/20... Training Step: 3499... Training loss: 1.4547... 1.7702 sec/batch
Epoch: 18/20... Training Step: 3500... Training loss: 1.4294... 1.6989 sec/batch
Epoch: 18/20... Training Step: 3501... Training loss: 1.4422... 1.8009 sec/batch
Epoch: 18/20... Training Step: 3502... Training loss: 1.4412... 1.6964 sec/batch
Epoch: 18/20... Training Step: 3503... Training loss: 1.4652... 1.6883 sec/batch
Epoch: 18/20... Training Step: 3504... Training loss: 1.4443... 1.7423 sec/batch
Epoch: 18/20... Training Step: 3505... Training loss: 1.4398... 1.6840 sec/batch
Epoch: 18/20... Training Step: 3506... Training loss: 1.4288... 1.6920 sec/batch
Epoch: 18/20... Training Step: 3507... Training loss: 1.4777... 1.7145 sec/batch
Epoch: 18/20... Training Step: 3508... Training loss: 1.4459... 1.6983 sec/batch
Epoch: 18/20... Training Step: 3509... Training loss: 1.4285... 1.7333 sec/batch
Epoch: 18/20... Training Step: 3510... Training loss: 1.4585... 1.7038 sec/batch
Epoch: 18/20... Training Step: 3511... Training loss: 1.4207... 1.7106 sec/batch
Epoch: 18/20... Training Step: 3512... Training loss: 1.4539... 1.6831 sec/batch
Epoch: 18/20... Training Step: 3513... Training loss: 1.4435... 1.6902 sec/batch
Epoch: 18/20... Training Step: 3514... Training loss: 1.4798... 1.7108 sec/batch
Epoch: 18/20... Training Step: 3515... Training loss: 1.4486... 1.7168 sec/batch
Epoch: 18/20... Training Step: 3516... Training loss: 1.4352... 1.7280 sec/batch
Epoch: 18/20... Training Step: 3517... Training loss: 1.4049... 1.7319 sec/batch
Epoch: 18/20... Training Step: 3518... Training loss: 1.4243... 1.6995 sec/batch
Epoch: 18/20... Training Step: 3519... Training loss: 1.4518... 1.7061 sec/batch
Epoch: 18/20... Training Step: 3520... Training loss: 1.4382... 1.7003 sec/batch
Epoch: 18/20... Training Step: 3521... Training loss: 1.4277... 1.6828 sec/batch
Epoch: 18/20... Training Step: 3522... Training loss: 1.4336... 1.6810 sec/batch
Epoch: 18/20... Training Step: 3523... Training loss: 1.4376... 1.6933 sec/batch
Epoch: 18/20... Training Step: 3524... Training loss: 1.4337... 1.6991 sec/batch
Epoch: 18/20... Training Step: 3525... Training loss: 1.3894... 1.7100 sec/batch
Epoch: 18/20... Training Step: 3526... Training loss: 1.4647... 1.6935 sec/batch
Epoch: 18/20... Training Step: 3527... Training loss: 1.4606... 1.6829 sec/batch
Epoch: 18/20... Training Step: 3528... Training loss: 1.4427... 1.7296 sec/batch
Epoch: 18/20... Training Step: 3529... Training loss: 1.4515... 1.7022 sec/batch
Epoch: 18/20... Training Step: 3530... Training loss: 1.4324... 2.0286 sec/batch
Epoch: 18/20... Training Step: 3531... Training loss: 1.4368... 2.0046 sec/batch
Epoch: 18/20... Training Step: 3532... Training loss: 1.4465... 1.8861 sec/batch
Epoch: 18/20... Training Step: 3533... Training loss: 1.4609... 1.7401 sec/batch
Epoch: 18/20... Training Step: 3534... Training loss: 1.5203... 1.7269 sec/batch
Epoch: 18/20... Training Step: 3535... Training loss: 1.4492... 1.6693 sec/batch
Epoch: 18/20... Training Step: 3536... Training loss: 1.4300... 1.7314 sec/batch
Epoch: 18/20... Training Step: 3537... Training loss: 1.4301... 1.7211 sec/batch
Epoch: 18/20... Training Step: 3538... Training loss: 1.4255... 1.6886 sec/batch
Epoch: 18/20... Training Step: 3539... Training loss: 1.4634... 1.6928 sec/batch
Epoch: 18/20... Training Step: 3540... Training loss: 1.4378... 1.6775 sec/batch
Epoch: 18/20... Training Step: 3541... Training loss: 1.4515... 1.6933 sec/batch
Epoch: 18/20... Training Step: 3542... Training loss: 1.4214... 1.7309 sec/batch
Epoch: 18/20... Training Step: 3543... Training loss: 1.4298... 1.7060 sec/batch
Epoch: 18/20... Training Step: 3544... Training loss: 1.4653... 1.6913 sec/batch
Epoch: 18/20... Training Step: 3545... Training loss: 1.4178... 1.6888 sec/batch
Epoch: 18/20... Training Step: 3546... Training loss: 1.4043... 1.6913 sec/batch
Epoch: 18/20... Training Step: 3547... Training loss: 1.4225... 1.6860 sec/batch
Epoch: 18/20... Training Step: 3548... Training loss: 1.4404... 1.6881 sec/batch
Epoch: 18/20... Training Step: 3549... Training loss: 1.4316... 1.7033 sec/batch
Epoch: 18/20... Training Step: 3550... Training loss: 1.4268... 1.7080 sec/batch
Epoch: 18/20... Training Step: 3551... Training loss: 1.4282... 1.7371 sec/batch
Epoch: 18/20... Training Step: 3552... Training loss: 1.4253... 1.7307 sec/batch
Epoch: 18/20... Training Step: 3553... Training loss: 1.4572... 1.7438 sec/batch
Epoch: 18/20... Training Step: 3554... Training loss: 1.4195... 1.7185 sec/batch
Epoch: 18/20... Training Step: 3555... Training loss: 1.4235... 1.6947 sec/batch
Epoch: 18/20... Training Step: 3556... Training loss: 1.4165... 1.6876 sec/batch
Epoch: 18/20... Training Step: 3557... Training loss: 1.4139... 1.7283 sec/batch
Epoch: 18/20... Training Step: 3558... Training loss: 1.4033... 1.6988 sec/batch
Epoch: 18/20... Training Step: 3559... Training loss: 1.4323... 1.6807 sec/batch
Epoch: 18/20... Training Step: 3560... Training loss: 1.4139... 1.7045 sec/batch
Epoch: 18/20... Training Step: 3561... Training loss: 1.4026... 1.6980 sec/batch
Epoch: 18/20... Training Step: 3562... Training loss: 1.4433... 1.7386 sec/batch
Epoch: 18/20... Training Step: 3563... Training loss: 1.4187... 1.6864 sec/batch
Epoch: 18/20... Training Step: 3564... Training loss: 1.4174... 1.6839 sec/batch
Epoch: 19/20... Training Step: 3565... Training loss: 1.4903... 1.6873 sec/batch
Epoch: 19/20... Training Step: 3566... Training loss: 1.4375... 1.6894 sec/batch
Epoch: 19/20... Training Step: 3567... Training loss: 1.4244... 1.7162 sec/batch
Epoch: 19/20... Training Step: 3568... Training loss: 1.4457... 1.7350 sec/batch
Epoch: 19/20... Training Step: 3569... Training loss: 1.4266... 2.1263 sec/batch
Epoch: 19/20... Training Step: 3570... Training loss: 1.3939... 2.0341 sec/batch
Epoch: 19/20... Training Step: 3571... Training loss: 1.4418... 1.9797 sec/batch
Epoch: 19/20... Training Step: 3572... Training loss: 1.4436... 1.8242 sec/batch
Epoch: 19/20... Training Step: 3573... Training loss: 1.4416... 1.8360 sec/batch
Epoch: 19/20... Training Step: 3574... Training loss: 1.4283... 1.7096 sec/batch
Epoch: 19/20... Training Step: 3575... Training loss: 1.4153... 1.7085 sec/batch
Epoch: 19/20... Training Step: 3576... Training loss: 1.4311... 1.6875 sec/batch
Epoch: 19/20... Training Step: 3577... Training loss: 1.4271... 1.7406 sec/batch
Epoch: 19/20... Training Step: 3578... Training loss: 1.4507... 1.6880 sec/batch
Epoch: 19/20... Training Step: 3579... Training loss: 1.4227... 1.6983 sec/batch
Epoch: 19/20... Training Step: 3580... Training loss: 1.4112... 1.6922 sec/batch
Epoch: 19/20... Training Step: 3581... Training loss: 1.4443... 1.7039 sec/batch
Epoch: 19/20... Training Step: 3582... Training loss: 1.4449... 1.7295 sec/batch
Epoch: 19/20... Training Step: 3583... Training loss: 1.4361... 1.7141 sec/batch
Epoch: 19/20... Training Step: 3584... Training loss: 1.4600... 1.6938 sec/batch
Epoch: 19/20... Training Step: 3585... Training loss: 1.4227... 1.6879 sec/batch
Epoch: 19/20... Training Step: 3586... Training loss: 1.4489... 1.7307 sec/batch
Epoch: 19/20... Training Step: 3587... Training loss: 1.4261... 1.6862 sec/batch
Epoch: 19/20... Training Step: 3588... Training loss: 1.4441... 1.6754 sec/batch
Epoch: 19/20... Training Step: 3589... Training loss: 1.4368... 1.6861 sec/batch
Epoch: 19/20... Training Step: 3590... Training loss: 1.3984... 1.7051 sec/batch
Epoch: 19/20... Training Step: 3591... Training loss: 1.4041... 1.6835 sec/batch
Epoch: 19/20... Training Step: 3592... Training loss: 1.4452... 1.6994 sec/batch
Epoch: 19/20... Training Step: 3593... Training loss: 1.4454... 1.6853 sec/batch
Epoch: 19/20... Training Step: 3594... Training loss: 1.4477... 1.7228 sec/batch
Epoch: 19/20... Training Step: 3595... Training loss: 1.4305... 1.7370 sec/batch
Epoch: 19/20... Training Step: 3596... Training loss: 1.4101... 1.6942 sec/batch
Epoch: 19/20... Training Step: 3597... Training loss: 1.4504... 1.7061 sec/batch
Epoch: 19/20... Training Step: 3598... Training loss: 1.4413... 1.7083 sec/batch
Epoch: 19/20... Training Step: 3599... Training loss: 1.4365... 1.6921 sec/batch
Epoch: 19/20... Training Step: 3600... Training loss: 1.4377... 1.7037 sec/batch
Epoch: 19/20... Training Step: 3601... Training loss: 1.4089... 1.6613 sec/batch
Epoch: 19/20... Training Step: 3602... Training loss: 1.4021... 1.7270 sec/batch
Epoch: 19/20... Training Step: 3603... Training loss: 1.3925... 1.7291 sec/batch
Epoch: 19/20... Training Step: 3604... Training loss: 1.4020... 1.7176 sec/batch
Epoch: 19/20... Training Step: 3605... Training loss: 1.4048... 1.7177 sec/batch
Epoch: 19/20... Training Step: 3606... Training loss: 1.4677... 1.6855 sec/batch
Epoch: 19/20... Training Step: 3607... Training loss: 1.4206... 1.6667 sec/batch
Epoch: 19/20... Training Step: 3608... Training loss: 1.4041... 1.6857 sec/batch
Epoch: 19/20... Training Step: 3609... Training loss: 1.4411... 1.6915 sec/batch
Epoch: 19/20... Training Step: 3610... Training loss: 1.3989... 1.6895 sec/batch
Epoch: 19/20... Training Step: 3611... Training loss: 1.4223... 1.6907 sec/batch
Epoch: 19/20... Training Step: 3612... Training loss: 1.4135... 1.7875 sec/batch
Epoch: 19/20... Training Step: 3613... Training loss: 1.4141... 1.8975 sec/batch
Epoch: 19/20... Training Step: 3614... Training loss: 1.4437... 1.8699 sec/batch
Epoch: 19/20... Training Step: 3615... Training loss: 1.3993... 1.8690 sec/batch
Epoch: 19/20... Training Step: 3616... Training loss: 1.4615... 1.8371 sec/batch
Epoch: 19/20... Training Step: 3617... Training loss: 1.4304... 1.7336 sec/batch
Epoch: 19/20... Training Step: 3618... Training loss: 1.4400... 1.7061 sec/batch
Epoch: 19/20... Training Step: 3619... Training loss: 1.4202... 1.6944 sec/batch
Epoch: 19/20... Training Step: 3620... Training loss: 1.4288... 1.7232 sec/batch
Epoch: 19/20... Training Step: 3621... Training loss: 1.4414... 1.8145 sec/batch
Epoch: 19/20... Training Step: 3622... Training loss: 1.4271... 1.8390 sec/batch
Epoch: 19/20... Training Step: 3623... Training loss: 1.4003... 1.6838 sec/batch
Epoch: 19/20... Training Step: 3624... Training loss: 1.4654... 2.3127 sec/batch
Epoch: 19/20... Training Step: 3625... Training loss: 1.4400... 2.4195 sec/batch
Epoch: 19/20... Training Step: 3626... Training loss: 1.4716... 2.2824 sec/batch
Epoch: 19/20... Training Step: 3627... Training loss: 1.4567... 1.8482 sec/batch
Epoch: 19/20... Training Step: 3628... Training loss: 1.4266... 1.9827 sec/batch
Epoch: 19/20... Training Step: 3629... Training loss: 1.4378... 2.1221 sec/batch
Epoch: 19/20... Training Step: 3630... Training loss: 1.4385... 2.2774 sec/batch
Epoch: 19/20... Training Step: 3631... Training loss: 1.4592... 2.4534 sec/batch
Epoch: 19/20... Training Step: 3632... Training loss: 1.4137... 2.1949 sec/batch
Epoch: 19/20... Training Step: 3633... Training loss: 1.4312... 2.0249 sec/batch
Epoch: 19/20... Training Step: 3634... Training loss: 1.4202... 2.2710 sec/batch
Epoch: 19/20... Training Step: 3635... Training loss: 1.4688... 1.7198 sec/batch
Epoch: 19/20... Training Step: 3636... Training loss: 1.4526... 1.8556 sec/batch
Epoch: 19/20... Training Step: 3637... Training loss: 1.4658... 1.5934 sec/batch
Epoch: 19/20... Training Step: 3638... Training loss: 1.4043... 1.5833 sec/batch
Epoch: 19/20... Training Step: 3639... Training loss: 1.4389... 1.5953 sec/batch
Epoch: 19/20... Training Step: 3640... Training loss: 1.4519... 1.6859 sec/batch
Epoch: 19/20... Training Step: 3641... Training loss: 1.4228... 1.8064 sec/batch
Epoch: 19/20... Training Step: 3642... Training loss: 1.4238... 1.8427 sec/batch
Epoch: 19/20... Training Step: 3643... Training loss: 1.3888... 2.6954 sec/batch
Epoch: 19/20... Training Step: 3644... Training loss: 1.4351... 2.5156 sec/batch
Epoch: 19/20... Training Step: 3645... Training loss: 1.3839... 2.7632 sec/batch
Epoch: 19/20... Training Step: 3646... Training loss: 1.4314... 2.2026 sec/batch
Epoch: 19/20... Training Step: 3647... Training loss: 1.4097... 2.3030 sec/batch
Epoch: 19/20... Training Step: 3648... Training loss: 1.4172... 2.3008 sec/batch
Epoch: 19/20... Training Step: 3649... Training loss: 1.3997... 2.2845 sec/batch
Epoch: 19/20... Training Step: 3650... Training loss: 1.4179... 2.8322 sec/batch
Epoch: 19/20... Training Step: 3651... Training loss: 1.3966... 3.5388 sec/batch
Epoch: 19/20... Training Step: 3652... Training loss: 1.4091... 3.5224 sec/batch
Epoch: 19/20... Training Step: 3653... Training loss: 1.3946... 3.3967 sec/batch
Epoch: 19/20... Training Step: 3654... Training loss: 1.4405... 3.3267 sec/batch
Epoch: 19/20... Training Step: 3655... Training loss: 1.3949... 3.2183 sec/batch
Epoch: 19/20... Training Step: 3656... Training loss: 1.4204... 2.7077 sec/batch
Epoch: 19/20... Training Step: 3657... Training loss: 1.4011... 2.8421 sec/batch
Epoch: 19/20... Training Step: 3658... Training loss: 1.4084... 2.5675 sec/batch
Epoch: 19/20... Training Step: 3659... Training loss: 1.4058... 2.4624 sec/batch
Epoch: 19/20... Training Step: 3660... Training loss: 1.4431... 2.7109 sec/batch
Epoch: 19/20... Training Step: 3661... Training loss: 1.4348... 1.9110 sec/batch
Epoch: 19/20... Training Step: 3662... Training loss: 1.3944... 2.3879 sec/batch
Epoch: 19/20... Training Step: 3663... Training loss: 1.4141... 2.5810 sec/batch
Epoch: 19/20... Training Step: 3664... Training loss: 1.3905... 2.6500 sec/batch
Epoch: 19/20... Training Step: 3665... Training loss: 1.4174... 2.6237 sec/batch
Epoch: 19/20... Training Step: 3666... Training loss: 1.4163... 2.2601 sec/batch
Epoch: 19/20... Training Step: 3667... Training loss: 1.4265... 2.1797 sec/batch
Epoch: 19/20... Training Step: 3668... Training loss: 1.4120... 1.8723 sec/batch
Epoch: 19/20... Training Step: 3669... Training loss: 1.4270... 1.7395 sec/batch
Epoch: 19/20... Training Step: 3670... Training loss: 1.4190... 1.7575 sec/batch
Epoch: 19/20... Training Step: 3671... Training loss: 1.4398... 1.7423 sec/batch
Epoch: 19/20... Training Step: 3672... Training loss: 1.4299... 1.7717 sec/batch
Epoch: 19/20... Training Step: 3673... Training loss: 1.4185... 1.8126 sec/batch
Epoch: 19/20... Training Step: 3674... Training loss: 1.4427... 1.7811 sec/batch
Epoch: 19/20... Training Step: 3675... Training loss: 1.4207... 1.7522 sec/batch
Epoch: 19/20... Training Step: 3676... Training loss: 1.4247... 1.7508 sec/batch
Epoch: 19/20... Training Step: 3677... Training loss: 1.4233... 1.7166 sec/batch
Epoch: 19/20... Training Step: 3678... Training loss: 1.4132... 1.7362 sec/batch
Epoch: 19/20... Training Step: 3679... Training loss: 1.4047... 1.7237 sec/batch
Epoch: 19/20... Training Step: 3680... Training loss: 1.3878... 1.7238 sec/batch
Epoch: 19/20... Training Step: 3681... Training loss: 1.4331... 1.7572 sec/batch
Epoch: 19/20... Training Step: 3682... Training loss: 1.4283... 1.7376 sec/batch
Epoch: 19/20... Training Step: 3683... Training loss: 1.4217... 1.7179 sec/batch
Epoch: 19/20... Training Step: 3684... Training loss: 1.4118... 1.6690 sec/batch
Epoch: 19/20... Training Step: 3685... Training loss: 1.4236... 1.7311 sec/batch
Epoch: 19/20... Training Step: 3686... Training loss: 1.3988... 2.0163 sec/batch
Epoch: 19/20... Training Step: 3687... Training loss: 1.3900... 2.2570 sec/batch
Epoch: 19/20... Training Step: 3688... Training loss: 1.4307... 2.2586 sec/batch
Epoch: 19/20... Training Step: 3689... Training loss: 1.4142... 2.2362 sec/batch
Epoch: 19/20... Training Step: 3690... Training loss: 1.3849... 2.6587 sec/batch
Epoch: 19/20... Training Step: 3691... Training loss: 1.4381... 2.1254 sec/batch
Epoch: 19/20... Training Step: 3692... Training loss: 1.4229... 1.9890 sec/batch
Epoch: 19/20... Training Step: 3693... Training loss: 1.4079... 2.0821 sec/batch
Epoch: 19/20... Training Step: 3694... Training loss: 1.3975... 3.0881 sec/batch
Epoch: 19/20... Training Step: 3695... Training loss: 1.3723... 2.2293 sec/batch
Epoch: 19/20... Training Step: 3696... Training loss: 1.4029... 1.8024 sec/batch
Epoch: 19/20... Training Step: 3697... Training loss: 1.4484... 1.7644 sec/batch
Epoch: 19/20... Training Step: 3698... Training loss: 1.4221... 1.7674 sec/batch
Epoch: 19/20... Training Step: 3699... Training loss: 1.4366... 1.7815 sec/batch
Epoch: 19/20... Training Step: 3700... Training loss: 1.4306... 1.7672 sec/batch
Epoch: 19/20... Training Step: 3701... Training loss: 1.4524... 2.9697 sec/batch
Epoch: 19/20... Training Step: 3702... Training loss: 1.4380... 2.2442 sec/batch
Epoch: 19/20... Training Step: 3703... Training loss: 1.4277... 2.1051 sec/batch
Epoch: 19/20... Training Step: 3704... Training loss: 1.4209... 2.0704 sec/batch
Epoch: 19/20... Training Step: 3705... Training loss: 1.4733... 2.0776 sec/batch
Epoch: 19/20... Training Step: 3706... Training loss: 1.4398... 2.5226 sec/batch
Epoch: 19/20... Training Step: 3707... Training loss: 1.4165... 3.5075 sec/batch
Epoch: 19/20... Training Step: 3708... Training loss: 1.4441... 2.2519 sec/batch
Epoch: 19/20... Training Step: 3709... Training loss: 1.4090... 2.9627 sec/batch
Epoch: 19/20... Training Step: 3710... Training loss: 1.4590... 2.7568 sec/batch
Epoch: 19/20... Training Step: 3711... Training loss: 1.4493... 2.7282 sec/batch
Epoch: 19/20... Training Step: 3712... Training loss: 1.4729... 3.0770 sec/batch
Epoch: 19/20... Training Step: 3713... Training loss: 1.4413... 3.1193 sec/batch
Epoch: 19/20... Training Step: 3714... Training loss: 1.4341... 2.8495 sec/batch
Epoch: 19/20... Training Step: 3715... Training loss: 1.3955... 2.1074 sec/batch
Epoch: 19/20... Training Step: 3716... Training loss: 1.4187... 1.8814 sec/batch
Epoch: 19/20... Training Step: 3717... Training loss: 1.4438... 1.8735 sec/batch
Epoch: 19/20... Training Step: 3718... Training loss: 1.4345... 1.9145 sec/batch
Epoch: 19/20... Training Step: 3719... Training loss: 1.4206... 1.9020 sec/batch
Epoch: 19/20... Training Step: 3720... Training loss: 1.4213... 2.1799 sec/batch
Epoch: 19/20... Training Step: 3721... Training loss: 1.4293... 3.7681 sec/batch
Epoch: 19/20... Training Step: 3722... Training loss: 1.4238... 4.4664 sec/batch
Epoch: 19/20... Training Step: 3723... Training loss: 1.3883... 4.6617 sec/batch
Epoch: 19/20... Training Step: 3724... Training loss: 1.4580... 3.9238 sec/batch
Epoch: 19/20... Training Step: 3725... Training loss: 1.4364... 2.3527 sec/batch
Epoch: 19/20... Training Step: 3726... Training loss: 1.4257... 2.4640 sec/batch
Epoch: 19/20... Training Step: 3727... Training loss: 1.4403... 2.3382 sec/batch
Epoch: 19/20... Training Step: 3728... Training loss: 1.4334... 3.3992 sec/batch
Epoch: 19/20... Training Step: 3729... Training loss: 1.4270... 2.8761 sec/batch
Epoch: 19/20... Training Step: 3730... Training loss: 1.4347... 2.5514 sec/batch
Epoch: 19/20... Training Step: 3731... Training loss: 1.4496... 2.5208 sec/batch
Epoch: 19/20... Training Step: 3732... Training loss: 1.4907... 2.6963 sec/batch
Epoch: 19/20... Training Step: 3733... Training loss: 1.4266... 2.6598 sec/batch
Epoch: 19/20... Training Step: 3734... Training loss: 1.4260... 3.3558 sec/batch
Epoch: 19/20... Training Step: 3735... Training loss: 1.4127... 4.5300 sec/batch
Epoch: 19/20... Training Step: 3736... Training loss: 1.4194... 2.9907 sec/batch
Epoch: 19/20... Training Step: 3737... Training loss: 1.4583... 4.0523 sec/batch
Epoch: 19/20... Training Step: 3738... Training loss: 1.4388... 2.9265 sec/batch
Epoch: 19/20... Training Step: 3739... Training loss: 1.4345... 2.8385 sec/batch
Epoch: 19/20... Training Step: 3740... Training loss: 1.3977... 2.0568 sec/batch
Epoch: 19/20... Training Step: 3741... Training loss: 1.4132... 2.3706 sec/batch
Epoch: 19/20... Training Step: 3742... Training loss: 1.4507... 3.1433 sec/batch
Epoch: 19/20... Training Step: 3743... Training loss: 1.4176... 4.0612 sec/batch
Epoch: 19/20... Training Step: 3744... Training loss: 1.4004... 2.4695 sec/batch
Epoch: 19/20... Training Step: 3745... Training loss: 1.4075... 1.9925 sec/batch
Epoch: 19/20... Training Step: 3746... Training loss: 1.4333... 2.1578 sec/batch
Epoch: 19/20... Training Step: 3747... Training loss: 1.4255... 1.8366 sec/batch
Epoch: 19/20... Training Step: 3748... Training loss: 1.4190... 1.7933 sec/batch
Epoch: 19/20... Training Step: 3749... Training loss: 1.4240... 1.7882 sec/batch
Epoch: 19/20... Training Step: 3750... Training loss: 1.4142... 1.8237 sec/batch
Epoch: 19/20... Training Step: 3751... Training loss: 1.4363... 1.7737 sec/batch
Epoch: 19/20... Training Step: 3752... Training loss: 1.4230... 1.8302 sec/batch
Epoch: 19/20... Training Step: 3753... Training loss: 1.4178... 1.7853 sec/batch
Epoch: 19/20... Training Step: 3754... Training loss: 1.4148... 1.7949 sec/batch
Epoch: 19/20... Training Step: 3755... Training loss: 1.4118... 2.6603 sec/batch
Epoch: 19/20... Training Step: 3756... Training loss: 1.4153... 2.9076 sec/batch
Epoch: 19/20... Training Step: 3757... Training loss: 1.4144... 2.9987 sec/batch
Epoch: 19/20... Training Step: 3758... Training loss: 1.4004... 2.5030 sec/batch
Epoch: 19/20... Training Step: 3759... Training loss: 1.3914... 2.4672 sec/batch
Epoch: 19/20... Training Step: 3760... Training loss: 1.4318... 3.2229 sec/batch
Epoch: 19/20... Training Step: 3761... Training loss: 1.4082... 3.2972 sec/batch
Epoch: 19/20... Training Step: 3762... Training loss: 1.4078... 3.1994 sec/batch
Epoch: 20/20... Training Step: 3763... Training loss: 1.4809... 2.1882 sec/batch
Epoch: 20/20... Training Step: 3764... Training loss: 1.4415... 2.9337 sec/batch
Epoch: 20/20... Training Step: 3765... Training loss: 1.4198... 2.5686 sec/batch
Epoch: 20/20... Training Step: 3766... Training loss: 1.4354... 2.2115 sec/batch
Epoch: 20/20... Training Step: 3767... Training loss: 1.4073... 1.9665 sec/batch
Epoch: 20/20... Training Step: 3768... Training loss: 1.3984... 1.9308 sec/batch
Epoch: 20/20... Training Step: 3769... Training loss: 1.4310... 1.9936 sec/batch
Epoch: 20/20... Training Step: 3770... Training loss: 1.4289... 2.2921 sec/batch
Epoch: 20/20... Training Step: 3771... Training loss: 1.4335... 2.0211 sec/batch
Epoch: 20/20... Training Step: 3772... Training loss: 1.4191... 2.0328 sec/batch
Epoch: 20/20... Training Step: 3773... Training loss: 1.4106... 2.0023 sec/batch
Epoch: 20/20... Training Step: 3774... Training loss: 1.4233... 1.9900 sec/batch
Epoch: 20/20... Training Step: 3775... Training loss: 1.4210... 1.9605 sec/batch
Epoch: 20/20... Training Step: 3776... Training loss: 1.4421... 1.9672 sec/batch
Epoch: 20/20... Training Step: 3777... Training loss: 1.4106... 1.8842 sec/batch
Epoch: 20/20... Training Step: 3778... Training loss: 1.4036... 1.9425 sec/batch
Epoch: 20/20... Training Step: 3779... Training loss: 1.4280... 1.9492 sec/batch
Epoch: 20/20... Training Step: 3780... Training loss: 1.4469... 1.9856 sec/batch
Epoch: 20/20... Training Step: 3781... Training loss: 1.4296... 1.9731 sec/batch
Epoch: 20/20... Training Step: 3782... Training loss: 1.4588... 1.9576 sec/batch
Epoch: 20/20... Training Step: 3783... Training loss: 1.4108... 1.9241 sec/batch
Epoch: 20/20... Training Step: 3784... Training loss: 1.4476... 1.8906 sec/batch
Epoch: 20/20... Training Step: 3785... Training loss: 1.4179... 1.9067 sec/batch
Epoch: 20/20... Training Step: 3786... Training loss: 1.4444... 1.8876 sec/batch
Epoch: 20/20... Training Step: 3787... Training loss: 1.4355... 1.9746 sec/batch
Epoch: 20/20... Training Step: 3788... Training loss: 1.3939... 2.0698 sec/batch
Epoch: 20/20... Training Step: 3789... Training loss: 1.4050... 1.8889 sec/batch
Epoch: 20/20... Training Step: 3790... Training loss: 1.4623... 1.9509 sec/batch
Epoch: 20/20... Training Step: 3791... Training loss: 1.4398... 1.8836 sec/batch
Epoch: 20/20... Training Step: 3792... Training loss: 1.4529... 1.7742 sec/batch
Epoch: 20/20... Training Step: 3793... Training loss: 1.4254... 1.7778 sec/batch
Epoch: 20/20... Training Step: 3794... Training loss: 1.4059... 1.7862 sec/batch
Epoch: 20/20... Training Step: 3795... Training loss: 1.4446... 1.7764 sec/batch
Epoch: 20/20... Training Step: 3796... Training loss: 1.4386... 1.7834 sec/batch
Epoch: 20/20... Training Step: 3797... Training loss: 1.4236... 1.7622 sec/batch
Epoch: 20/20... Training Step: 3798... Training loss: 1.4314... 1.7800 sec/batch
Epoch: 20/20... Training Step: 3799... Training loss: 1.3979... 1.7681 sec/batch
Epoch: 20/20... Training Step: 3800... Training loss: 1.3857... 1.7471 sec/batch
Epoch: 20/20... Training Step: 3801... Training loss: 1.3717... 1.7116 sec/batch
Epoch: 20/20... Training Step: 3802... Training loss: 1.4023... 1.7888 sec/batch
Epoch: 20/20... Training Step: 3803... Training loss: 1.4035... 1.8138 sec/batch
Epoch: 20/20... Training Step: 3804... Training loss: 1.4569... 1.7641 sec/batch
Epoch: 20/20... Training Step: 3805... Training loss: 1.4129... 1.7444 sec/batch
Epoch: 20/20... Training Step: 3806... Training loss: 1.3984... 1.8113 sec/batch
Epoch: 20/20... Training Step: 3807... Training loss: 1.4460... 1.7708 sec/batch
Epoch: 20/20... Training Step: 3808... Training loss: 1.3883... 1.7702 sec/batch
Epoch: 20/20... Training Step: 3809... Training loss: 1.4226... 1.7997 sec/batch
Epoch: 20/20... Training Step: 3810... Training loss: 1.4138... 1.8882 sec/batch
Epoch: 20/20... Training Step: 3811... Training loss: 1.4155... 2.0123 sec/batch
Epoch: 20/20... Training Step: 3812... Training loss: 1.4409... 1.8084 sec/batch
Epoch: 20/20... Training Step: 3813... Training loss: 1.3995... 1.7831 sec/batch
Epoch: 20/20... Training Step: 3814... Training loss: 1.4723... 1.7571 sec/batch
Epoch: 20/20... Training Step: 3815... Training loss: 1.4331... 1.7397 sec/batch
Epoch: 20/20... Training Step: 3816... Training loss: 1.4327... 1.9769 sec/batch
Epoch: 20/20... Training Step: 3817... Training loss: 1.4105... 1.9364 sec/batch
Epoch: 20/20... Training Step: 3818... Training loss: 1.4212... 1.7834 sec/batch
Epoch: 20/20... Training Step: 3819... Training loss: 1.4306... 1.8745 sec/batch
Epoch: 20/20... Training Step: 3820... Training loss: 1.4020... 1.7766 sec/batch
Epoch: 20/20... Training Step: 3821... Training loss: 1.3919... 1.7503 sec/batch
Epoch: 20/20... Training Step: 3822... Training loss: 1.4530... 1.7875 sec/batch
Epoch: 20/20... Training Step: 3823... Training loss: 1.4296... 1.7412 sec/batch
Epoch: 20/20... Training Step: 3824... Training loss: 1.4619... 1.7514 sec/batch
Epoch: 20/20... Training Step: 3825... Training loss: 1.4504... 1.7594 sec/batch
Epoch: 20/20... Training Step: 3826... Training loss: 1.4309... 1.7455 sec/batch
Epoch: 20/20... Training Step: 3827... Training loss: 1.4193... 1.7924 sec/batch
Epoch: 20/20... Training Step: 3828... Training loss: 1.4339... 1.7573 sec/batch
Epoch: 20/20... Training Step: 3829... Training loss: 1.4437... 1.7483 sec/batch
Epoch: 20/20... Training Step: 3830... Training loss: 1.4001... 1.7703 sec/batch
Epoch: 20/20... Training Step: 3831... Training loss: 1.4261... 1.7823 sec/batch
Epoch: 20/20... Training Step: 3832... Training loss: 1.4077... 1.7498 sec/batch
Epoch: 20/20... Training Step: 3833... Training loss: 1.4673... 1.7540 sec/batch
Epoch: 20/20... Training Step: 3834... Training loss: 1.4424... 1.7999 sec/batch
Epoch: 20/20... Training Step: 3835... Training loss: 1.4375... 1.7833 sec/batch
Epoch: 20/20... Training Step: 3836... Training loss: 1.4035... 1.7914 sec/batch
Epoch: 20/20... Training Step: 3837... Training loss: 1.4183... 1.8047 sec/batch
Epoch: 20/20... Training Step: 3838... Training loss: 1.4439... 1.8299 sec/batch
Epoch: 20/20... Training Step: 3839... Training loss: 1.4225... 1.7827 sec/batch
Epoch: 20/20... Training Step: 3840... Training loss: 1.4120... 1.7330 sec/batch
Epoch: 20/20... Training Step: 3841... Training loss: 1.3768... 1.7582 sec/batch
Epoch: 20/20... Training Step: 3842... Training loss: 1.4239... 1.8160 sec/batch
Epoch: 20/20... Training Step: 3843... Training loss: 1.3892... 1.7924 sec/batch
Epoch: 20/20... Training Step: 3844... Training loss: 1.4137... 1.7612 sec/batch
Epoch: 20/20... Training Step: 3845... Training loss: 1.3971... 1.7824 sec/batch
Epoch: 20/20... Training Step: 3846... Training loss: 1.4105... 1.7591 sec/batch
Epoch: 20/20... Training Step: 3847... Training loss: 1.4001... 1.7830 sec/batch
Epoch: 20/20... Training Step: 3848... Training loss: 1.4149... 1.7607 sec/batch
Epoch: 20/20... Training Step: 3849... Training loss: 1.3881... 1.7598 sec/batch
Epoch: 20/20... Training Step: 3850... Training loss: 1.3988... 1.7535 sec/batch
Epoch: 20/20... Training Step: 3851... Training loss: 1.3867... 1.7558 sec/batch
Epoch: 20/20... Training Step: 3852... Training loss: 1.4358... 1.8230 sec/batch
Epoch: 20/20... Training Step: 3853... Training loss: 1.3885... 1.8123 sec/batch
Epoch: 20/20... Training Step: 3854... Training loss: 1.4090... 1.7453 sec/batch
Epoch: 20/20... Training Step: 3855... Training loss: 1.3996... 1.8994 sec/batch
Epoch: 20/20... Training Step: 3856... Training loss: 1.3980... 1.7643 sec/batch
Epoch: 20/20... Training Step: 3857... Training loss: 1.3997... 1.7818 sec/batch
Epoch: 20/20... Training Step: 3858... Training loss: 1.4339... 1.7429 sec/batch
Epoch: 20/20... Training Step: 3859... Training loss: 1.4278... 1.7334 sec/batch
Epoch: 20/20... Training Step: 3860... Training loss: 1.3886... 1.8079 sec/batch
Epoch: 20/20... Training Step: 3861... Training loss: 1.3970... 1.7796 sec/batch
Epoch: 20/20... Training Step: 3862... Training loss: 1.3942... 1.7677 sec/batch
Epoch: 20/20... Training Step: 3863... Training loss: 1.4098... 1.7509 sec/batch
Epoch: 20/20... Training Step: 3864... Training loss: 1.4063... 1.7485 sec/batch
Epoch: 20/20... Training Step: 3865... Training loss: 1.4164... 1.7472 sec/batch
Epoch: 20/20... Training Step: 3866... Training loss: 1.4115... 1.7449 sec/batch
Epoch: 20/20... Training Step: 3867... Training loss: 1.4186... 1.8023 sec/batch
Epoch: 20/20... Training Step: 3868... Training loss: 1.4023... 1.7616 sec/batch
Epoch: 20/20... Training Step: 3869... Training loss: 1.4251... 1.7934 sec/batch
Epoch: 20/20... Training Step: 3870... Training loss: 1.4222... 1.8155 sec/batch
Epoch: 20/20... Training Step: 3871... Training loss: 1.4120... 1.7610 sec/batch
Epoch: 20/20... Training Step: 3872... Training loss: 1.4227... 1.7531 sec/batch
Epoch: 20/20... Training Step: 3873... Training loss: 1.4037... 1.7785 sec/batch
Epoch: 20/20... Training Step: 3874... Training loss: 1.4183... 1.7678 sec/batch
Epoch: 20/20... Training Step: 3875... Training loss: 1.4169... 1.7678 sec/batch
Epoch: 20/20... Training Step: 3876... Training loss: 1.4095... 1.7457 sec/batch
Epoch: 20/20... Training Step: 3877... Training loss: 1.3963... 1.7821 sec/batch
Epoch: 20/20... Training Step: 3878... Training loss: 1.3764... 1.7622 sec/batch
Epoch: 20/20... Training Step: 3879... Training loss: 1.4412... 1.7566 sec/batch
Epoch: 20/20... Training Step: 3880... Training loss: 1.4354... 1.7584 sec/batch
Epoch: 20/20... Training Step: 3881... Training loss: 1.4078... 1.7734 sec/batch
Epoch: 20/20... Training Step: 3882... Training loss: 1.4017... 1.7846 sec/batch
Epoch: 20/20... Training Step: 3883... Training loss: 1.4176... 1.7659 sec/batch
Epoch: 20/20... Training Step: 3884... Training loss: 1.3922... 1.7645 sec/batch
Epoch: 20/20... Training Step: 3885... Training loss: 1.3720... 1.7667 sec/batch
Epoch: 20/20... Training Step: 3886... Training loss: 1.4284... 1.7613 sec/batch
Epoch: 20/20... Training Step: 3887... Training loss: 1.4137... 1.8361 sec/batch
Epoch: 20/20... Training Step: 3888... Training loss: 1.3677... 1.7532 sec/batch
Epoch: 20/20... Training Step: 3889... Training loss: 1.4328... 1.7826 sec/batch
Epoch: 20/20... Training Step: 3890... Training loss: 1.4223... 1.7585 sec/batch
Epoch: 20/20... Training Step: 3891... Training loss: 1.4000... 1.7527 sec/batch
Epoch: 20/20... Training Step: 3892... Training loss: 1.3882... 1.7632 sec/batch
Epoch: 20/20... Training Step: 3893... Training loss: 1.3789... 1.7915 sec/batch
Epoch: 20/20... Training Step: 3894... Training loss: 1.3936... 1.7772 sec/batch
Epoch: 20/20... Training Step: 3895... Training loss: 1.4441... 1.7897 sec/batch
Epoch: 20/20... Training Step: 3896... Training loss: 1.4267... 1.7933 sec/batch
Epoch: 20/20... Training Step: 3897... Training loss: 1.4306... 1.7643 sec/batch
Epoch: 20/20... Training Step: 3898... Training loss: 1.4290... 1.7762 sec/batch
Epoch: 20/20... Training Step: 3899... Training loss: 1.4551... 1.7463 sec/batch
Epoch: 20/20... Training Step: 3900... Training loss: 1.4262... 1.7589 sec/batch
Epoch: 20/20... Training Step: 3901... Training loss: 1.4262... 1.7726 sec/batch
Epoch: 20/20... Training Step: 3902... Training loss: 1.4151... 1.7641 sec/batch
Epoch: 20/20... Training Step: 3903... Training loss: 1.4666... 1.7923 sec/batch
Epoch: 20/20... Training Step: 3904... Training loss: 1.4351... 1.8308 sec/batch
Epoch: 20/20... Training Step: 3905... Training loss: 1.4173... 1.9083 sec/batch
Epoch: 20/20... Training Step: 3906... Training loss: 1.4499... 1.8197 sec/batch
Epoch: 20/20... Training Step: 3907... Training loss: 1.4109... 1.9254 sec/batch
Epoch: 20/20... Training Step: 3908... Training loss: 1.4412... 1.7792 sec/batch
Epoch: 20/20... Training Step: 3909... Training loss: 1.4331... 1.7691 sec/batch
Epoch: 20/20... Training Step: 3910... Training loss: 1.4538... 1.7804 sec/batch
Epoch: 20/20... Training Step: 3911... Training loss: 1.4391... 1.7595 sec/batch
Epoch: 20/20... Training Step: 3912... Training loss: 1.4351... 1.7655 sec/batch
Epoch: 20/20... Training Step: 3913... Training loss: 1.3898... 1.7514 sec/batch
Epoch: 20/20... Training Step: 3914... Training loss: 1.4012... 1.7402 sec/batch
Epoch: 20/20... Training Step: 3915... Training loss: 1.4355... 1.8256 sec/batch
Epoch: 20/20... Training Step: 3916... Training loss: 1.4243... 1.7744 sec/batch
Epoch: 20/20... Training Step: 3917... Training loss: 1.4163... 1.7765 sec/batch
Epoch: 20/20... Training Step: 3918... Training loss: 1.4180... 1.7997 sec/batch
Epoch: 20/20... Training Step: 3919... Training loss: 1.4257... 1.7907 sec/batch
Epoch: 20/20... Training Step: 3920... Training loss: 1.4253... 1.8086 sec/batch
Epoch: 20/20... Training Step: 3921... Training loss: 1.3810... 1.7642 sec/batch
Epoch: 20/20... Training Step: 3922... Training loss: 1.4444... 1.7766 sec/batch
Epoch: 20/20... Training Step: 3923... Training loss: 1.4448... 1.8925 sec/batch
Epoch: 20/20... Training Step: 3924... Training loss: 1.4252... 1.7617 sec/batch
Epoch: 20/20... Training Step: 3925... Training loss: 1.4344... 1.7485 sec/batch
Epoch: 20/20... Training Step: 3926... Training loss: 1.4274... 1.7693 sec/batch
Epoch: 20/20... Training Step: 3927... Training loss: 1.4232... 1.7683 sec/batch
Epoch: 20/20... Training Step: 3928... Training loss: 1.4381... 1.7525 sec/batch
Epoch: 20/20... Training Step: 3929... Training loss: 1.4370... 1.7714 sec/batch
Epoch: 20/20... Training Step: 3930... Training loss: 1.4905... 1.7599 sec/batch
Epoch: 20/20... Training Step: 3931... Training loss: 1.4177... 1.8067 sec/batch
Epoch: 20/20... Training Step: 3932... Training loss: 1.4139... 1.7755 sec/batch
Epoch: 20/20... Training Step: 3933... Training loss: 1.4280... 1.7654 sec/batch
Epoch: 20/20... Training Step: 3934... Training loss: 1.4022... 1.7352 sec/batch
Epoch: 20/20... Training Step: 3935... Training loss: 1.4425... 1.8408 sec/batch
Epoch: 20/20... Training Step: 3936... Training loss: 1.4304... 1.8515 sec/batch
Epoch: 20/20... Training Step: 3937... Training loss: 1.4333... 1.9647 sec/batch
Epoch: 20/20... Training Step: 3938... Training loss: 1.4042... 1.7900 sec/batch
Epoch: 20/20... Training Step: 3939... Training loss: 1.4059... 1.7823 sec/batch
Epoch: 20/20... Training Step: 3940... Training loss: 1.4398... 1.8060 sec/batch
Epoch: 20/20... Training Step: 3941... Training loss: 1.4019... 1.7616 sec/batch
Epoch: 20/20... Training Step: 3942... Training loss: 1.3923... 1.7589 sec/batch
Epoch: 20/20... Training Step: 3943... Training loss: 1.3869... 1.7371 sec/batch
Epoch: 20/20... Training Step: 3944... Training loss: 1.4073... 1.7867 sec/batch
Epoch: 20/20... Training Step: 3945... Training loss: 1.4163... 1.7626 sec/batch
Epoch: 20/20... Training Step: 3946... Training loss: 1.4143... 1.7720 sec/batch
Epoch: 20/20... Training Step: 3947... Training loss: 1.4017... 1.7799 sec/batch
Epoch: 20/20... Training Step: 3948... Training loss: 1.4071... 1.7872 sec/batch
Epoch: 20/20... Training Step: 3949... Training loss: 1.4444... 1.8030 sec/batch
Epoch: 20/20... Training Step: 3950... Training loss: 1.4105... 1.7673 sec/batch
Epoch: 20/20... Training Step: 3951... Training loss: 1.4048... 1.7865 sec/batch
Epoch: 20/20... Training Step: 3952... Training loss: 1.4043... 1.7926 sec/batch
Epoch: 20/20... Training Step: 3953... Training loss: 1.3936... 1.7656 sec/batch
Epoch: 20/20... Training Step: 3954... Training loss: 1.3937... 1.8102 sec/batch
Epoch: 20/20... Training Step: 3955... Training loss: 1.4170... 1.7557 sec/batch
Epoch: 20/20... Training Step: 3956... Training loss: 1.3978... 1.7302 sec/batch
Epoch: 20/20... Training Step: 3957... Training loss: 1.3907... 1.8302 sec/batch
Epoch: 20/20... Training Step: 3958... Training loss: 1.4215... 1.7526 sec/batch
Epoch: 20/20... Training Step: 3959... Training loss: 1.4095... 1.7529 sec/batch
Epoch: 20/20... Training Step: 3960... Training loss: 1.4039... 1.7566 sec/batch
Read up on saving and loading checkpoints here: https://www.tensorflow.org/programmers_guide/variables
In [24]:
tf.train.get_checkpoint_state('checkpoints')
Out[24]:
model_checkpoint_path: "checkpoints/i3960_l256.ckpt"
all_model_checkpoint_paths: "checkpoints/i200_l256.ckpt"
all_model_checkpoint_paths: "checkpoints/i400_l256.ckpt"
all_model_checkpoint_paths: "checkpoints/i600_l256.ckpt"
all_model_checkpoint_paths: "checkpoints/i800_l256.ckpt"
all_model_checkpoint_paths: "checkpoints/i1000_l256.ckpt"
all_model_checkpoint_paths: "checkpoints/i1200_l256.ckpt"
all_model_checkpoint_paths: "checkpoints/i1400_l256.ckpt"
all_model_checkpoint_paths: "checkpoints/i1600_l256.ckpt"
all_model_checkpoint_paths: "checkpoints/i1800_l256.ckpt"
all_model_checkpoint_paths: "checkpoints/i2000_l256.ckpt"
all_model_checkpoint_paths: "checkpoints/i2200_l256.ckpt"
all_model_checkpoint_paths: "checkpoints/i2400_l256.ckpt"
all_model_checkpoint_paths: "checkpoints/i2600_l256.ckpt"
all_model_checkpoint_paths: "checkpoints/i2800_l256.ckpt"
all_model_checkpoint_paths: "checkpoints/i3000_l256.ckpt"
all_model_checkpoint_paths: "checkpoints/i3200_l256.ckpt"
all_model_checkpoint_paths: "checkpoints/i3400_l256.ckpt"
all_model_checkpoint_paths: "checkpoints/i3600_l256.ckpt"
all_model_checkpoint_paths: "checkpoints/i3800_l256.ckpt"
all_model_checkpoint_paths: "checkpoints/i3960_l256.ckpt"
Now that the network is trained, we'll can use it to generate new text. The idea is that we pass in a character, then the network will predict the next character. We can use the new one, to predict the next one. And we keep doing this to generate all new text. I also included some functionality to prime the network with some text by passing in a string and building up a state from that.
The network gives us predictions for each character. To reduce noise and make things a little less random, I'm going to only choose a new character from the top N most likely characters.
In [25]:
def pick_top_n(preds, vocab_size, top_n=5):
p = np.squeeze(preds)
p[np.argsort(p)[:-top_n]] = 0
p = p / np.sum(p)
c = np.random.choice(vocab_size, 1, p=p)[0]
return c
In [26]:
def sample(checkpoint, n_samples, lstm_size, vocab_size, prime="The "):
samples = [c for c in prime]
model = CharRNN(len(vocab), lstm_size=lstm_size, sampling=True)
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess, checkpoint)
new_state = sess.run(model.initial_state)
for c in prime:
x = np.zeros((1, 1))
x[0,0] = vocab_to_int[c]
feed = {model.inputs: x,
model.keep_prob: 1.,
model.initial_state: new_state}
preds, new_state = sess.run([model.prediction, model.final_state],
feed_dict=feed)
c = pick_top_n(preds, len(vocab))
samples.append(int_to_vocab[c])
for i in range(n_samples):
x[0,0] = c
feed = {model.inputs: x,
model.keep_prob: 1.,
model.initial_state: new_state}
preds, new_state = sess.run([model.prediction, model.final_state],
feed_dict=feed)
c = pick_top_n(preds, len(vocab))
samples.append(int_to_vocab[c])
return ''.join(samples)
Here, pass in the path to a checkpoint and sample from the network.
In [27]:
tf.train.latest_checkpoint('checkpoints')
Out[27]:
'checkpoints/i3960_l256.ckpt'
In [34]:
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = sample(checkpoint, 2000, lstm_size, len(vocab), prime="Far")
print(samp)
INFO:tensorflow:Restoring parameters from checkpoints/i3960_l256.ckpt
Farshas to be high to time of his sister's one went into the particuled,
and hostelistly that the tervation of
and to could be as in a stout women, and the solution of which she was an only should she had seeing the day, but the clints, he was a mother and
had been to see, and to be hards in the day it was a contenticle, that it had
say his
wance. He was the deacte with
his feeling of surrings the sole clear, while he had been that he had always seemed of her face of his heart. There wanted into healding at the sight of a commost and hand of the times to say his wife at
her words, but that he would say, said as he could not spire that the place who, all at a sound
and he was to be stone of the carriage of them.
"You can't have an incloudly. I can't
see you would be as to till your face of my step, I'll can she was took in a minuses they think with his husband, but anythy terribly it," he said, and with an
impression out of the sting too. "I share to comprehend you. He's because they were son's?"
"You're a letter to the princess. I am transable in the distrible.
He see," said Anna, and he was not to be come at
the stedstaid.
"They see it to herself with my chief sham when the shame."
Af he was an expression of
her head with a
look, what had always sat some sones and to a drank of the same
and sorts, the distest works of her carriage," he added, and thought a came in the drinks was that was anyone and he said settled, and had been sone and
had taken herself to the considerate words that they had
been to could not see him, as he said: Stepan Arkadyevitch come to that the children, would be setting the same as they are an hart over that as it in her friend and her showers to tell the care to
have the
poorming of words, they always said he went at them. The deceiving she said, went up of the tree, she could not say, how had an already and sat down and a member and to his family, and was need at him. The children, his hand and a saying, and
and went
of and
heart he was th
In [30]:
checkpoint = 'checkpoints/i200_l256.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)
INFO:tensorflow:Restoring parameters from checkpoints/i200_l256.ckpt
Farer on witt, the to wering the ser the the sis he hit has and so seer had sor the teedens hare alind he ald and wamith on the han so tee toull weunt whe that and to sat hat his the and wang to chite the
san hor tine she that ond hit
shes sith whot of of
and the ar the seed the terinsed, hat and, ang hor, af to te sos the she se at hans
sind o mere soudsering at the wan wan was alle as the athe thetered on hor ther ane wers ond the shr widit his he wersith an wer ter thot that ande sat hat and so wid sont ase soung, and wing what and his he torle sith wer and whes shis ans, wos har ale wam son the sortithin hot he wong wor hond wam has of the ang the was to wat he were as that thas the ald ans thitt and tot af atert the seante and that her
sous as had his thame whos
wis he ther sis to cortirit he
wat hane he wes tish an andens of on hos thin sase and
hers onde had an he te a thet tane to hat his of hared the wis her timile, seet he wer of thim,, wis ond, and has timis soud to sit he hin t
In [32]:
checkpoint = 'checkpoints/i600_l256.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)
INFO:tensorflow:Restoring parameters from checkpoints/i600_l256.ckpt
Farer wasted, and soor of stised, and wat tole a mating and her she
what the taint which answer and had she concencow was
thought thinked him,
ablooking the carting of her. She seeting of the stiress of it.
And
stor stomened," seid the hingry.
Chisters, she
said, and his sain to the hand of the carrion as tertal of the
couthel of the wand that the telled to and he was bround it.... I dour to the sore her sometions to the cantant the said all the peetings all the stord the come, the sers to his her her at the sole, work to that she had sears
wealds thit wis so the chanding the telling her head at she his was the
preasing and
havand of sond that when so alone.
"Yes, whether said of hard of it
stores the selled his
went
of the
wassing that
him there stent
on the sear had soonen hard the sather it what he say the carrity this were the seening of and at him a dacition the with the
hushing a thind. He sendicted of he had that the was the the stour, and and the pristant, and the pint op to h
In [35]:
checkpoint = 'checkpoints/i1200_l256.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)
INFO:tensorflow:Restoring parameters from checkpoints/i1200_l256.ckpt
Faring the seamed.
"I as he's
not it
of thome and why he could not have stell a way that him to have her at the
mompere in
the morried the showed in the thishiss. There to
the
place to her way that though he
wanted.
He did not said to her,
that he had break of she would not be a caming the thought, and who spoke a thout of so were all her
all how sone him out to at
the poseress,
and a starring on
her how the sole and tranded. She heard along in the warding the promartions was so this husperially his steping is, and that the seepes or the convirted as and, the
more....
The sands and his shele worse at the sant and the
mat the concead in a teater had been the candle talked of
her sead. Stepan Arkadyevitch should this she doed it was that to the prosain of the countell to the more of which
all the to sort that the carried, had all his has to the caming. Stepan Arkadyevitch, he was a man with her was seet, and the shoold,
and any seemated that the sere would be stard, and they had but
alway
In [ ]:
Content source: jc091/deep-learning
Similar notebooks: