In this notebook, I'll build a character-wise RNN trained on Anna Karenina, one of my all-time favorite books. It'll be able to generate new text based on the text from the book.
This network is based off of Andrej Karpathy's post on RNNs and implementation in Torch. Also, some information here at r2rt and from Sherjil Ozair on GitHub. Below is the general architecture of the character-wise RNN.
In [1]:
import time
from collections import namedtuple
import numpy as np
import tensorflow as tf
First we'll load the text file and convert it into integers for our network to use. Here I'm creating a couple dictionaries to convert the characters to and from integers. Encoding the characters as integers makes it easier to use as input in the network.
In [2]:
with open('chalo.txt', 'r') as f:
text=f.read()
vocab = set(text)
vocab_to_int = {c: i for i, c in enumerate(vocab)}
int_to_vocab = dict(enumerate(vocab))
chars = np.array([vocab_to_int[c] for c in text], dtype=np.int32)
Let's check out the first 100 characters, make sure everything is peachy. According to the American Book Review, this is the 6th best first line of a book ever.
In [3]:
text[:100]
Out[3]:
'Hoy ha muerto mamá. O quizá ayer. No lo sé. Recibí un telegrama del asilo: «Falleció su madre. Entie'
And we can see the characters encoded as integers.
In [4]:
chars[:100]
Out[4]:
array([ 1, 46, 61, 32, 74, 5, 32, 57, 23, 52, 48, 34, 46, 32, 57, 5, 57,
63, 40, 32, 60, 32, 75, 23, 9, 19, 63, 32, 5, 61, 52, 48, 40, 32,
76, 46, 32, 29, 46, 32, 62, 17, 40, 32, 51, 52, 67, 9, 49, 45, 32,
23, 69, 32, 34, 52, 29, 52, 43, 48, 5, 57, 5, 32, 41, 52, 29, 32,
5, 62, 9, 29, 46, 58, 32, 64, 7, 5, 29, 29, 52, 67, 9, 70, 32,
62, 23, 32, 57, 5, 41, 48, 52, 40, 32, 26, 69, 34, 9, 52], dtype=int32)
Now I need to split up the data into batches, and into training and validation sets. I should be making a test set here, but I'm not going to worry about that. My test will be if the network can generate new text.
Here I'll make both input and target arrays. The targets are the same as the inputs, except shifted one character over. I'll also drop the last bit of data so that I'll only have completely full batches.
The idea here is to make a 2D matrix where the number of rows is equal to the batch size. Each row will be one long concatenated string from the character data. We'll split this data into a training set and validation set using the split_frac keyword. This will keep 90% of the batches in the training set, the other 10% in the validation set.
In [5]:
def split_data(chars, batch_size, num_steps, split_frac=0.9):
"""
Split character data into training and validation sets, inputs and targets for each set.
Arguments
---------
chars: character array
batch_size: Size of examples in each of batch
num_steps: Number of sequence steps to keep in the input and pass to the network
split_frac: Fraction of batches to keep in the training set
Returns train_x, train_y, val_x, val_y
"""
slice_size = batch_size * num_steps
n_batches = int(len(chars) / slice_size)
# Drop the last few characters to make only full batches
x = chars[: n_batches*slice_size]
y = chars[1: n_batches*slice_size + 1]
# Split the data into batch_size slices, then stack them into a 2D matrix
x = np.stack(np.split(x, batch_size))
y = np.stack(np.split(y, batch_size))
# Now x and y are arrays with dimensions batch_size x n_batches*num_steps
# Split into training and validation sets, keep the virst split_frac batches for training
split_idx = int(n_batches*split_frac)
train_x, train_y= x[:, :split_idx*num_steps], y[:, :split_idx*num_steps]
val_x, val_y = x[:, split_idx*num_steps:], y[:, split_idx*num_steps:]
return train_x, train_y, val_x, val_y
Now I'll make my data sets and we can check out what's going on here. Here I'm going to use a batch size of 10 and 50 sequence steps.
In [6]:
train_x, train_y, val_x, val_y = split_data(chars, 10, 50)
In [7]:
train_x.shape
Out[7]:
(10, 14700)
Looking at the size of this array, we see that we have rows equal to the batch size. When we want to get a batch out of here, we can grab a subset of this array that contains all the rows but has a width equal to the number of steps in the sequence. The first batch looks like this:
In [8]:
train_x[:,:50]
Out[8]:
array([[ 1, 46, 61, 32, 74, 5, 32, 57, 23, 52, 48, 34, 46, 32, 57, 5, 57,
63, 40, 32, 60, 32, 75, 23, 9, 19, 63, 32, 5, 61, 52, 48, 40, 32,
76, 46, 32, 29, 46, 32, 62, 17, 40, 32, 51, 52, 67, 9, 49, 45],
[69, 48, 9, 62, 5, 32, 52, 29, 32, 48, 46, 62, 34, 48, 46, 32, 74,
23, 52, 62, 23, 41, 46, 32, 61, 32, 29, 5, 48, 43, 46, 40, 32, 0,
23, 52, 43, 46, 32, 69, 46, 62, 32, 5, 8, 5, 48, 34, 5, 57],
[48, 52, 46, 32, 75, 23, 52, 27, 62, 5, 48, 69, 5, 22, 32, 75, 23,
52, 32, 29, 52, 32, 74, 5, 67, 52, 32, 8, 52, 48, 41, 52, 48, 32,
67, 5, 62, 9, 32, 34, 46, 41, 46, 32, 52, 29, 32, 8, 52, 29],
[45, 32, 57, 9, 62, 57, 46, 22, 32, 34, 48, 5, 34, 5, 49, 5, 32,
41, 52, 32, 8, 52, 48, 66, 46, 48, 5, 48, 32, 29, 5, 32, 46, 62,
67, 23, 48, 9, 41, 5, 41, 32, 41, 52, 29, 32, 8, 5, 62, 9],
[62, 34, 5, 49, 5, 32, 66, 48, 45, 5, 32, 61, 32, 57, 52, 32, 43,
23, 62, 34, 5, 49, 5, 32, 69, 5, 41, 5, 48, 40, 32, 76, 46, 62,
32, 5, 29, 52, 11, 5, 57, 46, 62, 32, 67, 46, 69, 32, 21, 5],
[32, 75, 23, 9, 52, 69, 52, 62, 32, 5, 57, 5, 49, 5, 69, 40, 32,
53, 75, 23, 45, 32, 52, 29, 32, 5, 49, 46, 43, 5, 41, 46, 32, 57,
52, 32, 9, 69, 34, 52, 48, 48, 23, 57, 8, 9, 70, 32, 61, 32],
[ 9, 52, 69, 34, 52, 57, 52, 69, 34, 52, 32, 52, 29, 32, 62, 63, 49,
5, 41, 46, 32, 8, 5, 48, 5, 27, 52, 62, 34, 48, 52, 67, 74, 5,
48, 32, 52, 29, 32, 67, 23, 52, 48, 8, 46, 32, 41, 52, 32, 21],
[41, 52, 32, 69, 23, 52, 54, 46, 32, 61, 32, 57, 52, 32, 52, 69, 67,
46, 69, 34, 48, 17, 32, 52, 69, 32, 29, 5, 32, 57, 9, 62, 57, 5,
27, 62, 5, 29, 5, 22, 32, 41, 52, 29, 5, 69, 34, 52, 32, 41],
[29, 9, 43, 52, 69, 34, 52, 40, 27, 44, 62, 34, 52, 41, 52, 62, 32,
29, 52, 32, 74, 5, 69, 32, 46, 45, 41, 46, 22, 32, 55, 69, 46, 32,
52, 62, 32, 67, 9, 52, 48, 34, 46, 47, 32, 6, 5, 49, 52, 32],
[32, 5, 32, 23, 69, 32, 67, 5, 41, 5, 29, 62, 46, 22, 32, 34, 48,
52, 8, 5, 48, 32, 8, 46, 48, 32, 52, 62, 67, 5, 29, 46, 69, 52,
62, 40, 32, 68, 48, 52, 46, 32, 75, 23, 52, 32, 66, 23, 52, 32]], dtype=int32)
I'll write another function to grab batches out of the arrays made by split_data. Here each batch will be a sliding window on these arrays with size batch_size X num_steps. For example, if we want our network to train on a sequence of 100 characters, num_steps = 100. For the next batch, we'll shift this window the next sequence of num_steps characters. In this way we can feed batches to the network and the cell states will continue through on each batch.
In [9]:
def get_batch(arrs, num_steps):
batch_size, slice_size = arrs[0].shape
n_batches = int(slice_size/num_steps)
for b in range(n_batches):
yield [x[:, b*num_steps: (b+1)*num_steps] for x in arrs]
In [10]:
def build_rnn(num_classes, batch_size=50, num_steps=50, lstm_size=128, num_layers=2,
learning_rate=0.001, grad_clip=5, sampling=False):
# When we're using this network for sampling later, we'll be passing in
# one character at a time, so providing an option for that
if sampling == True:
batch_size, num_steps = 1, 1
tf.reset_default_graph()
# Declare placeholders we'll feed into the graph
inputs = tf.placeholder(tf.int32, [batch_size, num_steps], name='inputs')
targets = tf.placeholder(tf.int32, [batch_size, num_steps], name='targets')
# Keep probability placeholder for drop out layers
keep_prob = tf.placeholder(tf.float32, name='keep_prob')
# One-hot encoding the input and target characters
x_one_hot = tf.one_hot(inputs, num_classes)
y_one_hot = tf.one_hot(targets, num_classes)
### Build the RNN layers
# Use a basic LSTM cell
lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
# Add dropout to the cell
drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
# Stack up multiple LSTM layers, for deep learning
cell = tf.contrib.rnn.MultiRNNCell([drop] * num_layers)
initial_state = cell.zero_state(batch_size, tf.float32)
### Run the data through the RNN layers
# This makes a list where each element is on step in the sequence
rnn_inputs = [tf.squeeze(i, squeeze_dims=[1]) for i in tf.split(x_one_hot, num_steps, 1)]
# Run each sequence step through the RNN and collect the outputs
outputs, state = tf.contrib.rnn.static_rnn(cell, rnn_inputs, initial_state=initial_state)
final_state = state
# Reshape output so it's a bunch of rows, one output row for each step for each batch
seq_output = tf.concat(outputs, axis=1)
output = tf.reshape(seq_output, [-1, lstm_size])
# Now connect the RNN putputs to a softmax layer
with tf.variable_scope('softmax'):
softmax_w = tf.Variable(tf.truncated_normal((lstm_size, num_classes), stddev=0.1))
softmax_b = tf.Variable(tf.zeros(num_classes))
# Since output is a bunch of rows of RNN cell outputs, logits will be a bunch
# of rows of logit outputs, one for each step and batch
logits = tf.matmul(output, softmax_w) + softmax_b
# Use softmax to get the probabilities for predicted characters
preds = tf.nn.softmax(logits, name='predictions')
# Reshape the targets to match the logits
y_reshaped = tf.reshape(y_one_hot, [-1, num_classes])
loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y_reshaped)
cost = tf.reduce_mean(loss)
# Optimizer for training, using gradient clipping to control exploding gradients
tvars = tf.trainable_variables()
grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars), grad_clip)
train_op = tf.train.AdamOptimizer(learning_rate)
optimizer = train_op.apply_gradients(zip(grads, tvars))
# Export the nodes
# NOTE: I'm using a namedtuple here because I think they are cool
export_nodes = ['inputs', 'targets', 'initial_state', 'final_state',
'keep_prob', 'cost', 'preds', 'optimizer']
Graph = namedtuple('Graph', export_nodes)
local_dict = locals()
graph = Graph(*[local_dict[each] for each in export_nodes])
return graph
Here I'm defining the hyperparameters for the network.
batch_size - Number of sequences running through the network in one pass.num_steps - Number of characters in the sequence the network is trained on. Larger is better typically, the network will learn more long range dependencies. But it takes longer to train. 100 is typically a good number here.lstm_size - The number of units in the hidden layers.num_layers - Number of hidden LSTM layers to uselearning_rate - Learning rate for trainingkeep_prob - The dropout keep probability when training. If you're network is overfitting, try decreasing this.Here's some good advice from Andrej Karpathy on training the network. I'm going to write it in here for your benefit, but also link to where it originally came from.
Tips and Tricks
Monitoring Validation Loss vs. Training Loss
If you're somewhat new to Machine Learning or Neural Networks it can take a bit of expertise to get good models. The most important quantity to keep track of is the difference between your training loss (printed during training) and the validation loss (printed once in a while when the RNN is run on the validation data (by default every 1000 iterations)). In particular:
- If your training loss is much lower than validation loss then this means the network might be overfitting. Solutions to this are to decrease your network size, or to increase dropout. For example you could try dropout of 0.5 and so on.
- If your training/validation loss are about equal then your model is underfitting. Increase the size of your model (either number of layers or the raw number of neurons per layer)
Approximate number of parameters
The two most important parameters that control the model are
lstm_sizeandnum_layers. I would advise that you always usenum_layersof either 2/3. Thelstm_sizecan be adjusted based on how much data you have. The two important quantities to keep track of here are:
- The number of parameters in your model. This is printed when you start training.
- The size of your dataset. 1MB file is approximately 1 million characters.
These two should be about the same order of magnitude. It's a little tricky to tell. Here are some examples:
- I have a 100MB dataset and I'm using the default parameter settings (which currently print 150K parameters). My data size is significantly larger (100 mil >> 0.15 mil), so I expect to heavily underfit. I am thinking I can comfortably afford to make
lstm_sizelarger.- I have a 10MB dataset and running a 10 million parameter model. I'm slightly nervous and I'm carefully monitoring my validation loss. If it's larger than my training loss then I may want to try to increase dropout a bit and see if that heps the validation loss.
Best models strategy
The winning strategy to obtaining very good models (if you have the compute time) is to always err on making the network larger (as large as you're willing to wait for it to compute) and then try different dropout values (between 0,1). Whatever model has the best validation performance (the loss, written in the checkpoint filename, low is good) is the one you should use in the end.
It is very common in deep learning to run many different models with many different hyperparameter settings, and in the end take whatever checkpoint gave the best validation performance.
By the way, the size of your training and validation splits are also parameters. Make sure you have a decent amount of data in your validation set or otherwise the validation performance will be noisy and not very informative.
In [11]:
batch_size = 100
num_steps = 100
lstm_size = 512
num_layers = 2
learning_rate = 0.001
keep_prob = 0.3
Time for training which is pretty straightforward. Here I pass in some data, and get an LSTM state back. Then I pass that state back in to the network so the next batch can continue the state from the previous batch. And every so often (set by save_every_n) I calculate the validation loss and save a checkpoint.
Here I'm saving checkpoints with the format
i{iteration number}_l{# hidden layer units}_v{validation loss}.ckpt
In [12]:
epochs = 300
# Save every N iterations
save_every_n = 100
train_x, train_y, val_x, val_y = split_data(chars, batch_size, num_steps)
model = build_rnn(len(vocab),
batch_size=batch_size,
num_steps=num_steps,
learning_rate=learning_rate,
lstm_size=lstm_size,
num_layers=num_layers)
saver = tf.train.Saver(max_to_keep=100)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# Use the line below to load a checkpoint and resume training
#saver.restore(sess, 'checkpoints/______.ckpt')
n_batches = int(train_x.shape[1]/num_steps)
iterations = n_batches * epochs
for e in range(epochs):
# Train network
new_state = sess.run(model.initial_state)
loss = 0
for b, (x, y) in enumerate(get_batch([train_x, train_y], num_steps), 1):
iteration = e*n_batches + b
start = time.time()
feed = {model.inputs: x,
model.targets: y,
model.keep_prob: keep_prob,
model.initial_state: new_state}
batch_loss, new_state, _ = sess.run([model.cost, model.final_state, model.optimizer],
feed_dict=feed)
loss += batch_loss
end = time.time()
print('Epoch {}/{} '.format(e+1, epochs),
'Iteration {}/{}'.format(iteration, iterations),
'Training loss: {:.4f}'.format(loss/b),
'{:.4f} sec/batch'.format((end-start)))
if (iteration%save_every_n == 0) or (iteration == iterations):
# Check performance, notice dropout has been set to 1
val_loss = []
new_state = sess.run(model.initial_state)
for x, y in get_batch([val_x, val_y], num_steps):
feed = {model.inputs: x,
model.targets: y,
model.keep_prob: 1.,
model.initial_state: new_state}
batch_loss, new_state = sess.run([model.cost, model.final_state], feed_dict=feed)
val_loss.append(batch_loss)
print('Validation loss:', np.mean(val_loss),
'Saving checkpoint!')
saver.save(sess, "checkpoints/i{}_l{}_v{:.3f}.ckpt".format(iteration, lstm_size, np.mean(val_loss)))
Epoch 1/300 Iteration 1/4200 Training loss: 4.3580 2.8076 sec/batch
Epoch 1/300 Iteration 2/4200 Training loss: 4.3188 1.5070 sec/batch
Epoch 1/300 Iteration 3/4200 Training loss: 4.1688 1.5064 sec/batch
Epoch 1/300 Iteration 4/4200 Training loss: 4.3300 1.5074 sec/batch
Epoch 1/300 Iteration 5/4200 Training loss: 4.3594 1.5077 sec/batch
Epoch 1/300 Iteration 6/4200 Training loss: 4.3165 1.5088 sec/batch
Epoch 1/300 Iteration 7/4200 Training loss: 4.2508 1.5127 sec/batch
Epoch 1/300 Iteration 8/4200 Training loss: 4.1764 1.5142 sec/batch
Epoch 1/300 Iteration 9/4200 Training loss: 4.1068 1.5244 sec/batch
Epoch 1/300 Iteration 10/4200 Training loss: 4.0448 1.5276 sec/batch
Epoch 1/300 Iteration 11/4200 Training loss: 3.9894 1.5257 sec/batch
Epoch 1/300 Iteration 12/4200 Training loss: 3.9423 1.5294 sec/batch
Epoch 1/300 Iteration 13/4200 Training loss: 3.8991 1.5294 sec/batch
Epoch 1/300 Iteration 14/4200 Training loss: 3.8601 1.5252 sec/batch
Epoch 2/300 Iteration 15/4200 Training loss: 3.4589 1.5324 sec/batch
Epoch 2/300 Iteration 16/4200 Training loss: 3.3926 1.5360 sec/batch
Epoch 2/300 Iteration 17/4200 Training loss: 3.3657 1.5405 sec/batch
Epoch 2/300 Iteration 18/4200 Training loss: 3.3451 1.5844 sec/batch
Epoch 2/300 Iteration 19/4200 Training loss: 3.3327 1.5512 sec/batch
Epoch 2/300 Iteration 20/4200 Training loss: 3.3235 1.5681 sec/batch
Epoch 2/300 Iteration 21/4200 Training loss: 3.3156 1.6507 sec/batch
Epoch 2/300 Iteration 22/4200 Training loss: 3.3063 1.5789 sec/batch
Epoch 2/300 Iteration 23/4200 Training loss: 3.2969 1.5796 sec/batch
Epoch 2/300 Iteration 24/4200 Training loss: 3.2903 1.5555 sec/batch
Epoch 2/300 Iteration 25/4200 Training loss: 3.2824 1.5397 sec/batch
Epoch 2/300 Iteration 26/4200 Training loss: 3.2763 1.5411 sec/batch
Epoch 2/300 Iteration 27/4200 Training loss: 3.2720 1.5486 sec/batch
Epoch 2/300 Iteration 28/4200 Training loss: 3.2665 1.5414 sec/batch
Epoch 3/300 Iteration 29/4200 Training loss: 3.2789 1.5485 sec/batch
Epoch 3/300 Iteration 30/4200 Training loss: 3.2354 1.5479 sec/batch
Epoch 3/300 Iteration 31/4200 Training loss: 3.2275 1.5658 sec/batch
Epoch 3/300 Iteration 32/4200 Training loss: 3.2161 1.5469 sec/batch
Epoch 3/300 Iteration 33/4200 Training loss: 3.2102 1.5491 sec/batch
Epoch 3/300 Iteration 34/4200 Training loss: 3.2068 1.5627 sec/batch
Epoch 3/300 Iteration 35/4200 Training loss: 3.2041 1.5515 sec/batch
Epoch 3/300 Iteration 36/4200 Training loss: 3.1996 1.5595 sec/batch
Epoch 3/300 Iteration 37/4200 Training loss: 3.1957 1.5588 sec/batch
Epoch 3/300 Iteration 38/4200 Training loss: 3.1936 1.5551 sec/batch
Epoch 3/300 Iteration 39/4200 Training loss: 3.1894 1.5566 sec/batch
Epoch 3/300 Iteration 40/4200 Training loss: 3.1860 1.5584 sec/batch
Epoch 3/300 Iteration 41/4200 Training loss: 3.1844 1.5636 sec/batch
Epoch 3/300 Iteration 42/4200 Training loss: 3.1818 1.5663 sec/batch
Epoch 4/300 Iteration 43/4200 Training loss: 3.2253 1.5823 sec/batch
Epoch 4/300 Iteration 44/4200 Training loss: 3.1890 1.6262 sec/batch
Epoch 4/300 Iteration 45/4200 Training loss: 3.1820 1.6186 sec/batch
Epoch 4/300 Iteration 46/4200 Training loss: 3.1734 1.5781 sec/batch
Epoch 4/300 Iteration 47/4200 Training loss: 3.1673 1.5536 sec/batch
Epoch 4/300 Iteration 48/4200 Training loss: 3.1668 1.5886 sec/batch
Epoch 4/300 Iteration 49/4200 Training loss: 3.1664 1.6146 sec/batch
Epoch 4/300 Iteration 50/4200 Training loss: 3.1626 1.5692 sec/batch
Epoch 4/300 Iteration 51/4200 Training loss: 3.1599 1.5754 sec/batch
Epoch 4/300 Iteration 52/4200 Training loss: 3.1580 1.6019 sec/batch
Epoch 4/300 Iteration 53/4200 Training loss: 3.1555 1.5995 sec/batch
Epoch 4/300 Iteration 54/4200 Training loss: 3.1527 1.5932 sec/batch
Epoch 4/300 Iteration 55/4200 Training loss: 3.1512 1.5643 sec/batch
Epoch 4/300 Iteration 56/4200 Training loss: 3.1494 1.5815 sec/batch
Epoch 5/300 Iteration 57/4200 Training loss: 3.1918 1.5826 sec/batch
Epoch 5/300 Iteration 58/4200 Training loss: 3.1586 1.6145 sec/batch
Epoch 5/300 Iteration 59/4200 Training loss: 3.1503 1.6327 sec/batch
Epoch 5/300 Iteration 60/4200 Training loss: 3.1448 1.5775 sec/batch
Epoch 5/300 Iteration 61/4200 Training loss: 3.1413 1.5935 sec/batch
Epoch 5/300 Iteration 62/4200 Training loss: 3.1399 1.6811 sec/batch
Epoch 5/300 Iteration 63/4200 Training loss: 3.1388 1.6209 sec/batch
Epoch 5/300 Iteration 64/4200 Training loss: 3.1355 1.6102 sec/batch
Epoch 5/300 Iteration 65/4200 Training loss: 3.1321 1.6604 sec/batch
Epoch 5/300 Iteration 66/4200 Training loss: 3.1305 1.6119 sec/batch
Epoch 5/300 Iteration 67/4200 Training loss: 3.1277 1.6205 sec/batch
Epoch 5/300 Iteration 68/4200 Training loss: 3.1254 1.6245 sec/batch
Epoch 5/300 Iteration 69/4200 Training loss: 3.1242 1.6192 sec/batch
Epoch 5/300 Iteration 70/4200 Training loss: 3.1224 1.6184 sec/batch
Epoch 6/300 Iteration 71/4200 Training loss: 3.1680 1.6557 sec/batch
Epoch 6/300 Iteration 72/4200 Training loss: 3.1372 1.6291 sec/batch
Epoch 6/300 Iteration 73/4200 Training loss: 3.1282 1.6131 sec/batch
Epoch 6/300 Iteration 74/4200 Training loss: 3.1200 1.6303 sec/batch
Epoch 6/300 Iteration 75/4200 Training loss: 3.1164 1.6443 sec/batch
Epoch 6/300 Iteration 76/4200 Training loss: 3.1144 1.5969 sec/batch
Epoch 6/300 Iteration 77/4200 Training loss: 3.1135 1.6080 sec/batch
Epoch 6/300 Iteration 78/4200 Training loss: 3.1105 1.6287 sec/batch
Epoch 6/300 Iteration 79/4200 Training loss: 3.1075 1.6796 sec/batch
Epoch 6/300 Iteration 80/4200 Training loss: 3.1060 1.6526 sec/batch
Epoch 6/300 Iteration 81/4200 Training loss: 3.1027 1.6085 sec/batch
Epoch 6/300 Iteration 82/4200 Training loss: 3.1005 1.6070 sec/batch
Epoch 6/300 Iteration 83/4200 Training loss: 3.0994 1.6505 sec/batch
Epoch 6/300 Iteration 84/4200 Training loss: 3.0973 1.6317 sec/batch
Epoch 7/300 Iteration 85/4200 Training loss: 3.1299 1.6446 sec/batch
Epoch 7/300 Iteration 86/4200 Training loss: 3.1047 1.6360 sec/batch
Epoch 7/300 Iteration 87/4200 Training loss: 3.0978 1.6471 sec/batch
Epoch 7/300 Iteration 88/4200 Training loss: 3.0902 1.6530 sec/batch
Epoch 7/300 Iteration 89/4200 Training loss: 3.0862 1.6477 sec/batch
Epoch 7/300 Iteration 90/4200 Training loss: 3.0842 1.6622 sec/batch
Epoch 7/300 Iteration 91/4200 Training loss: 3.0817 1.6293 sec/batch
Epoch 7/300 Iteration 92/4200 Training loss: 3.0787 1.6587 sec/batch
Epoch 7/300 Iteration 93/4200 Training loss: 3.0751 1.6231 sec/batch
Epoch 7/300 Iteration 94/4200 Training loss: 3.0726 1.6414 sec/batch
Epoch 7/300 Iteration 95/4200 Training loss: 3.0692 1.7177 sec/batch
Epoch 7/300 Iteration 96/4200 Training loss: 3.0661 1.6585 sec/batch
Epoch 7/300 Iteration 97/4200 Training loss: 3.0650 1.6813 sec/batch
Epoch 7/300 Iteration 98/4200 Training loss: 3.0627 1.6770 sec/batch
Epoch 8/300 Iteration 99/4200 Training loss: 3.0900 1.6870 sec/batch
Epoch 8/300 Iteration 100/4200 Training loss: 3.0611 1.6773 sec/batch
Validation loss: 3.00469 Saving checkpoint!
Epoch 8/300 Iteration 101/4200 Training loss: 3.0534 3.1764 sec/batch
Epoch 8/300 Iteration 102/4200 Training loss: 3.0443 1.9290 sec/batch
Epoch 8/300 Iteration 103/4200 Training loss: 3.0407 1.6469 sec/batch
Epoch 8/300 Iteration 104/4200 Training loss: 3.0364 1.6375 sec/batch
Epoch 8/300 Iteration 105/4200 Training loss: 3.0350 1.6202 sec/batch
Epoch 8/300 Iteration 106/4200 Training loss: 3.0319 1.6619 sec/batch
Epoch 8/300 Iteration 107/4200 Training loss: 3.0290 1.6400 sec/batch
Epoch 8/300 Iteration 108/4200 Training loss: 3.0264 1.6474 sec/batch
Epoch 8/300 Iteration 109/4200 Training loss: 3.0232 1.6519 sec/batch
Epoch 8/300 Iteration 110/4200 Training loss: 3.0201 1.6477 sec/batch
Epoch 8/300 Iteration 111/4200 Training loss: 3.0187 1.6432 sec/batch
Epoch 8/300 Iteration 112/4200 Training loss: 3.0153 1.6458 sec/batch
Epoch 9/300 Iteration 113/4200 Training loss: 3.0260 1.6472 sec/batch
Epoch 9/300 Iteration 114/4200 Training loss: 3.0033 1.6500 sec/batch
Epoch 9/300 Iteration 115/4200 Training loss: 2.9909 1.6433 sec/batch
Epoch 9/300 Iteration 116/4200 Training loss: 2.9823 1.6464 sec/batch
Epoch 9/300 Iteration 117/4200 Training loss: 2.9775 1.6595 sec/batch
Epoch 9/300 Iteration 118/4200 Training loss: 2.9724 1.6349 sec/batch
Epoch 9/300 Iteration 119/4200 Training loss: 2.9680 1.6572 sec/batch
Epoch 9/300 Iteration 120/4200 Training loss: 2.9615 1.6252 sec/batch
Epoch 9/300 Iteration 121/4200 Training loss: 2.9565 1.6577 sec/batch
Epoch 9/300 Iteration 122/4200 Training loss: 2.9518 1.6859 sec/batch
Epoch 9/300 Iteration 123/4200 Training loss: 2.9469 1.6858 sec/batch
Epoch 9/300 Iteration 124/4200 Training loss: 2.9420 1.6410 sec/batch
Epoch 9/300 Iteration 125/4200 Training loss: 2.9378 1.6807 sec/batch
Epoch 9/300 Iteration 126/4200 Training loss: 2.9321 1.6763 sec/batch
Epoch 10/300 Iteration 127/4200 Training loss: 2.9148 1.6937 sec/batch
Epoch 10/300 Iteration 128/4200 Training loss: 2.8872 1.6636 sec/batch
Epoch 10/300 Iteration 129/4200 Training loss: 2.8744 1.6697 sec/batch
Epoch 10/300 Iteration 130/4200 Training loss: 2.8650 1.6607 sec/batch
Epoch 10/300 Iteration 131/4200 Training loss: 2.8600 1.6684 sec/batch
Epoch 10/300 Iteration 132/4200 Training loss: 2.8539 1.6654 sec/batch
Epoch 10/300 Iteration 133/4200 Training loss: 2.8486 1.6917 sec/batch
Epoch 10/300 Iteration 134/4200 Training loss: 2.8426 1.7217 sec/batch
Epoch 10/300 Iteration 135/4200 Training loss: 2.8367 1.6923 sec/batch
Epoch 10/300 Iteration 136/4200 Training loss: 2.8317 1.7349 sec/batch
Epoch 10/300 Iteration 137/4200 Training loss: 2.8274 1.6982 sec/batch
Epoch 10/300 Iteration 138/4200 Training loss: 2.8216 1.7001 sec/batch
Epoch 10/300 Iteration 139/4200 Training loss: 2.8178 1.6911 sec/batch
Epoch 10/300 Iteration 140/4200 Training loss: 2.8129 1.6870 sec/batch
Epoch 11/300 Iteration 141/4200 Training loss: 2.7940 1.6513 sec/batch
Epoch 11/300 Iteration 142/4200 Training loss: 2.7629 1.6921 sec/batch
Epoch 11/300 Iteration 143/4200 Training loss: 2.7513 1.6947 sec/batch
Epoch 11/300 Iteration 144/4200 Training loss: 2.7420 1.6932 sec/batch
Epoch 11/300 Iteration 145/4200 Training loss: 2.7386 1.7005 sec/batch
Epoch 11/300 Iteration 146/4200 Training loss: 2.7332 1.7196 sec/batch
Epoch 11/300 Iteration 147/4200 Training loss: 2.7279 1.7187 sec/batch
Epoch 11/300 Iteration 148/4200 Training loss: 2.7225 1.7273 sec/batch
Epoch 11/300 Iteration 149/4200 Training loss: 2.7175 1.6905 sec/batch
Epoch 11/300 Iteration 150/4200 Training loss: 2.7113 1.6881 sec/batch
Epoch 11/300 Iteration 151/4200 Training loss: 2.7056 1.7154 sec/batch
Epoch 11/300 Iteration 152/4200 Training loss: 2.6998 1.7007 sec/batch
Epoch 11/300 Iteration 153/4200 Training loss: 2.6956 1.6974 sec/batch
Epoch 11/300 Iteration 154/4200 Training loss: 2.6903 1.6558 sec/batch
Epoch 12/300 Iteration 155/4200 Training loss: 2.6798 1.6741 sec/batch
Epoch 12/300 Iteration 156/4200 Training loss: 2.6490 1.6497 sec/batch
Epoch 12/300 Iteration 157/4200 Training loss: 2.6400 1.7140 sec/batch
Epoch 12/300 Iteration 158/4200 Training loss: 2.6303 1.6926 sec/batch
Epoch 12/300 Iteration 159/4200 Training loss: 2.6272 1.7105 sec/batch
Epoch 12/300 Iteration 160/4200 Training loss: 2.6229 1.6924 sec/batch
Epoch 12/300 Iteration 161/4200 Training loss: 2.6156 1.6996 sec/batch
Epoch 12/300 Iteration 162/4200 Training loss: 2.6092 1.6977 sec/batch
Epoch 12/300 Iteration 163/4200 Training loss: 2.6037 1.6868 sec/batch
Epoch 12/300 Iteration 164/4200 Training loss: 2.5987 1.7007 sec/batch
Epoch 12/300 Iteration 165/4200 Training loss: 2.5941 1.6901 sec/batch
Epoch 12/300 Iteration 166/4200 Training loss: 2.5886 1.6590 sec/batch
Epoch 12/300 Iteration 167/4200 Training loss: 2.5842 1.6966 sec/batch
Epoch 12/300 Iteration 168/4200 Training loss: 2.5791 1.6490 sec/batch
Epoch 13/300 Iteration 169/4200 Training loss: 2.5611 1.7514 sec/batch
Epoch 13/300 Iteration 170/4200 Training loss: 2.5359 1.7446 sec/batch
Epoch 13/300 Iteration 171/4200 Training loss: 2.5263 1.7641 sec/batch
Epoch 13/300 Iteration 172/4200 Training loss: 2.5179 1.8205 sec/batch
Epoch 13/300 Iteration 173/4200 Training loss: 2.5148 1.9006 sec/batch
Epoch 13/300 Iteration 174/4200 Training loss: 2.5106 1.7837 sec/batch
Epoch 13/300 Iteration 175/4200 Training loss: 2.5069 1.7585 sec/batch
Epoch 13/300 Iteration 176/4200 Training loss: 2.5021 1.8237 sec/batch
Epoch 13/300 Iteration 177/4200 Training loss: 2.4981 1.7595 sec/batch
Epoch 13/300 Iteration 178/4200 Training loss: 2.4935 1.7623 sec/batch
Epoch 13/300 Iteration 179/4200 Training loss: 2.4897 1.7377 sec/batch
Epoch 13/300 Iteration 180/4200 Training loss: 2.4859 1.6993 sec/batch
Epoch 13/300 Iteration 181/4200 Training loss: 2.4830 1.6920 sec/batch
Epoch 13/300 Iteration 182/4200 Training loss: 2.4790 1.6639 sec/batch
Epoch 14/300 Iteration 183/4200 Training loss: 2.4706 1.6931 sec/batch
Epoch 14/300 Iteration 184/4200 Training loss: 2.4503 1.7377 sec/batch
Epoch 14/300 Iteration 185/4200 Training loss: 2.4395 1.8163 sec/batch
Epoch 14/300 Iteration 186/4200 Training loss: 2.4332 1.8312 sec/batch
Epoch 14/300 Iteration 187/4200 Training loss: 2.4325 1.7989 sec/batch
Epoch 14/300 Iteration 188/4200 Training loss: 2.4283 1.7936 sec/batch
Epoch 14/300 Iteration 189/4200 Training loss: 2.4230 1.7382 sec/batch
Epoch 14/300 Iteration 190/4200 Training loss: 2.4202 1.6988 sec/batch
Epoch 14/300 Iteration 191/4200 Training loss: 2.4174 1.6885 sec/batch
Epoch 14/300 Iteration 192/4200 Training loss: 2.4150 1.6863 sec/batch
Epoch 14/300 Iteration 193/4200 Training loss: 2.4140 1.6902 sec/batch
Epoch 14/300 Iteration 194/4200 Training loss: 2.4109 1.6916 sec/batch
Epoch 14/300 Iteration 195/4200 Training loss: 2.4093 1.6887 sec/batch
Epoch 14/300 Iteration 196/4200 Training loss: 2.4070 1.6901 sec/batch
Epoch 15/300 Iteration 197/4200 Training loss: 2.4281 1.6901 sec/batch
Epoch 15/300 Iteration 198/4200 Training loss: 2.4024 1.6909 sec/batch
Epoch 15/300 Iteration 199/4200 Training loss: 2.3933 1.6900 sec/batch
Epoch 15/300 Iteration 200/4200 Training loss: 2.3861 1.7322 sec/batch
Validation loss: 2.26998 Saving checkpoint!
Epoch 15/300 Iteration 201/4200 Training loss: 2.3840 3.2007 sec/batch
Epoch 15/300 Iteration 202/4200 Training loss: 2.3820 1.8330 sec/batch
Epoch 15/300 Iteration 203/4200 Training loss: 2.3786 1.6298 sec/batch
Epoch 15/300 Iteration 204/4200 Training loss: 2.3755 1.6457 sec/batch
Epoch 15/300 Iteration 205/4200 Training loss: 2.3717 1.6919 sec/batch
Epoch 15/300 Iteration 206/4200 Training loss: 2.3691 1.6881 sec/batch
Epoch 15/300 Iteration 207/4200 Training loss: 2.3678 1.6696 sec/batch
Epoch 15/300 Iteration 208/4200 Training loss: 2.3656 1.6858 sec/batch
Epoch 15/300 Iteration 209/4200 Training loss: 2.3640 1.6845 sec/batch
Epoch 15/300 Iteration 210/4200 Training loss: 2.3618 1.6863 sec/batch
Epoch 16/300 Iteration 211/4200 Training loss: 2.3858 1.6913 sec/batch
Epoch 16/300 Iteration 212/4200 Training loss: 2.3555 1.6871 sec/batch
Epoch 16/300 Iteration 213/4200 Training loss: 2.3447 1.6887 sec/batch
Epoch 16/300 Iteration 214/4200 Training loss: 2.3378 1.6889 sec/batch
Epoch 16/300 Iteration 215/4200 Training loss: 2.3384 1.6894 sec/batch
Epoch 16/300 Iteration 216/4200 Training loss: 2.3371 1.6900 sec/batch
Epoch 16/300 Iteration 217/4200 Training loss: 2.3331 1.6893 sec/batch
Epoch 16/300 Iteration 218/4200 Training loss: 2.3316 1.6890 sec/batch
Epoch 16/300 Iteration 219/4200 Training loss: 2.3292 1.6885 sec/batch
Epoch 16/300 Iteration 220/4200 Training loss: 2.3276 1.6921 sec/batch
Epoch 16/300 Iteration 221/4200 Training loss: 2.3271 1.6995 sec/batch
Epoch 16/300 Iteration 222/4200 Training loss: 2.3249 1.6878 sec/batch
Epoch 16/300 Iteration 223/4200 Training loss: 2.3236 1.6880 sec/batch
Epoch 16/300 Iteration 224/4200 Training loss: 2.3215 1.6882 sec/batch
Epoch 17/300 Iteration 225/4200 Training loss: 2.3500 1.6701 sec/batch
Epoch 17/300 Iteration 226/4200 Training loss: 2.3239 1.6874 sec/batch
Epoch 17/300 Iteration 227/4200 Training loss: 2.3140 1.6880 sec/batch
Epoch 17/300 Iteration 228/4200 Training loss: 2.3080 1.6859 sec/batch
Epoch 17/300 Iteration 229/4200 Training loss: 2.3075 1.6880 sec/batch
Epoch 17/300 Iteration 230/4200 Training loss: 2.3077 1.6896 sec/batch
Epoch 17/300 Iteration 231/4200 Training loss: 2.3040 1.6903 sec/batch
Epoch 17/300 Iteration 232/4200 Training loss: 2.3013 1.6883 sec/batch
Epoch 17/300 Iteration 233/4200 Training loss: 2.2981 1.6917 sec/batch
Epoch 17/300 Iteration 234/4200 Training loss: 2.2965 1.6909 sec/batch
Epoch 17/300 Iteration 235/4200 Training loss: 2.2951 1.6912 sec/batch
Epoch 17/300 Iteration 236/4200 Training loss: 2.2936 1.6874 sec/batch
Epoch 17/300 Iteration 237/4200 Training loss: 2.2925 1.6904 sec/batch
Epoch 17/300 Iteration 238/4200 Training loss: 2.2904 1.6861 sec/batch
Epoch 18/300 Iteration 239/4200 Training loss: 2.3262 1.6895 sec/batch
Epoch 18/300 Iteration 240/4200 Training loss: 2.2911 1.6870 sec/batch
Epoch 18/300 Iteration 241/4200 Training loss: 2.2797 1.6908 sec/batch
Epoch 18/300 Iteration 242/4200 Training loss: 2.2749 1.6893 sec/batch
Epoch 18/300 Iteration 243/4200 Training loss: 2.2725 1.6920 sec/batch
Epoch 18/300 Iteration 244/4200 Training loss: 2.2724 1.6872 sec/batch
Epoch 18/300 Iteration 245/4200 Training loss: 2.2685 1.6876 sec/batch
Epoch 18/300 Iteration 246/4200 Training loss: 2.2666 1.6868 sec/batch
Epoch 18/300 Iteration 247/4200 Training loss: 2.2646 1.6894 sec/batch
Epoch 18/300 Iteration 248/4200 Training loss: 2.2639 1.6922 sec/batch
Epoch 18/300 Iteration 249/4200 Training loss: 2.2633 1.6885 sec/batch
Epoch 18/300 Iteration 250/4200 Training loss: 2.2621 1.6879 sec/batch
Epoch 18/300 Iteration 251/4200 Training loss: 2.2613 1.6917 sec/batch
Epoch 18/300 Iteration 252/4200 Training loss: 2.2602 1.6919 sec/batch
Epoch 19/300 Iteration 253/4200 Training loss: 2.2948 1.6905 sec/batch
Epoch 19/300 Iteration 254/4200 Training loss: 2.2660 1.6913 sec/batch
Epoch 19/300 Iteration 255/4200 Training loss: 2.2551 1.6912 sec/batch
Epoch 19/300 Iteration 256/4200 Training loss: 2.2506 1.6706 sec/batch
Epoch 19/300 Iteration 257/4200 Training loss: 2.2489 1.6881 sec/batch
Epoch 19/300 Iteration 258/4200 Training loss: 2.2478 1.6858 sec/batch
Epoch 19/300 Iteration 259/4200 Training loss: 2.2453 1.6877 sec/batch
Epoch 19/300 Iteration 260/4200 Training loss: 2.2433 1.6903 sec/batch
Epoch 19/300 Iteration 261/4200 Training loss: 2.2405 1.6871 sec/batch
Epoch 19/300 Iteration 262/4200 Training loss: 2.2396 1.6938 sec/batch
Epoch 19/300 Iteration 263/4200 Training loss: 2.2389 1.6878 sec/batch
Epoch 19/300 Iteration 264/4200 Training loss: 2.2369 1.6873 sec/batch
Epoch 19/300 Iteration 265/4200 Training loss: 2.2359 1.6869 sec/batch
Epoch 19/300 Iteration 266/4200 Training loss: 2.2345 1.6912 sec/batch
Epoch 20/300 Iteration 267/4200 Training loss: 2.2721 1.6907 sec/batch
Epoch 20/300 Iteration 268/4200 Training loss: 2.2477 1.6877 sec/batch
Epoch 20/300 Iteration 269/4200 Training loss: 2.2311 1.6915 sec/batch
Epoch 20/300 Iteration 270/4200 Training loss: 2.2275 1.6900 sec/batch
Epoch 20/300 Iteration 271/4200 Training loss: 2.2270 1.6892 sec/batch
Epoch 20/300 Iteration 272/4200 Training loss: 2.2240 1.6874 sec/batch
Epoch 20/300 Iteration 273/4200 Training loss: 2.2212 1.6927 sec/batch
Epoch 20/300 Iteration 274/4200 Training loss: 2.2193 1.6866 sec/batch
Epoch 20/300 Iteration 275/4200 Training loss: 2.2175 1.7041 sec/batch
Epoch 20/300 Iteration 276/4200 Training loss: 2.2161 1.7388 sec/batch
Epoch 20/300 Iteration 277/4200 Training loss: 2.2153 1.7344 sec/batch
Epoch 20/300 Iteration 278/4200 Training loss: 2.2142 1.7375 sec/batch
Epoch 20/300 Iteration 279/4200 Training loss: 2.2129 1.6880 sec/batch
Epoch 20/300 Iteration 280/4200 Training loss: 2.2112 1.6901 sec/batch
Epoch 21/300 Iteration 281/4200 Training loss: 2.2438 1.6883 sec/batch
Epoch 21/300 Iteration 282/4200 Training loss: 2.2188 1.6910 sec/batch
Epoch 21/300 Iteration 283/4200 Training loss: 2.2095 1.6859 sec/batch
Epoch 21/300 Iteration 284/4200 Training loss: 2.2062 1.6859 sec/batch
Epoch 21/300 Iteration 285/4200 Training loss: 2.2047 1.7114 sec/batch
Epoch 21/300 Iteration 286/4200 Training loss: 2.2025 1.7368 sec/batch
Epoch 21/300 Iteration 287/4200 Training loss: 2.1992 1.6893 sec/batch
Epoch 21/300 Iteration 288/4200 Training loss: 2.1974 1.6889 sec/batch
Epoch 21/300 Iteration 289/4200 Training loss: 2.1950 1.6877 sec/batch
Epoch 21/300 Iteration 290/4200 Training loss: 2.1941 1.6883 sec/batch
Epoch 21/300 Iteration 291/4200 Training loss: 2.1941 1.7160 sec/batch
Epoch 21/300 Iteration 292/4200 Training loss: 2.1931 1.6982 sec/batch
Epoch 21/300 Iteration 293/4200 Training loss: 2.1924 1.6906 sec/batch
Epoch 21/300 Iteration 294/4200 Training loss: 2.1911 1.6929 sec/batch
Epoch 22/300 Iteration 295/4200 Training loss: 2.2334 1.6870 sec/batch
Epoch 22/300 Iteration 296/4200 Training loss: 2.2088 1.6874 sec/batch
Epoch 22/300 Iteration 297/4200 Training loss: 2.1916 1.6888 sec/batch
Epoch 22/300 Iteration 298/4200 Training loss: 2.1865 1.6886 sec/batch
Epoch 22/300 Iteration 299/4200 Training loss: 2.1852 1.6909 sec/batch
Epoch 22/300 Iteration 300/4200 Training loss: 2.1838 1.6872 sec/batch
Validation loss: 2.08517 Saving checkpoint!
Epoch 22/300 Iteration 301/4200 Training loss: 2.1826 3.1324 sec/batch
Epoch 22/300 Iteration 302/4200 Training loss: 2.1803 1.8813 sec/batch
Epoch 22/300 Iteration 303/4200 Training loss: 2.1775 1.6326 sec/batch
Epoch 22/300 Iteration 304/4200 Training loss: 2.1759 1.6762 sec/batch
Epoch 22/300 Iteration 305/4200 Training loss: 2.1765 1.6882 sec/batch
Epoch 22/300 Iteration 306/4200 Training loss: 2.1754 1.6892 sec/batch
Epoch 22/300 Iteration 307/4200 Training loss: 2.1746 1.6865 sec/batch
Epoch 22/300 Iteration 308/4200 Training loss: 2.1737 1.6873 sec/batch
Epoch 23/300 Iteration 309/4200 Training loss: 2.2038 1.6919 sec/batch
Epoch 23/300 Iteration 310/4200 Training loss: 2.1815 1.6872 sec/batch
Epoch 23/300 Iteration 311/4200 Training loss: 2.1687 1.6865 sec/batch
Epoch 23/300 Iteration 312/4200 Training loss: 2.1646 1.6885 sec/batch
Epoch 23/300 Iteration 313/4200 Training loss: 2.1659 1.6947 sec/batch
Epoch 23/300 Iteration 314/4200 Training loss: 2.1643 1.6881 sec/batch
Epoch 23/300 Iteration 315/4200 Training loss: 2.1604 1.6883 sec/batch
Epoch 23/300 Iteration 316/4200 Training loss: 2.1584 1.6882 sec/batch
Epoch 23/300 Iteration 317/4200 Training loss: 2.1565 1.6923 sec/batch
Epoch 23/300 Iteration 318/4200 Training loss: 2.1566 1.6873 sec/batch
Epoch 23/300 Iteration 319/4200 Training loss: 2.1561 1.6895 sec/batch
Epoch 23/300 Iteration 320/4200 Training loss: 2.1550 1.6882 sec/batch
Epoch 23/300 Iteration 321/4200 Training loss: 2.1555 1.6877 sec/batch
Epoch 23/300 Iteration 322/4200 Training loss: 2.1542 1.6867 sec/batch
Epoch 24/300 Iteration 323/4200 Training loss: 2.1818 1.6906 sec/batch
Epoch 24/300 Iteration 324/4200 Training loss: 2.1586 1.6836 sec/batch
Epoch 24/300 Iteration 325/4200 Training loss: 2.1489 1.6872 sec/batch
Epoch 24/300 Iteration 326/4200 Training loss: 2.1467 1.6898 sec/batch
Epoch 24/300 Iteration 327/4200 Training loss: 2.1469 1.6875 sec/batch
Epoch 24/300 Iteration 328/4200 Training loss: 2.1460 1.6865 sec/batch
Epoch 24/300 Iteration 329/4200 Training loss: 2.1442 1.6897 sec/batch
Epoch 24/300 Iteration 330/4200 Training loss: 2.1414 1.6869 sec/batch
Epoch 24/300 Iteration 331/4200 Training loss: 2.1390 1.6874 sec/batch
Epoch 24/300 Iteration 332/4200 Training loss: 2.1379 1.6910 sec/batch
Epoch 24/300 Iteration 333/4200 Training loss: 2.1381 1.6877 sec/batch
Epoch 24/300 Iteration 334/4200 Training loss: 2.1376 1.6912 sec/batch
Epoch 24/300 Iteration 335/4200 Training loss: 2.1375 1.6885 sec/batch
Epoch 24/300 Iteration 336/4200 Training loss: 2.1361 1.6886 sec/batch
Epoch 25/300 Iteration 337/4200 Training loss: 2.1713 1.6904 sec/batch
Epoch 25/300 Iteration 338/4200 Training loss: 2.1461 1.6911 sec/batch
Epoch 25/300 Iteration 339/4200 Training loss: 2.1339 1.6882 sec/batch
Epoch 25/300 Iteration 340/4200 Training loss: 2.1283 1.6850 sec/batch
Epoch 25/300 Iteration 341/4200 Training loss: 2.1286 1.6906 sec/batch
Epoch 25/300 Iteration 342/4200 Training loss: 2.1265 1.6885 sec/batch
Epoch 25/300 Iteration 343/4200 Training loss: 2.1238 1.6869 sec/batch
Epoch 25/300 Iteration 344/4200 Training loss: 2.1215 1.6865 sec/batch
Epoch 25/300 Iteration 345/4200 Training loss: 2.1189 1.6872 sec/batch
Epoch 25/300 Iteration 346/4200 Training loss: 2.1189 1.6901 sec/batch
Epoch 25/300 Iteration 347/4200 Training loss: 2.1191 1.6864 sec/batch
Epoch 25/300 Iteration 348/4200 Training loss: 2.1181 1.6911 sec/batch
Epoch 25/300 Iteration 349/4200 Training loss: 2.1176 1.6884 sec/batch
Epoch 25/300 Iteration 350/4200 Training loss: 2.1164 1.6910 sec/batch
Epoch 26/300 Iteration 351/4200 Training loss: 2.1517 1.7387 sec/batch
Epoch 26/300 Iteration 352/4200 Training loss: 2.1278 1.7423 sec/batch
Epoch 26/300 Iteration 353/4200 Training loss: 2.1154 1.7333 sec/batch
Epoch 26/300 Iteration 354/4200 Training loss: 2.1110 1.7359 sec/batch
Epoch 26/300 Iteration 355/4200 Training loss: 2.1108 1.7400 sec/batch
Epoch 26/300 Iteration 356/4200 Training loss: 2.1089 1.7362 sec/batch
Epoch 26/300 Iteration 357/4200 Training loss: 2.1061 1.7338 sec/batch
Epoch 26/300 Iteration 358/4200 Training loss: 2.1031 1.6927 sec/batch
Epoch 26/300 Iteration 359/4200 Training loss: 2.1017 1.9400 sec/batch
Epoch 26/300 Iteration 360/4200 Training loss: 2.1008 1.9734 sec/batch
Epoch 26/300 Iteration 361/4200 Training loss: 2.1015 1.9794 sec/batch
Epoch 26/300 Iteration 362/4200 Training loss: 2.1014 1.9656 sec/batch
Epoch 26/300 Iteration 363/4200 Training loss: 2.1017 1.9687 sec/batch
Epoch 26/300 Iteration 364/4200 Training loss: 2.1001 1.8724 sec/batch
Epoch 27/300 Iteration 365/4200 Training loss: 2.1365 1.6933 sec/batch
Epoch 27/300 Iteration 366/4200 Training loss: 2.1105 1.6854 sec/batch
Epoch 27/300 Iteration 367/4200 Training loss: 2.0952 1.6894 sec/batch
Epoch 27/300 Iteration 368/4200 Training loss: 2.0910 1.6911 sec/batch
Epoch 27/300 Iteration 369/4200 Training loss: 2.0932 1.6866 sec/batch
Epoch 27/300 Iteration 370/4200 Training loss: 2.0923 1.6871 sec/batch
Epoch 27/300 Iteration 371/4200 Training loss: 2.0892 1.6883 sec/batch
Epoch 27/300 Iteration 372/4200 Training loss: 2.0874 1.6923 sec/batch
Epoch 27/300 Iteration 373/4200 Training loss: 2.0856 1.6936 sec/batch
Epoch 27/300 Iteration 374/4200 Training loss: 2.0861 1.6881 sec/batch
Epoch 27/300 Iteration 375/4200 Training loss: 2.0874 1.6926 sec/batch
Epoch 27/300 Iteration 376/4200 Training loss: 2.0862 1.7213 sec/batch
Epoch 27/300 Iteration 377/4200 Training loss: 2.0870 1.7353 sec/batch
Epoch 27/300 Iteration 378/4200 Training loss: 2.0856 1.7340 sec/batch
Epoch 28/300 Iteration 379/4200 Training loss: 2.1139 1.7011 sec/batch
Epoch 28/300 Iteration 380/4200 Training loss: 2.0933 1.7361 sec/batch
Epoch 28/300 Iteration 381/4200 Training loss: 2.0794 1.7355 sec/batch
Epoch 28/300 Iteration 382/4200 Training loss: 2.0771 1.7372 sec/batch
Epoch 28/300 Iteration 383/4200 Training loss: 2.0776 1.7384 sec/batch
Epoch 28/300 Iteration 384/4200 Training loss: 2.0763 1.7385 sec/batch
Epoch 28/300 Iteration 385/4200 Training loss: 2.0732 1.7371 sec/batch
Epoch 28/300 Iteration 386/4200 Training loss: 2.0719 1.7068 sec/batch
Epoch 28/300 Iteration 387/4200 Training loss: 2.0703 1.7345 sec/batch
Epoch 28/300 Iteration 388/4200 Training loss: 2.0695 1.7379 sec/batch
Epoch 28/300 Iteration 389/4200 Training loss: 2.0695 1.7409 sec/batch
Epoch 28/300 Iteration 390/4200 Training loss: 2.0689 1.7370 sec/batch
Epoch 28/300 Iteration 391/4200 Training loss: 2.0690 1.6879 sec/batch
Epoch 28/300 Iteration 392/4200 Training loss: 2.0675 1.7238 sec/batch
Epoch 29/300 Iteration 393/4200 Training loss: 2.1043 1.7401 sec/batch
Epoch 29/300 Iteration 394/4200 Training loss: 2.0797 1.7364 sec/batch
Epoch 29/300 Iteration 395/4200 Training loss: 2.0623 1.7373 sec/batch
Epoch 29/300 Iteration 396/4200 Training loss: 2.0599 1.7401 sec/batch
Epoch 29/300 Iteration 397/4200 Training loss: 2.0604 1.7366 sec/batch
Epoch 29/300 Iteration 398/4200 Training loss: 2.0590 1.7359 sec/batch
Epoch 29/300 Iteration 399/4200 Training loss: 2.0554 1.7400 sec/batch
Epoch 29/300 Iteration 400/4200 Training loss: 2.0533 1.7413 sec/batch
Validation loss: 1.95575 Saving checkpoint!
Epoch 29/300 Iteration 401/4200 Training loss: 2.0538 3.2041 sec/batch
Epoch 29/300 Iteration 402/4200 Training loss: 2.0530 1.8211 sec/batch
Epoch 29/300 Iteration 403/4200 Training loss: 2.0530 1.6363 sec/batch
Epoch 29/300 Iteration 404/4200 Training loss: 2.0518 1.6527 sec/batch
Epoch 29/300 Iteration 405/4200 Training loss: 2.0517 1.6721 sec/batch
Epoch 29/300 Iteration 406/4200 Training loss: 2.0511 1.6876 sec/batch
Epoch 30/300 Iteration 407/4200 Training loss: 2.0796 1.6924 sec/batch
Epoch 30/300 Iteration 408/4200 Training loss: 2.0588 1.6905 sec/batch
Epoch 30/300 Iteration 409/4200 Training loss: 2.0453 1.6892 sec/batch
Epoch 30/300 Iteration 410/4200 Training loss: 2.0416 1.6910 sec/batch
Epoch 30/300 Iteration 411/4200 Training loss: 2.0422 1.6906 sec/batch
Epoch 30/300 Iteration 412/4200 Training loss: 2.0416 1.6871 sec/batch
Epoch 30/300 Iteration 413/4200 Training loss: 2.0389 1.6921 sec/batch
Epoch 30/300 Iteration 414/4200 Training loss: 2.0363 1.6915 sec/batch
Epoch 30/300 Iteration 415/4200 Training loss: 2.0340 1.6883 sec/batch
Epoch 30/300 Iteration 416/4200 Training loss: 2.0340 1.6875 sec/batch
Epoch 30/300 Iteration 417/4200 Training loss: 2.0345 1.6899 sec/batch
Epoch 30/300 Iteration 418/4200 Training loss: 2.0342 1.6897 sec/batch
Epoch 30/300 Iteration 419/4200 Training loss: 2.0343 1.6870 sec/batch
Epoch 30/300 Iteration 420/4200 Training loss: 2.0331 1.6990 sec/batch
Epoch 31/300 Iteration 421/4200 Training loss: 2.0617 1.7357 sec/batch
Epoch 31/300 Iteration 422/4200 Training loss: 2.0428 1.7372 sec/batch
Epoch 31/300 Iteration 423/4200 Training loss: 2.0293 1.7357 sec/batch
Epoch 31/300 Iteration 424/4200 Training loss: 2.0254 1.7350 sec/batch
Epoch 31/300 Iteration 425/4200 Training loss: 2.0255 1.7669 sec/batch
Epoch 31/300 Iteration 426/4200 Training loss: 2.0239 1.7521 sec/batch
Epoch 31/300 Iteration 427/4200 Training loss: 2.0215 1.7368 sec/batch
Epoch 31/300 Iteration 428/4200 Training loss: 2.0184 1.7354 sec/batch
Epoch 31/300 Iteration 429/4200 Training loss: 2.0166 1.7368 sec/batch
Epoch 31/300 Iteration 430/4200 Training loss: 2.0171 1.6931 sec/batch
Epoch 31/300 Iteration 431/4200 Training loss: 2.0182 1.6892 sec/batch
Epoch 31/300 Iteration 432/4200 Training loss: 2.0168 1.6892 sec/batch
Epoch 31/300 Iteration 433/4200 Training loss: 2.0177 1.6892 sec/batch
Epoch 31/300 Iteration 434/4200 Training loss: 2.0173 1.6881 sec/batch
Epoch 32/300 Iteration 435/4200 Training loss: 2.0543 1.6874 sec/batch
Epoch 32/300 Iteration 436/4200 Training loss: 2.0302 1.6910 sec/batch
Epoch 32/300 Iteration 437/4200 Training loss: 2.0154 1.6886 sec/batch
Epoch 32/300 Iteration 438/4200 Training loss: 2.0134 1.6892 sec/batch
Epoch 32/300 Iteration 439/4200 Training loss: 2.0141 1.6996 sec/batch
Epoch 32/300 Iteration 440/4200 Training loss: 2.0141 1.6884 sec/batch
Epoch 32/300 Iteration 441/4200 Training loss: 2.0114 1.6851 sec/batch
Epoch 32/300 Iteration 442/4200 Training loss: 2.0080 1.7287 sec/batch
Epoch 32/300 Iteration 443/4200 Training loss: 2.0050 1.7359 sec/batch
Epoch 32/300 Iteration 444/4200 Training loss: 2.0037 1.7349 sec/batch
Epoch 32/300 Iteration 445/4200 Training loss: 2.0046 1.7355 sec/batch
Epoch 32/300 Iteration 446/4200 Training loss: 2.0043 1.6891 sec/batch
Epoch 32/300 Iteration 447/4200 Training loss: 2.0049 1.7241 sec/batch
Epoch 32/300 Iteration 448/4200 Training loss: 2.0036 1.7372 sec/batch
Epoch 33/300 Iteration 449/4200 Training loss: 2.0351 1.7359 sec/batch
Epoch 33/300 Iteration 450/4200 Training loss: 2.0162 1.7364 sec/batch
Epoch 33/300 Iteration 451/4200 Training loss: 2.0011 1.7352 sec/batch
Epoch 33/300 Iteration 452/4200 Training loss: 2.0000 1.7375 sec/batch
Epoch 33/300 Iteration 453/4200 Training loss: 2.0012 1.7395 sec/batch
Epoch 33/300 Iteration 454/4200 Training loss: 2.0005 1.7358 sec/batch
Epoch 33/300 Iteration 455/4200 Training loss: 1.9981 1.6903 sec/batch
Epoch 33/300 Iteration 456/4200 Training loss: 1.9957 1.6943 sec/batch
Epoch 33/300 Iteration 457/4200 Training loss: 1.9926 1.6863 sec/batch
Epoch 33/300 Iteration 458/4200 Training loss: 1.9925 1.6872 sec/batch
Epoch 33/300 Iteration 459/4200 Training loss: 1.9927 1.6872 sec/batch
Epoch 33/300 Iteration 460/4200 Training loss: 1.9921 1.6886 sec/batch
Epoch 33/300 Iteration 461/4200 Training loss: 1.9928 1.7258 sec/batch
Epoch 33/300 Iteration 462/4200 Training loss: 1.9915 1.7354 sec/batch
Epoch 34/300 Iteration 463/4200 Training loss: 2.0245 1.7344 sec/batch
Epoch 34/300 Iteration 464/4200 Training loss: 2.0028 1.7390 sec/batch
Epoch 34/300 Iteration 465/4200 Training loss: 1.9867 1.7391 sec/batch
Epoch 34/300 Iteration 466/4200 Training loss: 1.9813 1.6880 sec/batch
Epoch 34/300 Iteration 467/4200 Training loss: 1.9833 1.6884 sec/batch
Epoch 34/300 Iteration 468/4200 Training loss: 1.9832 1.7202 sec/batch
Epoch 34/300 Iteration 469/4200 Training loss: 1.9812 1.7378 sec/batch
Epoch 34/300 Iteration 470/4200 Training loss: 1.9797 1.7419 sec/batch
Epoch 34/300 Iteration 471/4200 Training loss: 1.9781 1.6909 sec/batch
Epoch 34/300 Iteration 472/4200 Training loss: 1.9776 1.6893 sec/batch
Epoch 34/300 Iteration 473/4200 Training loss: 1.9778 1.6884 sec/batch
Epoch 34/300 Iteration 474/4200 Training loss: 1.9766 1.6935 sec/batch
Epoch 34/300 Iteration 475/4200 Training loss: 1.9767 1.6879 sec/batch
Epoch 34/300 Iteration 476/4200 Training loss: 1.9752 1.6890 sec/batch
Epoch 35/300 Iteration 477/4200 Training loss: 1.9997 1.6895 sec/batch
Epoch 35/300 Iteration 478/4200 Training loss: 1.9805 1.6932 sec/batch
Epoch 35/300 Iteration 479/4200 Training loss: 1.9668 1.7379 sec/batch
Epoch 35/300 Iteration 480/4200 Training loss: 1.9633 1.7360 sec/batch
Epoch 35/300 Iteration 481/4200 Training loss: 1.9659 1.6933 sec/batch
Epoch 35/300 Iteration 482/4200 Training loss: 1.9659 1.6883 sec/batch
Epoch 35/300 Iteration 483/4200 Training loss: 1.9620 1.7364 sec/batch
Epoch 35/300 Iteration 484/4200 Training loss: 1.9592 1.7353 sec/batch
Epoch 35/300 Iteration 485/4200 Training loss: 1.9582 1.7385 sec/batch
Epoch 35/300 Iteration 486/4200 Training loss: 1.9583 1.7360 sec/batch
Epoch 35/300 Iteration 487/4200 Training loss: 1.9593 1.7353 sec/batch
Epoch 35/300 Iteration 488/4200 Training loss: 1.9599 1.7399 sec/batch
Epoch 35/300 Iteration 489/4200 Training loss: 1.9603 1.7362 sec/batch
Epoch 35/300 Iteration 490/4200 Training loss: 1.9591 1.7346 sec/batch
Epoch 36/300 Iteration 491/4200 Training loss: 1.9952 1.7388 sec/batch
Epoch 36/300 Iteration 492/4200 Training loss: 1.9748 1.7347 sec/batch
Epoch 36/300 Iteration 493/4200 Training loss: 1.9586 1.7353 sec/batch
Epoch 36/300 Iteration 494/4200 Training loss: 1.9532 1.7356 sec/batch
Epoch 36/300 Iteration 495/4200 Training loss: 1.9544 1.7669 sec/batch
Epoch 36/300 Iteration 496/4200 Training loss: 1.9525 1.7530 sec/batch
Epoch 36/300 Iteration 497/4200 Training loss: 1.9489 1.7349 sec/batch
Epoch 36/300 Iteration 498/4200 Training loss: 1.9465 1.7367 sec/batch
Epoch 36/300 Iteration 499/4200 Training loss: 1.9433 1.7361 sec/batch
Epoch 36/300 Iteration 500/4200 Training loss: 1.9425 1.7360 sec/batch
Validation loss: 1.85878 Saving checkpoint!
Epoch 36/300 Iteration 501/4200 Training loss: 1.9471 1.6407 sec/batch
Epoch 36/300 Iteration 502/4200 Training loss: 1.9466 1.7366 sec/batch
Epoch 36/300 Iteration 503/4200 Training loss: 1.9469 1.7412 sec/batch
Epoch 36/300 Iteration 504/4200 Training loss: 1.9468 1.7351 sec/batch
Epoch 37/300 Iteration 505/4200 Training loss: 1.9895 1.7376 sec/batch
Epoch 37/300 Iteration 506/4200 Training loss: 1.9619 1.7395 sec/batch
Epoch 37/300 Iteration 507/4200 Training loss: 1.9434 1.7406 sec/batch
Epoch 37/300 Iteration 508/4200 Training loss: 1.9408 1.7399 sec/batch
Epoch 37/300 Iteration 509/4200 Training loss: 1.9417 1.7375 sec/batch
Epoch 37/300 Iteration 510/4200 Training loss: 1.9397 1.7385 sec/batch
Epoch 37/300 Iteration 511/4200 Training loss: 1.9356 1.7384 sec/batch
Epoch 37/300 Iteration 512/4200 Training loss: 1.9324 1.7375 sec/batch
Epoch 37/300 Iteration 513/4200 Training loss: 1.9301 1.7359 sec/batch
Epoch 37/300 Iteration 514/4200 Training loss: 1.9312 1.6951 sec/batch
Epoch 37/300 Iteration 515/4200 Training loss: 1.9332 1.7388 sec/batch
Epoch 37/300 Iteration 516/4200 Training loss: 1.9334 1.6876 sec/batch
Epoch 37/300 Iteration 517/4200 Training loss: 1.9340 1.7251 sec/batch
Epoch 37/300 Iteration 518/4200 Training loss: 1.9333 1.7405 sec/batch
Epoch 38/300 Iteration 519/4200 Training loss: 1.9700 1.7019 sec/batch
Epoch 38/300 Iteration 520/4200 Training loss: 1.9480 1.7385 sec/batch
Epoch 38/300 Iteration 521/4200 Training loss: 1.9324 1.7403 sec/batch
Epoch 38/300 Iteration 522/4200 Training loss: 1.9279 1.7389 sec/batch
Epoch 38/300 Iteration 523/4200 Training loss: 1.9287 1.7360 sec/batch
Epoch 38/300 Iteration 524/4200 Training loss: 1.9273 1.7432 sec/batch
Epoch 38/300 Iteration 525/4200 Training loss: 1.9250 1.7364 sec/batch
Epoch 38/300 Iteration 526/4200 Training loss: 1.9224 1.7371 sec/batch
Epoch 38/300 Iteration 527/4200 Training loss: 1.9203 1.7394 sec/batch
Epoch 38/300 Iteration 528/4200 Training loss: 1.9202 1.6903 sec/batch
Epoch 38/300 Iteration 529/4200 Training loss: 1.9215 1.6881 sec/batch
Epoch 38/300 Iteration 530/4200 Training loss: 1.9213 1.6888 sec/batch
Epoch 38/300 Iteration 531/4200 Training loss: 1.9218 1.6909 sec/batch
Epoch 38/300 Iteration 532/4200 Training loss: 1.9213 1.6878 sec/batch
Epoch 39/300 Iteration 533/4200 Training loss: 1.9532 1.7068 sec/batch
Epoch 39/300 Iteration 534/4200 Training loss: 1.9262 1.7353 sec/batch
Epoch 39/300 Iteration 535/4200 Training loss: 1.9106 1.7411 sec/batch
Epoch 39/300 Iteration 536/4200 Training loss: 1.9101 1.7390 sec/batch
Epoch 39/300 Iteration 537/4200 Training loss: 1.9113 1.7347 sec/batch
Epoch 39/300 Iteration 538/4200 Training loss: 1.9103 1.7425 sec/batch
Epoch 39/300 Iteration 539/4200 Training loss: 1.9072 1.7365 sec/batch
Epoch 39/300 Iteration 540/4200 Training loss: 1.9038 1.7374 sec/batch
Epoch 39/300 Iteration 541/4200 Training loss: 1.9026 1.7367 sec/batch
Epoch 39/300 Iteration 542/4200 Training loss: 1.9019 1.7445 sec/batch
Epoch 39/300 Iteration 543/4200 Training loss: 1.9035 1.7358 sec/batch
Epoch 39/300 Iteration 544/4200 Training loss: 1.9040 1.7381 sec/batch
Epoch 39/300 Iteration 545/4200 Training loss: 1.9053 1.7376 sec/batch
Epoch 39/300 Iteration 546/4200 Training loss: 1.9043 1.7356 sec/batch
Epoch 40/300 Iteration 547/4200 Training loss: 1.9445 1.7357 sec/batch
Epoch 40/300 Iteration 548/4200 Training loss: 1.9154 1.7356 sec/batch
Epoch 40/300 Iteration 549/4200 Training loss: 1.9028 1.7349 sec/batch
Epoch 40/300 Iteration 550/4200 Training loss: 1.8973 1.7355 sec/batch
Epoch 40/300 Iteration 551/4200 Training loss: 1.9008 1.7345 sec/batch
Epoch 40/300 Iteration 552/4200 Training loss: 1.8993 1.7396 sec/batch
Epoch 40/300 Iteration 553/4200 Training loss: 1.8950 1.7356 sec/batch
Epoch 40/300 Iteration 554/4200 Training loss: 1.8925 1.7356 sec/batch
Epoch 40/300 Iteration 555/4200 Training loss: 1.8906 1.7361 sec/batch
Epoch 40/300 Iteration 556/4200 Training loss: 1.8912 1.6929 sec/batch
Epoch 40/300 Iteration 557/4200 Training loss: 1.8924 1.6928 sec/batch
Epoch 40/300 Iteration 558/4200 Training loss: 1.8923 1.7284 sec/batch
Epoch 40/300 Iteration 559/4200 Training loss: 1.8938 1.7404 sec/batch
Epoch 40/300 Iteration 560/4200 Training loss: 1.8937 1.7393 sec/batch
Epoch 41/300 Iteration 561/4200 Training loss: 1.9296 1.6897 sec/batch
Epoch 41/300 Iteration 562/4200 Training loss: 1.9040 1.6871 sec/batch
Epoch 41/300 Iteration 563/4200 Training loss: 1.8884 1.7141 sec/batch
Epoch 41/300 Iteration 564/4200 Training loss: 1.8857 1.7386 sec/batch
Epoch 41/300 Iteration 565/4200 Training loss: 1.8892 1.7375 sec/batch
Epoch 41/300 Iteration 566/4200 Training loss: 1.8880 1.7386 sec/batch
Epoch 41/300 Iteration 567/4200 Training loss: 1.8838 1.7388 sec/batch
Epoch 41/300 Iteration 568/4200 Training loss: 1.8815 1.6928 sec/batch
Epoch 41/300 Iteration 569/4200 Training loss: 1.8788 1.6938 sec/batch
Epoch 41/300 Iteration 570/4200 Training loss: 1.8780 1.6880 sec/batch
Epoch 41/300 Iteration 571/4200 Training loss: 1.8789 1.6880 sec/batch
Epoch 41/300 Iteration 572/4200 Training loss: 1.8787 1.6891 sec/batch
Epoch 41/300 Iteration 573/4200 Training loss: 1.8803 1.6883 sec/batch
Epoch 41/300 Iteration 574/4200 Training loss: 1.8800 1.6889 sec/batch
Epoch 42/300 Iteration 575/4200 Training loss: 1.9200 1.7047 sec/batch
Epoch 42/300 Iteration 576/4200 Training loss: 1.8962 1.7387 sec/batch
Epoch 42/300 Iteration 577/4200 Training loss: 1.8813 1.7400 sec/batch
Epoch 42/300 Iteration 578/4200 Training loss: 1.8760 1.7369 sec/batch
Epoch 42/300 Iteration 579/4200 Training loss: 1.8761 1.6918 sec/batch
Epoch 42/300 Iteration 580/4200 Training loss: 1.8748 1.7383 sec/batch
Epoch 42/300 Iteration 581/4200 Training loss: 1.8719 1.7366 sec/batch
Epoch 42/300 Iteration 582/4200 Training loss: 1.8691 1.7361 sec/batch
Epoch 42/300 Iteration 583/4200 Training loss: 1.8672 1.7347 sec/batch
Epoch 42/300 Iteration 584/4200 Training loss: 1.8664 1.7356 sec/batch
Epoch 42/300 Iteration 585/4200 Training loss: 1.8672 1.7372 sec/batch
Epoch 42/300 Iteration 586/4200 Training loss: 1.8677 1.7382 sec/batch
Epoch 42/300 Iteration 587/4200 Training loss: 1.8675 1.6913 sec/batch
Epoch 42/300 Iteration 588/4200 Training loss: 1.8678 1.6920 sec/batch
Epoch 43/300 Iteration 589/4200 Training loss: 1.9017 1.6871 sec/batch
Epoch 43/300 Iteration 590/4200 Training loss: 1.8819 1.6871 sec/batch
Epoch 43/300 Iteration 591/4200 Training loss: 1.8675 1.6853 sec/batch
Epoch 43/300 Iteration 592/4200 Training loss: 1.8624 1.7231 sec/batch
Epoch 43/300 Iteration 593/4200 Training loss: 1.8651 1.7362 sec/batch
Epoch 43/300 Iteration 594/4200 Training loss: 1.8637 1.7372 sec/batch
Epoch 43/300 Iteration 595/4200 Training loss: 1.8600 1.7372 sec/batch
Epoch 43/300 Iteration 596/4200 Training loss: 1.8568 1.7380 sec/batch
Epoch 43/300 Iteration 597/4200 Training loss: 1.8533 1.6917 sec/batch
Epoch 43/300 Iteration 598/4200 Training loss: 1.8543 1.6907 sec/batch
Epoch 43/300 Iteration 599/4200 Training loss: 1.8559 1.6885 sec/batch
Epoch 43/300 Iteration 600/4200 Training loss: 1.8559 1.6888 sec/batch
Validation loss: 1.77177 Saving checkpoint!
Epoch 43/300 Iteration 601/4200 Training loss: 1.8591 1.6310 sec/batch
Epoch 43/300 Iteration 602/4200 Training loss: 1.8585 1.7362 sec/batch
Epoch 44/300 Iteration 603/4200 Training loss: 1.8874 1.7422 sec/batch
Epoch 44/300 Iteration 604/4200 Training loss: 1.8637 1.6920 sec/batch
Epoch 44/300 Iteration 605/4200 Training loss: 1.8497 1.6867 sec/batch
Epoch 44/300 Iteration 606/4200 Training loss: 1.8478 1.6999 sec/batch
Epoch 44/300 Iteration 607/4200 Training loss: 1.8513 1.7346 sec/batch
Epoch 44/300 Iteration 608/4200 Training loss: 1.8492 1.7370 sec/batch
Epoch 44/300 Iteration 609/4200 Training loss: 1.8458 1.7366 sec/batch
Epoch 44/300 Iteration 610/4200 Training loss: 1.8434 1.7358 sec/batch
Epoch 44/300 Iteration 611/4200 Training loss: 1.8410 1.7413 sec/batch
Epoch 44/300 Iteration 612/4200 Training loss: 1.8412 1.7387 sec/batch
Epoch 44/300 Iteration 613/4200 Training loss: 1.8423 1.7359 sec/batch
Epoch 44/300 Iteration 614/4200 Training loss: 1.8427 1.7361 sec/batch
Epoch 44/300 Iteration 615/4200 Training loss: 1.8438 1.7381 sec/batch
Epoch 44/300 Iteration 616/4200 Training loss: 1.8442 1.7361 sec/batch
Epoch 45/300 Iteration 617/4200 Training loss: 1.8837 1.6920 sec/batch
Epoch 45/300 Iteration 618/4200 Training loss: 1.8539 1.6884 sec/batch
Epoch 45/300 Iteration 619/4200 Training loss: 1.8402 1.6881 sec/batch
Epoch 45/300 Iteration 620/4200 Training loss: 1.8388 1.6881 sec/batch
Epoch 45/300 Iteration 621/4200 Training loss: 1.8395 1.6888 sec/batch
Epoch 45/300 Iteration 622/4200 Training loss: 1.8393 1.7371 sec/batch
Epoch 45/300 Iteration 623/4200 Training loss: 1.8349 1.7340 sec/batch
Epoch 45/300 Iteration 624/4200 Training loss: 1.8318 1.7415 sec/batch
Epoch 45/300 Iteration 625/4200 Training loss: 1.8294 1.7360 sec/batch
Epoch 45/300 Iteration 626/4200 Training loss: 1.8303 1.7361 sec/batch
Epoch 45/300 Iteration 627/4200 Training loss: 1.8319 1.7343 sec/batch
Epoch 45/300 Iteration 628/4200 Training loss: 1.8324 1.7366 sec/batch
Epoch 45/300 Iteration 629/4200 Training loss: 1.8341 1.7378 sec/batch
Epoch 45/300 Iteration 630/4200 Training loss: 1.8331 1.7378 sec/batch
Epoch 46/300 Iteration 631/4200 Training loss: 1.8734 1.7707 sec/batch
Epoch 46/300 Iteration 632/4200 Training loss: 1.8456 1.7401 sec/batch
Epoch 46/300 Iteration 633/4200 Training loss: 1.8317 1.7052 sec/batch
Epoch 46/300 Iteration 634/4200 Training loss: 1.8255 1.6927 sec/batch
Epoch 46/300 Iteration 635/4200 Training loss: 1.8278 1.7007 sec/batch
Epoch 46/300 Iteration 636/4200 Training loss: 1.8275 1.7361 sec/batch
Epoch 46/300 Iteration 637/4200 Training loss: 1.8226 1.7362 sec/batch
Epoch 46/300 Iteration 638/4200 Training loss: 1.8207 1.6911 sec/batch
Epoch 46/300 Iteration 639/4200 Training loss: 1.8182 1.7134 sec/batch
Epoch 46/300 Iteration 640/4200 Training loss: 1.8189 1.7370 sec/batch
Epoch 46/300 Iteration 641/4200 Training loss: 1.8205 1.7458 sec/batch
Epoch 46/300 Iteration 642/4200 Training loss: 1.8207 1.7387 sec/batch
Epoch 46/300 Iteration 643/4200 Training loss: 1.8223 1.7358 sec/batch
Epoch 46/300 Iteration 644/4200 Training loss: 1.8213 1.7385 sec/batch
Epoch 47/300 Iteration 645/4200 Training loss: 1.8579 1.7411 sec/batch
Epoch 47/300 Iteration 646/4200 Training loss: 1.8288 1.7407 sec/batch
Epoch 47/300 Iteration 647/4200 Training loss: 1.8143 1.7370 sec/batch
Epoch 47/300 Iteration 648/4200 Training loss: 1.8120 1.7363 sec/batch
Epoch 47/300 Iteration 649/4200 Training loss: 1.8143 1.7372 sec/batch
Epoch 47/300 Iteration 650/4200 Training loss: 1.8139 1.7363 sec/batch
Epoch 47/300 Iteration 651/4200 Training loss: 1.8084 1.7392 sec/batch
Epoch 47/300 Iteration 652/4200 Training loss: 1.8072 1.7346 sec/batch
Epoch 47/300 Iteration 653/4200 Training loss: 1.8047 1.7363 sec/batch
Epoch 47/300 Iteration 654/4200 Training loss: 1.8052 1.7368 sec/batch
Epoch 47/300 Iteration 655/4200 Training loss: 1.8073 1.7365 sec/batch
Epoch 47/300 Iteration 656/4200 Training loss: 1.8081 1.7356 sec/batch
Epoch 47/300 Iteration 657/4200 Training loss: 1.8090 1.7382 sec/batch
Epoch 47/300 Iteration 658/4200 Training loss: 1.8078 1.7393 sec/batch
Epoch 48/300 Iteration 659/4200 Training loss: 1.8393 1.7384 sec/batch
Epoch 48/300 Iteration 660/4200 Training loss: 1.8151 1.7370 sec/batch
Epoch 48/300 Iteration 661/4200 Training loss: 1.7998 1.7357 sec/batch
Epoch 48/300 Iteration 662/4200 Training loss: 1.7989 1.7409 sec/batch
Epoch 48/300 Iteration 663/4200 Training loss: 1.8009 1.7376 sec/batch
Epoch 48/300 Iteration 664/4200 Training loss: 1.8020 1.7368 sec/batch
Epoch 48/300 Iteration 665/4200 Training loss: 1.7983 1.7453 sec/batch
Epoch 48/300 Iteration 666/4200 Training loss: 1.7957 1.6901 sec/batch
Epoch 48/300 Iteration 667/4200 Training loss: 1.7914 1.6970 sec/batch
Epoch 48/300 Iteration 668/4200 Training loss: 1.7920 1.7352 sec/batch
Epoch 48/300 Iteration 669/4200 Training loss: 1.7935 1.7369 sec/batch
Epoch 48/300 Iteration 670/4200 Training loss: 1.7940 1.7341 sec/batch
Epoch 48/300 Iteration 671/4200 Training loss: 1.7955 1.7405 sec/batch
Epoch 48/300 Iteration 672/4200 Training loss: 1.7949 1.6925 sec/batch
Epoch 49/300 Iteration 673/4200 Training loss: 1.8338 1.6902 sec/batch
Epoch 49/300 Iteration 674/4200 Training loss: 1.8096 1.6893 sec/batch
Epoch 49/300 Iteration 675/4200 Training loss: 1.7939 1.6879 sec/batch
Epoch 49/300 Iteration 676/4200 Training loss: 1.7914 1.6895 sec/batch
Epoch 49/300 Iteration 677/4200 Training loss: 1.7927 1.6880 sec/batch
Epoch 49/300 Iteration 678/4200 Training loss: 1.7925 1.6870 sec/batch
Epoch 49/300 Iteration 679/4200 Training loss: 1.7879 1.6890 sec/batch
Epoch 49/300 Iteration 680/4200 Training loss: 1.7857 1.7144 sec/batch
Epoch 49/300 Iteration 681/4200 Training loss: 1.7836 1.7355 sec/batch
Epoch 49/300 Iteration 682/4200 Training loss: 1.7833 1.7366 sec/batch
Epoch 49/300 Iteration 683/4200 Training loss: 1.7847 1.7406 sec/batch
Epoch 49/300 Iteration 684/4200 Training loss: 1.7845 1.6884 sec/batch
Epoch 49/300 Iteration 685/4200 Training loss: 1.7855 1.7171 sec/batch
Epoch 49/300 Iteration 686/4200 Training loss: 1.7847 1.7359 sec/batch
Epoch 50/300 Iteration 687/4200 Training loss: 1.8201 1.7402 sec/batch
Epoch 50/300 Iteration 688/4200 Training loss: 1.7970 1.7398 sec/batch
Epoch 50/300 Iteration 689/4200 Training loss: 1.7801 1.7345 sec/batch
Epoch 50/300 Iteration 690/4200 Training loss: 1.7791 1.7368 sec/batch
Epoch 50/300 Iteration 691/4200 Training loss: 1.7800 1.7361 sec/batch
Epoch 50/300 Iteration 692/4200 Training loss: 1.7799 1.7363 sec/batch
Epoch 50/300 Iteration 693/4200 Training loss: 1.7756 1.7026 sec/batch
Epoch 50/300 Iteration 694/4200 Training loss: 1.7724 1.7354 sec/batch
Epoch 50/300 Iteration 695/4200 Training loss: 1.7711 1.7359 sec/batch
Epoch 50/300 Iteration 696/4200 Training loss: 1.7705 1.7371 sec/batch
Epoch 50/300 Iteration 697/4200 Training loss: 1.7716 1.7372 sec/batch
Epoch 50/300 Iteration 698/4200 Training loss: 1.7709 1.7354 sec/batch
Epoch 50/300 Iteration 699/4200 Training loss: 1.7723 1.7378 sec/batch
Epoch 50/300 Iteration 700/4200 Training loss: 1.7716 1.7379 sec/batch
Validation loss: 1.71034 Saving checkpoint!
Epoch 51/300 Iteration 701/4200 Training loss: 1.8054 3.1576 sec/batch
Epoch 51/300 Iteration 702/4200 Training loss: 1.7778 1.8606 sec/batch
Epoch 51/300 Iteration 703/4200 Training loss: 1.7608 1.6507 sec/batch
Epoch 51/300 Iteration 704/4200 Training loss: 1.7623 1.6881 sec/batch
Epoch 51/300 Iteration 705/4200 Training loss: 1.7624 1.6895 sec/batch
Epoch 51/300 Iteration 706/4200 Training loss: 1.7609 1.6894 sec/batch
Epoch 51/300 Iteration 707/4200 Training loss: 1.7578 1.6903 sec/batch
Epoch 51/300 Iteration 708/4200 Training loss: 1.7553 1.6939 sec/batch
Epoch 51/300 Iteration 709/4200 Training loss: 1.7520 1.6693 sec/batch
Epoch 51/300 Iteration 710/4200 Training loss: 1.7549 1.6900 sec/batch
Epoch 51/300 Iteration 711/4200 Training loss: 1.7568 1.6862 sec/batch
Epoch 51/300 Iteration 712/4200 Training loss: 1.7564 1.6955 sec/batch
Epoch 51/300 Iteration 713/4200 Training loss: 1.7577 1.6883 sec/batch
Epoch 51/300 Iteration 714/4200 Training loss: 1.7580 1.6908 sec/batch
Epoch 52/300 Iteration 715/4200 Training loss: 1.7879 1.6907 sec/batch
Epoch 52/300 Iteration 716/4200 Training loss: 1.7622 1.6910 sec/batch
Epoch 52/300 Iteration 717/4200 Training loss: 1.7495 1.6906 sec/batch
Epoch 52/300 Iteration 718/4200 Training loss: 1.7480 1.6868 sec/batch
Epoch 52/300 Iteration 719/4200 Training loss: 1.7512 1.6960 sec/batch
Epoch 52/300 Iteration 720/4200 Training loss: 1.7519 1.6874 sec/batch
Epoch 52/300 Iteration 721/4200 Training loss: 1.7466 1.6902 sec/batch
Epoch 52/300 Iteration 722/4200 Training loss: 1.7453 1.6972 sec/batch
Epoch 52/300 Iteration 723/4200 Training loss: 1.7426 1.7256 sec/batch
Epoch 52/300 Iteration 724/4200 Training loss: 1.7426 1.6928 sec/batch
Epoch 52/300 Iteration 725/4200 Training loss: 1.7447 1.6876 sec/batch
Epoch 52/300 Iteration 726/4200 Training loss: 1.7450 1.6916 sec/batch
Epoch 52/300 Iteration 727/4200 Training loss: 1.7470 1.6875 sec/batch
Epoch 52/300 Iteration 728/4200 Training loss: 1.7467 1.6890 sec/batch
Epoch 53/300 Iteration 729/4200 Training loss: 1.7819 1.6923 sec/batch
Epoch 53/300 Iteration 730/4200 Training loss: 1.7542 1.6877 sec/batch
Epoch 53/300 Iteration 731/4200 Training loss: 1.7404 1.6861 sec/batch
Epoch 53/300 Iteration 732/4200 Training loss: 1.7402 1.7227 sec/batch
Epoch 53/300 Iteration 733/4200 Training loss: 1.7425 1.7367 sec/batch
Epoch 53/300 Iteration 734/4200 Training loss: 1.7424 1.7375 sec/batch
Epoch 53/300 Iteration 735/4200 Training loss: 1.7366 1.7361 sec/batch
Epoch 53/300 Iteration 736/4200 Training loss: 1.7342 1.7389 sec/batch
Epoch 53/300 Iteration 737/4200 Training loss: 1.7311 1.7359 sec/batch
Epoch 53/300 Iteration 738/4200 Training loss: 1.7323 1.7409 sec/batch
Epoch 53/300 Iteration 739/4200 Training loss: 1.7348 1.7357 sec/batch
Epoch 53/300 Iteration 740/4200 Training loss: 1.7353 1.7344 sec/batch
Epoch 53/300 Iteration 741/4200 Training loss: 1.7366 1.7385 sec/batch
Epoch 53/300 Iteration 742/4200 Training loss: 1.7361 1.7359 sec/batch
Epoch 54/300 Iteration 743/4200 Training loss: 1.7681 1.7442 sec/batch
Epoch 54/300 Iteration 744/4200 Training loss: 1.7436 1.7357 sec/batch
Epoch 54/300 Iteration 745/4200 Training loss: 1.7295 1.7355 sec/batch
Epoch 54/300 Iteration 746/4200 Training loss: 1.7281 1.7370 sec/batch
Epoch 54/300 Iteration 747/4200 Training loss: 1.7295 1.7435 sec/batch
Epoch 54/300 Iteration 748/4200 Training loss: 1.7287 1.7363 sec/batch
Epoch 54/300 Iteration 749/4200 Training loss: 1.7240 1.7411 sec/batch
Epoch 54/300 Iteration 750/4200 Training loss: 1.7220 1.7372 sec/batch
Epoch 54/300 Iteration 751/4200 Training loss: 1.7200 1.7370 sec/batch
Epoch 54/300 Iteration 752/4200 Training loss: 1.7211 1.7364 sec/batch
Epoch 54/300 Iteration 753/4200 Training loss: 1.7227 1.7356 sec/batch
Epoch 54/300 Iteration 754/4200 Training loss: 1.7222 1.7310 sec/batch
Epoch 54/300 Iteration 755/4200 Training loss: 1.7244 1.7400 sec/batch
Epoch 54/300 Iteration 756/4200 Training loss: 1.7245 1.6887 sec/batch
Epoch 55/300 Iteration 757/4200 Training loss: 1.7533 1.7238 sec/batch
Epoch 55/300 Iteration 758/4200 Training loss: 1.7312 1.7368 sec/batch
Epoch 55/300 Iteration 759/4200 Training loss: 1.7138 1.7359 sec/batch
Epoch 55/300 Iteration 760/4200 Training loss: 1.7134 1.7371 sec/batch
Epoch 55/300 Iteration 761/4200 Training loss: 1.7158 1.7404 sec/batch
Epoch 55/300 Iteration 762/4200 Training loss: 1.7153 1.6947 sec/batch
Epoch 55/300 Iteration 763/4200 Training loss: 1.7098 1.6887 sec/batch
Epoch 55/300 Iteration 764/4200 Training loss: 1.7097 1.6886 sec/batch
Epoch 55/300 Iteration 765/4200 Training loss: 1.7080 1.6878 sec/batch
Epoch 55/300 Iteration 766/4200 Training loss: 1.7094 1.6902 sec/batch
Epoch 55/300 Iteration 767/4200 Training loss: 1.7112 1.7356 sec/batch
Epoch 55/300 Iteration 768/4200 Training loss: 1.7111 1.7598 sec/batch
Epoch 55/300 Iteration 769/4200 Training loss: 1.7131 1.7558 sec/batch
Epoch 55/300 Iteration 770/4200 Training loss: 1.7126 1.7377 sec/batch
Epoch 56/300 Iteration 771/4200 Training loss: 1.7407 1.7391 sec/batch
Epoch 56/300 Iteration 772/4200 Training loss: 1.7134 1.7333 sec/batch
Epoch 56/300 Iteration 773/4200 Training loss: 1.7027 1.7384 sec/batch
Epoch 56/300 Iteration 774/4200 Training loss: 1.7017 1.7371 sec/batch
Epoch 56/300 Iteration 775/4200 Training loss: 1.7032 1.6937 sec/batch
Epoch 56/300 Iteration 776/4200 Training loss: 1.7052 1.7219 sec/batch
Epoch 56/300 Iteration 777/4200 Training loss: 1.7006 1.7372 sec/batch
Epoch 56/300 Iteration 778/4200 Training loss: 1.6998 1.7352 sec/batch
Epoch 56/300 Iteration 779/4200 Training loss: 1.6969 1.7359 sec/batch
Epoch 56/300 Iteration 780/4200 Training loss: 1.6965 1.7372 sec/batch
Epoch 56/300 Iteration 781/4200 Training loss: 1.6989 1.7349 sec/batch
Epoch 56/300 Iteration 782/4200 Training loss: 1.6988 1.7415 sec/batch
Epoch 56/300 Iteration 783/4200 Training loss: 1.7006 1.7366 sec/batch
Epoch 56/300 Iteration 784/4200 Training loss: 1.7005 1.7367 sec/batch
Epoch 57/300 Iteration 785/4200 Training loss: 1.7231 1.7353 sec/batch
Epoch 57/300 Iteration 786/4200 Training loss: 1.6979 1.7338 sec/batch
Epoch 57/300 Iteration 787/4200 Training loss: 1.6871 1.7362 sec/batch
Epoch 57/300 Iteration 788/4200 Training loss: 1.6884 1.7364 sec/batch
Epoch 57/300 Iteration 789/4200 Training loss: 1.6899 1.7356 sec/batch
Epoch 57/300 Iteration 790/4200 Training loss: 1.6912 1.7354 sec/batch
Epoch 57/300 Iteration 791/4200 Training loss: 1.6869 1.7369 sec/batch
Epoch 57/300 Iteration 792/4200 Training loss: 1.6854 1.7363 sec/batch
Epoch 57/300 Iteration 793/4200 Training loss: 1.6841 1.6938 sec/batch
Epoch 57/300 Iteration 794/4200 Training loss: 1.6848 1.6887 sec/batch
Epoch 57/300 Iteration 795/4200 Training loss: 1.6873 1.6870 sec/batch
Epoch 57/300 Iteration 796/4200 Training loss: 1.6882 1.6890 sec/batch
Epoch 57/300 Iteration 797/4200 Training loss: 1.6904 1.6970 sec/batch
Epoch 57/300 Iteration 798/4200 Training loss: 1.6900 1.7393 sec/batch
Epoch 58/300 Iteration 799/4200 Training loss: 1.7321 1.7358 sec/batch
Epoch 58/300 Iteration 800/4200 Training loss: 1.7027 1.7403 sec/batch
Validation loss: 1.6555 Saving checkpoint!
Epoch 58/300 Iteration 801/4200 Training loss: 1.7033 1.6351 sec/batch
Epoch 58/300 Iteration 802/4200 Training loss: 1.6983 1.7346 sec/batch
Epoch 58/300 Iteration 803/4200 Training loss: 1.6968 1.7335 sec/batch
Epoch 58/300 Iteration 804/4200 Training loss: 1.6938 1.7384 sec/batch
Epoch 58/300 Iteration 805/4200 Training loss: 1.6883 1.7355 sec/batch
Epoch 58/300 Iteration 806/4200 Training loss: 1.6851 1.7343 sec/batch
Epoch 58/300 Iteration 807/4200 Training loss: 1.6821 1.7342 sec/batch
Epoch 58/300 Iteration 808/4200 Training loss: 1.6829 1.7361 sec/batch
Epoch 58/300 Iteration 809/4200 Training loss: 1.6848 1.7356 sec/batch
Epoch 58/300 Iteration 810/4200 Training loss: 1.6851 1.7366 sec/batch
Epoch 58/300 Iteration 811/4200 Training loss: 1.6872 1.7335 sec/batch
Epoch 58/300 Iteration 812/4200 Training loss: 1.6866 1.7350 sec/batch
Epoch 59/300 Iteration 813/4200 Training loss: 1.7273 1.7025 sec/batch
Epoch 59/300 Iteration 814/4200 Training loss: 1.6949 1.7372 sec/batch
Epoch 59/300 Iteration 815/4200 Training loss: 1.6792 1.7385 sec/batch
Epoch 59/300 Iteration 816/4200 Training loss: 1.6782 1.7351 sec/batch
Epoch 59/300 Iteration 817/4200 Training loss: 1.6788 1.7340 sec/batch
Epoch 59/300 Iteration 818/4200 Training loss: 1.6779 1.7347 sec/batch
Epoch 59/300 Iteration 819/4200 Training loss: 1.6735 1.7378 sec/batch
Epoch 59/300 Iteration 820/4200 Training loss: 1.6719 1.7347 sec/batch
Epoch 59/300 Iteration 821/4200 Training loss: 1.6693 1.7392 sec/batch
Epoch 59/300 Iteration 822/4200 Training loss: 1.6702 1.7376 sec/batch
Epoch 59/300 Iteration 823/4200 Training loss: 1.6715 1.7379 sec/batch
Epoch 59/300 Iteration 824/4200 Training loss: 1.6723 1.7386 sec/batch
Epoch 59/300 Iteration 825/4200 Training loss: 1.6740 1.7380 sec/batch
Epoch 59/300 Iteration 826/4200 Training loss: 1.6735 1.7373 sec/batch
Epoch 60/300 Iteration 827/4200 Training loss: 1.7106 1.7372 sec/batch
Epoch 60/300 Iteration 828/4200 Training loss: 1.6807 1.7374 sec/batch
Epoch 60/300 Iteration 829/4200 Training loss: 1.6641 1.7394 sec/batch
Epoch 60/300 Iteration 830/4200 Training loss: 1.6617 1.7398 sec/batch
Epoch 60/300 Iteration 831/4200 Training loss: 1.6639 1.7393 sec/batch
Epoch 60/300 Iteration 832/4200 Training loss: 1.6630 1.7383 sec/batch
Epoch 60/300 Iteration 833/4200 Training loss: 1.6575 1.7344 sec/batch
Epoch 60/300 Iteration 834/4200 Training loss: 1.6566 1.7350 sec/batch
Epoch 60/300 Iteration 835/4200 Training loss: 1.6548 1.7384 sec/batch
Epoch 60/300 Iteration 836/4200 Training loss: 1.6553 1.7566 sec/batch
Epoch 60/300 Iteration 837/4200 Training loss: 1.6567 1.7561 sec/batch
Epoch 60/300 Iteration 838/4200 Training loss: 1.6569 1.7357 sec/batch
Epoch 60/300 Iteration 839/4200 Training loss: 1.6582 1.7367 sec/batch
Epoch 60/300 Iteration 840/4200 Training loss: 1.6584 1.7348 sec/batch
Epoch 61/300 Iteration 841/4200 Training loss: 1.6972 1.7392 sec/batch
Epoch 61/300 Iteration 842/4200 Training loss: 1.6606 1.7342 sec/batch
Epoch 61/300 Iteration 843/4200 Training loss: 1.6459 1.7357 sec/batch
Epoch 61/300 Iteration 844/4200 Training loss: 1.6456 1.7353 sec/batch
Epoch 61/300 Iteration 845/4200 Training loss: 1.6479 1.7373 sec/batch
Epoch 61/300 Iteration 846/4200 Training loss: 1.6490 1.7338 sec/batch
Epoch 61/300 Iteration 847/4200 Training loss: 1.6460 1.7354 sec/batch
Epoch 61/300 Iteration 848/4200 Training loss: 1.6450 1.7378 sec/batch
Epoch 61/300 Iteration 849/4200 Training loss: 1.6442 1.7357 sec/batch
Epoch 61/300 Iteration 850/4200 Training loss: 1.6447 1.7431 sec/batch
Epoch 61/300 Iteration 851/4200 Training loss: 1.6468 1.8040 sec/batch
Epoch 61/300 Iteration 852/4200 Training loss: 1.6470 1.7360 sec/batch
Epoch 61/300 Iteration 853/4200 Training loss: 1.6486 1.7345 sec/batch
Epoch 61/300 Iteration 854/4200 Training loss: 1.6480 1.7469 sec/batch
Epoch 62/300 Iteration 855/4200 Training loss: 1.6897 1.7376 sec/batch
Epoch 62/300 Iteration 856/4200 Training loss: 1.6514 1.7338 sec/batch
Epoch 62/300 Iteration 857/4200 Training loss: 1.6384 1.7355 sec/batch
Epoch 62/300 Iteration 858/4200 Training loss: 1.6344 1.7356 sec/batch
Epoch 62/300 Iteration 859/4200 Training loss: 1.6386 1.7365 sec/batch
Epoch 62/300 Iteration 860/4200 Training loss: 1.6378 1.7401 sec/batch
Epoch 62/300 Iteration 861/4200 Training loss: 1.6324 1.7386 sec/batch
Epoch 62/300 Iteration 862/4200 Training loss: 1.6324 1.7364 sec/batch
Epoch 62/300 Iteration 863/4200 Training loss: 1.6307 1.7448 sec/batch
Epoch 62/300 Iteration 864/4200 Training loss: 1.6300 1.7404 sec/batch
Epoch 62/300 Iteration 865/4200 Training loss: 1.6333 1.7364 sec/batch
Epoch 62/300 Iteration 866/4200 Training loss: 1.6346 1.7418 sec/batch
Epoch 62/300 Iteration 867/4200 Training loss: 1.6372 1.7356 sec/batch
Epoch 62/300 Iteration 868/4200 Training loss: 1.6375 1.7357 sec/batch
Epoch 63/300 Iteration 869/4200 Training loss: 1.6784 1.7344 sec/batch
Epoch 63/300 Iteration 870/4200 Training loss: 1.6476 1.7355 sec/batch
Epoch 63/300 Iteration 871/4200 Training loss: 1.6371 1.7356 sec/batch
Epoch 63/300 Iteration 872/4200 Training loss: 1.6357 1.7364 sec/batch
Epoch 63/300 Iteration 873/4200 Training loss: 1.6366 1.7434 sec/batch
Epoch 63/300 Iteration 874/4200 Training loss: 1.6348 1.7355 sec/batch
Epoch 63/300 Iteration 875/4200 Training loss: 1.6304 1.7359 sec/batch
Epoch 63/300 Iteration 876/4200 Training loss: 1.6289 1.7354 sec/batch
Epoch 63/300 Iteration 877/4200 Training loss: 1.6261 1.7437 sec/batch
Epoch 63/300 Iteration 878/4200 Training loss: 1.6254 1.6897 sec/batch
Epoch 63/300 Iteration 879/4200 Training loss: 1.6278 1.6887 sec/batch
Epoch 63/300 Iteration 880/4200 Training loss: 1.6284 1.7397 sec/batch
Epoch 63/300 Iteration 881/4200 Training loss: 1.6291 1.7358 sec/batch
Epoch 63/300 Iteration 882/4200 Training loss: 1.6297 1.7392 sec/batch
Epoch 64/300 Iteration 883/4200 Training loss: 1.6784 1.6903 sec/batch
Epoch 64/300 Iteration 884/4200 Training loss: 1.6421 1.7144 sec/batch
Epoch 64/300 Iteration 885/4200 Training loss: 1.6273 1.7370 sec/batch
Epoch 64/300 Iteration 886/4200 Training loss: 1.6225 1.7362 sec/batch
Epoch 64/300 Iteration 887/4200 Training loss: 1.6241 1.7411 sec/batch
Epoch 64/300 Iteration 888/4200 Training loss: 1.6233 1.7354 sec/batch
Epoch 64/300 Iteration 889/4200 Training loss: 1.6172 1.7361 sec/batch
Epoch 64/300 Iteration 890/4200 Training loss: 1.6164 1.7356 sec/batch
Epoch 64/300 Iteration 891/4200 Training loss: 1.6159 1.7388 sec/batch
Epoch 64/300 Iteration 892/4200 Training loss: 1.6151 1.7396 sec/batch
Epoch 64/300 Iteration 893/4200 Training loss: 1.6176 1.7345 sec/batch
Epoch 64/300 Iteration 894/4200 Training loss: 1.6185 1.7399 sec/batch
Epoch 64/300 Iteration 895/4200 Training loss: 1.6200 1.7367 sec/batch
Epoch 64/300 Iteration 896/4200 Training loss: 1.6209 1.7346 sec/batch
Epoch 65/300 Iteration 897/4200 Training loss: 1.6510 1.7366 sec/batch
Epoch 65/300 Iteration 898/4200 Training loss: 1.6261 1.7401 sec/batch
Epoch 65/300 Iteration 899/4200 Training loss: 1.6085 1.7373 sec/batch
Epoch 65/300 Iteration 900/4200 Training loss: 1.6072 1.7384 sec/batch
Validation loss: 1.62048 Saving checkpoint!
Epoch 65/300 Iteration 901/4200 Training loss: 1.6227 3.2144 sec/batch
Epoch 65/300 Iteration 902/4200 Training loss: 1.6195 1.8242 sec/batch
Epoch 65/300 Iteration 903/4200 Training loss: 1.6135 1.6438 sec/batch
Epoch 65/300 Iteration 904/4200 Training loss: 1.6109 1.6517 sec/batch
Epoch 65/300 Iteration 905/4200 Training loss: 1.6077 1.6881 sec/batch
Epoch 65/300 Iteration 906/4200 Training loss: 1.6079 1.6681 sec/batch
Epoch 65/300 Iteration 907/4200 Training loss: 1.6096 1.6877 sec/batch
Epoch 65/300 Iteration 908/4200 Training loss: 1.6097 1.6903 sec/batch
Epoch 65/300 Iteration 909/4200 Training loss: 1.6108 1.6898 sec/batch
Epoch 65/300 Iteration 910/4200 Training loss: 1.6106 1.6886 sec/batch
Epoch 66/300 Iteration 911/4200 Training loss: 1.6388 1.6890 sec/batch
Epoch 66/300 Iteration 912/4200 Training loss: 1.6088 1.6898 sec/batch
Epoch 66/300 Iteration 913/4200 Training loss: 1.5967 1.6904 sec/batch
Epoch 66/300 Iteration 914/4200 Training loss: 1.5958 1.6878 sec/batch
Epoch 66/300 Iteration 915/4200 Training loss: 1.6001 1.6885 sec/batch
Epoch 66/300 Iteration 916/4200 Training loss: 1.5995 1.7311 sec/batch
Epoch 66/300 Iteration 917/4200 Training loss: 1.5961 1.7383 sec/batch
Epoch 66/300 Iteration 918/4200 Training loss: 1.5941 1.7361 sec/batch
Epoch 66/300 Iteration 919/4200 Training loss: 1.5928 1.7412 sec/batch
Epoch 66/300 Iteration 920/4200 Training loss: 1.5944 1.7361 sec/batch
Epoch 66/300 Iteration 921/4200 Training loss: 1.5958 1.7370 sec/batch
Epoch 66/300 Iteration 922/4200 Training loss: 1.5961 1.7366 sec/batch
Epoch 66/300 Iteration 923/4200 Training loss: 1.5981 1.7376 sec/batch
Epoch 66/300 Iteration 924/4200 Training loss: 1.5992 1.7353 sec/batch
Epoch 67/300 Iteration 925/4200 Training loss: 1.6394 1.7368 sec/batch
Epoch 67/300 Iteration 926/4200 Training loss: 1.6018 1.7369 sec/batch
Epoch 67/300 Iteration 927/4200 Training loss: 1.5853 1.6907 sec/batch
Epoch 67/300 Iteration 928/4200 Training loss: 1.5833 1.6900 sec/batch
Epoch 67/300 Iteration 929/4200 Training loss: 1.5859 1.6922 sec/batch
Epoch 67/300 Iteration 930/4200 Training loss: 1.5852 1.6890 sec/batch
Epoch 67/300 Iteration 931/4200 Training loss: 1.5821 1.6869 sec/batch
Epoch 67/300 Iteration 932/4200 Training loss: 1.5826 1.6903 sec/batch
Epoch 67/300 Iteration 933/4200 Training loss: 1.5815 1.6919 sec/batch
Epoch 67/300 Iteration 934/4200 Training loss: 1.5814 1.6875 sec/batch
Epoch 67/300 Iteration 935/4200 Training loss: 1.5836 1.6889 sec/batch
Epoch 67/300 Iteration 936/4200 Training loss: 1.5847 1.6922 sec/batch
Epoch 67/300 Iteration 937/4200 Training loss: 1.5865 1.7232 sec/batch
Epoch 67/300 Iteration 938/4200 Training loss: 1.5874 1.7352 sec/batch
Epoch 68/300 Iteration 939/4200 Training loss: 1.6341 1.7358 sec/batch
Epoch 68/300 Iteration 940/4200 Training loss: 1.5984 1.7385 sec/batch
Epoch 68/300 Iteration 941/4200 Training loss: 1.5814 1.7390 sec/batch
Epoch 68/300 Iteration 942/4200 Training loss: 1.5794 1.6905 sec/batch
Epoch 68/300 Iteration 943/4200 Training loss: 1.5805 1.6875 sec/batch
Epoch 68/300 Iteration 944/4200 Training loss: 1.5787 1.6983 sec/batch
Epoch 68/300 Iteration 945/4200 Training loss: 1.5745 1.7360 sec/batch
Epoch 68/300 Iteration 946/4200 Training loss: 1.5737 1.7366 sec/batch
Epoch 68/300 Iteration 947/4200 Training loss: 1.5709 1.7364 sec/batch
Epoch 68/300 Iteration 948/4200 Training loss: 1.5710 1.7394 sec/batch
Epoch 68/300 Iteration 949/4200 Training loss: 1.5737 1.7365 sec/batch
Epoch 68/300 Iteration 950/4200 Training loss: 1.5736 1.7366 sec/batch
Epoch 68/300 Iteration 951/4200 Training loss: 1.5758 1.7441 sec/batch
Epoch 68/300 Iteration 952/4200 Training loss: 1.5764 1.7384 sec/batch
Epoch 69/300 Iteration 953/4200 Training loss: 1.6091 1.6915 sec/batch
Epoch 69/300 Iteration 954/4200 Training loss: 1.5825 1.6893 sec/batch
Epoch 69/300 Iteration 955/4200 Training loss: 1.5671 1.6869 sec/batch
Epoch 69/300 Iteration 956/4200 Training loss: 1.5662 1.6890 sec/batch
Epoch 69/300 Iteration 957/4200 Training loss: 1.5687 1.6892 sec/batch
Epoch 69/300 Iteration 958/4200 Training loss: 1.5691 1.6905 sec/batch
Epoch 69/300 Iteration 959/4200 Training loss: 1.5637 1.6917 sec/batch
Epoch 69/300 Iteration 960/4200 Training loss: 1.5625 1.6907 sec/batch
Epoch 69/300 Iteration 961/4200 Training loss: 1.5610 1.6900 sec/batch
Epoch 69/300 Iteration 962/4200 Training loss: 1.5606 1.6905 sec/batch
Epoch 69/300 Iteration 963/4200 Training loss: 1.5634 1.6873 sec/batch
Epoch 69/300 Iteration 964/4200 Training loss: 1.5635 1.6913 sec/batch
Epoch 69/300 Iteration 965/4200 Training loss: 1.5655 1.6902 sec/batch
Epoch 69/300 Iteration 966/4200 Training loss: 1.5653 1.7562 sec/batch
Epoch 70/300 Iteration 967/4200 Training loss: 1.6072 1.6883 sec/batch
Epoch 70/300 Iteration 968/4200 Training loss: 1.5762 1.7076 sec/batch
Epoch 70/300 Iteration 969/4200 Training loss: 1.5562 1.7475 sec/batch
Epoch 70/300 Iteration 970/4200 Training loss: 1.5570 1.7354 sec/batch
Epoch 70/300 Iteration 971/4200 Training loss: 1.5590 1.6892 sec/batch
Epoch 70/300 Iteration 972/4200 Training loss: 1.5568 1.7469 sec/batch
Epoch 70/300 Iteration 973/4200 Training loss: 1.5517 1.7504 sec/batch
Epoch 70/300 Iteration 974/4200 Training loss: 1.5511 1.7772 sec/batch
Epoch 70/300 Iteration 975/4200 Training loss: 1.5489 1.7363 sec/batch
Epoch 70/300 Iteration 976/4200 Training loss: 1.5493 1.7356 sec/batch
Epoch 70/300 Iteration 977/4200 Training loss: 1.5518 1.7465 sec/batch
Epoch 70/300 Iteration 978/4200 Training loss: 1.5515 1.7409 sec/batch
Epoch 70/300 Iteration 979/4200 Training loss: 1.5536 1.8032 sec/batch
Epoch 70/300 Iteration 980/4200 Training loss: 1.5539 1.7402 sec/batch
Epoch 71/300 Iteration 981/4200 Training loss: 1.5998 1.7371 sec/batch
Epoch 71/300 Iteration 982/4200 Training loss: 1.5693 1.7951 sec/batch
Epoch 71/300 Iteration 983/4200 Training loss: 1.5503 1.9045 sec/batch
Epoch 71/300 Iteration 984/4200 Training loss: 1.5471 1.9655 sec/batch
Epoch 71/300 Iteration 985/4200 Training loss: 1.5489 1.8267 sec/batch
Epoch 71/300 Iteration 986/4200 Training loss: 1.5484 1.8190 sec/batch
Epoch 71/300 Iteration 987/4200 Training loss: 1.5428 1.8450 sec/batch
Epoch 71/300 Iteration 988/4200 Training loss: 1.5420 1.8450 sec/batch
Epoch 71/300 Iteration 989/4200 Training loss: 1.5401 1.8076 sec/batch
Epoch 71/300 Iteration 990/4200 Training loss: 1.5421 1.8205 sec/batch
Epoch 71/300 Iteration 991/4200 Training loss: 1.5448 1.8149 sec/batch
Epoch 71/300 Iteration 992/4200 Training loss: 1.5449 1.8713 sec/batch
Epoch 71/300 Iteration 993/4200 Training loss: 1.5466 1.8116 sec/batch
Epoch 71/300 Iteration 994/4200 Training loss: 1.5466 1.7976 sec/batch
Epoch 72/300 Iteration 995/4200 Training loss: 1.5734 1.7426 sec/batch
Epoch 72/300 Iteration 996/4200 Training loss: 1.5456 1.7367 sec/batch
Epoch 72/300 Iteration 997/4200 Training loss: 1.5307 1.7373 sec/batch
Epoch 72/300 Iteration 998/4200 Training loss: 1.5287 1.7422 sec/batch
Epoch 72/300 Iteration 999/4200 Training loss: 1.5312 1.7380 sec/batch
Epoch 72/300 Iteration 1000/4200 Training loss: 1.5320 1.7378 sec/batch
Validation loss: 1.59507 Saving checkpoint!
Epoch 72/300 Iteration 1001/4200 Training loss: 1.5395 1.6373 sec/batch
Epoch 72/300 Iteration 1002/4200 Training loss: 1.5381 1.7372 sec/batch
Epoch 72/300 Iteration 1003/4200 Training loss: 1.5355 1.7394 sec/batch
Epoch 72/300 Iteration 1004/4200 Training loss: 1.5362 1.6913 sec/batch
Epoch 72/300 Iteration 1005/4200 Training loss: 1.5385 1.7036 sec/batch
Epoch 72/300 Iteration 1006/4200 Training loss: 1.5387 1.7409 sec/batch
Epoch 72/300 Iteration 1007/4200 Training loss: 1.5396 1.7419 sec/batch
Epoch 72/300 Iteration 1008/4200 Training loss: 1.5392 1.7378 sec/batch
Epoch 73/300 Iteration 1009/4200 Training loss: 1.5657 1.7395 sec/batch
Epoch 73/300 Iteration 1010/4200 Training loss: 1.5392 1.7376 sec/batch
Epoch 73/300 Iteration 1011/4200 Training loss: 1.5260 1.7369 sec/batch
Epoch 73/300 Iteration 1012/4200 Training loss: 1.5266 1.7374 sec/batch
Epoch 73/300 Iteration 1013/4200 Training loss: 1.5281 1.7374 sec/batch
Epoch 73/300 Iteration 1014/4200 Training loss: 1.5255 1.7389 sec/batch
Epoch 73/300 Iteration 1015/4200 Training loss: 1.5221 1.7365 sec/batch
Epoch 73/300 Iteration 1016/4200 Training loss: 1.5213 1.7385 sec/batch
Epoch 73/300 Iteration 1017/4200 Training loss: 1.5201 1.7393 sec/batch
Epoch 73/300 Iteration 1018/4200 Training loss: 1.5208 1.7431 sec/batch
Epoch 73/300 Iteration 1019/4200 Training loss: 1.5243 1.7381 sec/batch
Epoch 73/300 Iteration 1020/4200 Training loss: 1.5239 1.6957 sec/batch
Epoch 73/300 Iteration 1021/4200 Training loss: 1.5257 1.7392 sec/batch
Epoch 73/300 Iteration 1022/4200 Training loss: 1.5268 1.6893 sec/batch
Epoch 74/300 Iteration 1023/4200 Training loss: 1.5597 1.7009 sec/batch
Epoch 74/300 Iteration 1024/4200 Training loss: 1.5353 1.7455 sec/batch
Epoch 74/300 Iteration 1025/4200 Training loss: 1.5177 1.7680 sec/batch
Epoch 74/300 Iteration 1026/4200 Training loss: 1.5160 1.7458 sec/batch
Epoch 74/300 Iteration 1027/4200 Training loss: 1.5192 1.7373 sec/batch
Epoch 74/300 Iteration 1028/4200 Training loss: 1.5181 1.7431 sec/batch
Epoch 74/300 Iteration 1029/4200 Training loss: 1.5131 1.8138 sec/batch
Epoch 74/300 Iteration 1030/4200 Training loss: 1.5129 1.7583 sec/batch
Epoch 74/300 Iteration 1031/4200 Training loss: 1.5120 1.7431 sec/batch
Epoch 74/300 Iteration 1032/4200 Training loss: 1.5124 1.7707 sec/batch
Epoch 74/300 Iteration 1033/4200 Training loss: 1.5144 1.7579 sec/batch
Epoch 74/300 Iteration 1034/4200 Training loss: 1.5143 1.7623 sec/batch
Epoch 74/300 Iteration 1035/4200 Training loss: 1.5169 1.7966 sec/batch
Epoch 74/300 Iteration 1036/4200 Training loss: 1.5170 1.7382 sec/batch
Epoch 75/300 Iteration 1037/4200 Training loss: 1.5514 1.7384 sec/batch
Epoch 75/300 Iteration 1038/4200 Training loss: 1.5217 1.6964 sec/batch
Epoch 75/300 Iteration 1039/4200 Training loss: 1.5084 1.7061 sec/batch
Epoch 75/300 Iteration 1040/4200 Training loss: 1.5074 1.7035 sec/batch
Epoch 75/300 Iteration 1041/4200 Training loss: 1.5084 1.6925 sec/batch
Epoch 75/300 Iteration 1042/4200 Training loss: 1.5087 1.6909 sec/batch
Epoch 75/300 Iteration 1043/4200 Training loss: 1.5043 1.6890 sec/batch
Epoch 75/300 Iteration 1044/4200 Training loss: 1.5036 1.6915 sec/batch
Epoch 75/300 Iteration 1045/4200 Training loss: 1.5012 1.6917 sec/batch
Epoch 75/300 Iteration 1046/4200 Training loss: 1.5019 1.7471 sec/batch
Epoch 75/300 Iteration 1047/4200 Training loss: 1.5044 1.7051 sec/batch
Epoch 75/300 Iteration 1048/4200 Training loss: 1.5041 1.7109 sec/batch
Epoch 75/300 Iteration 1049/4200 Training loss: 1.5060 1.7416 sec/batch
Epoch 75/300 Iteration 1050/4200 Training loss: 1.5059 1.6949 sec/batch
Epoch 76/300 Iteration 1051/4200 Training loss: 1.5540 1.6915 sec/batch
Epoch 76/300 Iteration 1052/4200 Training loss: 1.5188 1.6928 sec/batch
Epoch 76/300 Iteration 1053/4200 Training loss: 1.5003 1.6939 sec/batch
Epoch 76/300 Iteration 1054/4200 Training loss: 1.4971 1.6879 sec/batch
Epoch 76/300 Iteration 1055/4200 Training loss: 1.4985 1.6931 sec/batch
Epoch 76/300 Iteration 1056/4200 Training loss: 1.4959 1.7423 sec/batch
Epoch 76/300 Iteration 1057/4200 Training loss: 1.4925 1.7439 sec/batch
Epoch 76/300 Iteration 1058/4200 Training loss: 1.4907 1.7391 sec/batch
Epoch 76/300 Iteration 1059/4200 Training loss: 1.4901 1.7421 sec/batch
Epoch 76/300 Iteration 1060/4200 Training loss: 1.4918 1.7381 sec/batch
Epoch 76/300 Iteration 1061/4200 Training loss: 1.4950 1.6929 sec/batch
Epoch 76/300 Iteration 1062/4200 Training loss: 1.4951 1.6909 sec/batch
Epoch 76/300 Iteration 1063/4200 Training loss: 1.4975 1.6909 sec/batch
Epoch 76/300 Iteration 1064/4200 Training loss: 1.4972 1.6886 sec/batch
Epoch 77/300 Iteration 1065/4200 Training loss: 1.5346 1.6936 sec/batch
Epoch 77/300 Iteration 1066/4200 Training loss: 1.4987 1.6916 sec/batch
Epoch 77/300 Iteration 1067/4200 Training loss: 1.4826 1.6878 sec/batch
Epoch 77/300 Iteration 1068/4200 Training loss: 1.4835 1.6888 sec/batch
Epoch 77/300 Iteration 1069/4200 Training loss: 1.4859 1.7261 sec/batch
Epoch 77/300 Iteration 1070/4200 Training loss: 1.4869 1.7383 sec/batch
Epoch 77/300 Iteration 1071/4200 Training loss: 1.4817 1.7374 sec/batch
Epoch 77/300 Iteration 1072/4200 Training loss: 1.4809 1.7425 sec/batch
Epoch 77/300 Iteration 1073/4200 Training loss: 1.4799 1.7405 sec/batch
Epoch 77/300 Iteration 1074/4200 Training loss: 1.4810 1.7371 sec/batch
Epoch 77/300 Iteration 1075/4200 Training loss: 1.4838 1.7372 sec/batch
Epoch 77/300 Iteration 1076/4200 Training loss: 1.4835 1.7375 sec/batch
Epoch 77/300 Iteration 1077/4200 Training loss: 1.4863 1.7413 sec/batch
Epoch 77/300 Iteration 1078/4200 Training loss: 1.4875 1.7383 sec/batch
Epoch 78/300 Iteration 1079/4200 Training loss: 1.5264 1.7391 sec/batch
Epoch 78/300 Iteration 1080/4200 Training loss: 1.4967 1.7430 sec/batch
Epoch 78/300 Iteration 1081/4200 Training loss: 1.4810 1.7057 sec/batch
Epoch 78/300 Iteration 1082/4200 Training loss: 1.4775 1.7384 sec/batch
Epoch 78/300 Iteration 1083/4200 Training loss: 1.4782 1.7383 sec/batch
Epoch 78/300 Iteration 1084/4200 Training loss: 1.4769 1.7388 sec/batch
Epoch 78/300 Iteration 1085/4200 Training loss: 1.4715 1.7379 sec/batch
Epoch 78/300 Iteration 1086/4200 Training loss: 1.4708 1.7393 sec/batch
Epoch 78/300 Iteration 1087/4200 Training loss: 1.4696 1.6899 sec/batch
Epoch 78/300 Iteration 1088/4200 Training loss: 1.4720 1.7098 sec/batch
Epoch 78/300 Iteration 1089/4200 Training loss: 1.4754 1.7404 sec/batch
Epoch 78/300 Iteration 1090/4200 Training loss: 1.4742 1.7406 sec/batch
Epoch 78/300 Iteration 1091/4200 Training loss: 1.4761 1.7759 sec/batch
Epoch 78/300 Iteration 1092/4200 Training loss: 1.4768 1.7371 sec/batch
Epoch 79/300 Iteration 1093/4200 Training loss: 1.5119 1.7482 sec/batch
Epoch 79/300 Iteration 1094/4200 Training loss: 1.4781 1.7368 sec/batch
Epoch 79/300 Iteration 1095/4200 Training loss: 1.4624 1.7437 sec/batch
Epoch 79/300 Iteration 1096/4200 Training loss: 1.4603 1.6918 sec/batch
Epoch 79/300 Iteration 1097/4200 Training loss: 1.4633 1.6945 sec/batch
Epoch 79/300 Iteration 1098/4200 Training loss: 1.4616 1.6888 sec/batch
Epoch 79/300 Iteration 1099/4200 Training loss: 1.4576 1.6897 sec/batch
Epoch 79/300 Iteration 1100/4200 Training loss: 1.4563 1.6909 sec/batch
Validation loss: 1.58708 Saving checkpoint!
Epoch 79/300 Iteration 1101/4200 Training loss: 1.4660 1.6307 sec/batch
Epoch 79/300 Iteration 1102/4200 Training loss: 1.4668 1.7401 sec/batch
Epoch 79/300 Iteration 1103/4200 Training loss: 1.4704 1.7385 sec/batch
Epoch 79/300 Iteration 1104/4200 Training loss: 1.4686 1.7375 sec/batch
Epoch 79/300 Iteration 1105/4200 Training loss: 1.4707 1.7369 sec/batch
Epoch 79/300 Iteration 1106/4200 Training loss: 1.4718 1.7367 sec/batch
Epoch 80/300 Iteration 1107/4200 Training loss: 1.4996 1.7630 sec/batch
Epoch 80/300 Iteration 1108/4200 Training loss: 1.4713 1.7527 sec/batch
Epoch 80/300 Iteration 1109/4200 Training loss: 1.4572 1.7389 sec/batch
Epoch 80/300 Iteration 1110/4200 Training loss: 1.4572 1.7391 sec/batch
Epoch 80/300 Iteration 1111/4200 Training loss: 1.4589 1.7376 sec/batch
Epoch 80/300 Iteration 1112/4200 Training loss: 1.4567 1.6910 sec/batch
Epoch 80/300 Iteration 1113/4200 Training loss: 1.4513 1.6941 sec/batch
Epoch 80/300 Iteration 1114/4200 Training loss: 1.4506 1.6924 sec/batch
Epoch 80/300 Iteration 1115/4200 Training loss: 1.4507 1.6934 sec/batch
Epoch 80/300 Iteration 1116/4200 Training loss: 1.4514 1.6897 sec/batch
Epoch 80/300 Iteration 1117/4200 Training loss: 1.4542 1.6945 sec/batch
Epoch 80/300 Iteration 1118/4200 Training loss: 1.4542 1.6900 sec/batch
Epoch 80/300 Iteration 1119/4200 Training loss: 1.4562 1.6890 sec/batch
Epoch 80/300 Iteration 1120/4200 Training loss: 1.4573 1.6911 sec/batch
Epoch 81/300 Iteration 1121/4200 Training loss: 1.4996 1.6926 sec/batch
Epoch 81/300 Iteration 1122/4200 Training loss: 1.4685 1.6919 sec/batch
Epoch 81/300 Iteration 1123/4200 Training loss: 1.4509 1.6898 sec/batch
Epoch 81/300 Iteration 1124/4200 Training loss: 1.4493 1.7424 sec/batch
Epoch 81/300 Iteration 1125/4200 Training loss: 1.4490 1.7383 sec/batch
Epoch 81/300 Iteration 1126/4200 Training loss: 1.4488 1.7389 sec/batch
Epoch 81/300 Iteration 1127/4200 Training loss: 1.4442 1.7356 sec/batch
Epoch 81/300 Iteration 1128/4200 Training loss: 1.4421 1.7373 sec/batch
Epoch 81/300 Iteration 1129/4200 Training loss: 1.4406 1.7385 sec/batch
Epoch 81/300 Iteration 1130/4200 Training loss: 1.4428 1.7368 sec/batch
Epoch 81/300 Iteration 1131/4200 Training loss: 1.4467 1.7435 sec/batch
Epoch 81/300 Iteration 1132/4200 Training loss: 1.4467 1.7384 sec/batch
Epoch 81/300 Iteration 1133/4200 Training loss: 1.4492 1.7377 sec/batch
Epoch 81/300 Iteration 1134/4200 Training loss: 1.4506 1.7381 sec/batch
Epoch 82/300 Iteration 1135/4200 Training loss: 1.4853 1.7387 sec/batch
Epoch 82/300 Iteration 1136/4200 Training loss: 1.4521 1.7381 sec/batch
Epoch 82/300 Iteration 1137/4200 Training loss: 1.4391 1.7395 sec/batch
Epoch 82/300 Iteration 1138/4200 Training loss: 1.4374 1.7434 sec/batch
Epoch 82/300 Iteration 1139/4200 Training loss: 1.4372 1.7408 sec/batch
Epoch 82/300 Iteration 1140/4200 Training loss: 1.4363 1.7392 sec/batch
Epoch 82/300 Iteration 1141/4200 Training loss: 1.4319 1.7388 sec/batch
Epoch 82/300 Iteration 1142/4200 Training loss: 1.4315 1.7417 sec/batch
Epoch 82/300 Iteration 1143/4200 Training loss: 1.4322 1.7375 sec/batch
Epoch 82/300 Iteration 1144/4200 Training loss: 1.4316 1.7357 sec/batch
Epoch 82/300 Iteration 1145/4200 Training loss: 1.4346 1.7405 sec/batch
Epoch 82/300 Iteration 1146/4200 Training loss: 1.4344 1.7388 sec/batch
Epoch 82/300 Iteration 1147/4200 Training loss: 1.4361 1.7410 sec/batch
Epoch 82/300 Iteration 1148/4200 Training loss: 1.4368 1.7389 sec/batch
Epoch 83/300 Iteration 1149/4200 Training loss: 1.4767 1.7403 sec/batch
Epoch 83/300 Iteration 1150/4200 Training loss: 1.4467 1.7384 sec/batch
Epoch 83/300 Iteration 1151/4200 Training loss: 1.4308 1.7378 sec/batch
Epoch 83/300 Iteration 1152/4200 Training loss: 1.4297 1.7377 sec/batch
Epoch 83/300 Iteration 1153/4200 Training loss: 1.4289 1.7400 sec/batch
Epoch 83/300 Iteration 1154/4200 Training loss: 1.4272 1.7414 sec/batch
Epoch 83/300 Iteration 1155/4200 Training loss: 1.4243 1.7379 sec/batch
Epoch 83/300 Iteration 1156/4200 Training loss: 1.4229 1.7457 sec/batch
Epoch 83/300 Iteration 1157/4200 Training loss: 1.4213 1.6925 sec/batch
Epoch 83/300 Iteration 1158/4200 Training loss: 1.4221 1.7123 sec/batch
Epoch 83/300 Iteration 1159/4200 Training loss: 1.4253 1.7378 sec/batch
Epoch 83/300 Iteration 1160/4200 Training loss: 1.4247 1.7367 sec/batch
Epoch 83/300 Iteration 1161/4200 Training loss: 1.4271 1.7367 sec/batch
Epoch 83/300 Iteration 1162/4200 Training loss: 1.4279 1.7416 sec/batch
Epoch 84/300 Iteration 1163/4200 Training loss: 1.4615 1.7362 sec/batch
Epoch 84/300 Iteration 1164/4200 Training loss: 1.4311 1.7395 sec/batch
Epoch 84/300 Iteration 1165/4200 Training loss: 1.4137 1.7409 sec/batch
Epoch 84/300 Iteration 1166/4200 Training loss: 1.4132 1.7378 sec/batch
Epoch 84/300 Iteration 1167/4200 Training loss: 1.4175 1.7392 sec/batch
Epoch 84/300 Iteration 1168/4200 Training loss: 1.4175 1.7418 sec/batch
Epoch 84/300 Iteration 1169/4200 Training loss: 1.4148 1.7397 sec/batch
Epoch 84/300 Iteration 1170/4200 Training loss: 1.4138 1.7402 sec/batch
Epoch 84/300 Iteration 1171/4200 Training loss: 1.4136 1.7383 sec/batch
Epoch 84/300 Iteration 1172/4200 Training loss: 1.4150 1.6899 sec/batch
Epoch 84/300 Iteration 1173/4200 Training loss: 1.4177 1.7036 sec/batch
Epoch 84/300 Iteration 1174/4200 Training loss: 1.4172 1.7387 sec/batch
Epoch 84/300 Iteration 1175/4200 Training loss: 1.4193 1.7384 sec/batch
Epoch 84/300 Iteration 1176/4200 Training loss: 1.4200 1.7570 sec/batch
Epoch 85/300 Iteration 1177/4200 Training loss: 1.4642 1.7062 sec/batch
Epoch 85/300 Iteration 1178/4200 Training loss: 1.4255 1.7149 sec/batch
Epoch 85/300 Iteration 1179/4200 Training loss: 1.4099 1.7379 sec/batch
Epoch 85/300 Iteration 1180/4200 Training loss: 1.4095 1.7371 sec/batch
Epoch 85/300 Iteration 1181/4200 Training loss: 1.4123 1.6956 sec/batch
Epoch 85/300 Iteration 1182/4200 Training loss: 1.4110 1.7169 sec/batch
Epoch 85/300 Iteration 1183/4200 Training loss: 1.4071 1.7470 sec/batch
Epoch 85/300 Iteration 1184/4200 Training loss: 1.4073 1.7377 sec/batch
Epoch 85/300 Iteration 1185/4200 Training loss: 1.4064 1.6940 sec/batch
Epoch 85/300 Iteration 1186/4200 Training loss: 1.4071 1.6924 sec/batch
Epoch 85/300 Iteration 1187/4200 Training loss: 1.4082 1.6915 sec/batch
Epoch 85/300 Iteration 1188/4200 Training loss: 1.4084 1.6882 sec/batch
Epoch 85/300 Iteration 1189/4200 Training loss: 1.4106 1.7112 sec/batch
Epoch 85/300 Iteration 1190/4200 Training loss: 1.4109 1.7373 sec/batch
Epoch 86/300 Iteration 1191/4200 Training loss: 1.4505 1.7423 sec/batch
Epoch 86/300 Iteration 1192/4200 Training loss: 1.4145 1.7363 sec/batch
Epoch 86/300 Iteration 1193/4200 Training loss: 1.4022 1.7364 sec/batch
Epoch 86/300 Iteration 1194/4200 Training loss: 1.3996 1.7362 sec/batch
Epoch 86/300 Iteration 1195/4200 Training loss: 1.4011 1.7380 sec/batch
Epoch 86/300 Iteration 1196/4200 Training loss: 1.3998 1.7413 sec/batch
Epoch 86/300 Iteration 1197/4200 Training loss: 1.3963 1.7362 sec/batch
Epoch 86/300 Iteration 1198/4200 Training loss: 1.3957 1.7369 sec/batch
Epoch 86/300 Iteration 1199/4200 Training loss: 1.3949 1.7404 sec/batch
Epoch 86/300 Iteration 1200/4200 Training loss: 1.3961 1.7418 sec/batch
Validation loss: 1.60528 Saving checkpoint!
Epoch 86/300 Iteration 1201/4200 Training loss: 1.4084 3.1758 sec/batch
Epoch 86/300 Iteration 1202/4200 Training loss: 1.4064 1.8521 sec/batch
Epoch 86/300 Iteration 1203/4200 Training loss: 1.4074 1.6344 sec/batch
Epoch 86/300 Iteration 1204/4200 Training loss: 1.4073 1.7043 sec/batch
Epoch 87/300 Iteration 1205/4200 Training loss: 1.4348 1.6809 sec/batch
Epoch 87/300 Iteration 1206/4200 Training loss: 1.3971 1.6892 sec/batch
Epoch 87/300 Iteration 1207/4200 Training loss: 1.3840 1.7032 sec/batch
Epoch 87/300 Iteration 1208/4200 Training loss: 1.3835 1.6912 sec/batch
Epoch 87/300 Iteration 1209/4200 Training loss: 1.3852 1.6910 sec/batch
Epoch 87/300 Iteration 1210/4200 Training loss: 1.3855 1.6899 sec/batch
Epoch 87/300 Iteration 1211/4200 Training loss: 1.3811 1.6969 sec/batch
Epoch 87/300 Iteration 1212/4200 Training loss: 1.3802 1.7381 sec/batch
Epoch 87/300 Iteration 1213/4200 Training loss: 1.3798 1.7886 sec/batch
Epoch 87/300 Iteration 1214/4200 Training loss: 1.3812 1.7414 sec/batch
Epoch 87/300 Iteration 1215/4200 Training loss: 1.3842 1.7374 sec/batch
Epoch 87/300 Iteration 1216/4200 Training loss: 1.3846 1.7418 sec/batch
Epoch 87/300 Iteration 1217/4200 Training loss: 1.3863 2.0230 sec/batch
Epoch 87/300 Iteration 1218/4200 Training loss: 1.3882 2.0474 sec/batch
Epoch 88/300 Iteration 1219/4200 Training loss: 1.4299 1.9958 sec/batch
Epoch 88/300 Iteration 1220/4200 Training loss: 1.3959 2.0384 sec/batch
Epoch 88/300 Iteration 1221/4200 Training loss: 1.3802 2.0382 sec/batch
Epoch 88/300 Iteration 1222/4200 Training loss: 1.3795 1.8432 sec/batch
Epoch 88/300 Iteration 1223/4200 Training loss: 1.3817 1.7900 sec/batch
Epoch 88/300 Iteration 1224/4200 Training loss: 1.3808 1.7424 sec/batch
Epoch 88/300 Iteration 1225/4200 Training loss: 1.3754 1.7417 sec/batch
Epoch 88/300 Iteration 1226/4200 Training loss: 1.3743 1.7848 sec/batch
Epoch 88/300 Iteration 1227/4200 Training loss: 1.3746 1.7669 sec/batch
Epoch 88/300 Iteration 1228/4200 Training loss: 1.3749 1.7384 sec/batch
Epoch 88/300 Iteration 1229/4200 Training loss: 1.3759 1.7506 sec/batch
Epoch 88/300 Iteration 1230/4200 Training loss: 1.3763 1.7384 sec/batch
Epoch 88/300 Iteration 1231/4200 Training loss: 1.3786 1.7447 sec/batch
Epoch 88/300 Iteration 1232/4200 Training loss: 1.3792 1.7392 sec/batch
Epoch 89/300 Iteration 1233/4200 Training loss: 1.4355 1.7412 sec/batch
Epoch 89/300 Iteration 1234/4200 Training loss: 1.3967 1.7380 sec/batch
Epoch 89/300 Iteration 1235/4200 Training loss: 1.3791 1.7375 sec/batch
Epoch 89/300 Iteration 1236/4200 Training loss: 1.3770 1.7415 sec/batch
Epoch 89/300 Iteration 1237/4200 Training loss: 1.3760 1.7383 sec/batch
Epoch 89/300 Iteration 1238/4200 Training loss: 1.3729 1.7394 sec/batch
Epoch 89/300 Iteration 1239/4200 Training loss: 1.3686 1.7417 sec/batch
Epoch 89/300 Iteration 1240/4200 Training loss: 1.3673 1.7393 sec/batch
Epoch 89/300 Iteration 1241/4200 Training loss: 1.3658 1.7463 sec/batch
Epoch 89/300 Iteration 1242/4200 Training loss: 1.3667 1.7607 sec/batch
Epoch 89/300 Iteration 1243/4200 Training loss: 1.3676 1.7517 sec/batch
Epoch 89/300 Iteration 1244/4200 Training loss: 1.3675 1.7378 sec/batch
Epoch 89/300 Iteration 1245/4200 Training loss: 1.3681 1.7396 sec/batch
Epoch 89/300 Iteration 1246/4200 Training loss: 1.3683 1.7394 sec/batch
Epoch 90/300 Iteration 1247/4200 Training loss: 1.4032 1.6945 sec/batch
Epoch 90/300 Iteration 1248/4200 Training loss: 1.3710 1.7258 sec/batch
Epoch 90/300 Iteration 1249/4200 Training loss: 1.3558 1.6956 sec/batch
Epoch 90/300 Iteration 1250/4200 Training loss: 1.3557 1.6910 sec/batch
Epoch 90/300 Iteration 1251/4200 Training loss: 1.3572 1.6932 sec/batch
Epoch 90/300 Iteration 1252/4200 Training loss: 1.3546 1.6902 sec/batch
Epoch 90/300 Iteration 1253/4200 Training loss: 1.3500 1.7387 sec/batch
Epoch 90/300 Iteration 1254/4200 Training loss: 1.3506 1.7361 sec/batch
Epoch 90/300 Iteration 1255/4200 Training loss: 1.3502 1.7444 sec/batch
Epoch 90/300 Iteration 1256/4200 Training loss: 1.3511 1.7444 sec/batch
Epoch 90/300 Iteration 1257/4200 Training loss: 1.3535 1.7420 sec/batch
Epoch 90/300 Iteration 1258/4200 Training loss: 1.3536 1.7379 sec/batch
Epoch 90/300 Iteration 1259/4200 Training loss: 1.3572 1.7380 sec/batch
Epoch 90/300 Iteration 1260/4200 Training loss: 1.3581 1.7378 sec/batch
Epoch 91/300 Iteration 1261/4200 Training loss: 1.3972 1.7374 sec/batch
Epoch 91/300 Iteration 1262/4200 Training loss: 1.3586 1.7376 sec/batch
Epoch 91/300 Iteration 1263/4200 Training loss: 1.3470 1.7384 sec/batch
Epoch 91/300 Iteration 1264/4200 Training loss: 1.3479 1.7376 sec/batch
Epoch 91/300 Iteration 1265/4200 Training loss: 1.3498 1.7361 sec/batch
Epoch 91/300 Iteration 1266/4200 Training loss: 1.3503 1.7363 sec/batch
Epoch 91/300 Iteration 1267/4200 Training loss: 1.3473 1.8104 sec/batch
Epoch 91/300 Iteration 1268/4200 Training loss: 1.3462 1.7957 sec/batch
Epoch 91/300 Iteration 1269/4200 Training loss: 1.3438 1.7515 sec/batch
Epoch 91/300 Iteration 1270/4200 Training loss: 1.3452 1.7386 sec/batch
Epoch 91/300 Iteration 1271/4200 Training loss: 1.3489 1.7421 sec/batch
Epoch 91/300 Iteration 1272/4200 Training loss: 1.3495 1.7412 sec/batch
Epoch 91/300 Iteration 1273/4200 Training loss: 1.3522 1.7384 sec/batch
Epoch 91/300 Iteration 1274/4200 Training loss: 1.3540 1.7380 sec/batch
Epoch 92/300 Iteration 1275/4200 Training loss: 1.4000 1.6958 sec/batch
Epoch 92/300 Iteration 1276/4200 Training loss: 1.3585 1.6943 sec/batch
Epoch 92/300 Iteration 1277/4200 Training loss: 1.3473 1.6922 sec/batch
Epoch 92/300 Iteration 1278/4200 Training loss: 1.3490 1.6918 sec/batch
Epoch 92/300 Iteration 1279/4200 Training loss: 1.3491 1.6985 sec/batch
Epoch 92/300 Iteration 1280/4200 Training loss: 1.3457 1.6919 sec/batch
Epoch 92/300 Iteration 1281/4200 Training loss: 1.3418 1.6908 sec/batch
Epoch 92/300 Iteration 1282/4200 Training loss: 1.3408 1.6934 sec/batch
Epoch 92/300 Iteration 1283/4200 Training loss: 1.3411 1.6921 sec/batch
Epoch 92/300 Iteration 1284/4200 Training loss: 1.3421 1.7019 sec/batch
Epoch 92/300 Iteration 1285/4200 Training loss: 1.3445 1.7014 sec/batch
Epoch 92/300 Iteration 1286/4200 Training loss: 1.3436 1.6950 sec/batch
Epoch 92/300 Iteration 1287/4200 Training loss: 1.3454 1.6913 sec/batch
Epoch 92/300 Iteration 1288/4200 Training loss: 1.3461 1.6894 sec/batch
Epoch 93/300 Iteration 1289/4200 Training loss: 1.3895 1.6909 sec/batch
Epoch 93/300 Iteration 1290/4200 Training loss: 1.3545 1.6906 sec/batch
Epoch 93/300 Iteration 1291/4200 Training loss: 1.3386 1.6952 sec/batch
Epoch 93/300 Iteration 1292/4200 Training loss: 1.3376 1.6896 sec/batch
Epoch 93/300 Iteration 1293/4200 Training loss: 1.3385 1.6909 sec/batch
Epoch 93/300 Iteration 1294/4200 Training loss: 1.3359 1.6911 sec/batch
Epoch 93/300 Iteration 1295/4200 Training loss: 1.3297 1.6938 sec/batch
Epoch 93/300 Iteration 1296/4200 Training loss: 1.3281 1.6904 sec/batch
Epoch 93/300 Iteration 1297/4200 Training loss: 1.3282 1.6913 sec/batch
Epoch 93/300 Iteration 1298/4200 Training loss: 1.3301 1.6906 sec/batch
Epoch 93/300 Iteration 1299/4200 Training loss: 1.3327 1.6919 sec/batch
Epoch 93/300 Iteration 1300/4200 Training loss: 1.3324 1.6957 sec/batch
Validation loss: 1.62403 Saving checkpoint!
Epoch 93/300 Iteration 1301/4200 Training loss: 1.3434 3.1652 sec/batch
Epoch 93/300 Iteration 1302/4200 Training loss: 1.3443 1.8551 sec/batch
Epoch 94/300 Iteration 1303/4200 Training loss: 1.3797 1.6492 sec/batch
Epoch 94/300 Iteration 1304/4200 Training loss: 1.3425 1.6623 sec/batch
Epoch 94/300 Iteration 1305/4200 Training loss: 1.3284 1.6924 sec/batch
Epoch 94/300 Iteration 1306/4200 Training loss: 1.3284 1.6895 sec/batch
Epoch 94/300 Iteration 1307/4200 Training loss: 1.3297 1.6918 sec/batch
Epoch 94/300 Iteration 1308/4200 Training loss: 1.3287 1.6909 sec/batch
Epoch 94/300 Iteration 1309/4200 Training loss: 1.3218 1.7004 sec/batch
Epoch 94/300 Iteration 1310/4200 Training loss: 1.3203 1.7052 sec/batch
Epoch 94/300 Iteration 1311/4200 Training loss: 1.3184 1.6913 sec/batch
Epoch 94/300 Iteration 1312/4200 Training loss: 1.3207 1.6927 sec/batch
Epoch 94/300 Iteration 1313/4200 Training loss: 1.3249 1.6915 sec/batch
Epoch 94/300 Iteration 1314/4200 Training loss: 1.3241 1.7239 sec/batch
Epoch 94/300 Iteration 1315/4200 Training loss: 1.3259 1.6922 sec/batch
Epoch 94/300 Iteration 1316/4200 Training loss: 1.3262 1.7095 sec/batch
Epoch 95/300 Iteration 1317/4200 Training loss: 1.3434 1.8021 sec/batch
Epoch 95/300 Iteration 1318/4200 Training loss: 1.3194 1.7001 sec/batch
Epoch 95/300 Iteration 1319/4200 Training loss: 1.3060 1.7436 sec/batch
Epoch 95/300 Iteration 1320/4200 Training loss: 1.3091 1.7507 sec/batch
Epoch 95/300 Iteration 1321/4200 Training loss: 1.3090 1.7381 sec/batch
Epoch 95/300 Iteration 1322/4200 Training loss: 1.3101 1.8120 sec/batch
Epoch 95/300 Iteration 1323/4200 Training loss: 1.3074 1.7431 sec/batch
Epoch 95/300 Iteration 1324/4200 Training loss: 1.3055 1.7387 sec/batch
Epoch 95/300 Iteration 1325/4200 Training loss: 1.3041 1.7526 sec/batch
Epoch 95/300 Iteration 1326/4200 Training loss: 1.3048 1.7448 sec/batch
Epoch 95/300 Iteration 1327/4200 Training loss: 1.3074 1.8078 sec/batch
Epoch 95/300 Iteration 1328/4200 Training loss: 1.3071 1.6963 sec/batch
Epoch 95/300 Iteration 1329/4200 Training loss: 1.3090 1.6976 sec/batch
Epoch 95/300 Iteration 1330/4200 Training loss: 1.3096 1.7026 sec/batch
Epoch 96/300 Iteration 1331/4200 Training loss: 1.3500 1.7247 sec/batch
Epoch 96/300 Iteration 1332/4200 Training loss: 1.3118 1.7401 sec/batch
Epoch 96/300 Iteration 1333/4200 Training loss: 1.2988 1.7386 sec/batch
Epoch 96/300 Iteration 1334/4200 Training loss: 1.3014 1.7365 sec/batch
Epoch 96/300 Iteration 1335/4200 Training loss: 1.3021 1.7380 sec/batch
Epoch 96/300 Iteration 1336/4200 Training loss: 1.3013 1.7441 sec/batch
Epoch 96/300 Iteration 1337/4200 Training loss: 1.2994 1.7404 sec/batch
Epoch 96/300 Iteration 1338/4200 Training loss: 1.2969 1.7405 sec/batch
Epoch 96/300 Iteration 1339/4200 Training loss: 1.2945 1.7395 sec/batch
Epoch 96/300 Iteration 1340/4200 Training loss: 1.2961 1.7436 sec/batch
Epoch 96/300 Iteration 1341/4200 Training loss: 1.2991 1.7393 sec/batch
Epoch 96/300 Iteration 1342/4200 Training loss: 1.2986 1.6908 sec/batch
Epoch 96/300 Iteration 1343/4200 Training loss: 1.3007 1.6931 sec/batch
Epoch 96/300 Iteration 1344/4200 Training loss: 1.3027 1.6913 sec/batch
Epoch 97/300 Iteration 1345/4200 Training loss: 1.3341 1.7241 sec/batch
Epoch 97/300 Iteration 1346/4200 Training loss: 1.2983 1.7364 sec/batch
Epoch 97/300 Iteration 1347/4200 Training loss: 1.2917 1.7393 sec/batch
Epoch 97/300 Iteration 1348/4200 Training loss: 1.2904 1.7427 sec/batch
Epoch 97/300 Iteration 1349/4200 Training loss: 1.2907 1.6942 sec/batch
Epoch 97/300 Iteration 1350/4200 Training loss: 1.2896 1.6993 sec/batch
Epoch 97/300 Iteration 1351/4200 Training loss: 1.2846 1.6911 sec/batch
Epoch 97/300 Iteration 1352/4200 Training loss: 1.2846 1.6891 sec/batch
Epoch 97/300 Iteration 1353/4200 Training loss: 1.2837 1.6926 sec/batch
Epoch 97/300 Iteration 1354/4200 Training loss: 1.2847 1.6881 sec/batch
Epoch 97/300 Iteration 1355/4200 Training loss: 1.2869 1.6911 sec/batch
Epoch 97/300 Iteration 1356/4200 Training loss: 1.2870 1.7098 sec/batch
Epoch 97/300 Iteration 1357/4200 Training loss: 1.2897 1.7434 sec/batch
Epoch 97/300 Iteration 1358/4200 Training loss: 1.2913 1.7429 sec/batch
Epoch 98/300 Iteration 1359/4200 Training loss: 1.3208 1.7395 sec/batch
Epoch 98/300 Iteration 1360/4200 Training loss: 1.2887 1.7071 sec/batch
Epoch 98/300 Iteration 1361/4200 Training loss: 1.2802 1.7380 sec/batch
Epoch 98/300 Iteration 1362/4200 Training loss: 1.2845 1.7383 sec/batch
Epoch 98/300 Iteration 1363/4200 Training loss: 1.2854 1.6962 sec/batch
Epoch 98/300 Iteration 1364/4200 Training loss: 1.2868 1.7398 sec/batch
Epoch 98/300 Iteration 1365/4200 Training loss: 1.2829 1.6985 sec/batch
Epoch 98/300 Iteration 1366/4200 Training loss: 1.2802 1.7388 sec/batch
Epoch 98/300 Iteration 1367/4200 Training loss: 1.2789 1.7387 sec/batch
Epoch 98/300 Iteration 1368/4200 Training loss: 1.2802 1.7421 sec/batch
Epoch 98/300 Iteration 1369/4200 Training loss: 1.2830 1.7403 sec/batch
Epoch 98/300 Iteration 1370/4200 Training loss: 1.2837 1.7412 sec/batch
Epoch 98/300 Iteration 1371/4200 Training loss: 1.2862 1.7439 sec/batch
Epoch 98/300 Iteration 1372/4200 Training loss: 1.2882 1.7392 sec/batch
Epoch 99/300 Iteration 1373/4200 Training loss: 1.3244 1.7382 sec/batch
Epoch 99/300 Iteration 1374/4200 Training loss: 1.2937 1.7403 sec/batch
Epoch 99/300 Iteration 1375/4200 Training loss: 1.2758 1.6941 sec/batch
Epoch 99/300 Iteration 1376/4200 Training loss: 1.2743 1.6907 sec/batch
Epoch 99/300 Iteration 1377/4200 Training loss: 1.2764 1.6900 sec/batch
Epoch 99/300 Iteration 1378/4200 Training loss: 1.2757 1.7352 sec/batch
Epoch 99/300 Iteration 1379/4200 Training loss: 1.2713 1.7126 sec/batch
Epoch 99/300 Iteration 1380/4200 Training loss: 1.2696 1.7560 sec/batch
Epoch 99/300 Iteration 1381/4200 Training loss: 1.2695 1.7490 sec/batch
Epoch 99/300 Iteration 1382/4200 Training loss: 1.2695 1.6937 sec/batch
Epoch 99/300 Iteration 1383/4200 Training loss: 1.2722 1.6933 sec/batch
Epoch 99/300 Iteration 1384/4200 Training loss: 1.2719 1.6919 sec/batch
Epoch 99/300 Iteration 1385/4200 Training loss: 1.2736 1.7235 sec/batch
Epoch 99/300 Iteration 1386/4200 Training loss: 1.2745 1.7238 sec/batch
Epoch 100/300 Iteration 1387/4200 Training loss: 1.3146 1.6923 sec/batch
Epoch 100/300 Iteration 1388/4200 Training loss: 1.2847 1.7243 sec/batch
Epoch 100/300 Iteration 1389/4200 Training loss: 1.2667 1.7393 sec/batch
Epoch 100/300 Iteration 1390/4200 Training loss: 1.2640 1.7385 sec/batch
Epoch 100/300 Iteration 1391/4200 Training loss: 1.2654 1.7362 sec/batch
Epoch 100/300 Iteration 1392/4200 Training loss: 1.2631 1.7396 sec/batch
Epoch 100/300 Iteration 1393/4200 Training loss: 1.2585 1.7436 sec/batch
Epoch 100/300 Iteration 1394/4200 Training loss: 1.2589 1.7405 sec/batch
Epoch 100/300 Iteration 1395/4200 Training loss: 1.2591 1.7399 sec/batch
Epoch 100/300 Iteration 1396/4200 Training loss: 1.2596 1.7383 sec/batch
Epoch 100/300 Iteration 1397/4200 Training loss: 1.2625 1.7383 sec/batch
Epoch 100/300 Iteration 1398/4200 Training loss: 1.2628 1.7405 sec/batch
Epoch 100/300 Iteration 1399/4200 Training loss: 1.2656 1.7441 sec/batch
Epoch 100/300 Iteration 1400/4200 Training loss: 1.2661 1.7415 sec/batch
Validation loss: 1.65712 Saving checkpoint!
Epoch 101/300 Iteration 1401/4200 Training loss: 1.3192 3.2082 sec/batch
Epoch 101/300 Iteration 1402/4200 Training loss: 1.2810 1.8296 sec/batch
Epoch 101/300 Iteration 1403/4200 Training loss: 1.2639 1.6358 sec/batch
Epoch 101/300 Iteration 1404/4200 Training loss: 1.2655 1.6684 sec/batch
Epoch 101/300 Iteration 1405/4200 Training loss: 1.2678 1.6888 sec/batch
Epoch 101/300 Iteration 1406/4200 Training loss: 1.2654 1.6950 sec/batch
Epoch 101/300 Iteration 1407/4200 Training loss: 1.2595 1.6917 sec/batch
Epoch 101/300 Iteration 1408/4200 Training loss: 1.2577 1.6919 sec/batch
Epoch 101/300 Iteration 1409/4200 Training loss: 1.2567 1.6904 sec/batch
Epoch 101/300 Iteration 1410/4200 Training loss: 1.2591 1.6947 sec/batch
Epoch 101/300 Iteration 1411/4200 Training loss: 1.2617 1.6947 sec/batch
Epoch 101/300 Iteration 1412/4200 Training loss: 1.2622 1.6923 sec/batch
Epoch 101/300 Iteration 1413/4200 Training loss: 1.2632 1.6903 sec/batch
Epoch 101/300 Iteration 1414/4200 Training loss: 1.2629 1.6917 sec/batch
Epoch 102/300 Iteration 1415/4200 Training loss: 1.2973 1.6927 sec/batch
Epoch 102/300 Iteration 1416/4200 Training loss: 1.2645 1.6913 sec/batch
Epoch 102/300 Iteration 1417/4200 Training loss: 1.2501 1.6928 sec/batch
Epoch 102/300 Iteration 1418/4200 Training loss: 1.2501 1.6932 sec/batch
Epoch 102/300 Iteration 1419/4200 Training loss: 1.2504 1.6916 sec/batch
Epoch 102/300 Iteration 1420/4200 Training loss: 1.2487 1.6969 sec/batch
Epoch 102/300 Iteration 1421/4200 Training loss: 1.2434 1.6911 sec/batch
Epoch 102/300 Iteration 1422/4200 Training loss: 1.2399 1.6901 sec/batch
Epoch 102/300 Iteration 1423/4200 Training loss: 1.2388 1.7075 sec/batch
Epoch 102/300 Iteration 1424/4200 Training loss: 1.2392 1.7449 sec/batch
Epoch 102/300 Iteration 1425/4200 Training loss: 1.2414 1.7458 sec/batch
Epoch 102/300 Iteration 1426/4200 Training loss: 1.2421 1.7373 sec/batch
Epoch 102/300 Iteration 1427/4200 Training loss: 1.2451 1.7383 sec/batch
Epoch 102/300 Iteration 1428/4200 Training loss: 1.2469 1.7409 sec/batch
Epoch 103/300 Iteration 1429/4200 Training loss: 1.2886 1.7372 sec/batch
Epoch 103/300 Iteration 1430/4200 Training loss: 1.2489 1.7365 sec/batch
Epoch 103/300 Iteration 1431/4200 Training loss: 1.2392 1.7432 sec/batch
Epoch 103/300 Iteration 1432/4200 Training loss: 1.2386 1.7369 sec/batch
Epoch 103/300 Iteration 1433/4200 Training loss: 1.2431 1.7398 sec/batch
Epoch 103/300 Iteration 1434/4200 Training loss: 1.2424 1.7380 sec/batch
Epoch 103/300 Iteration 1435/4200 Training loss: 1.2385 1.7390 sec/batch
Epoch 103/300 Iteration 1436/4200 Training loss: 1.2373 1.7357 sec/batch
Epoch 103/300 Iteration 1437/4200 Training loss: 1.2346 1.7457 sec/batch
Epoch 103/300 Iteration 1438/4200 Training loss: 1.2344 1.7386 sec/batch
Epoch 103/300 Iteration 1439/4200 Training loss: 1.2376 1.7374 sec/batch
Epoch 103/300 Iteration 1440/4200 Training loss: 1.2377 1.7386 sec/batch
Epoch 103/300 Iteration 1441/4200 Training loss: 1.2405 1.7420 sec/batch
Epoch 103/300 Iteration 1442/4200 Training loss: 1.2408 1.7426 sec/batch
Epoch 104/300 Iteration 1443/4200 Training loss: 1.2810 1.7378 sec/batch
Epoch 104/300 Iteration 1444/4200 Training loss: 1.2414 1.7461 sec/batch
Epoch 104/300 Iteration 1445/4200 Training loss: 1.2318 1.7616 sec/batch
Epoch 104/300 Iteration 1446/4200 Training loss: 1.2354 1.7073 sec/batch
Epoch 104/300 Iteration 1447/4200 Training loss: 1.2353 1.6975 sec/batch
Epoch 104/300 Iteration 1448/4200 Training loss: 1.2314 1.7428 sec/batch
Epoch 104/300 Iteration 1449/4200 Training loss: 1.2259 1.7399 sec/batch
Epoch 104/300 Iteration 1450/4200 Training loss: 1.2245 1.7393 sec/batch
Epoch 104/300 Iteration 1451/4200 Training loss: 1.2230 1.7410 sec/batch
Epoch 104/300 Iteration 1452/4200 Training loss: 1.2234 1.7386 sec/batch
Epoch 104/300 Iteration 1453/4200 Training loss: 1.2259 1.7377 sec/batch
Epoch 104/300 Iteration 1454/4200 Training loss: 1.2260 1.7390 sec/batch
Epoch 104/300 Iteration 1455/4200 Training loss: 1.2279 1.7627 sec/batch
Epoch 104/300 Iteration 1456/4200 Training loss: 1.2285 1.7689 sec/batch
Epoch 105/300 Iteration 1457/4200 Training loss: 1.2699 1.8205 sec/batch
Epoch 105/300 Iteration 1458/4200 Training loss: 1.2343 1.7950 sec/batch
Epoch 105/300 Iteration 1459/4200 Training loss: 1.2183 1.7518 sec/batch
Epoch 105/300 Iteration 1460/4200 Training loss: 1.2201 1.7700 sec/batch
Epoch 105/300 Iteration 1461/4200 Training loss: 1.2220 1.8267 sec/batch
Epoch 105/300 Iteration 1462/4200 Training loss: 1.2215 1.7837 sec/batch
Epoch 105/300 Iteration 1463/4200 Training loss: 1.2155 1.7373 sec/batch
Epoch 105/300 Iteration 1464/4200 Training loss: 1.2147 1.7400 sec/batch
Epoch 105/300 Iteration 1465/4200 Training loss: 1.2139 1.7466 sec/batch
Epoch 105/300 Iteration 1466/4200 Training loss: 1.2153 1.7428 sec/batch
Epoch 105/300 Iteration 1467/4200 Training loss: 1.2183 1.7997 sec/batch
Epoch 105/300 Iteration 1468/4200 Training loss: 1.2192 1.7409 sec/batch
Epoch 105/300 Iteration 1469/4200 Training loss: 1.2219 1.7149 sec/batch
Epoch 105/300 Iteration 1470/4200 Training loss: 1.2223 1.7355 sec/batch
Epoch 106/300 Iteration 1471/4200 Training loss: 1.2680 1.7913 sec/batch
Epoch 106/300 Iteration 1472/4200 Training loss: 1.2277 1.8046 sec/batch
Epoch 106/300 Iteration 1473/4200 Training loss: 1.2128 1.7842 sec/batch
Epoch 106/300 Iteration 1474/4200 Training loss: 1.2145 1.7837 sec/batch
Epoch 106/300 Iteration 1475/4200 Training loss: 1.2155 1.7413 sec/batch
Epoch 106/300 Iteration 1476/4200 Training loss: 1.2154 1.7290 sec/batch
Epoch 106/300 Iteration 1477/4200 Training loss: 1.2101 1.6911 sec/batch
Epoch 106/300 Iteration 1478/4200 Training loss: 1.2075 1.6929 sec/batch
Epoch 106/300 Iteration 1479/4200 Training loss: 1.2067 1.7027 sec/batch
Epoch 106/300 Iteration 1480/4200 Training loss: 1.2086 1.7369 sec/batch
Epoch 106/300 Iteration 1481/4200 Training loss: 1.2114 1.7782 sec/batch
Epoch 106/300 Iteration 1482/4200 Training loss: 1.2117 1.7384 sec/batch
Epoch 106/300 Iteration 1483/4200 Training loss: 1.2145 1.7386 sec/batch
Epoch 106/300 Iteration 1484/4200 Training loss: 1.2166 1.7694 sec/batch
Epoch 107/300 Iteration 1485/4200 Training loss: 1.2579 1.7751 sec/batch
Epoch 107/300 Iteration 1486/4200 Training loss: 1.2247 1.7712 sec/batch
Epoch 107/300 Iteration 1487/4200 Training loss: 1.2119 1.7405 sec/batch
Epoch 107/300 Iteration 1488/4200 Training loss: 1.2111 1.7397 sec/batch
Epoch 107/300 Iteration 1489/4200 Training loss: 1.2126 1.7625 sec/batch
Epoch 107/300 Iteration 1490/4200 Training loss: 1.2124 1.7930 sec/batch
Epoch 107/300 Iteration 1491/4200 Training loss: 1.2073 1.7925 sec/batch
Epoch 107/300 Iteration 1492/4200 Training loss: 1.2064 1.7920 sec/batch
Epoch 107/300 Iteration 1493/4200 Training loss: 1.2049 1.7407 sec/batch
Epoch 107/300 Iteration 1494/4200 Training loss: 1.2062 1.7390 sec/batch
Epoch 107/300 Iteration 1495/4200 Training loss: 1.2080 1.7401 sec/batch
Epoch 107/300 Iteration 1496/4200 Training loss: 1.2078 1.7391 sec/batch
Epoch 107/300 Iteration 1497/4200 Training loss: 1.2111 1.7379 sec/batch
Epoch 107/300 Iteration 1498/4200 Training loss: 1.2118 1.7371 sec/batch
Epoch 108/300 Iteration 1499/4200 Training loss: 1.2425 1.7545 sec/batch
Epoch 108/300 Iteration 1500/4200 Training loss: 1.2059 1.7975 sec/batch
Validation loss: 1.68483 Saving checkpoint!
Epoch 108/300 Iteration 1501/4200 Training loss: 1.2572 3.2471 sec/batch
Epoch 108/300 Iteration 1502/4200 Training loss: 1.2443 1.8222 sec/batch
Epoch 108/300 Iteration 1503/4200 Training loss: 1.2349 1.7016 sec/batch
Epoch 108/300 Iteration 1504/4200 Training loss: 1.2297 1.6898 sec/batch
Epoch 108/300 Iteration 1505/4200 Training loss: 1.2208 1.6658 sec/batch
Epoch 108/300 Iteration 1506/4200 Training loss: 1.2167 1.6899 sec/batch
Epoch 108/300 Iteration 1507/4200 Training loss: 1.2144 1.6895 sec/batch
Epoch 108/300 Iteration 1508/4200 Training loss: 1.2142 1.6876 sec/batch
Epoch 108/300 Iteration 1509/4200 Training loss: 1.2141 1.6898 sec/batch
Epoch 108/300 Iteration 1510/4200 Training loss: 1.2131 1.7142 sec/batch
Epoch 108/300 Iteration 1511/4200 Training loss: 1.2143 1.6891 sec/batch
Epoch 108/300 Iteration 1512/4200 Training loss: 1.2145 1.7232 sec/batch
Epoch 109/300 Iteration 1513/4200 Training loss: 1.2360 1.6870 sec/batch
Epoch 109/300 Iteration 1514/4200 Training loss: 1.2024 1.6898 sec/batch
Epoch 109/300 Iteration 1515/4200 Training loss: 1.1921 1.6915 sec/batch
Epoch 109/300 Iteration 1516/4200 Training loss: 1.1890 1.6893 sec/batch
Epoch 109/300 Iteration 1517/4200 Training loss: 1.1918 1.7061 sec/batch
Epoch 109/300 Iteration 1518/4200 Training loss: 1.1924 1.7389 sec/batch
Epoch 109/300 Iteration 1519/4200 Training loss: 1.1887 1.7362 sec/batch
Epoch 109/300 Iteration 1520/4200 Training loss: 1.1870 1.7394 sec/batch
Epoch 109/300 Iteration 1521/4200 Training loss: 1.1855 1.7420 sec/batch
Epoch 109/300 Iteration 1522/4200 Training loss: 1.1856 1.6931 sec/batch
Epoch 109/300 Iteration 1523/4200 Training loss: 1.1883 1.6901 sec/batch
Epoch 109/300 Iteration 1524/4200 Training loss: 1.1874 1.6891 sec/batch
Epoch 109/300 Iteration 1525/4200 Training loss: 1.1895 1.6903 sec/batch
Epoch 109/300 Iteration 1526/4200 Training loss: 1.1907 1.6924 sec/batch
Epoch 110/300 Iteration 1527/4200 Training loss: 1.2398 1.6883 sec/batch
Epoch 110/300 Iteration 1528/4200 Training loss: 1.2009 1.6914 sec/batch
Epoch 110/300 Iteration 1529/4200 Training loss: 1.1847 1.7256 sec/batch
Epoch 110/300 Iteration 1530/4200 Training loss: 1.1832 1.7869 sec/batch
Epoch 110/300 Iteration 1531/4200 Training loss: 1.1843 1.7437 sec/batch
Epoch 110/300 Iteration 1532/4200 Training loss: 1.1818 1.7709 sec/batch
Epoch 110/300 Iteration 1533/4200 Training loss: 1.1766 1.8578 sec/batch
Epoch 110/300 Iteration 1534/4200 Training loss: 1.1754 1.7481 sec/batch
Epoch 110/300 Iteration 1535/4200 Training loss: 1.1732 1.8257 sec/batch
Epoch 110/300 Iteration 1536/4200 Training loss: 1.1738 1.8266 sec/batch
Epoch 110/300 Iteration 1537/4200 Training loss: 1.1760 1.7759 sec/batch
Epoch 110/300 Iteration 1538/4200 Training loss: 1.1768 1.7622 sec/batch
Epoch 110/300 Iteration 1539/4200 Training loss: 1.1794 1.7294 sec/batch
Epoch 110/300 Iteration 1540/4200 Training loss: 1.1805 1.7829 sec/batch
Epoch 111/300 Iteration 1541/4200 Training loss: 1.2233 1.7388 sec/batch
Epoch 111/300 Iteration 1542/4200 Training loss: 1.1896 1.8174 sec/batch
Epoch 111/300 Iteration 1543/4200 Training loss: 1.1751 1.7985 sec/batch
Epoch 111/300 Iteration 1544/4200 Training loss: 1.1738 1.8132 sec/batch
Epoch 111/300 Iteration 1545/4200 Training loss: 1.1759 1.7516 sec/batch
Epoch 111/300 Iteration 1546/4200 Training loss: 1.1757 1.7249 sec/batch
Epoch 111/300 Iteration 1547/4200 Training loss: 1.1710 1.7368 sec/batch
Epoch 111/300 Iteration 1548/4200 Training loss: 1.1700 1.7432 sec/batch
Epoch 111/300 Iteration 1549/4200 Training loss: 1.1688 1.7366 sec/batch
Epoch 111/300 Iteration 1550/4200 Training loss: 1.1689 1.8172 sec/batch
Epoch 111/300 Iteration 1551/4200 Training loss: 1.1713 1.7976 sec/batch
Epoch 111/300 Iteration 1552/4200 Training loss: 1.1726 1.7381 sec/batch
Epoch 111/300 Iteration 1553/4200 Training loss: 1.1738 1.6916 sec/batch
Epoch 111/300 Iteration 1554/4200 Training loss: 1.1750 1.7442 sec/batch
Epoch 112/300 Iteration 1555/4200 Training loss: 1.2122 1.7736 sec/batch
Epoch 112/300 Iteration 1556/4200 Training loss: 1.1766 1.8305 sec/batch
Epoch 112/300 Iteration 1557/4200 Training loss: 1.1668 1.8016 sec/batch
Epoch 112/300 Iteration 1558/4200 Training loss: 1.1659 1.7908 sec/batch
Epoch 112/300 Iteration 1559/4200 Training loss: 1.1667 1.7961 sec/batch
Epoch 112/300 Iteration 1560/4200 Training loss: 1.1663 1.7453 sec/batch
Epoch 112/300 Iteration 1561/4200 Training loss: 1.1627 1.7548 sec/batch
Epoch 112/300 Iteration 1562/4200 Training loss: 1.1626 1.7977 sec/batch
Epoch 112/300 Iteration 1563/4200 Training loss: 1.1605 1.7946 sec/batch
Epoch 112/300 Iteration 1564/4200 Training loss: 1.1619 1.7927 sec/batch
Epoch 112/300 Iteration 1565/4200 Training loss: 1.1647 1.7983 sec/batch
Epoch 112/300 Iteration 1566/4200 Training loss: 1.1632 1.7929 sec/batch
Epoch 112/300 Iteration 1567/4200 Training loss: 1.1665 1.7437 sec/batch
Epoch 112/300 Iteration 1568/4200 Training loss: 1.1683 1.7415 sec/batch
Epoch 113/300 Iteration 1569/4200 Training loss: 1.2259 1.6899 sec/batch
Epoch 113/300 Iteration 1570/4200 Training loss: 1.1876 1.6893 sec/batch
Epoch 113/300 Iteration 1571/4200 Training loss: 1.1722 1.6923 sec/batch
Epoch 113/300 Iteration 1572/4200 Training loss: 1.1696 1.7298 sec/batch
Epoch 113/300 Iteration 1573/4200 Training loss: 1.1675 1.7398 sec/batch
Epoch 113/300 Iteration 1574/4200 Training loss: 1.1635 1.7382 sec/batch
Epoch 113/300 Iteration 1575/4200 Training loss: 1.1600 1.7390 sec/batch
Epoch 113/300 Iteration 1576/4200 Training loss: 1.1583 1.7383 sec/batch
Epoch 113/300 Iteration 1577/4200 Training loss: 1.1586 1.7416 sec/batch
Epoch 113/300 Iteration 1578/4200 Training loss: 1.1601 1.7985 sec/batch
Epoch 113/300 Iteration 1579/4200 Training loss: 1.1617 1.7612 sec/batch
Epoch 113/300 Iteration 1580/4200 Training loss: 1.1607 1.7522 sec/batch
Epoch 113/300 Iteration 1581/4200 Training loss: 1.1623 1.7506 sec/batch
Epoch 113/300 Iteration 1582/4200 Training loss: 1.1626 1.7384 sec/batch
Epoch 114/300 Iteration 1583/4200 Training loss: 1.2110 1.7389 sec/batch
Epoch 114/300 Iteration 1584/4200 Training loss: 1.1750 1.7439 sec/batch
Epoch 114/300 Iteration 1585/4200 Training loss: 1.1605 1.7560 sec/batch
Epoch 114/300 Iteration 1586/4200 Training loss: 1.1599 1.7282 sec/batch
Epoch 114/300 Iteration 1587/4200 Training loss: 1.1565 1.6913 sec/batch
Epoch 114/300 Iteration 1588/4200 Training loss: 1.1548 1.6994 sec/batch
Epoch 114/300 Iteration 1589/4200 Training loss: 1.1509 1.7370 sec/batch
Epoch 114/300 Iteration 1590/4200 Training loss: 1.1508 1.6903 sec/batch
Epoch 114/300 Iteration 1591/4200 Training loss: 1.1485 1.7013 sec/batch
Epoch 114/300 Iteration 1592/4200 Training loss: 1.1503 1.7069 sec/batch
Epoch 114/300 Iteration 1593/4200 Training loss: 1.1521 1.8475 sec/batch
Epoch 114/300 Iteration 1594/4200 Training loss: 1.1513 1.7713 sec/batch
Epoch 114/300 Iteration 1595/4200 Training loss: 1.1539 1.7675 sec/batch
Epoch 114/300 Iteration 1596/4200 Training loss: 1.1534 1.7700 sec/batch
Epoch 115/300 Iteration 1597/4200 Training loss: 1.2079 1.7500 sec/batch
Epoch 115/300 Iteration 1598/4200 Training loss: 1.1708 1.8376 sec/batch
Epoch 115/300 Iteration 1599/4200 Training loss: 1.1560 1.7733 sec/batch
Epoch 115/300 Iteration 1600/4200 Training loss: 1.1537 1.7253 sec/batch
Validation loss: 1.72509 Saving checkpoint!
Epoch 115/300 Iteration 1601/4200 Training loss: 1.1947 3.2412 sec/batch
Epoch 115/300 Iteration 1602/4200 Training loss: 1.1842 1.8293 sec/batch
Epoch 115/300 Iteration 1603/4200 Training loss: 1.1738 1.6733 sec/batch
Epoch 115/300 Iteration 1604/4200 Training loss: 1.1698 1.6989 sec/batch
Epoch 115/300 Iteration 1605/4200 Training loss: 1.1654 1.6923 sec/batch
Epoch 115/300 Iteration 1606/4200 Training loss: 1.1651 1.6925 sec/batch
Epoch 115/300 Iteration 1607/4200 Training loss: 1.1658 1.6938 sec/batch
Epoch 115/300 Iteration 1608/4200 Training loss: 1.1635 1.6903 sec/batch
Epoch 115/300 Iteration 1609/4200 Training loss: 1.1639 1.6894 sec/batch
Epoch 115/300 Iteration 1610/4200 Training loss: 1.1625 1.6956 sec/batch
Epoch 116/300 Iteration 1611/4200 Training loss: 1.1829 1.6947 sec/batch
Epoch 116/300 Iteration 1612/4200 Training loss: 1.1526 1.8354 sec/batch
Epoch 116/300 Iteration 1613/4200 Training loss: 1.1430 1.7534 sec/batch
Epoch 116/300 Iteration 1614/4200 Training loss: 1.1459 1.7948 sec/batch
Epoch 116/300 Iteration 1615/4200 Training loss: 1.1470 1.8213 sec/batch
Epoch 116/300 Iteration 1616/4200 Training loss: 1.1438 1.7411 sec/batch
Epoch 116/300 Iteration 1617/4200 Training loss: 1.1379 1.7425 sec/batch
Epoch 116/300 Iteration 1618/4200 Training loss: 1.1351 1.7394 sec/batch
Epoch 116/300 Iteration 1619/4200 Training loss: 1.1321 1.7375 sec/batch
Epoch 116/300 Iteration 1620/4200 Training loss: 1.1335 1.7389 sec/batch
Epoch 116/300 Iteration 1621/4200 Training loss: 1.1371 1.7727 sec/batch
Epoch 116/300 Iteration 1622/4200 Training loss: 1.1368 1.8480 sec/batch
Epoch 116/300 Iteration 1623/4200 Training loss: 1.1387 1.7592 sec/batch
Epoch 116/300 Iteration 1624/4200 Training loss: 1.1386 1.8203 sec/batch
Epoch 117/300 Iteration 1625/4200 Training loss: 1.1824 1.7512 sec/batch
Epoch 117/300 Iteration 1626/4200 Training loss: 1.1434 1.8140 sec/batch
Epoch 117/300 Iteration 1627/4200 Training loss: 1.1274 1.8255 sec/batch
Epoch 117/300 Iteration 1628/4200 Training loss: 1.1299 1.8287 sec/batch
Epoch 117/300 Iteration 1629/4200 Training loss: 1.1295 1.8043 sec/batch
Epoch 117/300 Iteration 1630/4200 Training loss: 1.1270 1.8455 sec/batch
Epoch 117/300 Iteration 1631/4200 Training loss: 1.1220 1.6986 sec/batch
Epoch 117/300 Iteration 1632/4200 Training loss: 1.1211 1.6925 sec/batch
Epoch 117/300 Iteration 1633/4200 Training loss: 1.1196 1.7006 sec/batch
Epoch 117/300 Iteration 1634/4200 Training loss: 1.1202 1.6901 sec/batch
Epoch 117/300 Iteration 1635/4200 Training loss: 1.1248 1.6744 sec/batch
Epoch 117/300 Iteration 1636/4200 Training loss: 1.1248 1.6929 sec/batch
Epoch 117/300 Iteration 1637/4200 Training loss: 1.1276 1.6912 sec/batch
Epoch 117/300 Iteration 1638/4200 Training loss: 1.1288 1.6904 sec/batch
Epoch 118/300 Iteration 1639/4200 Training loss: 1.1673 1.6871 sec/batch
Epoch 118/300 Iteration 1640/4200 Training loss: 1.1303 1.6913 sec/batch
Epoch 118/300 Iteration 1641/4200 Training loss: 1.1219 1.6923 sec/batch
Epoch 118/300 Iteration 1642/4200 Training loss: 1.1244 1.6920 sec/batch
Epoch 118/300 Iteration 1643/4200 Training loss: 1.1283 1.6927 sec/batch
Epoch 118/300 Iteration 1644/4200 Training loss: 1.1278 1.6931 sec/batch
Epoch 118/300 Iteration 1645/4200 Training loss: 1.1219 1.7117 sec/batch
Epoch 118/300 Iteration 1646/4200 Training loss: 1.1183 1.7037 sec/batch
Epoch 118/300 Iteration 1647/4200 Training loss: 1.1169 1.6921 sec/batch
Epoch 118/300 Iteration 1648/4200 Training loss: 1.1197 1.6905 sec/batch
Epoch 118/300 Iteration 1649/4200 Training loss: 1.1224 1.6942 sec/batch
Epoch 118/300 Iteration 1650/4200 Training loss: 1.1229 1.6904 sec/batch
Epoch 118/300 Iteration 1651/4200 Training loss: 1.1246 1.6912 sec/batch
Epoch 118/300 Iteration 1652/4200 Training loss: 1.1257 1.6918 sec/batch
Epoch 119/300 Iteration 1653/4200 Training loss: 1.1592 1.6914 sec/batch
Epoch 119/300 Iteration 1654/4200 Training loss: 1.1328 1.6904 sec/batch
Epoch 119/300 Iteration 1655/4200 Training loss: 1.1170 1.6940 sec/batch
Epoch 119/300 Iteration 1656/4200 Training loss: 1.1156 1.6920 sec/batch
Epoch 119/300 Iteration 1657/4200 Training loss: 1.1154 1.7245 sec/batch
Epoch 119/300 Iteration 1658/4200 Training loss: 1.1146 1.6943 sec/batch
Epoch 119/300 Iteration 1659/4200 Training loss: 1.1110 1.6902 sec/batch
Epoch 119/300 Iteration 1660/4200 Training loss: 1.1080 1.7112 sec/batch
Epoch 119/300 Iteration 1661/4200 Training loss: 1.1068 1.7372 sec/batch
Epoch 119/300 Iteration 1662/4200 Training loss: 1.1087 1.7380 sec/batch
Epoch 119/300 Iteration 1663/4200 Training loss: 1.1117 1.7394 sec/batch
Epoch 119/300 Iteration 1664/4200 Training loss: 1.1117 1.7414 sec/batch
Epoch 119/300 Iteration 1665/4200 Training loss: 1.1144 1.7365 sec/batch
Epoch 119/300 Iteration 1666/4200 Training loss: 1.1146 1.7414 sec/batch
Epoch 120/300 Iteration 1667/4200 Training loss: 1.1596 1.7397 sec/batch
Epoch 120/300 Iteration 1668/4200 Training loss: 1.1272 1.7418 sec/batch
Epoch 120/300 Iteration 1669/4200 Training loss: 1.1129 1.7385 sec/batch
Epoch 120/300 Iteration 1670/4200 Training loss: 1.1091 1.7371 sec/batch
Epoch 120/300 Iteration 1671/4200 Training loss: 1.1090 1.7387 sec/batch
Epoch 120/300 Iteration 1672/4200 Training loss: 1.1088 1.7417 sec/batch
Epoch 120/300 Iteration 1673/4200 Training loss: 1.1064 1.7403 sec/batch
Epoch 120/300 Iteration 1674/4200 Training loss: 1.1033 1.7397 sec/batch
Epoch 120/300 Iteration 1675/4200 Training loss: 1.1018 1.7394 sec/batch
Epoch 120/300 Iteration 1676/4200 Training loss: 1.1026 1.7442 sec/batch
Epoch 120/300 Iteration 1677/4200 Training loss: 1.1047 1.7379 sec/batch
Epoch 120/300 Iteration 1678/4200 Training loss: 1.1046 1.7427 sec/batch
Epoch 120/300 Iteration 1679/4200 Training loss: 1.1081 1.7334 sec/batch
Epoch 120/300 Iteration 1680/4200 Training loss: 1.1086 1.6928 sec/batch
Epoch 121/300 Iteration 1681/4200 Training loss: 1.1548 1.7236 sec/batch
Epoch 121/300 Iteration 1682/4200 Training loss: 1.1214 1.7655 sec/batch
Epoch 121/300 Iteration 1683/4200 Training loss: 1.1093 1.8379 sec/batch
Epoch 121/300 Iteration 1684/4200 Training loss: 1.1091 1.8401 sec/batch
Epoch 121/300 Iteration 1685/4200 Training loss: 1.1100 1.7566 sec/batch
Epoch 121/300 Iteration 1686/4200 Training loss: 1.1106 1.7987 sec/batch
Epoch 121/300 Iteration 1687/4200 Training loss: 1.1074 1.8018 sec/batch
Epoch 121/300 Iteration 1688/4200 Training loss: 1.1058 1.7925 sec/batch
Epoch 121/300 Iteration 1689/4200 Training loss: 1.1025 1.7951 sec/batch
Epoch 121/300 Iteration 1690/4200 Training loss: 1.1026 1.7950 sec/batch
Epoch 121/300 Iteration 1691/4200 Training loss: 1.1043 1.7937 sec/batch
Epoch 121/300 Iteration 1692/4200 Training loss: 1.1035 1.7952 sec/batch
Epoch 121/300 Iteration 1693/4200 Training loss: 1.1049 1.7925 sec/batch
Epoch 121/300 Iteration 1694/4200 Training loss: 1.1055 1.7952 sec/batch
Epoch 122/300 Iteration 1695/4200 Training loss: 1.1441 1.7950 sec/batch
Epoch 122/300 Iteration 1696/4200 Training loss: 1.1017 1.7438 sec/batch
Epoch 122/300 Iteration 1697/4200 Training loss: 1.0910 1.7393 sec/batch
Epoch 122/300 Iteration 1698/4200 Training loss: 1.0923 1.7381 sec/batch
Epoch 122/300 Iteration 1699/4200 Training loss: 1.0924 1.7391 sec/batch
Epoch 122/300 Iteration 1700/4200 Training loss: 1.0931 1.7382 sec/batch
Validation loss: 1.77801 Saving checkpoint!
Epoch 122/300 Iteration 1701/4200 Training loss: 1.1244 3.1817 sec/batch
Epoch 122/300 Iteration 1702/4200 Training loss: 1.1194 1.8573 sec/batch
Epoch 122/300 Iteration 1703/4200 Training loss: 1.1143 1.6713 sec/batch
Epoch 122/300 Iteration 1704/4200 Training loss: 1.1115 1.6962 sec/batch
Epoch 122/300 Iteration 1705/4200 Training loss: 1.1107 1.6915 sec/batch
Epoch 122/300 Iteration 1706/4200 Training loss: 1.1087 1.7422 sec/batch
Epoch 122/300 Iteration 1707/4200 Training loss: 1.1095 1.7397 sec/batch
Epoch 122/300 Iteration 1708/4200 Training loss: 1.1093 1.7383 sec/batch
Epoch 123/300 Iteration 1709/4200 Training loss: 1.1464 1.7387 sec/batch
Epoch 123/300 Iteration 1710/4200 Training loss: 1.1019 1.7397 sec/batch
Epoch 123/300 Iteration 1711/4200 Training loss: 1.0845 1.7601 sec/batch
Epoch 123/300 Iteration 1712/4200 Training loss: 1.0833 1.7516 sec/batch
Epoch 123/300 Iteration 1713/4200 Training loss: 1.0858 1.7395 sec/batch
Epoch 123/300 Iteration 1714/4200 Training loss: 1.0829 1.7428 sec/batch
Epoch 123/300 Iteration 1715/4200 Training loss: 1.0795 1.7385 sec/batch
Epoch 123/300 Iteration 1716/4200 Training loss: 1.0780 1.7399 sec/batch
Epoch 123/300 Iteration 1717/4200 Training loss: 1.0773 1.7416 sec/batch
Epoch 123/300 Iteration 1718/4200 Training loss: 1.0786 1.7465 sec/batch
Epoch 123/300 Iteration 1719/4200 Training loss: 1.0794 1.7389 sec/batch
Epoch 123/300 Iteration 1720/4200 Training loss: 1.0778 1.7382 sec/batch
Epoch 123/300 Iteration 1721/4200 Training loss: 1.0798 1.7395 sec/batch
Epoch 123/300 Iteration 1722/4200 Training loss: 1.0799 1.7374 sec/batch
Epoch 124/300 Iteration 1723/4200 Training loss: 1.1213 1.7402 sec/batch
Epoch 124/300 Iteration 1724/4200 Training loss: 1.0864 1.7359 sec/batch
Epoch 124/300 Iteration 1725/4200 Training loss: 1.0703 1.7405 sec/batch
Epoch 124/300 Iteration 1726/4200 Training loss: 1.0706 1.7367 sec/batch
Epoch 124/300 Iteration 1727/4200 Training loss: 1.0740 1.7367 sec/batch
Epoch 124/300 Iteration 1728/4200 Training loss: 1.0727 1.7448 sec/batch
Epoch 124/300 Iteration 1729/4200 Training loss: 1.0677 1.7380 sec/batch
Epoch 124/300 Iteration 1730/4200 Training loss: 1.0656 1.7382 sec/batch
Epoch 124/300 Iteration 1731/4200 Training loss: 1.0637 1.7393 sec/batch
Epoch 124/300 Iteration 1732/4200 Training loss: 1.0653 1.7880 sec/batch
Epoch 124/300 Iteration 1733/4200 Training loss: 1.0663 1.7419 sec/batch
Epoch 124/300 Iteration 1734/4200 Training loss: 1.0655 1.7560 sec/batch
Epoch 124/300 Iteration 1735/4200 Training loss: 1.0671 1.8064 sec/batch
Epoch 124/300 Iteration 1736/4200 Training loss: 1.0689 1.7929 sec/batch
Epoch 125/300 Iteration 1737/4200 Training loss: 1.1214 1.7954 sec/batch
Epoch 125/300 Iteration 1738/4200 Training loss: 1.0790 1.7407 sec/batch
Epoch 125/300 Iteration 1739/4200 Training loss: 1.0638 1.7416 sec/batch
Epoch 125/300 Iteration 1740/4200 Training loss: 1.0656 1.7378 sec/batch
Epoch 125/300 Iteration 1741/4200 Training loss: 1.0642 1.7383 sec/batch
Epoch 125/300 Iteration 1742/4200 Training loss: 1.0620 1.7877 sec/batch
Epoch 125/300 Iteration 1743/4200 Training loss: 1.0590 1.7396 sec/batch
Epoch 125/300 Iteration 1744/4200 Training loss: 1.0574 1.7370 sec/batch
Epoch 125/300 Iteration 1745/4200 Training loss: 1.0552 1.7636 sec/batch
Epoch 125/300 Iteration 1746/4200 Training loss: 1.0560 1.7732 sec/batch
Epoch 125/300 Iteration 1747/4200 Training loss: 1.0583 1.7402 sec/batch
Epoch 125/300 Iteration 1748/4200 Training loss: 1.0582 1.7526 sec/batch
Epoch 125/300 Iteration 1749/4200 Training loss: 1.0603 1.7461 sec/batch
Epoch 125/300 Iteration 1750/4200 Training loss: 1.0615 1.8545 sec/batch
Epoch 126/300 Iteration 1751/4200 Training loss: 1.1032 1.7417 sec/batch
Epoch 126/300 Iteration 1752/4200 Training loss: 1.0680 1.7425 sec/batch
Epoch 126/300 Iteration 1753/4200 Training loss: 1.0553 1.7482 sec/batch
Epoch 126/300 Iteration 1754/4200 Training loss: 1.0572 1.7427 sec/batch
Epoch 126/300 Iteration 1755/4200 Training loss: 1.0551 1.7387 sec/batch
Epoch 126/300 Iteration 1756/4200 Training loss: 1.0555 1.7410 sec/batch
Epoch 126/300 Iteration 1757/4200 Training loss: 1.0539 1.7390 sec/batch
Epoch 126/300 Iteration 1758/4200 Training loss: 1.0510 1.7391 sec/batch
Epoch 126/300 Iteration 1759/4200 Training loss: 1.0502 1.7967 sec/batch
Epoch 126/300 Iteration 1760/4200 Training loss: 1.0503 1.7382 sec/batch
Epoch 126/300 Iteration 1761/4200 Training loss: 1.0515 1.7382 sec/batch
Epoch 126/300 Iteration 1762/4200 Training loss: 1.0514 1.7850 sec/batch
Epoch 126/300 Iteration 1763/4200 Training loss: 1.0529 1.7993 sec/batch
Epoch 126/300 Iteration 1764/4200 Training loss: 1.0539 1.7392 sec/batch
Epoch 127/300 Iteration 1765/4200 Training loss: 1.0983 1.7595 sec/batch
Epoch 127/300 Iteration 1766/4200 Training loss: 1.0655 1.7990 sec/batch
Epoch 127/300 Iteration 1767/4200 Training loss: 1.0478 1.7950 sec/batch
Epoch 127/300 Iteration 1768/4200 Training loss: 1.0479 1.7927 sec/batch
Epoch 127/300 Iteration 1769/4200 Training loss: 1.0486 1.7941 sec/batch
Epoch 127/300 Iteration 1770/4200 Training loss: 1.0477 1.7944 sec/batch
Epoch 127/300 Iteration 1771/4200 Training loss: 1.0425 1.7937 sec/batch
Epoch 127/300 Iteration 1772/4200 Training loss: 1.0415 1.7924 sec/batch
Epoch 127/300 Iteration 1773/4200 Training loss: 1.0407 1.7953 sec/batch
Epoch 127/300 Iteration 1774/4200 Training loss: 1.0433 1.7949 sec/batch
Epoch 127/300 Iteration 1775/4200 Training loss: 1.0458 1.7941 sec/batch
Epoch 127/300 Iteration 1776/4200 Training loss: 1.0466 1.7945 sec/batch
Epoch 127/300 Iteration 1777/4200 Training loss: 1.0485 1.7423 sec/batch
Epoch 127/300 Iteration 1778/4200 Training loss: 1.0497 1.7420 sec/batch
Epoch 128/300 Iteration 1779/4200 Training loss: 1.1072 1.7641 sec/batch
Epoch 128/300 Iteration 1780/4200 Training loss: 1.0659 1.7467 sec/batch
Epoch 128/300 Iteration 1781/4200 Training loss: 1.0497 1.7479 sec/batch
Epoch 128/300 Iteration 1782/4200 Training loss: 1.0492 1.7421 sec/batch
Epoch 128/300 Iteration 1783/4200 Training loss: 1.0502 1.7376 sec/batch
Epoch 128/300 Iteration 1784/4200 Training loss: 1.0457 1.7386 sec/batch
Epoch 128/300 Iteration 1785/4200 Training loss: 1.0412 1.7378 sec/batch
Epoch 128/300 Iteration 1786/4200 Training loss: 1.0389 1.7379 sec/batch
Epoch 128/300 Iteration 1787/4200 Training loss: 1.0391 1.7384 sec/batch
Epoch 128/300 Iteration 1788/4200 Training loss: 1.0392 1.7379 sec/batch
Epoch 128/300 Iteration 1789/4200 Training loss: 1.0401 1.7405 sec/batch
Epoch 128/300 Iteration 1790/4200 Training loss: 1.0411 1.7400 sec/batch
Epoch 128/300 Iteration 1791/4200 Training loss: 1.0425 1.7421 sec/batch
Epoch 128/300 Iteration 1792/4200 Training loss: 1.0432 1.7377 sec/batch
Epoch 129/300 Iteration 1793/4200 Training loss: 1.0781 1.7467 sec/batch
Epoch 129/300 Iteration 1794/4200 Training loss: 1.0506 1.7418 sec/batch
Epoch 129/300 Iteration 1795/4200 Training loss: 1.0340 1.7785 sec/batch
Epoch 129/300 Iteration 1796/4200 Training loss: 1.0302 1.7976 sec/batch
Epoch 129/300 Iteration 1797/4200 Training loss: 1.0311 1.7387 sec/batch
Epoch 129/300 Iteration 1798/4200 Training loss: 1.0305 1.7434 sec/batch
Epoch 129/300 Iteration 1799/4200 Training loss: 1.0292 1.7936 sec/batch
Epoch 129/300 Iteration 1800/4200 Training loss: 1.0277 1.7927 sec/batch
Validation loss: 1.83666 Saving checkpoint!
Epoch 129/300 Iteration 1801/4200 Training loss: 1.0560 3.2042 sec/batch
Epoch 129/300 Iteration 1802/4200 Training loss: 1.0545 1.8438 sec/batch
Epoch 129/300 Iteration 1803/4200 Training loss: 1.0548 1.6910 sec/batch
Epoch 129/300 Iteration 1804/4200 Training loss: 1.0530 1.7274 sec/batch
Epoch 129/300 Iteration 1805/4200 Training loss: 1.0549 1.7380 sec/batch
Epoch 129/300 Iteration 1806/4200 Training loss: 1.0547 1.7421 sec/batch
Epoch 130/300 Iteration 1807/4200 Training loss: 1.0919 1.7420 sec/batch
Epoch 130/300 Iteration 1808/4200 Training loss: 1.0548 1.7392 sec/batch
Epoch 130/300 Iteration 1809/4200 Training loss: 1.0402 1.7441 sec/batch
Epoch 130/300 Iteration 1810/4200 Training loss: 1.0412 1.7382 sec/batch
Epoch 130/300 Iteration 1811/4200 Training loss: 1.0398 1.7433 sec/batch
Epoch 130/300 Iteration 1812/4200 Training loss: 1.0367 1.7388 sec/batch
Epoch 130/300 Iteration 1813/4200 Training loss: 1.0341 1.7430 sec/batch
Epoch 130/300 Iteration 1814/4200 Training loss: 1.0337 1.7401 sec/batch
Epoch 130/300 Iteration 1815/4200 Training loss: 1.0313 1.7389 sec/batch
Epoch 130/300 Iteration 1816/4200 Training loss: 1.0297 1.7384 sec/batch
Epoch 130/300 Iteration 1817/4200 Training loss: 1.0320 1.7383 sec/batch
Epoch 130/300 Iteration 1818/4200 Training loss: 1.0306 1.7428 sec/batch
Epoch 130/300 Iteration 1819/4200 Training loss: 1.0325 1.7424 sec/batch
Epoch 130/300 Iteration 1820/4200 Training loss: 1.0337 1.7376 sec/batch
Epoch 131/300 Iteration 1821/4200 Training loss: 1.0692 1.7382 sec/batch
Epoch 131/300 Iteration 1822/4200 Training loss: 1.0362 1.7390 sec/batch
Epoch 131/300 Iteration 1823/4200 Training loss: 1.0221 1.7391 sec/batch
Epoch 131/300 Iteration 1824/4200 Training loss: 1.0227 1.7373 sec/batch
Epoch 131/300 Iteration 1825/4200 Training loss: 1.0225 1.7420 sec/batch
Epoch 131/300 Iteration 1826/4200 Training loss: 1.0200 1.7382 sec/batch
Epoch 131/300 Iteration 1827/4200 Training loss: 1.0163 1.7390 sec/batch
Epoch 131/300 Iteration 1828/4200 Training loss: 1.0159 1.7383 sec/batch
Epoch 131/300 Iteration 1829/4200 Training loss: 1.0132 1.7430 sec/batch
Epoch 131/300 Iteration 1830/4200 Training loss: 1.0130 1.7363 sec/batch
Epoch 131/300 Iteration 1831/4200 Training loss: 1.0172 1.7410 sec/batch
Epoch 131/300 Iteration 1832/4200 Training loss: 1.0170 1.7382 sec/batch
Epoch 131/300 Iteration 1833/4200 Training loss: 1.0181 1.7374 sec/batch
Epoch 131/300 Iteration 1834/4200 Training loss: 1.0189 1.7389 sec/batch
Epoch 132/300 Iteration 1835/4200 Training loss: 1.0722 1.7380 sec/batch
Epoch 132/300 Iteration 1836/4200 Training loss: 1.0381 1.7382 sec/batch
Epoch 132/300 Iteration 1837/4200 Training loss: 1.0260 1.7372 sec/batch
Epoch 132/300 Iteration 1838/4200 Training loss: 1.0242 1.7373 sec/batch
Epoch 132/300 Iteration 1839/4200 Training loss: 1.0227 1.7401 sec/batch
Epoch 132/300 Iteration 1840/4200 Training loss: 1.0197 1.7395 sec/batch
Epoch 132/300 Iteration 1841/4200 Training loss: 1.0160 1.7395 sec/batch
Epoch 132/300 Iteration 1842/4200 Training loss: 1.0146 1.7392 sec/batch
Epoch 132/300 Iteration 1843/4200 Training loss: 1.0129 1.7366 sec/batch
Epoch 132/300 Iteration 1844/4200 Training loss: 1.0126 1.7391 sec/batch
Epoch 132/300 Iteration 1845/4200 Training loss: 1.0132 1.7431 sec/batch
Epoch 132/300 Iteration 1846/4200 Training loss: 1.0132 1.7423 sec/batch
Epoch 132/300 Iteration 1847/4200 Training loss: 1.0145 1.7548 sec/batch
Epoch 132/300 Iteration 1848/4200 Training loss: 1.0154 1.7361 sec/batch
Epoch 133/300 Iteration 1849/4200 Training loss: 1.0670 1.7591 sec/batch
Epoch 133/300 Iteration 1850/4200 Training loss: 1.0370 2.0971 sec/batch
Epoch 133/300 Iteration 1851/4200 Training loss: 1.0226 2.0995 sec/batch
Epoch 133/300 Iteration 1852/4200 Training loss: 1.0197 2.0345 sec/batch
Epoch 133/300 Iteration 1853/4200 Training loss: 1.0176 2.0320 sec/batch
Epoch 133/300 Iteration 1854/4200 Training loss: 1.0136 2.0381 sec/batch
Epoch 133/300 Iteration 1855/4200 Training loss: 1.0082 1.7447 sec/batch
Epoch 133/300 Iteration 1856/4200 Training loss: 1.0057 1.7394 sec/batch
Epoch 133/300 Iteration 1857/4200 Training loss: 1.0036 1.7381 sec/batch
Epoch 133/300 Iteration 1858/4200 Training loss: 1.0055 1.7395 sec/batch
Epoch 133/300 Iteration 1859/4200 Training loss: 1.0076 1.7480 sec/batch
Epoch 133/300 Iteration 1860/4200 Training loss: 1.0071 1.7416 sec/batch
Epoch 133/300 Iteration 1861/4200 Training loss: 1.0087 1.7448 sec/batch
Epoch 133/300 Iteration 1862/4200 Training loss: 1.0095 1.7393 sec/batch
Epoch 134/300 Iteration 1863/4200 Training loss: 1.0501 1.7394 sec/batch
Epoch 134/300 Iteration 1864/4200 Training loss: 1.0159 1.7398 sec/batch
Epoch 134/300 Iteration 1865/4200 Training loss: 1.0102 1.7407 sec/batch
Epoch 134/300 Iteration 1866/4200 Training loss: 1.0095 1.7411 sec/batch
Epoch 134/300 Iteration 1867/4200 Training loss: 1.0097 1.7395 sec/batch
Epoch 134/300 Iteration 1868/4200 Training loss: 1.0066 1.7436 sec/batch
Epoch 134/300 Iteration 1869/4200 Training loss: 1.0023 1.7423 sec/batch
Epoch 134/300 Iteration 1870/4200 Training loss: 1.0013 1.7401 sec/batch
Epoch 134/300 Iteration 1871/4200 Training loss: 1.0012 1.7377 sec/batch
Epoch 134/300 Iteration 1872/4200 Training loss: 1.0023 1.7433 sec/batch
Epoch 134/300 Iteration 1873/4200 Training loss: 1.0059 1.7411 sec/batch
Epoch 134/300 Iteration 1874/4200 Training loss: 1.0058 1.7382 sec/batch
Epoch 134/300 Iteration 1875/4200 Training loss: 1.0067 1.7393 sec/batch
Epoch 134/300 Iteration 1876/4200 Training loss: 1.0069 1.8204 sec/batch
Epoch 135/300 Iteration 1877/4200 Training loss: 1.0439 1.7390 sec/batch
Epoch 135/300 Iteration 1878/4200 Training loss: 1.0125 1.7380 sec/batch
Epoch 135/300 Iteration 1879/4200 Training loss: 0.9994 1.7573 sec/batch
Epoch 135/300 Iteration 1880/4200 Training loss: 0.9996 1.7424 sec/batch
Epoch 135/300 Iteration 1881/4200 Training loss: 0.9991 1.7381 sec/batch
Epoch 135/300 Iteration 1882/4200 Training loss: 0.9996 1.8030 sec/batch
Epoch 135/300 Iteration 1883/4200 Training loss: 0.9983 1.7402 sec/batch
Epoch 135/300 Iteration 1884/4200 Training loss: 0.9967 1.7426 sec/batch
Epoch 135/300 Iteration 1885/4200 Training loss: 0.9945 1.7507 sec/batch
Epoch 135/300 Iteration 1886/4200 Training loss: 0.9953 1.7373 sec/batch
Epoch 135/300 Iteration 1887/4200 Training loss: 0.9980 1.7427 sec/batch
Epoch 135/300 Iteration 1888/4200 Training loss: 0.9976 1.7377 sec/batch
Epoch 135/300 Iteration 1889/4200 Training loss: 0.9992 1.7423 sec/batch
Epoch 135/300 Iteration 1890/4200 Training loss: 1.0014 1.7395 sec/batch
Epoch 136/300 Iteration 1891/4200 Training loss: 1.0510 1.7379 sec/batch
Epoch 136/300 Iteration 1892/4200 Training loss: 1.0134 1.8126 sec/batch
Epoch 136/300 Iteration 1893/4200 Training loss: 0.9995 1.7483 sec/batch
Epoch 136/300 Iteration 1894/4200 Training loss: 0.9998 1.7384 sec/batch
Epoch 136/300 Iteration 1895/4200 Training loss: 1.0013 1.7465 sec/batch
Epoch 136/300 Iteration 1896/4200 Training loss: 0.9980 1.7395 sec/batch
Epoch 136/300 Iteration 1897/4200 Training loss: 0.9936 1.7390 sec/batch
Epoch 136/300 Iteration 1898/4200 Training loss: 0.9921 1.7414 sec/batch
Epoch 136/300 Iteration 1899/4200 Training loss: 0.9897 1.7385 sec/batch
Epoch 136/300 Iteration 1900/4200 Training loss: 0.9897 1.7390 sec/batch
Validation loss: 1.87836 Saving checkpoint!
Epoch 136/300 Iteration 1901/4200 Training loss: 1.0179 3.2221 sec/batch
Epoch 136/300 Iteration 1902/4200 Training loss: 1.0151 1.8484 sec/batch
Epoch 136/300 Iteration 1903/4200 Training loss: 1.0143 1.6598 sec/batch
Epoch 136/300 Iteration 1904/4200 Training loss: 1.0127 1.6919 sec/batch
Epoch 137/300 Iteration 1905/4200 Training loss: 1.0306 1.6936 sec/batch
Epoch 137/300 Iteration 1906/4200 Training loss: 0.9946 1.6947 sec/batch
Epoch 137/300 Iteration 1907/4200 Training loss: 0.9887 1.7589 sec/batch
Epoch 137/300 Iteration 1908/4200 Training loss: 0.9852 1.7148 sec/batch
Epoch 137/300 Iteration 1909/4200 Training loss: 0.9885 1.7426 sec/batch
Epoch 137/300 Iteration 1910/4200 Training loss: 0.9874 1.7834 sec/batch
Epoch 137/300 Iteration 1911/4200 Training loss: 0.9829 1.7408 sec/batch
Epoch 137/300 Iteration 1912/4200 Training loss: 0.9816 1.8213 sec/batch
Epoch 137/300 Iteration 1913/4200 Training loss: 0.9812 1.7428 sec/batch
Epoch 137/300 Iteration 1914/4200 Training loss: 0.9819 1.7408 sec/batch
Epoch 137/300 Iteration 1915/4200 Training loss: 0.9826 1.7030 sec/batch
Epoch 137/300 Iteration 1916/4200 Training loss: 0.9825 1.7279 sec/batch
Epoch 137/300 Iteration 1917/4200 Training loss: 0.9839 1.7440 sec/batch
Epoch 137/300 Iteration 1918/4200 Training loss: 0.9854 1.7182 sec/batch
Epoch 138/300 Iteration 1919/4200 Training loss: 1.0335 1.7429 sec/batch
Epoch 138/300 Iteration 1920/4200 Training loss: 0.9985 1.7427 sec/batch
Epoch 138/300 Iteration 1921/4200 Training loss: 0.9888 1.7391 sec/batch
Epoch 138/300 Iteration 1922/4200 Training loss: 0.9899 1.7421 sec/batch
Epoch 138/300 Iteration 1923/4200 Training loss: 0.9892 1.7428 sec/batch
Epoch 138/300 Iteration 1924/4200 Training loss: 0.9856 1.8038 sec/batch
Epoch 138/300 Iteration 1925/4200 Training loss: 0.9823 1.7453 sec/batch
Epoch 138/300 Iteration 1926/4200 Training loss: 0.9818 1.7392 sec/batch
Epoch 138/300 Iteration 1927/4200 Training loss: 0.9794 1.7567 sec/batch
Epoch 138/300 Iteration 1928/4200 Training loss: 0.9799 1.7389 sec/batch
Epoch 138/300 Iteration 1929/4200 Training loss: 0.9816 1.7385 sec/batch
Epoch 138/300 Iteration 1930/4200 Training loss: 0.9818 1.7401 sec/batch
Epoch 138/300 Iteration 1931/4200 Training loss: 0.9830 1.7485 sec/batch
Epoch 138/300 Iteration 1932/4200 Training loss: 0.9830 1.7386 sec/batch
Epoch 139/300 Iteration 1933/4200 Training loss: 1.0284 1.8031 sec/batch
Epoch 139/300 Iteration 1934/4200 Training loss: 0.9937 1.7484 sec/batch
Epoch 139/300 Iteration 1935/4200 Training loss: 0.9850 1.7943 sec/batch
Epoch 139/300 Iteration 1936/4200 Training loss: 0.9846 1.7993 sec/batch
Epoch 139/300 Iteration 1937/4200 Training loss: 0.9821 1.7999 sec/batch
Epoch 139/300 Iteration 1938/4200 Training loss: 0.9775 1.7413 sec/batch
Epoch 139/300 Iteration 1939/4200 Training loss: 0.9743 1.7398 sec/batch
Epoch 139/300 Iteration 1940/4200 Training loss: 0.9731 1.7399 sec/batch
Epoch 139/300 Iteration 1941/4200 Training loss: 0.9709 1.7453 sec/batch
Epoch 139/300 Iteration 1942/4200 Training loss: 0.9709 1.7411 sec/batch
Epoch 139/300 Iteration 1943/4200 Training loss: 0.9737 1.7388 sec/batch
Epoch 139/300 Iteration 1944/4200 Training loss: 0.9743 1.7387 sec/batch
Epoch 139/300 Iteration 1945/4200 Training loss: 0.9760 1.7376 sec/batch
Epoch 139/300 Iteration 1946/4200 Training loss: 0.9759 1.7405 sec/batch
Epoch 140/300 Iteration 1947/4200 Training loss: 1.0285 1.7401 sec/batch
Epoch 140/300 Iteration 1948/4200 Training loss: 0.9932 1.7404 sec/batch
Epoch 140/300 Iteration 1949/4200 Training loss: 0.9831 1.7427 sec/batch
Epoch 140/300 Iteration 1950/4200 Training loss: 0.9801 1.7406 sec/batch
Epoch 140/300 Iteration 1951/4200 Training loss: 0.9768 1.7443 sec/batch
Epoch 140/300 Iteration 1952/4200 Training loss: 0.9740 1.7428 sec/batch
Epoch 140/300 Iteration 1953/4200 Training loss: 0.9701 1.7421 sec/batch
Epoch 140/300 Iteration 1954/4200 Training loss: 0.9677 1.7416 sec/batch
Epoch 140/300 Iteration 1955/4200 Training loss: 0.9640 1.7423 sec/batch
Epoch 140/300 Iteration 1956/4200 Training loss: 0.9637 1.7369 sec/batch
Epoch 140/300 Iteration 1957/4200 Training loss: 0.9660 1.7406 sec/batch
Epoch 140/300 Iteration 1958/4200 Training loss: 0.9655 1.7389 sec/batch
Epoch 140/300 Iteration 1959/4200 Training loss: 0.9663 1.7437 sec/batch
Epoch 140/300 Iteration 1960/4200 Training loss: 0.9657 1.7385 sec/batch
Epoch 141/300 Iteration 1961/4200 Training loss: 1.0030 1.7437 sec/batch
Epoch 141/300 Iteration 1962/4200 Training loss: 0.9676 1.7420 sec/batch
Epoch 141/300 Iteration 1963/4200 Training loss: 0.9592 1.7385 sec/batch
Epoch 141/300 Iteration 1964/4200 Training loss: 0.9601 1.7373 sec/batch
Epoch 141/300 Iteration 1965/4200 Training loss: 0.9591 1.7455 sec/batch
Epoch 141/300 Iteration 1966/4200 Training loss: 0.9606 1.7824 sec/batch
Epoch 141/300 Iteration 1967/4200 Training loss: 0.9571 1.7966 sec/batch
Epoch 141/300 Iteration 1968/4200 Training loss: 0.9553 1.7997 sec/batch
Epoch 141/300 Iteration 1969/4200 Training loss: 0.9533 1.7983 sec/batch
Epoch 141/300 Iteration 1970/4200 Training loss: 0.9534 1.7936 sec/batch
Epoch 141/300 Iteration 1971/4200 Training loss: 0.9558 1.7933 sec/batch
Epoch 141/300 Iteration 1972/4200 Training loss: 0.9554 1.8011 sec/batch
Epoch 141/300 Iteration 1973/4200 Training loss: 0.9577 1.7940 sec/batch
Epoch 141/300 Iteration 1974/4200 Training loss: 0.9592 1.7926 sec/batch
Epoch 142/300 Iteration 1975/4200 Training loss: 1.0059 1.8684 sec/batch
Epoch 142/300 Iteration 1976/4200 Training loss: 0.9685 1.7445 sec/batch
Epoch 142/300 Iteration 1977/4200 Training loss: 0.9554 1.7420 sec/batch
Epoch 142/300 Iteration 1978/4200 Training loss: 0.9535 1.7511 sec/batch
Epoch 142/300 Iteration 1979/4200 Training loss: 0.9522 1.7554 sec/batch
Epoch 142/300 Iteration 1980/4200 Training loss: 0.9525 1.7562 sec/batch
Epoch 142/300 Iteration 1981/4200 Training loss: 0.9507 1.7412 sec/batch
Epoch 142/300 Iteration 1982/4200 Training loss: 0.9492 1.7390 sec/batch
Epoch 142/300 Iteration 1983/4200 Training loss: 0.9474 1.7430 sec/batch
Epoch 142/300 Iteration 1984/4200 Training loss: 0.9482 1.7420 sec/batch
Epoch 142/300 Iteration 1985/4200 Training loss: 0.9503 1.7446 sec/batch
Epoch 142/300 Iteration 1986/4200 Training loss: 0.9508 1.7378 sec/batch
Epoch 142/300 Iteration 1987/4200 Training loss: 0.9524 1.7393 sec/batch
Epoch 142/300 Iteration 1988/4200 Training loss: 0.9535 1.7384 sec/batch
Epoch 143/300 Iteration 1989/4200 Training loss: 0.9968 1.7380 sec/batch
Epoch 143/300 Iteration 1990/4200 Training loss: 0.9645 1.7393 sec/batch
Epoch 143/300 Iteration 1991/4200 Training loss: 0.9545 1.7428 sec/batch
Epoch 143/300 Iteration 1992/4200 Training loss: 0.9554 1.7386 sec/batch
Epoch 143/300 Iteration 1993/4200 Training loss: 0.9562 1.7439 sec/batch
Epoch 143/300 Iteration 1994/4200 Training loss: 0.9556 1.7401 sec/batch
Epoch 143/300 Iteration 1995/4200 Training loss: 0.9522 1.7434 sec/batch
Epoch 143/300 Iteration 1996/4200 Training loss: 0.9498 1.7395 sec/batch
Epoch 143/300 Iteration 1997/4200 Training loss: 0.9474 1.7407 sec/batch
Epoch 143/300 Iteration 1998/4200 Training loss: 0.9483 1.7435 sec/batch
Epoch 143/300 Iteration 1999/4200 Training loss: 0.9492 1.7396 sec/batch
Epoch 143/300 Iteration 2000/4200 Training loss: 0.9487 1.7398 sec/batch
Validation loss: 1.9385 Saving checkpoint!
Epoch 143/300 Iteration 2001/4200 Training loss: 0.9729 3.1813 sec/batch
Epoch 143/300 Iteration 2002/4200 Training loss: 0.9723 1.8530 sec/batch
Epoch 144/300 Iteration 2003/4200 Training loss: 0.9980 1.6762 sec/batch
Epoch 144/300 Iteration 2004/4200 Training loss: 0.9634 1.6909 sec/batch
Epoch 144/300 Iteration 2005/4200 Training loss: 0.9515 1.7049 sec/batch
Epoch 144/300 Iteration 2006/4200 Training loss: 0.9515 1.7418 sec/batch
Epoch 144/300 Iteration 2007/4200 Training loss: 0.9527 1.7433 sec/batch
Epoch 144/300 Iteration 2008/4200 Training loss: 0.9516 1.7405 sec/batch
Epoch 144/300 Iteration 2009/4200 Training loss: 0.9478 1.7404 sec/batch
Epoch 144/300 Iteration 2010/4200 Training loss: 0.9440 1.7442 sec/batch
Epoch 144/300 Iteration 2011/4200 Training loss: 0.9425 1.7393 sec/batch
Epoch 144/300 Iteration 2012/4200 Training loss: 0.9424 1.7392 sec/batch
Epoch 144/300 Iteration 2013/4200 Training loss: 0.9434 1.6944 sec/batch
Epoch 144/300 Iteration 2014/4200 Training loss: 0.9430 1.7311 sec/batch
Epoch 144/300 Iteration 2015/4200 Training loss: 0.9443 1.7403 sec/batch
Epoch 144/300 Iteration 2016/4200 Training loss: 0.9450 1.7389 sec/batch
Epoch 145/300 Iteration 2017/4200 Training loss: 1.0065 1.7383 sec/batch
Epoch 145/300 Iteration 2018/4200 Training loss: 0.9592 1.7371 sec/batch
Epoch 145/300 Iteration 2019/4200 Training loss: 0.9444 1.7406 sec/batch
Epoch 145/300 Iteration 2020/4200 Training loss: 0.9432 1.7390 sec/batch
Epoch 145/300 Iteration 2021/4200 Training loss: 0.9440 1.7426 sec/batch
Epoch 145/300 Iteration 2022/4200 Training loss: 0.9412 1.7398 sec/batch
Epoch 145/300 Iteration 2023/4200 Training loss: 0.9371 1.7389 sec/batch
Epoch 145/300 Iteration 2024/4200 Training loss: 0.9342 1.7459 sec/batch
Epoch 145/300 Iteration 2025/4200 Training loss: 0.9330 1.7490 sec/batch
Epoch 145/300 Iteration 2026/4200 Training loss: 0.9335 1.6992 sec/batch
Epoch 145/300 Iteration 2027/4200 Training loss: 0.9342 1.7422 sec/batch
Epoch 145/300 Iteration 2028/4200 Training loss: 0.9321 1.7397 sec/batch
Epoch 145/300 Iteration 2029/4200 Training loss: 0.9343 1.7399 sec/batch
Epoch 145/300 Iteration 2030/4200 Training loss: 0.9344 1.7392 sec/batch
Epoch 146/300 Iteration 2031/4200 Training loss: 0.9778 1.7435 sec/batch
Epoch 146/300 Iteration 2032/4200 Training loss: 0.9384 1.6962 sec/batch
Epoch 146/300 Iteration 2033/4200 Training loss: 0.9288 1.7420 sec/batch
Epoch 146/300 Iteration 2034/4200 Training loss: 0.9275 1.7389 sec/batch
Epoch 146/300 Iteration 2035/4200 Training loss: 0.9264 1.7410 sec/batch
Epoch 146/300 Iteration 2036/4200 Training loss: 0.9267 1.7398 sec/batch
Epoch 146/300 Iteration 2037/4200 Training loss: 0.9235 1.7423 sec/batch
Epoch 146/300 Iteration 2038/4200 Training loss: 0.9202 1.6920 sec/batch
Epoch 146/300 Iteration 2039/4200 Training loss: 0.9191 1.7112 sec/batch
Epoch 146/300 Iteration 2040/4200 Training loss: 0.9194 1.7410 sec/batch
Epoch 146/300 Iteration 2041/4200 Training loss: 0.9225 1.7390 sec/batch
Epoch 146/300 Iteration 2042/4200 Training loss: 0.9217 1.7407 sec/batch
Epoch 146/300 Iteration 2043/4200 Training loss: 0.9237 1.7415 sec/batch
Epoch 146/300 Iteration 2044/4200 Training loss: 0.9242 1.7404 sec/batch
Epoch 147/300 Iteration 2045/4200 Training loss: 0.9800 1.7651 sec/batch
Epoch 147/300 Iteration 2046/4200 Training loss: 0.9328 1.7403 sec/batch
Epoch 147/300 Iteration 2047/4200 Training loss: 0.9196 1.7549 sec/batch
Epoch 147/300 Iteration 2048/4200 Training loss: 0.9176 1.7404 sec/batch
Epoch 147/300 Iteration 2049/4200 Training loss: 0.9179 1.7434 sec/batch
Epoch 147/300 Iteration 2050/4200 Training loss: 0.9159 1.7392 sec/batch
Epoch 147/300 Iteration 2051/4200 Training loss: 0.9150 1.7406 sec/batch
Epoch 147/300 Iteration 2052/4200 Training loss: 0.9152 1.7432 sec/batch
Epoch 147/300 Iteration 2053/4200 Training loss: 0.9130 1.7401 sec/batch
Epoch 147/300 Iteration 2054/4200 Training loss: 0.9136 1.7381 sec/batch
Epoch 147/300 Iteration 2055/4200 Training loss: 0.9151 1.7373 sec/batch
Epoch 147/300 Iteration 2056/4200 Training loss: 0.9141 1.7377 sec/batch
Epoch 147/300 Iteration 2057/4200 Training loss: 0.9164 1.7405 sec/batch
Epoch 147/300 Iteration 2058/4200 Training loss: 0.9159 1.7415 sec/batch
Epoch 148/300 Iteration 2059/4200 Training loss: 0.9475 1.7518 sec/batch
Epoch 148/300 Iteration 2060/4200 Training loss: 0.9253 1.7388 sec/batch
Epoch 148/300 Iteration 2061/4200 Training loss: 0.9127 1.7384 sec/batch
Epoch 148/300 Iteration 2062/4200 Training loss: 0.9093 1.7388 sec/batch
Epoch 148/300 Iteration 2063/4200 Training loss: 0.9089 1.7429 sec/batch
Epoch 148/300 Iteration 2064/4200 Training loss: 0.9046 1.7411 sec/batch
Epoch 148/300 Iteration 2065/4200 Training loss: 0.9027 1.7407 sec/batch
Epoch 148/300 Iteration 2066/4200 Training loss: 0.9020 1.7440 sec/batch
Epoch 148/300 Iteration 2067/4200 Training loss: 0.9023 1.7395 sec/batch
Epoch 148/300 Iteration 2068/4200 Training loss: 0.9035 1.7410 sec/batch
Epoch 148/300 Iteration 2069/4200 Training loss: 0.9048 1.7395 sec/batch
Epoch 148/300 Iteration 2070/4200 Training loss: 0.9036 1.7375 sec/batch
Epoch 148/300 Iteration 2071/4200 Training loss: 0.9053 1.7383 sec/batch
Epoch 148/300 Iteration 2072/4200 Training loss: 0.9058 1.7439 sec/batch
Epoch 149/300 Iteration 2073/4200 Training loss: 0.9666 1.7389 sec/batch
Epoch 149/300 Iteration 2074/4200 Training loss: 0.9238 1.7382 sec/batch
Epoch 149/300 Iteration 2075/4200 Training loss: 0.9117 1.7380 sec/batch
Epoch 149/300 Iteration 2076/4200 Training loss: 0.9105 1.7436 sec/batch
Epoch 149/300 Iteration 2077/4200 Training loss: 0.9113 1.7404 sec/batch
Epoch 149/300 Iteration 2078/4200 Training loss: 0.9107 1.7397 sec/batch
Epoch 149/300 Iteration 2079/4200 Training loss: 0.9074 1.7382 sec/batch
Epoch 149/300 Iteration 2080/4200 Training loss: 0.9042 1.7410 sec/batch
Epoch 149/300 Iteration 2081/4200 Training loss: 0.9018 1.7380 sec/batch
Epoch 149/300 Iteration 2082/4200 Training loss: 0.9022 1.7392 sec/batch
Epoch 149/300 Iteration 2083/4200 Training loss: 0.9052 1.7455 sec/batch
Epoch 149/300 Iteration 2084/4200 Training loss: 0.9041 1.7385 sec/batch
Epoch 149/300 Iteration 2085/4200 Training loss: 0.9056 1.7384 sec/batch
Epoch 149/300 Iteration 2086/4200 Training loss: 0.9059 1.7370 sec/batch
Epoch 150/300 Iteration 2087/4200 Training loss: 0.9426 1.7391 sec/batch
Epoch 150/300 Iteration 2088/4200 Training loss: 0.9070 1.7416 sec/batch
Epoch 150/300 Iteration 2089/4200 Training loss: 0.8962 1.7413 sec/batch
Epoch 150/300 Iteration 2090/4200 Training loss: 0.8959 1.7432 sec/batch
Epoch 150/300 Iteration 2091/4200 Training loss: 0.8949 1.7386 sec/batch
Epoch 150/300 Iteration 2092/4200 Training loss: 0.8942 1.7399 sec/batch
Epoch 150/300 Iteration 2093/4200 Training loss: 0.8894 1.7448 sec/batch
Epoch 150/300 Iteration 2094/4200 Training loss: 0.8895 1.7429 sec/batch
Epoch 150/300 Iteration 2095/4200 Training loss: 0.8871 1.7412 sec/batch
Epoch 150/300 Iteration 2096/4200 Training loss: 0.8874 1.7406 sec/batch
Epoch 150/300 Iteration 2097/4200 Training loss: 0.8880 1.7388 sec/batch
Epoch 150/300 Iteration 2098/4200 Training loss: 0.8871 1.7392 sec/batch
Epoch 150/300 Iteration 2099/4200 Training loss: 0.8885 1.7369 sec/batch
Epoch 150/300 Iteration 2100/4200 Training loss: 0.8895 1.7431 sec/batch
Validation loss: 2.00595 Saving checkpoint!
Epoch 151/300 Iteration 2101/4200 Training loss: 0.9523 3.1944 sec/batch
Epoch 151/300 Iteration 2102/4200 Training loss: 0.9073 1.8460 sec/batch
Epoch 151/300 Iteration 2103/4200 Training loss: 0.8934 1.6680 sec/batch
Epoch 151/300 Iteration 2104/4200 Training loss: 0.8932 1.6951 sec/batch
Epoch 151/300 Iteration 2105/4200 Training loss: 0.8945 1.6883 sec/batch
Epoch 151/300 Iteration 2106/4200 Training loss: 0.8922 1.7100 sec/batch
Epoch 151/300 Iteration 2107/4200 Training loss: 0.8898 1.7401 sec/batch
Epoch 151/300 Iteration 2108/4200 Training loss: 0.8868 1.7468 sec/batch
Epoch 151/300 Iteration 2109/4200 Training loss: 0.8845 1.7381 sec/batch
Epoch 151/300 Iteration 2110/4200 Training loss: 0.8840 1.7435 sec/batch
Epoch 151/300 Iteration 2111/4200 Training loss: 0.8855 1.7463 sec/batch
Epoch 151/300 Iteration 2112/4200 Training loss: 0.8841 1.7490 sec/batch
Epoch 151/300 Iteration 2113/4200 Training loss: 0.8862 1.7563 sec/batch
Epoch 151/300 Iteration 2114/4200 Training loss: 0.8875 1.7442 sec/batch
Epoch 152/300 Iteration 2115/4200 Training loss: 0.9163 1.7398 sec/batch
Epoch 152/300 Iteration 2116/4200 Training loss: 0.8864 1.7389 sec/batch
Epoch 152/300 Iteration 2117/4200 Training loss: 0.8812 1.7379 sec/batch
Epoch 152/300 Iteration 2118/4200 Training loss: 0.8820 1.7374 sec/batch
Epoch 152/300 Iteration 2119/4200 Training loss: 0.8846 1.7407 sec/batch
Epoch 152/300 Iteration 2120/4200 Training loss: 0.8828 1.7390 sec/batch
Epoch 152/300 Iteration 2121/4200 Training loss: 0.8802 1.7374 sec/batch
Epoch 152/300 Iteration 2122/4200 Training loss: 0.8808 1.7394 sec/batch
Epoch 152/300 Iteration 2123/4200 Training loss: 0.8791 1.7395 sec/batch
Epoch 152/300 Iteration 2124/4200 Training loss: 0.8795 1.7414 sec/batch
Epoch 152/300 Iteration 2125/4200 Training loss: 0.8811 1.7395 sec/batch
Epoch 152/300 Iteration 2126/4200 Training loss: 0.8813 1.7427 sec/batch
Epoch 152/300 Iteration 2127/4200 Training loss: 0.8826 1.7429 sec/batch
Epoch 152/300 Iteration 2128/4200 Training loss: 0.8832 1.7440 sec/batch
Epoch 153/300 Iteration 2129/4200 Training loss: 0.9235 1.7398 sec/batch
Epoch 153/300 Iteration 2130/4200 Training loss: 0.8847 1.8048 sec/batch
Epoch 153/300 Iteration 2131/4200 Training loss: 0.8735 1.8074 sec/batch
Epoch 153/300 Iteration 2132/4200 Training loss: 0.8745 1.7062 sec/batch
Epoch 153/300 Iteration 2133/4200 Training loss: 0.8758 1.7401 sec/batch
Epoch 153/300 Iteration 2134/4200 Training loss: 0.8753 1.7507 sec/batch
Epoch 153/300 Iteration 2135/4200 Training loss: 0.8726 1.7400 sec/batch
Epoch 153/300 Iteration 2136/4200 Training loss: 0.8725 1.7398 sec/batch
Epoch 153/300 Iteration 2137/4200 Training loss: 0.8716 1.8084 sec/batch
Epoch 153/300 Iteration 2138/4200 Training loss: 0.8714 1.7424 sec/batch
Epoch 153/300 Iteration 2139/4200 Training loss: 0.8725 1.7428 sec/batch
Epoch 153/300 Iteration 2140/4200 Training loss: 0.8731 1.7508 sec/batch
Epoch 153/300 Iteration 2141/4200 Training loss: 0.8753 1.6961 sec/batch
Epoch 153/300 Iteration 2142/4200 Training loss: 0.8754 1.7274 sec/batch
Epoch 154/300 Iteration 2143/4200 Training loss: 0.9196 1.8046 sec/batch
Epoch 154/300 Iteration 2144/4200 Training loss: 0.8870 1.7650 sec/batch
Epoch 154/300 Iteration 2145/4200 Training loss: 0.8775 1.7395 sec/batch
Epoch 154/300 Iteration 2146/4200 Training loss: 0.8779 1.7402 sec/batch
Epoch 154/300 Iteration 2147/4200 Training loss: 0.8769 1.7485 sec/batch
Epoch 154/300 Iteration 2148/4200 Training loss: 0.8755 1.7390 sec/batch
Epoch 154/300 Iteration 2149/4200 Training loss: 0.8722 1.7428 sec/batch
Epoch 154/300 Iteration 2150/4200 Training loss: 0.8706 1.7404 sec/batch
Epoch 154/300 Iteration 2151/4200 Training loss: 0.8690 1.7926 sec/batch
Epoch 154/300 Iteration 2152/4200 Training loss: 0.8701 1.7395 sec/batch
Epoch 154/300 Iteration 2153/4200 Training loss: 0.8730 1.7375 sec/batch
Epoch 154/300 Iteration 2154/4200 Training loss: 0.8723 1.7521 sec/batch
Epoch 154/300 Iteration 2155/4200 Training loss: 0.8737 1.7392 sec/batch
Epoch 154/300 Iteration 2156/4200 Training loss: 0.8752 1.7388 sec/batch
Epoch 155/300 Iteration 2157/4200 Training loss: 0.9260 1.7402 sec/batch
Epoch 155/300 Iteration 2158/4200 Training loss: 0.8829 1.7401 sec/batch
Epoch 155/300 Iteration 2159/4200 Training loss: 0.8698 1.7434 sec/batch
Epoch 155/300 Iteration 2160/4200 Training loss: 0.8705 1.7433 sec/batch
Epoch 155/300 Iteration 2161/4200 Training loss: 0.8729 1.7386 sec/batch
Epoch 155/300 Iteration 2162/4200 Training loss: 0.8719 1.7395 sec/batch
Epoch 155/300 Iteration 2163/4200 Training loss: 0.8693 1.7393 sec/batch
Epoch 155/300 Iteration 2164/4200 Training loss: 0.8687 1.7399 sec/batch
Epoch 155/300 Iteration 2165/4200 Training loss: 0.8689 1.7396 sec/batch
Epoch 155/300 Iteration 2166/4200 Training loss: 0.8693 1.7456 sec/batch
Epoch 155/300 Iteration 2167/4200 Training loss: 0.8706 1.7372 sec/batch
Epoch 155/300 Iteration 2168/4200 Training loss: 0.8696 1.7391 sec/batch
Epoch 155/300 Iteration 2169/4200 Training loss: 0.8711 1.7442 sec/batch
Epoch 155/300 Iteration 2170/4200 Training loss: 0.8714 1.7416 sec/batch
Epoch 156/300 Iteration 2171/4200 Training loss: 0.9158 1.7425 sec/batch
Epoch 156/300 Iteration 2172/4200 Training loss: 0.8769 1.7421 sec/batch
Epoch 156/300 Iteration 2173/4200 Training loss: 0.8688 1.7381 sec/batch
Epoch 156/300 Iteration 2174/4200 Training loss: 0.8668 1.7393 sec/batch
Epoch 156/300 Iteration 2175/4200 Training loss: 0.8648 1.7393 sec/batch
Epoch 156/300 Iteration 2176/4200 Training loss: 0.8635 1.7398 sec/batch
Epoch 156/300 Iteration 2177/4200 Training loss: 0.8609 1.7392 sec/batch
Epoch 156/300 Iteration 2178/4200 Training loss: 0.8590 1.7416 sec/batch
Epoch 156/300 Iteration 2179/4200 Training loss: 0.8579 1.7395 sec/batch
Epoch 156/300 Iteration 2180/4200 Training loss: 0.8584 1.7715 sec/batch
Epoch 156/300 Iteration 2181/4200 Training loss: 0.8585 1.7573 sec/batch
Epoch 156/300 Iteration 2182/4200 Training loss: 0.8579 1.7437 sec/batch
Epoch 156/300 Iteration 2183/4200 Training loss: 0.8609 1.7398 sec/batch
Epoch 156/300 Iteration 2184/4200 Training loss: 0.8623 1.7428 sec/batch
Epoch 157/300 Iteration 2185/4200 Training loss: 0.9139 1.7424 sec/batch
Epoch 157/300 Iteration 2186/4200 Training loss: 0.8750 1.7421 sec/batch
Epoch 157/300 Iteration 2187/4200 Training loss: 0.8643 1.7442 sec/batch
Epoch 157/300 Iteration 2188/4200 Training loss: 0.8616 1.7420 sec/batch
Epoch 157/300 Iteration 2189/4200 Training loss: 0.8624 1.7406 sec/batch
Epoch 157/300 Iteration 2190/4200 Training loss: 0.8617 1.7448 sec/batch
Epoch 157/300 Iteration 2191/4200 Training loss: 0.8587 1.7411 sec/batch
Epoch 157/300 Iteration 2192/4200 Training loss: 0.8559 1.7402 sec/batch
Epoch 157/300 Iteration 2193/4200 Training loss: 0.8546 1.7437 sec/batch
Epoch 157/300 Iteration 2194/4200 Training loss: 0.8562 1.7470 sec/batch
Epoch 157/300 Iteration 2195/4200 Training loss: 0.8578 1.7371 sec/batch
Epoch 157/300 Iteration 2196/4200 Training loss: 0.8573 1.7395 sec/batch
Epoch 157/300 Iteration 2197/4200 Training loss: 0.8599 1.7443 sec/batch
Epoch 157/300 Iteration 2198/4200 Training loss: 0.8610 1.7378 sec/batch
Epoch 158/300 Iteration 2199/4200 Training loss: 0.9063 1.7382 sec/batch
Epoch 158/300 Iteration 2200/4200 Training loss: 0.8735 1.7394 sec/batch
Validation loss: 2.04631 Saving checkpoint!
Epoch 158/300 Iteration 2201/4200 Training loss: 0.9865 3.2351 sec/batch
Epoch 158/300 Iteration 2202/4200 Training loss: 0.9539 1.8217 sec/batch
Epoch 158/300 Iteration 2203/4200 Training loss: 0.9345 1.6606 sec/batch
Epoch 158/300 Iteration 2204/4200 Training loss: 0.9203 1.6947 sec/batch
Epoch 158/300 Iteration 2205/4200 Training loss: 0.9092 1.6912 sec/batch
Epoch 158/300 Iteration 2206/4200 Training loss: 0.9038 1.6937 sec/batch
Epoch 158/300 Iteration 2207/4200 Training loss: 0.8987 1.6909 sec/batch
Epoch 158/300 Iteration 2208/4200 Training loss: 0.8958 1.6946 sec/batch
Epoch 158/300 Iteration 2209/4200 Training loss: 0.8946 1.7375 sec/batch
Epoch 158/300 Iteration 2210/4200 Training loss: 0.8912 1.7416 sec/batch
Epoch 158/300 Iteration 2211/4200 Training loss: 0.8891 1.7374 sec/batch
Epoch 158/300 Iteration 2212/4200 Training loss: 0.8880 1.7404 sec/batch
Epoch 159/300 Iteration 2213/4200 Training loss: 0.9053 1.7408 sec/batch
Epoch 159/300 Iteration 2214/4200 Training loss: 0.8693 1.7382 sec/batch
Epoch 159/300 Iteration 2215/4200 Training loss: 0.8558 1.7445 sec/batch
Epoch 159/300 Iteration 2216/4200 Training loss: 0.8548 1.7397 sec/batch
Epoch 159/300 Iteration 2217/4200 Training loss: 0.8570 1.7384 sec/batch
Epoch 159/300 Iteration 2218/4200 Training loss: 0.8560 1.7444 sec/batch
Epoch 159/300 Iteration 2219/4200 Training loss: 0.8548 1.7388 sec/batch
Epoch 159/300 Iteration 2220/4200 Training loss: 0.8550 1.7390 sec/batch
Epoch 159/300 Iteration 2221/4200 Training loss: 0.8542 1.7420 sec/batch
Epoch 159/300 Iteration 2222/4200 Training loss: 0.8550 1.7376 sec/batch
Epoch 159/300 Iteration 2223/4200 Training loss: 0.8567 1.7385 sec/batch
Epoch 159/300 Iteration 2224/4200 Training loss: 0.8558 1.7397 sec/batch
Epoch 159/300 Iteration 2225/4200 Training loss: 0.8576 1.7405 sec/batch
Epoch 159/300 Iteration 2226/4200 Training loss: 0.8588 1.7436 sec/batch
Epoch 160/300 Iteration 2227/4200 Training loss: 0.9126 1.7422 sec/batch
Epoch 160/300 Iteration 2228/4200 Training loss: 0.8666 1.7433 sec/batch
Epoch 160/300 Iteration 2229/4200 Training loss: 0.8572 1.6942 sec/batch
Epoch 160/300 Iteration 2230/4200 Training loss: 0.8547 1.7096 sec/batch
Epoch 160/300 Iteration 2231/4200 Training loss: 0.8545 1.7364 sec/batch
Epoch 160/300 Iteration 2232/4200 Training loss: 0.8554 1.7429 sec/batch
Epoch 160/300 Iteration 2233/4200 Training loss: 0.8553 1.7394 sec/batch
Epoch 160/300 Iteration 2234/4200 Training loss: 0.8536 1.7380 sec/batch
Epoch 160/300 Iteration 2235/4200 Training loss: 0.8505 1.7414 sec/batch
Epoch 160/300 Iteration 2236/4200 Training loss: 0.8518 1.7463 sec/batch
Epoch 160/300 Iteration 2237/4200 Training loss: 0.8526 1.7418 sec/batch
Epoch 160/300 Iteration 2238/4200 Training loss: 0.8511 1.7381 sec/batch
Epoch 160/300 Iteration 2239/4200 Training loss: 0.8527 1.7454 sec/batch
Epoch 160/300 Iteration 2240/4200 Training loss: 0.8540 1.8050 sec/batch
Epoch 161/300 Iteration 2241/4200 Training loss: 0.8961 1.7119 sec/batch
Epoch 161/300 Iteration 2242/4200 Training loss: 0.8570 1.7379 sec/batch
Epoch 161/300 Iteration 2243/4200 Training loss: 0.8495 1.7554 sec/batch
Epoch 161/300 Iteration 2244/4200 Training loss: 0.8475 1.7441 sec/batch
Epoch 161/300 Iteration 2245/4200 Training loss: 0.8491 1.7396 sec/batch
Epoch 161/300 Iteration 2246/4200 Training loss: 0.8478 1.7634 sec/batch
Epoch 161/300 Iteration 2247/4200 Training loss: 0.8463 1.7417 sec/batch
Epoch 161/300 Iteration 2248/4200 Training loss: 0.8455 1.7561 sec/batch
Epoch 161/300 Iteration 2249/4200 Training loss: 0.8446 1.7398 sec/batch
Epoch 161/300 Iteration 2250/4200 Training loss: 0.8461 1.7413 sec/batch
Epoch 161/300 Iteration 2251/4200 Training loss: 0.8475 1.7396 sec/batch
Epoch 161/300 Iteration 2252/4200 Training loss: 0.8477 1.7388 sec/batch
Epoch 161/300 Iteration 2253/4200 Training loss: 0.8503 1.7403 sec/batch
Epoch 161/300 Iteration 2254/4200 Training loss: 0.8503 1.7396 sec/batch
Epoch 162/300 Iteration 2255/4200 Training loss: 0.8772 1.7398 sec/batch
Epoch 162/300 Iteration 2256/4200 Training loss: 0.8529 1.7440 sec/batch
Epoch 162/300 Iteration 2257/4200 Training loss: 0.8435 1.7398 sec/batch
Epoch 162/300 Iteration 2258/4200 Training loss: 0.8442 1.7392 sec/batch
Epoch 162/300 Iteration 2259/4200 Training loss: 0.8435 1.7422 sec/batch
Epoch 162/300 Iteration 2260/4200 Training loss: 0.8442 1.6955 sec/batch
Epoch 162/300 Iteration 2261/4200 Training loss: 0.8424 1.7108 sec/batch
Epoch 162/300 Iteration 2262/4200 Training loss: 0.8419 1.7392 sec/batch
Epoch 162/300 Iteration 2263/4200 Training loss: 0.8388 1.7407 sec/batch
Epoch 162/300 Iteration 2264/4200 Training loss: 0.8404 1.7383 sec/batch
Epoch 162/300 Iteration 2265/4200 Training loss: 0.8420 1.7390 sec/batch
Epoch 162/300 Iteration 2266/4200 Training loss: 0.8410 1.7077 sec/batch
Epoch 162/300 Iteration 2267/4200 Training loss: 0.8421 1.7408 sec/batch
Epoch 162/300 Iteration 2268/4200 Training loss: 0.8426 1.7369 sec/batch
Epoch 163/300 Iteration 2269/4200 Training loss: 0.8937 1.7381 sec/batch
Epoch 163/300 Iteration 2270/4200 Training loss: 0.8578 1.7385 sec/batch
Epoch 163/300 Iteration 2271/4200 Training loss: 0.8519 1.7388 sec/batch
Epoch 163/300 Iteration 2272/4200 Training loss: 0.8487 1.7386 sec/batch
Epoch 163/300 Iteration 2273/4200 Training loss: 0.8449 1.7381 sec/batch
Epoch 163/300 Iteration 2274/4200 Training loss: 0.8439 1.7461 sec/batch
Epoch 163/300 Iteration 2275/4200 Training loss: 0.8411 1.7406 sec/batch
Epoch 163/300 Iteration 2276/4200 Training loss: 0.8405 1.7376 sec/batch
Epoch 163/300 Iteration 2277/4200 Training loss: 0.8379 1.7803 sec/batch
Epoch 163/300 Iteration 2278/4200 Training loss: 0.8400 1.7936 sec/batch
Epoch 163/300 Iteration 2279/4200 Training loss: 0.8409 1.7972 sec/batch
Epoch 163/300 Iteration 2280/4200 Training loss: 0.8395 1.7934 sec/batch
Epoch 163/300 Iteration 2281/4200 Training loss: 0.8403 1.7381 sec/batch
Epoch 163/300 Iteration 2282/4200 Training loss: 0.8403 1.7415 sec/batch
Epoch 164/300 Iteration 2283/4200 Training loss: 0.8938 1.7424 sec/batch
Epoch 164/300 Iteration 2284/4200 Training loss: 0.8612 1.7373 sec/batch
Epoch 164/300 Iteration 2285/4200 Training loss: 0.8469 1.7403 sec/batch
Epoch 164/300 Iteration 2286/4200 Training loss: 0.8456 1.7415 sec/batch
Epoch 164/300 Iteration 2287/4200 Training loss: 0.8443 1.7435 sec/batch
Epoch 164/300 Iteration 2288/4200 Training loss: 0.8416 1.7420 sec/batch
Epoch 164/300 Iteration 2289/4200 Training loss: 0.8377 1.7382 sec/batch
Epoch 164/300 Iteration 2290/4200 Training loss: 0.8344 1.7431 sec/batch
Epoch 164/300 Iteration 2291/4200 Training loss: 0.8340 1.7430 sec/batch
Epoch 164/300 Iteration 2292/4200 Training loss: 0.8334 1.7386 sec/batch
Epoch 164/300 Iteration 2293/4200 Training loss: 0.8344 1.7400 sec/batch
Epoch 164/300 Iteration 2294/4200 Training loss: 0.8348 1.7487 sec/batch
Epoch 164/300 Iteration 2295/4200 Training loss: 0.8368 1.7447 sec/batch
Epoch 164/300 Iteration 2296/4200 Training loss: 0.8371 1.7405 sec/batch
Epoch 165/300 Iteration 2297/4200 Training loss: 0.8949 1.7405 sec/batch
Epoch 165/300 Iteration 2298/4200 Training loss: 0.8643 1.7408 sec/batch
Epoch 165/300 Iteration 2299/4200 Training loss: 0.8506 1.7391 sec/batch
Epoch 165/300 Iteration 2300/4200 Training loss: 0.8511 1.7380 sec/batch
Validation loss: 2.08206 Saving checkpoint!
Epoch 165/300 Iteration 2301/4200 Training loss: 0.9200 1.6517 sec/batch
Epoch 165/300 Iteration 2302/4200 Training loss: 0.9046 1.7951 sec/batch
Epoch 165/300 Iteration 2303/4200 Training loss: 0.8927 1.7409 sec/batch
Epoch 165/300 Iteration 2304/4200 Training loss: 0.8848 1.7375 sec/batch
Epoch 165/300 Iteration 2305/4200 Training loss: 0.8777 1.7392 sec/batch
Epoch 165/300 Iteration 2306/4200 Training loss: 0.8747 1.7396 sec/batch
Epoch 165/300 Iteration 2307/4200 Training loss: 0.8728 1.7429 sec/batch
Epoch 165/300 Iteration 2308/4200 Training loss: 0.8679 1.7425 sec/batch
Epoch 165/300 Iteration 2309/4200 Training loss: 0.8672 1.7394 sec/batch
Epoch 165/300 Iteration 2310/4200 Training loss: 0.8665 1.7396 sec/batch
Epoch 166/300 Iteration 2311/4200 Training loss: 0.8578 1.7387 sec/batch
Epoch 166/300 Iteration 2312/4200 Training loss: 0.8369 1.7410 sec/batch
Epoch 166/300 Iteration 2313/4200 Training loss: 0.8349 1.7372 sec/batch
Epoch 166/300 Iteration 2314/4200 Training loss: 0.8395 1.7506 sec/batch
Epoch 166/300 Iteration 2315/4200 Training loss: 0.8375 1.7559 sec/batch
Epoch 166/300 Iteration 2316/4200 Training loss: 0.8374 1.7407 sec/batch
Epoch 166/300 Iteration 2317/4200 Training loss: 0.8341 1.7383 sec/batch
Epoch 166/300 Iteration 2318/4200 Training loss: 0.8329 1.8133 sec/batch
Epoch 166/300 Iteration 2319/4200 Training loss: 0.8316 1.7430 sec/batch
Epoch 166/300 Iteration 2320/4200 Training loss: 0.8320 1.7449 sec/batch
Epoch 166/300 Iteration 2321/4200 Training loss: 0.8342 1.7532 sec/batch
Epoch 166/300 Iteration 2322/4200 Training loss: 0.8337 1.7399 sec/batch
Epoch 166/300 Iteration 2323/4200 Training loss: 0.8350 1.7400 sec/batch
Epoch 166/300 Iteration 2324/4200 Training loss: 0.8352 1.7398 sec/batch
Epoch 167/300 Iteration 2325/4200 Training loss: 0.8702 1.7402 sec/batch
Epoch 167/300 Iteration 2326/4200 Training loss: 0.8476 1.7390 sec/batch
Epoch 167/300 Iteration 2327/4200 Training loss: 0.8436 1.7422 sec/batch
Epoch 167/300 Iteration 2328/4200 Training loss: 0.8473 1.7454 sec/batch
Epoch 167/300 Iteration 2329/4200 Training loss: 0.8466 1.7396 sec/batch
Epoch 167/300 Iteration 2330/4200 Training loss: 0.8444 1.7448 sec/batch
Epoch 167/300 Iteration 2331/4200 Training loss: 0.8398 1.7435 sec/batch
Epoch 167/300 Iteration 2332/4200 Training loss: 0.8382 1.7398 sec/batch
Epoch 167/300 Iteration 2333/4200 Training loss: 0.8353 1.7394 sec/batch
Epoch 167/300 Iteration 2334/4200 Training loss: 0.8344 1.7459 sec/batch
Epoch 167/300 Iteration 2335/4200 Training loss: 0.8362 1.7396 sec/batch
Epoch 167/300 Iteration 2336/4200 Training loss: 0.8355 1.7376 sec/batch
Epoch 167/300 Iteration 2337/4200 Training loss: 0.8366 1.7462 sec/batch
Epoch 167/300 Iteration 2338/4200 Training loss: 0.8371 1.7383 sec/batch
Epoch 168/300 Iteration 2339/4200 Training loss: 0.8554 1.7379 sec/batch
Epoch 168/300 Iteration 2340/4200 Training loss: 0.8367 1.7395 sec/batch
Epoch 168/300 Iteration 2341/4200 Training loss: 0.8282 1.7408 sec/batch
Epoch 168/300 Iteration 2342/4200 Training loss: 0.8314 1.7392 sec/batch
Epoch 168/300 Iteration 2343/4200 Training loss: 0.8363 1.7397 sec/batch
Epoch 168/300 Iteration 2344/4200 Training loss: 0.8386 1.7394 sec/batch
Epoch 168/300 Iteration 2345/4200 Training loss: 0.8362 1.7391 sec/batch
Epoch 168/300 Iteration 2346/4200 Training loss: 0.8325 1.7404 sec/batch
Epoch 168/300 Iteration 2347/4200 Training loss: 0.8300 1.7398 sec/batch
Epoch 168/300 Iteration 2348/4200 Training loss: 0.8301 1.7408 sec/batch
Epoch 168/300 Iteration 2349/4200 Training loss: 0.8325 1.7376 sec/batch
Epoch 168/300 Iteration 2350/4200 Training loss: 0.8320 1.7403 sec/batch
Epoch 168/300 Iteration 2351/4200 Training loss: 0.8343 1.7470 sec/batch
Epoch 168/300 Iteration 2352/4200 Training loss: 0.8352 1.7390 sec/batch
Epoch 169/300 Iteration 2353/4200 Training loss: 0.8688 1.7395 sec/batch
Epoch 169/300 Iteration 2354/4200 Training loss: 0.8366 1.7423 sec/batch
Epoch 169/300 Iteration 2355/4200 Training loss: 0.8290 1.7399 sec/batch
Epoch 169/300 Iteration 2356/4200 Training loss: 0.8292 1.7395 sec/batch
Epoch 169/300 Iteration 2357/4200 Training loss: 0.8322 1.7398 sec/batch
Epoch 169/300 Iteration 2358/4200 Training loss: 0.8325 1.7395 sec/batch
Epoch 169/300 Iteration 2359/4200 Training loss: 0.8297 1.7381 sec/batch
Epoch 169/300 Iteration 2360/4200 Training loss: 0.8283 1.7408 sec/batch
Epoch 169/300 Iteration 2361/4200 Training loss: 0.8257 1.7428 sec/batch
Epoch 169/300 Iteration 2362/4200 Training loss: 0.8258 1.7462 sec/batch
Epoch 169/300 Iteration 2363/4200 Training loss: 0.8275 1.7402 sec/batch
Epoch 169/300 Iteration 2364/4200 Training loss: 0.8272 1.7398 sec/batch
Epoch 169/300 Iteration 2365/4200 Training loss: 0.8284 1.7390 sec/batch
Epoch 169/300 Iteration 2366/4200 Training loss: 0.8288 1.7379 sec/batch
Epoch 170/300 Iteration 2367/4200 Training loss: 0.8773 1.7416 sec/batch
Epoch 170/300 Iteration 2368/4200 Training loss: 0.8334 1.7408 sec/batch
Epoch 170/300 Iteration 2369/4200 Training loss: 0.8223 1.7439 sec/batch
Epoch 170/300 Iteration 2370/4200 Training loss: 0.8201 1.7396 sec/batch
Epoch 170/300 Iteration 2371/4200 Training loss: 0.8248 1.7398 sec/batch
Epoch 170/300 Iteration 2372/4200 Training loss: 0.8251 1.7455 sec/batch
Epoch 170/300 Iteration 2373/4200 Training loss: 0.8180 1.7384 sec/batch
Epoch 170/300 Iteration 2374/4200 Training loss: 0.8176 1.7382 sec/batch
Epoch 170/300 Iteration 2375/4200 Training loss: 0.8161 1.7377 sec/batch
Epoch 170/300 Iteration 2376/4200 Training loss: 0.8160 1.7390 sec/batch
Epoch 170/300 Iteration 2377/4200 Training loss: 0.8170 1.7428 sec/batch
Epoch 170/300 Iteration 2378/4200 Training loss: 0.8156 1.7418 sec/batch
Epoch 170/300 Iteration 2379/4200 Training loss: 0.8173 1.7420 sec/batch
Epoch 170/300 Iteration 2380/4200 Training loss: 0.8173 1.7393 sec/batch
Epoch 171/300 Iteration 2381/4200 Training loss: 0.8765 1.7401 sec/batch
Epoch 171/300 Iteration 2382/4200 Training loss: 0.8277 1.7511 sec/batch
Epoch 171/300 Iteration 2383/4200 Training loss: 0.8134 1.7600 sec/batch
Epoch 171/300 Iteration 2384/4200 Training loss: 0.8116 1.7549 sec/batch
Epoch 171/300 Iteration 2385/4200 Training loss: 0.8128 1.7457 sec/batch
Epoch 171/300 Iteration 2386/4200 Training loss: 0.8115 1.7396 sec/batch
Epoch 171/300 Iteration 2387/4200 Training loss: 0.8077 1.7386 sec/batch
Epoch 171/300 Iteration 2388/4200 Training loss: 0.8040 1.7401 sec/batch
Epoch 171/300 Iteration 2389/4200 Training loss: 0.8011 1.7430 sec/batch
Epoch 171/300 Iteration 2390/4200 Training loss: 0.8018 1.7383 sec/batch
Epoch 171/300 Iteration 2391/4200 Training loss: 0.8031 1.7411 sec/batch
Epoch 171/300 Iteration 2392/4200 Training loss: 0.8021 1.7436 sec/batch
Epoch 171/300 Iteration 2393/4200 Training loss: 0.8031 1.7379 sec/batch
Epoch 171/300 Iteration 2394/4200 Training loss: 0.8035 1.7386 sec/batch
Epoch 172/300 Iteration 2395/4200 Training loss: 0.8387 1.7395 sec/batch
Epoch 172/300 Iteration 2396/4200 Training loss: 0.8079 1.7438 sec/batch
Epoch 172/300 Iteration 2397/4200 Training loss: 0.7999 1.7450 sec/batch
Epoch 172/300 Iteration 2398/4200 Training loss: 0.7985 1.7411 sec/batch
Epoch 172/300 Iteration 2399/4200 Training loss: 0.7997 1.8598 sec/batch
Epoch 172/300 Iteration 2400/4200 Training loss: 0.7987 1.7426 sec/batch
Validation loss: 2.1279 Saving checkpoint!
Epoch 172/300 Iteration 2401/4200 Training loss: 0.8498 1.6540 sec/batch
Epoch 172/300 Iteration 2402/4200 Training loss: 0.8405 1.7385 sec/batch
Epoch 172/300 Iteration 2403/4200 Training loss: 0.8338 1.7395 sec/batch
Epoch 172/300 Iteration 2404/4200 Training loss: 0.8306 1.7383 sec/batch
Epoch 172/300 Iteration 2405/4200 Training loss: 0.8295 1.7381 sec/batch
Epoch 172/300 Iteration 2406/4200 Training loss: 0.8263 1.7367 sec/batch
Epoch 172/300 Iteration 2407/4200 Training loss: 0.8254 1.7399 sec/batch
Epoch 172/300 Iteration 2408/4200 Training loss: 0.8239 1.7444 sec/batch
Epoch 173/300 Iteration 2409/4200 Training loss: 0.8343 1.7381 sec/batch
Epoch 173/300 Iteration 2410/4200 Training loss: 0.8009 1.7394 sec/batch
Epoch 173/300 Iteration 2411/4200 Training loss: 0.7916 1.7419 sec/batch
Epoch 173/300 Iteration 2412/4200 Training loss: 0.7878 1.7851 sec/batch
Epoch 173/300 Iteration 2413/4200 Training loss: 0.7873 1.7939 sec/batch
Epoch 173/300 Iteration 2414/4200 Training loss: 0.7887 1.7412 sec/batch
Epoch 173/300 Iteration 2415/4200 Training loss: 0.7862 1.7471 sec/batch
Epoch 173/300 Iteration 2416/4200 Training loss: 0.7852 1.7949 sec/batch
Epoch 173/300 Iteration 2417/4200 Training loss: 0.7829 1.7953 sec/batch
Epoch 173/300 Iteration 2418/4200 Training loss: 0.7828 1.7949 sec/batch
Epoch 173/300 Iteration 2419/4200 Training loss: 0.7835 1.7964 sec/batch
Epoch 173/300 Iteration 2420/4200 Training loss: 0.7821 1.7945 sec/batch
Epoch 173/300 Iteration 2421/4200 Training loss: 0.7839 1.7969 sec/batch
Epoch 173/300 Iteration 2422/4200 Training loss: 0.7843 1.8000 sec/batch
Epoch 174/300 Iteration 2423/4200 Training loss: 0.8274 1.7956 sec/batch
Epoch 174/300 Iteration 2424/4200 Training loss: 0.7911 1.7968 sec/batch
Epoch 174/300 Iteration 2425/4200 Training loss: 0.7809 1.7959 sec/batch
Epoch 174/300 Iteration 2426/4200 Training loss: 0.7826 1.7977 sec/batch
Epoch 174/300 Iteration 2427/4200 Training loss: 0.7825 1.7951 sec/batch
Epoch 174/300 Iteration 2428/4200 Training loss: 0.7822 1.7954 sec/batch
Epoch 174/300 Iteration 2429/4200 Training loss: 0.7804 1.8011 sec/batch
Epoch 174/300 Iteration 2430/4200 Training loss: 0.7771 1.7443 sec/batch
Epoch 174/300 Iteration 2431/4200 Training loss: 0.7750 1.7403 sec/batch
Epoch 174/300 Iteration 2432/4200 Training loss: 0.7758 1.7460 sec/batch
Epoch 174/300 Iteration 2433/4200 Training loss: 0.7770 1.7437 sec/batch
Epoch 174/300 Iteration 2434/4200 Training loss: 0.7765 1.7410 sec/batch
Epoch 174/300 Iteration 2435/4200 Training loss: 0.7779 1.7460 sec/batch
Epoch 174/300 Iteration 2436/4200 Training loss: 0.7788 1.7382 sec/batch
Epoch 175/300 Iteration 2437/4200 Training loss: 0.8101 1.7416 sec/batch
Epoch 175/300 Iteration 2438/4200 Training loss: 0.7801 1.7388 sec/batch
Epoch 175/300 Iteration 2439/4200 Training loss: 0.7756 1.7454 sec/batch
Epoch 175/300 Iteration 2440/4200 Training loss: 0.7749 1.7438 sec/batch
Epoch 175/300 Iteration 2441/4200 Training loss: 0.7759 1.7378 sec/batch
Epoch 175/300 Iteration 2442/4200 Training loss: 0.7748 1.7397 sec/batch
Epoch 175/300 Iteration 2443/4200 Training loss: 0.7704 1.7394 sec/batch
Epoch 175/300 Iteration 2444/4200 Training loss: 0.7693 1.7406 sec/batch
Epoch 175/300 Iteration 2445/4200 Training loss: 0.7663 1.7430 sec/batch
Epoch 175/300 Iteration 2446/4200 Training loss: 0.7660 1.7411 sec/batch
Epoch 175/300 Iteration 2447/4200 Training loss: 0.7669 1.7398 sec/batch
Epoch 175/300 Iteration 2448/4200 Training loss: 0.7659 1.7401 sec/batch
Epoch 175/300 Iteration 2449/4200 Training loss: 0.7663 1.7449 sec/batch
Epoch 175/300 Iteration 2450/4200 Training loss: 0.7666 1.7611 sec/batch
Epoch 176/300 Iteration 2451/4200 Training loss: 0.8112 1.7605 sec/batch
Epoch 176/300 Iteration 2452/4200 Training loss: 0.7784 1.7983 sec/batch
Epoch 176/300 Iteration 2453/4200 Training loss: 0.7695 1.7953 sec/batch
Epoch 176/300 Iteration 2454/4200 Training loss: 0.7667 1.7432 sec/batch
Epoch 176/300 Iteration 2455/4200 Training loss: 0.7679 1.7403 sec/batch
Epoch 176/300 Iteration 2456/4200 Training loss: 0.7670 1.7381 sec/batch
Epoch 176/300 Iteration 2457/4200 Training loss: 0.7637 1.7771 sec/batch
Epoch 176/300 Iteration 2458/4200 Training loss: 0.7614 1.7958 sec/batch
Epoch 176/300 Iteration 2459/4200 Training loss: 0.7594 1.7990 sec/batch
Epoch 176/300 Iteration 2460/4200 Training loss: 0.7596 1.7953 sec/batch
Epoch 176/300 Iteration 2461/4200 Training loss: 0.7611 1.7396 sec/batch
Epoch 176/300 Iteration 2462/4200 Training loss: 0.7605 1.7391 sec/batch
Epoch 176/300 Iteration 2463/4200 Training loss: 0.7616 1.7526 sec/batch
Epoch 176/300 Iteration 2464/4200 Training loss: 0.7629 1.7396 sec/batch
Epoch 177/300 Iteration 2465/4200 Training loss: 0.8158 1.7374 sec/batch
Epoch 177/300 Iteration 2466/4200 Training loss: 0.7828 1.7465 sec/batch
Epoch 177/300 Iteration 2467/4200 Training loss: 0.7740 1.7384 sec/batch
Epoch 177/300 Iteration 2468/4200 Training loss: 0.7690 1.7434 sec/batch
Epoch 177/300 Iteration 2469/4200 Training loss: 0.7657 1.7383 sec/batch
Epoch 177/300 Iteration 2470/4200 Training loss: 0.7653 1.7410 sec/batch
Epoch 177/300 Iteration 2471/4200 Training loss: 0.7612 1.7411 sec/batch
Epoch 177/300 Iteration 2472/4200 Training loss: 0.7588 1.7400 sec/batch
Epoch 177/300 Iteration 2473/4200 Training loss: 0.7580 1.7438 sec/batch
Epoch 177/300 Iteration 2474/4200 Training loss: 0.7603 1.7406 sec/batch
Epoch 177/300 Iteration 2475/4200 Training loss: 0.7599 1.7386 sec/batch
Epoch 177/300 Iteration 2476/4200 Training loss: 0.7595 1.7370 sec/batch
Epoch 177/300 Iteration 2477/4200 Training loss: 0.7614 1.7389 sec/batch
Epoch 177/300 Iteration 2478/4200 Training loss: 0.7613 1.7393 sec/batch
Epoch 178/300 Iteration 2479/4200 Training loss: 0.7873 1.7420 sec/batch
Epoch 178/300 Iteration 2480/4200 Training loss: 0.7626 1.7390 sec/batch
Epoch 178/300 Iteration 2481/4200 Training loss: 0.7546 1.7394 sec/batch
Epoch 178/300 Iteration 2482/4200 Training loss: 0.7556 1.7416 sec/batch
Epoch 178/300 Iteration 2483/4200 Training loss: 0.7594 1.7504 sec/batch
Epoch 178/300 Iteration 2484/4200 Training loss: 0.7590 1.7395 sec/batch
Epoch 178/300 Iteration 2485/4200 Training loss: 0.7529 1.7422 sec/batch
Epoch 178/300 Iteration 2486/4200 Training loss: 0.7512 1.7387 sec/batch
Epoch 178/300 Iteration 2487/4200 Training loss: 0.7483 1.7396 sec/batch
Epoch 178/300 Iteration 2488/4200 Training loss: 0.7490 1.7406 sec/batch
Epoch 178/300 Iteration 2489/4200 Training loss: 0.7497 1.7370 sec/batch
Epoch 178/300 Iteration 2490/4200 Training loss: 0.7495 1.7442 sec/batch
Epoch 178/300 Iteration 2491/4200 Training loss: 0.7512 1.7385 sec/batch
Epoch 178/300 Iteration 2492/4200 Training loss: 0.7515 1.7385 sec/batch
Epoch 179/300 Iteration 2493/4200 Training loss: 0.7830 1.7428 sec/batch
Epoch 179/300 Iteration 2494/4200 Training loss: 0.7540 1.7393 sec/batch
Epoch 179/300 Iteration 2495/4200 Training loss: 0.7477 1.7397 sec/batch
Epoch 179/300 Iteration 2496/4200 Training loss: 0.7487 1.7383 sec/batch
Epoch 179/300 Iteration 2497/4200 Training loss: 0.7482 1.7379 sec/batch
Epoch 179/300 Iteration 2498/4200 Training loss: 0.7461 1.7860 sec/batch
Epoch 179/300 Iteration 2499/4200 Training loss: 0.7434 1.7933 sec/batch
Epoch 179/300 Iteration 2500/4200 Training loss: 0.7418 1.7939 sec/batch
Validation loss: 2.2003 Saving checkpoint!
Epoch 179/300 Iteration 2501/4200 Training loss: 0.7866 3.1858 sec/batch
Epoch 179/300 Iteration 2502/4200 Training loss: 0.7837 1.8586 sec/batch
Epoch 179/300 Iteration 2503/4200 Training loss: 0.7830 1.6940 sec/batch
Epoch 179/300 Iteration 2504/4200 Training loss: 0.7800 1.7043 sec/batch
Epoch 179/300 Iteration 2505/4200 Training loss: 0.7791 1.7483 sec/batch
Epoch 179/300 Iteration 2506/4200 Training loss: 0.7773 1.7388 sec/batch
Epoch 180/300 Iteration 2507/4200 Training loss: 0.7935 1.7388 sec/batch
Epoch 180/300 Iteration 2508/4200 Training loss: 0.7619 1.7380 sec/batch
Epoch 180/300 Iteration 2509/4200 Training loss: 0.7507 1.7377 sec/batch
Epoch 180/300 Iteration 2510/4200 Training loss: 0.7493 1.7396 sec/batch
Epoch 180/300 Iteration 2511/4200 Training loss: 0.7482 1.7383 sec/batch
Epoch 180/300 Iteration 2512/4200 Training loss: 0.7482 1.7528 sec/batch
Epoch 180/300 Iteration 2513/4200 Training loss: 0.7462 1.7957 sec/batch
Epoch 180/300 Iteration 2514/4200 Training loss: 0.7420 1.7963 sec/batch
Epoch 180/300 Iteration 2515/4200 Training loss: 0.7411 1.8140 sec/batch
Epoch 180/300 Iteration 2516/4200 Training loss: 0.7418 1.7521 sec/batch
Epoch 180/300 Iteration 2517/4200 Training loss: 0.7425 1.7478 sec/batch
Epoch 180/300 Iteration 2518/4200 Training loss: 0.7415 1.7398 sec/batch
Epoch 180/300 Iteration 2519/4200 Training loss: 0.7425 1.7396 sec/batch
Epoch 180/300 Iteration 2520/4200 Training loss: 0.7426 1.7383 sec/batch
Epoch 181/300 Iteration 2521/4200 Training loss: 0.7668 1.7390 sec/batch
Epoch 181/300 Iteration 2522/4200 Training loss: 0.7329 1.7394 sec/batch
Epoch 181/300 Iteration 2523/4200 Training loss: 0.7291 1.7396 sec/batch
Epoch 181/300 Iteration 2524/4200 Training loss: 0.7340 1.7392 sec/batch
Epoch 181/300 Iteration 2525/4200 Training loss: 0.7352 1.7461 sec/batch
Epoch 181/300 Iteration 2526/4200 Training loss: 0.7351 1.7435 sec/batch
Epoch 181/300 Iteration 2527/4200 Training loss: 0.7318 1.7410 sec/batch
Epoch 181/300 Iteration 2528/4200 Training loss: 0.7311 1.7402 sec/batch
Epoch 181/300 Iteration 2529/4200 Training loss: 0.7295 1.7512 sec/batch
Epoch 181/300 Iteration 2530/4200 Training loss: 0.7307 1.7374 sec/batch
Epoch 181/300 Iteration 2531/4200 Training loss: 0.7322 1.7798 sec/batch
Epoch 181/300 Iteration 2532/4200 Training loss: 0.7318 1.7934 sec/batch
Epoch 181/300 Iteration 2533/4200 Training loss: 0.7330 1.7947 sec/batch
Epoch 181/300 Iteration 2534/4200 Training loss: 0.7324 1.7440 sec/batch
Epoch 182/300 Iteration 2535/4200 Training loss: 0.7782 1.7408 sec/batch
Epoch 182/300 Iteration 2536/4200 Training loss: 0.7376 1.7803 sec/batch
Epoch 182/300 Iteration 2537/4200 Training loss: 0.7298 1.7977 sec/batch
Epoch 182/300 Iteration 2538/4200 Training loss: 0.7294 1.7944 sec/batch
Epoch 182/300 Iteration 2539/4200 Training loss: 0.7320 1.8040 sec/batch
Epoch 182/300 Iteration 2540/4200 Training loss: 0.7307 1.7997 sec/batch
Epoch 182/300 Iteration 2541/4200 Training loss: 0.7286 1.7964 sec/batch
Epoch 182/300 Iteration 2542/4200 Training loss: 0.7275 1.7983 sec/batch
Epoch 182/300 Iteration 2543/4200 Training loss: 0.7262 1.7435 sec/batch
Epoch 182/300 Iteration 2544/4200 Training loss: 0.7281 1.7394 sec/batch
Epoch 182/300 Iteration 2545/4200 Training loss: 0.7288 1.7393 sec/batch
Epoch 182/300 Iteration 2546/4200 Training loss: 0.7283 1.7433 sec/batch
Epoch 182/300 Iteration 2547/4200 Training loss: 0.7315 1.7401 sec/batch
Epoch 182/300 Iteration 2548/4200 Training loss: 0.7316 1.7386 sec/batch
Epoch 183/300 Iteration 2549/4200 Training loss: 0.7711 1.7415 sec/batch
Epoch 183/300 Iteration 2550/4200 Training loss: 0.7329 1.7382 sec/batch
Epoch 183/300 Iteration 2551/4200 Training loss: 0.7282 1.7385 sec/batch
Epoch 183/300 Iteration 2552/4200 Training loss: 0.7244 1.7434 sec/batch
Epoch 183/300 Iteration 2553/4200 Training loss: 0.7237 1.7412 sec/batch
Epoch 183/300 Iteration 2554/4200 Training loss: 0.7237 1.7399 sec/batch
Epoch 183/300 Iteration 2555/4200 Training loss: 0.7232 1.7408 sec/batch
Epoch 183/300 Iteration 2556/4200 Training loss: 0.7237 1.7380 sec/batch
Epoch 183/300 Iteration 2557/4200 Training loss: 0.7208 1.7396 sec/batch
Epoch 183/300 Iteration 2558/4200 Training loss: 0.7219 1.7432 sec/batch
Epoch 183/300 Iteration 2559/4200 Training loss: 0.7239 1.7388 sec/batch
Epoch 183/300 Iteration 2560/4200 Training loss: 0.7234 1.7386 sec/batch
Epoch 183/300 Iteration 2561/4200 Training loss: 0.7252 1.7786 sec/batch
Epoch 183/300 Iteration 2562/4200 Training loss: 0.7263 1.7427 sec/batch
Epoch 184/300 Iteration 2563/4200 Training loss: 0.7710 1.7412 sec/batch
Epoch 184/300 Iteration 2564/4200 Training loss: 0.7322 1.7630 sec/batch
Epoch 184/300 Iteration 2565/4200 Training loss: 0.7256 1.7937 sec/batch
Epoch 184/300 Iteration 2566/4200 Training loss: 0.7267 1.7941 sec/batch
Epoch 184/300 Iteration 2567/4200 Training loss: 0.7266 1.7975 sec/batch
Epoch 184/300 Iteration 2568/4200 Training loss: 0.7288 1.8444 sec/batch
Epoch 184/300 Iteration 2569/4200 Training loss: 0.7258 1.7938 sec/batch
Epoch 184/300 Iteration 2570/4200 Training loss: 0.7257 1.8077 sec/batch
Epoch 184/300 Iteration 2571/4200 Training loss: 0.7250 1.7464 sec/batch
Epoch 184/300 Iteration 2572/4200 Training loss: 0.7268 1.7410 sec/batch
Epoch 184/300 Iteration 2573/4200 Training loss: 0.7281 1.7412 sec/batch
Epoch 184/300 Iteration 2574/4200 Training loss: 0.7281 1.7406 sec/batch
Epoch 184/300 Iteration 2575/4200 Training loss: 0.7276 1.7401 sec/batch
Epoch 184/300 Iteration 2576/4200 Training loss: 0.7282 1.7408 sec/batch
Epoch 185/300 Iteration 2577/4200 Training loss: 0.7661 1.7397 sec/batch
Epoch 185/300 Iteration 2578/4200 Training loss: 0.7304 1.7418 sec/batch
Epoch 185/300 Iteration 2579/4200 Training loss: 0.7234 1.7427 sec/batch
Epoch 185/300 Iteration 2580/4200 Training loss: 0.7227 1.7418 sec/batch
Epoch 185/300 Iteration 2581/4200 Training loss: 0.7244 1.7437 sec/batch
Epoch 185/300 Iteration 2582/4200 Training loss: 0.7240 1.7403 sec/batch
Epoch 185/300 Iteration 2583/4200 Training loss: 0.7211 1.7432 sec/batch
Epoch 185/300 Iteration 2584/4200 Training loss: 0.7207 1.7485 sec/batch
Epoch 185/300 Iteration 2585/4200 Training loss: 0.7194 1.7600 sec/batch
Epoch 185/300 Iteration 2586/4200 Training loss: 0.7196 1.7407 sec/batch
Epoch 185/300 Iteration 2587/4200 Training loss: 0.7206 1.6948 sec/batch
Epoch 185/300 Iteration 2588/4200 Training loss: 0.7200 1.7150 sec/batch
Epoch 185/300 Iteration 2589/4200 Training loss: 0.7214 1.7397 sec/batch
Epoch 185/300 Iteration 2590/4200 Training loss: 0.7212 1.7437 sec/batch
Epoch 186/300 Iteration 2591/4200 Training loss: 0.7564 1.7382 sec/batch
Epoch 186/300 Iteration 2592/4200 Training loss: 0.7226 1.7360 sec/batch
Epoch 186/300 Iteration 2593/4200 Training loss: 0.7146 1.7200 sec/batch
Epoch 186/300 Iteration 2594/4200 Training loss: 0.7141 1.7385 sec/batch
Epoch 186/300 Iteration 2595/4200 Training loss: 0.7175 1.7397 sec/batch
Epoch 186/300 Iteration 2596/4200 Training loss: 0.7173 1.7399 sec/batch
Epoch 186/300 Iteration 2597/4200 Training loss: 0.7164 1.7400 sec/batch
Epoch 186/300 Iteration 2598/4200 Training loss: 0.7164 1.7441 sec/batch
Epoch 186/300 Iteration 2599/4200 Training loss: 0.7149 1.7399 sec/batch
Epoch 186/300 Iteration 2600/4200 Training loss: 0.7149 1.7393 sec/batch
Validation loss: 2.28544 Saving checkpoint!
Epoch 186/300 Iteration 2601/4200 Training loss: 0.7554 3.1800 sec/batch
Epoch 186/300 Iteration 2602/4200 Training loss: 0.7524 1.9131 sec/batch
Epoch 186/300 Iteration 2603/4200 Training loss: 0.7498 1.6720 sec/batch
Epoch 186/300 Iteration 2604/4200 Training loss: 0.7489 1.7050 sec/batch
Epoch 187/300 Iteration 2605/4200 Training loss: 0.7592 1.6976 sec/batch
Epoch 187/300 Iteration 2606/4200 Training loss: 0.7176 1.7090 sec/batch
Epoch 187/300 Iteration 2607/4200 Training loss: 0.7116 1.7406 sec/batch
Epoch 187/300 Iteration 2608/4200 Training loss: 0.7091 1.7411 sec/batch
Epoch 187/300 Iteration 2609/4200 Training loss: 0.7106 1.7408 sec/batch
Epoch 187/300 Iteration 2610/4200 Training loss: 0.7115 1.7380 sec/batch
Epoch 187/300 Iteration 2611/4200 Training loss: 0.7099 1.7390 sec/batch
Epoch 187/300 Iteration 2612/4200 Training loss: 0.7094 1.7400 sec/batch
Epoch 187/300 Iteration 2613/4200 Training loss: 0.7080 1.7433 sec/batch
Epoch 187/300 Iteration 2614/4200 Training loss: 0.7080 1.7396 sec/batch
Epoch 187/300 Iteration 2615/4200 Training loss: 0.7090 1.7374 sec/batch
Epoch 187/300 Iteration 2616/4200 Training loss: 0.7087 1.7414 sec/batch
Epoch 187/300 Iteration 2617/4200 Training loss: 0.7108 1.7401 sec/batch
Epoch 187/300 Iteration 2618/4200 Training loss: 0.7105 1.7447 sec/batch
Epoch 188/300 Iteration 2619/4200 Training loss: 0.7559 1.7370 sec/batch
Epoch 188/300 Iteration 2620/4200 Training loss: 0.7237 1.7382 sec/batch
Epoch 188/300 Iteration 2621/4200 Training loss: 0.7170 1.7399 sec/batch
Epoch 188/300 Iteration 2622/4200 Training loss: 0.7116 1.7427 sec/batch
Epoch 188/300 Iteration 2623/4200 Training loss: 0.7114 1.7393 sec/batch
Epoch 188/300 Iteration 2624/4200 Training loss: 0.7096 1.7403 sec/batch
Epoch 188/300 Iteration 2625/4200 Training loss: 0.7057 1.7412 sec/batch
Epoch 188/300 Iteration 2626/4200 Training loss: 0.7035 1.7399 sec/batch
Epoch 188/300 Iteration 2627/4200 Training loss: 0.7013 1.7384 sec/batch
Epoch 188/300 Iteration 2628/4200 Training loss: 0.7013 1.7426 sec/batch
Epoch 188/300 Iteration 2629/4200 Training loss: 0.7019 1.7427 sec/batch
Epoch 188/300 Iteration 2630/4200 Training loss: 0.7014 1.7422 sec/batch
Epoch 188/300 Iteration 2631/4200 Training loss: 0.7028 1.7400 sec/batch
Epoch 188/300 Iteration 2632/4200 Training loss: 0.7032 1.7389 sec/batch
Epoch 189/300 Iteration 2633/4200 Training loss: 0.7439 1.7416 sec/batch
Epoch 189/300 Iteration 2634/4200 Training loss: 0.7193 1.7407 sec/batch
Epoch 189/300 Iteration 2635/4200 Training loss: 0.7077 1.7398 sec/batch
Epoch 189/300 Iteration 2636/4200 Training loss: 0.7059 1.7407 sec/batch
Epoch 189/300 Iteration 2637/4200 Training loss: 0.7072 1.7438 sec/batch
Epoch 189/300 Iteration 2638/4200 Training loss: 0.7052 1.7411 sec/batch
Epoch 189/300 Iteration 2639/4200 Training loss: 0.7020 1.7444 sec/batch
Epoch 189/300 Iteration 2640/4200 Training loss: 0.7011 1.7392 sec/batch
Epoch 189/300 Iteration 2641/4200 Training loss: 0.7001 1.7403 sec/batch
Epoch 189/300 Iteration 2642/4200 Training loss: 0.7023 1.7377 sec/batch
Epoch 189/300 Iteration 2643/4200 Training loss: 0.7022 1.7434 sec/batch
Epoch 189/300 Iteration 2644/4200 Training loss: 0.7020 1.7388 sec/batch
Epoch 189/300 Iteration 2645/4200 Training loss: 0.7033 1.7389 sec/batch
Epoch 189/300 Iteration 2646/4200 Training loss: 0.7032 1.7427 sec/batch
Epoch 190/300 Iteration 2647/4200 Training loss: 0.7449 1.7410 sec/batch
Epoch 190/300 Iteration 2648/4200 Training loss: 0.7080 1.7376 sec/batch
Epoch 190/300 Iteration 2649/4200 Training loss: 0.6994 1.7382 sec/batch
Epoch 190/300 Iteration 2650/4200 Training loss: 0.6992 1.7661 sec/batch
Epoch 190/300 Iteration 2651/4200 Training loss: 0.7005 1.7604 sec/batch
Epoch 190/300 Iteration 2652/4200 Training loss: 0.7010 1.7419 sec/batch
Epoch 190/300 Iteration 2653/4200 Training loss: 0.6992 1.7502 sec/batch
Epoch 190/300 Iteration 2654/4200 Training loss: 0.6994 1.7411 sec/batch
Epoch 190/300 Iteration 2655/4200 Training loss: 0.6979 1.7415 sec/batch
Epoch 190/300 Iteration 2656/4200 Training loss: 0.6977 1.7424 sec/batch
Epoch 190/300 Iteration 2657/4200 Training loss: 0.6978 1.7407 sec/batch
Epoch 190/300 Iteration 2658/4200 Training loss: 0.6976 1.6952 sec/batch
Epoch 190/300 Iteration 2659/4200 Training loss: 0.6989 1.6959 sec/batch
Epoch 190/300 Iteration 2660/4200 Training loss: 0.6985 1.6979 sec/batch
Epoch 191/300 Iteration 2661/4200 Training loss: 0.7456 1.6943 sec/batch
Epoch 191/300 Iteration 2662/4200 Training loss: 0.7084 1.6949 sec/batch
Epoch 191/300 Iteration 2663/4200 Training loss: 0.6979 1.7161 sec/batch
Epoch 191/300 Iteration 2664/4200 Training loss: 0.6980 1.7450 sec/batch
Epoch 191/300 Iteration 2665/4200 Training loss: 0.6966 1.7383 sec/batch
Epoch 191/300 Iteration 2666/4200 Training loss: 0.6960 1.7420 sec/batch
Epoch 191/300 Iteration 2667/4200 Training loss: 0.6969 1.6955 sec/batch
Epoch 191/300 Iteration 2668/4200 Training loss: 0.6938 1.7145 sec/batch
Epoch 191/300 Iteration 2669/4200 Training loss: 0.6923 1.7392 sec/batch
Epoch 191/300 Iteration 2670/4200 Training loss: 0.6933 1.7424 sec/batch
Epoch 191/300 Iteration 2671/4200 Training loss: 0.6959 1.7392 sec/batch
Epoch 191/300 Iteration 2672/4200 Training loss: 0.6942 1.7429 sec/batch
Epoch 191/300 Iteration 2673/4200 Training loss: 0.6960 1.7402 sec/batch
Epoch 191/300 Iteration 2674/4200 Training loss: 0.6957 1.7390 sec/batch
Epoch 192/300 Iteration 2675/4200 Training loss: 0.7479 1.7370 sec/batch
Epoch 192/300 Iteration 2676/4200 Training loss: 0.7152 1.7400 sec/batch
Epoch 192/300 Iteration 2677/4200 Training loss: 0.7036 1.7386 sec/batch
Epoch 192/300 Iteration 2678/4200 Training loss: 0.7001 1.7392 sec/batch
Epoch 192/300 Iteration 2679/4200 Training loss: 0.6974 1.7387 sec/batch
Epoch 192/300 Iteration 2680/4200 Training loss: 0.6946 1.7379 sec/batch
Epoch 192/300 Iteration 2681/4200 Training loss: 0.6931 1.7402 sec/batch
Epoch 192/300 Iteration 2682/4200 Training loss: 0.6928 1.7377 sec/batch
Epoch 192/300 Iteration 2683/4200 Training loss: 0.6932 1.7431 sec/batch
Epoch 192/300 Iteration 2684/4200 Training loss: 0.6937 1.7399 sec/batch
Epoch 192/300 Iteration 2685/4200 Training loss: 0.6948 1.7396 sec/batch
Epoch 192/300 Iteration 2686/4200 Training loss: 0.6936 1.7411 sec/batch
Epoch 192/300 Iteration 2687/4200 Training loss: 0.6943 1.7394 sec/batch
Epoch 192/300 Iteration 2688/4200 Training loss: 0.6946 1.7418 sec/batch
Epoch 193/300 Iteration 2689/4200 Training loss: 0.7332 1.7411 sec/batch
Epoch 193/300 Iteration 2690/4200 Training loss: 0.6973 1.7426 sec/batch
Epoch 193/300 Iteration 2691/4200 Training loss: 0.6924 1.7403 sec/batch
Epoch 193/300 Iteration 2692/4200 Training loss: 0.6949 1.7387 sec/batch
Epoch 193/300 Iteration 2693/4200 Training loss: 0.6915 1.7391 sec/batch
Epoch 193/300 Iteration 2694/4200 Training loss: 0.6910 1.7464 sec/batch
Epoch 193/300 Iteration 2695/4200 Training loss: 0.6882 1.7412 sec/batch
Epoch 193/300 Iteration 2696/4200 Training loss: 0.6872 1.7402 sec/batch
Epoch 193/300 Iteration 2697/4200 Training loss: 0.6865 1.7424 sec/batch
Epoch 193/300 Iteration 2698/4200 Training loss: 0.6885 1.7453 sec/batch
Epoch 193/300 Iteration 2699/4200 Training loss: 0.6898 1.7419 sec/batch
Epoch 193/300 Iteration 2700/4200 Training loss: 0.6890 1.7397 sec/batch
Validation loss: 2.28805 Saving checkpoint!
Epoch 193/300 Iteration 2701/4200 Training loss: 0.7225 3.2157 sec/batch
Epoch 193/300 Iteration 2702/4200 Training loss: 0.7197 1.8153 sec/batch
Epoch 194/300 Iteration 2703/4200 Training loss: 0.7356 1.6396 sec/batch
Epoch 194/300 Iteration 2704/4200 Training loss: 0.7078 1.6748 sec/batch
Epoch 194/300 Iteration 2705/4200 Training loss: 0.6956 1.6939 sec/batch
Epoch 194/300 Iteration 2706/4200 Training loss: 0.6956 1.6946 sec/batch
Epoch 194/300 Iteration 2707/4200 Training loss: 0.6949 1.6915 sec/batch
Epoch 194/300 Iteration 2708/4200 Training loss: 0.6938 1.6931 sec/batch
Epoch 194/300 Iteration 2709/4200 Training loss: 0.6911 1.6894 sec/batch
Epoch 194/300 Iteration 2710/4200 Training loss: 0.6893 1.6928 sec/batch
Epoch 194/300 Iteration 2711/4200 Training loss: 0.6865 1.6904 sec/batch
Epoch 194/300 Iteration 2712/4200 Training loss: 0.6865 1.7043 sec/batch
Epoch 194/300 Iteration 2713/4200 Training loss: 0.6880 1.7422 sec/batch
Epoch 194/300 Iteration 2714/4200 Training loss: 0.6873 1.7395 sec/batch
Epoch 194/300 Iteration 2715/4200 Training loss: 0.6874 1.6954 sec/batch
Epoch 194/300 Iteration 2716/4200 Training loss: 0.6865 1.7547 sec/batch
Epoch 195/300 Iteration 2717/4200 Training loss: 0.7281 1.7391 sec/batch
Epoch 195/300 Iteration 2718/4200 Training loss: 0.6884 1.7108 sec/batch
Epoch 195/300 Iteration 2719/4200 Training loss: 0.6805 1.6964 sec/batch
Epoch 195/300 Iteration 2720/4200 Training loss: 0.6777 1.6995 sec/batch
Epoch 195/300 Iteration 2721/4200 Training loss: 0.6775 1.7388 sec/batch
Epoch 195/300 Iteration 2722/4200 Training loss: 0.6779 1.7393 sec/batch
Epoch 195/300 Iteration 2723/4200 Training loss: 0.6753 1.7422 sec/batch
Epoch 195/300 Iteration 2724/4200 Training loss: 0.6731 1.7385 sec/batch
Epoch 195/300 Iteration 2725/4200 Training loss: 0.6729 1.7393 sec/batch
Epoch 195/300 Iteration 2726/4200 Training loss: 0.6729 1.7393 sec/batch
Epoch 195/300 Iteration 2727/4200 Training loss: 0.6738 1.7393 sec/batch
Epoch 195/300 Iteration 2728/4200 Training loss: 0.6739 1.7395 sec/batch
Epoch 195/300 Iteration 2729/4200 Training loss: 0.6750 1.7392 sec/batch
Epoch 195/300 Iteration 2730/4200 Training loss: 0.6752 1.7444 sec/batch
Epoch 196/300 Iteration 2731/4200 Training loss: 0.7096 1.7385 sec/batch
Epoch 196/300 Iteration 2732/4200 Training loss: 0.6840 1.7396 sec/batch
Epoch 196/300 Iteration 2733/4200 Training loss: 0.6773 1.7422 sec/batch
Epoch 196/300 Iteration 2734/4200 Training loss: 0.6760 1.7377 sec/batch
Epoch 196/300 Iteration 2735/4200 Training loss: 0.6752 1.7387 sec/batch
Epoch 196/300 Iteration 2736/4200 Training loss: 0.6756 1.7400 sec/batch
Epoch 196/300 Iteration 2737/4200 Training loss: 0.6735 1.7437 sec/batch
Epoch 196/300 Iteration 2738/4200 Training loss: 0.6719 1.7415 sec/batch
Epoch 196/300 Iteration 2739/4200 Training loss: 0.6703 1.7392 sec/batch
Epoch 196/300 Iteration 2740/4200 Training loss: 0.6719 1.7448 sec/batch
Epoch 196/300 Iteration 2741/4200 Training loss: 0.6748 1.7412 sec/batch
Epoch 196/300 Iteration 2742/4200 Training loss: 0.6736 1.7389 sec/batch
Epoch 196/300 Iteration 2743/4200 Training loss: 0.6757 1.7413 sec/batch
Epoch 196/300 Iteration 2744/4200 Training loss: 0.6747 1.7393 sec/batch
Epoch 197/300 Iteration 2745/4200 Training loss: 0.7191 1.7424 sec/batch
Epoch 197/300 Iteration 2746/4200 Training loss: 0.6866 1.7385 sec/batch
Epoch 197/300 Iteration 2747/4200 Training loss: 0.6779 1.7375 sec/batch
Epoch 197/300 Iteration 2748/4200 Training loss: 0.6768 1.7388 sec/batch
Epoch 197/300 Iteration 2749/4200 Training loss: 0.6765 1.7396 sec/batch
Epoch 197/300 Iteration 2750/4200 Training loss: 0.6767 1.7441 sec/batch
Epoch 197/300 Iteration 2751/4200 Training loss: 0.6740 1.7387 sec/batch
Epoch 197/300 Iteration 2752/4200 Training loss: 0.6708 1.7374 sec/batch
Epoch 197/300 Iteration 2753/4200 Training loss: 0.6706 1.7374 sec/batch
Epoch 197/300 Iteration 2754/4200 Training loss: 0.6709 1.7475 sec/batch
Epoch 197/300 Iteration 2755/4200 Training loss: 0.6708 1.7390 sec/batch
Epoch 197/300 Iteration 2756/4200 Training loss: 0.6705 1.7392 sec/batch
Epoch 197/300 Iteration 2757/4200 Training loss: 0.6711 1.7444 sec/batch
Epoch 197/300 Iteration 2758/4200 Training loss: 0.6709 1.7379 sec/batch
Epoch 198/300 Iteration 2759/4200 Training loss: 0.7063 1.7796 sec/batch
Epoch 198/300 Iteration 2760/4200 Training loss: 0.6703 1.7953 sec/batch
Epoch 198/300 Iteration 2761/4200 Training loss: 0.6652 1.7476 sec/batch
Epoch 198/300 Iteration 2762/4200 Training loss: 0.6685 1.7435 sec/batch
Epoch 198/300 Iteration 2763/4200 Training loss: 0.6662 1.7405 sec/batch
Epoch 198/300 Iteration 2764/4200 Training loss: 0.6667 1.7396 sec/batch
Epoch 198/300 Iteration 2765/4200 Training loss: 0.6631 1.7453 sec/batch
Epoch 198/300 Iteration 2766/4200 Training loss: 0.6618 1.7405 sec/batch
Epoch 198/300 Iteration 2767/4200 Training loss: 0.6607 1.7398 sec/batch
Epoch 198/300 Iteration 2768/4200 Training loss: 0.6622 1.7385 sec/batch
Epoch 198/300 Iteration 2769/4200 Training loss: 0.6648 1.7436 sec/batch
Epoch 198/300 Iteration 2770/4200 Training loss: 0.6641 1.7378 sec/batch
Epoch 198/300 Iteration 2771/4200 Training loss: 0.6657 1.7395 sec/batch
Epoch 198/300 Iteration 2772/4200 Training loss: 0.6655 1.7395 sec/batch
Epoch 199/300 Iteration 2773/4200 Training loss: 0.7077 1.7412 sec/batch
Epoch 199/300 Iteration 2774/4200 Training loss: 0.6741 1.7395 sec/batch
Epoch 199/300 Iteration 2775/4200 Training loss: 0.6599 1.7390 sec/batch
Epoch 199/300 Iteration 2776/4200 Training loss: 0.6591 1.7411 sec/batch
Epoch 199/300 Iteration 2777/4200 Training loss: 0.6595 1.7392 sec/batch
Epoch 199/300 Iteration 2778/4200 Training loss: 0.6599 1.7373 sec/batch
Epoch 199/300 Iteration 2779/4200 Training loss: 0.6574 1.7384 sec/batch
Epoch 199/300 Iteration 2780/4200 Training loss: 0.6552 1.7406 sec/batch
Epoch 199/300 Iteration 2781/4200 Training loss: 0.6549 1.7378 sec/batch
Epoch 199/300 Iteration 2782/4200 Training loss: 0.6571 1.7406 sec/batch
Epoch 199/300 Iteration 2783/4200 Training loss: 0.6582 1.7443 sec/batch
Epoch 199/300 Iteration 2784/4200 Training loss: 0.6581 1.7420 sec/batch
Epoch 199/300 Iteration 2785/4200 Training loss: 0.6597 1.7738 sec/batch
Epoch 199/300 Iteration 2786/4200 Training loss: 0.6603 1.7580 sec/batch
Epoch 200/300 Iteration 2787/4200 Training loss: 0.6919 1.7390 sec/batch
Epoch 200/300 Iteration 2788/4200 Training loss: 0.6662 1.7432 sec/batch
Epoch 200/300 Iteration 2789/4200 Training loss: 0.6581 1.7383 sec/batch
Epoch 200/300 Iteration 2790/4200 Training loss: 0.6607 1.7398 sec/batch
Epoch 200/300 Iteration 2791/4200 Training loss: 0.6588 1.7402 sec/batch
Epoch 200/300 Iteration 2792/4200 Training loss: 0.6576 1.7452 sec/batch
Epoch 200/300 Iteration 2793/4200 Training loss: 0.6556 1.7423 sec/batch
Epoch 200/300 Iteration 2794/4200 Training loss: 0.6547 1.7375 sec/batch
Epoch 200/300 Iteration 2795/4200 Training loss: 0.6543 1.7438 sec/batch
Epoch 200/300 Iteration 2796/4200 Training loss: 0.6561 1.7799 sec/batch
Epoch 200/300 Iteration 2797/4200 Training loss: 0.6573 1.7948 sec/batch
Epoch 200/300 Iteration 2798/4200 Training loss: 0.6557 1.7468 sec/batch
Epoch 200/300 Iteration 2799/4200 Training loss: 0.6562 1.7445 sec/batch
Epoch 200/300 Iteration 2800/4200 Training loss: 0.6554 1.7394 sec/batch
Validation loss: 2.3715 Saving checkpoint!
Epoch 201/300 Iteration 2801/4200 Training loss: 0.6877 3.2305 sec/batch
Epoch 201/300 Iteration 2802/4200 Training loss: 0.6599 1.8124 sec/batch
Epoch 201/300 Iteration 2803/4200 Training loss: 0.6563 1.6707 sec/batch
Epoch 201/300 Iteration 2804/4200 Training loss: 0.6523 1.6903 sec/batch
Epoch 201/300 Iteration 2805/4200 Training loss: 0.6534 1.6923 sec/batch
Epoch 201/300 Iteration 2806/4200 Training loss: 0.6522 1.6950 sec/batch
Epoch 201/300 Iteration 2807/4200 Training loss: 0.6508 1.6933 sec/batch
Epoch 201/300 Iteration 2808/4200 Training loss: 0.6503 1.6922 sec/batch
Epoch 201/300 Iteration 2809/4200 Training loss: 0.6485 1.7221 sec/batch
Epoch 201/300 Iteration 2810/4200 Training loss: 0.6490 1.7956 sec/batch
Epoch 201/300 Iteration 2811/4200 Training loss: 0.6502 1.7402 sec/batch
Epoch 201/300 Iteration 2812/4200 Training loss: 0.6486 1.7504 sec/batch
Epoch 201/300 Iteration 2813/4200 Training loss: 0.6506 1.7372 sec/batch
Epoch 201/300 Iteration 2814/4200 Training loss: 0.6507 1.7375 sec/batch
Epoch 202/300 Iteration 2815/4200 Training loss: 0.6886 1.7404 sec/batch
Epoch 202/300 Iteration 2816/4200 Training loss: 0.6603 1.7439 sec/batch
Epoch 202/300 Iteration 2817/4200 Training loss: 0.6509 1.7373 sec/batch
Epoch 202/300 Iteration 2818/4200 Training loss: 0.6506 1.7401 sec/batch
Epoch 202/300 Iteration 2819/4200 Training loss: 0.6478 1.7391 sec/batch
Epoch 202/300 Iteration 2820/4200 Training loss: 0.6484 1.7390 sec/batch
Epoch 202/300 Iteration 2821/4200 Training loss: 0.6463 1.7409 sec/batch
Epoch 202/300 Iteration 2822/4200 Training loss: 0.6445 1.7408 sec/batch
Epoch 202/300 Iteration 2823/4200 Training loss: 0.6431 1.7411 sec/batch
Epoch 202/300 Iteration 2824/4200 Training loss: 0.6439 1.7429 sec/batch
Epoch 202/300 Iteration 2825/4200 Training loss: 0.6454 1.7416 sec/batch
Epoch 202/300 Iteration 2826/4200 Training loss: 0.6447 1.7396 sec/batch
Epoch 202/300 Iteration 2827/4200 Training loss: 0.6460 1.7447 sec/batch
Epoch 202/300 Iteration 2828/4200 Training loss: 0.6453 1.7405 sec/batch
Epoch 203/300 Iteration 2829/4200 Training loss: 0.6904 1.7389 sec/batch
Epoch 203/300 Iteration 2830/4200 Training loss: 0.6556 1.7424 sec/batch
Epoch 203/300 Iteration 2831/4200 Training loss: 0.6483 1.7425 sec/batch
Epoch 203/300 Iteration 2832/4200 Training loss: 0.6463 1.7393 sec/batch
Epoch 203/300 Iteration 2833/4200 Training loss: 0.6449 1.7417 sec/batch
Epoch 203/300 Iteration 2834/4200 Training loss: 0.6436 1.7358 sec/batch
Epoch 203/300 Iteration 2835/4200 Training loss: 0.6404 1.7395 sec/batch
Epoch 203/300 Iteration 2836/4200 Training loss: 0.6381 1.7410 sec/batch
Epoch 203/300 Iteration 2837/4200 Training loss: 0.6360 1.7424 sec/batch
Epoch 203/300 Iteration 2838/4200 Training loss: 0.6361 1.7403 sec/batch
Epoch 203/300 Iteration 2839/4200 Training loss: 0.6364 1.7410 sec/batch
Epoch 203/300 Iteration 2840/4200 Training loss: 0.6363 1.7464 sec/batch
Epoch 203/300 Iteration 2841/4200 Training loss: 0.6381 1.7405 sec/batch
Epoch 203/300 Iteration 2842/4200 Training loss: 0.6391 1.7381 sec/batch
Epoch 204/300 Iteration 2843/4200 Training loss: 0.6802 1.7432 sec/batch
Epoch 204/300 Iteration 2844/4200 Training loss: 0.6428 1.7384 sec/batch
Epoch 204/300 Iteration 2845/4200 Training loss: 0.6339 1.7380 sec/batch
Epoch 204/300 Iteration 2846/4200 Training loss: 0.6346 1.7394 sec/batch
Epoch 204/300 Iteration 2847/4200 Training loss: 0.6341 1.7385 sec/batch
Epoch 204/300 Iteration 2848/4200 Training loss: 0.6363 1.7389 sec/batch
Epoch 204/300 Iteration 2849/4200 Training loss: 0.6327 1.7400 sec/batch
Epoch 204/300 Iteration 2850/4200 Training loss: 0.6299 1.7392 sec/batch
Epoch 204/300 Iteration 2851/4200 Training loss: 0.6277 1.7632 sec/batch
Epoch 204/300 Iteration 2852/4200 Training loss: 0.6297 1.7510 sec/batch
Epoch 204/300 Iteration 2853/4200 Training loss: 0.6311 1.7475 sec/batch
Epoch 204/300 Iteration 2854/4200 Training loss: 0.6320 1.7391 sec/batch
Epoch 204/300 Iteration 2855/4200 Training loss: 0.6330 1.7422 sec/batch
Epoch 204/300 Iteration 2856/4200 Training loss: 0.6336 1.7426 sec/batch
Epoch 205/300 Iteration 2857/4200 Training loss: 0.6877 1.7393 sec/batch
Epoch 205/300 Iteration 2858/4200 Training loss: 0.6491 1.7385 sec/batch
Epoch 205/300 Iteration 2859/4200 Training loss: 0.6382 1.7370 sec/batch
Epoch 205/300 Iteration 2860/4200 Training loss: 0.6356 1.7414 sec/batch
Epoch 205/300 Iteration 2861/4200 Training loss: 0.6348 1.7431 sec/batch
Epoch 205/300 Iteration 2862/4200 Training loss: 0.6367 1.7409 sec/batch
Epoch 205/300 Iteration 2863/4200 Training loss: 0.6348 1.7408 sec/batch
Epoch 205/300 Iteration 2864/4200 Training loss: 0.6326 1.7634 sec/batch
Epoch 205/300 Iteration 2865/4200 Training loss: 0.6298 1.8005 sec/batch
Epoch 205/300 Iteration 2866/4200 Training loss: 0.6305 1.7416 sec/batch
Epoch 205/300 Iteration 2867/4200 Training loss: 0.6305 1.7388 sec/batch
Epoch 205/300 Iteration 2868/4200 Training loss: 0.6295 1.7391 sec/batch
Epoch 205/300 Iteration 2869/4200 Training loss: 0.6307 1.7400 sec/batch
Epoch 205/300 Iteration 2870/4200 Training loss: 0.6321 1.7419 sec/batch
Epoch 206/300 Iteration 2871/4200 Training loss: 0.6694 1.7411 sec/batch
Epoch 206/300 Iteration 2872/4200 Training loss: 0.6318 1.7427 sec/batch
Epoch 206/300 Iteration 2873/4200 Training loss: 0.6311 1.7403 sec/batch
Epoch 206/300 Iteration 2874/4200 Training loss: 0.6289 1.7411 sec/batch
Epoch 206/300 Iteration 2875/4200 Training loss: 0.6284 1.7382 sec/batch
Epoch 206/300 Iteration 2876/4200 Training loss: 0.6293 1.7390 sec/batch
Epoch 206/300 Iteration 2877/4200 Training loss: 0.6270 1.7382 sec/batch
Epoch 206/300 Iteration 2878/4200 Training loss: 0.6272 1.7399 sec/batch
Epoch 206/300 Iteration 2879/4200 Training loss: 0.6256 1.7396 sec/batch
Epoch 206/300 Iteration 2880/4200 Training loss: 0.6263 1.7454 sec/batch
Epoch 206/300 Iteration 2881/4200 Training loss: 0.6281 1.7956 sec/batch
Epoch 206/300 Iteration 2882/4200 Training loss: 0.6287 1.7938 sec/batch
Epoch 206/300 Iteration 2883/4200 Training loss: 0.6309 1.7962 sec/batch
Epoch 206/300 Iteration 2884/4200 Training loss: 0.6318 1.7943 sec/batch
Epoch 207/300 Iteration 2885/4200 Training loss: 0.6585 1.8011 sec/batch
Epoch 207/300 Iteration 2886/4200 Training loss: 0.6311 1.7927 sec/batch
Epoch 207/300 Iteration 2887/4200 Training loss: 0.6268 1.7954 sec/batch
Epoch 207/300 Iteration 2888/4200 Training loss: 0.6285 1.8024 sec/batch
Epoch 207/300 Iteration 2889/4200 Training loss: 0.6319 1.7987 sec/batch
Epoch 207/300 Iteration 2890/4200 Training loss: 0.6313 1.7952 sec/batch
Epoch 207/300 Iteration 2891/4200 Training loss: 0.6279 1.7955 sec/batch
Epoch 207/300 Iteration 2892/4200 Training loss: 0.6277 1.8003 sec/batch
Epoch 207/300 Iteration 2893/4200 Training loss: 0.6273 1.7935 sec/batch
Epoch 207/300 Iteration 2894/4200 Training loss: 0.6267 1.7949 sec/batch
Epoch 207/300 Iteration 2895/4200 Training loss: 0.6274 1.8007 sec/batch
Epoch 207/300 Iteration 2896/4200 Training loss: 0.6272 1.7972 sec/batch
Epoch 207/300 Iteration 2897/4200 Training loss: 0.6281 1.7975 sec/batch
Epoch 207/300 Iteration 2898/4200 Training loss: 0.6275 1.8013 sec/batch
Epoch 208/300 Iteration 2899/4200 Training loss: 0.6759 1.8034 sec/batch
Epoch 208/300 Iteration 2900/4200 Training loss: 0.6512 1.7468 sec/batch
Validation loss: 2.44326 Saving checkpoint!
Epoch 208/300 Iteration 2901/4200 Training loss: 0.7974 3.2341 sec/batch
Epoch 208/300 Iteration 2902/4200 Training loss: 0.7533 1.8106 sec/batch
Epoch 208/300 Iteration 2903/4200 Training loss: 0.7319 1.6718 sec/batch
Epoch 208/300 Iteration 2904/4200 Training loss: 0.7136 1.6907 sec/batch
Epoch 208/300 Iteration 2905/4200 Training loss: 0.6989 1.6931 sec/batch
Epoch 208/300 Iteration 2906/4200 Training loss: 0.6888 1.6975 sec/batch
Epoch 208/300 Iteration 2907/4200 Training loss: 0.6810 1.7389 sec/batch
Epoch 208/300 Iteration 2908/4200 Training loss: 0.6755 1.7373 sec/batch
Epoch 208/300 Iteration 2909/4200 Training loss: 0.6727 1.7385 sec/batch
Epoch 208/300 Iteration 2910/4200 Training loss: 0.6684 1.7416 sec/batch
Epoch 208/300 Iteration 2911/4200 Training loss: 0.6674 1.7379 sec/batch
Epoch 208/300 Iteration 2912/4200 Training loss: 0.6657 1.7377 sec/batch
Epoch 209/300 Iteration 2913/4200 Training loss: 0.6703 1.7407 sec/batch
Epoch 209/300 Iteration 2914/4200 Training loss: 0.6446 1.7390 sec/batch
Epoch 209/300 Iteration 2915/4200 Training loss: 0.6369 1.7377 sec/batch
Epoch 209/300 Iteration 2916/4200 Training loss: 0.6337 1.7427 sec/batch
Epoch 209/300 Iteration 2917/4200 Training loss: 0.6290 1.7639 sec/batch
Epoch 209/300 Iteration 2918/4200 Training loss: 0.6271 1.7594 sec/batch
Epoch 209/300 Iteration 2919/4200 Training loss: 0.6251 1.7419 sec/batch
Epoch 209/300 Iteration 2920/4200 Training loss: 0.6214 1.7420 sec/batch
Epoch 209/300 Iteration 2921/4200 Training loss: 0.6193 1.7418 sec/batch
Epoch 209/300 Iteration 2922/4200 Training loss: 0.6202 1.7412 sec/batch
Epoch 209/300 Iteration 2923/4200 Training loss: 0.6220 1.7373 sec/batch
Epoch 209/300 Iteration 2924/4200 Training loss: 0.6219 1.7394 sec/batch
Epoch 209/300 Iteration 2925/4200 Training loss: 0.6226 1.7399 sec/batch
Epoch 209/300 Iteration 2926/4200 Training loss: 0.6229 1.7402 sec/batch
Epoch 210/300 Iteration 2927/4200 Training loss: 0.6540 1.7415 sec/batch
Epoch 210/300 Iteration 2928/4200 Training loss: 0.6344 1.7419 sec/batch
Epoch 210/300 Iteration 2929/4200 Training loss: 0.6297 1.7381 sec/batch
Epoch 210/300 Iteration 2930/4200 Training loss: 0.6275 1.7411 sec/batch
Epoch 210/300 Iteration 2931/4200 Training loss: 0.6259 1.7446 sec/batch
Epoch 210/300 Iteration 2932/4200 Training loss: 0.6234 1.7396 sec/batch
Epoch 210/300 Iteration 2933/4200 Training loss: 0.6199 1.7569 sec/batch
Epoch 210/300 Iteration 2934/4200 Training loss: 0.6189 1.7992 sec/batch
Epoch 210/300 Iteration 2935/4200 Training loss: 0.6175 1.7964 sec/batch
Epoch 210/300 Iteration 2936/4200 Training loss: 0.6190 1.7975 sec/batch
Epoch 210/300 Iteration 2937/4200 Training loss: 0.6196 1.7951 sec/batch
Epoch 210/300 Iteration 2938/4200 Training loss: 0.6192 1.7393 sec/batch
Epoch 210/300 Iteration 2939/4200 Training loss: 0.6204 1.7403 sec/batch
Epoch 210/300 Iteration 2940/4200 Training loss: 0.6210 1.7445 sec/batch
Epoch 211/300 Iteration 2941/4200 Training loss: 0.6543 1.7427 sec/batch
Epoch 211/300 Iteration 2942/4200 Training loss: 0.6175 1.7429 sec/batch
Epoch 211/300 Iteration 2943/4200 Training loss: 0.6205 1.7385 sec/batch
Epoch 211/300 Iteration 2944/4200 Training loss: 0.6233 1.7385 sec/batch
Epoch 211/300 Iteration 2945/4200 Training loss: 0.6214 1.7383 sec/batch
Epoch 211/300 Iteration 2946/4200 Training loss: 0.6194 1.7435 sec/batch
Epoch 211/300 Iteration 2947/4200 Training loss: 0.6165 1.7422 sec/batch
Epoch 211/300 Iteration 2948/4200 Training loss: 0.6169 1.7405 sec/batch
Epoch 211/300 Iteration 2949/4200 Training loss: 0.6144 1.7411 sec/batch
Epoch 211/300 Iteration 2950/4200 Training loss: 0.6151 1.7401 sec/batch
Epoch 211/300 Iteration 2951/4200 Training loss: 0.6154 1.7404 sec/batch
Epoch 211/300 Iteration 2952/4200 Training loss: 0.6152 1.7388 sec/batch
Epoch 211/300 Iteration 2953/4200 Training loss: 0.6163 1.7398 sec/batch
Epoch 211/300 Iteration 2954/4200 Training loss: 0.6171 1.7439 sec/batch
Epoch 212/300 Iteration 2955/4200 Training loss: 0.6586 1.7389 sec/batch
Epoch 212/300 Iteration 2956/4200 Training loss: 0.6334 1.7387 sec/batch
Epoch 212/300 Iteration 2957/4200 Training loss: 0.6267 1.7368 sec/batch
Epoch 212/300 Iteration 2958/4200 Training loss: 0.6234 1.7388 sec/batch
Epoch 212/300 Iteration 2959/4200 Training loss: 0.6210 1.7382 sec/batch
Epoch 212/300 Iteration 2960/4200 Training loss: 0.6201 1.7398 sec/batch
Epoch 212/300 Iteration 2961/4200 Training loss: 0.6170 1.7396 sec/batch
Epoch 212/300 Iteration 2962/4200 Training loss: 0.6163 1.7383 sec/batch
Epoch 212/300 Iteration 2963/4200 Training loss: 0.6139 1.7375 sec/batch
Epoch 212/300 Iteration 2964/4200 Training loss: 0.6145 1.7363 sec/batch
Epoch 212/300 Iteration 2965/4200 Training loss: 0.6142 1.7453 sec/batch
Epoch 212/300 Iteration 2966/4200 Training loss: 0.6136 1.7968 sec/batch
Epoch 212/300 Iteration 2967/4200 Training loss: 0.6145 1.7419 sec/batch
Epoch 212/300 Iteration 2968/4200 Training loss: 0.6147 1.7374 sec/batch
Epoch 213/300 Iteration 2969/4200 Training loss: 0.6361 1.7379 sec/batch
Epoch 213/300 Iteration 2970/4200 Training loss: 0.6079 1.7382 sec/batch
Epoch 213/300 Iteration 2971/4200 Training loss: 0.6033 1.7566 sec/batch
Epoch 213/300 Iteration 2972/4200 Training loss: 0.6055 1.7944 sec/batch
Epoch 213/300 Iteration 2973/4200 Training loss: 0.6045 1.7960 sec/batch
Epoch 213/300 Iteration 2974/4200 Training loss: 0.6046 1.7961 sec/batch
Epoch 213/300 Iteration 2975/4200 Training loss: 0.6041 1.7461 sec/batch
Epoch 213/300 Iteration 2976/4200 Training loss: 0.6028 1.7404 sec/batch
Epoch 213/300 Iteration 2977/4200 Training loss: 0.6010 1.7400 sec/batch
Epoch 213/300 Iteration 2978/4200 Training loss: 0.6039 1.7458 sec/batch
Epoch 213/300 Iteration 2979/4200 Training loss: 0.6060 1.7396 sec/batch
Epoch 213/300 Iteration 2980/4200 Training loss: 0.6056 1.7420 sec/batch
Epoch 213/300 Iteration 2981/4200 Training loss: 0.6062 1.7441 sec/batch
Epoch 213/300 Iteration 2982/4200 Training loss: 0.6061 1.7416 sec/batch
Epoch 214/300 Iteration 2983/4200 Training loss: 0.6422 1.7374 sec/batch
Epoch 214/300 Iteration 2984/4200 Training loss: 0.6211 1.7385 sec/batch
Epoch 214/300 Iteration 2985/4200 Training loss: 0.6074 1.7504 sec/batch
Epoch 214/300 Iteration 2986/4200 Training loss: 0.6075 1.7576 sec/batch
Epoch 214/300 Iteration 2987/4200 Training loss: 0.6070 1.7561 sec/batch
Epoch 214/300 Iteration 2988/4200 Training loss: 0.6060 1.7465 sec/batch
Epoch 214/300 Iteration 2989/4200 Training loss: 0.6036 1.7417 sec/batch
Epoch 214/300 Iteration 2990/4200 Training loss: 0.6023 1.7386 sec/batch
Epoch 214/300 Iteration 2991/4200 Training loss: 0.6027 1.7401 sec/batch
Epoch 214/300 Iteration 2992/4200 Training loss: 0.6044 1.7431 sec/batch
Epoch 214/300 Iteration 2993/4200 Training loss: 0.6040 1.7417 sec/batch
Epoch 214/300 Iteration 2994/4200 Training loss: 0.6034 1.7429 sec/batch
Epoch 214/300 Iteration 2995/4200 Training loss: 0.6045 1.7399 sec/batch
Epoch 214/300 Iteration 2996/4200 Training loss: 0.6054 1.7406 sec/batch
Epoch 215/300 Iteration 2997/4200 Training loss: 0.6439 1.7442 sec/batch
Epoch 215/300 Iteration 2998/4200 Training loss: 0.6130 1.7413 sec/batch
Epoch 215/300 Iteration 2999/4200 Training loss: 0.6018 1.7388 sec/batch
Epoch 215/300 Iteration 3000/4200 Training loss: 0.6007 1.7425 sec/batch
Validation loss: 2.49662 Saving checkpoint!
Epoch 215/300 Iteration 3001/4200 Training loss: 0.7072 1.6521 sec/batch
Epoch 215/300 Iteration 3002/4200 Training loss: 0.6910 1.7421 sec/batch
Epoch 215/300 Iteration 3003/4200 Training loss: 0.6771 1.7387 sec/batch
Epoch 215/300 Iteration 3004/4200 Training loss: 0.6674 1.7437 sec/batch
Epoch 215/300 Iteration 3005/4200 Training loss: 0.6589 1.7393 sec/batch
Epoch 215/300 Iteration 3006/4200 Training loss: 0.6544 1.7425 sec/batch
Epoch 215/300 Iteration 3007/4200 Training loss: 0.6510 1.7043 sec/batch
Epoch 215/300 Iteration 3008/4200 Training loss: 0.6461 1.7481 sec/batch
Epoch 215/300 Iteration 3009/4200 Training loss: 0.6429 1.7400 sec/batch
Epoch 215/300 Iteration 3010/4200 Training loss: 0.6401 1.7429 sec/batch
Epoch 216/300 Iteration 3011/4200 Training loss: 0.6365 1.7466 sec/batch
Epoch 216/300 Iteration 3012/4200 Training loss: 0.6075 1.7400 sec/batch
Epoch 216/300 Iteration 3013/4200 Training loss: 0.6031 1.7419 sec/batch
Epoch 216/300 Iteration 3014/4200 Training loss: 0.6020 1.7389 sec/batch
Epoch 216/300 Iteration 3015/4200 Training loss: 0.6019 1.7382 sec/batch
Epoch 216/300 Iteration 3016/4200 Training loss: 0.6024 1.7393 sec/batch
Epoch 216/300 Iteration 3017/4200 Training loss: 0.6010 1.7419 sec/batch
Epoch 216/300 Iteration 3018/4200 Training loss: 0.5998 1.7465 sec/batch
Epoch 216/300 Iteration 3019/4200 Training loss: 0.5984 1.7417 sec/batch
Epoch 216/300 Iteration 3020/4200 Training loss: 0.5973 1.7413 sec/batch
Epoch 216/300 Iteration 3021/4200 Training loss: 0.5974 1.7458 sec/batch
Epoch 216/300 Iteration 3022/4200 Training loss: 0.5970 1.7438 sec/batch
Epoch 216/300 Iteration 3023/4200 Training loss: 0.5987 1.7399 sec/batch
Epoch 216/300 Iteration 3024/4200 Training loss: 0.5988 1.7398 sec/batch
Epoch 217/300 Iteration 3025/4200 Training loss: 0.6348 1.7430 sec/batch
Epoch 217/300 Iteration 3026/4200 Training loss: 0.6048 1.7393 sec/batch
Epoch 217/300 Iteration 3027/4200 Training loss: 0.5967 1.7131 sec/batch
Epoch 217/300 Iteration 3028/4200 Training loss: 0.5983 1.7465 sec/batch
Epoch 217/300 Iteration 3029/4200 Training loss: 0.5979 1.7397 sec/batch
Epoch 217/300 Iteration 3030/4200 Training loss: 0.6004 1.7406 sec/batch
Epoch 217/300 Iteration 3031/4200 Training loss: 0.5952 1.7398 sec/batch
Epoch 217/300 Iteration 3032/4200 Training loss: 0.5928 1.7391 sec/batch
Epoch 217/300 Iteration 3033/4200 Training loss: 0.5908 1.7452 sec/batch
Epoch 217/300 Iteration 3034/4200 Training loss: 0.5917 1.7388 sec/batch
Epoch 217/300 Iteration 3035/4200 Training loss: 0.5933 1.7426 sec/batch
Epoch 217/300 Iteration 3036/4200 Training loss: 0.5919 1.7431 sec/batch
Epoch 217/300 Iteration 3037/4200 Training loss: 0.5931 1.7415 sec/batch
Epoch 217/300 Iteration 3038/4200 Training loss: 0.5938 1.7395 sec/batch
Epoch 218/300 Iteration 3039/4200 Training loss: 0.6389 1.7411 sec/batch
Epoch 218/300 Iteration 3040/4200 Training loss: 0.6086 1.7412 sec/batch
Epoch 218/300 Iteration 3041/4200 Training loss: 0.5979 1.7410 sec/batch
Epoch 218/300 Iteration 3042/4200 Training loss: 0.5981 1.7449 sec/batch
Epoch 218/300 Iteration 3043/4200 Training loss: 0.5983 1.7410 sec/batch
Epoch 218/300 Iteration 3044/4200 Training loss: 0.5965 1.7376 sec/batch
Epoch 218/300 Iteration 3045/4200 Training loss: 0.5939 1.7401 sec/batch
Epoch 218/300 Iteration 3046/4200 Training loss: 0.5919 1.7464 sec/batch
Epoch 218/300 Iteration 3047/4200 Training loss: 0.5901 1.7379 sec/batch
Epoch 218/300 Iteration 3048/4200 Training loss: 0.5894 1.7413 sec/batch
Epoch 218/300 Iteration 3049/4200 Training loss: 0.5894 1.7396 sec/batch
Epoch 218/300 Iteration 3050/4200 Training loss: 0.5876 1.7387 sec/batch
Epoch 218/300 Iteration 3051/4200 Training loss: 0.5875 1.7430 sec/batch
Epoch 218/300 Iteration 3052/4200 Training loss: 0.5882 1.7441 sec/batch
Epoch 219/300 Iteration 3053/4200 Training loss: 0.6318 1.7706 sec/batch
Epoch 219/300 Iteration 3054/4200 Training loss: 0.6063 1.7573 sec/batch
Epoch 219/300 Iteration 3055/4200 Training loss: 0.5988 1.7387 sec/batch
Epoch 219/300 Iteration 3056/4200 Training loss: 0.5946 1.7395 sec/batch
Epoch 219/300 Iteration 3057/4200 Training loss: 0.5990 1.7388 sec/batch
Epoch 219/300 Iteration 3058/4200 Training loss: 0.5965 1.7429 sec/batch
Epoch 219/300 Iteration 3059/4200 Training loss: 0.5933 1.7418 sec/batch
Epoch 219/300 Iteration 3060/4200 Training loss: 0.5904 1.7413 sec/batch
Epoch 219/300 Iteration 3061/4200 Training loss: 0.5883 1.7392 sec/batch
Epoch 219/300 Iteration 3062/4200 Training loss: 0.5894 1.7417 sec/batch
Epoch 219/300 Iteration 3063/4200 Training loss: 0.5904 1.7383 sec/batch
Epoch 219/300 Iteration 3064/4200 Training loss: 0.5902 1.7376 sec/batch
Epoch 219/300 Iteration 3065/4200 Training loss: 0.5907 1.7391 sec/batch
Epoch 219/300 Iteration 3066/4200 Training loss: 0.5909 1.7399 sec/batch
Epoch 220/300 Iteration 3067/4200 Training loss: 0.6202 1.7434 sec/batch
Epoch 220/300 Iteration 3068/4200 Training loss: 0.6002 1.7374 sec/batch
Epoch 220/300 Iteration 3069/4200 Training loss: 0.5909 1.7444 sec/batch
Epoch 220/300 Iteration 3070/4200 Training loss: 0.5875 1.7952 sec/batch
Epoch 220/300 Iteration 3071/4200 Training loss: 0.5895 1.7947 sec/batch
Epoch 220/300 Iteration 3072/4200 Training loss: 0.5892 1.7962 sec/batch
Epoch 220/300 Iteration 3073/4200 Training loss: 0.5868 1.7970 sec/batch
Epoch 220/300 Iteration 3074/4200 Training loss: 0.5862 1.7419 sec/batch
Epoch 220/300 Iteration 3075/4200 Training loss: 0.5858 1.7407 sec/batch
Epoch 220/300 Iteration 3076/4200 Training loss: 0.5863 1.7414 sec/batch
Epoch 220/300 Iteration 3077/4200 Training loss: 0.5866 1.7402 sec/batch
Epoch 220/300 Iteration 3078/4200 Training loss: 0.5849 1.7419 sec/batch
Epoch 220/300 Iteration 3079/4200 Training loss: 0.5860 1.7377 sec/batch
Epoch 220/300 Iteration 3080/4200 Training loss: 0.5858 1.7431 sec/batch
Epoch 221/300 Iteration 3081/4200 Training loss: 0.6210 1.7413 sec/batch
Epoch 221/300 Iteration 3082/4200 Training loss: 0.5992 1.7417 sec/batch
Epoch 221/300 Iteration 3083/4200 Training loss: 0.5933 1.7446 sec/batch
Epoch 221/300 Iteration 3084/4200 Training loss: 0.5929 1.7410 sec/batch
Epoch 221/300 Iteration 3085/4200 Training loss: 0.5919 1.7406 sec/batch
Epoch 221/300 Iteration 3086/4200 Training loss: 0.5915 1.7376 sec/batch
Epoch 221/300 Iteration 3087/4200 Training loss: 0.5902 1.7413 sec/batch
Epoch 221/300 Iteration 3088/4200 Training loss: 0.5876 1.7424 sec/batch
Epoch 221/300 Iteration 3089/4200 Training loss: 0.5865 1.7399 sec/batch
Epoch 221/300 Iteration 3090/4200 Training loss: 0.5859 1.7399 sec/batch
Epoch 221/300 Iteration 3091/4200 Training loss: 0.5860 1.7374 sec/batch
Epoch 221/300 Iteration 3092/4200 Training loss: 0.5856 1.7410 sec/batch
Epoch 221/300 Iteration 3093/4200 Training loss: 0.5855 1.7417 sec/batch
Epoch 221/300 Iteration 3094/4200 Training loss: 0.5850 1.7413 sec/batch
Epoch 222/300 Iteration 3095/4200 Training loss: 0.6218 1.7430 sec/batch
Epoch 222/300 Iteration 3096/4200 Training loss: 0.5980 1.7405 sec/batch
Epoch 222/300 Iteration 3097/4200 Training loss: 0.5902 1.7386 sec/batch
Epoch 222/300 Iteration 3098/4200 Training loss: 0.5878 1.7395 sec/batch
Epoch 222/300 Iteration 3099/4200 Training loss: 0.5875 1.7401 sec/batch
Epoch 222/300 Iteration 3100/4200 Training loss: 0.5872 1.7382 sec/batch
Validation loss: 2.54447 Saving checkpoint!
Epoch 222/300 Iteration 3101/4200 Training loss: 0.6584 3.2288 sec/batch
Epoch 222/300 Iteration 3102/4200 Training loss: 0.6489 1.8149 sec/batch
Epoch 222/300 Iteration 3103/4200 Training loss: 0.6401 1.6627 sec/batch
Epoch 222/300 Iteration 3104/4200 Training loss: 0.6341 1.6895 sec/batch
Epoch 222/300 Iteration 3105/4200 Training loss: 0.6299 1.7020 sec/batch
Epoch 222/300 Iteration 3106/4200 Training loss: 0.6256 1.7400 sec/batch
Epoch 222/300 Iteration 3107/4200 Training loss: 0.6250 1.7389 sec/batch
Epoch 222/300 Iteration 3108/4200 Training loss: 0.6225 1.7378 sec/batch
Epoch 223/300 Iteration 3109/4200 Training loss: 0.6364 1.7420 sec/batch
Epoch 223/300 Iteration 3110/4200 Training loss: 0.6116 1.7380 sec/batch
Epoch 223/300 Iteration 3111/4200 Training loss: 0.6040 1.7394 sec/batch
Epoch 223/300 Iteration 3112/4200 Training loss: 0.5970 1.7394 sec/batch
Epoch 223/300 Iteration 3113/4200 Training loss: 0.5923 1.7413 sec/batch
Epoch 223/300 Iteration 3114/4200 Training loss: 0.5920 1.7368 sec/batch
Epoch 223/300 Iteration 3115/4200 Training loss: 0.5903 1.7382 sec/batch
Epoch 223/300 Iteration 3116/4200 Training loss: 0.5868 1.7385 sec/batch
Epoch 223/300 Iteration 3117/4200 Training loss: 0.5851 1.7387 sec/batch
Epoch 223/300 Iteration 3118/4200 Training loss: 0.5855 1.7366 sec/batch
Epoch 223/300 Iteration 3119/4200 Training loss: 0.5862 1.7515 sec/batch
Epoch 223/300 Iteration 3120/4200 Training loss: 0.5850 1.7540 sec/batch
Epoch 223/300 Iteration 3121/4200 Training loss: 0.5866 1.7392 sec/batch
Epoch 223/300 Iteration 3122/4200 Training loss: 0.5856 1.7392 sec/batch
Epoch 224/300 Iteration 3123/4200 Training loss: 0.6118 1.7800 sec/batch
Epoch 224/300 Iteration 3124/4200 Training loss: 0.5948 1.7941 sec/batch
Epoch 224/300 Iteration 3125/4200 Training loss: 0.5899 1.7960 sec/batch
Epoch 224/300 Iteration 3126/4200 Training loss: 0.5885 1.7984 sec/batch
Epoch 224/300 Iteration 3127/4200 Training loss: 0.5885 1.7933 sec/batch
Epoch 224/300 Iteration 3128/4200 Training loss: 0.5866 1.7944 sec/batch
Epoch 224/300 Iteration 3129/4200 Training loss: 0.5857 1.7960 sec/batch
Epoch 224/300 Iteration 3130/4200 Training loss: 0.5825 1.7976 sec/batch
Epoch 224/300 Iteration 3131/4200 Training loss: 0.5809 1.7921 sec/batch
Epoch 224/300 Iteration 3132/4200 Training loss: 0.5818 1.7933 sec/batch
Epoch 224/300 Iteration 3133/4200 Training loss: 0.5816 1.7974 sec/batch
Epoch 224/300 Iteration 3134/4200 Training loss: 0.5805 1.7943 sec/batch
Epoch 224/300 Iteration 3135/4200 Training loss: 0.5814 1.7955 sec/batch
Epoch 224/300 Iteration 3136/4200 Training loss: 0.5816 1.7969 sec/batch
Epoch 225/300 Iteration 3137/4200 Training loss: 0.6202 1.7423 sec/batch
Epoch 225/300 Iteration 3138/4200 Training loss: 0.5950 1.7435 sec/batch
Epoch 225/300 Iteration 3139/4200 Training loss: 0.5817 1.7400 sec/batch
Epoch 225/300 Iteration 3140/4200 Training loss: 0.5827 1.7402 sec/batch
Epoch 225/300 Iteration 3141/4200 Training loss: 0.5832 1.7405 sec/batch
Epoch 225/300 Iteration 3142/4200 Training loss: 0.5831 1.7427 sec/batch
Epoch 225/300 Iteration 3143/4200 Training loss: 0.5802 1.7391 sec/batch
Epoch 225/300 Iteration 3144/4200 Training loss: 0.5778 1.7396 sec/batch
Epoch 225/300 Iteration 3145/4200 Training loss: 0.5751 1.7392 sec/batch
Epoch 225/300 Iteration 3146/4200 Training loss: 0.5741 1.7387 sec/batch
Epoch 225/300 Iteration 3147/4200 Training loss: 0.5751 1.7397 sec/batch
Epoch 225/300 Iteration 3148/4200 Training loss: 0.5748 1.7419 sec/batch
Epoch 225/300 Iteration 3149/4200 Training loss: 0.5765 1.7399 sec/batch
Epoch 225/300 Iteration 3150/4200 Training loss: 0.5760 1.7432 sec/batch
Epoch 226/300 Iteration 3151/4200 Training loss: 0.6128 1.7404 sec/batch
Epoch 226/300 Iteration 3152/4200 Training loss: 0.5858 1.7373 sec/batch
Epoch 226/300 Iteration 3153/4200 Training loss: 0.5756 1.7373 sec/batch
Epoch 226/300 Iteration 3154/4200 Training loss: 0.5765 1.7389 sec/batch
Epoch 226/300 Iteration 3155/4200 Training loss: 0.5770 1.7405 sec/batch
Epoch 226/300 Iteration 3156/4200 Training loss: 0.5782 1.7435 sec/batch
Epoch 226/300 Iteration 3157/4200 Training loss: 0.5743 1.7422 sec/batch
Epoch 226/300 Iteration 3158/4200 Training loss: 0.5718 1.7376 sec/batch
Epoch 226/300 Iteration 3159/4200 Training loss: 0.5709 1.7416 sec/batch
Epoch 226/300 Iteration 3160/4200 Training loss: 0.5731 1.7388 sec/batch
Epoch 226/300 Iteration 3161/4200 Training loss: 0.5731 1.7390 sec/batch
Epoch 226/300 Iteration 3162/4200 Training loss: 0.5722 1.7410 sec/batch
Epoch 226/300 Iteration 3163/4200 Training loss: 0.5738 1.7373 sec/batch
Epoch 226/300 Iteration 3164/4200 Training loss: 0.5741 1.7409 sec/batch
Epoch 227/300 Iteration 3165/4200 Training loss: 0.6065 1.7402 sec/batch
Epoch 227/300 Iteration 3166/4200 Training loss: 0.5769 1.7376 sec/batch
Epoch 227/300 Iteration 3167/4200 Training loss: 0.5708 1.7413 sec/batch
Epoch 227/300 Iteration 3168/4200 Training loss: 0.5703 1.7378 sec/batch
Epoch 227/300 Iteration 3169/4200 Training loss: 0.5718 1.7424 sec/batch
Epoch 227/300 Iteration 3170/4200 Training loss: 0.5706 1.7393 sec/batch
Epoch 227/300 Iteration 3171/4200 Training loss: 0.5686 1.7403 sec/batch
Epoch 227/300 Iteration 3172/4200 Training loss: 0.5679 1.7385 sec/batch
Epoch 227/300 Iteration 3173/4200 Training loss: 0.5669 1.7396 sec/batch
Epoch 227/300 Iteration 3174/4200 Training loss: 0.5686 1.7389 sec/batch
Epoch 227/300 Iteration 3175/4200 Training loss: 0.5690 1.7579 sec/batch
Epoch 227/300 Iteration 3176/4200 Training loss: 0.5689 1.8025 sec/batch
Epoch 227/300 Iteration 3177/4200 Training loss: 0.5702 1.7938 sec/batch
Epoch 227/300 Iteration 3178/4200 Training loss: 0.5722 1.7937 sec/batch
Epoch 228/300 Iteration 3179/4200 Training loss: 0.6111 1.7984 sec/batch
Epoch 228/300 Iteration 3180/4200 Training loss: 0.5820 1.7973 sec/batch
Epoch 228/300 Iteration 3181/4200 Training loss: 0.5736 1.7430 sec/batch
Epoch 228/300 Iteration 3182/4200 Training loss: 0.5748 1.7405 sec/batch
Epoch 228/300 Iteration 3183/4200 Training loss: 0.5712 1.7462 sec/batch
Epoch 228/300 Iteration 3184/4200 Training loss: 0.5738 1.7400 sec/batch
Epoch 228/300 Iteration 3185/4200 Training loss: 0.5709 1.7426 sec/batch
Epoch 228/300 Iteration 3186/4200 Training loss: 0.5697 1.7377 sec/batch
Epoch 228/300 Iteration 3187/4200 Training loss: 0.5686 1.7720 sec/batch
Epoch 228/300 Iteration 3188/4200 Training loss: 0.5696 1.7555 sec/batch
Epoch 228/300 Iteration 3189/4200 Training loss: 0.5711 1.7372 sec/batch
Epoch 228/300 Iteration 3190/4200 Training loss: 0.5713 1.7382 sec/batch
Epoch 228/300 Iteration 3191/4200 Training loss: 0.5720 1.7421 sec/batch
Epoch 228/300 Iteration 3192/4200 Training loss: 0.5726 1.7399 sec/batch
Epoch 229/300 Iteration 3193/4200 Training loss: 0.6079 1.7442 sec/batch
Epoch 229/300 Iteration 3194/4200 Training loss: 0.5800 1.7407 sec/batch
Epoch 229/300 Iteration 3195/4200 Training loss: 0.5697 1.7409 sec/batch
Epoch 229/300 Iteration 3196/4200 Training loss: 0.5712 1.7410 sec/batch
Epoch 229/300 Iteration 3197/4200 Training loss: 0.5722 1.7441 sec/batch
Epoch 229/300 Iteration 3198/4200 Training loss: 0.5723 1.7395 sec/batch
Epoch 229/300 Iteration 3199/4200 Training loss: 0.5698 1.7407 sec/batch
Epoch 229/300 Iteration 3200/4200 Training loss: 0.5683 1.7396 sec/batch
Validation loss: 2.54548 Saving checkpoint!
Epoch 229/300 Iteration 3201/4200 Training loss: 0.6260 3.2255 sec/batch
Epoch 229/300 Iteration 3202/4200 Training loss: 0.6209 1.8167 sec/batch
Epoch 229/300 Iteration 3203/4200 Training loss: 0.6176 1.6768 sec/batch
Epoch 229/300 Iteration 3204/4200 Training loss: 0.6132 1.6917 sec/batch
Epoch 229/300 Iteration 3205/4200 Training loss: 0.6110 1.7252 sec/batch
Epoch 229/300 Iteration 3206/4200 Training loss: 0.6092 1.7390 sec/batch
Epoch 230/300 Iteration 3207/4200 Training loss: 0.6015 1.7395 sec/batch
Epoch 230/300 Iteration 3208/4200 Training loss: 0.5742 1.7372 sec/batch
Epoch 230/300 Iteration 3209/4200 Training loss: 0.5681 1.7397 sec/batch
Epoch 230/300 Iteration 3210/4200 Training loss: 0.5716 1.7383 sec/batch
Epoch 230/300 Iteration 3211/4200 Training loss: 0.5731 1.7371 sec/batch
Epoch 230/300 Iteration 3212/4200 Training loss: 0.5724 1.7415 sec/batch
Epoch 230/300 Iteration 3213/4200 Training loss: 0.5707 1.7401 sec/batch
Epoch 230/300 Iteration 3214/4200 Training loss: 0.5676 1.7385 sec/batch
Epoch 230/300 Iteration 3215/4200 Training loss: 0.5669 1.7405 sec/batch
Epoch 230/300 Iteration 3216/4200 Training loss: 0.5664 1.7393 sec/batch
Epoch 230/300 Iteration 3217/4200 Training loss: 0.5664 1.7387 sec/batch
Epoch 230/300 Iteration 3218/4200 Training loss: 0.5661 1.7409 sec/batch
Epoch 230/300 Iteration 3219/4200 Training loss: 0.5670 1.7413 sec/batch
Epoch 230/300 Iteration 3220/4200 Training loss: 0.5674 1.7386 sec/batch
Epoch 231/300 Iteration 3221/4200 Training loss: 0.6070 1.7391 sec/batch
Epoch 231/300 Iteration 3222/4200 Training loss: 0.5666 1.7417 sec/batch
Epoch 231/300 Iteration 3223/4200 Training loss: 0.5577 1.7368 sec/batch
Epoch 231/300 Iteration 3224/4200 Training loss: 0.5585 1.7396 sec/batch
Epoch 231/300 Iteration 3225/4200 Training loss: 0.5591 1.7408 sec/batch
Epoch 231/300 Iteration 3226/4200 Training loss: 0.5607 1.7409 sec/batch
Epoch 231/300 Iteration 3227/4200 Training loss: 0.5598 1.7414 sec/batch
Epoch 231/300 Iteration 3228/4200 Training loss: 0.5591 1.7379 sec/batch
Epoch 231/300 Iteration 3229/4200 Training loss: 0.5587 1.7396 sec/batch
Epoch 231/300 Iteration 3230/4200 Training loss: 0.5587 1.7812 sec/batch
Epoch 231/300 Iteration 3231/4200 Training loss: 0.5605 1.7948 sec/batch
Epoch 231/300 Iteration 3232/4200 Training loss: 0.5619 1.7996 sec/batch
Epoch 231/300 Iteration 3233/4200 Training loss: 0.5633 1.7474 sec/batch
Epoch 231/300 Iteration 3234/4200 Training loss: 0.5629 1.7392 sec/batch
Epoch 232/300 Iteration 3235/4200 Training loss: 0.5946 1.7406 sec/batch
Epoch 232/300 Iteration 3236/4200 Training loss: 0.5659 1.7404 sec/batch
Epoch 232/300 Iteration 3237/4200 Training loss: 0.5554 1.7421 sec/batch
Epoch 232/300 Iteration 3238/4200 Training loss: 0.5547 1.7431 sec/batch
Epoch 232/300 Iteration 3239/4200 Training loss: 0.5551 1.7411 sec/batch
Epoch 232/300 Iteration 3240/4200 Training loss: 0.5559 1.7422 sec/batch
Epoch 232/300 Iteration 3241/4200 Training loss: 0.5555 1.7402 sec/batch
Epoch 232/300 Iteration 3242/4200 Training loss: 0.5530 1.7378 sec/batch
Epoch 232/300 Iteration 3243/4200 Training loss: 0.5511 1.7401 sec/batch
Epoch 232/300 Iteration 3244/4200 Training loss: 0.5503 1.7381 sec/batch
Epoch 232/300 Iteration 3245/4200 Training loss: 0.5499 1.7396 sec/batch
Epoch 232/300 Iteration 3246/4200 Training loss: 0.5491 1.7402 sec/batch
Epoch 232/300 Iteration 3247/4200 Training loss: 0.5505 1.7428 sec/batch
Epoch 232/300 Iteration 3248/4200 Training loss: 0.5513 1.7380 sec/batch
Epoch 233/300 Iteration 3249/4200 Training loss: 0.5976 1.7385 sec/batch
Epoch 233/300 Iteration 3250/4200 Training loss: 0.5646 1.7391 sec/batch
Epoch 233/300 Iteration 3251/4200 Training loss: 0.5578 1.7416 sec/batch
Epoch 233/300 Iteration 3252/4200 Training loss: 0.5520 1.7372 sec/batch
Epoch 233/300 Iteration 3253/4200 Training loss: 0.5482 1.7664 sec/batch
Epoch 233/300 Iteration 3254/4200 Training loss: 0.5479 1.7529 sec/batch
Epoch 233/300 Iteration 3255/4200 Training loss: 0.5452 1.8079 sec/batch
Epoch 233/300 Iteration 3256/4200 Training loss: 0.5433 1.8006 sec/batch
Epoch 233/300 Iteration 3257/4200 Training loss: 0.5449 1.7419 sec/batch
Epoch 233/300 Iteration 3258/4200 Training loss: 0.5460 1.7393 sec/batch
Epoch 233/300 Iteration 3259/4200 Training loss: 0.5463 1.7418 sec/batch
Epoch 233/300 Iteration 3260/4200 Training loss: 0.5463 1.7456 sec/batch
Epoch 233/300 Iteration 3261/4200 Training loss: 0.5479 1.7437 sec/batch
Epoch 233/300 Iteration 3262/4200 Training loss: 0.5479 1.7412 sec/batch
Epoch 234/300 Iteration 3263/4200 Training loss: 0.5879 1.7399 sec/batch
Epoch 234/300 Iteration 3264/4200 Training loss: 0.5619 1.7375 sec/batch
Epoch 234/300 Iteration 3265/4200 Training loss: 0.5562 1.7387 sec/batch
Epoch 234/300 Iteration 3266/4200 Training loss: 0.5570 1.7389 sec/batch
Epoch 234/300 Iteration 3267/4200 Training loss: 0.5547 1.7442 sec/batch
Epoch 234/300 Iteration 3268/4200 Training loss: 0.5555 1.7378 sec/batch
Epoch 234/300 Iteration 3269/4200 Training loss: 0.5520 1.7587 sec/batch
Epoch 234/300 Iteration 3270/4200 Training loss: 0.5514 1.7921 sec/batch
Epoch 234/300 Iteration 3271/4200 Training loss: 0.5496 1.8008 sec/batch
Epoch 234/300 Iteration 3272/4200 Training loss: 0.5496 1.7928 sec/batch
Epoch 234/300 Iteration 3273/4200 Training loss: 0.5494 1.7947 sec/batch
Epoch 234/300 Iteration 3274/4200 Training loss: 0.5487 1.7972 sec/batch
Epoch 234/300 Iteration 3275/4200 Training loss: 0.5504 1.7959 sec/batch
Epoch 234/300 Iteration 3276/4200 Training loss: 0.5508 1.7954 sec/batch
Epoch 235/300 Iteration 3277/4200 Training loss: 0.5931 1.7953 sec/batch
Epoch 235/300 Iteration 3278/4200 Training loss: 0.5604 1.7964 sec/batch
Epoch 235/300 Iteration 3279/4200 Training loss: 0.5508 1.7438 sec/batch
Epoch 235/300 Iteration 3280/4200 Training loss: 0.5479 1.7395 sec/batch
Epoch 235/300 Iteration 3281/4200 Training loss: 0.5483 1.7414 sec/batch
Epoch 235/300 Iteration 3282/4200 Training loss: 0.5487 1.7395 sec/batch
Epoch 235/300 Iteration 3283/4200 Training loss: 0.5473 1.7388 sec/batch
Epoch 235/300 Iteration 3284/4200 Training loss: 0.5483 1.7393 sec/batch
Epoch 235/300 Iteration 3285/4200 Training loss: 0.5470 1.7396 sec/batch
Epoch 235/300 Iteration 3286/4200 Training loss: 0.5466 1.7393 sec/batch
Epoch 235/300 Iteration 3287/4200 Training loss: 0.5466 1.7404 sec/batch
Epoch 235/300 Iteration 3288/4200 Training loss: 0.5459 1.7386 sec/batch
Epoch 235/300 Iteration 3289/4200 Training loss: 0.5472 1.7377 sec/batch
Epoch 235/300 Iteration 3290/4200 Training loss: 0.5476 1.7402 sec/batch
Epoch 236/300 Iteration 3291/4200 Training loss: 0.5881 1.7399 sec/batch
Epoch 236/300 Iteration 3292/4200 Training loss: 0.5645 1.7391 sec/batch
Epoch 236/300 Iteration 3293/4200 Training loss: 0.5558 1.7401 sec/batch
Epoch 236/300 Iteration 3294/4200 Training loss: 0.5576 1.7554 sec/batch
Epoch 236/300 Iteration 3295/4200 Training loss: 0.5530 1.7975 sec/batch
Epoch 236/300 Iteration 3296/4200 Training loss: 0.5500 1.7948 sec/batch
Epoch 236/300 Iteration 3297/4200 Training loss: 0.5487 1.8001 sec/batch
Epoch 236/300 Iteration 3298/4200 Training loss: 0.5489 1.7942 sec/batch
Epoch 236/300 Iteration 3299/4200 Training loss: 0.5490 1.7949 sec/batch
Epoch 236/300 Iteration 3300/4200 Training loss: 0.5478 1.7949 sec/batch
Validation loss: 2.61699 Saving checkpoint!
Epoch 236/300 Iteration 3301/4200 Training loss: 0.5971 3.1939 sec/batch
Epoch 236/300 Iteration 3302/4200 Training loss: 0.5924 1.8475 sec/batch
Epoch 236/300 Iteration 3303/4200 Training loss: 0.5899 1.6666 sec/batch
Epoch 236/300 Iteration 3304/4200 Training loss: 0.5873 1.6914 sec/batch
Epoch 237/300 Iteration 3305/4200 Training loss: 0.5923 1.6906 sec/batch
Epoch 237/300 Iteration 3306/4200 Training loss: 0.5694 1.6907 sec/batch
Epoch 237/300 Iteration 3307/4200 Training loss: 0.5579 1.7125 sec/batch
Epoch 237/300 Iteration 3308/4200 Training loss: 0.5563 1.7388 sec/batch
Epoch 237/300 Iteration 3309/4200 Training loss: 0.5565 1.7419 sec/batch
Epoch 237/300 Iteration 3310/4200 Training loss: 0.5552 1.7406 sec/batch
Epoch 237/300 Iteration 3311/4200 Training loss: 0.5523 1.7445 sec/batch
Epoch 237/300 Iteration 3312/4200 Training loss: 0.5516 1.7379 sec/batch
Epoch 237/300 Iteration 3313/4200 Training loss: 0.5509 1.7431 sec/batch
Epoch 237/300 Iteration 3314/4200 Training loss: 0.5511 1.7402 sec/batch
Epoch 237/300 Iteration 3315/4200 Training loss: 0.5520 1.7436 sec/batch
Epoch 237/300 Iteration 3316/4200 Training loss: 0.5516 1.7382 sec/batch
Epoch 237/300 Iteration 3317/4200 Training loss: 0.5524 1.7377 sec/batch
Epoch 237/300 Iteration 3318/4200 Training loss: 0.5526 1.7442 sec/batch
Epoch 238/300 Iteration 3319/4200 Training loss: 0.5992 1.7469 sec/batch
Epoch 238/300 Iteration 3320/4200 Training loss: 0.5715 1.7565 sec/batch
Epoch 238/300 Iteration 3321/4200 Training loss: 0.5634 1.7403 sec/batch
Epoch 238/300 Iteration 3322/4200 Training loss: 0.5570 1.7441 sec/batch
Epoch 238/300 Iteration 3323/4200 Training loss: 0.5540 1.7370 sec/batch
Epoch 238/300 Iteration 3324/4200 Training loss: 0.5507 1.7372 sec/batch
Epoch 238/300 Iteration 3325/4200 Training loss: 0.5469 1.7399 sec/batch
Epoch 238/300 Iteration 3326/4200 Training loss: 0.5471 1.7381 sec/batch
Epoch 238/300 Iteration 3327/4200 Training loss: 0.5450 1.7363 sec/batch
Epoch 238/300 Iteration 3328/4200 Training loss: 0.5474 1.7440 sec/batch
Epoch 238/300 Iteration 3329/4200 Training loss: 0.5476 1.7703 sec/batch
Epoch 238/300 Iteration 3330/4200 Training loss: 0.5460 1.7438 sec/batch
Epoch 238/300 Iteration 3331/4200 Training loss: 0.5467 1.7429 sec/batch
Epoch 238/300 Iteration 3332/4200 Training loss: 0.5455 1.7396 sec/batch
Epoch 239/300 Iteration 3333/4200 Training loss: 0.5892 1.7422 sec/batch
Epoch 239/300 Iteration 3334/4200 Training loss: 0.5700 1.7418 sec/batch
Epoch 239/300 Iteration 3335/4200 Training loss: 0.5617 1.7426 sec/batch
Epoch 239/300 Iteration 3336/4200 Training loss: 0.5566 1.7417 sec/batch
Epoch 239/300 Iteration 3337/4200 Training loss: 0.5542 1.7394 sec/batch
Epoch 239/300 Iteration 3338/4200 Training loss: 0.5491 1.7394 sec/batch
Epoch 239/300 Iteration 3339/4200 Training loss: 0.5455 1.7380 sec/batch
Epoch 239/300 Iteration 3340/4200 Training loss: 0.5426 1.7393 sec/batch
Epoch 239/300 Iteration 3341/4200 Training loss: 0.5427 1.7416 sec/batch
Epoch 239/300 Iteration 3342/4200 Training loss: 0.5430 1.7469 sec/batch
Epoch 239/300 Iteration 3343/4200 Training loss: 0.5431 1.7374 sec/batch
Epoch 239/300 Iteration 3344/4200 Training loss: 0.5430 1.7413 sec/batch
Epoch 239/300 Iteration 3345/4200 Training loss: 0.5437 1.7386 sec/batch
Epoch 239/300 Iteration 3346/4200 Training loss: 0.5434 1.7442 sec/batch
Epoch 240/300 Iteration 3347/4200 Training loss: 0.5680 1.7386 sec/batch
Epoch 240/300 Iteration 3348/4200 Training loss: 0.5548 1.7422 sec/batch
Epoch 240/300 Iteration 3349/4200 Training loss: 0.5513 1.7449 sec/batch
Epoch 240/300 Iteration 3350/4200 Training loss: 0.5511 1.7423 sec/batch
Epoch 240/300 Iteration 3351/4200 Training loss: 0.5517 1.7412 sec/batch
Epoch 240/300 Iteration 3352/4200 Training loss: 0.5491 1.7418 sec/batch
Epoch 240/300 Iteration 3353/4200 Training loss: 0.5463 1.7438 sec/batch
Epoch 240/300 Iteration 3354/4200 Training loss: 0.5465 1.7432 sec/batch
Epoch 240/300 Iteration 3355/4200 Training loss: 0.5438 1.7397 sec/batch
Epoch 240/300 Iteration 3356/4200 Training loss: 0.5446 1.7428 sec/batch
Epoch 240/300 Iteration 3357/4200 Training loss: 0.5437 1.7416 sec/batch
Epoch 240/300 Iteration 3358/4200 Training loss: 0.5413 1.7389 sec/batch
Epoch 240/300 Iteration 3359/4200 Training loss: 0.5432 1.7389 sec/batch
Epoch 240/300 Iteration 3360/4200 Training loss: 0.5429 1.7382 sec/batch
Epoch 241/300 Iteration 3361/4200 Training loss: 0.5805 1.7372 sec/batch
Epoch 241/300 Iteration 3362/4200 Training loss: 0.5450 1.7412 sec/batch
Epoch 241/300 Iteration 3363/4200 Training loss: 0.5381 1.7441 sec/batch
Epoch 241/300 Iteration 3364/4200 Training loss: 0.5381 1.7411 sec/batch
Epoch 241/300 Iteration 3365/4200 Training loss: 0.5373 1.7408 sec/batch
Epoch 241/300 Iteration 3366/4200 Training loss: 0.5383 1.7413 sec/batch
Epoch 241/300 Iteration 3367/4200 Training loss: 0.5347 1.7452 sec/batch
Epoch 241/300 Iteration 3368/4200 Training loss: 0.5317 1.7389 sec/batch
Epoch 241/300 Iteration 3369/4200 Training loss: 0.5304 1.7439 sec/batch
Epoch 241/300 Iteration 3370/4200 Training loss: 0.5298 1.7411 sec/batch
Epoch 241/300 Iteration 3371/4200 Training loss: 0.5311 1.7382 sec/batch
Epoch 241/300 Iteration 3372/4200 Training loss: 0.5319 1.7412 sec/batch
Epoch 241/300 Iteration 3373/4200 Training loss: 0.5337 1.7436 sec/batch
Epoch 241/300 Iteration 3374/4200 Training loss: 0.5336 1.7437 sec/batch
Epoch 242/300 Iteration 3375/4200 Training loss: 0.5721 1.7377 sec/batch
Epoch 242/300 Iteration 3376/4200 Training loss: 0.5523 1.7381 sec/batch
Epoch 242/300 Iteration 3377/4200 Training loss: 0.5412 1.7385 sec/batch
Epoch 242/300 Iteration 3378/4200 Training loss: 0.5398 1.7400 sec/batch
Epoch 242/300 Iteration 3379/4200 Training loss: 0.5386 1.7399 sec/batch
Epoch 242/300 Iteration 3380/4200 Training loss: 0.5383 1.7383 sec/batch
Epoch 242/300 Iteration 3381/4200 Training loss: 0.5363 1.7381 sec/batch
Epoch 242/300 Iteration 3382/4200 Training loss: 0.5341 1.7389 sec/batch
Epoch 242/300 Iteration 3383/4200 Training loss: 0.5331 1.7402 sec/batch
Epoch 242/300 Iteration 3384/4200 Training loss: 0.5338 1.7416 sec/batch
Epoch 242/300 Iteration 3385/4200 Training loss: 0.5349 1.7379 sec/batch
Epoch 242/300 Iteration 3386/4200 Training loss: 0.5332 1.7649 sec/batch
Epoch 242/300 Iteration 3387/4200 Training loss: 0.5345 1.8271 sec/batch
Epoch 242/300 Iteration 3388/4200 Training loss: 0.5346 1.7960 sec/batch
Epoch 243/300 Iteration 3389/4200 Training loss: 0.5759 1.8107 sec/batch
Epoch 243/300 Iteration 3390/4200 Training loss: 0.5517 1.7992 sec/batch
Epoch 243/300 Iteration 3391/4200 Training loss: 0.5381 1.7943 sec/batch
Epoch 243/300 Iteration 3392/4200 Training loss: 0.5348 1.7976 sec/batch
Epoch 243/300 Iteration 3393/4200 Training loss: 0.5340 1.7957 sec/batch
Epoch 243/300 Iteration 3394/4200 Training loss: 0.5365 1.8020 sec/batch
Epoch 243/300 Iteration 3395/4200 Training loss: 0.5339 1.7950 sec/batch
Epoch 243/300 Iteration 3396/4200 Training loss: 0.5340 1.7948 sec/batch
Epoch 243/300 Iteration 3397/4200 Training loss: 0.5332 1.7940 sec/batch
Epoch 243/300 Iteration 3398/4200 Training loss: 0.5326 1.7937 sec/batch
Epoch 243/300 Iteration 3399/4200 Training loss: 0.5322 1.7936 sec/batch
Epoch 243/300 Iteration 3400/4200 Training loss: 0.5300 1.7993 sec/batch
Validation loss: 2.63149 Saving checkpoint!
Epoch 243/300 Iteration 3401/4200 Training loss: 0.5724 3.1836 sec/batch
Epoch 243/300 Iteration 3402/4200 Training loss: 0.5690 1.8629 sec/batch
Epoch 244/300 Iteration 3403/4200 Training loss: 0.5691 1.6953 sec/batch
Epoch 244/300 Iteration 3404/4200 Training loss: 0.5460 1.6944 sec/batch
Epoch 244/300 Iteration 3405/4200 Training loss: 0.5394 1.7213 sec/batch
Epoch 244/300 Iteration 3406/4200 Training loss: 0.5377 1.7407 sec/batch
Epoch 244/300 Iteration 3407/4200 Training loss: 0.5342 1.7402 sec/batch
Epoch 244/300 Iteration 3408/4200 Training loss: 0.5317 1.7459 sec/batch
Epoch 244/300 Iteration 3409/4200 Training loss: 0.5311 1.7415 sec/batch
Epoch 244/300 Iteration 3410/4200 Training loss: 0.5304 1.7394 sec/batch
Epoch 244/300 Iteration 3411/4200 Training loss: 0.5301 1.7406 sec/batch
Epoch 244/300 Iteration 3412/4200 Training loss: 0.5318 1.7457 sec/batch
Epoch 244/300 Iteration 3413/4200 Training loss: 0.5332 1.7413 sec/batch
Epoch 244/300 Iteration 3414/4200 Training loss: 0.5308 1.7394 sec/batch
Epoch 244/300 Iteration 3415/4200 Training loss: 0.5301 1.7396 sec/batch
Epoch 244/300 Iteration 3416/4200 Training loss: 0.5298 1.7375 sec/batch
Epoch 245/300 Iteration 3417/4200 Training loss: 0.5648 1.7376 sec/batch
Epoch 245/300 Iteration 3418/4200 Training loss: 0.5437 1.7205 sec/batch
Epoch 245/300 Iteration 3419/4200 Training loss: 0.5383 1.7405 sec/batch
Epoch 245/300 Iteration 3420/4200 Training loss: 0.5378 1.7389 sec/batch
Epoch 245/300 Iteration 3421/4200 Training loss: 0.5360 1.7419 sec/batch
Epoch 245/300 Iteration 3422/4200 Training loss: 0.5351 1.7424 sec/batch
Epoch 245/300 Iteration 3423/4200 Training loss: 0.5313 1.7390 sec/batch
Epoch 245/300 Iteration 3424/4200 Training loss: 0.5301 1.7404 sec/batch
Epoch 245/300 Iteration 3425/4200 Training loss: 0.5265 1.7376 sec/batch
Epoch 245/300 Iteration 3426/4200 Training loss: 0.5263 1.7395 sec/batch
Epoch 245/300 Iteration 3427/4200 Training loss: 0.5276 1.7383 sec/batch
Epoch 245/300 Iteration 3428/4200 Training loss: 0.5283 1.7385 sec/batch
Epoch 245/300 Iteration 3429/4200 Training loss: 0.5282 1.7443 sec/batch
Epoch 245/300 Iteration 3430/4200 Training loss: 0.5283 1.7393 sec/batch
Epoch 246/300 Iteration 3431/4200 Training loss: 0.5644 1.7971 sec/batch
Epoch 246/300 Iteration 3432/4200 Training loss: 0.5415 1.7411 sec/batch
Epoch 246/300 Iteration 3433/4200 Training loss: 0.5296 1.7426 sec/batch
Epoch 246/300 Iteration 3434/4200 Training loss: 0.5295 1.7404 sec/batch
Epoch 246/300 Iteration 3435/4200 Training loss: 0.5279 1.7442 sec/batch
Epoch 246/300 Iteration 3436/4200 Training loss: 0.5263 1.7393 sec/batch
Epoch 246/300 Iteration 3437/4200 Training loss: 0.5226 1.7434 sec/batch
Epoch 246/300 Iteration 3438/4200 Training loss: 0.5204 1.7405 sec/batch
Epoch 246/300 Iteration 3439/4200 Training loss: 0.5184 1.7444 sec/batch
Epoch 246/300 Iteration 3440/4200 Training loss: 0.5182 1.7397 sec/batch
Epoch 246/300 Iteration 3441/4200 Training loss: 0.5200 1.7377 sec/batch
Epoch 246/300 Iteration 3442/4200 Training loss: 0.5195 1.7390 sec/batch
Epoch 246/300 Iteration 3443/4200 Training loss: 0.5209 1.7446 sec/batch
Epoch 246/300 Iteration 3444/4200 Training loss: 0.5207 1.7393 sec/batch
Epoch 247/300 Iteration 3445/4200 Training loss: 0.5509 1.7405 sec/batch
Epoch 247/300 Iteration 3446/4200 Training loss: 0.5260 1.7387 sec/batch
Epoch 247/300 Iteration 3447/4200 Training loss: 0.5219 1.7420 sec/batch
Epoch 247/300 Iteration 3448/4200 Training loss: 0.5216 1.7419 sec/batch
Epoch 247/300 Iteration 3449/4200 Training loss: 0.5223 1.7458 sec/batch
Epoch 247/300 Iteration 3450/4200 Training loss: 0.5243 1.7406 sec/batch
Epoch 247/300 Iteration 3451/4200 Training loss: 0.5215 1.7803 sec/batch
Epoch 247/300 Iteration 3452/4200 Training loss: 0.5186 1.7975 sec/batch
Epoch 247/300 Iteration 3453/4200 Training loss: 0.5163 1.8274 sec/batch
Epoch 247/300 Iteration 3454/4200 Training loss: 0.5165 1.7989 sec/batch
Epoch 247/300 Iteration 3455/4200 Training loss: 0.5177 1.8132 sec/batch
Epoch 247/300 Iteration 3456/4200 Training loss: 0.5182 1.7979 sec/batch
Epoch 247/300 Iteration 3457/4200 Training loss: 0.5198 1.7428 sec/batch
Epoch 247/300 Iteration 3458/4200 Training loss: 0.5197 1.7408 sec/batch
Epoch 248/300 Iteration 3459/4200 Training loss: 0.5549 1.7426 sec/batch
Epoch 248/300 Iteration 3460/4200 Training loss: 0.5328 1.7428 sec/batch
Epoch 248/300 Iteration 3461/4200 Training loss: 0.5236 1.7429 sec/batch
Epoch 248/300 Iteration 3462/4200 Training loss: 0.5217 1.7389 sec/batch
Epoch 248/300 Iteration 3463/4200 Training loss: 0.5230 1.7388 sec/batch
Epoch 248/300 Iteration 3464/4200 Training loss: 0.5246 1.7389 sec/batch
Epoch 248/300 Iteration 3465/4200 Training loss: 0.5221 1.7400 sec/batch
Epoch 248/300 Iteration 3466/4200 Training loss: 0.5209 1.7379 sec/batch
Epoch 248/300 Iteration 3467/4200 Training loss: 0.5191 1.7429 sec/batch
Epoch 248/300 Iteration 3468/4200 Training loss: 0.5171 1.7805 sec/batch
Epoch 248/300 Iteration 3469/4200 Training loss: 0.5169 1.7934 sec/batch
Epoch 248/300 Iteration 3470/4200 Training loss: 0.5151 1.7946 sec/batch
Epoch 248/300 Iteration 3471/4200 Training loss: 0.5160 1.7937 sec/batch
Epoch 248/300 Iteration 3472/4200 Training loss: 0.5165 1.7453 sec/batch
Epoch 249/300 Iteration 3473/4200 Training loss: 0.5491 1.7425 sec/batch
Epoch 249/300 Iteration 3474/4200 Training loss: 0.5275 1.7382 sec/batch
Epoch 249/300 Iteration 3475/4200 Training loss: 0.5186 1.7414 sec/batch
Epoch 249/300 Iteration 3476/4200 Training loss: 0.5130 1.7405 sec/batch
Epoch 249/300 Iteration 3477/4200 Training loss: 0.5149 1.7432 sec/batch
Epoch 249/300 Iteration 3478/4200 Training loss: 0.5153 1.7408 sec/batch
Epoch 249/300 Iteration 3479/4200 Training loss: 0.5129 1.7410 sec/batch
Epoch 249/300 Iteration 3480/4200 Training loss: 0.5118 1.7453 sec/batch
Epoch 249/300 Iteration 3481/4200 Training loss: 0.5102 1.7415 sec/batch
Epoch 249/300 Iteration 3482/4200 Training loss: 0.5110 1.7394 sec/batch
Epoch 249/300 Iteration 3483/4200 Training loss: 0.5108 1.7377 sec/batch
Epoch 249/300 Iteration 3484/4200 Training loss: 0.5108 1.7442 sec/batch
Epoch 249/300 Iteration 3485/4200 Training loss: 0.5122 1.7397 sec/batch
Epoch 249/300 Iteration 3486/4200 Training loss: 0.5124 1.7392 sec/batch
Epoch 250/300 Iteration 3487/4200 Training loss: 0.5396 1.7440 sec/batch
Epoch 250/300 Iteration 3488/4200 Training loss: 0.5166 1.7398 sec/batch
Epoch 250/300 Iteration 3489/4200 Training loss: 0.5135 1.7426 sec/batch
Epoch 250/300 Iteration 3490/4200 Training loss: 0.5098 1.7388 sec/batch
Epoch 250/300 Iteration 3491/4200 Training loss: 0.5073 1.7409 sec/batch
Epoch 250/300 Iteration 3492/4200 Training loss: 0.5077 1.7407 sec/batch
Epoch 250/300 Iteration 3493/4200 Training loss: 0.5063 1.7409 sec/batch
Epoch 250/300 Iteration 3494/4200 Training loss: 0.5047 1.7418 sec/batch
Epoch 250/300 Iteration 3495/4200 Training loss: 0.5036 1.7382 sec/batch
Epoch 250/300 Iteration 3496/4200 Training loss: 0.5025 1.7398 sec/batch
Epoch 250/300 Iteration 3497/4200 Training loss: 0.5025 1.7439 sec/batch
Epoch 250/300 Iteration 3498/4200 Training loss: 0.5010 1.8076 sec/batch
Epoch 250/300 Iteration 3499/4200 Training loss: 0.5018 1.7386 sec/batch
Epoch 250/300 Iteration 3500/4200 Training loss: 0.5022 1.7439 sec/batch
Validation loss: 2.68825 Saving checkpoint!
Epoch 251/300 Iteration 3501/4200 Training loss: 0.5431 3.2162 sec/batch
Epoch 251/300 Iteration 3502/4200 Training loss: 0.5146 1.8239 sec/batch
Epoch 251/300 Iteration 3503/4200 Training loss: 0.5090 1.6742 sec/batch
Epoch 251/300 Iteration 3504/4200 Training loss: 0.5055 1.6909 sec/batch
Epoch 251/300 Iteration 3505/4200 Training loss: 0.5044 1.6968 sec/batch
Epoch 251/300 Iteration 3506/4200 Training loss: 0.5046 1.6909 sec/batch
Epoch 251/300 Iteration 3507/4200 Training loss: 0.5040 1.6916 sec/batch
Epoch 251/300 Iteration 3508/4200 Training loss: 0.5035 1.7262 sec/batch
Epoch 251/300 Iteration 3509/4200 Training loss: 0.5016 1.7434 sec/batch
Epoch 251/300 Iteration 3510/4200 Training loss: 0.5005 1.7377 sec/batch
Epoch 251/300 Iteration 3511/4200 Training loss: 0.5017 1.7391 sec/batch
Epoch 251/300 Iteration 3512/4200 Training loss: 0.5009 1.7441 sec/batch
Epoch 251/300 Iteration 3513/4200 Training loss: 0.5015 1.7399 sec/batch
Epoch 251/300 Iteration 3514/4200 Training loss: 0.5003 1.7404 sec/batch
Epoch 252/300 Iteration 3515/4200 Training loss: 0.5439 1.7388 sec/batch
Epoch 252/300 Iteration 3516/4200 Training loss: 0.5205 1.7399 sec/batch
Epoch 252/300 Iteration 3517/4200 Training loss: 0.5103 1.7388 sec/batch
Epoch 252/300 Iteration 3518/4200 Training loss: 0.5079 1.7428 sec/batch
Epoch 252/300 Iteration 3519/4200 Training loss: 0.5022 1.7665 sec/batch
Epoch 252/300 Iteration 3520/4200 Training loss: 0.5025 1.7408 sec/batch
Epoch 252/300 Iteration 3521/4200 Training loss: 0.5018 1.7601 sec/batch
Epoch 252/300 Iteration 3522/4200 Training loss: 0.5011 1.7408 sec/batch
Epoch 252/300 Iteration 3523/4200 Training loss: 0.4993 1.7409 sec/batch
Epoch 252/300 Iteration 3524/4200 Training loss: 0.4987 1.7423 sec/batch
Epoch 252/300 Iteration 3525/4200 Training loss: 0.4991 1.7403 sec/batch
Epoch 252/300 Iteration 3526/4200 Training loss: 0.4983 1.7449 sec/batch
Epoch 252/300 Iteration 3527/4200 Training loss: 0.4995 1.7404 sec/batch
Epoch 252/300 Iteration 3528/4200 Training loss: 0.4986 1.7413 sec/batch
Epoch 253/300 Iteration 3529/4200 Training loss: 0.5341 1.7450 sec/batch
Epoch 253/300 Iteration 3530/4200 Training loss: 0.5151 1.7418 sec/batch
Epoch 253/300 Iteration 3531/4200 Training loss: 0.5035 1.7417 sec/batch
Epoch 253/300 Iteration 3532/4200 Training loss: 0.5021 1.7412 sec/batch
Epoch 253/300 Iteration 3533/4200 Training loss: 0.5025 1.7482 sec/batch
Epoch 253/300 Iteration 3534/4200 Training loss: 0.4993 1.7449 sec/batch
Epoch 253/300 Iteration 3535/4200 Training loss: 0.4976 1.6935 sec/batch
Epoch 253/300 Iteration 3536/4200 Training loss: 0.4969 1.7619 sec/batch
Epoch 253/300 Iteration 3537/4200 Training loss: 0.4942 1.7037 sec/batch
Epoch 253/300 Iteration 3538/4200 Training loss: 0.4946 1.7392 sec/batch
Epoch 253/300 Iteration 3539/4200 Training loss: 0.4950 1.7504 sec/batch
Epoch 253/300 Iteration 3540/4200 Training loss: 0.4937 1.7426 sec/batch
Epoch 253/300 Iteration 3541/4200 Training loss: 0.4950 1.7399 sec/batch
Epoch 253/300 Iteration 3542/4200 Training loss: 0.4949 1.7408 sec/batch
Epoch 254/300 Iteration 3543/4200 Training loss: 0.5384 1.7413 sec/batch
Epoch 254/300 Iteration 3544/4200 Training loss: 0.5066 1.7396 sec/batch
Epoch 254/300 Iteration 3545/4200 Training loss: 0.4988 1.7400 sec/batch
Epoch 254/300 Iteration 3546/4200 Training loss: 0.4975 1.7428 sec/batch
Epoch 254/300 Iteration 3547/4200 Training loss: 0.4974 1.7374 sec/batch
Epoch 254/300 Iteration 3548/4200 Training loss: 0.4991 1.7370 sec/batch
Epoch 254/300 Iteration 3549/4200 Training loss: 0.4954 1.7410 sec/batch
Epoch 254/300 Iteration 3550/4200 Training loss: 0.4937 1.7418 sec/batch
Epoch 254/300 Iteration 3551/4200 Training loss: 0.4928 1.7421 sec/batch
Epoch 254/300 Iteration 3552/4200 Training loss: 0.4939 1.7427 sec/batch
Epoch 254/300 Iteration 3553/4200 Training loss: 0.4952 1.7395 sec/batch
Epoch 254/300 Iteration 3554/4200 Training loss: 0.4938 1.7391 sec/batch
Epoch 254/300 Iteration 3555/4200 Training loss: 0.4949 1.7417 sec/batch
Epoch 254/300 Iteration 3556/4200 Training loss: 0.4935 1.7408 sec/batch
Epoch 255/300 Iteration 3557/4200 Training loss: 0.5256 1.7449 sec/batch
Epoch 255/300 Iteration 3558/4200 Training loss: 0.4936 1.7407 sec/batch
Epoch 255/300 Iteration 3559/4200 Training loss: 0.4941 1.7388 sec/batch
Epoch 255/300 Iteration 3560/4200 Training loss: 0.4981 1.7460 sec/batch
Epoch 255/300 Iteration 3561/4200 Training loss: 0.4970 1.7400 sec/batch
Epoch 255/300 Iteration 3562/4200 Training loss: 0.4936 1.7381 sec/batch
Epoch 255/300 Iteration 3563/4200 Training loss: 0.4916 1.7378 sec/batch
Epoch 255/300 Iteration 3564/4200 Training loss: 0.4894 1.7375 sec/batch
Epoch 255/300 Iteration 3565/4200 Training loss: 0.4878 1.7422 sec/batch
Epoch 255/300 Iteration 3566/4200 Training loss: 0.4889 1.7388 sec/batch
Epoch 255/300 Iteration 3567/4200 Training loss: 0.4897 1.7371 sec/batch
Epoch 255/300 Iteration 3568/4200 Training loss: 0.4900 1.7468 sec/batch
Epoch 255/300 Iteration 3569/4200 Training loss: 0.4915 1.7430 sec/batch
Epoch 255/300 Iteration 3570/4200 Training loss: 0.4930 1.7396 sec/batch
Epoch 256/300 Iteration 3571/4200 Training loss: 0.5096 1.7379 sec/batch
Epoch 256/300 Iteration 3572/4200 Training loss: 0.4856 1.7380 sec/batch
Epoch 256/300 Iteration 3573/4200 Training loss: 0.4819 1.7390 sec/batch
Epoch 256/300 Iteration 3574/4200 Training loss: 0.4836 1.7441 sec/batch
Epoch 256/300 Iteration 3575/4200 Training loss: 0.4881 1.7423 sec/batch
Epoch 256/300 Iteration 3576/4200 Training loss: 0.4884 1.7416 sec/batch
Epoch 256/300 Iteration 3577/4200 Training loss: 0.4864 1.8004 sec/batch
Epoch 256/300 Iteration 3578/4200 Training loss: 0.4857 1.7958 sec/batch
Epoch 256/300 Iteration 3579/4200 Training loss: 0.4849 1.7986 sec/batch
Epoch 256/300 Iteration 3580/4200 Training loss: 0.4859 1.7394 sec/batch
Epoch 256/300 Iteration 3581/4200 Training loss: 0.4886 1.7402 sec/batch
Epoch 256/300 Iteration 3582/4200 Training loss: 0.4888 1.7645 sec/batch
Epoch 256/300 Iteration 3583/4200 Training loss: 0.4907 1.7953 sec/batch
Epoch 256/300 Iteration 3584/4200 Training loss: 0.4898 1.7946 sec/batch
Epoch 257/300 Iteration 3585/4200 Training loss: 0.5187 1.7944 sec/batch
Epoch 257/300 Iteration 3586/4200 Training loss: 0.5008 1.7948 sec/batch
Epoch 257/300 Iteration 3587/4200 Training loss: 0.4955 1.7961 sec/batch
Epoch 257/300 Iteration 3588/4200 Training loss: 0.4956 1.8260 sec/batch
Epoch 257/300 Iteration 3589/4200 Training loss: 0.4962 1.8138 sec/batch
Epoch 257/300 Iteration 3590/4200 Training loss: 0.4972 1.7962 sec/batch
Epoch 257/300 Iteration 3591/4200 Training loss: 0.4948 1.7939 sec/batch
Epoch 257/300 Iteration 3592/4200 Training loss: 0.4929 1.7949 sec/batch
Epoch 257/300 Iteration 3593/4200 Training loss: 0.4894 1.7982 sec/batch
Epoch 257/300 Iteration 3594/4200 Training loss: 0.4884 1.8005 sec/batch
Epoch 257/300 Iteration 3595/4200 Training loss: 0.4902 1.7952 sec/batch
Epoch 257/300 Iteration 3596/4200 Training loss: 0.4903 1.7941 sec/batch
Epoch 257/300 Iteration 3597/4200 Training loss: 0.4920 1.7408 sec/batch
Epoch 257/300 Iteration 3598/4200 Training loss: 0.4925 1.7392 sec/batch
Epoch 258/300 Iteration 3599/4200 Training loss: 0.5145 1.7411 sec/batch
Epoch 258/300 Iteration 3600/4200 Training loss: 0.4988 1.7398 sec/batch
Validation loss: 2.7303 Saving checkpoint!
Epoch 258/300 Iteration 3601/4200 Training loss: 0.6826 1.6674 sec/batch
Epoch 258/300 Iteration 3602/4200 Training loss: 0.6292 1.7957 sec/batch
Epoch 258/300 Iteration 3603/4200 Training loss: 0.6007 1.7470 sec/batch
Epoch 258/300 Iteration 3604/4200 Training loss: 0.5837 1.7391 sec/batch
Epoch 258/300 Iteration 3605/4200 Training loss: 0.5706 1.7382 sec/batch
Epoch 258/300 Iteration 3606/4200 Training loss: 0.5590 1.7488 sec/batch
Epoch 258/300 Iteration 3607/4200 Training loss: 0.5487 1.7393 sec/batch
Epoch 258/300 Iteration 3608/4200 Training loss: 0.5416 1.7424 sec/batch
Epoch 258/300 Iteration 3609/4200 Training loss: 0.5375 1.7408 sec/batch
Epoch 258/300 Iteration 3610/4200 Training loss: 0.5328 1.7461 sec/batch
Epoch 258/300 Iteration 3611/4200 Training loss: 0.5302 1.7419 sec/batch
Epoch 258/300 Iteration 3612/4200 Training loss: 0.5277 1.7405 sec/batch
Epoch 259/300 Iteration 3613/4200 Training loss: 0.5372 1.7439 sec/batch
Epoch 259/300 Iteration 3614/4200 Training loss: 0.5041 1.7425 sec/batch
Epoch 259/300 Iteration 3615/4200 Training loss: 0.4931 1.7399 sec/batch
Epoch 259/300 Iteration 3616/4200 Training loss: 0.4881 1.7419 sec/batch
Epoch 259/300 Iteration 3617/4200 Training loss: 0.4876 1.7432 sec/batch
Epoch 259/300 Iteration 3618/4200 Training loss: 0.4877 1.7399 sec/batch
Epoch 259/300 Iteration 3619/4200 Training loss: 0.4860 1.7421 sec/batch
Epoch 259/300 Iteration 3620/4200 Training loss: 0.4854 1.7386 sec/batch
Epoch 259/300 Iteration 3621/4200 Training loss: 0.4845 1.7390 sec/batch
Epoch 259/300 Iteration 3622/4200 Training loss: 0.4844 1.7998 sec/batch
Epoch 259/300 Iteration 3623/4200 Training loss: 0.4845 1.7410 sec/batch
Epoch 259/300 Iteration 3624/4200 Training loss: 0.4830 1.7542 sec/batch
Epoch 259/300 Iteration 3625/4200 Training loss: 0.4828 1.7398 sec/batch
Epoch 259/300 Iteration 3626/4200 Training loss: 0.4827 1.7446 sec/batch
Epoch 260/300 Iteration 3627/4200 Training loss: 0.5277 1.7945 sec/batch
Epoch 260/300 Iteration 3628/4200 Training loss: 0.4960 1.7938 sec/batch
Epoch 260/300 Iteration 3629/4200 Training loss: 0.4903 1.7947 sec/batch
Epoch 260/300 Iteration 3630/4200 Training loss: 0.4868 1.7992 sec/batch
Epoch 260/300 Iteration 3631/4200 Training loss: 0.4828 1.7952 sec/batch
Epoch 260/300 Iteration 3632/4200 Training loss: 0.4805 1.7987 sec/batch
Epoch 260/300 Iteration 3633/4200 Training loss: 0.4782 1.7957 sec/batch
Epoch 260/300 Iteration 3634/4200 Training loss: 0.4768 1.8007 sec/batch
Epoch 260/300 Iteration 3635/4200 Training loss: 0.4751 1.7990 sec/batch
Epoch 260/300 Iteration 3636/4200 Training loss: 0.4749 1.7412 sec/batch
Epoch 260/300 Iteration 3637/4200 Training loss: 0.4752 1.7473 sec/batch
Epoch 260/300 Iteration 3638/4200 Training loss: 0.4755 1.7387 sec/batch
Epoch 260/300 Iteration 3639/4200 Training loss: 0.4762 1.7387 sec/batch
Epoch 260/300 Iteration 3640/4200 Training loss: 0.4760 1.8054 sec/batch
Epoch 261/300 Iteration 3641/4200 Training loss: 0.4948 1.7418 sec/batch
Epoch 261/300 Iteration 3642/4200 Training loss: 0.4848 1.7430 sec/batch
Epoch 261/300 Iteration 3643/4200 Training loss: 0.4758 1.7492 sec/batch
Epoch 261/300 Iteration 3644/4200 Training loss: 0.4745 1.7409 sec/batch
Epoch 261/300 Iteration 3645/4200 Training loss: 0.4719 1.7409 sec/batch
Epoch 261/300 Iteration 3646/4200 Training loss: 0.4725 1.7429 sec/batch
Epoch 261/300 Iteration 3647/4200 Training loss: 0.4720 1.7461 sec/batch
Epoch 261/300 Iteration 3648/4200 Training loss: 0.4719 1.7377 sec/batch
Epoch 261/300 Iteration 3649/4200 Training loss: 0.4709 1.7398 sec/batch
Epoch 261/300 Iteration 3650/4200 Training loss: 0.4717 1.7380 sec/batch
Epoch 261/300 Iteration 3651/4200 Training loss: 0.4722 1.7382 sec/batch
Epoch 261/300 Iteration 3652/4200 Training loss: 0.4707 1.7401 sec/batch
Epoch 261/300 Iteration 3653/4200 Training loss: 0.4714 1.7422 sec/batch
Epoch 261/300 Iteration 3654/4200 Training loss: 0.4721 1.7453 sec/batch
Epoch 262/300 Iteration 3655/4200 Training loss: 0.5128 1.7542 sec/batch
Epoch 262/300 Iteration 3656/4200 Training loss: 0.4903 1.7597 sec/batch
Epoch 262/300 Iteration 3657/4200 Training loss: 0.4828 1.7409 sec/batch
Epoch 262/300 Iteration 3658/4200 Training loss: 0.4794 1.7415 sec/batch
Epoch 262/300 Iteration 3659/4200 Training loss: 0.4804 1.7388 sec/batch
Epoch 262/300 Iteration 3660/4200 Training loss: 0.4804 1.7401 sec/batch
Epoch 262/300 Iteration 3661/4200 Training loss: 0.4800 1.7391 sec/batch
Epoch 262/300 Iteration 3662/4200 Training loss: 0.4782 1.7391 sec/batch
Epoch 262/300 Iteration 3663/4200 Training loss: 0.4765 1.7385 sec/batch
Epoch 262/300 Iteration 3664/4200 Training loss: 0.4759 1.7387 sec/batch
Epoch 262/300 Iteration 3665/4200 Training loss: 0.4764 1.7419 sec/batch
Epoch 262/300 Iteration 3666/4200 Training loss: 0.4753 1.7413 sec/batch
Epoch 262/300 Iteration 3667/4200 Training loss: 0.4761 1.7390 sec/batch
Epoch 262/300 Iteration 3668/4200 Training loss: 0.4764 1.8084 sec/batch
Epoch 263/300 Iteration 3669/4200 Training loss: 0.5186 1.7461 sec/batch
Epoch 263/300 Iteration 3670/4200 Training loss: 0.4921 1.7437 sec/batch
Epoch 263/300 Iteration 3671/4200 Training loss: 0.4759 1.7554 sec/batch
Epoch 263/300 Iteration 3672/4200 Training loss: 0.4734 1.7389 sec/batch
Epoch 263/300 Iteration 3673/4200 Training loss: 0.4725 1.7374 sec/batch
Epoch 263/300 Iteration 3674/4200 Training loss: 0.4716 1.7407 sec/batch
Epoch 263/300 Iteration 3675/4200 Training loss: 0.4719 1.7442 sec/batch
Epoch 263/300 Iteration 3676/4200 Training loss: 0.4706 1.7952 sec/batch
Epoch 263/300 Iteration 3677/4200 Training loss: 0.4691 1.7941 sec/batch
Epoch 263/300 Iteration 3678/4200 Training loss: 0.4710 1.7986 sec/batch
Epoch 263/300 Iteration 3679/4200 Training loss: 0.4720 1.7442 sec/batch
Epoch 263/300 Iteration 3680/4200 Training loss: 0.4711 1.7400 sec/batch
Epoch 263/300 Iteration 3681/4200 Training loss: 0.4716 1.7415 sec/batch
Epoch 263/300 Iteration 3682/4200 Training loss: 0.4719 1.7443 sec/batch
Epoch 264/300 Iteration 3683/4200 Training loss: 0.5122 1.7416 sec/batch
Epoch 264/300 Iteration 3684/4200 Training loss: 0.4821 1.7409 sec/batch
Epoch 264/300 Iteration 3685/4200 Training loss: 0.4778 1.7397 sec/batch
Epoch 264/300 Iteration 3686/4200 Training loss: 0.4769 1.7391 sec/batch
Epoch 264/300 Iteration 3687/4200 Training loss: 0.4743 1.7392 sec/batch
Epoch 264/300 Iteration 3688/4200 Training loss: 0.4750 1.7414 sec/batch
Epoch 264/300 Iteration 3689/4200 Training loss: 0.4735 1.7434 sec/batch
Epoch 264/300 Iteration 3690/4200 Training loss: 0.4732 1.7383 sec/batch
Epoch 264/300 Iteration 3691/4200 Training loss: 0.4711 1.7409 sec/batch
Epoch 264/300 Iteration 3692/4200 Training loss: 0.4723 1.7464 sec/batch
Epoch 264/300 Iteration 3693/4200 Training loss: 0.4723 1.7410 sec/batch
Epoch 264/300 Iteration 3694/4200 Training loss: 0.4710 1.7403 sec/batch
Epoch 264/300 Iteration 3695/4200 Training loss: 0.4711 1.7416 sec/batch
Epoch 264/300 Iteration 3696/4200 Training loss: 0.4716 1.7384 sec/batch
Epoch 265/300 Iteration 3697/4200 Training loss: 0.5146 1.7442 sec/batch
Epoch 265/300 Iteration 3698/4200 Training loss: 0.4905 1.7418 sec/batch
Epoch 265/300 Iteration 3699/4200 Training loss: 0.4824 1.7468 sec/batch
Epoch 265/300 Iteration 3700/4200 Training loss: 0.4810 1.7395 sec/batch
Validation loss: 2.77884 Saving checkpoint!
Epoch 265/300 Iteration 3701/4200 Training loss: 0.5973 1.6427 sec/batch
Epoch 265/300 Iteration 3702/4200 Training loss: 0.5764 1.7464 sec/batch
Epoch 265/300 Iteration 3703/4200 Training loss: 0.5626 1.7411 sec/batch
Epoch 265/300 Iteration 3704/4200 Training loss: 0.5507 1.7406 sec/batch
Epoch 265/300 Iteration 3705/4200 Training loss: 0.5404 1.7427 sec/batch
Epoch 265/300 Iteration 3706/4200 Training loss: 0.5335 1.7437 sec/batch
Epoch 265/300 Iteration 3707/4200 Training loss: 0.5272 1.7414 sec/batch
Epoch 265/300 Iteration 3708/4200 Training loss: 0.5219 1.7421 sec/batch
Epoch 265/300 Iteration 3709/4200 Training loss: 0.5183 1.7392 sec/batch
Epoch 265/300 Iteration 3710/4200 Training loss: 0.5151 1.7400 sec/batch
Epoch 266/300 Iteration 3711/4200 Training loss: 0.5026 1.7473 sec/batch
Epoch 266/300 Iteration 3712/4200 Training loss: 0.4786 1.7389 sec/batch
Epoch 266/300 Iteration 3713/4200 Training loss: 0.4739 1.7395 sec/batch
Epoch 266/300 Iteration 3714/4200 Training loss: 0.4776 1.7386 sec/batch
Epoch 266/300 Iteration 3715/4200 Training loss: 0.4784 1.7416 sec/batch
Epoch 266/300 Iteration 3716/4200 Training loss: 0.4783 1.7388 sec/batch
Epoch 266/300 Iteration 3717/4200 Training loss: 0.4733 1.7435 sec/batch
Epoch 266/300 Iteration 3718/4200 Training loss: 0.4722 1.7460 sec/batch
Epoch 266/300 Iteration 3719/4200 Training loss: 0.4708 1.7404 sec/batch
Epoch 266/300 Iteration 3720/4200 Training loss: 0.4699 1.7410 sec/batch
Epoch 266/300 Iteration 3721/4200 Training loss: 0.4704 1.7447 sec/batch
Epoch 266/300 Iteration 3722/4200 Training loss: 0.4694 1.7693 sec/batch
Epoch 266/300 Iteration 3723/4200 Training loss: 0.4699 1.7597 sec/batch
Epoch 266/300 Iteration 3724/4200 Training loss: 0.4694 1.7377 sec/batch
Epoch 267/300 Iteration 3725/4200 Training loss: 0.4999 1.7458 sec/batch
Epoch 267/300 Iteration 3726/4200 Training loss: 0.4787 1.7406 sec/batch
Epoch 267/300 Iteration 3727/4200 Training loss: 0.4711 1.7412 sec/batch
Epoch 267/300 Iteration 3728/4200 Training loss: 0.4680 1.7443 sec/batch
Epoch 267/300 Iteration 3729/4200 Training loss: 0.4662 1.7403 sec/batch
Epoch 267/300 Iteration 3730/4200 Training loss: 0.4679 1.7383 sec/batch
Epoch 267/300 Iteration 3731/4200 Training loss: 0.4643 1.7398 sec/batch
Epoch 267/300 Iteration 3732/4200 Training loss: 0.4636 1.7432 sec/batch
Epoch 267/300 Iteration 3733/4200 Training loss: 0.4615 1.7390 sec/batch
Epoch 267/300 Iteration 3734/4200 Training loss: 0.4627 1.7776 sec/batch
Epoch 267/300 Iteration 3735/4200 Training loss: 0.4636 1.7425 sec/batch
Epoch 267/300 Iteration 3736/4200 Training loss: 0.4622 1.7879 sec/batch
Epoch 267/300 Iteration 3737/4200 Training loss: 0.4625 1.7943 sec/batch
Epoch 267/300 Iteration 3738/4200 Training loss: 0.4616 1.7937 sec/batch
Epoch 268/300 Iteration 3739/4200 Training loss: 0.4845 1.7935 sec/batch
Epoch 268/300 Iteration 3740/4200 Training loss: 0.4637 1.7967 sec/batch
Epoch 268/300 Iteration 3741/4200 Training loss: 0.4631 1.7935 sec/batch
Epoch 268/300 Iteration 3742/4200 Training loss: 0.4614 1.7945 sec/batch
Epoch 268/300 Iteration 3743/4200 Training loss: 0.4644 1.7970 sec/batch
Epoch 268/300 Iteration 3744/4200 Training loss: 0.4636 1.7952 sec/batch
Epoch 268/300 Iteration 3745/4200 Training loss: 0.4607 1.7501 sec/batch
Epoch 268/300 Iteration 3746/4200 Training loss: 0.4578 1.7404 sec/batch
Epoch 268/300 Iteration 3747/4200 Training loss: 0.4571 1.7390 sec/batch
Epoch 268/300 Iteration 3748/4200 Training loss: 0.4578 1.7402 sec/batch
Epoch 268/300 Iteration 3749/4200 Training loss: 0.4597 1.7441 sec/batch
Epoch 268/300 Iteration 3750/4200 Training loss: 0.4580 1.7425 sec/batch
Epoch 268/300 Iteration 3751/4200 Training loss: 0.4597 1.7395 sec/batch
Epoch 268/300 Iteration 3752/4200 Training loss: 0.4585 1.7403 sec/batch
Epoch 269/300 Iteration 3753/4200 Training loss: 0.4937 1.7391 sec/batch
Epoch 269/300 Iteration 3754/4200 Training loss: 0.4730 1.7395 sec/batch
Epoch 269/300 Iteration 3755/4200 Training loss: 0.4607 1.7435 sec/batch
Epoch 269/300 Iteration 3756/4200 Training loss: 0.4601 1.7420 sec/batch
Epoch 269/300 Iteration 3757/4200 Training loss: 0.4591 1.7419 sec/batch
Epoch 269/300 Iteration 3758/4200 Training loss: 0.4606 1.7414 sec/batch
Epoch 269/300 Iteration 3759/4200 Training loss: 0.4579 1.7383 sec/batch
Epoch 269/300 Iteration 3760/4200 Training loss: 0.4582 1.7390 sec/batch
Epoch 269/300 Iteration 3761/4200 Training loss: 0.4559 1.7412 sec/batch
Epoch 269/300 Iteration 3762/4200 Training loss: 0.4558 1.7448 sec/batch
Epoch 269/300 Iteration 3763/4200 Training loss: 0.4562 1.7391 sec/batch
Epoch 269/300 Iteration 3764/4200 Training loss: 0.4550 1.7426 sec/batch
Epoch 269/300 Iteration 3765/4200 Training loss: 0.4563 1.7421 sec/batch
Epoch 269/300 Iteration 3766/4200 Training loss: 0.4570 1.7457 sec/batch
Epoch 270/300 Iteration 3767/4200 Training loss: 0.4941 1.7408 sec/batch
Epoch 270/300 Iteration 3768/4200 Training loss: 0.4740 1.7387 sec/batch
Epoch 270/300 Iteration 3769/4200 Training loss: 0.4696 1.7398 sec/batch
Epoch 270/300 Iteration 3770/4200 Training loss: 0.4682 1.7464 sec/batch
Epoch 270/300 Iteration 3771/4200 Training loss: 0.4639 1.7415 sec/batch
Epoch 270/300 Iteration 3772/4200 Training loss: 0.4640 1.7396 sec/batch
Epoch 270/300 Iteration 3773/4200 Training loss: 0.4608 1.7418 sec/batch
Epoch 270/300 Iteration 3774/4200 Training loss: 0.4579 1.7427 sec/batch
Epoch 270/300 Iteration 3775/4200 Training loss: 0.4561 1.7386 sec/batch
Epoch 270/300 Iteration 3776/4200 Training loss: 0.4552 1.7377 sec/batch
Epoch 270/300 Iteration 3777/4200 Training loss: 0.4562 1.7397 sec/batch
Epoch 270/300 Iteration 3778/4200 Training loss: 0.4555 1.7393 sec/batch
Epoch 270/300 Iteration 3779/4200 Training loss: 0.4559 1.7427 sec/batch
Epoch 270/300 Iteration 3780/4200 Training loss: 0.4559 1.7398 sec/batch
Epoch 271/300 Iteration 3781/4200 Training loss: 0.4938 1.7406 sec/batch
Epoch 271/300 Iteration 3782/4200 Training loss: 0.4663 1.7430 sec/batch
Epoch 271/300 Iteration 3783/4200 Training loss: 0.4566 1.7445 sec/batch
Epoch 271/300 Iteration 3784/4200 Training loss: 0.4575 1.7392 sec/batch
Epoch 271/300 Iteration 3785/4200 Training loss: 0.4556 1.7394 sec/batch
Epoch 271/300 Iteration 3786/4200 Training loss: 0.4567 1.7394 sec/batch
Epoch 271/300 Iteration 3787/4200 Training loss: 0.4567 1.7453 sec/batch
Epoch 271/300 Iteration 3788/4200 Training loss: 0.4560 1.7409 sec/batch
Epoch 271/300 Iteration 3789/4200 Training loss: 0.4554 1.7425 sec/batch
Epoch 271/300 Iteration 3790/4200 Training loss: 0.4553 1.7464 sec/batch
Epoch 271/300 Iteration 3791/4200 Training loss: 0.4550 1.7540 sec/batch
Epoch 271/300 Iteration 3792/4200 Training loss: 0.4541 1.7588 sec/batch
Epoch 271/300 Iteration 3793/4200 Training loss: 0.4546 1.7445 sec/batch
Epoch 271/300 Iteration 3794/4200 Training loss: 0.4553 1.7384 sec/batch
Epoch 272/300 Iteration 3795/4200 Training loss: 0.4923 1.7403 sec/batch
Epoch 272/300 Iteration 3796/4200 Training loss: 0.4640 1.7433 sec/batch
Epoch 272/300 Iteration 3797/4200 Training loss: 0.4531 1.7394 sec/batch
Epoch 272/300 Iteration 3798/4200 Training loss: 0.4537 1.7790 sec/batch
Epoch 272/300 Iteration 3799/4200 Training loss: 0.4559 1.7945 sec/batch
Epoch 272/300 Iteration 3800/4200 Training loss: 0.4544 1.7453 sec/batch
Validation loss: 2.81519 Saving checkpoint!
Epoch 272/300 Iteration 3801/4200 Training loss: 0.5382 1.6556 sec/batch
Epoch 272/300 Iteration 3802/4200 Training loss: 0.5240 1.7986 sec/batch
Epoch 272/300 Iteration 3803/4200 Training loss: 0.5144 1.7482 sec/batch
Epoch 272/300 Iteration 3804/4200 Training loss: 0.5077 1.7400 sec/batch
Epoch 272/300 Iteration 3805/4200 Training loss: 0.5024 1.7403 sec/batch
Epoch 272/300 Iteration 3806/4200 Training loss: 0.4968 1.7448 sec/batch
Epoch 272/300 Iteration 3807/4200 Training loss: 0.4935 1.7416 sec/batch
Epoch 272/300 Iteration 3808/4200 Training loss: 0.4912 1.7398 sec/batch
Epoch 273/300 Iteration 3809/4200 Training loss: 0.4808 1.7457 sec/batch
Epoch 273/300 Iteration 3810/4200 Training loss: 0.4683 1.7439 sec/batch
Epoch 273/300 Iteration 3811/4200 Training loss: 0.4600 1.7384 sec/batch
Epoch 273/300 Iteration 3812/4200 Training loss: 0.4612 1.7435 sec/batch
Epoch 273/300 Iteration 3813/4200 Training loss: 0.4544 1.7384 sec/batch
Epoch 273/300 Iteration 3814/4200 Training loss: 0.4523 1.7379 sec/batch
Epoch 273/300 Iteration 3815/4200 Training loss: 0.4505 1.7413 sec/batch
Epoch 273/300 Iteration 3816/4200 Training loss: 0.4482 1.7393 sec/batch
Epoch 273/300 Iteration 3817/4200 Training loss: 0.4463 1.7393 sec/batch
Epoch 273/300 Iteration 3818/4200 Training loss: 0.4456 1.7796 sec/batch
Epoch 273/300 Iteration 3819/4200 Training loss: 0.4453 1.7996 sec/batch
Epoch 273/300 Iteration 3820/4200 Training loss: 0.4453 1.7389 sec/batch
Epoch 273/300 Iteration 3821/4200 Training loss: 0.4452 1.7403 sec/batch
Epoch 273/300 Iteration 3822/4200 Training loss: 0.4449 1.7401 sec/batch
Epoch 274/300 Iteration 3823/4200 Training loss: 0.4754 1.7481 sec/batch
Epoch 274/300 Iteration 3824/4200 Training loss: 0.4546 1.7403 sec/batch
Epoch 274/300 Iteration 3825/4200 Training loss: 0.4474 1.7403 sec/batch
Epoch 274/300 Iteration 3826/4200 Training loss: 0.4460 1.7385 sec/batch
Epoch 274/300 Iteration 3827/4200 Training loss: 0.4474 1.7385 sec/batch
Epoch 274/300 Iteration 3828/4200 Training loss: 0.4475 1.7395 sec/batch
Epoch 274/300 Iteration 3829/4200 Training loss: 0.4463 1.7381 sec/batch
Epoch 274/300 Iteration 3830/4200 Training loss: 0.4457 1.7446 sec/batch
Epoch 274/300 Iteration 3831/4200 Training loss: 0.4434 1.7394 sec/batch
Epoch 274/300 Iteration 3832/4200 Training loss: 0.4429 1.7382 sec/batch
Epoch 274/300 Iteration 3833/4200 Training loss: 0.4443 1.7417 sec/batch
Epoch 274/300 Iteration 3834/4200 Training loss: 0.4429 1.7464 sec/batch
Epoch 274/300 Iteration 3835/4200 Training loss: 0.4429 1.7968 sec/batch
Epoch 274/300 Iteration 3836/4200 Training loss: 0.4419 1.7988 sec/batch
Epoch 275/300 Iteration 3837/4200 Training loss: 0.4745 1.7490 sec/batch
Epoch 275/300 Iteration 3838/4200 Training loss: 0.4516 1.7431 sec/batch
Epoch 275/300 Iteration 3839/4200 Training loss: 0.4472 1.7402 sec/batch
Epoch 275/300 Iteration 3840/4200 Training loss: 0.4444 1.7392 sec/batch
Epoch 275/300 Iteration 3841/4200 Training loss: 0.4455 1.7416 sec/batch
Epoch 275/300 Iteration 3842/4200 Training loss: 0.4488 1.7414 sec/batch
Epoch 275/300 Iteration 3843/4200 Training loss: 0.4455 1.7432 sec/batch
Epoch 275/300 Iteration 3844/4200 Training loss: 0.4444 1.7393 sec/batch
Epoch 275/300 Iteration 3845/4200 Training loss: 0.4426 1.7392 sec/batch
Epoch 275/300 Iteration 3846/4200 Training loss: 0.4425 1.7396 sec/batch
Epoch 275/300 Iteration 3847/4200 Training loss: 0.4429 1.7405 sec/batch
Epoch 275/300 Iteration 3848/4200 Training loss: 0.4431 1.7429 sec/batch
Epoch 275/300 Iteration 3849/4200 Training loss: 0.4438 1.7403 sec/batch
Epoch 275/300 Iteration 3850/4200 Training loss: 0.4444 1.7394 sec/batch
Epoch 276/300 Iteration 3851/4200 Training loss: 0.4621 1.7391 sec/batch
Epoch 276/300 Iteration 3852/4200 Training loss: 0.4388 1.7393 sec/batch
Epoch 276/300 Iteration 3853/4200 Training loss: 0.4335 1.7431 sec/batch
Epoch 276/300 Iteration 3854/4200 Training loss: 0.4365 1.7393 sec/batch
Epoch 276/300 Iteration 3855/4200 Training loss: 0.4358 1.7421 sec/batch
Epoch 276/300 Iteration 3856/4200 Training loss: 0.4367 1.7429 sec/batch
Epoch 276/300 Iteration 3857/4200 Training loss: 0.4348 1.7452 sec/batch
Epoch 276/300 Iteration 3858/4200 Training loss: 0.4333 1.7484 sec/batch
Epoch 276/300 Iteration 3859/4200 Training loss: 0.4327 1.7559 sec/batch
Epoch 276/300 Iteration 3860/4200 Training loss: 0.4336 1.7392 sec/batch
Epoch 276/300 Iteration 3861/4200 Training loss: 0.4334 1.7386 sec/batch
Epoch 276/300 Iteration 3862/4200 Training loss: 0.4333 1.7400 sec/batch
Epoch 276/300 Iteration 3863/4200 Training loss: 0.4339 1.7386 sec/batch
Epoch 276/300 Iteration 3864/4200 Training loss: 0.4344 1.7430 sec/batch
Epoch 277/300 Iteration 3865/4200 Training loss: 0.4714 1.7400 sec/batch
Epoch 277/300 Iteration 3866/4200 Training loss: 0.4531 1.7807 sec/batch
Epoch 277/300 Iteration 3867/4200 Training loss: 0.4412 1.7938 sec/batch
Epoch 277/300 Iteration 3868/4200 Training loss: 0.4370 1.7972 sec/batch
Epoch 277/300 Iteration 3869/4200 Training loss: 0.4373 1.7949 sec/batch
Epoch 277/300 Iteration 3870/4200 Training loss: 0.4378 1.7954 sec/batch
Epoch 277/300 Iteration 3871/4200 Training loss: 0.4358 1.8021 sec/batch
Epoch 277/300 Iteration 3872/4200 Training loss: 0.4341 1.8019 sec/batch
Epoch 277/300 Iteration 3873/4200 Training loss: 0.4335 1.7951 sec/batch
Epoch 277/300 Iteration 3874/4200 Training loss: 0.4341 1.7986 sec/batch
Epoch 277/300 Iteration 3875/4200 Training loss: 0.4347 1.7971 sec/batch
Epoch 277/300 Iteration 3876/4200 Training loss: 0.4333 1.7970 sec/batch
Epoch 277/300 Iteration 3877/4200 Training loss: 0.4346 1.8008 sec/batch
Epoch 277/300 Iteration 3878/4200 Training loss: 0.4345 1.7994 sec/batch
Epoch 278/300 Iteration 3879/4200 Training loss: 0.4619 1.7936 sec/batch
Epoch 278/300 Iteration 3880/4200 Training loss: 0.4447 1.7958 sec/batch
Epoch 278/300 Iteration 3881/4200 Training loss: 0.4445 1.7436 sec/batch
Epoch 278/300 Iteration 3882/4200 Training loss: 0.4411 1.7419 sec/batch
Epoch 278/300 Iteration 3883/4200 Training loss: 0.4391 1.7405 sec/batch
Epoch 278/300 Iteration 3884/4200 Training loss: 0.4397 1.7449 sec/batch
Epoch 278/300 Iteration 3885/4200 Training loss: 0.4390 1.7398 sec/batch
Epoch 278/300 Iteration 3886/4200 Training loss: 0.4380 1.7395 sec/batch
Epoch 278/300 Iteration 3887/4200 Training loss: 0.4357 1.7401 sec/batch
Epoch 278/300 Iteration 3888/4200 Training loss: 0.4348 1.7429 sec/batch
Epoch 278/300 Iteration 3889/4200 Training loss: 0.4350 1.7376 sec/batch
Epoch 278/300 Iteration 3890/4200 Training loss: 0.4333 1.7371 sec/batch
Epoch 278/300 Iteration 3891/4200 Training loss: 0.4343 1.7409 sec/batch
Epoch 278/300 Iteration 3892/4200 Training loss: 0.4343 1.7398 sec/batch
Epoch 279/300 Iteration 3893/4200 Training loss: 0.4734 1.7412 sec/batch
Epoch 279/300 Iteration 3894/4200 Training loss: 0.4475 1.7412 sec/batch
Epoch 279/300 Iteration 3895/4200 Training loss: 0.4361 1.7416 sec/batch
Epoch 279/300 Iteration 3896/4200 Training loss: 0.4362 1.7399 sec/batch
Epoch 279/300 Iteration 3897/4200 Training loss: 0.4358 1.7401 sec/batch
Epoch 279/300 Iteration 3898/4200 Training loss: 0.4361 1.7386 sec/batch
Epoch 279/300 Iteration 3899/4200 Training loss: 0.4351 1.7428 sec/batch
Epoch 279/300 Iteration 3900/4200 Training loss: 0.4327 1.7389 sec/batch
Validation loss: 2.88249 Saving checkpoint!
Epoch 279/300 Iteration 3901/4200 Training loss: 0.4979 1.6487 sec/batch
Epoch 279/300 Iteration 3902/4200 Training loss: 0.4902 1.7380 sec/batch
Epoch 279/300 Iteration 3903/4200 Training loss: 0.4849 1.7371 sec/batch
Epoch 279/300 Iteration 3904/4200 Training loss: 0.4787 1.7456 sec/batch
Epoch 279/300 Iteration 3905/4200 Training loss: 0.4760 1.7414 sec/batch
Epoch 279/300 Iteration 3906/4200 Training loss: 0.4738 1.7407 sec/batch
Epoch 280/300 Iteration 3907/4200 Training loss: 0.4730 1.7440 sec/batch
Epoch 280/300 Iteration 3908/4200 Training loss: 0.4476 1.7404 sec/batch
Epoch 280/300 Iteration 3909/4200 Training loss: 0.4382 1.7391 sec/batch
Epoch 280/300 Iteration 3910/4200 Training loss: 0.4371 1.7382 sec/batch
Epoch 280/300 Iteration 3911/4200 Training loss: 0.4353 1.7393 sec/batch
Epoch 280/300 Iteration 3912/4200 Training loss: 0.4363 1.7384 sec/batch
Epoch 280/300 Iteration 3913/4200 Training loss: 0.4375 1.7396 sec/batch
Epoch 280/300 Iteration 3914/4200 Training loss: 0.4365 1.7441 sec/batch
Epoch 280/300 Iteration 3915/4200 Training loss: 0.4357 1.7386 sec/batch
Epoch 280/300 Iteration 3916/4200 Training loss: 0.4353 1.7394 sec/batch
Epoch 280/300 Iteration 3917/4200 Training loss: 0.4347 1.7401 sec/batch
Epoch 280/300 Iteration 3918/4200 Training loss: 0.4332 1.7400 sec/batch
Epoch 280/300 Iteration 3919/4200 Training loss: 0.4341 1.7423 sec/batch
Epoch 280/300 Iteration 3920/4200 Training loss: 0.4342 1.7398 sec/batch
Epoch 281/300 Iteration 3921/4200 Training loss: 0.4822 1.7392 sec/batch
Epoch 281/300 Iteration 3922/4200 Training loss: 0.4566 1.7409 sec/batch
Epoch 281/300 Iteration 3923/4200 Training loss: 0.4461 1.7387 sec/batch
Epoch 281/300 Iteration 3924/4200 Training loss: 0.4417 1.7383 sec/batch
Epoch 281/300 Iteration 3925/4200 Training loss: 0.4396 1.7534 sec/batch
Epoch 281/300 Iteration 3926/4200 Training loss: 0.4410 1.7576 sec/batch
Epoch 281/300 Iteration 3927/4200 Training loss: 0.4390 1.7395 sec/batch
Epoch 281/300 Iteration 3928/4200 Training loss: 0.4381 1.7483 sec/batch
Epoch 281/300 Iteration 3929/4200 Training loss: 0.4358 1.7411 sec/batch
Epoch 281/300 Iteration 3930/4200 Training loss: 0.4361 1.7404 sec/batch
Epoch 281/300 Iteration 3931/4200 Training loss: 0.4351 1.7385 sec/batch
Epoch 281/300 Iteration 3932/4200 Training loss: 0.4344 1.7421 sec/batch
Epoch 281/300 Iteration 3933/4200 Training loss: 0.4349 1.7412 sec/batch
Epoch 281/300 Iteration 3934/4200 Training loss: 0.4348 1.7428 sec/batch
Epoch 282/300 Iteration 3935/4200 Training loss: 0.4655 1.7408 sec/batch
Epoch 282/300 Iteration 3936/4200 Training loss: 0.4388 1.7402 sec/batch
Epoch 282/300 Iteration 3937/4200 Training loss: 0.4261 1.7402 sec/batch
Epoch 282/300 Iteration 3938/4200 Training loss: 0.4278 1.7404 sec/batch
Epoch 282/300 Iteration 3939/4200 Training loss: 0.4292 1.7431 sec/batch
Epoch 282/300 Iteration 3940/4200 Training loss: 0.4302 1.7384 sec/batch
Epoch 282/300 Iteration 3941/4200 Training loss: 0.4307 1.7411 sec/batch
Epoch 282/300 Iteration 3942/4200 Training loss: 0.4305 1.7381 sec/batch
Epoch 282/300 Iteration 3943/4200 Training loss: 0.4282 1.7422 sec/batch
Epoch 282/300 Iteration 3944/4200 Training loss: 0.4278 1.7397 sec/batch
Epoch 282/300 Iteration 3945/4200 Training loss: 0.4272 1.7439 sec/batch
Epoch 282/300 Iteration 3946/4200 Training loss: 0.4262 1.7396 sec/batch
Epoch 282/300 Iteration 3947/4200 Training loss: 0.4263 1.7420 sec/batch
Epoch 282/300 Iteration 3948/4200 Training loss: 0.4261 1.7394 sec/batch
Epoch 283/300 Iteration 3949/4200 Training loss: 0.4572 1.7379 sec/batch
Epoch 283/300 Iteration 3950/4200 Training loss: 0.4393 1.7405 sec/batch
Epoch 283/300 Iteration 3951/4200 Training loss: 0.4323 1.7394 sec/batch
Epoch 283/300 Iteration 3952/4200 Training loss: 0.4316 1.7410 sec/batch
Epoch 283/300 Iteration 3953/4200 Training loss: 0.4328 1.7397 sec/batch
Epoch 283/300 Iteration 3954/4200 Training loss: 0.4337 1.7382 sec/batch
Epoch 283/300 Iteration 3955/4200 Training loss: 0.4309 1.7374 sec/batch
Epoch 283/300 Iteration 3956/4200 Training loss: 0.4293 1.7413 sec/batch
Epoch 283/300 Iteration 3957/4200 Training loss: 0.4282 1.7393 sec/batch
Epoch 283/300 Iteration 3958/4200 Training loss: 0.4277 1.7420 sec/batch
Epoch 283/300 Iteration 3959/4200 Training loss: 0.4286 1.7482 sec/batch
Epoch 283/300 Iteration 3960/4200 Training loss: 0.4280 1.7393 sec/batch
Epoch 283/300 Iteration 3961/4200 Training loss: 0.4284 1.7399 sec/batch
Epoch 283/300 Iteration 3962/4200 Training loss: 0.4277 1.7446 sec/batch
Epoch 284/300 Iteration 3963/4200 Training loss: 0.4545 1.7390 sec/batch
Epoch 284/300 Iteration 3964/4200 Training loss: 0.4339 1.7408 sec/batch
Epoch 284/300 Iteration 3965/4200 Training loss: 0.4254 1.7447 sec/batch
Epoch 284/300 Iteration 3966/4200 Training loss: 0.4278 1.7404 sec/batch
Epoch 284/300 Iteration 3967/4200 Training loss: 0.4247 1.7426 sec/batch
Epoch 284/300 Iteration 3968/4200 Training loss: 0.4291 1.7395 sec/batch
Epoch 284/300 Iteration 3969/4200 Training loss: 0.4283 1.7398 sec/batch
Epoch 284/300 Iteration 3970/4200 Training loss: 0.4278 1.7406 sec/batch
Epoch 284/300 Iteration 3971/4200 Training loss: 0.4279 1.7809 sec/batch
Epoch 284/300 Iteration 3972/4200 Training loss: 0.4279 1.7935 sec/batch
Epoch 284/300 Iteration 3973/4200 Training loss: 0.4280 1.7459 sec/batch
Epoch 284/300 Iteration 3974/4200 Training loss: 0.4259 1.7401 sec/batch
Epoch 284/300 Iteration 3975/4200 Training loss: 0.4264 1.7379 sec/batch
Epoch 284/300 Iteration 3976/4200 Training loss: 0.4271 1.7446 sec/batch
Epoch 285/300 Iteration 3977/4200 Training loss: 0.4782 1.7406 sec/batch
Epoch 285/300 Iteration 3978/4200 Training loss: 0.4447 1.7413 sec/batch
Epoch 285/300 Iteration 3979/4200 Training loss: 0.4329 1.7449 sec/batch
Epoch 285/300 Iteration 3980/4200 Training loss: 0.4304 1.7393 sec/batch
Epoch 285/300 Iteration 3981/4200 Training loss: 0.4279 1.7387 sec/batch
Epoch 285/300 Iteration 3982/4200 Training loss: 0.4270 1.7393 sec/batch
Epoch 285/300 Iteration 3983/4200 Training loss: 0.4246 1.7410 sec/batch
Epoch 285/300 Iteration 3984/4200 Training loss: 0.4229 1.8073 sec/batch
Epoch 285/300 Iteration 3985/4200 Training loss: 0.4223 1.7949 sec/batch
Epoch 285/300 Iteration 3986/4200 Training loss: 0.4238 1.7388 sec/batch
Epoch 285/300 Iteration 3987/4200 Training loss: 0.4249 1.7543 sec/batch
Epoch 285/300 Iteration 3988/4200 Training loss: 0.4249 1.7813 sec/batch
Epoch 285/300 Iteration 3989/4200 Training loss: 0.4245 1.7962 sec/batch
Epoch 285/300 Iteration 3990/4200 Training loss: 0.4238 1.7953 sec/batch
Epoch 286/300 Iteration 3991/4200 Training loss: 0.4408 1.7438 sec/batch
Epoch 286/300 Iteration 3992/4200 Training loss: 0.4243 1.7409 sec/batch
Epoch 286/300 Iteration 3993/4200 Training loss: 0.4189 1.7704 sec/batch
Epoch 286/300 Iteration 3994/4200 Training loss: 0.4207 1.7485 sec/batch
Epoch 286/300 Iteration 3995/4200 Training loss: 0.4206 1.7540 sec/batch
Epoch 286/300 Iteration 3996/4200 Training loss: 0.4205 1.7741 sec/batch
Epoch 286/300 Iteration 3997/4200 Training loss: 0.4182 1.9099 sec/batch
Epoch 286/300 Iteration 3998/4200 Training loss: 0.4163 2.0208 sec/batch
Epoch 286/300 Iteration 3999/4200 Training loss: 0.4142 1.8757 sec/batch
Epoch 286/300 Iteration 4000/4200 Training loss: 0.4156 1.8929 sec/batch
Validation loss: 2.88912 Saving checkpoint!
Epoch 286/300 Iteration 4001/4200 Training loss: 0.4701 3.2101 sec/batch
Epoch 286/300 Iteration 4002/4200 Training loss: 0.4647 1.8695 sec/batch
Epoch 286/300 Iteration 4003/4200 Training loss: 0.4607 1.7069 sec/batch
Epoch 286/300 Iteration 4004/4200 Training loss: 0.4564 1.7682 sec/batch
Epoch 287/300 Iteration 4005/4200 Training loss: 0.4531 1.7416 sec/batch
Epoch 287/300 Iteration 4006/4200 Training loss: 0.4297 1.7381 sec/batch
Epoch 287/300 Iteration 4007/4200 Training loss: 0.4278 1.7405 sec/batch
Epoch 287/300 Iteration 4008/4200 Training loss: 0.4252 1.7916 sec/batch
Epoch 287/300 Iteration 4009/4200 Training loss: 0.4230 1.7401 sec/batch
Epoch 287/300 Iteration 4010/4200 Training loss: 0.4222 1.7378 sec/batch
Epoch 287/300 Iteration 4011/4200 Training loss: 0.4192 1.7811 sec/batch
Epoch 287/300 Iteration 4012/4200 Training loss: 0.4143 1.7489 sec/batch
Epoch 287/300 Iteration 4013/4200 Training loss: 0.4146 1.7969 sec/batch
Epoch 287/300 Iteration 4014/4200 Training loss: 0.4169 1.8120 sec/batch
Epoch 287/300 Iteration 4015/4200 Training loss: 0.4181 1.7375 sec/batch
Epoch 287/300 Iteration 4016/4200 Training loss: 0.4173 1.7398 sec/batch
Epoch 287/300 Iteration 4017/4200 Training loss: 0.4177 1.7769 sec/batch
Epoch 287/300 Iteration 4018/4200 Training loss: 0.4169 1.7936 sec/batch
Epoch 288/300 Iteration 4019/4200 Training loss: 0.4492 1.7429 sec/batch
Epoch 288/300 Iteration 4020/4200 Training loss: 0.4355 1.7707 sec/batch
Epoch 288/300 Iteration 4021/4200 Training loss: 0.4238 1.7914 sec/batch
Epoch 288/300 Iteration 4022/4200 Training loss: 0.4183 1.7985 sec/batch
Epoch 288/300 Iteration 4023/4200 Training loss: 0.4167 1.7990 sec/batch
Epoch 288/300 Iteration 4024/4200 Training loss: 0.4145 1.8126 sec/batch
Epoch 288/300 Iteration 4025/4200 Training loss: 0.4118 1.7862 sec/batch
Epoch 288/300 Iteration 4026/4200 Training loss: 0.4085 1.8551 sec/batch
Epoch 288/300 Iteration 4027/4200 Training loss: 0.4056 1.7766 sec/batch
Epoch 288/300 Iteration 4028/4200 Training loss: 0.4061 1.7859 sec/batch
Epoch 288/300 Iteration 4029/4200 Training loss: 0.4063 1.7461 sec/batch
Epoch 288/300 Iteration 4030/4200 Training loss: 0.4067 1.8257 sec/batch
Epoch 288/300 Iteration 4031/4200 Training loss: 0.4071 1.7382 sec/batch
Epoch 288/300 Iteration 4032/4200 Training loss: 0.4069 1.8066 sec/batch
Epoch 289/300 Iteration 4033/4200 Training loss: 0.4363 1.7474 sec/batch
Epoch 289/300 Iteration 4034/4200 Training loss: 0.4091 1.7973 sec/batch
Epoch 289/300 Iteration 4035/4200 Training loss: 0.4036 1.7759 sec/batch
Epoch 289/300 Iteration 4036/4200 Training loss: 0.4032 1.7656 sec/batch
Epoch 289/300 Iteration 4037/4200 Training loss: 0.4027 1.7401 sec/batch
Epoch 289/300 Iteration 4038/4200 Training loss: 0.4025 1.7402 sec/batch
Epoch 289/300 Iteration 4039/4200 Training loss: 0.4028 1.7392 sec/batch
Epoch 289/300 Iteration 4040/4200 Training loss: 0.4022 1.7418 sec/batch
Epoch 289/300 Iteration 4041/4200 Training loss: 0.4013 1.7368 sec/batch
Epoch 289/300 Iteration 4042/4200 Training loss: 0.4016 1.7390 sec/batch
Epoch 289/300 Iteration 4043/4200 Training loss: 0.4030 1.7415 sec/batch
Epoch 289/300 Iteration 4044/4200 Training loss: 0.4028 1.7456 sec/batch
Epoch 289/300 Iteration 4045/4200 Training loss: 0.4034 1.7402 sec/batch
Epoch 289/300 Iteration 4046/4200 Training loss: 0.4035 1.7379 sec/batch
Epoch 290/300 Iteration 4047/4200 Training loss: 0.4387 1.7422 sec/batch
Epoch 290/300 Iteration 4048/4200 Training loss: 0.4117 1.7425 sec/batch
Epoch 290/300 Iteration 4049/4200 Training loss: 0.4079 1.7397 sec/batch
Epoch 290/300 Iteration 4050/4200 Training loss: 0.4058 1.7417 sec/batch
Epoch 290/300 Iteration 4051/4200 Training loss: 0.4043 1.7425 sec/batch
Epoch 290/300 Iteration 4052/4200 Training loss: 0.4067 1.7398 sec/batch
Epoch 290/300 Iteration 4053/4200 Training loss: 0.4057 1.7402 sec/batch
Epoch 290/300 Iteration 4054/4200 Training loss: 0.4052 1.7380 sec/batch
Epoch 290/300 Iteration 4055/4200 Training loss: 0.4050 1.7380 sec/batch
Epoch 290/300 Iteration 4056/4200 Training loss: 0.4043 1.7428 sec/batch
Epoch 290/300 Iteration 4057/4200 Training loss: 0.4042 1.7373 sec/batch
Epoch 290/300 Iteration 4058/4200 Training loss: 0.4034 1.7580 sec/batch
Epoch 290/300 Iteration 4059/4200 Training loss: 0.4048 1.7506 sec/batch
Epoch 290/300 Iteration 4060/4200 Training loss: 0.4039 1.7379 sec/batch
Epoch 291/300 Iteration 4061/4200 Training loss: 0.4309 1.7392 sec/batch
Epoch 291/300 Iteration 4062/4200 Training loss: 0.4014 1.7424 sec/batch
Epoch 291/300 Iteration 4063/4200 Training loss: 0.4021 1.7367 sec/batch
Epoch 291/300 Iteration 4064/4200 Training loss: 0.4013 1.7420 sec/batch
Epoch 291/300 Iteration 4065/4200 Training loss: 0.4012 1.7424 sec/batch
Epoch 291/300 Iteration 4066/4200 Training loss: 0.4012 1.7430 sec/batch
Epoch 291/300 Iteration 4067/4200 Training loss: 0.4000 1.7978 sec/batch
Epoch 291/300 Iteration 4068/4200 Training loss: 0.3990 1.7428 sec/batch
Epoch 291/300 Iteration 4069/4200 Training loss: 0.3979 1.7375 sec/batch
Epoch 291/300 Iteration 4070/4200 Training loss: 0.3984 1.7384 sec/batch
Epoch 291/300 Iteration 4071/4200 Training loss: 0.3981 1.7390 sec/batch
Epoch 291/300 Iteration 4072/4200 Training loss: 0.3972 1.7418 sec/batch
Epoch 291/300 Iteration 4073/4200 Training loss: 0.3988 1.7396 sec/batch
Epoch 291/300 Iteration 4074/4200 Training loss: 0.3993 1.7411 sec/batch
Epoch 292/300 Iteration 4075/4200 Training loss: 0.4321 1.7366 sec/batch
Epoch 292/300 Iteration 4076/4200 Training loss: 0.4065 1.7385 sec/batch
Epoch 292/300 Iteration 4077/4200 Training loss: 0.4056 1.7389 sec/batch
Epoch 292/300 Iteration 4078/4200 Training loss: 0.4017 1.7399 sec/batch
Epoch 292/300 Iteration 4079/4200 Training loss: 0.4010 1.7383 sec/batch
Epoch 292/300 Iteration 4080/4200 Training loss: 0.4014 1.7387 sec/batch
Epoch 292/300 Iteration 4081/4200 Training loss: 0.3999 1.7414 sec/batch
Epoch 292/300 Iteration 4082/4200 Training loss: 0.3977 1.7388 sec/batch
Epoch 292/300 Iteration 4083/4200 Training loss: 0.3963 1.7421 sec/batch
Epoch 292/300 Iteration 4084/4200 Training loss: 0.3964 1.7379 sec/batch
Epoch 292/300 Iteration 4085/4200 Training loss: 0.3961 1.7372 sec/batch
Epoch 292/300 Iteration 4086/4200 Training loss: 0.3955 1.7401 sec/batch
Epoch 292/300 Iteration 4087/4200 Training loss: 0.3959 1.7399 sec/batch
Epoch 292/300 Iteration 4088/4200 Training loss: 0.3957 1.7468 sec/batch
Epoch 293/300 Iteration 4089/4200 Training loss: 0.4484 1.7379 sec/batch
Epoch 293/300 Iteration 4090/4200 Training loss: 0.4161 1.7413 sec/batch
Epoch 293/300 Iteration 4091/4200 Training loss: 0.4078 1.7429 sec/batch
Epoch 293/300 Iteration 4092/4200 Training loss: 0.4051 1.7205 sec/batch
Epoch 293/300 Iteration 4093/4200 Training loss: 0.4046 1.7381 sec/batch
Epoch 293/300 Iteration 4094/4200 Training loss: 0.4038 1.7403 sec/batch
Epoch 293/300 Iteration 4095/4200 Training loss: 0.4042 1.7409 sec/batch
Epoch 293/300 Iteration 4096/4200 Training loss: 0.4024 1.7415 sec/batch
Epoch 293/300 Iteration 4097/4200 Training loss: 0.4011 1.7390 sec/batch
Epoch 293/300 Iteration 4098/4200 Training loss: 0.4000 1.7424 sec/batch
Epoch 293/300 Iteration 4099/4200 Training loss: 0.3995 1.7392 sec/batch
Epoch 293/300 Iteration 4100/4200 Training loss: 0.3970 1.7391 sec/batch
Validation loss: 2.93268 Saving checkpoint!
Epoch 293/300 Iteration 4101/4200 Training loss: 0.4442 3.1724 sec/batch
Epoch 293/300 Iteration 4102/4200 Training loss: 0.4407 1.8580 sec/batch
Epoch 294/300 Iteration 4103/4200 Training loss: 0.4354 1.6706 sec/batch
Epoch 294/300 Iteration 4104/4200 Training loss: 0.4159 1.6938 sec/batch
Epoch 294/300 Iteration 4105/4200 Training loss: 0.4098 1.6914 sec/batch
Epoch 294/300 Iteration 4106/4200 Training loss: 0.4075 1.7430 sec/batch
Epoch 294/300 Iteration 4107/4200 Training loss: 0.4045 1.7429 sec/batch
Epoch 294/300 Iteration 4108/4200 Training loss: 0.4044 1.7398 sec/batch
Epoch 294/300 Iteration 4109/4200 Training loss: 0.4051 1.7416 sec/batch
Epoch 294/300 Iteration 4110/4200 Training loss: 0.4035 1.7448 sec/batch
Epoch 294/300 Iteration 4111/4200 Training loss: 0.4025 1.7421 sec/batch
Epoch 294/300 Iteration 4112/4200 Training loss: 0.4021 1.7396 sec/batch
Epoch 294/300 Iteration 4113/4200 Training loss: 0.4015 1.7390 sec/batch
Epoch 294/300 Iteration 4114/4200 Training loss: 0.4001 1.7422 sec/batch
Epoch 294/300 Iteration 4115/4200 Training loss: 0.4010 1.7373 sec/batch
Epoch 294/300 Iteration 4116/4200 Training loss: 0.4011 1.7416 sec/batch
Epoch 295/300 Iteration 4117/4200 Training loss: 0.4368 1.7418 sec/batch
Epoch 295/300 Iteration 4118/4200 Training loss: 0.4054 1.7382 sec/batch
Epoch 295/300 Iteration 4119/4200 Training loss: 0.4001 1.7414 sec/batch
Epoch 295/300 Iteration 4120/4200 Training loss: 0.3983 1.7400 sec/batch
Epoch 295/300 Iteration 4121/4200 Training loss: 0.3986 1.7375 sec/batch
Epoch 295/300 Iteration 4122/4200 Training loss: 0.3968 1.7395 sec/batch
Epoch 295/300 Iteration 4123/4200 Training loss: 0.3947 1.7410 sec/batch
Epoch 295/300 Iteration 4124/4200 Training loss: 0.3945 1.7604 sec/batch
Epoch 295/300 Iteration 4125/4200 Training loss: 0.3947 1.7512 sec/batch
Epoch 295/300 Iteration 4126/4200 Training loss: 0.3951 1.7415 sec/batch
Epoch 295/300 Iteration 4127/4200 Training loss: 0.3944 1.7421 sec/batch
Epoch 295/300 Iteration 4128/4200 Training loss: 0.3931 1.7417 sec/batch
Epoch 295/300 Iteration 4129/4200 Training loss: 0.3936 1.7418 sec/batch
Epoch 295/300 Iteration 4130/4200 Training loss: 0.3933 1.7776 sec/batch
Epoch 296/300 Iteration 4131/4200 Training loss: 0.4405 1.7949 sec/batch
Epoch 296/300 Iteration 4132/4200 Training loss: 0.4124 1.7397 sec/batch
Epoch 296/300 Iteration 4133/4200 Training loss: 0.4041 1.7574 sec/batch
Epoch 296/300 Iteration 4134/4200 Training loss: 0.4024 1.7978 sec/batch
Epoch 296/300 Iteration 4135/4200 Training loss: 0.4020 1.7961 sec/batch
Epoch 296/300 Iteration 4136/4200 Training loss: 0.4016 1.7950 sec/batch
Epoch 296/300 Iteration 4137/4200 Training loss: 0.4015 1.7961 sec/batch
Epoch 296/300 Iteration 4138/4200 Training loss: 0.3995 1.7980 sec/batch
Epoch 296/300 Iteration 4139/4200 Training loss: 0.3983 1.7939 sec/batch
Epoch 296/300 Iteration 4140/4200 Training loss: 0.3997 1.7965 sec/batch
Epoch 296/300 Iteration 4141/4200 Training loss: 0.3991 1.8020 sec/batch
Epoch 296/300 Iteration 4142/4200 Training loss: 0.3986 1.7930 sec/batch
Epoch 296/300 Iteration 4143/4200 Training loss: 0.3983 1.7935 sec/batch
Epoch 296/300 Iteration 4144/4200 Training loss: 0.3975 1.7983 sec/batch
Epoch 297/300 Iteration 4145/4200 Training loss: 0.4355 1.7947 sec/batch
Epoch 297/300 Iteration 4146/4200 Training loss: 0.4111 1.7927 sec/batch
Epoch 297/300 Iteration 4147/4200 Training loss: 0.4027 1.7946 sec/batch
Epoch 297/300 Iteration 4148/4200 Training loss: 0.4014 1.7973 sec/batch
Epoch 297/300 Iteration 4149/4200 Training loss: 0.3992 1.7947 sec/batch
Epoch 297/300 Iteration 4150/4200 Training loss: 0.3973 1.7388 sec/batch
Epoch 297/300 Iteration 4151/4200 Training loss: 0.3947 1.7453 sec/batch
Epoch 297/300 Iteration 4152/4200 Training loss: 0.3942 1.7379 sec/batch
Epoch 297/300 Iteration 4153/4200 Training loss: 0.3942 1.7396 sec/batch
Epoch 297/300 Iteration 4154/4200 Training loss: 0.3935 1.7445 sec/batch
Epoch 297/300 Iteration 4155/4200 Training loss: 0.3939 1.7374 sec/batch
Epoch 297/300 Iteration 4156/4200 Training loss: 0.3928 1.7387 sec/batch
Epoch 297/300 Iteration 4157/4200 Training loss: 0.3929 1.7386 sec/batch
Epoch 297/300 Iteration 4158/4200 Training loss: 0.3918 1.7369 sec/batch
Epoch 298/300 Iteration 4159/4200 Training loss: 0.4390 1.7395 sec/batch
Epoch 298/300 Iteration 4160/4200 Training loss: 0.4113 1.7394 sec/batch
Epoch 298/300 Iteration 4161/4200 Training loss: 0.3996 1.7432 sec/batch
Epoch 298/300 Iteration 4162/4200 Training loss: 0.3971 1.7398 sec/batch
Epoch 298/300 Iteration 4163/4200 Training loss: 0.3937 1.7380 sec/batch
Epoch 298/300 Iteration 4164/4200 Training loss: 0.3910 1.7377 sec/batch
Epoch 298/300 Iteration 4165/4200 Training loss: 0.3900 1.7449 sec/batch
Epoch 298/300 Iteration 4166/4200 Training loss: 0.3889 1.7432 sec/batch
Epoch 298/300 Iteration 4167/4200 Training loss: 0.3899 1.7390 sec/batch
Epoch 298/300 Iteration 4168/4200 Training loss: 0.3897 1.7442 sec/batch
Epoch 298/300 Iteration 4169/4200 Training loss: 0.3891 1.7428 sec/batch
Epoch 298/300 Iteration 4170/4200 Training loss: 0.3885 1.7412 sec/batch
Epoch 298/300 Iteration 4171/4200 Training loss: 0.3894 1.7403 sec/batch
Epoch 298/300 Iteration 4172/4200 Training loss: 0.3880 1.7459 sec/batch
Epoch 299/300 Iteration 4173/4200 Training loss: 0.4025 1.7420 sec/batch
Epoch 299/300 Iteration 4174/4200 Training loss: 0.3868 1.7386 sec/batch
Epoch 299/300 Iteration 4175/4200 Training loss: 0.3799 1.7467 sec/batch
Epoch 299/300 Iteration 4176/4200 Training loss: 0.3825 1.7403 sec/batch
Epoch 299/300 Iteration 4177/4200 Training loss: 0.3821 1.7404 sec/batch
Epoch 299/300 Iteration 4178/4200 Training loss: 0.3835 1.7394 sec/batch
Epoch 299/300 Iteration 4179/4200 Training loss: 0.3842 1.7403 sec/batch
Epoch 299/300 Iteration 4180/4200 Training loss: 0.3837 1.7406 sec/batch
Epoch 299/300 Iteration 4181/4200 Training loss: 0.3815 1.7373 sec/batch
Epoch 299/300 Iteration 4182/4200 Training loss: 0.3819 1.7388 sec/batch
Epoch 299/300 Iteration 4183/4200 Training loss: 0.3822 1.7406 sec/batch
Epoch 299/300 Iteration 4184/4200 Training loss: 0.3816 1.7383 sec/batch
Epoch 299/300 Iteration 4185/4200 Training loss: 0.3823 1.7424 sec/batch
Epoch 299/300 Iteration 4186/4200 Training loss: 0.3821 1.7393 sec/batch
Epoch 300/300 Iteration 4187/4200 Training loss: 0.4200 1.7420 sec/batch
Epoch 300/300 Iteration 4188/4200 Training loss: 0.3984 1.7393 sec/batch
Epoch 300/300 Iteration 4189/4200 Training loss: 0.3886 1.7435 sec/batch
Epoch 300/300 Iteration 4190/4200 Training loss: 0.3896 1.7412 sec/batch
Epoch 300/300 Iteration 4191/4200 Training loss: 0.3871 1.7378 sec/batch
Epoch 300/300 Iteration 4192/4200 Training loss: 0.3890 1.7429 sec/batch
Epoch 300/300 Iteration 4193/4200 Training loss: 0.3883 1.7491 sec/batch
Epoch 300/300 Iteration 4194/4200 Training loss: 0.3872 1.7514 sec/batch
Epoch 300/300 Iteration 4195/4200 Training loss: 0.3827 1.7382 sec/batch
Epoch 300/300 Iteration 4196/4200 Training loss: 0.3825 1.7435 sec/batch
Epoch 300/300 Iteration 4197/4200 Training loss: 0.3840 1.7384 sec/batch
Epoch 300/300 Iteration 4198/4200 Training loss: 0.3837 1.7411 sec/batch
Epoch 300/300 Iteration 4199/4200 Training loss: 0.3839 1.7398 sec/batch
Epoch 300/300 Iteration 4200/4200 Training loss: 0.3832 1.7365 sec/batch
Validation loss: 2.9823 Saving checkpoint!
Read up on saving and loading checkpoints here: https://www.tensorflow.org/programmers_guide/variables
In [13]:
tf.train.get_checkpoint_state('checkpoints')
Out[13]:
model_checkpoint_path: "checkpoints/i4200_l512_v2.982.ckpt"
all_model_checkpoint_paths: "checkpoints/i100_l512_v3.005.ckpt"
all_model_checkpoint_paths: "checkpoints/i200_l512_v2.270.ckpt"
all_model_checkpoint_paths: "checkpoints/i300_l512_v2.085.ckpt"
all_model_checkpoint_paths: "checkpoints/i400_l512_v1.956.ckpt"
all_model_checkpoint_paths: "checkpoints/i500_l512_v1.859.ckpt"
all_model_checkpoint_paths: "checkpoints/i600_l512_v1.772.ckpt"
all_model_checkpoint_paths: "checkpoints/i700_l512_v1.710.ckpt"
all_model_checkpoint_paths: "checkpoints/i800_l512_v1.655.ckpt"
all_model_checkpoint_paths: "checkpoints/i900_l512_v1.620.ckpt"
all_model_checkpoint_paths: "checkpoints/i1000_l512_v1.595.ckpt"
all_model_checkpoint_paths: "checkpoints/i1100_l512_v1.587.ckpt"
all_model_checkpoint_paths: "checkpoints/i1200_l512_v1.605.ckpt"
all_model_checkpoint_paths: "checkpoints/i1300_l512_v1.624.ckpt"
all_model_checkpoint_paths: "checkpoints/i1400_l512_v1.657.ckpt"
all_model_checkpoint_paths: "checkpoints/i1500_l512_v1.685.ckpt"
all_model_checkpoint_paths: "checkpoints/i1600_l512_v1.725.ckpt"
all_model_checkpoint_paths: "checkpoints/i1700_l512_v1.778.ckpt"
all_model_checkpoint_paths: "checkpoints/i1800_l512_v1.837.ckpt"
all_model_checkpoint_paths: "checkpoints/i1900_l512_v1.878.ckpt"
all_model_checkpoint_paths: "checkpoints/i2000_l512_v1.938.ckpt"
all_model_checkpoint_paths: "checkpoints/i2100_l512_v2.006.ckpt"
all_model_checkpoint_paths: "checkpoints/i2200_l512_v2.046.ckpt"
all_model_checkpoint_paths: "checkpoints/i2300_l512_v2.082.ckpt"
all_model_checkpoint_paths: "checkpoints/i2400_l512_v2.128.ckpt"
all_model_checkpoint_paths: "checkpoints/i2500_l512_v2.200.ckpt"
all_model_checkpoint_paths: "checkpoints/i2600_l512_v2.285.ckpt"
all_model_checkpoint_paths: "checkpoints/i2700_l512_v2.288.ckpt"
all_model_checkpoint_paths: "checkpoints/i2800_l512_v2.371.ckpt"
all_model_checkpoint_paths: "checkpoints/i2900_l512_v2.443.ckpt"
all_model_checkpoint_paths: "checkpoints/i3000_l512_v2.497.ckpt"
all_model_checkpoint_paths: "checkpoints/i3100_l512_v2.544.ckpt"
all_model_checkpoint_paths: "checkpoints/i3200_l512_v2.545.ckpt"
all_model_checkpoint_paths: "checkpoints/i3300_l512_v2.617.ckpt"
all_model_checkpoint_paths: "checkpoints/i3400_l512_v2.631.ckpt"
all_model_checkpoint_paths: "checkpoints/i3500_l512_v2.688.ckpt"
all_model_checkpoint_paths: "checkpoints/i3600_l512_v2.730.ckpt"
all_model_checkpoint_paths: "checkpoints/i3700_l512_v2.779.ckpt"
all_model_checkpoint_paths: "checkpoints/i3800_l512_v2.815.ckpt"
all_model_checkpoint_paths: "checkpoints/i3900_l512_v2.882.ckpt"
all_model_checkpoint_paths: "checkpoints/i4000_l512_v2.889.ckpt"
all_model_checkpoint_paths: "checkpoints/i4100_l512_v2.933.ckpt"
all_model_checkpoint_paths: "checkpoints/i4200_l512_v2.982.ckpt"
Now that the network is trained, we'll can use it to generate new text. The idea is that we pass in a character, then the network will predict the next character. We can use the new one, to predict the next one. And we keep doing this to generate all new text. I also included some functionality to prime the network with some text by passing in a string and building up a state from that.
The network gives us predictions for each character. To reduce noise and make things a little less random, I'm going to only choose a new character from the top N most likely characters.
In [14]:
def pick_top_n(preds, vocab_size, top_n=5):
p = np.squeeze(preds)
p[np.argsort(p)[:-top_n]] = 0
p = p / np.sum(p)
c = np.random.choice(vocab_size, 1, p=p)[0]
return c
In [15]:
def sample(checkpoint, n_samples, lstm_size, vocab_size, prime="The "):
samples = [c for c in prime]
model = build_rnn(vocab_size, lstm_size=lstm_size, sampling=True)
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess, checkpoint)
new_state = sess.run(model.initial_state)
for c in prime:
x = np.zeros((1, 1))
x[0,0] = vocab_to_int[c]
feed = {model.inputs: x,
model.keep_prob: 1.,
model.initial_state: new_state}
preds, new_state = sess.run([model.preds, model.final_state],
feed_dict=feed)
c = pick_top_n(preds, len(vocab))
samples.append(int_to_vocab[c])
for i in range(n_samples):
x[0,0] = c
feed = {model.inputs: x,
model.keep_prob: 1.,
model.initial_state: new_state}
preds, new_state = sess.run([model.preds, model.final_state],
feed_dict=feed)
c = pick_top_n(preds, len(vocab))
samples.append(int_to_vocab[c])
return ''.join(samples)
Here, pass in the path to a checkpoint and sample from the network.
In [21]:
checkpoint = "checkpoints/i3000_l512_v2.497.ckpt"
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Cuando en")
print(samp)
Cuando entró el diajo del estadio
del bastio, me dijo que no tenía imotinación porque el cierto so lo también la misma diarte: dijo: «Es cierto» y aque había pies, en el findo con la tona de descalor. Subre de decuris, as ecpertar. No toves nado de que podía decir aún aquí los acisados a
cansarme. Como se había puesto en
calla pero no hablarme de una paqeeta que contentaran un pin la
padez cuando ma cerraba que conclunía por los
colvellos de pare contreruo. Todos los casos de su cuerto. El compré del cocho de la carte. Los dejasos descabaría todo su vinteso. Pero porque el ciero tanío hablado, con mino desengrecionas que estaba
muy lentomonto lo ballaba, las alena. El portero dejó con una habitación de y. Pero me dijo que no tenía amás que resplander al mirmo tol me
los cuanto me sirpueran el mismo ardea de la miradada, dal ser dicido, paro los trojes de su antes de mi curato. Era el ciiró un poció en la corciancia. Como el careo de abamirados en el cielo y del colmento de no diseguia muy sino.
Content source: javoweb/deep-learning
Similar notebooks: