In this notebook, I'll build a character-wise RNN trained on Anna Karenina, one of my all-time favorite books. It'll be able to generate new text based on the text from the book.
This network is based off of Andrej Karpathy's post on RNNs and implementation in Torch. Also, some information here at r2rt and from Sherjil Ozair on GitHub. Below is the general architecture of the character-wise RNN.
In [1]:
import time
from collections import namedtuple
import numpy as np
import tensorflow as tf
First we'll load the text file and convert it into integers for our network to use. Here I'm creating a couple dictionaries to convert the characters to and from integers. Encoding the characters as integers makes it easier to use as input in the network.
In [2]:
with open('jinpingmei.txt', 'r') as f:
text=f.read()
vocab = sorted(set(text))
vocab_to_int = {c: i for i, c in enumerate(vocab)}
int_to_vocab = dict(enumerate(vocab))
encoded = np.array([vocab_to_int[c] for c in text], dtype=np.int32)
Let's check out the first 100 characters, make sure everything is peachy. According to the American Book Review, this is the 6th best first line of a book ever.
In [3]:
text[:100]
Out[3]:
'第一回\u3000西门庆热结十弟兄\u3000武二郎冷遇亲哥嫂 \n \n\u3000\u3000诗曰: \n\n\u3000\u3000\u3000\u3000豪华去后行人绝,箫筝不响歌喉咽。雄剑无威光彩沉,宝琴零落金星灭。 \n\u3000\u3000\u3000\u3000玉阶寂寞坠秋露,月照当时歌舞处。当时歌舞人不回,化'
And we can see the characters encoded as integers.
In [4]:
encoded[:100]
Out[4]:
array([2831, 41, 773, 34, 3539, 4068, 1204, 2328, 2964, 484, 1249,
308, 34, 2037, 109, 3951, 361, 3921, 130, 671, 979, 1,
0, 1, 0, 34, 34, 3606, 1818, 4459, 1, 0, 0,
34, 34, 34, 34, 3668, 490, 533, 583, 3468, 132, 2971,
4456, 2861, 2844, 49, 662, 2032, 716, 654, 36, 4150, 424,
1756, 946, 312, 1267, 2108, 4456, 1024, 2477, 4162, 3339, 3993,
1780, 2291, 36, 1, 0, 34, 34, 34, 34, 2444, 4108,
1044, 1054, 813, 2756, 4177, 4456, 1826, 2342, 1261, 1767, 2032,
3221, 863, 36, 1261, 1767, 2032, 3221, 132, 49, 773, 4456,
470])
Since the network is working with individual characters, it's similar to a classification problem in which we are trying to predict the next character from the previous text. Here's how many 'classes' our network has to pick from.
In [5]:
len(vocab)
Out[5]:
4464
Here is where we'll make our mini-batches for training. Remember that we want our batches to be multiple sequences of some desired number of sequence steps. Considering a simple example, our batches would look like this:
We have our text encoded as integers as one long array in encoded. Let's create a function that will give us an iterator for our batches. I like using generator functions to do this. Then we can pass encoded into this function and get our batch generator.
The first thing we need to do is discard some of the text so we only have completely full batches. Each batch contains $N \times M$ characters, where $N$ is the batch size (the number of sequences) and $M$ is the number of steps. Then, to get the number of batches we can make from some array arr, you divide the length of arr by the batch size. Once you know the number of batches and the batch size, you can get the total number of characters to keep.
After that, we need to split arr into $N$ sequences. You can do this using arr.reshape(size) where size is a tuple containing the dimensions sizes of the reshaped array. We know we want $N$ sequences (n_seqs below), let's make that the size of the first dimension. For the second dimension, you can use -1 as a placeholder in the size, it'll fill up the array with the appropriate data for you. After this, you should have an array that is $N \times (M * K)$ where $K$ is the number of batches.
Now that we have this array, we can iterate through it to get our batches. The idea is each batch is a $N \times M$ window on the array. For each subsequent batch, the window moves over by n_steps. We also want to create both the input and target arrays. Remember that the targets are the inputs shifted over one character. You'll usually see the first input character used as the last target character, so something like this:
y[:, :-1], y[:, -1] = x[:, 1:], x[:, 0]
where x is the input batch and y is the target batch.
The way I like to do this window is use range to take steps of size n_steps from $0$ to arr.shape[1], the total number of steps in each sequence. That way, the integers you get from range always point to the start of a batch, and each window is n_steps wide.
In [6]:
def get_batches(arr, n_seqs, n_steps):
'''Create a generator that returns batches of size
n_seqs x n_steps from arr.
Arguments
---------
arr: Array you want to make batches from
n_seqs: Batch size, the number of sequences per batch
n_steps: Number of sequence steps per batch
'''
# Get the number of characters per batch and number of batches we can make
characters_per_batch = n_seqs * n_steps
n_batches = len(arr)//characters_per_batch
# Keep only enough characters to make full batches
arr = arr[:n_batches * characters_per_batch]
# Reshape into n_seqs rows
arr = arr.reshape((n_seqs, -1))
for n in range(0, arr.shape[1], n_steps):
# The features
x = arr[:, n:n+n_steps]
# The targets, shifted by one
y = np.zeros_like(x)
y[:, :-1], y[:, -1] = x[:, 1:], x[:, 0]
yield x, y
Now I'll make my data sets and we can check out what's going on here. Here I'm going to use a batch size of 10 and 50 sequence steps.
In [7]:
batches = get_batches(encoded, 10, 50)
x, y = next(batches)
In [8]:
print('x\n', x[:10, :10])
print('\ny\n', y[:10, :10])
x
[[2831 41 773 34 3539 4068 1204 2328 2964 484]
[3539 4068 1204 805 48 4456 4072 3925 4459 30]
[ 583 3856 1826 954 1451 3989 1464 105 506 304]
[ 533 3040 36 31 1 0 0 34 34 1194]
[ 149 868 2511 976 913 2057 875 932 4456 793]
[ 132 4456 1094 4050 3474 84 3094 4456 149 1038]
[1719 36 3539 4068 1204 773 3205 518 47 4456]
[4459 30 1438 1006 2510 347 100 50 2206 1016]
[1782 1943 1001 833 69 538 130 2632 2624 3543]
[2632 1775 4456 49 3570 2586 206 36 31 3870]]
y
[[ 41 773 34 3539 4068 1204 2328 2964 484 1249]
[4068 1204 805 48 4456 4072 3925 4459 30 1323]
[3856 1826 954 1451 3989 1464 105 506 304 4456]
[3040 36 31 1 0 0 34 34 1194 1016]
[ 868 2511 976 913 2057 875 932 4456 793 4068]
[4456 1094 4050 3474 84 3094 4456 149 1038 2753]
[ 36 3539 4068 1204 773 3205 518 47 4456 1070]
[ 30 1438 1006 2510 347 100 50 2206 1016 397]
[1943 1001 833 69 538 130 2632 2624 3543 1854]
[1775 4456 49 3570 2586 206 36 31 3870 1720]]
If you implemented get_batches correctly, the above output should look something like
x
[[55 63 69 22 6 76 45 5 16 35]
[ 5 69 1 5 12 52 6 5 56 52]
[48 29 12 61 35 35 8 64 76 78]
[12 5 24 39 45 29 12 56 5 63]
[ 5 29 6 5 29 78 28 5 78 29]
[ 5 13 6 5 36 69 78 35 52 12]
[63 76 12 5 18 52 1 76 5 58]
[34 5 73 39 6 5 12 52 36 5]
[ 6 5 29 78 12 79 6 61 5 59]
[ 5 78 69 29 24 5 6 52 5 63]]
y
[[63 69 22 6 76 45 5 16 35 35]
[69 1 5 12 52 6 5 56 52 29]
[29 12 61 35 35 8 64 76 78 28]
[ 5 24 39 45 29 12 56 5 63 29]
[29 6 5 29 78 28 5 78 29 45]
[13 6 5 36 69 78 35 52 12 43]
[76 12 5 18 52 1 76 5 58 52]
[ 5 73 39 6 5 12 52 36 5 78]
[ 5 29 78 12 79 6 61 5 59 63]
[78 69 29 24 5 6 52 5 63 76]]
although the exact numbers will be different. Check to make sure the data is shifted over one step for y.
Below is where you'll build the network. We'll break it up into parts so it's easier to reason about each bit. Then we can connect them up into the whole network.
First off we'll create our input placeholders. As usual we need placeholders for the training data and the targets. We'll also create a placeholder for dropout layers called keep_prob.
In [9]:
def build_inputs(batch_size, num_steps):
''' Define placeholders for inputs, targets, and dropout
Arguments
---------
batch_size: Batch size, number of sequences per batch
num_steps: Number of sequence steps in a batch
'''
# Declare placeholders we'll feed into the graph
inputs = tf.placeholder(tf.int32, [batch_size, num_steps], name='inputs')
targets = tf.placeholder(tf.int32, [batch_size, num_steps], name='targets')
# Keep probability placeholder for drop out layers
keep_prob = tf.placeholder(tf.float32, name='keep_prob')
return inputs, targets, keep_prob
Here we will create the LSTM cell we'll use in the hidden layer. We'll use this cell as a building block for the RNN. So we aren't actually defining the RNN here, just the type of cell we'll use in the hidden layer.
We first create a basic LSTM cell with
lstm = tf.contrib.rnn.BasicLSTMCell(num_units)
where num_units is the number of units in the hidden layers in the cell. Then we can add dropout by wrapping it with
tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
You pass in a cell and it will automatically add dropout to the inputs or outputs. Finally, we can stack up the LSTM cells into layers with tf.contrib.rnn.MultiRNNCell. With this, you pass in a list of cells and it will send the output of one cell into the next cell. Previously with TensorFlow 1.0, you could do this
tf.contrib.rnn.MultiRNNCell([cell]*num_layers)
This might look a little weird if you know Python well because this will create a list of the same cell object. However, TensorFlow 1.0 will create different weight matrices for all cell objects. But, starting with TensorFlow 1.1 you actually need to create new cell objects in the list. To get it to work in TensorFlow 1.1, it should look like
def build_cell(num_units, keep_prob):
lstm = tf.contrib.rnn.BasicLSTMCell(num_units)
drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
return drop
tf.contrib.rnn.MultiRNNCell([build_cell(num_units, keep_prob) for _ in range(num_layers)])
Even though this is actually multiple LSTM cells stacked on each other, you can treat the multiple layers as one cell.
We also need to create an initial cell state of all zeros. This can be done like so
initial_state = cell.zero_state(batch_size, tf.float32)
Below, we implement the build_lstm function to create these LSTM cells and the initial state.
In [10]:
def build_lstm(lstm_size, num_layers, batch_size, keep_prob):
''' Build LSTM cell.
Arguments
---------
keep_prob: Scalar tensor (tf.placeholder) for the dropout keep probability
lstm_size: Size of the hidden layers in the LSTM cells
num_layers: Number of LSTM layers
batch_size: Batch size
'''
### Build the LSTM Cell
def build_cell(lstm_size, keep_prob):
# Use a basic LSTM cell
lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
# Add dropout to the cell
drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
return drop
# Stack up multiple LSTM layers, for deep learning
cell = tf.contrib.rnn.MultiRNNCell([build_cell(lstm_size, keep_prob) for _ in range(num_layers)])
initial_state = cell.zero_state(batch_size, tf.float32)
return cell, initial_state
Here we'll create the output layer. We need to connect the output of the RNN cells to a full connected layer with a softmax output. The softmax output gives us a probability distribution we can use to predict the next character.
If our input has batch size $N$, number of steps $M$, and the hidden layer has $L$ hidden units, then the output is a 3D tensor with size $N \times M \times L$. The output of each LSTM cell has size $L$, we have $M$ of them, one for each sequence step, and we have $N$ sequences. So the total size is $N \times M \times L$.
We are using the same fully connected layer, the same weights, for each of the outputs. Then, to make things easier, we should reshape the outputs into a 2D tensor with shape $(M * N) \times L$. That is, one row for each sequence and step, where the values of each row are the output from the LSTM cells.
One we have the outputs reshaped, we can do the matrix multiplication with the weights. We need to wrap the weight and bias variables in a variable scope with tf.variable_scope(scope_name) because there are weights being created in the LSTM cells. TensorFlow will throw an error if the weights created here have the same names as the weights created in the LSTM cells, which they will be default. To avoid this, we wrap the variables in a variable scope so we can give them unique names.
In [11]:
def build_output(lstm_output, in_size, out_size):
''' Build a softmax layer, return the softmax output and logits.
Arguments
---------
x: Input tensor
in_size: Size of the input tensor, for example, size of the LSTM cells
out_size: Size of this softmax layer
'''
# Reshape output so it's a bunch of rows, one row for each step for each sequence.
# That is, the shape should be batch_size*num_steps rows by lstm_size columns
seq_output = tf.concat(lstm_output, axis=1)
x = tf.reshape(seq_output, [-1, in_size])
# Connect the RNN outputs to a softmax layer
with tf.variable_scope('softmax'):
softmax_w = tf.Variable(tf.truncated_normal((in_size, out_size), stddev=0.1))
softmax_b = tf.Variable(tf.zeros(out_size))
# Since output is a bunch of rows of RNN cell outputs, logits will be a bunch
# of rows of logit outputs, one for each step and sequence
logits = tf.matmul(x, softmax_w) + softmax_b
# Use softmax to get the probabilities for predicted characters
out = tf.nn.softmax(logits, name='predictions')
return out, logits
Next up is the training loss. We get the logits and targets and calculate the softmax cross-entropy loss. First we need to one-hot encode the targets, we're getting them as encoded characters. Then, reshape the one-hot targets so it's a 2D tensor with size $(M*N) \times C$ where $C$ is the number of classes/characters we have. Remember that we reshaped the LSTM outputs and ran them through a fully connected layer with $C$ units. So our logits will also have size $(M*N) \times C$.
Then we run the logits and targets through tf.nn.softmax_cross_entropy_with_logits and find the mean to get the loss.
In [12]:
def build_loss(logits, targets, lstm_size, num_classes):
''' Calculate the loss from the logits and the targets.
Arguments
---------
logits: Logits from final fully connected layer
targets: Targets for supervised learning
lstm_size: Number of LSTM hidden units
num_classes: Number of classes in targets
'''
# One-hot encode targets and reshape to match logits, one row per batch_size per step
y_one_hot = tf.one_hot(targets, num_classes)
y_reshaped = tf.reshape(y_one_hot, logits.get_shape())
# Softmax cross entropy loss
loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y_reshaped)
loss = tf.reduce_mean(loss)
return loss
Here we build the optimizer. Normal RNNs have have issues gradients exploding and disappearing. LSTMs fix the disappearance problem, but the gradients can still grow without bound. To fix this, we can clip the gradients above some threshold. That is, if a gradient is larger than that threshold, we set it to the threshold. This will ensure the gradients never grow overly large. Then we use an AdamOptimizer for the learning step.
In [13]:
def build_optimizer(loss, learning_rate, grad_clip):
''' Build optmizer for training, using gradient clipping.
Arguments:
loss: Network loss
learning_rate: Learning rate for optimizer
'''
# Optimizer for training, using gradient clipping to control exploding gradients
tvars = tf.trainable_variables()
grads, _ = tf.clip_by_global_norm(tf.gradients(loss, tvars), grad_clip)
train_op = tf.train.AdamOptimizer(learning_rate)
optimizer = train_op.apply_gradients(zip(grads, tvars))
return optimizer
Now we can put all the pieces together and build a class for the network. To actually run data through the LSTM cells, we will use tf.nn.dynamic_rnn. This function will pass the hidden and cell states across LSTM cells appropriately for us. It returns the outputs for each LSTM cell at each step for each sequence in the mini-batch. It also gives us the final LSTM state. We want to save this state as final_state so we can pass it to the first LSTM cell in the the next mini-batch run. For tf.nn.dynamic_rnn, we pass in the cell and initial state we get from build_lstm, as well as our input sequences. Also, we need to one-hot encode the inputs before going into the RNN.
In [14]:
class CharRNN:
def __init__(self, num_classes, batch_size=64, num_steps=50,
lstm_size=128, num_layers=2, learning_rate=0.001,
grad_clip=5, sampling=False):
# When we're using this network for sampling later, we'll be passing in
# one character at a time, so providing an option for that
if sampling == True:
batch_size, num_steps = 1, 1
else:
batch_size, num_steps = batch_size, num_steps
tf.reset_default_graph()
# Build the input placeholder tensors
self.inputs, self.targets, self.keep_prob = build_inputs(batch_size, num_steps)
# Build the LSTM cell
cell, self.initial_state = build_lstm(lstm_size, num_layers, batch_size, self.keep_prob)
### Run the data through the RNN layers
# First, one-hot encode the input tokens
x_one_hot = tf.one_hot(self.inputs, num_classes)
# Run each sequence step through the RNN and collect the outputs
outputs, state = tf.nn.dynamic_rnn(cell, x_one_hot, initial_state=self.initial_state)
self.final_state = state
# Get softmax predictions and logits
self.prediction, self.logits = build_output(outputs, lstm_size, num_classes)
# Loss and optimizer (with gradient clipping)
self.loss = build_loss(self.logits, self.targets, lstm_size, num_classes)
self.optimizer = build_optimizer(self.loss, learning_rate, grad_clip)
Here I'm defining the hyperparameters for the network.
batch_size - Number of sequences running through the network in one pass.num_steps - Number of characters in the sequence the network is trained on. Larger is better typically, the network will learn more long range dependencies. But it takes longer to train. 100 is typically a good number here.lstm_size - The number of units in the hidden layers.num_layers - Number of hidden LSTM layers to uselearning_rate - Learning rate for trainingkeep_prob - The dropout keep probability when training. If you're network is overfitting, try decreasing this.Here's some good advice from Andrej Karpathy on training the network. I'm going to copy it in here for your benefit, but also link to where it originally came from.
Tips and Tricks
Monitoring Validation Loss vs. Training Loss
If you're somewhat new to Machine Learning or Neural Networks it can take a bit of expertise to get good models. The most important quantity to keep track of is the difference between your training loss (printed during training) and the validation loss (printed once in a while when the RNN is run on the validation data (by default every 1000 iterations)). In particular:
- If your training loss is much lower than validation loss then this means the network might be overfitting. Solutions to this are to decrease your network size, or to increase dropout. For example you could try dropout of 0.5 and so on.
- If your training/validation loss are about equal then your model is underfitting. Increase the size of your model (either number of layers or the raw number of neurons per layer)
Approximate number of parameters
The two most important parameters that control the model are
lstm_sizeandnum_layers. I would advise that you always usenum_layersof either 2/3. Thelstm_sizecan be adjusted based on how much data you have. The two important quantities to keep track of here are:
- The number of parameters in your model. This is printed when you start training.
- The size of your dataset. 1MB file is approximately 1 million characters.
These two should be about the same order of magnitude. It's a little tricky to tell. Here are some examples:
- I have a 100MB dataset and I'm using the default parameter settings (which currently print 150K parameters). My data size is significantly larger (100 mil >> 0.15 mil), so I expect to heavily underfit. I am thinking I can comfortably afford to make
lstm_sizelarger.- I have a 10MB dataset and running a 10 million parameter model. I'm slightly nervous and I'm carefully monitoring my validation loss. If it's larger than my training loss then I may want to try to increase dropout a bit and see if that helps the validation loss.
Best models strategy
The winning strategy to obtaining very good models (if you have the compute time) is to always err on making the network larger (as large as you're willing to wait for it to compute) and then try different dropout values (between 0,1). Whatever model has the best validation performance (the loss, written in the checkpoint filename, low is good) is the one you should use in the end.
It is very common in deep learning to run many different models with many different hyperparameter settings, and in the end take whatever checkpoint gave the best validation performance.
By the way, the size of your training and validation splits are also parameters. Make sure you have a decent amount of data in your validation set or otherwise the validation performance will be noisy and not very informative.
In [17]:
batch_size = 128 # Sequences per batch
num_steps = 100 # Number of sequence steps per batch
lstm_size = 512 # Size of hidden layers in LSTMs
num_layers = 2 # Number of LSTM layers
learning_rate = 0.0003 # Learning rate
keep_prob = 0.5 # Dropout keep probability
This is typical training code, passing inputs and targets into the network, then running the optimizer. Here we also get back the final LSTM state for the mini-batch. Then, we pass that state back into the network so the next batch can continue the state from the previous batch. And every so often (set by save_every_n) I save a checkpoint.
Here I'm saving checkpoints with the format
i{iteration number}_l{# hidden layer units}.ckpt
In [18]:
epochs = 50
# Save every N iterations
save_every_n = 200
model = CharRNN(len(vocab), batch_size=batch_size, num_steps=num_steps,
lstm_size=lstm_size, num_layers=num_layers,
learning_rate=learning_rate)
saver = tf.train.Saver(max_to_keep=100)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
# Use the line below to load a checkpoint and resume training
#saver.restore(sess, 'checkpoints/______.ckpt')
counter = 0
for e in range(epochs):
# Train network
new_state = sess.run(model.initial_state)
loss = 0
for x, y in get_batches(encoded, batch_size, num_steps):
counter += 1
start = time.time()
feed = {model.inputs: x,
model.targets: y,
model.keep_prob: keep_prob,
model.initial_state: new_state}
batch_loss, new_state, _ = sess.run([model.loss,
model.final_state,
model.optimizer],
feed_dict=feed)
end = time.time()
print('Epoch: {}/{}... '.format(e+1, epochs),
'Training Step: {}... '.format(counter),
'Training loss: {:.4f}... '.format(batch_loss),
'{:.4f} sec/batch'.format((end-start)))
if (counter % save_every_n == 0):
saver.save(sess, "checkpoints/i{}_l{}.ckpt".format(counter, lstm_size))
saver.save(sess, "checkpoints/i{}_l{}.ckpt".format(counter, lstm_size))
Epoch: 1/50... Training Step: 1... Training loss: 8.4041... 1.4952 sec/batch
Epoch: 1/50... Training Step: 2... Training loss: 8.3998... 1.4524 sec/batch
Epoch: 1/50... Training Step: 3... Training loss: 8.3945... 1.3991 sec/batch
Epoch: 1/50... Training Step: 4... Training loss: 8.3873... 1.4213 sec/batch
Epoch: 1/50... Training Step: 5... Training loss: 8.3733... 1.4446 sec/batch
Epoch: 1/50... Training Step: 6... Training loss: 8.3482... 1.3928 sec/batch
Epoch: 1/50... Training Step: 7... Training loss: 8.2876... 1.4095 sec/batch
Epoch: 1/50... Training Step: 8... Training loss: 8.1361... 1.4096 sec/batch
Epoch: 1/50... Training Step: 9... Training loss: 7.8635... 1.3634 sec/batch
Epoch: 1/50... Training Step: 10... Training loss: 7.8812... 1.4309 sec/batch
Epoch: 1/50... Training Step: 11... Training loss: 7.7826... 1.4422 sec/batch
Epoch: 1/50... Training Step: 12... Training loss: 7.6051... 1.4290 sec/batch
Epoch: 1/50... Training Step: 13... Training loss: 7.5716... 1.4138 sec/batch
Epoch: 1/50... Training Step: 14... Training loss: 7.5146... 1.4320 sec/batch
Epoch: 1/50... Training Step: 15... Training loss: 7.4315... 1.4189 sec/batch
Epoch: 1/50... Training Step: 16... Training loss: 7.3902... 1.4191 sec/batch
Epoch: 1/50... Training Step: 17... Training loss: 7.3200... 1.4182 sec/batch
Epoch: 1/50... Training Step: 18... Training loss: 7.2669... 1.3586 sec/batch
Epoch: 1/50... Training Step: 19... Training loss: 7.1855... 1.4206 sec/batch
Epoch: 1/50... Training Step: 20... Training loss: 7.1204... 1.4203 sec/batch
Epoch: 1/50... Training Step: 21... Training loss: 7.0440... 1.4192 sec/batch
Epoch: 1/50... Training Step: 22... Training loss: 6.9846... 1.4362 sec/batch
Epoch: 1/50... Training Step: 23... Training loss: 6.9807... 1.4440 sec/batch
Epoch: 1/50... Training Step: 24... Training loss: 6.8857... 1.4185 sec/batch
Epoch: 1/50... Training Step: 25... Training loss: 6.8277... 1.4294 sec/batch
Epoch: 1/50... Training Step: 26... Training loss: 6.8140... 1.4275 sec/batch
Epoch: 1/50... Training Step: 27... Training loss: 6.7916... 1.3947 sec/batch
Epoch: 1/50... Training Step: 28... Training loss: 6.7677... 1.4322 sec/batch
Epoch: 1/50... Training Step: 29... Training loss: 6.7035... 1.4405 sec/batch
Epoch: 1/50... Training Step: 30... Training loss: 6.7070... 1.3937 sec/batch
Epoch: 1/50... Training Step: 31... Training loss: 6.6189... 1.4403 sec/batch
Epoch: 1/50... Training Step: 32... Training loss: 6.6099... 1.4299 sec/batch
Epoch: 1/50... Training Step: 33... Training loss: 6.5814... 1.3934 sec/batch
Epoch: 1/50... Training Step: 34... Training loss: 6.6172... 1.3255 sec/batch
Epoch: 1/50... Training Step: 35... Training loss: 6.5959... 1.2594 sec/batch
Epoch: 1/50... Training Step: 36... Training loss: 6.5343... 1.4601 sec/batch
Epoch: 1/50... Training Step: 37... Training loss: 6.5823... 1.2933 sec/batch
Epoch: 1/50... Training Step: 38... Training loss: 6.5858... 1.3944 sec/batch
Epoch: 1/50... Training Step: 39... Training loss: 6.6164... 1.4282 sec/batch
Epoch: 1/50... Training Step: 40... Training loss: 6.5207... 1.4261 sec/batch
Epoch: 1/50... Training Step: 41... Training loss: 6.4745... 1.4157 sec/batch
Epoch: 1/50... Training Step: 42... Training loss: 6.4701... 1.4269 sec/batch
Epoch: 1/50... Training Step: 43... Training loss: 6.4750... 1.4117 sec/batch
Epoch: 1/50... Training Step: 44... Training loss: 6.4179... 1.4170 sec/batch
Epoch: 1/50... Training Step: 45... Training loss: 6.4395... 1.3711 sec/batch
Epoch: 1/50... Training Step: 46... Training loss: 6.4293... 1.4336 sec/batch
Epoch: 1/50... Training Step: 47... Training loss: 6.4261... 1.3729 sec/batch
Epoch: 1/50... Training Step: 48... Training loss: 6.4262... 1.4187 sec/batch
Epoch: 1/50... Training Step: 49... Training loss: 6.4385... 1.3363 sec/batch
Epoch: 1/50... Training Step: 50... Training loss: 6.3844... 1.4323 sec/batch
Epoch: 1/50... Training Step: 51... Training loss: 6.3757... 1.4207 sec/batch
Epoch: 1/50... Training Step: 52... Training loss: 6.3973... 1.3322 sec/batch
Epoch: 1/50... Training Step: 53... Training loss: 6.3551... 1.3747 sec/batch
Epoch: 1/50... Training Step: 54... Training loss: 6.3960... 1.4434 sec/batch
Epoch: 1/50... Training Step: 55... Training loss: 6.3463... 1.4204 sec/batch
Epoch: 1/50... Training Step: 56... Training loss: 6.3021... 1.4234 sec/batch
Epoch: 1/50... Training Step: 57... Training loss: 6.3714... 1.4427 sec/batch
Epoch: 1/50... Training Step: 58... Training loss: 6.3331... 1.4178 sec/batch
Epoch: 1/50... Training Step: 59... Training loss: 6.3458... 1.4051 sec/batch
Epoch: 1/50... Training Step: 60... Training loss: 6.3477... 1.4162 sec/batch
Epoch: 1/50... Training Step: 61... Training loss: 6.3354... 1.4550 sec/batch
Epoch: 2/50... Training Step: 62... Training loss: 6.5991... 1.4494 sec/batch
Epoch: 2/50... Training Step: 63... Training loss: 6.3203... 1.4167 sec/batch
Epoch: 2/50... Training Step: 64... Training loss: 6.2970... 1.4287 sec/batch
Epoch: 2/50... Training Step: 65... Training loss: 6.3479... 1.4261 sec/batch
Epoch: 2/50... Training Step: 66... Training loss: 6.3361... 1.4250 sec/batch
Epoch: 2/50... Training Step: 67... Training loss: 6.3714... 1.4217 sec/batch
Epoch: 2/50... Training Step: 68... Training loss: 6.3474... 1.4571 sec/batch
Epoch: 2/50... Training Step: 69... Training loss: 6.4134... 1.3687 sec/batch
Epoch: 2/50... Training Step: 70... Training loss: 6.3248... 1.4562 sec/batch
Epoch: 2/50... Training Step: 71... Training loss: 6.3431... 1.4272 sec/batch
Epoch: 2/50... Training Step: 72... Training loss: 6.3954... 1.4183 sec/batch
Epoch: 2/50... Training Step: 73... Training loss: 6.3161... 1.4253 sec/batch
Epoch: 2/50... Training Step: 74... Training loss: 6.3417... 1.4191 sec/batch
Epoch: 2/50... Training Step: 75... Training loss: 6.3496... 1.3595 sec/batch
Epoch: 2/50... Training Step: 76... Training loss: 6.2712... 1.4654 sec/batch
Epoch: 2/50... Training Step: 77... Training loss: 6.3360... 1.3916 sec/batch
Epoch: 2/50... Training Step: 78... Training loss: 6.3354... 1.4348 sec/batch
Epoch: 2/50... Training Step: 79... Training loss: 6.3267... 1.4399 sec/batch
Epoch: 2/50... Training Step: 80... Training loss: 6.3146... 1.4097 sec/batch
Epoch: 2/50... Training Step: 81... Training loss: 6.3257... 1.4183 sec/batch
Epoch: 2/50... Training Step: 82... Training loss: 6.3248... 1.4329 sec/batch
Epoch: 2/50... Training Step: 83... Training loss: 6.2568... 1.4385 sec/batch
Epoch: 2/50... Training Step: 84... Training loss: 6.3413... 1.3784 sec/batch
Epoch: 2/50... Training Step: 85... Training loss: 6.2521... 1.4372 sec/batch
Epoch: 2/50... Training Step: 86... Training loss: 6.2511... 1.3993 sec/batch
Epoch: 2/50... Training Step: 87... Training loss: 6.2588... 1.4051 sec/batch
Epoch: 2/50... Training Step: 88... Training loss: 6.2791... 1.3755 sec/batch
Epoch: 2/50... Training Step: 89... Training loss: 6.2915... 1.3827 sec/batch
Epoch: 2/50... Training Step: 90... Training loss: 6.2193... 1.3653 sec/batch
Epoch: 2/50... Training Step: 91... Training loss: 6.2644... 1.3557 sec/batch
Epoch: 2/50... Training Step: 92... Training loss: 6.2011... 1.3238 sec/batch
Epoch: 2/50... Training Step: 93... Training loss: 6.2163... 1.4196 sec/batch
Epoch: 2/50... Training Step: 94... Training loss: 6.2372... 1.3627 sec/batch
Epoch: 2/50... Training Step: 95... Training loss: 6.2481... 1.4082 sec/batch
Epoch: 2/50... Training Step: 96... Training loss: 6.2912... 1.4589 sec/batch
Epoch: 2/50... Training Step: 97... Training loss: 6.2545... 1.3824 sec/batch
Epoch: 2/50... Training Step: 98... Training loss: 6.2977... 1.4196 sec/batch
Epoch: 2/50... Training Step: 99... Training loss: 6.2972... 1.3919 sec/batch
Epoch: 2/50... Training Step: 100... Training loss: 6.3490... 1.4375 sec/batch
Epoch: 2/50... Training Step: 101... Training loss: 6.2528... 1.4494 sec/batch
Epoch: 2/50... Training Step: 102... Training loss: 6.2209... 1.4313 sec/batch
Epoch: 2/50... Training Step: 103... Training loss: 6.2378... 1.4221 sec/batch
Epoch: 2/50... Training Step: 104... Training loss: 6.2645... 1.4190 sec/batch
Epoch: 2/50... Training Step: 105... Training loss: 6.1841... 1.2915 sec/batch
Epoch: 2/50... Training Step: 106... Training loss: 6.1960... 1.4220 sec/batch
Epoch: 2/50... Training Step: 107... Training loss: 6.2146... 1.4232 sec/batch
Epoch: 2/50... Training Step: 108... Training loss: 6.2174... 1.4370 sec/batch
Epoch: 2/50... Training Step: 109... Training loss: 6.2191... 1.4557 sec/batch
Epoch: 2/50... Training Step: 110... Training loss: 6.2459... 1.3730 sec/batch
Epoch: 2/50... Training Step: 111... Training loss: 6.1884... 1.4246 sec/batch
Epoch: 2/50... Training Step: 112... Training loss: 6.1914... 1.4233 sec/batch
Epoch: 2/50... Training Step: 113... Training loss: 6.2229... 1.4164 sec/batch
Epoch: 2/50... Training Step: 114... Training loss: 6.1822... 1.4285 sec/batch
Epoch: 2/50... Training Step: 115... Training loss: 6.2109... 1.4196 sec/batch
Epoch: 2/50... Training Step: 116... Training loss: 6.2331... 1.4112 sec/batch
Epoch: 2/50... Training Step: 117... Training loss: 6.2807... 1.4300 sec/batch
Epoch: 2/50... Training Step: 118... Training loss: 6.3104... 1.4095 sec/batch
Epoch: 2/50... Training Step: 119... Training loss: 6.2109... 1.4373 sec/batch
Epoch: 2/50... Training Step: 120... Training loss: 6.1758... 1.4290 sec/batch
Epoch: 2/50... Training Step: 121... Training loss: 6.1963... 1.4340 sec/batch
Epoch: 2/50... Training Step: 122... Training loss: 6.1852... 1.4213 sec/batch
Epoch: 3/50... Training Step: 123... Training loss: 6.4350... 1.4188 sec/batch
Epoch: 3/50... Training Step: 124... Training loss: 6.1730... 1.4271 sec/batch
Epoch: 3/50... Training Step: 125... Training loss: 6.1623... 1.4282 sec/batch
Epoch: 3/50... Training Step: 126... Training loss: 6.2086... 1.4068 sec/batch
Epoch: 3/50... Training Step: 127... Training loss: 6.1886... 1.3683 sec/batch
Epoch: 3/50... Training Step: 128... Training loss: 6.2302... 1.4346 sec/batch
Epoch: 3/50... Training Step: 129... Training loss: 6.2167... 1.4257 sec/batch
Epoch: 3/50... Training Step: 130... Training loss: 6.2759... 1.4212 sec/batch
Epoch: 3/50... Training Step: 131... Training loss: 6.1817... 1.4530 sec/batch
Epoch: 3/50... Training Step: 132... Training loss: 6.2123... 1.4327 sec/batch
Epoch: 3/50... Training Step: 133... Training loss: 6.2597... 1.4342 sec/batch
Epoch: 3/50... Training Step: 134... Training loss: 6.1830... 1.4364 sec/batch
Epoch: 3/50... Training Step: 135... Training loss: 6.1823... 1.4057 sec/batch
Epoch: 3/50... Training Step: 136... Training loss: 6.2123... 1.4453 sec/batch
Epoch: 3/50... Training Step: 137... Training loss: 6.1325... 1.4226 sec/batch
Epoch: 3/50... Training Step: 138... Training loss: 6.1901... 1.4356 sec/batch
Epoch: 3/50... Training Step: 139... Training loss: 6.2086... 1.4249 sec/batch
Epoch: 3/50... Training Step: 140... Training loss: 6.2066... 1.4237 sec/batch
Epoch: 3/50... Training Step: 141... Training loss: 6.1877... 1.4208 sec/batch
Epoch: 3/50... Training Step: 142... Training loss: 6.2027... 1.4492 sec/batch
Epoch: 3/50... Training Step: 143... Training loss: 6.2199... 1.4215 sec/batch
Epoch: 3/50... Training Step: 144... Training loss: 6.1362... 1.4434 sec/batch
Epoch: 3/50... Training Step: 145... Training loss: 6.2280... 1.4133 sec/batch
Epoch: 3/50... Training Step: 146... Training loss: 6.1415... 1.4267 sec/batch
Epoch: 3/50... Training Step: 147... Training loss: 6.1343... 1.3970 sec/batch
Epoch: 3/50... Training Step: 148... Training loss: 6.1383... 1.4222 sec/batch
Epoch: 3/50... Training Step: 149... Training loss: 6.1653... 1.4333 sec/batch
Epoch: 3/50... Training Step: 150... Training loss: 6.1680... 1.3970 sec/batch
Epoch: 3/50... Training Step: 151... Training loss: 6.1073... 1.4030 sec/batch
Epoch: 3/50... Training Step: 152... Training loss: 6.1401... 1.4356 sec/batch
Epoch: 3/50... Training Step: 153... Training loss: 6.0854... 1.4306 sec/batch
Epoch: 3/50... Training Step: 154... Training loss: 6.1001... 1.4009 sec/batch
Epoch: 3/50... Training Step: 155... Training loss: 6.1095... 1.3764 sec/batch
Epoch: 3/50... Training Step: 156... Training loss: 6.1353... 1.4234 sec/batch
Epoch: 3/50... Training Step: 157... Training loss: 6.1482... 1.4581 sec/batch
Epoch: 3/50... Training Step: 158... Training loss: 6.1026... 1.4183 sec/batch
Epoch: 3/50... Training Step: 159... Training loss: 6.1643... 1.4248 sec/batch
Epoch: 3/50... Training Step: 160... Training loss: 6.1760... 1.3960 sec/batch
Epoch: 3/50... Training Step: 161... Training loss: 6.2353... 1.4631 sec/batch
Epoch: 3/50... Training Step: 162... Training loss: 6.1444... 1.4334 sec/batch
Epoch: 3/50... Training Step: 163... Training loss: 6.1033... 1.5530 sec/batch
Epoch: 3/50... Training Step: 164... Training loss: 6.1173... 1.5240 sec/batch
Epoch: 3/50... Training Step: 165... Training loss: 6.1481... 1.4463 sec/batch
Epoch: 3/50... Training Step: 166... Training loss: 6.0556... 1.4064 sec/batch
Epoch: 3/50... Training Step: 167... Training loss: 6.0816... 1.3700 sec/batch
Epoch: 3/50... Training Step: 168... Training loss: 6.0844... 1.4450 sec/batch
Epoch: 3/50... Training Step: 169... Training loss: 6.0990... 1.4485 sec/batch
Epoch: 3/50... Training Step: 170... Training loss: 6.1134... 1.4616 sec/batch
Epoch: 3/50... Training Step: 171... Training loss: 6.1196... 1.3994 sec/batch
Epoch: 3/50... Training Step: 172... Training loss: 6.0766... 1.4285 sec/batch
Epoch: 3/50... Training Step: 173... Training loss: 6.0764... 1.4312 sec/batch
Epoch: 3/50... Training Step: 174... Training loss: 6.1199... 1.4223 sec/batch
Epoch: 3/50... Training Step: 175... Training loss: 6.0734... 1.3975 sec/batch
Epoch: 3/50... Training Step: 176... Training loss: 6.0984... 1.3581 sec/batch
Epoch: 3/50... Training Step: 177... Training loss: 6.0395... 1.4278 sec/batch
Epoch: 3/50... Training Step: 178... Training loss: 6.0318... 1.4343 sec/batch
Epoch: 3/50... Training Step: 179... Training loss: 6.0787... 1.4168 sec/batch
Epoch: 3/50... Training Step: 180... Training loss: 6.0314... 1.4109 sec/batch
Epoch: 3/50... Training Step: 181... Training loss: 6.0567... 1.4393 sec/batch
Epoch: 3/50... Training Step: 182... Training loss: 6.0721... 1.4199 sec/batch
Epoch: 3/50... Training Step: 183... Training loss: 6.0589... 1.4051 sec/batch
Epoch: 4/50... Training Step: 184... Training loss: 6.2891... 1.4200 sec/batch
Epoch: 4/50... Training Step: 185... Training loss: 6.0522... 1.4357 sec/batch
Epoch: 4/50... Training Step: 186... Training loss: 6.0471... 1.4232 sec/batch
Epoch: 4/50... Training Step: 187... Training loss: 6.0870... 1.4348 sec/batch
Epoch: 4/50... Training Step: 188... Training loss: 6.0610... 1.4678 sec/batch
Epoch: 4/50... Training Step: 189... Training loss: 6.1056... 1.3693 sec/batch
Epoch: 4/50... Training Step: 190... Training loss: 6.1005... 1.4218 sec/batch
Epoch: 4/50... Training Step: 191... Training loss: 6.1536... 1.4356 sec/batch
Epoch: 4/50... Training Step: 192... Training loss: 6.0460... 1.4040 sec/batch
Epoch: 4/50... Training Step: 193... Training loss: 6.0841... 1.4188 sec/batch
Epoch: 4/50... Training Step: 194... Training loss: 6.1238... 1.4466 sec/batch
Epoch: 4/50... Training Step: 195... Training loss: 6.0397... 1.4062 sec/batch
Epoch: 4/50... Training Step: 196... Training loss: 6.0426... 1.4222 sec/batch
Epoch: 4/50... Training Step: 197... Training loss: 6.0798... 1.4188 sec/batch
Epoch: 4/50... Training Step: 198... Training loss: 5.9902... 1.4280 sec/batch
Epoch: 4/50... Training Step: 199... Training loss: 6.0403... 1.4102 sec/batch
Epoch: 4/50... Training Step: 200... Training loss: 6.0708... 1.4037 sec/batch
Epoch: 4/50... Training Step: 201... Training loss: 6.0700... 1.4342 sec/batch
Epoch: 4/50... Training Step: 202... Training loss: 6.0551... 1.4516 sec/batch
Epoch: 4/50... Training Step: 203... Training loss: 6.0496... 1.4117 sec/batch
Epoch: 4/50... Training Step: 204... Training loss: 6.0770... 1.4487 sec/batch
Epoch: 4/50... Training Step: 205... Training loss: 5.9928... 1.3899 sec/batch
Epoch: 4/50... Training Step: 206... Training loss: 6.0753... 1.4089 sec/batch
Epoch: 4/50... Training Step: 207... Training loss: 5.9810... 1.5272 sec/batch
Epoch: 4/50... Training Step: 208... Training loss: 5.9849... 1.4355 sec/batch
Epoch: 4/50... Training Step: 209... Training loss: 5.9803... 1.4170 sec/batch
Epoch: 4/50... Training Step: 210... Training loss: 6.0324... 1.3683 sec/batch
Epoch: 4/50... Training Step: 211... Training loss: 6.0253... 1.4401 sec/batch
Epoch: 4/50... Training Step: 212... Training loss: 5.9572... 1.4363 sec/batch
Epoch: 4/50... Training Step: 213... Training loss: 5.9867... 1.4194 sec/batch
Epoch: 4/50... Training Step: 214... Training loss: 5.9449... 1.4452 sec/batch
Epoch: 4/50... Training Step: 215... Training loss: 5.9339... 1.4190 sec/batch
Epoch: 4/50... Training Step: 216... Training loss: 5.9702... 1.3975 sec/batch
Epoch: 4/50... Training Step: 217... Training loss: 5.9762... 1.4494 sec/batch
Epoch: 4/50... Training Step: 218... Training loss: 5.9874... 1.4022 sec/batch
Epoch: 4/50... Training Step: 219... Training loss: 5.9320... 1.3984 sec/batch
Epoch: 4/50... Training Step: 220... Training loss: 6.0113... 1.3959 sec/batch
Epoch: 4/50... Training Step: 221... Training loss: 5.9994... 1.3824 sec/batch
Epoch: 4/50... Training Step: 222... Training loss: 6.0655... 1.2947 sec/batch
Epoch: 4/50... Training Step: 223... Training loss: 5.9684... 1.3952 sec/batch
Epoch: 4/50... Training Step: 224... Training loss: 5.9437... 1.4048 sec/batch
Epoch: 4/50... Training Step: 225... Training loss: 5.9313... 1.4085 sec/batch
Epoch: 4/50... Training Step: 226... Training loss: 5.9933... 1.3817 sec/batch
Epoch: 4/50... Training Step: 227... Training loss: 5.8853... 1.3778 sec/batch
Epoch: 4/50... Training Step: 228... Training loss: 5.9071... 1.4091 sec/batch
Epoch: 4/50... Training Step: 229... Training loss: 5.9141... 1.4036 sec/batch
Epoch: 4/50... Training Step: 230... Training loss: 5.9261... 1.2933 sec/batch
Epoch: 4/50... Training Step: 231... Training loss: 5.9431... 1.3859 sec/batch
Epoch: 4/50... Training Step: 232... Training loss: 5.9539... 1.4150 sec/batch
Epoch: 4/50... Training Step: 233... Training loss: 5.8860... 1.3934 sec/batch
Epoch: 4/50... Training Step: 234... Training loss: 5.8954... 1.3938 sec/batch
Epoch: 4/50... Training Step: 235... Training loss: 5.9560... 1.3808 sec/batch
Epoch: 4/50... Training Step: 236... Training loss: 5.8904... 1.3685 sec/batch
Epoch: 4/50... Training Step: 237... Training loss: 5.9256... 1.3846 sec/batch
Epoch: 4/50... Training Step: 238... Training loss: 5.8489... 1.3786 sec/batch
Epoch: 4/50... Training Step: 239... Training loss: 5.8426... 1.4013 sec/batch
Epoch: 4/50... Training Step: 240... Training loss: 5.8872... 1.3394 sec/batch
Epoch: 4/50... Training Step: 241... Training loss: 5.8364... 1.3532 sec/batch
Epoch: 4/50... Training Step: 242... Training loss: 5.8773... 1.3930 sec/batch
Epoch: 4/50... Training Step: 243... Training loss: 5.8811... 1.4077 sec/batch
Epoch: 4/50... Training Step: 244... Training loss: 5.8751... 1.3809 sec/batch
Epoch: 5/50... Training Step: 245... Training loss: 6.0996... 1.3569 sec/batch
Epoch: 5/50... Training Step: 246... Training loss: 5.8614... 1.4008 sec/batch
Epoch: 5/50... Training Step: 247... Training loss: 5.8780... 1.2659 sec/batch
Epoch: 5/50... Training Step: 248... Training loss: 5.9136... 1.3866 sec/batch
Epoch: 5/50... Training Step: 249... Training loss: 5.9095... 1.3832 sec/batch
Epoch: 5/50... Training Step: 250... Training loss: 5.9249... 1.4293 sec/batch
Epoch: 5/50... Training Step: 251... Training loss: 5.9381... 1.3277 sec/batch
Epoch: 5/50... Training Step: 252... Training loss: 5.9692... 1.3764 sec/batch
Epoch: 5/50... Training Step: 253... Training loss: 5.8716... 1.3824 sec/batch
Epoch: 5/50... Training Step: 254... Training loss: 5.8937... 1.3991 sec/batch
Epoch: 5/50... Training Step: 255... Training loss: 5.9498... 1.3544 sec/batch
Epoch: 5/50... Training Step: 256... Training loss: 5.8689... 1.3530 sec/batch
Epoch: 5/50... Training Step: 257... Training loss: 5.8675... 1.3939 sec/batch
Epoch: 5/50... Training Step: 258... Training loss: 5.8881... 1.5749 sec/batch
Epoch: 5/50... Training Step: 259... Training loss: 5.7998... 1.3666 sec/batch
Epoch: 5/50... Training Step: 260... Training loss: 5.8493... 1.3929 sec/batch
Epoch: 5/50... Training Step: 261... Training loss: 5.8876... 1.4107 sec/batch
Epoch: 5/50... Training Step: 262... Training loss: 5.8717... 1.3935 sec/batch
Epoch: 5/50... Training Step: 263... Training loss: 5.8577... 1.3933 sec/batch
Epoch: 5/50... Training Step: 264... Training loss: 5.8689... 1.4066 sec/batch
Epoch: 5/50... Training Step: 265... Training loss: 5.9020... 1.3035 sec/batch
Epoch: 5/50... Training Step: 266... Training loss: 5.7957... 1.3753 sec/batch
Epoch: 5/50... Training Step: 267... Training loss: 5.8787... 1.3777 sec/batch
Epoch: 5/50... Training Step: 268... Training loss: 5.7957... 1.3657 sec/batch
Epoch: 5/50... Training Step: 269... Training loss: 5.8012... 1.3161 sec/batch
Epoch: 5/50... Training Step: 270... Training loss: 5.7878... 1.3452 sec/batch
Epoch: 5/50... Training Step: 271... Training loss: 5.8498... 1.3688 sec/batch
Epoch: 5/50... Training Step: 272... Training loss: 5.8528... 1.3612 sec/batch
Epoch: 5/50... Training Step: 273... Training loss: 5.7511... 1.3363 sec/batch
Epoch: 5/50... Training Step: 274... Training loss: 5.8112... 1.3787 sec/batch
Epoch: 5/50... Training Step: 275... Training loss: 5.7589... 1.3448 sec/batch
Epoch: 5/50... Training Step: 276... Training loss: 5.7463... 1.3799 sec/batch
Epoch: 5/50... Training Step: 277... Training loss: 5.7803... 1.4064 sec/batch
Epoch: 5/50... Training Step: 278... Training loss: 5.7916... 1.3090 sec/batch
Epoch: 5/50... Training Step: 279... Training loss: 5.8181... 1.3874 sec/batch
Epoch: 5/50... Training Step: 280... Training loss: 5.7378... 1.3417 sec/batch
Epoch: 5/50... Training Step: 281... Training loss: 5.8403... 1.3756 sec/batch
Epoch: 5/50... Training Step: 282... Training loss: 5.8011... 1.3998 sec/batch
Epoch: 5/50... Training Step: 283... Training loss: 5.8825... 1.3729 sec/batch
Epoch: 5/50... Training Step: 284... Training loss: 5.7677... 1.3916 sec/batch
Epoch: 5/50... Training Step: 285... Training loss: 5.7243... 1.3739 sec/batch
Epoch: 5/50... Training Step: 286... Training loss: 5.7587... 1.3902 sec/batch
Epoch: 5/50... Training Step: 287... Training loss: 5.8074... 1.3779 sec/batch
Epoch: 5/50... Training Step: 288... Training loss: 5.6992... 1.3183 sec/batch
Epoch: 5/50... Training Step: 289... Training loss: 5.7197... 1.4255 sec/batch
Epoch: 5/50... Training Step: 290... Training loss: 5.7371... 1.3419 sec/batch
Epoch: 5/50... Training Step: 291... Training loss: 5.7388... 1.3231 sec/batch
Epoch: 5/50... Training Step: 292... Training loss: 5.7795... 1.3176 sec/batch
Epoch: 5/50... Training Step: 293... Training loss: 5.7890... 1.3363 sec/batch
Epoch: 5/50... Training Step: 294... Training loss: 5.7132... 1.3937 sec/batch
Epoch: 5/50... Training Step: 295... Training loss: 5.7084... 1.3996 sec/batch
Epoch: 5/50... Training Step: 296... Training loss: 5.7836... 1.3680 sec/batch
Epoch: 5/50... Training Step: 297... Training loss: 5.6889... 1.3793 sec/batch
Epoch: 5/50... Training Step: 298... Training loss: 5.7517... 1.3737 sec/batch
Epoch: 5/50... Training Step: 299... Training loss: 5.6688... 1.3680 sec/batch
Epoch: 5/50... Training Step: 300... Training loss: 5.6418... 1.3805 sec/batch
Epoch: 5/50... Training Step: 301... Training loss: 5.7133... 1.3928 sec/batch
Epoch: 5/50... Training Step: 302... Training loss: 5.6564... 1.3899 sec/batch
Epoch: 5/50... Training Step: 303... Training loss: 5.6975... 1.3584 sec/batch
Epoch: 5/50... Training Step: 304... Training loss: 5.7037... 1.4091 sec/batch
Epoch: 5/50... Training Step: 305... Training loss: 5.6940... 1.3902 sec/batch
Epoch: 6/50... Training Step: 306... Training loss: 5.9163... 1.2871 sec/batch
Epoch: 6/50... Training Step: 307... Training loss: 5.6843... 1.3126 sec/batch
Epoch: 6/50... Training Step: 308... Training loss: 5.6635... 1.3782 sec/batch
Epoch: 6/50... Training Step: 309... Training loss: 5.7247... 1.4102 sec/batch
Epoch: 6/50... Training Step: 310... Training loss: 5.7016... 1.3979 sec/batch
Epoch: 6/50... Training Step: 311... Training loss: 5.7559... 1.3742 sec/batch
Epoch: 6/50... Training Step: 312... Training loss: 5.7495... 1.4309 sec/batch
Epoch: 6/50... Training Step: 313... Training loss: 5.7751... 1.3450 sec/batch
Epoch: 6/50... Training Step: 314... Training loss: 5.6766... 1.3754 sec/batch
Epoch: 6/50... Training Step: 315... Training loss: 5.7052... 1.3828 sec/batch
Epoch: 6/50... Training Step: 316... Training loss: 5.7722... 1.3340 sec/batch
Epoch: 6/50... Training Step: 317... Training loss: 5.6684... 1.3709 sec/batch
Epoch: 6/50... Training Step: 318... Training loss: 5.6945... 1.3871 sec/batch
Epoch: 6/50... Training Step: 319... Training loss: 5.7011... 1.3812 sec/batch
Epoch: 6/50... Training Step: 320... Training loss: 5.6153... 1.3818 sec/batch
Epoch: 6/50... Training Step: 321... Training loss: 5.6817... 1.3614 sec/batch
Epoch: 6/50... Training Step: 322... Training loss: 5.7076... 1.3945 sec/batch
Epoch: 6/50... Training Step: 323... Training loss: 5.6911... 1.3911 sec/batch
Epoch: 6/50... Training Step: 324... Training loss: 5.6928... 1.3249 sec/batch
Epoch: 6/50... Training Step: 325... Training loss: 5.6899... 1.3619 sec/batch
Epoch: 6/50... Training Step: 326... Training loss: 5.7323... 1.3682 sec/batch
Epoch: 6/50... Training Step: 327... Training loss: 5.6080... 1.3963 sec/batch
Epoch: 6/50... Training Step: 328... Training loss: 5.7156... 1.3559 sec/batch
Epoch: 6/50... Training Step: 329... Training loss: 5.5834... 1.3836 sec/batch
Epoch: 6/50... Training Step: 330... Training loss: 5.6224... 1.3224 sec/batch
Epoch: 6/50... Training Step: 331... Training loss: 5.6049... 1.3891 sec/batch
Epoch: 6/50... Training Step: 332... Training loss: 5.6647... 1.3286 sec/batch
Epoch: 6/50... Training Step: 333... Training loss: 5.6572... 1.4149 sec/batch
Epoch: 6/50... Training Step: 334... Training loss: 5.5928... 1.3771 sec/batch
Epoch: 6/50... Training Step: 335... Training loss: 5.6196... 1.3665 sec/batch
Epoch: 6/50... Training Step: 336... Training loss: 5.5717... 1.3752 sec/batch
Epoch: 6/50... Training Step: 337... Training loss: 5.5648... 1.3850 sec/batch
Epoch: 6/50... Training Step: 338... Training loss: 5.5948... 1.3807 sec/batch
Epoch: 6/50... Training Step: 339... Training loss: 5.6096... 1.3514 sec/batch
Epoch: 6/50... Training Step: 340... Training loss: 5.6294... 1.3818 sec/batch
Epoch: 6/50... Training Step: 341... Training loss: 5.5626... 1.3998 sec/batch
Epoch: 6/50... Training Step: 342... Training loss: 5.6588... 1.3796 sec/batch
Epoch: 6/50... Training Step: 343... Training loss: 5.6243... 1.3742 sec/batch
Epoch: 6/50... Training Step: 344... Training loss: 5.7182... 1.3880 sec/batch
Epoch: 6/50... Training Step: 345... Training loss: 5.5788... 1.3842 sec/batch
Epoch: 6/50... Training Step: 346... Training loss: 5.5456... 1.3467 sec/batch
Epoch: 6/50... Training Step: 347... Training loss: 5.5642... 1.3229 sec/batch
Epoch: 6/50... Training Step: 348... Training loss: 5.6196... 1.4028 sec/batch
Epoch: 6/50... Training Step: 349... Training loss: 5.5028... 1.3976 sec/batch
Epoch: 6/50... Training Step: 350... Training loss: 5.5400... 1.3353 sec/batch
Epoch: 6/50... Training Step: 351... Training loss: 5.5582... 1.3731 sec/batch
Epoch: 6/50... Training Step: 352... Training loss: 5.5779... 1.3898 sec/batch
Epoch: 6/50... Training Step: 353... Training loss: 5.5914... 1.3644 sec/batch
Epoch: 6/50... Training Step: 354... Training loss: 5.6216... 1.3942 sec/batch
Epoch: 6/50... Training Step: 355... Training loss: 5.5292... 1.3557 sec/batch
Epoch: 6/50... Training Step: 356... Training loss: 5.5275... 1.3988 sec/batch
Epoch: 6/50... Training Step: 357... Training loss: 5.6096... 1.3648 sec/batch
Epoch: 6/50... Training Step: 358... Training loss: 5.5215... 1.4237 sec/batch
Epoch: 6/50... Training Step: 359... Training loss: 5.5751... 1.3754 sec/batch
Epoch: 6/50... Training Step: 360... Training loss: 5.4883... 1.3801 sec/batch
Epoch: 6/50... Training Step: 361... Training loss: 5.4763... 1.3920 sec/batch
Epoch: 6/50... Training Step: 362... Training loss: 5.5330... 1.3662 sec/batch
Epoch: 6/50... Training Step: 363... Training loss: 5.4970... 1.4303 sec/batch
Epoch: 6/50... Training Step: 364... Training loss: 5.5334... 1.3749 sec/batch
Epoch: 6/50... Training Step: 365... Training loss: 5.5354... 1.3637 sec/batch
Epoch: 6/50... Training Step: 366... Training loss: 5.5214... 1.3222 sec/batch
Epoch: 7/50... Training Step: 367... Training loss: 5.7834... 1.4110 sec/batch
Epoch: 7/50... Training Step: 368... Training loss: 5.5228... 1.3681 sec/batch
Epoch: 7/50... Training Step: 369... Training loss: 5.5187... 1.3126 sec/batch
Epoch: 7/50... Training Step: 370... Training loss: 5.5643... 1.3789 sec/batch
Epoch: 7/50... Training Step: 371... Training loss: 5.5570... 1.3839 sec/batch
Epoch: 7/50... Training Step: 372... Training loss: 5.6065... 1.3913 sec/batch
Epoch: 7/50... Training Step: 373... Training loss: 5.6088... 1.3491 sec/batch
Epoch: 7/50... Training Step: 374... Training loss: 5.6194... 1.3703 sec/batch
Epoch: 7/50... Training Step: 375... Training loss: 5.5234... 1.3468 sec/batch
Epoch: 7/50... Training Step: 376... Training loss: 5.5631... 1.3876 sec/batch
Epoch: 7/50... Training Step: 377... Training loss: 5.6242... 1.3758 sec/batch
Epoch: 7/50... Training Step: 378... Training loss: 5.5268... 1.3476 sec/batch
Epoch: 7/50... Training Step: 379... Training loss: 5.5337... 1.3808 sec/batch
Epoch: 7/50... Training Step: 380... Training loss: 5.5654... 1.3872 sec/batch
Epoch: 7/50... Training Step: 381... Training loss: 5.4644... 1.3928 sec/batch
Epoch: 7/50... Training Step: 382... Training loss: 5.5256... 1.3718 sec/batch
Epoch: 7/50... Training Step: 383... Training loss: 5.5650... 1.3850 sec/batch
Epoch: 7/50... Training Step: 384... Training loss: 5.5359... 1.3741 sec/batch
Epoch: 7/50... Training Step: 385... Training loss: 5.5340... 1.3820 sec/batch
Epoch: 7/50... Training Step: 386... Training loss: 5.5388... 1.3756 sec/batch
Epoch: 7/50... Training Step: 387... Training loss: 5.5920... 1.3136 sec/batch
Epoch: 7/50... Training Step: 388... Training loss: 5.4500... 1.2668 sec/batch
Epoch: 7/50... Training Step: 389... Training loss: 5.5549... 1.3872 sec/batch
Epoch: 7/50... Training Step: 390... Training loss: 5.4203... 1.3726 sec/batch
Epoch: 7/50... Training Step: 391... Training loss: 5.4748... 1.3808 sec/batch
Epoch: 7/50... Training Step: 392... Training loss: 5.4561... 1.3866 sec/batch
Epoch: 7/50... Training Step: 393... Training loss: 5.5127... 1.3590 sec/batch
Epoch: 7/50... Training Step: 394... Training loss: 5.5194... 1.3745 sec/batch
Epoch: 7/50... Training Step: 395... Training loss: 5.4287... 1.3706 sec/batch
Epoch: 7/50... Training Step: 396... Training loss: 5.4754... 1.3782 sec/batch
Epoch: 7/50... Training Step: 397... Training loss: 5.4251... 1.3487 sec/batch
Epoch: 7/50... Training Step: 398... Training loss: 5.4151... 1.3359 sec/batch
Epoch: 7/50... Training Step: 399... Training loss: 5.4597... 1.4042 sec/batch
Epoch: 7/50... Training Step: 400... Training loss: 5.4566... 1.3847 sec/batch
Epoch: 7/50... Training Step: 401... Training loss: 5.5003... 1.2903 sec/batch
Epoch: 7/50... Training Step: 402... Training loss: 5.4168... 1.4183 sec/batch
Epoch: 7/50... Training Step: 403... Training loss: 5.5265... 1.3698 sec/batch
Epoch: 7/50... Training Step: 404... Training loss: 5.4725... 1.3786 sec/batch
Epoch: 7/50... Training Step: 405... Training loss: 5.5889... 1.3914 sec/batch
Epoch: 7/50... Training Step: 406... Training loss: 5.4271... 1.4041 sec/batch
Epoch: 7/50... Training Step: 407... Training loss: 5.3942... 1.3765 sec/batch
Epoch: 7/50... Training Step: 408... Training loss: 5.4274... 1.3705 sec/batch
Epoch: 7/50... Training Step: 409... Training loss: 5.4811... 1.3632 sec/batch
Epoch: 7/50... Training Step: 410... Training loss: 5.3667... 1.3882 sec/batch
Epoch: 7/50... Training Step: 411... Training loss: 5.4007... 1.4034 sec/batch
Epoch: 7/50... Training Step: 412... Training loss: 5.4078... 1.3811 sec/batch
Epoch: 7/50... Training Step: 413... Training loss: 5.4310... 1.2984 sec/batch
Epoch: 7/50... Training Step: 414... Training loss: 5.4550... 1.3896 sec/batch
Epoch: 7/50... Training Step: 415... Training loss: 5.4857... 1.3823 sec/batch
Epoch: 7/50... Training Step: 416... Training loss: 5.3836... 1.3365 sec/batch
Epoch: 7/50... Training Step: 417... Training loss: 5.3973... 1.3916 sec/batch
Epoch: 7/50... Training Step: 418... Training loss: 5.4755... 1.3630 sec/batch
Epoch: 7/50... Training Step: 419... Training loss: 5.3741... 1.3802 sec/batch
Epoch: 7/50... Training Step: 420... Training loss: 5.4398... 1.4162 sec/batch
Epoch: 7/50... Training Step: 421... Training loss: 5.3541... 1.3733 sec/batch
Epoch: 7/50... Training Step: 422... Training loss: 5.3305... 1.3697 sec/batch
Epoch: 7/50... Training Step: 423... Training loss: 5.4079... 1.3759 sec/batch
Epoch: 7/50... Training Step: 424... Training loss: 5.3534... 1.4072 sec/batch
Epoch: 7/50... Training Step: 425... Training loss: 5.3935... 1.3880 sec/batch
Epoch: 7/50... Training Step: 426... Training loss: 5.4087... 1.3671 sec/batch
Epoch: 7/50... Training Step: 427... Training loss: 5.3881... 1.3876 sec/batch
Epoch: 8/50... Training Step: 428... Training loss: 5.6280... 1.3700 sec/batch
Epoch: 8/50... Training Step: 429... Training loss: 5.3837... 1.3396 sec/batch
Epoch: 8/50... Training Step: 430... Training loss: 5.3877... 1.3585 sec/batch
Epoch: 8/50... Training Step: 431... Training loss: 5.4294... 1.3404 sec/batch
Epoch: 8/50... Training Step: 432... Training loss: 5.4140... 1.3464 sec/batch
Epoch: 8/50... Training Step: 433... Training loss: 5.4700... 1.3754 sec/batch
Epoch: 8/50... Training Step: 434... Training loss: 5.4609... 1.3469 sec/batch
Epoch: 8/50... Training Step: 435... Training loss: 5.4936... 1.3745 sec/batch
Epoch: 8/50... Training Step: 436... Training loss: 5.3870... 1.3782 sec/batch
Epoch: 8/50... Training Step: 437... Training loss: 5.4275... 1.3663 sec/batch
Epoch: 8/50... Training Step: 438... Training loss: 5.4959... 1.4071 sec/batch
Epoch: 8/50... Training Step: 439... Training loss: 5.3799... 1.3756 sec/batch
Epoch: 8/50... Training Step: 440... Training loss: 5.4056... 1.3251 sec/batch
Epoch: 8/50... Training Step: 441... Training loss: 5.4081... 1.3738 sec/batch
Epoch: 8/50... Training Step: 442... Training loss: 5.3230... 1.3844 sec/batch
Epoch: 8/50... Training Step: 443... Training loss: 5.3812... 1.3705 sec/batch
Epoch: 8/50... Training Step: 444... Training loss: 5.4251... 1.3513 sec/batch
Epoch: 8/50... Training Step: 445... Training loss: 5.4058... 1.3820 sec/batch
Epoch: 8/50... Training Step: 446... Training loss: 5.3935... 1.3787 sec/batch
Epoch: 8/50... Training Step: 447... Training loss: 5.4112... 1.3691 sec/batch
Epoch: 8/50... Training Step: 448... Training loss: 5.4547... 1.3681 sec/batch
Epoch: 8/50... Training Step: 449... Training loss: 5.3270... 1.3892 sec/batch
Epoch: 8/50... Training Step: 450... Training loss: 5.4101... 1.3760 sec/batch
Epoch: 8/50... Training Step: 451... Training loss: 5.2857... 1.3323 sec/batch
Epoch: 8/50... Training Step: 452... Training loss: 5.3250... 1.3710 sec/batch
Epoch: 8/50... Training Step: 453... Training loss: 5.3075... 1.3749 sec/batch
Epoch: 8/50... Training Step: 454... Training loss: 5.3948... 1.3747 sec/batch
Epoch: 8/50... Training Step: 455... Training loss: 5.3770... 1.3698 sec/batch
Epoch: 8/50... Training Step: 456... Training loss: 5.3089... 1.3343 sec/batch
Epoch: 8/50... Training Step: 457... Training loss: 5.3364... 1.3806 sec/batch
Epoch: 8/50... Training Step: 458... Training loss: 5.2791... 1.3652 sec/batch
Epoch: 8/50... Training Step: 459... Training loss: 5.2824... 1.3681 sec/batch
Epoch: 8/50... Training Step: 460... Training loss: 5.3205... 1.3911 sec/batch
Epoch: 8/50... Training Step: 461... Training loss: 5.3318... 1.3924 sec/batch
Epoch: 8/50... Training Step: 462... Training loss: 5.3487... 1.3224 sec/batch
Epoch: 8/50... Training Step: 463... Training loss: 5.2997... 1.3698 sec/batch
Epoch: 8/50... Training Step: 464... Training loss: 5.3861... 1.3662 sec/batch
Epoch: 8/50... Training Step: 465... Training loss: 5.3465... 1.4064 sec/batch
Epoch: 8/50... Training Step: 466... Training loss: 5.4676... 1.3488 sec/batch
Epoch: 8/50... Training Step: 467... Training loss: 5.2970... 1.2943 sec/batch
Epoch: 8/50... Training Step: 468... Training loss: 5.2485... 1.3609 sec/batch
Epoch: 8/50... Training Step: 469... Training loss: 5.2964... 1.3594 sec/batch
Epoch: 8/50... Training Step: 470... Training loss: 5.3489... 1.3719 sec/batch
Epoch: 8/50... Training Step: 471... Training loss: 5.2249... 1.3333 sec/batch
Epoch: 8/50... Training Step: 472... Training loss: 5.2815... 1.3749 sec/batch
Epoch: 8/50... Training Step: 473... Training loss: 5.2824... 1.3909 sec/batch
Epoch: 8/50... Training Step: 474... Training loss: 5.3081... 1.3325 sec/batch
Epoch: 8/50... Training Step: 475... Training loss: 5.3224... 1.3873 sec/batch
Epoch: 8/50... Training Step: 476... Training loss: 5.3452... 1.3918 sec/batch
Epoch: 8/50... Training Step: 477... Training loss: 5.2544... 1.3661 sec/batch
Epoch: 8/50... Training Step: 478... Training loss: 5.2524... 1.4057 sec/batch
Epoch: 8/50... Training Step: 479... Training loss: 5.3492... 1.3744 sec/batch
Epoch: 8/50... Training Step: 480... Training loss: 5.2423... 1.3730 sec/batch
Epoch: 8/50... Training Step: 481... Training loss: 5.3105... 1.3125 sec/batch
Epoch: 8/50... Training Step: 482... Training loss: 5.2169... 1.3725 sec/batch
Epoch: 8/50... Training Step: 483... Training loss: 5.2085... 1.4026 sec/batch
Epoch: 8/50... Training Step: 484... Training loss: 5.2752... 1.3714 sec/batch
Epoch: 8/50... Training Step: 485... Training loss: 5.2122... 1.3748 sec/batch
Epoch: 8/50... Training Step: 486... Training loss: 5.2682... 1.3789 sec/batch
Epoch: 8/50... Training Step: 487... Training loss: 5.2670... 1.3296 sec/batch
Epoch: 8/50... Training Step: 488... Training loss: 5.2623... 1.3736 sec/batch
Epoch: 9/50... Training Step: 489... Training loss: 5.4895... 1.3829 sec/batch
Epoch: 9/50... Training Step: 490... Training loss: 5.2572... 1.3509 sec/batch
Epoch: 9/50... Training Step: 491... Training loss: 5.2501... 1.3825 sec/batch
Epoch: 9/50... Training Step: 492... Training loss: 5.3018... 1.3981 sec/batch
Epoch: 9/50... Training Step: 493... Training loss: 5.2912... 1.3909 sec/batch
Epoch: 9/50... Training Step: 494... Training loss: 5.3431... 1.3467 sec/batch
Epoch: 9/50... Training Step: 495... Training loss: 5.3396... 1.3127 sec/batch
Epoch: 9/50... Training Step: 496... Training loss: 5.3569... 1.4036 sec/batch
Epoch: 9/50... Training Step: 497... Training loss: 5.2578... 1.3844 sec/batch
Epoch: 9/50... Training Step: 498... Training loss: 5.3052... 1.3729 sec/batch
Epoch: 9/50... Training Step: 499... Training loss: 5.3775... 1.3827 sec/batch
Epoch: 9/50... Training Step: 500... Training loss: 5.2464... 1.3809 sec/batch
Epoch: 9/50... Training Step: 501... Training loss: 5.2898... 1.4030 sec/batch
Epoch: 9/50... Training Step: 502... Training loss: 5.2912... 1.3835 sec/batch
Epoch: 9/50... Training Step: 503... Training loss: 5.1929... 1.2836 sec/batch
Epoch: 9/50... Training Step: 504... Training loss: 5.2472... 1.3892 sec/batch
Epoch: 9/50... Training Step: 505... Training loss: 5.2992... 1.4060 sec/batch
Epoch: 9/50... Training Step: 506... Training loss: 5.2951... 1.3472 sec/batch
Epoch: 9/50... Training Step: 507... Training loss: 5.2660... 1.3942 sec/batch
Epoch: 9/50... Training Step: 508... Training loss: 5.2849... 1.3806 sec/batch
Epoch: 9/50... Training Step: 509... Training loss: 5.3419... 1.3737 sec/batch
Epoch: 9/50... Training Step: 510... Training loss: 5.1918... 1.3559 sec/batch
Epoch: 9/50... Training Step: 511... Training loss: 5.2866... 1.3893 sec/batch
Epoch: 9/50... Training Step: 512... Training loss: 5.1769... 1.3454 sec/batch
Epoch: 9/50... Training Step: 513... Training loss: 5.1940... 1.3708 sec/batch
Epoch: 9/50... Training Step: 514... Training loss: 5.1828... 1.4051 sec/batch
Epoch: 9/50... Training Step: 515... Training loss: 5.2586... 1.3881 sec/batch
Epoch: 9/50... Training Step: 516... Training loss: 5.2753... 1.3737 sec/batch
Epoch: 9/50... Training Step: 517... Training loss: 5.1792... 1.3518 sec/batch
Epoch: 9/50... Training Step: 518... Training loss: 5.2081... 1.3812 sec/batch
Epoch: 9/50... Training Step: 519... Training loss: 5.1635... 1.4080 sec/batch
Epoch: 9/50... Training Step: 520... Training loss: 5.1614... 1.3677 sec/batch
Epoch: 9/50... Training Step: 521... Training loss: 5.2076... 1.3701 sec/batch
Epoch: 9/50... Training Step: 522... Training loss: 5.1984... 1.3649 sec/batch
Epoch: 9/50... Training Step: 523... Training loss: 5.2285... 1.4266 sec/batch
Epoch: 9/50... Training Step: 524... Training loss: 5.1709... 1.3765 sec/batch
Epoch: 9/50... Training Step: 525... Training loss: 5.2635... 1.3748 sec/batch
Epoch: 9/50... Training Step: 526... Training loss: 5.2163... 1.4052 sec/batch
Epoch: 9/50... Training Step: 527... Training loss: 5.3301... 1.3413 sec/batch
Epoch: 9/50... Training Step: 528... Training loss: 5.1608... 1.3980 sec/batch
Epoch: 9/50... Training Step: 529... Training loss: 5.1292... 1.4107 sec/batch
Epoch: 9/50... Training Step: 530... Training loss: 5.1563... 1.3229 sec/batch
Epoch: 9/50... Training Step: 531... Training loss: 5.2170... 1.3820 sec/batch
Epoch: 9/50... Training Step: 532... Training loss: 5.1113... 1.3825 sec/batch
Epoch: 9/50... Training Step: 533... Training loss: 5.1304... 1.3877 sec/batch
Epoch: 9/50... Training Step: 534... Training loss: 5.1544... 1.3729 sec/batch
Epoch: 9/50... Training Step: 535... Training loss: 5.1736... 1.3184 sec/batch
Epoch: 9/50... Training Step: 536... Training loss: 5.1920... 1.3737 sec/batch
Epoch: 9/50... Training Step: 537... Training loss: 5.2418... 1.4311 sec/batch
Epoch: 9/50... Training Step: 538... Training loss: 5.1347... 1.3470 sec/batch
Epoch: 9/50... Training Step: 539... Training loss: 5.1428... 1.3813 sec/batch
Epoch: 9/50... Training Step: 540... Training loss: 5.2304... 1.3601 sec/batch
Epoch: 9/50... Training Step: 541... Training loss: 5.1257... 1.3964 sec/batch
Epoch: 9/50... Training Step: 542... Training loss: 5.1910... 1.3764 sec/batch
Epoch: 9/50... Training Step: 543... Training loss: 5.0866... 1.3659 sec/batch
Epoch: 9/50... Training Step: 544... Training loss: 5.0774... 1.3890 sec/batch
Epoch: 9/50... Training Step: 545... Training loss: 5.1709... 1.3666 sec/batch
Epoch: 9/50... Training Step: 546... Training loss: 5.0996... 1.3944 sec/batch
Epoch: 9/50... Training Step: 547... Training loss: 5.1425... 1.3709 sec/batch
Epoch: 9/50... Training Step: 548... Training loss: 5.1503... 1.3946 sec/batch
Epoch: 9/50... Training Step: 549... Training loss: 5.1418... 1.3699 sec/batch
Epoch: 10/50... Training Step: 550... Training loss: 5.3582... 1.3583 sec/batch
Epoch: 10/50... Training Step: 551... Training loss: 5.1243... 1.3006 sec/batch
Epoch: 10/50... Training Step: 552... Training loss: 5.1251... 1.3929 sec/batch
Epoch: 10/50... Training Step: 553... Training loss: 5.1802... 1.3738 sec/batch
Epoch: 10/50... Training Step: 554... Training loss: 5.1610... 1.3722 sec/batch
Epoch: 10/50... Training Step: 555... Training loss: 5.2251... 1.3897 sec/batch
Epoch: 10/50... Training Step: 556... Training loss: 5.2126... 1.3760 sec/batch
Epoch: 10/50... Training Step: 557... Training loss: 5.2347... 1.3488 sec/batch
Epoch: 10/50... Training Step: 558... Training loss: 5.1432... 1.3469 sec/batch
Epoch: 10/50... Training Step: 559... Training loss: 5.1786... 1.3924 sec/batch
Epoch: 10/50... Training Step: 560... Training loss: 5.2731... 1.3513 sec/batch
Epoch: 10/50... Training Step: 561... Training loss: 5.1343... 1.3706 sec/batch
Epoch: 10/50... Training Step: 562... Training loss: 5.1640... 1.3926 sec/batch
Epoch: 10/50... Training Step: 563... Training loss: 5.1882... 1.3856 sec/batch
Epoch: 10/50... Training Step: 564... Training loss: 5.0814... 1.3945 sec/batch
Epoch: 10/50... Training Step: 565... Training loss: 5.1378... 1.3875 sec/batch
Epoch: 10/50... Training Step: 566... Training loss: 5.1725... 1.3665 sec/batch
Epoch: 10/50... Training Step: 567... Training loss: 5.1588... 1.3925 sec/batch
Epoch: 10/50... Training Step: 568... Training loss: 5.1549... 1.3814 sec/batch
Epoch: 10/50... Training Step: 569... Training loss: 5.1587... 1.3706 sec/batch
Epoch: 10/50... Training Step: 570... Training loss: 5.2009... 1.3931 sec/batch
Epoch: 10/50... Training Step: 571... Training loss: 5.0668... 1.3763 sec/batch
Epoch: 10/50... Training Step: 572... Training loss: 5.1437... 1.3798 sec/batch
Epoch: 10/50... Training Step: 573... Training loss: 5.0492... 1.3909 sec/batch
Epoch: 10/50... Training Step: 574... Training loss: 5.0729... 1.2909 sec/batch
Epoch: 10/50... Training Step: 575... Training loss: 5.0487... 1.3474 sec/batch
Epoch: 10/50... Training Step: 576... Training loss: 5.1305... 1.3692 sec/batch
Epoch: 10/50... Training Step: 577... Training loss: 5.1406... 1.3813 sec/batch
Epoch: 10/50... Training Step: 578... Training loss: 5.0371... 1.3774 sec/batch
Epoch: 10/50... Training Step: 579... Training loss: 5.0843... 1.3452 sec/batch
Epoch: 10/50... Training Step: 580... Training loss: 5.0352... 1.3653 sec/batch
Epoch: 10/50... Training Step: 581... Training loss: 5.0171... 1.3791 sec/batch
Epoch: 10/50... Training Step: 582... Training loss: 5.0720... 1.3469 sec/batch
Epoch: 10/50... Training Step: 583... Training loss: 5.0924... 1.3645 sec/batch
Epoch: 10/50... Training Step: 584... Training loss: 5.1076... 1.4338 sec/batch
Epoch: 10/50... Training Step: 585... Training loss: 5.0291... 1.3496 sec/batch
Epoch: 10/50... Training Step: 586... Training loss: 5.1388... 1.3779 sec/batch
Epoch: 10/50... Training Step: 587... Training loss: 5.0963... 1.3474 sec/batch
Epoch: 10/50... Training Step: 588... Training loss: 5.2178... 1.3613 sec/batch
Epoch: 10/50... Training Step: 589... Training loss: 5.0362... 1.3544 sec/batch
Epoch: 10/50... Training Step: 590... Training loss: 4.9964... 1.3840 sec/batch
Epoch: 10/50... Training Step: 591... Training loss: 5.0311... 1.3602 sec/batch
Epoch: 10/50... Training Step: 592... Training loss: 5.1033... 1.3615 sec/batch
Epoch: 10/50... Training Step: 593... Training loss: 4.9860... 1.3086 sec/batch
Epoch: 10/50... Training Step: 594... Training loss: 4.9975... 1.3685 sec/batch
Epoch: 10/50... Training Step: 595... Training loss: 5.0273... 1.3414 sec/batch
Epoch: 10/50... Training Step: 596... Training loss: 5.0470... 1.2802 sec/batch
Epoch: 10/50... Training Step: 597... Training loss: 5.0503... 1.3734 sec/batch
Epoch: 10/50... Training Step: 598... Training loss: 5.0917... 1.3695 sec/batch
Epoch: 10/50... Training Step: 599... Training loss: 5.0086... 1.3949 sec/batch
Epoch: 10/50... Training Step: 600... Training loss: 4.9833... 1.3878 sec/batch
Epoch: 10/50... Training Step: 601... Training loss: 5.0899... 1.4266 sec/batch
Epoch: 10/50... Training Step: 602... Training loss: 4.9754... 1.3827 sec/batch
Epoch: 10/50... Training Step: 603... Training loss: 5.0600... 1.3588 sec/batch
Epoch: 10/50... Training Step: 604... Training loss: 4.9564... 1.3465 sec/batch
Epoch: 10/50... Training Step: 605... Training loss: 4.9417... 1.3683 sec/batch
Epoch: 10/50... Training Step: 606... Training loss: 5.0294... 1.3697 sec/batch
Epoch: 10/50... Training Step: 607... Training loss: 4.9613... 1.3525 sec/batch
Epoch: 10/50... Training Step: 608... Training loss: 4.9986... 1.3899 sec/batch
Epoch: 10/50... Training Step: 609... Training loss: 5.0211... 1.3690 sec/batch
Epoch: 10/50... Training Step: 610... Training loss: 5.0079... 1.3795 sec/batch
Epoch: 11/50... Training Step: 611... Training loss: 5.2420... 1.3724 sec/batch
Epoch: 11/50... Training Step: 612... Training loss: 4.9978... 1.4015 sec/batch
Epoch: 11/50... Training Step: 613... Training loss: 4.9930... 1.3454 sec/batch
Epoch: 11/50... Training Step: 614... Training loss: 5.0430... 1.2454 sec/batch
Epoch: 11/50... Training Step: 615... Training loss: 5.0462... 1.3553 sec/batch
Epoch: 11/50... Training Step: 616... Training loss: 5.1099... 1.3884 sec/batch
Epoch: 11/50... Training Step: 617... Training loss: 5.0824... 1.3775 sec/batch
Epoch: 11/50... Training Step: 618... Training loss: 5.1130... 1.3719 sec/batch
Epoch: 11/50... Training Step: 619... Training loss: 5.0335... 1.3847 sec/batch
Epoch: 11/50... Training Step: 620... Training loss: 5.0552... 1.3901 sec/batch
Epoch: 11/50... Training Step: 621... Training loss: 5.1504... 1.3864 sec/batch
Epoch: 11/50... Training Step: 622... Training loss: 5.0053... 1.3799 sec/batch
Epoch: 11/50... Training Step: 623... Training loss: 5.0521... 1.3701 sec/batch
Epoch: 11/50... Training Step: 624... Training loss: 5.0723... 1.3701 sec/batch
Epoch: 11/50... Training Step: 625... Training loss: 4.9610... 1.3713 sec/batch
Epoch: 11/50... Training Step: 626... Training loss: 5.0101... 1.3609 sec/batch
Epoch: 11/50... Training Step: 627... Training loss: 5.0620... 1.3963 sec/batch
Epoch: 11/50... Training Step: 628... Training loss: 5.0437... 1.3697 sec/batch
Epoch: 11/50... Training Step: 629... Training loss: 5.0571... 1.4050 sec/batch
Epoch: 11/50... Training Step: 630... Training loss: 5.0501... 1.3880 sec/batch
Epoch: 11/50... Training Step: 631... Training loss: 5.0946... 1.3701 sec/batch
Epoch: 11/50... Training Step: 632... Training loss: 4.9563... 1.3115 sec/batch
Epoch: 11/50... Training Step: 633... Training loss: 5.0243... 1.3777 sec/batch
Epoch: 11/50... Training Step: 634... Training loss: 4.9376... 1.4120 sec/batch
Epoch: 11/50... Training Step: 635... Training loss: 4.9662... 1.3254 sec/batch
Epoch: 11/50... Training Step: 636... Training loss: 4.9290... 1.3907 sec/batch
Epoch: 11/50... Training Step: 637... Training loss: 5.0119... 1.3885 sec/batch
Epoch: 11/50... Training Step: 638... Training loss: 5.0251... 1.3944 sec/batch
Epoch: 11/50... Training Step: 639... Training loss: 4.9286... 1.3837 sec/batch
Epoch: 11/50... Training Step: 640... Training loss: 4.9835... 1.3649 sec/batch
Epoch: 11/50... Training Step: 641... Training loss: 4.9286... 1.3710 sec/batch
Epoch: 11/50... Training Step: 642... Training loss: 4.8999... 1.3864 sec/batch
Epoch: 11/50... Training Step: 643... Training loss: 4.9815... 1.3635 sec/batch
Epoch: 11/50... Training Step: 644... Training loss: 4.9666... 1.2863 sec/batch
Epoch: 11/50... Training Step: 645... Training loss: 5.0088... 1.3796 sec/batch
Epoch: 11/50... Training Step: 646... Training loss: 4.9316... 1.3825 sec/batch
Epoch: 11/50... Training Step: 647... Training loss: 5.0419... 1.3930 sec/batch
Epoch: 11/50... Training Step: 648... Training loss: 4.9863... 1.3820 sec/batch
Epoch: 11/50... Training Step: 649... Training loss: 5.1076... 1.4004 sec/batch
Epoch: 11/50... Training Step: 650... Training loss: 4.9401... 1.3658 sec/batch
Epoch: 11/50... Training Step: 651... Training loss: 4.9119... 1.3165 sec/batch
Epoch: 11/50... Training Step: 652... Training loss: 4.9365... 1.4020 sec/batch
Epoch: 11/50... Training Step: 653... Training loss: 4.9979... 1.3738 sec/batch
Epoch: 11/50... Training Step: 654... Training loss: 4.8721... 1.3688 sec/batch
Epoch: 11/50... Training Step: 655... Training loss: 4.9003... 1.3734 sec/batch
Epoch: 11/50... Training Step: 656... Training loss: 4.9279... 1.3993 sec/batch
Epoch: 11/50... Training Step: 657... Training loss: 4.9469... 1.3645 sec/batch
Epoch: 11/50... Training Step: 658... Training loss: 4.9614... 1.3723 sec/batch
Epoch: 11/50... Training Step: 659... Training loss: 5.0027... 1.3888 sec/batch
Epoch: 11/50... Training Step: 660... Training loss: 4.9028... 1.2821 sec/batch
Epoch: 11/50... Training Step: 661... Training loss: 4.8802... 1.3838 sec/batch
Epoch: 11/50... Training Step: 662... Training loss: 4.9922... 1.3132 sec/batch
Epoch: 11/50... Training Step: 663... Training loss: 4.8807... 1.3614 sec/batch
Epoch: 11/50... Training Step: 664... Training loss: 4.9455... 1.3630 sec/batch
Epoch: 11/50... Training Step: 665... Training loss: 4.8591... 1.3874 sec/batch
Epoch: 11/50... Training Step: 666... Training loss: 4.8420... 1.3773 sec/batch
Epoch: 11/50... Training Step: 667... Training loss: 4.9448... 1.3643 sec/batch
Epoch: 11/50... Training Step: 668... Training loss: 4.8629... 1.3705 sec/batch
Epoch: 11/50... Training Step: 669... Training loss: 4.8890... 1.3436 sec/batch
Epoch: 11/50... Training Step: 670... Training loss: 4.9186... 1.4140 sec/batch
Epoch: 11/50... Training Step: 671... Training loss: 4.9028... 1.3694 sec/batch
Epoch: 12/50... Training Step: 672... Training loss: 5.1165... 1.3481 sec/batch
Epoch: 12/50... Training Step: 673... Training loss: 4.9019... 1.3404 sec/batch
Epoch: 12/50... Training Step: 674... Training loss: 4.8904... 1.3987 sec/batch
Epoch: 12/50... Training Step: 675... Training loss: 4.9554... 1.3738 sec/batch
Epoch: 12/50... Training Step: 676... Training loss: 4.9488... 1.3223 sec/batch
Epoch: 12/50... Training Step: 677... Training loss: 5.0020... 1.3178 sec/batch
Epoch: 12/50... Training Step: 678... Training loss: 4.9830... 1.3372 sec/batch
Epoch: 12/50... Training Step: 679... Training loss: 5.0148... 1.3833 sec/batch
Epoch: 12/50... Training Step: 680... Training loss: 4.9083... 1.3707 sec/batch
Epoch: 12/50... Training Step: 681... Training loss: 4.9594... 1.3845 sec/batch
Epoch: 12/50... Training Step: 682... Training loss: 5.0440... 1.3763 sec/batch
Epoch: 12/50... Training Step: 683... Training loss: 4.9218... 1.3861 sec/batch
Epoch: 12/50... Training Step: 684... Training loss: 4.9499... 1.3623 sec/batch
Epoch: 12/50... Training Step: 685... Training loss: 4.9590... 1.3365 sec/batch
Epoch: 12/50... Training Step: 686... Training loss: 4.8634... 1.3530 sec/batch
Epoch: 12/50... Training Step: 687... Training loss: 4.9175... 1.3833 sec/batch
Epoch: 12/50... Training Step: 688... Training loss: 4.9609... 1.4024 sec/batch
Epoch: 12/50... Training Step: 689... Training loss: 4.9508... 1.3032 sec/batch
Epoch: 12/50... Training Step: 690... Training loss: 4.9500... 1.3682 sec/batch
Epoch: 12/50... Training Step: 691... Training loss: 4.9516... 1.3725 sec/batch
Epoch: 12/50... Training Step: 692... Training loss: 4.9895... 1.4189 sec/batch
Epoch: 12/50... Training Step: 693... Training loss: 4.8548... 1.3808 sec/batch
Epoch: 12/50... Training Step: 694... Training loss: 4.9222... 1.3666 sec/batch
Epoch: 12/50... Training Step: 695... Training loss: 4.8224... 1.3171 sec/batch
Epoch: 12/50... Training Step: 696... Training loss: 4.8747... 1.4080 sec/batch
Epoch: 12/50... Training Step: 697... Training loss: 4.8314... 1.2938 sec/batch
Epoch: 12/50... Training Step: 698... Training loss: 4.9193... 1.3569 sec/batch
Epoch: 12/50... Training Step: 699... Training loss: 4.9320... 1.3709 sec/batch
Epoch: 12/50... Training Step: 700... Training loss: 4.8328... 1.3840 sec/batch
Epoch: 12/50... Training Step: 701... Training loss: 4.9051... 1.3817 sec/batch
Epoch: 12/50... Training Step: 702... Training loss: 4.8167... 1.3690 sec/batch
Epoch: 12/50... Training Step: 703... Training loss: 4.8065... 1.3862 sec/batch
Epoch: 12/50... Training Step: 704... Training loss: 4.8865... 1.2759 sec/batch
Epoch: 12/50... Training Step: 705... Training loss: 4.8631... 1.3688 sec/batch
Epoch: 12/50... Training Step: 706... Training loss: 4.8912... 1.2862 sec/batch
Epoch: 12/50... Training Step: 707... Training loss: 4.8385... 1.4072 sec/batch
Epoch: 12/50... Training Step: 708... Training loss: 4.9256... 1.3891 sec/batch
Epoch: 12/50... Training Step: 709... Training loss: 4.8916... 1.3164 sec/batch
Epoch: 12/50... Training Step: 710... Training loss: 5.0139... 1.3889 sec/batch
Epoch: 12/50... Training Step: 711... Training loss: 4.8295... 1.3733 sec/batch
Epoch: 12/50... Training Step: 712... Training loss: 4.8168... 1.3479 sec/batch
Epoch: 12/50... Training Step: 713... Training loss: 4.8307... 1.3855 sec/batch
Epoch: 12/50... Training Step: 714... Training loss: 4.9037... 1.4109 sec/batch
Epoch: 12/50... Training Step: 715... Training loss: 4.7853... 1.4281 sec/batch
Epoch: 12/50... Training Step: 716... Training loss: 4.8056... 1.3747 sec/batch
Epoch: 12/50... Training Step: 717... Training loss: 4.8359... 1.3180 sec/batch
Epoch: 12/50... Training Step: 718... Training loss: 4.8446... 1.3965 sec/batch
Epoch: 12/50... Training Step: 719... Training loss: 4.8609... 1.3832 sec/batch
Epoch: 12/50... Training Step: 720... Training loss: 4.9011... 1.3875 sec/batch
Epoch: 12/50... Training Step: 721... Training loss: 4.8175... 1.3632 sec/batch
Epoch: 12/50... Training Step: 722... Training loss: 4.7859... 1.3673 sec/batch
Epoch: 12/50... Training Step: 723... Training loss: 4.8949... 1.3486 sec/batch
Epoch: 12/50... Training Step: 724... Training loss: 4.7825... 1.3863 sec/batch
Epoch: 12/50... Training Step: 725... Training loss: 4.8607... 1.3913 sec/batch
Epoch: 12/50... Training Step: 726... Training loss: 4.7794... 1.3434 sec/batch
Epoch: 12/50... Training Step: 727... Training loss: 4.7477... 1.3790 sec/batch
Epoch: 12/50... Training Step: 728... Training loss: 4.8544... 1.3822 sec/batch
Epoch: 12/50... Training Step: 729... Training loss: 4.7817... 1.3966 sec/batch
Epoch: 12/50... Training Step: 730... Training loss: 4.8015... 1.3703 sec/batch
Epoch: 12/50... Training Step: 731... Training loss: 4.8482... 1.3720 sec/batch
Epoch: 12/50... Training Step: 732... Training loss: 4.8296... 1.3703 sec/batch
Epoch: 13/50... Training Step: 733... Training loss: 5.0313... 1.4254 sec/batch
Epoch: 13/50... Training Step: 734... Training loss: 4.8210... 1.3274 sec/batch
Epoch: 13/50... Training Step: 735... Training loss: 4.8077... 1.3718 sec/batch
Epoch: 13/50... Training Step: 736... Training loss: 4.8763... 1.3873 sec/batch
Epoch: 13/50... Training Step: 737... Training loss: 4.8609... 1.3859 sec/batch
Epoch: 13/50... Training Step: 738... Training loss: 4.9263... 1.3714 sec/batch
Epoch: 13/50... Training Step: 739... Training loss: 4.8981... 1.3737 sec/batch
Epoch: 13/50... Training Step: 740... Training loss: 4.9416... 1.3307 sec/batch
Epoch: 13/50... Training Step: 741... Training loss: 4.8506... 1.3689 sec/batch
Epoch: 13/50... Training Step: 742... Training loss: 4.8765... 1.3878 sec/batch
Epoch: 13/50... Training Step: 743... Training loss: 4.9807... 1.3925 sec/batch
Epoch: 13/50... Training Step: 744... Training loss: 4.8325... 1.3843 sec/batch
Epoch: 13/50... Training Step: 745... Training loss: 4.8738... 1.3861 sec/batch
Epoch: 13/50... Training Step: 746... Training loss: 4.8848... 1.3876 sec/batch
Epoch: 13/50... Training Step: 747... Training loss: 4.7860... 1.3849 sec/batch
Epoch: 13/50... Training Step: 748... Training loss: 4.8161... 1.3873 sec/batch
Epoch: 13/50... Training Step: 749... Training loss: 4.8769... 1.3878 sec/batch
Epoch: 13/50... Training Step: 750... Training loss: 4.8827... 1.3684 sec/batch
Epoch: 13/50... Training Step: 751... Training loss: 4.8633... 1.4071 sec/batch
Epoch: 13/50... Training Step: 752... Training loss: 4.8728... 1.3606 sec/batch
Epoch: 13/50... Training Step: 753... Training loss: 4.9202... 1.3964 sec/batch
Epoch: 13/50... Training Step: 754... Training loss: 4.7814... 1.3384 sec/batch
Epoch: 13/50... Training Step: 755... Training loss: 4.8454... 1.3875 sec/batch
Epoch: 13/50... Training Step: 756... Training loss: 4.7469... 1.3782 sec/batch
Epoch: 13/50... Training Step: 757... Training loss: 4.7874... 1.3746 sec/batch
Epoch: 13/50... Training Step: 758... Training loss: 4.7577... 1.3985 sec/batch
Epoch: 13/50... Training Step: 759... Training loss: 4.8474... 1.3857 sec/batch
Epoch: 13/50... Training Step: 760... Training loss: 4.8538... 1.3845 sec/batch
Epoch: 13/50... Training Step: 761... Training loss: 4.7368... 1.3809 sec/batch
Epoch: 13/50... Training Step: 762... Training loss: 4.8270... 1.3700 sec/batch
Epoch: 13/50... Training Step: 763... Training loss: 4.7467... 1.3723 sec/batch
Epoch: 13/50... Training Step: 764... Training loss: 4.7129... 1.3760 sec/batch
Epoch: 13/50... Training Step: 765... Training loss: 4.8026... 1.3981 sec/batch
Epoch: 13/50... Training Step: 766... Training loss: 4.7893... 1.3433 sec/batch
Epoch: 13/50... Training Step: 767... Training loss: 4.8252... 1.3114 sec/batch
Epoch: 13/50... Training Step: 768... Training loss: 4.7573... 1.3733 sec/batch
Epoch: 13/50... Training Step: 769... Training loss: 4.8466... 1.4033 sec/batch
Epoch: 13/50... Training Step: 770... Training loss: 4.8259... 1.3491 sec/batch
Epoch: 13/50... Training Step: 771... Training loss: 4.9397... 1.3710 sec/batch
Epoch: 13/50... Training Step: 772... Training loss: 4.7714... 1.3856 sec/batch
Epoch: 13/50... Training Step: 773... Training loss: 4.7196... 1.3865 sec/batch
Epoch: 13/50... Training Step: 774... Training loss: 4.7575... 1.3553 sec/batch
Epoch: 13/50... Training Step: 775... Training loss: 4.8130... 1.3757 sec/batch
Epoch: 13/50... Training Step: 776... Training loss: 4.7168... 1.3869 sec/batch
Epoch: 13/50... Training Step: 777... Training loss: 4.7314... 1.3640 sec/batch
Epoch: 13/50... Training Step: 778... Training loss: 4.7527... 1.4135 sec/batch
Epoch: 13/50... Training Step: 779... Training loss: 4.7732... 1.3562 sec/batch
Epoch: 13/50... Training Step: 780... Training loss: 4.7877... 1.3803 sec/batch
Epoch: 13/50... Training Step: 781... Training loss: 4.8285... 1.3785 sec/batch
Epoch: 13/50... Training Step: 782... Training loss: 4.7503... 1.3232 sec/batch
Epoch: 13/50... Training Step: 783... Training loss: 4.7165... 1.3968 sec/batch
Epoch: 13/50... Training Step: 784... Training loss: 4.8337... 1.3886 sec/batch
Epoch: 13/50... Training Step: 785... Training loss: 4.7139... 1.3847 sec/batch
Epoch: 13/50... Training Step: 786... Training loss: 4.7783... 1.3199 sec/batch
Epoch: 13/50... Training Step: 787... Training loss: 4.6988... 1.4197 sec/batch
Epoch: 13/50... Training Step: 788... Training loss: 4.6736... 1.3527 sec/batch
Epoch: 13/50... Training Step: 789... Training loss: 4.7981... 1.3949 sec/batch
Epoch: 13/50... Training Step: 790... Training loss: 4.7070... 1.3224 sec/batch
Epoch: 13/50... Training Step: 791... Training loss: 4.7526... 1.4045 sec/batch
Epoch: 13/50... Training Step: 792... Training loss: 4.7641... 1.4107 sec/batch
Epoch: 13/50... Training Step: 793... Training loss: 4.7641... 1.3693 sec/batch
Epoch: 14/50... Training Step: 794... Training loss: 4.9625... 1.3866 sec/batch
Epoch: 14/50... Training Step: 795... Training loss: 4.7415... 1.3545 sec/batch
Epoch: 14/50... Training Step: 796... Training loss: 4.7458... 1.3906 sec/batch
Epoch: 14/50... Training Step: 797... Training loss: 4.8090... 1.3889 sec/batch
Epoch: 14/50... Training Step: 798... Training loss: 4.7874... 1.3328 sec/batch
Epoch: 14/50... Training Step: 799... Training loss: 4.8497... 1.3727 sec/batch
Epoch: 14/50... Training Step: 800... Training loss: 4.8432... 1.3844 sec/batch
Epoch: 14/50... Training Step: 801... Training loss: 4.8532... 1.3836 sec/batch
Epoch: 14/50... Training Step: 802... Training loss: 4.7797... 1.3714 sec/batch
Epoch: 14/50... Training Step: 803... Training loss: 4.8192... 1.3469 sec/batch
Epoch: 14/50... Training Step: 804... Training loss: 4.9121... 1.3842 sec/batch
Epoch: 14/50... Training Step: 805... Training loss: 4.7692... 1.3616 sec/batch
Epoch: 14/50... Training Step: 806... Training loss: 4.8116... 1.3564 sec/batch
Epoch: 14/50... Training Step: 807... Training loss: 4.8147... 1.3787 sec/batch
Epoch: 14/50... Training Step: 808... Training loss: 4.7222... 1.3835 sec/batch
Epoch: 14/50... Training Step: 809... Training loss: 4.7539... 1.3544 sec/batch
Epoch: 14/50... Training Step: 810... Training loss: 4.8084... 1.3785 sec/batch
Epoch: 14/50... Training Step: 811... Training loss: 4.8036... 1.3471 sec/batch
Epoch: 14/50... Training Step: 812... Training loss: 4.8051... 1.3724 sec/batch
Epoch: 14/50... Training Step: 813... Training loss: 4.8139... 1.3657 sec/batch
Epoch: 14/50... Training Step: 814... Training loss: 4.8538... 1.3790 sec/batch
Epoch: 14/50... Training Step: 815... Training loss: 4.6997... 1.3350 sec/batch
Epoch: 14/50... Training Step: 816... Training loss: 4.7644... 1.4056 sec/batch
Epoch: 14/50... Training Step: 817... Training loss: 4.6742... 1.3872 sec/batch
Epoch: 14/50... Training Step: 818... Training loss: 4.7296... 1.3994 sec/batch
Epoch: 14/50... Training Step: 819... Training loss: 4.6679... 1.3584 sec/batch
Epoch: 14/50... Training Step: 820... Training loss: 4.7753... 1.3560 sec/batch
Epoch: 14/50... Training Step: 821... Training loss: 4.7736... 1.3037 sec/batch
Epoch: 14/50... Training Step: 822... Training loss: 4.6635... 1.3890 sec/batch
Epoch: 14/50... Training Step: 823... Training loss: 4.7618... 1.2474 sec/batch
Epoch: 14/50... Training Step: 824... Training loss: 4.6742... 1.3810 sec/batch
Epoch: 14/50... Training Step: 825... Training loss: 4.6532... 1.3835 sec/batch
Epoch: 14/50... Training Step: 826... Training loss: 4.7323... 1.3847 sec/batch
Epoch: 14/50... Training Step: 827... Training loss: 4.7175... 1.3694 sec/batch
Epoch: 14/50... Training Step: 828... Training loss: 4.7541... 1.3720 sec/batch
Epoch: 14/50... Training Step: 829... Training loss: 4.6984... 1.3629 sec/batch
Epoch: 14/50... Training Step: 830... Training loss: 4.7791... 1.3770 sec/batch
Epoch: 14/50... Training Step: 831... Training loss: 4.7553... 1.3468 sec/batch
Epoch: 14/50... Training Step: 832... Training loss: 4.8569... 1.3162 sec/batch
Epoch: 14/50... Training Step: 833... Training loss: 4.6916... 1.3863 sec/batch
Epoch: 14/50... Training Step: 834... Training loss: 4.6626... 1.3775 sec/batch
Epoch: 14/50... Training Step: 835... Training loss: 4.6880... 1.3596 sec/batch
Epoch: 14/50... Training Step: 836... Training loss: 4.7521... 1.3699 sec/batch
Epoch: 14/50... Training Step: 837... Training loss: 4.6581... 1.3805 sec/batch
Epoch: 14/50... Training Step: 838... Training loss: 4.6631... 1.3537 sec/batch
Epoch: 14/50... Training Step: 839... Training loss: 4.6874... 1.3943 sec/batch
Epoch: 14/50... Training Step: 840... Training loss: 4.7051... 1.3230 sec/batch
Epoch: 14/50... Training Step: 841... Training loss: 4.7089... 1.3947 sec/batch
Epoch: 14/50... Training Step: 842... Training loss: 4.7503... 1.3697 sec/batch
Epoch: 14/50... Training Step: 843... Training loss: 4.6799... 1.4008 sec/batch
Epoch: 14/50... Training Step: 844... Training loss: 4.6480... 1.3883 sec/batch
Epoch: 14/50... Training Step: 845... Training loss: 4.7683... 1.3885 sec/batch
Epoch: 14/50... Training Step: 846... Training loss: 4.6419... 1.3691 sec/batch
Epoch: 14/50... Training Step: 847... Training loss: 4.7269... 1.3864 sec/batch
Epoch: 14/50... Training Step: 848... Training loss: 4.6176... 1.3927 sec/batch
Epoch: 14/50... Training Step: 849... Training loss: 4.6001... 1.3138 sec/batch
Epoch: 14/50... Training Step: 850... Training loss: 4.7262... 1.3711 sec/batch
Epoch: 14/50... Training Step: 851... Training loss: 4.6317... 1.3661 sec/batch
Epoch: 14/50... Training Step: 852... Training loss: 4.6667... 1.3661 sec/batch
Epoch: 14/50... Training Step: 853... Training loss: 4.6894... 1.3851 sec/batch
Epoch: 14/50... Training Step: 854... Training loss: 4.6754... 1.3526 sec/batch
Epoch: 15/50... Training Step: 855... Training loss: 4.8766... 1.3920 sec/batch
Epoch: 15/50... Training Step: 856... Training loss: 4.6683... 1.3689 sec/batch
Epoch: 15/50... Training Step: 857... Training loss: 4.6697... 1.3403 sec/batch
Epoch: 15/50... Training Step: 858... Training loss: 4.7304... 1.3822 sec/batch
Epoch: 15/50... Training Step: 859... Training loss: 4.7174... 1.3734 sec/batch
Epoch: 15/50... Training Step: 860... Training loss: 4.7859... 1.3729 sec/batch
Epoch: 15/50... Training Step: 861... Training loss: 4.7732... 1.3405 sec/batch
Epoch: 15/50... Training Step: 862... Training loss: 4.7870... 1.3879 sec/batch
Epoch: 15/50... Training Step: 863... Training loss: 4.7168... 1.3122 sec/batch
Epoch: 15/50... Training Step: 864... Training loss: 4.7413... 1.3437 sec/batch
Epoch: 15/50... Training Step: 865... Training loss: 4.8455... 1.3885 sec/batch
Epoch: 15/50... Training Step: 866... Training loss: 4.7040... 1.4070 sec/batch
Epoch: 15/50... Training Step: 867... Training loss: 4.7473... 1.3416 sec/batch
Epoch: 15/50... Training Step: 868... Training loss: 4.7535... 1.3626 sec/batch
Epoch: 15/50... Training Step: 869... Training loss: 4.6563... 1.3552 sec/batch
Epoch: 15/50... Training Step: 870... Training loss: 4.6941... 1.4030 sec/batch
Epoch: 15/50... Training Step: 871... Training loss: 4.7511... 1.3509 sec/batch
Epoch: 15/50... Training Step: 872... Training loss: 4.7450... 1.3131 sec/batch
Epoch: 15/50... Training Step: 873... Training loss: 4.7421... 1.3904 sec/batch
Epoch: 15/50... Training Step: 874... Training loss: 4.7563... 1.3595 sec/batch
Epoch: 15/50... Training Step: 875... Training loss: 4.7930... 1.3718 sec/batch
Epoch: 15/50... Training Step: 876... Training loss: 4.6482... 1.3849 sec/batch
Epoch: 15/50... Training Step: 877... Training loss: 4.7079... 1.3729 sec/batch
Epoch: 15/50... Training Step: 878... Training loss: 4.6214... 1.3493 sec/batch
Epoch: 15/50... Training Step: 879... Training loss: 4.6655... 1.3848 sec/batch
Epoch: 15/50... Training Step: 880... Training loss: 4.6161... 1.3695 sec/batch
Epoch: 15/50... Training Step: 881... Training loss: 4.7101... 1.3913 sec/batch
Epoch: 15/50... Training Step: 882... Training loss: 4.7107... 1.3878 sec/batch
Epoch: 15/50... Training Step: 883... Training loss: 4.6186... 1.3440 sec/batch
Epoch: 15/50... Training Step: 884... Training loss: 4.6961... 1.4071 sec/batch
Epoch: 15/50... Training Step: 885... Training loss: 4.6097... 1.3732 sec/batch
Epoch: 15/50... Training Step: 886... Training loss: 4.5902... 1.3855 sec/batch
Epoch: 15/50... Training Step: 887... Training loss: 4.6670... 1.3700 sec/batch
Epoch: 15/50... Training Step: 888... Training loss: 4.6718... 1.4147 sec/batch
Epoch: 15/50... Training Step: 889... Training loss: 4.7017... 1.3888 sec/batch
Epoch: 15/50... Training Step: 890... Training loss: 4.6324... 1.3284 sec/batch
Epoch: 15/50... Training Step: 891... Training loss: 4.7317... 1.3852 sec/batch
Epoch: 15/50... Training Step: 892... Training loss: 4.6985... 1.3242 sec/batch
Epoch: 15/50... Training Step: 893... Training loss: 4.8142... 1.4053 sec/batch
Epoch: 15/50... Training Step: 894... Training loss: 4.6282... 1.3741 sec/batch
Epoch: 15/50... Training Step: 895... Training loss: 4.6141... 1.3917 sec/batch
Epoch: 15/50... Training Step: 896... Training loss: 4.6355... 1.3613 sec/batch
Epoch: 15/50... Training Step: 897... Training loss: 4.6985... 1.4398 sec/batch
Epoch: 15/50... Training Step: 898... Training loss: 4.5868... 1.3744 sec/batch
Epoch: 15/50... Training Step: 899... Training loss: 4.6078... 1.3064 sec/batch
Epoch: 15/50... Training Step: 900... Training loss: 4.6298... 1.3701 sec/batch
Epoch: 15/50... Training Step: 901... Training loss: 4.6474... 1.3200 sec/batch
Epoch: 15/50... Training Step: 902... Training loss: 4.6567... 1.3862 sec/batch
Epoch: 15/50... Training Step: 903... Training loss: 4.7019... 1.4029 sec/batch
Epoch: 15/50... Training Step: 904... Training loss: 4.6358... 1.3669 sec/batch
Epoch: 15/50... Training Step: 905... Training loss: 4.5847... 1.3732 sec/batch
Epoch: 15/50... Training Step: 906... Training loss: 4.7047... 1.3951 sec/batch
Epoch: 15/50... Training Step: 907... Training loss: 4.5861... 1.3679 sec/batch
Epoch: 15/50... Training Step: 908... Training loss: 4.6676... 1.3727 sec/batch
Epoch: 15/50... Training Step: 909... Training loss: 4.5770... 1.2881 sec/batch
Epoch: 15/50... Training Step: 910... Training loss: 4.5505... 1.3908 sec/batch
Epoch: 15/50... Training Step: 911... Training loss: 4.6826... 1.3388 sec/batch
Epoch: 15/50... Training Step: 912... Training loss: 4.5849... 1.3998 sec/batch
Epoch: 15/50... Training Step: 913... Training loss: 4.6362... 1.3856 sec/batch
Epoch: 15/50... Training Step: 914... Training loss: 4.6458... 1.3716 sec/batch
Epoch: 15/50... Training Step: 915... Training loss: 4.6287... 1.3693 sec/batch
Epoch: 16/50... Training Step: 916... Training loss: 4.8308... 1.3910 sec/batch
Epoch: 16/50... Training Step: 917... Training loss: 4.6131... 1.3754 sec/batch
Epoch: 16/50... Training Step: 918... Training loss: 4.6222... 1.3668 sec/batch
Epoch: 16/50... Training Step: 919... Training loss: 4.6920... 1.3735 sec/batch
Epoch: 16/50... Training Step: 920... Training loss: 4.6612... 1.3987 sec/batch
Epoch: 16/50... Training Step: 921... Training loss: 4.7351... 1.3809 sec/batch
Epoch: 16/50... Training Step: 922... Training loss: 4.7261... 1.3816 sec/batch
Epoch: 16/50... Training Step: 923... Training loss: 4.7307... 1.3515 sec/batch
Epoch: 16/50... Training Step: 924... Training loss: 4.6578... 1.4080 sec/batch
Epoch: 16/50... Training Step: 925... Training loss: 4.6985... 1.3776 sec/batch
Epoch: 16/50... Training Step: 926... Training loss: 4.8031... 1.3393 sec/batch
Epoch: 16/50... Training Step: 927... Training loss: 4.6373... 1.3559 sec/batch
Epoch: 16/50... Training Step: 928... Training loss: 4.6903... 1.4106 sec/batch
Epoch: 16/50... Training Step: 929... Training loss: 4.7014... 1.3845 sec/batch
Epoch: 16/50... Training Step: 930... Training loss: 4.6008... 1.3929 sec/batch
Epoch: 16/50... Training Step: 931... Training loss: 4.6402... 1.3938 sec/batch
Epoch: 16/50... Training Step: 932... Training loss: 4.6994... 1.4035 sec/batch
Epoch: 16/50... Training Step: 933... Training loss: 4.6937... 1.3258 sec/batch
Epoch: 16/50... Training Step: 934... Training loss: 4.6761... 1.3466 sec/batch
Epoch: 16/50... Training Step: 935... Training loss: 4.6978... 1.3647 sec/batch
Epoch: 16/50... Training Step: 936... Training loss: 4.7337... 1.3848 sec/batch
Epoch: 16/50... Training Step: 937... Training loss: 4.5858... 1.3600 sec/batch
Epoch: 16/50... Training Step: 938... Training loss: 4.6439... 1.4067 sec/batch
Epoch: 16/50... Training Step: 939... Training loss: 4.5629... 1.4044 sec/batch
Epoch: 16/50... Training Step: 940... Training loss: 4.6052... 1.3874 sec/batch
Epoch: 16/50... Training Step: 941... Training loss: 4.5633... 1.3727 sec/batch
Epoch: 16/50... Training Step: 942... Training loss: 4.6581... 1.3985 sec/batch
Epoch: 16/50... Training Step: 943... Training loss: 4.6638... 1.3795 sec/batch
Epoch: 16/50... Training Step: 944... Training loss: 4.5572... 1.3753 sec/batch
Epoch: 16/50... Training Step: 945... Training loss: 4.6323... 1.3465 sec/batch
Epoch: 16/50... Training Step: 946... Training loss: 4.5498... 1.3629 sec/batch
Epoch: 16/50... Training Step: 947... Training loss: 4.5377... 1.3628 sec/batch
Epoch: 16/50... Training Step: 948... Training loss: 4.6171... 1.3670 sec/batch
Epoch: 16/50... Training Step: 949... Training loss: 4.5990... 1.3787 sec/batch
Epoch: 16/50... Training Step: 950... Training loss: 4.6449... 1.3817 sec/batch
Epoch: 16/50... Training Step: 951... Training loss: 4.5887... 1.3821 sec/batch
Epoch: 16/50... Training Step: 952... Training loss: 4.6670... 1.3806 sec/batch
Epoch: 16/50... Training Step: 953... Training loss: 4.6298... 1.3887 sec/batch
Epoch: 16/50... Training Step: 954... Training loss: 4.7431... 1.3779 sec/batch
Epoch: 16/50... Training Step: 955... Training loss: 4.5681... 1.3795 sec/batch
Epoch: 16/50... Training Step: 956... Training loss: 4.5577... 1.4165 sec/batch
Epoch: 16/50... Training Step: 957... Training loss: 4.5757... 1.4178 sec/batch
Epoch: 16/50... Training Step: 958... Training loss: 4.6326... 1.3521 sec/batch
Epoch: 16/50... Training Step: 959... Training loss: 4.5327... 1.3939 sec/batch
Epoch: 16/50... Training Step: 960... Training loss: 4.5480... 1.3317 sec/batch
Epoch: 16/50... Training Step: 961... Training loss: 4.5750... 1.3778 sec/batch
Epoch: 16/50... Training Step: 962... Training loss: 4.5908... 1.3459 sec/batch
Epoch: 16/50... Training Step: 963... Training loss: 4.6002... 1.3598 sec/batch
Epoch: 16/50... Training Step: 964... Training loss: 4.6480... 1.3848 sec/batch
Epoch: 16/50... Training Step: 965... Training loss: 4.5616... 1.3330 sec/batch
Epoch: 16/50... Training Step: 966... Training loss: 4.5300... 1.3586 sec/batch
Epoch: 16/50... Training Step: 967... Training loss: 4.6372... 1.3734 sec/batch
Epoch: 16/50... Training Step: 968... Training loss: 4.5211... 1.3913 sec/batch
Epoch: 16/50... Training Step: 969... Training loss: 4.5985... 1.3814 sec/batch
Epoch: 16/50... Training Step: 970... Training loss: 4.5317... 1.3473 sec/batch
Epoch: 16/50... Training Step: 971... Training loss: 4.4962... 1.3846 sec/batch
Epoch: 16/50... Training Step: 972... Training loss: 4.6235... 1.3942 sec/batch
Epoch: 16/50... Training Step: 973... Training loss: 4.5314... 1.3532 sec/batch
Epoch: 16/50... Training Step: 974... Training loss: 4.5727... 1.4049 sec/batch
Epoch: 16/50... Training Step: 975... Training loss: 4.5908... 1.3904 sec/batch
Epoch: 16/50... Training Step: 976... Training loss: 4.5711... 1.3539 sec/batch
Epoch: 17/50... Training Step: 977... Training loss: 4.7609... 1.3543 sec/batch
Epoch: 17/50... Training Step: 978... Training loss: 4.5768... 1.4088 sec/batch
Epoch: 17/50... Training Step: 979... Training loss: 4.5635... 1.3836 sec/batch
Epoch: 17/50... Training Step: 980... Training loss: 4.6361... 1.3724 sec/batch
Epoch: 17/50... Training Step: 981... Training loss: 4.6088... 1.3144 sec/batch
Epoch: 17/50... Training Step: 982... Training loss: 4.6852... 1.3537 sec/batch
Epoch: 17/50... Training Step: 983... Training loss: 4.6629... 1.3369 sec/batch
Epoch: 17/50... Training Step: 984... Training loss: 4.6736... 1.3397 sec/batch
Epoch: 17/50... Training Step: 985... Training loss: 4.6092... 1.3799 sec/batch
Epoch: 17/50... Training Step: 986... Training loss: 4.6392... 1.3987 sec/batch
Epoch: 17/50... Training Step: 987... Training loss: 4.7529... 1.3894 sec/batch
Epoch: 17/50... Training Step: 988... Training loss: 4.5920... 1.3666 sec/batch
Epoch: 17/50... Training Step: 989... Training loss: 4.6492... 1.3095 sec/batch
Epoch: 17/50... Training Step: 990... Training loss: 4.6488... 1.3499 sec/batch
Epoch: 17/50... Training Step: 991... Training loss: 4.5643... 1.3672 sec/batch
Epoch: 17/50... Training Step: 992... Training loss: 4.5911... 1.3664 sec/batch
Epoch: 17/50... Training Step: 993... Training loss: 4.6456... 1.3856 sec/batch
Epoch: 17/50... Training Step: 994... Training loss: 4.6406... 1.3697 sec/batch
Epoch: 17/50... Training Step: 995... Training loss: 4.6344... 1.3794 sec/batch
Epoch: 17/50... Training Step: 996... Training loss: 4.6599... 1.3895 sec/batch
Epoch: 17/50... Training Step: 997... Training loss: 4.6834... 1.3633 sec/batch
Epoch: 17/50... Training Step: 998... Training loss: 4.5414... 1.3711 sec/batch
Epoch: 17/50... Training Step: 999... Training loss: 4.5834... 1.3909 sec/batch
Epoch: 17/50... Training Step: 1000... Training loss: 4.5178... 1.2514 sec/batch
Epoch: 17/50... Training Step: 1001... Training loss: 4.5560... 1.3685 sec/batch
Epoch: 17/50... Training Step: 1002... Training loss: 4.5157... 1.3426 sec/batch
Epoch: 17/50... Training Step: 1003... Training loss: 4.5955... 1.3795 sec/batch
Epoch: 17/50... Training Step: 1004... Training loss: 4.6231... 1.3902 sec/batch
Epoch: 17/50... Training Step: 1005... Training loss: 4.5030... 1.3812 sec/batch
Epoch: 17/50... Training Step: 1006... Training loss: 4.5878... 1.3910 sec/batch
Epoch: 17/50... Training Step: 1007... Training loss: 4.4987... 1.3330 sec/batch
Epoch: 17/50... Training Step: 1008... Training loss: 4.4865... 1.3892 sec/batch
Epoch: 17/50... Training Step: 1009... Training loss: 4.5691... 1.3931 sec/batch
Epoch: 17/50... Training Step: 1010... Training loss: 4.5585... 1.3867 sec/batch
Epoch: 17/50... Training Step: 1011... Training loss: 4.5893... 1.3684 sec/batch
Epoch: 17/50... Training Step: 1012... Training loss: 4.5254... 1.3795 sec/batch
Epoch: 17/50... Training Step: 1013... Training loss: 4.6272... 1.3832 sec/batch
Epoch: 17/50... Training Step: 1014... Training loss: 4.5892... 1.3354 sec/batch
Epoch: 17/50... Training Step: 1015... Training loss: 4.7001... 1.3715 sec/batch
Epoch: 17/50... Training Step: 1016... Training loss: 4.5190... 1.3390 sec/batch
Epoch: 17/50... Training Step: 1017... Training loss: 4.5120... 1.3824 sec/batch
Epoch: 17/50... Training Step: 1018... Training loss: 4.5329... 1.4053 sec/batch
Epoch: 17/50... Training Step: 1019... Training loss: 4.5812... 1.3145 sec/batch
Epoch: 17/50... Training Step: 1020... Training loss: 4.4944... 1.3737 sec/batch
Epoch: 17/50... Training Step: 1021... Training loss: 4.5002... 1.3991 sec/batch
Epoch: 17/50... Training Step: 1022... Training loss: 4.5336... 1.3708 sec/batch
Epoch: 17/50... Training Step: 1023... Training loss: 4.5484... 1.3730 sec/batch
Epoch: 17/50... Training Step: 1024... Training loss: 4.5504... 1.2814 sec/batch
Epoch: 17/50... Training Step: 1025... Training loss: 4.5947... 1.3910 sec/batch
Epoch: 17/50... Training Step: 1026... Training loss: 4.5208... 1.3873 sec/batch
Epoch: 17/50... Training Step: 1027... Training loss: 4.4842... 1.3705 sec/batch
Epoch: 17/50... Training Step: 1028... Training loss: 4.5997... 1.3989 sec/batch
Epoch: 17/50... Training Step: 1029... Training loss: 4.4745... 1.3950 sec/batch
Epoch: 17/50... Training Step: 1030... Training loss: 4.5518... 1.4138 sec/batch
Epoch: 17/50... Training Step: 1031... Training loss: 4.4640... 1.3692 sec/batch
Epoch: 17/50... Training Step: 1032... Training loss: 4.4440... 1.4001 sec/batch
Epoch: 17/50... Training Step: 1033... Training loss: 4.5761... 1.3695 sec/batch
Epoch: 17/50... Training Step: 1034... Training loss: 4.4810... 1.3798 sec/batch
Epoch: 17/50... Training Step: 1035... Training loss: 4.5125... 1.4364 sec/batch
Epoch: 17/50... Training Step: 1036... Training loss: 4.5381... 1.3297 sec/batch
Epoch: 17/50... Training Step: 1037... Training loss: 4.5246... 1.3795 sec/batch
Epoch: 18/50... Training Step: 1038... Training loss: 4.7109... 1.3889 sec/batch
Epoch: 18/50... Training Step: 1039... Training loss: 4.5157... 1.3907 sec/batch
Epoch: 18/50... Training Step: 1040... Training loss: 4.5187... 1.3813 sec/batch
Epoch: 18/50... Training Step: 1041... Training loss: 4.5682... 1.3778 sec/batch
Epoch: 18/50... Training Step: 1042... Training loss: 4.5623... 1.3852 sec/batch
Epoch: 18/50... Training Step: 1043... Training loss: 4.6343... 1.3323 sec/batch
Epoch: 18/50... Training Step: 1044... Training loss: 4.6181... 1.3904 sec/batch
Epoch: 18/50... Training Step: 1045... Training loss: 4.6199... 1.3671 sec/batch
Epoch: 18/50... Training Step: 1046... Training loss: 4.5603... 1.3867 sec/batch
Epoch: 18/50... Training Step: 1047... Training loss: 4.5969... 1.3701 sec/batch
Epoch: 18/50... Training Step: 1048... Training loss: 4.7036... 1.3285 sec/batch
Epoch: 18/50... Training Step: 1049... Training loss: 4.5311... 1.3894 sec/batch
Epoch: 18/50... Training Step: 1050... Training loss: 4.6001... 1.3629 sec/batch
Epoch: 18/50... Training Step: 1051... Training loss: 4.6072... 1.3468 sec/batch
Epoch: 18/50... Training Step: 1052... Training loss: 4.5091... 1.3650 sec/batch
Epoch: 18/50... Training Step: 1053... Training loss: 4.5474... 1.3635 sec/batch
Epoch: 18/50... Training Step: 1054... Training loss: 4.5937... 1.3991 sec/batch
Epoch: 18/50... Training Step: 1055... Training loss: 4.5992... 1.3741 sec/batch
Epoch: 18/50... Training Step: 1056... Training loss: 4.5937... 1.3725 sec/batch
Epoch: 18/50... Training Step: 1057... Training loss: 4.5927... 1.3844 sec/batch
Epoch: 18/50... Training Step: 1058... Training loss: 4.6431... 1.3782 sec/batch
Epoch: 18/50... Training Step: 1059... Training loss: 4.4909... 1.3445 sec/batch
Epoch: 18/50... Training Step: 1060... Training loss: 4.5349... 1.3714 sec/batch
Epoch: 18/50... Training Step: 1061... Training loss: 4.4626... 1.3925 sec/batch
Epoch: 18/50... Training Step: 1062... Training loss: 4.5150... 1.3901 sec/batch
Epoch: 18/50... Training Step: 1063... Training loss: 4.4571... 1.3247 sec/batch
Epoch: 18/50... Training Step: 1064... Training loss: 4.5536... 1.3770 sec/batch
Epoch: 18/50... Training Step: 1065... Training loss: 4.5698... 1.3700 sec/batch
Epoch: 18/50... Training Step: 1066... Training loss: 4.4587... 1.2995 sec/batch
Epoch: 18/50... Training Step: 1067... Training loss: 4.5362... 1.3742 sec/batch
Epoch: 18/50... Training Step: 1068... Training loss: 4.4429... 1.3898 sec/batch
Epoch: 18/50... Training Step: 1069... Training loss: 4.4386... 1.3807 sec/batch
Epoch: 18/50... Training Step: 1070... Training loss: 4.5246... 1.3632 sec/batch
Epoch: 18/50... Training Step: 1071... Training loss: 4.5096... 1.3689 sec/batch
Epoch: 18/50... Training Step: 1072... Training loss: 4.5531... 1.3874 sec/batch
Epoch: 18/50... Training Step: 1073... Training loss: 4.4821... 1.3814 sec/batch
Epoch: 18/50... Training Step: 1074... Training loss: 4.5764... 1.3593 sec/batch
Epoch: 18/50... Training Step: 1075... Training loss: 4.5401... 1.3787 sec/batch
Epoch: 18/50... Training Step: 1076... Training loss: 4.6541... 1.3527 sec/batch
Epoch: 18/50... Training Step: 1077... Training loss: 4.4848... 1.3254 sec/batch
Epoch: 18/50... Training Step: 1078... Training loss: 4.4744... 1.3728 sec/batch
Epoch: 18/50... Training Step: 1079... Training loss: 4.4885... 1.3866 sec/batch
Epoch: 18/50... Training Step: 1080... Training loss: 4.5445... 1.3923 sec/batch
Epoch: 18/50... Training Step: 1081... Training loss: 4.4375... 1.3901 sec/batch
Epoch: 18/50... Training Step: 1082... Training loss: 4.4611... 1.3789 sec/batch
Epoch: 18/50... Training Step: 1083... Training loss: 4.4742... 1.3999 sec/batch
Epoch: 18/50... Training Step: 1084... Training loss: 4.5054... 1.3936 sec/batch
Epoch: 18/50... Training Step: 1085... Training loss: 4.4922... 1.3494 sec/batch
Epoch: 18/50... Training Step: 1086... Training loss: 4.5508... 1.3461 sec/batch
Epoch: 18/50... Training Step: 1087... Training loss: 4.4728... 1.3907 sec/batch
Epoch: 18/50... Training Step: 1088... Training loss: 4.4197... 1.3136 sec/batch
Epoch: 18/50... Training Step: 1089... Training loss: 4.5588... 1.3133 sec/batch
Epoch: 18/50... Training Step: 1090... Training loss: 4.4230... 1.3876 sec/batch
Epoch: 18/50... Training Step: 1091... Training loss: 4.4961... 1.3791 sec/batch
Epoch: 18/50... Training Step: 1092... Training loss: 4.4209... 1.3559 sec/batch
Epoch: 18/50... Training Step: 1093... Training loss: 4.3967... 1.3605 sec/batch
Epoch: 18/50... Training Step: 1094... Training loss: 4.5172... 1.3878 sec/batch
Epoch: 18/50... Training Step: 1095... Training loss: 4.4576... 1.3216 sec/batch
Epoch: 18/50... Training Step: 1096... Training loss: 4.4720... 1.3571 sec/batch
Epoch: 18/50... Training Step: 1097... Training loss: 4.4997... 1.2499 sec/batch
Epoch: 18/50... Training Step: 1098... Training loss: 4.4692... 1.3943 sec/batch
Epoch: 19/50... Training Step: 1099... Training loss: 4.6502... 1.3811 sec/batch
Epoch: 19/50... Training Step: 1100... Training loss: 4.4660... 1.3430 sec/batch
Epoch: 19/50... Training Step: 1101... Training loss: 4.4620... 1.3836 sec/batch
Epoch: 19/50... Training Step: 1102... Training loss: 4.5204... 1.3952 sec/batch
Epoch: 19/50... Training Step: 1103... Training loss: 4.5173... 1.3805 sec/batch
Epoch: 19/50... Training Step: 1104... Training loss: 4.5877... 1.3145 sec/batch
Epoch: 19/50... Training Step: 1105... Training loss: 4.5625... 1.3858 sec/batch
Epoch: 19/50... Training Step: 1106... Training loss: 4.5656... 1.3099 sec/batch
Epoch: 19/50... Training Step: 1107... Training loss: 4.5199... 1.3368 sec/batch
Epoch: 19/50... Training Step: 1108... Training loss: 4.5475... 1.3651 sec/batch
Epoch: 19/50... Training Step: 1109... Training loss: 4.6506... 1.4011 sec/batch
Epoch: 19/50... Training Step: 1110... Training loss: 4.4848... 1.3729 sec/batch
Epoch: 19/50... Training Step: 1111... Training loss: 4.5475... 1.3845 sec/batch
Epoch: 19/50... Training Step: 1112... Training loss: 4.5591... 1.3554 sec/batch
Epoch: 19/50... Training Step: 1113... Training loss: 4.4763... 1.3570 sec/batch
Epoch: 19/50... Training Step: 1114... Training loss: 4.5038... 1.3788 sec/batch
Epoch: 19/50... Training Step: 1115... Training loss: 4.5418... 1.3763 sec/batch
Epoch: 19/50... Training Step: 1116... Training loss: 4.5543... 1.4204 sec/batch
Epoch: 19/50... Training Step: 1117... Training loss: 4.5408... 1.3229 sec/batch
Epoch: 19/50... Training Step: 1118... Training loss: 4.5571... 1.3706 sec/batch
Epoch: 19/50... Training Step: 1119... Training loss: 4.5949... 1.3742 sec/batch
Epoch: 19/50... Training Step: 1120... Training loss: 4.4502... 1.4006 sec/batch
Epoch: 19/50... Training Step: 1121... Training loss: 4.4902... 1.3744 sec/batch
Epoch: 19/50... Training Step: 1122... Training loss: 4.4175... 1.3836 sec/batch
Epoch: 19/50... Training Step: 1123... Training loss: 4.4696... 1.3641 sec/batch
Epoch: 19/50... Training Step: 1124... Training loss: 4.4092... 1.3839 sec/batch
Epoch: 19/50... Training Step: 1125... Training loss: 4.5183... 1.4139 sec/batch
Epoch: 19/50... Training Step: 1126... Training loss: 4.5180... 1.3685 sec/batch
Epoch: 19/50... Training Step: 1127... Training loss: 4.4168... 1.3848 sec/batch
Epoch: 19/50... Training Step: 1128... Training loss: 4.5006... 1.3456 sec/batch
Epoch: 19/50... Training Step: 1129... Training loss: 4.4183... 1.3574 sec/batch
Epoch: 19/50... Training Step: 1130... Training loss: 4.3991... 1.3847 sec/batch
Epoch: 19/50... Training Step: 1131... Training loss: 4.4844... 1.2653 sec/batch
Epoch: 19/50... Training Step: 1132... Training loss: 4.4577... 1.3860 sec/batch
Epoch: 19/50... Training Step: 1133... Training loss: 4.5106... 1.3910 sec/batch
Epoch: 19/50... Training Step: 1134... Training loss: 4.4469... 1.4120 sec/batch
Epoch: 19/50... Training Step: 1135... Training loss: 4.5537... 1.3552 sec/batch
Epoch: 19/50... Training Step: 1136... Training loss: 4.5121... 1.3713 sec/batch
Epoch: 19/50... Training Step: 1137... Training loss: 4.6141... 1.3617 sec/batch
Epoch: 19/50... Training Step: 1138... Training loss: 4.4386... 1.4187 sec/batch
Epoch: 19/50... Training Step: 1139... Training loss: 4.4230... 1.3526 sec/batch
Epoch: 19/50... Training Step: 1140... Training loss: 4.4382... 1.3742 sec/batch
Epoch: 19/50... Training Step: 1141... Training loss: 4.4960... 1.3252 sec/batch
Epoch: 19/50... Training Step: 1142... Training loss: 4.3907... 1.2963 sec/batch
Epoch: 19/50... Training Step: 1143... Training loss: 4.4332... 1.4070 sec/batch
Epoch: 19/50... Training Step: 1144... Training loss: 4.4369... 1.2838 sec/batch
Epoch: 19/50... Training Step: 1145... Training loss: 4.4607... 1.3340 sec/batch
Epoch: 19/50... Training Step: 1146... Training loss: 4.4511... 1.3755 sec/batch
Epoch: 19/50... Training Step: 1147... Training loss: 4.5139... 1.3856 sec/batch
Epoch: 19/50... Training Step: 1148... Training loss: 4.4386... 1.3680 sec/batch
Epoch: 19/50... Training Step: 1149... Training loss: 4.3876... 1.4017 sec/batch
Epoch: 19/50... Training Step: 1150... Training loss: 4.5129... 1.3691 sec/batch
Epoch: 19/50... Training Step: 1151... Training loss: 4.3887... 1.3875 sec/batch
Epoch: 19/50... Training Step: 1152... Training loss: 4.4718... 1.3938 sec/batch
Epoch: 19/50... Training Step: 1153... Training loss: 4.3753... 1.3781 sec/batch
Epoch: 19/50... Training Step: 1154... Training loss: 4.3652... 1.3773 sec/batch
Epoch: 19/50... Training Step: 1155... Training loss: 4.5035... 1.3169 sec/batch
Epoch: 19/50... Training Step: 1156... Training loss: 4.4083... 1.4038 sec/batch
Epoch: 19/50... Training Step: 1157... Training loss: 4.4265... 1.3703 sec/batch
Epoch: 19/50... Training Step: 1158... Training loss: 4.4605... 1.4058 sec/batch
Epoch: 19/50... Training Step: 1159... Training loss: 4.4365... 1.3537 sec/batch
Epoch: 20/50... Training Step: 1160... Training loss: 4.6250... 1.3309 sec/batch
Epoch: 20/50... Training Step: 1161... Training loss: 4.4405... 1.3894 sec/batch
Epoch: 20/50... Training Step: 1162... Training loss: 4.4364... 1.3556 sec/batch
Epoch: 20/50... Training Step: 1163... Training loss: 4.4947... 1.3803 sec/batch
Epoch: 20/50... Training Step: 1164... Training loss: 4.4685... 1.3823 sec/batch
Epoch: 20/50... Training Step: 1165... Training loss: 4.5437... 1.4013 sec/batch
Epoch: 20/50... Training Step: 1166... Training loss: 4.5318... 1.3887 sec/batch
Epoch: 20/50... Training Step: 1167... Training loss: 4.5398... 1.4055 sec/batch
Epoch: 20/50... Training Step: 1168... Training loss: 4.4846... 1.3656 sec/batch
Epoch: 20/50... Training Step: 1169... Training loss: 4.5248... 1.3685 sec/batch
Epoch: 20/50... Training Step: 1170... Training loss: 4.6323... 1.4041 sec/batch
Epoch: 20/50... Training Step: 1171... Training loss: 4.4710... 1.3828 sec/batch
Epoch: 20/50... Training Step: 1172... Training loss: 4.5135... 1.3709 sec/batch
Epoch: 20/50... Training Step: 1173... Training loss: 4.5240... 1.3770 sec/batch
Epoch: 20/50... Training Step: 1174... Training loss: 4.4321... 1.3592 sec/batch
Epoch: 20/50... Training Step: 1175... Training loss: 4.4631... 1.3708 sec/batch
Epoch: 20/50... Training Step: 1176... Training loss: 4.5116... 1.3696 sec/batch
Epoch: 20/50... Training Step: 1177... Training loss: 4.5230... 1.3679 sec/batch
Epoch: 20/50... Training Step: 1178... Training loss: 4.5170... 1.3688 sec/batch
Epoch: 20/50... Training Step: 1179... Training loss: 4.5229... 1.3919 sec/batch
Epoch: 20/50... Training Step: 1180... Training loss: 4.5586... 1.3695 sec/batch
Epoch: 20/50... Training Step: 1181... Training loss: 4.4093... 1.4111 sec/batch
Epoch: 20/50... Training Step: 1182... Training loss: 4.4427... 1.3357 sec/batch
Epoch: 20/50... Training Step: 1183... Training loss: 4.3872... 1.3887 sec/batch
Epoch: 20/50... Training Step: 1184... Training loss: 4.4391... 1.3812 sec/batch
Epoch: 20/50... Training Step: 1185... Training loss: 4.3822... 1.3842 sec/batch
Epoch: 20/50... Training Step: 1186... Training loss: 4.4719... 1.3585 sec/batch
Epoch: 20/50... Training Step: 1187... Training loss: 4.5023... 1.3540 sec/batch
Epoch: 20/50... Training Step: 1188... Training loss: 4.3812... 1.3839 sec/batch
Epoch: 20/50... Training Step: 1189... Training loss: 4.4656... 1.4101 sec/batch
Epoch: 20/50... Training Step: 1190... Training loss: 4.3808... 1.3885 sec/batch
Epoch: 20/50... Training Step: 1191... Training loss: 4.3537... 1.3737 sec/batch
Epoch: 20/50... Training Step: 1192... Training loss: 4.4379... 1.3759 sec/batch
Epoch: 20/50... Training Step: 1193... Training loss: 4.4374... 1.3943 sec/batch
Epoch: 20/50... Training Step: 1194... Training loss: 4.4775... 1.3695 sec/batch
Epoch: 20/50... Training Step: 1195... Training loss: 4.4032... 1.3543 sec/batch
Epoch: 20/50... Training Step: 1196... Training loss: 4.5052... 1.3832 sec/batch
Epoch: 20/50... Training Step: 1197... Training loss: 4.4644... 1.4186 sec/batch
Epoch: 20/50... Training Step: 1198... Training loss: 4.5769... 1.3737 sec/batch
Epoch: 20/50... Training Step: 1199... Training loss: 4.4062... 1.3551 sec/batch
Epoch: 20/50... Training Step: 1200... Training loss: 4.3928... 1.3887 sec/batch
Epoch: 20/50... Training Step: 1201... Training loss: 4.4098... 1.3133 sec/batch
Epoch: 20/50... Training Step: 1202... Training loss: 4.4504... 1.3716 sec/batch
Epoch: 20/50... Training Step: 1203... Training loss: 4.3614... 1.3677 sec/batch
Epoch: 20/50... Training Step: 1204... Training loss: 4.3920... 1.2926 sec/batch
Epoch: 20/50... Training Step: 1205... Training loss: 4.3981... 1.3732 sec/batch
Epoch: 20/50... Training Step: 1206... Training loss: 4.4198... 1.3800 sec/batch
Epoch: 20/50... Training Step: 1207... Training loss: 4.4116... 1.3654 sec/batch
Epoch: 20/50... Training Step: 1208... Training loss: 4.4671... 1.3949 sec/batch
Epoch: 20/50... Training Step: 1209... Training loss: 4.3996... 1.3347 sec/batch
Epoch: 20/50... Training Step: 1210... Training loss: 4.3559... 1.3737 sec/batch
Epoch: 20/50... Training Step: 1211... Training loss: 4.4812... 1.3705 sec/batch
Epoch: 20/50... Training Step: 1212... Training loss: 4.3474... 1.3651 sec/batch
Epoch: 20/50... Training Step: 1213... Training loss: 4.4154... 1.4259 sec/batch
Epoch: 20/50... Training Step: 1214... Training loss: 4.3426... 1.2503 sec/batch
Epoch: 20/50... Training Step: 1215... Training loss: 4.3213... 1.3726 sec/batch
Epoch: 20/50... Training Step: 1216... Training loss: 4.4463... 1.3737 sec/batch
Epoch: 20/50... Training Step: 1217... Training loss: 4.3760... 1.3891 sec/batch
Epoch: 20/50... Training Step: 1218... Training loss: 4.3950... 1.3518 sec/batch
Epoch: 20/50... Training Step: 1219... Training loss: 4.4217... 1.3514 sec/batch
Epoch: 20/50... Training Step: 1220... Training loss: 4.4110... 1.3973 sec/batch
Epoch: 21/50... Training Step: 1221... Training loss: 4.5814... 1.3720 sec/batch
Epoch: 21/50... Training Step: 1222... Training loss: 4.3988... 1.4036 sec/batch
Epoch: 21/50... Training Step: 1223... Training loss: 4.3913... 1.3764 sec/batch
Epoch: 21/50... Training Step: 1224... Training loss: 4.4537... 1.2911 sec/batch
Epoch: 21/50... Training Step: 1225... Training loss: 4.4336... 1.3893 sec/batch
Epoch: 21/50... Training Step: 1226... Training loss: 4.5077... 1.3837 sec/batch
Epoch: 21/50... Training Step: 1227... Training loss: 4.4906... 1.3864 sec/batch
Epoch: 21/50... Training Step: 1228... Training loss: 4.4833... 1.3996 sec/batch
Epoch: 21/50... Training Step: 1229... Training loss: 4.4434... 1.3732 sec/batch
Epoch: 21/50... Training Step: 1230... Training loss: 4.4810... 1.3744 sec/batch
Epoch: 21/50... Training Step: 1231... Training loss: 4.5861... 1.3512 sec/batch
Epoch: 21/50... Training Step: 1232... Training loss: 4.4228... 1.4138 sec/batch
Epoch: 21/50... Training Step: 1233... Training loss: 4.4645... 1.3775 sec/batch
Epoch: 21/50... Training Step: 1234... Training loss: 4.4883... 1.3744 sec/batch
Epoch: 21/50... Training Step: 1235... Training loss: 4.4015... 1.4313 sec/batch
Epoch: 21/50... Training Step: 1236... Training loss: 4.4233... 1.3647 sec/batch
Epoch: 21/50... Training Step: 1237... Training loss: 4.4848... 1.3839 sec/batch
Epoch: 21/50... Training Step: 1238... Training loss: 4.4842... 1.3896 sec/batch
Epoch: 21/50... Training Step: 1239... Training loss: 4.4736... 1.3181 sec/batch
Epoch: 21/50... Training Step: 1240... Training loss: 4.4902... 1.3907 sec/batch
Epoch: 21/50... Training Step: 1241... Training loss: 4.5191... 1.3706 sec/batch
Epoch: 21/50... Training Step: 1242... Training loss: 4.3788... 1.3730 sec/batch
Epoch: 21/50... Training Step: 1243... Training loss: 4.4205... 1.3865 sec/batch
Epoch: 21/50... Training Step: 1244... Training loss: 4.3551... 1.3677 sec/batch
Epoch: 21/50... Training Step: 1245... Training loss: 4.4084... 1.3248 sec/batch
Epoch: 21/50... Training Step: 1246... Training loss: 4.3616... 1.4216 sec/batch
Epoch: 21/50... Training Step: 1247... Training loss: 4.4379... 1.3736 sec/batch
Epoch: 21/50... Training Step: 1248... Training loss: 4.4647... 1.3745 sec/batch
Epoch: 21/50... Training Step: 1249... Training loss: 4.3556... 1.4135 sec/batch
Epoch: 21/50... Training Step: 1250... Training loss: 4.4287... 1.3720 sec/batch
Epoch: 21/50... Training Step: 1251... Training loss: 4.3464... 1.3719 sec/batch
Epoch: 21/50... Training Step: 1252... Training loss: 4.3347... 1.3767 sec/batch
Epoch: 21/50... Training Step: 1253... Training loss: 4.4095... 1.4146 sec/batch
Epoch: 21/50... Training Step: 1254... Training loss: 4.4043... 1.3719 sec/batch
Epoch: 21/50... Training Step: 1255... Training loss: 4.4259... 1.3791 sec/batch
Epoch: 21/50... Training Step: 1256... Training loss: 4.3723... 1.4021 sec/batch
Epoch: 21/50... Training Step: 1257... Training loss: 4.4804... 1.3776 sec/batch
Epoch: 21/50... Training Step: 1258... Training loss: 4.4366... 1.4026 sec/batch
Epoch: 21/50... Training Step: 1259... Training loss: 4.5357... 1.3816 sec/batch
Epoch: 21/50... Training Step: 1260... Training loss: 4.3665... 1.3887 sec/batch
Epoch: 21/50... Training Step: 1261... Training loss: 4.3555... 1.3122 sec/batch
Epoch: 21/50... Training Step: 1262... Training loss: 4.3696... 1.3997 sec/batch
Epoch: 21/50... Training Step: 1263... Training loss: 4.4170... 1.3728 sec/batch
Epoch: 21/50... Training Step: 1264... Training loss: 4.3193... 1.3747 sec/batch
Epoch: 21/50... Training Step: 1265... Training loss: 4.3456... 1.3774 sec/batch
Epoch: 21/50... Training Step: 1266... Training loss: 4.3626... 1.3650 sec/batch
Epoch: 21/50... Training Step: 1267... Training loss: 4.3821... 1.4172 sec/batch
Epoch: 21/50... Training Step: 1268... Training loss: 4.3784... 1.3763 sec/batch
Epoch: 21/50... Training Step: 1269... Training loss: 4.4304... 1.3763 sec/batch
Epoch: 21/50... Training Step: 1270... Training loss: 4.3677... 1.2811 sec/batch
Epoch: 21/50... Training Step: 1271... Training loss: 4.3211... 1.3810 sec/batch
Epoch: 21/50... Training Step: 1272... Training loss: 4.4556... 1.3600 sec/batch
Epoch: 21/50... Training Step: 1273... Training loss: 4.3114... 1.3735 sec/batch
Epoch: 21/50... Training Step: 1274... Training loss: 4.3925... 1.3483 sec/batch
Epoch: 21/50... Training Step: 1275... Training loss: 4.3129... 1.3946 sec/batch
Epoch: 21/50... Training Step: 1276... Training loss: 4.2910... 1.4220 sec/batch
Epoch: 21/50... Training Step: 1277... Training loss: 4.4237... 1.3735 sec/batch
Epoch: 21/50... Training Step: 1278... Training loss: 4.3400... 1.3617 sec/batch
Epoch: 21/50... Training Step: 1279... Training loss: 4.3565... 1.3971 sec/batch
Epoch: 21/50... Training Step: 1280... Training loss: 4.3818... 1.3783 sec/batch
Epoch: 21/50... Training Step: 1281... Training loss: 4.3659... 1.3208 sec/batch
Epoch: 22/50... Training Step: 1282... Training loss: 4.5369... 1.3944 sec/batch
Epoch: 22/50... Training Step: 1283... Training loss: 4.3647... 1.3719 sec/batch
Epoch: 22/50... Training Step: 1284... Training loss: 4.3564... 1.3750 sec/batch
Epoch: 22/50... Training Step: 1285... Training loss: 4.4141... 1.4003 sec/batch
Epoch: 22/50... Training Step: 1286... Training loss: 4.4066... 1.3932 sec/batch
Epoch: 22/50... Training Step: 1287... Training loss: 4.4618... 1.3817 sec/batch
Epoch: 22/50... Training Step: 1288... Training loss: 4.4597... 1.3597 sec/batch
Epoch: 22/50... Training Step: 1289... Training loss: 4.4442... 1.3784 sec/batch
Epoch: 22/50... Training Step: 1290... Training loss: 4.4065... 1.3717 sec/batch
Epoch: 22/50... Training Step: 1291... Training loss: 4.4453... 1.3708 sec/batch
Epoch: 22/50... Training Step: 1292... Training loss: 4.5605... 1.3685 sec/batch
Epoch: 22/50... Training Step: 1293... Training loss: 4.3839... 1.3939 sec/batch
Epoch: 22/50... Training Step: 1294... Training loss: 4.4490... 1.2674 sec/batch
Epoch: 22/50... Training Step: 1295... Training loss: 4.4469... 1.3766 sec/batch
Epoch: 22/50... Training Step: 1296... Training loss: 4.3590... 1.3495 sec/batch
Epoch: 22/50... Training Step: 1297... Training loss: 4.3831... 1.3966 sec/batch
Epoch: 22/50... Training Step: 1298... Training loss: 4.4225... 1.3033 sec/batch
Epoch: 22/50... Training Step: 1299... Training loss: 4.4471... 1.3669 sec/batch
Epoch: 22/50... Training Step: 1300... Training loss: 4.4434... 1.3976 sec/batch
Epoch: 22/50... Training Step: 1301... Training loss: 4.4464... 1.3769 sec/batch
Epoch: 22/50... Training Step: 1302... Training loss: 4.4829... 1.3633 sec/batch
Epoch: 22/50... Training Step: 1303... Training loss: 4.3365... 1.3958 sec/batch
Epoch: 22/50... Training Step: 1304... Training loss: 4.3797... 1.3914 sec/batch
Epoch: 22/50... Training Step: 1305... Training loss: 4.3127... 1.3745 sec/batch
Epoch: 22/50... Training Step: 1306... Training loss: 4.3594... 1.3487 sec/batch
Epoch: 22/50... Training Step: 1307... Training loss: 4.3107... 1.3581 sec/batch
Epoch: 22/50... Training Step: 1308... Training loss: 4.3970... 1.3319 sec/batch
Epoch: 22/50... Training Step: 1309... Training loss: 4.4251... 1.3691 sec/batch
Epoch: 22/50... Training Step: 1310... Training loss: 4.3180... 1.3882 sec/batch
Epoch: 22/50... Training Step: 1311... Training loss: 4.3949... 1.3302 sec/batch
Epoch: 22/50... Training Step: 1312... Training loss: 4.3100... 1.4230 sec/batch
Epoch: 22/50... Training Step: 1313... Training loss: 4.2932... 1.3865 sec/batch
Epoch: 22/50... Training Step: 1314... Training loss: 4.3749... 1.3727 sec/batch
Epoch: 22/50... Training Step: 1315... Training loss: 4.3627... 1.3929 sec/batch
Epoch: 22/50... Training Step: 1316... Training loss: 4.3903... 1.3765 sec/batch
Epoch: 22/50... Training Step: 1317... Training loss: 4.3450... 1.3673 sec/batch
Epoch: 22/50... Training Step: 1318... Training loss: 4.4380... 1.3483 sec/batch
Epoch: 22/50... Training Step: 1319... Training loss: 4.4087... 1.3858 sec/batch
Epoch: 22/50... Training Step: 1320... Training loss: 4.5040... 1.3549 sec/batch
Epoch: 22/50... Training Step: 1321... Training loss: 4.3231... 1.3606 sec/batch
Epoch: 22/50... Training Step: 1322... Training loss: 4.3288... 1.3881 sec/batch
Epoch: 22/50... Training Step: 1323... Training loss: 4.3310... 1.3423 sec/batch
Epoch: 22/50... Training Step: 1324... Training loss: 4.3850... 1.3806 sec/batch
Epoch: 22/50... Training Step: 1325... Training loss: 4.2987... 1.3913 sec/batch
Epoch: 22/50... Training Step: 1326... Training loss: 4.3214... 1.3816 sec/batch
Epoch: 22/50... Training Step: 1327... Training loss: 4.3315... 1.3860 sec/batch
Epoch: 22/50... Training Step: 1328... Training loss: 4.3567... 1.3899 sec/batch
Epoch: 22/50... Training Step: 1329... Training loss: 4.3440... 1.3902 sec/batch
Epoch: 22/50... Training Step: 1330... Training loss: 4.3990... 1.3201 sec/batch
Epoch: 22/50... Training Step: 1331... Training loss: 4.3233... 1.3263 sec/batch
Epoch: 22/50... Training Step: 1332... Training loss: 4.2951... 1.3737 sec/batch
Epoch: 22/50... Training Step: 1333... Training loss: 4.4070... 1.3301 sec/batch
Epoch: 22/50... Training Step: 1334... Training loss: 4.2849... 1.3941 sec/batch
Epoch: 22/50... Training Step: 1335... Training loss: 4.3559... 1.3911 sec/batch
Epoch: 22/50... Training Step: 1336... Training loss: 4.2727... 1.3725 sec/batch
Epoch: 22/50... Training Step: 1337... Training loss: 4.2528... 1.4033 sec/batch
Epoch: 22/50... Training Step: 1338... Training loss: 4.3918... 1.3744 sec/batch
Epoch: 22/50... Training Step: 1339... Training loss: 4.3165... 1.4105 sec/batch
Epoch: 22/50... Training Step: 1340... Training loss: 4.3370... 1.3346 sec/batch
Epoch: 22/50... Training Step: 1341... Training loss: 4.3600... 1.3896 sec/batch
Epoch: 22/50... Training Step: 1342... Training loss: 4.3434... 1.3133 sec/batch
Epoch: 23/50... Training Step: 1343... Training loss: 4.5024... 1.3930 sec/batch
Epoch: 23/50... Training Step: 1344... Training loss: 4.3472... 1.3914 sec/batch
Epoch: 23/50... Training Step: 1345... Training loss: 4.3378... 1.3884 sec/batch
Epoch: 23/50... Training Step: 1346... Training loss: 4.3760... 1.4025 sec/batch
Epoch: 23/50... Training Step: 1347... Training loss: 4.3723... 1.3682 sec/batch
Epoch: 23/50... Training Step: 1348... Training loss: 4.4399... 1.4140 sec/batch
Epoch: 23/50... Training Step: 1349... Training loss: 4.4430... 1.3794 sec/batch
Epoch: 23/50... Training Step: 1350... Training loss: 4.4201... 1.3763 sec/batch
Epoch: 23/50... Training Step: 1351... Training loss: 4.3841... 1.3378 sec/batch
Epoch: 23/50... Training Step: 1352... Training loss: 4.4209... 1.3742 sec/batch
Epoch: 23/50... Training Step: 1353... Training loss: 4.5373... 1.3599 sec/batch
Epoch: 23/50... Training Step: 1354... Training loss: 4.3716... 1.3818 sec/batch
Epoch: 23/50... Training Step: 1355... Training loss: 4.4117... 1.4002 sec/batch
Epoch: 23/50... Training Step: 1356... Training loss: 4.4261... 1.3804 sec/batch
Epoch: 23/50... Training Step: 1357... Training loss: 4.3309... 1.3966 sec/batch
Epoch: 23/50... Training Step: 1358... Training loss: 4.3595... 1.3545 sec/batch
Epoch: 23/50... Training Step: 1359... Training loss: 4.4062... 1.3678 sec/batch
Epoch: 23/50... Training Step: 1360... Training loss: 4.4162... 1.3717 sec/batch
Epoch: 23/50... Training Step: 1361... Training loss: 4.4144... 1.3946 sec/batch
Epoch: 23/50... Training Step: 1362... Training loss: 4.4298... 1.3959 sec/batch
Epoch: 23/50... Training Step: 1363... Training loss: 4.4490... 1.3734 sec/batch
Epoch: 23/50... Training Step: 1364... Training loss: 4.3274... 1.3749 sec/batch
Epoch: 23/50... Training Step: 1365... Training loss: 4.3491... 1.3649 sec/batch
Epoch: 23/50... Training Step: 1366... Training loss: 4.2882... 1.3928 sec/batch
Epoch: 23/50... Training Step: 1367... Training loss: 4.3335... 1.2763 sec/batch
Epoch: 23/50... Training Step: 1368... Training loss: 4.2742... 1.3711 sec/batch
Epoch: 23/50... Training Step: 1369... Training loss: 4.3716... 1.3781 sec/batch
Epoch: 23/50... Training Step: 1370... Training loss: 4.3966... 1.4041 sec/batch
Epoch: 23/50... Training Step: 1371... Training loss: 4.2833... 1.3970 sec/batch
Epoch: 23/50... Training Step: 1372... Training loss: 4.3659... 1.3532 sec/batch
Epoch: 23/50... Training Step: 1373... Training loss: 4.2669... 1.3679 sec/batch
Epoch: 23/50... Training Step: 1374... Training loss: 4.2628... 1.3644 sec/batch
Epoch: 23/50... Training Step: 1375... Training loss: 4.3408... 1.3489 sec/batch
Epoch: 23/50... Training Step: 1376... Training loss: 4.3207... 1.3488 sec/batch
Epoch: 23/50... Training Step: 1377... Training loss: 4.3740... 1.3940 sec/batch
Epoch: 23/50... Training Step: 1378... Training loss: 4.3017... 1.3914 sec/batch
Epoch: 23/50... Training Step: 1379... Training loss: 4.3961... 1.3946 sec/batch
Epoch: 23/50... Training Step: 1380... Training loss: 4.3588... 1.3794 sec/batch
Epoch: 23/50... Training Step: 1381... Training loss: 4.4623... 1.3714 sec/batch
Epoch: 23/50... Training Step: 1382... Training loss: 4.3004... 1.3895 sec/batch
Epoch: 23/50... Training Step: 1383... Training loss: 4.2931... 1.3739 sec/batch
Epoch: 23/50... Training Step: 1384... Training loss: 4.3058... 1.3337 sec/batch
Epoch: 23/50... Training Step: 1385... Training loss: 4.3520... 1.3894 sec/batch
Epoch: 23/50... Training Step: 1386... Training loss: 4.2554... 1.3973 sec/batch
Epoch: 23/50... Training Step: 1387... Training loss: 4.2855... 1.3692 sec/batch
Epoch: 23/50... Training Step: 1388... Training loss: 4.3074... 1.4076 sec/batch
Epoch: 23/50... Training Step: 1389... Training loss: 4.3223... 1.3753 sec/batch
Epoch: 23/50... Training Step: 1390... Training loss: 4.3138... 1.3662 sec/batch
Epoch: 23/50... Training Step: 1391... Training loss: 4.3565... 1.4037 sec/batch
Epoch: 23/50... Training Step: 1392... Training loss: 4.2959... 1.4154 sec/batch
Epoch: 23/50... Training Step: 1393... Training loss: 4.2604... 1.3707 sec/batch
Epoch: 23/50... Training Step: 1394... Training loss: 4.3785... 1.3149 sec/batch
Epoch: 23/50... Training Step: 1395... Training loss: 4.2458... 1.3798 sec/batch
Epoch: 23/50... Training Step: 1396... Training loss: 4.3215... 1.3672 sec/batch
Epoch: 23/50... Training Step: 1397... Training loss: 4.2555... 1.3921 sec/batch
Epoch: 23/50... Training Step: 1398... Training loss: 4.2160... 1.3702 sec/batch
Epoch: 23/50... Training Step: 1399... Training loss: 4.3573... 1.3836 sec/batch
Epoch: 23/50... Training Step: 1400... Training loss: 4.2960... 1.3866 sec/batch
Epoch: 23/50... Training Step: 1401... Training loss: 4.3066... 1.3103 sec/batch
Epoch: 23/50... Training Step: 1402... Training loss: 4.3186... 1.3706 sec/batch
Epoch: 23/50... Training Step: 1403... Training loss: 4.3087... 1.3736 sec/batch
Epoch: 24/50... Training Step: 1404... Training loss: 4.4767... 1.3995 sec/batch
Epoch: 24/50... Training Step: 1405... Training loss: 4.3081... 1.3784 sec/batch
Epoch: 24/50... Training Step: 1406... Training loss: 4.3016... 1.3761 sec/batch
Epoch: 24/50... Training Step: 1407... Training loss: 4.3526... 1.3749 sec/batch
Epoch: 24/50... Training Step: 1408... Training loss: 4.3360... 1.3906 sec/batch
Epoch: 24/50... Training Step: 1409... Training loss: 4.4104... 1.3692 sec/batch
Epoch: 24/50... Training Step: 1410... Training loss: 4.3877... 1.3704 sec/batch
Epoch: 24/50... Training Step: 1411... Training loss: 4.3886... 1.3714 sec/batch
Epoch: 24/50... Training Step: 1412... Training loss: 4.3411... 1.3917 sec/batch
Epoch: 24/50... Training Step: 1413... Training loss: 4.3811... 1.3897 sec/batch
Epoch: 24/50... Training Step: 1414... Training loss: 4.4932... 1.3757 sec/batch
Epoch: 24/50... Training Step: 1415... Training loss: 4.3203... 1.3891 sec/batch
Epoch: 24/50... Training Step: 1416... Training loss: 4.3868... 1.3652 sec/batch
Epoch: 24/50... Training Step: 1417... Training loss: 4.3849... 1.4024 sec/batch
Epoch: 24/50... Training Step: 1418... Training loss: 4.3081... 1.3692 sec/batch
Epoch: 24/50... Training Step: 1419... Training loss: 4.3223... 1.4007 sec/batch
Epoch: 24/50... Training Step: 1420... Training loss: 4.3709... 1.3739 sec/batch
Epoch: 24/50... Training Step: 1421... Training loss: 4.3867... 1.3785 sec/batch
Epoch: 24/50... Training Step: 1422... Training loss: 4.3725... 1.4098 sec/batch
Epoch: 24/50... Training Step: 1423... Training loss: 4.3857... 1.3621 sec/batch
Epoch: 24/50... Training Step: 1424... Training loss: 4.4144... 1.3445 sec/batch
Epoch: 24/50... Training Step: 1425... Training loss: 4.2904... 1.3721 sec/batch
Epoch: 24/50... Training Step: 1426... Training loss: 4.3035... 1.3462 sec/batch
Epoch: 24/50... Training Step: 1427... Training loss: 4.2542... 1.3834 sec/batch
Epoch: 24/50... Training Step: 1428... Training loss: 4.3022... 1.3690 sec/batch
Epoch: 24/50... Training Step: 1429... Training loss: 4.2455... 1.3691 sec/batch
Epoch: 24/50... Training Step: 1430... Training loss: 4.3502... 1.3727 sec/batch
Epoch: 24/50... Training Step: 1431... Training loss: 4.3597... 1.3737 sec/batch
Epoch: 24/50... Training Step: 1432... Training loss: 4.2545... 1.3912 sec/batch
Epoch: 24/50... Training Step: 1433... Training loss: 4.3271... 1.3784 sec/batch
Epoch: 24/50... Training Step: 1434... Training loss: 4.2483... 1.3230 sec/batch
Epoch: 24/50... Training Step: 1435... Training loss: 4.2405... 1.3872 sec/batch
Epoch: 24/50... Training Step: 1436... Training loss: 4.3130... 1.3717 sec/batch
Epoch: 24/50... Training Step: 1437... Training loss: 4.2984... 1.3917 sec/batch
Epoch: 24/50... Training Step: 1438... Training loss: 4.3334... 1.3445 sec/batch
Epoch: 24/50... Training Step: 1439... Training loss: 4.3020... 1.3783 sec/batch
Epoch: 24/50... Training Step: 1440... Training loss: 4.3891... 1.4158 sec/batch
Epoch: 24/50... Training Step: 1441... Training loss: 4.3292... 1.3942 sec/batch
Epoch: 24/50... Training Step: 1442... Training loss: 4.4490... 1.3759 sec/batch
Epoch: 24/50... Training Step: 1443... Training loss: 4.2759... 1.3734 sec/batch
Epoch: 24/50... Training Step: 1444... Training loss: 4.2815... 1.3968 sec/batch
Epoch: 24/50... Training Step: 1445... Training loss: 4.2871... 1.3854 sec/batch
Epoch: 24/50... Training Step: 1446... Training loss: 4.3312... 1.3699 sec/batch
Epoch: 24/50... Training Step: 1447... Training loss: 4.2395... 1.3720 sec/batch
Epoch: 24/50... Training Step: 1448... Training loss: 4.2631... 1.3665 sec/batch
Epoch: 24/50... Training Step: 1449... Training loss: 4.2736... 1.3865 sec/batch
Epoch: 24/50... Training Step: 1450... Training loss: 4.2957... 1.3751 sec/batch
Epoch: 24/50... Training Step: 1451... Training loss: 4.2856... 1.3824 sec/batch
Epoch: 24/50... Training Step: 1452... Training loss: 4.3344... 1.3900 sec/batch
Epoch: 24/50... Training Step: 1453... Training loss: 4.2665... 1.3850 sec/batch
Epoch: 24/50... Training Step: 1454... Training loss: 4.2423... 1.3068 sec/batch
Epoch: 24/50... Training Step: 1455... Training loss: 4.3461... 1.3882 sec/batch
Epoch: 24/50... Training Step: 1456... Training loss: 4.2265... 1.3773 sec/batch
Epoch: 24/50... Training Step: 1457... Training loss: 4.2898... 1.3569 sec/batch
Epoch: 24/50... Training Step: 1458... Training loss: 4.2225... 1.3952 sec/batch
Epoch: 24/50... Training Step: 1459... Training loss: 4.2000... 1.3803 sec/batch
Epoch: 24/50... Training Step: 1460... Training loss: 4.3332... 1.3623 sec/batch
Epoch: 24/50... Training Step: 1461... Training loss: 4.2514... 1.3795 sec/batch
Epoch: 24/50... Training Step: 1462... Training loss: 4.2765... 1.2806 sec/batch
Epoch: 24/50... Training Step: 1463... Training loss: 4.2867... 1.3361 sec/batch
Epoch: 24/50... Training Step: 1464... Training loss: 4.2714... 1.3700 sec/batch
Epoch: 25/50... Training Step: 1465... Training loss: 4.4479... 1.3448 sec/batch
Epoch: 25/50... Training Step: 1466... Training loss: 4.2792... 1.4080 sec/batch
Epoch: 25/50... Training Step: 1467... Training loss: 4.2777... 1.4003 sec/batch
Epoch: 25/50... Training Step: 1468... Training loss: 4.3240... 1.3769 sec/batch
Epoch: 25/50... Training Step: 1469... Training loss: 4.3114... 1.3869 sec/batch
Epoch: 25/50... Training Step: 1470... Training loss: 4.3840... 1.2880 sec/batch
Epoch: 25/50... Training Step: 1471... Training loss: 4.3617... 1.3931 sec/batch
Epoch: 25/50... Training Step: 1472... Training loss: 4.3519... 1.3719 sec/batch
Epoch: 25/50... Training Step: 1473... Training loss: 4.3062... 1.3784 sec/batch
Epoch: 25/50... Training Step: 1474... Training loss: 4.3525... 1.3737 sec/batch
Epoch: 25/50... Training Step: 1475... Training loss: 4.4727... 1.3419 sec/batch
Epoch: 25/50... Training Step: 1476... Training loss: 4.2917... 1.3993 sec/batch
Epoch: 25/50... Training Step: 1477... Training loss: 4.3571... 1.4011 sec/batch
Epoch: 25/50... Training Step: 1478... Training loss: 4.3642... 1.3737 sec/batch
Epoch: 25/50... Training Step: 1479... Training loss: 4.2814... 1.3607 sec/batch
Epoch: 25/50... Training Step: 1480... Training loss: 4.2919... 1.3929 sec/batch
Epoch: 25/50... Training Step: 1481... Training loss: 4.3373... 1.3869 sec/batch
Epoch: 25/50... Training Step: 1482... Training loss: 4.3583... 1.3464 sec/batch
Epoch: 25/50... Training Step: 1483... Training loss: 4.3489... 1.3760 sec/batch
Epoch: 25/50... Training Step: 1484... Training loss: 4.3634... 1.3724 sec/batch
Epoch: 25/50... Training Step: 1485... Training loss: 4.3779... 1.2533 sec/batch
Epoch: 25/50... Training Step: 1486... Training loss: 4.2523... 1.3683 sec/batch
Epoch: 25/50... Training Step: 1487... Training loss: 4.2718... 1.3679 sec/batch
Epoch: 25/50... Training Step: 1488... Training loss: 4.2274... 1.3280 sec/batch
Epoch: 25/50... Training Step: 1489... Training loss: 4.2673... 1.3865 sec/batch
Epoch: 25/50... Training Step: 1490... Training loss: 4.2149... 1.3733 sec/batch
Epoch: 25/50... Training Step: 1491... Training loss: 4.3028... 1.3504 sec/batch
Epoch: 25/50... Training Step: 1492... Training loss: 4.3306... 1.3821 sec/batch
Epoch: 25/50... Training Step: 1493... Training loss: 4.2330... 1.3693 sec/batch
Epoch: 25/50... Training Step: 1494... Training loss: 4.2980... 1.3524 sec/batch
Epoch: 25/50... Training Step: 1495... Training loss: 4.2168... 1.4021 sec/batch
Epoch: 25/50... Training Step: 1496... Training loss: 4.1998... 1.3779 sec/batch
Epoch: 25/50... Training Step: 1497... Training loss: 4.2723... 1.3746 sec/batch
Epoch: 25/50... Training Step: 1498... Training loss: 4.2689... 1.3467 sec/batch
Epoch: 25/50... Training Step: 1499... Training loss: 4.3101... 1.4260 sec/batch
Epoch: 25/50... Training Step: 1500... Training loss: 4.2597... 1.3501 sec/batch
Epoch: 25/50... Training Step: 1501... Training loss: 4.3521... 1.3663 sec/batch
Epoch: 25/50... Training Step: 1502... Training loss: 4.3195... 1.3899 sec/batch
Epoch: 25/50... Training Step: 1503... Training loss: 4.4190... 1.3682 sec/batch
Epoch: 25/50... Training Step: 1504... Training loss: 4.2464... 1.3679 sec/batch
Epoch: 25/50... Training Step: 1505... Training loss: 4.2421... 1.3461 sec/batch
Epoch: 25/50... Training Step: 1506... Training loss: 4.2559... 1.3794 sec/batch
Epoch: 25/50... Training Step: 1507... Training loss: 4.2998... 1.3929 sec/batch
Epoch: 25/50... Training Step: 1508... Training loss: 4.2093... 1.3258 sec/batch
Epoch: 25/50... Training Step: 1509... Training loss: 4.2341... 1.2491 sec/batch
Epoch: 25/50... Training Step: 1510... Training loss: 4.2356... 1.3864 sec/batch
Epoch: 25/50... Training Step: 1511... Training loss: 4.2546... 1.3753 sec/batch
Epoch: 25/50... Training Step: 1512... Training loss: 4.2412... 1.3896 sec/batch
Epoch: 25/50... Training Step: 1513... Training loss: 4.3125... 1.3850 sec/batch
Epoch: 25/50... Training Step: 1514... Training loss: 4.2356... 1.3933 sec/batch
Epoch: 25/50... Training Step: 1515... Training loss: 4.2134... 1.3837 sec/batch
Epoch: 25/50... Training Step: 1516... Training loss: 4.3168... 1.3643 sec/batch
Epoch: 25/50... Training Step: 1517... Training loss: 4.2014... 1.3821 sec/batch
Epoch: 25/50... Training Step: 1518... Training loss: 4.2590... 1.3797 sec/batch
Epoch: 25/50... Training Step: 1519... Training loss: 4.1871... 1.3751 sec/batch
Epoch: 25/50... Training Step: 1520... Training loss: 4.1642... 1.3507 sec/batch
Epoch: 25/50... Training Step: 1521... Training loss: 4.2952... 1.4002 sec/batch
Epoch: 25/50... Training Step: 1522... Training loss: 4.2295... 1.3709 sec/batch
Epoch: 25/50... Training Step: 1523... Training loss: 4.2442... 1.3693 sec/batch
Epoch: 25/50... Training Step: 1524... Training loss: 4.2505... 1.3797 sec/batch
Epoch: 25/50... Training Step: 1525... Training loss: 4.2479... 1.3857 sec/batch
Epoch: 26/50... Training Step: 1526... Training loss: 4.4112... 1.3768 sec/batch
Epoch: 26/50... Training Step: 1527... Training loss: 4.2412... 1.3966 sec/batch
Epoch: 26/50... Training Step: 1528... Training loss: 4.2431... 1.4055 sec/batch
Epoch: 26/50... Training Step: 1529... Training loss: 4.2916... 1.3741 sec/batch
Epoch: 26/50... Training Step: 1530... Training loss: 4.2742... 1.3815 sec/batch
Epoch: 26/50... Training Step: 1531... Training loss: 4.3376... 1.3692 sec/batch
Epoch: 26/50... Training Step: 1532... Training loss: 4.3349... 1.3696 sec/batch
Epoch: 26/50... Training Step: 1533... Training loss: 4.3276... 1.3795 sec/batch
Epoch: 26/50... Training Step: 1534... Training loss: 4.2834... 1.4013 sec/batch
Epoch: 26/50... Training Step: 1535... Training loss: 4.3174... 1.4090 sec/batch
Epoch: 26/50... Training Step: 1536... Training loss: 4.4279... 1.3840 sec/batch
Epoch: 26/50... Training Step: 1537... Training loss: 4.2761... 1.3485 sec/batch
Epoch: 26/50... Training Step: 1538... Training loss: 4.3244... 1.3806 sec/batch
Epoch: 26/50... Training Step: 1539... Training loss: 4.3342... 1.3992 sec/batch
Epoch: 26/50... Training Step: 1540... Training loss: 4.2486... 1.3864 sec/batch
Epoch: 26/50... Training Step: 1541... Training loss: 4.2642... 1.3739 sec/batch
Epoch: 26/50... Training Step: 1542... Training loss: 4.3099... 1.3759 sec/batch
Epoch: 26/50... Training Step: 1543... Training loss: 4.3244... 1.3797 sec/batch
Epoch: 26/50... Training Step: 1544... Training loss: 4.3326... 1.3861 sec/batch
Epoch: 26/50... Training Step: 1545... Training loss: 4.3335... 1.3663 sec/batch
Epoch: 26/50... Training Step: 1546... Training loss: 4.3488... 1.3330 sec/batch
Epoch: 26/50... Training Step: 1547... Training loss: 4.2187... 1.3580 sec/batch
Epoch: 26/50... Training Step: 1548... Training loss: 4.2408... 1.3718 sec/batch
Epoch: 26/50... Training Step: 1549... Training loss: 4.1982... 1.3662 sec/batch
Epoch: 26/50... Training Step: 1550... Training loss: 4.2396... 1.3896 sec/batch
Epoch: 26/50... Training Step: 1551... Training loss: 4.1928... 1.3688 sec/batch
Epoch: 26/50... Training Step: 1552... Training loss: 4.2841... 1.3574 sec/batch
Epoch: 26/50... Training Step: 1553... Training loss: 4.3033... 1.3152 sec/batch
Epoch: 26/50... Training Step: 1554... Training loss: 4.1995... 1.3658 sec/batch
Epoch: 26/50... Training Step: 1555... Training loss: 4.2788... 1.3712 sec/batch
Epoch: 26/50... Training Step: 1556... Training loss: 4.1939... 1.3880 sec/batch
Epoch: 26/50... Training Step: 1557... Training loss: 4.1856... 1.4017 sec/batch
Epoch: 26/50... Training Step: 1558... Training loss: 4.2688... 1.3285 sec/batch
Epoch: 26/50... Training Step: 1559... Training loss: 4.2382... 1.3708 sec/batch
Epoch: 26/50... Training Step: 1560... Training loss: 4.2745... 1.3670 sec/batch
Epoch: 26/50... Training Step: 1561... Training loss: 4.2282... 1.4112 sec/batch
Epoch: 26/50... Training Step: 1562... Training loss: 4.3219... 1.3150 sec/batch
Epoch: 26/50... Training Step: 1563... Training loss: 4.2806... 1.4075 sec/batch
Epoch: 26/50... Training Step: 1564... Training loss: 4.3895... 1.3117 sec/batch
Epoch: 26/50... Training Step: 1565... Training loss: 4.2152... 1.3907 sec/batch
Epoch: 26/50... Training Step: 1566... Training loss: 4.2142... 1.3703 sec/batch
Epoch: 26/50... Training Step: 1567... Training loss: 4.2276... 1.3873 sec/batch
Epoch: 26/50... Training Step: 1568... Training loss: 4.2653... 1.3860 sec/batch
Epoch: 26/50... Training Step: 1569... Training loss: 4.1825... 1.3814 sec/batch
Epoch: 26/50... Training Step: 1570... Training loss: 4.2219... 1.3991 sec/batch
Epoch: 26/50... Training Step: 1571... Training loss: 4.2183... 1.3793 sec/batch
Epoch: 26/50... Training Step: 1572... Training loss: 4.2453... 1.3812 sec/batch
Epoch: 26/50... Training Step: 1573... Training loss: 4.2309... 1.3553 sec/batch
Epoch: 26/50... Training Step: 1574... Training loss: 4.2822... 1.3891 sec/batch
Epoch: 26/50... Training Step: 1575... Training loss: 4.2144... 1.3440 sec/batch
Epoch: 26/50... Training Step: 1576... Training loss: 4.1918... 1.3839 sec/batch
Epoch: 26/50... Training Step: 1577... Training loss: 4.2898... 1.3588 sec/batch
Epoch: 26/50... Training Step: 1578... Training loss: 4.1745... 1.3132 sec/batch
Epoch: 26/50... Training Step: 1579... Training loss: 4.2426... 1.4086 sec/batch
Epoch: 26/50... Training Step: 1580... Training loss: 4.1591... 1.3739 sec/batch
Epoch: 26/50... Training Step: 1581... Training loss: 4.1460... 1.3532 sec/batch
Epoch: 26/50... Training Step: 1582... Training loss: 4.2847... 1.3674 sec/batch
Epoch: 26/50... Training Step: 1583... Training loss: 4.1980... 1.3702 sec/batch
Epoch: 26/50... Training Step: 1584... Training loss: 4.2336... 1.3176 sec/batch
Epoch: 26/50... Training Step: 1585... Training loss: 4.2440... 1.3857 sec/batch
Epoch: 26/50... Training Step: 1586... Training loss: 4.2290... 1.3730 sec/batch
Epoch: 27/50... Training Step: 1587... Training loss: 4.3858... 1.3793 sec/batch
Epoch: 27/50... Training Step: 1588... Training loss: 4.2230... 1.3315 sec/batch
Epoch: 27/50... Training Step: 1589... Training loss: 4.2315... 1.3775 sec/batch
Epoch: 27/50... Training Step: 1590... Training loss: 4.2784... 1.3956 sec/batch
Epoch: 27/50... Training Step: 1591... Training loss: 4.2533... 1.2543 sec/batch
Epoch: 27/50... Training Step: 1592... Training loss: 4.3273... 1.2904 sec/batch
Epoch: 27/50... Training Step: 1593... Training loss: 4.3268... 1.3830 sec/batch
Epoch: 27/50... Training Step: 1594... Training loss: 4.2973... 1.2969 sec/batch
Epoch: 27/50... Training Step: 1595... Training loss: 4.2587... 1.3845 sec/batch
Epoch: 27/50... Training Step: 1596... Training loss: 4.3045... 1.3652 sec/batch
Epoch: 27/50... Training Step: 1597... Training loss: 4.4246... 1.3877 sec/batch
Epoch: 27/50... Training Step: 1598... Training loss: 4.2521... 1.3803 sec/batch
Epoch: 27/50... Training Step: 1599... Training loss: 4.3122... 1.3127 sec/batch
Epoch: 27/50... Training Step: 1600... Training loss: 4.3123... 1.3930 sec/batch
Epoch: 27/50... Training Step: 1601... Training loss: 4.2251... 1.3665 sec/batch
Epoch: 27/50... Training Step: 1602... Training loss: 4.2398... 1.3731 sec/batch
Epoch: 27/50... Training Step: 1603... Training loss: 4.2863... 1.3817 sec/batch
Epoch: 27/50... Training Step: 1604... Training loss: 4.3011... 1.3977 sec/batch
Epoch: 27/50... Training Step: 1605... Training loss: 4.3020... 1.3945 sec/batch
Epoch: 27/50... Training Step: 1606... Training loss: 4.3126... 1.2127 sec/batch
Epoch: 27/50... Training Step: 1607... Training loss: 4.3321... 1.3742 sec/batch
Epoch: 27/50... Training Step: 1608... Training loss: 4.2029... 1.4034 sec/batch
Epoch: 27/50... Training Step: 1609... Training loss: 4.2237... 1.3427 sec/batch
Epoch: 27/50... Training Step: 1610... Training loss: 4.1779... 1.3458 sec/batch
Epoch: 27/50... Training Step: 1611... Training loss: 4.2141... 1.3923 sec/batch
Epoch: 27/50... Training Step: 1612... Training loss: 4.1752... 1.3509 sec/batch
Epoch: 27/50... Training Step: 1613... Training loss: 4.2641... 1.3842 sec/batch
Epoch: 27/50... Training Step: 1614... Training loss: 4.2761... 1.3882 sec/batch
Epoch: 27/50... Training Step: 1615... Training loss: 4.1700... 1.3677 sec/batch
Epoch: 27/50... Training Step: 1616... Training loss: 4.2603... 1.3888 sec/batch
Epoch: 27/50... Training Step: 1617... Training loss: 4.1680... 1.3954 sec/batch
Epoch: 27/50... Training Step: 1618... Training loss: 4.1577... 1.3670 sec/batch
Epoch: 27/50... Training Step: 1619... Training loss: 4.2398... 1.3731 sec/batch
Epoch: 27/50... Training Step: 1620... Training loss: 4.2085... 1.3700 sec/batch
Epoch: 27/50... Training Step: 1621... Training loss: 4.2627... 1.3700 sec/batch
Epoch: 27/50... Training Step: 1622... Training loss: 4.1948... 1.4128 sec/batch
Epoch: 27/50... Training Step: 1623... Training loss: 4.2919... 1.3891 sec/batch
Epoch: 27/50... Training Step: 1624... Training loss: 4.2603... 1.3827 sec/batch
Epoch: 27/50... Training Step: 1625... Training loss: 4.3609... 1.3695 sec/batch
Epoch: 27/50... Training Step: 1626... Training loss: 4.1914... 1.3323 sec/batch
Epoch: 27/50... Training Step: 1627... Training loss: 4.1834... 1.3663 sec/batch
Epoch: 27/50... Training Step: 1628... Training loss: 4.1995... 1.3732 sec/batch
Epoch: 27/50... Training Step: 1629... Training loss: 4.2550... 1.3982 sec/batch
Epoch: 27/50... Training Step: 1630... Training loss: 4.1588... 1.3940 sec/batch
Epoch: 27/50... Training Step: 1631... Training loss: 4.2041... 1.3870 sec/batch
Epoch: 27/50... Training Step: 1632... Training loss: 4.1944... 1.3720 sec/batch
Epoch: 27/50... Training Step: 1633... Training loss: 4.2349... 1.3807 sec/batch
Epoch: 27/50... Training Step: 1634... Training loss: 4.2061... 1.3805 sec/batch
Epoch: 27/50... Training Step: 1635... Training loss: 4.2682... 1.3712 sec/batch
Epoch: 27/50... Training Step: 1636... Training loss: 4.1980... 1.4051 sec/batch
Epoch: 27/50... Training Step: 1637... Training loss: 4.1465... 1.3813 sec/batch
Epoch: 27/50... Training Step: 1638... Training loss: 4.2865... 1.3743 sec/batch
Epoch: 27/50... Training Step: 1639... Training loss: 4.1550... 1.3865 sec/batch
Epoch: 27/50... Training Step: 1640... Training loss: 4.2102... 1.4008 sec/batch
Epoch: 27/50... Training Step: 1641... Training loss: 4.1449... 1.3267 sec/batch
Epoch: 27/50... Training Step: 1642... Training loss: 4.1128... 1.3761 sec/batch
Epoch: 27/50... Training Step: 1643... Training loss: 4.2625... 1.3676 sec/batch
Epoch: 27/50... Training Step: 1644... Training loss: 4.1770... 1.3925 sec/batch
Epoch: 27/50... Training Step: 1645... Training loss: 4.2042... 1.4043 sec/batch
Epoch: 27/50... Training Step: 1646... Training loss: 4.2204... 1.3663 sec/batch
Epoch: 27/50... Training Step: 1647... Training loss: 4.1825... 1.3368 sec/batch
Epoch: 28/50... Training Step: 1648... Training loss: 4.3685... 1.3664 sec/batch
Epoch: 28/50... Training Step: 1649... Training loss: 4.1982... 1.3996 sec/batch
Epoch: 28/50... Training Step: 1650... Training loss: 4.2074... 1.3694 sec/batch
Epoch: 28/50... Training Step: 1651... Training loss: 4.2340... 1.4076 sec/batch
Epoch: 28/50... Training Step: 1652... Training loss: 4.2296... 1.4049 sec/batch
Epoch: 28/50... Training Step: 1653... Training loss: 4.2981... 1.3750 sec/batch
Epoch: 28/50... Training Step: 1654... Training loss: 4.2895... 1.3722 sec/batch
Epoch: 28/50... Training Step: 1655... Training loss: 4.2746... 1.3776 sec/batch
Epoch: 28/50... Training Step: 1656... Training loss: 4.2252... 1.3678 sec/batch
Epoch: 28/50... Training Step: 1657... Training loss: 4.2742... 1.3794 sec/batch
Epoch: 28/50... Training Step: 1658... Training loss: 4.3821... 1.4130 sec/batch
Epoch: 28/50... Training Step: 1659... Training loss: 4.2242... 1.3525 sec/batch
Epoch: 28/50... Training Step: 1660... Training loss: 4.2826... 1.3797 sec/batch
Epoch: 28/50... Training Step: 1661... Training loss: 4.2838... 1.3957 sec/batch
Epoch: 28/50... Training Step: 1662... Training loss: 4.2082... 1.3853 sec/batch
Epoch: 28/50... Training Step: 1663... Training loss: 4.2130... 1.3801 sec/batch
Epoch: 28/50... Training Step: 1664... Training loss: 4.2641... 1.3766 sec/batch
Epoch: 28/50... Training Step: 1665... Training loss: 4.2817... 1.3918 sec/batch
Epoch: 28/50... Training Step: 1666... Training loss: 4.2760... 1.3804 sec/batch
Epoch: 28/50... Training Step: 1667... Training loss: 4.2873... 1.3829 sec/batch
Epoch: 28/50... Training Step: 1668... Training loss: 4.3143... 1.3678 sec/batch
Epoch: 28/50... Training Step: 1669... Training loss: 4.1802... 1.3945 sec/batch
Epoch: 28/50... Training Step: 1670... Training loss: 4.1953... 1.3670 sec/batch
Epoch: 28/50... Training Step: 1671... Training loss: 4.1544... 1.3773 sec/batch
Epoch: 28/50... Training Step: 1672... Training loss: 4.1903... 1.3448 sec/batch
Epoch: 28/50... Training Step: 1673... Training loss: 4.1497... 1.3778 sec/batch
Epoch: 28/50... Training Step: 1674... Training loss: 4.2389... 1.3704 sec/batch
Epoch: 28/50... Training Step: 1675... Training loss: 4.2585... 1.3801 sec/batch
Epoch: 28/50... Training Step: 1676... Training loss: 4.1498... 1.4237 sec/batch
Epoch: 28/50... Training Step: 1677... Training loss: 4.2227... 1.3693 sec/batch
Epoch: 28/50... Training Step: 1678... Training loss: 4.1465... 1.3909 sec/batch
Epoch: 28/50... Training Step: 1679... Training loss: 4.1299... 1.2820 sec/batch
Epoch: 28/50... Training Step: 1680... Training loss: 4.2073... 1.4108 sec/batch
Epoch: 28/50... Training Step: 1681... Training loss: 4.1904... 1.3450 sec/batch
Epoch: 28/50... Training Step: 1682... Training loss: 4.2407... 1.3832 sec/batch
Epoch: 28/50... Training Step: 1683... Training loss: 4.1791... 1.3810 sec/batch
Epoch: 28/50... Training Step: 1684... Training loss: 4.2890... 1.3795 sec/batch
Epoch: 28/50... Training Step: 1685... Training loss: 4.2454... 1.3856 sec/batch
Epoch: 28/50... Training Step: 1686... Training loss: 4.3448... 1.3649 sec/batch
Epoch: 28/50... Training Step: 1687... Training loss: 4.1627... 1.3897 sec/batch
Epoch: 28/50... Training Step: 1688... Training loss: 4.1575... 1.3465 sec/batch
Epoch: 28/50... Training Step: 1689... Training loss: 4.1759... 1.3596 sec/batch
Epoch: 28/50... Training Step: 1690... Training loss: 4.2210... 1.3872 sec/batch
Epoch: 28/50... Training Step: 1691... Training loss: 4.1412... 1.3979 sec/batch
Epoch: 28/50... Training Step: 1692... Training loss: 4.1683... 1.3698 sec/batch
Epoch: 28/50... Training Step: 1693... Training loss: 4.1563... 1.3120 sec/batch
Epoch: 28/50... Training Step: 1694... Training loss: 4.1812... 1.3800 sec/batch
Epoch: 28/50... Training Step: 1695... Training loss: 4.1843... 1.2829 sec/batch
Epoch: 28/50... Training Step: 1696... Training loss: 4.2394... 1.3542 sec/batch
Epoch: 28/50... Training Step: 1697... Training loss: 4.1491... 1.3731 sec/batch
Epoch: 28/50... Training Step: 1698... Training loss: 4.1339... 1.3942 sec/batch
Epoch: 28/50... Training Step: 1699... Training loss: 4.2410... 1.3784 sec/batch
Epoch: 28/50... Training Step: 1700... Training loss: 4.1158... 1.3779 sec/batch
Epoch: 28/50... Training Step: 1701... Training loss: 4.1846... 1.3718 sec/batch
Epoch: 28/50... Training Step: 1702... Training loss: 4.1058... 1.3997 sec/batch
Epoch: 28/50... Training Step: 1703... Training loss: 4.0843... 1.3652 sec/batch
Epoch: 28/50... Training Step: 1704... Training loss: 4.2212... 1.3557 sec/batch
Epoch: 28/50... Training Step: 1705... Training loss: 4.1410... 1.3517 sec/batch
Epoch: 28/50... Training Step: 1706... Training loss: 4.1613... 1.3967 sec/batch
Epoch: 28/50... Training Step: 1707... Training loss: 4.1709... 1.3791 sec/batch
Epoch: 28/50... Training Step: 1708... Training loss: 4.1663... 1.4200 sec/batch
Epoch: 29/50... Training Step: 1709... Training loss: 4.3339... 1.3436 sec/batch
Epoch: 29/50... Training Step: 1710... Training loss: 4.1625... 1.3750 sec/batch
Epoch: 29/50... Training Step: 1711... Training loss: 4.1653... 1.3737 sec/batch
Epoch: 29/50... Training Step: 1712... Training loss: 4.2080... 1.3659 sec/batch
Epoch: 29/50... Training Step: 1713... Training loss: 4.1984... 1.3779 sec/batch
Epoch: 29/50... Training Step: 1714... Training loss: 4.2607... 1.3740 sec/batch
Epoch: 29/50... Training Step: 1715... Training loss: 4.2577... 1.3686 sec/batch
Epoch: 29/50... Training Step: 1716... Training loss: 4.2411... 1.3283 sec/batch
Epoch: 29/50... Training Step: 1717... Training loss: 4.2096... 1.3657 sec/batch
Epoch: 29/50... Training Step: 1718... Training loss: 4.2288... 1.3692 sec/batch
Epoch: 29/50... Training Step: 1719... Training loss: 4.3671... 1.3706 sec/batch
Epoch: 29/50... Training Step: 1720... Training loss: 4.1894... 1.3981 sec/batch
Epoch: 29/50... Training Step: 1721... Training loss: 4.2464... 1.4034 sec/batch
Epoch: 29/50... Training Step: 1722... Training loss: 4.2538... 1.3837 sec/batch
Epoch: 29/50... Training Step: 1723... Training loss: 4.1754... 1.3756 sec/batch
Epoch: 29/50... Training Step: 1724... Training loss: 4.1808... 1.3720 sec/batch
Epoch: 29/50... Training Step: 1725... Training loss: 4.2305... 1.3883 sec/batch
Epoch: 29/50... Training Step: 1726... Training loss: 4.2450... 1.4182 sec/batch
Epoch: 29/50... Training Step: 1727... Training loss: 4.2517... 1.3383 sec/batch
Epoch: 29/50... Training Step: 1728... Training loss: 4.2479... 1.3712 sec/batch
Epoch: 29/50... Training Step: 1729... Training loss: 4.2883... 1.3694 sec/batch
Epoch: 29/50... Training Step: 1730... Training loss: 4.1554... 1.3754 sec/batch
Epoch: 29/50... Training Step: 1731... Training loss: 4.1642... 1.3872 sec/batch
Epoch: 29/50... Training Step: 1732... Training loss: 4.1265... 1.3200 sec/batch
Epoch: 29/50... Training Step: 1733... Training loss: 4.1626... 1.3493 sec/batch
Epoch: 29/50... Training Step: 1734... Training loss: 4.1242... 1.3184 sec/batch
Epoch: 29/50... Training Step: 1735... Training loss: 4.2139... 1.3968 sec/batch
Epoch: 29/50... Training Step: 1736... Training loss: 4.2261... 1.3719 sec/batch
Epoch: 29/50... Training Step: 1737... Training loss: 4.1216... 1.3146 sec/batch
Epoch: 29/50... Training Step: 1738... Training loss: 4.1948... 1.3782 sec/batch
Epoch: 29/50... Training Step: 1739... Training loss: 4.1268... 1.3978 sec/batch
Epoch: 29/50... Training Step: 1740... Training loss: 4.1044... 1.3658 sec/batch
Epoch: 29/50... Training Step: 1741... Training loss: 4.1784... 1.3731 sec/batch
Epoch: 29/50... Training Step: 1742... Training loss: 4.1564... 1.3951 sec/batch
Epoch: 29/50... Training Step: 1743... Training loss: 4.2055... 1.3723 sec/batch
Epoch: 29/50... Training Step: 1744... Training loss: 4.1505... 1.3888 sec/batch
Epoch: 29/50... Training Step: 1745... Training loss: 4.2400... 1.3636 sec/batch
Epoch: 29/50... Training Step: 1746... Training loss: 4.2028... 1.3896 sec/batch
Epoch: 29/50... Training Step: 1747... Training loss: 4.3072... 1.3541 sec/batch
Epoch: 29/50... Training Step: 1748... Training loss: 4.1338... 1.3671 sec/batch
Epoch: 29/50... Training Step: 1749... Training loss: 4.1391... 1.3297 sec/batch
Epoch: 29/50... Training Step: 1750... Training loss: 4.1556... 1.3215 sec/batch
Epoch: 29/50... Training Step: 1751... Training loss: 4.1962... 1.3713 sec/batch
Epoch: 29/50... Training Step: 1752... Training loss: 4.1146... 1.3429 sec/batch
Epoch: 29/50... Training Step: 1753... Training loss: 4.1391... 1.3889 sec/batch
Epoch: 29/50... Training Step: 1754... Training loss: 4.1357... 1.3819 sec/batch
Epoch: 29/50... Training Step: 1755... Training loss: 4.1602... 1.4027 sec/batch
Epoch: 29/50... Training Step: 1756... Training loss: 4.1618... 1.3546 sec/batch
Epoch: 29/50... Training Step: 1757... Training loss: 4.1918... 1.4304 sec/batch
Epoch: 29/50... Training Step: 1758... Training loss: 4.1456... 1.3763 sec/batch
Epoch: 29/50... Training Step: 1759... Training loss: 4.1131... 1.3464 sec/batch
Epoch: 29/50... Training Step: 1760... Training loss: 4.2123... 1.3894 sec/batch
Epoch: 29/50... Training Step: 1761... Training loss: 4.0944... 1.3755 sec/batch
Epoch: 29/50... Training Step: 1762... Training loss: 4.1555... 1.3921 sec/batch
Epoch: 29/50... Training Step: 1763... Training loss: 4.0862... 1.3734 sec/batch
Epoch: 29/50... Training Step: 1764... Training loss: 4.0587... 1.3651 sec/batch
Epoch: 29/50... Training Step: 1765... Training loss: 4.2031... 1.3437 sec/batch
Epoch: 29/50... Training Step: 1766... Training loss: 4.1159... 1.3480 sec/batch
Epoch: 29/50... Training Step: 1767... Training loss: 4.1516... 1.3977 sec/batch
Epoch: 29/50... Training Step: 1768... Training loss: 4.1580... 1.3850 sec/batch
Epoch: 29/50... Training Step: 1769... Training loss: 4.1499... 1.3835 sec/batch
Epoch: 30/50... Training Step: 1770... Training loss: 4.3083... 1.3425 sec/batch
Epoch: 30/50... Training Step: 1771... Training loss: 4.1491... 1.4117 sec/batch
Epoch: 30/50... Training Step: 1772... Training loss: 4.1489... 1.3535 sec/batch
Epoch: 30/50... Training Step: 1773... Training loss: 4.1910... 1.2822 sec/batch
Epoch: 30/50... Training Step: 1774... Training loss: 4.1815... 1.3472 sec/batch
Epoch: 30/50... Training Step: 1775... Training loss: 4.2488... 1.4173 sec/batch
Epoch: 30/50... Training Step: 1776... Training loss: 4.2264... 1.3679 sec/batch
Epoch: 30/50... Training Step: 1777... Training loss: 4.2146... 1.3929 sec/batch
Epoch: 30/50... Training Step: 1778... Training loss: 4.1865... 1.3908 sec/batch
Epoch: 30/50... Training Step: 1779... Training loss: 4.2226... 1.3696 sec/batch
Epoch: 30/50... Training Step: 1780... Training loss: 4.3416... 1.4381 sec/batch
Epoch: 30/50... Training Step: 1781... Training loss: 4.1783... 1.3814 sec/batch
Epoch: 30/50... Training Step: 1782... Training loss: 4.2283... 1.3826 sec/batch
Epoch: 30/50... Training Step: 1783... Training loss: 4.2401... 1.3721 sec/batch
Epoch: 30/50... Training Step: 1784... Training loss: 4.1559... 1.3883 sec/batch
Epoch: 30/50... Training Step: 1785... Training loss: 4.1762... 1.3593 sec/batch
Epoch: 30/50... Training Step: 1786... Training loss: 4.2117... 1.3846 sec/batch
Epoch: 30/50... Training Step: 1787... Training loss: 4.2261... 1.3932 sec/batch
Epoch: 30/50... Training Step: 1788... Training loss: 4.2343... 1.3658 sec/batch
Epoch: 30/50... Training Step: 1789... Training loss: 4.2334... 1.3127 sec/batch
Epoch: 30/50... Training Step: 1790... Training loss: 4.2567... 1.3418 sec/batch
Epoch: 30/50... Training Step: 1791... Training loss: 4.1358... 1.3599 sec/batch
Epoch: 30/50... Training Step: 1792... Training loss: 4.1445... 1.3450 sec/batch
Epoch: 30/50... Training Step: 1793... Training loss: 4.1022... 1.3880 sec/batch
Epoch: 30/50... Training Step: 1794... Training loss: 4.1449... 1.3819 sec/batch
Epoch: 30/50... Training Step: 1795... Training loss: 4.1008... 1.3765 sec/batch
Epoch: 30/50... Training Step: 1796... Training loss: 4.1965... 1.3555 sec/batch
Epoch: 30/50... Training Step: 1797... Training loss: 4.2060... 1.3931 sec/batch
Epoch: 30/50... Training Step: 1798... Training loss: 4.1062... 1.4224 sec/batch
Epoch: 30/50... Training Step: 1799... Training loss: 4.1641... 1.3735 sec/batch
Epoch: 30/50... Training Step: 1800... Training loss: 4.1021... 1.3887 sec/batch
Epoch: 30/50... Training Step: 1801... Training loss: 4.0867... 1.3188 sec/batch
Epoch: 30/50... Training Step: 1802... Training loss: 4.1711... 1.4043 sec/batch
Epoch: 30/50... Training Step: 1803... Training loss: 4.1454... 1.3854 sec/batch
Epoch: 30/50... Training Step: 1804... Training loss: 4.1796... 1.3884 sec/batch
Epoch: 30/50... Training Step: 1805... Training loss: 4.1322... 1.3889 sec/batch
Epoch: 30/50... Training Step: 1806... Training loss: 4.2304... 1.3880 sec/batch
Epoch: 30/50... Training Step: 1807... Training loss: 4.1881... 1.3158 sec/batch
Epoch: 30/50... Training Step: 1808... Training loss: 4.2768... 1.3846 sec/batch
Epoch: 30/50... Training Step: 1809... Training loss: 4.1049... 1.3247 sec/batch
Epoch: 30/50... Training Step: 1810... Training loss: 4.1178... 1.3840 sec/batch
Epoch: 30/50... Training Step: 1811... Training loss: 4.1338... 1.3913 sec/batch
Epoch: 30/50... Training Step: 1812... Training loss: 4.1765... 1.3732 sec/batch
Epoch: 30/50... Training Step: 1813... Training loss: 4.0916... 1.3860 sec/batch
Epoch: 30/50... Training Step: 1814... Training loss: 4.1287... 1.3938 sec/batch
Epoch: 30/50... Training Step: 1815... Training loss: 4.1292... 1.3222 sec/batch
Epoch: 30/50... Training Step: 1816... Training loss: 4.1432... 1.3745 sec/batch
Epoch: 30/50... Training Step: 1817... Training loss: 4.1226... 1.3815 sec/batch
Epoch: 30/50... Training Step: 1818... Training loss: 4.1637... 1.3622 sec/batch
Epoch: 30/50... Training Step: 1819... Training loss: 4.1183... 1.3724 sec/batch
Epoch: 30/50... Training Step: 1820... Training loss: 4.0852... 1.3745 sec/batch
Epoch: 30/50... Training Step: 1821... Training loss: 4.1816... 1.3768 sec/batch
Epoch: 30/50... Training Step: 1822... Training loss: 4.0813... 1.3610 sec/batch
Epoch: 30/50... Training Step: 1823... Training loss: 4.1317... 1.4026 sec/batch
Epoch: 30/50... Training Step: 1824... Training loss: 4.0737... 1.3400 sec/batch
Epoch: 30/50... Training Step: 1825... Training loss: 4.0408... 1.3644 sec/batch
Epoch: 30/50... Training Step: 1826... Training loss: 4.1817... 1.3695 sec/batch
Epoch: 30/50... Training Step: 1827... Training loss: 4.1075... 1.3647 sec/batch
Epoch: 30/50... Training Step: 1828... Training loss: 4.1264... 1.3839 sec/batch
Epoch: 30/50... Training Step: 1829... Training loss: 4.1466... 1.2775 sec/batch
Epoch: 30/50... Training Step: 1830... Training loss: 4.1185... 1.3679 sec/batch
Epoch: 31/50... Training Step: 1831... Training loss: 4.2962... 1.3903 sec/batch
Epoch: 31/50... Training Step: 1832... Training loss: 4.1358... 1.3511 sec/batch
Epoch: 31/50... Training Step: 1833... Training loss: 4.1254... 1.3675 sec/batch
Epoch: 31/50... Training Step: 1834... Training loss: 4.1742... 1.3815 sec/batch
Epoch: 31/50... Training Step: 1835... Training loss: 4.1569... 1.3622 sec/batch
Epoch: 31/50... Training Step: 1836... Training loss: 4.2080... 1.3882 sec/batch
Epoch: 31/50... Training Step: 1837... Training loss: 4.2137... 1.3775 sec/batch
Epoch: 31/50... Training Step: 1838... Training loss: 4.1983... 1.3470 sec/batch
Epoch: 31/50... Training Step: 1839... Training loss: 4.1587... 1.3900 sec/batch
Epoch: 31/50... Training Step: 1840... Training loss: 4.2086... 1.4019 sec/batch
Epoch: 31/50... Training Step: 1841... Training loss: 4.3107... 1.3937 sec/batch
Epoch: 31/50... Training Step: 1842... Training loss: 4.1431... 1.3871 sec/batch
Epoch: 31/50... Training Step: 1843... Training loss: 4.2134... 1.3893 sec/batch
Epoch: 31/50... Training Step: 1844... Training loss: 4.2069... 1.3580 sec/batch
Epoch: 31/50... Training Step: 1845... Training loss: 4.1258... 1.3899 sec/batch
Epoch: 31/50... Training Step: 1846... Training loss: 4.1453... 1.3852 sec/batch
Epoch: 31/50... Training Step: 1847... Training loss: 4.1776... 1.3969 sec/batch
Epoch: 31/50... Training Step: 1848... Training loss: 4.2050... 1.3713 sec/batch
Epoch: 31/50... Training Step: 1849... Training loss: 4.1940... 1.3786 sec/batch
Epoch: 31/50... Training Step: 1850... Training loss: 4.2019... 1.4310 sec/batch
Epoch: 31/50... Training Step: 1851... Training loss: 4.2258... 1.3795 sec/batch
Epoch: 31/50... Training Step: 1852... Training loss: 4.1194... 1.3656 sec/batch
Epoch: 31/50... Training Step: 1853... Training loss: 4.1224... 1.3938 sec/batch
Epoch: 31/50... Training Step: 1854... Training loss: 4.0773... 1.3683 sec/batch
Epoch: 31/50... Training Step: 1855... Training loss: 4.1259... 1.3431 sec/batch
Epoch: 31/50... Training Step: 1856... Training loss: 4.0653... 1.3626 sec/batch
Epoch: 31/50... Training Step: 1857... Training loss: 4.1642... 1.3891 sec/batch
Epoch: 31/50... Training Step: 1858... Training loss: 4.1718... 1.3784 sec/batch
Epoch: 31/50... Training Step: 1859... Training loss: 4.0741... 1.3948 sec/batch
Epoch: 31/50... Training Step: 1860... Training loss: 4.1467... 1.3632 sec/batch
Epoch: 31/50... Training Step: 1861... Training loss: 4.0734... 1.3885 sec/batch
Epoch: 31/50... Training Step: 1862... Training loss: 4.0628... 1.3573 sec/batch
Epoch: 31/50... Training Step: 1863... Training loss: 4.1406... 1.3912 sec/batch
Epoch: 31/50... Training Step: 1864... Training loss: 4.1172... 1.3983 sec/batch
Epoch: 31/50... Training Step: 1865... Training loss: 4.1511... 1.3832 sec/batch
Epoch: 31/50... Training Step: 1866... Training loss: 4.1117... 1.3726 sec/batch
Epoch: 31/50... Training Step: 1867... Training loss: 4.2125... 1.3739 sec/batch
Epoch: 31/50... Training Step: 1868... Training loss: 4.1627... 1.4007 sec/batch
Epoch: 31/50... Training Step: 1869... Training loss: 4.2680... 1.3571 sec/batch
Epoch: 31/50... Training Step: 1870... Training loss: 4.0867... 1.3728 sec/batch
Epoch: 31/50... Training Step: 1871... Training loss: 4.0995... 1.4146 sec/batch
Epoch: 31/50... Training Step: 1872... Training loss: 4.1121... 1.3941 sec/batch
Epoch: 31/50... Training Step: 1873... Training loss: 4.1499... 1.3598 sec/batch
Epoch: 31/50... Training Step: 1874... Training loss: 4.0544... 1.3814 sec/batch
Epoch: 31/50... Training Step: 1875... Training loss: 4.0969... 1.3881 sec/batch
Epoch: 31/50... Training Step: 1876... Training loss: 4.0910... 1.2361 sec/batch
Epoch: 31/50... Training Step: 1877... Training loss: 4.1218... 1.3947 sec/batch
Epoch: 31/50... Training Step: 1878... Training loss: 4.1033... 1.3747 sec/batch
Epoch: 31/50... Training Step: 1879... Training loss: 4.1497... 1.3004 sec/batch
Epoch: 31/50... Training Step: 1880... Training loss: 4.0886... 1.3670 sec/batch
Epoch: 31/50... Training Step: 1881... Training loss: 4.0681... 1.3857 sec/batch
Epoch: 31/50... Training Step: 1882... Training loss: 4.1697... 1.3675 sec/batch
Epoch: 31/50... Training Step: 1883... Training loss: 4.0653... 1.3785 sec/batch
Epoch: 31/50... Training Step: 1884... Training loss: 4.1053... 1.3513 sec/batch
Epoch: 31/50... Training Step: 1885... Training loss: 4.0454... 1.3860 sec/batch
Epoch: 31/50... Training Step: 1886... Training loss: 4.0150... 1.4052 sec/batch
Epoch: 31/50... Training Step: 1887... Training loss: 4.1478... 1.3884 sec/batch
Epoch: 31/50... Training Step: 1888... Training loss: 4.0817... 1.2810 sec/batch
Epoch: 31/50... Training Step: 1889... Training loss: 4.1150... 1.3585 sec/batch
Epoch: 31/50... Training Step: 1890... Training loss: 4.1212... 1.4039 sec/batch
Epoch: 31/50... Training Step: 1891... Training loss: 4.0895... 1.3494 sec/batch
Epoch: 32/50... Training Step: 1892... Training loss: 4.2617... 1.3733 sec/batch
Epoch: 32/50... Training Step: 1893... Training loss: 4.1045... 1.4076 sec/batch
Epoch: 32/50... Training Step: 1894... Training loss: 4.1101... 1.3667 sec/batch
Epoch: 32/50... Training Step: 1895... Training loss: 4.1416... 1.3266 sec/batch
Epoch: 32/50... Training Step: 1896... Training loss: 4.1168... 1.3599 sec/batch
Epoch: 32/50... Training Step: 1897... Training loss: 4.1727... 1.3915 sec/batch
Epoch: 32/50... Training Step: 1898... Training loss: 4.1904... 1.3872 sec/batch
Epoch: 32/50... Training Step: 1899... Training loss: 4.1598... 1.4098 sec/batch
Epoch: 32/50... Training Step: 1900... Training loss: 4.1432... 1.4041 sec/batch
Epoch: 32/50... Training Step: 1901... Training loss: 4.1748... 1.3288 sec/batch
Epoch: 32/50... Training Step: 1902... Training loss: 4.2785... 1.3722 sec/batch
Epoch: 32/50... Training Step: 1903... Training loss: 4.1220... 1.3509 sec/batch
Epoch: 32/50... Training Step: 1904... Training loss: 4.1809... 1.4197 sec/batch
Epoch: 32/50... Training Step: 1905... Training loss: 4.1863... 1.3194 sec/batch
Epoch: 32/50... Training Step: 1906... Training loss: 4.1024... 1.3729 sec/batch
Epoch: 32/50... Training Step: 1907... Training loss: 4.1063... 1.3788 sec/batch
Epoch: 32/50... Training Step: 1908... Training loss: 4.1541... 1.3996 sec/batch
Epoch: 32/50... Training Step: 1909... Training loss: 4.1900... 1.3577 sec/batch
Epoch: 32/50... Training Step: 1910... Training loss: 4.1806... 1.3896 sec/batch
Epoch: 32/50... Training Step: 1911... Training loss: 4.1855... 1.3214 sec/batch
Epoch: 32/50... Training Step: 1912... Training loss: 4.1964... 1.4020 sec/batch
Epoch: 32/50... Training Step: 1913... Training loss: 4.0937... 1.4022 sec/batch
Epoch: 32/50... Training Step: 1914... Training loss: 4.0950... 1.3742 sec/batch
Epoch: 32/50... Training Step: 1915... Training loss: 4.0744... 1.3875 sec/batch
Epoch: 32/50... Training Step: 1916... Training loss: 4.1052... 1.3538 sec/batch
Epoch: 32/50... Training Step: 1917... Training loss: 4.0650... 1.4134 sec/batch
Epoch: 32/50... Training Step: 1918... Training loss: 4.1501... 1.3854 sec/batch
Epoch: 32/50... Training Step: 1919... Training loss: 4.1555... 1.4078 sec/batch
Epoch: 32/50... Training Step: 1920... Training loss: 4.0642... 1.3705 sec/batch
Epoch: 32/50... Training Step: 1921... Training loss: 4.1308... 1.3726 sec/batch
Epoch: 32/50... Training Step: 1922... Training loss: 4.0600... 1.4109 sec/batch
Epoch: 32/50... Training Step: 1923... Training loss: 4.0524... 1.3764 sec/batch
Epoch: 32/50... Training Step: 1924... Training loss: 4.1307... 1.3905 sec/batch
Epoch: 32/50... Training Step: 1925... Training loss: 4.0916... 1.3716 sec/batch
Epoch: 32/50... Training Step: 1926... Training loss: 4.1462... 1.3624 sec/batch
Epoch: 32/50... Training Step: 1927... Training loss: 4.0839... 1.3509 sec/batch
Epoch: 32/50... Training Step: 1928... Training loss: 4.1902... 1.3946 sec/batch
Epoch: 32/50... Training Step: 1929... Training loss: 4.1333... 1.3668 sec/batch
Epoch: 32/50... Training Step: 1930... Training loss: 4.2392... 1.3739 sec/batch
Epoch: 32/50... Training Step: 1931... Training loss: 4.0718... 1.4091 sec/batch
Epoch: 32/50... Training Step: 1932... Training loss: 4.0703... 1.3885 sec/batch
Epoch: 32/50... Training Step: 1933... Training loss: 4.0825... 1.4053 sec/batch
Epoch: 32/50... Training Step: 1934... Training loss: 4.1341... 1.3687 sec/batch
Epoch: 32/50... Training Step: 1935... Training loss: 4.0491... 1.3703 sec/batch
Epoch: 32/50... Training Step: 1936... Training loss: 4.0713... 1.2849 sec/batch
Epoch: 32/50... Training Step: 1937... Training loss: 4.0833... 1.3912 sec/batch
Epoch: 32/50... Training Step: 1938... Training loss: 4.0946... 1.8376 sec/batch
Epoch: 32/50... Training Step: 1939... Training loss: 4.0848... 1.4008 sec/batch
Epoch: 32/50... Training Step: 1940... Training loss: 4.1380... 1.3779 sec/batch
Epoch: 32/50... Training Step: 1941... Training loss: 4.0769... 1.3803 sec/batch
Epoch: 32/50... Training Step: 1942... Training loss: 4.0425... 1.3918 sec/batch
Epoch: 32/50... Training Step: 1943... Training loss: 4.1424... 1.3658 sec/batch
Epoch: 32/50... Training Step: 1944... Training loss: 4.0362... 1.4255 sec/batch
Epoch: 32/50... Training Step: 1945... Training loss: 4.0783... 1.4143 sec/batch
Epoch: 32/50... Training Step: 1946... Training loss: 4.0208... 1.4178 sec/batch
Epoch: 32/50... Training Step: 1947... Training loss: 3.9886... 1.3487 sec/batch
Epoch: 32/50... Training Step: 1948... Training loss: 4.1359... 1.4190 sec/batch
Epoch: 32/50... Training Step: 1949... Training loss: 4.0702... 1.2748 sec/batch
Epoch: 32/50... Training Step: 1950... Training loss: 4.0831... 1.4417 sec/batch
Epoch: 32/50... Training Step: 1951... Training loss: 4.0859... 1.4265 sec/batch
Epoch: 32/50... Training Step: 1952... Training loss: 4.0738... 1.4157 sec/batch
Epoch: 33/50... Training Step: 1953... Training loss: 4.2346... 1.3606 sec/batch
Epoch: 33/50... Training Step: 1954... Training loss: 4.0919... 1.3575 sec/batch
Epoch: 33/50... Training Step: 1955... Training loss: 4.0860... 1.4498 sec/batch
Epoch: 33/50... Training Step: 1956... Training loss: 4.1352... 1.3999 sec/batch
Epoch: 33/50... Training Step: 1957... Training loss: 4.1091... 1.3921 sec/batch
Epoch: 33/50... Training Step: 1958... Training loss: 4.1692... 1.4106 sec/batch
Epoch: 33/50... Training Step: 1959... Training loss: 4.1649... 1.3897 sec/batch
Epoch: 33/50... Training Step: 1960... Training loss: 4.1393... 1.3933 sec/batch
Epoch: 33/50... Training Step: 1961... Training loss: 4.1135... 1.3449 sec/batch
Epoch: 33/50... Training Step: 1962... Training loss: 4.1491... 1.3994 sec/batch
Epoch: 33/50... Training Step: 1963... Training loss: 4.2655... 1.4043 sec/batch
Epoch: 33/50... Training Step: 1964... Training loss: 4.1059... 1.3641 sec/batch
Epoch: 33/50... Training Step: 1965... Training loss: 4.1566... 1.3927 sec/batch
Epoch: 33/50... Training Step: 1966... Training loss: 4.1575... 1.3848 sec/batch
Epoch: 33/50... Training Step: 1967... Training loss: 4.0904... 1.4263 sec/batch
Epoch: 33/50... Training Step: 1968... Training loss: 4.0992... 1.3989 sec/batch
Epoch: 33/50... Training Step: 1969... Training loss: 4.1447... 1.4254 sec/batch
Epoch: 33/50... Training Step: 1970... Training loss: 4.1686... 1.3794 sec/batch
Epoch: 33/50... Training Step: 1971... Training loss: 4.1630... 1.3804 sec/batch
Epoch: 33/50... Training Step: 1972... Training loss: 4.1561... 1.3678 sec/batch
Epoch: 33/50... Training Step: 1973... Training loss: 4.1876... 1.3866 sec/batch
Epoch: 33/50... Training Step: 1974... Training loss: 4.0796... 1.3898 sec/batch
Epoch: 33/50... Training Step: 1975... Training loss: 4.0724... 1.3855 sec/batch
Epoch: 33/50... Training Step: 1976... Training loss: 4.0518... 1.3904 sec/batch
Epoch: 33/50... Training Step: 1977... Training loss: 4.0740... 1.3806 sec/batch
Epoch: 33/50... Training Step: 1978... Training loss: 4.0257... 1.3841 sec/batch
Epoch: 33/50... Training Step: 1979... Training loss: 4.1271... 1.4130 sec/batch
Epoch: 33/50... Training Step: 1980... Training loss: 4.1249... 1.3724 sec/batch
Epoch: 33/50... Training Step: 1981... Training loss: 4.0322... 1.3088 sec/batch
Epoch: 33/50... Training Step: 1982... Training loss: 4.1138... 1.4093 sec/batch
Epoch: 33/50... Training Step: 1983... Training loss: 4.0417... 1.3962 sec/batch
Epoch: 33/50... Training Step: 1984... Training loss: 4.0167... 1.3596 sec/batch
Epoch: 33/50... Training Step: 1985... Training loss: 4.0968... 1.3618 sec/batch
Epoch: 33/50... Training Step: 1986... Training loss: 4.0828... 1.3991 sec/batch
Epoch: 33/50... Training Step: 1987... Training loss: 4.1175... 1.4006 sec/batch
Epoch: 33/50... Training Step: 1988... Training loss: 4.0767... 1.3818 sec/batch
Epoch: 33/50... Training Step: 1989... Training loss: 4.1618... 1.3983 sec/batch
Epoch: 33/50... Training Step: 1990... Training loss: 4.1088... 1.3991 sec/batch
Epoch: 33/50... Training Step: 1991... Training loss: 4.2035... 1.3892 sec/batch
Epoch: 33/50... Training Step: 1992... Training loss: 4.0488... 1.4076 sec/batch
Epoch: 33/50... Training Step: 1993... Training loss: 4.0420... 1.3670 sec/batch
Epoch: 33/50... Training Step: 1994... Training loss: 4.0620... 1.4099 sec/batch
Epoch: 33/50... Training Step: 1995... Training loss: 4.0972... 1.3744 sec/batch
Epoch: 33/50... Training Step: 1996... Training loss: 4.0246... 1.4298 sec/batch
Epoch: 33/50... Training Step: 1997... Training loss: 4.0650... 1.3819 sec/batch
Epoch: 33/50... Training Step: 1998... Training loss: 4.0490... 1.3946 sec/batch
Epoch: 33/50... Training Step: 1999... Training loss: 4.0830... 1.3898 sec/batch
Epoch: 33/50... Training Step: 2000... Training loss: 4.0506... 1.4125 sec/batch
Epoch: 33/50... Training Step: 2001... Training loss: 4.1129... 1.3895 sec/batch
Epoch: 33/50... Training Step: 2002... Training loss: 4.0669... 1.3357 sec/batch
Epoch: 33/50... Training Step: 2003... Training loss: 4.0196... 1.3988 sec/batch
Epoch: 33/50... Training Step: 2004... Training loss: 4.1156... 1.3955 sec/batch
Epoch: 33/50... Training Step: 2005... Training loss: 4.0114... 1.3313 sec/batch
Epoch: 33/50... Training Step: 2006... Training loss: 4.0584... 1.3639 sec/batch
Epoch: 33/50... Training Step: 2007... Training loss: 4.0044... 1.3859 sec/batch
Epoch: 33/50... Training Step: 2008... Training loss: 3.9749... 1.3552 sec/batch
Epoch: 33/50... Training Step: 2009... Training loss: 4.1190... 1.4077 sec/batch
Epoch: 33/50... Training Step: 2010... Training loss: 4.0287... 1.3928 sec/batch
Epoch: 33/50... Training Step: 2011... Training loss: 4.0530... 1.3002 sec/batch
Epoch: 33/50... Training Step: 2012... Training loss: 4.0612... 1.4014 sec/batch
Epoch: 33/50... Training Step: 2013... Training loss: 4.0527... 1.3873 sec/batch
Epoch: 34/50... Training Step: 2014... Training loss: 4.2216... 1.4156 sec/batch
Epoch: 34/50... Training Step: 2015... Training loss: 4.0722... 1.3795 sec/batch
Epoch: 34/50... Training Step: 2016... Training loss: 4.0608... 1.3793 sec/batch
Epoch: 34/50... Training Step: 2017... Training loss: 4.1040... 1.3563 sec/batch
Epoch: 34/50... Training Step: 2018... Training loss: 4.0860... 1.4048 sec/batch
Epoch: 34/50... Training Step: 2019... Training loss: 4.1408... 1.3693 sec/batch
Epoch: 34/50... Training Step: 2020... Training loss: 4.1440... 1.3922 sec/batch
Epoch: 34/50... Training Step: 2021... Training loss: 4.1056... 1.3153 sec/batch
Epoch: 34/50... Training Step: 2022... Training loss: 4.0936... 1.4127 sec/batch
Epoch: 34/50... Training Step: 2023... Training loss: 4.1238... 1.4090 sec/batch
Epoch: 34/50... Training Step: 2024... Training loss: 4.2449... 1.4003 sec/batch
Epoch: 34/50... Training Step: 2025... Training loss: 4.0960... 1.4165 sec/batch
Epoch: 34/50... Training Step: 2026... Training loss: 4.1374... 1.3876 sec/batch
Epoch: 34/50... Training Step: 2027... Training loss: 4.1438... 1.4062 sec/batch
Epoch: 34/50... Training Step: 2028... Training loss: 4.0670... 1.3853 sec/batch
Epoch: 34/50... Training Step: 2029... Training loss: 4.0732... 1.3926 sec/batch
Epoch: 34/50... Training Step: 2030... Training loss: 4.1175... 1.4209 sec/batch
Epoch: 34/50... Training Step: 2031... Training loss: 4.1385... 1.4544 sec/batch
Epoch: 34/50... Training Step: 2032... Training loss: 4.1352... 1.3958 sec/batch
Epoch: 34/50... Training Step: 2033... Training loss: 4.1408... 1.3677 sec/batch
Epoch: 34/50... Training Step: 2034... Training loss: 4.1503... 1.3806 sec/batch
Epoch: 34/50... Training Step: 2035... Training loss: 4.0460... 1.3968 sec/batch
Epoch: 34/50... Training Step: 2036... Training loss: 4.0477... 1.4238 sec/batch
Epoch: 34/50... Training Step: 2037... Training loss: 4.0302... 1.3744 sec/batch
Epoch: 34/50... Training Step: 2038... Training loss: 4.0591... 1.3899 sec/batch
Epoch: 34/50... Training Step: 2039... Training loss: 4.0133... 1.4038 sec/batch
Epoch: 34/50... Training Step: 2040... Training loss: 4.0907... 1.4134 sec/batch
Epoch: 34/50... Training Step: 2041... Training loss: 4.1192... 1.4184 sec/batch
Epoch: 34/50... Training Step: 2042... Training loss: 4.0192... 1.3842 sec/batch
Epoch: 34/50... Training Step: 2043... Training loss: 4.0705... 1.4011 sec/batch
Epoch: 34/50... Training Step: 2044... Training loss: 4.0099... 1.3798 sec/batch
Epoch: 34/50... Training Step: 2045... Training loss: 4.0023... 1.4037 sec/batch
Epoch: 34/50... Training Step: 2046... Training loss: 4.0737... 1.4060 sec/batch
Epoch: 34/50... Training Step: 2047... Training loss: 4.0557... 1.4231 sec/batch
Epoch: 34/50... Training Step: 2048... Training loss: 4.0976... 1.3212 sec/batch
Epoch: 34/50... Training Step: 2049... Training loss: 4.0541... 1.3971 sec/batch
Epoch: 34/50... Training Step: 2050... Training loss: 4.1459... 1.4018 sec/batch
Epoch: 34/50... Training Step: 2051... Training loss: 4.0971... 1.3846 sec/batch
Epoch: 34/50... Training Step: 2052... Training loss: 4.1931... 1.3966 sec/batch
Epoch: 34/50... Training Step: 2053... Training loss: 4.0227... 1.3943 sec/batch
Epoch: 34/50... Training Step: 2054... Training loss: 4.0397... 1.4003 sec/batch
Epoch: 34/50... Training Step: 2055... Training loss: 4.0445... 1.3564 sec/batch
Epoch: 34/50... Training Step: 2056... Training loss: 4.0849... 1.3593 sec/batch
Epoch: 34/50... Training Step: 2057... Training loss: 4.0031... 1.3996 sec/batch
Epoch: 34/50... Training Step: 2058... Training loss: 4.0386... 1.4130 sec/batch
Epoch: 34/50... Training Step: 2059... Training loss: 4.0299... 1.4072 sec/batch
Epoch: 34/50... Training Step: 2060... Training loss: 4.0486... 1.4105 sec/batch
Epoch: 34/50... Training Step: 2061... Training loss: 4.0273... 1.3365 sec/batch
Epoch: 34/50... Training Step: 2062... Training loss: 4.0838... 1.3924 sec/batch
Epoch: 34/50... Training Step: 2063... Training loss: 4.0353... 1.4036 sec/batch
Epoch: 34/50... Training Step: 2064... Training loss: 4.0009... 1.3881 sec/batch
Epoch: 34/50... Training Step: 2065... Training loss: 4.1033... 1.3420 sec/batch
Epoch: 34/50... Training Step: 2066... Training loss: 3.9904... 1.3820 sec/batch
Epoch: 34/50... Training Step: 2067... Training loss: 4.0418... 1.4021 sec/batch
Epoch: 34/50... Training Step: 2068... Training loss: 3.9726... 1.4057 sec/batch
Epoch: 34/50... Training Step: 2069... Training loss: 3.9664... 1.3695 sec/batch
Epoch: 34/50... Training Step: 2070... Training loss: 4.0785... 1.3809 sec/batch
Epoch: 34/50... Training Step: 2071... Training loss: 4.0171... 1.3918 sec/batch
Epoch: 34/50... Training Step: 2072... Training loss: 4.0500... 1.3184 sec/batch
Epoch: 34/50... Training Step: 2073... Training loss: 4.0479... 1.3929 sec/batch
Epoch: 34/50... Training Step: 2074... Training loss: 4.0394... 1.3784 sec/batch
Epoch: 35/50... Training Step: 2075... Training loss: 4.1920... 1.4095 sec/batch
Epoch: 35/50... Training Step: 2076... Training loss: 4.0470... 1.4151 sec/batch
Epoch: 35/50... Training Step: 2077... Training loss: 4.0484... 1.4005 sec/batch
Epoch: 35/50... Training Step: 2078... Training loss: 4.0777... 1.4071 sec/batch
Epoch: 35/50... Training Step: 2079... Training loss: 4.0669... 1.4674 sec/batch
Epoch: 35/50... Training Step: 2080... Training loss: 4.1324... 1.3991 sec/batch
Epoch: 35/50... Training Step: 2081... Training loss: 4.1369... 1.3732 sec/batch
Epoch: 35/50... Training Step: 2082... Training loss: 4.1022... 1.4144 sec/batch
Epoch: 35/50... Training Step: 2083... Training loss: 4.0833... 1.3990 sec/batch
Epoch: 35/50... Training Step: 2084... Training loss: 4.1128... 1.4168 sec/batch
Epoch: 35/50... Training Step: 2085... Training loss: 4.2288... 1.3976 sec/batch
Epoch: 35/50... Training Step: 2086... Training loss: 4.0622... 1.4132 sec/batch
Epoch: 35/50... Training Step: 2087... Training loss: 4.1253... 1.3007 sec/batch
Epoch: 35/50... Training Step: 2088... Training loss: 4.1177... 1.4194 sec/batch
Epoch: 35/50... Training Step: 2089... Training loss: 4.0523... 1.4030 sec/batch
Epoch: 35/50... Training Step: 2090... Training loss: 4.0575... 1.4157 sec/batch
Epoch: 35/50... Training Step: 2091... Training loss: 4.0919... 1.4010 sec/batch
Epoch: 35/50... Training Step: 2092... Training loss: 4.1319... 1.3544 sec/batch
Epoch: 35/50... Training Step: 2093... Training loss: 4.1065... 1.3473 sec/batch
Epoch: 35/50... Training Step: 2094... Training loss: 4.1311... 1.4097 sec/batch
Epoch: 35/50... Training Step: 2095... Training loss: 4.1434... 1.3620 sec/batch
Epoch: 35/50... Training Step: 2096... Training loss: 4.0377... 1.3329 sec/batch
Epoch: 35/50... Training Step: 2097... Training loss: 4.0331... 1.3729 sec/batch
Epoch: 35/50... Training Step: 2098... Training loss: 3.9903... 1.3978 sec/batch
Epoch: 35/50... Training Step: 2099... Training loss: 4.0422... 1.3910 sec/batch
Epoch: 35/50... Training Step: 2100... Training loss: 3.9933... 1.2949 sec/batch
Epoch: 35/50... Training Step: 2101... Training loss: 4.0759... 1.3631 sec/batch
Epoch: 35/50... Training Step: 2102... Training loss: 4.0922... 1.3581 sec/batch
Epoch: 35/50... Training Step: 2103... Training loss: 3.9968... 1.3961 sec/batch
Epoch: 35/50... Training Step: 2104... Training loss: 4.0529... 1.4213 sec/batch
Epoch: 35/50... Training Step: 2105... Training loss: 3.9977... 1.3612 sec/batch
Epoch: 35/50... Training Step: 2106... Training loss: 3.9951... 1.4014 sec/batch
Epoch: 35/50... Training Step: 2107... Training loss: 4.0565... 1.4224 sec/batch
Epoch: 35/50... Training Step: 2108... Training loss: 4.0327... 1.4131 sec/batch
Epoch: 35/50... Training Step: 2109... Training loss: 4.0816... 1.4740 sec/batch
Epoch: 35/50... Training Step: 2110... Training loss: 4.0230... 1.3974 sec/batch
Epoch: 35/50... Training Step: 2111... Training loss: 4.1169... 1.3275 sec/batch
Epoch: 35/50... Training Step: 2112... Training loss: 4.0705... 1.4033 sec/batch
Epoch: 35/50... Training Step: 2113... Training loss: 4.1821... 1.3846 sec/batch
Epoch: 35/50... Training Step: 2114... Training loss: 4.0055... 1.3929 sec/batch
Epoch: 35/50... Training Step: 2115... Training loss: 4.0153... 1.4044 sec/batch
Epoch: 35/50... Training Step: 2116... Training loss: 4.0186... 1.4066 sec/batch
Epoch: 35/50... Training Step: 2117... Training loss: 4.0582... 1.3921 sec/batch
Epoch: 35/50... Training Step: 2118... Training loss: 3.9880... 1.3299 sec/batch
Epoch: 35/50... Training Step: 2119... Training loss: 4.0114... 1.3887 sec/batch
Epoch: 35/50... Training Step: 2120... Training loss: 4.0341... 1.3800 sec/batch
Epoch: 35/50... Training Step: 2121... Training loss: 4.0345... 1.4140 sec/batch
Epoch: 35/50... Training Step: 2122... Training loss: 4.0165... 1.4037 sec/batch
Epoch: 35/50... Training Step: 2123... Training loss: 4.0666... 1.3876 sec/batch
Epoch: 35/50... Training Step: 2124... Training loss: 4.0095... 1.3573 sec/batch
Epoch: 35/50... Training Step: 2125... Training loss: 3.9883... 1.3464 sec/batch
Epoch: 35/50... Training Step: 2126... Training loss: 4.0878... 1.3533 sec/batch
Epoch: 35/50... Training Step: 2127... Training loss: 3.9740... 1.2862 sec/batch
Epoch: 35/50... Training Step: 2128... Training loss: 4.0270... 1.3872 sec/batch
Epoch: 35/50... Training Step: 2129... Training loss: 3.9667... 1.3860 sec/batch
Epoch: 35/50... Training Step: 2130... Training loss: 3.9324... 1.3772 sec/batch
Epoch: 35/50... Training Step: 2131... Training loss: 4.0638... 1.3521 sec/batch
Epoch: 35/50... Training Step: 2132... Training loss: 4.0060... 1.3973 sec/batch
Epoch: 35/50... Training Step: 2133... Training loss: 4.0208... 1.3180 sec/batch
Epoch: 35/50... Training Step: 2134... Training loss: 4.0316... 1.4187 sec/batch
Epoch: 35/50... Training Step: 2135... Training loss: 4.0060... 1.3823 sec/batch
Epoch: 36/50... Training Step: 2136... Training loss: 4.1854... 1.3877 sec/batch
Epoch: 36/50... Training Step: 2137... Training loss: 4.0283... 1.3759 sec/batch
Epoch: 36/50... Training Step: 2138... Training loss: 4.0218... 1.4124 sec/batch
Epoch: 36/50... Training Step: 2139... Training loss: 4.0697... 1.3982 sec/batch
Epoch: 36/50... Training Step: 2140... Training loss: 4.0442... 1.3911 sec/batch
Epoch: 36/50... Training Step: 2141... Training loss: 4.1038... 1.3916 sec/batch
Epoch: 36/50... Training Step: 2142... Training loss: 4.1155... 1.3829 sec/batch
Epoch: 36/50... Training Step: 2143... Training loss: 4.0709... 1.3974 sec/batch
Epoch: 36/50... Training Step: 2144... Training loss: 4.0516... 1.4185 sec/batch
Epoch: 36/50... Training Step: 2145... Training loss: 4.0857... 1.4198 sec/batch
Epoch: 36/50... Training Step: 2146... Training loss: 4.2162... 1.3992 sec/batch
Epoch: 36/50... Training Step: 2147... Training loss: 4.0415... 1.4184 sec/batch
Epoch: 36/50... Training Step: 2148... Training loss: 4.1039... 1.4263 sec/batch
Epoch: 36/50... Training Step: 2149... Training loss: 4.1003... 1.4131 sec/batch
Epoch: 36/50... Training Step: 2150... Training loss: 4.0305... 1.3568 sec/batch
Epoch: 36/50... Training Step: 2151... Training loss: 4.0262... 1.4146 sec/batch
Epoch: 36/50... Training Step: 2152... Training loss: 4.0728... 1.3959 sec/batch
Epoch: 36/50... Training Step: 2153... Training loss: 4.1125... 1.3799 sec/batch
Epoch: 36/50... Training Step: 2154... Training loss: 4.1022... 1.4174 sec/batch
Epoch: 36/50... Training Step: 2155... Training loss: 4.1164... 1.4317 sec/batch
Epoch: 36/50... Training Step: 2156... Training loss: 4.1263... 1.4255 sec/batch
Epoch: 36/50... Training Step: 2157... Training loss: 4.0247... 1.4191 sec/batch
Epoch: 36/50... Training Step: 2158... Training loss: 4.0053... 1.3044 sec/batch
Epoch: 36/50... Training Step: 2159... Training loss: 3.9945... 1.3897 sec/batch
Epoch: 36/50... Training Step: 2160... Training loss: 4.0077... 1.3935 sec/batch
Epoch: 36/50... Training Step: 2161... Training loss: 3.9695... 1.3906 sec/batch
Epoch: 36/50... Training Step: 2162... Training loss: 4.0710... 1.3989 sec/batch
Epoch: 36/50... Training Step: 2163... Training loss: 4.0724... 1.3055 sec/batch
Epoch: 36/50... Training Step: 2164... Training loss: 3.9822... 1.3921 sec/batch
Epoch: 36/50... Training Step: 2165... Training loss: 4.0347... 1.3934 sec/batch
Epoch: 36/50... Training Step: 2166... Training loss: 3.9820... 1.4034 sec/batch
Epoch: 36/50... Training Step: 2167... Training loss: 3.9619... 1.3950 sec/batch
Epoch: 36/50... Training Step: 2168... Training loss: 4.0371... 1.3562 sec/batch
Epoch: 36/50... Training Step: 2169... Training loss: 4.0214... 1.4125 sec/batch
Epoch: 36/50... Training Step: 2170... Training loss: 4.0737... 1.4042 sec/batch
Epoch: 36/50... Training Step: 2171... Training loss: 4.0056... 1.3735 sec/batch
Epoch: 36/50... Training Step: 2172... Training loss: 4.1109... 1.3838 sec/batch
Epoch: 36/50... Training Step: 2173... Training loss: 4.0677... 1.3400 sec/batch
Epoch: 36/50... Training Step: 2174... Training loss: 4.1491... 1.3447 sec/batch
Epoch: 36/50... Training Step: 2175... Training loss: 3.9839... 1.3943 sec/batch
Epoch: 36/50... Training Step: 2176... Training loss: 4.0003... 1.4270 sec/batch
Epoch: 36/50... Training Step: 2177... Training loss: 4.0038... 1.3793 sec/batch
Epoch: 36/50... Training Step: 2178... Training loss: 4.0464... 1.3790 sec/batch
Epoch: 36/50... Training Step: 2179... Training loss: 3.9598... 1.3910 sec/batch
Epoch: 36/50... Training Step: 2180... Training loss: 4.0048... 1.3725 sec/batch
Epoch: 36/50... Training Step: 2181... Training loss: 3.9967... 1.4048 sec/batch
Epoch: 36/50... Training Step: 2182... Training loss: 4.0127... 1.3934 sec/batch
Epoch: 36/50... Training Step: 2183... Training loss: 3.9937... 1.3603 sec/batch
Epoch: 36/50... Training Step: 2184... Training loss: 4.0487... 1.4197 sec/batch
Epoch: 36/50... Training Step: 2185... Training loss: 3.9898... 1.3873 sec/batch
Epoch: 36/50... Training Step: 2186... Training loss: 3.9711... 1.3243 sec/batch
Epoch: 36/50... Training Step: 2187... Training loss: 4.0694... 1.4126 sec/batch
Epoch: 36/50... Training Step: 2188... Training loss: 3.9780... 1.4058 sec/batch
Epoch: 36/50... Training Step: 2189... Training loss: 4.0057... 1.3522 sec/batch
Epoch: 36/50... Training Step: 2190... Training loss: 3.9549... 1.4006 sec/batch
Epoch: 36/50... Training Step: 2191... Training loss: 3.9107... 1.4033 sec/batch
Epoch: 36/50... Training Step: 2192... Training loss: 4.0450... 1.3769 sec/batch
Epoch: 36/50... Training Step: 2193... Training loss: 3.9898... 1.3962 sec/batch
Epoch: 36/50... Training Step: 2194... Training loss: 4.0177... 1.3951 sec/batch
Epoch: 36/50... Training Step: 2195... Training loss: 4.0119... 1.3646 sec/batch
Epoch: 36/50... Training Step: 2196... Training loss: 4.0029... 1.3362 sec/batch
Epoch: 37/50... Training Step: 2197... Training loss: 4.1586... 1.3755 sec/batch
Epoch: 37/50... Training Step: 2198... Training loss: 4.0109... 1.4117 sec/batch
Epoch: 37/50... Training Step: 2199... Training loss: 4.0149... 1.3938 sec/batch
Epoch: 37/50... Training Step: 2200... Training loss: 4.0427... 1.3869 sec/batch
Epoch: 37/50... Training Step: 2201... Training loss: 4.0146... 1.4100 sec/batch
Epoch: 37/50... Training Step: 2202... Training loss: 4.0834... 1.3875 sec/batch
Epoch: 37/50... Training Step: 2203... Training loss: 4.0842... 1.2869 sec/batch
Epoch: 37/50... Training Step: 2204... Training loss: 4.0457... 1.3845 sec/batch
Epoch: 37/50... Training Step: 2205... Training loss: 4.0305... 1.3765 sec/batch
Epoch: 37/50... Training Step: 2206... Training loss: 4.0650... 1.3814 sec/batch
Epoch: 37/50... Training Step: 2207... Training loss: 4.1933... 1.3975 sec/batch
Epoch: 37/50... Training Step: 2208... Training loss: 4.0322... 1.2323 sec/batch
Epoch: 37/50... Training Step: 2209... Training loss: 4.0787... 1.4537 sec/batch
Epoch: 37/50... Training Step: 2210... Training loss: 4.0773... 1.3746 sec/batch
Epoch: 37/50... Training Step: 2211... Training loss: 4.0134... 1.3352 sec/batch
Epoch: 37/50... Training Step: 2212... Training loss: 4.0132... 1.3611 sec/batch
Epoch: 37/50... Training Step: 2213... Training loss: 4.0518... 1.3718 sec/batch
Epoch: 37/50... Training Step: 2214... Training loss: 4.0790... 1.3932 sec/batch
Epoch: 37/50... Training Step: 2215... Training loss: 4.0819... 1.3875 sec/batch
Epoch: 37/50... Training Step: 2216... Training loss: 4.0871... 1.4112 sec/batch
Epoch: 37/50... Training Step: 2217... Training loss: 4.1004... 1.3228 sec/batch
Epoch: 37/50... Training Step: 2218... Training loss: 4.0011... 1.4539 sec/batch
Epoch: 37/50... Training Step: 2219... Training loss: 3.9781... 1.3451 sec/batch
Epoch: 37/50... Training Step: 2220... Training loss: 3.9757... 1.4112 sec/batch
Epoch: 37/50... Training Step: 2221... Training loss: 3.9914... 1.3858 sec/batch
Epoch: 37/50... Training Step: 2222... Training loss: 3.9504... 1.4215 sec/batch
Epoch: 37/50... Training Step: 2223... Training loss: 4.0362... 1.3828 sec/batch
Epoch: 37/50... Training Step: 2224... Training loss: 4.0525... 1.3856 sec/batch
Epoch: 37/50... Training Step: 2225... Training loss: 3.9644... 1.3304 sec/batch
Epoch: 37/50... Training Step: 2226... Training loss: 4.0147... 1.3795 sec/batch
Epoch: 37/50... Training Step: 2227... Training loss: 3.9594... 1.4083 sec/batch
Epoch: 37/50... Training Step: 2228... Training loss: 3.9500... 1.3795 sec/batch
Epoch: 37/50... Training Step: 2229... Training loss: 4.0168... 1.3997 sec/batch
Epoch: 37/50... Training Step: 2230... Training loss: 3.9955... 1.4206 sec/batch
Epoch: 37/50... Training Step: 2231... Training loss: 4.0427... 1.4547 sec/batch
Epoch: 37/50... Training Step: 2232... Training loss: 3.9993... 1.4055 sec/batch
Epoch: 37/50... Training Step: 2233... Training loss: 4.0869... 1.3871 sec/batch
Epoch: 37/50... Training Step: 2234... Training loss: 4.0442... 1.4133 sec/batch
Epoch: 37/50... Training Step: 2235... Training loss: 4.1369... 1.3915 sec/batch
Epoch: 37/50... Training Step: 2236... Training loss: 3.9688... 1.3807 sec/batch
Epoch: 37/50... Training Step: 2237... Training loss: 3.9727... 1.3988 sec/batch
Epoch: 37/50... Training Step: 2238... Training loss: 3.9835... 1.3307 sec/batch
Epoch: 37/50... Training Step: 2239... Training loss: 4.0316... 1.3624 sec/batch
Epoch: 37/50... Training Step: 2240... Training loss: 3.9512... 1.3407 sec/batch
Epoch: 37/50... Training Step: 2241... Training loss: 3.9684... 1.3827 sec/batch
Epoch: 37/50... Training Step: 2242... Training loss: 3.9749... 1.3849 sec/batch
Epoch: 37/50... Training Step: 2243... Training loss: 3.9976... 1.4006 sec/batch
Epoch: 37/50... Training Step: 2244... Training loss: 3.9834... 1.4179 sec/batch
Epoch: 37/50... Training Step: 2245... Training loss: 4.0335... 1.3610 sec/batch
Epoch: 37/50... Training Step: 2246... Training loss: 3.9722... 1.3665 sec/batch
Epoch: 37/50... Training Step: 2247... Training loss: 3.9462... 1.3971 sec/batch
Epoch: 37/50... Training Step: 2248... Training loss: 4.0314... 1.3851 sec/batch
Epoch: 37/50... Training Step: 2249... Training loss: 3.9371... 1.4341 sec/batch
Epoch: 37/50... Training Step: 2250... Training loss: 3.9900... 1.4032 sec/batch
Epoch: 37/50... Training Step: 2251... Training loss: 3.9237... 1.4037 sec/batch
Epoch: 37/50... Training Step: 2252... Training loss: 3.9024... 1.3555 sec/batch
Epoch: 37/50... Training Step: 2253... Training loss: 4.0262... 1.3903 sec/batch
Epoch: 37/50... Training Step: 2254... Training loss: 3.9555... 1.3999 sec/batch
Epoch: 37/50... Training Step: 2255... Training loss: 3.9909... 1.3919 sec/batch
Epoch: 37/50... Training Step: 2256... Training loss: 3.9974... 1.3938 sec/batch
Epoch: 37/50... Training Step: 2257... Training loss: 3.9799... 1.3331 sec/batch
Epoch: 38/50... Training Step: 2258... Training loss: 4.1268... 1.4193 sec/batch
Epoch: 38/50... Training Step: 2259... Training loss: 3.9975... 1.3922 sec/batch
Epoch: 38/50... Training Step: 2260... Training loss: 3.9855... 1.3890 sec/batch
Epoch: 38/50... Training Step: 2261... Training loss: 4.0279... 1.3894 sec/batch
Epoch: 38/50... Training Step: 2262... Training loss: 4.0094... 1.4129 sec/batch
Epoch: 38/50... Training Step: 2263... Training loss: 4.0704... 1.3810 sec/batch
Epoch: 38/50... Training Step: 2264... Training loss: 4.0702... 1.3904 sec/batch
Epoch: 38/50... Training Step: 2265... Training loss: 4.0364... 1.4249 sec/batch
Epoch: 38/50... Training Step: 2266... Training loss: 4.0076... 1.3821 sec/batch
Epoch: 38/50... Training Step: 2267... Training loss: 4.0544... 1.4081 sec/batch
Epoch: 38/50... Training Step: 2268... Training loss: 4.1763... 1.3872 sec/batch
Epoch: 38/50... Training Step: 2269... Training loss: 4.0129... 1.3696 sec/batch
Epoch: 38/50... Training Step: 2270... Training loss: 4.0666... 1.3641 sec/batch
Epoch: 38/50... Training Step: 2271... Training loss: 4.0773... 1.4022 sec/batch
Epoch: 38/50... Training Step: 2272... Training loss: 3.9829... 1.4150 sec/batch
Epoch: 38/50... Training Step: 2273... Training loss: 3.9841... 1.4073 sec/batch
Epoch: 38/50... Training Step: 2274... Training loss: 4.0300... 1.3718 sec/batch
Epoch: 38/50... Training Step: 2275... Training loss: 4.0671... 1.3555 sec/batch
Epoch: 38/50... Training Step: 2276... Training loss: 4.0605... 1.4348 sec/batch
Epoch: 38/50... Training Step: 2277... Training loss: 4.0688... 1.3869 sec/batch
Epoch: 38/50... Training Step: 2278... Training loss: 4.0883... 1.3925 sec/batch
Epoch: 38/50... Training Step: 2279... Training loss: 3.9871... 1.4343 sec/batch
Epoch: 38/50... Training Step: 2280... Training loss: 3.9601... 1.4324 sec/batch
Epoch: 38/50... Training Step: 2281... Training loss: 3.9473... 1.3948 sec/batch
Epoch: 38/50... Training Step: 2282... Training loss: 3.9693... 1.4379 sec/batch
Epoch: 38/50... Training Step: 2283... Training loss: 3.9532... 1.4292 sec/batch
Epoch: 38/50... Training Step: 2284... Training loss: 4.0250... 1.4309 sec/batch
Epoch: 38/50... Training Step: 2285... Training loss: 4.0350... 1.4068 sec/batch
Epoch: 38/50... Training Step: 2286... Training loss: 3.9494... 1.3422 sec/batch
Epoch: 38/50... Training Step: 2287... Training loss: 4.0041... 1.3991 sec/batch
Epoch: 38/50... Training Step: 2288... Training loss: 3.9387... 1.3836 sec/batch
Epoch: 38/50... Training Step: 2289... Training loss: 3.9445... 1.4238 sec/batch
Epoch: 38/50... Training Step: 2290... Training loss: 4.0034... 1.4069 sec/batch
Epoch: 38/50... Training Step: 2291... Training loss: 3.9877... 1.3781 sec/batch
Epoch: 38/50... Training Step: 2292... Training loss: 4.0196... 1.3696 sec/batch
Epoch: 38/50... Training Step: 2293... Training loss: 3.9745... 1.3878 sec/batch
Epoch: 38/50... Training Step: 2294... Training loss: 4.0711... 1.3939 sec/batch
Epoch: 38/50... Training Step: 2295... Training loss: 4.0101... 1.3199 sec/batch
Epoch: 38/50... Training Step: 2296... Training loss: 4.1110... 1.3808 sec/batch
Epoch: 38/50... Training Step: 2297... Training loss: 3.9426... 1.3826 sec/batch
Epoch: 38/50... Training Step: 2298... Training loss: 3.9517... 1.3939 sec/batch
Epoch: 38/50... Training Step: 2299... Training loss: 3.9636... 1.3279 sec/batch
Epoch: 38/50... Training Step: 2300... Training loss: 4.0056... 1.3948 sec/batch
Epoch: 38/50... Training Step: 2301... Training loss: 3.9264... 1.3758 sec/batch
Epoch: 38/50... Training Step: 2302... Training loss: 3.9633... 1.3666 sec/batch
Epoch: 38/50... Training Step: 2303... Training loss: 3.9586... 1.3581 sec/batch
Epoch: 38/50... Training Step: 2304... Training loss: 3.9970... 1.3984 sec/batch
Epoch: 38/50... Training Step: 2305... Training loss: 3.9526... 1.4065 sec/batch
Epoch: 38/50... Training Step: 2306... Training loss: 4.0066... 1.3860 sec/batch
Epoch: 38/50... Training Step: 2307... Training loss: 3.9460... 1.3345 sec/batch
Epoch: 38/50... Training Step: 2308... Training loss: 3.9216... 1.3705 sec/batch
Epoch: 38/50... Training Step: 2309... Training loss: 4.0354... 1.3654 sec/batch
Epoch: 38/50... Training Step: 2310... Training loss: 3.9232... 1.3291 sec/batch
Epoch: 38/50... Training Step: 2311... Training loss: 3.9642... 1.3983 sec/batch
Epoch: 38/50... Training Step: 2312... Training loss: 3.8971... 1.3383 sec/batch
Epoch: 38/50... Training Step: 2313... Training loss: 3.8947... 1.3715 sec/batch
Epoch: 38/50... Training Step: 2314... Training loss: 4.0086... 1.3700 sec/batch
Epoch: 38/50... Training Step: 2315... Training loss: 3.9457... 1.3719 sec/batch
Epoch: 38/50... Training Step: 2316... Training loss: 3.9732... 1.3947 sec/batch
Epoch: 38/50... Training Step: 2317... Training loss: 3.9809... 1.3880 sec/batch
Epoch: 38/50... Training Step: 2318... Training loss: 3.9512... 1.3810 sec/batch
Epoch: 39/50... Training Step: 2319... Training loss: 4.1120... 1.3980 sec/batch
Epoch: 39/50... Training Step: 2320... Training loss: 3.9784... 1.3902 sec/batch
Epoch: 39/50... Training Step: 2321... Training loss: 3.9734... 1.3967 sec/batch
Epoch: 39/50... Training Step: 2322... Training loss: 4.0194... 1.3869 sec/batch
Epoch: 39/50... Training Step: 2323... Training loss: 3.9843... 1.3824 sec/batch
Epoch: 39/50... Training Step: 2324... Training loss: 4.0393... 1.3835 sec/batch
Epoch: 39/50... Training Step: 2325... Training loss: 4.0526... 1.4232 sec/batch
Epoch: 39/50... Training Step: 2326... Training loss: 4.0225... 1.4124 sec/batch
Epoch: 39/50... Training Step: 2327... Training loss: 3.9901... 1.4469 sec/batch
Epoch: 39/50... Training Step: 2328... Training loss: 4.0379... 1.3988 sec/batch
Epoch: 39/50... Training Step: 2329... Training loss: 4.1406... 1.3215 sec/batch
Epoch: 39/50... Training Step: 2330... Training loss: 3.9770... 1.4149 sec/batch
Epoch: 39/50... Training Step: 2331... Training loss: 4.0461... 1.3783 sec/batch
Epoch: 39/50... Training Step: 2332... Training loss: 4.0415... 1.3949 sec/batch
Epoch: 39/50... Training Step: 2333... Training loss: 3.9785... 1.4037 sec/batch
Epoch: 39/50... Training Step: 2334... Training loss: 3.9591... 1.3844 sec/batch
Epoch: 39/50... Training Step: 2335... Training loss: 4.0086... 1.3947 sec/batch
Epoch: 39/50... Training Step: 2336... Training loss: 4.0563... 1.3896 sec/batch
Epoch: 39/50... Training Step: 2337... Training loss: 4.0379... 1.3951 sec/batch
Epoch: 39/50... Training Step: 2338... Training loss: 4.0490... 1.4065 sec/batch
Epoch: 39/50... Training Step: 2339... Training loss: 4.0731... 1.4006 sec/batch
Epoch: 39/50... Training Step: 2340... Training loss: 3.9679... 1.3828 sec/batch
Epoch: 39/50... Training Step: 2341... Training loss: 3.9433... 1.3469 sec/batch
Epoch: 39/50... Training Step: 2342... Training loss: 3.9188... 1.3777 sec/batch
Epoch: 39/50... Training Step: 2343... Training loss: 3.9435... 1.3782 sec/batch
Epoch: 39/50... Training Step: 2344... Training loss: 3.9134... 1.3759 sec/batch
Epoch: 39/50... Training Step: 2345... Training loss: 4.0119... 1.4200 sec/batch
Epoch: 39/50... Training Step: 2346... Training loss: 4.0114... 1.3905 sec/batch
Epoch: 39/50... Training Step: 2347... Training loss: 3.9280... 1.3527 sec/batch
Epoch: 39/50... Training Step: 2348... Training loss: 3.9961... 1.3999 sec/batch
Epoch: 39/50... Training Step: 2349... Training loss: 3.9408... 1.3946 sec/batch
Epoch: 39/50... Training Step: 2350... Training loss: 3.9004... 1.3874 sec/batch
Epoch: 39/50... Training Step: 2351... Training loss: 3.9804... 1.3242 sec/batch
Epoch: 39/50... Training Step: 2352... Training loss: 3.9623... 1.4301 sec/batch
Epoch: 39/50... Training Step: 2353... Training loss: 3.9960... 1.4193 sec/batch
Epoch: 39/50... Training Step: 2354... Training loss: 3.9518... 1.3927 sec/batch
Epoch: 39/50... Training Step: 2355... Training loss: 4.0437... 1.3572 sec/batch
Epoch: 39/50... Training Step: 2356... Training loss: 4.0012... 1.3988 sec/batch
Epoch: 39/50... Training Step: 2357... Training loss: 4.0967... 1.4025 sec/batch
Epoch: 39/50... Training Step: 2358... Training loss: 3.9267... 1.4147 sec/batch
Epoch: 39/50... Training Step: 2359... Training loss: 3.9404... 1.4341 sec/batch
Epoch: 39/50... Training Step: 2360... Training loss: 3.9563... 1.4189 sec/batch
Epoch: 39/50... Training Step: 2361... Training loss: 3.9906... 1.3206 sec/batch
Epoch: 39/50... Training Step: 2362... Training loss: 3.9035... 1.4184 sec/batch
Epoch: 39/50... Training Step: 2363... Training loss: 3.9532... 1.3772 sec/batch
Epoch: 39/50... Training Step: 2364... Training loss: 3.9344... 1.4138 sec/batch
Epoch: 39/50... Training Step: 2365... Training loss: 3.9689... 1.3873 sec/batch
Epoch: 39/50... Training Step: 2366... Training loss: 3.9443... 1.4033 sec/batch
Epoch: 39/50... Training Step: 2367... Training loss: 3.9822... 1.4077 sec/batch
Epoch: 39/50... Training Step: 2368... Training loss: 3.9458... 1.3254 sec/batch
Epoch: 39/50... Training Step: 2369... Training loss: 3.9048... 1.3589 sec/batch
Epoch: 39/50... Training Step: 2370... Training loss: 4.0098... 1.3958 sec/batch
Epoch: 39/50... Training Step: 2371... Training loss: 3.9094... 1.3933 sec/batch
Epoch: 39/50... Training Step: 2372... Training loss: 3.9432... 1.3923 sec/batch
Epoch: 39/50... Training Step: 2373... Training loss: 3.8983... 1.4210 sec/batch
Epoch: 39/50... Training Step: 2374... Training loss: 3.8670... 1.3827 sec/batch
Epoch: 39/50... Training Step: 2375... Training loss: 4.0083... 1.4079 sec/batch
Epoch: 39/50... Training Step: 2376... Training loss: 3.9272... 1.3745 sec/batch
Epoch: 39/50... Training Step: 2377... Training loss: 3.9523... 1.3943 sec/batch
Epoch: 39/50... Training Step: 2378... Training loss: 3.9602... 1.4007 sec/batch
Epoch: 39/50... Training Step: 2379... Training loss: 3.9381... 1.3506 sec/batch
Epoch: 40/50... Training Step: 2380... Training loss: 4.0890... 1.3793 sec/batch
Epoch: 40/50... Training Step: 2381... Training loss: 3.9515... 1.3786 sec/batch
Epoch: 40/50... Training Step: 2382... Training loss: 3.9638... 1.3001 sec/batch
Epoch: 40/50... Training Step: 2383... Training loss: 3.9995... 1.3796 sec/batch
Epoch: 40/50... Training Step: 2384... Training loss: 3.9713... 1.4057 sec/batch
Epoch: 40/50... Training Step: 2385... Training loss: 4.0283... 1.3535 sec/batch
Epoch: 40/50... Training Step: 2386... Training loss: 4.0367... 1.3878 sec/batch
Epoch: 40/50... Training Step: 2387... Training loss: 3.9931... 1.3785 sec/batch
Epoch: 40/50... Training Step: 2388... Training loss: 3.9843... 1.3901 sec/batch
Epoch: 40/50... Training Step: 2389... Training loss: 4.0229... 1.3925 sec/batch
Epoch: 40/50... Training Step: 2390... Training loss: 4.1396... 1.3827 sec/batch
Epoch: 40/50... Training Step: 2391... Training loss: 3.9699... 1.3956 sec/batch
Epoch: 40/50... Training Step: 2392... Training loss: 4.0250... 1.3777 sec/batch
Epoch: 40/50... Training Step: 2393... Training loss: 4.0264... 1.3250 sec/batch
Epoch: 40/50... Training Step: 2394... Training loss: 3.9578... 1.3879 sec/batch
Epoch: 40/50... Training Step: 2395... Training loss: 3.9575... 1.3504 sec/batch
Epoch: 40/50... Training Step: 2396... Training loss: 3.9741... 1.4070 sec/batch
Epoch: 40/50... Training Step: 2397... Training loss: 4.0328... 1.3911 sec/batch
Epoch: 40/50... Training Step: 2398... Training loss: 4.0196... 1.3900 sec/batch
Epoch: 40/50... Training Step: 2399... Training loss: 4.0315... 1.3464 sec/batch
Epoch: 40/50... Training Step: 2400... Training loss: 4.0426... 1.3437 sec/batch
Epoch: 40/50... Training Step: 2401... Training loss: 3.9420... 1.4102 sec/batch
Epoch: 40/50... Training Step: 2402... Training loss: 3.9166... 1.2971 sec/batch
Epoch: 40/50... Training Step: 2403... Training loss: 3.9054... 1.2966 sec/batch
Epoch: 40/50... Training Step: 2404... Training loss: 3.9413... 1.3996 sec/batch
Epoch: 40/50... Training Step: 2405... Training loss: 3.9033... 1.3788 sec/batch
Epoch: 40/50... Training Step: 2406... Training loss: 3.9795... 1.3663 sec/batch
Epoch: 40/50... Training Step: 2407... Training loss: 4.0005... 1.3325 sec/batch
Epoch: 40/50... Training Step: 2408... Training loss: 3.9059... 1.4126 sec/batch
Epoch: 40/50... Training Step: 2409... Training loss: 3.9603... 1.3911 sec/batch
Epoch: 40/50... Training Step: 2410... Training loss: 3.9020... 1.3926 sec/batch
Epoch: 40/50... Training Step: 2411... Training loss: 3.9052... 1.3965 sec/batch
Epoch: 40/50... Training Step: 2412... Training loss: 3.9568... 1.4098 sec/batch
Epoch: 40/50... Training Step: 2413... Training loss: 3.9360... 1.4003 sec/batch
Epoch: 40/50... Training Step: 2414... Training loss: 3.9780... 1.3882 sec/batch
Epoch: 40/50... Training Step: 2415... Training loss: 3.9338... 1.4127 sec/batch
Epoch: 40/50... Training Step: 2416... Training loss: 4.0162... 1.3913 sec/batch
Epoch: 40/50... Training Step: 2417... Training loss: 3.9927... 1.3898 sec/batch
Epoch: 40/50... Training Step: 2418... Training loss: 4.0611... 1.3995 sec/batch
Epoch: 40/50... Training Step: 2419... Training loss: 3.9009... 1.3988 sec/batch
Epoch: 40/50... Training Step: 2420... Training loss: 3.9185... 1.3535 sec/batch
Epoch: 40/50... Training Step: 2421... Training loss: 3.9470... 1.4004 sec/batch
Epoch: 40/50... Training Step: 2422... Training loss: 3.9559... 1.3864 sec/batch
Epoch: 40/50... Training Step: 2423... Training loss: 3.8967... 1.3810 sec/batch
Epoch: 40/50... Training Step: 2424... Training loss: 3.9244... 1.4109 sec/batch
Epoch: 40/50... Training Step: 2425... Training loss: 3.9361... 1.3703 sec/batch
Epoch: 40/50... Training Step: 2426... Training loss: 3.9530... 1.4389 sec/batch
Epoch: 40/50... Training Step: 2427... Training loss: 3.9349... 1.3917 sec/batch
Epoch: 40/50... Training Step: 2428... Training loss: 3.9563... 1.3983 sec/batch
Epoch: 40/50... Training Step: 2429... Training loss: 3.9157... 1.3931 sec/batch
Epoch: 40/50... Training Step: 2430... Training loss: 3.9019... 1.3943 sec/batch
Epoch: 40/50... Training Step: 2431... Training loss: 3.9824... 1.3969 sec/batch
Epoch: 40/50... Training Step: 2432... Training loss: 3.8908... 1.3975 sec/batch
Epoch: 40/50... Training Step: 2433... Training loss: 3.9363... 1.3507 sec/batch
Epoch: 40/50... Training Step: 2434... Training loss: 3.8596... 1.3869 sec/batch
Epoch: 40/50... Training Step: 2435... Training loss: 3.8535... 1.3790 sec/batch
Epoch: 40/50... Training Step: 2436... Training loss: 3.9663... 1.3973 sec/batch
Epoch: 40/50... Training Step: 2437... Training loss: 3.9060... 1.3859 sec/batch
Epoch: 40/50... Training Step: 2438... Training loss: 3.9399... 1.3954 sec/batch
Epoch: 40/50... Training Step: 2439... Training loss: 3.9322... 1.3905 sec/batch
Epoch: 40/50... Training Step: 2440... Training loss: 3.9178... 1.3719 sec/batch
Epoch: 41/50... Training Step: 2441... Training loss: 4.0733... 1.3929 sec/batch
Epoch: 41/50... Training Step: 2442... Training loss: 3.9418... 1.3922 sec/batch
Epoch: 41/50... Training Step: 2443... Training loss: 3.9289... 1.2778 sec/batch
Epoch: 41/50... Training Step: 2444... Training loss: 3.9722... 1.4204 sec/batch
Epoch: 41/50... Training Step: 2445... Training loss: 3.9462... 1.3234 sec/batch
Epoch: 41/50... Training Step: 2446... Training loss: 4.0054... 1.2960 sec/batch
Epoch: 41/50... Training Step: 2447... Training loss: 4.0087... 1.4101 sec/batch
Epoch: 41/50... Training Step: 2448... Training loss: 3.9747... 1.3881 sec/batch
Epoch: 41/50... Training Step: 2449... Training loss: 3.9435... 1.3978 sec/batch
Epoch: 41/50... Training Step: 2450... Training loss: 3.9997... 1.3895 sec/batch
Epoch: 41/50... Training Step: 2451... Training loss: 4.1137... 1.4162 sec/batch
Epoch: 41/50... Training Step: 2452... Training loss: 3.9526... 1.3941 sec/batch
Epoch: 41/50... Training Step: 2453... Training loss: 4.0077... 1.3966 sec/batch
Epoch: 41/50... Training Step: 2454... Training loss: 4.0050... 1.4153 sec/batch
Epoch: 41/50... Training Step: 2455... Training loss: 3.9419... 1.4008 sec/batch
Epoch: 41/50... Training Step: 2456... Training loss: 3.9326... 1.4156 sec/batch
Epoch: 41/50... Training Step: 2457... Training loss: 3.9613... 1.3878 sec/batch
Epoch: 41/50... Training Step: 2458... Training loss: 4.0145... 1.4213 sec/batch
Epoch: 41/50... Training Step: 2459... Training loss: 4.0037... 1.3667 sec/batch
Epoch: 41/50... Training Step: 2460... Training loss: 4.0171... 1.3672 sec/batch
Epoch: 41/50... Training Step: 2461... Training loss: 4.0234... 1.3554 sec/batch
Epoch: 41/50... Training Step: 2462... Training loss: 3.9350... 1.4081 sec/batch
Epoch: 41/50... Training Step: 2463... Training loss: 3.9072... 1.4145 sec/batch
Epoch: 41/50... Training Step: 2464... Training loss: 3.8976... 1.3900 sec/batch
Epoch: 41/50... Training Step: 2465... Training loss: 3.9253... 1.4048 sec/batch
Epoch: 41/50... Training Step: 2466... Training loss: 3.8898... 1.4008 sec/batch
Epoch: 41/50... Training Step: 2467... Training loss: 3.9696... 1.3834 sec/batch
Epoch: 41/50... Training Step: 2468... Training loss: 3.9707... 1.3265 sec/batch
Epoch: 41/50... Training Step: 2469... Training loss: 3.8938... 1.4010 sec/batch
Epoch: 41/50... Training Step: 2470... Training loss: 3.9488... 1.4430 sec/batch
Epoch: 41/50... Training Step: 2471... Training loss: 3.8888... 1.3996 sec/batch
Epoch: 41/50... Training Step: 2472... Training loss: 3.8801... 1.4040 sec/batch
Epoch: 41/50... Training Step: 2473... Training loss: 3.9384... 1.3945 sec/batch
Epoch: 41/50... Training Step: 2474... Training loss: 3.9228... 1.3976 sec/batch
Epoch: 41/50... Training Step: 2475... Training loss: 3.9591... 1.3774 sec/batch
Epoch: 41/50... Training Step: 2476... Training loss: 3.9124... 1.4086 sec/batch
Epoch: 41/50... Training Step: 2477... Training loss: 4.0087... 1.3905 sec/batch
Epoch: 41/50... Training Step: 2478... Training loss: 3.9668... 1.3937 sec/batch
Epoch: 41/50... Training Step: 2479... Training loss: 4.0448... 1.3739 sec/batch
Epoch: 41/50... Training Step: 2480... Training loss: 3.8976... 1.3972 sec/batch
Epoch: 41/50... Training Step: 2481... Training loss: 3.9004... 1.3316 sec/batch
Epoch: 41/50... Training Step: 2482... Training loss: 3.9139... 1.4078 sec/batch
Epoch: 41/50... Training Step: 2483... Training loss: 3.9393... 1.4140 sec/batch
Epoch: 41/50... Training Step: 2484... Training loss: 3.8779... 1.3905 sec/batch
Epoch: 41/50... Training Step: 2485... Training loss: 3.9074... 1.3987 sec/batch
Epoch: 41/50... Training Step: 2486... Training loss: 3.9022... 1.4061 sec/batch
Epoch: 41/50... Training Step: 2487... Training loss: 3.9197... 1.4080 sec/batch
Epoch: 41/50... Training Step: 2488... Training loss: 3.9016... 1.4214 sec/batch
Epoch: 41/50... Training Step: 2489... Training loss: 3.9512... 1.4109 sec/batch
Epoch: 41/50... Training Step: 2490... Training loss: 3.8950... 1.3868 sec/batch
Epoch: 41/50... Training Step: 2491... Training loss: 3.8772... 1.4004 sec/batch
Epoch: 41/50... Training Step: 2492... Training loss: 3.9654... 1.4068 sec/batch
Epoch: 41/50... Training Step: 2493... Training loss: 3.8703... 1.2976 sec/batch
Epoch: 41/50... Training Step: 2494... Training loss: 3.9229... 1.4296 sec/batch
Epoch: 41/50... Training Step: 2495... Training loss: 3.8455... 1.4015 sec/batch
Epoch: 41/50... Training Step: 2496... Training loss: 3.8332... 1.4074 sec/batch
Epoch: 41/50... Training Step: 2497... Training loss: 3.9598... 1.3963 sec/batch
Epoch: 41/50... Training Step: 2498... Training loss: 3.9031... 1.4246 sec/batch
Epoch: 41/50... Training Step: 2499... Training loss: 3.9279... 1.3836 sec/batch
Epoch: 41/50... Training Step: 2500... Training loss: 3.9114... 1.4047 sec/batch
Epoch: 41/50... Training Step: 2501... Training loss: 3.9061... 1.3652 sec/batch
Epoch: 42/50... Training Step: 2502... Training loss: 4.0575... 1.3672 sec/batch
Epoch: 42/50... Training Step: 2503... Training loss: 3.9196... 1.3829 sec/batch
Epoch: 42/50... Training Step: 2504... Training loss: 3.9117... 1.4005 sec/batch
Epoch: 42/50... Training Step: 2505... Training loss: 3.9650... 1.3668 sec/batch
Epoch: 42/50... Training Step: 2506... Training loss: 3.9365... 1.3554 sec/batch
Epoch: 42/50... Training Step: 2507... Training loss: 3.9896... 1.3837 sec/batch
Epoch: 42/50... Training Step: 2508... Training loss: 3.9943... 1.3959 sec/batch
Epoch: 42/50... Training Step: 2509... Training loss: 3.9689... 1.3736 sec/batch
Epoch: 42/50... Training Step: 2510... Training loss: 3.9343... 1.3908 sec/batch
Epoch: 42/50... Training Step: 2511... Training loss: 3.9830... 1.3950 sec/batch
Epoch: 42/50... Training Step: 2512... Training loss: 4.0958... 1.4016 sec/batch
Epoch: 42/50... Training Step: 2513... Training loss: 3.9176... 1.4108 sec/batch
Epoch: 42/50... Training Step: 2514... Training loss: 3.9827... 1.3782 sec/batch
Epoch: 42/50... Training Step: 2515... Training loss: 3.9969... 1.3199 sec/batch
Epoch: 42/50... Training Step: 2516... Training loss: 3.9411... 1.4212 sec/batch
Epoch: 42/50... Training Step: 2517... Training loss: 3.9009... 1.4019 sec/batch
Epoch: 42/50... Training Step: 2518... Training loss: 3.9436... 1.4208 sec/batch
Epoch: 42/50... Training Step: 2519... Training loss: 4.0041... 1.4513 sec/batch
Epoch: 42/50... Training Step: 2520... Training loss: 3.9942... 1.4029 sec/batch
Epoch: 42/50... Training Step: 2521... Training loss: 3.9970... 1.3662 sec/batch
Epoch: 42/50... Training Step: 2522... Training loss: 4.0090... 1.4092 sec/batch
Epoch: 42/50... Training Step: 2523... Training loss: 3.9219... 1.4148 sec/batch
Epoch: 42/50... Training Step: 2524... Training loss: 3.8849... 1.4203 sec/batch
Epoch: 42/50... Training Step: 2525... Training loss: 3.8809... 1.4132 sec/batch
Epoch: 42/50... Training Step: 2526... Training loss: 3.8973... 1.4174 sec/batch
Epoch: 42/50... Training Step: 2527... Training loss: 3.8574... 1.4058 sec/batch
Epoch: 42/50... Training Step: 2528... Training loss: 3.9496... 1.4156 sec/batch
Epoch: 42/50... Training Step: 2529... Training loss: 3.9544... 1.3373 sec/batch
Epoch: 42/50... Training Step: 2530... Training loss: 3.8783... 1.4211 sec/batch
Epoch: 42/50... Training Step: 2531... Training loss: 3.9279... 1.3767 sec/batch
Epoch: 42/50... Training Step: 2532... Training loss: 3.8756... 1.3680 sec/batch
Epoch: 42/50... Training Step: 2533... Training loss: 3.8585... 1.3566 sec/batch
Epoch: 42/50... Training Step: 2534... Training loss: 3.9309... 1.3921 sec/batch
Epoch: 42/50... Training Step: 2535... Training loss: 3.9078... 1.3558 sec/batch
Epoch: 42/50... Training Step: 2536... Training loss: 3.9364... 1.3532 sec/batch
Epoch: 42/50... Training Step: 2537... Training loss: 3.9016... 1.4328 sec/batch
Epoch: 42/50... Training Step: 2538... Training loss: 3.9842... 1.4519 sec/batch
Epoch: 42/50... Training Step: 2539... Training loss: 3.9533... 1.4094 sec/batch
Epoch: 42/50... Training Step: 2540... Training loss: 4.0455... 1.3907 sec/batch
Epoch: 42/50... Training Step: 2541... Training loss: 3.8854... 1.4261 sec/batch
Epoch: 42/50... Training Step: 2542... Training loss: 3.8876... 1.3834 sec/batch
Epoch: 42/50... Training Step: 2543... Training loss: 3.8898... 1.3556 sec/batch
Epoch: 42/50... Training Step: 2544... Training loss: 3.9385... 1.4009 sec/batch
Epoch: 42/50... Training Step: 2545... Training loss: 3.8718... 1.3934 sec/batch
Epoch: 42/50... Training Step: 2546... Training loss: 3.8929... 1.3743 sec/batch
Epoch: 42/50... Training Step: 2547... Training loss: 3.8902... 1.3792 sec/batch
Epoch: 42/50... Training Step: 2548... Training loss: 3.9044... 1.3986 sec/batch
Epoch: 42/50... Training Step: 2549... Training loss: 3.8841... 1.3679 sec/batch
Epoch: 42/50... Training Step: 2550... Training loss: 3.9332... 1.3816 sec/batch
Epoch: 42/50... Training Step: 2551... Training loss: 3.8817... 1.3780 sec/batch
Epoch: 42/50... Training Step: 2552... Training loss: 3.8591... 1.3736 sec/batch
Epoch: 42/50... Training Step: 2553... Training loss: 3.9615... 1.3799 sec/batch
Epoch: 42/50... Training Step: 2554... Training loss: 3.8522... 1.4026 sec/batch
Epoch: 42/50... Training Step: 2555... Training loss: 3.8918... 1.4054 sec/batch
Epoch: 42/50... Training Step: 2556... Training loss: 3.8359... 1.3554 sec/batch
Epoch: 42/50... Training Step: 2557... Training loss: 3.8204... 1.3973 sec/batch
Epoch: 42/50... Training Step: 2558... Training loss: 3.9321... 1.3936 sec/batch
Epoch: 42/50... Training Step: 2559... Training loss: 3.8664... 1.4105 sec/batch
Epoch: 42/50... Training Step: 2560... Training loss: 3.9056... 1.3298 sec/batch
Epoch: 42/50... Training Step: 2561... Training loss: 3.9058... 1.3292 sec/batch
Epoch: 42/50... Training Step: 2562... Training loss: 3.8842... 1.3935 sec/batch
Epoch: 43/50... Training Step: 2563... Training loss: 4.0231... 1.4254 sec/batch
Epoch: 43/50... Training Step: 2564... Training loss: 3.9029... 1.3822 sec/batch
Epoch: 43/50... Training Step: 2565... Training loss: 3.9031... 1.3950 sec/batch
Epoch: 43/50... Training Step: 2566... Training loss: 3.9487... 1.4110 sec/batch
Epoch: 43/50... Training Step: 2567... Training loss: 3.9186... 1.3206 sec/batch
Epoch: 43/50... Training Step: 2568... Training loss: 3.9821... 1.3796 sec/batch
Epoch: 43/50... Training Step: 2569... Training loss: 3.9799... 1.3840 sec/batch
Epoch: 43/50... Training Step: 2570... Training loss: 3.9347... 1.3981 sec/batch
Epoch: 43/50... Training Step: 2571... Training loss: 3.9227... 1.3979 sec/batch
Epoch: 43/50... Training Step: 2572... Training loss: 3.9617... 1.4084 sec/batch
Epoch: 43/50... Training Step: 2573... Training loss: 4.0834... 1.4160 sec/batch
Epoch: 43/50... Training Step: 2574... Training loss: 3.9077... 1.4007 sec/batch
Epoch: 43/50... Training Step: 2575... Training loss: 3.9800... 1.3828 sec/batch
Epoch: 43/50... Training Step: 2576... Training loss: 3.9697... 1.3658 sec/batch
Epoch: 43/50... Training Step: 2577... Training loss: 3.9101... 1.3947 sec/batch
Epoch: 43/50... Training Step: 2578... Training loss: 3.9059... 1.3890 sec/batch
Epoch: 43/50... Training Step: 2579... Training loss: 3.9251... 1.4011 sec/batch
Epoch: 43/50... Training Step: 2580... Training loss: 3.9843... 1.4013 sec/batch
Epoch: 43/50... Training Step: 2581... Training loss: 3.9722... 1.4481 sec/batch
Epoch: 43/50... Training Step: 2582... Training loss: 3.9780... 1.3825 sec/batch
Epoch: 43/50... Training Step: 2583... Training loss: 3.9889... 1.3783 sec/batch
Epoch: 43/50... Training Step: 2584... Training loss: 3.9085... 1.4169 sec/batch
Epoch: 43/50... Training Step: 2585... Training loss: 3.8752... 1.4066 sec/batch
Epoch: 43/50... Training Step: 2586... Training loss: 3.8690... 1.4201 sec/batch
Epoch: 43/50... Training Step: 2587... Training loss: 3.8823... 1.3033 sec/batch
Epoch: 43/50... Training Step: 2588... Training loss: 3.8553... 1.3935 sec/batch
Epoch: 43/50... Training Step: 2589... Training loss: 3.9365... 1.3225 sec/batch
Epoch: 43/50... Training Step: 2590... Training loss: 3.9477... 1.4524 sec/batch
Epoch: 43/50... Training Step: 2591... Training loss: 3.8622... 1.4092 sec/batch
Epoch: 43/50... Training Step: 2592... Training loss: 3.9064... 1.4065 sec/batch
Epoch: 43/50... Training Step: 2593... Training loss: 3.8657... 1.3833 sec/batch
Epoch: 43/50... Training Step: 2594... Training loss: 3.8454... 1.3637 sec/batch
Epoch: 43/50... Training Step: 2595... Training loss: 3.9098... 1.4295 sec/batch
Epoch: 43/50... Training Step: 2596... Training loss: 3.8776... 1.3872 sec/batch
Epoch: 43/50... Training Step: 2597... Training loss: 3.9366... 1.3236 sec/batch
Epoch: 43/50... Training Step: 2598... Training loss: 3.8882... 1.3904 sec/batch
Epoch: 43/50... Training Step: 2599... Training loss: 3.9688... 1.3921 sec/batch
Epoch: 43/50... Training Step: 2600... Training loss: 3.9335... 1.3830 sec/batch
Epoch: 43/50... Training Step: 2601... Training loss: 4.0308... 1.4528 sec/batch
Epoch: 43/50... Training Step: 2602... Training loss: 3.8661... 1.3974 sec/batch
Epoch: 43/50... Training Step: 2603... Training loss: 3.8664... 1.4028 sec/batch
Epoch: 43/50... Training Step: 2604... Training loss: 3.8786... 1.4006 sec/batch
Epoch: 43/50... Training Step: 2605... Training loss: 3.9035... 1.3926 sec/batch
Epoch: 43/50... Training Step: 2606... Training loss: 3.8455... 1.3446 sec/batch
Epoch: 43/50... Training Step: 2607... Training loss: 3.8834... 1.4409 sec/batch
Epoch: 43/50... Training Step: 2608... Training loss: 3.8750... 1.4119 sec/batch
Epoch: 43/50... Training Step: 2609... Training loss: 3.8874... 1.3494 sec/batch
Epoch: 43/50... Training Step: 2610... Training loss: 3.8734... 1.3974 sec/batch
Epoch: 43/50... Training Step: 2611... Training loss: 3.9269... 1.4088 sec/batch
Epoch: 43/50... Training Step: 2612... Training loss: 3.8633... 1.3928 sec/batch
Epoch: 43/50... Training Step: 2613... Training loss: 3.8399... 1.4146 sec/batch
Epoch: 43/50... Training Step: 2614... Training loss: 3.9360... 1.4299 sec/batch
Epoch: 43/50... Training Step: 2615... Training loss: 3.8371... 1.4317 sec/batch
Epoch: 43/50... Training Step: 2616... Training loss: 3.8831... 1.4208 sec/batch
Epoch: 43/50... Training Step: 2617... Training loss: 3.8332... 1.3666 sec/batch
Epoch: 43/50... Training Step: 2618... Training loss: 3.7949... 1.4119 sec/batch
Epoch: 43/50... Training Step: 2619... Training loss: 3.9311... 1.3616 sec/batch
Epoch: 43/50... Training Step: 2620... Training loss: 3.8636... 1.3594 sec/batch
Epoch: 43/50... Training Step: 2621... Training loss: 3.8976... 1.3881 sec/batch
Epoch: 43/50... Training Step: 2622... Training loss: 3.8825... 1.3797 sec/batch
Epoch: 43/50... Training Step: 2623... Training loss: 3.8688... 1.3894 sec/batch
Epoch: 44/50... Training Step: 2624... Training loss: 4.0213... 1.3616 sec/batch
Epoch: 44/50... Training Step: 2625... Training loss: 3.8876... 1.4077 sec/batch
Epoch: 44/50... Training Step: 2626... Training loss: 3.8873... 1.3896 sec/batch
Epoch: 44/50... Training Step: 2627... Training loss: 3.9307... 1.3864 sec/batch
Epoch: 44/50... Training Step: 2628... Training loss: 3.9089... 1.3733 sec/batch
Epoch: 44/50... Training Step: 2629... Training loss: 3.9642... 1.4085 sec/batch
Epoch: 44/50... Training Step: 2630... Training loss: 3.9481... 1.3788 sec/batch
Epoch: 44/50... Training Step: 2631... Training loss: 3.9189... 1.3255 sec/batch
Epoch: 44/50... Training Step: 2632... Training loss: 3.9131... 1.4087 sec/batch
Epoch: 44/50... Training Step: 2633... Training loss: 3.9446... 1.4082 sec/batch
Epoch: 44/50... Training Step: 2634... Training loss: 4.0551... 1.3605 sec/batch
Epoch: 44/50... Training Step: 2635... Training loss: 3.9038... 1.3214 sec/batch
Epoch: 44/50... Training Step: 2636... Training loss: 3.9462... 1.4081 sec/batch
Epoch: 44/50... Training Step: 2637... Training loss: 3.9477... 1.3047 sec/batch
Epoch: 44/50... Training Step: 2638... Training loss: 3.8812... 1.4176 sec/batch
Epoch: 44/50... Training Step: 2639... Training loss: 3.8973... 1.3301 sec/batch
Epoch: 44/50... Training Step: 2640... Training loss: 3.9107... 1.3974 sec/batch
Epoch: 44/50... Training Step: 2641... Training loss: 3.9670... 1.3915 sec/batch
Epoch: 44/50... Training Step: 2642... Training loss: 3.9536... 1.3844 sec/batch
Epoch: 44/50... Training Step: 2643... Training loss: 3.9641... 1.3371 sec/batch
Epoch: 44/50... Training Step: 2644... Training loss: 3.9647... 1.4162 sec/batch
Epoch: 44/50... Training Step: 2645... Training loss: 3.8888... 1.4064 sec/batch
Epoch: 44/50... Training Step: 2646... Training loss: 3.8481... 1.3768 sec/batch
Epoch: 44/50... Training Step: 2647... Training loss: 3.8608... 1.3731 sec/batch
Epoch: 44/50... Training Step: 2648... Training loss: 3.8711... 1.3453 sec/batch
Epoch: 44/50... Training Step: 2649... Training loss: 3.8347... 1.3992 sec/batch
Epoch: 44/50... Training Step: 2650... Training loss: 3.9160... 1.4133 sec/batch
Epoch: 44/50... Training Step: 2651... Training loss: 3.9278... 1.3719 sec/batch
Epoch: 44/50... Training Step: 2652... Training loss: 3.8561... 1.3740 sec/batch
Epoch: 44/50... Training Step: 2653... Training loss: 3.8967... 1.3894 sec/batch
Epoch: 44/50... Training Step: 2654... Training loss: 3.8551... 1.3887 sec/batch
Epoch: 44/50... Training Step: 2655... Training loss: 3.8324... 1.3960 sec/batch
Epoch: 44/50... Training Step: 2656... Training loss: 3.8957... 1.3855 sec/batch
Epoch: 44/50... Training Step: 2657... Training loss: 3.8791... 1.3532 sec/batch
Epoch: 44/50... Training Step: 2658... Training loss: 3.9113... 1.3978 sec/batch
Epoch: 44/50... Training Step: 2659... Training loss: 3.8700... 1.3646 sec/batch
Epoch: 44/50... Training Step: 2660... Training loss: 3.9621... 1.4115 sec/batch
Epoch: 44/50... Training Step: 2661... Training loss: 3.9189... 1.3851 sec/batch
Epoch: 44/50... Training Step: 2662... Training loss: 4.0005... 1.4045 sec/batch
Epoch: 44/50... Training Step: 2663... Training loss: 3.8513... 1.4062 sec/batch
Epoch: 44/50... Training Step: 2664... Training loss: 3.8628... 1.3802 sec/batch
Epoch: 44/50... Training Step: 2665... Training loss: 3.8640... 1.3947 sec/batch
Epoch: 44/50... Training Step: 2666... Training loss: 3.8956... 1.4719 sec/batch
Epoch: 44/50... Training Step: 2667... Training loss: 3.8370... 1.3385 sec/batch
Epoch: 44/50... Training Step: 2668... Training loss: 3.8665... 1.4061 sec/batch
Epoch: 44/50... Training Step: 2669... Training loss: 3.8587... 1.3860 sec/batch
Epoch: 44/50... Training Step: 2670... Training loss: 3.8914... 1.3925 sec/batch
Epoch: 44/50... Training Step: 2671... Training loss: 3.8492... 1.3373 sec/batch
Epoch: 44/50... Training Step: 2672... Training loss: 3.9115... 1.4118 sec/batch
Epoch: 44/50... Training Step: 2673... Training loss: 3.8619... 1.3544 sec/batch
Epoch: 44/50... Training Step: 2674... Training loss: 3.8386... 1.4013 sec/batch
Epoch: 44/50... Training Step: 2675... Training loss: 3.9314... 1.4002 sec/batch
Epoch: 44/50... Training Step: 2676... Training loss: 3.8248... 1.4519 sec/batch
Epoch: 44/50... Training Step: 2677... Training loss: 3.8725... 1.3507 sec/batch
Epoch: 44/50... Training Step: 2678... Training loss: 3.8211... 1.3516 sec/batch
Epoch: 44/50... Training Step: 2679... Training loss: 3.7830... 1.3864 sec/batch
Epoch: 44/50... Training Step: 2680... Training loss: 3.9084... 1.4195 sec/batch
Epoch: 44/50... Training Step: 2681... Training loss: 3.8562... 1.3586 sec/batch
Epoch: 44/50... Training Step: 2682... Training loss: 3.8851... 1.3818 sec/batch
Epoch: 44/50... Training Step: 2683... Training loss: 3.8704... 1.3555 sec/batch
Epoch: 44/50... Training Step: 2684... Training loss: 3.8515... 1.3856 sec/batch
Epoch: 45/50... Training Step: 2685... Training loss: 4.0098... 1.3945 sec/batch
Epoch: 45/50... Training Step: 2686... Training loss: 3.8730... 1.3798 sec/batch
Epoch: 45/50... Training Step: 2687... Training loss: 3.8776... 1.4173 sec/batch
Epoch: 45/50... Training Step: 2688... Training loss: 3.9014... 1.3537 sec/batch
Epoch: 45/50... Training Step: 2689... Training loss: 3.8788... 1.3814 sec/batch
Epoch: 45/50... Training Step: 2690... Training loss: 3.9485... 1.4243 sec/batch
Epoch: 45/50... Training Step: 2691... Training loss: 3.9470... 1.3958 sec/batch
Epoch: 45/50... Training Step: 2692... Training loss: 3.9037... 1.3798 sec/batch
Epoch: 45/50... Training Step: 2693... Training loss: 3.8927... 1.2858 sec/batch
Epoch: 45/50... Training Step: 2694... Training loss: 3.9212... 1.4229 sec/batch
Epoch: 45/50... Training Step: 2695... Training loss: 4.0525... 1.3780 sec/batch
Epoch: 45/50... Training Step: 2696... Training loss: 3.8854... 1.3799 sec/batch
Epoch: 45/50... Training Step: 2697... Training loss: 3.9522... 1.3198 sec/batch
Epoch: 45/50... Training Step: 2698... Training loss: 3.9338... 1.3850 sec/batch
Epoch: 45/50... Training Step: 2699... Training loss: 3.8765... 1.4043 sec/batch
Epoch: 45/50... Training Step: 2700... Training loss: 3.8610... 1.2869 sec/batch
Epoch: 45/50... Training Step: 2701... Training loss: 3.9019... 1.3910 sec/batch
Epoch: 45/50... Training Step: 2702... Training loss: 3.9487... 1.3821 sec/batch
Epoch: 45/50... Training Step: 2703... Training loss: 3.9405... 1.3874 sec/batch
Epoch: 45/50... Training Step: 2704... Training loss: 3.9532... 1.4166 sec/batch
Epoch: 45/50... Training Step: 2705... Training loss: 3.9540... 1.3943 sec/batch
Epoch: 45/50... Training Step: 2706... Training loss: 3.8753... 1.4220 sec/batch
Epoch: 45/50... Training Step: 2707... Training loss: 3.8408... 1.3968 sec/batch
Epoch: 45/50... Training Step: 2708... Training loss: 3.8431... 1.3830 sec/batch
Epoch: 45/50... Training Step: 2709... Training loss: 3.8644... 1.4449 sec/batch
Epoch: 45/50... Training Step: 2710... Training loss: 3.8076... 1.3206 sec/batch
Epoch: 45/50... Training Step: 2711... Training loss: 3.9082... 1.3894 sec/batch
Epoch: 45/50... Training Step: 2712... Training loss: 3.9108... 1.4177 sec/batch
Epoch: 45/50... Training Step: 2713... Training loss: 3.8315... 1.4419 sec/batch
Epoch: 45/50... Training Step: 2714... Training loss: 3.8771... 1.3857 sec/batch
Epoch: 45/50... Training Step: 2715... Training loss: 3.8402... 1.3938 sec/batch
Epoch: 45/50... Training Step: 2716... Training loss: 3.8386... 1.4325 sec/batch
Epoch: 45/50... Training Step: 2717... Training loss: 3.8740... 1.3635 sec/batch
Epoch: 45/50... Training Step: 2718... Training loss: 3.8588... 1.4036 sec/batch
Epoch: 45/50... Training Step: 2719... Training loss: 3.9060... 1.3799 sec/batch
Epoch: 45/50... Training Step: 2720... Training loss: 3.8744... 1.3736 sec/batch
Epoch: 45/50... Training Step: 2721... Training loss: 3.9490... 1.4000 sec/batch
Epoch: 45/50... Training Step: 2722... Training loss: 3.8943... 1.4005 sec/batch
Epoch: 45/50... Training Step: 2723... Training loss: 3.9922... 1.4353 sec/batch
Epoch: 45/50... Training Step: 2724... Training loss: 3.8347... 1.4198 sec/batch
Epoch: 45/50... Training Step: 2725... Training loss: 3.8421... 1.4083 sec/batch
Epoch: 45/50... Training Step: 2726... Training loss: 3.8614... 1.4066 sec/batch
Epoch: 45/50... Training Step: 2727... Training loss: 3.8921... 1.3482 sec/batch
Epoch: 45/50... Training Step: 2728... Training loss: 3.8263... 1.3955 sec/batch
Epoch: 45/50... Training Step: 2729... Training loss: 3.8497... 1.4172 sec/batch
Epoch: 45/50... Training Step: 2730... Training loss: 3.8552... 1.3627 sec/batch
Epoch: 45/50... Training Step: 2731... Training loss: 3.8757... 1.4138 sec/batch
Epoch: 45/50... Training Step: 2732... Training loss: 3.8302... 1.3655 sec/batch
Epoch: 45/50... Training Step: 2733... Training loss: 3.8921... 1.3286 sec/batch
Epoch: 45/50... Training Step: 2734... Training loss: 3.8413... 1.4308 sec/batch
Epoch: 45/50... Training Step: 2735... Training loss: 3.8147... 1.3019 sec/batch
Epoch: 45/50... Training Step: 2736... Training loss: 3.9092... 1.3998 sec/batch
Epoch: 45/50... Training Step: 2737... Training loss: 3.8102... 1.3812 sec/batch
Epoch: 45/50... Training Step: 2738... Training loss: 3.8344... 1.4176 sec/batch
Epoch: 45/50... Training Step: 2739... Training loss: 3.7969... 1.3988 sec/batch
Epoch: 45/50... Training Step: 2740... Training loss: 3.7709... 1.3897 sec/batch
Epoch: 45/50... Training Step: 2741... Training loss: 3.8883... 1.3833 sec/batch
Epoch: 45/50... Training Step: 2742... Training loss: 3.8363... 1.3960 sec/batch
Epoch: 45/50... Training Step: 2743... Training loss: 3.8574... 1.4193 sec/batch
Epoch: 45/50... Training Step: 2744... Training loss: 3.8542... 1.4100 sec/batch
Epoch: 45/50... Training Step: 2745... Training loss: 3.8390... 1.4218 sec/batch
Epoch: 46/50... Training Step: 2746... Training loss: 3.9796... 1.3945 sec/batch
Epoch: 46/50... Training Step: 2747... Training loss: 3.8661... 1.3669 sec/batch
Epoch: 46/50... Training Step: 2748... Training loss: 3.8591... 1.3698 sec/batch
Epoch: 46/50... Training Step: 2749... Training loss: 3.8996... 1.3662 sec/batch
Epoch: 46/50... Training Step: 2750... Training loss: 3.8566... 1.3682 sec/batch
Epoch: 46/50... Training Step: 2751... Training loss: 3.9267... 1.3983 sec/batch
Epoch: 46/50... Training Step: 2752... Training loss: 3.9282... 1.3838 sec/batch
Epoch: 46/50... Training Step: 2753... Training loss: 3.8841... 1.4053 sec/batch
Epoch: 46/50... Training Step: 2754... Training loss: 3.8589... 1.3791 sec/batch
Epoch: 46/50... Training Step: 2755... Training loss: 3.9150... 1.3819 sec/batch
Epoch: 46/50... Training Step: 2756... Training loss: 4.0194... 1.4409 sec/batch
Epoch: 46/50... Training Step: 2757... Training loss: 3.8641... 1.3863 sec/batch
Epoch: 46/50... Training Step: 2758... Training loss: 3.9203... 1.3897 sec/batch
Epoch: 46/50... Training Step: 2759... Training loss: 3.9310... 1.4003 sec/batch
Epoch: 46/50... Training Step: 2760... Training loss: 3.8653... 1.4170 sec/batch
Epoch: 46/50... Training Step: 2761... Training loss: 3.8473... 1.4270 sec/batch
Epoch: 46/50... Training Step: 2762... Training loss: 3.8779... 1.3992 sec/batch
Epoch: 46/50... Training Step: 2763... Training loss: 3.9365... 1.4172 sec/batch
Epoch: 46/50... Training Step: 2764... Training loss: 3.9174... 1.3982 sec/batch
Epoch: 46/50... Training Step: 2765... Training loss: 3.9175... 1.3837 sec/batch
Epoch: 46/50... Training Step: 2766... Training loss: 3.9414... 1.4204 sec/batch
Epoch: 46/50... Training Step: 2767... Training loss: 3.8597... 1.3839 sec/batch
Epoch: 46/50... Training Step: 2768... Training loss: 3.8112... 1.3755 sec/batch
Epoch: 46/50... Training Step: 2769... Training loss: 3.8289... 1.4078 sec/batch
Epoch: 46/50... Training Step: 2770... Training loss: 3.8470... 1.3980 sec/batch
Epoch: 46/50... Training Step: 2771... Training loss: 3.7984... 1.3936 sec/batch
Epoch: 46/50... Training Step: 2772... Training loss: 3.8823... 1.4217 sec/batch
Epoch: 46/50... Training Step: 2773... Training loss: 3.8896... 1.4482 sec/batch
Epoch: 46/50... Training Step: 2774... Training loss: 3.8224... 1.4197 sec/batch
Epoch: 46/50... Training Step: 2775... Training loss: 3.8592... 1.3930 sec/batch
Epoch: 46/50... Training Step: 2776... Training loss: 3.8185... 1.3918 sec/batch
Epoch: 46/50... Training Step: 2777... Training loss: 3.7966... 1.3871 sec/batch
Epoch: 46/50... Training Step: 2778... Training loss: 3.8722... 1.4028 sec/batch
Epoch: 46/50... Training Step: 2779... Training loss: 3.8501... 1.2815 sec/batch
Epoch: 46/50... Training Step: 2780... Training loss: 3.8887... 1.3854 sec/batch
Epoch: 46/50... Training Step: 2781... Training loss: 3.8303... 1.4079 sec/batch
Epoch: 46/50... Training Step: 2782... Training loss: 3.9422... 1.3581 sec/batch
Epoch: 46/50... Training Step: 2783... Training loss: 3.8990... 1.3571 sec/batch
Epoch: 46/50... Training Step: 2784... Training loss: 3.9659... 1.4189 sec/batch
Epoch: 46/50... Training Step: 2785... Training loss: 3.8209... 1.2965 sec/batch
Epoch: 46/50... Training Step: 2786... Training loss: 3.8288... 1.4080 sec/batch
Epoch: 46/50... Training Step: 2787... Training loss: 3.8323... 1.3964 sec/batch
Epoch: 46/50... Training Step: 2788... Training loss: 3.8670... 1.4165 sec/batch
Epoch: 46/50... Training Step: 2789... Training loss: 3.8135... 1.3927 sec/batch
Epoch: 46/50... Training Step: 2790... Training loss: 3.8287... 1.3526 sec/batch
Epoch: 46/50... Training Step: 2791... Training loss: 3.8174... 1.3595 sec/batch
Epoch: 46/50... Training Step: 2792... Training loss: 3.8646... 1.4131 sec/batch
Epoch: 46/50... Training Step: 2793... Training loss: 3.8326... 1.3959 sec/batch
Epoch: 46/50... Training Step: 2794... Training loss: 3.8738... 1.3777 sec/batch
Epoch: 46/50... Training Step: 2795... Training loss: 3.8261... 1.4055 sec/batch
Epoch: 46/50... Training Step: 2796... Training loss: 3.7968... 1.3985 sec/batch
Epoch: 46/50... Training Step: 2797... Training loss: 3.9064... 1.4038 sec/batch
Epoch: 46/50... Training Step: 2798... Training loss: 3.7888... 1.3969 sec/batch
Epoch: 46/50... Training Step: 2799... Training loss: 3.8269... 1.4161 sec/batch
Epoch: 46/50... Training Step: 2800... Training loss: 3.7864... 1.3192 sec/batch
Epoch: 46/50... Training Step: 2801... Training loss: 3.7635... 1.4341 sec/batch
Epoch: 46/50... Training Step: 2802... Training loss: 3.8835... 1.3875 sec/batch
Epoch: 46/50... Training Step: 2803... Training loss: 3.8166... 1.3970 sec/batch
Epoch: 46/50... Training Step: 2804... Training loss: 3.8528... 1.3514 sec/batch
Epoch: 46/50... Training Step: 2805... Training loss: 3.8402... 1.3870 sec/batch
Epoch: 46/50... Training Step: 2806... Training loss: 3.8215... 1.3671 sec/batch
Epoch: 47/50... Training Step: 2807... Training loss: 3.9710... 1.3563 sec/batch
Epoch: 47/50... Training Step: 2808... Training loss: 3.8324... 1.4036 sec/batch
Epoch: 47/50... Training Step: 2809... Training loss: 3.8208... 1.3886 sec/batch
Epoch: 47/50... Training Step: 2810... Training loss: 3.8776... 1.3865 sec/batch
Epoch: 47/50... Training Step: 2811... Training loss: 3.8334... 1.4231 sec/batch
Epoch: 47/50... Training Step: 2812... Training loss: 3.8946... 1.4288 sec/batch
Epoch: 47/50... Training Step: 2813... Training loss: 3.9132... 1.3921 sec/batch
Epoch: 47/50... Training Step: 2814... Training loss: 3.8652... 1.3831 sec/batch
Epoch: 47/50... Training Step: 2815... Training loss: 3.8506... 1.4019 sec/batch
Epoch: 47/50... Training Step: 2816... Training loss: 3.8980... 1.4197 sec/batch
Epoch: 47/50... Training Step: 2817... Training loss: 4.0041... 1.4171 sec/batch
Epoch: 47/50... Training Step: 2818... Training loss: 3.8476... 1.3922 sec/batch
Epoch: 47/50... Training Step: 2819... Training loss: 3.9044... 1.4229 sec/batch
Epoch: 47/50... Training Step: 2820... Training loss: 3.8965... 1.3408 sec/batch
Epoch: 47/50... Training Step: 2821... Training loss: 3.8430... 1.4260 sec/batch
Epoch: 47/50... Training Step: 2822... Training loss: 3.8217... 1.3680 sec/batch
Epoch: 47/50... Training Step: 2823... Training loss: 3.8631... 1.3578 sec/batch
Epoch: 47/50... Training Step: 2824... Training loss: 3.9155... 1.3761 sec/batch
Epoch: 47/50... Training Step: 2825... Training loss: 3.8932... 1.3921 sec/batch
Epoch: 47/50... Training Step: 2826... Training loss: 3.9020... 1.4144 sec/batch
Epoch: 47/50... Training Step: 2827... Training loss: 3.9051... 1.3243 sec/batch
Epoch: 47/50... Training Step: 2828... Training loss: 3.8457... 1.3936 sec/batch
Epoch: 47/50... Training Step: 2829... Training loss: 3.7876... 1.3983 sec/batch
Epoch: 47/50... Training Step: 2830... Training loss: 3.8013... 1.3902 sec/batch
Epoch: 47/50... Training Step: 2831... Training loss: 3.8193... 1.3847 sec/batch
Epoch: 47/50... Training Step: 2832... Training loss: 3.7776... 1.3831 sec/batch
Epoch: 47/50... Training Step: 2833... Training loss: 3.8610... 1.4280 sec/batch
Epoch: 47/50... Training Step: 2834... Training loss: 3.8684... 1.3370 sec/batch
Epoch: 47/50... Training Step: 2835... Training loss: 3.8056... 1.3809 sec/batch
Epoch: 47/50... Training Step: 2836... Training loss: 3.8576... 1.3412 sec/batch
Epoch: 47/50... Training Step: 2837... Training loss: 3.8082... 1.4320 sec/batch
Epoch: 47/50... Training Step: 2838... Training loss: 3.8016... 1.3126 sec/batch
Epoch: 47/50... Training Step: 2839... Training loss: 3.8484... 1.3952 sec/batch
Epoch: 47/50... Training Step: 2840... Training loss: 3.8338... 1.4074 sec/batch
Epoch: 47/50... Training Step: 2841... Training loss: 3.8655... 1.3839 sec/batch
Epoch: 47/50... Training Step: 2842... Training loss: 3.8202... 1.3554 sec/batch
Epoch: 47/50... Training Step: 2843... Training loss: 3.9091... 1.3844 sec/batch
Epoch: 47/50... Training Step: 2844... Training loss: 3.8689... 1.4125 sec/batch
Epoch: 47/50... Training Step: 2845... Training loss: 3.9533... 1.4002 sec/batch
Epoch: 47/50... Training Step: 2846... Training loss: 3.8064... 1.4156 sec/batch
Epoch: 47/50... Training Step: 2847... Training loss: 3.8136... 1.3911 sec/batch
Epoch: 47/50... Training Step: 2848... Training loss: 3.8312... 1.4068 sec/batch
Epoch: 47/50... Training Step: 2849... Training loss: 3.8647... 1.3212 sec/batch
Epoch: 47/50... Training Step: 2850... Training loss: 3.7916... 1.3780 sec/batch
Epoch: 47/50... Training Step: 2851... Training loss: 3.8287... 1.3988 sec/batch
Epoch: 47/50... Training Step: 2852... Training loss: 3.8321... 1.4470 sec/batch
Epoch: 47/50... Training Step: 2853... Training loss: 3.8558... 1.3665 sec/batch
Epoch: 47/50... Training Step: 2854... Training loss: 3.8066... 1.3975 sec/batch
Epoch: 47/50... Training Step: 2855... Training loss: 3.8597... 1.3540 sec/batch
Epoch: 47/50... Training Step: 2856... Training loss: 3.8311... 1.3848 sec/batch
Epoch: 47/50... Training Step: 2857... Training loss: 3.7913... 1.3878 sec/batch
Epoch: 47/50... Training Step: 2858... Training loss: 3.8839... 1.3997 sec/batch
Epoch: 47/50... Training Step: 2859... Training loss: 3.7993... 1.3568 sec/batch
Epoch: 47/50... Training Step: 2860... Training loss: 3.8076... 1.3638 sec/batch
Epoch: 47/50... Training Step: 2861... Training loss: 3.7829... 1.3344 sec/batch
Epoch: 47/50... Training Step: 2862... Training loss: 3.7593... 1.4248 sec/batch
Epoch: 47/50... Training Step: 2863... Training loss: 3.8630... 1.3851 sec/batch
Epoch: 47/50... Training Step: 2864... Training loss: 3.8150... 1.3962 sec/batch
Epoch: 47/50... Training Step: 2865... Training loss: 3.8369... 1.4312 sec/batch
Epoch: 47/50... Training Step: 2866... Training loss: 3.8346... 1.3305 sec/batch
Epoch: 47/50... Training Step: 2867... Training loss: 3.8020... 1.4002 sec/batch
Epoch: 48/50... Training Step: 2868... Training loss: 3.9545... 1.3674 sec/batch
Epoch: 48/50... Training Step: 2869... Training loss: 3.8242... 1.3959 sec/batch
Epoch: 48/50... Training Step: 2870... Training loss: 3.8202... 1.4025 sec/batch
Epoch: 48/50... Training Step: 2871... Training loss: 3.8624... 1.3870 sec/batch
Epoch: 48/50... Training Step: 2872... Training loss: 3.8181... 1.3850 sec/batch
Epoch: 48/50... Training Step: 2873... Training loss: 3.8881... 1.3969 sec/batch
Epoch: 48/50... Training Step: 2874... Training loss: 3.9091... 1.3914 sec/batch
Epoch: 48/50... Training Step: 2875... Training loss: 3.8566... 1.3879 sec/batch
Epoch: 48/50... Training Step: 2876... Training loss: 3.8440... 1.3918 sec/batch
Epoch: 48/50... Training Step: 2877... Training loss: 3.8800... 1.3817 sec/batch
Epoch: 48/50... Training Step: 2878... Training loss: 3.9937... 1.4003 sec/batch
Epoch: 48/50... Training Step: 2879... Training loss: 3.8203... 1.3232 sec/batch
Epoch: 48/50... Training Step: 2880... Training loss: 3.8898... 1.4087 sec/batch
Epoch: 48/50... Training Step: 2881... Training loss: 3.9088... 1.3290 sec/batch
Epoch: 48/50... Training Step: 2882... Training loss: 3.8348... 1.4033 sec/batch
Epoch: 48/50... Training Step: 2883... Training loss: 3.8090... 1.4070 sec/batch
Epoch: 48/50... Training Step: 2884... Training loss: 3.8349... 1.4221 sec/batch
Epoch: 48/50... Training Step: 2885... Training loss: 3.8918... 1.3890 sec/batch
Epoch: 48/50... Training Step: 2886... Training loss: 3.8751... 1.3365 sec/batch
Epoch: 48/50... Training Step: 2887... Training loss: 3.9003... 1.4194 sec/batch
Epoch: 48/50... Training Step: 2888... Training loss: 3.9091... 1.3979 sec/batch
Epoch: 48/50... Training Step: 2889... Training loss: 3.8297... 1.3702 sec/batch
Epoch: 48/50... Training Step: 2890... Training loss: 3.7970... 1.3788 sec/batch
Epoch: 48/50... Training Step: 2891... Training loss: 3.7820... 1.4216 sec/batch
Epoch: 48/50... Training Step: 2892... Training loss: 3.8072... 1.4349 sec/batch
Epoch: 48/50... Training Step: 2893... Training loss: 3.7589... 1.4011 sec/batch
Epoch: 48/50... Training Step: 2894... Training loss: 3.8449... 1.4266 sec/batch
Epoch: 48/50... Training Step: 2895... Training loss: 3.8614... 1.3083 sec/batch
Epoch: 48/50... Training Step: 2896... Training loss: 3.7802... 1.4027 sec/batch
Epoch: 48/50... Training Step: 2897... Training loss: 3.8214... 1.3996 sec/batch
Epoch: 48/50... Training Step: 2898... Training loss: 3.7933... 1.4395 sec/batch
Epoch: 48/50... Training Step: 2899... Training loss: 3.7794... 1.4003 sec/batch
Epoch: 48/50... Training Step: 2900... Training loss: 3.8354... 1.4104 sec/batch
Epoch: 48/50... Training Step: 2901... Training loss: 3.8015... 1.3977 sec/batch
Epoch: 48/50... Training Step: 2902... Training loss: 3.8407... 1.4131 sec/batch
Epoch: 48/50... Training Step: 2903... Training loss: 3.8038... 1.4041 sec/batch
Epoch: 48/50... Training Step: 2904... Training loss: 3.8878... 1.4028 sec/batch
Epoch: 48/50... Training Step: 2905... Training loss: 3.8577... 1.3178 sec/batch
Epoch: 48/50... Training Step: 2906... Training loss: 3.9247... 1.3539 sec/batch
Epoch: 48/50... Training Step: 2907... Training loss: 3.7985... 1.3676 sec/batch
Epoch: 48/50... Training Step: 2908... Training loss: 3.8013... 1.3857 sec/batch
Epoch: 48/50... Training Step: 2909... Training loss: 3.7963... 1.4048 sec/batch
Epoch: 48/50... Training Step: 2910... Training loss: 3.8390... 1.3822 sec/batch
Epoch: 48/50... Training Step: 2911... Training loss: 3.7868... 1.3993 sec/batch
Epoch: 48/50... Training Step: 2912... Training loss: 3.8046... 1.4263 sec/batch
Epoch: 48/50... Training Step: 2913... Training loss: 3.8023... 1.3722 sec/batch
Epoch: 48/50... Training Step: 2914... Training loss: 3.8443... 1.3766 sec/batch
Epoch: 48/50... Training Step: 2915... Training loss: 3.7876... 1.3417 sec/batch
Epoch: 48/50... Training Step: 2916... Training loss: 3.8394... 1.4581 sec/batch
Epoch: 48/50... Training Step: 2917... Training loss: 3.7918... 1.4207 sec/batch
Epoch: 48/50... Training Step: 2918... Training loss: 3.7777... 1.4104 sec/batch
Epoch: 48/50... Training Step: 2919... Training loss: 3.8740... 1.3909 sec/batch
Epoch: 48/50... Training Step: 2920... Training loss: 3.7677... 1.3605 sec/batch
Epoch: 48/50... Training Step: 2921... Training loss: 3.8098... 1.3453 sec/batch
Epoch: 48/50... Training Step: 2922... Training loss: 3.7610... 1.3838 sec/batch
Epoch: 48/50... Training Step: 2923... Training loss: 3.7370... 1.4529 sec/batch
Epoch: 48/50... Training Step: 2924... Training loss: 3.8513... 1.3860 sec/batch
Epoch: 48/50... Training Step: 2925... Training loss: 3.7922... 1.3360 sec/batch
Epoch: 48/50... Training Step: 2926... Training loss: 3.8294... 1.4049 sec/batch
Epoch: 48/50... Training Step: 2927... Training loss: 3.8149... 1.3792 sec/batch
Epoch: 48/50... Training Step: 2928... Training loss: 3.7973... 1.3750 sec/batch
Epoch: 49/50... Training Step: 2929... Training loss: 3.9437... 1.3757 sec/batch
Epoch: 49/50... Training Step: 2930... Training loss: 3.8101... 1.3845 sec/batch
Epoch: 49/50... Training Step: 2931... Training loss: 3.8156... 1.3479 sec/batch
Epoch: 49/50... Training Step: 2932... Training loss: 3.8575... 1.4192 sec/batch
Epoch: 49/50... Training Step: 2933... Training loss: 3.8101... 1.4063 sec/batch
Epoch: 49/50... Training Step: 2934... Training loss: 3.8759... 1.4024 sec/batch
Epoch: 49/50... Training Step: 2935... Training loss: 3.8718... 1.3566 sec/batch
Epoch: 49/50... Training Step: 2936... Training loss: 3.8291... 1.3911 sec/batch
Epoch: 49/50... Training Step: 2937... Training loss: 3.8285... 1.4267 sec/batch
Epoch: 49/50... Training Step: 2938... Training loss: 3.8674... 1.3997 sec/batch
Epoch: 49/50... Training Step: 2939... Training loss: 3.9863... 1.4269 sec/batch
Epoch: 49/50... Training Step: 2940... Training loss: 3.8166... 1.3349 sec/batch
Epoch: 49/50... Training Step: 2941... Training loss: 3.8805... 1.3958 sec/batch
Epoch: 49/50... Training Step: 2942... Training loss: 3.8682... 1.4072 sec/batch
Epoch: 49/50... Training Step: 2943... Training loss: 3.8257... 1.3721 sec/batch
Epoch: 49/50... Training Step: 2944... Training loss: 3.7980... 1.4082 sec/batch
Epoch: 49/50... Training Step: 2945... Training loss: 3.8372... 1.4112 sec/batch
Epoch: 49/50... Training Step: 2946... Training loss: 3.8912... 1.3264 sec/batch
Epoch: 49/50... Training Step: 2947... Training loss: 3.8649... 1.3956 sec/batch
Epoch: 49/50... Training Step: 2948... Training loss: 3.8894... 1.4051 sec/batch
Epoch: 49/50... Training Step: 2949... Training loss: 3.8920... 1.4133 sec/batch
Epoch: 49/50... Training Step: 2950... Training loss: 3.8171... 1.4104 sec/batch
Epoch: 49/50... Training Step: 2951... Training loss: 3.7545... 1.4014 sec/batch
Epoch: 49/50... Training Step: 2952... Training loss: 3.7657... 1.4027 sec/batch
Epoch: 49/50... Training Step: 2953... Training loss: 3.7928... 1.3890 sec/batch
Epoch: 49/50... Training Step: 2954... Training loss: 3.7571... 1.3123 sec/batch
Epoch: 49/50... Training Step: 2955... Training loss: 3.8324... 1.2875 sec/batch
Epoch: 49/50... Training Step: 2956... Training loss: 3.8381... 1.3791 sec/batch
Epoch: 49/50... Training Step: 2957... Training loss: 3.7716... 1.3390 sec/batch
Epoch: 49/50... Training Step: 2958... Training loss: 3.8219... 1.4168 sec/batch
Epoch: 49/50... Training Step: 2959... Training loss: 3.7788... 1.3808 sec/batch
Epoch: 49/50... Training Step: 2960... Training loss: 3.7715... 1.3631 sec/batch
Epoch: 49/50... Training Step: 2961... Training loss: 3.8302... 1.3549 sec/batch
Epoch: 49/50... Training Step: 2962... Training loss: 3.8035... 1.4036 sec/batch
Epoch: 49/50... Training Step: 2963... Training loss: 3.8373... 1.4442 sec/batch
Epoch: 49/50... Training Step: 2964... Training loss: 3.8039... 1.3995 sec/batch
Epoch: 49/50... Training Step: 2965... Training loss: 3.8792... 1.3952 sec/batch
Epoch: 49/50... Training Step: 2966... Training loss: 3.8420... 1.3932 sec/batch
Epoch: 49/50... Training Step: 2967... Training loss: 3.9144... 1.4047 sec/batch
Epoch: 49/50... Training Step: 2968... Training loss: 3.7799... 1.3842 sec/batch
Epoch: 49/50... Training Step: 2969... Training loss: 3.7978... 1.3903 sec/batch
Epoch: 49/50... Training Step: 2970... Training loss: 3.7944... 1.3440 sec/batch
Epoch: 49/50... Training Step: 2971... Training loss: 3.8236... 1.3879 sec/batch
Epoch: 49/50... Training Step: 2972... Training loss: 3.7757... 1.3888 sec/batch
Epoch: 49/50... Training Step: 2973... Training loss: 3.7915... 1.3881 sec/batch
Epoch: 49/50... Training Step: 2974... Training loss: 3.8076... 1.4069 sec/batch
Epoch: 49/50... Training Step: 2975... Training loss: 3.8226... 1.3718 sec/batch
Epoch: 49/50... Training Step: 2976... Training loss: 3.7739... 1.3806 sec/batch
Epoch: 49/50... Training Step: 2977... Training loss: 3.8283... 1.3748 sec/batch
Epoch: 49/50... Training Step: 2978... Training loss: 3.7790... 1.3833 sec/batch
Epoch: 49/50... Training Step: 2979... Training loss: 3.7674... 1.3498 sec/batch
Epoch: 49/50... Training Step: 2980... Training loss: 3.8530... 1.4084 sec/batch
Epoch: 49/50... Training Step: 2981... Training loss: 3.7501... 1.3974 sec/batch
Epoch: 49/50... Training Step: 2982... Training loss: 3.7877... 1.3622 sec/batch
Epoch: 49/50... Training Step: 2983... Training loss: 3.7560... 1.4109 sec/batch
Epoch: 49/50... Training Step: 2984... Training loss: 3.7183... 1.3338 sec/batch
Epoch: 49/50... Training Step: 2985... Training loss: 3.8435... 1.4418 sec/batch
Epoch: 49/50... Training Step: 2986... Training loss: 3.7748... 1.3268 sec/batch
Epoch: 49/50... Training Step: 2987... Training loss: 3.8085... 1.3688 sec/batch
Epoch: 49/50... Training Step: 2988... Training loss: 3.7959... 1.3463 sec/batch
Epoch: 49/50... Training Step: 2989... Training loss: 3.7755... 1.4117 sec/batch
Epoch: 50/50... Training Step: 2990... Training loss: 3.9194... 1.4020 sec/batch
Epoch: 50/50... Training Step: 2991... Training loss: 3.8014... 1.3929 sec/batch
Epoch: 50/50... Training Step: 2992... Training loss: 3.8100... 1.4332 sec/batch
Epoch: 50/50... Training Step: 2993... Training loss: 3.8367... 1.3792 sec/batch
Epoch: 50/50... Training Step: 2994... Training loss: 3.8013... 1.3999 sec/batch
Epoch: 50/50... Training Step: 2995... Training loss: 3.8660... 1.4005 sec/batch
Epoch: 50/50... Training Step: 2996... Training loss: 3.8641... 1.3781 sec/batch
Epoch: 50/50... Training Step: 2997... Training loss: 3.8213... 1.3853 sec/batch
Epoch: 50/50... Training Step: 2998... Training loss: 3.8068... 1.3499 sec/batch
Epoch: 50/50... Training Step: 2999... Training loss: 3.8518... 1.3964 sec/batch
Epoch: 50/50... Training Step: 3000... Training loss: 3.9521... 1.3879 sec/batch
Epoch: 50/50... Training Step: 3001... Training loss: 3.7899... 1.4123 sec/batch
Epoch: 50/50... Training Step: 3002... Training loss: 3.8532... 1.4081 sec/batch
Epoch: 50/50... Training Step: 3003... Training loss: 3.8581... 1.4110 sec/batch
Epoch: 50/50... Training Step: 3004... Training loss: 3.8147... 1.3695 sec/batch
Epoch: 50/50... Training Step: 3005... Training loss: 3.7844... 1.4289 sec/batch
Epoch: 50/50... Training Step: 3006... Training loss: 3.8149... 1.4012 sec/batch
Epoch: 50/50... Training Step: 3007... Training loss: 3.8656... 1.3846 sec/batch
Epoch: 50/50... Training Step: 3008... Training loss: 3.8545... 1.4137 sec/batch
Epoch: 50/50... Training Step: 3009... Training loss: 3.8769... 1.3816 sec/batch
Epoch: 50/50... Training Step: 3010... Training loss: 3.8623... 1.3772 sec/batch
Epoch: 50/50... Training Step: 3011... Training loss: 3.8096... 1.4184 sec/batch
Epoch: 50/50... Training Step: 3012... Training loss: 3.7411... 1.4224 sec/batch
Epoch: 50/50... Training Step: 3013... Training loss: 3.7561... 1.3869 sec/batch
Epoch: 50/50... Training Step: 3014... Training loss: 3.7758... 1.4089 sec/batch
Epoch: 50/50... Training Step: 3015... Training loss: 3.7358... 1.4111 sec/batch
Epoch: 50/50... Training Step: 3016... Training loss: 3.8186... 1.3787 sec/batch
Epoch: 50/50... Training Step: 3017... Training loss: 3.8179... 1.4062 sec/batch
Epoch: 50/50... Training Step: 3018... Training loss: 3.7588... 1.4142 sec/batch
Epoch: 50/50... Training Step: 3019... Training loss: 3.8030... 1.4059 sec/batch
Epoch: 50/50... Training Step: 3020... Training loss: 3.7428... 1.3556 sec/batch
Epoch: 50/50... Training Step: 3021... Training loss: 3.7481... 1.3902 sec/batch
Epoch: 50/50... Training Step: 3022... Training loss: 3.8079... 1.3610 sec/batch
Epoch: 50/50... Training Step: 3023... Training loss: 3.7741... 1.3969 sec/batch
Epoch: 50/50... Training Step: 3024... Training loss: 3.8218... 1.3774 sec/batch
Epoch: 50/50... Training Step: 3025... Training loss: 3.7697... 1.3908 sec/batch
Epoch: 50/50... Training Step: 3026... Training loss: 3.8819... 1.3488 sec/batch
Epoch: 50/50... Training Step: 3027... Training loss: 3.8312... 1.3976 sec/batch
Epoch: 50/50... Training Step: 3028... Training loss: 3.8935... 1.3974 sec/batch
Epoch: 50/50... Training Step: 3029... Training loss: 3.7599... 1.4297 sec/batch
Epoch: 50/50... Training Step: 3030... Training loss: 3.7854... 1.3967 sec/batch
Epoch: 50/50... Training Step: 3031... Training loss: 3.7763... 1.4024 sec/batch
Epoch: 50/50... Training Step: 3032... Training loss: 3.8203... 1.3223 sec/batch
Epoch: 50/50... Training Step: 3033... Training loss: 3.7449... 1.3742 sec/batch
Epoch: 50/50... Training Step: 3034... Training loss: 3.7875... 1.3947 sec/batch
Epoch: 50/50... Training Step: 3035... Training loss: 3.7767... 1.3708 sec/batch
Epoch: 50/50... Training Step: 3036... Training loss: 3.8168... 1.3797 sec/batch
Epoch: 50/50... Training Step: 3037... Training loss: 3.7870... 1.4340 sec/batch
Epoch: 50/50... Training Step: 3038... Training loss: 3.8149... 1.4072 sec/batch
Epoch: 50/50... Training Step: 3039... Training loss: 3.7652... 1.4285 sec/batch
Epoch: 50/50... Training Step: 3040... Training loss: 3.7399... 1.4248 sec/batch
Epoch: 50/50... Training Step: 3041... Training loss: 3.8557... 1.3254 sec/batch
Epoch: 50/50... Training Step: 3042... Training loss: 3.7415... 1.3716 sec/batch
Epoch: 50/50... Training Step: 3043... Training loss: 3.7778... 1.3861 sec/batch
Epoch: 50/50... Training Step: 3044... Training loss: 3.7387... 1.3983 sec/batch
Epoch: 50/50... Training Step: 3045... Training loss: 3.6943... 1.4393 sec/batch
Epoch: 50/50... Training Step: 3046... Training loss: 3.8219... 1.3806 sec/batch
Epoch: 50/50... Training Step: 3047... Training loss: 3.7706... 1.2904 sec/batch
Epoch: 50/50... Training Step: 3048... Training loss: 3.7896... 1.4140 sec/batch
Epoch: 50/50... Training Step: 3049... Training loss: 3.7858... 1.3906 sec/batch
Epoch: 50/50... Training Step: 3050... Training loss: 3.7573... 1.3844 sec/batch
Read up on saving and loading checkpoints here: https://www.tensorflow.org/programmers_guide/variables
In [19]:
tf.train.get_checkpoint_state('checkpoints')
Out[19]:
model_checkpoint_path: "checkpoints\\i3050_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i200_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i400_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i600_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i800_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i1000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i1200_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i1400_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i1600_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i1800_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i2000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i2200_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i2400_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i2600_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i2800_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i3000_l512.ckpt"
all_model_checkpoint_paths: "checkpoints\\i3050_l512.ckpt"
Now that the network is trained, we'll can use it to generate new text. The idea is that we pass in a character, then the network will predict the next character. We can use the new one, to predict the next one. And we keep doing this to generate all new text. I also included some functionality to prime the network with some text by passing in a string and building up a state from that.
The network gives us predictions for each character. To reduce noise and make things a little less random, I'm going to only choose a new character from the top N most likely characters.
In [20]:
def pick_top_n(preds, vocab_size, top_n=5):
p = np.squeeze(preds)
p[np.argsort(p)[:-top_n]] = 0
p = p / np.sum(p)
c = np.random.choice(vocab_size, 1, p=p)[0]
return c
In [21]:
def sample(checkpoint, n_samples, lstm_size, vocab_size, prime="The "):
samples = [c for c in prime]
model = CharRNN(len(vocab), lstm_size=lstm_size, sampling=True)
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess, checkpoint)
new_state = sess.run(model.initial_state)
for c in prime:
x = np.zeros((1, 1))
x[0,0] = vocab_to_int[c]
feed = {model.inputs: x,
model.keep_prob: 1.,
model.initial_state: new_state}
preds, new_state = sess.run([model.prediction, model.final_state],
feed_dict=feed)
c = pick_top_n(preds, len(vocab))
samples.append(int_to_vocab[c])
for i in range(n_samples):
x[0,0] = c
feed = {model.inputs: x,
model.keep_prob: 1.,
model.initial_state: new_state}
preds, new_state = sess.run([model.prediction, model.final_state],
feed_dict=feed)
c = pick_top_n(preds, len(vocab))
samples.append(int_to_vocab[c])
return ''.join(samples)
Here, pass in the path to a checkpoint and sample from the network.
In [22]:
tf.train.latest_checkpoint('checkpoints')
Out[22]:
'checkpoints\\i3050_l512.ckpt'
In [26]:
checkpoint = tf.train.latest_checkpoint('checkpoints')
samp = sample(checkpoint, 7000, lstm_size, len(vocab), prime="浪")
print(samp)
INFO:tensorflow:Restoring parameters from checkpoints\i3050_l512.ckpt
浪。”
西门庆见了,只见西门庆进入门中,便道:“这里说了,只是一件儿。”西门庆道:“你说不知,我不敢说。”伯爵道:“这等你不知,我还不去。我说我不知你,你不知,你这个不好?”西门庆道:“我不得你。”西门庆道:“你休要说,你不要,我说你,你不要你,我也不要我去,你不肯你。他不肯来,你不知,我就不知我,不知你那里去了。”西门庆道:“你不知道,只是我不的。”西门庆道:“你的我这里,就是我的不好。你这一个小厮儿,不知道,你也要吃他。”西门庆道:“我的不说你,我说你不好,他不知道,不是这般说,我就不知道。”那妇人听了,只见他说话,一直往外边去。只见玳安进房里,一面走了一遍,金莲不见,说道:“他是不知,你这个好个儿!我不在这里,只是你不好。”西门庆道:“怪狗才,他这等你来。”西门庆道:“他不知,你这个不在我。”西门庆道:“我不要他,你不要,你还要我去,教我拿着他去了。”李瓶儿道:“你不好,只顾我来。”西门庆道:“我不知你,我不在家。那个不好不好?”妇人道:“我也不知,我也不是你这般,你不知道。”那妇人听着,笑道:“你的不是,说的是谁。”那妇人道:“我不知道,我不知道。”西门庆道:“他也不是,你说我,你也是个不知道!”于是打了一个,说道:“我不是你,我这里说的是谁的?”月娘道:“他这个不好,你就是我这个儿!不是你这个淫妇儿,我不知你,你怎不知道!”说道:“我的我不知,我怎的不知你?”妇人道:“你不知你,我不在他这里,你不知道。你不知怎的不得?我来家我也不要了,他也没了,我也不好。你说了一场,他也不知你的,我这个不是你的。”
不言语,只见了一回。
西门庆在房里睡,只见他两个儿,不觉一阵风儿,只见一个不知。那日不见,不在西门庆房里,只见李瓶儿来了,说:“姐姐,他不在,我也不知你,你这个是他的。不知我,你不知他。他不是你,你这里,我不是他,他就是你的。”那玉楼听见了,说:“他,你还不在这屋里?”西门庆道:“你不好,我也不要你,不想来我,不要你看,他也要不出来,我也要他,你说他,你还不知道了。”一面走了一遍,又不在家,又问他:“我在那屋里?”金莲道:“你没了,我就是不得他。”月娘道:“你说道:“我,你这个不好。他来,你还不知道。”西门庆笑:“你不知道,你这等我说。”于是向前取出两个来与西门庆,说道:“我不的,他还不知道,不知你这里有事。”月娘道:“我不好不得,你只怕他。”于是走出来来,说道:“我不知,不知我的,不知怎的。”西门庆道:“我说,你也不是我的。”西门庆道:“既是你,不好不好,就要去了,你就要不去。我不要他,我也不要。你这个不知怎的?”李瓶儿道:“你不是,只怕他,只怕你也不得,我就是你的。不想你不在家,不知你家去了。”西门庆道:“你不知,你不好,不是你这里来!你说我的,你也不要吃,不好,就是你的人儿不好?”西门庆道:“你不说,我不知道。我说你不好,你也不知你,我不去。”西门庆道:“你这个不是,你也不好!”那春梅就是了他一块,一口里放着。
那西门庆不在他房里,只见他不在话。西门庆不想他到后边,就在门前看看。
西门庆看了,就往西门庆家去。
正说:
花枝柳花花园,柳妆花草。
西门庆到后边,一个月娘,就是李瓶儿房中,一个金莲、金莲、金莲、李娇儿、孟玉楼、金莲、金莲、玉箫、兰香、玉箫、玉楼在李瓶儿房边,打着他睡着,不想他去,只怕不见他来。正是:
风不知是多少,少不好人?
话说月娘众人,见了一回,只见西门庆来家,说道:“你的小的不知道。”那李瓶儿不敢说。西门庆道:“你这等不起来,你不知道,你就是我不去。”那西门庆不言,说道:“你不知,我不知你不知,我也不是你。你说他家,你还不要他。我若要我,不是你家人,他不肯说,我也不知他这里,你不知我的他,不知怎么!”西门庆道:“不好,只怕我的不知你。”那妇人道:“他的好不知,我也不好了。你这两日,你还不好,不要你去。”月娘道:“他是你的好!”那妇人道:“我不是你。你说我,你还不在他这里,我不好。你若是我,他不好,你这等不打我,只要教你吃了。”于是走到前边,西门庆道:“我不好,你不知,不知你去了。我不知,我不知道。”西门庆笑道:“你这里,他也有个儿。”西门庆道:“你这里说,我就要去。我不知你那里去了。”那李瓶儿不知笑了,只见李瓶儿在前边坐了。
正说:
不觉心中,如意思不逢。
话说西门庆自从,不知道了。
且说西门庆在家,不觉心中,不知大姐。正是:
风不逢无,难为,人情难为。
话说西门庆到后边,只见西门庆一个小厮儿,不敢走来。正是:
不知多少不成,无人无情。
话说西门庆自从家,不知道:“这个是个人的,不得你不在他。”那妇人听着,把手子都是一条,说道:“你老人家不好,你不好!”西门庆道:“你这奴才,不是一般说的。”月娘道:“你的,你这里有个人儿,你不肯吃了?”西门庆道:“你说,我不要你这两银子。”那西门庆道:“你说不得,你这个不好。”西门庆道:“我不知道,你不知道,我就是你说的话儿。我说他不知道。”西门庆道:“我,你还不知我,你就来了。我就不去,我不知你这些事儿!你还不知道,我不是你这个儿。”西门庆道:“你不是这个人,就是了我。你若要不知,他是不好,你也不好,只顾说了。”于是走到前边,月娘说:“大娘,你不曾吃了,我也是不好?”玉楼道:“我的姐姐,只是我不得你,他这个不知你的。”月娘道:“你的,我不要你,你还要来,我也不去。”李瓶儿笑道:“我也不好,你说了一日,我还要来你看你。”西门庆道:“你这个说,我也没不出来。你若不在家,我不知怎的?”李瓶儿道:“你休说,我也不是我,你也有个他的。我不知道你,他也不是你这里,只是一个小的儿,不敢是他。”那西门庆道:“你不知,我这一日就是他。”西门庆道:“我不知怎么?我不知,你就往前边去了一夜,就是我一日。”西门庆道:“我,你不好。我不知道他,我怎的不知?”那玉箫道:“不知我的他。”西门庆道:“我这等不是。”那妇人道:“我不知,不知你这个是谁?你不知我,你还不知我,你这个不在家。”西门庆道:“我的,你也不知你的。”于是把他拿出来了,一直到到前边,说道:“你爹不知,你这个不知怎么!”说毕,只见他来了,只见西门庆来到,不知道了一日,说了他,说:“爹,你不知道了。”西门庆道:“我不好不去。”西门庆道:“我,你不知道。”西门庆道:“你不知道。”于是走到前边坐了。西门庆因问:“你那日,你怎的不知?”月娘道:“不的我,不想了他来。你爹来家,你也不知他,只顾我来了。”西门庆道:“你说,你不知,你不知你。”这李铭、吴银儿、韩道国、李娇儿、李娇儿、孟玉楼、潘姥姥,都是李瓶儿。李瓶儿在炕上坐,不在他,看了一个。西门庆见了,只顾说话,说:“你们不曾,他不来,我不知道。”西门庆道:“你还要说话儿。只是我说的,不知怎的不得。”那妇人便不肯,把手子拿出来,说道:“你不知,你这等我来。”西门庆便道:“你不知,你还不知你,你不知道我。”妇人笑道:“你不知,你怎得不的来?’他说:‘你不知,你也有不知道!你若是你不知,不是你的。”妇人道:“不敢,你不知道。”说毕,西门庆笑了一回,说道:“你不知道,我不知怎么,到明日,我不知道。”那妇人听了,只顾叫:“我的儿,你不知你,你这等我来,你不在你,我把奴才来了,你与他这个头缠!”西门庆听见,不觉他来了,就不觉心下不觉。
正说:
一日无事不知时,一个无人情。不想一日不知,谁知道:
正在房中,忽有一阵风风,走到雪娥房里。那妇人见了一回,说道:“我的你,你还没去了!”西门庆道:“我说不是他。”西门庆便道:“我不知,我这一件事不知你,我不在他家,你怎得这般!”西门庆道:“我不得我,我也不得。”那妇人笑道:“怪狗,我的不知,不是那边,我只怕你不好?我就不知,你就不去。”妇人道:“我不知道,你也不吃了,我不知我怎的,你就不去?”西门庆道:“我的不知,你怎的不知?”那西门庆笑道:“你不知道,不是你说的,不知道,只怕我不的,他也不知,我不知怎的。你如今日不在,只是他这里。”那玉楼笑着不得,只说:“大姐姐,你这个小厮儿,我不知你这里,你就没了!”那玉箫道:“他是不知,我的不知怎么,不敢去了。”月娘道:“不知道了。”西门庆道:“你这个不好!”西门庆听了,说道:“我不的,你这个不在那里,不想他来家去。”月娘道:“你不知怎么?”西门庆道:“他不知道,不想你家人家不的,只是他家,不敢说,我就是不得你!”那李铭道:“不知你,我在那里,就是你去!你不在我家,我也不知。”月娘笑道:“不是你,我不得我。我这个说话,我还有个头儿,我说你怎的?”李瓶儿道:“我,不要你看,我说你不知,你也不好了。”于是走到后边,只见月娘说,不在那边,只见小玉拿着两盒子,一面都打了,一个小厮,不在家。西门庆便问:“你家,我怎不得?”月娘道:“你这个不知道,不知道了。”西门庆道:“你说道:‘我不知怎的不知?我怎的不知道?我也不知,你不知我,不是你的人,你不知,我就把你那个儿打了一个,把他一个儿也不得了,我也不好。”月娘道:“你的,不是你的。”那李瓶儿连忙走来了。月娘道:“我不知道,我不知你,他不好。”西门庆道:“你这等不知,只是我的不是你。你说我在这里,我不在我家。”说着一声。西门庆道:“我这个说不是,只是你老爹,我也不敢说。他不知怎的的不得?”西门庆道:“你休说。”西门庆道:“既是这里,我不知道!”说毕,又是一个人家,一个小优儿,拿着琵琶,唱了一套《黄吕》儿,又把花子儿来了。
西门庆道:“你家,不是你家。你这里有甚么?”西门庆道:“我不要紧,你就去了。”西门庆道:“我也是我不的,不好不的。”月娘道:“不好,我不的。”那春梅道:“我的你不在那里,我和你说话,只怕他不知道!”那春梅笑道:“我说你好,你不要我,我不知你。”这西门庆笑道:“你不的你说了,他怎的好不的?”那玉楼道:“我这里没人。”月娘道:“你不的,你还没了哩!”那玉箫骂道:“怪囚根子,谁敢是你!那日没的,我来不知你。你来家不去,不想你去了。”妇人笑道:“我,你休怪我说了,你不知你,你不好,我不要他来!你说他怎的不好?我的我不要他!”金莲道:“我不好,我的你不知道!我不在这边,我不想我。”那妇人道:“我不知我,你休要说,我就不知你。我不知我,不知你,我不知你这奴才,你不好。”那西门庆听见,把手中放了一顿,说道:“你休怪淫妇,你不好说,你不是你家,我也不是你。”那妇人道:“他的,我也没了,你不在这里,你怎的不来?”西门庆道:“不知你这等我,你还不要来!”那妇人道:“不知,你休要说,我也不知你。你如今我这般,你也不是,你自从你去了!”西门庆道:“我不知你,不知我这些儿。你不好不去!”那玉箫便不言语,一面走到前边,月娘说:“你不知道,我不要我,你说道:‘他来不要你,我不知你,你休要他。我说我也不得,你只顾要我,我不知你,只要我这个不知道。”西门庆道:“我不要你,我不知你不知,怎的不来我,我和你说话儿。你说他,你不知我,你也不好。”西门庆道:“他的,你不好,我就要你这般去。”妇玉道:“你这里,你这等我不来。”于是把李瓶儿打发了,一回去了。月娘道:“你说,你不好不吃,你不好?”那妇人道:“你不知,我也不是。”那玉箫道:“你的好,你不在这里,我怎的不得?”妇人道:“你这等我说,我不是你。”西门庆道:他不是你,你也没的,不好不的。他不好,我也不是我。”那玉楼道:“你还没个,你说他这个好儿。”那玉楼道:“你这个不的,你不要他,只怕我不好!”月娘道:“我的你,只是你一个不好?我的你不好!你若不知,你不知道,我不好,我不好!”西门庆道:“不好,我也不好,我不好不知。我不想你那个不好!”西门庆道:“你不吃。你不好,你这般,你不在你屋里睡罢。”于是把银子打了一回来,把他打发,打发他去了。正是:
风无非,无非,难不可知。
话说一夜不题。
且说西门庆到家,早辰,早晨出来安儿,早晚来,只见他往后边去了。西门庆见来,只见他来,不想他说道:“你爹来,我这等我不知道了,你不知你,不知你这里有些。”那婆子听了,只顾说着,不知道:“你的我这般,说不是他的。”月娘道:“他不是你。我说我的他,不是你的他。他不在我,他就不在家,你不知道,不知你这个不是他。”西门庆笑笑道:“我的,我不是你,你说他的,不知你不知怎么,你不知怎么?”那妇人道:“我不得,你不知道。你是你家人,我不是这等我。你若不知,我不知你这奴才。”妇人道:“我也不知,他就要了。你这等不知,我不知怎的,只怕你不知道。我就要我说。他不好,他只怕你来。”西门庆问:“你不知,我就是他。”月娘道:“他的不好,我也不知道,你也不知你,不敢来,你就是了。我不知道,我也不得你,不知你不知,你不在那里,我不知你。我这里有些不好,我就来了我。你这等你来!”西门庆道:“你不知道,我这等你不去。我若是你这等我,你就不知道。我也没了。”西门庆道:“我也不吃你,就把你去罢。”西门庆便道:“我不知道,你也是我不得,只怕不好。”西门庆道:“你不的,你还不知,我不知怎么!”月娘道:“你不好,我不好,你也没了。我一个儿也有不的。”西门庆道:“你也有这等,我也不要吃了。”西门庆道:“我不知,我这等你。你若不知你,我把你我不知,我不知你这等你去。”那妇人道:“我不知你。你若不要我,你休要我,我不要吃你,我把你就在我上,你不知怎么?”李瓶儿道:“你不好,你我说我,我不知道了,你不知,你这里吃酒,我不知,我把他也没了。”妇人道:“你休要他,只顾你说,你也不要他。”那妇人笑嘻嘻道:“他的你说,只怕是他的,我不是你。”那西门庆不知道,只怕不的。
且说月娘,不见话下。
且说西门庆进来,只顾说着,不知道话。那日晚夕,月娘在炕上坐着,忽想出来,见西门庆在家,只是一日,他来到房中。西门庆便叫春梅:“我来,你不吃,我这里来?我和你说:‘你来,我还不知道。你那个淫妇!你也没有甚么?”那妇人道:“你说,我不知道。我是他这等他,我就不知他,你怎的不知他?”月娘道:“他不的,只怕你这里来。”月娘道:“我不好,不是他。你不知,你还不去!”西门庆道:“我说你,我就要了他,你这里有话。你这等说他来,就是他的,你不知道。我就是他的,我的不知,怎的不知道!”月娘道:“不敢说他。”月娘道:“我的不是,只要我一个不好。”西门庆道:“你不是,只怕这里,我就没了。”那春梅笑嘻嘻走到,只问:“你爹,他怎的不得?”西门庆道:“你不是我的,我不好了,我不去了。”那春梅道:“你还不去,只怕他去。”西门庆道:“我也不好,你就是我去了。你还不去罢,我就把你来家。”西门庆道:“我不知道,我不知你,我不在家,你就是我说。”于是把李瓶儿拿着,只是小厮,把他打开了。月娘道:“你的,不知道,怎的说了!”西门庆道:“我不知,你怎的不见?”西门庆道:“你这里说,我就是你,我说不知你这里来!你这等我不知你,不知你这里有个不的?”李瓶儿便道:“你不知道,我这里不知道,我不是那边去了。他不知你不的,不知我,只顾你这个不知。”西门庆道:“我的不是你,他不知道了。”那春梅道:“他是他的。我也不要他,他不知道,只是不知,你这个不知道!我也是不知的事,你也不好。”妇人道:“他不的,只要我说,你也不去。”西门庆道:“不是你,我不好。你这等你,你不知他罢,不知我的不是你。”西门庆道:“我不知,我也不是。你不知,只是我这一般,你不知道了。”于是把银簪儿,打发了一个。西门庆道:“我不要你,不是我。”西门庆道:“你不知道,我不敢说我?”于是拿过一钟酒来。伯爵道:“我的你不说。我一日儿没了,我也不吃。”西门道:“我不要,你这个说话。”那李瓶儿不肯,说道:“小的,你不好,只要吃了些酒。我就不在家,不是我说话。”于是打了他两个,说了一遍。西门庆因问:“你的甚么?”西门庆道:“我不知道。”说道:“我的你说话!”西门庆道:“我的不知,你还有个人儿。”西门大官人听了,又不肯出来。西门庆道:“我说你不是。”西门庆道:“我不知,我这等他去,你不知,我也不知。”那春梅便问:“我,你怎的不得你?”西门庆道:“他,我也不是你。”李瓶儿道:“我,我不要说。”那玉箫又道:“我的小淫妇儿,你怎的不去!”那玉楼笑嘻嘻笑道:“我的不是你,我的你也罢。”月娘道:“我不知你,不知你怎样的,只怕我不在这里,教我和你说,你这个不知怎的?”妇人道:“我说,我不知怎的?”那西门庆听了,便道:“你这里,我就没个人儿!”这妇人道:“你也不要,你说他。”那婆子笑道:“我不知,你也不是你。我不知道他,我也不知,你就不是。”西门庆笑道:“不的,你不在我,你说话儿。你说着你家去,我就是了。”西门庆道:“我不好,他不知道!”西门庆道:“你休不信他,只是我一般儿,他不是我的。”于是向袖中取出出来,递与月娘。妇人道:“我,不知你,你这等我来?”西门庆道:“我不知你,我也不是你,我也不好不得,你不好,不想我的我不知,你
In [ ]:
checkpoint = 'checkpoints/i200_l512.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)
In [ ]:
checkpoint = 'checkpoints/i600_l512.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)
In [ ]:
checkpoint = 'checkpoints/i1200_l512.ckpt'
samp = sample(checkpoint, 1000, lstm_size, len(vocab), prime="Far")
print(samp)
Content source: oscarmore2/deep-learning-study
Similar notebooks: