Since my local machine does not have GPU support and thus can't perform many model training and evaluation tasks in a reasonable amount of time, I have created a script evaluation.lua
in this repository which generates reports on a language model and serializes them in JSON. This notebook will consume these reports and explore them. It will also include some information about the models these reports were made for that is not included in the serialized report.
In [85]:
# load some requirements
import json
import matplotlib.pyplot as plt
with open('reports/unweightednoavg_one_layer_12.json', 'r') as f:
first_report = json.loads(f.read())
with open('reports/unweightednoavg_7.json', 'r') as f:
second_report = json.loads(f.read())
with open('reports/unweightednoavg_4.json', 'r') as f:
third_report = json.loads(f.read())
I created a model with 1 LSTM layer, a dropout of 0.1, and a hidden size of 300. Here we can look at it's structure:
nn.Sequential {
[input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> output]
(1): nn.LookupTable
(2): nn.LSTM(100 -> 512)
(3): nn.Dropout(0.10000)
(4): nn.DynamicView
(5): nn.Linear(300 -> 25000)
(6): nn.LogSoftMax
}
Notably, this one is a layer shallower and has a larger hidden size, with slightly reduced dropout. While it is not captured in the report, this model converged to it's final loss more quickly than the previous model. The use of adam also lead to considerably lower loss
This model experienced a reduced perplexity across each of the datasets:
In [86]:
# print out the losses from the report
print 'Training set perplexity:', first_report['train_perplexity']
print 'Validation set perplexity:', first_report['valid_perplexity']
print 'Test set perplexity:', first_report['test_perplexity']
In [87]:
with open('logs/log_series.json', 'r') as f:
logs = json.loads(f.read())
for k in logs.keys():
plt.plot(logs[k][0], logs[k][1], label=str(k))
plt.title('Loss v. Epoch')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()
In [88]:
# function for turning report data into scatter plot
def scatterize_batch_loss(report_batch_loss):
x = []
y = []
for i, v in enumerate(report_batch_loss):
if i > 50:
break # We'll only consider ones of length 50 and below to get a better view of the data in the chart.
if isinstance(v, list):
x.extend([i + 1 for j in v]) # same batch size for all losses in v
y.extend([j for j in v])
else:
if v is not None:
x.append(i)
y.append(v)
return x, y
In [89]:
%matplotlib inline
x, y = scatterize_batch_loss(first_report['train_batch_perplexities'])
plt.scatter(x, y)
plt.title('Training Perplexity v. Sequence Length')
plt.xlabel('Sequence Length')
plt.ylabel('Perplexity')
plt.show()
In [90]:
%matplotlib inline
x, y = scatterize_batch_loss(first_report['valid_batch_perplexities'])
plt.scatter(x, y)
plt.title('Validation Perplexity v. Sequence Length')
plt.xlabel('Sequence Length')
plt.ylabel('Perplexity')
plt.show()
In [91]:
%matplotlib inline
x, y = scatterize_batch_loss(first_report['test_batch_perplexities'])
plt.scatter(x, y)
plt.title('Test Perplexity v. Sequence Length')
plt.xlabel('Sequence Length')
plt.ylabel('Perplexity')
plt.show()
Notably, this model has a loss below 6 for sequences that are ~10 words or less.
We can also look at examples of how it generates text. Below are side by side comparisons of the labels from the training/validation/test set and the sentence the model generated. A Special <G>
token will be placed in the generated sequence to illustrate where the model's input ends and it's generation begins. I chose to look at only short sequences, as the models each have lower loss for these, and might stand a chance of answering correctly.
In [92]:
def print_sample(sample):
seq = sample['generated'].split(' ')
seq.insert(sample['supplied_length'] + 1, '<G>')
gold = sample['gold'].split(' ')
gold.insert(sample['supplied_length'], '<G>')
print('Gend: ' + ' '.join(seq))
print('True: ' + seq[1] + ' ' + ' '.join(gold) + '\n')
In [93]:
for sample in first_report['train_samples'][5:]:
print_sample(sample)
In [94]:
for sample in first_report['valid_samples'][0:5]:
print_sample(sample)
In [95]:
for sample in first_report['test_samples'][0:5]:
print_sample(sample)
This model has lower loss and doesn't seem to make quite as many gibberish mistakes in generation (double periods, long strings of <UNK>
, etc.) This is perhaps too small of a sample to make a real conclusion though. Like the previous model, it tends to favor abrupt endings, as it likely is being punished less for only getting a couple tokens wrong instead of a long sequence of wrong answers. It is also leaves an idea hanging, ending sentences with "the", etc.
I created a model with 2 LSTM layers, a dropout of 0.1, and a hidden size of 300. Here we can look at it's structure:
nn.Sequential {
[input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> output]
(1): nn.LookupTable
(2): nn.LSTM(100 -> 300)
(3): nn.Dropout(0.100000)
(4): nn.LSTM(300 -> 300)
(5): nn.Dropout(0.100000)
(6): nn.DynamicView
(7): nn.Linear(300 -> 25000)
(8): nn.LogSoftMax
}
I have created 3 datasets, built from the Google Billion Words data set. I trained on a version of the train_small
data set with a reduced vocabulary of 25000, in batches of size 50, with a sequence length cut off of 30. I did not tune any hyper parameters with the validation set, but this could be future work. There is also a small test set.
In [96]:
# print out the losses from the report
print 'Training set loss:', second_report['train_perplexity']
print 'Validation set loss:', second_report['valid_perplexity']
print 'Test set loss:', second_report['test_perplexity']
In [97]:
with open('logs/log_series_2_layer.json', 'r') as f:
logs = json.loads(f.read())
for k in logs.keys():
plt.plot(logs[k][0], logs[k][1], label=str(k))
plt.title('Loss v. Epoch')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()
We can examine the relationship between loss and sequence length. We can expect higher losses with increasing sequence length as more information must be remembered by the model as it generates, and the model is only trained on examples of sequence length 30 or less. We can generate a scatter plot of batch loss v. sequence length of batch (all batches are same size):
In [98]:
%matplotlib inline
x, y = scatterize_batch_loss(second_report['train_batch_perplexities'])
plt.scatter(x, y)
plt.title('Training Perplexity v. Sequence Length')
plt.xlabel('Sequence Length')
plt.ylabel('Perplexity')
plt.show()
In [99]:
%matplotlib inline
x, y = scatterize_batch_loss(second_report['valid_batch_perplexities'])
plt.scatter(x, y)
plt.title('Validation Perplexity v. Sequence Length')
plt.xlabel('Sequence Length')
plt.ylabel('Perplexity')
plt.show()
In [100]:
%matplotlib inline
x, y = scatterize_batch_loss(second_report['test_batch_perplexities'])
plt.scatter(x, y)
plt.title('Test Perplexity v. Sequence Length')
plt.xlabel('Sequence Length')
plt.ylabel('Perplexity')
plt.show()
We can also look at examples of how it generates text. Below are side by side comparisons of the labels from the training/validation/test set and the sentence the model generated. A Special <G>
token will be placed in the generated sequence to illustrate where the model's input ends and it's generation begins.
In [101]:
for sample in second_report['train_samples']:
print_sample(sample)
In [102]:
for sample in second_report['valid_samples'][0:5]:
print_sample(sample)
In [103]:
for sample in second_report['test_samples'][0:5]:
print_sample(sample)
I created a model with 2 LSTM layers, a dropout of 0.1, and a hidden size of 300. Here we can look at it's structure:
nn.Sequential {
[input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> output]
(1): nn.LookupTable
(2): nn.LSTM(100 -> 300)
(3): nn.Dropout(0.100000)
(4): nn.LSTM(300 -> 300)
(5): nn.Dropout(0.100000)
(6): nn.DynamicView
(7): nn.Linear(300 -> 25000)
(8): nn.LogSoftMax
}
I have created 3 datasets, built from the Google Billion Words data set. I trained on a version of the train_small
data set with a reduced vocabulary of 25000, in batches of size 50, with a sequence length cut off of 30. I did not tune any hyper parameters with the validation set, but this could be future work. There is also a small test set.
In [104]:
# print out the losses from the report
print 'Training set loss:', third_report['train_perplexity']
print 'Validation set loss:', third_report['valid_perplexity']
print 'Test set loss:', third_report['test_perplexity']
In [105]:
with open('logs/log_series_2_layer.json', 'r') as f:
logs = json.loads(f.read())
for k in logs.keys():
plt.plot(logs[k][0], logs[k][1], label=str(k))
plt.title('Loss v. Epoch')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()
We can examine the relationship between loss and sequence length. We can expect higher losses with increasing sequence length as more information must be remembered by the model as it generates, and the model is only trained on examples of sequence length 30 or less. We can generate a scatter plot of batch loss v. sequence length of batch (all batches are same size):
In [106]:
%matplotlib inline
x, y = scatterize_batch_loss(third_report['train_batch_perplexities'])
plt.scatter(x, y)
plt.title('Training Perplexity v. Sequence Length')
plt.xlabel('Sequence Length')
plt.ylabel('Perplexity')
plt.show()
In [107]:
%matplotlib inline
x, y = scatterize_batch_loss(third_report['valid_batch_perplexities'])
plt.scatter(x, y)
plt.title('Validation Perplexity v. Sequence Length')
plt.xlabel('Sequence Length')
plt.ylabel('Perplexity')
plt.show()
In [108]:
%matplotlib inline
x, y = scatterize_batch_loss(third_report['test_batch_perplexities'])
plt.scatter(x, y)
plt.title('Test Perplexity v. Sequence Length')
plt.xlabel('Sequence Length')
plt.ylabel('Perplexity')
plt.show()
We can also look at examples of how it generates text. Below are side by side comparisons of the labels from the training/validation/test set and the sentence the model generated. A Special <G>
token will be placed in the generated sequence to illustrate where the model's input ends and it's generation begins.
In [109]:
for sample in third_report['train_samples']:
print_sample(sample)
In [110]:
for sample in third_report['valid_samples'][0:5]:
print_sample(sample)
In [111]:
for sample in third_report['test_samples'][0:5]:
print_sample(sample)
In [ ]: