本次实验设计主要修改自Pytorch官方提供的教程,在官方的教程上进行了整合。如对本次课件的内容有任何疑惑的同学可以直接微信我或者邮件到cuizhiying.csu@gmail.com
原作者: `Sean Robertson <https://github.com/spro/practical-
These are all good additions to the understanding of RNN.:
我们在没有CUDA的情况下跑整个实验,但是有CUDA的话自然会更好,首先,检查和配置CUDA环境。
In [1]:
import torch
is_cuda = True if torch.cuda.is_available() else False
print(is_cuda)
选着一张空闲的卡
In [2]:
id = 1
torch.cuda.set_device(id)
print( torch.cuda.current_device() )
We will be building and training a basic character-level RNN to classify words. A character-level RNN reads words as a series of characters - outputting a prediction and "hidden state" at each step, feeding its previous hidden state into each next step. We take the final prediction to be the output, i.e. which class the word belongs to.
Specifically, we'll train on a few thousand surnames from 18 languages of origin, and predict which language a name is from based on the spelling:
Download the data from here and extract it to the current directory.
Included in the data/names
directory are 18 text files named as
"[Language].txt". Each file contains a bunch of names, one name per
line, mostly romanized (but we still need to convert from Unicode to
ASCII).
We'll end up with a dictionary of lists of names per language,
{language: [names ...]}
. The generic variables "category" and "line"
(for language and name in our case) are used for later extensibility.
In [9]:
from __future__ import unicode_literals, print_function, division
from io import open
import glob
import os
def findFiles(path): return glob.glob(path)
print(findFiles('data/names/*.txt'))
import unicodedata
import string
all_letters = string.ascii_letters + " .,;'"
n_letters = len(all_letters)
# Turn a Unicode string to plain ASCII, thanks to https://stackoverflow.com/a/518232/2809427
def unicodeToAscii(s):
return ''.join(
c for c in unicodedata.normalize('NFD', s)
if unicodedata.category(c) != 'Mn'
and c in all_letters
)
print(unicodeToAscii('Ślusàrski'))
# Build the category_lines dictionary, a list of names per language
all_categories = []
category_lines = {}
# Split it into training set and validation set
training_lines = {}
validation_lines = {}
# Read a file and split into lines
def readLines(filename):
lines = open(filename, encoding='utf-8').read().strip().split('\n')
return [unicodeToAscii(line) for line in lines]
for filename in findFiles('data/names/*.txt'):
category = os.path.splitext(os.path.basename(filename))[0]
all_categories.append(category)
lines = readLines(filename)
category_lines[category] = lines
num_of_training_set = int(len(lines)*0.8)
training_lines[category] = lines[:num_of_training_set]
validation_lines[category] = lines[num_of_training_set:]
n_categories = len(all_categories)
print(n_categories)
Now we have category_lines
, a dictionary mapping each category
(language) to a list of lines (names). We also kept track of
all_categories
(just a list of languages) and n_categories
for
later reference.
In [3]:
print(category_lines['Italian'][:5])
Now that we have all the names organized, we need to turn them into Tensors to make any use of them.
To represent a single letter, we use a "one-hot vector" of size
<1 x n_letters>
. A one-hot vector is filled with 0s except for a 1
at index of the current letter, e.g. "b" = <0 1 0 0 0 ...>
.
To make a word we join a bunch of those into a 2D matrix
<line_length x 1 x n_letters>
.
That extra 1 dimension is because PyTorch assumes everything is in batches - we're just using a batch size of 1 here.
In [4]:
import torch
# Find letter index from all_letters, e.g. "a" = 0
def letterToIndex(letter):
return all_letters.find(letter)
# Just for demonstration, turn a letter into a <1 x n_letters> Tensor
def letterToTensor(letter):
tensor = torch.zeros(1, n_letters)
tensor[0][letterToIndex(letter)] = 1
if is_cuda:
tensor = tensor.cuda()
return tensor
# Turn a line into a <line_length x 1 x n_letters>,
# or an array of one-hot letter vectors
def lineToTensor(line):
tensor = torch.zeros(len(line), 1, n_letters)
for li, letter in enumerate(line):
tensor[li][0][letterToIndex(letter)] = 1
if is_cuda:
tensor = tensor.cuda()
return tensor
# Tensor here, someone else may call it vector.
print(letterToTensor('J'))
print(lineToTensor('Jones').size())
Before autograd, creating a recurrent neural network in Torch involved cloning the parameters of a layer over several timesteps. The layers held hidden state and gradients which are now entirely handled by the graph itself. This means you can implement a RNN in a very "pure" way, as regular feed-forward layers.
Inside the forward function, we need to Loop
In [26]:
import torch.nn as nn
class BaseRNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(BaseRNN, self).__init__()
self.hidden_size = hidden_size
# input to hidden
self.i2h = nn.Linear(input_size, hidden_size)
# hidden to hidden
self.h2h = nn.Linear(hidden_size, hidden_size)
#############################################
#
# Change your activation function here
#
#############################################
# self.activation = nn.Tanh()
self.activation = nn.ReLU()
# hidden to output
self.h2o = nn.Linear(hidden_size, output_size)
def step(self, letter, hidden):
i2h = self.i2h(letter)
h2h = self.h2h(hidden)
hidden = self.activation( h2h+i2h )
output = self.h2o(hidden)
return output, hidden
def forward(self, word):
hidden = self.initHidden(is_cuda)
for i in range(word.size()[0]):
# Only the last output will be used to predict
output, hidden = self.step(word[i], hidden)
return output
def initHidden(self, is_cuda=True):
if is_cuda:
return torch.zeros(1, self.hidden_size).cuda()
else:
return torch.zeros(1, self.hidden_size)
n_hidden = 128
rnn = BaseRNN(n_letters, n_hidden, n_categories)
if is_cuda:
rnn = rnn.cuda()
In [37]:
###############################################################################
#
# Finish a the following model, and train it
#
###############################################################################
class DeeperRNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(DeeperRNN, self).__init__()
self.hidden1_size = hidden_size
self.hidden2_size = hidden_size
self.layer1 = BaseRNN(input_size, hidden_size, hidden_size)
self.layer2 = BaseRNN(hidden_size, hidden_size, output_size)
def step(self, letter, hidden1, hidden2):
############################################################
# complete this function
############################################################
output1, hidden1 = self.layer1.step(letter, hidden1)
output2, hidden2 = self.layer2.step(output1, hidden2)
return output2, hidden1, hidden2
def forward(self, word):
############################################################
# complete this function
############################################################
hidden1, hidden2 = self.initHidden(is_cuda)
for i in range(word.size()[0]):
output, hidden1, hidden2 = self.step(word[i], hidden1, hidden2)
return output
def initHidden(self, is_cuda=True):
if is_cuda:
return torch.zeros(1, self.hidden1_size).cuda(), torch.zeros(1, self.hidden2_size).cuda()
else:
return torch.zeros(1, self.hidden1_size), torch.zeros(1, self.hidden2_size)
###############################################################################
# Remember to uncomment the following two line after you finish your model and
# start to retrain your model
###############################################################################
n_hidden = 128
rnn = DeeperRNN(n_letters, n_hidden, n_categories)
rnn = rnn.cuda() if is_cuda else rnn
To run a step of this network we need to pass an input (in our case, the Tensor for the current letter) and a previous hidden state (which we initialize as zeros at first). We'll get back the output (probability of each language) and a next hidden state (which we keep for the next step).
In [8]:
input = letterToTensor('A')
hidden =torch.zeros(1, n_hidden)
hidden = hidden.cuda() if is_cuda else hidden
output, next_hidden = rnn.step(input, hidden)
print(output.shape)
print(next_hidden.shape)
For the sake of efficiency we don't want to be creating a new Tensor for
every step, so we will use lineToTensor
instead of
letterToTensor
and use slices. This could be further optimized by
pre-computing batches of Tensors.
Each loop in side forward function will:
Read each letter in and
Compare final output to target
In [9]:
input = lineToTensor('Albert')
print(n_hidden)
output = rnn(input)
print(output)
print(output.shape)
As you can see the output is a <1 x n_categories>
Tensor, where
every item is the likelihood of that category (higher is more likely).
In [10]:
def categoryFromOutput(output):
top_n, top_i = output.topk(1)
category_i = top_i[0].item()
return all_categories[category_i], category_i
print(categoryFromOutput(output))
We will also want a quick way to get a training example (a name and its language):
In [11]:
import random
def randomChoice(l):
return l[random.randint(0, len(l) - 1)]
def randomTrainingExample():
category = randomChoice(all_categories)
# attention: split training set
line = randomChoice(training_lines[category])
category_tensor = torch.tensor([all_categories.index(category)], dtype=torch.long)
category_tensor = category_tensor.cuda() if is_cuda else category_tensor
line_tensor = lineToTensor(line)
return category, line, category_tensor, line_tensor
def randomValidationExample():
category = randomChoice(all_categories)
# attention: split validation set
line = randomChoice(validation_lines[category])
category_tensor = torch.tensor([all_categories.index(category)], dtype=torch.long)
category_tensor = category_tensor.cuda() if is_cuda else category_tensor
line_tensor = lineToTensor(line)
return category, line, category_tensor, line_tensor
for i in range(10):
category, line, category_tensor, line_tensor = randomTrainingExample()
print('category =', category, '/ line =', line)
In [38]:
criterion = nn.CrossEntropyLoss()
learning_rate = 0.005 # If you set this too high, it might explode. If too low, it might not learn
def train(category_tensor, line_tensor):
output = rnn(line_tensor)
rnn.zero_grad()
loss = criterion(output, category_tensor)
loss.backward()
# Add parameters' gradients to their values, multiplied by learning rate
for p in rnn.parameters():
if hasattr(p.grad, "data"):
p.data.add_(-learning_rate, p.grad.data)
return output, loss.item()
Now we just have to run that with a bunch of examples. Since the
train
function returns both the output and loss we can print its
guesses and also keep track of loss for plotting. Since there are 1000s
of examples we print only every print_every
examples, and take an
average of the loss.
In [39]:
import time
import math
n_iters = 100000
print_every = 5000
plot_every = 1000
# Keep track of losses for plotting
current_loss = 0
all_losses = []
def timeSince(since):
now = time.time()
s = now - since
m = math.floor(s / 60)
s -= m * 60
return '%dm %ds' % (m, s)
start = time.time()
for iter in range(1, n_iters + 1):
category, line, category_tensor, line_tensor = randomTrainingExample()
output, loss = train(category_tensor, line_tensor)
current_loss += loss
# Print iter number, loss, name and guess
if iter % print_every == 0:
guess, guess_i = categoryFromOutput(output)
correct = '✓' if guess == category else '✗ (%s)' % category
print('%d %d%% (%s) %.4f %s / %s %s' % (iter, iter / n_iters * 100, timeSince(start), loss, line, guess, correct))
# Add current loss avg to list of losses
if iter % plot_every == 0:
all_losses.append(current_loss / plot_every)
current_loss = 0
In [29]:
import matplotlib.pyplot as plt
#################################################
#
# Tips: your could use plt.plot
#
#################################################
plt.title('losses')
plt.plot(all_losses)
plt.show()
In [40]:
# DRNN
plt.title('losses')
plt.plot(all_losses)
plt.show()
To see how well the network performs on different categories, we will
create a confusion matrix, indicating for every actual language (rows)
which language the network guesses (columns). To calculate the confusion
matrix a bunch of samples are run through the network with
evaluate()
, which is the same as train()
minus the backprop.
In [41]:
# Keep track of correct guesses in a confusion matrix
confusion_training = torch.zeros(n_categories, n_categories)
confusion_validation = torch.zeros(n_categories, n_categories)
n_confusion = 5000
# Just return an output given a line
def evaluate(line_tensor):
rnn.eval()
output = rnn(line_tensor)
return output
# Go through a bunch of examples and record which are correctly guessed
for i in range(n_confusion):
category, line, category_tensor, line_tensor = randomTrainingExample()
output = evaluate(line_tensor)
guess, guess_i = categoryFromOutput(output)
category_i = all_categories.index(category)
confusion_training[category_i][guess_i] += 1
# Go through a bunch of examples and record which are correctly guessed
for i in range(n_confusion):
category, line, category_tensor, line_tensor = randomValidationExample()
output = evaluate(line_tensor)
guess, guess_i = categoryFromOutput(output)
category_i = all_categories.index(category)
confusion_validation[category_i][guess_i] += 1
# catcul acc
right_train = 0
right_valid = 0
for i in range(n_categories):
right_train += confusion_training[i][i]
right_valid += confusion_validation[i][i]
acc_train = right_train / n_confusion
acc_valid = right_valid / n_confusion
# Normalize by dividing every row by its sum and
for i in range(n_categories):
confusion_training[i] = confusion_training[i] / confusion_training[i].sum()
confusion_validation[i] = confusion_validation[i] / confusion_validation[i].sum()
# Set up plot
fig = plt.figure()
ax1 = fig.add_subplot(121)
cax1 = ax1.matshow(confusion_training.numpy())
ax2 = fig.add_subplot(122)
cax2 = ax2.matshow(confusion_validation.numpy())
# Set up axes
ax1.set_xticklabels([''] + all_categories, rotation=90)
ax1.set_yticklabels([''] + all_categories)
ax2.set_xticklabels([''] + all_categories, rotation=90)
# sphinx_gallery_thumbnail_number = 2
plt.show()
print("Traing set Acc is", acc_train.item())
print("validation set Acc is", acc_valid.item())
You can pick out bright spots off the main axis that show which languages it guesses incorrectly, e.g. Chinese for Korean, and Spanish for Italian. It seems to do very well with Greek, and very poorly with English (perhaps because of overlap with other languages).
In [20]:
def predict(input_line, n_predictions=3):
print('\n> %s' % input_line)
with torch.no_grad():
output = evaluate(lineToTensor(input_line))
output = torch.nn.functional.softmax(output, dim=1)
# Get top N categories
topv, topi = output.topk(n_predictions, 1, True)
predictions = []
for i in range(n_predictions):
value = topv[0][i].item()
category_index = topi[0][i].item()
print('Probability (%.2f) %s' % (value, all_categories[category_index]))
predictions.append([value, all_categories[category_index]])
predict('Dovesky')
predict('Jackson')
predict('Satoshi')
predict("Cui")
predict("Zhuang")
predict("Xue")
predict("Wang")
predict('Chen')
predict('Xiu')
predict('Jia')
==============================================================================================================
1、在第1.4.3
节中编写代码,打印出训练过程当中,loss值的变化,(loss已经在训练过程中记录,找出并画成曲线即可,你也可以选择自己修改训练过程中的代码,自己实现重新画成更加漂亮的曲线。)
2、在第1.6
节,将自己的姓名的每一个字的拼音输入进去,进行预测,看模型预测是否正确(保留上面的预测结果)。如名字为张三
,则输入predict('Zhang')
和predict('San')
.
3、编辑该cell,在下面一行回答问题。当前的BaseRNN
中使用的激活函数修改成relu
(在第1.3小节中),然后重新训练模型,看性能是否有变化。
当前的激活函数是(Tanh),训练集准确率为:(0.7675999999046326)验证集的准确率为:(0.39480000734329224)
激活函数为Relu
时的训练准确率为:(0.770799994468689)验证集的准确率为:(0.4124000072479248)
4、编辑该cell,在下面一行记录当前的模型的准确率,然后,回到1.3中,编辑DeeperRNN
类,使其比BaseRNN
类更深一层,并且重新训练这个更深的网络,看新能是否有提高(记得去掉DeeperRNN
那一个cell中的最后两行的注释)。对于RNN的隐藏层的增加可阅读下一个cell的补充理解
BaseRNN
的训练集准确率为:(0.770799994468689)验证集的准确率为:(0.4124000072479248)
DeeperRNN
的训练准确率为:(0.8001999855041504)验证集的准确率为:(0.4189999997615814)
补充材料:
RNNs are neural networks and everything works monotonically better (if done right) if you put on your deep learning hat and start stacking models up like pancakes. For instance, we can form a 2-layer recurrent network as follows:
y1 = rnn1.step(x) y = rnn2.step(y1)In other words we have two separate RNNs: One RNN is receiving the input vectors and the second RNN is receiving the output of the first RNN as its input. Except neither of these RNNs know or care - it’s all just vectors coming in and going out, and some gradients flowing through each module during backpropagation.
原文阅读点击这里
多层hidden网络结构图如下图所示,其中虚线输出的output表示改output值被丢弃,没有进一步使用:
We are still hand-crafting a small RNN with a few linear layers. The big difference is instead of predicting a category after reading in all the letters of a name, we input a category and output one letter at a time. Recurrently predicting characters to form language (this could also be done with words or other higher order constructs) is often referred to as a "language model".
This network extends the last tutorial's RNN with an extra argument for the category tensor, which is added with the others. The category tensor is a one-hot vector just like the letter input.
We will interpret the output as the probability of the next letter. When sampling, the most likely output letter is used as the next input letter.
I added a second linear layer h2o
(after combining hidden and
output) to give it more muscle to work with. There's also a dropout
layer, which randomly zeros parts of its
input with a given probability
(here 0.2) and is usually used to fuzz inputs to prevent overfitting.
Here we're using it towards the end of the network to purposely add some
chaos and increase sampling variety.
In [2]:
import torch
import torch.nn as nn
class GenerateRNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(GenerateRNN, self).__init__()
self.hidden_size = hidden_size
self.c2h = nn.Linear(n_categories, hidden_size)
self.i2h = nn.Linear(input_size, hidden_size)
self.h2h = nn.Linear(hidden_size, hidden_size)
self.activation = nn.Tanh()
self.h2o = nn.Linear(hidden_size, output_size)
self.dropout = nn.Dropout(0.2)
self.softmax = nn.LogSoftmax(dim=1)
def forward(self, category, input, hidden):
c2h = self.c2h(category)
i2h = self.i2h(input)
h2h = self.h2h(hidden)
hidden = self.activation( c2h+i2h+h2h )
dropout = self.dropout(self.h2o(hidden))
output = self.softmax(dropout)
return output, hidden
def initHidden(self, is_cuda=True):
if is_cuda:
return torch.zeros(1, self.hidden_size).cuda()
else:
return torch.zeros(1, self.hidden_size)
In [3]:
import random
# Random item from a list
def randomChoice(l):
return l[random.randint(0, len(l) - 1)]
# Get a random category and random line from that category
def randomTrainingPair():
category = randomChoice(all_categories)
line = randomChoice(category_lines[category])
return category, line
For each timestep (that is, for each letter in a training word) the
inputs of the network will be
(category, current letter, hidden state)
and the outputs will be
(next letter, next hidden state)
. So for each training set, we'll
need the category, a set of input letters, and a set of output/target
letters.
Since we are predicting the next letter from the current letter for each
timestep, the letter pairs are groups of consecutive letters from the
line - e.g. for "ABCD<EOS>"
we would create ("A", "B"), ("B", "C"),
("C", "D"), ("D", "EOS").
The category tensor is a one-hot
tensor of size
<1 x n_categories>
. When training we feed it to the network at every
timestep - this is a design choice, it could have been included as part
of initial hidden state or some other strategy.
Or you could look at here for another description. I think this image could
tell you everything.
In [4]:
# One-hot vector for category
def categoryTensor(category):
li = all_categories.index(category)
tensor = torch.zeros(1, n_categories)
tensor[0][li] = 1
if is_cuda:
tensor = tensor.cuda()
return tensor
# One-hot matrix of first to last letters (not including EOS) for input
def inputTensor(line):
tensor = torch.zeros(len(line), 1, n_letters)
for li in range(len(line)):
letter = line[li]
tensor[li][0][all_letters.find(letter)] = 1
if is_cuda:
tensor = tensor.cuda()
return tensor
# LongTensor of second letter to end (EOS) for target
def targetTensor(line):
letter_indexes = [all_letters.find(line[li]) for li in range(1, len(line))]
letter_indexes.append(n_letters - 1) # EOS
tensor = torch.LongTensor(letter_indexes)
if is_cuda:
tensor = tensor.cuda()
return tensor
For convenience during training we'll make a randomTrainingExample
function that fetches a random (category, line) pair and turns them into
the required (category, input, target) tensors.
In [5]:
# Make category, input, and target tensors from a random category, line pair
def randomTrainingExample():
category, line = randomTrainingPair()
category_tensor = categoryTensor(category)
input_line_tensor = inputTensor(line)
target_line_tensor = targetTensor(line)
return category_tensor, input_line_tensor, target_line_tensor
In contrast to classification, where only the last output is used, we are making a prediction at every step, so we are calculating loss at every step.
The magic of autograd allows you to simply sum these losses at each step and call backward at the end.
For the loss function nn.NLLLoss
is appropriate, since the last
layer of the RNN is nn.LogSoftmax
.
For the different between nn.NLLLoss
and the nn.LogSoftmax
, we could Look
at the source code, the most import line is quote as follow:
def cross_entropy(input, target):
return nll_loss(log_softmax(input, 1))
In a word, Cross entropy combines log_softmax
and nll_loss
in a single function.
In [6]:
criterion = nn.NLLLoss()
learning_rate = 0.0005
def train(category_tensor, input_line_tensor, target_line_tensor):
target_line_tensor.unsqueeze_(-1)
hidden = rnn.initHidden(is_cuda)
rnn.zero_grad()
loss = 0
# Take care of the loss function,
# it could be visualized as the following figure
for i in range(input_line_tensor.size(0)):
output, hidden = rnn(category_tensor, input_line_tensor[i], hidden)
l = criterion(output, target_line_tensor[i])
loss += l
loss.backward()
for p in rnn.parameters():
if hasattr(p.grad, "data"):
p.data.add_(-learning_rate, p.grad.data)
return output, loss.item() / input_line_tensor.size(0)
The loss function could be shown as the following picture. The network forward through entire sequence to compute loss, then backward through entire sequence to compute gradient
To keep track of how long training takes I am adding a timeSince(timestamp) function which returns a human readable string:
In [7]:
import time
import math
def timeSince(since):
now = time.time()
s = now - since
m = math.floor(s / 60)
s -= m * 60
return '%dm %ds' % (m, s)
Training is business as usual - call train a bunch of times and wait a few minutes, printing the current time and loss every print_every examples, and keeping store of an average loss per plot_every examples in all_losses for plotting later.
In [32]:
is_cuda = False
rnn = GenerateRNN(n_letters, 128, n_letters)
rnn = rnn.cuda() if is_cuda else rnn
n_iters = 100000
print_every = 5000
plot_every = 500
all_losses = []
total_loss = 0 # Reset every plot_every iters
start = time.time()
for iter in range(1, n_iters + 1):
output, loss = train(*randomTrainingExample())
total_loss += loss
if iter % print_every == 0:
print('%s (%d %d%%) %.4f' % (timeSince(start), iter, iter / n_iters * 100, total_loss/iter))
if iter % plot_every == 0:
all_losses.append(total_loss / iter)
In [34]:
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
plt.figure()
plt.plot(all_losses)
plt.show()
To sample we give the network a letter and ask what the next one is, feed that in as the next letter, and repeat until the EOS token.
output_name
with the starting letterUp to a maximum output length,
output_name
and continueReturn the final name
Note: Rather than having to give it a starting letter, another strategy would have been to include a "start of string" token in training and have the network choose its own starting letter.
In [35]:
max_length = 20
# Sample from a category and starting letter
def sample(category, start_letter='A'):
with torch.no_grad():
# no need to track history in sampling
category_tensor = categoryTensor(category)
input = inputTensor(start_letter)
hidden = rnn.initHidden(is_cuda)
output_name = start_letter
for i in range(max_length):
output, hidden = rnn(category_tensor, input[0], hidden)
topv, topi = output.topk(1)
topi = topi[0][0]
if topi == n_letters - 1:
break
else:
letter = all_letters[topi]
output_name += letter
input = inputTensor(letter)
return output_name
# Get multiple samples from one category and multiple starting letters
def samples(category, start_letters='ABC'):
for start_letter in start_letters:
print(sample(category, start_letter))
print("\n")
samples('Russian', 'CCZZYY')
samples('German', 'CCZZYY')
samples('Spanish', 'CCZZYY')
samples('Chinese', 'CCZZYY')
In [36]:
samples('Chinese', 'CXJ')
samples('Chinese', 'CXJ')
samples('Chinese', 'CXJ')
2、请回答:为什么这个模型这个模型训练好了之后,输入同样的参数会产生不一样的结果?并在上面的程序中验证你自己的猜测
答:因为RNN中使用了Dropout来防止过拟合,因此每次forward的参数都可能不一样。
3、这是一个生成模型,该模型只输入了一个字母,就可以预测一个单词,请问改模型在开始预测什么时候终止预测新的字符,即这个生成模型是如何确定生成的单词的长度的?
答:在预估新的字符为'
时终止预测,即新的预测字符的下标为n_letters - 1
时。如果预测的单词长度超过max_length
,也终止预测。
4、根据下一个cell的提示,以impantance sampling的方式,实现程序在同样输入的情况下,可能生成不一样的名字。
Importance sampling指的是,在生成单词每一个字母的时候,一般的生成模型都有是选择输出结果中概率值最大的字母作为预测,而Importance sampling的方式则是按照各个字母的输出概率来选择输出哪个模型。举个栗子,当程序输入一个字母C
的时候,程序预测下一个字母为[a, o, e]
的概率分别为[0.7, 0.2, 0.1]
, 普通的做法是直接选择概率值最大的a
作为预测结果,而Importance Sampling
的方式则是以70%
的概率选择a
,以20%
的概率选择o
,以10%
的该路选择e
,最终选到哪个有一定的随机性。
请根据你的理解,补充完成下面的Importance sampling
的预测。
Tips: 可使用numpy.random.choice
函数,文档请点击这里
Tips: 这部分的代码与上面的代码的不同之处仅仅在于如何选取预测值,所以,你只需要在下面指定的区域编辑代码即可。
Tips: 直接使用模型输出的概率值会比较麻烦,因为模型的概率输出值会很小,因为这里有50多种字符,所以,一个更加明智的做法是只在概率值最高的前3个或者5个中选择,记得要将概率使用softmax等函数将其转换到和为1.
Tips: 更加简单的做法是,仅仅使用模型输出的概率值进行排序,然后使用指定的概率值进行选择,如,指定选择概率最大的那个字母的概率值为0.5,第二大的为0.3,第三大的为0.2,然后可以直接输入[0.5, 0.3, 0.2].
In [44]:
import numpy as np
max_length = 20
# Sample from a category and starting letter
def sample(category, start_letter='A'):
with torch.no_grad(): # no need to track history in sampling
# initial the model and input
category_tensor = categoryTensor(category)
input = inputTensor(start_letter)
hidden = rnn.initHidden(is_cuda)
output_name = start_letter
for i in range(max_length):
# get the output from model
output, hidden = rnn(category_tensor, input[0], hidden)
# change it into probability
output = torch.exp(output)
#################################################################################
# code here
topv, topi = output.topk(5)
topv = topv / topv.sum()
index = np.random.choice(5, 1, p=topv.cpu().numpy()[0])
################################################################################
topi = topi[0][index]
if topi == n_letters - 1:
break
else:
letter = all_letters[topi]
output_name += letter
input = inputTensor(letter)
return output_name
# Get multiple samples from one category and multiple starting letters
def samples(category, start_letters='ABC'):
for start_letter in start_letters:
print(sample(category, start_letter))
print("\n")
rnn.eval()
samples('Russian', 'CYY')
samples('German', 'CCY')
samples('Spanish', 'CCZZYY')
samples('Chinese', 'CCCZZZYYY')