Skip-gram word2vec

In this notebook, I'll lead you through using TensorFlow to implement the word2vec algorithm using the skip-gram architecture. By implementing this, you'll learn about embedding words for use in natural language processing. This will come in handy when dealing with things like translations.

Readings

Here are the resources I used to build this notebook. I suggest reading these either beforehand or while you're working on this material.

A really good conceptual overview of word2vec from Chris McCormick
First word2vec paper from Mikolov et al.
NIPS paper with improvements for word2vec also from Mikolov et al.
An implementation of word2vec from Thushan Ganegedara
TensorFlow word2vec tutorial

Word embeddings

When you're dealing with language and words, you end up with tens of thousands of classes to predict, one for each word. Trying to one-hot encode these words is massively inefficient, you'll have one element set to 1 and the other 50,000 set to 0. The word2vec algorithm finds much more efficient representations by finding vectors that represent the words. These vectors also contain semantic information about the words. Words that show up in similar contexts, such as "black", "white", and "red" will have vectors near each other. There are two architectures for implementing word2vec, CBOW (Continuous Bag-Of-Words) and Skip-gram.

In this implementation, we'll be using the skip-gram architecture because it performs better than CBOW. Here, we pass in a word and try to predict the words surrounding it in the text. In this way, we can train the network to learn representations for words that show up in similar contexts.

First up, importing packages.



In [1]:

    
import time

import numpy as np
import tensorflow as tf

import utils

Load the text8 dataset, a file of cleaned up Wikipedia articles from Matt Mahoney. The next cell will download the data set to the data folder. Then you can extract it and delete the archive file to save storage space.



In [2]:

    
from urllib.request import urlretrieve
from os.path import isfile, isdir
from tqdm import tqdm
import zipfile

dataset_folder_path = 'data'
dataset_filename = 'text8.zip'
dataset_name = 'Text8 Dataset'

class DLProgress(tqdm):
    last_block = 0

    def hook(self, block_num=1, block_size=1, total_size=None):
        self.total = total_size
        self.update((block_num - self.last_block) * block_size)
        self.last_block = block_num

if not isfile(dataset_filename):
    with DLProgress(unit='B', unit_scale=True, miniters=1, desc=dataset_name) as pbar:
        urlretrieve(
            'http://mattmahoney.net/dc/text8.zip',
            dataset_filename,
            pbar.hook)

if not isdir(dataset_folder_path):
    with zipfile.ZipFile(dataset_filename) as zip_ref:
        zip_ref.extractall(dataset_folder_path)
        
with open('data/text8') as f:
    text = f.read()









    



Text8 Dataset: 31.4MB [01:08, 458KB/s]

Preprocessing

Here I'm fixing up the text to make training easier. This comes from the utils module I wrote. The preprocess function coverts any punctuation into tokens, so a period is changed to <PERIOD>. In this data set, there aren't any periods, but it will help in other NLP problems. I'm also removing all words that show up five or fewer times in the dataset. This will greatly reduce issues due to noise in the data and improve the quality of the vector representations. If you want to write your own functions for this stuff, go for it.



In [3]:

    
words = utils.preprocess(text)
print(words[:30])









    



['anarchism', 'originated', 'as', 'a', 'term', 'of', 'abuse', 'first', 'used', 'against', 'early', 'working', 'class', 'radicals', 'including', 'the', 'diggers', 'of', 'the', 'english', 'revolution', 'and', 'the', 'sans', 'culottes', 'of', 'the', 'french', 'revolution', 'whilst']



In [4]:

    
print("Total words: {}".format(len(words)))
print("Unique words: {}".format(len(set(words))))









    



Total words: 16680599
Unique words: 63641

And here I'm creating dictionaries to covert words to integers and backwards, integers to words. The integers are assigned in descending frequency order, so the most frequent word ("the") is given the integer 0 and the next most frequent is 1 and so on. The words are converted to integers and stored in the list int_words.



In [5]:

    
vocab_to_int, int_to_vocab = utils.create_lookup_tables(words)
int_words = [vocab_to_int[word] for word in words]

Subsampling

Words that show up often such as "the", "of", and "for" don't provide much context to the nearby words. If we discard some of them, we can remove some of the noise from our data and in return get faster training and better representations. This process is called subsampling by Mikolov. For each word $w_i$ in the training set, we'll discard it with probability given by

$$ P(w_i) = 1 - \sqrt{\frac{t}{f(w_i)}} $$

where $t$ is a threshold parameter and $f(w_i)$ is the frequency of word $w_i$ in the total dataset.

I'm going to leave this up to you as an exercise. This is more of a programming challenge, than about deep learning specifically. But, being able to prepare your data for your network is an important skill to have. Check out my solution to see how I did it.

Exercise: Implement subsampling for the words in int_words. That is, go through int_words and discard each word given the probablility $P(w_i)$ shown above. Note that $P(w_i)$ is the probability that a word is discarded. Assign the subsampled data to train_words.



In [12]:

    
# create a tabel to store word frequency
from collections import Counter
word_num = len(int_words)
word_counter = Counter()
for int_word in int_words:
    word_counter[int_word] += 1
word_freq = {int_word: word_counter[int_word] / word_num for int_word in word_counter.keys()}



In [13]:

    
word_freq[0]









    Out[13]:





0.06363056866243233



In [55]:

    
## Your code here
t = 1e-5
train_words = [] # The final subsampled word list
for int_word in int_words:
    p_discard = 1 - np.sqrt(t / word_freq[int_word])
    if np.random.rand((1)) > p_discard:
        train_words.append(int_word)



In [56]:

    
p_discard









    Out[56]:





0.90241381216729244



In [57]:

    
word_freq[int_word]









    Out[57]:





0.0010500821942905048



In [58]:

    
len(int_words)









    Out[58]:





16680599



In [59]:

    
len(train_words)









    Out[59]:





4630499

Making batches

Now that our data is in good shape, we need to get it into the proper form to pass it into our network. With the skip-gram architecture, for each word in the text, we want to grab all the words in a window around that word, with size $C$.

From Mikolov et al.:

"Since the more distant words are usually less related to the current word than those close to it, we give less weight to the distant words by sampling less from those words in our training examples... If we choose $C = 5$, for each training word we will select randomly a number $R$ in range $< 1; C >$, and then use $R$ words from history and $R$ words from the future of the current word as correct labels."

Exercise: Implement a function get_target that receives a list of words, an index, and a window size, then returns a list of words in the window around the index. Make sure to use the algorithm described above, where you choose a random number of words from the window.



In [98]:

    
def get_target(words, idx, window_size=5, random=True):
    ''' Get a list of words in a window around an index. '''
    # Your code here
    if random:
        R = np.random.randint(1, window_size+1)
    else:
        R = window_size
        
    start = idx - R if idx - R >0 else 0
    stop = idx + R
    out = set(words[start:idx] + words[start+1:stop+1])
    return list(out)

Here's a function that returns batches for our network. The idea is that it grabs batch_size words from a words list. Then for each of those words, it gets the target words in the window. I haven't found a way to pass in a random number of target words and get it to work with the architecture, so I make one row per input-target pair. This is a generator function by the way, helps save memory.



In [63]:

    
def get_batches(words, batch_size, window_size=5):
    ''' Create a generator of word batches as a tuple (inputs, targets) '''
    
    n_batches = len(words)//batch_size
    
    # only full batches
    words = words[:n_batches*batch_size]
    
    for idx in range(0, len(words), batch_size):
        x, y = [], []
        batch = words[idx:idx+batch_size]
        for ii in range(len(batch)):
            batch_x = batch[ii]
            batch_y = get_target(batch, ii, window_size)
            y.extend(batch_y)
            x.extend([batch_x]*len(batch_y))
        yield x, y

Building the graph

From Chris McCormick's blog, we can see the general structure of our network.

The input words are passed in as one-hot encoded vectors. This will go into a hidden layer of linear units, then into a softmax layer. We'll use the softmax layer to make a prediction like normal.

The idea here is to train the hidden layer weight matrix to find efficient representations for our words. This weight matrix is usually called the embedding matrix or embedding look-up table. We can discard the softmax layer becuase we don't really care about making predictions with this network. We just want the embedding matrix so we can use it in other networks we build from the dataset.

I'm going to have you build the graph in stages now. First off, creating the inputs and labels placeholders like normal.

Exercise: Assign inputs and labels using tf.placeholder. We're going to be passing in integers, so set the data types to tf.int32. The batches we're passing in will have varying sizes, so set the batch sizes to [None]. To make things work later, you'll need to set the second dimension of labels to None or 1.



In [84]:

    
train_graph = tf.Graph()
with train_graph.as_default():
    inputs = tf.placeholder(tf.int32, (None,))
    labels = tf.placeholder(tf.int32, (None, None))

Embedding

The embedding matrix has a size of the number of words by the number of neurons in the hidden layer. So, if you have 10,000 words and 300 hidden units, the matrix will have size $10,000 \times 300$. Remember that we're using one-hot encoded vectors for our inputs. When you do the matrix multiplication of the one-hot vector with the embedding matrix, you end up selecting only one row out of the entire matrix:

You don't actually need to do the matrix multiplication, you just need to select the row in the embedding matrix that corresponds to the input word. Then, the embedding matrix becomes a lookup table, you're looking up a vector the size of the hidden layer that represents the input word.

Exercise: Tensorflow provides a convenient function tf.nn.embedding_lookup that does this lookup for us. You pass in the embedding matrix and a tensor of integers, then it returns rows in the matrix corresponding to those integers. Below, set the number of embedding features you'll use (200 is a good start), create the embedding matrix variable, and use tf.nn.embedding_lookup to get the embedding tensors. For the embedding matrix, I suggest you initialize it with a uniform random numbers between -1 and 1 using tf.random_uniform. This TensorFlow tutorial will help if you get stuck.



In [90]:

    
n_vocab = len(int_to_vocab)
n_embedding = 256 # Number of embedding features 
with train_graph.as_default():
    embedding = tf.Variable(tf.random_uniform([n_vocab, n_embedding], -1, -1)) # create embedding weight matrix here
    embed = tf.nn.embedding_lookup(embedding, inputs) # use tf.nn.embedding_lookup to get the hidden layer output

Negative sampling

For every example we give the network, we train it using the output from the softmax layer. That means for each input, we're making very small changes to millions of weights even though we only have one true example. This makes training the network very inefficient. We can approximate the loss from the softmax layer by only updating a small subset of all the weights at once. We'll update the weights for the correct label, but only a small number of incorrect labels. This is called "negative sampling". Tensorflow has a convenient function to do this, tf.nn.sampled_softmax_loss.

Exercise: Below, create weights and biases for the softmax layer. Then, use tf.nn.sampled_softmax_loss to calculate the loss. Be sure to read the documentation to figure out how it works.



In [86]:

    
inputs.get_shape().as_list()









    Out[86]:





[None]



In [87]:

    
embedding.get_shape().as_list()









    Out[87]:





[63641, 256]



In [88]:

    
embed.get_shape().as_list()









    Out[88]:





[None, 256]



In [93]:

    
# Number of negative labels to sample
n_sampled = 100
with train_graph.as_default():
    softmax_w = tf.Variable(tf.truncated_normal([n_vocab, n_embedding], stddev=0.1)) # create softmax weight matrix here
    softmax_b = tf.Variable(tf.zeros(n_vocab))# create softmax biases here
    
    # Calculate the loss using negative sampling
    loss = tf.nn.sampled_softmax_loss(softmax_w, softmax_b, labels, embed, n_sampled, n_vocab ) 
    
    cost = tf.reduce_mean(loss)
    optimizer = tf.train.AdamOptimizer().minimize(cost)

Validation

This code is from Thushan Ganegedara's implementation. Here we're going to choose a few common words and few uncommon words. Then, we'll print out the closest words to them. It's a nice way to check that our embedding table is grouping together words with similar semantic meanings.



In [95]:

    
import random
with train_graph.as_default():
    ## From Thushan Ganegedara's implementation
    valid_size = 16 # Random set of words to evaluate similarity on.
    valid_window = 100
    # pick 8 samples from (0,100) and (1000,1100) each ranges. lower id implies more frequent 
    valid_examples = np.array(random.sample(range(valid_window), valid_size//2))
    valid_examples = np.append(valid_examples, 
                               random.sample(range(1000,1000+valid_window), valid_size//2))

    valid_dataset = tf.constant(valid_examples, dtype=tf.int32)
    
    # We use the cosine distance:
    norm = tf.sqrt(tf.reduce_sum(tf.square(embedding), 1, keep_dims=True))
    normalized_embedding = embedding / norm
    valid_embedding = tf.nn.embedding_lookup(normalized_embedding, valid_dataset)
    similarity = tf.matmul(valid_embedding, tf.transpose(normalized_embedding))



In [96]:

    
# If the checkpoints directory doesn't exist:
!mkdir checkpoints

Training

Below is the code to train the network. Every 100 batches it reports the training loss. Every 1000 batches, it'll print out the validation words.



In [99]:

    
epochs = 10
batch_size = 1000
window_size = 10

with train_graph.as_default():
    saver = tf.train.Saver()

with tf.Session(graph=train_graph) as sess:
    iteration = 1
    loss = 0
    sess.run(tf.global_variables_initializer())

    for e in range(1, epochs+1):
        batches = get_batches(train_words, batch_size, window_size)
        start = time.time()
        for x, y in batches:
            
            feed = {inputs: x,
                    labels: np.array(y)[:, None]}
            train_loss, _ = sess.run([cost, optimizer], feed_dict=feed)
            
            loss += train_loss
            
            if iteration % 100 == 0: 
                end = time.time()
                print("Epoch {}/{}".format(e, epochs),
                      "Iteration: {}".format(iteration),
                      "Avg. Training loss: {:.4f}".format(loss/100),
                      "{:.4f} sec/batch".format((end-start)/100))
                loss = 0
                start = time.time()
            
            if iteration % 1000 == 0:
                ## From Thushan Ganegedara's implementation
                # note that this is expensive (~20% slowdown if computed every 500 steps)
                sim = similarity.eval()
                for i in range(valid_size):
                    valid_word = int_to_vocab[valid_examples[i]]
                    top_k = 8 # number of nearest neighbors
                    nearest = (-sim[i, :]).argsort()[1:top_k+1]
                    log = 'Nearest to %s:' % valid_word
                    for k in range(top_k):
                        close_word = int_to_vocab[nearest[k]]
                        log = '%s %s,' % (log, close_word)
                    print(log)
            
            iteration += 1
    save_path = saver.save(sess, "checkpoints/text8.ckpt")
    embed_mat = sess.run(normalized_embedding)









    



Epoch 1/10 Iteration: 100 Avg. Training loss: 5.9884 0.1222 sec/batch
Epoch 1/10 Iteration: 200 Avg. Training loss: 7.4473 0.1125 sec/batch
Epoch 1/10 Iteration: 300 Avg. Training loss: 9.0847 0.1132 sec/batch
Epoch 1/10 Iteration: 400 Avg. Training loss: 10.8268 0.1128 sec/batch
Epoch 1/10 Iteration: 500 Avg. Training loss: 12.1550 0.1184 sec/batch
Epoch 1/10 Iteration: 600 Avg. Training loss: 13.2720 0.1257 sec/batch
Epoch 1/10 Iteration: 700 Avg. Training loss: 14.4383 0.1196 sec/batch
Epoch 1/10 Iteration: 800 Avg. Training loss: 15.1760 0.1185 sec/batch
Epoch 1/10 Iteration: 900 Avg. Training loss: 15.7819 0.1170 sec/batch
Epoch 1/10 Iteration: 1000 Avg. Training loss: 16.9411 0.1179 sec/batch
Nearest to known: employer, electorates, wafer, asean, foes, pashtunistan, dodge, rote,
Nearest to after: conquers, joris, newark, fyodor, theorie, oneself, boundless, playmates,
Nearest to however: legalism, rayleigh, bestows, dangerously, tk, fabled, monitors, whiskers,
Nearest to these: bloating, simplistic, automatism, fremantle, hyperglycemia, starved, putatively, depictions,
Nearest to a: oklahoma, neighbourhood, kauffman, seeing, bernhardt, purveyor, snippets, pitch,
Nearest to there: dove, bedouins, overcoming, spp, claws, racquets, aglaulus, ectopic,
Nearest to system: alcoba, abreu, incurable, hiroshima, sympathizer, connote, berkman, rabba,
Nearest to most: avant, garde, fries, dinner, bakery, nursery, eschatology, dudley,
Nearest to versions: intervocalic, morphologically, saturnalia, tungusic, jutsu, undp, revolutionists, natufian,
Nearest to know: grin, keyhole, splashes, oes, rhus, emedicine, classificatory, emo,
Nearest to pressure: anasazi, solf, refrigerant, nolo, nitrates, overlying, weierstrass, gallaudet,
Nearest to cost: gynecology, pic, salk, kangra, remodelled, paedophilia, migraine, rescind,
Nearest to mean: variance, almond, jmu, diversifying, retour, tchad, affaire, lillian,
Nearest to magazine: swayed, iana, boh, kournikova, anselm, automorphism, utf, hingis,
Nearest to behind: keto, exhumed, quarterback, kick, glanville, creighton, fumble, pendergast,
Nearest to governor: teaming, vechten, forrester, heijenoort, zalta, penciller, plantes, theobromine,
Epoch 1/10 Iteration: 1100 Avg. Training loss: 16.6916 0.1203 sec/batch
Epoch 1/10 Iteration: 1200 Avg. Training loss: 17.7740 0.1204 sec/batch
Epoch 1/10 Iteration: 1300 Avg. Training loss: 18.1315 0.1242 sec/batch
Epoch 1/10 Iteration: 1400 Avg. Training loss: 18.5439 0.1254 sec/batch
Epoch 1/10 Iteration: 1500 Avg. Training loss: 18.5988 0.1256 sec/batch
Epoch 1/10 Iteration: 1600 Avg. Training loss: 18.3287 0.1391 sec/batch
Epoch 1/10 Iteration: 1700 Avg. Training loss: 18.5151 0.1389 sec/batch
Epoch 1/10 Iteration: 1800 Avg. Training loss: 18.8152 0.1408 sec/batch
Epoch 1/10 Iteration: 1900 Avg. Training loss: 18.9214 0.1428 sec/batch
Epoch 1/10 Iteration: 2000 Avg. Training loss: 18.4651 0.1528 sec/batch
Nearest to known: employer, aircraft, castro, impair, conqueror, nigeria, harbours, mike,
Nearest to after: screens, bob, host, joris, girls, revival, atomic, wages,
Nearest to however: allies, monitors, seeking, julian, succeeded, demo, toxic, calculations,
Nearest to these: comedian, surviving, linguists, mentioned, vernacular, coalitions, assertion, sydney,
Nearest to a: high, own, important, systems, those, number, than, left,
Nearest to there: stretch, reminiscent, recipients, suppression, canceled, winners, standing, scored,
Nearest to system: torch, aug, moderated, laozi, alcoba, receives, founders, gia,
Nearest to most: garde, output, avant, latin, fly, purchase, east, happened,
Nearest to versions: spalding, tile, accademia, bemani, morphologically, asiatic, tungusic, terrestris,
Nearest to know: dada, borgir, grin, quartered, transuranic, inaugurating, comptes, cycladic,
Nearest to pressure: vapor, boiling, chromaticity, xm, saturation, hubble, schumpeter, nolo,
Nearest to cost: twine, normalize, apocalypses, characterise, interleaving, kangra, frosty, helmsman,
Nearest to mean: variance, almond, almonds, jmu, diversifying, pers, coherency, contrivance,
Nearest to magazine: stradella, azteca, hyland, steadfastly, welk, oop, easters, anselm,
Nearest to behind: skinner, fullback, ishi, rien, tarkenton, stances, randle, childress,
Nearest to governor: vechten, coupler, ogura, teaming, abad, susima, dich, bindusara,
Epoch 1/10 Iteration: 2100 Avg. Training loss: 18.7629 0.1488 sec/batch
Epoch 1/10 Iteration: 2200 Avg. Training loss: 18.8296 0.1465 sec/batch
Epoch 1/10 Iteration: 2300 Avg. Training loss: 19.0572 0.1580 sec/batch
Epoch 1/10 Iteration: 2400 Avg. Training loss: 18.7344 0.1499 sec/batch
Epoch 1/10 Iteration: 2500 Avg. Training loss: 19.4661 0.1531 sec/batch
Epoch 1/10 Iteration: 2600 Avg. Training loss: 18.2653 0.1493 sec/batch
Epoch 1/10 Iteration: 2700 Avg. Training loss: 17.8207 0.1528 sec/batch
Epoch 1/10 Iteration: 2800 Avg. Training loss: 18.7666 0.1547 sec/batch
Epoch 1/10 Iteration: 2900 Avg. Training loss: 18.3354 0.1544 sec/batch
Epoch 1/10 Iteration: 3000 Avg. Training loss: 18.2032 0.1502 sec/batch
Nearest to known: introduced, instance, successful, historical, rise, initially, records, phenomenon,
Nearest to after: another, man, possibly, take, below, host, york, class,
Nearest to however: wide, we, you, free, individual, divided, head, allies,
Nearest to these: commonly, introduced, per, remains, mentioned, sources, along, lived,
Nearest to a: high, important, those, own, life, left, systems, could,
Nearest to there: entire, national, likely, king, gained, us, allowed, addition,
Nearest to system: prominent, sweden, queen, torch, founders, opposite, possibility, depicted,
Nearest to most: east, latin, government, already, rather, introduction, account, types,
Nearest to versions: nigger, accademia, quintessential, hyperion, scotty, maul, reconfiguration, gamefaqs,
Nearest to know: maze, thoughtful, diverges, pulsating, sears, omission, reorganizing, unexplained,
Nearest to pressure: boiling, vapor, saturation, hubble, schumpeter, tre, conformal, gravitationally,
Nearest to cost: exhumed, madame, guerilla, distract, ringed, ecological, ammonius, peary,
Nearest to mean: variance, almond, propagated, edible, oecs, butte, kernel, mired,
Nearest to magazine: anointed, caterpillar, oop, provably, tabernacle, distillation, grabbing, milman,
Nearest to behind: skinner, superstition, elmwood, randle, cain, ers, dignity, thursday,
Nearest to governor: miki, typifies, prophesies, hellenized, emden, reconciling, serendipity, unranked,
Epoch 1/10 Iteration: 3100 Avg. Training loss: 17.8752 0.1578 sec/batch
Epoch 1/10 Iteration: 3200 Avg. Training loss: 18.2069 0.1496 sec/batch
Epoch 1/10 Iteration: 3300 Avg. Training loss: 17.0041 0.1496 sec/batch
Epoch 1/10 Iteration: 3400 Avg. Training loss: 17.0014 0.1517 sec/batch
Epoch 1/10 Iteration: 3500 Avg. Training loss: 17.3774 0.1507 sec/batch
Epoch 1/10 Iteration: 3600 Avg. Training loss: 17.5689 0.1507 sec/batch
Epoch 1/10 Iteration: 3700 Avg. Training loss: 16.9709 0.1512 sec/batch
Epoch 1/10 Iteration: 3800 Avg. Training loss: 17.1936 0.1492 sec/batch
Epoch 1/10 Iteration: 3900 Avg. Training loss: 16.7252 0.1483 sec/batch
Epoch 1/10 Iteration: 4000 Avg. Training loss: 16.6275 0.1484 sec/batch
Nearest to known: instance, initially, whole, music, rise, notable, introduced, highly,
Nearest to after: another, man, below, possibly, office, leaving, economic, replaced,
Nearest to however: wide, you, individual, free, we, elected, active, divided,
Nearest to these: remains, efforts, laws, bring, mentioned, future, play, positions,
Nearest to a: high, important, adopted, as, those, own, or, highly,
Nearest to there: gained, entire, won, likely, w, ten, allowed, court,
Nearest to system: prominent, organization, remaining, miles, possibility, causes, queen, expected,
Nearest to most: latin, east, account, types, introduction, shows, contain, rock,
Nearest to versions: hook, tile, tiles, haarlem, nigger, predate, asiatic, pyramids,
Nearest to know: maybe, exceptional, mandatory, sums, tips, efficacy, inaugurated, sears,
Nearest to pressure: boiling, vapor, saturation, hubble, tre, bang, morgan, pi,
Nearest to cost: disciples, sworn, ecological, steer, madame, slot, scripts, marched,
Nearest to mean: annexation, kernel, almond, variance, almonds, propagated, devoid, ij,
Nearest to magazine: anointed, restrictive, mon, zeppelin, miners, distillation, guild, teamed,
Nearest to behind: skinner, founding, superstition, dignity, catch, honorary, behavior, gaming,
Nearest to governor: roaming, caput, hellenized, powerfully, shropshire, viability, ecowas, miki,
Epoch 1/10 Iteration: 4100 Avg. Training loss: 16.9160 0.1516 sec/batch
Epoch 1/10 Iteration: 4200 Avg. Training loss: 16.0120 0.1470 sec/batch
Epoch 1/10 Iteration: 4300 Avg. Training loss: 15.6134 0.1485 sec/batch
Epoch 1/10 Iteration: 4400 Avg. Training loss: 15.6046 0.1466 sec/batch
Epoch 1/10 Iteration: 4500 Avg. Training loss: 15.5675 0.1486 sec/batch
Epoch 1/10 Iteration: 4600 Avg. Training loss: 15.9050 0.1482 sec/batch
Epoch 2/10 Iteration: 4700 Avg. Training loss: 15.6969 0.1020 sec/batch
Epoch 2/10 Iteration: 4800 Avg. Training loss: 15.2623 0.1493 sec/batch
Epoch 2/10 Iteration: 4900 Avg. Training loss: 16.3304 0.1473 sec/batch
Epoch 2/10 Iteration: 5000 Avg. Training loss: 15.0107 0.1454 sec/batch
Nearest to known: instance, hours, differences, records, sites, phenomenon, rise, seeing,
Nearest to after: another, possibly, office, space, hands, below, man, section,
Nearest to however: succeeded, wide, hands, rome, active, to, equipment, divided,
Nearest to these: bring, impossible, efforts, argue, regular, developing, positions, laws,
Nearest to a: high, as, important, adopted, involving, or, schools, employed,
Nearest to there: made, gained, apparently, supposed, review, rapidly, picture, showed,
Nearest to system: prominent, miles, capable, paper, possibility, remaining, queen, causes,
Nearest to most: latin, account, east, contain, types, of, basic, be,
Nearest to versions: nova, receives, stream, explosion, economies, ibm, uniform, uniquely,
Nearest to know: bringing, piece, publications, maybe, joe, empty, exceptional, probable,
Nearest to pressure: boiling, gas, expansion, vapor, freely, morgan, pi, aged,
Nearest to cost: disciples, scattered, relates, exploit, bus, sworn, scripts, database,
Nearest to mean: participate, handle, inhabitants, statistics, descriptions, divide, unemployment, di,
Nearest to magazine: roads, van, ring, islamic, illustrated, stress, flourished, scheme,
Nearest to behind: founding, describing, behavior, skinner, anything, route, neutral, thanks,
Nearest to governor: warrior, simulation, shoot, math, phillip, appoint, verb, viability,
Epoch 2/10 Iteration: 5100 Avg. Training loss: 15.1408 0.1455 sec/batch
Epoch 2/10 Iteration: 5200 Avg. Training loss: 15.1743 0.1471 sec/batch
Epoch 2/10 Iteration: 5300 Avg. Training loss: 14.4042 0.1468 sec/batch
Epoch 2/10 Iteration: 5400 Avg. Training loss: 14.3229 0.1457 sec/batch
Epoch 2/10 Iteration: 5500 Avg. Training loss: 13.9615 0.1442 sec/batch
Epoch 2/10 Iteration: 5600 Avg. Training loss: 13.8479 0.1452 sec/batch
Epoch 2/10 Iteration: 5700 Avg. Training loss: 13.3202 0.1478 sec/batch
Epoch 2/10 Iteration: 5800 Avg. Training loss: 13.3227 0.1485 sec/batch
Epoch 2/10 Iteration: 5900 Avg. Training loss: 12.9006 0.1482 sec/batch
Epoch 2/10 Iteration: 6000 Avg. Training loss: 13.4509 0.1455 sec/batch
Nearest to known: instance, hours, phenomenon, initially, sites, less, rise, differences,
Nearest to after: another, interested, chemical, rapid, identity, hands, office, space,
Nearest to however: to, allies, wide, seeking, head, it, equipment, succeeded,
Nearest to these: also, bring, argue, efforts, interpreted, past, remains, representing,
Nearest to a: as, the, high, of, important, or, by, adopted,
Nearest to there: made, entire, of, national, supposed, apparently, no, gained,
Nearest to system: prominent, repeatedly, remaining, capable, organization, depicted, captain, taught,
Nearest to most: of, be, account, types, the, contain, east, reaching,
Nearest to versions: receives, uniform, plane, proof, stream, advance, nova, economies,
Nearest to know: empty, joe, maybe, exceptional, rolling, probable, stars, populated,
Nearest to pressure: gas, expansion, freely, choosing, boiling, fly, surrounded, revival,
Nearest to cost: scattered, reaction, identification, disciples, trip, christ, database, respective,
Nearest to mean: participate, divide, handle, descriptions, unemployment, reserved, directions, corruption,
Nearest to magazine: roads, variable, illustrated, van, ring, aims, organisation, pair,
Nearest to behind: bob, machines, presumably, texas, murder, landed, pushed, telling,
Nearest to governor: warrior, federation, block, mechanism, confusing, permits, symbolic, christians,
Epoch 2/10 Iteration: 6100 Avg. Training loss: 13.5078 0.1477 sec/batch
Epoch 2/10 Iteration: 6200 Avg. Training loss: 12.5025 0.1462 sec/batch
Epoch 2/10 Iteration: 6300 Avg. Training loss: 12.6681 0.1456 sec/batch
Epoch 2/10 Iteration: 6400 Avg. Training loss: 12.4807 0.1490 sec/batch
Epoch 2/10 Iteration: 6500 Avg. Training loss: 12.9303 0.1511 sec/batch
Epoch 2/10 Iteration: 6600 Avg. Training loss: 12.5207 0.1471 sec/batch
Epoch 2/10 Iteration: 6700 Avg. Training loss: 11.7224 0.1484 sec/batch
Epoch 2/10 Iteration: 6800 Avg. Training loss: 12.1221 0.1629 sec/batch
Epoch 2/10 Iteration: 6900 Avg. Training loss: 12.4104 0.1569 sec/batch
Epoch 2/10 Iteration: 7000 Avg. Training loss: 11.7765 0.1622 sec/batch
Nearest to known: the, instance, of, popularly, phenomenon, opinion, and, hours,
Nearest to after: another, and, the, to, were, was, interested, of,
Nearest to however: to, the, it, and, allies, that, by, can,
Nearest to these: also, argue, bring, efforts, but, interpreted, in, remains,
Nearest to a: the, of, as, in, and, or, by, to,
Nearest to there: the, of, made, a, apparently, national, by, to,
Nearest to system: the, first, prominent, that, remaining, of, with, captain,
Nearest to most: of, the, be, output, and, types, is, to,
Nearest to versions: receives, uniform, of, advance, explosion, portal, economies, zones,
Nearest to know: bringing, typically, for, joe, empty, contained, breaking, physics,
Nearest to pressure: expansion, gas, decided, tracks, proceed, ministers, significant, federal,
Nearest to cost: reaction, christ, scattered, respective, database, tested, cheap, relates,
Nearest to mean: be, divide, participate, unemployment, differences, occupy, dean, resolve,
Nearest to magazine: variable, aims, pull, determines, roads, alter, illustrated, gross,
Nearest to behind: soon, power, describing, murder, region, catch, stopped, pushed,
Nearest to governor: warrior, confusing, disagreement, permits, math, federation, simulation, foods,
Epoch 2/10 Iteration: 7100 Avg. Training loss: 12.3387 0.1585 sec/batch
Epoch 2/10 Iteration: 7200 Avg. Training loss: 11.5431 0.1595 sec/batch
Epoch 2/10 Iteration: 7300 Avg. Training loss: 11.0456 0.1545 sec/batch
Epoch 2/10 Iteration: 7400 Avg. Training loss: 11.4277 0.1556 sec/batch
Epoch 2/10 Iteration: 7500 Avg. Training loss: 11.4505 0.1544 sec/batch
Epoch 2/10 Iteration: 7600 Avg. Training loss: 10.9850 0.1559 sec/batch
Epoch 2/10 Iteration: 7700 Avg. Training loss: 10.8433 0.1563 sec/batch
Epoch 2/10 Iteration: 7800 Avg. Training loss: 11.1346 0.1587 sec/batch
Epoch 2/10 Iteration: 7900 Avg. Training loss: 10.2599 0.1588 sec/batch
Epoch 2/10 Iteration: 8000 Avg. Training loss: 10.3976 0.1592 sec/batch
Nearest to known: the, of, and, a, popularly, opinion, instance, to,
Nearest to after: and, another, the, of, to, was, one, were,
Nearest to however: to, the, and, it, of, that, in, by,
Nearest to these: also, in, bring, the, argue, of, but, efforts,
Nearest to a: the, of, in, and, as, to, for, is,
Nearest to there: of, the, a, to, and, by, as, in,
Nearest to system: the, of, first, in, with, that, and, a,
Nearest to most: the, of, be, and, to, in, is, by,
Nearest to versions: of, the, receives, queen, plane, legal, note, two,
Nearest to know: for, to, bringing, the, already, contained, typically, even,
Nearest to pressure: of, the, to, decided, with, it, for, and,
Nearest to cost: readily, of, respective, zero, cheap, scattered, reaction, expensive,
Nearest to mean: be, to, of, and, the, a, one, differences,
Nearest to magazine: such, freely, the, of, a, aims, also, ways,
Nearest to behind: are, soon, power, region, describing, to, nearly, average,
Nearest to governor: warrior, determine, reason, internationally, governing, countries, indicates, federation,
Epoch 2/10 Iteration: 8100 Avg. Training loss: 10.4807 0.1626 sec/batch
Epoch 2/10 Iteration: 8200 Avg. Training loss: 11.1478 0.1577 sec/batch
Epoch 2/10 Iteration: 8300 Avg. Training loss: 10.0715 0.1565 sec/batch
Epoch 2/10 Iteration: 8400 Avg. Training loss: 10.9091 0.1583 sec/batch
Epoch 2/10 Iteration: 8500 Avg. Training loss: 10.8919 0.1585 sec/batch
Epoch 2/10 Iteration: 8600 Avg. Training loss: 9.7092 0.1563 sec/batch
Epoch 2/10 Iteration: 8700 Avg. Training loss: 10.8900 0.1622 sec/batch
Epoch 2/10 Iteration: 8800 Avg. Training loss: 10.2301 0.1634 sec/batch
Epoch 2/10 Iteration: 8900 Avg. Training loss: 9.6703 0.1609 sec/batch
Epoch 2/10 Iteration: 9000 Avg. Training loss: 9.6427 0.1611 sec/batch
Nearest to known: the, popularly, of, a, opinion, and, as, to,
Nearest to after: and, was, another, of, the, to, one, elections,
Nearest to however: to, the, it, of, and, that, allies, by,
Nearest to these: also, in, the, argue, of, a, but, bring,
Nearest to a: of, the, in, to, as, and, by, is,
Nearest to there: of, the, a, to, in, as, by, have,
Nearest to system: the, of, expand, with, prominent, that, first, in,
Nearest to most: of, the, be, to, is, in, and, a,
Nearest to versions: of, the, a, legal, plane, receives, other, queen,
Nearest to know: for, to, already, bringing, even, are, is, the,
Nearest to pressure: of, the, to, it, with, significant, billion, for,
Nearest to cost: of, deemed, openly, readily, zero, in, racial, touch,
Nearest to mean: be, to, of, and, a, the, is, as,
Nearest to magazine: the, freely, such, of, a, gross, to, and,
Nearest to behind: to, are, soon, poverty, power, promising, in, describing,
Nearest to governor: eight, warrior, two, governing, determine, elections, publishing, countries,
Epoch 2/10 Iteration: 9100 Avg. Training loss: 9.7150 0.1602 sec/batch
Epoch 2/10 Iteration: 9200 Avg. Training loss: 9.8139 0.1590 sec/batch
Epoch 3/10 Iteration: 9300 Avg. Training loss: 9.7099 0.0715 sec/batch
Epoch 3/10 Iteration: 9400 Avg. Training loss: 9.4727 0.1644 sec/batch
Epoch 3/10 Iteration: 9500 Avg. Training loss: 9.8487 0.1584 sec/batch
Epoch 3/10 Iteration: 9600 Avg. Training loss: 9.7990 0.1620 sec/batch
Epoch 3/10 Iteration: 9700 Avg. Training loss: 9.3782 0.1547 sec/batch
Epoch 3/10 Iteration: 9800 Avg. Training loss: 9.8312 0.1626 sec/batch
Epoch 3/10 Iteration: 9900 Avg. Training loss: 9.0418 0.1549 sec/batch
Epoch 3/10 Iteration: 10000 Avg. Training loss: 8.8901 0.1588 sec/batch
Nearest to known: popularly, opinion, of, conclude, the, transmission, hours, a,
Nearest to after: was, another, elections, were, enhanced, one, to, somewhere,
Nearest to however: to, it, allies, seldom, that, the, of, and,
Nearest to these: also, tremendous, argue, bring, in, a, the, efforts,
Nearest to a: of, in, to, as, the, it, and, is,
Nearest to there: of, a, the, to, have, antiquity, apparently, is,
Nearest to system: expand, of, the, authorized, prominent, screen, with, location,
Nearest to most: the, evolve, of, be, continental, is, to, happened,
Nearest to versions: of, receives, the, legal, plane, a, zones, discontinued,
Nearest to know: to, for, not, we, bringing, even, is, already,
Nearest to pressure: of, the, with, it, to, significant, billion, revival,
Nearest to cost: racial, of, expensive, motivated, absolute, respective, deemed, grants,
Nearest to mean: be, myself, to, of, unemployment, is, corruption, as,
Nearest to magazine: gross, freely, expand, pull, such, the, organised, seek,
Nearest to behind: to, promising, are, soon, attractive, poverty, seldom, bottom,
Nearest to governor: elections, governing, eight, two, warrior, thirteen, countries, determine,
Epoch 3/10 Iteration: 10100 Avg. Training loss: 8.8866 0.1553 sec/batch
Epoch 3/10 Iteration: 10200 Avg. Training loss: 9.0175 0.1554 sec/batch
Epoch 3/10 Iteration: 10300 Avg. Training loss: 8.7071 0.1581 sec/batch
Epoch 3/10 Iteration: 10400 Avg. Training loss: 8.2922 0.1600 sec/batch
Epoch 3/10 Iteration: 10500 Avg. Training loss: 8.5094 0.1592 sec/batch
Epoch 3/10 Iteration: 10600 Avg. Training loss: 8.3139 0.1576 sec/batch
Epoch 3/10 Iteration: 10700 Avg. Training loss: 9.1795 0.1689 sec/batch
Epoch 3/10 Iteration: 10800 Avg. Training loss: 8.5956 0.1584 sec/batch
Epoch 3/10 Iteration: 10900 Avg. Training loss: 8.0448 0.1603 sec/batch
Epoch 3/10 Iteration: 11000 Avg. Training loss: 8.2951 0.1576 sec/batch
Nearest to known: popularly, conclude, opinion, misleading, transmission, everyone, viii, antiquity,
Nearest to after: another, enhanced, successes, seldom, elections, was, somewhere, firmly,
Nearest to however: to, prestige, seldom, allies, campus, modifications, it, struggled,
Nearest to these: tremendous, satisfaction, illustrate, entities, bring, argue, acceptable, farther,
Nearest to a: reasonably, accessible, subtle, inaccurate, sects, solved, measurements, transmitted,
Nearest to there: stretch, reminiscent, zones, antiquity, grants, neighboring, wouldn, progressed,
Nearest to system: expand, judiciary, authorized, managing, discuss, priority, screen, receives,
Nearest to most: evolve, govern, expulsion, continental, racial, universally, grants, orthodoxy,
Nearest to versions: receives, of, zones, interpretations, plane, uniform, discontinued, cultures,
Nearest to know: exceptional, maybe, empty, balanced, capabilities, distinguishes, labeled, for,
Nearest to pressure: proceed, with, billion, backing, significant, depended, of, tracks,
Nearest to cost: supremacy, racial, instances, expensive, informally, motivated, respective, deemed,
Nearest to mean: myself, be, divide, desirable, corruption, suspect, differing, unemployment,
Nearest to magazine: gross, talents, freely, aims, flourished, broadcasts, struggled, organised,
Nearest to behind: promising, attractive, minds, seldom, soon, missing, consume, classics,
Nearest to governor: governing, elections, legislative, appointed, warrior, thirteen, eight, disagreement,
Epoch 3/10 Iteration: 11100 Avg. Training loss: 8.5049 0.1653 sec/batch
Epoch 3/10 Iteration: 11200 Avg. Training loss: 8.7357 0.1633 sec/batch
Epoch 3/10 Iteration: 11300 Avg. Training loss: 7.8602 0.1665 sec/batch
Epoch 3/10 Iteration: 11400 Avg. Training loss: 7.9983 0.1543 sec/batch
Epoch 3/10 Iteration: 11500 Avg. Training loss: 8.1702 0.1568 sec/batch
Epoch 3/10 Iteration: 11600 Avg. Training loss: 7.9275 0.1587 sec/batch
Epoch 3/10 Iteration: 11700 Avg. Training loss: 7.9733 0.1560 sec/batch
Epoch 3/10 Iteration: 11800 Avg. Training loss: 8.2327 0.1590 sec/batch
Epoch 3/10 Iteration: 11900 Avg. Training loss: 7.4341 0.1579 sec/batch
Epoch 3/10 Iteration: 12000 Avg. Training loss: 7.8394 0.1576 sec/batch
Nearest to known: popularly, conclude, suggestion, commissioner, transmission, recognise, empirical, opinion,
Nearest to after: another, devastated, firmly, disrupted, oneself, session, dominate, controversies,
Nearest to however: prestige, detect, modifications, struggled, seldom, to, campus, jews,
Nearest to these: satisfaction, courses, illustrate, suspect, empirical, entities, shoot, verse,
Nearest to a: experimentation, sects, interact, reasonably, intuitive, platforms, inaccurate, commanding,
Nearest to there: reminiscent, antiquity, stretch, happiness, zones, individually, minds, payment,
Nearest to system: managing, coordinate, summarized, judiciary, elementary, expand, graphics, authorized,
Nearest to most: evolve, expulsion, flourished, foreigners, orthodoxy, conform, govern, universally,
Nearest to versions: vendors, inaccurate, recognizes, zones, receives, histories, cultures, discontinued,
Nearest to know: maybe, exceptional, empty, violated, distinguishes, we, recall, imagine,
Nearest to pressure: depended, proceed, sectors, negligible, tracks, regulate, detect, intent,
Nearest to cost: supremacy, instances, distribute, racial, informally, exploit, tribal, costs,
Nearest to mean: myself, be, suspect, divide, desirable, boost, corruption, locate,
Nearest to magazine: gross, flourished, talents, licensing, aims, broadcasts, freely, overlooked,
Nearest to behind: reconcile, attractive, promising, trips, dignity, deprived, seldom, minds,
Nearest to governor: elections, governing, appointed, legislative, suffrage, warrior, voted, assassination,
Epoch 3/10 Iteration: 12100 Avg. Training loss: 8.1086 0.1581 sec/batch
Epoch 3/10 Iteration: 12200 Avg. Training loss: 7.4738 0.1562 sec/batch
Epoch 3/10 Iteration: 12300 Avg. Training loss: 7.6671 0.1561 sec/batch
Epoch 3/10 Iteration: 12400 Avg. Training loss: 7.6531 0.1589 sec/batch
Epoch 3/10 Iteration: 12500 Avg. Training loss: 7.4769 0.1601 sec/batch
Epoch 3/10 Iteration: 12600 Avg. Training loss: 7.1155 0.1710 sec/batch
Epoch 3/10 Iteration: 12700 Avg. Training loss: 7.3641 0.1555 sec/batch
Epoch 3/10 Iteration: 12800 Avg. Training loss: 7.8831 0.1394 sec/batch
Epoch 3/10 Iteration: 12900 Avg. Training loss: 7.0139 0.1388 sec/batch
Epoch 3/10 Iteration: 13000 Avg. Training loss: 7.5400 0.1415 sec/batch
Nearest to known: popularly, conclude, commissioner, suggestion, transmission, recognise, flourishing, verse,
Nearest to after: aristocratic, another, devastated, statute, ceremony, bob, stuck, was,
Nearest to however: prestige, adjusted, jews, detect, implied, forcibly, visually, subordinate,
Nearest to these: satisfaction, courses, empirical, suspect, illustrate, sheer, verse, trusted,
Nearest to a: experimentation, intuitive, reasonably, hierarchy, interact, manipulation, platforms, attach,
Nearest to there: observable, happiness, believers, reminiscent, entails, jurisdiction, stretch, antiquity,
Nearest to system: systems, convey, coordinate, summarized, cellular, bus, overlapping, graphics,
Nearest to most: evolve, expulsion, negligible, foreigners, nomadic, clarify, orthodoxy, investors,
Nearest to versions: generates, vendors, inaccurate, interfaces, histories, recognizes, zones, corner,
Nearest to know: exceptional, distinguishes, maybe, we, recall, empty, violated, imagine,
Nearest to pressure: depended, regulate, attracts, intent, compute, negligible, proceed, significant,
Nearest to cost: supremacy, expensive, exploit, delays, informally, upgrade, instances, distribute,
Nearest to mean: be, myself, appreciated, locate, suspect, sexuality, contradiction, differences,
Nearest to magazine: gross, restrictive, talents, flourished, licensing, broadcasts, overlooked, freely,
Nearest to behind: dignity, reconcile, deprived, ears, complications, attractive, promising, trips,
Nearest to governor: appointed, assassination, legislative, elections, appoint, suffrage, emmanuel, unicameral,
Epoch 3/10 Iteration: 13100 Avg. Training loss: 8.2283 0.1543 sec/batch
Epoch 3/10 Iteration: 13200 Avg. Training loss: 6.9819 0.1514 sec/batch
Epoch 3/10 Iteration: 13300 Avg. Training loss: 7.7581 0.1472 sec/batch
Epoch 3/10 Iteration: 13400 Avg. Training loss: 7.4298 0.1522 sec/batch
Epoch 3/10 Iteration: 13500 Avg. Training loss: 6.9998 0.1477 sec/batch
Epoch 3/10 Iteration: 13600 Avg. Training loss: 7.1100 0.1504 sec/batch
Epoch 3/10 Iteration: 13700 Avg. Training loss: 7.0449 0.1487 sec/batch
Epoch 3/10 Iteration: 13800 Avg. Training loss: 7.1391 0.1488 sec/batch
Epoch 4/10 Iteration: 13900 Avg. Training loss: 7.2902 0.0161 sec/batch
Epoch 4/10 Iteration: 14000 Avg. Training loss: 7.2131 0.1522 sec/batch
Nearest to known: popularly, flourishing, strained, norse, suggestion, tragic, guides, conclude,
Nearest to after: aristocratic, devastated, disbanded, dismissal, dominate, ellen, was, factions,
Nearest to however: visually, prestige, metric, implied, outsiders, confuse, detect, forcibly,
Nearest to these: satisfaction, foundational, outcomes, empirical, trusted, courses, specifies, instrumentation,
Nearest to a: experimentation, intuitive, attach, transforming, tolerated, hierarchy, neighbourhood, manipulation,
Nearest to there: observable, entails, behaviors, jurisdiction, reminiscent, happiness, checks, no,
Nearest to system: systems, operating, cellular, dragon, employers, correlation, summarized, applies,
Nearest to most: evolve, negligible, employers, clarify, foreigners, midwest, investors, nomadic,
Nearest to versions: ibm, compatible, directories, utilities, interfaces, vendors, generates, unix,
Nearest to know: distinguishes, we, maybe, you, anybody, exceptional, empty, imagine,
Nearest to pressure: depended, attracts, adjust, compute, significant, predict, regulate, symmetry,
Nearest to cost: expensive, costs, upgrade, delays, melodies, exploit, standardization, slot,
Nearest to mean: geometric, be, contradiction, locate, x, multiplication, differences, profoundly,
Nearest to magazine: restrictive, gross, talents, licensing, stature, murderer, interviews, rid,
Nearest to behind: dignity, reconcile, complications, undermined, delegation, progression, amidst, notoriety,
Nearest to governor: appointed, legislative, unicameral, elections, bicameral, peacefully, elected, emmanuel,
Epoch 4/10 Iteration: 14100 Avg. Training loss: 7.0769 0.1511 sec/batch
Epoch 4/10 Iteration: 14200 Avg. Training loss: 7.3763 0.1486 sec/batch
Epoch 4/10 Iteration: 14300 Avg. Training loss: 6.8873 0.1494 sec/batch
Epoch 4/10 Iteration: 14400 Avg. Training loss: 7.4647 0.1499 sec/batch
Epoch 4/10 Iteration: 14500 Avg. Training loss: 7.0009 0.1531 sec/batch
Epoch 4/10 Iteration: 14600 Avg. Training loss: 6.8213 0.1556 sec/batch
Epoch 4/10 Iteration: 14700 Avg. Training loss: 6.8047 0.1481 sec/batch
Epoch 4/10 Iteration: 14800 Avg. Training loss: 6.9463 0.1379 sec/batch
Epoch 4/10 Iteration: 14900 Avg. Training loss: 6.8338 0.1434 sec/batch






    



---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-99-b1d3869cd0d6> in <module>()
     18             feed = {inputs: x,
     19                     labels: np.array(y)[:, None]}
---> 20             train_loss, _ = sess.run([cost, optimizer], feed_dict=feed)
     21 
     22             loss += train_loss

/home/luo/anaconda2/envs/dlnd/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    765     try:
    766       result = self._run(None, fetches, feed_dict, options_ptr,
--> 767                          run_metadata_ptr)
    768       if run_metadata:
    769         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/home/luo/anaconda2/envs/dlnd/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
    963     if final_fetches or final_targets:
    964       results = self._do_run(handle, final_targets, final_fetches,
--> 965                              feed_dict_string, options, run_metadata)
    966     else:
    967       results = []

/home/luo/anaconda2/envs/dlnd/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1013     if handle is None:
   1014       return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
-> 1015                            target_list, options, run_metadata)
   1016     else:
   1017       return self._do_call(_prun_fn, self._session, handle, feed_dict,

/home/luo/anaconda2/envs/dlnd/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1020   def _do_call(self, fn, *args):
   1021     try:
-> 1022       return fn(*args)
   1023     except errors.OpError as e:
   1024       message = compat.as_text(e.message)

/home/luo/anaconda2/envs/dlnd/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1002         return tf_session.TF_Run(session, options,
   1003                                  feed_dict, fetch_list, target_list,
-> 1004                                  status, run_metadata)
   1005 
   1006     def _prun_fn(session, handle, feed_dict, fetch_list):

KeyboardInterrupt:

Restore the trained network if you need to:



In [100]:

    
with train_graph.as_default():
    saver = tf.train.Saver()

with tf.Session(graph=train_graph) as sess:
    saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
    embed_mat = sess.run(embedding)









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
TypeError: expected bytes, NoneType found

The above exception was the direct cause of the following exception:

SystemError                               Traceback (most recent call last)
<ipython-input-100-4eeca6a379f0> in <module>()
      3 
      4 with tf.Session(graph=train_graph) as sess:
----> 5     saver.restore(sess, tf.train.latest_checkpoint('checkpoints'))
      6     embed_mat = sess.run(embedding)

/home/luo/anaconda2/envs/dlnd/lib/python3.6/site-packages/tensorflow/python/training/saver.py in restore(self, sess, save_path)
   1426       return
   1427     sess.run(self.saver_def.restore_op_name,
-> 1428              {self.saver_def.filename_tensor_name: save_path})
   1429 
   1430   @staticmethod

/home/luo/anaconda2/envs/dlnd/lib/python3.6/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    765     try:
    766       result = self._run(None, fetches, feed_dict, options_ptr,
--> 767                          run_metadata_ptr)
    768       if run_metadata:
    769         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/home/luo/anaconda2/envs/dlnd/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
    963     if final_fetches or final_targets:
    964       results = self._do_run(handle, final_targets, final_fetches,
--> 965                              feed_dict_string, options, run_metadata)
    966     else:
    967       results = []

/home/luo/anaconda2/envs/dlnd/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1013     if handle is None:
   1014       return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
-> 1015                            target_list, options, run_metadata)
   1016     else:
   1017       return self._do_call(_prun_fn, self._session, handle, feed_dict,

/home/luo/anaconda2/envs/dlnd/lib/python3.6/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1020   def _do_call(self, fn, *args):
   1021     try:
-> 1022       return fn(*args)
   1023     except errors.OpError as e:
   1024       message = compat.as_text(e.message)

/home/luo/anaconda2/envs/dlnd/lib/python3.6/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1002         return tf_session.TF_Run(session, options,
   1003                                  feed_dict, fetch_list, target_list,
-> 1004                                  status, run_metadata)
   1005 
   1006     def _prun_fn(session, handle, feed_dict, fetch_list):

SystemError: <built-in function TF_Run> returned a result with an error set

Visualizing the word vectors

Below we'll use T-SNE to visualize how our high-dimensional word vectors cluster together. T-SNE is used to project these vectors into two dimensions while preserving local stucture. Check out this post from Christopher Olah to learn more about T-SNE and other ways to visualize high-dimensional data.



In [ ]:

    
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.pyplot as plt
from sklearn.manifold import TSNE



In [ ]:

    
viz_words = 500
tsne = TSNE()
embed_tsne = tsne.fit_transform(embed_mat[:viz_words, :])



In [ ]:

    
fig, ax = plt.subplots(figsize=(14, 14))
for idx in range(viz_words):
    plt.scatter(*embed_tsne[idx, :], color='steelblue')
    plt.annotate(int_to_vocab[idx], (embed_tsne[idx, 0], embed_tsne[idx, 1]), alpha=0.7)