Deep Learning

Assignment 5

The goal of this assignment is to train a Word2Vec skip-gram model over Text8 data.



In [1]:

    
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
%matplotlib inline
from __future__ import print_function
import collections
import math
import numpy as np
import os
import random
import tensorflow as tf
import zipfile
from matplotlib import pylab
from six.moves import range
from six.moves.urllib.request import urlretrieve
from sklearn.manifold import TSNE

Download the data from the source website if necessary.



In [2]:

    
url = 'http://mattmahoney.net/dc/'

def maybe_download(filename, expected_bytes):
  """Download a file if not present, and make sure it's the right size."""
  if not os.path.exists(filename):
    filename, _ = urlretrieve(url + filename, filename)
  statinfo = os.stat(filename)
  if statinfo.st_size == expected_bytes:
    print('Found and verified %s' % filename)
  else:
    print(statinfo.st_size)
    raise Exception(
      'Failed to verify ' + filename + '. Can you get to it with a browser?')
  return filename

filename = maybe_download('text8.zip', 31344016)









    



Found and verified text8.zip

Read the data into a string.



In [3]:

    
def read_data(filename):
  """Extract the first file enclosed in a zip file as a list of words"""
  with zipfile.ZipFile(filename) as f:
    data = tf.compat.as_str(f.read(f.namelist()[0])).split()
  return data
  
words = read_data(filename)
print('Data size %d' % len(words))









    



Data size 17005207

Build the dictionary and replace rare words with UNK token.



In [4]:

    
words[0:20]









    Out[4]:





['anarchism',
 'originated',
 'as',
 'a',
 'term',
 'of',
 'abuse',
 'first',
 'used',
 'against',
 'early',
 'working',
 'class',
 'radicals',
 'including',
 'the',
 'diggers',
 'of',
 'the',
 'english']



In [5]:

    
vocabulary_size = 50000

def build_dataset(words):
  count = [['UNK', -1]]
  count.extend(collections.Counter(words).most_common(vocabulary_size - 1))
  dictionary = dict()
  for word, _ in count:
    dictionary[word] = len(dictionary)
  data = list()
  unk_count = 0
  for word in words:
    if word in dictionary:
      index = dictionary[word]
    else:
      index = 0  # dictionary['UNK']
      unk_count = unk_count + 1
    data.append(index)
  count[0][1] = unk_count
  reverse_dictionary = dict(zip(dictionary.values(), dictionary.keys())) 
  return data, count, dictionary, reverse_dictionary

data, count, dictionary, reverse_dictionary = build_dataset(words)
print('Most common words (+UNK)', count[:5])
print('Sample data', data[:10])
del words  # Hint to reduce memory.









    



Most common words (+UNK) [['UNK', 418391], ('the', 1061396), ('of', 593677), ('and', 416629), ('one', 411764)]
Sample data [5239, 3084, 12, 6, 195, 2, 3137, 46, 59, 156]

Function to generate a training batch for the skip-gram model.



In [6]:

    
data_index = 0

def generate_batch(batch_size, num_skips, skip_window):
  global data_index
  assert batch_size % num_skips == 0
  assert num_skips <= 2 * skip_window
  batch = np.ndarray(shape=(batch_size), dtype=np.int32)
  labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32)
  span = 2 * skip_window + 1 # [ skip_window target skip_window ]
  # Buffer generated from double-ended queue with fast add/remove
  buffer = collections.deque(maxlen=span)
  for _ in range(span):
    buffer.append(data[data_index])
    data_index = (data_index + 1) % len(data)
  for i in range(batch_size // num_skips):
    target = skip_window  # target label at the center of the buffer
    targets_to_avoid = [ skip_window ]
    for j in range(num_skips):
      while target in targets_to_avoid:
        target = random.randint(0, span - 1)
      targets_to_avoid.append(target)
      batch[i * num_skips + j] = buffer[skip_window]
      labels[i * num_skips + j, 0] = buffer[target]
    buffer.append(data[data_index])
    data_index = (data_index + 1) % len(data)
  return batch, labels

print('data:', [reverse_dictionary[di] for di in data[:8]])

for num_skips, skip_window in [(2, 1), (4, 2)]:
    data_index = 0
    batch, labels = generate_batch(batch_size=8, num_skips=num_skips, skip_window=skip_window)
    print('\nwith num_skips = %d and skip_window = %d:' % (num_skips, skip_window))
    print('    batch:', [reverse_dictionary[bi] for bi in batch])
    print('    labels:', [reverse_dictionary[li] for li in labels.reshape(8)])









    



data: ['anarchism', 'originated', 'as', 'a', 'term', 'of', 'abuse', 'first']

with num_skips = 2 and skip_window = 1:
    batch: ['originated', 'originated', 'as', 'as', 'a', 'a', 'term', 'term']
    labels: ['as', 'anarchism', 'a', 'originated', 'term', 'as', 'a', 'of']

with num_skips = 4 and skip_window = 2:
    batch: ['as', 'as', 'as', 'as', 'a', 'a', 'a', 'a']
    labels: ['a', 'anarchism', 'term', 'originated', 'of', 'term', 'originated', 'as']

Train a skip-gram model.



In [7]:

    
# vocabulary is size 50000
batch_size = 128
embedding_size = 128 # Dimension of the embedding vector.
skip_window = 1 # How many words to consider left and right.
num_skips = 2 # How many times to reuse an input to generate a label.
# We pick a random validation set to sample nearest neighbors. here we limit the
# validation samples to the words that have a low numeric ID, which by
# construction are also the most frequent. 
valid_size = 16 # Random set of words to evaluate similarity on.
valid_window = 100 # Only pick dev samples in the head of the distribution.
valid_examples = np.array(random.sample(range(valid_window), valid_size))
num_sampled = 64 # Number of negative examples to sample.

graph = tf.Graph()

with graph.as_default(), tf.device('/cpu:0'):

  # Input data.
  train_dataset = tf.placeholder(tf.int32, shape=[batch_size])
  train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
  valid_dataset = tf.constant(valid_examples, dtype=tf.int32)
  
  # Variables.
  embeddings = tf.Variable(
    tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
  softmax_weights = tf.Variable(
    tf.truncated_normal([vocabulary_size, embedding_size],
                         stddev=1.0 / math.sqrt(embedding_size)))
  softmax_biases = tf.Variable(tf.zeros([vocabulary_size]))
  
  # Model.
  # Look up embeddings for inputs.
  embed = tf.nn.embedding_lookup(embeddings, train_dataset)
  # Compute the softmax loss, using a sample of the negative labels each time.
  loss = tf.reduce_mean(
    tf.nn.sampled_softmax_loss(weights=softmax_weights, biases=softmax_biases, inputs=embed,
                               labels=train_labels, num_sampled=num_sampled, num_classes=vocabulary_size))

  # Optimizer.
  # Note: The optimizer will optimize the softmax_weights AND the embeddings.
  # This is because the embeddings are defined as a variable quantity and the
  # optimizer's `minimize` method will by default modify all variable quantities 
  # that contribute to the tensor it is passed.
  # See docs on `tf.train.Optimizer.minimize()` for more details.
  optimizer = tf.train.AdagradOptimizer(1.0).minimize(loss)
  
  # Compute the similarity between minibatch examples and all embeddings.
  # We use the cosine distance:
  norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True))
  normalized_embeddings = embeddings / norm
  valid_embeddings = tf.nn.embedding_lookup(
    normalized_embeddings, valid_dataset)
  similarity = tf.matmul(valid_embeddings, tf.transpose(normalized_embeddings))



In [8]:

    
num_steps = 100001

with tf.Session(graph=graph) as session:
  tf.global_variables_initializer().run()
  print('Initialized')
  average_loss = 0
  for step in range(num_steps):
    batch_data, batch_labels = generate_batch(
      batch_size, num_skips, skip_window)
    feed_dict = {train_dataset : batch_data, train_labels : batch_labels}
    _, l = session.run([optimizer, loss], feed_dict=feed_dict)
    average_loss += l
    if step % 2000 == 0:
      if step > 0:
        average_loss = average_loss / 2000
      # The average loss is an estimate of the loss over the last 2000 batches.
      print('Average loss at step %d: %f' % (step, average_loss))
      average_loss = 0
    # note that this is expensive (~20% slowdown if computed every 500 steps)
    if step % 10000 == 0:
      sim = similarity.eval()
      for i in range(valid_size):
        valid_word = reverse_dictionary[valid_examples[i]]
        top_k = 8 # number of nearest neighbors
        nearest = (-sim[i, :]).argsort()[1:top_k+1]
        log = 'Nearest to %s:' % valid_word
        for k in range(top_k):
          close_word = reverse_dictionary[nearest[k]]
          log = '%s %s,' % (log, close_word)
        print(log)
  final_embeddings = normalized_embeddings.eval()









    



Initialized
Average loss at step 0: 8.005444
Nearest to states: hydrazine, glycolysis, preprocessing, loathed, webpages, chestnuts, leith, stirring,
Nearest to the: gunn, sunburn, robustness, virginal, retinue, keratin, notebooks, shaved,
Nearest to a: koreans, go, huffman, specifier, accompany, updating, interwoven, dulles,
Nearest to his: redefinition, windsor, crowded, proxima, enciclopedia, guitar, olin, mandy,
Nearest to no: advaita, trillion, deployment, substratum, fic, furthermore, doppelbock, happens,
Nearest to first: outgrowth, huis, translations, hawking, trna, alicante, minos, beltaine,
Nearest to world: domed, lanka, ives, threaten, refuses, knitted, angevin, putsch,
Nearest to years: nicolaus, franke, exhibition, salient, mcclintock, pernambuco, morin, warplanes,
Nearest to had: manifest, cavities, vulgar, recycle, sepsis, tropes, mapuche, supportive,
Nearest to three: adonis, test, legalism, prima, gutenberg, hydroxides, sad, kzinti,
Nearest to most: derivates, bounding, amplifying, cn, graduation, gomorrah, nippon, taliesin,
Nearest to two: microbial, freed, undefined, courage, merely, scharnhorst, skeleton, untimely,
Nearest to more: prairie, standalone, slapping, favorably, heijenoort, gaiman, weapon, daggers,
Nearest to all: sterility, watchdog, revered, ecology, ehud, wasted, novelty, dining,
Nearest to called: sailor, uucp, theologically, iho, kut, norwegians, rotunda, lances,
Nearest to would: programmed, zolt, arithmetic, curtis, avtovaz, edwards, kiwi, vigil,
Average loss at step 2000: 4.372715
Average loss at step 4000: 3.863384
Average loss at step 6000: 3.794621
Average loss at step 8000: 3.683262
Average loss at step 10000: 3.616962
Nearest to states: hydrazine, orson, glycolysis, cars, bull, salute, chestnuts, ichij,
Nearest to the: its, a, his, this, an, dehydrogenase, heliopause, each,
Nearest to a: the, progressions, eased, contiguous, this, his, equidistant, refutable,
Nearest to his: their, its, her, s, the, crowded, this, a,
Nearest to no: idi, geological, lineages, fic, happens, phobia, amhr, poorly,
Nearest to first: longshanks, solaris, hardwicke, conlang, pseudorandom, trna, unleashing, exploitation,
Nearest to world: ives, ranging, syllogisms, rewrite, imminent, csf, algae, threaten,
Nearest to years: chaplain, nicolaus, exhibition, kingman, favored, pernambuco, keys, salient,
Nearest to had: has, have, was, glockenspiel, ripe, vulgar, supportive, into,
Nearest to three: five, four, eight, seven, six, two, zero, nine,
Nearest to most: more, steric, derivates, initiative, uncles, ski, sizeable, intends,
Nearest to two: five, three, four, six, eight, nine, seven, one,
Nearest to more: most, whether, prairie, weakening, hams, incisors, weapon, favorably,
Nearest to all: countenance, usn, dining, mcelroy, ehud, proline, iron, joannes,
Nearest to called: sailor, uucp, kut, databases, syllabics, freie, theologically, iho,
Nearest to would: can, will, may, arithmetic, curtis, should, levee, swine,
Average loss at step 12000: 3.607985
Average loss at step 14000: 3.571040
Average loss at step 16000: 3.408321
Average loss at step 18000: 3.456718
Average loss at step 20000: 3.547160
Nearest to states: orson, glycolysis, hydrazine, salute, knesset, bering, motorsport, nations,
Nearest to the: its, their, any, this, an, vorder, humber, his,
Nearest to a: progressions, this, very, no, korah, carefree, outcrops, the,
Nearest to his: their, her, its, the, my, separations, decimal, supposition,
Nearest to no: any, launched, idi, a, syncopated, sketching, amhr, australasian,
Nearest to first: last, hardwicke, exploitation, rammstein, conlang, solaris, maximally, same,
Nearest to world: ives, ranging, rewrite, imminent, syllogisms, eugene, putsch, pluriform,
Nearest to years: apeiron, chaplain, favored, kingman, hewlett, keys, days, tortures,
Nearest to had: has, have, was, were, glockenspiel, henrietta, ripe, supportive,
Nearest to three: four, six, five, two, seven, eight, zero, nine,
Nearest to most: more, steric, use, nextstep, sizeable, some, catchy, derivates,
Nearest to two: three, six, four, five, seven, one, eight, zero,
Nearest to more: most, very, playback, less, hams, weapon, stationed, weakening,
Nearest to all: many, some, these, usn, mcelroy, droughts, glottalized, sake,
Nearest to called: sailor, kut, uucp, iho, transports, syllabics, rebecca, jagiello,
Nearest to would: can, will, could, should, may, must, to, angelina,
Average loss at step 22000: 3.501884
Average loss at step 24000: 3.489150
Average loss at step 26000: 3.481511
Average loss at step 28000: 3.481227
Average loss at step 30000: 3.504071
Nearest to states: nations, orson, state, knesset, glycolysis, bering, glorifying, salute,
Nearest to the: their, its, his, jannaeus, some, a, homeowners, any,
Nearest to a: any, the, korah, this, progressions, frankenstein, akita, zar,
Nearest to his: her, their, its, the, my, s, our, separations,
Nearest to no: any, there, another, scrutiny, kappa, recursive, syncopated, launched,
Nearest to first: last, second, next, exploitation, dictum, present, faq, rammstein,
Nearest to world: rewrite, ranging, ives, syllogisms, imminent, depressant, vaults, eugene,
Nearest to years: days, apeiron, times, favored, chaplain, keys, kingman, hewlett,
Nearest to had: has, have, was, were, glockenspiel, is, having, ripe,
Nearest to three: four, five, seven, six, two, eight, nine, zero,
Nearest to most: more, some, use, among, initiative, many, steric, legislators,
Nearest to two: three, four, one, seven, six, five, eight, zero,
Nearest to more: most, less, very, playback, growing, weapon, combined, rather,
Nearest to all: some, these, many, droughts, both, foo, any, ime,
Nearest to called: transports, jagiello, undertones, kut, syllabics, rebecca, sailor, uucp,
Nearest to would: can, will, could, may, should, must, might, to,
Average loss at step 32000: 3.497722
Average loss at step 34000: 3.494118
Average loss at step 36000: 3.455470
Average loss at step 38000: 3.301874
Average loss at step 40000: 3.433499
Nearest to states: nations, state, knesset, glorifying, glycolysis, kingdom, bering, orson,
Nearest to the: this, their, any, its, his, a, some, each,
Nearest to a: another, the, any, progressions, rounders, puppy, no, this,
Nearest to his: their, her, its, the, my, s, separations, our,
Nearest to no: any, another, there, syncopated, scrutiny, a, controllers, poorly,
Nearest to first: last, second, next, rammstein, exploitation, gemini, maximally, amniotic,
Nearest to world: rewrite, ives, denote, ministry, imminent, depressant, syllogisms, pluriform,
Nearest to years: days, times, apeiron, chaplain, bahia, consonances, favored, whom,
Nearest to had: has, have, was, were, avignon, later, glockenspiel, been,
Nearest to three: five, four, two, seven, six, eight, nine, one,
Nearest to most: more, some, among, many, use, legislators, less, bobo,
Nearest to two: three, four, five, seven, six, one, eight, zero,
Nearest to more: less, most, very, playback, higher, greater, rather, stationed,
Nearest to all: both, ime, these, sweetener, any, many, foo, mcelroy,
Nearest to called: undertones, kensington, homebrew, syllabics, sailor, describes, jagiello, named,
Nearest to would: will, can, could, may, should, might, must, did,
Average loss at step 42000: 3.433009
Average loss at step 44000: 3.454800
Average loss at step 46000: 3.451844
Average loss at step 48000: 3.353686
Average loss at step 50000: 3.385085
Nearest to states: nations, kingdom, knesset, glycolysis, state, glorifying, bering, orson,
Nearest to the: its, their, his, this, humber, a, directions, vorder,
Nearest to a: progressions, another, the, no, korah, trolling, any, hetfield,
Nearest to his: her, their, its, my, your, the, s, him,
Nearest to no: any, another, there, little, a, scrutiny, mystics, sparring,
Nearest to first: last, second, next, rammstein, pseudorandom, assignments, present, fourth,
Nearest to world: ministry, ives, denote, rewrite, mis, super, u, beyond,
Nearest to years: days, times, chaplain, months, bahia, apeiron, laptop, rvi,
Nearest to had: has, have, was, having, were, hypothetically, avignon, ever,
Nearest to three: four, six, seven, five, eight, two, nine, zero,
Nearest to most: more, among, some, many, particularly, less, legislators, use,
Nearest to two: three, four, six, one, seven, five, eight, zero,
Nearest to more: less, most, very, playback, weapon, bre, incised, stationed,
Nearest to all: both, ime, many, droughts, some, every, mcelroy, leadbelly,
Nearest to called: named, undertones, describes, kensington, sailor, encoded, syllabics, homebrew,
Nearest to would: will, could, can, may, should, might, must, cannot,
Average loss at step 52000: 3.440918
Average loss at step 54000: 3.429205
Average loss at step 56000: 3.438610
Average loss at step 58000: 3.395889
Average loss at step 60000: 3.395627
Nearest to states: nations, kingdom, bering, state, countries, glycolysis, glorifying, us,
Nearest to the: their, a, its, humber, any, this, our, each,
Nearest to a: any, the, another, progressions, korah, trolling, no, this,
Nearest to his: her, their, its, my, olin, our, your, the,
Nearest to no: any, little, scrutiny, syncopated, mystics, codebreakers, sparring, a,
Nearest to first: last, second, next, treble, rammstein, maximally, fourth, dictum,
Nearest to world: rewrite, ministry, muslim, homophony, cold, church, denote, ives,
Nearest to years: days, months, times, chaplain, bahia, year, apeiron, rvi,
Nearest to had: has, have, was, were, having, been, hypothetically, avignon,
Nearest to three: five, four, six, two, seven, eight, nine, one,
Nearest to most: more, some, use, among, many, particularly, nextstep, less,
Nearest to two: three, four, six, five, one, seven, eight, zero,
Nearest to more: less, very, most, rather, greater, playback, stationed, varied,
Nearest to all: both, many, those, every, any, these, some, various,
Nearest to called: named, used, homebrew, condenses, undertones, encoded, sailor, syllabics,
Nearest to would: will, could, can, may, should, might, must, cannot,
Average loss at step 62000: 3.242388
Average loss at step 64000: 3.255350
Average loss at step 66000: 3.405548
Average loss at step 68000: 3.396361
Average loss at step 70000: 3.359849
Nearest to states: nations, kingdom, countries, glorifying, state, bering, us, glycolysis,
Nearest to the: its, their, any, this, a, each, these, some,
Nearest to a: another, the, any, enough, korah, dispensed, rounders, hetfield,
Nearest to his: her, their, its, my, our, your, olin, crowded,
Nearest to no: little, syncopated, there, any, scrutiny, contagion, sparring, infects,
Nearest to first: last, second, next, same, fourth, exchanges, dictum, treble,
Nearest to world: cold, muslim, ministry, u, denote, mis, rewrite, plastics,
Nearest to years: days, months, bahia, chaplain, year, times, minutes, centuries,
Nearest to had: has, have, was, were, having, hypothetically, chose, avignon,
Nearest to three: four, six, five, two, seven, eight, nine, zero,
Nearest to most: more, some, less, many, among, particularly, use, legislators,
Nearest to two: three, six, four, one, five, seven, eight, zero,
Nearest to more: less, most, very, rather, highly, greater, increasingly, playback,
Nearest to all: many, both, some, various, any, those, every, several,
Nearest to called: named, encoded, homebrew, used, kensington, paternalistic, see, considered,
Nearest to would: will, could, can, may, should, might, must, cannot,
Average loss at step 72000: 3.372574
Average loss at step 74000: 3.352292
Average loss at step 76000: 3.320355
Average loss at step 78000: 3.356399
Average loss at step 80000: 3.383223
Nearest to states: nations, kingdom, us, countries, state, glorifying, bering, timeout,
Nearest to the: its, their, tippit, this, his, a, curbed, humber,
Nearest to a: another, progressions, the, dispensed, korah, emerson, frankenstein, spectrometers,
Nearest to his: her, their, its, my, your, our, the, crowded,
Nearest to no: little, syncopated, sarcasm, any, sparring, mystics, there, contagion,
Nearest to first: last, second, next, fourth, exchanges, same, third, treble,
Nearest to world: cold, u, denote, numerology, spot, plastics, ives, depressant,
Nearest to years: days, months, year, times, minutes, weeks, centuries, decades,
Nearest to had: has, have, was, were, hypothetically, began, avignon, been,
Nearest to three: four, six, five, two, seven, eight, nine, zero,
Nearest to most: more, some, many, less, particularly, among, especially, all,
Nearest to two: three, four, six, five, seven, one, eight, zero,
Nearest to more: less, most, very, rather, increasingly, quite, bre, highly,
Nearest to all: both, every, any, many, various, each, leadbelly, several,
Nearest to called: named, used, considered, encoded, homebrew, termed, kensington, referred,
Nearest to would: will, could, can, may, might, should, must, cannot,
Average loss at step 82000: 3.406522
Average loss at step 84000: 3.409680
Average loss at step 86000: 3.393597
Average loss at step 88000: 3.353419
Average loss at step 90000: 3.366480
Nearest to states: nations, kingdom, us, bering, glorifying, countries, state, firm,
Nearest to the: its, their, a, any, this, his, every, humber,
Nearest to a: another, the, any, every, dispensed, korah, refutable, deposited,
Nearest to his: her, their, its, my, our, your, the, s,
Nearest to no: little, any, syncopated, there, predominately, sarcasm, only, another,
Nearest to first: last, second, next, previous, treble, same, third, fourth,
Nearest to world: cold, rewrite, bodies, faith, ranging, denote, numerology, sorcery,
Nearest to years: days, months, decades, year, minutes, centuries, weeks, hours,
Nearest to had: has, have, were, was, having, since, hypothetically, avignon,
Nearest to three: two, five, four, seven, eight, six, nine, zero,
Nearest to most: more, some, particularly, less, many, among, especially, all,
Nearest to two: three, four, five, seven, six, one, eight, zero,
Nearest to more: less, most, very, increasingly, rather, greater, bre, playback,
Nearest to all: both, every, many, any, various, leadbelly, these, several,
Nearest to called: named, used, termed, considered, referred, homebrew, encoded, tangential,
Nearest to would: will, could, might, can, should, may, must, cannot,
Average loss at step 92000: 3.399624
Average loss at step 94000: 3.256396
Average loss at step 96000: 3.359089
Average loss at step 98000: 3.244152
Average loss at step 100000: 3.354399
Nearest to states: nations, kingdom, us, countries, glorifying, bering, state, semantic,
Nearest to the: its, their, humber, his, your, a, our, some,
Nearest to a: another, the, any, korah, refutable, progressions, rounders, dispensed,
Nearest to his: her, their, my, your, our, its, the, s,
Nearest to no: little, any, syncopated, there, predominately, another, scrutiny, implemented,
Nearest to first: last, second, next, fourth, third, treble, previous, original,
Nearest to world: cold, muslim, u, gulf, rewrite, ministry, mis, songwriter,
Nearest to years: days, months, minutes, year, weeks, hours, decades, centuries,
Nearest to had: has, have, was, having, were, hypothetically, since, could,
Nearest to three: four, two, five, six, seven, eight, nine, zero,
Nearest to most: more, less, particularly, many, especially, among, some, use,
Nearest to two: three, four, six, five, seven, eight, one, nine,
Nearest to more: less, most, very, greater, increasingly, particularly, rather, playback,
Nearest to all: every, various, both, these, many, several, any, those,
Nearest to called: named, termed, homebrew, referred, used, encoded, treasured, condenses,
Nearest to would: will, could, should, might, can, may, must, cannot,



In [9]:

    
num_points = 400

tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000)
two_d_embeddings = tsne.fit_transform(final_embeddings[1:num_points+1, :])



In [10]:

    
def plot(embeddings, labels):
  assert embeddings.shape[0] >= len(labels), 'More labels than embeddings'
  pylab.figure(figsize=(15,15))  # in inches
  for i, label in enumerate(labels):
    x, y = embeddings[i,:]
    pylab.scatter(x, y)
    pylab.annotate(label, xy=(x, y), xytext=(5, 2), textcoords='offset points',
                   ha='right', va='bottom')
  pylab.show()

words = [reverse_dictionary[i] for i in range(1, num_points+1)]
plot(two_d_embeddings, words)

Problem

An alternative to skip-gram is another Word2Vec model called CBOW (Continuous Bag of Words). In the CBOW model, instead of predicting a context word from a word vector, you predict a word from the sum of all the word vectors in its context. Implement and evaluate a CBOW model trained on the text8 dataset.

So our input vector is a set of words and our output is a predicted word. This is the reverse of the skip-gram?

Swap batch and label?

Is batch still a set of [word, label] pairs? Or is it [[word1, word2, word4, word5], label] with the words from the context window? It is the latter.

Training data for skip-gram model is a set of batch vectors and a set of labels, wherein the batch vectors are 128 in length.
Training data for CBOW is a set of words around a current word.

Function to generate batch for CBOW model.

We need to provide some form of padding? Or just loop around the 'data' structure?



In [22]:

    
data_index = 0

# Batch needs to output batch of input words-label but excluding word in center of window
def generate_batch(batch_size, context_window):
    global data_index
    context_size = 2 * context_window
    batch = np.ndarray(shape=(batch_size, context_size), dtype=np.int32)
    labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32)
    span = 2 * context_window + 1 # [ context_window target context_window ]
    # Buffer generated from double-ended queue with fast add/remove
    buffer = collections.deque(maxlen=span)
    # Below builds a buffer and initialises
    for _ in range(span):
        # This is where the data comes in
        buffer.append(data[data_index])
        data_index = (data_index + 1) % len(data)
    for i in range(batch_size):
        target = context_window  # target label at the center of the buffer
        buffer_list = list(buffer)
        batch[i, :] = buffer_list[:target] + buffer_list[target+1:]
        labels[i] = buffer[target]
        # When you append to a buffer you add to the front and it pops off the end
        buffer.append(data[data_index])
        data_index = (data_index + 1) % len(data)
    return batch, labels

print('data:', [reverse_dictionary[di] for di in data[:8]])

for context_window in [1, 2, 3, 4]:
    data_index = 0
    batch, labels = generate_batch(batch_size=8, context_window=context_window)
    print('\nwith context_window = %d:' % (context_window))
    print('\nbatch shape: {0}'.format(batch.shape))
    for be in batch:
        print('    batch_entry:', [reverse_dictionary[bindex] for bindex in be])
    print('    labels:', [reverse_dictionary[li] for li in labels.reshape(8)])









    



data: ['anarchism', 'originated', 'as', 'a', 'term', 'of', 'abuse', 'first']

with context_window = 1:

batch shape: (8, 2)
    batch_entry: ['anarchism', 'as']
    batch_entry: ['originated', 'a']
    batch_entry: ['as', 'term']
    batch_entry: ['a', 'of']
    batch_entry: ['term', 'abuse']
    batch_entry: ['of', 'first']
    batch_entry: ['abuse', 'used']
    batch_entry: ['first', 'against']
    labels: ['originated', 'as', 'a', 'term', 'of', 'abuse', 'first', 'used']

with context_window = 2:

batch shape: (8, 4)
    batch_entry: ['anarchism', 'originated', 'a', 'term']
    batch_entry: ['originated', 'as', 'term', 'of']
    batch_entry: ['as', 'a', 'of', 'abuse']
    batch_entry: ['a', 'term', 'abuse', 'first']
    batch_entry: ['term', 'of', 'first', 'used']
    batch_entry: ['of', 'abuse', 'used', 'against']
    batch_entry: ['abuse', 'first', 'against', 'early']
    batch_entry: ['first', 'used', 'early', 'working']
    labels: ['as', 'a', 'term', 'of', 'abuse', 'first', 'used', 'against']

with context_window = 3:

batch shape: (8, 6)
    batch_entry: ['anarchism', 'originated', 'as', 'term', 'of', 'abuse']
    batch_entry: ['originated', 'as', 'a', 'of', 'abuse', 'first']
    batch_entry: ['as', 'a', 'term', 'abuse', 'first', 'used']
    batch_entry: ['a', 'term', 'of', 'first', 'used', 'against']
    batch_entry: ['term', 'of', 'abuse', 'used', 'against', 'early']
    batch_entry: ['of', 'abuse', 'first', 'against', 'early', 'working']
    batch_entry: ['abuse', 'first', 'used', 'early', 'working', 'class']
    batch_entry: ['first', 'used', 'against', 'working', 'class', 'radicals']
    labels: ['a', 'term', 'of', 'abuse', 'first', 'used', 'against', 'early']

with context_window = 4:

batch shape: (8, 8)
    batch_entry: ['anarchism', 'originated', 'as', 'a', 'of', 'abuse', 'first', 'used']
    batch_entry: ['originated', 'as', 'a', 'term', 'abuse', 'first', 'used', 'against']
    batch_entry: ['as', 'a', 'term', 'of', 'first', 'used', 'against', 'early']
    batch_entry: ['a', 'term', 'of', 'abuse', 'used', 'against', 'early', 'working']
    batch_entry: ['term', 'of', 'abuse', 'first', 'against', 'early', 'working', 'class']
    batch_entry: ['of', 'abuse', 'first', 'used', 'early', 'working', 'class', 'radicals']
    batch_entry: ['abuse', 'first', 'used', 'against', 'working', 'class', 'radicals', 'including']
    batch_entry: ['first', 'used', 'against', 'early', 'class', 'radicals', 'including', 'the']
    labels: ['term', 'of', 'abuse', 'first', 'used', 'against', 'early', 'working']



In [24]:

    
# vocabulary is size 50000
batch_size = 128
embedding_size = 128 # Dimension of the embedding vector.
context_window = 3 # How many words to consider left and right.
# We pick a random validation set to sample nearest neighbors. here we limit the
# validation samples to the words that have a low numeric ID, which by
# construction are also the most frequent. 
valid_size = 16 # Random set of words to evaluate similarity on.
valid_window = 100 # Only pick dev samples in the head of the distribution.
valid_examples = np.array(random.sample(range(valid_window), valid_size))
num_sampled = 64 # Number of negative examples to sample.

graph = tf.Graph()

with graph.as_default(), tf.device('/cpu:0'):

    # Input data.
    train_dataset = tf.placeholder(tf.int32, shape=[batch_size, 2*context_window])
    train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
    valid_dataset = tf.constant(valid_examples, dtype=tf.int32)
  
    # Variables.
    embeddings = tf.Variable(
        tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
    softmax_weights = tf.Variable(
        tf.truncated_normal([vocabulary_size, embedding_size],
                         stddev=1.0 / math.sqrt(embedding_size)))
    softmax_biases = tf.Variable(tf.zeros([vocabulary_size]))
  
    # Model - with help from here - http://www.thushv.com/natural_language_processing/word2vec-part-2-nlp-with-deep-learning-with-tensorflow-cbow/
    embeds = None
    for i in range(2*context_window):
        embedding_i = tf.nn.embedding_lookup(embeddings, train_dataset[:,i])
        print('embedding %d shape: %s'%(i,embedding_i.get_shape().as_list()))
        emb_x,emb_y = embedding_i.get_shape().as_list()
        if embeds is None:
            embeds = tf.reshape(embedding_i,[emb_x,emb_y,1])
        else:
            embeds = tf.concat([embeds,tf.reshape(embedding_i,[emb_x,emb_y,1])],2)
 
    assert embeds.get_shape().as_list()[2]==2*context_window
    print("Concat embedding size: %s"%embeds.get_shape().as_list())
    avg_embed =  tf.reduce_mean(embeds,2,keep_dims=False)
    print("Avg embedding size: %s"%avg_embed.get_shape().as_list())
    
    # Compute the softmax loss, using a sample of the negative labels each time.
    loss = tf.reduce_mean(
        tf.nn.sampled_softmax_loss(weights=softmax_weights, biases=softmax_biases, inputs=avg_embed,
                               labels=train_labels, num_sampled=num_sampled, num_classes=vocabulary_size))

    # Optimizer.
    # Note: The optimizer will optimize the softmax_weights AND the embeddings.
    # This is because the embeddings are defined as a variable quantity and the
    # optimizer's `minimize` method will by default modify all variable quantities 
    # that contribute to the tensor it is passed.
    # See docs on `tf.train.Optimizer.minimize()` for more details.
    optimizer = tf.train.AdagradOptimizer(1.0).minimize(loss)
  
    # Compute the similarity between minibatch examples and all embeddings.
    # We use the cosine distance:
    norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True))
    normalized_embeddings = embeddings / norm
    valid_embeddings = tf.nn.embedding_lookup(
        normalized_embeddings, valid_dataset)
    similarity = tf.matmul(valid_embeddings, tf.transpose(normalized_embeddings))









    



embedding 0 shape: [128, 128]
embedding 1 shape: [128, 128]
embedding 2 shape: [128, 128]
embedding 3 shape: [128, 128]
embedding 4 shape: [128, 128]
embedding 5 shape: [128, 128]
Concat embedding size: [128, 128, 6]
Avg embedding size: [128, 128]



In [27]:

    
num_steps = 100001

with tf.Session(graph=graph) as session:
    tf.global_variables_initializer().run()
    print('Initialized')
    average_loss = 0
    for step in range(num_steps):
        batch_data, batch_labels = generate_batch(
            batch_size, context_window)
        
        feed_dict = {train_dataset : batch_data, train_labels : batch_labels}
        
        _, l = session.run([optimizer, loss], feed_dict=feed_dict)
        average_loss += l
        
        if step % 2000 == 0:
            if step > 0:
                average_loss = average_loss / 2000
            # The average loss is an estimate of the loss over the last 2000 batches.
            print('Average loss at step %d: %f' % (step, average_loss))
            average_loss = 0
        # note that this is expensive (~20% slowdown if computed every 500 steps)
        if step % 10000 == 0:
            sim = similarity.eval()
            for i in range(valid_size):
                valid_word = reverse_dictionary[valid_examples[i]]
                top_k = 8 # number of nearest neighbors
                nearest = (-sim[i, :]).argsort()[1:top_k+1]
                log = 'Nearest to %s:' % valid_word
                for k in range(top_k):
                    close_word = reverse_dictionary[nearest[k]]
                    log = '%s %s,' % (log, close_word)
                print(log)
    final_embeddings = normalized_embeddings.eval()









    



Initialized
Average loss at step 0: 7.585863
Nearest to history: sheik, currents, studded, jorge, cumulative, pharmacologic, magna, foreshadowing,
Nearest to years: maeterlinck, teams, emilia, mei, iceni, depository, coslet, bogs,
Nearest to two: monte, alexis, paulist, mrnas, herding, livy, bowstring, apoplexy,
Nearest to will: unikom, ladino, thaw, travolta, marcion, angina, unstated, confessor,
Nearest to an: holiness, rebuked, oder, drm, followings, honour, watering, abstain,
Nearest to b: traded, matres, sphinx, antarctic, perish, analogies, hooves, ind,
Nearest to however: hypothetically, sincerely, normalizing, chow, depicting, deriving, feminists, runnymede,
Nearest to not: antiquated, deflation, asl, xiang, undermine, gladly, mpg, least,
Nearest to during: makran, fad, odie, halos, catenary, toughest, gpo, countenance,
Nearest to known: coll, analyse, verhofstadt, answer, memorials, callers, depicted, dalek,
Nearest to would: victory, greatness, oldham, sixths, knocking, chipsets, prosaic, berzelius,
Nearest to UNK: gls, mains, regain, folktales, increase, xxix, cooperative, kaufmann,
Nearest to but: emanates, hainan, straddles, trotskyists, bastard, fossil, coronae, speak,
Nearest to such: bootp, convincingly, unconsciously, inches, ep, cfm, manz, wha,
Nearest to who: deterrence, treacherous, cosets, kantele, trams, descriptor, incredibly, reintroduce,
Nearest to time: guise, wah, luthier, dosing, annulled, ck, bates, eulemur,
Average loss at step 2000: 4.200094
Average loss at step 4000: 3.595824
Average loss at step 6000: 3.614160
Average loss at step 8000: 3.545771
Average loss at step 10000: 3.506427
Nearest to history: sheik, studded, naomi, foreshadowing, tons, salamanders, magna, pharmacologic,
Nearest to years: emilia, inflections, maeterlinck, contexts, watercraft, fevers, unpredictability, teams,
Nearest to two: four, alexis, nerd, three, rwandan, aspirated, six, paulist,
Nearest to will: could, may, would, can, theatres, lifespan, tells, octal,
Nearest to an: a, the, this, syllabaries, vampire, oder, sukarnoputri, demesne,
Nearest to b: d, traded, sailors, matres, antisymmetric, ind, sphinx, dramatic,
Nearest to however: undergraduate, luthier, chow, shoah, hypothetically, feminists, kerr, loris,
Nearest to not: directrix, euphemisms, bassist, dimensional, vigorously, undermine, infiltrated, cripple,
Nearest to during: toughest, cro, counterintelligence, ohm, tarzan, electrically, twa, familiar,
Nearest to known: well, depicted, hundredth, answer, coll, callers, such, analyse,
Nearest to would: may, must, could, will, berzelius, can, shawl, does,
Nearest to UNK: predominance, enlightened, venezuela, outbreak, javelin, et, reidel, desilu,
Nearest to but: dudley, jarmusch, temptation, mull, permanent, conformation, bastard, straddles,
Nearest to such: well, unconsciously, bamiyan, bootp, known, wha, attributes, convincingly,
Nearest to who: deterrence, trams, latent, cke, kantele, characterises, simulcast, generation,
Nearest to time: bates, eulemur, ck, luthier, guise, memorized, annulled, mindanao,
Average loss at step 12000: 3.510734
Average loss at step 14000: 3.400310
Average loss at step 16000: 3.434774
Average loss at step 18000: 3.411235
Average loss at step 20000: 3.371390
Nearest to history: pharmacologic, book, hatta, tons, salamanders, magna, bubbles, studded,
Nearest to years: emilia, inflections, contexts, haiku, watercraft, diversification, maeterlinck, unpredictability,
Nearest to two: three, four, one, six, karenga, dma, fide, eight,
Nearest to will: would, could, may, can, must, should, does, cannot,
Nearest to an: this, vma, a, broca, oder, holiness, another, reasoning,
Nearest to b: d, traded, chaplain, antisymmetric, sailors, monotremes, chimera, ind,
Nearest to however: undergraduate, among, although, chow, kerr, shoah, luthier, reforms,
Nearest to not: directrix, euphemisms, undermine, infiltrated, dimensional, juggling, vigorously, einer,
Nearest to during: against, cro, unjust, ohm, twa, before, configurations, toughest,
Nearest to known: depicted, well, gottlieb, hundredth, answer, verhofstadt, callers, coll,
Nearest to would: may, will, must, could, can, should, does, might,
Nearest to UNK: l, de, et, la, william, j, n, musicians,
Nearest to but: dudley, although, temptation, jarmusch, duddy, ignored, though, conformation,
Nearest to such: well, ifs, bamiyan, unconsciously, bootp, attributes, gillette, known,
Nearest to who: latent, deterrence, pythons, cke, bill, characterises, shewa, he,
Nearest to time: least, year, eulemur, ck, luthier, memorized, guise, bates,
Average loss at step 22000: 3.356230
Average loss at step 24000: 3.366677
Average loss at step 26000: 3.317554
Average loss at step 28000: 3.219768
Average loss at step 30000: 3.284547
Nearest to history: pharmacologic, hatta, faxes, magna, naomi, tons, currents, book,
Nearest to years: months, emilia, inflections, haiku, diversification, days, purposes, contexts,
Nearest to two: three, one, ers, karenga, four, sf, immature, paulist,
Nearest to will: would, could, may, can, must, should, cannot, might,
Nearest to an: vma, another, anesthetic, oder, broca, this, planned, holiness,
Nearest to b: d, traded, chaplain, gulag, sailors, antisymmetric, kursk, model,
Nearest to however: although, among, undergraduate, though, chow, shoah, kerr, delineation,
Nearest to not: directrix, undermine, euphemisms, vigorously, dimensional, decoys, infiltrated, juggling,
Nearest to during: against, before, configurations, twa, unjust, ohm, after, cro,
Nearest to known: depicted, well, seen, gottlieb, regarded, referred, coll, verhofstadt,
Nearest to would: will, may, could, must, should, can, might, cannot,
Nearest to UNK: n, de, et, com, l, hogan, j, la,
Nearest to but: although, dudley, though, jarmusch, duddy, yet, materialist, israelites,
Nearest to such: well, ifs, bamiyan, gillette, bootp, chanson, soundtrack, attributes,
Nearest to who: latent, deterrence, pythons, cke, bill, he, shewa, allergy,
Nearest to time: least, year, memorized, luthier, ck, eulemur, probabilistic, guise,
Average loss at step 32000: 3.286000
Average loss at step 34000: 3.236703
Average loss at step 36000: 3.291754
Average loss at step 38000: 3.282871
Average loss at step 40000: 3.249981
Nearest to history: currents, pharmacologic, magna, faxes, book, ndez, hatta, naomi,
Nearest to years: months, days, emilia, purposes, contexts, inflections, year, diversification,
Nearest to two: three, four, one, six, karenga, swastika, ers, meters,
Nearest to will: would, could, may, must, can, should, cannot, might,
Nearest to an: another, vma, holiness, oder, wallonia, abstain, broca, reasoning,
Nearest to b: d, traded, gulag, kursk, h, chaplain, chimera, model,
Nearest to however: although, though, among, but, while, undergraduate, kerr, chow,
Nearest to not: directrix, undermine, euphemisms, vigorously, infiltrated, juggling, autographs, bisexual,
Nearest to during: against, before, after, unjust, configurations, twa, until, throughout,
Nearest to known: depicted, seen, referred, gottlieb, regarded, well, dzhokhar, available,
Nearest to would: will, may, could, must, might, should, can, cannot,
Nearest to UNK: michael, et, j, fished, la, river, hogan, monster,
Nearest to but: although, though, however, dudley, while, yet, irian, ehret,
Nearest to such: well, ifs, bamiyan, described, gillette, bootp, regarded, chanson,
Nearest to who: latent, he, deterrence, pythons, bill, cke, allergy, translating,
Nearest to time: least, memorized, luthier, point, year, ck, syme, purpose,
Average loss at step 42000: 3.219168
Average loss at step 44000: 3.182288
Average loss at step 46000: 3.218473
Average loss at step 48000: 3.195252
Average loss at step 50000: 3.184970
Nearest to history: currents, pharmacologic, faxes, hatta, collection, life, ndez, book,
Nearest to years: months, days, emilia, year, decades, purposes, haiku, inflections,
Nearest to two: three, four, meters, one, top, six, karenga, females,
Nearest to will: would, could, may, must, can, should, might, cannot,
Nearest to an: another, vma, oder, anesthetic, mosquitia, holiness, planned, standoff,
Nearest to b: d, traded, silicone, chimera, lemay, antisymmetric, gulag, chaplain,
Nearest to however: although, though, but, among, while, undergraduate, because, until,
Nearest to not: directrix, euphemisms, vigorously, dorne, undermine, autographs, locate, infiltrated,
Nearest to during: against, before, after, configurations, unjust, throughout, twa, until,
Nearest to known: referred, depicted, seen, regarded, gottlieb, available, identified, dzhokhar,
Nearest to would: will, may, could, must, might, should, cannot, can,
Nearest to UNK: et, la, michael, holliday, png, com, al, mckenna,
Nearest to but: although, however, though, while, dudley, yet, irian, fossil,
Nearest to such: well, described, ifs, regarded, gillette, bamiyan, far, heralds,
Nearest to who: latent, he, deterrence, cke, allergy, bill, pythons, offence,
Nearest to time: least, point, memorized, year, ck, guise, syme, probabilistic,
Average loss at step 52000: 3.112957
Average loss at step 54000: 3.166306
Average loss at step 56000: 3.158962
Average loss at step 58000: 3.085047
Average loss at step 60000: 3.084010
Nearest to history: currents, pharmacologic, faxes, book, ndez, life, collection, religion,
Nearest to years: months, days, decades, year, emilia, purposes, inflections, contexts,
Nearest to two: three, four, six, five, kb, karenga, meters, eight,
Nearest to will: would, could, must, may, can, might, should, cannot,
Nearest to an: another, standoff, vma, wallonia, anesthetic, mosquitia, holiness, oder,
Nearest to b: d, traded, silicone, lemay, chimera, brookline, sphinx, antisymmetric,
Nearest to however: although, though, but, while, among, undergraduate, because, hinge,
Nearest to not: never, euphemisms, vigorously, directrix, maltose, undermine, dorne, locate,
Nearest to during: before, against, after, configurations, unjust, throughout, until, in,
Nearest to known: referred, regarded, depicted, seen, gottlieb, available, identified, dzhokhar,
Nearest to would: will, could, may, must, might, should, can, does,
Nearest to UNK: van, la, german, betty, carl, michael, der, maria,
Nearest to but: although, however, though, while, yet, dudley, ignored, than,
Nearest to such: well, described, regarded, gillette, far, chanson, seen, bamiyan,
Nearest to who: latent, cke, he, allergy, deterrence, waite, offence, bill,
Nearest to time: least, point, memorized, year, purpose, ck, ingham, affinity,
Average loss at step 62000: 3.116589
Average loss at step 64000: 3.075802
Average loss at step 66000: 3.123237
Average loss at step 68000: 3.017805
Average loss at step 70000: 3.122700
Nearest to history: currents, pharmacologic, religion, faxes, reproach, life, xx, ndez,
Nearest to years: months, decades, days, year, emilia, purposes, teams, inflections,
Nearest to two: three, four, six, meters, one, karenga, five, merriam,
Nearest to will: would, could, must, may, can, might, should, cannot,
Nearest to an: another, vma, wallonia, oder, mosquitia, standoff, anesthetic, planned,
Nearest to b: d, silicone, traded, chimera, j, lemay, f, antisymmetric,
Nearest to however: although, though, while, but, because, among, hinge, undergraduate,
Nearest to not: never, vigorously, maltose, directrix, euphemisms, locate, autographs, undermine,
Nearest to during: against, before, after, throughout, configurations, unjust, since, in,
Nearest to known: referred, regarded, depicted, seen, available, gottlieb, identified, dzhokhar,
Nearest to would: will, could, may, must, might, should, cannot, did,
Nearest to UNK: et, l, des, der, le, la, stratford, river,
Nearest to but: although, however, though, while, yet, dudley, browed, botched,
Nearest to such: well, described, regarded, far, gillette, result, seen, chanson,
Nearest to who: latent, he, cke, allergy, she, mitchell, waite, berthe,
Nearest to time: least, memorized, point, affinity, lingers, purpose, syme, ingham,
Average loss at step 72000: 3.147720
Average loss at step 74000: 3.033683
Average loss at step 76000: 3.110921
Average loss at step 78000: 3.053545
Average loss at step 80000: 3.085399
Nearest to history: geography, warthog, ndez, arne, currents, archbishop, pharmacologic, religion,
Nearest to years: months, decades, days, year, emilia, purposes, contexts, haiku,
Nearest to two: three, four, six, five, one, females, several, karenga,
Nearest to will: would, could, must, may, can, should, might, cannot,
Nearest to an: another, vma, oder, wallonia, standoff, mosquitia, sentinels, anesthetic,
Nearest to b: d, silicone, j, chimera, traded, f, h, lemay,
Nearest to however: although, but, though, while, because, that, among, hinge,
Nearest to not: never, maltose, euphemisms, directrix, locate, autographs, juggling, vigorously,
Nearest to during: before, against, after, until, configurations, since, throughout, in,
Nearest to known: referred, depicted, regarded, seen, available, gottlieb, described, identified,
Nearest to would: will, could, may, might, must, should, did, can,
Nearest to UNK: et, der, desilu, des, gesenius, la, van, artistic,
Nearest to but: although, however, though, while, because, dudley, yet, botched,
Nearest to such: well, described, regarded, seen, gillette, far, bamiyan, attributes,
Nearest to who: latent, cke, he, mitchell, snyder, allergy, berthe, waite,
Nearest to time: least, point, memorized, affinity, lingers, irony, syme, purpose,
Average loss at step 82000: 3.106333
Average loss at step 84000: 2.976522
Average loss at step 86000: 3.044904
Average loss at step 88000: 3.049955
Average loss at step 90000: 2.953195
Nearest to history: currents, religion, reproach, pharmacologic, tradition, journal, becket, geography,
Nearest to years: months, decades, days, year, emilia, contexts, purposes, centuries,
Nearest to two: three, four, six, karenga, five, one, oskar, eclecticism,
Nearest to will: would, could, must, may, might, can, should, cannot,
Nearest to an: another, vma, standoff, mosquitia, anesthetic, planned, batmobile, sheik,
Nearest to b: d, silicone, traded, chimera, antisymmetric, lemay, dramatic, oriente,
Nearest to however: although, but, though, because, while, that, hinge, among,
Nearest to not: never, maltose, directrix, euphemisms, dorne, vigorously, juggling, autographs,
Nearest to during: before, after, against, throughout, until, since, configurations, unjust,
Nearest to known: referred, regarded, depicted, seen, available, described, identified, defined,
Nearest to would: will, could, may, might, must, should, did, does,
Nearest to UNK: der, la, maria, van, peter, michael, et, mythology,
Nearest to but: although, however, though, while, because, dudley, israelites, yet,
Nearest to such: well, regarded, seen, described, far, result, bamiyan, gillette,
Nearest to who: latent, cke, he, allergy, mitchell, snyder, berthe, young,
Nearest to time: least, point, memorized, lingers, affinity, moment, purpose, period,
Average loss at step 92000: 2.663241
Average loss at step 94000: 2.852551
Average loss at step 96000: 3.027855
Average loss at step 98000: 3.003954
Average loss at step 100000: 3.052330
Nearest to history: tradition, currents, geography, pharmacologic, religion, becket, reproach, williamsburg,
Nearest to years: months, decades, days, year, centuries, purposes, emilia, contexts,
Nearest to two: three, four, five, six, meters, karenga, one, top,
Nearest to will: would, must, could, might, should, may, can, cannot,
Nearest to an: another, standoff, vma, oder, anesthetic, planned, ochre, teng,
Nearest to b: d, silicone, chimera, sphinx, antisymmetric, traded, nast, civilize,
Nearest to however: although, but, though, because, while, that, until, ensuing,
Nearest to not: never, maltose, directrix, euphemisms, dorne, juggling, mauritania, autographs,
Nearest to during: before, after, against, until, throughout, configurations, since, in,
Nearest to known: referred, regarded, depicted, seen, available, identified, described, dzhokhar,
Nearest to would: will, could, might, may, must, should, did, cannot,
Nearest to UNK: et, der, des, johnston, jos, town, garden, gesenius,
Nearest to but: although, however, though, while, because, yet, dudley, israelites,
Nearest to such: well, far, seen, regarded, described, result, bamiyan, gillette,
Nearest to who: snyder, latent, cke, mitchell, young, berthe, living, allergy,
Nearest to time: least, point, memorized, lingers, moment, probabilistic, syme, period,



In [ ]: