Deep Learning

Assignment 5

The goal of this assignment is to train a skip-gram model over Text8 data.


In [1]:
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
import collections
import math
import numpy as np
import os
import random
import tensorflow as tf
import urllib
import zipfile
from matplotlib import pylab
from sklearn.manifold import TSNE


---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-6205c2571c61> in <module>()
     10 import zipfile
     11 from matplotlib import pylab
---> 12 from sklearn.manifold import TSNE

ImportError: cannot import name TSNE

Download the data from the source website if necessary.


In [2]:
url = 'http://mattmahoney.net/dc/'

def maybe_download(filename, expected_bytes):
  """Download a file if not present, and make sure it's the right size."""
  if not os.path.exists(filename):
    filename, _ = urllib.urlretrieve(url + filename, filename)
  statinfo = os.stat(filename)
  if statinfo.st_size == expected_bytes:
    print 'Found and verified', filename
  else:
    print statinfo.st_size
    raise Exception(
      'Failed to verify ' + filename + '. Can you get to it with a browser?')
  return filename

filename = maybe_download('text8.zip', 31344016)


Found and verified text8.zip

Read the data into a string.


In [3]:
def read_data(filename):
  f = zipfile.ZipFile(filename)
  for name in f.namelist():
    return f.read(name).split()
  f.close()
  
words = read_data(filename)
print 'Data size', len(words)


Data size 17005207

Build the dictionary and replace rare words with UNK token.


In [4]:
vocabulary_size = 50000

def build_dataset(words):
  count = [['UNK', -1]]
  count.extend(collections.Counter(words).most_common(vocabulary_size - 1))
  dictionary = dict()
  for word, _ in count:
    dictionary[word] = len(dictionary)
  data = list()
  unk_count = 0
  for word in words:
    if word in dictionary:
      index = dictionary[word]
    else:
      index = 0  # dictionary['UNK']
      unk_count = unk_count + 1
    data.append(index)
  count[0][1] = unk_count
  reverse_dictionary = dict(zip(dictionary.values(), dictionary.keys())) 
  return data, count, dictionary, reverse_dictionary

data, count, dictionary, reverse_dictionary = build_dataset(words)
print 'Most common words (+UNK)', count[:5]
print 'Sample data', data[:10]
del words  # Hint to reduce memory.


Most common words (+UNK) [['UNK', 418391], ('the', 1061396), ('of', 593677), ('and', 416629), ('one', 411764)]
Sample data [5239, 3084, 12, 6, 195, 2, 3137, 46, 59, 156]

Function to generate a training batch for the skip-gram model.


In [5]:
data_index = 0

def generate_batch(batch_size, num_skips, skip_window):
  global data_index
  assert batch_size % num_skips == 0
  assert num_skips <= 2 * skip_window
  batch = np.ndarray(shape=(batch_size), dtype=np.int32)
  labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32)
  span = 2 * skip_window + 1 # [ skip_window target skip_window ]
  buffer = collections.deque(maxlen=span)
  for _ in range(span):
    buffer.append(data[data_index])
    data_index = (data_index + 1) % len(data)
  for i in range(batch_size / num_skips):
    target = skip_window  # target label at the center of the buffer
    targets_to_avoid = [ skip_window ]
    for j in range(num_skips):
      while target in targets_to_avoid:
        target = random.randint(0, span - 1)
      targets_to_avoid.append(target)
      batch[i * num_skips + j] = buffer[skip_window]
      labels[i * num_skips + j, 0] = buffer[target]
    buffer.append(data[data_index])
    data_index = (data_index + 1) % len(data)
  return batch, labels

batch, labels = generate_batch(batch_size=8, num_skips=2, skip_window=1)
for i in range(8):
  print batch[i], '->', labels[i, 0]
  print reverse_dictionary[batch[i]], '->', reverse_dictionary[labels[i, 0]]


3084 -> 5239
originated -> anarchism
3084 -> 12
originated -> as
12 -> 6
as -> a
12 -> 3084
as -> originated
6 -> 195
a -> term
6 -> 12
a -> as
195 -> 2
term -> of
195 -> 6
term -> a

Train a skip-gram model.


In [6]:
batch_size = 128
embedding_size = 128 # Dimension of the embedding vector.
skip_window = 1 # How many words to consider left and right.
num_skips = 2 # How many times to reuse an input to generate a label.
# We pick a random validation set to sample nearest neighbors. here we limit the
# validation samples to the words that have a low numeric ID, which by
# construction are also the most frequent. 
valid_size = 16 # Random set of words to evaluate similarity on.
valid_window = 100 # Only pick dev samples in the head of the distribution.
valid_examples = np.array(random.sample(xrange(valid_window), valid_size))
num_sampled = 64 # Number of negative examples to sample.

graph = tf.Graph()

with graph.as_default():

  # Input data.
  train_dataset = tf.placeholder(tf.int32, shape=[batch_size])
  train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
  valid_dataset = tf.constant(valid_examples, dtype=tf.int32)
  
  # Variables.
  embeddings = tf.Variable(
    tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
  softmax_weights = tf.Variable(
    tf.truncated_normal([vocabulary_size, embedding_size],
                         stddev=1.0 / math.sqrt(embedding_size)))
  softmax_biases = tf.Variable(tf.zeros([vocabulary_size]))
  
  # Model.
  # Look up embeddings for inputs.
  embed = tf.nn.embedding_lookup(embeddings, train_dataset)
  # Compute the softmax loss, using a sample of the negative labels each time.
  loss = tf.reduce_mean(
    tf.nn.sampled_softmax_loss(softmax_weights, softmax_biases, embed,
                               train_labels, num_sampled, vocabulary_size))

  # Optimizer.
  optimizer = tf.train.AdagradOptimizer(1.0).minimize(loss)
  
  # Compute the similarity between minibatch examples and all embeddings.
  # We use the cosine distance:
  norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True))
  normalized_embeddings = embeddings / norm
  valid_embeddings = tf.nn.embedding_lookup(
    normalized_embeddings, valid_dataset)
  similarity = tf.matmul(valid_embeddings, tf.transpose(normalized_embeddings))

In [7]:
num_steps = 100001

with tf.Session(graph=graph) as session:
  tf.initialize_all_variables().run()
  print "Initialized"
  average_loss = 0
  for step in xrange(num_steps):
    batch_data, batch_labels = generate_batch(
      batch_size, num_skips, skip_window)
    feed_dict = {train_dataset : batch_data, train_labels : batch_labels}
    _, l = session.run([optimizer, loss], feed_dict=feed_dict)
    average_loss += l
    if step % 2000 == 0:
      if step > 0:
        average_loss = average_loss / 2000
      # The average loss is an estimate of the loss over the last 2000 batches.
      print "Average loss at step", step, ":", average_loss
      average_loss = 0
    # note that this is expensive (~20% slowdown if computed every 500 steps)
    if step % 10000 == 0:
      sim = similarity.eval()
      for i in xrange(valid_size):
        valid_word = reverse_dictionary[valid_examples[i]]
        top_k = 8 # number of nearest neighbors
        nearest = (-sim[i, :]).argsort()[1:top_k+1]
        log = "Nearest to %s:" % valid_word
        for k in xrange(top_k):
          close_word = reverse_dictionary[nearest[k]]
          log = "%s %s," % (log, close_word)
        print log
  final_embeddings = normalized_embeddings.eval()


Initialized
Average loss at step 0 : 7.74733114243
Nearest to six: hazmi, numeral, second, scandals, yoshi, destroy, yeltsin, oxus,
Nearest to often: extra, inspiring, legalistic, follows, percussive, precedes, professionally, snapshots,
Nearest to history: defenses, ethel, ahijah, lodger, engraved, correlations, breathes, exercise,
Nearest to was: embarrassed, marx, cura, workhorse, viral, adventists, homelessness, races,
Nearest to from: dictate, anthemius, tended, faintly, spool, marries, npc, cobalt,
Nearest to i: knoll, foreknowledge, serious, indulging, janusz, hale, akm, deliberations,
Nearest to however: meaningful, integrated, lam, indus, renowned, shale, hangul, kronecker,
Nearest to may: contradict, loaned, camped, curium, quagga, gillette, croatians, pdas,
Nearest to as: formal, rg, mppc, pipelined, localization, plucked, utah, unsurprisingly,
Nearest to called: echolocation, hyphens, canaan, kom, contemporary, ornate, subcategories, administered,
Nearest to all: cocktails, keillor, medals, qarase, doherty, exclaims, barbados, fuss,
Nearest to but: polisario, dossier, acquisitions, majuro, illegal, interaction, dolphins, mordred,
Nearest to th: proprietor, leinster, mitigate, pupil, perdition, suharto, drenched, chianti,
Nearest to system: moonlight, whitehouse, speedway, crimes, petro, roanoke, gagarin, beaten,
Nearest to there: godf, isabella, conjugal, kernow, mental, ogle, gamemaster, concurrent,
Nearest to four: machiavelli, legate, styne, lib, scrupulous, edith, graduate, cloudbusting,
Average loss at step 2000 : 4.36050001049
Average loss at step 4000 : 3.86183701131
Average loss at step 6000 : 3.7887783376
Average loss at step 8000 : 3.68366609967
Average loss at step 10000 : 3.61751648521
Nearest to six: eight, seven, three, four, five, nine, zero, two,
Nearest to often: also, precedes, extra, widely, inspiring, he, it, not,
Nearest to history: defenses, exercise, basilides, feast, lignite, agora, seafloor, fenian,
Nearest to was: is, were, has, had, cond, by, been, be,
Nearest to from: at, in, into, on, sihanouk, cloak, drummers, vegetal,
Nearest to i: aargau, inca, foreknowledge, graphing, earnestly, indulging, hale, recognized,
Nearest to however: hangul, kronecker, during, meaningful, moraine, watching, indus, nearly,
Nearest to may: would, can, could, camped, must, arthritis, quagga, powder,
Nearest to as: drew, negatives, murals, pacification, by, rp, microlensing, shootout,
Nearest to called: lactose, kom, ambiguously, rounders, judaic, seasoning, mad, nasser,
Nearest to all: preventative, inapplicable, qarase, seraphim, varanus, clarendon, medals, solves,
Nearest to but: trick, nearly, loves, and, irenaeus, airport, wynette, however,
Nearest to th: four, fausto, fridays, leinster, sorted, macedon, perdition, anthology,
Nearest to system: moonlight, crimes, petro, speedway, whitehouse, ditto, dolls, roanoke,
Nearest to there: they, it, murr, not, she, isopropanol, often, claws,
Nearest to four: six, three, seven, eight, five, two, nine, zero,
Average loss at step 12000 : 3.60496291411
Average loss at step 14000 : 3.57449301976
Average loss at step 16000 : 3.41143671155
Average loss at step 18000 : 3.45902734238
Average loss at step 20000 : 3.53521625662
Nearest to six: eight, seven, nine, four, five, three, two, zero,
Nearest to often: also, widely, extra, who, there, precedes, he, sometimes,
Nearest to history: defenses, seafloor, joshua, agora, basilides, mjf, rationalize, glutinous,
Nearest to was: is, had, were, has, became, be, are, been,
Nearest to from: at, into, expressiveness, in, vitrification, muriel, between, through,
Nearest to i: ii, we, inca, aargau, teutoburg, katsura, iii, earnestly,
Nearest to however: but, hangul, during, kronecker, dymaxion, watching, internet, and,
Nearest to may: can, would, could, will, must, might, should, parallelism,
Nearest to as: cabral, oops, pacification, narrating, emmett, by, dlp, rp,
Nearest to called: lactose, mad, ambiguously, imaging, nasser, seasoning, championships, guerillas,
Nearest to all: many, some, several, inapplicable, solves, those, headaches, tegmark,
Nearest to but: however, and, is, which, that, or, would, are,
Nearest to th: fausto, acetylcholine, fridays, shipping, mm, multics, macedon, sorted,
Nearest to system: ditto, crimes, quo, petro, misidentification, moonlight, whitehouse, artificial,
Nearest to there: it, they, he, which, often, button, she, greaves,
Nearest to four: three, six, seven, eight, five, two, nine, one,
Average loss at step 22000 : 3.50573488361
Average loss at step 24000 : 3.49012978911
Average loss at step 26000 : 3.48640610874
Average loss at step 28000 : 3.47965462893
Average loss at step 30000 : 3.50235926777
Nearest to six: four, eight, seven, five, three, nine, two, zero,
Nearest to often: widely, sometimes, also, precedes, there, generally, still, now,
Nearest to history: seafloor, agora, defenses, bleeding, rationalize, glutinous, fenian, joshua,
Nearest to was: is, had, were, has, became, been, when, be,
Nearest to from: into, through, in, during, on, oswaldo, across, under,
Nearest to i: we, ii, aargau, they, iii, hadassah, teutoburg, inca,
Nearest to however: but, and, which, during, hangul, that, kronecker, when,
Nearest to may: can, would, could, will, must, might, should, although,
Nearest to as: pacification, emmett, celeste, by, elastic, reynard, emphasizes, exhibiting,
Nearest to called: lactose, imaging, mad, ambiguously, eurasian, seasoning, kalmar, murrow,
Nearest to all: some, these, many, several, clarendon, atlanteans, those, dich,
Nearest to but: however, and, though, when, that, which, or, while,
Nearest to th: bc, mm, fausto, multics, shipping, acetylcholine, fridays, eight,
Nearest to system: systems, archer, ditto, quo, whitehouse, separates, kenning, dictating,
Nearest to there: they, it, he, often, she, this, still, button,
Nearest to four: five, six, three, eight, seven, two, nine, one,
Average loss at step 32000 : 3.50074583721
Average loss at step 34000 : 3.49189666724
Average loss at step 36000 : 3.45647158682
Average loss at step 38000 : 3.29917961174
Average loss at step 40000 : 3.42598986757
Nearest to six: four, seven, eight, five, three, nine, two, one,
Nearest to often: widely, sometimes, also, generally, usually, frequently, commonly, still,
Nearest to history: agora, basilides, glutinous, seafloor, rix, papen, joshua, uncanny,
Nearest to was: is, had, became, were, has, been, did, being,
Nearest to from: through, into, of, in, during, gladio, sopranos, after,
Nearest to i: we, you, ii, they, t, aargau, inca, expunged,
Nearest to however: but, that, though, vicksburg, ak, although, it, statistic,
Nearest to may: can, would, could, will, must, might, should, although,
Nearest to as: by, better, hussars, pear, diplomatic, when, unresponsive, freely,
Nearest to called: lactose, imaging, considered, formosa, judaic, healthcare, tsim, mad,
Nearest to all: these, dich, causation, many, some, sustenance, atlanteans, each,
Nearest to but: however, and, although, it, are, he, see, that,
Nearest to th: bc, multics, fausto, cm, mm, acetylcholine, peak, beaten,
Nearest to system: systems, mannerheim, immunoglobulins, archer, surpassed, kenning, phobos, advent,
Nearest to there: they, it, he, often, now, she, also, who,
Nearest to four: six, five, eight, seven, three, two, nine, one,
Average loss at step 42000 : 3.43068783361
Average loss at step 44000 : 3.44958059001
Average loss at step 46000 : 3.45170159447
Average loss at step 48000 : 3.35055111641
Average loss at step 50000 : 3.3775735321
Nearest to six: eight, seven, four, three, five, nine, two, one,
Nearest to often: sometimes, widely, generally, also, frequently, usually, commonly, now,
Nearest to history: papen, rix, djinn, agora, uncanny, glutinous, mjf, breathes,
Nearest to was: is, has, were, became, had, seems, be, been,
Nearest to from: through, into, after, during, in, at, across, iib,
Nearest to i: we, you, ii, aargau, exclusivity, inca, iii, t,
Nearest to however: but, when, while, though, although, that, during, and,
Nearest to may: can, would, could, will, must, should, might, cannot,
Nearest to as: by, exhibiting, emmett, disobey, microlensing, zed, learners, became,
Nearest to called: lactose, mad, imaging, named, dalton, ore, healthcare, hague,
Nearest to all: each, some, many, dich, every, both, those, any,
Nearest to but: however, although, and, when, while, though, where, until,
Nearest to th: bc, multics, five, mm, mitigate, st, fausto, cm,
Nearest to system: systems, archer, kenning, immunoglobulins, zooming, pogrom, michelson, palliative,
Nearest to there: they, it, he, she, who, now, often, we,
Nearest to four: six, three, seven, eight, five, nine, two, zero,
Average loss at step 52000 : 3.44253816271
Average loss at step 54000 : 3.42357444042
Average loss at step 56000 : 3.43862233436
Average loss at step 58000 : 3.3963695901
Average loss at step 60000 : 3.3921213775
Nearest to six: four, eight, five, seven, nine, three, two, zero,
Nearest to often: sometimes, generally, widely, frequently, usually, also, commonly, now,
Nearest to history: journal, development, papen, milner, rix, djinn, list, fundamental,
Nearest to was: is, had, became, were, has, seems, been, be,
Nearest to from: into, through, during, in, sopranos, after, across, syncopated,
Nearest to i: we, you, ii, t, they, inca, aargau, g,
Nearest to however: but, although, though, that, when, which, while, during,
Nearest to may: can, would, could, will, should, must, might, cannot,
Nearest to as: exhibiting, cogent, dissolves, macros, zed, authentic, spaghetti, cabral,
Nearest to called: used, lactose, considered, mad, healthcare, dominant, minestrone, tsim,
Nearest to all: many, some, each, those, these, both, any, various,
Nearest to but: however, and, although, which, though, while, see, when,
Nearest to th: bc, multics, four, cm, five, ad, nd, rd,
Nearest to system: systems, network, group, zooming, immunoglobulins, mannerheim, astrological, gallant,
Nearest to there: they, it, now, he, she, this, we, still,
Nearest to four: six, five, seven, three, eight, nine, two, zero,
Average loss at step 62000 : 3.23762752998
Average loss at step 64000 : 3.25394991755
Average loss at step 66000 : 3.40291905105
Average loss at step 68000 : 3.39065775722
Average loss at step 70000 : 3.35721546137
Nearest to six: eight, seven, four, nine, five, three, two, zero,
Nearest to often: sometimes, frequently, usually, commonly, generally, now, widely, also,
Nearest to history: rix, milner, list, journal, connexion, papen, djinn, uncanny,
Nearest to was: is, has, became, had, were, when, been, be,
Nearest to from: through, into, across, between, during, in, within, iib,
Nearest to i: we, you, ii, g, exclusivity, carrey, they, hadassah,
Nearest to however: but, although, though, while, that, when, where, which,
Nearest to may: can, would, could, will, must, should, might, cannot,
Nearest to as: by, unresponsive, when, like, is, before, dlp, thorough,
Nearest to called: lactose, mad, considered, minestrone, used, ghulam, kept, hague,
Nearest to all: many, some, various, any, both, every, these, each,
Nearest to but: however, and, although, though, which, while, that, or,
Nearest to th: bc, multics, ad, rd, bce, cm, st, ce,
Nearest to system: systems, mannerheim, zooming, group, immunoglobulins, admits, kenning, network,
Nearest to there: they, it, now, he, she, we, still, this,
Nearest to four: six, seven, eight, five, three, nine, two, zero,
Average loss at step 72000 : 3.37139070928
Average loss at step 74000 : 3.35036020774
Average loss at step 76000 : 3.31517776144
Average loss at step 78000 : 3.35513249722
Average loss at step 80000 : 3.3776724962
Nearest to six: five, eight, seven, four, nine, three, two, zero,
Nearest to often: sometimes, frequently, usually, commonly, generally, widely, typically, now,
Nearest to history: rix, survey, connexion, milner, list, journal, regardless, djinn,
Nearest to was: is, were, became, had, has, been, be, being,
Nearest to from: through, into, across, around, during, iib, within, by,
Nearest to i: you, we, ii, g, iii, t, lee, iv,
Nearest to however: but, although, that, while, though, when, and, currently,
Nearest to may: can, could, would, will, must, should, might, cannot,
Nearest to as: when, unresponsive, interfering, cabral, warship, before, fourths, dirac,
Nearest to called: considered, lactose, used, mad, formosa, minestrone, known, hus,
Nearest to all: both, every, each, various, many, dich, some, any,
Nearest to but: however, although, though, and, while, they, which, or,
Nearest to th: bc, multics, rd, ad, st, bce, five, nd,
Nearest to system: systems, mannerheim, kampaku, admits, immunoglobulins, pogrom, epicenter, astrological,
Nearest to there: they, it, he, she, still, often, we, currently,
Nearest to four: six, five, eight, three, seven, nine, two, zero,
Average loss at step 82000 : 3.40933198977
Average loss at step 84000 : 3.41235718811
Average loss at step 86000 : 3.38808333994
Average loss at step 88000 : 3.35123853177
Average loss at step 90000 : 3.36252585894
Nearest to six: eight, five, seven, four, nine, three, two, zero,
Nearest to often: sometimes, usually, frequently, commonly, generally, typically, widely, now,
Nearest to history: rix, djinn, regardless, list, workgroups, tradition, consisting, journal,
Nearest to was: is, became, had, were, has, be, been, seems,
Nearest to from: through, across, into, during, iib, without, auras, under,
Nearest to i: we, you, g, ii, iv, t, iii, exclusivity,
Nearest to however: but, although, though, that, while, which, cases, pigmented,
Nearest to may: can, could, would, will, should, must, might, cannot,
Nearest to as: deanna, disobey, resold, myanmar, nieuwe, emmett, tert, pacification,
Nearest to called: considered, used, lactose, formosa, named, mad, known, kept,
Nearest to all: many, both, several, every, some, each, various, these,
Nearest to but: however, although, and, though, while, they, until, which,
Nearest to th: bc, multics, rd, ad, fifth, one, bce, nd,
Nearest to system: systems, program, model, mannerheim, tice, line, tiamat, conditions,
Nearest to there: it, they, he, still, we, she, now, currently,
Nearest to four: five, six, seven, eight, three, two, nine, zero,
Average loss at step 92000 : 3.39632817602
Average loss at step 94000 : 3.24911082464
Average loss at step 96000 : 3.35330440253
Average loss at step 98000 : 3.24368149039
Average loss at step 100000 : 3.36106535047
Nearest to six: four, seven, five, eight, nine, three, two, zero,
Nearest to often: sometimes, usually, frequently, commonly, generally, typically, widely, now,
Nearest to history: rix, philosophical, medieval, workgroups, consisting, survey, list, argyle,
Nearest to was: is, became, had, were, has, been, seems, appears,
Nearest to from: through, across, into, during, in, perez, iib, typhoon,
Nearest to i: we, you, ii, t, exclusivity, g, they, iii,
Nearest to however: but, although, that, though, especially, and, alderman, which,
Nearest to may: can, could, would, will, should, must, might, cannot,
Nearest to as: like, deanna, cabral, elastic, herold, hussars, perturbed, ibu,
Nearest to called: named, lactose, used, considered, ghulam, mad, formosa, known,
Nearest to all: every, various, these, several, both, many, some, any,
Nearest to but: however, although, and, though, while, noticed, did, that,
Nearest to th: bc, rd, nd, multics, fifth, st, bce, ad,
Nearest to system: systems, model, program, computing, samara, network, mannerheim, project,
Nearest to there: they, we, he, it, now, still, she, sometimes,
Nearest to four: six, seven, five, eight, three, nine, two, zero,

In [8]:
num_points = 400

tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000)
two_d_embeddings = tsne.fit_transform(final_embeddings[1:num_points+1, :])


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-8-f70c03698d67> in <module>()
      1 num_points = 400
      2 
----> 3 tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000)
      4 two_d_embeddings = tsne.fit_transform(final_embeddings[1:num_points+1, :])

NameError: name 'TSNE' is not defined

In [ ]:
def plot(embeddings, labels):
  assert embeddings.shape[0] >= len(labels), 'More labels than embeddings'
  pylab.figure(figsize=(15,15))  # in inches
  for i, label in enumerate(labels):
    x, y = embeddings[i,:]
    pylab.scatter(x, y)
    pylab.annotate(label, xy=(x, y), xytext=(5, 2), textcoords='offset points',
                   ha='right', va='bottom')
  pylab.show()

words = [reverse_dictionary[i] for i in xrange(1, num_points+1)]
plot(two_d_embeddings, words)

Problem

An alternative to Word2Vec is called CBOW (Continuous Bag of Words). In the CBOW model, instead of predicting a context word from a word vector, you predict a word from the sum of all the word vectors in its context. Implement and evaluate a CBOW model trained on the text8 dataset.