word2vec

Slides: bit.ly/word2vec_talk


Agenda

0) Welcome!
1) What is word2vec?
2) How does word2vec work?
3) What can you do with word2vec?
4) Demo(?)

Slides: bit.ly/word2vec_talk

hi, brian. @BrianSpiering

Data Science Faculty @GalvanizeU

Slides: bit.ly/word2vec_talk
(adapted from my Natural Language Processing (NLP) course)


Pop Quiz

Do computers prefer numbers or words?

Numbers

word2vec is a series of algorithms to map words (strings) to numbers (lists of floats). </details>


In [20]:
output = {'fox': [-0.00449447, -0.00310097]}

In [21]:
input_text = "The quick brown fox"

print(output['fox'])


[-0.00449447, -0.00310097]

Why is word2vec so popular?

  1. Creates a word "cloud", organized by semantic meaning.

  2. Converts text into a numerical form that machine learning algorithms and Deep Learning Neural Nets can then use as input.


“You shall know a word by the company it keeps”

- J. R. Firth 1957

Distributional Hypothesis: Words that occur in the same contexts tend to have similar meanings

Example:

... government debt problems are turning into banking crises...

... Europe governments needs unified banking regulation to replace the hodgepodge of debt regulations...

The words: government, regulation, and debt probably represent some aspect of banking since they frequently appear near by.

How does word2vec model the Distributional Hypothesis?

word2Vec is a very simple neural network:

Source

Skip-gram architecture

Given the current word, predict the context (surrounding words).


Skip-gram example

“Insurgents killed in ongoing fighting”


In [22]:
bigrams = ["insurgents killed", "killed in", "in ongoing", "ongoing fighting"]

skip_2_bigrams = ["insurgents killed", "insurgents in", "insurgents ongoing", "killed in", 
                  "killed ongoing", "killed fighting", "in ongoing", "in fighting", "ongoing fighting"]

Now that we have word vectors, what can we do?

Math with words!

Types of Word Math

  1. Distance
  2. Arithmetic

1. Distance


Words that are related will be closer than unrelated words, thus the relationships between words can encoded as distance through the space.

Words closest to “Sweden”

Source

2. Arithmetic: Word analogies

The "Hello, world!" of word2vec:

Man is to woman as king is to queen

$cos(w, king) - cos(w, man) + cos(w, woman) = cos(w, queen)$

Verb Tense

everything2vec

emjoi2vec

Source

Summary

  • word2vec: Create a dense vector representation of words that models semantic meaning based on context
  • Then you can do math with words
  • Sets you up for machine learning and Deep Learning

Where do I go from here?