0) Welcome!
1) What is word2vec?
2) How does word2vec work?
3) What can you do with word2vec?
4) Demo(?)
Slides: bit.ly/word2vec_talk
hi, brian. @BrianSpiering
Data Science Faculty @GalvanizeU
Slides: bit.ly/word2vec_talk
(adapted from my Natural Language Processing (NLP) course)
Numbers
word2vec is a series of algorithms to map words (strings) to numbers (lists of floats).
</details>
In [20]:
output = {'fox': [-0.00449447, -0.00310097]}
In [21]:
input_text = "The quick brown fox"
print(output['fox'])
“You shall know a word by the company it keeps”
- J. R. Firth 1957
Distributional Hypothesis: Words that occur in the same contexts tend to have similar meanings
Example:
... government debt problems are turning into banking crises...
... Europe governments needs unified banking regulation to replace the hodgepodge of debt regulations...
The words: government, regulation, and debt probably represent some aspect of banking since they frequently appear near by.
In [22]:
bigrams = ["insurgents killed", "killed in", "in ongoing", "ongoing fighting"]
skip_2_bigrams = ["insurgents killed", "insurgents in", "insurgents ongoing", "killed in",
"killed ongoing", "killed fighting", "in ongoing", "in fighting", "ongoing fighting"]
Math with words!