In [ ]:
# ----------------------------------- Spacy
# https://spacy.io/
# ----------------------------------- +
import spacy
from spacy import displacy

In [ ]:
nlp = spacy.load('en_core_web_sm')
doc = nlp(u"Apple is looking to buying U.K. startup for $1 billion")


In [ ]:
# ------------------------------------------------------------------- + Natural Language Processing
# ----- Segmentation >> Tokenization
# ----- Tagging      >> Parsing       >> Entity Recognizer (ER)
# ------------------------------------------------------------------- +


In [ ]:
for token in doc:
    print(token.text, token.pos_, token.tag_, token.dep_, token.head.text, token.lefts, token.rights)

In [ ]:
displacy.serve(doc, style='dep')

In [ ]:
displacy.serve(doc, style='ent')

In [ ]:
# ------------------------------------- Inline Rendering (instead of serving)
displacy.render(doc, style='dep', jupyter=True, options={'distance': 100})

1) spaCy architecture

2) spaCy Models for Tagging >> Parsing (Dependency Graph) >> ER

3) The WordVector .. aka embeddings

4) use cases

5) BYOM

CNN, Recurrent Neural Networks (LSTM, GRU, BiLSTM, ...), Recursive neural networks, Memory Networks, Attention Models, ...
Simplified NLP Technique using NN
  • each vector corresponds to a word
  • input layer = word vector or an updated word vector
  • weights represent connections to be learnt or updated

  • e.g. classify movie as good or bad - based on review,
    • review 1 = the movie was excellent
    • review 2 = the movie was far from excellent
    • review 3 = the movie was far from excellent in its technical aspect but I really enjoyed it and I highly recommend it

  • Q: how do you train
     * run NN classifier : takes in to account the entire sentence - black magic at work :)
     * i.e. input layer, hidden layer, output layer with forward and backward prop.
     * tools
          - word2vec: Gensim Python Library ::: sentences, size of embedding vector, window i.e. neighbors, workers
          - skip-gram and cbow models : using hierarchical softmax or negative sampling
          - keras: build cnn pipeline
    
Reuse Word embeddings

In [ ]:
# > You shall know a word by the company it keeps - J.R. Firth

In [ ]: