notebook.community

Edit and run



In [ ]:

    
# ----------------------------------- Spacy
# https://spacy.io/
# ----------------------------------- +
import spacy
from spacy import displacy



In [ ]:

    
nlp = spacy.load('en_core_web_sm')
doc = nlp(u"Apple is looking to buying U.K. startup for $1 billion")



In [ ]:

    
# ------------------------------------------------------------------- + Natural Language Processing
# ----- Segmentation >> Tokenization
# ----- Tagging      >> Parsing       >> Entity Recognizer (ER)
# ------------------------------------------------------------------- +



In [ ]:

    
for token in doc:
    print(token.text, token.pos_, token.tag_, token.dep_, token.head.text, token.lefts, token.rights)



In [ ]:

    
displacy.serve(doc, style='dep')



In [ ]:

    
displacy.serve(doc, style='ent')



In [ ]:

    
# ------------------------------------- Inline Rendering (instead of serving)
displacy.render(doc, style='dep', jupyter=True, options={'distance': 100})

1) spaCy architecture

Containers e.g. Doc for in place modification
https://explosion.ai/blog/multithreading-with-cython

2) spaCy Models for Tagging >> Parsing (Dependency Graph) >> ER

https://spacy.io/models/
Trained using Neural Nets, since v2.x
https://spacy.io/usage/facts-figures [see, feature comparison + spaCy and other libraries]

3) The WordVector .. aka embeddings

https://spacy.io/models/en#section-en_vectors_web_lg
https://nlp.stanford.edu/projects/glove/ [Trained on - Wikipedia, Web, Twitter]
https://code.google.com/archive/p/word2vec/
view word vectors: t-SNE - https://www.kaggle.com/poonaml/trick-or-treat-visualizing-word-vectors-t-sne
cbow vs skip-gram - https://arxiv.org/pdf/1301.3781.pdf
doc2vec ..

4) use cases

http://mccormickml.com/2018/06/15/applying-word2vec-to-recommenders-and-advertising/
answer - who done what - questions
classifiers
sentiment analysis
https://opendatascience.com/dissecting-the-presidential-debates-with-an-nlp-scalpel/
https://machinelearningmastery.com/develop-word-embedding-model-predicting-movie-review-sentiment/
natural language generation
...

5) BYOM

CNN, Recurrent Neural Networks (LSTM, GRU, BiLSTM, ...), Recursive neural networks, Memory Networks, Attention Models, ...

Simplified NLP Technique using NN

each vector corresponds to a word
input layer = word vector or an updated word vector
weights represent connections to be learnt or updated
e.g. classify movie as good or bad - based on review,
- review 1 = the movie was excellent
- review 2 = the movie was far from excellent
- review 3 = the movie was far from excellent in its technical aspect but I really enjoyed it and I highly recommend it

Q: how do you train

 * run NN classifier : takes in to account the entire sentence - black magic at work :)
 * i.e. input layer, hidden layer, output layer with forward and backward prop.
 * tools
      - word2vec: Gensim Python Library ::: sentences, size of embedding vector, window i.e. neighbors, workers
      - skip-gram and cbow models : using hierarchical softmax or negative sampling
      - keras: build cnn pipeline

https://radimrehurek.com/gensim/intro.html

Reuse Word embeddings



In [ ]:

    
# > You shall know a word by the company it keeps - J.R. Firth



In [ ]: