Spacy

spaCy is an open-source software library for advanced Natural Language Processing, written in the programming languages Python and Cython. It is slightly more ready enterprise use cases. Its out of box POS tagger, NER analyzers are very popular.


In [4]:
sentence = """European authorities fined Google a record $5.1 billion on Wednesday for 
abusing its power in the mobile phone market and ordered the company to alter its practices"""

In [7]:
"""
Install spacy
$ pip install spacy

Download en_core_web_sm module
$ python -m spacy download en_core_web_sm

"""

import spacy
from spacy import displacy
from collections import Counter
import en_core_web_sm
nlp = en_core_web_sm.load()

doc = nlp(sentence)
[(X.text, X.label_) for X in doc.ents]


Out[7]:
[('European', 'NORP'),
 ('Google', 'ORG'),
 ('$5.1 billion', 'MONEY'),
 ('Wednesday', 'DATE')]

Display dependency graph


In [8]:
displacy.render(nlp(str(sentence)), style='dep', jupyter = True, options = {'distance': 120})


European ADJ authorities NOUN fined VERB Google PROPN a DET record NOUN $ SYM 5.1 NUM billion NUM on ADP Wednesday PROPN for ADP SPACE abusing VERB its DET power NOUN in ADP the DET mobile ADJ phone NOUN market NOUN and CCONJ ordered VERB the DET company NOUN to PART alter VERB its DET practices NOUN amod nsubj dobj det npadvmod quantmod compound nummod prep pobj prep advcl poss dobj prep det amod compound pobj cc conj det dobj aux xcomp poss dobj

In [9]:
[(x.orth_,x.pos_, x.lemma_) 
     for x in [y for y in nlp(sentence) 
               if not y.is_stop and y.pos_ != 'PUNCT']]


Out[9]:
[('European', 'ADJ', 'european'),
 ('authorities', 'NOUN', 'authority'),
 ('fined', 'VERB', 'fine'),
 ('Google', 'PROPN', 'Google'),
 ('record', 'NOUN', 'record'),
 ('$', 'SYM', '$'),
 ('5.1', 'NUM', '5.1'),
 ('billion', 'NUM', 'billion'),
 ('Wednesday', 'PROPN', 'Wednesday'),
 ('\n', 'SPACE', '\n'),
 ('abusing', 'VERB', 'abuse'),
 ('power', 'NOUN', 'power'),
 ('mobile', 'ADJ', 'mobile'),
 ('phone', 'NOUN', 'phone'),
 ('market', 'NOUN', 'market'),
 ('ordered', 'VERB', 'order'),
 ('company', 'NOUN', 'company'),
 ('alter', 'VERB', 'alter'),
 ('practices', 'NOUN', 'practice')]

Now let's see how to create text classifier using nltk and scikit learn.