In [1]:
!pip install spacy nltk
Spacy is an NLP/Computational Linguistics package built from the ground up. It's written in Cython so it's fast!!
Let's check it out. Here's some text from Alice in Wonderland free on Gutenberg.
In [2]:
text = """'Please would you tell me,' said Alice, a little timidly, for she was not quite sure whether it was good manners for her to speak first, 'why your cat grins like that?'
'It's a Cheshire cat,' said the Duchess, 'and that's why. Pig!'
She said the last word with such sudden violence that Alice quite jumped; but she saw in another moment that it was addressed to the baby, and not to her, so she took courage, and went on again:—
'I didn't know that Cheshire cats always grinned; in fact, I didn't know that cats could grin.'
'They all can,' said the Duchess; 'and most of 'em do.'
'I don't know of any that do,' Alice said very politely, feeling quite pleased to have got into a conversation.
'You don't know much,' said the Duchess; 'and that's a fact.'"""
Download and load the model. SpaCy has an excellent English NLP processor. It has the following features which we shall explore:
In [3]:
import spacy
import spacy.en.download
# spacy.en.download.main()
processor = spacy.en.English()
In [4]:
processed_text = processor(text)
processed_text
Out[4]:
Looks like the same text? Let's dig a little deeper
In [5]:
n = 0
for sentence in processed_text.sents:
print(n, sentence)
n+=1
In [6]:
n = 0
for sentence in processed_text.sents:
for token in sentence:
print(n, token, token.pos_, token.lemma_)
n+=1
In [7]:
for entity in processed_text.ents:
print(entity, entity.label_)
In [8]:
for noun_chunk in processed_text.noun_chunks:
print(noun_chunk)
In [9]:
def pr_tree(word, level):
if word.is_punct:
return
for child in word.lefts:
pr_tree(child, level+1)
print('\t'* level + word.text + ' - ' + word.dep_)
for child in word.rights:
pr_tree(child, level+1)
In [10]:
for sentence in processed_text.sents:
pr_tree(sentence.root, 0)
print('-------------------------------------------')
In [11]:
proc_fruits = processor('''I think green apples are delicious.
While pears have a strange texture to them.
The bowls they sit in are ugly.''')
apples, pears, bowls = proc_fruits.sents
fruit = processed_text.vocab['fruit']
print(apples.similarity(fruit))
print(pears.similarity(fruit))
print(bowls.similarity(fruit))
Find your favorite news source and grab the article text.