Exercise from http://www.nltk.org/book_1ed/ch02.html

Author : Nirmal kumar Ravi

Create a variable phrase containing a list of words. Experiment with the operations described in this chapter, including addition, multiplication, indexing, slicing, and sorting.


In [10]:
wphrase = ['I','love','NLP']
print wphrase
wphrase = wphrase +  ['and','ML'] #addition
print wphrase
print wphrase*2 #multiply
print wphrase[2] #indexing
print wphrase[1:3] #slicing
print sorted(wphrase) #sorting


['I', 'love', 'NLP']
['I', 'love', 'NLP', 'and', 'ML']
['I', 'love', 'NLP', 'and', 'ML', 'I', 'love', 'NLP', 'and', 'ML']
NLP
['love', 'NLP']
['I', 'ML', 'NLP', 'and', 'love']

Use the corpus module to explore austen-persuasion.txt. How many word tokens does this book have? How many word types?


In [14]:
from nltk.corpus import gutenberg
persuasion = gutenberg.words('austen-persuasion.txt')
print 'There are %d tokens and %d word types'%(len(persuasion),len(set(persuasion)))


There are 98171 tokens and 6132 word types

Use the Brown corpus reader nltk.corpus.brown.words() or the Web text corpus reader nltk.corpus.webtext.words() to access some sample text in two different genres.


In [15]:
from nltk.corpus import brown
brown.categories()


Out[15]:
[u'adventure',
 u'belles_lettres',
 u'editorial',
 u'fiction',
 u'government',
 u'hobbies',
 u'humor',
 u'learned',
 u'lore',
 u'mystery',
 u'news',
 u'religion',
 u'reviews',
 u'romance',
 u'science_fiction']

In [17]:
print 'words in Romance and humor genres',
brown.words(categories=['romance','humor'])[:10]


words in Romance and humor genres
Out[17]:
[u'They',
 u'neither',
 u'liked',
 u'nor',
 u'disliked',
 u'the',
 u'Old',
 u'Man',
 u'.',
 u'To']

Read in the texts of the State of the Union addresses, using the state_union corpus reader. Count occurrences of men, women, and people in each document. What has happened to the usage of these words over time?


In [28]:
%matplotlib inline
from nltk.corpus import state_union
cfd = nltk.ConditionalFreqDist(
    (target, fileid[:4])
    for fileid in state_union.fileids()
    for w in state_union.words(fileid)
    for target in ['men', 'women','people']
    if w.lower().startswith(target))


cfd.plot()


Investigate the holonym-meronym relations for some nouns. Remember that there are three kinds of holonym-meronym relation, so you need to use: member_meronyms(), part_meronyms(), substance_meronyms(), member_holonyms(), part_holonyms(), and substance_holonyms().


In [38]:
from nltk.corpus import wordnet as wn

print wn.synset('tree.n.01').member_meronyms()
print wn.synset('tree.n.01').part_meronyms()
print wn.synset('tree.n.01').substance_meronyms()
print wn.synset('tree.n.01').member_holonyms()
print wn.synset('tree.n.01').part_holonyms()
print wn.synset('tree.n.01').substance_holonyms()


[]
[Synset('burl.n.02'), Synset('crown.n.07'), Synset('limb.n.02'), Synset('stump.n.01'), Synset('trunk.n.01')]
[Synset('heartwood.n.01'), Synset('sapwood.n.01')]
[Synset('forest.n.01')]
[]
[]

In the discussion of comparative wordlists, we created an object called translate which you could look up using words in both German and Italian in order to get corresponding words in English. What problem might arise with this approach? Can you suggest a way to avoid this problem?

  • If the word is not present it might cause error
  • Since we are using unigrams the meaning may change when the word is used in sentence

According to Strunk and White's Elements of Style, the word however, used at the start of a sentence, means "in whatever way" or "to whatever extent", and not "nevertheless". They give this example of correct usage: However you advise him, he will probably do as he thinks best. (http://www.bartleby.com/141/strunk3.html) Use the concordance tool to study actual usage of this word in the various texts we have been considering. See also the LanguageLog posting "Fossilized prejudices about 'however'" at http://itre.cis.upenn.edu/~myl/languagelog/archives/001913.html


In [40]:
nltk.Text(persuasion).concordance("however")


Displaying 25 of 89 matches:
onceited , silly father . She had , however , one very intimate friend , a sens
early custom . But these measures , however good in themselves , were insuffici
ellynch Hall was to be let . This , however , was a profound secret , not to be
t immediate neighbourhood , which , however , had not suited him ; that acciden
e dues of a tenant . It succeeded , however ; and though Sir Walter must ever l
h , the former curate of Monkford , however suspicious appearances may be , but
good character and appearance ; and however Lady Russell might have asked yet f
siness no evil . She was assisted , however , by that perfect indifference and 
h the others . Something occurred , however , to give her a different duty . Ma
 , but can never alter plain ones . However , at any rate , as I have a great d
l what is due to you as my sister . However , we may as well go and sit with th
o means of her going . She wished , however to see the Crofts , and was glad to
ithout any approach to coarseness , however , or any want of good humour . Anne
ll be in question . She could not , however , reach such a degree of certainty 
al to Anne ' s nerves . She found , however , that it was one to which she must
re gone , she hoped , to be happy , however oddly constructed such happiness mi
once more in the same room . Soon , however , she began to reason with herself 
 ! It would be but a new creation , however , and I never think much of your ne
rove of Uppercross ." Her husband , however , would not agree with her here ; f
re presently ." Captain Wentworth , however , came from his window , apparently
d Walter stir . In another moment , however , she found herself in the state of
 at once . After a short struggle , however , Charles Hayter seemed to quit the
rything being to be done together , however undesired and inconvenient . She tr
 , nobody answered her . Winthrop , however , or its environs -- for young men 
fore they were beyond her hearing , however , Louisa spoke again . " Mary is go