Exercise from http://www.nltk.org/book_1ed/ch01.html

Author : Nirmal kumar Ravi

Try using the Python interpreter as a calculator, and typing expressions like 12 / (4 + 1).


In [1]:
12/(4+1)


Out[1]:
2

Given an alphabet of 26 letters, there are 26 to the power 10, or 26 ** 10, ten-letter strings we can form. That works out to 141167095653376L (the L at the end just indicates that this is Python's long-number format). How many hundred-letter strings are possible?


In [2]:
26 ** 100


Out[2]:
3142930641582938830174357788501626427282669988762475256374173175398995908420104023465432599069702289330964075081611719197835869803511992549376L

The Python multiplication operation can be applied to lists. What happens when you type ['Monty', 'Python'] 20, or 3 sent1? Answer: The list item(s) gets multiplied to N number of times. Where N is an integer.


In [3]:
['Monty', 'Python'] * 20


Out[3]:
['Monty',
 'Python',
 'Monty',
 'Python',
 'Monty',
 'Python',
 'Monty',
 'Python',
 'Monty',
 'Python',
 'Monty',
 'Python',
 'Monty',
 'Python',
 'Monty',
 'Python',
 'Monty',
 'Python',
 'Monty',
 'Python',
 'Monty',
 'Python',
 'Monty',
 'Python',
 'Monty',
 'Python',
 'Monty',
 'Python',
 'Monty',
 'Python',
 'Monty',
 'Python',
 'Monty',
 'Python',
 'Monty',
 'Python',
 'Monty',
 'Python',
 'Monty',
 'Python']

In [4]:
sent1 = ['Call', 'me', 'Ishmael', '.']
3 * sent1


Out[4]:
['Call',
 'me',
 'Ishmael',
 '.',
 'Call',
 'me',
 'Ishmael',
 '.',
 'Call',
 'me',
 'Ishmael',
 '.']

Review 1.1 on computing with language. How many words are there in text2? How many distinct words are there?


In [3]:
import nltk
nltk.download()


showing info http://www.nltk.org/nltk_data/
Out[3]:
True

In [5]:
from nltk.book import *


*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908

In [6]:
print 'There are %d distict words in text2'%(len(set(text2)))


There are 6833 distict words in text2

Find the collocations in text5.


In [7]:
text5.collocations()


wanna chat; PART JOIN; MODE #14-19teens; JOIN PART; PART PART;
cute.-ass MP3; MP3 player; JOIN JOIN; times .. .; ACTION watches; guys
wanna; song lasts; last night; ACTION sits; -...)...- S.M.R.; Lime
Player; Player 12%; dont know; lez gurls; long time

Consider the following Python expression: len(set(text4)). State the purpose of this expression. Describe the two steps involved in performing this computation.

It computes length of unique words in text4. The first is set operation which removes duplicates and second is calculating length of unique words.


In [8]:
len(set(text4))


Out[8]:
9754

In [ ]: