Being Divisive

This is just a little "gotcha" to watch out for in python 2.7. Can you predict what will happen in the following code before you run it?


In [1]:
from __future__ import division

In [2]:
nums = [2,4,6]
[2/n for n in nums]


Out[2]:
[1.0, 0.5, 0.3333333333333333]

Why did this happen?

??? (fill in the answer here)

There are ways to fix this:

  • 2.0/n
  • 2/float(n)
  • from __future__ import division #get it from python 3.x

Let Me Enumerate The Ways

Say your ischool project partner gave you a list of stuff to you. It's in priority order already but you want each item to be numbered in order, you know, first do ANLP reading, then do ANLP homework, then do ANLP corpus selection, and oh yeah, maybe then do something for 202 and TUI. So you start with a list like

todo = ['anlp_reading', 'anlp_homework', 'anlp_corpus', '202_reading', 'tui_homework', 'tui_project']

and you want to turn it into

(0, ['anlp_reading'), (1, 'anlp_homework'), (2,'anlp_corpus'), (3, '202_reading'), (4, 'tui_homework'), (5, 'tui_project')]

Below, write code for a standard way to do this, either with a for loop or a list comprehension.


In [4]:
todo = ['anlp_reading', 'anlp_homework', 'anlp_corpus', '202_reading', 'tui_homework', 
        'tui_project']

i = 0
todo2 = []
for v in todo:
    todo2.append((i, v))
    i+=1
    
todo2


Out[4]:
[(0, 'anlp_reading'),
 (1, 'anlp_homework'),
 (2, 'anlp_corpus'),
 (3, '202_reading'),
 (4, 'tui_homework'),
 (5, 'tui_project')]

Now here is the handy quick little way to do this faster: enumerate! This produces an iterator object, so to see its output all at once, wrap a list() around it, e.g,

list(enumerate(todo))


In [5]:
list(enumerate(todo))


Out[5]:
[(0, 'anlp_reading'),
 (1, 'anlp_homework'),
 (2, 'anlp_corpus'),
 (3, '202_reading'),
 (4, 'tui_homework'),
 (5, 'tui_project')]

Word (parts) Are Cheap

In English we can determine a lot of information about word forms by looking at the endings of the words. Python makes this very easy to do. For example, words that end in "ing" are often gerunds or else present participles. (The gerund has the same function as a noun but looks like a verb. The present particle is part of present tense.) Below, the code loads the text files from NLTK as described in Chapter 1.

Choose one of the texts and write one line of code that pulls out all words that end in 'ing' from that text file. (Hint: there is a special string command that does just what you want.)


In [6]:
import nltk
from nltk.book import *


*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908

gerunds = [w for w in text3 if w.endswith("ing")]


In [13]:
print gerunds


['beginning', 'evening', 'morning', 'evening', 'morning', 'gathering', 'bring', 'yielding', 'yielding', 'yielding', 'yielding', 'evening', 'morning', 'evening', 'morning', 'bring', 'moving', 'living', 'saying', 'evening', 'morning', 'bring', 'living', 'creeping', 'thing', 'thing', 'creeping', 'thing', 'living', 'thing', 'bearing', 'yielding', 'thing', 'thing', 'evening', 'morning', 'living', 'saying', 'living', 'knowing', 'walking', 'bring', 'saying', 'bring', 'living', 'flaming', 'offering', 'offering', 'finding', 'wounding', 'saying', 'concerning', 'creeping', 'thing', 'bring', 'thing', 'living', 'thing', 'bring', 'creeping', 'thing', 'according', 'living', 'according', 'thing', 'creeping', 'thing', 'creeping', 'thing', 'living', 'creeping', 'living', 'thing', 'evening', 'covering', 'saying', 'Bring', 'living', 'thing', 'creeping', 'thing', 'creeping', 'thing', 'thing', 'living', 'moving', 'thing', 'bring', 'saying', 'living', 'living', 'bring', 'living', 'everlasting', 'living', 'beginning', 'dwelling', 'nothing', 'having', 'going', 'concerning', 'beginning', 'king', 'king', 'king', 'king', 'king', 'king', 'king', 'king', 'king', 'king', 'king', 'king', 'king', 'king', 'king', 'king', 'king', 'king', 'king', 'king', 'king', 'king', 'king', 'thing', 'saying', 'exceeding', 'seeing', 'saying', 'going', 'smoking', 'burning', 'saying', 'saying', 'exceeding', 'everlasting', 'everlasting', 'everlasting', 'everlasting', 'talking', 'according', 'saying', 'being', 'saying', 'thing', 'according', 'saying', 'bring', 'thing', 'Seeing', 'bring', 'according', 'communing', 'seeing', 'bring', 'bring', 'nothing', 'bring', 'morning', 'saying', 'being', 'saving', 'concerning', 'thing', 'anything', 'morning', 'king', 'sinning', 'morning', 'thing', 'covering', 'being', 'mocking', 'thing', 'morning', 'putting', 'saying', 'according', 'thing', 'everlasting', 'offering', 'morning', 'offering', 'offering', 'offering', 'thing', 'seeing', 'offering', 'thing', 'blessing', 'multiplying', 'saying', 'saying', 'saying', 'saying', 'saying', 'saying', 'saying', 'willing', 'bring', 'bring', 'saying', 'willing', 'bring', 'concerning', 'evening', 'speaking', 'giving', 'drinking', 'wondering', 'drinking', 'earring', 'being', 'earring', 'saying', 'saying', 'speaking', 'earring', 'thing', 'bowing', 'morning', 'seeing', 'coming', 'according', 'according', 'cunning', 'dwelling', 'king', 'king', 'sporting', 'saying', 'springing', 'saying', 'seeing', 'nothing', 'morning', 'concerning', 'bring', 'bring', 'saying', 'saying', 'Bring', 'according', 'bring', 'bring', 'blessing', 'according', 'Bring', 'blessing', 'hunting', 'exceeding', 'blessing', 'blessing', 'blessing', 'blessing', 'dwelling', 'blessing', 'mourning', 'touching', 'purposing', 'blessing', 'saying', 'seeing', 'ascending', 'descending', 'bring', 'morning', 'saying', 'lying', 'evening', 'morning', 'bearing', 'bearing', 'evening', 'thing', 'removing', 'according', 'watering', 'saying', 'saying', 'getting', 'doing', 'saying', 'morning', 'saying', 'saying', 'saying', 'saying', 'saying', 'breaking', 'blessing', 'according', 'saying', 'lying', 'thing', 'saying', 'according', 'thing', 'thing', 'saying', 'being', 'offering', 'departing', 'being', 'king', 'according', 'according', 'being', 'feeding', 'binding', 'saying', 'bring', 'wandering', 'saying', 'bearing', 'going', 'mourning', 'thing', 'saying', 'saying', 'saying', 'Bring', 'saying', 'saying', 'blessing', 'thing', 'saying', 'saying', 'according', 'saying', 'saying', 'king', 'thing', 'king', 'king', 'according', 'king', 'morning', 'saying', 'bring', 'nothing', 'morning', 'saying', 'according', 'according', 'saying', 'beginning', 'thing', 'following', 'thing', 'bring', 'thing', 'according', 'ring', 'king', 'numbering', 'according', 'saying', 'bring', 'concerning', 'saying', 'saying', 'saying', 'saying', 'bring', 'saying', 'bring', 'bring', 'bring', 'saying', 'saying', 'saying', 'according', 'Bring', 'bring', 'Bring', 'according', 'according', 'saying', 'according', 'morning', 'doing', 'according', 'according', 'saying', 'Bring', 'bring', 'seeing', 'bring', 'saying', 'bring', 'earing', 'bring', 'saying', 'bring', 'according', 'saying', 'bring', 'saying', 'according', 'everlasting', 'Bring', 'guiding', 'saying', 'saying', 'bring', 'beginning', 'gathering', 'Binding', 'couching', 'everlasting', 'morning', 'according', 'blessing', 'commanding', 'mourning', 'saying', 'saying', 'saying', 'according', 'mourning', 'mourning', 'mourning', 'according', 'saying', 'saying', 'bring', 'bring', 'saying', 'being']

Look at the results and try to see what you can tell, without context, about those words -- are they nouns, gerunds, present participles, something else?

The Words Get In the Way

In the homework we looked a bit at how to get rid of some of the noise from a list of frequent words. Some standard approaches are to:

  • remove punctuation
  • lowercase words
  • stem words, or otherwise normalize them (e.g., convert to their infinitives or singular forms)
  • remove very common words (stopwords)

These are all fine things to do. An additional idea is to compare the common words from one collection to those of another and see how they differ. Those that differ but are still very common are probably quite interesting and signify something special about that collection, especially after some simple normalization steps.

To try this out, do the following steps:

  1. Create frequency distributions fd1 and fd2 from the contents of two different NLTK texts text1 and text2 (you can choose which texts you want to work with).
  2. Compare the keys in the top 50 (or 100 or 200) most frequent keys for these two collections. So compare the keys from fd1 to those from fd2 and vice versa. You should see different words pop out in each comparison.

In [ ]: