Remember: Pair-Programming → take care of your neighbor

Exercise 1 — Temperature Converter

  • Write a generic temperature converter function (Kelvin, Fahrenheit, Celsius → Kelvin, Fahrenheit, Celsius)
  • Check it against the one by google

In [1]:
def K_to_C(t):
    """Convert temperature in Kelvin to Celsius"""
    return t - 273.15

def F_to_C(t):
    """Convert temperature in Fahrenheit to Celsius"""
    return 5*(t-32.)/9.

def C_to_K(t):
    """Convert temperature in Celsius to Kelvin"""
    return t + 273.15

def C_to_F(t):
    """Convert temperature in Celsius to Fahrenheit"""
    return 32+(t*9/5.)

def tconverter(temp, output_unit):
    """Temperature converter from Kelvin, Fahrenheit, Celsius to Kelvin, Fahrenheit, Celsius.
       
    Input temperature `temp` is a string with unit, e.g. '10C' or '-13F' or '235K'
    Output unit `output_unit` is one of 'C', 'F', 'K'
    """
    units = ('C', 'F', 'K')
    conversion = {'KC': K_to_C,
                  'FC': F_to_C,
                  'CK': C_to_K,
                  'CF': C_to_F}
    
    # check that the input temperature has the right format
    if not isinstance(temp, str) or temp[-1] not in units:
        raise TypeError('Input temperature '+str(temp)+' has the wrong format!')
    # check that requested output unit is known
    if output_unit not in units:
        raise ValueError('Output unit '+str(output_unit)+' not known!')
    # set input unit
    input_unit = temp[-1]
    # set input temperature
    temp = float(temp[:-1])
    # convert input temperature to Celsius
    if input_unit != 'C':
        temp = conversion[input_unit+'C'](temp)
    # convert temperature in Celsius to output unit
    if output_unit != 'C':
        temp = conversion['C'+output_unit](temp)
    return temp

In [2]:
tconverter('10C', 'K')


Out[2]:
283.15

In [3]:
tconverter('-123F', 'C')


Out[3]:
-86.11111111111111

In [4]:
tconverter('148F', 'K')


Out[4]:
337.59444444444443

In [5]:
tconverter(10, 'F')


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-bd6667d7779a> in <module>()
----> 1 tconverter(10, 'F')

<ipython-input-1-112f7ba951ae> in tconverter(temp, output_unit)
     29     # check that the input temperature has the right format
     30     if not isinstance(temp, str) or temp[-1] not in units:
---> 31         raise TypeError('Input temperature '+str(temp)+' has the wrong format!')
     32     # check that requested output unit is known
     33     if output_unit not in units:

TypeError: Input temperature 10 has the wrong format!

In [6]:
tconverter('123', 'C')


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-b906fec85b3b> in <module>()
----> 1 tconverter('123', 'C')

<ipython-input-1-112f7ba951ae> in tconverter(temp, output_unit)
     29     # check that the input temperature has the right format
     30     if not isinstance(temp, str) or temp[-1] not in units:
---> 31         raise TypeError('Input temperature '+str(temp)+' has the wrong format!')
     32     # check that requested output unit is known
     33     if output_unit not in units:

TypeError: Input temperature 123 has the wrong format!

In [11]:
tconverter('12C', 'L')


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-6083c892c24f> in <module>()
----> 1 tconverter('12C', 'L')

<ipython-input-1-112f7ba951ae> in tconverter(temp, output_unit)
     32     # check that requested output unit is known
     33     if output_unit not in units:
---> 34         raise ValueError('Output unit '+str(output_unit)+' not known!')
     35     # set input unit
     36     input_unit = temp[-1]

ValueError: Output unit L not known!

Exercise 2 — A Deception Experiment

  • Look for a text online that would be funny to deceive (max 1000 words) [we use here the beginning of Alice in Wonderland as an example: http://sabian.org/alice_in_wonderland1.php ]
  • Copy it in a text file
  • Create a dictionary of all words with their occurrences [think about what to do with casing, commas, dots, columns, etc.]
  • Create a deception dictionary with at least 50 words [a deception dictionary maps a word to another word, e.g. {'house': 'boat'}]
  • Write one or several functions in order to accomplish the following tasks:
    • read two text files, one with an arbitrary text to be deceived and one with a deception map, where every line contains two words ["original word" "mapped word"]
    • create and save to a third text file a deceived version of the input text

In [13]:
# read the input text
text = open("text.txt", "r")

In [14]:
# show first ten line of the file (this is not Python)
!head -n 20 text.txt


Alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, `and what is the use of a book,' thought Alice `without pictures or conversation?'

So she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy-chain would be worth the trouble of getting up and picking the daisies, when suddenly a White Rabbit with pink eyes ran close by her.

There was nothing so very remarkable in that; nor did Alice think it so very much out of the way to hear the Rabbit say to itself, `Oh dear! Oh dear! I shall be late!' (when she thought it over afterwards, it occurred to her that she ought to have wondered at this, but at the time it all seemed quite natural); but when the Rabbit actually took a watch out of its waistcoat-pocket, and looked at it, and then hurried on, Alice started to her feet, for it flashed across her mind that she had never before seen a rabbit with either a waistcoat-pocket, or a watch to take out of it, and burning with curiosity, she ran across the field after it, and fortunately was just in time to see it pop down a large rabbit-hole under the hedge. 

In [15]:
def standardize_word(word):
    """Standardize word.

    A standard word is lower cased and only contains alphabetic
    characters. Return two strings: (std_word, illegal_chars)
    """
    # string containing the illegal characters found in the word
    illegal = ''
    # make word lower-case
    word = word.lower()
    if not word.isalpha():
        # there are non alphabetic chars in the word
        for char in word:
            # collect non alphabetic chars
            if not char.isalpha():
                illegal += char
        # remove non alphabetic chars from the word
        word = word.translate(None, illegal)
    return word, illegal

In [16]:
# initialize the word frequency map
wfreq = {}
# set of illegal (i.e. non alphabetic)characters found in the text
# start with an empty set
illegal = set()
# word counter
word_count = 0
# iterate over the lines in the input file
for line in text:
    # split the line into words 
    # at this point, words may still contain illegal characters and have mixed case
    words = line.split()
    # iterate over words in the current line
    for word in words:
        # standardize words and get back the illegal characters
        stdword, illegal_chars = standardize_word(word)
        # update set of illegal chars
        illegal.update(illegal_chars)
        # we have got a new word
        word_count += 1
        # if the word is already in the dictionary update the counter,
        # otherwise initialize the counter
        wfreq[stdword] = wfreq.setdefault(stdword, 0) + 1
        #if stdword in wfreq:
        #    wfreq[stdword] += 1
        #else:
        #    # the word was not in the dictionary already: initialize the counter
        #    wfreq[stdword] = 1

In [17]:
print "Total number of words in the text:", word_count


Total number of words in the text: 253

In [18]:
print "Number of different words:", len(wfreq)


Number of different words: 136

In [19]:
print "Illegal characters in text:", list(illegal)


Illegal characters in text: ['!', '`', "'", ')', '(', '-', ',', '.', ';', ':', '?']

In [20]:
# we now want to see what words are occurring more often:
# Sort the dictionary by value (this a so-called Python idiom)
import operator
wfreq_sorted = sorted(wfreq.items(), key=operator.itemgetter(1), reverse=True)

In [21]:
# print the 50 more frequent words
print wfreq_sorted[:50]


[('the', 13), ('it', 11), ('to', 9), ('of', 8), ('her', 8), ('and', 8), ('a', 8), ('she', 7), ('was', 5), ('very', 4), ('alice', 4), ('or', 4), ('rabbit', 4), ('in', 4), ('had', 3), ('out', 3), ('that', 3), ('but', 3), ('with', 3), ('at', 3), ('when', 3), ('so', 3), ('watch', 2), ('waistcoatpocket', 2), ('dear', 2), ('for', 2), ('pictures', 2), ('across', 2), ('be', 2), ('by', 2), ('on', 2), ('sister', 2), ('oh', 2), ('ran', 2), ('mind', 2), ('as', 2), ('book', 2), ('nothing', 2), ('thought', 2), ('time', 2), ('all', 1), ('remarkable', 1), ('hedge', 1), ('over', 1), ('sleepy', 1), ('its', 1), ('seemed', 1), ('just', 1), ('actually', 1), ('late', 1)]

In [22]:
def load_deception_map(filename, on_error="fail"):
    """Load a deception map from file.
    
    Expected format is two words per line. If illegal characters
    are detected, the behaviour depends on on_error:
    if on_error is "fail" throw a ValueError, if on_error is "ignore"
    ignore illegal chars.
    """
    # initialize the deception map
    dmap = {}
    # open deception map file
    fh = open(filename, 'r')
    # iterate over lines in the file
    for count, line in enumerate(fh):
        # get the words on the current line
        words = line.split()
        if len(words) == 0:
            # this is an empty line, we can skip to the next line
            continue
        elif len(words) != 2:
            # we are expecting exactly two words per line
            # this must be an invalid line: throw an error!
            raise ValueError('Too many/few words on line '+str(count))
        # Standardize words
        key, illegal1 = standardize_word(words[0])
        value, illegal2 = standardize_word(words[1])
        if on_error == 'fail' and len(illegal1+illegal2) != 0:
            # we are asked to fail if there are illegal characters in the current line
            raise ValueError('Illegal chars on line '+str(count)+':'+illegal1+illegal2)
        # add the word to the deception map
        dmap[key] = value
        # no checking is done to ensure that the word is not already in the map,
        # i.e. the map may be inconsistent or redundant or non-invertible
    return dmap

In [24]:
# load deception map from file
deception = load_deception_map('deception.dict')

In [25]:
deception


Out[25]:
{'alice': 'john',
 'be': 'have',
 'her': 'his',
 'i': 'you',
 'is': 'was',
 'rabbit': 'tiger',
 'say': 'read',
 'she': 'he',
 'think': 'wait',
 'this': 'that',
 'well': 'bad',
 'you': 'i'}

In [26]:
def deceive_text(filename, deception_map):
    """Return a version of text in filename deceived by deception_map"""
    text = open(filename, 'r')
    # initiliaze a list of output lines
    output = []
    # iterate over the lines of the input text
    for line in text:
        # get the words on the current line
        words = line.split()
        # initiliaze the output line
        newline = []
        # iterate over the words on the current line
        for word in words:
            # standardize word
            stdword, illegal_chars = standardize_word(word)
            if stdword in deception_map:
                # this word needs to be replaced, but we want to
                # maintain the illegal chars (case is lost in translation)
                # 1. make the word lower case
                newword = word.lower()
                # 2. replace within the original word+illegal_chars
                newword = newword.replace(stdword, deception_map[stdword])
                # 3. append the resulting word+illegal_chars in the current output line list of words
                newline.append(newword)
            else:
                # this word does not need to be replaced, just put the word as-is in the output line list
                newline.append(word)
        # the output line we want to have consists of all words joined with a white space
        newline = ' '.join(newline)
        # append the current output line to the list of output lines  
        output.append(newline)
    
    # the output text consists of all output lines joined with newline characters
    output = '\n'.join(output)
    return output

In [27]:
# get the deceived version of our text
newalice = deceive_text('text.txt', deception)

In [28]:
# show the first 20 lines
print '\n'.join(newalice.split('\n')[:20])


john was beginning to get very tired of sitting by his sister on the bank, and of having nothing to do: once or twice he had peeped into the book his sister was reading, but it had no pictures or conversations in it, `and what was the use of a book,' thought john `without pictures or conversation?'

So he was considering in his own mind (as bad as he could, for the hot day made his feel very sleepy and stupid), whether the pleasure of making a daisy-chain would have worth the trouble of getting up and picking the daisies, when suddenly a White tiger with pink eyes ran close by his.

There was nothing so very remarkable in that; nor did john wait it so very much out of the way to hear the tiger read to itself, `Oh dear! Oh dear! you shall have late!' (when he thought it over afterwards, it occurred to his that he ought to have wondered at that, but at the time it all seemed quite natural); but when the tiger actually took a watch out of its waistcoat-pocket, and looked at it, and then hurried on, john started to his feet, for it flashed across his mind that he had never before seen a tiger with either a waistcoat-pocket, or a watch to take out of it, and burning with curiosity, he ran across the field after it, and fortunately was just in time to see it pop down a large rabbit-hole under the hedge.

In [29]:
# write out the modified text
newtext = open('newtext.txt', 'w')
newtext.write(newalice)
newtext.close()

In [ ]: