Wk1.0

Warm-up: I got 32767 problems and overflow is one of them.

1. Swap the values of two variables, a and b without using a temporary variable.

2. Suppose I had six different sodas. In how many different combinations could I drink the sodas? Write a program that calculates the number of unique combinations for 6 objects. Assume that I finish a whole sode before moving onto another one.


In [13]:
count = 1
for elem in range(1, 3 + 1):
    count *= elem
    
print(count)


6

3. Extend your program to n objects. How many different combinations do I have for 5 objects? How about 15? What is the max number of objects I could calculate for if I was storing the result in a 32 bit integer? What happens if the combinations exceed 32 bits?


In [14]:
from math import factorial as f

f(3)


Out[14]:
6

4. What will the following code yield? Was it what you expected? What's going on here?

.1 + .1 + .1 == .3

5. Try typing in the command below and read this page

format(.1, '.100g')

Data structure of the day: tuples

  • Switching variables: a second look
  • How do we make a single tuple?
  • slicing, indexing
  • mutability
  • tuple packing and unpacking
    • using tuples in loops
    • using tuples to unpack enumerate(lst)
  • tuples as return values
  • comparing tuples
    (0, 1, 2000000) < (0, 3, 4)
  • Design pattern: DSU
    • Decorate
    • Sort
    • Undecorate

Ex.

txt = 'but soft what light in yonder window breaks'
words = txt.split()
t = list()
for word in words:
   t.append((len(word), word))

t.sort(reverse=True)

res = list()
for length, word in t:
    res.append(word)

print(res)

Why would words.sort() not work?

  • We can use tuples as a way to store related data
    addr = 'monty@python.org'
    uname, domain = addr.split('@')
  • Advanced: tuples as argument parameters
    t = (a, b, c)
    func(*t)

Tuples: exercises

Exercise 1

Revise a previous program as follows: Read and parse the "From" lines and pull out the addresses from the line. Count the number of messages from each person using a dictionary.

After all the data has been read print the person with the most commits by creating a list of (count, email) tuples from the dictionary and then sorting the list in reverse order and print out the person who has the most commits.

Sample Line:
From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008

Enter a file name: mbox-short.txt
cwen@iupui.edu 5

Enter a file name: mbox.txt
zqian@umich.edu 195

Exercise 2

This program counts the distribution of the hour of the day for each of the messages. You can pull the hour from the "From" line by finding the time string and then splitting that string into parts using the colon character. Once you have accumulated the counts for each hour, print out the counts, one per line, sorted by hour as shown below.

Sample Execution:
python timeofday.py
Enter a file name: mbox-short.txt
04 3
06 1
07 1
09 2
10 3
11 6
14 1
15 2
16 4
17 2
18 1
19 1

Exercise 3

Write a program that reads a file and prints the letters in decreasing order of frequency. Your program should convert all the input to lower case and only count the letters a-z. Your program should not count spaces, digits, punctuation or anything other than the letters a-z. Find text samples from several different languages and see how letter frequency varies between languages. Compare your results with the tables at wikipedia.org/wiki/Letter_frequencies.

Afternoon warm-up

Write a function that takes three numbers $x_1, x_2, x_3$ from a user and returns the max value. Don't use the built in max function. Would your function work on more than three values?


In [40]:
def n_max():
    inpt = eval(input("Please enter some values: "))
    maximum = max_val(inpt)
    print("The largest value is", maximum)
    
def max_val(ints):
    """Input: collection of ints.
    Returns: maximum of the collection
    int - the max integer."""
    max = ints[0]
    for x in ints:
        if x > max:
            max = x
    return max
        
    

assert max_val([1, 2, 3]) == 3
assert max_val([1, 1, 1]) == 1
assert max_val([1, 2, 2]) == 2

In [41]:
n_max()


Please enter some values: 1,2,1313
The largest value is 1313

In [31]:
inpt =  eval(input("Please enter three values: "))


Please enter three values: 1,2,3,4,5,6,7

In [34]:
list(inpt)


Out[34]:
[1, 2, 3, 4, 5, 6, 7]

Strategy 1: Compare each to all (brute force)

Strategy 2: Decision Tree

Strategy 3: Sequential processing

Strategy 4: Use python

The development process

A Problem Solving Algorithm

1. Understand the problem

2. Brainstorm on paper

3. Plan out program

4. Refine design

5. Create function

6. Create function docstring

7. Create function tests

8. Check that tests fail

9. If function is trivial, then solve it (i.e. get function tests to pass). Else, create sub-function (aka divide and conquer) and repeat steps 5-8.

Example: Compress


In [ ]:
assert compress('AAAADDBBBBBCCEAA') == 'A4D2B5C2E1A2'

In [68]:
# %load ../scripts/compress/compressor.py
    
def groupby_char(lst):
    """Returns a list of strings containing identical characters.
    
    Takes a list of characters produced by running split on a string.
    Groups runs (in order sequences) of identical characters into string elements in the list.
    
    Parameters
    ---------
    Input:
    lst: list
    A list of single character strings.
    
    Output:
    grouped: list
    A list of strings containing grouped characters."""

    new_lst = []

    count = 1

    for i in range(len(lst) - 1): # we range to the second to last index since we're checking if lst[i] == lst[i + 1]. 
        if lst[i] == lst[i + 1]:
            count += 1        
        else:
            new_lst.append([lst[i],count]) # Create a lst of lists. Each list contains a character and the count of adjacent identical characters.
            count = 1
            
    new_lst.append((lst[-1],count)) # Return the last character (we didn't reach it with our for loop since indexing until second to last).


    grouped = [char*count for [char, count] in new_lst] 

    return grouped

def compress_group(string):
    """Returns a compressed two character string containing a character and a number.
    
    Takes in a string of identical characters and returns the compressed string 
    consisting of the character and the length of the original string.
    
    Example
    -------
    "AAA"-->"A3"

    Parameters:
    -----------
    Input:
    string: str
    A string of identical characters.
    
    Output:
    ------
    compressed_str: str
    A compressed string of length two containing a character and a number.
    """
    

    return str(string[0]) + str(len(string))


def compress(string):
    """Returns a compressed representation of a string.
    
    Compresses the string by mapping each run of identical characters to a 
    single character and a count. 
    
    Ex.
    --
    compress('AAABBCDDD')--> 'A3B2C1D3'.
    
    Only compresses string if the compression is shorter than the original string.

    Ex.
    --
    compress('A')--> 'A' # not 'A1'.

    Parameters
    ----------

    Input:
    string: str
    The string to compress

    Output:
    compressed: str
    The compressed representation of the string.
    """

    try:
        split_str = [char for char in string] # Create list of single characters.
        grouped = groupby_char(split_str) # Group characters if characters are identical.
        compressed = ''.join( # Compress each element of the grouped list and join to a string.
                [compress_group(elem) for elem in grouped])

        if len(compressed) < len(string): # Only return compressed if compressed is actually shorter.
            return compressed
        else:
            return string

    except IndexError: # If our input string is empty, return an empty string.
        return ""

    except TypeError: # If we get something that's not compressible (including NoneType) return None.
        return None

In [69]:
# %load ../scripts/compress/compress_tests.py
# This will fail to run because in wrong directory

from compress.compressor import *

def compress_test():
    assert compress('AAABBCDDD') == 'A3B2C1D3'
    assert compress('A') == 'A'
    assert compress('') == ''
    assert compress('AABBCC') == 'AABBCC' # compressing doesn't shorten string so just return string.
    assert compress(None) == None

def groupby_char_test():
    assert groupby_char(["A", "A", "A", "B", "B"]) == ["AAA", "BB"]

def compress_group_test():
    assert compress_group("AAA") == "A3"
    assert compress_group("A") == "A1"


---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-69-43020d6ff834> in <module>()
      2 # This will fail to run because in wrong directory
      3 
----> 4 from compress.compressor import *
      5 
      6 def compress_test():

ImportError: No module named 'compress'