Week 1 Review


In [4]:
from sklearn import datasets
boston = datasets.load_boston()

In [5]:
print boston.DESCR


Boston House Prices dataset

Notes
------
Data Set Characteristics:  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive
    
    :Median Value (attribute 14) is usually the target

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
        - B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
        - LSTAT    % lower status of the population
        - MEDV     Median value of owner-occupied homes in $1000's

    :Missing Attribute Values: None

    :Creator: Harrison, D. and Rubinfeld, D.L.

This is a copy of UCI ML housing dataset.
http://archive.ics.uci.edu/ml/datasets/Housing


This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.

The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic
prices and the demand for clean air', J. Environ. Economics & Management,
vol.5, 81-102, 1978.   Used in Belsley, Kuh & Welsch, 'Regression diagnostics
...', Wiley, 1980.   N.B. Various transformations are used in the table on
pages 244-261 of the latter.

The Boston house-price data has been used in many machine learning papers that address regression
problems.   
     
**References**

   - Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.
   - Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.
   - many more! (see http://archive.ics.uci.edu/ml/datasets/Housing)

Let's group house prices into expensive (> \$20K) or affordable (<= \$20K). This is 1978.


In [13]:
# create dictionary with values that are empty lists
prices = {
    'expensive' : [],
    'affordable' : []
}

# the price of a house will be appended to the corresponding list (expensive or affordable)
for value in boston.target:
    if value > 20:
        prices['expensive'].append(value)
    else:
        prices['affordable'].append(value)

from pprint import pprint as pp
pp(prices)


{'affordable': [16.5,
                18.899999999999999,
                15.0,
                18.899999999999999,
                18.199999999999999,
                19.899999999999999,
                17.5,
                18.199999999999999,
                13.6,
                19.600000000000001,
                15.199999999999999,
                14.5,
                15.6,
                13.9,
                16.600000000000001,
                14.800000000000001,
                18.399999999999999,
                12.699999999999999,
                14.5,
                13.199999999999999,
                13.1,
                13.5,
                18.899999999999999,
                20.0,
                19.300000000000001,
                20.0,
                16.600000000000001,
                14.4,
                19.399999999999999,
                19.699999999999999,
                18.899999999999999,
                19.600000000000001,
                18.699999999999999,
                16.0,
                19.399999999999999,
                17.399999999999999,
                20.0,
                18.600000000000001,
                19.300000000000001,
                19.5,
                19.5,
                19.800000000000001,
                19.399999999999999,
                18.800000000000001,
                18.699999999999999,
                18.5,
                18.300000000000001,
                19.199999999999999,
                19.300000000000001,
                17.300000000000001,
                18.800000000000001,
                15.699999999999999,
                16.199999999999999,
                18.0,
                14.300000000000001,
                19.199999999999999,
                19.600000000000001,
                18.399999999999999,
                15.6,
                18.100000000000001,
                17.399999999999999,
                17.100000000000001,
                13.300000000000001,
                17.800000000000001,
                14.0,
                14.4,
                13.4,
                15.6,
                11.800000000000001,
                13.800000000000001,
                15.6,
                14.6,
                17.800000000000001,
                15.4,
                19.600000000000001,
                15.300000000000001,
                19.399999999999999,
                17.0,
                15.6,
                13.1,
                17.399999999999999,
                19.100000000000001,
                20.0,
                19.300000000000001,
                17.600000000000001,
                18.5,
                16.100000000000001,
                19.399999999999999,
                16.199999999999999,
                17.800000000000001,
                19.800000000000001,
                18.5,
                19.300000000000001,
                19.800000000000001,
                17.100000000000001,
                19.399999999999999,
                19.5,
                18.5,
                19.0,
                18.699999999999999,
                16.5,
                17.5,
                17.199999999999999,
                18.600000000000001,
                18.199999999999999,
                17.800000000000001,
                19.899999999999999,
                16.800000000000001,
                13.800000000000001,
                13.800000000000001,
                15.0,
                13.9,
                13.300000000000001,
                13.1,
                10.199999999999999,
                10.4,
                10.9,
                11.300000000000001,
                12.300000000000001,
                8.8000000000000007,
                7.2000000000000002,
                10.5,
                7.4000000000000004,
                10.199999999999999,
                11.5,
                15.1,
                9.6999999999999993,
                13.800000000000001,
                12.699999999999999,
                13.1,
                12.5,
                8.5,
                5.0,
                6.2999999999999998,
                5.5999999999999996,
                7.2000000000000002,
                12.1,
                8.3000000000000007,
                8.5,
                5.0,
                11.9,
                17.199999999999999,
                15.0,
                17.199999999999999,
                17.899999999999999,
                16.300000000000001,
                7.0,
                7.2000000000000002,
                7.5,
                10.4,
                8.8000000000000007,
                8.4000000000000004,
                16.699999999999999,
                14.199999999999999,
                13.4,
                11.699999999999999,
                8.3000000000000007,
                10.199999999999999,
                10.9,
                11.0,
                9.5,
                14.5,
                14.1,
                16.100000000000001,
                14.300000000000001,
                11.699999999999999,
                13.4,
                9.5999999999999996,
                8.6999999999999993,
                8.4000000000000004,
                12.800000000000001,
                10.5,
                17.100000000000001,
                18.399999999999999,
                15.4,
                10.800000000000001,
                11.800000000000001,
                14.9,
                12.6,
                14.1,
                13.0,
                13.4,
                15.199999999999999,
                16.100000000000001,
                17.800000000000001,
                14.9,
                14.1,
                12.699999999999999,
                13.5,
                14.9,
                20.0,
                16.399999999999999,
                17.699999999999999,
                19.5,
                19.899999999999999,
                19.0,
                19.100000000000001,
                19.100000000000001,
                19.899999999999999,
                19.600000000000001,
                13.800000000000001,
                13.300000000000001,
                16.699999999999999,
                12.0,
                14.6,
                19.100000000000001,
                15.199999999999999,
                7.0,
                8.0999999999999996,
                13.6,
                19.699999999999999,
                18.300000000000001,
                17.5,
                16.800000000000001,
                11.9],
 'expensive': [24.0,
               21.600000000000001,
               34.700000000000003,
               33.399999999999999,
               36.200000000000003,
               28.699999999999999,
               22.899999999999999,
               27.100000000000001,
               21.699999999999999,
               20.399999999999999,
               23.100000000000001,
               20.199999999999999,
               21.0,
               21.0,
               24.699999999999999,
               30.800000000000001,
               34.899999999999999,
               26.600000000000001,
               25.300000000000001,
               24.699999999999999,
               21.199999999999999,
               20.5,
               25.0,
               23.399999999999999,
               35.399999999999999,
               24.699999999999999,
               31.600000000000001,
               23.300000000000001,
               22.199999999999999,
               25.0,
               33.0,
               23.5,
               22.0,
               20.899999999999999,
               24.199999999999999,
               21.699999999999999,
               22.800000000000001,
               23.399999999999999,
               24.100000000000001,
               21.399999999999999,
               20.800000000000001,
               21.199999999999999,
               20.300000000000001,
               28.0,
               23.899999999999999,
               24.800000000000001,
               22.899999999999999,
               23.899999999999999,
               26.600000000000001,
               22.5,
               22.199999999999999,
               23.600000000000001,
               28.699999999999999,
               22.600000000000001,
               22.0,
               22.899999999999999,
               25.0,
               20.600000000000001,
               28.399999999999999,
               21.399999999999999,
               38.700000000000003,
               43.799999999999997,
               33.200000000000003,
               27.5,
               26.5,
               20.100000000000001,
               20.399999999999999,
               21.699999999999999,
               22.800000000000001,
               21.199999999999999,
               20.399999999999999,
               22.0,
               20.300000000000001,
               20.5,
               21.399999999999999,
               23.0,
               21.5,
               41.299999999999997,
               24.300000000000001,
               23.300000000000001,
               27.0,
               50.0,
               50.0,
               50.0,
               22.699999999999999,
               25.0,
               50.0,
               23.800000000000001,
               23.800000000000001,
               22.300000000000001,
               23.100000000000001,
               23.600000000000001,
               22.600000000000001,
               29.399999999999999,
               23.199999999999999,
               24.600000000000001,
               29.899999999999999,
               37.200000000000003,
               39.799999999999997,
               36.200000000000003,
               37.899999999999999,
               32.5,
               26.399999999999999,
               29.600000000000001,
               50.0,
               32.0,
               29.800000000000001,
               34.899999999999999,
               37.0,
               30.5,
               36.399999999999999,
               31.100000000000001,
               29.100000000000001,
               50.0,
               33.299999999999997,
               30.300000000000001,
               34.600000000000001,
               34.899999999999999,
               32.899999999999999,
               24.100000000000001,
               42.299999999999997,
               48.5,
               50.0,
               22.600000000000001,
               24.399999999999999,
               22.5,
               24.399999999999999,
               21.699999999999999,
               22.399999999999999,
               28.100000000000001,
               23.699999999999999,
               25.0,
               23.300000000000001,
               28.699999999999999,
               21.5,
               23.0,
               26.699999999999999,
               21.699999999999999,
               27.5,
               30.100000000000001,
               44.799999999999997,
               50.0,
               37.600000000000001,
               31.600000000000001,
               46.700000000000003,
               31.5,
               24.300000000000001,
               31.699999999999999,
               41.700000000000003,
               48.299999999999997,
               29.0,
               24.0,
               25.100000000000001,
               31.5,
               23.699999999999999,
               23.300000000000001,
               22.0,
               20.100000000000001,
               22.199999999999999,
               23.699999999999999,
               24.300000000000001,
               20.5,
               24.5,
               26.199999999999999,
               24.399999999999999,
               24.800000000000001,
               29.600000000000001,
               42.799999999999997,
               21.899999999999999,
               20.899999999999999,
               44.0,
               50.0,
               36.0,
               30.100000000000001,
               33.799999999999997,
               43.100000000000001,
               48.799999999999997,
               31.0,
               36.5,
               22.800000000000001,
               30.699999999999999,
               50.0,
               43.5,
               20.699999999999999,
               21.100000000000001,
               25.199999999999999,
               24.399999999999999,
               35.200000000000003,
               32.399999999999999,
               32.0,
               33.200000000000003,
               33.100000000000001,
               29.100000000000001,
               35.100000000000001,
               45.399999999999999,
               35.399999999999999,
               46.0,
               50.0,
               32.200000000000003,
               22.0,
               20.100000000000001,
               23.199999999999999,
               22.300000000000001,
               24.800000000000001,
               28.5,
               37.299999999999997,
               27.899999999999999,
               23.899999999999999,
               21.699999999999999,
               28.600000000000001,
               27.100000000000001,
               20.300000000000001,
               22.5,
               29.0,
               24.800000000000001,
               22.0,
               26.399999999999999,
               33.100000000000001,
               36.100000000000001,
               28.399999999999999,
               33.399999999999999,
               28.199999999999999,
               22.800000000000001,
               20.300000000000001,
               22.100000000000001,
               21.600000000000001,
               23.800000000000001,
               23.100000000000001,
               21.0,
               23.800000000000001,
               23.100000000000001,
               20.399999999999999,
               25.0,
               24.600000000000001,
               23.0,
               22.199999999999999,
               22.600000000000001,
               22.199999999999999,
               20.699999999999999,
               21.100000000000001,
               20.600000000000001,
               32.700000000000003,
               23.899999999999999,
               31.199999999999999,
               23.100000000000001,
               24.5,
               26.600000000000001,
               22.899999999999999,
               24.100000000000001,
               30.100000000000001,
               20.600000000000001,
               21.699999999999999,
               22.699999999999999,
               22.600000000000001,
               25.0,
               20.800000000000001,
               21.899999999999999,
               27.5,
               21.899999999999999,
               23.100000000000001,
               50.0,
               50.0,
               50.0,
               50.0,
               50.0,
               23.199999999999999,
               27.899999999999999,
               27.5,
               20.800000000000001,
               20.199999999999999,
               21.399999999999999,
               20.100000000000001,
               23.199999999999999,
               29.800000000000001,
               21.399999999999999,
               23.0,
               23.699999999999999,
               25.0,
               21.800000000000001,
               20.600000000000001,
               21.199999999999999,
               20.600000000000001,
               20.100000000000001,
               21.800000000000001,
               24.5,
               23.100000000000001,
               21.199999999999999,
               22.399999999999999,
               20.600000000000001,
               23.899999999999999,
               22.0]}

In [14]:
# how many affordable and expensive houses are there?
print "Expensive:", len(prices['expensive'])
print "Affordable:", len(prices['affordable'])


Expensive: 291
Affordable: 215

In [19]:
# What proportion of expensive houses are more than $40K?  List comprehension is good for operating on a list
from __future__ import division      # what's another way to do this without importing?

len([v for v in prices['expensive'] if v > 40])/len(prices['expensive'])


Out[19]:
0.10652920962199312

In [35]:
# whats the average price of an affordable house?
avg = sum(i for i in prices['affordable'])/len(prices['affordable'])
print avg
print "Average price: ${:,.0f}".format(avg*1000)    # make it pretty


15.1930232558
Average price: $15,193

Week 2 - Functions

First class functions

Functions behave like any other object, such as an int or a list

  • Use functions as arguments to other functions
  • Store functions as dictionary values
  • Return a function from another function
  • Common feature of functional programming languages like Lisp or Haskell or Scala
  • Leads to many powerful ways to use functions.

In [231]:
def square(x):
    """Square of x"""    # what are these """ """ called? What are they for?
    return x**2

def cube(x):
    """Cube of x."""
    return x**3

def root(x):
    """Square root of x."""
    return x**.5

In [233]:
# create a dictionary of functions
funcs = {
    'square': square,
    'cube': cube,
    'root': root,
}

In [234]:
x = 2

print square(x)
print cube(x)
print root(x)


4
8
1.41421356237

In [235]:
# print function name and output, sorted by function name
for key, val in sorted(funcs.items()):
    print key, val(x)


cube 8
root 1.41421356237
square 4

Functions can be passed in as arguments


In [245]:
def derivative(x, f, h = 0.01):       # using default value for h
    """ Calculate the derivative of any continuous, differentiable function """
    
    return (f(x+h) - f(x))/h
$$ f(x) = 3x^2 + 5x + 3$$


$$ f'(x) = 6x + 5$$
$$ f'(2) = 6\times2 + 5 = 17$$


In [138]:
def some_func(x):    
    return 3*x**2 + 5*x + 3

In [251]:
derivative(2, some_func, h = 0.01) # passing in function f.   What happens if I include a value for h?


Out[251]:
17.029999999999745

Functions can also be returned by functions


In [257]:
# this is how you might performing timing outside of the Jupyter notebook
# Lots of documentation online about time decorator

import time

def sum_squares(n):
    """ Sum of the squares from 0 to n-1 """
    
    return sum([x**2 for x in range(n)])

def timer(f,n):
    """ Time how long it takes to evaluate a function """
    
    start = time.clock()
    result = f(n)   
    elapsed = time.clock() - start
    return result, elapsed

In [258]:
n = 10000000
timer(sum_squares, n)


Out[258]:
(333333283333335000000L, 1.555731999999999)

In [259]:
%timeit sum_squares(n)


1 loop, best of 3: 1.53 s per loop

Higher order functions

  • A function that uses another function as an input argument or returns a function
  • The most familiar are `map`, `filter` and `reduce`. These are very fast in Python.
  • Custom functions are HOF

In [167]:
# The map function applies a function to each member of a collection
# map(some_function, some_sequence)

map(square, range(10))


Out[167]:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [176]:
# The filter function applies a predicate to each member of a collection,
# retaining only those members where the predicate is True

def is_even(x):
    return x % 2 == 0

filter(is_even, range(10))


Out[176]:
[0, 2, 4, 6, 8]

In [177]:
# It is common to combine map and filter

map(square, filter(is_even, range(10)))


Out[177]:
[0, 4, 16, 36, 64]

In [181]:
# The reduce function reduces a collection using a binary operator to combine sequential items, two at a time

def add_them(x, y):
    return x + y

# another implementation of the sum function - like a running total
reduce(add_them, [1,2,3,4,5])


Out[181]:
15

In [182]:
def custom_sum(xs, func):
    """Returns the sum of xs after applying func to xs."""
    
    return sum(map(func, xs))

xs = range(10)
print custom_sum(xs, square)
print custom_sum(xs, cube)
print custom_sum(xs, root)


285
2025
19.306000526

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

EXERCISE TIME!

1) Using map, write python program to calculate the length of each element in a list: ['Donald','Ted','Hilary','Joe','Bernie'].

2) Using reduce and map, write a python program to find the largest element in the list of integers, floats or strings (that are numbers).
For example: [2, '3', 4.0, 2, -1, '10', 9, -4.3, 8, 7, 11, 3]. Should return 11.

Hint:

  1. How can you tell Python to compare `'10'` to `8`? Should the types be the same?
  2. You can use `reduce` to find the maximum in a similar way as we used it to find the `sum` in the example above.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Anonymous functions

  • In functional programming, there is often a need to create small specific functions that perform a limited task as input to a HOF such as `map` or `filter`. Think `sum()`, `square()`, ...
  • In such cases, these functions are often written as anonymous or **lambda** functions

If you find it hard to understand what a lambda function is doing, it should probably be rewritten as a regular function.


In [185]:
# Using standard functions
n = 10

def square(x):
    return x*x

square(n)


Out[185]:
100

In [186]:
map(square, range(n))


Out[186]:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [187]:
# Using lambda function

sqr = lambda x: x*x

sqr(n)


Out[187]:
100

In [265]:
n=10
map(sqr, range(n))


Out[265]:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [266]:
# what does this function do?

s1 = reduce(lambda x, y: x+y, map(lambda x: x**2, range(1,10)))
print(s1)


285

In [267]:
# functional expressions and lambdas are cool
# but can be difficult to read when over-used
# Here is a more comprehensible version

s2 = sum(x**2 for x in range(1, 10))
print(s2)


285

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

EXERCISE TIME!

Rewrite the following as a list comprehension, i.e. one liner without using map or filter


In [21]:
ans = map(lambda x: x*x, filter(lambda x: x%2 == 0, range(10)))
print ans


[0, 4, 16, 36, 64]

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Recursion

  • A recursive function is one that calls itself
  • Extremely useful examples of the divide-and-conquer paradigm in algorithm development
  • However, they can be computationally inefficient and their use in Python is quite rare in practice

Recursive functions generally have:

  • a set of base cases
    • the answer is obvious
    • can be returned immediately
  • a set of recursive cases
    • which are split into smaller pieces
    • each of which is given to the same function called recursively

Examples

Factorial:

$$ n! = n\times(n-1)\times(n-2)\times...\times2\times1$$

For example, $$4! = 4\times3\times2\times1 = 24 $$


In [268]:
def fact(n):
    """Returns the factorial of n"""
    
    # base case
    if n==0:
        return 1
    
    # recursive case
    else:
        return n * fact(n-1)

In [269]:
[(n,fact(n)) for n in range(1,10)]


Out[269]:
[(1, 1),
 (2, 2),
 (3, 6),
 (4, 24),
 (5, 120),
 (6, 720),
 (7, 5040),
 (8, 40320),
 (9, 362880)]

Fibonacci sequence:

$$F_n = F_{n-1} + F_{n-2},\!\,$$

Output is:

$$1, 1, 2, 3, 5, 8, 13, 21, ...$$


In [193]:
def fib1(n):
    """Fibonacci with recursion"""

    # base case
    if n==0 or n==1:
        return 1
    # recursive caae
    else:
        return fib1(n-1) + fib1(n-2)

In [270]:
[(i,fib1(i)) for i in range(20)]


Out[270]:
[(0, 1),
 (1, 1),
 (2, 2),
 (3, 3),
 (4, 5),
 (5, 8),
 (6, 13),
 (7, 21),
 (8, 34),
 (9, 55),
 (10, 89),
 (11, 144),
 (12, 233),
 (13, 377),
 (14, 610),
 (15, 987),
 (16, 1597),
 (17, 2584),
 (18, 4181),
 (19, 6765)]

In [272]:
# In Python, a more efficient version that does not use recursion is

def fib2(n):
    """Fibonacci without recursion"""
    
    a, b = 0, 1
    for i in range(1, n+1):
        a, b = b, a + b  # notice the variable setting is on one line
    return b

In [273]:
[(i,fib2(i)) for i in range(20)]


Out[273]:
[(0, 1),
 (1, 1),
 (2, 2),
 (3, 3),
 (4, 5),
 (5, 8),
 (6, 13),
 (7, 21),
 (8, 34),
 (9, 55),
 (10, 89),
 (11, 144),
 (12, 233),
 (13, 377),
 (14, 610),
 (15, 987),
 (16, 1597),
 (17, 2584),
 (18, 4181),
 (19, 6765)]

In [274]:
# Note that the recursive version is much slower than the non-recursive version

%time fib1(30)
%time fib2(30)


CPU times: user 428 ms, sys: 23.5 ms, total: 451 ms
Wall time: 437 ms
CPU times: user 6 µs, sys: 0 ns, total: 6 µs
Wall time: 7.87 µs
Out[274]:
1346269

This is because it makes many duplicate function calls. For example:

We can get around that by caching results, but we won't cover that here.

Classic use of recursion - Quick Sort


In [275]:
def quick_sort(xs):
    """ Classic quick sort """

    # base case
    if xs == []:
        return xs
    # recursive case
    else:
        pivot = xs[0] # choose starting pivot to be on the left
        less_than = [x for x in xs[1:] if x <= pivot]
        more_than = [x for x in xs[1:] if x > pivot]
        
        return quick_sort(less_than) + [pivot] + quick_sort(more_than)

In [276]:
xs = [11,3,1,4,1,5,9,2,6,5,3,5,9,0,10,4,3,7,4,5,8,-1]
print quick_sort(xs)


[-1, 0, 1, 1, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5, 6, 7, 8, 9, 9, 10, 11]

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

EXERCISE TIME!

Euclid's algorithm for finding the greatest common divisor of two numbers is

gcd(a, 0) = a
gcd(a, b) = gcd(b, a modulo b)
  1. What is the greatest common divisor of `17384` and `1928`? Write the `gcd(a,b)` function.
  2. Write a function to calculate the least common multiple, `lcm(a,b)`. Hint: Using `gcd` makes this very easy. Feel free to Google it.
  3. What is the least common multiple of `17384` and `1928`? Hint: Google it!

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Iterators

  • Iterators represent streams of values.
  • Will produce the next value when you call `next()` on it
  • Because only one value is consumed at a time, they use very little memory.
  • Use of iterators is very helpful for working with data sets too large to fit into RAM.

In [215]:
# Iterators can be created from sequences with the built-in function iter()

xs = [1,2,3]
x_iter = iter(xs)

print x_iter.next() # python "remembers" where the pointer is
print x_iter.next()
print x_iter.next()
print x_iter.next()


1
2
3
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-215-35c80bc0fe4b> in <module>()
      7 print x_iter.next()
      8 print x_iter.next()
----> 9 print x_iter.next()

StopIteration: 

In [216]:
# Most commonly, iterators are used (automatically) within a for loop
# which terminates when it encouters a StopIteration exception

x_iter = iter(xs)
for x in x_iter:
    print x


1
2
3

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

EXERCISE TIME!

Starting with range(1, 20), make a list of the squares of each odd number in the following ways

  • Using map and filter.
    • This can be done in one line if you use lambda (twice!). But you don't have to!
    • Remember - to use map and/or filter, you need to apply a function to each element of a list.
  • Using a list comprehension
  • With a for loop

The answer should be [1, 9, 25, 49, 81, 121, 169, 225, 289, 361]

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Review Problems

Q1. Rewrite the factorial function so that it does not use recursion. Hint: consider using reduce since you're multiplying consecutive numbers.

Here is the original code:

def fact(n):
    """Returns the factorial of n."""
    # base case
    if n==0:
        return 1
    # recursive case
    else:
        return n * fact(n-1)

for i in range(1,11):
    print fact(i),

Q2. Write a function, normalize(x) that takes in a vector x and outputs a normalized vector x_norm in the following way:

$$ x^{normed}_{i} = \frac{x_{i} - \mu}{\sigma} $$

where,

$$ \mu = \frac{1}{n}\sum_{i=1}^{n}x_{i} $$$$ \sigma^2 = \frac{1}{n}\sum_{i=1}^{n}(x_{i} - \mu)^2 $$

Each $X_i$ is a single data point from the input list. For example, an input list x = [1,2,3,4] should output x_norm = [-1.3416407865,-0.4472135955,0.4472135955,1.3416407865].

Note that the sum of the new list should be 0 and the standard deviation should be 1.0 - this is why it's called normalizing. It's also called standardizing

Q3. Matrix Multiplication and list comprehension

Rewrite the matrix multiplication code to use list comprehension instead of nested for loops.

Q4.

Write a program to merge two dictionaries together. Assume the dictionaries have the same keys. For example:

a = {
    "key1" : 1,
    "key3" : "snafu",
    "key2" : 5,
    "key5" : 7,
    "key4" : 0,
    }

b = {
    "key2" : 6,
    "key1" : 8,
    "key4" : "bar",
    "key3" : 9,
    "key5" : "foo"
    }

becomes

c = {
     'key1': (1, 8),
     'key2': (5, 6),
     'key3': ('snafu', 9),
     'key4': (0, 'bar'),
     'key5': (7, 'foo')
}

Hint: See the collections module for a helper function.

Q5. Pascal's triangle

  1. Write a function `pascal(c,r)` which takes in a column `c` and row `r` (both start indexing at 0) and returns the value of Pascal's triangle.
  2. Print the first 10 iterations of the triangle. Here are the first 6:
     1
    1 1
   1 2 1
  1 3 3 1
 1 4 6 4 1 
1 5 10 10 5 1 
...

Do this recursively.

Bonus (Hard!) Questions

Q1. The four adjacent digits in the 1000-digit number that have the greatest product are 9 × 9 × 8 × 9 = 5832.

73167176531330624919225119674426574742355349194934
96983520312774506326239578318016984801869478851843
85861560789112949495459501737958331952853208805511
12540698747158523863050715693290963295227443043557
66896648950445244523161731856403098711121722383113
62229893423380308135336276614282806444486645238749
30358907296290491560440772390713810515859307960866
70172427121883998797908792274921901699720888093776
65727333001053367881220235421809751254540594752243
52584907711670556013604839586446706324415722155397
53697817977846174064955149290862569321978468622482
83972241375657056057490261407972968652414535100474
82166370484403199890008895243450658541227588666881
16427171479924442928230863465674813919123162824586
17866458359124566529476545682848912883142607690042
24219022671055626321111109370544217506941658960408
07198403850962455444362981230987879927244284909188
84580156166097919133875499200524063689912560717606
05886116467109405077541002256983155200055935729725
71636269561882670428252483600823257530420752963450

Write a program to find the thirteen adjacent digits in the 1000-digit number that have the greatest product. What is the value of this product? (Euler problem #8)

The answer shoud be 23514624000.

Q2.

A string of consecutive successes is known as a success run. Write a function that returns the counts for runs of length $k$ for each $k$ observed in a dictionary. Hint: check out the itertools library.

For example: if the trials were [0, 1, 0, 1, 1, 0, 0, 0, 0, 1], the function should return

{1: 2, 2: 1})