Table of content:

  • (V) iterators and list comprehensions
  • (V) itertools
  • (V) high-efficiency data containers
  • (V) map-reduce in python
  • (V) think functions
  • (V) python as type-based language. Type injections
  • (V) dunder utilities and how to use them
  • (V) wrappers: commonly used wrappers, user-defined wrappers
  • (V) Closures (and how to avoid creating it accidentally, aka: never use mutables as default values )
  • (V) code repetitivity index (http://clonedigger.sourceforge.net/)
  • (V) refactoring a bad piece of code
  • (V) python scopes: global, local, function layers and how to correct it (mutable wrapping)
  • (V) python lambdas (silent functions)
  • (V) basic python threading
  • (V) docstrings and Sphinx
  • (V) Gohlke Link
  • (V) Type function
  • (V) packaging of pure python modules, dependencies and installation of git repos via pip (aka how to share your code well)
  • (V) numpy and scipy: matrices and sparse matrices
  • (V) pyplot basics
  • (V) basics of git usage (within Pycharm)
  • (V) advanced indexing - lists and Numpy

In [1]:
%matplotlib inline

Ex0: More than just code


In [2]:
L = []

print dir(L)

for elt in dir(L):
    print elt, '\t', getattr(L, elt).__doc__


['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__delslice__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getslice__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__setslice__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']
__add__ 	x.__add__(y) <==> x+y
__class__ 	list() -> new empty list
list(iterable) -> new list initialized from iterable's items
__contains__ 	x.__contains__(y) <==> y in x
__delattr__ 	x.__delattr__('name') <==> del x.name
__delitem__ 	x.__delitem__(y) <==> del x[y]
__delslice__ 	x.__delslice__(i, j) <==> del x[i:j]
           
           Use of negative indices is not supported.
__doc__ 	str(object='') -> string

Return a nice string representation of the object.
If the argument is a string, the return value is the same object.
__eq__ 	x.__eq__(y) <==> x==y
__format__ 	default object formatter
__ge__ 	x.__ge__(y) <==> x>=y
__getattribute__ 	x.__getattribute__('name') <==> x.name
__getitem__ 	x.__getitem__(y) <==> x[y]
__getslice__ 	x.__getslice__(i, j) <==> x[i:j]
           
           Use of negative indices is not supported.
__gt__ 	x.__gt__(y) <==> x>y
__hash__ 	None
__iadd__ 	x.__iadd__(y) <==> x+=y
__imul__ 	x.__imul__(y) <==> x*=y
__init__ 	x.__init__(...) initializes x; see help(type(x)) for signature
__iter__ 	x.__iter__() <==> iter(x)
__le__ 	x.__le__(y) <==> x<=y
__len__ 	x.__len__() <==> len(x)
__lt__ 	x.__lt__(y) <==> x<y
__mul__ 	x.__mul__(n) <==> x*n
__ne__ 	x.__ne__(y) <==> x!=y
__new__ 	T.__new__(S, ...) -> a new object with type S, a subtype of T
__reduce__ 	helper for pickle
__reduce_ex__ 	helper for pickle
__repr__ 	x.__repr__() <==> repr(x)
__reversed__ 	L.__reversed__() -- return a reverse iterator over the list
__rmul__ 	x.__rmul__(n) <==> n*x
__setattr__ 	x.__setattr__('name', value) <==> x.name = value
__setitem__ 	x.__setitem__(i, y) <==> x[i]=y
__setslice__ 	x.__setslice__(i, j, y) <==> x[i:j]=y
           
           Use  of negative indices is not supported.
__sizeof__ 	L.__sizeof__() -- size of L in memory, in bytes
__str__ 	x.__str__() <==> str(x)
__subclasshook__ 	Abstract classes can override this to customize issubclass().

This is invoked early on by abc.ABCMeta.__subclasscheck__().
It should return True, False or NotImplemented.  If it returns
NotImplemented, the normal algorithm is used.  Otherwise, it
overrides the normal algorithm (and the outcome is cached).

append 	L.append(object) -- append object to end
count 	L.count(value) -> integer -- return number of occurrences of value
extend 	L.extend(iterable) -- extend list by appending elements from the iterable
index 	L.index(value, [start, [stop]]) -> integer -- return first index of value.
Raises ValueError if the value is not present.
insert 	L.insert(index, object) -- insert object before index
pop 	L.pop([index]) -> item -- remove and return item at index (default last).
Raises IndexError if list is empty or index is out of range.
remove 	L.remove(value) -- remove first occurrence of value.
Raises ValueError if the value is not present.
reverse 	L.reverse() -- reverse *IN PLACE*
sort 	L.sort(cmp=None, key=None, reverse=False) -- stable sort *IN PLACE*;
cmp(x, y) -> -1, 0, 1

Ex1: nested dicts


In [3]:
from pprint import pprint

cl = 'module level'

def test_function():
    
    def inner_test():
        st2 = 'secondary_function'
        cl = 'inner test'
        print 'inner function locals: \n', pprint(locals())
    
    st = 'primary function'
    it = inner_test()
    cl = 'test function'
    print 'locals: \n', pprint(locals())
#     print 'globals:', pprint(globals())
    return 'test  function return'

test_function()


inner function locals: 
{'cl': 'inner test', 'st2': 'secondary_function'}
None
locals: 
{'cl': 'test function',
 'inner_test': <function inner_test at 0x000000000A216F98>,
 'it': None,
 'st': 'primary function'}
None
Out[3]:
'test  function return'

In [4]:
gl = globals()
print '__name__' in gl.keys()
print gl['__name__']
import numpy as np
print np.__name__


True
__main__
numpy

Ex2: First catch-22:

Tell what's wrong with the code below and correct it:


In [5]:
foo = 'bar'

def ret_foo():
    return foo

def nice_foo():
    foo = 'nice'

def foo_mod_function():
    foo += '-bar' 

if __name__ == '__main__':
    print foo
    print type(foo)
    print foo.__class__
    print ret_foo()
    nice_foo()
    print foo
    foo_mod_function()
    print foo


bar
<type 'str'>
<type 'str'>
bar
bar
---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-5-e13997e80cf3> in <module>()
     17     nice_foo()
     18     print foo
---> 19     foo_mod_function()
     20     print foo

<ipython-input-5-e13997e80cf3> in foo_mod_function()
      8 
      9 def foo_mod_function():
---> 10     foo += '-bar'
     11 
     12 if __name__ == '__main__':

UnboundLocalError: local variable 'foo' referenced before assignment

Ex3: Duck Typing Make all the modification necesseray for the code to run correctly


In [34]:
if __name__ == '__main__':
    print type('a')
    print type("a")
    print type('acknowledge')
    print type(u'acknowledge')
    print type(3)
    print type(3.1)
    print type(foo2)
    print '---------------'
    print isinstance('a', str)
    print isinstance(u'a', str)
    print isinstance('a', bool)
    print isinstance([], (tuple, list, set))
    print issubclass(str, bool)
    print '---------------'
    print bool('a')
    print bool('')
    print bool([])
    print '---------------'
    print 'a' + 'b'
    print 1 + 2
    print 1 + 2.1
    print 'a' + 1
    print '---------------'
    print 'Py' * 2
    print 'Py'-'thon'
    print [1,2,3]*2
    print 1 / 2
    print 1 % 2
    print 1 / 2.
    print 1 % 2.
    assert(1 / 2 == 1 / 2.)
    print 3.621 % 2.5
    print 8.621 % 7.5
    assert(3.621 % 2.5 == 8.621 % 7.5)


<type 'str'>
<type 'str'>
<type 'str'>
<type 'unicode'>
<type 'int'>
<type 'float'>
---------------
True
False
False
True
False
---------------
True
False
False
---------------
ab
3
3.1
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-34-13b450b2a6e3> in <module>()
     21     print 1 + 2
     22     print 1 + 2.1
---> 23     print 'a' + 1  # for Dar: they are supposed to perform an explicite conversion here
     24     print '---------------'
     25     print 'Py' * 2

TypeError: cannot concatenate 'str' and 'int' objects

Ex4: Catching/Trhowing the exceptions

Make the function below work in accordance with specification


In [7]:
def simple_adder(var1, var2):  #tester function: no need to modify
    return var1 + var2

def adder(var1, var2):
    """
    Adds var1 and var2.
    In case their types are incompatible, both variables are converted to strings and their sum returned.
    The variables are always printed before being returned
    If unrelated error is raised from within variables, it will be passed along and no matter
    If a TypeError was raised, it will be printed before printing the variables
    """
    return var1 + var2

if __name__ == '__main__':
    print adder(1, 1)
    print adder(1, 3.1)
    print adder('1', 'a')
    print adder('a', 1)
    print adder(2, 'b')


2
4.1
1a
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-2a37fbc00fa0> in <module>()
     16     print adder(1, 3.1)
     17     print adder('1', 'a')
---> 18     print adder('a', 1)
     19     print adder(2, 'b')

<ipython-input-7-2a37fbc00fa0> in adder(var1, var2)
     10     If a TypeError was raised, it will be printed before printing the variables
     11     """
---> 12     return var1 + var2
     13 
     14 if __name__ == '__main__':

TypeError: cannot concatenate 'str' and 'int' objects

However, complexity comes at the expense of performance


In [8]:
%timeit 1 + 1
%timeit simple_adder(1, 1)
%timeit adder(1, 1)


100000000 loops, best of 3: 13 ns per loop
10000000 loops, best of 3: 84.9 ns per loop
10000000 loops, best of 3: 84.4 ns per loop

Ex5: Mutability and dict keys Correct all the errors


In [10]:
import hashlib
from hashlib import md5

dico = {'Name': 'Zara', 'Age': 27}
sup_str = 'super'


class crazy_string(object):
    
    def __init__(self, value):
        self.value = value
    
    def __hash__(self):
        m = hashlib.md5()
        m.update(self.value)
        return int('0x'+m.hexdigest(),0)
    
    def __eq__(self, other):
        if type(other) == type(self):
            return self.value == other.value
        else:
            return False
    

if __name__ == "__main__":
    print "Value : %s" %  dict.get('Age')
    print "Value : %s" %  dict.get('Sex', "Never")
    print sup_str, '\t', hex(id(sup_str))
    sup_str += '!'
    print sup_str, '\t', hex(id(sup_str))
    print '---------------------'
    cstr = crazy_string('super')
    dico[cstr] = 'sure?'
    print dico
    cstr.value = 'super!'
    print dico
    dico[cstr]


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-e18767a33d71> in <module>()
     24 
     25 if __name__ == "__main__":
---> 26     print "Value : %s" %  dict.get('Age')
     27     print "Value : %s" %  dict.get('Sex', "Never")
     28     print sup_str, '\t', hex(id(sup_str))

TypeError: descriptor 'get' requires a 'dict' object but received a 'str'

Ex6: Mutability: pointers v.s. copies

Explain what happens below and correct the erroneous lines


In [11]:
global foo
foo = ['bar']


def ret_foo():
    return foo


def nice_foo():
    foo = ['nice']
    return foo


def real_nice_foo():
    foo[0] = 'nice'
    
    
def foo_mod_function():
    global foo
    foo += ['bar'] 

    
def simple_function(value = 0):
    print value
    value += 1
    return value
    
    
def first_closure(value = [0]):
    print value[0]
    value[0] += 1
    return value[0]
    
    
if __name__ == '__main__':
    print foo 
    print ret_foo(), hex(id(foo))
    print '---------------'
    same_foo = nice_foo()
    print same_foo, hex(id(same_foo))
    print foo, hex(id(foo))
    print '---------------'
    real_nice_foo()
    print foo, hex(id(foo))
    print '---------------'
    foo_mod_function()
    print '---------------'
    simple_function()
    simple_function()
    simple_function(1)
    assert(simple_function() == simple_function())
    print '---------------'
    first_closure()
    first_closure()
    first_closure([1])
    first_closure([1])
    first_closure()
    print '---------------' 
    print dir(first_closure)
    print first_closure.func_closure
    print first_closure.func_defaults
    print first_closure.__closure__
    print '---------------'
    assert(first_closure() == first_closure())


['bar']
['bar'] 0xa3bc908L
---------------
['nice'] 0xa3bc8c8L
['bar'] 0xa3bc908L
---------------
['nice'] 0xa3bc908L
---------------
---------------
0
0
1
0
0
---------------
0
1
1
1
2
---------------
['__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__doc__', '__format__', '__get__', '__getattribute__', '__globals__', '__hash__', '__init__', '__module__', '__name__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'func_closure', 'func_code', 'func_defaults', 'func_dict', 'func_doc', 'func_globals', 'func_name']
None
([3],)
None
---------------
3
4
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-11-ea443d211f4a> in <module>()
     62     print first_closure.__closure__
     63     print '---------------'
---> 64     assert(first_closure() == first_closure())

AssertionError: 

Ex7: Closures in Closures enclose things well

Get rid of errors and ensure that everything works as the docs say


In [18]:
def closure(variable, memory = []):
    print memory
    memory += [variable]

    
def closure_generator(initial_memory):
    
    def inner_closure(variable, memory=[initial_memory]):
        """
        Adds variavble to the memory. If types mismatch, cast everything to string
        """
        print memory
        memory[0] += variable
    
    return inner_closure


if __name__ == "__main__":
    print dir(closure)
    print closure.func_defaults
    closure(3)
    closure(2)
    closure('pretty')
    closure('')
    _closure = closure_generator(3)
    _closure(1)
    _closure(3)
    _closure('pretty')


['__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__doc__', '__format__', '__get__', '__getattribute__', '__globals__', '__hash__', '__init__', '__module__', '__name__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'func_closure', 'func_code', 'func_defaults', 'func_dict', 'func_doc', 'func_globals', 'func_name']
([],)
[]
[3]
[3, 2]
[3, 2, 'pretty']
[3]
[4]
[7]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-18-03cf97d0591c> in <module>()
     26     _closure(1)
     27     _closure(3)
---> 28     _closure('pretty')

<ipython-input-18-03cf97d0591c> in inner_closure(variable, memory)
     11         """
     12         print memory
---> 13         memory[0] += variable
     14 
     15     return inner_closure

TypeError: unsupported operand type(s) for +=: 'int' and 'str'

Ex8: Wrappers wrap well

Modify the wrapper below so that it caches calls to a function

you might need the following knowledge:

- ```*args``` give you a list of unnamed arguments passed to the function
- `**kwargs` give you a dict of named arguments passed to the function
- `kwargs.update(dict(zip(func.func_code.co_varnames, args)))` allows you to transform unnamed arguments into named ones
- tuples are only equal if their contents and order in which contents appear are identical
- you can enumerate the contents of a dict with a `name_of_dict.iteritems()` command
- you can sort in a simple maner with a `sorted(whatever you need to sort)`
- dicts only take as keys hashable items

In [50]:
# courtesy http://www.brianholdefehr.com/decorators-and-functional-python
def logging_decorator(func):
    def wrapper():
        wrapper.count += 1
        print "The function I modify has been called {0} times(s).".format(
              wrapper.count)
        func()
    wrapper.count = 0
    return wrapper


def a_function():
    print "I'm a normal function."

def print_args(*args):
    print args
    
def print_kwargs(**kwargs):
    print kwargs
    
if __name__ == "__main__":
    modified_function = logging_decorator(a_function)

    modified_function()
    modified_function()
    
    @logging_decorator
    def a_function():
        print "I'm a normal function."
    a_function()
    lst = 1, 2, 3, 4
    print_args(lst)
    print_args(*lst)
    dct = {'a':1, 'b':2, 'c':3}
#     print_kwargs(dct)
    print_kwargs(**dct)


The function I modify has been called 1 times(s).
I'm a normal function.
The function I modify has been called 2 times(s).
I'm a normal function.
The function I modify has been called 1 times(s).
I'm a normal function.
((1, 2, 3, 4),)
(1, 2, 3, 4)
{'a': 1, 'c': 3, 'b': 2}

In [113]:
# Your solution here please

def caching_decorator(func):
    
    def wrapper(*args, **kwargs):
        pass 
    
    return wrapper


def a_function(arg1, arg2):
    print arg1, arg2
    return arg1 + arg2
    
if __name__ == "__main__":
    modified_function = caching_decorator(a_function)

    print modified_function(1, 1)
    print modified_function('a', 'b')
    print modified_function('a', 'b')


1 1
2
a b
ab
ab

Ex9: runtime class modification

Explain where the erorrs come from and correct them


In [19]:
class little_class_in_the_prairie(object):
    
    def __init__(self, payload):
        self.payload = payload
    
    @staticmethod
    def present_myself(asker='John'):
        return "I am a little house in the prairie and I know '%s'"% asker
    
    @classmethod
    def present_myself_1(cls, asker='John'):
        return "I am a %s and I know '%s'"% (cls, asker)
    
    def present_myself_2(self, asker='John'):
        return "I am a little house in the prairie, I know '%s' and my content is '%s'"% (asker, self.payload)
 
    
def outside_function(self):
    return self.payload


if __name__ == "__main__":
    lcitp = little_class_in_the_prairie('inner payload')
    print lcitp.payload
    print '----------------'
#     print little_class_in_the_prairie.payload
    print little_class_in_the_prairie.present_myself()
    print lcitp.present_myself()
    print little_class_in_the_prairie.present_myself_1()
    print lcitp.present_myself_1()
#     print little_class_in_the_prairie.present_myself_2()
    print lcitp.present_myself_2()
    print '----------------'
    print getattr(lcitp, 'payload')
    print getattr(lcitp, '__init__')
    print getattr(lcitp, 'present_myself_2')
    print getattr(lcitp, 'present_myself_2')()  # < Yep, this is totally a Pythonic currying notation
    print '----------------'
    print hasattr(lcitp, 'present_myself_2')
    print hasattr(lcitp, 'not_my_function')
    setattr(lcitp, 'not_my_function', outside_function)
    print hasattr(lcitp, 'not_my_function')
    print lcitp.not_my_function()


inner payload
----------------
I am a little house in the prairie and I know 'John'
I am a little house in the prairie and I know 'John'
I am a <class '__main__.little_class_in_the_prairie'> and I know 'John'
I am a <class '__main__.little_class_in_the_prairie'> and I know 'John'
I am a little house in the prairie, I know 'John' and my content is 'inner payload'
----------------
inner payload
<bound method little_class_in_the_prairie.__init__ of <__main__.little_class_in_the_prairie object at 0x000000000A3C27F0>>
<bound method little_class_in_the_prairie.present_myself_2 of <__main__.little_class_in_the_prairie object at 0x000000000A3C27F0>>
I am a little house in the prairie, I know 'John' and my content is 'inner payload'
----------------
True
False
True
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-19-61a861dc521d> in <module>()
     41     setattr(lcitp, 'not_my_function', outside_function)
     42     print hasattr(lcitp, 'not_my_function')
---> 43     print lcitp.not_my_function()
     44 

TypeError: outside_function() takes exactly 1 argument (0 given)

Ex10: Advanced wrappers

Modify the wrapper below so that it times the execution of the function it wraps. Make sure the wrapper preserves the name of the function and it's documentation.


In [20]:
def args_as_ints(f):
    def g(*args, **kwargs):
        args = [int(x) for x in args]
        kwargs = dict((k, int(v)) for k, v in kwargs.items())
        return f(*args, **kwargs)
    return g

@args_as_ints
def funny_function(x, y, z=3):
    """Computes x*y + 2*z"""
    return x*y + 2*z

Ex11: Pythonic map-reduce


In [22]:
def increment_list(lst):
    newlist = []
    for elt in lst:
        newlist.append(elt+1)
    return newlist

def plus_one(integer):
    return integer+1

def sum_integers(integer1, integer2):
    return integer1+integer2

if __name__ == '__main__':
    test_lst =  [1,1,1,1]

    print increment_list(test_lst)
    print map(plus_one, test_lst)
    print '-----------------'
    %timeit increment_list(test_lst)
    %timeit map(plus_one, test_lst)
    print '-----------------'
    print sum(test_lst)
    print reduce(sum_integers, test_lst)


[2, 2, 2, 2]
[2, 2, 2, 2]
-----------------
1000000 loops, best of 3: 509 ns per loop
1000000 loops, best of 3: 672 ns per loop
-----------------
4
4

Ex12: Iterators, itertools and list comprehensions

For the first two blocks of code, explain what each line does, correct the eventual errors. For the third block of code, read the documentation for all the imported module and illustrate it with an example


In [23]:
from time import sleep

def count_to_3():
    print "begin"
    for i in range(3):
        print "before yield", i
        yield i
        print "after yield", i
    print "end"
    
if __name__ == "__main__":
    counter = count_to_3()
    print '--------------------'
    counter.next()
    sleep(1)
    counter.next()
    sleep(1)
    counter.next()
    sleep(1)
    counter.next()
    print '--------------------'
    counter = count_to_3()
    print '--------------------'
    for i in counter:
        print i


--------------------
begin
before yield 0
after yield 0
before yield 1
after yield 1
before yield 2
after yield 2
end
---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-23-ae26215dec34> in <module>()
     18     counter.next()
     19     sleep(5)
---> 20     counter.next()
     21     print '--------------------'
     22     counter = count_to_3()

StopIteration: 

In [26]:
def increment_list(lst):
    newlist = []
    for elt in lst:
        newlist.append(elt+1)
    return newlist

def plus_one(integer):
    return integer+1

def sum_integers(integer1, integer2):
    return integer1+integer2

if __name__ == '__main__':
    test_lst =  [1,1,1,1]

    print increment_list(test_lst)
    print [elt+1 for elt in test_lst]
    print map(plus_one, test_lst)
    print '-----------------'
    %timeit increment_list(test_lst)
    %timeit [elt+1 for elt in test_lst]
    %timeit map(plus_one, test_lst)
    print '-----------------'
    print sum(test_lst)
    print reduce(sum_integers, test_lst)
    print '-----------------'
    
    test_generator = (elt for elt in test_lst)
    print test_generator
    for i in test_generator:
        print i
    print '-----------------'
    dico = dict( (_i, elt) for _i, elt in enumerate(test_lst))
    print dico
    dico = dict( (_i, elt) for _i, elt in enumerate(test_generator))
    print dico


[2, 2, 2, 2]
[2, 2, 2, 2]
[2, 2, 2, 2]
-----------------
1000000 loops, best of 3: 495 ns per loop
1000000 loops, best of 3: 217 ns per loop
1000000 loops, best of 3: 633 ns per loop
-----------------
4
4
-----------------
<generator object <genexpr> at 0x000000000A3B0BD0>
1
1
1
1
-----------------
{0: 1, 1: 1, 2: 1, 3: 1}
{}

In [27]:
def layer_1(iterator):
    """ add one """
    for elt in iterator:
        yield elt + 1

def layer_2(iterator):
    """  convert ot string """ 
    for elt in iterator:
        yield str(elt)

def layer_3(iterator):
    """  add super-prefix """ 
    for elt in iterator:
        yield 'super' + elt

tst_lst = [1, 2, 3, 4, 5]

if __name__ == "__main__":
    for elt in layer_3(layer_2(layer_1(tst_lst))):
        print elt
    
    print '----------------------------------'
    
    for elt in reduce((lambda x, y: y(x)), [layer_1, layer_2, layer_3], tst_lst):  # in functional languages, we are doing a fold
        print elt


super2
super3
super4
super5
super6
----------------------------------
super2
super3
super4
super5
super6

In [28]:
from itertools import cycle, chain, compress, ifilter, imap, izip, izip_longest, product, permutations, combination


---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-28-4c0ed2fce9da> in <module>()
----> 1 from itertools import cycle, chain, compress, ifilter, imap, izip, izip_longest, product, permutation, combination

ImportError: cannot import name permutation

Ex13: Lambdas

Sort the following dicts using a lambda function. Here are some informations you might need:

- 'sorted()' function will perform the sorting of a list for you
- 'key' function allows you to select what element will be selected for the comparison if value is a lists of elements 
- 'reverse' allows to invert the sorting order
- '{}.iteritems()' will create an iterator of (key, value) pairs from a dictionary

In [24]:
dct = {'a':(1, 1), 'b':(1, 2), 'c':(2, 3)}

Ex14: Functional Python

use a lambda function to count the number of elements in the following lists and then to count the number of elements with length over 4 in the strings below:


In [29]:
# map-imap-ifilter-reduce
from itertools import ifilter, imap, count
from collections import defaultdict, Counter

a = [1, 2, 3]
b = [4, 5, 6, 7]
c = [8, 9, 1, 2, 3]

s1 = 'fs a prul'
s2 = 'prul a fs tke i dama'
s3 = 'dama ka a i prul'

# Solution:

Ex15: collections Implement each task in the simplest manner you could think of, and then using the collections. Compare the speed of implementations.


In [30]:
from collections import namedtuple, deque, Counter, OrderedDict, defaultdict

dct = {'a':1,'b':2,'c':3}
master_dct = {}
st = [1,2,3,4,5,6,1,3,7,4,1,6,7]
master_lst = []

def naive_sort_dict():
    pass

def naive_dict_stabilization():
    pass

def naive_counter():
    pass

def naive_dict_ordering():
    pass

if __name__ == "__main__":
    pass

Ex16: Basic threading


In [ ]:
from multiprocessing import Pool

# WARNING: threads and multiprocessing don't work with REPL but require parsed files.
# ipython is ok

# Map, imap, 
# partition
# reduce as chain of map responses to which a reduce function is applied

def f(x):
    return x*x

def f2(x1, x2):
    return "%s; %s", (x1, x2)

if __name__ == '__main__':
    p = Pool(5)
    print(p.map(f, [1, 2, 3]))
    for elt in (p.imap(f, ['a', 'b', 'c'], ['b','c','d'])):
        print elt
    p.terminate()

Ex 17: using RAM profiler

1) Download and install python memory profiler:

pip install -U memory_profiler

2) Execute a memory profiler on the following function (NB: you'll need to fill it in locall):

3) Wrap your function with a logging wrapper from a previous exercice are re-run the profiler.

What do you observe?


In [31]:
# you will need to copy this code you your local machine and run it there

from memory_profiler import profile

@profile
def my_func():
    a = [1] * (10 ** 6)
    b = [2] * (2 * 10 ** 7)
    del b
    return a

if __name__ == '__main__':
    my_func()


Filename: <ipython-input-31-776b2651beed>

ERROR: Could not find file <ipython-input-31-776b2651beed>
NOTE: %mprun can only be used on functions defined in physical files, and not in the IPython environment.

Ex 18: Basic numpy arrays and their variants

1) Python matrix and python array are not the same. For the sake of simplicity, everyone is usung np.array and not np.matrix/
2) array creation routines : random data, linspace, mesh, diag, zeros
3) .dtype and astype(np.float)
4) show np.save, np.load
5) .shape, not dim(); [:, 1]; newaxis; boolean indexing; stepping [::2], start/stop: [2:], [:-1]
6) Linear algebra basics: `*`, .T/.H, dot(a,b), .I (achtung: raises exception, generally better to use explicit solvers)
7) reshape, pad, newaxis, repeat, concatenate
8) copy v.s. deepcopy
9) function vectorization
10) boolean element logical tests
11) scipy.sparse.lilmatrix
12) show how embedded list manipulation become easier in numpy

In [2]:
import numpy as np

def f1(lst_arg):
    collection_list = []
    for sublist in lst_arg:
        collection_list.append([])
        for element in sublist:
            collection_list[-1].append(element**2)
            
    return collection_list

def f2(lst_args):
    return np.array(lst_args)**2
            
if __name__ == "__main__":
    lst = [[i]*10 for i in range (0, 10)]
#     print lst
#     print f1(lst)
#     print f2(lst)
    %timeit f1(lst)
    %timeit f2(lst)


100000 loops, best of 3: 12.9 µs per loop
100000 loops, best of 3: 6.38 µs per loop

In [37]:
from itertools import combinations_with_replacement

arr = np.zeros((4, 4, 2))
for i, j in combinations_with_replacement(range(0,4), 2):
    arr[i, j] = (i, j)
    arr[j, i] = (j, i)

print arr.shape

lines =arr[:, :, 0]
columns = arr[:, :, 1]
print lines
print columns

print lines[1, :]
print columns[:, 1]

lines[::2,:]


(4L, 4L, 2L)
[[ 0.  0.  0.  0.]
 [ 1.  1.  1.  1.]
 [ 2.  2.  2.  2.]
 [ 3.  3.  3.  3.]]
[[ 0.  1.  2.  3.]
 [ 0.  1.  2.  3.]
 [ 0.  1.  2.  3.]
 [ 0.  1.  2.  3.]]
[ 1.  1.  1.  1.]
[ 1.  1.  1.  1.]
[[ 1.  0.]
 [ 0.  0.]]
[['0.0' '0.0']
 ['0.0' '0.0']]

In [ ]:
print lines>1
print columns>1
print np.logical_or(lines>1, columns>1)
print np.logical_and(lines>1, columns>1)
print np.logical_xor(lines>1, columns>1)

print lines[lines>1]
new_arr = np.zeros((4,4))
new_arr[lines>1] = lines[lines>1]
print new_arr

In [40]:
A = np.zeros((2, 2))
B = A
B[0, 0] = 1
print A

print A.dtype

A = np.zeros((2, 2))
A = A.astype(np.str)
B = A.copy()
B[0, 0] = '1.0'
print A

print A.dtype


[[ 1.  0.]
 [ 0.  0.]]
float64
[['0.0' '0.0']
 ['0.0' '0.0']]
|S32

In [52]:
print np.pad(lines,((1,2),(3,4)), 'edge')
print np.reshape(arr, (2,4,4))
print lines[np.newaxis,:, :]
print np.repeat(lines, 2, axis=1)
print np.concatenate((lines, columns), axis=1)
print n_arr = np.rollaxis(arr, 2)
print n_arr.shape
print arr.shape


Out[52]:
(4L, 4L, 2L)

In [66]:
print lines*columns
print lines
lines *= 10
print lines
lines /= 10
print lines
hmat = np.dot(lines, columns)
print hmat
print np.linalg.eigh(hmat)


[[ 0.  0.  0.  0.]
 [ 0.  1.  2.  3.]
 [ 0.  2.  4.  6.]
 [ 0.  3.  6.  9.]]
[[ 0.  0.  0.  0.]
 [ 1.  1.  1.  1.]
 [ 2.  2.  2.  2.]
 [ 3.  3.  3.  3.]]
[[  0.   0.   0.   0.]
 [ 10.  10.  10.  10.]
 [ 20.  20.  20.  20.]
 [ 30.  30.  30.  30.]]
[[ 0.  0.  0.  0.]
 [ 1.  1.  1.  1.]
 [ 2.  2.  2.  2.]
 [ 3.  3.  3.  3.]]
[[  0.   0.   0.   0.]
 [  0.   4.   8.  12.]
 [  0.   8.  16.  24.]
 [  0.  12.  24.  36.]]
(array([ -2.40201475e-15,   0.00000000e+00,   1.50133186e-15,
         5.60000000e+01]), array([[ 0.        ,  1.        , -0.        ,  0.        ],
       [ 0.5976232 ,  0.        , -0.75592191,  0.26726124],
       [-0.74464991,  0.        , -0.39972769,  0.53452248],
       [ 0.29722554,  0.        ,  0.5184591 ,  0.80178373]]))

In [18]:
print np.random.rand(2,2)
print np.zeros((2,2))
print np.ones((2,2))
print np.linspace(0, 3, 4).reshape((2,2))
print np.diag(np.array([1,2]))


[[ 0.52291016  0.0861436 ]
 [ 0.55808157  0.54514512]]
[[ 0.  0.]
 [ 0.  0.]]
[[ 1.  1.]
 [ 1.  1.]]
[[ 0.  1.]
 [ 2.  3.]]
[[1 0]
 [0 2]]

Ex 19: Using numpy arrays indexing

1) In the following exercice, create a function without "for" or "while" loops that takes in a numpy matrix and index list and returns matrix sorted by the indexes provided, so that the matrix is first sorted on the column a), then column b), then column c), etc..., a-.. being indexes provided.

2) Write a unit-test that

  • tests if the sorting was performed correctly
  • measures the amoung of RAM you've used
  • measures the execution time of your function

3) Advanced, thus optional:

  • Create a numpy array representing a mapping from unique identifiers to 5 different "traits", that are a mix of string, float and integers.
  • Add a sixth trait that is a mapping to 10 different values, each of which is a 5-tuple of strings.
  • Fill the array with sample data (~100 unique identifiers; )
  • Create a routine that allows to:
    • retrieve the set of unique identifiers that posess a defined trait
    • retieve the set of unique identifiers for which one of the 5-tuple of strings contain a defined substring

In [13]:
import numpy as np

# select filtering indexes
filter_indexes = [1, 3]
# generate the test data
raw_data = np.random.randint(0, 4, size=(50,5))


# create a column that we would use for indexing
index_columns = raw_data[:, filter_indexes]

# sort the index columns by lexigraphic order over all the indexing columns
argsorts = np.lexsort(index_columns.T)

# sort both the index and the data column
sorted_index = index_columns[argsorts, :]
sorted_data = raw_data[argsorts, :]

# in each indexing column, find if number in row and row-1 are identical
# then group to check if all numbers in corresponding positions in row and row-1 are identical
autocorrelation = np.all(sorted_index[1:, :] == sorted_index[:-1, :], axis=1)

# find out the breakpoints: these are the positions where row and row-1 are not identical
breakpoints = np.nonzero(np.logical_not(autocorrelation))[0]+1

# finally find the desired subsets 
subsets = np.split(sorted_data, breakpoints)

Ex 20: fitting a function


In [19]:
from scipy import stats

X = stats.poisson(3.5)
Y = stats.norm()

t_statistic, p_value = stats.ttest_ind(X.rvs(size=1000), X.rvs(size=1000))

print "t-statistic =", t_statistic
print "p-value =", p_value


t-statistic = -0.907220652173
p-value = 0.364399525926

In [31]:
from scipy.optimize import curve_fit
from matplotlib import pyplot as plt

def func(x, a, b, c):
    return a * np.exp(-b * x) + c

xdata = np.linspace(0, 4, 50)
y = func(xdata, 2.5, 1.3, 0.5)
ydata = y + 0.2 * np.random.normal(size=len(xdata))
popt, pcov = curve_fit(func, xdata, ydata)
print popt
print pcov

plt.plot(xdata, ydata)
plt.plot(xdata, func(xdata, *popt))
plt.show()


[ 2.58142367  1.50786117  0.50400054]
[[ 0.02050211  0.01022599 -0.00073763]
 [ 0.01022599  0.02869223  0.00608063]
 [-0.00073763  0.00608063  0.00283359]]

In [22]:
from scipy import odr

def lin(p, x):
    a, b, c, d = p
    return c*x

def pol2(p, x):
     a, b, c, d = p
     return b*x**2 + c*x + d


def pol3(p, x):
    a, b, c, d = p
    return a*x**3 + b**x**2 + c*x + d

def regress(x, y, x_sd, y_sd, function_to_fit=pol3, name_to_plot='', figure_no=1):

    def plot_result():
        x_fit = np.linspace(np.min(x)*0.95, np.max(x)*1.05, 1000)
        y_fit = function_to_fit(out.beta, x_fit)
        lin_fit = lin(lin_out.beta, x_fit)
        plt.subplot(2, 2, figure_no)
        plt.title(name_to_plot+': \n %.2fx^3 + %.2fx^2 + %.2fx + %.2f v.s. %.2fx. \n Res var gain: x %.2f' % tuple(out.beta.tolist()+[lin_out.beta.tolist()[2]]+[lin_out.res_var/out.res_var]))
        plt.errorbar(x, y, xerr=x_sd, yerr=y_sd, linestyle='None', marker='x')
        plt.plot(x_fit, y_fit, 'g')
        plt.plot(x_fit, lin_fit, 'r')
        plt.autoscale(tight=True)

    model = odr.Model(function_to_fit)
    data = odr.RealData(x, y, sx=x_sd, sy=y_sd)
    _odr = odr.ODR(data, model, beta0=[1., 1., 10., 0.01])
    out = _odr.run()

    lin_model = odr.Model(lin)
    lin_odr = odr.ODR(data, lin_model, beta0=[0., 0., 10., 0.01])
    lin_out = lin_odr.run()

    lin_out.pprint()

    plot_result()

    return out.beta

Ex 21: Matplotlib and pyplot:


In [53]:
from matplotlib import pyplot as plt

years = np.linspace(1800, 2010, 210)
temp_data = np.random.rand(210)

fig, ax = plt.subplots(figsize=(14,4))
ax.plot(years, temp_data)
ax.axis('tight')
ax.set_title('tempeatures in Stockholm')
ax.set_xlabel('year')
ax.set_ylabel('temperature (C)');
plt.show()

# axes sharing:
f, (ax1, ax2, ax3) = plt.subplots(3, sharex=True, sharey=True)
ax1.plot(years, temp_data)
ax1.set_title('Sharing both axes')
ax2.plot(years, temp_data, 'g.')
ax3.scatter(years, 2 * temp_data ** 2 - 1, color='r')
f.subplots_adjust(hspace=0)
plt.setp([a.get_xticklabels() for a in f.axes[:-1]], visible=False)
plt.show()

r_array = np.random.rand(100, 100)
plt.imshow(r_array, interpolation='nearest', cmap='gray')
plt.colorbar()
plt.show()


Additional:

Show that each module gives an access to a set of elements that are available as a dict mapping to different elements


In [5]:
import math

print(dir(math))


['__doc__', '__name__', '__package__', 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'hypot', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'modf', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'trunc']

Show that there are actually closures inside the functions we are working on


In [21]:
def f1(arg1=None):
    return None


def f_outer():
    
    def f_inner(params):
        return a+params
    
    a = 1
    
    return f_inner


if __name__ == "__main__":
    print dir(f1)
    print f1.__class__
    print f1.__defaults__
    
    print dir(f_outer)
    print f_outer()
    print dir(f_outer())
    print f_outer().__closure__


['__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__doc__', '__format__', '__get__', '__getattribute__', '__globals__', '__hash__', '__init__', '__module__', '__name__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'func_closure', 'func_code', 'func_defaults', 'func_dict', 'func_doc', 'func_globals', 'func_name']
<type 'function'>
(None,)
['__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__doc__', '__format__', '__get__', '__getattribute__', '__globals__', '__hash__', '__init__', '__module__', '__name__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'func_closure', 'func_code', 'func_defaults', 'func_dict', 'func_doc', 'func_globals', 'func_name']
<function f_inner at 0x00000000040AC278>
['__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__doc__', '__format__', '__get__', '__getattribute__', '__globals__', '__hash__', '__init__', '__module__', '__name__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'func_closure', 'func_code', 'func_defaults', 'func_dict', 'func_doc', 'func_globals', 'func_name']
(<cell at 0x0000000004157318: int object at 0x0000000001D28478>,)

/ operator means integer division on integers and float division on floats in python 2, but in python 3 the division is always float.

* operator will multiply any number-likes, but will repeat a string or a list


In [2]:
print 1/2

print 1./2


0
0.5

In [3]:
from __future__ import division
print 1/2


0.5

In [5]:
print 'abc'*3
print [1, 2, 3]*3


abcabcabc
[1, 2, 3, 1, 2, 3, 1, 2, 3]

"%s, %f" formatting allows to easily format strings you would like to pring


In [8]:
print "%s is ok, but: \t %s is better off as %.2f" %(3., 1/3, 1/3)


3.0 is ok, but: 	 0.333333333333 is better off as 0.33

Using docstrings properly is absolutely essential for writing good python code. Not only it will help the tomorrow-you to figure out what you've just did, combined with Sphinx it makes documentation creation for your code really easy.


In [11]:
def func1(s):
    """
    Print a string 's' and tell how many characters it has    
    """
    
    print(s + " has " + str(len(s)) + " characters")

if __name__ == "__main__":
    print func1.__doc__
    print help(func1)


    Print a string 's' and tell how many characters it has    
    
Help on function func1 in module __main__:

func1(s)
    Print a string 's' and tell how many characters it has

None