Functions

The basic unit of code organization in Python is Functions. Don't write classes until you really have to (see this for more details). As you've already seen, functions are defines by the def keyword


In [ ]:
def a_func(x, y, z=1.5):
    if z > 1:
        return z * (x + y)
    else:
        return z / (x + y)

As you can see, multiple return statements are just fine. If the end of a function is reached, None is returned.
Each function can have positional and keyword arguments. Keyword arguments are mostly used to specify default values as well as optional values. Keyword arguments can be specified in arbitrary order. And keyword arguments can also be used as positional arguments.


In [ ]:
a_func(3, 4)

In [ ]:
a_func(3, 4, 2)

In [ ]:
a_func(3, 4, z=12)

Keyword arguments must succeed


In [ ]:
a_func(z=12, 3, 4)

Namespaces

To functions there are two different scopes which can contain variables: global and local.
Variables that are assigned within a function are local variables. That namespace is created upon calling the function and destroyed when the functions returns.


In [ ]:
def func():
    inner = []
    for i in range(5):
        inner.append(i)

In [ ]:
func()
inner

Here, a is a local variable. That means a is created when the function is entered, 5 elements are added then it is destroyed when the functions returns. It can't be accessed from the outside.


In [ ]:
a = []
def func():
    for i in range(5):
        a.append(i)

In [ ]:
func()
a

But what happens when we have conflicting variable names? what do you think will the value of a be after func is called?


In [ ]:
a = []
def func():
    a = [1, 2]
    for i in range(5):
        a.append(i)

In [ ]:
func()
a

To make Python do what was our intent we need to use the global keyword.


In [ ]:
a = []
def func():
    global a
    a = [1, 2]
    for i in range(5):
        a.append(i)

In [ ]:
func()
a

In [ ]:
Image('images/mem3.jpg')

Beware: Don't use global variables!

Functions can be declared anywhere, even in functions.


In [ ]:
def f(x):
    def g(x):
        return x + 1
    return g(x)**2
print(f(4))
g(3)

Return multiple values


In [ ]:
def f():
    return 1, 5, 3

a, b, c = f()
print(a, b, c)

But are this really three separate values? Comma-seperates lists of values are of which type?

Functions are first level citizens


In [ ]:
def f(text, g):
    return g(text)

In [ ]:
f('foo', str.upper)

Anonymous functions (aka as lambda)

Don't be scared by the name lambda. It refers the Lambda calculus by Alonzo Church. It really just means "Functions without names". The syntax looks like this:


In [ ]:
lambda x: x ** 2

A lambda function can be assigned to a variable which makes this variable callable.


In [ ]:
f = lambda x: x ** 2
f(2)

These functions often come in handy in data mining, since many data transformation functions take functions as arguments. A typical example is the build-in sorted function.


In [ ]:
sorted??

Say we want to sort the following date by the second value of each tuple.


In [ ]:
data = [(4, 0), (1, 4), (3, 9), (2, 1)]

In [ ]:
sorted(data)

In [ ]:
sorted(data, key=lambda x: x[1])

Closures

At first encounter Closures are black magic. It takes some getting used to. But once mastered, they are a powerful tool. It's mentioned here for completeness. You can write good code without closures.


In [ ]:
def make_closure(a):
    def closure():
        print('I know the secret: %d' % a)
    return closure

In [ ]:
closure = make_closure(5)
closure()

What has just happend?

  • A function closing over (hence, the name Closure) another function was defined.
  • The make_closure function got called. During its execution the function closure got defined and finally returned to the caller (note: the functions is returned!).
  • The returned function gets called.
  • During its execution the closure function accesses the scope of the already destroyed function make_closure and receives the value 5 from it.

Generators

Before talking about Generators, we need to learn more about Interators.

One of the reasons for Pythons popularity is its unified way to iterate over sequences. Not just tuples, dicts and other build-ins but also custom ones. Objects that implement the __iter__() method are automatically iterable and thereby compatible to the vast Python software stack. To give you an idea about the iter protocol, see the following.


In [ ]:
a = [1, 2, 3]

In [ ]:
iter_a = iter(a)
iter_a

In [ ]:
next(iter_a)

A generator is a way to construct a iterable object. Whereas normal functions execute and return a single value, generators return a sequence of values lazily, pausing after each one until the next one is requested. Generators are normal functions, except that they use the yield statement rather than the return statement.


In [ ]:
def squares(upper_bound=10):
    x = 1
    for i in range(upper_bound):
        yield i**2

In [ ]:
my_squares = squares()

In [ ]:
next(my_squares)

In [ ]:
list(my_squares)

In contrast to a normal function the execution of a generator is stopped at the yield statement. And the corresponding value gets returned. Execution is continued as soon as next method is called on the generator. That means: Values are generated lazily. Thus, deferring computation and saving memory.

But, it wouldn't be Python if there wouldn't be a shorthand notation for generators.
The syntax for this shorthand is borrowed from list comprehensions.


In [ ]:
generator = (i**2 for i in range(10))
generator

In [ ]:
next(generator)

In [ ]:
list(generator)

Classes

This part will be as brief as possible. There is a lot to classes in Python, but only the bare minimum is required for this lecture. And that is what I'll focus on here.
Declaring a class is done using the class statement.


In [ ]:
class Rectangle(object):
    def __init__(self, x, y):
        self.x = x
        self.y = y
        self._PI = 3.145

    def area(self):
        return self.x * self.y

    def perimeter(self):
        return 2 * self.x + 2 * self.y
    
    def _bogus_area(self):
        return self.x * self.x

my_rec = Rectangle(2, 4)    
my_rec.area(), my_rec.perimeter()

Note that each method is passed a self object. This is similar to Javas this. The only difference is that it is mentioned explicitly.
The __init__ method behaves like a constructor in Java, but actually it isn't. Why? Methods and members whos name starts with a _ are hidden to other objects (similar to the private modifier in Java). They don't show up when investigating the object.


In [ ]:
my_rec.<TAB>

But objects can be investigated more deeply by their default __dir__ and __dict__ perperties


In [ ]:
my_rec.__dict__

In [ ]:
my_rec.__dir__()

These hidden properties can be used!


In [ ]:
my_rec._PI

In [ ]:
my_rec._bogus_area()

The language itself does not enforce any access restrictions. When Guido van Rossum was asked whether this is a flaw in the language design he replied: "Python is a language for adults. Don't do stupid things, behave like a grownup".
Put differently: "Once you fiddle with those things, you are most likely doing it wrong."

Files, Paths, the OS interface and JSON

Note: This is just a small primer to get you up and running!
It's crucial to get data into the program and the results back to disk. That's what we are going to cover here.
In Python it's very easy to open and read a file:


In [ ]:
fh = open('material/foo.py')
print(fh.read())
fh.close()

But opening or reading a file can cause errors.


In [ ]:
fh = open('material/fo0.py')
print(fh.read())
fh.close()

Which potentially leave with an unclosed file.
Let's catch that error instead.


In [ ]:
try:
    fh = open('fo0.py')
    print(fh.read())
except FileNotFoundError as e:
    raise e
finally:
    fh.close()

But there is a more pythionic way, a context manager:


In [ ]:
with open('fo0.py') as fh:
    print(fh.read())

The os.path has a lot of useful tools to deal with files and directories:

  • exists
  • isdir
  • isfile
  • join
  • sep
  • others

In [ ]:
from os import path
path.<TAB>

os.walk uses the force to accomplish it's goales!


In [ ]:
import os
foo = os.walk('/')

In [ ]:
next(foo)

In [ ]:
from IPython.display import display
with open('images/the_force.gif','rb') as f:
    display(Image(f.read()), format='png')

JSON is your friend

  • It serializes and deserializes data strctures
  • It just works
  • Doesn't have a version number? Why?
  • Multiplatform, multiplanguage, multi-everything
  • It's entire spec fits on a business card

In [ ]:
Image('images/json.jpg')

There are only four methods you need to know about.


In [ ]:
import json
json.<TAB>

In [ ]:
data = {
    'numbers': [1,2,3,4,5],
    'tuple': (12, 14, 22),
    'letters': 'abcdef',
    'embedded dict': {
        'more': 'date'
    }
}
with open('material/data.json', 'w') as fh:
    json.dump(data, fh)

In [ ]:
!cat material/data.json

In [ ]:
old_data = None
with open('material/data.json') as fh:
    old_data = json.load(fh)
old_data

In [ ]:
Image('images/mem2.jpg')