Object oriented programming (OOP)

Python is not ony a powerful scripting language, but it also supports object-oriented programming. In fact, everything in Python is an object. Working with functions is instead called procedure-oriented programming. Both styles (or philosophies) are acceptable and appropriate.

Objected-oriented programming is well suited for creating modules and APIs.

Objects are defined and handled trhough the Class type

More reading on objected-oriented programming:

A note about objects (reprise)

  • Everything in Python is an object, which can be seen as an advanced version of a variable
  • objects have methods
  • the dir keyword allows the user to discover them

This is different from other languages like c or C++, where int and bool are primitive types


In [ ]:
print(dir(bool))

A note about scopes and namespaces

Like in other programming languages, variables are only visible in certain parts of the code, formally termed "scopes". In practical terms this means that certain variables will only be visible inside a limited part of the code, and that variables in different scopes can have the same name, without generating any conflict.

Consider the following example:


In [ ]:
def f():
    x = 1
    print(x)

x = 2
f()
print(x)

List-comprehensions (and all other comprehensions) have their own scope


In [ ]:
x = 2
a = [x**2 for x in range(10)]
print(a)
print(x)

Note: despite these notes about scopes, it is still a good idea to use descriptive variable names, and to avoid name conflicts as much as possible.

Note: despite scopes, it is still a good idea to avoid globally-defined variables as much as possible.

Namespaces define the "areas" of the code between which the same variable names can appear. Modules are a great example of a namespace:


In [ ]:
import math
import numpy
print(math.pi, numpy.pi)

# don't do this at home
math.pi = 2

print(math.pi, numpy.pi)

The attribute pi is present in both namespaces, so that there are no conflicts between variables or functions with the same name.

Classes are also examples of namespaces.

Classes

Think of a class as a container of data and functionality at the same time. A class is essentially a new object type, which can create new instances, much like the int type is used to create different numbers (the instances).

Each class instance can have attributes of any type attached to it and methods that can act on those attributes or other variables.

Example: a cake recipe (class or type) and a baked cake (instance)

The syntax to create a class is as follows; notice how class names are by convention written with the first letter uppercase.

class ClassName:
    statement_1
    .
    .
    .
    statement_N

When a class definition is entered, a new namespace is created, and used as the local scope.


In [ ]:
class MyClass:
    """A simple example class"""
    i = 12345

    def __init__(self):
        self.data = []
    
    def f(self):
        return 'hello world'

MyClass.i and MyClass.f are valid attribute references, returning an integer and a function object. Class attributes can also be assigned to, so you can change the value of MyClass.i by assignment. __doc__ is also a valid attribute, returning the docstring belonging to the class.

Class instantiation is the creation of a new instance of type MyClass, and uses the function notation.


In [ ]:
x = MyClass()
print(x.i)
print(x.f())
print(x.__doc__)

In [ ]:
print(x.i)
x.i = 2
print(x.i)

New class instances can be created with specific initial variables, either with default values or user-defined ones. The __init__ method is used for this task, usually as the first method in the class definition. If __init__ has any positional arguments, an instance cannot be created without providing them.


In [ ]:
class Complex:
    def __init__(self, realpart, imagpart):
        self.r = realpart
        self.i = imagpart
    
    def generic_method(self, value):
        print(value)

In [ ]:
x = Complex()

In [ ]:
x = Complex(1.1, -2.3)
x.r, x.i

What about the self variable?

self refers to the specific instance of the class any method acts upon. The two following cells are perfectly equivalent, even though the second notation is very rare.


In [ ]:
x.generic_method(100)

In [ ]:
Complex.generic_method(x, 100)

On top of the attributes (variables) and methods (functions) created when a class instance is initiated, we can attach attributes to an already existing class instance


In [ ]:
x.counter = 1
while x.counter < 10:
    x.counter = x.counter * 2
print(x.counter)
del x.counter

In [ ]:
x.counter

Class and instance variables

Generally speaking, instance variables are for data unique to each instance and class variables are for attributes and methods shared by all instances of the class:


In [ ]:
class Dog:

    # class variable shared by all instances
    kind = 'canine'

    def __init__(self, name):
        # instance variable unique to each instance
        self.name = name

In [ ]:
d = Dog('Fido')
e = Dog('Buddy')

# shared by all dogs
print(d.kind)
print(e.kind)

# unique to each instance
print(d.name)
print(e.name)

Warning

When mutable objects (lists, and so on, see previous chapter) are used as class variables, any change to that variable will be shared by all of that class instances.


In [ ]:
class Dog:

    # this is ok
    kind = 'canine'

    # mutable class variable
    tricks = []

    def __init__(self, name):
        self.name = name

    def add_trick(self, trick):
        self.tricks.append(trick)

d = Dog('Fido')
e = Dog('Buddy')

# operating on the `tricks` class variable in two separate instances
d.add_trick('roll over')
e.add_trick('play dead')

# changing the `kind` class variable
e.kind = 'super-dog'

print(d.kind)
print(d.tricks)

Inheritance

A powerful design principle in OOP is class inheritance: in a nutshell, it allows to reuse and expand code written for a class (the parent) and create a new one that has all the characteristics of the parent class and additional attributes and methods.

From a type we can then create an infinite number of subtypes. Usually the parent class is a generic object and the subsequent subtypes (children) are more specialized concepts.


In [ ]:
# base class
class Sequence:
    def __init__(self, name, sequence):
        self.name = name
        self.sequence = sequence

# inherits Sequence,
# has specific attributes and methods
class Dna(Sequence):
    def reverse_complement(self):
        translation_table = str.maketrans('ACGTacgt', 'TGCAtgca')
        revcomp_sequence = self.sequence.translate(translation_table)[::-1]
        return revcomp_sequence

# inherits Sequence,
# has specific attributes and methods
class Protein(Sequence):
    def get_exon_length(self):
        return len(self.sequence) * 3

In [ ]:
dna = Dna('gene1', 'ACTGCGACCAAGACATAG')
dna.reverse_complement()

In [ ]:
prot = Protein('protein1', 'MPNFFIDRPIFAWVIAIIIMLAGGLAILKLPVAQYPTIAP')
prot.reverse_complement()

In [ ]:
prot = Protein('protein1', 'MPNFFIDRPIFAWVIAIIIMLAGGLAILKLPVAQYPTIAP')
prot.get_exon_length()

An underappreciated advantage of inheritance is that it is allows to expand classes that belong to different namespaces. This means that even classes belonging to different modules (or even the base namespace) can be expanded.


In [ ]:
class BetterInt(int):
    def is_odd(self):
        return bool(self % 2)

In [ ]:
x = BetterInt(2)
x.is_odd()

Public and private attributes/methods

Another paradigm of OOP is the distinction between public/private/protected attributes and methods. Specifically:

  • public: completely visible and accessible
  • private: only visible from inside the class
  • protected: only visible from inside the class they belong to, and any subclass derived from it

In python, all attributes and methods are public, but there are a few conventions to have them treated as private. They would still be publically accessible, but the author of the class has "warned" the user not to tamper with them to avoid possible conflicts.


In [ ]:
class Reverser():
    def __init__(self, name):
        self.public = name
        self._private = name[::-1]
    
    def get_reverse(self):
        return self._private
        
x = Reverser('hello world')
print(x.public)
print(x.get_reverse())
x._private = 'luddism'
print(x.get_reverse())

In the above example, the _private attribute is not meant to be called by the class user, but it can still be easily accessed. In languages like C++ accessing or changing the value of a private attribute would trigger an error. In python it is possible but might interfere with the intended purpose of that attribute/method.

A way to obfuscate a private attribute/method a bit more is to use Name mangling, that is using a double underscore before the attribute name:


In [ ]:
class Reverser():
    def __init__(self, name):
        self.public = name
        self.__private = name[::-1]
    
    def get_reverse(self):
        return self.__private
    
x = Reverser('hello world')
print(x.public)
print(x.get_reverse())
x.__private = 'luddism'
print(x.get_reverse())

We have created a new attribute called __private, but the original class attribute has not been changed. That is because name mangling has transformed the __private attribute to _Reverser__private internally.


In [ ]:
print(x.__private)
print(x._Reverser__private)

Operators: handy methods

As stated at the beginning of this chapter, everything in python is an object. As we have seen with objects of type int, we can apply some operators to them:


In [ ]:
x = 1
y = 2
x + 2

The sum operator is in fact a method of the int class. The following expression is exactly equivalent to calling x + y.


In [ ]:
x = 1
y = 2
x.__add__(y)

A comprehensive list of operators that can be implemented for any given class can be found here. It's worth noting that many of those operators are already implemented for any class. Re-implementing an existing operator (or more generally a method) is termed overloading.


In [ ]:
x = Protein('prot1', 'MPNFFIDRPIFAWVIAIIIMLAGGLAILKLPVAQYPTIAP')
dir(x)

For instance, the __eq__ method implements the == boolean operation. The basic implementation checks whether two instances are exactly the same, a behaviour that is not always intuitive.


In [ ]:
p1 = Protein('prot1', 'MPNFFIDRPIFAWVIAIIIMLAGGLAILKLPVAQYPTIAP')
p2 = Protein('prot1', 'MPNFFIDRPIFAWVIAIIIMLAGGLAILKLPVAQYPTIAP')
p1 == p2

In [ ]:
# let's fix it
class Protein(Sequence):
    def get_exon_length(self):
        return len(self.sequence) * 3
    
    def __eq__(self, other_instance):
        return self.sequence == other_instance.sequence

In [ ]:
p1 = Protein('prot1', 'MPNFFIDRPIFAWVIAIIIMLAGGLAILKLPVAQYPTIAP')
p2 = Protein('prot1', 'MPNFFIDRPIFAWVIAIIIMLAGGLAILKLPVAQYPTIAP')
p1 == p2

Other interesting operators:

  • __lt__ (x<y), __le__ (x<=y)
  • __gt__ (x>y), __ge__ (x>=y)
  • __eq__ (x==y), __ne__ (x!=y)
  • __str__: how the instance will be represented when calling the print or format functions on it
  • __bool__ cast the instance to bool, for instance based on one of its attributes

Many more are available, and allow to create new interesting data types.

Duck typing

Unlike languages like c, where the type of arguments to functions have to be previously defined, python uses the "Duck typing" paradigm.

"If it walks like a duck and it quacks like a duck, then it must be a duck."

In other words it means that we are not interested in checking and enforcing the type of an object to be used by a method, only that it needs to contain certain attributes and methods. More importantly, the check is performed at runtime, and not at compilation time (which python doesn't have anyway!). This allows greater flexibility in passing objects to functions.


In [ ]:
def sum_two_things(a, b):
    return a + b

In [ ]:
sum_two_things(1, 2)

In [ ]:
sum_two_things('a', 'b')

For the above examples we just need two objects that support the __add__ operator, but we don't care about their actual type, as long as they say "quack!"