Python is not ony a powerful scripting language, but it also supports object-oriented programming. In fact, everything in Python is an object. Working with functions is instead called procedure-oriented programming. Both styles (or philosophies) are acceptable and appropriate.
Objected-oriented programming is well suited for creating modules and APIs.
Objects are defined and handled trhough the Class
type
More reading on objected-oriented programming:
In [ ]:
print(dir(bool))
Like in other programming languages, variables are only visible in certain parts of the code, formally termed "scopes". In practical terms this means that certain variables will only be visible inside a limited part of the code, and that variables in different scopes can have the same name, without generating any conflict.
Consider the following example:
In [ ]:
def f():
x = 1
print(x)
x = 2
f()
print(x)
List-comprehensions (and all other comprehensions) have their own scope
In [ ]:
x = 2
a = [x**2 for x in range(10)]
print(a)
print(x)
Note: despite these notes about scopes, it is still a good idea to use descriptive variable names, and to avoid name conflicts as much as possible.
Note: despite scopes, it is still a good idea to avoid globally-defined variables as much as possible.
Namespaces define the "areas" of the code between which the same variable names can appear. Modules are a great example of a namespace:
In [ ]:
import math
import numpy
print(math.pi, numpy.pi)
# don't do this at home
math.pi = 2
print(math.pi, numpy.pi)
The attribute
pi
is present in both namespaces, so that there are no conflicts between variables or functions with the same name.
Classes are also examples of namespaces.
Think of a class as a container of data and functionality at the same time. A class is essentially a new object type
, which can create new instances
, much like the int
type is used to create different numbers (the instances
).
Each class instance can have attributes
of any type attached to it and methods
that can act on those attributes
or other variables.
Example: a cake recipe (class
or type
) and a baked cake (instance
)
The syntax to create a class is as follows; notice how class names are by convention written with the first letter uppercase.
class ClassName:
statement_1
.
.
.
statement_N
When a class definition is entered, a new namespace is created, and used as the local scope.
In [ ]:
class MyClass:
"""A simple example class"""
i = 12345
def __init__(self):
self.data = []
def f(self):
return 'hello world'
MyClass.i
and MyClass.f
are valid attribute references, returning an integer and a function object. Class attributes can also be assigned to, so you can change the value of MyClass.i
by assignment. __doc__
is also a valid attribute, returning the docstring belonging to the class.
Class instantiation is the creation of a new instance
of type MyClass
, and uses the function notation.
In [ ]:
x = MyClass()
print(x.i)
print(x.f())
print(x.__doc__)
In [ ]:
print(x.i)
x.i = 2
print(x.i)
New class instances can be created with specific initial variables, either with default values or user-defined ones. The __init__
method is used for this task, usually as the first method in the class definition. If __init__
has any positional arguments, an instance
cannot be created without providing them.
In [ ]:
class Complex:
def __init__(self, realpart, imagpart):
self.r = realpart
self.i = imagpart
def generic_method(self, value):
print(value)
In [ ]:
x = Complex()
In [ ]:
x = Complex(1.1, -2.3)
x.r, x.i
What about the self
variable?
self
refers to the specific instance
of the class
any method acts upon. The two following cells are perfectly equivalent, even though the second notation is very rare.
In [ ]:
x.generic_method(100)
In [ ]:
Complex.generic_method(x, 100)
On top of the attributes
(variables) and methods
(functions) created when a class instance is initiated, we can attach attributes to an already existing class instance
In [ ]:
x.counter = 1
while x.counter < 10:
x.counter = x.counter * 2
print(x.counter)
del x.counter
In [ ]:
x.counter
Class
and instance
variables
Generally speaking, instance variables are for data unique to each instance and class variables are for attributes and methods shared by all instances of the class:
In [ ]:
class Dog:
# class variable shared by all instances
kind = 'canine'
def __init__(self, name):
# instance variable unique to each instance
self.name = name
In [ ]:
d = Dog('Fido')
e = Dog('Buddy')
# shared by all dogs
print(d.kind)
print(e.kind)
# unique to each instance
print(d.name)
print(e.name)
Warning
When mutable objects (lists, and so on, see previous chapter) are used as class variables, any change to that variable will be shared by all of that class instances.
In [ ]:
class Dog:
# this is ok
kind = 'canine'
# mutable class variable
tricks = []
def __init__(self, name):
self.name = name
def add_trick(self, trick):
self.tricks.append(trick)
d = Dog('Fido')
e = Dog('Buddy')
# operating on the `tricks` class variable in two separate instances
d.add_trick('roll over')
e.add_trick('play dead')
# changing the `kind` class variable
e.kind = 'super-dog'
print(d.kind)
print(d.tricks)
A powerful design principle in OOP is class inheritance
: in a nutshell, it allows to reuse and expand code written for a class (the parent
) and create a new one that has all the characteristics of the parent class and additional attributes
and methods
.
From a type
we can then create an infinite number of subtypes
. Usually the parent class is a generic object and the subsequent subtypes (children) are more specialized concepts.
In [ ]:
# base class
class Sequence:
def __init__(self, name, sequence):
self.name = name
self.sequence = sequence
# inherits Sequence,
# has specific attributes and methods
class Dna(Sequence):
def reverse_complement(self):
translation_table = str.maketrans('ACGTacgt', 'TGCAtgca')
revcomp_sequence = self.sequence.translate(translation_table)[::-1]
return revcomp_sequence
# inherits Sequence,
# has specific attributes and methods
class Protein(Sequence):
def get_exon_length(self):
return len(self.sequence) * 3
In [ ]:
dna = Dna('gene1', 'ACTGCGACCAAGACATAG')
dna.reverse_complement()
In [ ]:
prot = Protein('protein1', 'MPNFFIDRPIFAWVIAIIIMLAGGLAILKLPVAQYPTIAP')
prot.reverse_complement()
In [ ]:
prot = Protein('protein1', 'MPNFFIDRPIFAWVIAIIIMLAGGLAILKLPVAQYPTIAP')
prot.get_exon_length()
An underappreciated advantage of inheritance is that it is allows to expand classes that belong to different namespaces. This means that even classes belonging to different modules (or even the base namespace) can be expanded.
In [ ]:
class BetterInt(int):
def is_odd(self):
return bool(self % 2)
In [ ]:
x = BetterInt(2)
x.is_odd()
Another paradigm of OOP is the distinction between public/private/protected attributes
and methods
.
Specifically:
public
: completely visible and accessibleprivate
: only visible from inside the classprotected
: only visible from inside the class they belong to, and any subclass derived from itIn python, all attributes
and methods
are public, but there are a few conventions to have them treated as private
. They would still be publically accessible, but the author of the class has "warned" the user not to tamper with them to avoid possible conflicts.
In [ ]:
class Reverser():
def __init__(self, name):
self.public = name
self._private = name[::-1]
def get_reverse(self):
return self._private
x = Reverser('hello world')
print(x.public)
print(x.get_reverse())
x._private = 'luddism'
print(x.get_reverse())
In the above example, the _private
attribute is not meant to be called by the class user, but it can still be easily accessed. In languages like C++
accessing or changing the value of a private attribute would trigger an error. In python it is possible but might interfere with the intended purpose of that attribute
/method
.
A way to obfuscate a private attribute
/method
a bit more is to use Name mangling, that is using a double underscore before the attribute name:
In [ ]:
class Reverser():
def __init__(self, name):
self.public = name
self.__private = name[::-1]
def get_reverse(self):
return self.__private
x = Reverser('hello world')
print(x.public)
print(x.get_reverse())
x.__private = 'luddism'
print(x.get_reverse())
We have created a new attribute called __private
, but the original class attribute has not been changed. That is because name mangling has transformed the __private
attribute to _Reverser__private
internally.
In [ ]:
print(x.__private)
print(x._Reverser__private)
In [ ]:
x = 1
y = 2
x + 2
The sum
operator is in fact a method of the int
class. The following expression is exactly equivalent to calling x + y
.
In [ ]:
x = 1
y = 2
x.__add__(y)
A comprehensive list of operators that can be implemented for any given class can be found here. It's worth noting that many of those operators are already implemented for any class. Re-implementing an existing operator (or more generally a method
) is termed overloading
.
In [ ]:
x = Protein('prot1', 'MPNFFIDRPIFAWVIAIIIMLAGGLAILKLPVAQYPTIAP')
dir(x)
For instance, the __eq__
method implements the ==
boolean operation. The basic implementation checks whether two instances are exactly the same, a behaviour that is not always intuitive.
In [ ]:
p1 = Protein('prot1', 'MPNFFIDRPIFAWVIAIIIMLAGGLAILKLPVAQYPTIAP')
p2 = Protein('prot1', 'MPNFFIDRPIFAWVIAIIIMLAGGLAILKLPVAQYPTIAP')
p1 == p2
In [ ]:
# let's fix it
class Protein(Sequence):
def get_exon_length(self):
return len(self.sequence) * 3
def __eq__(self, other_instance):
return self.sequence == other_instance.sequence
In [ ]:
p1 = Protein('prot1', 'MPNFFIDRPIFAWVIAIIIMLAGGLAILKLPVAQYPTIAP')
p2 = Protein('prot1', 'MPNFFIDRPIFAWVIAIIIMLAGGLAILKLPVAQYPTIAP')
p1 == p2
Other interesting operators:
__lt__
(x<y), __le__
(x<=y)__gt__
(x>y), __ge__
(x>=y)__eq__
(x==y), __ne__
(x!=y)__str__
: how the instance will be represented when calling the print
or format
functions on it__bool__
cast the instance to bool, for instance based on one of its attributesMany more are available, and allow to create new interesting data types.
Unlike languages like c
, where the type of arguments to functions have to be previously defined, python uses the "Duck typing" paradigm.
"If it walks like a duck and it quacks like a duck, then it must be a duck."
In other words it means that we are not interested in checking and enforcing the type
of an object to be used by a method
, only that it needs to contain certain attributes and methods. More importantly, the check is performed at runtime, and not at compilation time (which python doesn't have anyway!). This allows greater flexibility in passing objects to functions.
In [ ]:
def sum_two_things(a, b):
return a + b
In [ ]:
sum_two_things(1, 2)
In [ ]:
sum_two_things('a', 'b')
For the above examples we just need two objects that support the __add__
operator, but we don't care about their actual type
, as long as they say "quack!"