So far we've been working with functions and packages of functions, as well as defining our own functions. It turns out, though, that we've been working with objects all along, we just haven't recognize them as such. For example,
In [113]:
x = 'Hi'
x.lower()
Out[113]:
The string x
is an object that we can send messages to.
In [114]:
print( type(x) )
Even integers are objects:
In [115]:
print(dir(99))
A class is the blueprint for an object and is basically the name of the type, str
in this case. An object is called an instance of the class.
In x.lower()
we are sending the lower
message to the x
string object. Messages are really just functions associated with classes/objects.
In [116]:
x.lower
Out[116]:
In a language that does not support object learning programming, we would do something like:
lower(x)
Python has both functions and object reprogramming which is why there is both x.lower()
and:
In [117]:
len(x)
Out[117]:
The choice of function or "message" is up to the library designer, but lower
only makes sense for strings so it makes sense to group it with the definition of str
.
In terms of implementation, however, x.lower()
is actually implemented as str.lower(x)
where str
is the class definition for strings. Computer processors understand function calls; they do not understand objects and so we performed this translation within the Python interpreter itself.
In [118]:
import numpy as np
np.array([1,2,3])
Out[118]:
In [119]:
import math
math.log(3000)
Out[119]:
This is a common point of confusion when reading code. When we see a.f()
, we don't know whether that function f
is a member of the package identified by a
or an object referred to by a
.
In the wordsim project, you defined a file called wordsim.py
and then my test_wordsim.py
file did from wordsim import *
to pull in all your functions in wordsim.py
.
In the following, identify the identifiers (words) as package or function or field:
np.log(3)
np.linalg.norm(v)
from sklearn.ensemble import RandomForestRegressor
pd.read_csv("foo.csv")
pd.read_csv
'hi'.lower()
'hi'.lower
df_train.columns
np.pi
img = img.convert("L")
Now, identify the data types of subexpressions and identify the identifiers (words) as package or function or field:
df["saledate"].dt.year
df_train.isnull().any().head(60)
Objects have functions, which we call methods to distinguish them from functions not associated with objects. Objects also have variables, which we call fields or instance variables.
Fields are the state of the object. Methods are the behavior of the object.
We've also been using fields all along, such as df.columns
that gets the list of columns in a data frame.
In [120]:
import datetime
now = datetime.date.today()
print( type(now) )
print( now.year ) # access field year
print( now.month )
If you try to access an objects function without the parentheses, the expression evaluates to the function object itself instead of calling it:
In [121]:
s='hi'
s.title
Out[121]:
A class is a blueprint for multiple objects, often called instances. The class encapsulates the state and behavior of an object.
Imagine an alien lands in your backyard and asks you to describe a car. You would probably describe its attributes, such as the number of wheels, and its functionality, such as can start and stop. These are the state and behavior. By defining them, we effectively define the object. The class name is just giving a name to the entity.
By convention, class names should be capitalized like Point
.
In [122]:
from lolviz import *
books = [
('Gridlinked', 'Neal Asher'),
('Startide Rising', 'David Brin')
]
objviz(books)
Out[122]:
In [123]:
for b in books:
print(f"{b[1]}: {b[0]}")
In [124]:
# Or, more fancy
for title, author in books:
print(f"{author}: {title}")
To access the elements of the tuple in both cases, we have to keep track of the order in our heads. In other words, we have to access the tuple elements like they are list elements, which they are.
A better way is to formally declare that author and title data elements should be encapsulated into a single entity called a book. Python has what I consider an extremely quirky specification but it is extremely flexible. For example, we can define an object that has no methods and no fields but then can add fields dynamically with assignment statements:
In [125]:
class Book:
pass
b = Book()
print(b)
b.title = 'Gridlinked'
b.author = 'Neal Asher'
print(b.title, b.author)
objviz(b)
Out[125]:
But this doesn't let us define methods associated with that object (easily). Let's take a look at our first real class definition that contains a function called a constructor.
In [126]:
class Book:
def __init__(self, title, author):
self.title = title
self.author = author
self.chapters = []
The constructor typically sets initial and default field values based upon the arguments.
All methods, functions defined within an object, must have an explicit first argument called self
. This is the object under consideration.
Then we can make a list of book objects or instances of class Book
using instance creation syntax Book(...,...)
:
In [127]:
books = [
Book('Gridlinked', 'Neal Asher'),
Book(title='David Brin', author='Startide Rising')
]
In [128]:
objviz(books)
Out[128]:
In [129]:
for b in books:
print(f"{b.author}: {b.title}") # access fields
Notice that we do not pass the self
parameter to the constructor. It's implicit at the call site but explicit at the definition site!
In [130]:
class Foo:
pass # just says "empty"
x = Foo()
x.foo = 3
That does not get an error even though the class itself does not define foo!
You can even add methods on the fly.
If you try to print out a book you will see just the type information and the physical memory address:
In [131]:
print(books[0])
In [175]:
class Book:
def __init__(self, title, author):
self.title = title
self.author = author
def __str__(self): # called when conversion to string needed like print
return f"Book({self.title}, {self.author})"
def __repr__(self): # called in interactive mode
return self.__str__() # call the string
books = [
Book('Gridlinked', 'Neal Asher'),
Book('Startide Rising', 'David Brin')
]
In [176]:
print(books[0]) # calls __str__()
books[0] # calls __repr__()
Out[176]:
Make sure that you use self.x
to refer to field x
, otherwise you are creating a local variable inside a method:
In [134]:
class Foo:
def __init__(self):
self.x = 0
def foo(self):
x = 3 # WARNING: does not alter the field! should be self.x
Let's create another method that sets the count of book sold.
In [178]:
class Book:
def __init__(self, title, author):
self.title = title
self.author = author
self.sold = 0 # set default
def sell(self, n):
self.sold += n
def __str__(self): # called when conversion to string needed like print
return f"Book({self.title}, {self.author}, sold={self.sold})"
def __repr__(self): # called in interactive mode
return self.__str__() # call the string
In [179]:
b = Book('Gridlinked', 'Neal Asher')
print(b)
b.sell(100) # Book.sell(b, 100)
print(b)
Note: that from within a method definition, we call other methods on the same object using self.foo(...)
for method foo
.
b.sell(100)
method call is translated and executed by the Python interpreter as function call Book.sell(b,100)
. b
becomes parameter self
and so the sell()
function is updating book b
.
Why we prefer b.sell(100)
over Book.sell(b,100)
: Instead of just functions, we send messages back and forth between objects. Instead of bark(dog) we say dog.bark() or instead of inflate(ball) we say ball.inflate().
Real-world objects contain ... and ...
A software object's state is stored in ...
A software object's behavior is exposed through ...
A blueprint for a software object is called a ...
Define a class called Point
that has a constructor taking x, y coordinates and make them fields of the class.
Define method distance(q)
that takes a Point
and returns the Euclidean distance from self
to q
.
Test with
p = Point(3,4)
q = Point(5,6)
print(p.distance(q))
Add method __str__
so that print(q)
prints something nice like (3,4)
.
In [137]:
import numpy as np
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
def distance(self, other):
return np.sqrt( (self.x - other.x)**2 + (self.y - other.y)**2 )
def __str__(self):
return f"({self.x},{self.y})"
In [138]:
p = Point(3,4)
q = Point(5,6)
print(p, q)
print(p.distance(q))
Defining something new as it relates to something we already understand is usually a lot easier. The same thing is true in programming. Let's start with an account object:
In [139]:
class Account:
def __init__(self, starting):
self.balance = starting
def add(self, value):
self.balance += value
def total(self):
return self.balance
In [140]:
a = Account(100.0)
a.add(15)
a.total()
Out[140]:
In [141]:
objviz(a)
Out[141]:
Inheritance behaves like an import
or include operation from another class into a new class. (Note that this is not really true, but we can think of it as an include for our purposes.)
If we do not specify a superclass, class object
is the implicit superclass. That class is called the root of the class hierarchy and defines a number of standard methods:
In [142]:
x = object() # yes, we can make a generic object
print(dir(x))
We could define an interest-bearing account as it differs from a regular account:
In [180]:
class InterestingAccount(Account): # derive from super class to get subclass
def __init__(self, starting, rate):
self.balance = starting # super().__init__(starting)
self.rate = rate
def total(self): # OVERRIDE method
return self.balance + self.balance * self.rate
b = InterestingAccount(100.0, 0.15)
b.add(15)
b.total()
In [145]:
objviz(b)
Out[145]:
The key is that we get to use add()
without having to redefine it in InterestingAccount
and InterestingAccount
also gets to override what the total()
of the account is. We have reused and refined previous functionality. You can think of the superclass as defining some initial functions that we can reuse or override in the subclass.
We can also extend the functionality by adding a method that is not in the superclass.
In [146]:
class InterestingAccount(Account): # derive from super class to get subclass
def __init__(self, starting, rate):
super().__init__(starting) # does self.balance = starting above
self.rate = rate
def total(self): # OVERRIDE method
return self.balance + self.balance * self.rate
def profit(self):
return self.balance * self.rate
In [147]:
b = InterestingAccount(100.0, 0.15)
b.add(15)
b.profit()
Out[147]:
In [148]:
a = Account(100.0)
b = InterestingAccount(100.0, 0.15)
print(type(a))
print(type(b))
The class definitions are actually objects themselves that you can access with a secret field of any object:
In [149]:
print(b.__class__)
print(b.__class__.__base__)
Foo
using a constructor that takes no arguments.__init__
method do?Employee
and Manager
, which is the superclass in which is the subclass?When you call b.add(15)
, Python looks up function add
within the object definition for b
(InterestingAccount
). Because we have inherited that method from the superclass, subclass knows about it. When we call b.total()
, Python again looks up method within InterestingAccount
and finds an overridden method. That is why b.total()
doesn't invoke the Account
version.
This behavior is desirable but extremely confusing at first. Here is an example of it in action where I have added a __str__
method to the superclass:
In [150]:
class Account:
def __init__(self, starting):
self.balance = starting
def add(self, value):
self.balance += value
def total(self):
return self.balance
def __str__(self):
return f"Balance {self.total()}" # can call 2 different functions
class InterestingAccount(Account): # derive from super class to get subclass
def __init__(self, starting, rate):
self.balance = starting
self.rate = rate
def total(self): # OVERRIDE method
return self.balance + self.balance * self.rate
def profit(self):
return self.balance * self.rate
The devious part is that __str__
in Account
calls Account.total()
or InterestingAccount.total()
, depending on the type of self
:
In [151]:
a = Account(100.0)
b = InterestingAccount(100.0, 0.15)
print(a) # calls Account.total()
print(b) # calls InterestingAccount.total()
Define a Point3D
that inherits from Point
.
Define constructor that takes x,y,z values and sets fields. Call super().__init__(x,y)
to call constructor of superclass.
Define / override distance(q)
so it works with 3D field values to return distance.
Test with
p = Point3D(3,4,9)
q = Point3D(5,6,10)
print(p.distance(q))
Add method __str__
so that print(q)
prints something nice like (3,4,5)
. Recall:
$dist(x,y) = \sqrt{(x_1-y_1)^2 + (x_2-y_2)^2 + (x_3-y_3)^2)}$
In [152]:
import numpy as np
class Point3D(Point):
def __init__(self, x, y, z):
# reuse/refine super class constructor
super().__init__(x,y)
self.z = z
def distance(self, other):
return np.sqrt( (self.x - other.x)**2 +
(self.y - other.y)**2 +
(self.z - other.z)**2 )
def __str__(self):
return f"({self.x},{self.y},{self.z})"
In [153]:
p = Point3D(3,4,9)
q = Point3D(5,6,10)
print(p.distance(q))
Because the mind of a hunter-gatherer views the world as a collection of objects that interact by sending messages, an OO programming paradigm maps well to the real world problems we try to simulate via computer. Further, we are at our best when programming the way our minds are hardwired to think.
In general when writing software, we try to map real-world entities onto programming constructs. If we take a word problem, the nouns typically become objects and the verbs typically become methods within these objects.
Because we can specify how differently-typed objects are similar, we can define new objects as they differ from existing objects. By correctly relating similar classes by their category/commonality/ similarity, code reuse occurs as a side-effect of inheritance.
Non-OO languages are inflexible/brittle because the exact type of variables must be specified. In OO languages, polymorphism is the ability to refer to groups of similar but different types using a single type reference.