TODO:Perhaps recognize to the students that I know they know the simple conditional and looping statements. The goal here is to get them retrained from birth so to speak to properly understand these operations in context. I'd like to show them how to think about these things. We never just decide to stick in another loop because we haven't done one in a while or randomly insert a conditional statement. Either here or somewhere else, we need to really stress the difference between atomic element assignment like numbers and assignment between references. Students had their minds blown in the linked list stuff later. Emphasize that functions can change lists that come in also.
Now that we've studied a problem-solving process and learned the common programming patterns using pseudocode, it's finally time to express ourselves using actual Python programming language syntax. Keep in mind that, to implement any program, we should follow the problem-solving process and write things out in pseudocode first. Then, coding is a simple matter of translating pseudocode to Python.
Let's review our computation model. Our basic raw ingredient is data (numbers or text strings) that lives on our disk typically (or SSDs nowadays). Note: we might have to go get that data with code; see MSAN692. The disk is very large but cannot serve up data fast enough for the processor, which is many orders of magnitude faster than the disk. Consequently, our first act in an analytics program is often to load some data from the disk into temporary memory. The memory is faster than the disk but smaller and disappears when the power goes off or your program terminates. The processor is still much faster than the memory but we have lots of tricks to increase the speed of communication between the processor and memory (e.g., caches).
The processor is where all of the action occurs because it is the entity that executes the statements in a program. The operations in a program boil down to one of these fundamental instructions within the processor:
In Model of Computation, we studied pseudocode that maps to one or more of these fundamental instructions. We saw how some of the higher-level programming patterns map down to pseudocode chosen from these fundamental instructions. We finished up by looking at some of the low-level programming patterns that combine fundamental instructions to do useful things like filtering and image processing.
The act of translating a pseudocode operation into Python code involves choosing the right Python construct, just like programming involves choosing the right pattern to solve a piece of a real-world problem. Then, it's a matter of shoehorning our free-form pseudocode into the straitjacket of programming language syntax. Before we get to those details, however, let's look at the big picture and a general strategy for writing programs.
Writing and executing a program are remarkably similar to writing and reading a paper or report. Just as with our program work plan, we begin writing a paper by clearly identifying a thesis or problem statement. Analogous to identifying input-output pairs, we might identify the target audience and what we hope readers will come away with after reading the paper. With this in mind, we should write an outline of the paper, which corresponds to identifying the processing steps in the program work plan. Sections and chapters in a paper might correspond to functions and packages in the programming world.
When reading a paper, we read the sections and paragraphs in order, like a processor executes a program. The paper text can ask the reader to jump temporarily to a figure or different section and return. This is analogous to a program calling a function and returning, possibly with some information. When reading a paper, we might also encounter conditional sections, such as "If you've studied quantum physics, you can skip this section." There can even be loops in a paper, such as "Now's a good time to reread the background section on linear algebra."
The point is that, if you've been taught how to properly write a paper, the process of writing code should feel very familiar. To simplify the process of learning to code in Python, we're going to restrict ourselves to a subset of the language and follow a few templates that will help us organize our programs.
While I was in graduate school, I worked in Paris for six months (building an industrial robot control language). A friend, who didn't speak French, came over to work as well and got a tutor. The tutor started him out with just the present tense, four verbs, and a few key nouns like café and croissant. Moreover, the tutor gave him simple sentence templates in the French equivalent of these English templates:
_______ go _______.
and
I am _________.
that he could fill in with subjects and objects (nouns, pronouns).
That's also an excellent approach for learning to code in a programming language. We're going to start out playing around in a small sandbox, picking a simple subset of python that lets us do some interesting things.
The "nouns" in this subset are numbers like 34
and 3.4
, strings like parrt
, and lists of nouns like [3,1.5,4]
. We can name these values using variables just as we use names like Mary to refer to a specific human being. The "verbs", which act on nouns, are arithmetic operators like cost + tax
, relational operators like quantity<10
, and some built-in functions like len(names)
. We'll also use some sentence templates for conditional statements and loops. Finally, we'll also need the ability to pull in (import
) code written by other programmers to help us out. It's like opening a cookbook that lets us refer to existing recipes.
The atomic elements in python, so to speak, are numbers and strings of text. We distinguish between integer and real numbers, which we call floating-point numbers, because computers represent the two internally in a different way. Here's where the data type comes back into play. Numbers like 34
, -9
, and 0
are said to have type int
whereas 3.14159
and 0.123
are type float
. These values are called int or float literals. Strings of text are just a sequence of characters in single quotes (there are more quoting options but we can ignore that for now) and have type string
. For example, 'parrt'
and 'May 25, 1999'
. Note that the string representation of '207'
is very different than the integer 207
. The former is a sequence, which we can think of as a list, with three characters and the latter is a numeric value that we could, say, multiply by 10.
Let's look at our first python program!
In [35]:
print 'hi'
That code is a kind of statement that instructs the computer to print the specified value to the computer screen (the console).
Exercise: Try that statement out yourself. Using PyCharm, we see an editor window and the results of running the one-line program using the Run menu:
You can also try things out interactively using the interactive Python console (also called a Python shell) without actually creating a Python file containing the code. After typing the print statement and hitting the newline (return) key, the console looks like:
So here is our first statement template:
print ___________
We can fill that hole with any kind of expression; right now, we only know about values (the simplest expressions) so we can do things like:
In [36]:
print 34
print 3.14159
Notice the order of execution. The processor is executing one statement after the other.
Instead of printing values to the screen, let's store values into memory through variable assignment statements. The assignment statement template looks like:
variablename = __________
For example, we can store the value one into a variable called count
and then reference that variable to load the data back from memory for use by a print statement:
In [37]:
count = 1
print count
Again, the sequence matters. Putting the print
before the assignment will cause an error because count
is not defined as a variable until after the assignment.
To see how things are stored in memory, let's look at three assignments.
In [38]:
count = 1
name = 'iPhone'
price = 699.99
We can visualize the state of computer memory after executing that program using pythontutor.com. It shows a snapshot of memory like this:
(fyi, the "Global frame" holds what we call global variables. For now, everything will be globally visible and so we can just call them variables.)
Exercise: Type in those assignments yourself and then print each variable.
Another important thing about variables in a program, is that we can reassign variables to new values. For example, programs tend to count things and so you will often see assignments that looks like count = count + 1
. Here's a contrived example:
In [39]:
count = 0
count = count + 1
print count
From a mathematical point of view, it looks weird/nonsensical to say that a value is equal to itself plus something else. Programmers don't mean that the two are equal; we are assigning a value to a variable, which just happens to be the same variable referenced on the right-hand side. The act of assignment corresponds to the fundamental processor "store to memory" operation we discussed earlier. The Python language differentiates between assignment, =
, and equality, ==
, using two different operators.
Just as we use columns of data in spreadsheets frequently, we also use lists of values a lot in Python coding. A list is just an ordered sequence of elements. The simplest list has numbers or strings as elements. For example, [2, 4, 8]
is a list with three numbers in it. We call this a list literal just like 4
is an integer literal. Of course, we can also associate a name with a list:
In [40]:
values = [2, 4, 8]
print values
Python tutor shows the following snapshot of memory. Notice how indexing of the list, the numbers in grey, start from zero. In other words, the element in the 1st position has index 0, the element in the second position has index 1, etc... The last valid index in a list has the length of the list - 1.
Here's an example list with string elements:
In [41]:
names = ['Xue', 'Mary', 'Bob']
print names
Exercise: Try printing the result of adding 2 list literals together using the +
operator. E.g., [34,99]
and [1,3,7]
.
The list elements don't all have to be of the same type. For example, we might group the name and enrollment of a course in the same list:
In [42]:
course = ['msan692', 51]
This list might be a single row in a table with lots of courses. Because a table is a list of rows, a table is a list of lists. For example, a table like this:
could be associated with variable courses
using this python list of row lists.
In [43]:
courses = [
['msan501', 51],
['msan502', 32],
['msan692', 101]
]
Python tutor gives a snapshot of memory that looks like this:
This example code also highlights an important python characteristic. Assignment and print statements must be completed on the same line unless we break the line in a [...]
construct (or (...)
for future reference). For example, if we finish the line with the =
symbol, we get a syntax error from python:
badcourses =
[
['msan501', 51],
['msan502', 32],
['msan692', 101]
]
yields error:
File "<ipython-input-17-55e90f1fbebb>", line 1
badcourses =
^
SyntaxError: invalid syntax
Besides creating lists, we need to access the elements, which we do using square brackets and an index value. For example, to get the first course at index 0 from this list, we would use courses[0]
:
In [44]:
print courses[0] # print the first one
print courses[1] # print the 2nd one
Because this is a list of lists, we use two-step array indexing like courses[i][j]
to access row i
and column j
.
Exercise: Try printing out the msan502
and 32
values using array index notation.
We can also set list values by using the array indexing notation on the left-hand side of an assignment statement:
In [45]:
print courses[2][1]
courses[2][1] = 99
print courses[2]
This indexing notation also works to access the elements of a string (but you cannot assign to individual characters in a string because strings are immutable):
In [46]:
name = 'parrt'
print name[0]
print name[1]
Looking at the python tutor representation of our courses
list, we can see that Python definitely represents that table as a list of lists in memory. Also notice that variable courses
refers to the list, meaning that courses
is a variable that points at some memory space organized into a list. For example, if we assign another variable to courses
, then they both point at the same organized chunk of memory:
In [47]:
courses = [
['msan501', 51],
['msan502', 32],
['msan692', 101]
]
mycourses = courses
Python tutor illustrates this nicely:
While the python tutor does not illustrate it this way, variables assigned to strings also refer to them with the same pointer concept. After executing the following two assignments, variables name
and myname
refer to the same sequence of characters in memory.
In [48]:
name = 'parrt'
myname = name
The general rule is that assignment only makes copies of numbers, not strings or lists. We'll learn more about this later.
So far, we've seen the assignment and print statements, both of which have "holes" where we can stick in values. More generally, we can insert expressions. An expression is just a combination of values and operators, corresponding to nouns and verbs in natural language. We use arithmetic operators (+
, -
, *
, /
) and parentheses for computing values:
In [49]:
price = 50.00
cost = price * 1.10 + 4 # add 10% tax and 4$ for shipping
print cost
The expression is price * 1.10 + 4
and it follows the normal operator precedence rules that multiplies are done before additions. For example, 4 + price * 1.10
gives the same result:
In [50]:
price = 50.00
cost = 4 + price * 1.10 # add 10% tax and 4$ for shipping
print cost
There is another kind of expression called a conditional expression or Boolean expression that is a combination of values and relational operators (<
, >
, <=
, >=
, ==
equal, !=
not equal). These are primarily used in conditional statements and loops, which we'll see next.
Ok, we've now got a basic understanding of how to compute and print values, and we have seen that the processor execute statements one after the other. Processors can also execute statements conditionally so let's see how to express that in python. The basic template for a conditional statement looks like:
if _____: _______
if there is one conditional statement or
if _____:
_____
_____
...
if there is more than one conditional statement.
if
are indented from the starting column of the if
keyword. Indentation is how python groups statements and associates statements with conditionals and loops. All statements starting at the same column number are grouped together. The exception is when we associate a single statement with a conditional or loop on the same line (the first if
template).
Here's a simple example that tests whether the temperature is greater than 90 degrees (Fahrenheit, let's say).
In [51]:
temp = 95
if temp>90: print 'hot!'
The processor executes the assignment first then tests the value of variable temp
against value 90
. The result of that conditional expression has type boolean (bool
). If the result is true, the processor executes the print statement guarded by the conditional. If the result is false, the processor skips the print statement.
As always, the sequence of operations is critical to proper program execution. It's worth pointing out that this if
statement is different than what we might find in a recipe meant for humans. For example, the if
statement above evaluates temp>90
at a very specific point in time, directly after the previous assignment statement executes. In a recipe, however, we might see something like "if the cream starts to boil, turn down the heat." What this really means is that if the cream ever starts to boil, turn down the heat. In most programming languages, there is no direct way to express this real-world functionality. Just keep in mind that Python if
statements execute only when the processor reaches it. The if
statement is not somehow constantly and repeatedly evaluating temp>90
.
In Model of Computation, we also saw and if-else type conditional statement. We can also directly expressed this in python. The template looks like:
if _____:
_____
_____
...
else:
_____
_____
...
Continuing with our previous example, we might use the else
clause like this:
In [52]:
temp = 95
if temp>90:
print 'hot!'
else:
print 'nice'
In [53]:
temp = 75
if temp>90:
print 'hot!'
else:
print 'nice'
Our model of computation also allows us to repeat statements using a variety of loops. The most general loop tested a condition expression and has a template that looks like this:
while _____:
_____
_____
...
where one of the statements within the while
loop must change the conditions of the test to avoid an infinite loop.
Let's translate this simple pseudocode program:
init a counter to 1
while counter <= 5:
print "hi"
add 1 to counter
to Python:
In [54]:
count = 0
while count <= 5:
print "hi"
count = count + 1
Exercise: Using the same coding template, alter the loop so that it prints out the count
variable each time through the loop instead of hi
.
Another kind of loop we saw is the for-each loop, which has the template:
for x in _____:
_____
_____
...
where x
can be any variable name we want. For example,We can print each name from a list on a line by itself:
In [59]:
names = ['Xue', 'Mary', 'Bob']
for name in names:
print name
Similarly, we can print out the rows of our courses table like this:
In [55]:
for course in courses:
print course
When we need to loop through multiple lists simultaneously, we use indexed loops following this template:
n = _____ # n is the length of the lists (should be same length)
for i in range(n):
_____
_____
...
The range(n)
function returns a range from 0 to n-1 or $[0..n)$ in math notation.
Here is an indexed loop that is equivalent of the for-each loop:
In [61]:
names = ['msan501', 'msan502', 'msan692']
enrollment = [51, 32, 101]
n = 3
for i in range(n):
print names[i], enrollment[i] # print the ith element of names and enrollment lists
Usually we don't know the length of the list (3 in this case) and so we must ask python to compute it using the commonly-used len(...)
function. Rewriting the example to be more general, we'd see:
In [62]:
n = len(names)
for i in range(n):
print names[i], enrollment[i]
As a shorthand, programmers often combine the range
and len
as follows:
In [58]:
for i in range(len(courses)):
print names[i], enrollment[i]
We've seen the use of some predefined functions, such as range
and len
, but those are available without doing anything special in your Python program. Now let's take a look at importing a library of code and data. Because there are perhaps millions of libraries out there and Python can't automatically load them all into memory (slow and they wouldn't fit), we must explicitly import
the libraries we want to use. This is like opening a specific cookbook.
For example, let's say we need the value of π, which I can only remember to five decimal points. If we type pi
in the Python console, we get an error because that variable is not defined:
In [27]:
print pi
However, the Python math
library has that variable and much more, so let's import it.
In [28]:
import math
That tells Python to bring in the math
library and so now we can access the stuff inside. A crude way to ask Python for the list of stuff in a package is to use the dir
function, similar to the Windows commandline dir
command.
In [29]:
print dir(math)
It's better to use the online math documentation, but sometimes that command is helpful if you just can't remember the name of something.
Anyway, now we can finally get the value of pi
:
In [30]:
print math.pi
We can access anything we want by prefixing it with the name of the library followed by the dot operator which is kind of like an "access" operator. pi
is a variable but there are also functions such as sqrt
:
In [31]:
print math.sqrt(100)
Take a look back at the summary in Model of Computation. You'll notice that the high level pseudocode operations look remarkably like the actual python equivalent. Great news! Let's review all of our python statement templates as it defines the basic "sentence structure" we'll use going forward to write some real programs.
We import packages with import
and can refer to the elements using the dot .
operator.
In [ ]: