In this class, you will do a lot of development in the Python language. Python is great for rapid prototyping, meaning you can get started quickly. Yet, it's much more than a toy language, and so very practical for industrial use. In fact, Python is one of the 3 main languages in use at Google's production code!
Q: Can you name the other two languages?
It also has extensive library support, meaning a lot of the code you will need day-to-day already exists -- you just need to find it, download it (if necessary), and start using it in your own programs.
Right now, you are looking at an IPython notebook. An IPython notebook is a document that mixes text, code, and code output. It's a little bit different than running Python from an interpreter command-line, which you may have done at the MS Analytics Python 101 Bootcamp. We will use them for in-class exercises because, from a teaching and learning point of view, it's a great way to create structured, self-documenting worksheets that you can play with and revise.
The goal of this notebook is to introduce you to how IPython notebooks work and practice writing some Python code.
Note: For a different tutorial, refer to the one included in the Python documentation or the MSA Python 101 Boot Camp (see above).
The basic building block of any Python program is an expression. A Python expression is just like a mathematical expression: it's a snippet of values and operations that, when executed, produce a value.
Examples:
3
3.0
"hello"
3+5
"hello" + ", world"
In an IPython notebook, you enter Python code in designated code blocks, denoted by the prefix, In [...]:
Q: Try typing any one of the above expressions, and then press the "Play" button. What happens if you enter more than one expression (each on its own line)?
In [ ]:
You can also name an expression, so that you can refer to it elsewhere in your program. The name is sometimes also called an identifier, and a named expression a variable.
A line of code that names an expression is an assignment statement.
For instance, consider the following sequence of assignment statements. Read it first. Then, play it to see if you get what you'd expect.
Q: Notice that the last line is in fact an expression, which is just one of the identifiers. What happens if you remove it? You can do that by commenting it out, namely, typing a
#
sign in front of it.
In [ ]:
x = 5
x_squared = x * x
x_to_the_4th = x_squared * x_squared
x_to_the_4th
A function is similar to the same concept from mathematics: it produces some output, given one or more inputs.
And just like a mathematical function, Python distinguishes between the definition of a function from its use. To understand how functions work, it's helpful to see definitions and uses together.
The following program defines a function named square
that, given an input value $k$, computes its square, $k^2$. It then uses that function to compute $5^4$, which is of course equal to $(5^2)^2 = 625$, or two sequential uses of square
.
In [ ]:
def square (k):
k_squared = k * k
return k_squared
square (square (5))
Some things to observe:
square
is indented. The Python language requires that you do this, to indicate both the beginning and the end of the function's definition. (The end occurs at the next line that is not indented.)return
statement within the function's definition tells you what that value is.An important note about functions is that any identifiers assigned within its body are local to the body. That is, when a function is called (executed), it cannot normally replace the value of an identifier that was assigned outside the function.
Q: Based on the preceding claim, what is the value of 'z' in this example?
In [ ]:
z = 7
def tryToOverwrite (k):
z = k
y = z + k
return y
tryToOverwrite (10)
z
Beyond scalar values, you can also operate on collections of values.
There are many kinds of collections in Python. The simplest is the basic list, which is simply a sequence of values.
You can define small lists directly using a special kind of expression, in which the values (themselves expressions) appear as a comma-delimited sequence within square brackets:
[3, 1, 4, 1, 5, 9]
["my", "momma", "told", "me", "you", "better", "shop", "around"]
[72, "hello", square, square (5)]
Q: Try evaluating the above expressions.
In [ ]:
Q: What value does the following expression produce?
In [ ]:
[72, "hello", square, square (5)][2](10)
You can of course name a list.
In [ ]:
E = [2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 5]
Given a list L
, you can extract particular list values by an integer offset, or index.
The first element starts at index 0, and subsequence elements at positive integers. But you can also use negative indices to give offsets relative to one element past the last. Lastly, you can extract arbitrary subsequences, which are themselves (sub)lists, by a variety of different slicing operations. The following examples summarize these cases:
L[0]
L[len (L)-1] == L[-1]
L[-2]
[a:b]
: All elements starting at a
and ending at b-1
[a:b:s]
: All elements starting at a
, ending at b-1
, and including every s
elements. That is, the indices a
, a+s
, a+2*s
, ..., a+k*s <= b-1 < a+(k+1)*s
for some largest integer k
Q: Use slicing to create an expression that is a sublist of
E
above containing only the values 8. (You may exploit the exact definition ofE
.)
In [ ]:
A handy function is range (a, b)
, which returns a list of integers from $a$ to $b-1$, inclusive.
In [ ]:
n = 100000000
X = range (1, n+1) # Note the '+1', to get the sequence {1, 2, ..., n}
In [ ]:
s_analytical = n * (n+1) / 2
s_computed = sum (X)
assert s_computed == s_analytical
print ("Sum of {1, 2, ..., %d} = %d" % (n, s_computed))
By the way, here's a neat trick to output typeset mathematics (via $\LaTeX$), taken from this StackOverflow post.
Try hitting "Play" to see what it outputs. Note: This trick only works inside an IPython notebook.
In [ ]:
from IPython.display import Math
Math("\\sum_{i=1}^{%d} i = %d" % (n, s_computed))
It's probably obvious that one of your main goals is to write correct programs. But another important goal, especially if you want to scale to large data sets, is efficiency.
Often, efficiency is possible through the use of the right patterns or idioms. Just like idioms in natural language, programming language idioms are shortcuts that have a particular meaning.
Consider two different ways to carry out the sum of n
elements: using the sum()
function, as above, compared to using an explicit loop. You should observe that one is much faster than the other. Why?
In [ ]:
import time
t_start = time.time () # Start timer, which returns a time stamp in seconds
s_sum = sum (X)
t_sum = time.time () - t_start
assert s_sum+1 == s_analytical
t_start = time.time ()
s_loop = 0
for x_i in X:
s_loop = s_loop + x_i
t_loop = time.time () - t_start
assert s_loop == s_analytical
print ("Time using sum: %g seconds" % t_sum)
print ("Time using loop: %g seconds" % t_loop)