CSE 6040, Fall 2015 [01]: Intro to Python

In this class, you will do a lot of development in the Python language. Python is great for rapid prototyping, meaning you can get started quickly. Yet, it's much more than a toy language, and so very practical for industrial use. In fact, Python is one of the 3 main languages in use at Google's production code!

Q: Can you name the other two languages?

It also has extensive library support, meaning a lot of the code you will need day-to-day already exists -- you just need to find it, download it (if necessary), and start using it in your own programs.

Right now, you are looking at an IPython notebook. An IPython notebook is a document that mixes text, code, and code output. It's a little bit different than running Python from an interpreter command-line, which you may have done at the MS Analytics Python 101 Bootcamp. We will use them for in-class exercises because, from a teaching and learning point of view, it's a great way to create structured, self-documenting worksheets that you can play with and revise.

The goal of this notebook is to introduce you to how IPython notebooks work and practice writing some Python code.

Note: For a different tutorial, refer to the one included in the Python documentation or the MSA Python 101 Boot Camp (see above).

Concept 0: Expressions, identifiers, and (assignment) statements

The basic building block of any Python program is an expression. A Python expression is just like a mathematical expression: it's a snippet of values and operations that, when executed, produce a value.

Examples:

  • An integer value: 3
  • A floating-point ("real") value: 3.0
  • A string: "hello"
  • The addition of two integers: 3+5
  • The addition of two strings: "hello" + ", world"

In an IPython notebook, you enter Python code in designated code blocks, denoted by the prefix, In [...]:

Q: Try typing any one of the above expressions, and then press the "Play" button. What happens if you enter more than one expression (each on its own line)?


In [ ]:

You can also name an expression, so that you can refer to it elsewhere in your program. The name is sometimes also called an identifier, and a named expression a variable.

A line of code that names an expression is an assignment statement.

For instance, consider the following sequence of assignment statements. Read it first. Then, play it to see if you get what you'd expect.

Q: Notice that the last line is in fact an expression, which is just one of the identifiers. What happens if you remove it? You can do that by commenting it out, namely, typing a # sign in front of it.


In [ ]:
x = 5
x_squared = x * x
x_to_the_4th = x_squared * x_squared
x_to_the_4th

Concept 1: Functions and identifier scope

A function is similar to the same concept from mathematics: it produces some output, given one or more inputs.

And just like a mathematical function, Python distinguishes between the definition of a function from its use. To understand how functions work, it's helpful to see definitions and uses together.

The following program defines a function named square that, given an input value $k$, computes its square, $k^2$. It then uses that function to compute $5^4$, which is of course equal to $(5^2)^2 = 625$, or two sequential uses of square.


In [ ]:
def square (k):
    k_squared = k * k
    return k_squared

square (square (5))

Some things to observe:

  • Each statement within the body of square is indented. The Python language requires that you do this, to indicate both the beginning and the end of the function's definition. (The end occurs at the next line that is not indented.)
  • Following the function definition, there is an assignment statement whose right-hand side consists of two uses of the function, or two function calls. The form suggests that a function call is in fact a certain kind of expression.
  • Procedurally, the function call transfers control of the program to the function definition, with the arguments at the definition replaced by the values given at the call site, which is the location of the call.
  • Since the function call is an expression, it must take some value when the call is complete. The return statement within the function's definition tells you what that value is.

An important note about functions is that any identifiers assigned within its body are local to the body. That is, when a function is called (executed), it cannot normally replace the value of an identifier that was assigned outside the function.

Q: Based on the preceding claim, what is the value of 'z' in this example?


In [ ]:
z = 7

def tryToOverwrite (k):
    z = k
    y = z + k
    return y

tryToOverwrite (10)
z

Concept 2: Lists

Beyond scalar values, you can also operate on collections of values.

There are many kinds of collections in Python. The simplest is the basic list, which is simply a sequence of values.

You can define small lists directly using a special kind of expression, in which the values (themselves expressions) appear as a comma-delimited sequence within square brackets:

  • A list of integers: [3, 1, 4, 1, 5, 9]
  • A list of strings: ["my", "momma", "told", "me", "you", "better", "shop", "around"]
  • A mixed list: [72, "hello", square, square (5)]

Q: Try evaluating the above expressions.


In [ ]:

Q: What value does the following expression produce?


In [ ]:
[72, "hello", square, square (5)][2](10)

You can of course name a list.


In [ ]:
E = [2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 5]

Given a list L, you can extract particular list values by an integer offset, or index.

The first element starts at index 0, and subsequence elements at positive integers. But you can also use negative indices to give offsets relative to one element past the last. Lastly, you can extract arbitrary subsequences, which are themselves (sub)lists, by a variety of different slicing operations. The following examples summarize these cases:

  • First element: L[0]
  • Last element: L[len (L)-1] == L[-1]
  • Second-to-last element: L[-2]
  • Slice, [a:b]: All elements starting at a and ending at b-1
  • Strided slice, [a:b:s]: All elements starting at a, ending at b-1, and including every s elements. That is, the indices a, a+s, a+2*s, ..., a+k*s <= b-1 < a+(k+1)*s for some largest integer k

Q: Use slicing to create an expression that is a sublist of E above containing only the values 8. (You may exploit the exact definition of E.)


In [ ]:

A handy function is range (a, b), which returns a list of integers from $a$ to $b-1$, inclusive.


In [ ]:
n = 100000000
X = range (1, n+1) # Note the '+1', to get the sequence {1, 2, ..., n}

Concept 3: Sum the sequence

Compute the sum of the sequence X created above.


In [ ]:
s_analytical = n * (n+1) / 2
s_computed = sum (X)
assert s_computed == s_analytical
print ("Sum of {1, 2, ..., %d} = %d" % (n, s_computed))

By the way, here's a neat trick to output typeset mathematics (via $\LaTeX$), taken from this StackOverflow post.

Try hitting "Play" to see what it outputs. Note: This trick only works inside an IPython notebook.


In [ ]:
from IPython.display import Math
Math("\\sum_{i=1}^{%d} i = %d" % (n, s_computed))

Concept 4: Correctness and efficiency

It's probably obvious that one of your main goals is to write correct programs. But another important goal, especially if you want to scale to large data sets, is efficiency.

Often, efficiency is possible through the use of the right patterns or idioms. Just like idioms in natural language, programming language idioms are shortcuts that have a particular meaning.

Consider two different ways to carry out the sum of n elements: using the sum() function, as above, compared to using an explicit loop. You should observe that one is much faster than the other. Why?


In [ ]:
import time

t_start = time.time () # Start timer, which returns a time stamp in seconds
s_sum = sum (X)
t_sum = time.time () - t_start
assert s_sum+1 == s_analytical

t_start = time.time ()
s_loop = 0
for x_i in X:
    s_loop = s_loop + x_i
t_loop = time.time () - t_start
assert s_loop == s_analytical

print ("Time using sum: %g seconds" % t_sum)
print ("Time using loop: %g seconds" % t_loop)