Introduction to Python

This tutorial was originally drawn from Scipy Lecture Notes by this list of contributors. I've continued to modify it as I use it.

This work is CC-BY. Author: Aaron L. Brenner

Python is a programming language, as are C, Fortran, BASIC, PHP, etc. Some specific features of Python are as follows:

  • an interpreted (as opposed to compiled) language. Contrary to e.g. C or Fortran, one does not compile Python code before executing it. In addition, Python can be used interactively: many Python interpreters are available, from which commands and scripts can be executed.
  • a free software released under an open-source license: Python can be used and distributed free of charge, even for building commercial software.
  • multi-platform: Python is available for all major operating systems, Windows, Linux/Unix, MacOS X, most likely your mobile phone OS, etc.
  • a very readable language with clear non-verbose syntax* a language for which a large variety of high-quality packages are available for various applications, from web frameworks to scientific computing.
  • a language very easy to interface with other languages, in particular C and C++.
  • Some other features of the language are illustrated just below. For example, Python is an object-oriented language, with dynamic typing (the same variable can contain objects of different types during the course of a program).

See https://www.python.org/about/ for more information about distinguishing features of Python.

Some Key Learning and Reference Resources

If you are interested in moving forward with learning Python, it is worth your time to get acquainted with all of these resources. The tutorial will step you through more Python, and you should be familiar with the basics of the Python language and its standard library.

Python documentation home https://docs.python.org/3/

Tutorial https://docs.python.org/3/tutorial/index.html

Python Language Reference https://docs.python.org/3/reference/index.html#reference-index

The Python Standard Library https://docs.python.org/3/library/index.html#library-index

Additional Learning Resources

The Python Cookbook, 3rd Edition - This is one of many various 'cookbooks'. These can be very useful not only for seeing solutions to common problems, but also as a way to read brief examples of ideomatic code. Reading code snippets in this way can be a great compliment to language reference documentation and traditional tutorials. http://chimera.labs.oreilly.com/books/1230000000393/

Also, don't be embarrased to Google your questions! Try some variation of python [thing] example

Let's dive in* with an example that does something (kind of) useful:

* hat tip to Mark Pilgrim

This is a script that inspects a CSV data file and reports on some summary characteristics. Take a minute to read over the code before running it. Don't worry if you don't understand all of what's happening. We'll step through some of this code in more detail as we learn the basics of python. For now, just try to get a feel for what a complete script looks like. After you've read it over, go ahead and execute it.


In [ ]:
# open the source CSV file
csv = open("cars.csv")

# create a list with the column names. we assume the first row contiains them.
# we strip the carriage return (if there is one) from the line, then split values on the commas.
# Note: this uses a nifty python feature called 'list comprehension' to do it in one line
column_names = [i for i in csv.readline().strip().split(',')]

# read the rest of the file into a matrix (a list of lists). Use the same strip and split methods.
data = [line.strip().split(',') for line in csv.readlines()]

# now, try to infer the data types of each column from the values in the first row.
# the testing here shows some string methods, like isspace(), isalpha(), isdigit().
# we'll save these data type assumptions because we'll use them later in a report.
column_datatypes = []
for value in data[0]:
    if len(value) < 1 or value.isspace():
        column_datatypes.append('string')
    elif value.isalpha():
        column_datatypes.append('string')
    elif '.' in value or value.isdigit():
        column_datatypes.append('numeric')
    else:
        column_datatypes.append('string')
        
# now let's do some basic reporting on the csv
# overall stats of the file:
print("this csv file has " + str(len(column_names)) + " columns and " + str(len(data)) + " rows.")

# loop over each column name, do some different things depending on whether we've inferred 
# it contains string or numeric values. we declare certain variables with 'False' so even if
# we can't fill them we can test them without an error.
for i, value in enumerate(column_names):
    average_value = False
    highest_value = False
    lowest_value = False
    
    # if it's a numeric column, we'll get all the values for this column out of our data matrix,
    # convert them to float (remember they are all strings by default), and then get the average,
    # high, and low values. If there's an error doing this, just get the values as strings
    if column_datatypes[i] == 'numeric':
        try:
            column_values = [float(data[j][i]) for j in range(len(data))]
            average_value = sum(column_values)/len(column_values)
            highest_value = sorted(column_values)[-1]
            lowest_value = sorted(column_values)[0]
        except ValueError:
            column_values = [data[j][i] for j in range(len(data))]
    else:
        column_values = [data[j][i] for j in range(len(data))]
    
    # the set function removes duplicates from a list, so taking its length is equivilent
    # to the number of unique values
    unique_value_count = len(set(column_values))
    
    # now we start printing. First just the field name. The simple way of formatting a string
    # is with the + operator. Note: we add one to the index because we don't want our list
    # to start with zero.
    print(str(i+1) + ". \"" + value + "\"")
    
    # next the type we think it is, and the number of unique values
    # Note: using the + style of string formatting all non-string values have to be cast to strings
    print("\t{0} ({1} of {2} unique)".format(column_datatypes[i], unique_value_count, len(data)))
    
    # now different details if it's numeric and successfully converted to float, if it's
    # numeric, and didnt', and otherwise we assume it's a string.
    # Note: also showing a different, more powerful string formatting method here
    if column_datatypes[i] == 'numeric':
        if average_value:
            print("\taverage value: {0:g}".format(average_value))
            print("\tlowest value: {0:g}".format(lowest_value))
            print("\thighest value: {0:g}".format(highest_value))
        else:
            print("\tNOTE: problems converting values to float!")
    else: 
            print("\tfirst value: {0:s}".format(column_values[0]))
            print("\tlast value: {0:s}".format(column_values[-1]))

After you've run this as-is, change the name of the CSV file from cars.csv to cities.csv. Run it again.

First Steps

Follow along with the instructor in typing instructions:


In [ ]:


In [ ]:


In [ ]:


In [ ]:

Two variables a and b have been defined above. Note that one does not declare the type of an variable before assigning its value.

In addition, the type of a variable may change, in the sense that at one point in time it can be equal to a value of a certain type, and a second point in time, it can be equal to a value of a different type. b was first equal to an integer, but it became equal to a string when it was assigned the value ’hello’. But you can see that type often matters, as when we try to print an integer in the midst of a string.

Basic Types

Numerical Types

Python supports the following numerical, scalar types.

Integer


In [ ]:
1 + 1

Remember how we saw integers in our CSV script example? The number of columns and number of rows are integers


In [ ]:
a = len(column_names)
type(a)

Floats

Note: most decimal fractions cannot be represented exactly as binary fractions, and certain operations using floats may lead to surprising results. For more details, start here.


In [ ]:
c = 2.1
type(c)

In [ ]:
type(average_value)

Booleans


In [ ]:
3 > 4

In [ ]:
test = (3 > 4)
test

In [ ]:
type(test)

A Python shell can therefore replace your pocket calculator, with the basic arithmetic operations +, -, *, /, % (modulo) natively implemented.

Try some things here or follow along with the instructor's examples:


In [ ]:


In [ ]:


In [ ]:

Type conversion (casting):


In [ ]:
float(1)

Comments

Commenting code is good practice and is extremely helpful to help others understand your code. And, often, to help you understand code that you've written earlier.

In python, everything following the hash/pound sign # is a comment. Comments can either be their own line(s), or in-line. Use in-line comments sparingly.


In [ ]:
# this is a comment. We might say, for example, that we're setting the value of Pi:
pi = 3.14

pie = 'pumpkin' # and this is an in-line comment. Setting the value of pie.

Exercise

Add three of the same single digit integers (e.g. 1 + 1 + 1). Is the result what you expected?

Next, add three of the same tenths digit, which are floats (e.g. .1 + .1 + .1). Is the result what you expected?


In [ ]:


In [ ]:

What happened with those floats? How could we avoid this? There is an explanation and some suggestions in this python documentaion on floating points.

Containers

Python provides many efficient types of containers, in which collections of data can be stored.

Lists

A list is an ordered collection of objects, that may have different types. For example:


In [ ]:
l = ['red', 'blue', 'green', 'black', 'white']
type(l)

And remember in our CSV script example, we used lots of lists! There was a list to store the column names, which we assumed were in the first row of data:


In [ ]:
column_names

And then each row of the CSV was itself a list, and all the rows were another list. So we used a list of lists, or a matrix.


In [ ]:
data

Indexing: accessing individual objects contained in the list:


In [ ]:
column_names[0]

In [ ]:
column_names[-1]

Indexing starts at 0, not 1!

Slicing: obtaining sublists of regularly-spaced elements:


In [ ]:
column_names[1:3]

Note that l[start:stop] contains the elements with indices i such as start<= i < stop (i ranging from start to stop-1). Therefore, l[start:stop] has (stop - start) elements.

Slicing syntax: l[start:stop:stride]

Lists are mutable objects and can be modified:


In [ ]:
column_names[0] = 'LOOK AT ME!'
column_names

The elements of a list may have different types:


In [ ]:
l = [3, -200, 'hello']
l

Python offers a large panel of functions to modify lists, or query them. Here are a few examples; for more details, see https://docs.python.org/tutorial/datastructures.html#more-on-lists

Add and remove elements:


In [ ]:
L = ['red', 'blue', 'green', 'black', 'white']
L.append('pink')
L

In [ ]:
L.pop() # removes and returns the last item

In [ ]:
L

Add a list to the end of a list with extend()


In [ ]:
L.extend(['pink', 'purple']) # extend L, in-place

In [ ]:
L

In [ ]:
L = L[:-2]
L

Two ways to reverse a list:


In [ ]:
r = L[::-1]
r

In [ ]:
r.reverse() # in-place
r

Concatenate lists:


In [ ]:
r + L

Sort:


In [ ]:
sorted(r) # new object

In [ ]:
r

In [ ]:
r.sort() #in-place
r

Exercise

We used sorted() in our CSV example a few times. That, along with list indexes, helped us get the lowest and highest value. Here's what it looked like:

highest_value = sorted(column_values)[-1] lowest_value = sorted(column_values)[0]

Try creating your own list of unsorted items. Can you replicate the highest and lowest value expressions above? What happens if your list is not made up of numbers?


In [ ]:

Methods The notation r.method() (e.g. r.append(3) and L.pop()) is our first example of object-oriented programming (OOP). Being a list, the object r has a method function that is called using the notation .methodname(). We will talk about functions later in this tutorial.

When you're using jupyter, to see all the different methods available to a variable, type a period after the variable name and hit the tab key.


In [ ]:

Strings

We've already seen strings a few times. Python supports many different string syntaxes (single, double or triple quotes):


In [ ]:
s = 'Hello, how are you?'
s = "Hi, what's up"
s = '''Hello, # tripling the quotes allows the
how are you''' # the string to span more than one line
s = """Hi,
what's up?"""

Double quotes are crucial when you have a quote in the string:


In [ ]:
s = 'Hi, what's up?'

The newline character is \n, and the tab character is \t.

Strings are collections like lists. Hence they can be indexed and sliced, using the same syntax and rules.

Indexing strings:


In [ ]:
a = "hello"
a[0]

In [ ]:
a[1]

In [ ]:
a[-1]

Accents and special characters can also be handled in strings because since Python 3, the string type handles unicode (UTF-8) by default.(For a lot more on Unicode, character encoding, and how it relates to python, see https://docs.python.org/3/howto/unicode.html).

A string is an immutable object and it is not possible to modify its contents. If you want to modify a string, you'll create a new string from the original one (or use a method that returns a new string).


In [ ]:
a = "hello, world!"
a[2] = 'z'

In [ ]:
a.replace('l', 'z', 1)

In [ ]:
a.replace('l', 'z')

Strings have many useful methods, such as a.replace as seen above. Remember the a. object-oriented notation and use tab completion or help(str) to search for new methods.

Exercise

We used a few string methods in the CSV example at the start. See them in this chunk of code?

if len(value) < 1 or value.isspace(): column_datatypes.append('string') elif value.isalpha(): column_datatypes.append('string') elif '.' in value or value.isdigit():

Now you try. Create a new variable and assign it with a string. Then, try a few of python's string methods to see how you can return different versions of your string, or test whether it has certain characteristics.


In [ ]:

String formatting:

We also saw string formatting in the CSV example, when we printed some of the reporting:

print("\t{0} ({1} of {2} unique)".format(column_datatypes[i], unique_value_count, len(data)))


In [ ]:
'An integer: {0} ; a float: {1} ; another string: {2} '.format(1, 0.1, 'string')

In [ ]:
i = 102
filename = 'processing_of_dataset_{0}.txt'.format(i)
filename

Dictionaries

A dictionary is basically an efficient table that maps keys to values. It is an unordered container:


In [ ]:
tel = {'emmanuelle': 5752, 'sebastian': 5578}
tel['francis'] = 5915
tel

In [ ]:
tel['sebastian']

In [ ]:
tel.keys()

In [ ]:
tel.values()

In [ ]:
'francis' in tel

It can be used to conveniently store and retrieve values associated with a name (a string for a date, a name, etc.). See https://docs.python.org/tutorial/datastructures.html#dictionaries for more information.

A dictionary can have keys (resp. values) with different types:


In [ ]:
d = {'a':1, 'b':2, 3:'hello'}
d

The assignment operator

Python library reference says:

Assignment statements are used to (re)bind names to values and to modify attributes or items of mutable objects.

In short, it works as follows (simple assignment):

  1. an expression on the right hand side is evaluated, the corresponding object is created/obtained
  2. a name on the left hand side is assigned, or bound, to the r.h.s. object

Things to note:

  • a single object can have several names bound to it:

In [ ]:
a = [1, 2, 3]
b = a
a

In [ ]:
b

In [ ]:
a is b

In [ ]:
b[1] = "hi!"
a

Control Flow

Controls the order in which the code is executed.

Conditional Expressions

if <THING>
    Evaluates to false:
        * any number equal to zero (0,0.0)
        * an empty container (list, dictionary)
        * False, None
    Evaluates to True:
        * everything else

a == b Tests equality:

if/elif/else


In [ ]:
if 2**2 == 4:
    print('Obviously')

a in b For any collection b check to see if b contains a:


In [ ]:
b = [1,2,3]
2 in b

In [ ]:
5 in b

Blocks are delimited by indentation

Type the following lines in your Python interpreter, and be careful to respect the indentation depth. The Jupyter Notebook automatically increases the indentation depth after a coln : sign; to decrease the indentation depth, go four spaces to the left with the Backspace key. Press the Enter key twice to leave the logical block.


In [ ]:
a = 10

if a == 1:
    print(1)
elif a == 2:
    print(2)
else:
    print("A lot")

for/range

Iterating with an index:


In [ ]:
for i in range(4):
    print(i)

But most often, it is more readable to iterate over values:


In [ ]:
for word in ('cool', 'powerful', 'readable'):
    print('Python is %s ' % word)

Advanced iteration

Iterate over any sequence

You can iterate over any sequence (string, list, keys in a dictionary, lines in a file, ...):


In [ ]:
vowels = 'aeiouy'

for i in 'powerful':
    if i in vowels:
        print(i)

In [ ]:
message = "Hello how are you?"
message.split() # returns a list

In [ ]:
for word in message.split():
    print(word)

Few languages (in particular, languages for scientific computing) allow to loop over anything but integers/indices.

With Python it is possible to loop exactly over the objects of interest without bothering with indices you often don’t care about. This feature can often be used to make code more readable.

It is not safe to modify the sequence you are iterating over.

Keeping track of enumeration number

Common task is to iterate over a sequence while keeping track of the item number.

  • Could use while loop with a counter as above. Or a for loop:

In [ ]:
words = ['cool', 'powerful', 'readable']
for i in range(0, len(words)):
    print(i, words[i])
  • But, Python provides a built-in function - enumerate - for this:

In [ ]:
for index, item in enumerate(words):
    print(index, item)

When looping over a dictionary use .items():


In [ ]:
d = {'a': 1, 'b':1.2, 'c':"hi"}

for key, val in sorted(d.items()):
    print('Key: %s has value: %s ' % (key, val))

The ordering of a dictionary in random, thus we use sorted() which will sort on the keys.

Exercise

Countdown to blast off

Write code that uses a loop to print a count down from 10 to 1, followed by printing the string "blast off!". There's more than one way to do this, so figure out what works for you, drawing on things we've already learned.


In [ ]:

List Comprehensions


In [ ]:
[i**2 for i in range(4)]

Same as:


In [ ]:
l = []
for i in range(4):
    l.append(i)
l

Now that you've done the countdown exercise above, consider how you could have used a list comprehension in a solution:


In [ ]:
[10 - i for i in range(10)]

Defining Functions

Function definition


In [ ]:
def test():
    print('in test function')
    
test()

Function blocks must be indented as other control flow blocks

Return statement

Functions can optionally return values:


In [ ]:
def disk_area(radius):
    return 3.14 * radius * radius

disk_area(1.5)

By default, functions return None.

Note the syntax to define a function:

  • the def keyword;
  • is followed by the function’s name, then
  • the arguments of the function are given between parentheses followed by a colon.
  • the function body;
  • and return object for optionally returning values.

Parameters

Mandatory parameters (positional arguments):


In [ ]:
def double_it(x):
    return x * 2

double_it(3)

In [ ]:
double_it()

Optional parameters (keyword or named arguments)


In [ ]:
def double_it(x=2):
    return x * 2

double_it()

In [ ]:
double_it(3)

Keyword arguments allow you to specify default values.

Default values are evaluated when the function is defined, not when it is called. This can be problematic when using mutable types (e.g. dictionary or list) and modifying them in the function body, since the modifications will be persistent across invocations of the function.

Global variables

Variables declared outside the function can be referenced within the function:


In [ ]:
# We're defining a global variable for pi, and it's actually a special kind of global 
# because we intend it to be constant (i.e. it's value doesn't change). There's a convention
# of using uppercase in naming constants. See https://www.python.org/dev/peps/pep-0008/#constants
PI = 3.14159

def disk_area(radius):
    return PI * radius * radius

disk_area(1.5)

Docstrings

Documentation about what the function does and its parameters. General convention:


In [ ]:
def funcname(params):
    """Concise one-line sentence describing the function.

    Extended summary which can contain multiple paragraphs.
    """
    # function body
    pass

There's a great help feature build into Jupyter: type a question mark after any object or function to get quick access to its docstring. Try it:


In [ ]:
funcname?

Docstring guidelines For the sake of standardization, the Docstring Conventions webpage documents the semantics and conventions associated with Python docstrings.

Also, the Numpy and Scipy modules have defined a precise standard for documenting scientific functions, that you may want to follow for your own functions, with a Parameters section, an Examples section, etc. See http://projects.scipy.org/numpy/wiki/CodingStyleGuidelines#docstring-standard and http://projects.scipy.org/numpy/browser/trunk/doc/example.py#L37

Create your own function

Use any features of Python that we've already worked on, or play with something new from the documentation. Write a function, and include a docstring explaining the function's purpose. Test your function by executing it. If your function uses parameters, try calling the function a few times with different parameters.

Define the function here:


In [ ]:

Call the function here:


In [ ]:

Importing modules

Importing objects from modules

Modules let you use code that doesn't reside within the notebook and might be part of the standard library. (more on this later)


In [ ]:
import os

os

Methods

Methods are functions attached to objects. You’ve seen these in our examples on lists, dictionaries, strings, etc...


In [ ]:
os.listdir('.')

And also:


In [ ]:
from os import listdir

listdir('.')

Using alias:


In [ ]:
import pandas as pd

Modules are thus a good way to organize code in a hierarchical way. Actually, all the data science tools we are going to use are modules:


In [ ]:
import pandas as pd

pd.Series([0,1,2,3,4,5,6,7,8,9])

Good Practices

  • Use meaningful object names
  • Indentation: no choice!

Indenting is compulsory in Python! Every command block following a colon bears an additional indentation level with respect to the previous line with a colon. One must therefore indent after def f(): or while:. At the end of such logical blocks, one decreases the indentation depth (and re-increases it if a new block is entered, etc.)

Strict respect of indentation is the price to pay for getting rid of { or ; characters that delineate logical blocks in other languages. Improper indentation leads to errors such as:

------------------------------------------------------------
IndentationError: unexpected indent (test.py, line 2)

All this indentation business can be a bit confusing in the beginning. However, with the clear indentation, and in the absence of extra characters, the resulting code is very nice to read compared to other languages.

  • Indentation depth: Inside your text editor, you may choose to indent with any positive number of spaces (1, 2, 3, 4, ...). However, it is considered good practice to indent with 4 spaces. You may configure your editor to map the Tab key to a 4-space indentation. In Python(x,y), the editor is already configured this way.
  • Style guidelines Long lines: you should not write very long lines that span over more than (e.g.) 80 characters. Long lines can be broken with the \ character

In [ ]:
long_line = "Here is a very very long line \
... that we break in two parts."

Spaces Write well-spaced code: put whitespaces after commas, around arithmetic operators, etc.:


In [ ]:
a = 1 # yes
a=1 # too cramped

A certain number of rules for writing “beautiful” code (and more importantly using the same conventions as anybody else!) are given in the PEP-8: Style Guide for Python Code.

Input and Output

We write or read strings to/from files (other types must be converted to strings). To write in a file:


In [ ]:
f = open('workfile.txt', 'w') # opens the workfile file in writing mode
type(f)

In [ ]:
f.write('This is a test \nand another test')
f.close() # always use close() after opening a file! Very important!

To read from a file:


In [ ]:
f = open('workfile.txt', 'r')
s = f.read()
print(s)
f.close()

Iterating over a file


In [ ]:
f = open('workfile.txt', 'r')
for line in f:
    print(line)
f.close()

File modes:

  • r: Read-only
  • w: Write-only
    • Note: This will create a new file or overwrite an existing file
  • a: Append to a file
  • r+: Read and Write

For more information about file modes read the documentation for the open() function. https://docs.python.org/3.5/library/functions.html#open

The Standard Library

Reference documentation for this section:

os module: operating system functionality

"A portable way of using operating system dependent functionality.”

**Directory and file manipulation

Get the current directory:


In [ ]:
import os
os.getcwd()

List a directory:


In [ ]:
os.listdir(os.curdir)

Make a directory:


In [ ]:
os.mkdir('junkdir')

'junkdir' in os.listdir(os.curdir)

Rename the directory:


In [ ]:
os.rename('junkdir', 'foodir')

'junkdir' in os.listdir(os.curdir)

In [ ]:
'foodir' in os.listdir(os.curdir)

In [ ]:
os.rmdir('foodir') #remove directory
'foodir' in os.listdir(os.curdir)

Delete a file:


In [ ]:
fp = open('junk.txt', 'w')
fp.close()
'junk.txt' in os.listdir(os.curdir)

In [ ]:
os.remove('junk.txt')
'junk.txt' in os.listdir(os.curdir)

glob: Pattern matching on files

The glob module provides convenient file pattern matching.

Find all files ending in .txt:


In [ ]:
import glob

glob.glob('*.txt')

Exception handling in Python

It is likely that you have raised Exceptions if you have typed all the previous commands of the tutorial. For example, you may have raised an exception if you entered a command with a typo.

Exceptions are raised by different kinds of errors arising when executing Python code. In your own code, you may also catch errors, or define custom error types. You may want to look at the descriptions of the built-in Exceptions when looking for the right exception type.

Exceptions

Exceptions are raised by errors in Python:


In [ ]:
1/0

In [ ]:
d = {1:1, 2:2}

d[3]

In [ ]:
l = [1, 2, 3]

l[4]

In [ ]:
l.foobar

Catching exceptions

try/except


In [ ]:
while True:
    try:
        x = int(input('Please enter a number: '))
        break
    except ValueError:
        print('That was no valid number. Try again...')
x

try/finally


In [ ]:
try:
    x = int(input('Please enter a number: '))
finally:
    print('Thank you for your input.')

Raising exceptions

  • Capturing and reraising an exception:

In [ ]:
def filter_name(name):
    try:
        name = name.encode('ascii')
    except UnicodeError as e:
        if name == 'Gaël':
            print("OK, Gaël")
        else:
            raise e
    return name

filter_name("Gaël")

In [ ]:
filter_name('Stéfan')
  • Exceptions to pass messages between parts of the code:

In [ ]:
def achilles_arrow(x):
    if abs(x - 1) < 1e-3:
        raise StopIteration
    x = 1 - (1-x)/2.
    return x

x = 0

while True:
    try:
        x = achilles_arrow(x)
    except StopIteration:
        break

x

Use exceptions to notify certain conditions are met (e.g. StopIteration) or not (e.g. custom error raising)


In [ ]: