PyShop Session 1


This session introduces Python as an open source, high level programming language, as well as a community. By the end of the session, you should be familiar with the following necessary (or at least useful) components for being a participating member of the Python community:

  • The Python interpreter.
  • Anaconda Python.
  • Text editors.
  • Stack Exchange.
  • GitHub.

Additionally, this session will introduce Python style and syntax, data types, modules and packages, the standard workflow, objects and object oriented programming, documentation for collaboration, as well as some basic examples. By the end of this session you should feel comfortable setting up and working in your new Python environment.

What is Python?

Python is a programming language. More specifically, python is an interpreted, object oriented programming language. This means that Python is not compiled like C, but fed into the Python interpreter from the command line (or a script). This makes the Python workflow much smoother and easier to understand. Beyond technical issues, Python's syntax is very user friendly and has made it the first choice for programming courses. In fact, many economics departments already offer Python courses and MIT recently switched all of their introductory computer science courses to Python.

How is Python different from the other tools we use?

The programs I'm refering to are R, Stata, Matlab. These are not programming languages. Stata and Matlab are proprietary software that costs money, while R is an open source software package for statistics. R gets around the problem of having to pay for your calculations, but its syntax is not intuitive and it is geared solely towards statisticians. Matlab can compete with Python on parallelism, but it cannot compare in terms of capability on several other metrics: currentness, price, strings, portability, classes, and functions. Overall, the beauty of Python is that it was designed by computer programmers for their own needs and the code is constantly being updated and improved.

Who uses Python and what are its main features?

Who uses it:

  • Yahoo maps
  • Google!
  • Tons of computer games
  • Disney animators
  • NASA's Johnson Space Center and Los Alomos National Laboratories Theoretical Physics Division (#PhysicsEnvy)
  • The CIA website

But what about in academia? Just googling "'school name' economics python" for the top ten programs according to US news, the following programs either have high performance computing interests using Python or courses in Python:

  • UPenn
  • NYU Stern
  • Harvard Institute for Quantitative Social Science
  • Chicago
  • Kellogg
  • Columbia

On top of this, Python is fast becoming the intructory programming language for teaching. For instance, MIT recently switched all of their intro computer programming courses to Python.

What are its main features?

(this list borrowed from 'A Byte of Python' by Patrick Flemming)

  • Simplicity. Very streamlined syntax makes it easy to read.
  • Free!
  • High level. You don't need to manage memory usage, all of that is done under the hood.
  • Portable. Works on practically any operating system.
  • Interpreted. No compiling. When you run a script, the interpreter translates that into Bytecodes and then to the native language to run. This actually makes Python slightly slower than, for example, C++, but much easier to use. However, you can use modules to compile your programs to C if you really care all that much!
  • Object oriented. Python uses method attributes (we'll go over that later) to make object oriented programming easy, but it also supports procedure oriented programming as well.
  • Extensible. Python plays nice with C.
  • Libraries!!!! So many libraries, you won't know where to start.

How to get set-up.

The Python workflow that I use is a text editor and an interpreter, since I'm a Linux programmer. However, you may be interested in a MatLab like IDE. These can be really messy and complex, so beware, but if you want to do it I've heard Sublime Text 2 is nice, or just IDLE (the IDE included with Python) will work. Since I can't speak to this, though, I will talk about iPython and text editors. For step by step set up, see Pre-Workshop Exercises.

The Python interpreter and Anaconda Python

The first thing you need is Python! Most store bought computers have it, but we need some additional features, so I suggest you download Anaconda Python. It comes with 330+ packages, it's free, it works on any operating system, its light, and it includes the interactive iPython and iPython notebook. Made by Continuum Analytics, who also run Wakaari, a web based Python environment (but the Sciences Po firewall blocks it, so good luck using it here!).

Text editors

Not going to use an Interactive Development Environment (IDE) because too complicated, too many options. There are a ton of options for text editors, but I like GitHub's Atom text editor. It is easy to use, web based, and stylish! However, it can be a little slow...

GitHub

Knowing how to use GitHub is really a must for anyone hoping to collaborate on an open source project or just on a large programming project. It is a repository hosting service based on the Git distributed revision control system. Essentially, it is a free service that helps you to do version control and software updating without ever having to do any work! The GitHub workflow follows the following flow:

  • Create a branch of a project. Others can do the same while you work on the code, but your branch will be a snapshot from the time you created it, along with any changes you made.
  • Add commits. You make changes to the branch, which in GitHub jargon is a 'commit'.
  • Open a pull request. This starts a discussion with the community about your changes. Think of a 'pull request' as asking them to pull your branch back into the main flow.
  • Deploy. When things are looking good, you deploy the code to test for bugs.
  • Merge. When things are running smoothly, your changes will be merged back into the main project. GitHub takes care of all of the tedious parts, you just write the code and see if it works!

StackExchange

Ok, this is just in case you don't already know it, but Stack Exchange will be your best friend in the coming weeks. There is a massive community using this site to discuss Python, so create an account and start asking questions!

IPython and the IPython Notebook.

There are several technical reasons why you would want to use the IPython shell, including tab completion, help 'magic', debugging, and optimization, but these are all beyond the scope of the course. However, you should look into these in the IPython documentation. The reason WE are going to use IPython is it's ease of use and the IPython (or apparently now they call it Jupyter) Notebook.

To use the IPython Notebook simply type ipython notebook, at which point the computer will open up a notebook server where you can see and edit your own IPython notebooks. These are probably not the most efficient way to work in general, but they are a great teaching tool.

Ok, let's run a command! The prototypical first program is the Hello World! program. Here it is:


In [ ]:
print('Hello World!')

That's it! It is that easy. In fact, you can save this single line of code in a file ending in .py and then run it and you would get the same thing. Running a script can be done using the python command, but IPython is a better way to work.

You can open the IPython interpreter by typing ipython, then run your program by typing run my_program.py, as long as it is in your present working directory.

Ok, let's see a more complex example. First, we are going to set up inline plotting:


In [ ]:
%matplotlib inline

Now we are going to write a little program (I would save this as a file, hence the docstring, but I put it here for clarity) to plot a utility function:


In [ ]:
"""
Origin: Plotting a utilty function.
Filename: example_utility.py
Author: Tyler Abbot
Last modified: 8 September, 2015
"""
import numpy as np
import matplotlib.pyplot as plt

# Define the input variable
c = np.linspace(0.01, 10.0, 100.0)

# Calculate utility over the given space
U = np.log(c)

#Plot the function
plt.plot(c, U)
plt.xlabel('Consumption')
plt.ylabel('Utility')
plt.title('An Example Utility Function')
plt.show()

This example illustrates some of the basic points of Python programming and syntax.

  • Docstring. A docstring is a 'string literal' that informs the reader (not the computer!) of the usage for a program, function, module, etc. It is good practice to include information about your program at the beginning so that others, and you for that matter, can figure out what it is for later. I got this habit from Tom Sargent's Quant Econ course and have used it religiously (although I dont always remember to change the modification date...).

  • Import statements. One of the great things about Python is that you can pick and choose what functionality you would like. This is where import statements come in. At the beginning of your program you tell Python exactly what modules and packages you would like. It is possible to import entire modules, but this is frowned upon. You should be specific in order to keep your program light. We will come back to the syntax later, but here we are importing the numpy and matplotlib modules and defining more compact names for them.

  • Comments. In python you can comment text using the #, or a multiline comment can be surrounded by three quotation marks: """This is commented. """. There are certain comment style standards laid out in PEP8:

    1. Comments should be complete sentences.
    2. Use two spaces after a period.
    3. Comments should be in English (sorry French folk)
    4. Use block comments instead of inline comments (or at least use in line comments sparingly)
    5. Write Docstrings!

    In general, save the multiline comment for docstrings.

  • Variable definition. In Python you use the '=' sign for variable definitions. Variable type is assumed based on the definition, but you can define the variable more precisely if you'd like. In general, Python variables are local in scope, that is they are defined for use within a function or program. However, you should be careful to use different variable names in different functions, as if Python cannot find a variable in the local 'namespace', then it will move to the global namespace.

  • Function call. When you call a function in Python, you simply pass it the required arguments. Things can get more complicated, but we will discuss this later. When you import a module, the module itself is an object. This is sort of a philosophical point, but pretty much everything in Python, functions, modules, variables, etc., is an object. Given this, you can reference methods of those objects. Here, when we do plt.plot(foo), we are actually telling the computer to go into matplotlib and find the plot function, then to run that function on the variable foo. This is the idea behind object oriented programming. I know, this was a very short explanation, but either I wave some hands or I write a book.

  • Methods. Throughout the example you'll notice the syntax of object.method. This is a very pythonic way of programming, as everything in Python is an object, be it a variable you have defined or a module you import. A "method" is a function or attribute defined within the class to which the object belongs. We're getting ahead of ourselves, but it suffices to know that this syntax refers to a method.

Behond these basic parts to a Python program, there is a lot of focus on syntax and style in the community. This is why after 25 years the language is still concise and clear. To get an idea of the philosophy behind python style, run the following cell:


In [ ]:
import this

Modules and packages

A module is a way for Python to save definitions for later use. A module file simply contains the definitions of functions, classes, etc., and you can write your own if you want to. Modules can also contain executable statements that are run the first time the code is used, ie when you import. It is important to note that when you import a module, the system will search in your PYTHONPATH, a system variable that you may have to define yourself (I think on the newest version of Anaconda this is not a problem, but I'm not sure...).

A package is a larger container of modules. For instance, Matplotlib is a package and pyplot is a module. This is simply a vocabulary issue. You'll just import the stuff you need!

Here is a list of some of the most useful packages for economists (or anybody, really):

  • Numpy. This is probably the most used package in Python, and if not it's definitely the most used by scientists. It includes mainly the numpy array object and related linear algebra operators. Along with that it has some random number and Fourier transform capabilities.

  • Scipy. The 'Scipy Stack' actually incorporates most of the packages in this list, as well as some others. If you are talking about the 'Scipy Library' then you are referring to the namesake package that includes numerical routines. The part you'll use most is probably the numerical integration and optimization, but it also has great interpolation, sparse matrices, statistics, and linear algebra capabilities.

  • Matplotlib. An object oriented plotting utility. This package can be very easy to use, but also offers amazing customizability which, when combined with third party packages, can match any graphics software on the market.

  • Pandas. A data analysis library. Contains the DataFrame object (which is what everyone in this course is probably interested in), as well as multidimensional panel objects for panel data (I'm just learning about these! so neat!), and series objects. It also contains some statistical functions, but the most useful things are IO and data munging. You can use Pandas to read in large amounts of data in almost any format and write to almost anything, including HDF5 which allows you to work with 'bigdata'.

  • StatModels. A module for statistics. It seems to me to be a lot of econometric methods, and given that the founder is an econometrician it is probably pretty focused on stuff you will be interested in! Alongside a lot of the functionality, you get some nice statistical plotting as well.

  • Requests. This is probably less useful, but if you ever want to automate data retrieval you'll need to use http. Requests makes this easy. Their slogan is 'HTTP for Humans', which is pretty self explanatory.

  • Beautiful Soup. Again this is not as useful. This is a simple and easy to use module for parsing, navigating, searching, and modifying a document. This is particularly useful if you need to do any webscraping.

  • Scrapy. This module helps you to create web crawlers that you can use to gather data on the web. It even generates the file tree for you. I actually find this kind of terrifying, but you might find it useful.

  • Sympy. This is a symbolic algebra package that can do simplification, polynomial expansions, symbolic calculus, equation solving, combinatronics, and tons of other neat stuff!

Data structures

Python's data structures are, like everything else, object oriented. In this sense, each one has special methods that can be used to manipulate it. Here's a list of some of the basic data structures:

  • List
  • String
  • Tuple
  • Set
  • Dictionary
  • Numpy Array
  • Pandas DataFrame

Most of these data types will come up in this course, but we can't cover everything. You should take some time to read the documentation on these so that you are at least familiar with them.

Indexing in Python begins at zero and you can reference an item in a list, string, or tuple by its index. You can also reference a numpy array using indices, and in higher dimensions these are much easier to deal with than list indices. Dictionaries and DataFrames use keys to keep track of their contents. We will discuss DataFrames in more depth when we talk about Pandas. For now, we will talk just about the native list, string, tuple, and dictionary types.

Lists

A list some useful methods that you should look up in the documentation, such as pop, append, and sort. Two of the most useful methods with lists, though, are list comprehensions and lists as iterators.

List comprehensions

This is a concise and simple way to create a list. Instead of writing:


In [ ]:
x = []
for i in range(0, 50):
    x.append(i)
    
print(x)

we can directly fill the object x using what is called a list comprehension:


In [ ]:
x = [i for i in range(0, 50)]

print(x)

A list comprehension is a succinct way to write for loop that creates a list. You essentially place all of the syntax within the list definition, between the []'s.

List as iterator

If you have a list of items, say variable names, and you would like to iterate over these variables, you can use them as an iterator. For example:


In [ ]:
names = ["var_1", "var_2", "var_3"]
for variable in names:
    print(variable)

This is the simplest (for me) form of what's called an iteraterable, beyond simply a list of numbers. Many objects in Python are "iterable": lists, strings, arrays, even text files. If you are interested in how these objects work in the context of iterators and a more general object known as "generators", I encourage you to check out the Python practice book, although it isn't quite necessary for our work.

Loops

When programming we think of two different kinds of loops: indefinite and definite loops. A "definite loop" is one where the number of iterations is known in advance; the definition of the loop specifies the number of iterations. An "indefinite loop" is one where the number of iterations is unknown in advance; the definition of the loop specifies a condition.

Here are two examples:


In [ ]:
# Definite loop
for i in range(0, 10):
    print(i)

# Indefinite loop
import random
x = 0.0
while x < 1:
    x += random.random()
    print(x)

In Python, definite loops seem to be the norm, while in C indefinite loops are used more often. I encourage you to stick to definite loops, as indefinite ones can be unruly and a runaway loop can crash your computer quite easily.

Strings

A "String" is a list of letters and characters. Strings in Python behave in a similar way to lists, but treating a letter as an entry:


In [ ]:
# Define a string using quotes
x = 'Hello!  I am a string!'
print(x)

# The type of quote is irrelevant
x = "Hello!  I am a string!"
print(x)

# Reference stings in the same way as a list,
# but indices refer to position in the string
print(x[0])
print(x[:5])

# Strings support arithmetic operations similar to lists
print(x + x)

Strings are different from lists, however, in the set of methods that are associated to them:


In [ ]:
# Change the case
print(x.upper())
print(x.lower())

# Find the index of a substring
print(x.find('I am'))
print(x[x.find('I am'):x.find('I am') + 4])

Strings offer a ton of special methods, so if you are interested in them, check out the official documentation.

Tuple

A tuple is an immutable datatype. That is, once it is defined it cannot be changed. Tuples are often used for defining things like global parameters. In the case where you need to do a lot of analysis, defining a variable you do not want to change as a tuple will protect you from making a mistake.


In [ ]:
# Defining a tuple with or without parentheses
tup = 'a', 'b'
tup = ('a', 'b')

# Tuples can contain different data
tup = 'a', 2

# Trying to modify a tuple will cause an error
tup[0] = 1

Dictionaries

Dictionaries are, frankly, a mystery to me. There is a very nice discussion of their uses on the **Think Like a Computer Scientist** open book project page. As an economist you probably won't need them very often, but you will have to interact with them. It's for this reason that I include them here, but if you want to learn more, check out the documenation.


In [ ]:
# Defining a dictionary
students_grades = {"Joe": [10., 15., 12.],
                   "Jane": [12., 16., 14.],
                   "Nick": [8., 6., 6.]}
print(students_grades)

# Retrieving information from the dictionary
print(students_grades['Joe'])

# Looping over the information
students_averages = {}
for student, grades in students_grades.items():
    students_averages[student] = sum(grades)/len(grades)

print(students_averages)

Conclusion

Although this set of notes covered only briefly the data types, I hope it has given you at least a bit of understanding about the native species in the Python ecosystem. In the upcoming courses we will discuss other classes of objects with different features, but they all work around these basic datatypes.

For more in depth study, see the exercises and homework for outside readings.