PyShop: An Introduction to Python Workshop for Economists

Syllabus

Course Description

The main goal of this course is to introduce budding and established economists to the world of Python. It is truly that: a world unto itself. It would be impossible to effectively cover all of the topics in this course in 10 hours. However, computer programming is a hands-on endeavor, a craft that must be practiced. To that end, this course will try to introduce you to certain aspects of the Python ecosystem that might be useful for an economist, while providing resources for home study and direction towards other resources. At the same time, it will attempt to illuminate some of the more frustrating aspects and to speed up the incorporation of open source software into your workflow. In summary, this course will help you to set-up your Python environment, introduce the basic tenants of Python programming style and syntax, and touch on some more advanced topics in the hopes of giving you a sense of what may be possible in Python.

Course Format

The course will be made up of lecture and homework. The lectures will cover basic theory and ideas, while the homework will put into practice the ideas from the course. All of the course materials are provided in the form of IPython notebooks (except for the slides, which are pdf) and are fully available on the web with solutions to exercises. It is strongly encouraged that you do the homework in order to learn Python programming. Without practicle application, you will not retain much from the course.

Course Schedule

The course will be organized into five, two hour sessions.

Introduction to Python and Open Source Software. This session introduces Python as an open source, high level programming language, as well as a community. By the end of the session, you should be familiar with the following necessary (or at least useful) components for being a participating member of the Python community:
- The Python interpreter.
- Anaconda Python.
- Text editors.
- Stack Exchange.
- GitHub.
- Running Python code.
Additionally, this session will introduce Python style and syntax, data types, modules and packages, the standard workflow, objects and object oriented programming, documentation for collaboration, as well as some basic examples. By the end of this session you should feel comfortable setting up and working in your new Python environment.

Introduction to the Most Used Modules. This session focuses on some of the most useful modules, including NumPy, SciPy, Matplotlib, and Pandas. All of the features discussed will be introduced using basic examples. The main topics covered will be the following:
- NumPy arrays and their syntax.
- Linear algebra.
- Unconstrained optimization.
- Root finding.
- DataFrames.
- Simple plots.
This session is simply meant to introduce the main features of the big four modules and give students time to become more familar with simple Python programs. By the end of this session you should be comfortable enough with the basic Python interface to know what modules you need to import to do basic calculation, data entry, and data visualization.

Advanced Topics in Numerical Methods and Array Manipulation. This session delves more deeply into NumPy and SciPy. We will study the advantages of NumPy's built-in functions and vectorization operations through a practical example: finite elements/projection methods in a simple RBC model. By the end of the session you should be comfortable converting serial/loop style code to vectorized code.

Advanced Topics in Statistics and Data Visualization. This session looks at some of the finer points of data manipulation in Pandas, some econometric methods available from StatModels, as well as how to fine tune your Matplotlib output. We'll see how to automate data retrieval using Python to scrape a webpage for form data and post forms to a site. In Pandas we will focus on data IO and data types, merge, grouping, reshaping, and time series. In StatsModels we will look at some examples of OLS and plotting, but these will only be brief examples meant to introduce the library. Finally, in Matplotlib we will look at the matplotlibrc file, discuss axes objects and object oriented plotting (including subplots). We will also look at Seaborn as an alternative plotting package. As with the previous session, these topics only scratch the surface, but should be enough to get you started. By the end you should at least know where to look to find statistical methods for most economics applications, how to deal with data sets, and be familiar with the finer points of plotting in Python.

CPU and GPU Parallelization: How to super compute. This session will introduce the principles of parallelization and compiler based speed ups. We will begin with a discussion of the basics of how a computer carries out computation, including a discussion of hard disks, ram, CPU, and cores. We'll then discuss why Python is so slow and look at some compilation based solutions to slow code. Next, we'll see some simple parallelization algorithms and how to break down a problem to be parallelized. Next we'll look at multiprocessing for CPU based parallelization. Finally, we'll cover how to use a graphics card to do GPU computation using PyCUDA, which can offer from 300 to 3000 times speed up, and some publicly available resources. By the end of this session you should understand the logic behind parallelization and have a general idea of what resources are available in Python. You should also have an understanding of how to use PyCUDA to run code on a GPU.