Python is a powerful and easy to use programming language. It has a large community of developers and given its open source nature, you can find many solutions, scripts, and help all over the web. It is easy to learn and code, and faster than other high-level programming languages...and did I mention it is free because it is open source
IPython is a very powerful extension to Python that provides: Powerful interactive shells (terminal and Qt-based).
You can download and install Python and its packages for free for your computer from Python.org. While this is the official site, which offers the basic installer and you can try do add any packages you require yourself, a much easier approach, which is almost foolproof is to use Enthought Canopy or Anaconda. Both of these distributions offer academic licenses (Canopy), which allow you to use a larger set of packages.
I use Canopy on OSX, so the rest of this tutorial will assume you are using that distribution. Everything that is done should work on any distribution that has the required packages, since the Python scripts should run (in rinciple) on any of these distributions.
We will use IPython as our computing environment.
Once you have your Python distribution installed you'll be ready to start working. You have two options:
ipython
ipython qtconsole
ipython notebook
While they are all using IPython, each has its advantages and disadvantages. You should to play with them to get a better feeling of which you want to use for which purpose. In my own research I usually use a text editor (TextMate) and the ipython qtconsole
. Although after seeing the power of Ipython's notebooks (see this excellent and in-depth presentation by its creators), I might consider shifting towards the notebook. As you will see, this might prove an excellent environment to do research, homework, replicate papers, etc.
Let's start by running some simple commands at the prompt to do some simple computations.
In [ ]:
1+1-2
In [ ]:
3*2
In [ ]:
3**2
In [ ]:
-1**2
In [ ]:
3*(3-2)
In [ ]:
3*3-2
Notice that Python obeys the usual orders for operators, so exponentiation before multiplication/division, etc.
In [ ]:
1/2
Notce that this answer is wrong if $1,2\in\mathbb{R}$, but Python thinks they are integers, so it forces and integer. In order to have a more natural behavior of division we need
In [ ]:
from __future__ import division
1/2
So what else can we do? Where do we start if we are new? You can use ?
or help()
to get help.
In [ ]:
?
In [ ]:
help()
If you want information about a command, say mycommand
you can use help(mycommand)
, mycommand?
or mycommand??
to get information about how it is used or even see its code.
In [ ]:
help(sum)
In [ ]:
sum?
In [ ]:
sum??
We can print information
In [ ]:
print 'Hello World!'
We can also create variables, which can be of various types
In [ ]:
a=1
b=2
a+b
a
and b
now hold numerical values we can use for computing
In [ ]:
c=[1,2]
d=[[1,2],[3,4]]
print 'c=%s' % c
print 'd=%s' % d
Notice that we have used %s
and %
to let Python know we are passing a string to the print function.
What kind of variables are c
and d
? They look like vectors and matrices, but...
In [ ]:
print 'a*c=%s' % (a*c)
print 'b*d=%s' % (b*d)
In [ ]:
c*d
Actually, Python does not have vectors or matrices directly available. Instead it has lists, sets, arrays, etc., each with its own set of operations. We defined c
and d
as list objects
In [ ]:
type(c)
In [ ]:
type(d)
In [ ]:
type(a)
Luckily Python has a powerful package for numerical computing called Numpy.
In order to use a package in Python or IPython, say mypackage
, you need to import it, by executing
import mypackage
After executing this command, you will have access to the functions and objects defined in mypackage
. For example, if mypackage
has a function squared
that takes a real number x
and computes its square, we can use this function by calling mypackage.squared(x)
. Since the name of some packages might be too long, your can give them a nickname by importing them instead as
import mypackage as myp
so now we could compute the square of x
by calling myp.squared(x)
.
We will see various packages that will be useful to do computations, statistics, plots, etc.
IPython has a command that imports Numpy and Matplotlib (Python's main plotting package). Numpy is imported as np
and Matplotlib as plt
. One could import these by hand by executing
import numpy as np
import matplotlib as plt
but the creators of IPython have optimized the interaction between these packages by running the following command:
%pylab
In [ ]:
%pylab?
I do recommend using the --no-import-all
option in order to ensure you do not contaminate the namespace. Instead it might be best to use
%pylab --no-import-all
%matplotlib
In [ ]:
%matplotlib?
In [ ]:
%pylab --no-import-all
%matplotlib inline
In [ ]:
np?
Let us now recreate c
and d
, but as Numpy arrays instead.
In [ ]:
ca=np.array(c)
da=np.array(d)
print 'ca=%s' % ca
print 'da=%s' % da
We could have created them as matrices intead. Again how you want to cerate them depends on what you will be doing with them. See here for an explanation of the differences between Numpy arrays and matrices.
In [ ]:
cm=np.matrix(c)
dm=np.matrix(d)
print 'cm=%s' % cm
print 'dm=%s' % dm
Let's see some information about these...(this is a good moment to show tab completion...a wonderful feature of IPython, which is not avalable if Python)
In [ ]:
cm.shape
In [ ]:
ca.shape
In [ ]:
dm.
da.
Let's try again some operations on our new arrays and matrices
In [ ]:
cm*dm
In [ ]:
ca*da
In [ ]:
ca.dot(da)
We can create special matrices using Numpy's functions and classes
In [ ]:
print np.ones((3,4))
print np.zeros((2,2))
print np.eye(2)
print np.ones_like(cm)
In [ ]:
np.random
In [ ]:
x0=0
x=[x0]
[x.append(x[-1]+np.random.normal()) for i in range(500)]
plt.plot(x)
plt.title('A simple random walk')
plt.xlabel('Period')
plt.ylabel('Log Income')
plt.show()
We have used some of the functions in Python, Numpy and Matplotlib. But what if we wanted to create our own functions? It is very easy to do so in Python. There are two ways to define functions. Let's use them to define the CRRA utility function $u(c)=\frac{c^{1-\sigma}-1}{1-\sigma}$ and the production function $f(k)=Ak^\alpha$.
The first method is as follows:
In [ ]:
def u(c,sigma):
'''This function returns the value of utility when the CRRA
coefficient is sigma. I.e.
u(c,sigma)=(c**(1-sigma)-1)/(1-sigma) if sigma!=1
and
u(c,sigma)=ln(c) if sigma==1
Usage: u(c,sigma)
'''
if sigma!=1:
u=(c**(1-sigma)-1)/(1-sigma)
else:
u=np.log(c)
return u
This defined the utility function. Let's plot it for $0< c\le5$ and $\sigma\in\{0.5,1,1.5\}$
In [ ]:
c=np.linspace(0.1,5,100)
u1=u(c,.5)
u2=u(c,1)
u3=u(c,1.5)
plt.plot(c,u1,label=r'$\sigma=.5$')
plt.plot(c,u2,label=r'$\sigma=1$')
plt.plot(c,u3,label=r'$\sigma=1.5$')
plt.xlabel(r'$c_t$')
plt.ylabel(r'$u(c_t)$')
plt.title('CRRA Utility function')
plt.legend(loc=4)
plt.show()
While this is nice, it requires us to always have to put a value for the CRRA coefficient. Furthermore, we need to remember if $c$ is the first or second argument. Since we tend to use log-utilities a lot, let us change the definition of the utility function so that it has a default value for $\sigma$ equal to 1
In [ ]:
def u(c,sigma=1):
'''This function returns the value of utility when the CRRA
coefficient is sigma. I.e.
u(c,sigma)=(c**(1-sigma)-1)/(1-sigma) if sigma!=1
and
u(c,sigma)=ln(c) if sigma==1
Usage: u(c,sigma=value), where sigma=1 is the default
'''
if sigma!=1:
u=(c**(1-sigma)-1)/(1-sigma)
else:
u=np.log(c)
return u
In [ ]:
sigma1=.25
sigma3=1.25
u1=u(c,sigma=sigma1)
u2=u(c)
u3=u(c,sigma=sigma3)
plt.plot(c,u1,label=r'$\sigma='+str(sigma1)+'$')
plt.plot(c,u2,label=r'$\sigma=1$')
plt.plot(c,u3,label=r'$\sigma='+str(sigma3)+'$')
plt.xlabel(r'$c_t$')
plt.ylabel(r'$u(c_t)$')
plt.title('CRRA Utility function')
plt.legend(loc=4)
plt.show()
The second method is to use the lambda
notation, which allows you to define functions in one line or without giving the function a name.
In [ ]:
squared= lambda x: x**2
squared(2)
Let's write a script that prints "Hello World!"
In [ ]:
%%file?
In [ ]:
%%file helloworld.py
#!/usr/bin/env python
# coding=utf-8
'''
My First script in Python
Author: Me
E-mail: me@me.com
Website: http://me.com
GitHub: https://github.com/me
Date: Today
This code computes Random Walks and graphs them
'''
'''
from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
'''
print 'Hello World!'
Let's run that script
In [ ]:
%run helloworld.py
In [ ]:
import time
time.sleep(60*5)
print "It's time"
In [ ]:
import pandas as pd
You can use Pandas to download data from various sources, including FRED and the World Bank. See here. As an exercise, go to the World Bank WDI data page and download as an excel file the "Mobile cellular subscriptions (per 100 people)" and as a csv file the "GDP per capita, PPP (constant 2011 international $)" series, the name of the series are IT.CEL.SETS.P2 and NY.GDP.PCAP.PP.KD. Save the file as wdimobile.csv and wdigdppc.xlsx and import it into python by issuing the following command:
In [ ]:
dfincome=pd.read_csv('./WDI/wdigdppc.csv', skiprows=2)
dfmobile=pd.read_excel('./WDI/wdimobile.xls', skiprows=2)
Now you should have a data frame with the data you downloaded.
Let's see what they look like...
In [ ]:
dfincome
In [ ]:
dfmobile
Notice that these data frames look like spreadsheets or data tables. Columns have names that can be used to call the data, e.g.
In [ ]:
dfmobile.columns
In [ ]:
dfmobile['Country Code']
We can use columns to compute additional information. For example the growth rate of income per capite between 2000 and 2005 is
In [ ]:
growth=np.log(dfincome['2005'])-np.log(dfincome['2000'])
In [ ]:
growth
Notice we do not know which country the growth rate belongs to. We could create a column in dfincome that holds the growth value or we can change the index of the data frame so that it keeps the country code (this is useful!)
In [ ]:
dfincome['growth']=np.log(dfincome['2005'])-np.log(dfincome['2000'])
In [ ]:
dfincome[['Country Code','growth']]
In [ ]:
dfincome.set_index('Country Code', inplace=True)
In [ ]:
growth=np.log(dfincome['2005'])-np.log(dfincome['2000'])
growth.name='growth'
growth
Let's delete the growth column from dfincome
In [ ]:
dfincome.drop('growth',axis=1, inplace=True)
Let's compute the growth of cell phone subscription for the period 2000-2005
In [ ]:
dfmobile.set_index('Country Code',inplace=True)
growthmobile=np.log(dfmobile['2005'])-np.log(dfmobile['2000'])
In [ ]:
growthmobile.name='mobile'
In [ ]:
growthmobile
Let's see the descriptive stats for each growth process
In [ ]:
growth.describe()
In [ ]:
growthmobile.describe()
Notice that growthmobile has infinite mean, i.e. some country started with zero coverage and now has positive one. Let's see for which ccountries that is the case and change those observations to NaN.
In [ ]:
growthmobile.ix[growthmobile==np.inf]
In [ ]:
growthmobile.ix[growthmobile==np.inf]=np.nan
In [ ]:
growthmobile.describe()
Let's compute the correlation between between both growth rates
In [ ]:
growth.corr(growthmobile)
Let's run an OLS regression between both growth rates, but first let's merge the data together. First we merge the data using the pd.merge command, which allows us to merge two data frames.
In [ ]:
mydata=pd.merge(growth.reset_index(),growthmobile.reset_index())
mydata
Second, we use the pd.concat command that concatenates series or data frames into data frames.
In [ ]:
grates=pd.concat([growth,growthmobile],axis=1)
grates
Now let's import the statsmodels module to run the regression.
In [ ]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
from IPython.display import Latex
In [ ]:
mod = sm.OLS(mydata['growth'],sm.add_constant(mydata['mobile']), missing='drop').fit()
mod.summary2()
In [ ]:
mod = smf.ols(formula='growth ~ mobile', data=mydata[['growth','mobile']], missing='drop').fit()
mod.summary2()
In [ ]:
mod = smf.ols(formula='growth ~ mobile', data=grates, missing='drop').fit()
mod.summary2()
In [ ]:
mysummary=mod.summary2()
Latex(mysummary.as_latex())
Using Pandas and Statsmodels write a Python script that:
Some additional useful tools:
In [ ]:
%%latex
\begin{eqnarray}
\nabla \times \vec{\mathbf{B}} -\, \frac1c\, \frac{\partial\vec{\mathbf{E}}}{\partial t} & = \frac{4\pi}{c}\vec{\mathbf{j}} \\
\nabla \cdot \vec{\mathbf{E}} & = 4 \pi \rho \\
\nabla \times \vec{\mathbf{E}}\, +\, \frac1c\, \frac{\partial\vec{\mathbf{B}}}{\partial t} & = \vec{\mathbf{0}} \\
\nabla \cdot \vec{\mathbf{B}} & = 0
\end{eqnarray}
Some examples of plots...
In [ ]:
dfincome[[str(i) for i in range(1990,2013)]].loc['USA'].plot()
In [ ]:
plt.scatter(grates.growth,grates.mobile)
In [ ]:
from pandas.io import wb
wb.search('cell*')
Notebook written by Ömer Özak for his First Year Ph.D. students in Economics at Southern Methodist University. Feel free to distribute or enlarge or contribute.