Python for Scientific Computing in Economics

... background material available at https://github.com/softecon/talks

Why Python?

  • general-purpose
  • widely used
  • high-level
  • readability
  • extensibility
  • active community
  • numerous interfaces
  • 
    
    In [1]:
    import this
    
    
    
    
    The Zen of Python, by Tim Peters
    
    Beautiful is better than ugly.
    Explicit is better than implicit.
    Simple is better than complex.
    Complex is better than complicated.
    Flat is better than nested.
    Sparse is better than dense.
    Readability counts.
    Special cases aren't special enough to break the rules.
    Although practicality beats purity.
    Errors should never pass silently.
    Unless explicitly silenced.
    In the face of ambiguity, refuse the temptation to guess.
    There should be one-- and preferably only one --obvious way to do it.
    Although that way may not be obvious at first unless you're Dutch.
    Now is better than never.
    Although never is often better than *right* now.
    If the implementation is hard to explain, it's a bad idea.
    If the implementation is easy to explain, it may be a good idea.
    Namespaces are one honking great idea -- let's do more of those!
    

    Why Python for Scientific Computing?

    • Python is used by computer programmers and scientists alike. Thus, tools from software engineering are readily available.

    • Python is an open-source project. This ensures that all implementation details can be critically examined. There are no licence costs and low barriers to recomputability.

    • Python can be easily linked to high-performance languages such as C and Fortran. Python is ideal for prototyping with a focus on readability, design patterns, and ease of testing.

    • Python has numerous high-quality libraries for scientific computing under active development.

    What do you need to get started?

  • SciPy Stack
  • Basic Example
  • Integrated Development Environment
  • Additional Resources
  • First things first, here is the ``Hello, World!'' program in Python.

    
    
    In [2]:
    print("Hello, World!")
    
    
    
    
    Hello, World!
    

    SciPy Stack

    Most of our required tools are part of the SciPy Stack, a collection of open source software for scientific computing in Python.

  • SciPy Library
  • NumPy
  • Matplotlib
  • pandas
  • SymPy
  • IPython
  • nose
  • Depending on your particular specialization, this package might be of additional interest to you, e.g. statsmodels.

    Basic Example

    To get a feel for the language, let us work with a basic example. We will set up a simple Ordinary Least Squares (OLS) model.

    $$Y=Xβ+ϵ$$

    We start by simulating a synthetic dataset. Then we fit a basic OLS regression and assess the quality of its prediction.

    Alternatives

    Pseudorandom Number Generation

    
    
    In [3]:
    # Import relevant libraries from the SciPy Stack
    import numpy as np
    
    # Specify parametrization
    num_agents = 1000
    num_covars = 3
    
    betas_true = np.array([0.22, 0.30, -0.1]).T
    
    # Set a seed to ensure recomputability in light of randomness
    np.random.seed(4292367295)
    
    # Sample exogenous agent characteristics from a uniform distribution in 
    # a given shape
    X = np.random.rand(num_agents, num_covars)
    
    # Sample random disturbances from a standard normal distribution and rescale
    eps = np.random.normal(scale=0.1, size=num_agents)
    
    # Construct endogenous agent characteristic
    Y = np.dot(X, betas_true) + eps
    

    Statistical Analysis

    
    
    In [4]:
    # Import relevant libraries from the SciPy Stack
    import statsmodels.api as sm
    
    # Specify and fit the model
    rslt = sm.OLS(Y, X).fit()
    
    
    
    In [5]:
    # Provide some summary information
    print(rslt.summary())
    
    
    
    
                                OLS Regression Results                            
    ==============================================================================
    Dep. Variable:                      y   R-squared:                       0.835
    Model:                            OLS   Adj. R-squared:                  0.834
    Method:                 Least Squares   F-statistic:                     1682.
    Date:                Wed, 03 Feb 2016   Prob (F-statistic):               0.00
    Time:                        17:49:15   Log-Likelihood:                 864.68
    No. Observations:                1000   AIC:                            -1723.
    Df Residuals:                     997   BIC:                            -1709.
    Df Model:                           3                                         
    Covariance Type:            nonrobust                                         
    ==============================================================================
                     coef    std err          t      P>|t|      [95.0% Conf. Int.]
    ------------------------------------------------------------------------------
    x1             0.2281      0.009     24.624      0.000         0.210     0.246
    x2             0.2841      0.009     30.353      0.000         0.266     0.302
    x3            -0.0898      0.009     -9.484      0.000        -0.108    -0.071
    ==============================================================================
    Omnibus:                        7.301   Durbin-Watson:                   1.964
    Prob(Omnibus):                  0.026   Jarque-Bera (JB):                7.417
    Skew:                           0.202   Prob(JB):                       0.0245
    Kurtosis:                       2.877   Cond. No.                         3.22
    ==============================================================================
    
    Warnings:
    [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
    

    Data Visualization

    
    
    In [6]:
    # Import relevant libraries from the SciPy Stack
    import matplotlib.pyplot as plt
    
    # Initialize canvas
    ax = plt.figure(figsize=(12, 8)).add_subplot(111, axisbg='white')
    
    # Plot actual and fitted values
    ax.plot(np.dot(X, rslt.params), Y, 'o', label='True')
    ax.plot(np.dot(X, rslt.params), rslt.fittedvalues, 'r--.', label="Predicted")
    
    # Set axis labels and ranges
    ax.set_xlabel(r'$X\hat{\beta}$', fontsize=20)
    ax.set_ylabel(r'$Y$', fontsize=20)
    
    # Remove first element on y-axis
    ax.yaxis.get_major_ticks()[0].set_visible(False)
    
    # Add legend
    plt.legend(loc='upper center', bbox_to_anchor=(0.50, -0.10),
        fancybox=False, frameon=False, shadow=False, ncol=2, fontsize=20)
    
    # Add title
    plt.suptitle('Synthetic Sample', fontsize=20)
    
    # Save figure
    plt.savefig('images/scatterplot.png', bbox_inches='tight', format='png')
    
    
    
    In [7]:
    from IPython.display import Image
    Image(filename='images/scatterplot.png', width=700, height=700)
    
    
    
    
    Out[7]:

    Integrated Development Environment

    PyCharm

    PyCharm is developed by the Czech company JetBrains. It is free to use for educational purposes. However, it is a commerical product and thus very well documented. Numerous resources are available to get you started.

    If you would like to check out some alternatives: (1) Spyder, (2) PyDev.

    Potential Benefits

    • Unit Testing Integration

    • Graphical Debugger

    • Version Control Integration

    • Coding Assistance

      • Code Completion

      • Syntax and Error Highlighting

      • ...

    Let us check it all out for our Basic Example.

    Graphical User Interface

    Conclusion

    Next Steps

    • Set up your machine for scientific computing with Python

      • Visit Continuum Analytics and download Anaconda for your own computer. Anaconda is a free Python distribution with all the required packages to get you started.

      • Install PyCharm. Make sure to hook it up to your Anacadona distribution (instructions).

    • Check out the additional resources to dive more into the details.

    Additional Resources

    Numerous additional lecture notes, tutorials, online courses, and books are available online.

    Contact



    Philipp Eisenhauer



    Software Engineering for Economists Initiative

    
    
    In [8]:
    import urllib; from IPython.core.display import HTML
    HTML(urllib.urlopen('http://bit.ly/1K5apRH').read())
    
    
    
    
    Out[8]: