In [ ]:

    
import requests
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

Class 5: Pandas

Pandas is a Python package for data analysis. Documentation and examples: http://pandas.pydata.org/

Pandas basics

To learn how Pandas works, we'll make use of a dataset containing long-run averages of inflation, money growth, and real GDP. The dataset is available here: http://www.briancjenkins.com/data/quantitytheory/csv/qtyTheoryData.csv. Recall that the quantity theory of money implies the following linear relationship between the long-run rate of money growth, the long-run rate of inflation, and the long-run rate of real GDP growth in a country:

\begin{align} \text{inflation} & = \text{money growth} - \text{real GDP growth}, \end{align}

Generally, we treat real GDP growth and money supply growth as exogenous so this is a theory about the determination of inflation.

Now, we could download the data manually, but we might as well use Python to do it. The requests module is good for this.



In [ ]:

    
# Use the requests module to download money growth and inflation data
url = 'http://www.briancjenkins.com/data/quantitytheory/csv/qtyTheoryData.csv'
r = requests.get(url,verify=True)

with open('qtyTheoryData.csv','wb') as newFile:
    
    newFile.write(r.content)

Import Pandas



In [ ]:

    
import pandas as pd

Import data from a csv file

Pandas has a function called read_csv() for reading data from a csv file into a Pandas DataFrame object. Let's import the quantity thery data into a variable called df.



In [ ]:

    
# Import quantity theory data into a Pandas DataFrame called df with country names as the index.



In [ ]:

    
# Print the first 5 rows



In [ ]:

    
# Print the last 5 rows



In [ ]:

    
# Print the type of df

Properties of `DataFrame` objects

Like entries in a spreadsheet file, elements in a DataFrame object have row and column coordinates. Column names are always strings.



In [ ]:

    
# Print the columns of df



In [ ]:

    
# Create a new variable called money equal to the 'money growth' column and print



In [ ]:

    
# Print the type of the variable money



In [ ]:

    
# Print the first 5 rows of just the inflation, money growth, and gdp growth columns

The set of row coordinates is the index. Index values can be strings, numbers, or dates.



In [ ]:

    
# Print the index of df



In [ ]:

    
# Create a new variable called usa equal to the 'United States' row and print



In [ ]:

    
# Print the inflation rate of the United States



In [ ]:

    
# Print the inflation rate of the United States in a different way



In [ ]:

    
# Create a new variable called first equal to the first row in the DataFrame and print

Create new columns by name.



In [ ]:

    
# Create a new column called 'difference' equal to the money growth column minus the inflation column and print the column

Methods

A Pandas DataFrame has a bunch of useful methods defined for it. describe() returns some summary statistics.



In [ ]:

    
# Print the summary statistics for df

While Pandas' describe function provides some good summary information, NumPy also has some useful functions for computing statistics. For example, the NumPy function corrcoef() computes the coefficient of correlation for two series.



In [ ]:

    
# Print the correlation coefficient for inflation and money growth


# Print the correlation coefficient for inflation and real GDP growth


# Print the correlation coefficient for money growth and real GDP growth

sort_values() returns a copy of the original DataFrame sorted along the given column. The optional argument ascending is set to True by default, but can be changed to False if you want to print the lowest first.



In [ ]:

    
# Print rows for the countries with the 10 lowest inflation rates


# Print rows for the countries with the 10 lowest money growth rates



In [ ]:

    
# Print rows for the countries with the 10 highest inflation rates


# Print rows for the countries with the 10 highest money growth rates

sort_index() returns a copy of the original DataFrame sorted along the index. The optional argument ascending is set to True by default, but can be changed to False if you want to print the lowest first.



In [ ]:

    
# Print df with the index descending alphabetical order

Quick plotting example

Construct a graph that visually confirms the quantity theory of money by making a scatter plot with average money growth on the horizontal axis and average inflation on the vertical axis. Add a 45 degree line and labels and a title.



In [ ]:

    
# Construct a well-labeled scatter plot of inflation against money growth