In [ ]:
import requests
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
Pandas is a Python package for data analysis. Documentation and examples: http://pandas.pydata.org/
To learn how Pandas works, we'll make use of a dataset containing long-run averages of inflation, money growth, and real GDP. The dataset is available here: http://www.briancjenkins.com/data/quantitytheory/csv/qtyTheoryData.csv. Recall that the quantity theory of money implies the following linear relationship between the long-run rate of money growth, the long-run rate of inflation, and the long-run rate of real GDP growth in a country:
\begin{align} \text{inflation} & = \text{money growth} - \text{real GDP growth}, \end{align}Generally, we treat real GDP growth and money supply growth as exogenous so this is a theory about the determination of inflation.
Now, we could download the data manually, but we might as well use Python to do it. The requests module is good for this.
In [ ]:
# Use the requests module to download money growth and inflation data
url = 'http://www.briancjenkins.com/data/quantitytheory/csv/qtyTheoryData.csv'
r = requests.get(url,verify=True)
with open('qtyTheoryData.csv','wb') as newFile:
newFile.write(r.content)
In [ ]:
import pandas as pd
In [ ]:
# Import quantity theory data into a Pandas DataFrame called df with country names as the index.
In [ ]:
# Print the first 5 rows
In [ ]:
# Print the last 5 rows
In [ ]:
# Print the type of df
In [ ]:
# Print the columns of df
In [ ]:
# Create a new variable called money equal to the 'money growth' column and print
In [ ]:
# Print the type of the variable money
In [ ]:
# Print the first 5 rows of just the inflation, money growth, and gdp growth columns
The set of row coordinates is the index. Index values can be strings, numbers, or dates.
In [ ]:
# Print the index of df
In [ ]:
# Create a new variable called usa equal to the 'United States' row and print
In [ ]:
# Print the inflation rate of the United States
In [ ]:
# Print the inflation rate of the United States in a different way
In [ ]:
# Create a new variable called first equal to the first row in the DataFrame and print
Create new columns by name.
In [ ]:
# Create a new column called 'difference' equal to the money growth column minus the inflation column and print the column
In [ ]:
# Print the summary statistics for df
While Pandas' describe function provides some good summary information, NumPy also has some useful functions for computing statistics. For example, the NumPy function corrcoef() computes the coefficient of correlation for two series.
In [ ]:
# Print the correlation coefficient for inflation and money growth
# Print the correlation coefficient for inflation and real GDP growth
# Print the correlation coefficient for money growth and real GDP growth
sort_values() returns a copy of the original DataFrame sorted along the given column. The optional argument ascending is set to True by default, but can be changed to False if you want to print the lowest first.
In [ ]:
# Print rows for the countries with the 10 lowest inflation rates
# Print rows for the countries with the 10 lowest money growth rates
In [ ]:
# Print rows for the countries with the 10 highest inflation rates
# Print rows for the countries with the 10 highest money growth rates
sort_index() returns a copy of the original DataFrame sorted along the index. The optional argument ascending is set to True by default, but can be changed to False if you want to print the lowest first.
In [ ]:
# Print df with the index descending alphabetical order
In [ ]:
# Construct a well-labeled scatter plot of inflation against money growth