In [ ]:
import requests
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
%matplotlib inline

Class 6: More Pandas

Objectives:

  1. Analize some cross-country GDP per capita data
  2. Create a new DataFrame
  3. Export a DataFrame to a csv file

Exercise: Cross-country income per capita statistics

Download a file called corssCountryIncomePerCapita.csv by visiting http://www.briancjenkins.com/data/international/ and following the link for: "GDP per capita (constant US 2005 PPP $, levels)"


In [ ]:
# Use the requests module to download cross country GDP per capita
url = ''
filename=''
r = requests.get(url,verify=True)

with open(filename,'wb') as newFile:
    
    newFile.write(r.content)

In [ ]:
# Import the cross-country GDP data into a DataFrame called incomeDf with index_col=0


# Print the first five rows of incomeDf

In [ ]:
# Print the columns of incomeDf

In [ ]:
# Print the number of countries represented in incomeDf

In [ ]:
# Print the index of incomeDf

In [ ]:
# Print the number of years of data in incomeDf

In [ ]:
# Print the first five rows of the 'United States - USA' column of incomeDf

In [ ]:
# Print the last five rows of the 'United States - USA' column of incomeDf

In [ ]:
# Create a plot of income per capita from 1960 to 2011 for the US

In [ ]:
# Create a plot of income per capita from 1960 to 2011 for another country in the dataset

In [ ]:
# Create a new variable called income60 equal to the 1960 row from incomeDf


# Print the index of income60

In [ ]:
# Print the average world income per capita in 1960


# Print the standard deviation in world income per capita in 1960

In [ ]:
# Print the names of the five countries with the highest five incomes per capita in 1960

In [ ]:
# Print the names of the five countries with the lowest five incomes per capita in 1960

In [ ]:
# Create a new variable called income11 equal to the 2011 row from incomeDf


# Print the average world income per capita in 2011


# Print the standard deviation in world income per capita in 2011

In [ ]:
# Print the names of the five countries with the highest five incomes per capita in 2011

In [ ]:
# Print the names of the five countries with the lowest five incomes per capita in 2011

Creating a new DataFrame

Now we'll use our cross-country income per capita data to create a new DataFrame containing growth data.


In [ ]:
# Create a DataFrame called growthDf with columns 'income 1960' and 'income 2011' equal to income per capita
# in 1960 and 2011 and an index equal to the index of income60

In [ ]:
# Create a new column equal to the difference between 'income 2011' and 'income 1960' for each country

Let $y_t$ denotes income per capita for some country in some year $t$ and let $g$ denotes the average annual growth in income per capita between years 0 and $T$. $g$ is defined by: \begin{align} y_T & = (1+g)^T y_0 \end{align} which implies: \begin{align} g & = \left(\frac{y_T}{y_0}\right)^{1/T} - 1 \end{align} Note that since our data are from 1960 to 2011, $T = 51$. Which is also equal to len(incomeDf.index)-1.


In [ ]:
# Create a new column equal to the average annual growth rate between for each country between 1960 and 2011

In [ ]:
# Print the first five rows of growthDf

In [ ]:
# Print the names of the five countries with the highest average annual growth rates

In [ ]:
# Print the names of the five countries with the lowest average annual growth rates

In [ ]:
# Print the average annual growth rate of income per capita from 1960 to 2011


# Print the standard deviation of the annual growth rate of income per capita from 1960 to 2011

In [ ]:
# Construct a scatter plot:
#    Use the plt.scatter function
#    income per capita in 1960 on the horizontal axis and average annual growth rate on the vertical axis
#    Set the opacity of the points to something like 0.25 - 0.35 
#    Label the plot clearly with axis labels and a title

Exporting a DataFrame to csv

Use the DataFrame method to_csv().


In [ ]:
# Export the growthDf DataFrame to a csv file called 'growth_data.csv'