In [1]:
import requests
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

Class 5: Pandas

Pandas is a Python package for data analysis. Documentation and examples: http://pandas.pydata.org/

Pandas basics

To learn how Pandas works, we'll make use of a dataset containing long-run averages of inflation, money growth, and real GDP. The dataset is available here: http://www.briancjenkins.com/data/quantitytheory/csv/qtyTheoryData.csv. Recall that the quantity theory of money implies the following linear relationship between the long-run rate of money growth, the long-run rate of inflation, and the long-run rate of real GDP growth in a country:

\begin{align} \text{inflation} & = \text{money growth} - \text{real GDP growth}, \end{align}

Generally, we treat real GDP growth and money supply growth as exogenous so this is a theory about the determination of inflation.

Now, we could download the data manually, but we might as well use Python to do it. The requests module is good for this.


In [2]:
# Use the requests module to download money growth and inflation data
url = 'http://www.briancjenkins.com/data/quantitytheory/csv/qtyTheoryData.csv'
r = requests.get(url,verify=True)

with open('qtyTheoryData.csv','wb') as newFile:
    
    newFile.write(r.content)

Import Pandas


In [3]:
import pandas as pd

Import data from a csv file

Pandas has a function called read_csv() for reading data from a csv file into a Pandas DataFrame object. Let's import the quantity thery data into a variable called df.


In [4]:
# Import quantity theory data into a Pandas DataFrame called df with country names as the index.
df = pd.read_csv('qtyTheoryData.csv',index_col=0)

In [5]:
# Print the first 5 rows
print(df.head(5))


                    iso code  observations  inflation  money growth  \
country                                                               
Albania                  ALB            21    0.05010       0.12230   
Algeria                  DZA            51    0.10684       0.16024   
Angola                   AGO            20    0.85355       1.00940   
Antigua and Barbuda      ATG            38    0.04139       0.09630   
Argentina                ARG            54    0.72681       0.77245   

                     gdp growth  
country                          
Albania                 0.04788  
Algeria                 0.03819  
Angola                  0.07152  
Antigua and Barbuda     0.03683  
Argentina               0.02646  

In [6]:
# Print the last 5 rows
print(df.tail())


                   iso code  observations  inflation  money growth  gdp growth
country                                                                       
Venezuela, RB           VEN            53    0.20298       0.26996     0.02740
West Bank and Gaza      PSE            17    0.03632       0.11749     0.02821
Yemen, Rep.             YEM            24    0.13884       0.13238     0.03541
Zambia                  ZMB            27    0.21917       0.23465     0.01163
Zimbabwe                ZWE            32   -0.00124      -0.12919     0.00813

In [7]:
# Print the type of df
print(type(df))


<class 'pandas.core.frame.DataFrame'>

Properties of DataFrame objects

Like entries in a spreadsheet file, elements in a DataFrame object have row and column coordinates. Column names are always strings.


In [8]:
# Print the columns of df
print(df.columns)


Index(['iso code', 'observations', 'inflation', 'money growth', 'gdp growth'], dtype='object')

In [9]:
# Create a new variable called money equal to the 'money growth' column and print
money = df['money growth']
print(money)


country
Albania                           0.12230
Algeria                           0.16024
Angola                            1.00940
Antigua and Barbuda               0.09630
Argentina                         0.77245
Armenia                           0.46411
Aruba                             0.08128
Australia                         0.10181
Austria                           0.07491
Azerbaijan                        0.50149
Bahamas, The                      0.06965
Bahrain                           0.08263
Bangladesh                        0.13523
Barbados                          0.10833
Belarus                           0.61733
Belgium                           0.06306
Belize                            0.11148
Benin                             0.10734
Bhutan                            0.18748
Bolivia                           0.42180
Bosnia and Herzegovina            0.19611
Botswana                          0.15783
Brazil                            0.89717
Brunei Darussalam                 0.02883
Bulgaria                          0.34614
Burkina Faso                      0.10509
Burundi                           0.13430
Cabo Verde                        0.09847
Cambodia                          0.21306
Cameroon                          0.09689
                                   ...   
Sri Lanka                         0.12114
St. Kitts and Nevis               0.14659
St. Lucia                         0.08911
St. Vincent and the Grenadines    0.09639
Sudan                             0.29333
Suriname                          0.29243
Swaziland                         0.14436
Sweden                            0.10587
Switzerland                       0.06059
Syrian Arab Republic              0.15539
Tajikistan                        0.30609
Tanzania                          0.20918
Thailand                          0.09645
Timor-Leste                       0.18530
Togo                              0.09973
Tonga                             0.09616
Trinidad and Tobago               0.11841
Tunisia                           0.10958
Turkey                            0.37305
Uganda                            0.36831
Ukraine                           0.55392
United Arab Emirates              0.13659
United States                     0.05644
Uruguay                           0.40479
Vanuatu                           0.08102
Venezuela, RB                     0.26996
West Bank and Gaza                0.11749
Yemen, Rep.                       0.13238
Zambia                            0.23465
Zimbabwe                         -0.12919
Name: money growth, dtype: float64

In [10]:
# Print the type of the variable money
print(type(money))


<class 'pandas.core.series.Series'>

In [11]:
# Print the first 5 rows of just the inflation, money growth, and gdp growth columns
print(df[['inflation','money growth','gdp growth']].head())


                     inflation  money growth  gdp growth
country                                                 
Albania                0.05010       0.12230     0.04788
Algeria                0.10684       0.16024     0.03819
Angola                 0.85355       1.00940     0.07152
Antigua and Barbuda    0.04139       0.09630     0.03683
Argentina              0.72681       0.77245     0.02646

The set of row coordinates is the index. Index values can be strings, numbers, or dates.


In [12]:
# Print the index of df
print(df.index)


Index(['Albania', 'Algeria', 'Angola', 'Antigua and Barbuda', 'Argentina',
       'Armenia', 'Aruba', 'Australia', 'Austria', 'Azerbaijan',
       ...
       'Ukraine', 'United Arab Emirates', 'United States', 'Uruguay',
       'Vanuatu', 'Venezuela, RB', 'West Bank and Gaza', 'Yemen, Rep.',
       'Zambia', 'Zimbabwe'],
      dtype='object', name='country', length=176)

In [13]:
# Create a new variable called usa equal to the 'United States' row and print
usa = df.loc['United States']
print(usa)


iso code            USA
observations         54
inflation       0.03374
money growth    0.05644
gdp growth      0.03075
Name: United States, dtype: object

In [14]:
# Print the inflation rate of the United States
print(df.loc['United States']['inflation'])


0.03374

In [15]:
# Print the inflation rate of the United States in a different way
print(df['inflation'].loc['United States'])


0.03374

In [16]:
# Create a new variable called first equal to the first row in the DataFrame and print
first = df.iloc[0]
print(first)


iso code            ALB
observations         21
inflation        0.0501
money growth     0.1223
gdp growth      0.04788
Name: Albania, dtype: object

Create new columns by name.


In [17]:
# Create a new column called 'difference' equal to the money growth column minus the inflation column and print the column
df['difference'] = df['money growth'] - df['inflation']
print(df['difference'])


country
Albania                           0.07220
Algeria                           0.05340
Angola                            0.15585
Antigua and Barbuda               0.05491
Argentina                         0.04564
Armenia                           0.02129
Aruba                             0.04959
Australia                         0.05113
Austria                           0.05871
Azerbaijan                        0.07391
Bahamas, The                      0.02706
Bahrain                           0.05280
Bangladesh                        0.05657
Barbados                          0.03839
Belarus                           0.03194
Belgium                           0.04620
Belize                            0.08922
Benin                             0.05926
Bhutan                            0.12036
Bolivia                           0.08569
Bosnia and Herzegovina            0.15562
Botswana                          0.06465
Brazil                            0.03085
Brunei Darussalam                -0.02619
Bulgaria                          0.06444
Burkina Faso                      0.06629
Burundi                           0.04095
Cabo Verde                        0.06765
Cambodia                          0.17857
Cameroon                          0.04393
                                   ...   
Sri Lanka                         0.02782
St. Kitts and Nevis               0.10880
St. Lucia                         0.05599
St. Vincent and the Grenadines    0.04958
Sudan                             0.04823
Suriname                          0.03263
Swaziland                         0.04699
Sweden                            0.03408
Switzerland                       0.04378
Syrian Arab Republic              0.06709
Tajikistan                        0.13993
Tanzania                          0.06045
Thailand                          0.05519
Timor-Leste                       0.14941
Togo                              0.05269
Tonga                             0.04178
Trinidad and Tobago               0.04752
Tunisia                           0.05148
Turkey                            0.06091
Uganda                            0.10012
Ukraine                          -0.03054
United Arab Emirates              0.10121
United States                     0.02270
Uruguay                           0.03633
Vanuatu                           0.04592
Venezuela, RB                     0.06698
West Bank and Gaza                0.08117
Yemen, Rep.                      -0.00646
Zambia                            0.01548
Zimbabwe                         -0.12795
Name: difference, dtype: float64

Methods

A Pandas DataFrame has a bunch of useful methods defined for it. describe() returns some summary statistics.


In [18]:
# Print the summary statistics for df
print(df.describe())


       observations   inflation  money growth  gdp growth  difference
count    176.000000  176.000000    176.000000  176.000000  176.000000
mean      36.460227    0.117901      0.179125    0.036382    0.061224
std       14.654833    0.145099      0.146216    0.021459    0.037845
min       13.000000   -0.001240     -0.129190   -0.023680   -0.127950
25%       22.000000    0.041098      0.099415    0.023295    0.042710
50%       38.000000    0.064320      0.135195    0.036795    0.054685
75%       53.000000    0.124485      0.197527    0.047000    0.074672
max       54.000000    0.866320      1.009400    0.151800    0.178570

While Pandas' describe function provides some good summary information, NumPy also has some useful functions for computing statistics. For example, the NumPy function corrcoef() computes the coefficient of correlation for two series.


In [19]:
# Print the correlation coefficient for inflation and money growth
print('corr of inflation and money growth',np.corrcoef(df['inflation'],df['money growth'])[0][1])

# Print the correlation coefficient for inflation and real GDP growth
print('corr of inflation and money growth',np.corrcoef(df['inflation'],df['gdp growth'])[0][1])

# Print the correlation coefficient for money growth and real GDP growth
print('corr of inflation and money growth',np.corrcoef(df['money growth'],df['gdp growth'])[0][1])


corr of inflation and money growth 0.96627499535
corr of inflation and money growth 0.00915684541961
corr of inflation and money growth 0.155152280798

sort_values() returns a copy of the original DataFrame sorted along the given column. The optional argument ascending is set to True by default, but can be changed to False if you want to print the lowest first.


In [20]:
# Print rows for the countries with the 10 lowest inflation rates
print(df.sort_values('inflation').head(10))

# Print rows for the countries with the 10 lowest money growth rates
print(df.sort_values('money growth').head(10))


                     iso code  observations  inflation  money growth  \
country                                                                
Zimbabwe                  ZWE            32   -0.00124      -0.12919   
Germany                   DEU            16    0.01051       0.07344   
Hong Kong SAR, China      HKG            24    0.01395       0.09846   
France                    FRA            16    0.01421       0.06584   
Ireland                   IRL            16    0.01436       0.12348   
Kosovo                    KSV            14    0.01555       0.08659   
Greece                    GRC            14    0.01570       0.01423   
Austria                   AUT            16    0.01620       0.07491   
Finland                   FIN            16    0.01645       0.06528   
Switzerland               CHE            35    0.01681       0.06059   

                      gdp growth  difference  
country                                       
Zimbabwe                 0.00813    -0.12795  
Germany                  0.01117     0.06293  
Hong Kong SAR, China     0.03610     0.08451  
France                   0.01190     0.05163  
Ireland                  0.03079     0.10912  
Kosovo                   0.03341     0.07104  
Greece                  -0.00455    -0.00147  
Austria                  0.01414     0.05871  
Finland                  0.01383     0.04883  
Switzerland              0.01696     0.04378  
                      iso code  observations  inflation  money growth  \
country                                                                 
Zimbabwe                   ZWE            32   -0.00124      -0.12919   
Greece                     GRC            14    0.01570       0.01423   
Brunei Darussalam          BRN            16    0.05502       0.02883   
Portugal                   PRT            16    0.02036       0.03117   
Micronesia, Fed. Sts.      FSM            20    0.01993       0.04259   
Italy                      ITA            16    0.01863       0.04956   
United States              USA            54    0.03374       0.05644   
Switzerland                CHE            35    0.01681       0.06059   
Belgium                    BEL            16    0.01686       0.06306   
Finland                    FIN            16    0.01645       0.06528   

                       gdp growth  difference  
country                                        
Zimbabwe                  0.00813    -0.12795  
Greece                   -0.00455    -0.00147  
Brunei Darussalam         0.01039    -0.02619  
Portugal                  0.00307     0.01081  
Micronesia, Fed. Sts.    -0.00178     0.02266  
Italy                     0.00147     0.03093  
United States             0.03075     0.02270  
Switzerland               0.01696     0.04378  
Belgium                   0.01428     0.04620  
Finland                   0.01383     0.04883  

In [21]:
# Print rows for the countries with the 10 highest inflation rates
print(df.sort_values('inflation',ascending=False).head(10))

# Print rows for the countries with the 10 highest money growth rates
print(df.sort_values('money growth',ascending=False).head(10))


           iso code  observations  inflation  money growth  gdp growth  \
country                                                                  
Brazil          BRA            54    0.86632       0.89717     0.04124   
Angola          AGO            20    0.85355       1.00940     0.07152   
Argentina       ARG            54    0.72681       0.77245     0.02646   
Nicaragua       NIC            54    0.61343       0.66175     0.02434   
Belarus         BLR            21    0.58539       0.61733     0.04919   
Ukraine         UKR            23    0.58446       0.55392    -0.01037   
Peru            PER            54    0.47256       0.53655     0.03495   
Armenia         ARM            23    0.44282       0.46411     0.05466   
Azerbaijan      AZE            23    0.42758       0.50149     0.05468   
Moldova         MDA            24    0.42727       0.40570    -0.00689   

            difference  
country                 
Brazil         0.03085  
Angola         0.15585  
Argentina      0.04564  
Nicaragua      0.04832  
Belarus        0.03194  
Ukraine       -0.03054  
Peru           0.06399  
Armenia        0.02129  
Azerbaijan     0.07391  
Moldova       -0.02157  
           iso code  observations  inflation  money growth  gdp growth  \
country                                                                  
Angola          AGO            20    0.85355       1.00940     0.07152   
Brazil          BRA            54    0.86632       0.89717     0.04124   
Argentina       ARG            54    0.72681       0.77245     0.02646   
Nicaragua       NIC            54    0.61343       0.66175     0.02434   
Belarus         BLR            21    0.58539       0.61733     0.04919   
Ukraine         UKR            23    0.58446       0.55392    -0.01037   
Peru            PER            54    0.47256       0.53655     0.03495   
Azerbaijan      AZE            23    0.42758       0.50149     0.05468   
Armenia         ARM            23    0.44282       0.46411     0.05466   
Bolivia         BOL            54    0.33611       0.42180     0.03016   

            difference  
country                 
Angola         0.15585  
Brazil         0.03085  
Argentina      0.04564  
Nicaragua      0.04832  
Belarus        0.03194  
Ukraine       -0.03054  
Peru           0.06399  
Azerbaijan     0.07391  
Armenia        0.02129  
Bolivia        0.08569  

sort_index() returns a copy of the original DataFrame sorted along the index. The optional argument ascending is set to True by default, but can be changed to False if you want to print the lowest first.


In [22]:
# Print df with the index descending alphabetical order
print(df.sort_index(ascending=False))


                               iso code  observations  inflation  \
country                                                            
Zimbabwe                            ZWE            32   -0.00124   
Zambia                              ZMB            27    0.21917   
Yemen, Rep.                         YEM            24    0.13884   
West Bank and Gaza                  PSE            17    0.03632   
Venezuela, RB                       VEN            53    0.20298   
Vanuatu                             VUT            36    0.03510   
Uruguay                             URY            54    0.36846   
United States                       USA            54    0.03374   
United Arab Emirates                ARE            40    0.03538   
Ukraine                             UKR            23    0.58446   
Uganda                              UGA            33    0.26819   
Turkey                              TUR            54    0.31214   
Tunisia                             TUN            50    0.05810   
Trinidad and Tobago                 TTO            54    0.07089   
Tonga                               TON            26    0.05438   
Togo                                TGO            53    0.04704   
Timor-Leste                         TLS            13    0.03589   
Thailand                            THA            54    0.04126   
Tanzania                            TZA            27    0.14873   
Tajikistan                          TJK            17    0.16616   
Syrian Arab Republic                SYR            47    0.08830   
Switzerland                         CHE            35    0.01681   
Sweden                              SWE            29    0.07179   
Swaziland                           SWZ            41    0.09737   
Suriname                            SUR            40    0.25980   
Sudan                               SDN            54    0.24510   
St. Vincent and the Grenadines      VCT            40    0.04681   
St. Lucia                           LCA            35    0.03312   
St. Kitts and Nevis                 KNA            36    0.03779   
Sri Lanka                           LKA            54    0.09332   
...                                 ...           ...        ...   
Cameroon                            CMR            54    0.05296   
Cambodia                            KHM            22    0.03449   
Cabo Verde                          CPV            35    0.03082   
Burundi                             BDI            51    0.09335   
Burkina Faso                        BFA            53    0.03880   
Bulgaria                            BGR            24    0.28170   
Brunei Darussalam                   BRN            16    0.05502   
Brazil                              BRA            54    0.86632   
Botswana                            BWA            39    0.09318   
Bosnia and Herzegovina              BIH            18    0.04049   
Bolivia                             BOL            54    0.33611   
Bhutan                              BTN            32    0.06712   
Benin                               BEN            53    0.04808   
Belize                              BLZ            39    0.02226   
Belgium                             BEL            16    0.01686   
Belarus                             BLR            21    0.58539   
Barbados                            BRB            44    0.06994   
Bangladesh                          BGD            41    0.07866   
Bahrain                             BHR            35    0.02983   
Bahamas, The                        BHS            46    0.04259   
Azerbaijan                          AZE            23    0.42758   
Austria                             AUT            16    0.01620   
Australia                           AUS            54    0.05068   
Aruba                               ABW            16    0.03169   
Armenia                             ARM            23    0.44282   
Argentina                           ARG            54    0.72681   
Antigua and Barbuda                 ATG            38    0.04139   
Angola                              AGO            20    0.85355   
Algeria                             DZA            51    0.10684   
Albania                             ALB            21    0.05010   

                                money growth  gdp growth  difference  
country                                                               
Zimbabwe                            -0.12919     0.00813    -0.12795  
Zambia                               0.23465     0.01163     0.01548  
Yemen, Rep.                          0.13238     0.03541    -0.00646  
West Bank and Gaza                   0.11749     0.02821     0.08117  
Venezuela, RB                        0.26996     0.02740     0.06698  
Vanuatu                              0.08102     0.02888     0.04592  
Uruguay                              0.40479     0.02251     0.03633  
United States                        0.05644     0.03075     0.02270  
United Arab Emirates                 0.13659     0.04692     0.10121  
Ukraine                              0.55392    -0.01037    -0.03054  
Uganda                               0.36831     0.05680     0.10012  
Turkey                               0.37305     0.04415     0.06091  
Tunisia                              0.10958     0.04544     0.05148  
Trinidad and Tobago                  0.11841     0.02854     0.04752  
Tonga                                0.09616     0.01605     0.04178  
Togo                                 0.09973     0.03433     0.05269  
Timor-Leste                          0.18530     0.05283     0.14941  
Thailand                             0.09645     0.05992     0.05519  
Tanzania                             0.20918     0.04978     0.06045  
Tajikistan                           0.30609     0.07200     0.13993  
Syrian Arab Republic                 0.15539     0.05270     0.06709  
Switzerland                          0.06059     0.01696     0.04378  
Sweden                               0.10587     0.02789     0.03408  
Swaziland                            0.14436     0.04428     0.04699  
Suriname                             0.29243     0.01478     0.03263  
Sudan                                0.29333     0.03720     0.04823  
St. Vincent and the Grenadines       0.09639     0.03763     0.04958  
St. Lucia                            0.08911     0.03528     0.05599  
St. Kitts and Nevis                  0.14659     0.03854     0.10880  
Sri Lanka                            0.12114     0.04755     0.02782  
...                                      ...         ...         ...  
Cameroon                             0.09689     0.03405     0.04393  
Cambodia                             0.21306     0.07304     0.17857  
Cabo Verde                           0.09847     0.06613     0.06765  
Burundi                              0.13430     0.02690     0.04095  
Burkina Faso                         0.10509     0.04205     0.06629  
Bulgaria                             0.34614     0.01965     0.06444  
Brunei Darussalam                    0.02883     0.01039    -0.02619  
Brazil                               0.89717     0.04124     0.03085  
Botswana                             0.15783     0.06930     0.06465  
Bosnia and Herzegovina               0.19611     0.04205     0.15562  
Bolivia                              0.42180     0.03016     0.08569  
Bhutan                               0.18748     0.07085     0.12036  
Benin                                0.10734     0.03679     0.05926  
Belize                               0.11148     0.05013     0.08922  
Belgium                              0.06306     0.01428     0.04620  
Belarus                              0.61733     0.04919     0.03194  
Barbados                             0.10833     0.02442     0.03839  
Bangladesh                           0.13523     0.04486     0.05657  
Bahrain                              0.08263     0.03987     0.05280  
Bahamas, The                         0.06965     0.01846     0.02706  
Azerbaijan                           0.50149     0.05468     0.07391  
Austria                              0.07491     0.01414     0.05871  
Australia                            0.10181     0.03455     0.05113  
Aruba                                0.08128     0.00825     0.04959  
Armenia                              0.46411     0.05466     0.02129  
Argentina                            0.77245     0.02646     0.04564  
Antigua and Barbuda                  0.09630     0.03683     0.05491  
Angola                               1.00940     0.07152     0.15585  
Algeria                              0.16024     0.03819     0.05340  
Albania                              0.12230     0.04788     0.07220  

[176 rows x 6 columns]

Quick plotting example

Construct a graph that visually confirms the quantity theory of money by making a scatter plot with average money growth on the horizontal axis and average inflation on the vertical axis. Add a 45 degree line and labels and a title.


In [23]:
# Construct a well-labeled scatter plot of inflation against money growth

plt.scatter(df['money growth'],df['inflation'],s=50,alpha = 0.25)
plt.grid()
plt.xlim([-0.2,1.2])
plt.ylim([-0.2,1.2])
plt.xlabel('money growth')
plt.ylabel('inflation')
plt.title('Average inflation against average money growth \nfor '+str(len(df.index))+' countries.')


Out[23]:
<matplotlib.text.Text at 0x117c6c080>