Working with Economic data in Python

This notebook will introduce you to working with data in Python. You will use packages like Numpy to manipulate, work and do computations with arrays, matrices, and such, and anipulate data (see my Introduction to Python). But given the needs of economists (and other scientists) it will be advantageous for us to use pandas. pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for Python. pandas allows you to import and process data in many useful ways. It interacts greatly with other packages that complement it making it a very powerful tool for data analysis.

With pandas you can

  1. Import many types of data, including
    • CSV files
    • Tab or other types of delimited files
    • Excel (xls, xlsx) files
    • Stata files
  1. Open files directly from a website
  2. Merge, select, join data
  3. Perform statistical analyses
  4. Create plots of your data

and much more. Let's start by importing pandas and use to it download some data and create some of the figures from the lecture notes. Note that when importing pandas it is accustomed to assign it the alias pd. I suggest you follow this conventiuon, which will make using other peoples code and snippets easier.


In [1]:
# Let's import pandas and some other basic packages we will use 
from __future__ import division
%pylab --no-import-all
%matplotlib inline
import pandas as pd
import numpy as np


Using matplotlib backend: MacOSX
Populating the interactive namespace from numpy and matplotlib

Working with Pandas

The basic structures in pandas are pd.Series and pd.DataFrame. You can think of a pd.Series as a labeled vector that contains data and has a large set of functions that can be easily performed on it. A pd.DataFrame is similar a table/matrix of multidimensional data where each column contains a pd.Series. I know...this may not explain much, so let's start with some actual examples. Let's create two series, one containing some country names and another containing some ficticious data.


In [2]:
countries = pd.Series(['Colombia', 'Turkey', 'USA', 'Germany', 'Chile'], name='country')
print(countries)
print('\n', 'There are ', countries.shape[0], 'countries in this series.')


0    Colombia
1      Turkey
2         USA
3     Germany
4       Chile
Name: country, dtype: object

 There are  5 countries in this series.

Notice that we have assinged a name to the series that is different than the name of the variable containing the series. Our print(countries) statement is showing the series and its contents, its name and the dype of data it contains. Here our series is only composed of strings so it assigns it the object dtype (not important for now, but we will use this later to convert data between types, e.g. strings to integers or floats or the other way around).

Let's create the data using some of the functions we already learned.


In [3]:
np.random.seed(123456)
data = pd.Series(np.random.normal(size=(countries.shape)), name='noise')
print(data)
print('\n', 'The average in this sample is ', data.mean())


0    0.469112
1   -0.282863
2   -1.509059
3   -1.135632
4    1.212112
Name: noise, dtype: float64

 The average in this sample is  -0.24926597871826645

Here we have used the mean() function of the series to compute its mean. There are many other properties/functions for these series including std(), shape, count(), max(), min(), etc. You can access these by writing series.name_of_function_or_property. To see what functions are available you can hit tab after writing series..

Let's create a pd.DataFrame using these two series.


In [4]:
df = pd.DataFrame([countries, data])
df


Out[4]:
0 1 2 3 4
country Colombia Turkey USA Germany Chile
noise 0.469112 -0.282863 -1.50906 -1.13563 1.21211

Not exactly what we'd like, but don't worry, we can just transpose it so it has each country with its data in a row.


In [5]:
df = df.T
df


Out[5]:
country noise
0 Colombia 0.469112
1 Turkey -0.282863
2 USA -1.50906
3 Germany -1.13563
4 Chile 1.21211

Now let us add some more data to this dataframe. This is done easily by defining a new columns. Let's create the square of noise, create the sum of noise and its square, and get the length of the country's name.


In [6]:
df['noise_sq'] = df.noise**2
df['noise and its square'] = df.noise + df.noise_sq
df['name length'] = df.country.apply(len)
df


Out[6]:
country noise noise_sq noise and its square name length
0 Colombia 0.469112 0.220066 0.689179 8
1 Turkey -0.282863 0.0800117 -0.202852 6
2 USA -1.50906 2.27726 0.768199 3
3 Germany -1.13563 1.28966 0.154029 7
4 Chile 1.21211 1.46922 2.68133 5

This shows some of the ways in which you can create new data. Especially useful is the apply method, which applies a function to the series. You can also apply a function to the whole dataframe, which is useful if you want to perform computations using various columns.

Let's see some other ways in which we can interact with dataframes. First, let's select some observations, e.g., all countries in the South America.


In [7]:
# Let's create a list of South American countries
south_america = ['Colombia', 'Chile']
# Select the rows for South American countries
df.loc[df.country.apply(lambda x: x in south_america)]


Out[7]:
country noise noise_sq noise and its square name length
0 Colombia 0.469112 0.220066 0.689179 8
4 Chile 1.21211 1.46922 2.68133 5

Now let's use this to create a dummy indicating whether a country belongs to South America. To understand what is going on let's show the result of the condition for selecting rows.


In [8]:
df.country.apply(lambda x: x in south_america)


Out[8]:
0     True
1    False
2    False
3    False
4     True
Name: country, dtype: bool

So in the previous selection of rows we told pandas which rows we wanted or not to be included by passing a series of booleans (True, False). We can use this result to create the dummy, we only need to convert the output to int.


In [9]:
df['South America'] = df.country.apply(lambda x: x in south_america).astype(int)

Now, let's plot the various series in the dataframe


In [10]:
df.plot()


Out[10]:
<matplotlib.axes._subplots.AxesSubplot at 0x1275829a0>

Not too nice nor useful. Notice that it assigned the row number to the x-axis labels. Let's change the row labels, which are contained in the dataframe's index by assigning the country names as the index.


In [11]:
df = df.set_index('country')
print(df)
df.plot()


             noise   noise_sq noise and its square  name length  South America
country                                                                       
Colombia  0.469112   0.220066             0.689179            8              1
Turkey   -0.282863  0.0800117            -0.202852            6              0
USA       -1.50906    2.27726             0.768199            3              0
Germany   -1.13563    1.28966             0.154029            7              0
Chile      1.21211    1.46922              2.68133            5              1
Out[11]:
<matplotlib.axes._subplots.AxesSubplot at 0x12968dcd0>

Better, but still not very informative. Below we will improve on this when we work with some real data.

Notice that by using the set_index function we have assigned the index to the country names. This may be useful to select data. E.g., if we want to see only the row for Colombia we can


In [12]:
df.loc['Colombia']


Out[12]:
noise                   0.469112
noise_sq                0.220066
noise and its square    0.689179
name length                    8
South America                  1
Name: Colombia, dtype: object

Getting data

One of the nice features of pandas and its ecology is that it makes obtaining data very easy. In order to exemplify this and also to revisit some of the basic facts of comparative development, let's download some data from various sources. This may require you to create accounts in order to access and download the data (sometimes the process is very simple and does not require an actual project...in other cases you need to propose a project and be approved...usually due to privacy concerns with micro-data). Don't be afraid, all these sources are free and are used a lot in research, so it is good that you learn to use them. Let's start with a list of useful sources.

Country-level data economic data

Censuses, Surveys, and other micro-level data

  • IPUMS: provides census and survey data from around the world integrated across time and space.
  • General Social Survey provides survey data on what Americans think and feel about such issues as national spending priorities, crime and punishment, intergroup relations, and confidence in institutions.
  • European Social Survey provides survey measures on the attitudes, beliefs and behaviour patterns of diverse European populations in more than thirty nations.
  • UK Data Service is the UK’s largest collection of social, economic and population data resources.
  • SHRUG is The Socioeconomic High-resolution Rural-Urban Geographic Platform for India. Provides access to dozens of datasets covering India’s 500,000 villages and 8000 towns using a set of a common geographic identifiers that span 25 years.

Divergence - Big time

To study the divergence across countries let's download and plot the historical GDP and population data. In order to keep the data and not having to download it everytime from scratch, we'll create a folder ./data in the currect directory and save each file there. Also, we'll make sure that if the data does not exist, we download it. We'll use the os package to create directories.

Setting up paths


In [13]:
import os

pathout = './data/'

if not os.path.exists(pathout):
    os.mkdir(pathout)
    
pathgraphs = './graphs/'
if not os.path.exists(pathgraphs):
    os.mkdir(pathgraphs)

Download New Maddison Project Data


In [14]:
try:
    maddison_new = pd.read_stata(pathout + 'Maddison2018.dta')
    maddison_new_region = pd.read_stata(pathout + 'Maddison2018_region.dta')
    maddison_new_1990 = pd.read_stata(pathout + 'Maddison2018_1990.dta')
except:
    maddison_new = pd.read_stata('https://www.rug.nl/ggdc/historicaldevelopment/maddison/data/mpd2018.dta')
    maddison_new.to_stata(pathout + 'Maddison2018.dta', write_index=False, version=117)
    maddison_new_region = pd.read_stata('https://www.rug.nl/ggdc/historicaldevelopment/maddison/data/mpd2018_region_data.dta')
    maddison_new_region.to_stata(pathout + 'Maddison2018_region.dta', write_index=False, version=117)
    maddison_new_1990 = pd.read_stata('https://www.rug.nl/ggdc/historicaldevelopment/maddison/data/mpd2018_1990bm.dta')
    maddison_new_1990.to_stata(pathout + 'Maddison2018_1990.dta', write_index=False, version=117)

In [15]:
maddison_new


Out[15]:
countrycode country year cgdppc rgdpnapc pop i_cig i_bm
0 AFG Afghanistan 1820.0 NaN NaN 3280.0 NaN NaN
1 AFG Afghanistan 1870.0 NaN NaN 4207.0 NaN NaN
2 AFG Afghanistan 1913.0 NaN NaN 5730.0 NaN NaN
3 AFG Afghanistan 1950.0 2392.0 2392.0 8150.0 Extrapolated NaN
4 AFG Afghanistan 1951.0 2422.0 2422.0 8284.0 Extrapolated NaN
... ... ... ... ... ... ... ... ...
19868 ZWE Zimbabwe 2012.0 1623.0 1604.0 12620.0 Extrapolated NaN
19869 ZWE Zimbabwe 2013.0 1801.0 1604.0 13183.0 Extrapolated NaN
19870 ZWE Zimbabwe 2014.0 1797.0 1594.0 13772.0 Extrapolated NaN
19871 ZWE Zimbabwe 2015.0 1759.0 1560.0 14230.0 Extrapolated NaN
19872 ZWE Zimbabwe 2016.0 1729.0 1534.0 14547.0 Extrapolated NaN

19873 rows × 8 columns

This dataset is in long format. Also, notice that the year is not an integer. Let's correct this


In [16]:
maddison_new['year'] = maddison_new.year.astype(int)
maddison_new


Out[16]:
countrycode country year cgdppc rgdpnapc pop i_cig i_bm
0 AFG Afghanistan 1820 NaN NaN 3280.0 NaN NaN
1 AFG Afghanistan 1870 NaN NaN 4207.0 NaN NaN
2 AFG Afghanistan 1913 NaN NaN 5730.0 NaN NaN
3 AFG Afghanistan 1950 2392.0 2392.0 8150.0 Extrapolated NaN
4 AFG Afghanistan 1951 2422.0 2422.0 8284.0 Extrapolated NaN
... ... ... ... ... ... ... ... ...
19868 ZWE Zimbabwe 2012 1623.0 1604.0 12620.0 Extrapolated NaN
19869 ZWE Zimbabwe 2013 1801.0 1604.0 13183.0 Extrapolated NaN
19870 ZWE Zimbabwe 2014 1797.0 1594.0 13772.0 Extrapolated NaN
19871 ZWE Zimbabwe 2015 1759.0 1560.0 14230.0 Extrapolated NaN
19872 ZWE Zimbabwe 2016 1729.0 1534.0 14547.0 Extrapolated NaN

19873 rows × 8 columns

Original Maddison Data

Now, let's download, save and read the original Maddison database. Since the original file is an excel file with different data on each sheet, it will require us to use a different method to get all the data.


In [17]:
if not os.path.exists(pathout + 'Maddison_original.xls'):
    import urllib
    dataurl = "http://www.ggdc.net/maddison/Historical_Statistics/horizontal-file_02-2010.xls"
    urllib.request.urlretrieve(dataurl, pathout + 'Maddison_original.xls')

Some data munging

This dataset is not very nicely structured for importing, as you can see if you open it in Excel. I suggest you do so, so that you can better see what is going on. Notice that the first two rows really have no data. Also, every second column is empty. Moreover, there are a few empty rows. Let's import the data and clean it so we can plot and analyse it better.


In [18]:
maddison_old_pop = pd.read_excel(pathout + 'Maddison_original.xls', sheet_name="Population", skiprows=2)
maddison_old_pop


Out[18]:
Unnamed: 0 1 Unnamed: 2 1000 Unnamed: 4 1500 Unnamed: 6 1600 Unnamed: 8 1700 ... 2002 2003 2004 2005 2006 2007 2008 2009 Unnamed: 201 2030
0 Western Europe NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Austria 500.0 NaN 700.0 NaN 2000.0 NaN 2500.0 NaN 2500.0 ... 8148.312 8162.656 8174.762 8184.691 8192.880 8199.783 8205.533 8210 NaN 8120.000
2 Belgium 300.0 NaN 400.0 NaN 1400.0 NaN 1600.0 NaN 2000.0 ... 10311.970 10330.824 10348.276 10364.388 10379.067 10392.226 10403.951 10414 NaN 10409.000
3 Denmark 180.0 NaN 360.0 NaN 600.0 NaN 650.0 NaN 700.0 ... 5374.693 5394.138 5413.392 5432.335 5450.661 5468.120 5484.723 5501 NaN 5730.488
4 Finland 20.0 NaN 40.0 NaN 300.0 NaN 400.0 NaN 400.0 ... 5193.039 5204.405 5214.512 5223.442 5231.372 5238.460 5244.749 5250 NaN 5201.445
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
273 Guadeloupe NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 435.739 440.189 444.515 448.713 452.776 456.698 460.486 n.a. NaN 523.493
274 Guyana (Fr.) NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 182.333 186.917 191.309 195.506 199.509 203.321 206.941 n.a. NaN 272.781
275 Martinique NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 422.277 425.966 429.510 432.900 436.131 439.202 442.119 n.a. NaN 486.714
276 Reunion NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 743.981 755.171 766.153 776.948 787.584 798.094 808.506 n.a. NaN 1025.217
277 Total NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 1784.330 1808.243 1831.487 1854.067 1876.000 1897.315 1918.052 n.a. NaN 2308.205

278 rows × 203 columns


In [19]:
maddison_old_gdppc = pd.read_excel(pathout + 'Maddison_original.xls', sheet_name="PerCapita GDP", skiprows=2)
maddison_old_gdppc


Out[19]:
Unnamed: 0 1 Unnamed: 2 1000 Unnamed: 4 1500 Unnamed: 6 1600 Unnamed: 8 1700 ... 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
0 Western Europe NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Austria 425.000000 NaN 425.000000 NaN 707 NaN 837.200000 NaN 993.200000 ... 20065.093878 20691.415561 20812.893753 20955.874051 21165.047259 21626.929322 22140.725899 22892.682427 23674.041130 24130.547035
2 Belgium 450.000000 NaN 425.000000 NaN 875 NaN 975.625000 NaN 1144.000000 ... 19964.428266 20656.458570 20761.238278 21032.935511 21205.859281 21801.602508 22246.561977 22881.632810 23446.949672 23654.763464
3 Denmark 400.000000 NaN 400.000000 NaN 738.333 NaN 875.384615 NaN 1038.571429 ... 22254.890572 22975.162513 23059.374968 23082.620719 23088.582457 23492.664119 23972.564284 24680.492880 24995.245167 24620.568805
4 Finland 400.000000 NaN 400.000000 NaN 453.333 NaN 537.500000 NaN 637.500000 ... 18855.985066 19770.363126 20245.896529 20521.702225 20845.802738 21574.406196 22140.573208 23190.283543 24131.519569 24343.586318
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
190 Total Africa 472.352941 NaN 424.767802 NaN 413.71 NaN 422.071584 NaN 420.628684 ... 1430.752576 1447.071701 1471.156532 1482.629352 1517.935644 1558.099461 1603.686517 1663.531318 1724.226776 1780.265474
191 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
192 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
193 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
194 World Average 466.752281 NaN 453.402162 NaN 566.389 NaN 595.783856 NaN 614.853602 ... 5833.255492 6037.675887 6131.705471 6261.734267 6469.119575 6738.281333 6960.031035 7238.383483 7467.648232 7613.922924

195 rows × 200 columns

Let's start by renaming the first column, which has the region/country names


In [20]:
maddison_old_pop.rename(columns={'Unnamed: 0':'Country'}, inplace=True)
maddison_old_gdppc.rename(columns={'Unnamed: 0':'Country'}, inplace=True)

Now let's drop all the columns that do not have data


In [21]:
maddison_old_pop = maddison_old_pop[[col for col in maddison_old_pop.columns if str(col).startswith('Unnamed')==False]]
maddison_old_gdppc = maddison_old_gdppc[[col for col in maddison_old_gdppc.columns if str(col).startswith('Unnamed')==False]]

Now, let's change the name of the columns so they reflect the underlying variable


In [22]:
maddison_old_pop.columns = ['Country'] + ['pop_'+str(col) for col in maddison_old_pop.columns[1:]]
maddison_old_gdppc.columns = ['Country'] + ['gdppc_'+str(col) for col in maddison_old_gdppc.columns[1:]]

In [23]:
maddison_old_pop


Out[23]:
Country pop_1 pop_1000 pop_1500 pop_1600 pop_1700 pop_1820 pop_1821 pop_1822 pop_1823 ... pop_2001 pop_2002 pop_2003 pop_2004 pop_2005 pop_2006 pop_2007 pop_2008 pop_2009 pop_2030
0 Western Europe NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Austria 500.0 700.0 2000.0 2500.0 2500.0 3369.0 3386.0 3402.0 3419.0 ... 8131.690 8148.312 8162.656 8174.762 8184.691 8192.880 8199.783 8205.533 8210 8120.000
2 Belgium 300.0 400.0 1400.0 1600.0 2000.0 3434.0 3464.0 3495.0 3526.0 ... 10291.679 10311.970 10330.824 10348.276 10364.388 10379.067 10392.226 10403.951 10414 10409.000
3 Denmark 180.0 360.0 600.0 650.0 700.0 1155.0 1167.0 1179.0 1196.0 ... 5355.826 5374.693 5394.138 5413.392 5432.335 5450.661 5468.120 5484.723 5501 5730.488
4 Finland 20.0 40.0 300.0 400.0 400.0 1169.0 1186.0 1202.0 1219.0 ... 5180.309 5193.039 5204.405 5214.512 5223.442 5231.372 5238.460 5244.749 5250 5201.445
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
273 Guadeloupe NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 431.170 435.739 440.189 444.515 448.713 452.776 456.698 460.486 n.a. 523.493
274 Guyana (Fr.) NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 177.562 182.333 186.917 191.309 195.506 199.509 203.321 206.941 n.a. 272.781
275 Martinique NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 418.454 422.277 425.966 429.510 432.900 436.131 439.202 442.119 n.a. 486.714
276 Reunion NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 732.570 743.981 755.171 766.153 776.948 787.584 798.094 808.506 n.a. 1025.217
277 Total NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 1759.756 1784.330 1808.243 1831.487 1854.067 1876.000 1897.315 1918.052 n.a. 2308.205

278 rows × 197 columns


In [24]:
maddison_old_gdppc


Out[24]:
Country gdppc_1 gdppc_1000 gdppc_1500 gdppc_1600 gdppc_1700 gdppc_1820 gdppc_1821 gdppc_1822 gdppc_1823 ... gdppc_1999 gdppc_2000 gdppc_2001 gdppc_2002 gdppc_2003 gdppc_2004 gdppc_2005 gdppc_2006 gdppc_2007 gdppc_2008
0 Western Europe NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Austria 425.000000 425.000000 707 837.200000 993.200000 1218.165628 NaN NaN NaN ... 20065.093878 20691.415561 20812.893753 20955.874051 21165.047259 21626.929322 22140.725899 22892.682427 23674.041130 24130.547035
2 Belgium 450.000000 425.000000 875 975.625000 1144.000000 1318.870122 NaN NaN NaN ... 19964.428266 20656.458570 20761.238278 21032.935511 21205.859281 21801.602508 22246.561977 22881.632810 23446.949672 23654.763464
3 Denmark 400.000000 400.000000 738.333 875.384615 1038.571429 1273.593074 1320.479863 1326.547922 1307.692308 ... 22254.890572 22975.162513 23059.374968 23082.620719 23088.582457 23492.664119 23972.564284 24680.492880 24995.245167 24620.568805
4 Finland 400.000000 400.000000 453.333 537.500000 637.500000 781.009410 NaN NaN NaN ... 18855.985066 19770.363126 20245.896529 20521.702225 20845.802738 21574.406196 22140.573208 23190.283543 24131.519569 24343.586318
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
190 Total Africa 472.352941 424.767802 413.71 422.071584 420.628684 419.755914 NaN NaN NaN ... 1430.752576 1447.071701 1471.156532 1482.629352 1517.935644 1558.099461 1603.686517 1663.531318 1724.226776 1780.265474
191 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
192 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
193 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
194 World Average 466.752281 453.402162 566.389 595.783856 614.853602 665.735330 NaN NaN NaN ... 5833.255492 6037.675887 6131.705471 6261.734267 6469.119575 6738.281333 6960.031035 7238.383483 7467.648232 7613.922924

195 rows × 195 columns

Let's choose the rows that hold the aggregates by region for the main regions of the world.


In [25]:
gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.apply(lambda x: str(x).upper().find('TOTAL')!=-1)].reset_index(drop=True)
gdppc = gdppc.dropna(subset=['gdppc_1'])
gdppc = gdppc.loc[2:]
gdppc['Country'] = gdppc.Country.str.replace('Total', '').str.replace('Countries', '').str.replace('\d+', '').str.replace('European', 'Europe').str.strip()
gdppc = gdppc.loc[gdppc.Country.apply(lambda x: x.find('USSR')==-1 and  x.find('West Asian')==-1)].reset_index(drop=True)
gdppc


Out[25]:
Country gdppc_1 gdppc_1000 gdppc_1500 gdppc_1600 gdppc_1700 gdppc_1820 gdppc_1821 gdppc_1822 gdppc_1823 ... gdppc_1999 gdppc_2000 gdppc_2001 gdppc_2002 gdppc_2003 gdppc_2004 gdppc_2005 gdppc_2006 gdppc_2007 gdppc_2008
0 Western Europe 576.167665 427.425665 771.094 887.906964 993.456911 1194.184683 NaN NaN NaN ... 18497.208533 19176.001655 19463.863297 19627.707522 19801.145425 20199.220700 20522.238008 21087.304789 21589.011346 21671.774225
1 Western Offshoots 400.000000 400.000000 400 400.000000 476.000000 1201.993477 NaN NaN NaN ... 26680.580823 27393.808035 27387.312035 27648.644070 28090.274362 28807.845958 29415.399334 29922.741918 30344.425293 30151.805880
2 East Europe 411.789474 400.000000 496 548.023599 606.010638 683.160984 NaN NaN NaN ... 5734.162109 5970.165085 6143.112873 6321.395376 6573.365882 6942.136596 7261.721015 7730.097570 8192.881904 8568.967581
3 Latin America 400.000000 400.000000 416.457 437.558140 526.639004 691.060678 NaN NaN NaN ... 5765.585093 5889.237351 5846.295193 5746.609672 5785.841237 6063.068969 6265.525702 6530.533583 6783.869986 6973.134656
4 Asia 455.671021 469.961665 568.418 573.550859 571.605276 580.626115 NaN NaN NaN ... 3623.902724 3797.608955 3927.186275 4121.275511 4388.982705 4661.517477 4900.563281 5187.253152 5408.383588 5611.198564
5 Africa 472.352941 424.767802 413.71 422.071584 420.628684 419.755914 NaN NaN NaN ... 1430.752576 1447.071701 1471.156532 1482.629352 1517.935644 1558.099461 1603.686517 1663.531318 1724.226776 1780.265474

6 rows × 195 columns

Let's drop missing values


In [26]:
gdppc = gdppc.dropna(axis=1, how='any')
gdppc


Out[26]:
Country gdppc_1 gdppc_1000 gdppc_1500 gdppc_1600 gdppc_1700 gdppc_1820 gdppc_1870 gdppc_1900 gdppc_1913 ... gdppc_1999 gdppc_2000 gdppc_2001 gdppc_2002 gdppc_2003 gdppc_2004 gdppc_2005 gdppc_2006 gdppc_2007 gdppc_2008
0 Western Europe 576.167665 427.425665 771.094 887.906964 993.456911 1194.184683 1953.068150 2884.661525 3456.576178 ... 18497.208533 19176.001655 19463.863297 19627.707522 19801.145425 20199.220700 20522.238008 21087.304789 21589.011346 21671.774225
1 Western Offshoots 400.000000 400.000000 400 400.000000 476.000000 1201.993477 2419.152411 4014.870040 5232.816582 ... 26680.580823 27393.808035 27387.312035 27648.644070 28090.274362 28807.845958 29415.399334 29922.741918 30344.425293 30151.805880
2 East Europe 411.789474 400.000000 496 548.023599 606.010638 683.160984 936.628265 1437.944586 1694.879668 ... 5734.162109 5970.165085 6143.112873 6321.395376 6573.365882 6942.136596 7261.721015 7730.097570 8192.881904 8568.967581
3 Latin America 400.000000 400.000000 416.457 437.558140 526.639004 691.060678 676.005331 1113.071149 1494.431922 ... 5765.585093 5889.237351 5846.295193 5746.609672 5785.841237 6063.068969 6265.525702 6530.533583 6783.869986 6973.134656
4 Asia 455.671021 469.961665 568.418 573.550859 571.605276 580.626115 553.459947 637.615593 695.131881 ... 3623.902724 3797.608955 3927.186275 4121.275511 4388.982705 4661.517477 4900.563281 5187.253152 5408.383588 5611.198564
5 Africa 472.352941 424.767802 413.71 422.071584 420.628684 419.755914 500.011054 601.236364 637.433138 ... 1430.752576 1447.071701 1471.156532 1482.629352 1517.935644 1558.099461 1603.686517 1663.531318 1724.226776 1780.265474

6 rows × 70 columns

Let's convert from wide to long format


In [27]:
gdppc = pd.wide_to_long(gdppc, ['gdppc_'], i='Country', j='year').reset_index()
gdppc


Out[27]:
Country year gdppc_
0 Western Europe 1 576.168
1 Western Offshoots 1 400
2 East Europe 1 411.789
3 Latin America 1 400
4 Asia 1 455.671
... ... ... ...
409 Western Offshoots 2008 30151.8
410 East Europe 2008 8568.97
411 Latin America 2008 6973.13
412 Asia 2008 5611.2
413 Africa 2008 1780.27

414 rows × 3 columns

Plotting

We can now plot the data. Let's try two different ways. The first uses the plot function from pandas. The second uses the package seaborn, which improves on the capabilities of matplotlib. The main difference is how the data needs to be organized. Of course, these are not the only ways to plot and we can try others.


In [28]:
import matplotlib as mpl
import seaborn as sns
# Setup seaborn
sns.set()

Let's pivot the table so that each region is a column and each row is a year. This will allow us to plot using the plot function of the pandas DataFrame.


In [29]:
gdppc2 = gdppc.pivot_table(index='year',columns='Country',values='gdppc_',aggfunc='sum')
gdppc2


Out[29]:
Country Africa Asia East Europe Latin America Western Europe Western Offshoots
year
1 472.352941 455.671021 411.789474 400.000000 576.167665 400.000000
1000 424.767802 469.961665 400.000000 400.000000 427.425665 400.000000
1500 413.709504 568.417900 496.000000 416.457143 771.093805 400.000000
1600 422.071584 573.550859 548.023599 437.558140 887.906964 400.000000
1700 420.628684 571.605276 606.010638 526.639004 993.456911 476.000000
... ... ... ... ... ... ...
2004 1558.099461 4661.517477 6942.136596 6063.068969 20199.220700 28807.845958
2005 1603.686517 4900.563281 7261.721015 6265.525702 20522.238008 29415.399334
2006 1663.531318 5187.253152 7730.097570 6530.533583 21087.304789 29922.741918
2007 1724.226776 5408.383588 8192.881904 6783.869986 21589.011346 30344.425293
2008 1780.265474 5611.198564 8568.967581 6973.134656 21671.774225 30151.805880

69 rows × 6 columns

Ok. Let's plot using the pandas plot function.


In [30]:
# Select some colors
mycolors = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# Use seaborn to setup a color map to be used by matplotlib
my_cmap = mpl.colors.ListedColormap(sns.color_palette(mycolors).as_hex())

# Set the size of the figure and get a figure and axis object
fig, ax = plt.subplots(figsize=(30,20))
# Plot using the axis ax and colormap my_cmap
gdppc2.loc[1800:].plot(ax=ax, linewidth=8, cmap=my_cmap)
# Change options of axes, legend
ax.tick_params(axis = 'both', which = 'major', labelsize=32)
ax.tick_params(axis = 'both', which = 'minor', labelsize=16)
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(prop={'size': 40}).set_title("Region", prop = {'size':40})
# Label axes
ax.set_xlabel('Year', fontsize=36)
ax.set_ylabel('GDP per capita (1990 Int\'l US$)', fontsize=36)


Out[30]:
Text(0, 0.5, "GDP per capita (1990 Int'l US$)")

In [31]:
fig


Out[31]:

Now, let's use seaborn


In [32]:
gdppc['Region'] = gdppc.Country.astype('category')
gdppc['gdppc_'] = gdppc.gdppc_.astype(float)
# Plot
fig, ax = plt.subplots(figsize=(30,20))
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[gdppc.year>=1800].reset_index(drop=True), alpha=1, lw=8, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=False)
ax.tick_params(axis = 'both', which = 'major', labelsize=32)
ax.tick_params(axis = 'both', which = 'minor', labelsize=16)
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year', fontsize=36)
ax.set_ylabel('GDP per capita (1990 Int\'l US$)', fontsize=36)


Out[32]:
Text(0, 0.5, "GDP per capita (1990 Int'l US$)")

In [33]:
fig


Out[33]:

Nice! Basically the same plot. But we can do better! Let's use seaborn again, but this time use different markers for each region, and let's use only a subset of the data so that it looks better. Also, let's export the figure so we can use it in our slides.


In [34]:
# Create category for hue
gdppc['Region'] = gdppc.Country.astype('category')
gdppc['gdppc_'] = gdppc.gdppc_.astype(float)

sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=1800) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'y1820-2010.pdf', dpi=300, bbox_inches='tight')



In [35]:
fig


Out[35]:

Let's create the same plot using the updated data from the Maddison Project. Here we have less years, but the picture is similar.


In [36]:
maddison_new_region['Region'] = maddison_new_region.region_name

mycolors2 = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71", "orange", "b"]
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='cgdppc', hue='Region', data=maddison_new_region.loc[(maddison_new_region.year.apply(lambda x: x in [1870, 1890, 1913, 1929,1950, 2016])) | ((maddison_new_region.year>1950) & (maddison_new_region.year.apply(lambda x: np.mod(x,10)==0)))], alpha=1, palette=sns.color_palette(mycolors2), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (2011 Int\'l US$)')
plt.savefig(pathgraphs + 'y1870-2016.pdf', dpi=300, bbox_inches='tight')



In [37]:
fig


Out[37]:

Let's show the evolution starting from other periods.


In [38]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=1700) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'take-off-1700-2010.pdf', dpi=300, bbox_inches='tight')



In [39]:
fig


Out[39]:

In [40]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=1500) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'y1500-2010.pdf', dpi=300, bbox_inches='tight')



In [41]:
fig


Out[41]:

In [42]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=1000) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'y1000-2010.pdf', dpi=300, bbox_inches='tight')



In [43]:
fig


Out[43]:

In [44]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=0) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'y1-2010.pdf', dpi=300, bbox_inches='tight')



In [45]:
fig


Out[45]:

Let's plot the evolution of GDP per capita for the whole world


In [46]:
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country=='World Average']
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc
world_gdppc['Region'] = world_gdppc.Country.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)

sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=world_gdppc.loc[(world_gdppc.year>=0) & (world_gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'W-y1-2010.pdf', dpi=300, bbox_inches='tight')



In [47]:
fig


Out[47]:

Let's plot $log(GDPpc)$ during the modern era when we have sustained economic growth


In [48]:
gdppc['lgdppc'] = np.log(gdppc.gdppc_)

# Select some colors
mycolors = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# Use seaborn to setup a color map to be used by matplotlib
my_cmap = mpl.colors.ListedColormap(sns.color_palette(mycolors).as_hex())

sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='lgdppc', hue='Region', data=gdppc.loc[(gdppc.year>=1950)].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(loc='upper left')
ax.set_xlabel('Year')
ax.set_ylabel('Log[GDP per capita (1990 Int\'l US$)]')
plt.savefig(pathgraphs + 'sg1950-2000.pdf', dpi=300, bbox_inches='tight')



In [49]:
fig


Out[49]:

In [50]:
mycolors2 = ["#34495e", "#2ecc71"]
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='cgdppc', hue='Region', data=maddison_new_region.loc[(maddison_new_region.year>=1870) & (maddison_new_region.region.apply(lambda x: x in ['we', 'wo']))], alpha=1, palette=sns.color_palette(mycolors2), style='Region', dashes=False, markers=['D', '^'],)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1f}'))
ax.set_yscale('log')
ax.set_yticks([500, 5000, 50000])
ax.get_yaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter())
ax.legend(loc='upper left')
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$, log-scale)')
plt.savefig(pathgraphs + 'sg1870-2000.pdf', dpi=300, bbox_inches='tight')


Growth Rates

Let's select a subsample of periods between 1CE and 2008 and compute the growth rate per year of income per capita in the world. We will select the sample of years we want using the loc operator and then use the shift operator to get data from the previous observation.


In [51]:
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 2008]).astype(int)
world_gdppc


Out[51]:
Country year gdppc_ Region mysample
0 World Average 1 466.752281 World Average 1
1 World Average 1000 453.402162 World Average 1
2 World Average 1500 566.389464 World Average 1
3 World Average 1600 595.783856 World Average 0
4 World Average 1700 614.853602 World Average 0
... ... ... ... ... ...
189 World Average 2004 6738.281333 World Average 0
190 World Average 2005 6960.031035 World Average 0
191 World Average 2006 7238.383483 World Average 0
192 World Average 2007 7467.648232 World Average 0
193 World Average 2008 7613.922924 World Average 1

69 rows × 5 columns


In [52]:
maddison_growth = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth['year_prev'] = maddison_growth['year'] - maddison_growth['year'].shift(1)
maddison_growth['growth'] = ((maddison_growth['gdppc_'] / maddison_growth['gdppc_'].shift(1)) ** (1/ maddison_growth.year_prev) -1)
maddison_growth['Period'] = maddison_growth['year'].astype(str).shift(1) + '-' + maddison_growth['year'].astype(str)
maddison_growth


Out[52]:
Country year gdppc_ Region mysample year_prev growth Period
0 World Average 1 466.752281 World Average 1 NaN NaN NaN
1 World Average 1000 453.402162 World Average 1 999.0 -0.000029 1-1000
2 World Average 1500 566.389464 World Average 1 500.0 0.000445 1000-1500
3 World Average 1820 665.735330 World Average 1 320.0 0.000505 1500-1820
4 World Average 2008 7613.922924 World Average 1 188.0 0.013046 1820-2008

In [53]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues", maddison_growth.shape[0]+4)[4:])
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
#handles, labels = ax.get_legend_handles_labels()
#ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate of Income per capita')
plt.savefig(pathgraphs + 'W-g1-2010.pdf', dpi=300, bbox_inches='tight')



In [54]:
fig


Out[54]:

Growth of population and income (by regions)


In [55]:
# Growth rates gdppc
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country=='World Average']
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = 'World'
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)
print(maddison_growth_gdppc)


         Country  year       gdppc_ Region  mysample  year_prev    growth     Period
0  World Average     1   466.752281  World         1        NaN       NaN        NaN
1  World Average  1000   453.402162  World         1      999.0 -0.000029     1-1000
2  World Average  1500   566.389464  World         1      500.0  0.000445  1000-1500
3  World Average  1820   665.735330  World         1      320.0  0.000505  1500-1820
4  World Average  1913  1524.430799  World         1       93.0  0.008948  1820-1913

In [56]:
# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country=='World Total']
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = 'World'
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
print(maddison_growth_pop)


       Country  year          pop_ Region  mysample  year_prev    growth     Period
0  World Total     1  2.258200e+05  World         1        NaN       NaN        NaN
1  World Total  1000  2.673300e+05  World         1      999.0  0.000169     1-1000
2  World Total  1500  4.384280e+05  World         1      500.0  0.000990  1000-1500
3  World Total  1820  1.041708e+06  World         1      320.0  0.002708  1500-1820
4  World Total  1913  1.792925e+06  World         1       93.0  0.005856  1820-1913

In [57]:
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth


Out[57]:
Region Period GDPpc Population
1 World 1-1000 -0.000029 0.000169
2 World 1000-1500 0.000445 0.000990
3 World 1500-1820 0.000505 0.002708
4 World 1820-1913 0.008948 0.005856

In [58]:
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 
maddison_growth


Out[58]:
Region Period variable growth
0 World 1-1000 Income per capita -0.000029
1 World 1000-1500 Income per capita 0.000445
2 World 1500-1820 Income per capita 0.000505
3 World 1820-1913 Income per capita 0.008948
4 World 1-1000 Population 0.000169
5 World 1000-1500 Population 0.000990
6 World 1500-1820 Population 0.002708
7 World 1820-1913 Population 0.005856

In [59]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + 'W-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')



In [60]:
fig


Out[60]:

In [61]:
# Growth rates gdppc
myregion = 'Western Offshoots'
fname = 'WO'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')



In [62]:
fig


Out[62]:

In [63]:
# Growth rates gdppc
myregion = 'Western Europe'
fname = 'WE'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total 30  '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total 30  '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')



In [64]:
fig


Out[64]:

In [65]:
# Growth rates gdppc
myregion = 'Latin America'
fname = 'LA'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')



In [66]:
fig


Out[66]:

In [67]:
# Growth rates gdppc
myregion = 'Asia'
fname = 'AS'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')



In [68]:
fig


Out[68]:

In [69]:
# Growth rates gdppc
myregion = 'Africa'
fname = 'AF'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')



In [70]:
fig


Out[70]:

Comparing richest to poorest region across time

Let's create a table that shows the GDP per capita levels for the 6 regions in the original data and compute the ratio of richest to poorest. Let's also plot it.


In [71]:
gdppc2['Richest-Poorest Ratio'] = gdppc2.max(axis=1) / gdppc2.min(axis=1)
gdp_ratio = gdppc2.loc[[1, 1000, 1500, 1700, 1820, 1870, 1913, 1940, 1960, 1980, 2000, 2008]].T
gdp_ratio = gdp_ratio.T.reset_index()
gdp_ratio['Region'] = 'Richest-Poorest'
gdp_ratio['Region'] = gdp_ratio.Region.astype('category')

In [72]:
gdp_ratio


Out[72]:
Country year Africa Asia East Europe Latin America Western Europe Western Offshoots Richest-Poorest Ratio Region
0 1 472.352941 455.671021 411.789474 400.000000 576.167665 400.000000 1.440419 Richest-Poorest
1 1000 424.767802 469.961665 400.000000 400.000000 427.425665 400.000000 1.174904 Richest-Poorest
2 1500 413.709504 568.417900 496.000000 416.457143 771.093805 400.000000 1.927735 Richest-Poorest
3 1700 420.628684 571.605276 606.010638 526.639004 993.456911 476.000000 2.361838 Richest-Poorest
4 1820 419.755914 580.626115 683.160984 691.060678 1194.184683 1201.993477 2.863553 Richest-Poorest
5 1870 500.011054 553.459947 936.628265 676.005331 1953.068150 2419.152411 4.838198 Richest-Poorest
6 1913 637.433138 695.131881 1694.879668 1494.431922 3456.576178 5232.816582 8.209201 Richest-Poorest
7 1940 813.374613 893.992784 1968.706774 1932.850716 4554.045082 6837.844866 8.406760 Richest-Poorest
8 1960 1055.114678 1025.743131 3069.750386 3135.517072 6879.294331 10961.082848 10.685992 Richest-Poorest
9 1980 1514.558119 2028.654705 5785.933433 5437.924365 13154.033928 18060.162963 11.924378 Richest-Poorest
10 2000 1447.071701 3797.608955 5970.165085 5889.237351 19176.001655 27393.808035 18.930512 Richest-Poorest
11 2008 1780.265474 5611.198564 8568.967581 6973.134656 21671.774225 30151.805880 16.936691 Richest-Poorest

In [73]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Richest-Poorest Ratio', data=gdp_ratio, alpha=1, hue='Region', style='Region', dashes=False, markers=True, )
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Richest-Poorest Ratio')
plt.savefig(pathgraphs + 'Richest-Poorest-Ratio.pdf', dpi=300, bbox_inches='tight')



In [74]:
fig


Out[74]:

Visualize as Table


In [75]:
gdp_ratio.style.format({
    1: '{:,.1f}'.format, 1000: '{:,.1f}'.format, 1500: '{:,.1%}'.format, 1700: '{:,.1%}'.format, 
    1820: '{:,.1%}'.format, 1870: '{:,.1%}'.format, 1913: '{:,.1%}'.format, 1940: '{:,.1%}'.format, 
    1960: '{:,.1%}'.format, 1980: '{:,.1%}'.format, 2000: '{:,.1%}'.format, 2008: '{:,.1%}'.format, 
})


Out[75]:
Country year Africa Asia East Europe Latin America Western Europe Western Offshoots Richest-Poorest Ratio Region
0 1 472.352941 455.671021 411.789474 400.000000 576.167665 400.000000 1.440419 Richest-Poorest
1 1000 424.767802 469.961665 400.000000 400.000000 427.425665 400.000000 1.174904 Richest-Poorest
2 1500 413.709504 568.417900 496.000000 416.457143 771.093805 400.000000 1.927735 Richest-Poorest
3 1700 420.628684 571.605276 606.010638 526.639004 993.456911 476.000000 2.361838 Richest-Poorest
4 1820 419.755914 580.626115 683.160984 691.060678 1194.184683 1201.993477 2.863553 Richest-Poorest
5 1870 500.011054 553.459947 936.628265 676.005331 1953.068150 2419.152411 4.838198 Richest-Poorest
6 1913 637.433138 695.131881 1694.879668 1494.431922 3456.576178 5232.816582 8.209201 Richest-Poorest
7 1940 813.374613 893.992784 1968.706774 1932.850716 4554.045082 6837.844866 8.406760 Richest-Poorest
8 1960 1055.114678 1025.743131 3069.750386 3135.517072 6879.294331 10961.082848 10.685992 Richest-Poorest
9 1980 1514.558119 2028.654705 5785.933433 5437.924365 13154.033928 18060.162963 11.924378 Richest-Poorest
10 2000 1447.071701 3797.608955 5970.165085 5889.237351 19176.001655 27393.808035 18.930512 Richest-Poorest
11 2008 1780.265474 5611.198564 8568.967581 6973.134656 21671.774225 30151.805880 16.936691 Richest-Poorest

Export table to LaTeX

Let's print the table as LaTeX code that can be copied and pasted in our slides or paper.


In [76]:
print(gdp_ratio.to_latex(formatters={
    1: '{:,.1f}'.format, 1000: '{:,.1f}'.format, 1500: '{:,.1f}'.format, 1700: '{:,.1f}'.format, 
    1820: '{:,.1f}'.format, 1870: '{:,.1f}'.format, 1913: '{:,.1f}'.format, 1940: '{:,.1f}'.format, 
    1960: '{:,.1f}'.format, 1980: '{:,.1f}'.format, 2000: '{:,.1f}'.format, 2008: '{:,.1f}'.format, 
}))


\begin{tabular}{lrrrrrrrrl}
\toprule
Country &  year &       Africa &         Asia &  East Europe &  Latin America &  Western Europe &  Western Offshoots &  Richest-Poorest Ratio &           Region \\
\midrule
0  &     1 &   472.352941 &   455.671021 &   411.789474 &     400.000000 &      576.167665 &         400.000000 &               1.440419 &  Richest-Poorest \\
1  &  1000 &   424.767802 &   469.961665 &   400.000000 &     400.000000 &      427.425665 &         400.000000 &               1.174904 &  Richest-Poorest \\
2  &  1500 &   413.709504 &   568.417900 &   496.000000 &     416.457143 &      771.093805 &         400.000000 &               1.927735 &  Richest-Poorest \\
3  &  1700 &   420.628684 &   571.605276 &   606.010638 &     526.639004 &      993.456911 &         476.000000 &               2.361838 &  Richest-Poorest \\
4  &  1820 &   419.755914 &   580.626115 &   683.160984 &     691.060678 &     1194.184683 &        1201.993477 &               2.863553 &  Richest-Poorest \\
5  &  1870 &   500.011054 &   553.459947 &   936.628265 &     676.005331 &     1953.068150 &        2419.152411 &               4.838198 &  Richest-Poorest \\
6  &  1913 &   637.433138 &   695.131881 &  1694.879668 &    1494.431922 &     3456.576178 &        5232.816582 &               8.209201 &  Richest-Poorest \\
7  &  1940 &   813.374613 &   893.992784 &  1968.706774 &    1932.850716 &     4554.045082 &        6837.844866 &               8.406760 &  Richest-Poorest \\
8  &  1960 &  1055.114678 &  1025.743131 &  3069.750386 &    3135.517072 &     6879.294331 &       10961.082848 &              10.685992 &  Richest-Poorest \\
9  &  1980 &  1514.558119 &  2028.654705 &  5785.933433 &    5437.924365 &    13154.033928 &       18060.162963 &              11.924378 &  Richest-Poorest \\
10 &  2000 &  1447.071701 &  3797.608955 &  5970.165085 &    5889.237351 &    19176.001655 &       27393.808035 &              18.930512 &  Richest-Poorest \\
11 &  2008 &  1780.265474 &  5611.198564 &  8568.967581 &    6973.134656 &    21671.774225 &       30151.805880 &              16.936691 &  Richest-Poorest \\
\bottomrule
\end{tabular}


In [77]:
%%latex
\begin{tabular}{lrrrrrrrrrrrr}
\toprule
year &  1    &  1000 &  1500 &  1700 &    1820 &    1870 &    1913 &    1940 &     1960 &     1980 &     2000 &     2008 \\
Country               &       &       &       &       &         &         &         &         &          &          &          &          \\
\midrule
Africa                & 472.4 & 424.8 & 413.7 & 420.6 &   419.8 &   500.0 &   637.4 &   813.4 &  1,055.1 &  1,514.6 &  1,447.1 &  1,780.3 \\
Asia                  & 455.7 & 470.0 & 568.4 & 571.6 &   580.6 &   553.5 &   695.1 &   894.0 &  1,025.7 &  2,028.7 &  3,797.6 &  5,611.2 \\
East Europe           & 411.8 & 400.0 & 496.0 & 606.0 &   683.2 &   936.6 & 1,694.9 & 1,968.7 &  3,069.8 &  5,785.9 &  5,970.2 &  8,569.0 \\
Latin America         & 400.0 & 400.0 & 416.5 & 526.6 &   691.1 &   676.0 & 1,494.4 & 1,932.9 &  3,135.5 &  5,437.9 &  5,889.2 &  6,973.1 \\
Western Europe        & 576.2 & 427.4 & 771.1 & 993.5 & 1,194.2 & 1,953.1 & 3,456.6 & 4,554.0 &  6,879.3 & 13,154.0 & 19,176.0 & 21,671.8 \\
Western Offshoots     & 400.0 & 400.0 & 400.0 & 476.0 & 1,202.0 & 2,419.2 & 5,232.8 & 6,837.8 & 10,961.1 & 18,060.2 & 27,393.8 & 30,151.8 \\
Richest-Poorest Ratio &   1.4 &   1.2 &   1.9 &   2.4 &     2.9 &     4.8 &     8.2 &     8.4 &     10.7 &     11.9 &     18.9 &     16.9 \\
\bottomrule
\end{tabular}


\begin{tabular}{lrrrrrrrrrrrr} \toprule year & 1 & 1000 & 1500 & 1700 & 1820 & 1870 & 1913 & 1940 & 1960 & 1980 & 2000 & 2008 \\ Country & & & & & & & & & & & & \\ \midrule Africa & 472.4 & 424.8 & 413.7 & 420.6 & 419.8 & 500.0 & 637.4 & 813.4 & 1,055.1 & 1,514.6 & 1,447.1 & 1,780.3 \\ Asia & 455.7 & 470.0 & 568.4 & 571.6 & 580.6 & 553.5 & 695.1 & 894.0 & 1,025.7 & 2,028.7 & 3,797.6 & 5,611.2 \\ East Europe & 411.8 & 400.0 & 496.0 & 606.0 & 683.2 & 936.6 & 1,694.9 & 1,968.7 & 3,069.8 & 5,785.9 & 5,970.2 & 8,569.0 \\ Latin America & 400.0 & 400.0 & 416.5 & 526.6 & 691.1 & 676.0 & 1,494.4 & 1,932.9 & 3,135.5 & 5,437.9 & 5,889.2 & 6,973.1 \\ Western Europe & 576.2 & 427.4 & 771.1 & 993.5 & 1,194.2 & 1,953.1 & 3,456.6 & 4,554.0 & 6,879.3 & 13,154.0 & 19,176.0 & 21,671.8 \\ Western Offshoots & 400.0 & 400.0 & 400.0 & 476.0 & 1,202.0 & 2,419.2 & 5,232.8 & 6,837.8 & 10,961.1 & 18,060.2 & 27,393.8 & 30,151.8 \\ Richest-Poorest Ratio & 1.4 & 1.2 & 1.9 & 2.4 & 2.9 & 4.8 & 8.2 & 8.4 & 10.7 & 11.9 & 18.9 & 16.9 \\ \bottomrule \end{tabular}

Export Table to HTML


In [78]:
from IPython.display import display, HTML
display(HTML(gdp_ratio.to_html(formatters={
    1: '{:,.1f}'.format, 1000: '{:,.1f}'.format, 1500: '{:,.1f}'.format, 1700: '{:,.1f}'.format, 
    1820: '{:,.1f}'.format, 1870: '{:,.1f}'.format, 1913: '{:,.1f}'.format, 1940: '{:,.1f}'.format, 
    1960: '{:,.1f}'.format, 1980: '{:,.1f}'.format, 2000: '{:,.1f}'.format, 2008: '{:,.1f}'.format, 
})))


Country year Africa Asia East Europe Latin America Western Europe Western Offshoots Richest-Poorest Ratio Region
0 1 472.352941 455.671021 411.789474 400.000000 576.167665 400.000000 1.440419 Richest-Poorest
1 1000 424.767802 469.961665 400.000000 400.000000 427.425665 400.000000 1.174904 Richest-Poorest
2 1500 413.709504 568.417900 496.000000 416.457143 771.093805 400.000000 1.927735 Richest-Poorest
3 1700 420.628684 571.605276 606.010638 526.639004 993.456911 476.000000 2.361838 Richest-Poorest
4 1820 419.755914 580.626115 683.160984 691.060678 1194.184683 1201.993477 2.863553 Richest-Poorest
5 1870 500.011054 553.459947 936.628265 676.005331 1953.068150 2419.152411 4.838198 Richest-Poorest
6 1913 637.433138 695.131881 1694.879668 1494.431922 3456.576178 5232.816582 8.209201 Richest-Poorest
7 1940 813.374613 893.992784 1968.706774 1932.850716 4554.045082 6837.844866 8.406760 Richest-Poorest
8 1960 1055.114678 1025.743131 3069.750386 3135.517072 6879.294331 10961.082848 10.685992 Richest-Poorest
9 1980 1514.558119 2028.654705 5785.933433 5437.924365 13154.033928 18060.162963 11.924378 Richest-Poorest
10 2000 1447.071701 3797.608955 5970.165085 5889.237351 19176.001655 27393.808035 18.930512 Richest-Poorest
11 2008 1780.265474 5611.198564 8568.967581 6973.134656 21671.774225 30151.805880 16.936691 Richest-Poorest

Take-off, industrialization and reversals

Industrialization per capita

Let's create a full dataframe inserting the data by hand. This is based on data from Bairoch, P., 1982. "International industrialization levels from 1750 to 1980". Journal of European Economic History, 11(2), p.269. for 1750-1913 the data comes from Table 9


In [79]:
industrialization = [['Developed Countries', 8, 8, 11, 16, 24, 35, 55],
                     ['Europe', 8, 8, 11, 17, 23, 33, 45],
                     ['Austria-Hungary', 7, 7, 8, 11, 15, 23, 32],
                     ['Belgium', 9, 10, 14, 28, 43, 56, 88],
                     ['France', 9, 9, 12, 20, 28, 39, 59],
                     ['Germany', 8, 8, 9, 15, 25, 52, 85],
                     ['Italy', 8, 8, 8, 10, 12, 17, 26],
                     ['Russia', 6, 6, 7, 8, 10, 15, 20],
                     ['Spain', 7, 7, 8, 11, 14, 19, 22],
                     ['Sweden', 7, 8, 9, 15, 24, 41, 67],
                     ['Switzerland', 7, 10, 16, 26, 39, 67, 87],
                     ['United Kingdom', 10, 16, 25, 64, 87, 100, 115],
                     ['Canada', np.nan, 5, 6, 7, 10, 24, 46],
                     ['United States', 4, 9, 14, 21, 38, 69, 126],
                     ['Japan', 7, 7, 7, 7, 9, 12, 20],
                     ['Third World', 7, 6, 6, 4, 3, 2, 2],
                     ['China', 8, 6, 6, 4, 4, 3, 3],
                     ['India', 7, 6, 6, 3, 2, 1, 2],
                     ['Brazil', np.nan, np.nan, np.nan, 4, 4, 5, 7],
                     ['Mexico', np.nan, np.nan, np.nan, 5, 4, 5, 7],
                     ['World', 7, 6, 7, 7, 9, 14, 21]]

years = [1750, 1800, 1830, 1860, 1880, 1900, 1913]
industrialization = pd.DataFrame(industrialization, columns=['Country'] + ['y'+str(y) for y in years])

For 1913-1980 the data comes from Table 12


In [80]:
industrialization2 = [['Developed Countries', 55, 71, 81, 135, 194, 315, 344],
                      ['Market Economies', np.nan, 96, 105, 167, 222, 362, 387],
                      ['Europe', 45, 76, 94, 107, 166, 260, 280],
                      ['Belgium', 88, 116, 89, 117, 183, 291, 316],
                      ['France', 59, 82, 73, 95, 167, 259, 277],
                      ['Germany', 85, 101, 128, 144, 244, 366, 395],
                      ['Italy', 26, 39, 44, 61, 121, 194, 231],
                      ['Spain', 22, 28, 23, 31, 56, 144, 159],
                      ['Sweden', 67, 84, 135, 163, 262, 405, 409],
                      ['Switzerland', 87, 90, 88, 167, 259, 366, 354],
                      ['United Kingdom', 115, 122, 157, 210, 253, 341, 325],
                      ['Canada', 46, 82, 84, 185, 237, 370, 379],
                      ['United States', 126, 182, 167, 354, 393, 604, 629],
                      ['Japan', 20, 30, 51, 40, 113, 310, 353],
                      ['U.S.S.R.', 20, 20, 38, 73, 139, 222, 252],
                      ['Third World', 2, 3, 4, 5, 8, 14, 17],
                      ['India', 2, 3, 4, 6, 8, 14, 16],
                      ['Brazil', 7, 10, 10, 13, 23, 42, 55],
                      ['Mexico', 7, 9, 8, 12, 22, 36, 41],
                      ['China', 3, 4, 4, 5, 10, 18, 24],
                      ['World', 21, 28, 31 ,48, 66, 100, 103]]
years = [1913, 1928, 1938, 1953, 1963, 1973, 1980]
industrialization2 = pd.DataFrame(industrialization2, columns=['Country'] + ['y'+str(y) for y in years])

Let's join both dataframes so we can plot the whole series.


In [81]:
industrialization = industrialization.merge(industrialization2)
industrialization


Out[81]:
Country y1750 y1800 y1830 y1860 y1880 y1900 y1913 y1928 y1938 y1953 y1963 y1973 y1980
0 Developed Countries 8.0 8.0 11.0 16 24 35 55 71 81 135 194 315 344
1 Europe 8.0 8.0 11.0 17 23 33 45 76 94 107 166 260 280
2 Belgium 9.0 10.0 14.0 28 43 56 88 116 89 117 183 291 316
3 France 9.0 9.0 12.0 20 28 39 59 82 73 95 167 259 277
4 Germany 8.0 8.0 9.0 15 25 52 85 101 128 144 244 366 395
5 Italy 8.0 8.0 8.0 10 12 17 26 39 44 61 121 194 231
6 Spain 7.0 7.0 8.0 11 14 19 22 28 23 31 56 144 159
7 Sweden 7.0 8.0 9.0 15 24 41 67 84 135 163 262 405 409
8 Switzerland 7.0 10.0 16.0 26 39 67 87 90 88 167 259 366 354
9 United Kingdom 10.0 16.0 25.0 64 87 100 115 122 157 210 253 341 325
10 Canada NaN 5.0 6.0 7 10 24 46 82 84 185 237 370 379
11 United States 4.0 9.0 14.0 21 38 69 126 182 167 354 393 604 629
12 Japan 7.0 7.0 7.0 7 9 12 20 30 51 40 113 310 353
13 Third World 7.0 6.0 6.0 4 3 2 2 3 4 5 8 14 17
14 China 8.0 6.0 6.0 4 4 3 3 4 4 5 10 18 24
15 India 7.0 6.0 6.0 3 2 1 2 3 4 6 8 14 16
16 Brazil NaN NaN NaN 4 4 5 7 10 10 13 23 42 55
17 Mexico NaN NaN NaN 5 4 5 7 9 8 12 22 36 41
18 World 7.0 6.0 7.0 7 9 14 21 28 31 48 66 100 103

Let's convert to long format and plot the evolution of industrialization across regions and groups of countries.


In [82]:
industrialization = pd.wide_to_long(industrialization, ['y'], i='Country', j='year').reset_index()
industrialization.rename(columns={'y':'Industrialization'}, inplace=True)

In [83]:
# Select some colors
mycolors = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# Use seaborn to setup a color map to be used by matplotlib
my_cmap = mpl.colors.ListedColormap(sns.color_palette(mycolors).as_hex())
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Industrialization', hue='Country',
             data=industrialization.loc[industrialization.Country.apply(lambda x: x in ['Developed Countries', 'Third World', 'World'])].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=True)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Industrialization per capita (UK in 1900=100)')
plt.savefig(pathgraphs + 'Industrialization-Dev-NonDev.pdf', dpi=300, bbox_inches='tight')



In [84]:
fig


Out[84]:

In [85]:
# Map country name to development level
dev_level = {'Belgium':'Developed',
             'France':'Developed',
             'Germany':'Developed',
             'Italy':'Developed',
             'Spain':'Developed',
             'Sweden':'Developed',
             'Switzerland':'Developed',
             'United Kingdom':'Developed',
             'Canada':'Developed',
             'United States':'Developed',
             'Japan':'Developed',
             'China':'Developing',
             'India':'Developing',
             'Brazil':'Developing',
             'Mexico':'Developing'}

industrialization['dev_level'] = industrialization.Country.map(dev_level)

filled_markers = ('o', 's', 'v', '^', '<', '>', '8', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')

sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Industrialization', hue='Country',
             data=industrialization.loc[industrialization.dev_level=='Developed'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             palette=sns.cubehelix_palette(11, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Industrialization per capita (UK in 1900=100)')
plt.savefig(pathgraphs + 'Industrialization-Dev.pdf', dpi=300, bbox_inches='tight')



In [86]:
fig


Out[86]:

In [87]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Industrialization', hue='Country',
             data=industrialization.loc[industrialization.dev_level=='Developing'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             palette=sns.cubehelix_palette(4, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Industrialization per capita (UK in 1900=100)')
plt.savefig(pathgraphs + 'Industrialization-NonDev.pdf', dpi=300, bbox_inches='tight')



In [88]:
fig


Out[88]:

In [89]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Industrialization', hue='Country',
             data=industrialization.loc[
                 (industrialization.Country.apply(lambda x: x in ['India', 'United Kingdom'])) & 
                 (industrialization.year<=1900)].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             )
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Industrialization per capita (UK in 1900=100)')
plt.savefig(pathgraphs + 'Industrialization-UK-IND.pdf', dpi=300, bbox_inches='tight')



In [90]:
fig


Out[90]:

Manufacturing

Let's use data from the same source to explore what happened to the share of manufacturing across regions.


In [91]:
# 1750-1913
manufacturing = [['Developed Countries', 27.0, 32.3, 39.5, 63.4, 79.1, 89.0, 92.5],
                 ['Europe', 23.2, 28.1, 34.2, 53.2, 61.3, 62.0, 56.6],
                 ['Austria-Hungary', 2.9, 3.2, 3.2, 4.2, 4.4, 4.7, 4.4],
                 ['Belgium', 0.3, 0.5, 0.7, 1.4, 1.8, 1.7, 1.8],
                 ['France', 4.0, 4.2, 5.2, 7.9, 7.8, 6.8, 6.1],
                 ['Germany', 2.9, 3.5, 3.5, 4.9, 8.5, 13.2, 14.8],
                 ['Italy', 2.4, 2.5, 2.3, 2.5, 2.5, 2.5, 2.4],
                 ['Russia', 5.0, 5.6, 5.6, 7.0, 7.6, 8.8, 8.2],
                 ['Spain', 1.2, 1.5, 1.5, 1.8, 1.8, 1.6, 1.2],
                 ['Sweden', 0.3, 0.3, 0.4, 0.6, 0.8, 0.9, 1.0],
                 ['Switzerland', 0.1, 0.3, 0.4, 0.7, 0.8, 1.0, 0.9],
                 ['United Kingdom', 1.9, 4.3, 9.5, 19.9, 22.9, 18.5, 13.6],
                 ['Canada', np.nan, np.nan, 0.1, 0.3, 0.4, 0.6, 0.9],
                 ['United States', 0.1, 0.8, 2.4, 7.2, 14.7, 23.6, 32.0],
                 ['Japan', 3.8, 3.5, 2.8, 2.6, 2.4, 2.4, 2.7],
                 ['Third World', 73.0, 67.7, 60.5, 36.6, 20.9, 11.0, 7.5],
                 ['China', 32.8, 33.3, 29.8, 19.7, 12.5, 6.2, 3.6],
                 ['India', 24.5, 19.7, 17.6, 8.6, 2.8, 1.7, 1.4],
                 ['Brazil', np.nan, np.nan, np.nan, 0.4, 0.3, 0.4, 0.5],
                 ['Mexico', np.nan, np.nan, np.nan, 0.4, 0.3, 0.3, 0.3]]

years = [1750, 1800, 1830, 1860, 1880, 1900, 1913]
manufacturing = pd.DataFrame(manufacturing, columns=['Country'] + ['y'+str(y) for y in years])

# 1913-1980
manufacturing2 = [['Developed Countries', 92.5, 92.8, 92.8, 93.5, 91.5, 90.1, 88.0],
                  ['Market Economies', 76.7, 80.3, 76.5, 77.5, 70.5, 70.0, 66.9],
                  ['Europe', 40.8, 35.4, 37.3, 26.1, 26.5, 24.5, 22.9],
                  ['Belgium', 1.8, 1.7, 1.1, 0.8, 0.8, 0.7, 0.7],
                  ['France', 6.1, 6.0, 4.4, 3.2, 3.8, 3.5, 3.3],
                  ['Germany', 14.8, 11.6, 12.7, 5.9, 6.4, 5.9, 5.3],
                  ['Italy', 2.4, 2.7, 2.8, 2.3, 2.9, 2.9, 2.9],
                  ['Spain', 1.2, 1.1, 0.8, 0.7, 0.8, 1.3, 1.4],
                  ['Sweden', 1.0, 0.9, 1.2, 0.9, 0.9, 0.9, 0.8],
                  ['Switzerland', 0.9, 0.7, 0.5, 0.7, 0.7, 0.6, 0.5],
                  ['United Kingdom', 13.6, 9.9, 10.7, 8.4, 6.4, 4.9, 4.0],
                  ['Canada', 0.9, 1.5, 1.4, 2.2, 2.1, 2.1, 2.0],
                  ['United States', 32.0, 39.3, 31.4, 44.7, 35.1, 33.0, 31.5],
                  ['Japan', 2.7, 3.3, 5.2, 2.9, 5.1, 8.8, 9.1],
                  ['U.S.S.R.', 8.2, 5.3, 9.0, 10.7, 14.2, 14.4, 14.8],
                  ['Third World', 7.5, 7.2, 7.2, 6.5, 8.5, 9.9, 12.0],
                  ['India', 1.4, 1.9, 2.4, 1.7, 1.8, 2.1, 2.3],
                  ['Brazil', 0.5, 0.6, 0.6, 0.6, 0.8, 1.1, 1.4],
                  ['Mexico', 0.3, 0.2, 0.2, 0.3, 0.4, 0.5, 0.6],
                  ['China', 3.6, 3.4, 3.1, 2.3, 3.5, 3.9, 5.0]]
years = [1913, 1928, 1938, 1953, 1963, 1973, 1980]
manufacturing2 = pd.DataFrame(manufacturing2, columns=['Country'] + ['y'+str(y) for y in years])

# Merge
manufacturing = manufacturing.merge(manufacturing2)
manufacturing = pd.wide_to_long(manufacturing, ['y'], i='Country', j='year').reset_index()
manufacturing.rename(columns={'y':'manufacturing'}, inplace=True)
manufacturing['manufacturing'] = manufacturing.manufacturing / 100
manufacturing


Out[91]:
Country year manufacturing
0 Developed Countries 1750 0.270
1 Belgium 1750 0.003
2 France 1750 0.040
3 Germany 1750 0.029
4 Italy 1750 0.024
... ... ... ...
216 Third World 1980 0.120
217 China 1980 0.050
218 India 1980 0.023
219 Brazil 1980 0.014
220 Mexico 1980 0.006

221 rows × 3 columns


In [92]:
# Select some colors
mycolors = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# Use seaborn to setup a color map to be used by matplotlib
my_cmap = mpl.colors.ListedColormap(sns.color_palette(mycolors).as_hex())
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='manufacturing', hue='Country',
             data=manufacturing.loc[manufacturing.Country.apply(lambda x: x in ['Developed Countries', 'Third World', 'World'])].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=True)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Share of World Manufacturing')
plt.savefig(pathgraphs + 'Manufacturing-Dev-NonDev.pdf', dpi=300, bbox_inches='tight')



In [93]:
fig


Out[93]:

In [94]:
# Map country name to development level
dev_level = {'Belgium':'Developed',
             'France':'Developed',
             'Germany':'Developed',
             'Italy':'Developed',
             'Spain':'Developed',
             'Sweden':'Developed',
             'Switzerland':'Developed',
             'United Kingdom':'Developed',
             'Canada':'Developed',
             'United States':'Developed',
             'Japan':'Developed',
             'China':'Developing',
             'India':'Developing',
             'Brazil':'Developing',
             'Mexico':'Developing'}

manufacturing['dev_level'] = manufacturing.Country.map(dev_level)

filled_markers = ('o', 's', 'v', '^', '<', '>', '8', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')

sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='manufacturing', hue='Country',
             data=manufacturing.loc[manufacturing.dev_level=='Developed'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             palette=sns.cubehelix_palette(11, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Share of World Manufacturing')
plt.savefig(pathgraphs + 'Manufacturing-Dev.pdf', dpi=300, bbox_inches='tight')



In [95]:
fig


Out[95]:

In [96]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='manufacturing', hue='Country',
             data=manufacturing.loc[manufacturing.dev_level=='Developing'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             palette=sns.cubehelix_palette(4, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Share of World Manufacturing')
plt.savefig(pathgraphs + 'Manufacturing-NonDev.pdf', dpi=300, bbox_inches='tight')



In [97]:
fig


Out[97]:

In [98]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='manufacturing', hue='Country',
             data=manufacturing.loc[
                 (manufacturing.Country.apply(lambda x: x in ['India', 'United Kingdom'])) & 
                 (manufacturing.year<=1900)].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             )
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Share of World Manufacturing')
plt.savefig(pathgraphs + 'manufacturing-UK-IND.pdf', dpi=300, bbox_inches='tight')



In [99]:
fig


Out[99]:

Industrial Potential

We can also explore the industrial potantial of these countries.


In [100]:
# 1750-1913
indpotential = [['Developed Countries', 34.4, 47.4, 72.9, 143.2, 253.1, 481.2, 863.0,],
                ['Europe', 29.6, 41.2, 63.0, 120.3, 196.2, 335.4, 527.8,],
                ['Austria-Hungary', 3.7, 4.8, 5.8, 9.5, 14.0, 25.6, 40.7,],
                ['Belgium', 0.4, 0.7, 1.3, 3.1, 5.7, 9.2, 16.3,],
                ['France', 5.0, 6.2, 9.5, 17.9, 25.1, 36.8, 57.3,],
                ['Germany', 3.7, 5.2, 6.5, 11.1, 27.4, 71.2, 137.7,],
                ['Italy', 3.1, 3.7, 4.2, 5.7, 8.1, 13.6, 22.5,],
                ['Russia', 6.4, 8.3, 10.3, 15.8, 24.5, 47.5, 76.6,],
                ['Spain', 1.6, 2.1, 2.7, 4.0, 5.8, 8.5, 11.0,],
                ['Sweden', 0.3, 0.5, 0.6, 1.4, 2.6, 5.0, 9.0,],
                ['Switzerland', 0.2, 0.4, 0.8, 1.6, 2.6, 5.4, 8.0,],
                ['United Kingdom', 2.4, 6.2, 17.5, 45.0, 73.3, 100.0, 127.2,],
                ['Canada', np.nan, np.nan, 0.1, 0.6, 1.4, 3.2, 8.7,],
                ['United States', 0.1, 1.1, 4.6, 16.2, 46.9, 127.8, 298.1,],
                ['Japan', 4.8, 5.1, 5.2, 5.8, 7.6, 13.0, 25.1,],
                ['Third World', 92.9, 99.4, 111.5, 82.7, 67.0, 59.6, 69.5,],
                ['China', 41.7, 48.8, 54.9, 44.1, 39.9, 33.5, 33.3,],
                ['India', 31.2, 29.0, 32.5, 19.4, 8.8, 9.3, 13.1,],
                ['Brazil', np.nan, np.nan, np.nan, 0.9, 0.9, 2.1, 4.3,],
                ['Mexico', np.nan, np.nan, np.nan, 0.9, 0.8, 1.7, 2.7,],
                ['World', 127.3, 146.9, 184.4, 225.9, 320.1, 540.8, 932.5,]]

years = [1750, 1800, 1830, 1860, 1880, 1900, 1913]
indpotential = pd.DataFrame(indpotential, columns=['Country'] + ['y'+str(y) for y in years])

# 1913-1980
indpotential2 = [['Developed Countries', 863, 1259, 1562, 2870, 4699, 8432, 9718],
                 ['Market Economies', 715, 1089, 1288, 2380, 3624, 6547, 7388],
                 ['Europe', 380, 480, 629, 801, 1361, 2290, 2529],
                 ['Belgium', 16, 22, 18, 25, 41, 69, 76],
                 ['France', 57, 82, 74, 98, 194, 328, 362],
                 ['Germany', 138, 158, 214, 180, 330, 550, 590],
                 ['Italy', 23, 37, 46, 71, 150, 258, 319],
                 ['Spain', 11, 16, 14, 22, 43, 122, 156],
                 ['Sweden', 9, 12, 21, 28, 48, 80, 83],
                 ['Switzerland', 8, 9, 9, 20, 37, 57, 54],
                 ['United Kingdom', 127, 135, 181, 258, 330, 462, 441],
                 ['Canada', 9, 20, 23, 66, 109, 199, 220],
                 ['United States', 298, 533, 528, 1373, 1804, 3089, 3475],
                 ['Japan', 25, 45, 88, 88, 264, 819, 1001],
                 ['U.S.S.R.', 77, 72, 152, 328, 760, 1345, 1630],
                 ['Third World', 70, 98, 122, 200, 439, 927, 1323],
                 ['India', 13, 26, 40, 52, 91, 194, 254],
                 ['Brazil', 4, 8, 10, 18, 42, 102, 159],
                 ['Mexico', 3, 3, 4, 9, 21, 47, 68],
                 ['China', 33, 46, 52, 71, 178, 369, 553],
                 ['World', 933, 1356, 1684, 3070, 5138, 9359, 11041]]

years = [1913, 1928, 1938, 1953, 1963, 1973, 1980]
indpotential2 = pd.DataFrame(indpotential2, columns=['Country'] + ['y'+str(y) for y in years])

# Merge
indpotential = indpotential.merge(indpotential2[indpotential2.columns.difference(['y1913'])])
indpotential = pd.wide_to_long(indpotential, ['y'], i='Country', j='year').reset_index()
indpotential.rename(columns={'y':'indpotential'}, inplace=True)
indpotential


Out[100]:
Country year indpotential
0 Developed Countries 1750 34.4
1 Europe 1750 29.6
2 Belgium 1750 0.4
3 France 1750 5.0
4 Germany 1750 3.7
... ... ... ...
242 China 1980 553.0
243 India 1980 254.0
244 Brazil 1980 159.0
245 Mexico 1980 68.0
246 World 1980 11041.0

247 rows × 3 columns


In [101]:
# Select some colors
mycolors = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# Use seaborn to setup a color map to be used by matplotlib
my_cmap = mpl.colors.ListedColormap(sns.color_palette(mycolors).as_hex())
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='indpotential', hue='Country',
             data=indpotential.loc[indpotential.Country.apply(lambda x: x in ['Developed Countries', 'Third World', 'World'])].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=True)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Total Industrial Potential (UK in 1900 = 100)')
plt.savefig(pathgraphs + 'indpotential-Dev-NonDev.pdf', dpi=300, bbox_inches='tight')



In [102]:
fig


Out[102]:

In [103]:
# Map country name to development level
dev_level = {'Belgium':'Developed',
             'France':'Developed',
             'Germany':'Developed',
             'Italy':'Developed',
             'Spain':'Developed',
             'Sweden':'Developed',
             'Switzerland':'Developed',
             'United Kingdom':'Developed',
             'Canada':'Developed',
             'United States':'Developed',
             'Japan':'Developed',
             'China':'Developing',
             'India':'Developing',
             'Brazil':'Developing',
             'Mexico':'Developing'}

indpotential['dev_level'] = indpotential.Country.map(dev_level)

filled_markers = ('o', 's', 'v', '^', '<', '>', '8', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')

sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='indpotential', hue='Country',
             data=indpotential.loc[indpotential.dev_level=='Developed'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             palette=sns.cubehelix_palette(11, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Total Industrial Potential (UK in 1900 = 100)')
plt.savefig(pathgraphs + 'indpotential-Dev.pdf', dpi=300, bbox_inches='tight')



In [104]:
fig


Out[104]:

In [105]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='indpotential', hue='Country',
             data=indpotential.loc[indpotential.dev_level=='Developing'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             palette=sns.cubehelix_palette(4, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Total Industrial Potential (UK in 1900 = 100)')
plt.savefig(pathgraphs + 'indpotential-NonDev.pdf', dpi=300, bbox_inches='tight')



In [106]:
fig


Out[106]:

In [107]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='indpotential', hue='Country',
             data=indpotential.loc[
                 (indpotential.Country.apply(lambda x: x in ['India', 'United Kingdom'])) & 
                 (indpotential.year<=1900)].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             )
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Total Industrial Potential (UK in 1900 = 100)')
plt.savefig(pathgraphs + 'indpotential-UK-IND.pdf', dpi=300, bbox_inches='tight')



In [108]:
fig


Out[108]:

Persistence

Let's explore the persistence of economic development since 1950. To do so, let's get the Penn World Table and World Bank Data.

Penn World Table

Let's start by importing the data from the Penn World Tables


In [109]:
try:
    pwt_xls = pd.read_excel(pathout + 'pwt91.xlsx',encoding='utf-8')
    pwt = pd.read_stata(pathout + 'pwt91.dta')    
except:
    pwt_xls = pd.read_excel('https://www.rug.nl/ggdc/docs/pwt91.xlsx',sheet_name=1)
    pwt = pd.read_stata('https://www.rug.nl/ggdc/docs/pwt91.dta')
    pwt_xls.to_excel(pathout + 'pwt91.xlsx', index=False, encoding='utf-8')
    pwt.to_stata(pathout + 'pwt91.dta', write_index=False, version=117)
    
# Get labels of variables
pwt_labels = pd.io.stata.StataReader(pathout + 'pwt91.dta').variable_labels()

The excel file let's us know the defintion of the variables, while the Stata file has the data (of course the excel file also has the data). For some reason the original Stata file does not seem to have labels!


In [110]:
pwt_labels


Out[110]:
{'countrycode': '',
 'country': '',
 'currency_unit': '',
 'year': '',
 'rgdpe': '',
 'rgdpo': '',
 'pop': '',
 'emp': '',
 'avh': '',
 'hc': '',
 'ccon': '',
 'cda': '',
 'cgdpe': '',
 'cgdpo': '',
 'cn': '',
 'ck': '',
 'ctfp': '',
 'cwtfp': '',
 'rgdpna': '',
 'rconna': '',
 'rdana': '',
 'rnna': '',
 'rkna': '',
 'rtfpna': '',
 'rwtfpna': '',
 'labsh': '',
 'irr': '',
 'delta': '',
 'xr': '',
 'pl_con': '',
 'pl_da': '',
 'pl_gdpo': '',
 'i_cig': '',
 'i_xm': '',
 'i_xr': '',
 'i_outlier': '',
 'i_irr': '',
 'cor_exp': '',
 'statcap': '',
 'csh_c': '',
 'csh_i': '',
 'csh_g': '',
 'csh_x': '',
 'csh_m': '',
 'csh_r': '',
 'pl_c': '',
 'pl_i': '',
 'pl_g': '',
 'pl_x': '',
 'pl_m': '',
 'pl_n': '',
 'pl_k': ''}

In [111]:
pwt_xls


Out[111]:
Variable name Variable definition
0 Identifier variables NaN
1 countrycode 3-letter ISO country code
2 country Country name
3 currency_unit Currency unit
4 year Year
... ... ...
62 pl_g Price level of government consumption, price ...
63 pl_x Price level of exports, price level of USA GDP...
64 pl_m Price level of imports, price level of USA GDP...
65 pl_n Price level of the capital stock, price level ...
66 pl_k Price level of the capital services, price lev...

67 rows × 2 columns


In [112]:
pwt


Out[112]:
countrycode country currency_unit year rgdpe rgdpo pop emp avh hc ... csh_x csh_m csh_r pl_c pl_i pl_g pl_x pl_m pl_n pl_k
0 ABW Aruba Aruban Guilder 1950 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 ABW Aruba Aruban Guilder 1951 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 ABW Aruba Aruban Guilder 1952 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 ABW Aruba Aruban Guilder 1953 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 ABW Aruba Aruban Guilder 1954 NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
12371 ZWE Zimbabwe US Dollar 2013 28086.937500 28329.810547 15.054506 7.914061 NaN 2.504635 ... 0.169638 -0.426188 0.090225 0.577488 0.582022 0.448409 0.723247 0.632360 0.383488 0.704313
12372 ZWE Zimbabwe US Dollar 2014 29217.554688 29355.759766 15.411675 8.222112 NaN 2.550258 ... 0.141791 -0.340442 0.051500 0.600760 0.557172 0.392895 0.724510 0.628352 0.349735 0.704991
12373 ZWE Zimbabwe US Dollar 2015 30091.923828 29150.750000 15.777451 8.530669 NaN 2.584653 ... 0.137558 -0.354298 -0.023353 0.622927 0.580814 0.343926 0.654940 0.564430 0.348472 0.713156
12374 ZWE Zimbabwe US Dollar 2016 30974.292969 29420.449219 16.150362 8.839398 NaN 2.616257 ... 0.141248 -0.310446 0.003050 0.640176 0.599462 0.337853 0.657060 0.550084 0.346553 0.718671
12375 ZWE Zimbabwe US Dollar 2017 32693.474609 30940.816406 16.529903 9.181251 NaN 2.648248 ... 0.141799 -0.299539 0.019133 0.647136 0.726222 0.340680 0.645338 0.539529 0.412392 0.755215

12376 rows × 52 columns


In [113]:
# Describe the data
pwt.describe()


Out[113]:
year rgdpe rgdpo pop emp avh hc ccon cda cgdpe ... csh_x csh_m csh_r pl_c pl_i pl_g pl_x pl_m pl_n pl_k
count 12376.000000 9.985000e+03 9.985000e+03 9985.000000 8841.000000 3373.000000 8299.000000 9.985000e+03 9.985000e+03 9.985000e+03 ... 9985.000000 9985.000000 9985.000000 9985.000000 9985.000000 9985.000000 9985.000000 9985.000000 9959.000000 7047.000000
mean 1983.500000 2.720569e+05 2.691928e+05 30.736767 14.799485 1984.099854 2.064241 1.984998e+05 2.686580e+05 2.697088e+05 ... 0.229183 -0.307399 0.019670 0.391839 0.486303 0.368860 0.436420 0.431026 0.466652 1.403137
std 19.628579 1.078882e+06 1.070178e+06 114.569824 59.107712 272.879944 0.720774 7.772703e+05 1.079234e+06 1.070720e+06 ... 0.260547 0.681575 0.201448 0.280254 0.956450 0.347244 0.211918 0.220563 0.400624 2.628997
min 1950.000000 1.846645e+01 1.977999e+01 0.004376 0.001180 1353.886841 1.007038 1.443100e+01 1.986141e+01 1.848834e+01 ... -1.496417 -26.741989 -8.731015 0.017207 0.012448 0.010474 0.007868 0.022644 0.019666 0.060732
25% 1966.750000 6.178189e+03 6.380658e+03 1.634517 0.940000 1799.336060 1.431531 5.227761e+03 6.395296e+03 6.002223e+03 ... 0.068159 -0.381261 -0.022347 0.182697 0.198099 0.125520 0.243906 0.248910 0.219715 0.663940
50% 1983.500000 2.725946e+04 2.710632e+04 6.115370 3.021000 1972.072876 1.954407 2.153850e+04 2.763264e+04 2.677256e+04 ... 0.144143 -0.203762 0.000727 0.326817 0.396347 0.256664 0.473103 0.486665 0.364834 0.982678
75% 2000.250000 1.386558e+05 1.374726e+05 19.891548 8.583438 2149.860352 2.649120 1.005379e+05 1.357644e+05 1.362898e+05 ... 0.301996 -0.104336 0.044098 0.520135 0.594202 0.490205 0.596405 0.576243 0.569292 1.458653
max 2017.000000 1.839607e+07 1.838384e+07 1409.517456 792.575317 2910.734863 3.974208 1.483615e+07 1.846078e+07 1.792857e+07 ... 3.057809 23.158607 9.917986 3.986815 35.654171 2.367351 2.271417 5.465247 6.730951 60.361191

8 rows × 44 columns

Computing $\log$ GDP per capita

Now, we can create new variables, transform and plot the data

To compute the $log$ of income per capita (GDPpc), the first thing we need is to know the name of the column that contains the GDPpc data in the dataframe. To do this, let's find among the variables those whic in their description have the word capita.


In [114]:
pwt_xls.columns


Out[114]:
Index(['Variable name', 'Variable definition'], dtype='object')

To be able to read the definitions better, let's tell pandas to show us more content.


In [115]:
pd.set_option("display.max_columns", 20)
pd.set_option('display.max_rows', 50)
pd.set_option('display.width', 1000)
#pd.set_option('display.max_colwidth', -1)

In [116]:
pwt_xls.loc[pwt_xls['Variable definition'].apply(lambda x: str(x).lower().find('capita')!=-1)]


Out[116]:
Variable name Variable definition
12 hc Human capital index, based on years of schooli...
19 cn Capital stock at current PPPs (in mil. 2011US$)
20 ck Capital services levels at current PPPs (USA=1)
28 rnna Capital stock at constant 2011 national prices...
29 rkna Capital services at constant 2011 national pri...
34 delta Average depreciation rate of the capital stock
47 i_irr 0/1/2/3: the observation for irr is not an out...
53 csh_i Share of gross capital formation at current PPPs
61 pl_i Price level of capital formation, price level...
65 pl_n Price level of the capital stock, price level ...
66 pl_k Price level of the capital services, price lev...

So, it seems the data does not contain that variable. But do not panic...we know how to compute it based on GDP and Population. Let's do it!

Identify the name of the variable for GDP


In [117]:
pwt_xls.loc[pwt_xls['Variable definition'].apply(lambda x: str(x).upper().find('GDP')!=-1)]


Out[117]:
Variable name Variable definition
7 rgdpe Expenditure-side real GDP at chained PPPs (in ...
8 rgdpo Output-side real GDP at chained PPPs (in mil. ...
17 cgdpe Expenditure-side real GDP at current PPPs (in ...
18 cgdpo Output-side real GDP at current PPPs (in mil. ...
25 rgdpna Real GDP at constant 2011 national prices (in ...
32 labsh Share of labour compensation in GDP at current...
38 pl_con Price level of CCON (PPP/XR), price level of U...
39 pl_da Price level of CDA (PPP/XR), price level of US...
40 pl_gdpo Price level of CGDPo (PPP/XR), price level of...
46 i_outlier 0/1: the observation on pl_gdpe or pl_gdpo is ...
57 csh_r Share of residual trade and GDP statistical di...
60 pl_c Price level of household consumption, price l...
61 pl_i Price level of capital formation, price level...
62 pl_g Price level of government consumption, price ...
63 pl_x Price level of exports, price level of USA GDP...
64 pl_m Price level of imports, price level of USA GDP...

Identify the name of the variable for population


In [118]:
pwt_xls.loc[pwt_xls['Variable definition'].apply(lambda x: str(x).lower().find('population')!=-1)]


Out[118]:
Variable name Variable definition
9 pop Population (in millions)

Create a new variables/columns with real GDPpc for all the measures included in PWT


In [119]:
# Get columns with GDP measures
gdpcols = pwt_xls.loc[pwt_xls['Variable definition'].apply(lambda x: str(x).upper().find('REAL GDP')!=-1), 'Variable name'].tolist()

# Generate GDPpc for each measure
for gdp in gdpcols:
    pwt[gdp + '_pc'] = pwt[gdp] / pwt['pop']

# GDPpc data
gdppccols = [col+'_pc' for col in gdpcols]
pwt[['countrycode', 'country', 'year'] + gdppccols]


Out[119]:
countrycode country year rgdpe_pc rgdpo_pc cgdpe_pc cgdpo_pc rgdpna_pc
0 ABW Aruba 1950 NaN NaN NaN NaN NaN
1 ABW Aruba 1951 NaN NaN NaN NaN NaN
2 ABW Aruba 1952 NaN NaN NaN NaN NaN
3 ABW Aruba 1953 NaN NaN NaN NaN NaN
4 ABW Aruba 1954 NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ...
12371 ZWE Zimbabwe 2013 1865.683105 1881.816040 1874.657715 1898.868286 1952.479736
12372 ZWE Zimbabwe 2014 1895.806519 1904.774048 1918.362305 1935.120605 1947.798950
12373 ZWE Zimbabwe 2015 1907.274170 1847.621094 1924.819824 1902.378662 1934.789307
12374 ZWE Zimbabwe 2016 1917.869873 1821.658813 1932.771973 1889.612061 1901.752686
12375 ZWE Zimbabwe 2017 1977.838257 1871.808716 1998.100098 1940.005371 1913.949829

12376 rows × 8 columns

Now let's use the apply function to compute logs.


In [120]:
pwt[['l'+col for col in gdppccols]] = pwt[gdppccols].apply(np.log, axis=1)
pwt[['countrycode', 'country', 'year'] + ['l'+col for col in gdppccols]]


Out[120]:
countrycode country year lrgdpe_pc lrgdpo_pc lcgdpe_pc lcgdpo_pc lrgdpna_pc
0 ABW Aruba 1950 NaN NaN NaN NaN NaN
1 ABW Aruba 1951 NaN NaN NaN NaN NaN
2 ABW Aruba 1952 NaN NaN NaN NaN NaN
3 ABW Aruba 1953 NaN NaN NaN NaN NaN
4 ABW Aruba 1954 NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ...
12371 ZWE Zimbabwe 2013 7.531383 7.539993 7.536181 7.549013 7.576856
12372 ZWE Zimbabwe 2014 7.547400 7.552119 7.559227 7.567925 7.574455
12373 ZWE Zimbabwe 2015 7.553431 7.521654 7.562588 7.550860 7.567754
12374 ZWE Zimbabwe 2016 7.558970 7.507503 7.566710 7.544127 7.550531
12375 ZWE Zimbabwe 2017 7.589760 7.534660 7.599952 7.570446 7.556924

12376 rows × 8 columns

How correlated are these measures of log GDP per capita?


In [121]:
pwt[['countrycode', 'country', 'year'] + ['l'+col for col in gdppccols]].groupby('year').corr()


Out[121]:
lrgdpe_pc lrgdpo_pc lcgdpe_pc lcgdpo_pc lrgdpna_pc
year
1950 lrgdpe_pc 1.000000 0.996004 0.999360 0.994707 0.939644
lrgdpo_pc 0.996004 1.000000 0.995951 0.998978 0.942147
lcgdpe_pc 0.999360 0.995951 1.000000 0.995946 0.939410
lcgdpo_pc 0.994707 0.998978 0.995946 1.000000 0.943629
lrgdpna_pc 0.939644 0.942147 0.939410 0.943629 1.000000
... ... ... ... ... ... ...
2017 lrgdpe_pc 1.000000 0.975165 0.999933 0.978183 0.990955
lrgdpo_pc 0.975165 1.000000 0.974924 0.999628 0.982313
lcgdpe_pc 0.999933 0.974924 1.000000 0.978034 0.990629
lcgdpo_pc 0.978183 0.999628 0.978034 1.000000 0.984534
lrgdpna_pc 0.990955 0.982313 0.990629 0.984534 1.000000

340 rows × 5 columns

While it seems they are highly correlated, it is hard to see here directly. Let's get the statistics for each measures correlations across all years.


In [122]:
pwt[['countrycode', 'country', 'year'] + ['l'+col for col in gdppccols]].groupby('year').corr().describe()


Out[122]:
lrgdpe_pc lrgdpo_pc lcgdpe_pc lcgdpo_pc lrgdpna_pc
count 340.000000 340.000000 340.000000 340.000000 340.000000
mean 0.984764 0.976230 0.984787 0.983319 0.959519
std 0.021766 0.026951 0.021799 0.022352 0.031102
min 0.914426 0.903082 0.913860 0.899516 0.899516
25% 0.976850 0.958346 0.976905 0.973735 0.936402
50% 0.995482 0.988851 0.995799 0.995067 0.951741
75% 0.999805 0.998989 0.999805 0.998989 0.994875
max 1.000000 1.000000 1.000000 1.000000 1.000000

Ok. This gives us a better sense of how strongly correlated these measures of log GDP per capita are. In what follows we will use only one, namely Log[GDPpc] based on Expenditure-side real GDP at chained PPPs (in mil. 2011US$), i.e., lrgdpe_pc.

Convergence post-1960?

Let's start by looking at the distribution of Log[GDPpc] in 1960. For these we need to subset our dataframe and select only the rows for the year 1960. This is don with the loc property of the dataframe.


In [123]:
gdppc1960 = pwt.loc[pwt.year==1960, ['countrycode', 'country', 'year', 'lrgdpe_pc']]
gdppc1960


Out[123]:
countrycode country year lrgdpe_pc
10 ABW Aruba 1960 NaN
78 AGO Angola 1960 NaN
146 AIA Anguilla 1960 NaN
214 ALB Albania 1960 NaN
282 ARE United Arab Emirates 1960 NaN
... ... ... ... ...
12046 VNM Viet Nam 1960 NaN
12114 YEM Yemen 1960 NaN
12182 ZAF South Africa 1960 8.664412
12250 ZMB Zambia 1960 7.883263
12318 ZWE Zimbabwe 1960 7.646267

182 rows × 4 columns

gdppc1960 has the data for all countries in th eyear 1960. We can plot the histogram using the functions of the dataframe.


In [124]:
gdppc1960.lrgdpe_pc.hist()


Out[124]:
<matplotlib.axes._subplots.AxesSubplot at 0x1323d7280>

We can also plot it using the seaborn package. Let's plot the kernel density of the distribution


In [125]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.kdeplot(gdppc1960.lrgdpe_pc, ax=ax, shade=True, label='1960', linewidth=2)
ax.set_xlabel('Log[Income per capita]')
ax.set_ylabel('Density of Countries')
plt.savefig(pathgraphs + 'y1960-density.pdf', dpi=300, bbox_inches='tight')



In [126]:
fig


Out[126]:

Let's now also include the distribution for other years


In [127]:
gdppc1980 = pwt.loc[pwt.year==1980, ['countrycode', 'country', 'year', 'lrgdpe_pc']]
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.kdeplot(gdppc1960.lrgdpe_pc, ax=ax, shade=True, label='1960', linewidth=2)
sns.kdeplot(gdppc1980.lrgdpe_pc, ax=ax, shade=True, label='1980', linewidth=2)
ax.set_xlabel('Log[Income per capita]')
ax.set_ylabel('Density of Countries')
plt.savefig(pathgraphs + 'y1960-1980-density.pdf', dpi=300, bbox_inches='tight')



In [128]:
fig


Out[128]:

In [129]:
gdppc2000 = pwt.loc[pwt.year==2000, ['countrycode', 'country', 'year', 'lrgdpe_pc']]
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.kdeplot(gdppc1960.lrgdpe_pc, ax=ax, shade=True, label='1960', linewidth=2)
sns.kdeplot(gdppc1980.lrgdpe_pc, ax=ax, shade=True, label='1980', linewidth=2)
sns.kdeplot(gdppc2000.lrgdpe_pc, ax=ax, shade=True, label='2000', linewidth=2)
ax.set_xlabel('Log[Income per capita]')
ax.set_ylabel('Density of Countries')
plt.savefig(pathgraphs + 'y1960-2000-density.pdf', dpi=300, bbox_inches='tight')



In [130]:
fig


Out[130]:

Let's show the evolution of the distribution by looking at it every 10 years starting from 1950 onwards. Moreover, let's do everything in a unique piece of code.


In [131]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
period = list(range(1950, 2020, 10)) + [2017]
#mycolors = sns.color_palette("GnBu", n_colors=len(period)+5)
mycolors = sns.cubehelix_palette(len(period), start=.5, rot=-.75)
# Plot
fig, ax = plt.subplots()
k = 0
for t in period:
    sns.kdeplot(pwt.loc[pwt.year==t].lrgdpe_pc, ax=ax, shade=True, label=str(t), linewidth=2, color=mycolors[k])
    k += 1
ax.set_xlabel('Log[Income per capita]')
ax.set_ylabel('Density of Countries')
plt.savefig(pathgraphs + 'y1950-2010-density.pdf', dpi=300, bbox_inches='tight')



In [132]:
fig


Out[132]:

Persistence

The lack of convergence in the last 60 years suggest that there is some persistence in (recent) development. Let's explore this by plotting the association between past GDP per capita across different periods. In order to make things more comparable, let's normalize looking at income levels relative to the US. To do so, it's better to use the year as the index of the dataframe.


In [133]:
pwt.set_index('year', inplace=True)
pwt['lrgdpe_pc_US'] = pwt.loc[pwt.countrycode=='USA', 'lrgdpe_pc']
pwt['lrgdpe_pc_rel'] = pwt.lrgdpe_pc / pwt.lrgdpe_pc_US
pwt.reset_index(inplace=True)
pwt[['countrycode', 'country', 'year', 'lrgdpe_pc_rel']]


Out[133]:
countrycode country year lrgdpe_pc_rel
0 ABW Aruba 1950 NaN
1 ABW Aruba 1951 NaN
2 ABW Aruba 1952 NaN
3 ABW Aruba 1953 NaN
4 ABW Aruba 1954 NaN
... ... ... ... ...
12371 ZWE Zimbabwe 2013 0.693611
12372 ZWE Zimbabwe 2014 0.693790
12373 ZWE Zimbabwe 2015 0.692485
12374 ZWE Zimbabwe 2016 0.692220
12375 ZWE Zimbabwe 2017 0.694026

12376 rows × 4 columns

Let's plot the relative income levels in 1960 to 1980, 2000 and 2017. First let's create the wide version of this data.


In [134]:
relgdppc = pwt[['countrycode', 'year', 'lrgdpe_pc_rel']].pivot(index='countrycode', columns='year', values='lrgdpe_pc_rel')
relgdppc.columns = ['y' + str(col) for col in relgdppc.columns]
relgdppc.reset_index(inplace=True)
relgdppc


Out[134]:
countrycode y1950 y1951 y1952 y1953 y1954 y1955 y1956 y1957 y1958 ... y2008 y2009 y2010 y2011 y2012 y2013 y2014 y2015 y2016 y2017
0 ABW NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.989490 0.989011 0.977356 0.972664 0.969597 0.969952 0.968135 0.966769 0.963675 0.962826
1 AGO NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.786656 0.767352 0.794583 0.815654 0.815493 0.812440 0.805392 0.786142 0.782010 0.778917
2 AIA NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.975120 0.951357 0.943233 0.942697 0.935110 0.931704 0.934498 0.934647 0.928544 0.918428
3 ALB NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.826849 0.836372 0.844121 0.847105 0.848709 0.845853 0.848582 0.851238 0.850862 0.856015
4 ARE NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 1.041859 1.020390 1.013018 1.022369 1.022926 1.024411 1.027438 1.016130 1.014197 1.014201
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
177 VNM NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.755160 0.761839 0.771784 0.778270 0.783986 0.786128 0.789185 0.791125 0.796555 0.801899
178 YEM NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 0.755317 0.757512 0.772611 0.761339 0.750812 0.750272 0.737346 0.711625 0.697790 0.684275
179 ZAF 0.893115 0.887755 0.87563 0.881956 0.889942 0.886195 0.889469 0.892274 0.891707 ... 0.862708 0.862971 0.864622 0.867436 0.866010 0.865265 0.863675 0.862188 0.860596 0.860410
180 ZMB NaN NaN NaN NaN NaN 0.813323 0.816613 0.796724 0.785549 ... 0.721515 0.732669 0.742396 0.750338 0.753036 0.753754 0.753928 0.751828 0.751907 0.757633
181 ZWE NaN NaN NaN NaN 0.769047 0.764846 0.770286 0.776759 0.775427 ... 0.618364 0.670103 0.674767 0.682105 0.690134 0.693611 0.693790 0.692485 0.692220 0.694026

182 rows × 69 columns


In [135]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
k = 0
fig, ax = plt.subplots()
ax.plot([relgdppc.y1960.min()*.99, relgdppc.y1960.max()*1.01], [relgdppc.y1960.min()*.99, relgdppc.y1960.max()*1.01], c='r', label='45 degree')
sns.regplot(x='y1960', y='y2017', data=relgdppc, ax=ax, label='1960-2017')
movex = relgdppc.y1960.mean() * 0.006125
movey = relgdppc.y2017.mean() * 0.006125
for line in range(0,relgdppc.shape[0]):
    if (np.isnan(relgdppc.y1960[line])==False) & (np.isnan(relgdppc.y2017[line])==False):
        ax.text(relgdppc.y1960[line]+movex, relgdppc.y2017[line]+movey, relgdppc.countrycode[line], horizontalalignment='left', fontsize=12, color='black', weight='semibold')
ax.set_xlabel('Log[Income per capita 1960] relative to US')
ax.set_ylabel('Log[Income per capita in 2017] relative to US')
ax.legend()
plt.savefig(pathgraphs + '1960_versus_2017_drop.pdf', dpi=300, bbox_inches='tight')



In [136]:
fig


Out[136]:

Let's create a function that will simplify our plotting of this figure for various years


In [137]:
def PersistencePlot(dfin, var0='y1960', var1='y2010', labelvar='countrycode', 
                    dx=0.006125, dy=0.006125, 
                    xlabel='Log[Income per capita 1960] relative to US', 
                    ylabel='Log[Income per capita in 2010] relative to US',
                    linelabel='1960-2010',
                    filename='1960_versus_2010_drop.pdf'):
    '''
    Plot the association between var0 and var in dataframe using labelvar for labels. 
    '''
    sns.set(rc={'figure.figsize':(11.7,8.27)})
    sns.set_context("talk")
    df = dfin.copy()
    df = df.dropna(subset=[var0, var1]).reset_index(drop=True)
    # Plot
    k = 0
    fig, ax = plt.subplots()
    ax.plot([df[var0].min()*.99, df[var0].max()*1.01], [df[var0].min()*.99, df[var0].max()*1.01], c='r', label='45 degree')
    sns.regplot(x=var0, y=var1, data=df, ax=ax, label=linelabel)
    movex = df[var0].mean() * dx
    movey = df[var1].mean() * dy
    for line in range(0,df.shape[0]):
        ax.text(df[var0][line]+movex, df[var1][line]+movey, df[labelvar][line], horizontalalignment='left', fontsize=12, color='black')
    ax.set_xlabel(xlabel)
    ax.set_ylabel(ylabel)
    ax.legend()
    plt.savefig(pathgraphs + filename, dpi=300, bbox_inches='tight')
    pass

In [138]:
PersistencePlot(relgdppc, var0='y1980', var1='y2010', xlabel='Log[Income per capita 1980] relative to US',
                ylabel='Log[Income per capita in 2010] relative to US',
                    filename='1980_versus_2010_drop.pdf')



In [139]:
PersistencePlot(relgdppc.loc[(relgdppc.countrycode!='BRN')& (relgdppc.countrycode!='ARE')], var0='y1980', var1='y2010', xlabel='Log[Income per capita 1980] relative to US',
                ylabel='Log[Income per capita in 2010] relative to US', linelabel='1980-2010',
                filename='1980_versus_2010_drop.pdf')



In [140]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
period = list(range(1980, 2020, 20)) + [2017]
#mycolors = sns.color_palette("GnBu", n_colors=len(period)+5)
mycolors = sns.cubehelix_palette(len(period), start=.5, rot=-.75)
# Plot
k = 0
fig, ax = plt.subplots()
for t in period:
    sns.regplot(x='y1960', y='y'+str(t), data=relgdppc, ax=ax, label='1960-'+str(t))
    k += 1
ax.set_xlabel('Log[Income per capita 1960] relative to US')
ax.set_ylabel('Log[Income per capita in other period] relative to US')
ax.legend()


Out[140]:
<matplotlib.legend.Legend at 0x13f016fd0>

In [141]:
fig


Out[141]:

Getting data from the World Bank

The World Bank (WB) is a major source of free data. pandas has a subpackage that allows you download from many sources including the WB. The package we will use to access these API is pandas-datareader. pandas-datareader can be used to download data from a host of sources including the WB, OECD, FRED (see here).


In [142]:
from pandas_datareader import data, wb

We can now use wb to get information and data from the WB. Let's start by downloading teh set of basic information about the countries included in the API.


In [143]:
wbcountries = wb.get_countries()
wbcountries['name'] = wbcountries.name.str.strip()
wbcountries


Out[143]:
iso3c iso2c name region adminregion incomeLevel lendingType capitalCity longitude latitude
0 ABW AW Aruba Latin America & Caribbean High income Not classified Oranjestad -70.0167 12.51670
1 AFG AF Afghanistan South Asia South Asia Low income IDA Kabul 69.1761 34.52280
2 AFR A9 Africa Aggregates Aggregates Aggregates NaN NaN
3 AGO AO Angola Sub-Saharan Africa Sub-Saharan Africa (excluding high income) Lower middle income IBRD Luanda 13.2420 -8.81155
4 ALB AL Albania Europe & Central Asia Europe & Central Asia (excluding high income) Upper middle income IBRD Tirane 19.8172 41.33170
... ... ... ... ... ... ... ... ... ... ...
299 XZN A5 Sub-Saharan Africa excluding South Africa and ... Aggregates Aggregates Aggregates NaN NaN
300 YEM YE Yemen, Rep. Middle East & North Africa Middle East & North Africa (excluding high inc... Low income IDA Sana'a 44.2075 15.35200
301 ZAF ZA South Africa Sub-Saharan Africa Sub-Saharan Africa (excluding high income) Upper middle income IBRD Pretoria 28.1871 -25.74600
302 ZMB ZM Zambia Sub-Saharan Africa Sub-Saharan Africa (excluding high income) Lower middle income IDA Lusaka 28.2937 -15.39820
303 ZWE ZW Zimbabwe Sub-Saharan Africa Sub-Saharan Africa (excluding high income) Lower middle income Blend Harare 31.0672 -17.83120

304 rows × 10 columns

Let's use wb to find all the series that have the word "population".


In [144]:
popvars = wb.search(string='population')
popvars


Out[144]:
id name unit source sourceNote sourceOrganization topics
24 1.1_ACCESS.ELECTRICITY.TOT Access to electricity (% of total population) Sustainable Energy for All Access to electricity is the percentage of pop... b'World Bank Global Electrification Database 2...
39 1.2_ACCESS.ELECTRICITY.RURAL Access to electricity (% of rural population) Sustainable Energy for All Access to electricity is the percentage of rur... b'World Bank Global Electrification Database 2...
40 1.3_ACCESS.ELECTRICITY.URBAN Access to electricity (% of urban population) Sustainable Energy for All Access to electricity is the percentage of tot... b'World Bank Global Electrification Database 2...
128 2.1_ACCESS.CFT.TOT Access to Clean Fuels and Technologies for coo... Sustainable Energy for All b''
159 3.11.01.01.popcen Population census Statistical Capacity Indicators Population censuses collect data on the size, ... b'World Bank Microdata library. Original sourc...
... ... ... ... ... ... ... ...
17439 per_sionl.overlap_pop_urb Population only receiving All Social Insurance... The Atlas of Social Protection: Indicators of ... NULL b'The Atlas of Social Protection: Indicators o... Social Protection & Labor
17440 per_sionl.overlap_q1_preT_tot Population in the 1st quintile (poorest) only ... The Atlas of Social Protection: Indicators of ... NULL b'The Atlas of Social Protection: Indicators o... Social Protection & Labor
17441 per_sionl.overlap_q1_rur Population in the 1st quintile (poorest) only ... The Atlas of Social Protection: Indicators of ... NULL b'The Atlas of Social Protection: Indicators o... Social Protection & Labor
17442 per_sionl.overlap_q1_tot Population in the 1st quintile (poorest) only ... The Atlas of Social Protection: Indicators of ... NULL b'The Atlas of Social Protection: Indicators o... Social Protection & Labor
17443 per_sionl.overlap_q1_urb Population in the 1st quintile (poorest) only ... The Atlas of Social Protection: Indicators of ... NULL b'The Atlas of Social Protection: Indicators o... Social Protection & Labor

1591 rows × 7 columns

Lot's of variables are available, from multiple sources that have been collected by the WB. If you check their website you can see more information on them, also identify and search the variables you may want to focus on. Here let's download the number of males and females in the population by age group, the total population, as well as the total urban population for the year 2017.


In [145]:
femalepop = popvars.loc[popvars.id.apply(lambda x: x.find('SP.POP.')!=-1 and x.endswith('FE'))]
malepop = popvars.loc[popvars.id.apply(lambda x: x.find('SP.POP.')!=-1 and x.endswith('MA'))]
popfields = ['SP.POP.0014.FE.IN', 'SP.POP.1564.FE.IN', 'SP.POP.65UP.FE.IN',
             'SP.POP.0014.MA.IN', 'SP.POP.1564.MA.IN', 'SP.POP.65UP.MA.IN',
             'SP.POP.TOTL.FE.IN', 'SP.POP.TOTL.MA.IN', 'SP.POP.TOTL',
             'EN.URB.MCTY', 'EN.URB.LCTY'] + malepop.id.tolist() + femalepop.id.tolist()
popfields


Out[145]:
['SP.POP.0014.FE.IN',
 'SP.POP.1564.FE.IN',
 'SP.POP.65UP.FE.IN',
 'SP.POP.0014.MA.IN',
 'SP.POP.1564.MA.IN',
 'SP.POP.65UP.MA.IN',
 'SP.POP.TOTL.FE.IN',
 'SP.POP.TOTL.MA.IN',
 'SP.POP.TOTL',
 'EN.URB.MCTY',
 'EN.URB.LCTY',
 'SP.POP.0004.MA',
 'SP.POP.0509.MA',
 'SP.POP.1014.MA',
 'SP.POP.1519.MA',
 'SP.POP.2024.MA',
 'SP.POP.2529.MA',
 'SP.POP.3034.MA',
 'SP.POP.3539.MA',
 'SP.POP.4044.MA',
 'SP.POP.4549.MA',
 'SP.POP.5054.MA',
 'SP.POP.5559.MA',
 'SP.POP.6064.MA',
 'SP.POP.6569.MA',
 'SP.POP.7074.MA',
 'SP.POP.7579.MA',
 'SP.POP.80UP.MA',
 'SP.POP.0004.FE',
 'SP.POP.0509.FE',
 'SP.POP.1014.FE',
 'SP.POP.1519.FE',
 'SP.POP.2024.FE',
 'SP.POP.2529.FE',
 'SP.POP.3034.FE',
 'SP.POP.3539.FE',
 'SP.POP.4044.FE',
 'SP.POP.4549.FE',
 'SP.POP.5054.FE',
 'SP.POP.5559.FE',
 'SP.POP.6064.FE',
 'SP.POP.6569.FE',
 'SP.POP.7074.FE',
 'SP.POP.7579.FE',
 'SP.POP.80UP.FE']

Let's also download GDP per capita in PPP at constant 2011 prices, which is the series NY.GDP.PCAP.PP.KD.


In [146]:
wdi = wb.download(indicator=popfields+['NY.GDP.PCAP.PP.KD'], country=wbcountries.iso2c.values, start=2017, end=2017)

wdi


/Users/ozak/anaconda3/envs/GeoPython38env/lib/python3.8/site-packages/pandas_datareader/wb.py:592: UserWarning: Non-standard ISO country codes: 1A, 1W, 4E, 6D, 6F, 6L, 6N, 6X, 7E, 8S, A4, A5, A9, B1, B2, B3, B4, B6, B7, B8, C4, C5, C6, C7, C8, C9, D2, D3, D4, D5, D6, D7, D8, D9, EU, F1, F6, JG, L4, L5, L6, L7, M1, M2, N6, O6, OE, R6, S1, S2, S3, S4, T2, T3, T4, T5, T6, T7, V1, V2, V3, V4, XC, XD, XE, XF, XG, XH, XI, XJ, XK, XL, XM, XN, XO, XP, XQ, XT, XU, XY, Z4, Z7, ZB, ZF, ZG, ZJ, ZQ, ZT
  warnings.warn(
Out[146]:
SP.POP.0014.FE.IN SP.POP.1564.FE.IN SP.POP.65UP.FE.IN SP.POP.0014.MA.IN SP.POP.1564.MA.IN SP.POP.65UP.MA.IN SP.POP.TOTL.FE.IN SP.POP.TOTL.MA.IN SP.POP.TOTL EN.URB.MCTY ... SP.POP.4044.FE SP.POP.4549.FE SP.POP.5054.FE SP.POP.5559.FE SP.POP.6064.FE SP.POP.6569.FE SP.POP.7074.FE SP.POP.7579.FE SP.POP.80UP.FE NY.GDP.PCAP.PP.KD
country year
Aruba 2017 9297.0 38127.0 7907.0 9646.0 34524.0 5864.0 55331.0 50035.0 105366.0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 33966.483000
Afghanistan 2017 7732365.0 9413927.0 497974.0 8122796.0 10100250.0 429088.0 17644266.0 18652134.0 36296400.0 3913297.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 2202.570851
Angola 2017 6984651.0 7711810.0 370978.0 7015191.0 7437431.0 296687.0 15067439.0 14749309.0 29816748.0 7515345.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 7310.901738
Albania 2017 243452.0 967394.0 198481.0 274603.0 1005023.0 184503.0 1409327.0 1464130.0 2873457.0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 13093.652313
Andorra 2017 NaN NaN NaN NaN NaN NaN NaN NaN 77001.0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Kosovo 2017 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Yemen, Rep. 2017 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 559027.0 443721.0 385742.0 311511.0 235265.0 179953.0 122274.0 72535.0 56798.0 NaN
South Africa 2017 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 1778722.0 1536691.0 1304635.0 1108203.0 920619.0 684906.0 493069.0 338152.0 275995.0 NaN
Zambia 2017 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 355596.0 259410.0 196406.0 152437.0 115017.0 85429.0 60372.0 38551.0 29540.0 NaN
Zimbabwe 2017 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 342473.0 243625.0 193105.0 159731.0 133656.0 95561.0 72600.0 53412.0 44607.0 NaN

523 rows × 46 columns

Looks like there are lots of missing values...but be not fooled. This is a strange behavior of wb. Since the original source differs, it is not linking the countries correctly. Let's see this


In [147]:
wdi.sort_index()


Out[147]:
SP.POP.0014.FE.IN SP.POP.1564.FE.IN SP.POP.65UP.FE.IN SP.POP.0014.MA.IN SP.POP.1564.MA.IN SP.POP.65UP.MA.IN SP.POP.TOTL.FE.IN SP.POP.TOTL.MA.IN SP.POP.TOTL EN.URB.MCTY ... SP.POP.4044.FE SP.POP.4549.FE SP.POP.5054.FE SP.POP.5559.FE SP.POP.6064.FE SP.POP.6569.FE SP.POP.7074.FE SP.POP.7579.FE SP.POP.80UP.FE NY.GDP.PCAP.PP.KD
country year
Afghanistan 2017 7732365.0 9413927.0 497974.0 8122796.0 10100250.0 429088.0 17644266.0 18652134.0 36296400.0 3913297.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 2202.570851
2017 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 700156.0 562807.0 451226.0 357049.0 275515.0 218541.0 145457.0 78439.0 55537.0 NaN
Albania 2017 243452.0 967394.0 198481.0 274603.0 1005023.0 184503.0 1409327.0 1464130.0 2873457.0 NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 13093.652313
2017 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 85864.0 94264.0 101671.0 101284.0 84148.0 63784.0 52773.0 40717.0 41206.0 NaN
Algeria 2017 6005664.0 13165826.0 1310941.0 6261929.0 13399086.0 1245754.0 20482430.0 20906768.0 41389198.0 2659373.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 11550.617638
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Yemen, Rep. 2017 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 559027.0 443721.0 385742.0 311511.0 235265.0 179953.0 122274.0 72535.0 56798.0 NaN
Zambia 2017 3794229.0 4502766.0 213893.0 3859276.0 4346054.0 137471.0 8510888.0 8342800.0 16853688.0 2406227.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 3485.002103
2017 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 355596.0 259410.0 196406.0 152437.0 115017.0 85429.0 60372.0 38551.0 29540.0 NaN
Zimbabwe 2017 3024124.0 4169317.0 266180.0 3040282.0 3590650.0 146192.0 7459621.0 6777124.0 14236745.0 1509901.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 3134.327494
2017 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 342473.0 243625.0 193105.0 159731.0 133656.0 95561.0 72600.0 53412.0 44607.0 NaN

523 rows × 46 columns

Let's aggregate by year-country so that we have the correct data


In [148]:
wdi = wdi.groupby(['country', 'year']).max()
wdi.reset_index(inplace=True)
wdi


Out[148]:
country year SP.POP.0014.FE.IN SP.POP.1564.FE.IN SP.POP.65UP.FE.IN SP.POP.0014.MA.IN SP.POP.1564.MA.IN SP.POP.65UP.MA.IN SP.POP.TOTL.FE.IN SP.POP.TOTL.MA.IN ... SP.POP.4044.FE SP.POP.4549.FE SP.POP.5054.FE SP.POP.5559.FE SP.POP.6064.FE SP.POP.6569.FE SP.POP.7074.FE SP.POP.7579.FE SP.POP.80UP.FE NY.GDP.PCAP.PP.KD
0 Afghanistan 2017 7732365.0 9.413927e+06 497974.0 8.122796e+06 1.010025e+07 429088.0 1.764427e+07 1.865213e+07 ... 700156.0 562807.0 451226.0 357049.0 275515.0 218541.0 145457.0 78439.0 55537.0 2202.570851
1 Albania 2017 243452.0 9.673940e+05 198481.0 2.746030e+05 1.005023e+06 184503.0 1.409327e+06 1.464130e+06 ... 85864.0 94264.0 101671.0 101284.0 84148.0 63784.0 52773.0 40717.0 41206.0 13093.652313
2 Algeria 2017 6005664.0 1.316583e+07 1310941.0 6.261929e+06 1.339909e+07 1245754.0 2.048243e+07 2.090677e+07 ... 1334956.0 1112079.0 950716.0 767641.0 631310.0 458099.0 325968.0 256637.0 270236.0 11550.617638
3 American Samoa 2017 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 Andorra 2017 NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
259 West Bank and Gaza 2017 855495.0 1.267122e+06 72244.0 8.935800e+05 1.300776e+06 65588.0 2.194861e+06 2.259944e+06 ... 102015.0 84835.0 69853.0 50659.0 36755.0 28050.0 20237.0 13392.0 10565.0 1183.435345
260 World 2017 941262629.0 2.422523e+09 358609161.0 1.005788e+09 2.488493e+09 290411331.0 3.722395e+09 3.784692e+09 ... 240536539.0 231501913.0 209202400.0 179317688.0 155624953.0 123519929.0 87547943.0 64952254.0 82589032.0 16167.228725
261 Yemen, Rep. 2017 5456767.0 7.919190e+06 431560.0 5.677559e+06 7.987563e+06 362182.0 1.380752e+07 1.402730e+07 ... 559027.0 443721.0 385742.0 311511.0 235265.0 179953.0 122274.0 72535.0 56798.0 NaN
262 Zambia 2017 3794229.0 4.502766e+06 213893.0 3.859276e+06 4.346054e+06 137471.0 8.510888e+06 8.342800e+06 ... 355596.0 259410.0 196406.0 152437.0 115017.0 85429.0 60372.0 38551.0 29540.0 3485.002103
263 Zimbabwe 2017 3024124.0 4.169317e+06 266180.0 3.040282e+06 3.590650e+06 146192.0 7.459621e+06 6.777124e+06 ... 342473.0 243625.0 193105.0 159731.0 133656.0 95561.0 72600.0 53412.0 44607.0 3134.327494

264 rows × 48 columns

Let's merge this data with the original wbcountries dataframe, so that we can use it to plot.


In [149]:
wdi = wbcountries.merge(wdi, left_on='name', right_on='country')
wdi


Out[149]:
iso3c iso2c name region adminregion incomeLevel lendingType capitalCity longitude latitude ... SP.POP.4044.FE SP.POP.4549.FE SP.POP.5054.FE SP.POP.5559.FE SP.POP.6064.FE SP.POP.6569.FE SP.POP.7074.FE SP.POP.7579.FE SP.POP.80UP.FE NY.GDP.PCAP.PP.KD
0 ABW AW Aruba Latin America & Caribbean High income Not classified Oranjestad -70.0167 12.51670 ... 3920.0 4249.0 4847.0 4638.0 3810.0 2928.0 2021.0 1439.0 1519.0 33966.483000
1 AFG AF Afghanistan South Asia South Asia Low income IDA Kabul 69.1761 34.52280 ... 700156.0 562807.0 451226.0 357049.0 275515.0 218541.0 145457.0 78439.0 55537.0 2202.570851
2 AGO AO Angola Sub-Saharan Africa Sub-Saharan Africa (excluding high income) Lower middle income IBRD Luanda 13.2420 -8.81155 ... 622220.0 475507.0 379684.0 305882.0 205034.0 147574.0 106010.0 67489.0 49905.0 7310.901738
3 ALB AL Albania Europe & Central Asia Europe & Central Asia (excluding high income) Upper middle income IBRD Tirane 19.8172 41.33170 ... 85864.0 94264.0 101671.0 101284.0 84148.0 63784.0 52773.0 40717.0 41206.0 13093.652313
4 AND AD Andorra Europe & Central Asia High income Not classified Andorra la Vella 1.5218 42.50750 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
259 XKX XK Kosovo Europe & Central Asia Europe & Central Asia (excluding high income) Upper middle income IDA Pristina 20.9260 42.56500 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN 10302.075476
260 YEM YE Yemen, Rep. Middle East & North Africa Middle East & North Africa (excluding high inc... Low income IDA Sana'a 44.2075 15.35200 ... 559027.0 443721.0 385742.0 311511.0 235265.0 179953.0 122274.0 72535.0 56798.0 NaN
261 ZAF ZA South Africa Sub-Saharan Africa Sub-Saharan Africa (excluding high income) Upper middle income IBRD Pretoria 28.1871 -25.74600 ... 1778722.0 1536691.0 1304635.0 1108203.0 920619.0 684906.0 493069.0 338152.0 275995.0 12703.421242
262 ZMB ZM Zambia Sub-Saharan Africa Sub-Saharan Africa (excluding high income) Lower middle income IDA Lusaka 28.2937 -15.39820 ... 355596.0 259410.0 196406.0 152437.0 115017.0 85429.0 60372.0 38551.0 29540.0 3485.002103
263 ZWE ZW Zimbabwe Sub-Saharan Africa Sub-Saharan Africa (excluding high income) Lower middle income Blend Harare 31.0672 -17.83120 ... 342473.0 243625.0 193105.0 159731.0 133656.0 95561.0 72600.0 53412.0 44607.0 3134.327494

264 rows × 58 columns

Plot Male vs Female population in each country in 2017


In [150]:
PersistencePlot(wdi, var0='SP.POP.TOTL.FE.IN', var1='SP.POP.TOTL.MA.IN', xlabel='Number of Females',
                ylabel='Number of Males', labelvar='iso3c', linelabel='Female-Male', 
                dx=0.1, dy=0.1, filename='Female-Male-2017.pdf')


Let's take $log$s so we see this better


In [151]:
wdi['lpop_fe'] = np.log(wdi['SP.POP.TOTL.FE.IN'])
wdi['lpop_ma'] = np.log(wdi['SP.POP.TOTL.MA.IN'])
PersistencePlot(wdi, var0='lpop_fe', var1='lpop_ma', xlabel='Log[Number of Females]',
                ylabel='Log[Number of Males]', labelvar='iso3c', linelabel='Female-Male', 
                dx=0.01, dy=0.01, filename='Female-Male-2017.pdf')


Seems like the gender ratio, i.e., the number of males per female is quite different from 1. Let's plot the histogram of the gender ratio across countries to see this better.


In [152]:
(np.exp(wdi['lpop_ma'] - wdi['lpop_fe'])).hist()


Out[152]:
<matplotlib.axes._subplots.AxesSubplot at 0x143263910>

In [153]:
wdi['gender_ratio'] = (wdi['SP.POP.TOTL.MA.IN'] / wdi['SP.POP.TOTL.FE.IN'])
wdi.gender_ratio.hist()


Out[153]:
<matplotlib.axes._subplots.AxesSubplot at 0x142a5d880>

In [154]:
print('Maximum gender ratio = ', wdi.gender_ratio.max())
wdi.loc[wdi.gender_ratio>=1.05][['iso3c', 'name', 'region', 'gender_ratio']].sort_values('gender_ratio')


Maximum gender ratio =  3.1097202844667002
Out[154]:
iso3c name region gender_ratio
38 CHN China East Asia & Pacific 1.054808
224 SYC Seychelles Sub-Saharan Africa 1.056584
1 AFG Afghanistan South Asia 1.057122
167 MYS Malaysia East Asia & Pacific 1.059319
182 PAK Pakistan South Asia 1.060265
202 SAS South Asia Aggregates 1.068105
238 TSA South Asia (IDA & IBRD) Aggregates 1.068105
258 WSM Samoa East Asia & Pacific 1.071733
151 MEA Middle East & North Africa Aggregates 1.073636
5 ARB Arab World Aggregates 1.074033
107 IND India South Asia 1.082583
29 BRN Brunei Darussalam East Asia & Pacific 1.083313
216 SST Small states Aggregates 1.096263
206 SGP Singapore East Asia & Pacific 1.098113
54 DJI Djibouti Middle East & North Africa 1.113483
30 BTN Bhutan South Asia 1.123726
181 OSS Other small states Aggregates 1.130181
86 GNQ Equatorial Guinea Sub-Saharan Africa 1.245989
203 SAU Saudi Arabia Middle East & North Africa 1.343270
125 KWT Kuwait Middle East & North Africa 1.492402
150 MDV Maldives South Asia 1.620545
20 BHR Bahrain Middle East & North Africa 1.695785
180 OMN Oman Middle East & North Africa 1.929817
6 ARE United Arab Emirates Middle East & North Africa 2.280813
198 QAT Qatar Middle East & North Africa 3.109720

In [155]:
print('Minimum gender ratio = ', wdi.gender_ratio.min())
wdi.loc[wdi.gender_ratio<=0.95][['iso3c', 'name', 'region', 'gender_ratio']].sort_values('gender_ratio')


Minimum gender ratio =  0.8326826068938992
Out[155]:
iso3c name region gender_ratio
176 NPL Nepal South Asia 0.832683
49 CUW Curacao Latin America & Caribbean 0.847313
143 LVA Latvia Europe & Central Asia 0.849527
141 LTU Lithuania Europe & Central Asia 0.857032
94 HKG Hong Kong SAR, China East Asia & Pacific 0.857683
246 UKR Ukraine Europe & Central Asia 0.862200
200 RUS Russian Federation Europe & Central Asia 0.863357
23 BLR Belarus Europe & Central Asia 0.870642
209 SLV El Salvador Latin America & Caribbean 0.884228
69 EST Estonia Europe & Central Asia 0.886953
8 ARM Armenia Europe & Central Asia 0.888378
192 PRT Portugal Europe & Central Asia 0.897158
0 ABW Aruba Latin America & Caribbean 0.904285
99 HUN Hungary Europe & Central Asia 0.906684
254 VIR Virgin Islands (U.S.) Latin America & Caribbean 0.908106
263 ZWE Zimbabwe Sub-Saharan Africa 0.908508
190 PRI Puerto Rico Latin America & Caribbean 0.910474
80 GEO Georgia Europe & Central Asia 0.913288
62 ECA Europe & Central Asia (excluding high income) Aggregates 0.917302
229 TEC Europe & Central Asia (IDA & IBRD countries) Aggregates 0.919337
144 MAC Macao SAR, China East Asia & Pacific 0.923031
148 MDA Moldova Europe & Central Asia 0.923076
83 GIN Guinea Sub-Saharan Africa 0.925869
136 LKA Sri Lanka South Asia 0.926015
97 HRV Croatia Europe & Central Asia 0.927410
31 BWA Botswana Sub-Saharan Africa 0.929864
10 ATG Antigua and Barbuda Latin America & Caribbean 0.930138
158 MMR Myanmar East Asia & Pacific 0.930308
248 URY Uruguay Latin America & Caribbean 0.932800
28 BRB Barbados Latin America & Caribbean 0.933902
34 CEB Central Europe and the Baltics Aggregates 0.937790
169 NAM Namibia Sub-Saharan Africa 0.939169
75 FRA France Europe & Central Asia 0.939170
63 ECS Europe & Central Asia Aggregates 0.939541
118 KAZ Kazakhstan Europe & Central Asia 0.940358
188 POL Poland Europe & Central Asia 0.940866
163 MOZ Mozambique Sub-Saharan Africa 0.941291
21 BHS Bahamas, The Latin America & Caribbean 0.944080
114 ITA Italy Europe & Central Asia 0.945175
19 BGR Bulgaria Europe & Central Asia 0.945606
219 SVK Slovak Republic Europe & Central Asia 0.946981
222 SWZ Eswatini Sub-Saharan Africa 0.947617
199 ROU Romania Europe & Central Asia 0.948377
205 SEN Senegal Sub-Saharan Africa 0.948715

Gender ratio and development


In [156]:
wdi['lgdppc'] = np.log(wdi['NY.GDP.PCAP.PP.KD'])
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.scatterplot(x='lgdppc', y='gender_ratio', hue='region',
                hue_order=['East Asia & Pacific', 'Europe & Central Asia',
                           'Latin America & Caribbean ', 'Middle East & North Africa',
                           'North America', 'South Asia', 'Sub-Saharan Africa '],
                data=wdi.loc[wdi.region!='Aggregates'], alpha=1, style='incomeLevel', 
                style_order=['High income', 'Upper middle income', 'Lower middle income', 'Low income'],
                )
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Log[GDP per capita]')
ax.set_ylabel('Gender Ratio')
plt.savefig(pathgraphs + 'Gender-Ratio-GDPpc.pdf', dpi=300, bbox_inches='tight')



In [157]:
fig


Out[157]:

Use statistical and mathematical functions to analyze the data

Now let's import the statsmodels module to run regressions.


In [158]:
import statsmodels.api as sm
import statsmodels.formula.api as smf
from IPython.display import Latex

Let's estimate the elasticity of the number of men with respect to the number of women.


In [159]:
mod = sm.OLS(wdi['lpop_ma'],sm.add_constant(wdi['lpop_fe']), missing='drop').fit()
mod.summary2()


Out[159]:
Model: OLS Adj. R-squared: 0.998
Dependent Variable: lpop_ma AIC: -306.5235
Date: 2020-06-13 11:54 BIC: -299.5706
No. Observations: 239 Log-Likelihood: 155.26
Df Model: 1 F-statistic: 1.055e+05
Df Residuals: 237 Prob (F-statistic): 5.56e-316
R-squared: 0.998 Scale: 0.016103
Coef. Std.Err. t P>|t| [0.025 0.975]
const 0.0553 0.0496 1.1149 0.2660 -0.0424 0.1529
lpop_fe 0.9968 0.0031 324.8267 0.0000 0.9908 1.0029
Omnibus: 287.615 Durbin-Watson: 1.950
Prob(Omnibus): 0.000 Jarque-Bera (JB): 13838.973
Skew: 5.192 Prob(JB): 0.000
Kurtosis: 38.803 Condition No.: 98

In [160]:
print('The elasticity is %8.4f' % mod.params[1])
print(r'The $R^2$ is %8.3f' % mod.rsquared)


The elasticity is   0.9968
The $R^2$ is    0.998

Let's instead use the smf module, which allows us to run the regression wiritng the formula instead of having to pass the data and adding the constant as a new variable. Let's run a simple correlation between $\log(GDPpc)$ and the gender ratio.


In [161]:
mod = smf.ols(formula='lgdppc ~ gender_ratio', data=wdi[['lpop_ma','lpop_fe', 'lgdppc', 'gender_ratio']], missing='drop').fit()
mod.summary2()


Out[161]:
Model: OLS Adj. R-squared: 0.022
Dependent Variable: lgdppc AIC: 693.7805
Date: 2020-06-13 11:54 BIC: 700.6216
No. Observations: 226 Log-Likelihood: -344.89
Df Model: 1 F-statistic: 6.003
Df Residuals: 224 Prob (F-statistic): 0.0151
R-squared: 0.026 Scale: 1.2500
Coef. Std.Err. t P>|t| [0.025 0.975]
Intercept 8.4168 0.3909 21.5327 0.0000 7.6465 9.1870
gender_ratio 0.9251 0.3776 2.4500 0.0151 0.1810 1.6693
Omnibus: 12.884 Durbin-Watson: 1.817
Prob(Omnibus): 0.002 Jarque-Bera (JB): 7.382
Skew: -0.270 Prob(JB): 0.025
Kurtosis: 2.298 Condition No.: 10

In [162]:
mysummary=mod.summary2()
Latex(mysummary.as_latex())


Out[162]:
\begin{table} \caption{Results: Ordinary least squares} \begin{center} \begin{tabular}{llll} \hline Model: & OLS & Adj. R-squared: & 0.022 \\ Dependent Variable: & lgdppc & AIC: & 693.7805 \\ Date: & 2020-06-13 11:54 & BIC: & 700.6216 \\ No. Observations: & 226 & Log-Likelihood: & -344.89 \\ Df Model: & 1 & F-statistic: & 6.003 \\ Df Residuals: & 224 & Prob (F-statistic): & 0.0151 \\ R-squared: & 0.026 & Scale: & 1.2500 \\ \hline \end{tabular} \end{center} \hline \begin{center} \begin{tabular}{lcccccc} \hline & Coef. & Std.Err. & t & P$> |$t$|$ & [0.025 & 0.975] \\ \hline \hline \end{tabular} \begin{tabular}{lrrrrrr} Intercept & 8.4168 & 0.3909 & 21.5327 & 0.0000 & 7.6465 & 9.1870 \\ gender\_ratio & 0.9251 & 0.3776 & 2.4500 & 0.0151 & 0.1810 & 1.6693 \\ \hline \end{tabular} \end{center} \hline \begin{center} \begin{tabular}{llll} \hline Omnibus: & 12.884 & Durbin-Watson: & 1.817 \\ Prob(Omnibus): & 0.002 & Jarque-Bera (JB): & 7.382 \\ Skew: & -0.270 & Prob(JB): & 0.025 \\ Kurtosis: & 2.298 & Condition No.: & 10 \\ \hline \end{tabular} \end{center} \end{table}

In [163]:
print('The semi-elasticity is %2.4f' % mod.params[1])
print(r'The $R^2$ is %1.3f' % mod.rsquared)


The semi-elasticity is 0.9251
The $R^2$ is 0.026

But of course we know correlation is not causation! Even more, from our figure we know that the positive association is driven by the rich oil producing countries of the Middle East & North Africa. To see this, let's replicate the analysis without those countries.


In [164]:
mod = smf.ols(formula='lgdppc ~ gender_ratio', data=wdi.loc[wdi.region!='Middle East & North Africa'][['lpop_ma','lpop_fe', 'lgdppc', 'gender_ratio']], missing='drop').fit()
mod.summary2()


Out[164]:
Model: OLS Adj. R-squared: 0.004
Dependent Variable: lgdppc AIC: 639.2589
Date: 2020-06-13 11:54 BIC: 645.9244
No. Observations: 207 Log-Likelihood: -317.63
Df Model: 1 F-statistic: 1.919
Df Residuals: 205 Prob (F-statistic): 0.167
R-squared: 0.009 Scale: 1.2722
Coef. Std.Err. t P>|t| [0.025 0.975]
Intercept 10.8771 1.1291 9.6336 0.0000 8.6510 13.1032
gender_ratio -1.5789 1.1397 -1.3853 0.1675 -3.8260 0.6682
Omnibus: 12.094 Durbin-Watson: 1.734
Prob(Omnibus): 0.002 Jarque-Bera (JB): 6.673
Skew: -0.255 Prob(JB): 0.036
Kurtosis: 2.283 Condition No.: 29

In [165]:
print('The semi-elasticity is %2.4f with a p-value of %1.4f' % (mod.params[1], mod.pvalues[1]))
print(r'The $R^2$ is %1.3f' % mod.rsquared)
print("Luckily we had plotted the data, right?!")


The semi-elasticity is -1.5789 with a p-value of 0.1675
The $R^2$ is 0.009
Luckily we had plotted the data, right?!

Homework

Using Pandas and Statsmodels write a Jupyter Notebook that:

  1. Uses the data from the Maddison Project to plot the evolution of total population across the world.
  2. Plots the evolution of the share of the world population by countries and WB regions.
  3. Downloads fertility, mortality and life expectancy data from the WB and plots its evolution in the last 60 years.
  4. Downloads mortality and life expectancy data (across regions and cohorts) from the Human Mortality Database and plots its evolution.
  5. Using this data analyze the convergence of life expectanty, mortality and fertility.

Submit your notebook as a pull request to the course's github repository.

Wages and Population In England 1200-1860

Let's get the population and wage series from Greg Clark's website for plotting.


In [166]:
uk1 = pd.read_excel('http://faculty.econ.ucdavis.edu/faculty/gclark/English%20Data/England%20NNI%20-%20Clark%20-%202015.xlsx', sheet_name='Decadal')
uk2 = pd.read_excel('http://faculty.econ.ucdavis.edu/faculty/gclark/English%20Data/Wages%202014.xlsx', sheet_name='Decadal')

In [167]:
uk1


Out[167]:
Decade Unnamed: 1 Pop England Share Males farm sector Male Farm Wage Male Non-Farm Wage Male average Wage Male Work Days per Year Total Wage Income Land rents ... All Capital Income Indirect Taxes Net National Income Unnamed: 15 Price Index - Domestic Expenditure Price Index - GDP Price Index - Cost of Living Unnamed: 19 Real Net National Income (DE) Real NNI/N
0 NaN NaN m. NaN d./day d./day d./day NaN (₤ m) (₤ m) ... (₤ m) (₤ m) (₤ m) NaN (1860s=100) (1860s=100) (1860s=100) NaN (1860s=100) (1860s=100)
1 1200.0 NaN 3.39595 0.555168 1.37365 2.28282 2.08878 300.0 3.07847 1.60604 ... 1.74125 0 6.42576 NaN 6.58634 7.12642 6.5442 NaN 14.8972 86.6214
2 1210.0 NaN 3.39595 0.575784 1.26945 1.84928 2.02114 300.0 3.20043 1.60604 ... 1.95638 0 6.76285 NaN 7.49473 8.1093 7.57584 NaN 14.0425 81.6513
3 1220.0 NaN 3.738 0.626021 1.25538 2.13595 1.94733 300.0 3.39416 1.62895 ... 1.97144 0 6.99455 NaN 8.33274 9.01602 8.53557 NaN 13.1437 69.432
4 1230.0 NaN 3.9039 0.652303 1.17893 NaN 1.84872 300.0 3.3653 1.33146 ... 2.04084 0 6.7376 NaN 8.2654 8.94316 8.40574 NaN 12.4624 63.035
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
63 1820.0 NaN 11.9821 0.345313 20.3334 34.5349 34.3278 300.0 191.868 38.1915 ... 78.7788 29.1646 338.003 NaN 108.479 112.086 110.194 NaN 48.1282 79.2907
64 1830.0 NaN 13.7732 0.308229 20.0429 35.3837 35.4298 300.0 227.646 36.5573 ... 93.748 25.8767 383.828 NaN 100.892 102.972 101.269 NaN 58.5932 84.0351
65 1840.0 NaN 15.6365 0.264763 21.0963 36.1676 37.0167 300.0 269.977 39.1656 ... 101.875 26.1843 437.202 NaN 96.8991 97.8146 98.7991 NaN 69.559 87.7247
66 1850.0 NaN 17.5896 0.246630 22.0997 37.8408 39.1299 300.0 321.387 39.4743 ... 124.452 28.3904 513.703 NaN 93.3178 93.1664 95.1283 NaN 84.549 94.9057
67 1860.0 NaN 19.7222 0.239390 23.6258 43.5979 44.6595 300.0 411.413 43.1763 ... 168.819 30.283 653.692 NaN 99.9493 99.9555 99.9962 NaN 100.343 100.349

68 rows × 22 columns


In [168]:
uk2


Out[168]:
Decade Farm Laborers, d/day Coal Miners, d./day Building Laborers, d/day Building Craftsmen, d/day Unnamed: 5 Cost of Living (1860s=100) Unnamed: 7 Real Farm Wage (1860s=100) Real Building Laborer Wage (1860s=100) Real Building Craftsman Wage (1860s=100)
0 1200 1.373647 NaN NaN 2.783922 NaN 6.544197 NaN 88.841573 NaN 80.673336
1 1210 1.262561 NaN NaN 2.078984 NaN 7.575843 NaN 72.045676 NaN 52.335306
2 1220 1.249455 NaN 1.625946 2.602945 NaN 8.535567 NaN 60.578574 51.791535 56.307104
3 1230 1.178929 NaN NaN NaN NaN 8.405740 NaN 59.258095 NaN NaN
4 1240 1.246828 NaN 1.878412 2.893921 NaN 8.871055 NaN 61.132054 58.464596 62.484216
... ... ... ... ... ... ... ... ... ... ... ...
62 1820 20.333416 32.226677 27.009300 42.060419 NaN 110.194354 NaN 78.081590 71.212912 72.500372
63 1830 20.042939 32.680000 28.021165 42.746221 NaN 101.268842 NaN 83.892814 80.390114 80.295861
64 1840 21.096252 30.920000 29.023687 43.311592 NaN 98.771980 NaN 90.604982 85.635493 83.439177
65 1850 22.099690 36.680000 30.103970 45.577598 NaN 95.128327 NaN 98.270928 92.231871 91.251668
66 1860 23.625775 41.760000 34.466257 52.729581 NaN 99.996226 NaN 100.013083 100.110361 100.049356

67 rows × 11 columns

Let's clean the data and merge it into a unique dataframe.


In [169]:
uk1 = uk1.loc[uk1.index.difference([0])].reset_index(drop=True)[[col for col in uk1.columns if col.find('Unnamed')==-1]]
uk2 = uk2[[col for col in uk2.columns if col.find('Unnamed')==-1]]
uk = uk1.merge(uk2)
uk.Decade = uk.Decade.astype(int)
uk['Pop England'] = uk['Pop England'].astype(float)

In [170]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='Decade', y='Pop England', data=uk.loc[uk.Decade<1730], alpha=1, label='Population', color='r')
ax2 = ax.twinx()
sns.lineplot(x='Decade', y='Real Farm Wage (1860s=100)', data=uk.loc[uk.Decade<1730], alpha=1, label='Real Wages', color='b')
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
handles, labels = ax.get_legend_handles_labels()
handles2, labels2 = ax2.get_legend_handles_labels()
ax.legend(handles=(handles+handles2), labels=(labels+labels2), loc='upper left')
ax2.legend(handles=(handles+handles2), labels=(labels+labels2), loc='upper left')
nticks = 7
ax.yaxis.set_major_locator(matplotlib.ticker.LinearLocator(nticks))
ax2.yaxis.set_major_locator(matplotlib.ticker.LinearLocator(nticks))
ax.set_xlabel('Year')
ax.set_ylabel('Population (millions)')
plt.savefig(pathgraphs + 'UK-pop-GDPpc-1200-1730.pdf', dpi=300, bbox_inches='tight')



In [171]:
fig


Out[171]:

In [ ]: