Working with Economic data in Python

This notebook will introduce you to working with data in Python. You will use packages like Numpy to manipulate, work and do computations with arrays, matrices, and such, and anipulate data (see my Introduction to Python). But given the needs of economists (and other scientists) it will be advantageous for us to use pandas. pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for Python. pandas allows you to import and process data in many useful ways. It interacts greatly with other packages that complement it making it a very powerful tool for data analysis.

With pandas you can

  1. Import many types of data, including
    • CSV files
    • Tab or other types of delimited files
    • Excel (xls, xlsx) files
    • Stata files
  1. Open files directly from a website
  2. Merge, select, join data
  3. Perform statistical analyses
  4. Create plots of your data

and much more. Let's start by importing pandas and use to it download some data and create some of the figures from the lecture notes. Note that when importing pandas it is accustomed to assign it the alias pd. I suggest you follow this conventiuon, which will make using other peoples code and snippets easier.


In [1]:
# Let's import pandas and some other basic packages we will use 
from __future__ import division
%pylab --no-import-all
%matplotlib inline
import pandas as pd
import numpy as np


Using matplotlib backend: MacOSX
Populating the interactive namespace from numpy and matplotlib

Working with Pandas

The basic structures in pandas are pd.Series and pd.DataFrame. You can think of a pd.Series as a labeled vector that contains data and has a large set of functions that can be easily performed on it. A pd.DataFrame is similar a table/matrix of multidimensional data where each column contains a pd.Series. I know...this may not explain much, so let's start with some actual examples. Let's create two series, one containing some country names and another containing some ficticious data.


In [2]:
countries = pd.Series(['Colombia', 'Turkey', 'USA', 'Germany', 'Chile'], name='country')
print(countries)
print('\n', 'There are ', countries.shape[0], 'countries in this series.')


0    Colombia
1      Turkey
2         USA
3     Germany
4       Chile
Name: country, dtype: object

 There are  5 countries in this series.

Notice that we have assinged a name to the series that is different than the name of the variable containing the series. Our print(countries) statement is showing the series and its contents, its name and the dype of data it contains. Here our series is only composed of strings so it assigns it the object dtype (not important for now, but we will use this later to convert data between types, e.g. strings to integers or floats or the other way around).

Let's create the data using some of the functions we already learned.


In [3]:
np.random.seed(123456)
data = pd.Series(np.random.normal(size=(countries.shape)), name='noise')
print(data)
print('\n', 'The average in this sample is ', data.mean())


0    0.469112
1   -0.282863
2   -1.509059
3   -1.135632
4    1.212112
Name: noise, dtype: float64

 The average in this sample is  -0.24926597871826645

Here we have used the mean() function of the series to compute its mean. There are many other properties/functions for these series including std(), shape, count(), max(), min(), etc. You can access these by writing series.name_of_function_or_property. To see what functions are available you can hit tab after writing series..

Let's create a pd.DataFrame using these two series.


In [4]:
df = pd.DataFrame([countries, data])
df


Out[4]:
0 1 2 3 4
country Colombia Turkey USA Germany Chile
noise 0.469112 -0.282863 -1.50906 -1.13563 1.21211

Not exactly what we'd like, but don't worry, we can just transpose it so it has each country with its data in a row.


In [5]:
df = df.T
df


Out[5]:
country noise
0 Colombia 0.469112
1 Turkey -0.282863
2 USA -1.50906
3 Germany -1.13563
4 Chile 1.21211

Now let us add some more data to this dataframe. This is done easily by defining a new columns. Let's create the square of noise, create the sum of noise and its square, and get the length of the country's name.


In [6]:
df['noise_sq'] = df.noise**2
df['noise and its square'] = df.noise + df.noise_sq
df['name length'] = df.country.apply(len)
df


Out[6]:
country noise noise_sq noise and its square name length
0 Colombia 0.469112 0.220066 0.689179 8
1 Turkey -0.282863 0.0800117 -0.202852 6
2 USA -1.50906 2.27726 0.768199 3
3 Germany -1.13563 1.28966 0.154029 7
4 Chile 1.21211 1.46922 2.68133 5

This shows some of the ways in which you can create new data. Especially useful is the apply method, which applies a function to the series. You can also apply a function to the whole dataframe, which is useful if you want to perform computations using various columns.

Let's see some other ways in which we can interact with dataframes. First, let's select some observations, e.g., all countries in the South America.


In [7]:
# Let's create a list of South American countries
south_america = ['Colombia', 'Chile']
# Select the rows for South American countries
df.loc[df.country.apply(lambda x: x in south_america)]


Out[7]:
country noise noise_sq noise and its square name length
0 Colombia 0.469112 0.220066 0.689179 8
4 Chile 1.21211 1.46922 2.68133 5

Now let's use this to create a dummy indicating whether a country belongs to South America. To understand what is going on let's show the result of the condition for selecting rows.


In [8]:
df.country.apply(lambda x: x in south_america)


Out[8]:
0     True
1    False
2    False
3    False
4     True
Name: country, dtype: bool

So in the previous selection of rows we told pandas which rows we wanted or not to be included by passing a series of booleans (True, False). We can use this result to create the dummy, we only need to convert the output to int.


In [9]:
df['South America'] = df.country.apply(lambda x: x in south_america).astype(int)

Now, let's plot the various series in the dataframe


In [10]:
df.plot()


Out[10]:
<matplotlib.axes._subplots.AxesSubplot at 0x1275829a0>

Not too nice nor useful. Notice that it assigned the row number to the x-axis labels. Let's change the row labels, which are contained in the dataframe's index by assigning the country names as the index.


In [11]:
df = df.set_index('country')
print(df)
df.plot()


             noise   noise_sq noise and its square  name length  South America
country                                                                       
Colombia  0.469112   0.220066             0.689179            8              1
Turkey   -0.282863  0.0800117            -0.202852            6              0
USA       -1.50906    2.27726             0.768199            3              0
Germany   -1.13563    1.28966             0.154029            7              0
Chile      1.21211    1.46922              2.68133            5              1
Out[11]:
<matplotlib.axes._subplots.AxesSubplot at 0x12968dcd0>

Better, but still not very informative. Below we will improve on this when we work with some real data.

Notice that by using the set_index function we have assigned the index to the country names. This may be useful to select data. E.g., if we want to see only the row for Colombia we can


In [12]:
df.loc['Colombia']


Out[12]:
noise                   0.469112
noise_sq                0.220066
noise and its square    0.689179
name length                    8
South America                  1
Name: Colombia, dtype: object

Getting data

One of the nice features of pandas and its ecology is that it makes obtaining data very easy. In order to exemplify this and also to revisit some of the basic facts of comparative development, let's download some data from various sources. This may require you to create accounts in order to access and download the data (sometimes the process is very simple and does not require an actual project...in other cases you need to propose a project and be approved...usually due to privacy concerns with micro-data). Don't be afraid, all these sources are free and are used a lot in research, so it is good that you learn to use them. Let's start with a list of useful sources.

Country-level data economic data

Censuses, Surveys, and other micro-level data

  • IPUMS: provides census and survey data from around the world integrated across time and space.
  • General Social Survey provides survey data on what Americans think and feel about such issues as national spending priorities, crime and punishment, intergroup relations, and confidence in institutions.
  • European Social Survey provides survey measures on the attitudes, beliefs and behaviour patterns of diverse European populations in more than thirty nations.
  • UK Data Service is the UK’s largest collection of social, economic and population data resources.
  • SHRUG is The Socioeconomic High-resolution Rural-Urban Geographic Platform for India. Provides access to dozens of datasets covering India’s 500,000 villages and 8000 towns using a set of a common geographic identifiers that span 25 years.

Divergence - Big time

To study the divergence across countries let's download and plot the historical GDP and population data. In order to keep the data and not having to download it everytime from scratch, we'll create a folder ./data in the currect directory and save each file there. Also, we'll make sure that if the data does not exist, we download it. We'll use the os package to create directories.

Setting up paths


In [13]:
import os

pathout = './data/'

if not os.path.exists(pathout):
    os.mkdir(pathout)
    
pathgraphs = './graphs/'
if not os.path.exists(pathgraphs):
    os.mkdir(pathgraphs)

Download New Maddison Project Data


In [14]:
try:
    maddison_new = pd.read_stata(pathout + 'Maddison2018.dta')
    maddison_new_region = pd.read_stata(pathout + 'Maddison2018_region.dta')
    maddison_new_1990 = pd.read_stata(pathout + 'Maddison2018_1990.dta')
except:
    maddison_new = pd.read_stata('https://www.rug.nl/ggdc/historicaldevelopment/maddison/data/mpd2018.dta')
    maddison_new.to_stata(pathout + 'Maddison2018.dta', write_index=False, version=117)
    maddison_new_region = pd.read_stata('https://www.rug.nl/ggdc/historicaldevelopment/maddison/data/mpd2018_region_data.dta')
    maddison_new_region.to_stata(pathout + 'Maddison2018_region.dta', write_index=False, version=117)
    maddison_new_1990 = pd.read_stata('https://www.rug.nl/ggdc/historicaldevelopment/maddison/data/mpd2018_1990bm.dta')
    maddison_new_1990.to_stata(pathout + 'Maddison2018_1990.dta', write_index=False, version=117)

In [15]:
maddison_new


Out[15]:
countrycode country year cgdppc rgdpnapc pop i_cig i_bm
0 AFG Afghanistan 1820.0 NaN NaN 3280.0 NaN NaN
1 AFG Afghanistan 1870.0 NaN NaN 4207.0 NaN NaN
2 AFG Afghanistan 1913.0 NaN NaN 5730.0 NaN NaN
3 AFG Afghanistan 1950.0 2392.0 2392.0 8150.0 Extrapolated NaN
4 AFG Afghanistan 1951.0 2422.0 2422.0 8284.0 Extrapolated NaN
... ... ... ... ... ... ... ... ...
19868 ZWE Zimbabwe 2012.0 1623.0 1604.0 12620.0 Extrapolated NaN
19869 ZWE Zimbabwe 2013.0 1801.0 1604.0 13183.0 Extrapolated NaN
19870 ZWE Zimbabwe 2014.0 1797.0 1594.0 13772.0 Extrapolated NaN
19871 ZWE Zimbabwe 2015.0 1759.0 1560.0 14230.0 Extrapolated NaN
19872 ZWE Zimbabwe 2016.0 1729.0 1534.0 14547.0 Extrapolated NaN

19873 rows × 8 columns

This dataset is in long format. Also, notice that the year is not an integer. Let's correct this


In [16]:
maddison_new['year'] = maddison_new.year.astype(int)
maddison_new


Out[16]:
countrycode country year cgdppc rgdpnapc pop i_cig i_bm
0 AFG Afghanistan 1820 NaN NaN 3280.0 NaN NaN
1 AFG Afghanistan 1870 NaN NaN 4207.0 NaN NaN
2 AFG Afghanistan 1913 NaN NaN 5730.0 NaN NaN
3 AFG Afghanistan 1950 2392.0 2392.0 8150.0 Extrapolated NaN
4 AFG Afghanistan 1951 2422.0 2422.0 8284.0 Extrapolated NaN
... ... ... ... ... ... ... ... ...
19868 ZWE Zimbabwe 2012 1623.0 1604.0 12620.0 Extrapolated NaN
19869 ZWE Zimbabwe 2013 1801.0 1604.0 13183.0 Extrapolated NaN
19870 ZWE Zimbabwe 2014 1797.0 1594.0 13772.0 Extrapolated NaN
19871 ZWE Zimbabwe 2015 1759.0 1560.0 14230.0 Extrapolated NaN
19872 ZWE Zimbabwe 2016 1729.0 1534.0 14547.0 Extrapolated NaN

19873 rows × 8 columns

Original Maddison Data

Now, let's download, save and read the original Maddison database. Since the original file is an excel file with different data on each sheet, it will require us to use a different method to get all the data.


In [17]:
if not os.path.exists(pathout + 'Maddison_original.xls'):
    import urllib
    dataurl = "http://www.ggdc.net/maddison/Historical_Statistics/horizontal-file_02-2010.xls"
    urllib.request.urlretrieve(dataurl, pathout + 'Maddison_original.xls')

Some data munging

This dataset is not very nicely structured for importing, as you can see if you open it in Excel. I suggest you do so, so that you can better see what is going on. Notice that the first two rows really have no data. Also, every second column is empty. Moreover, there are a few empty rows. Let's import the data and clean it so we can plot and analyse it better.


In [18]:
maddison_old_pop = pd.read_excel(pathout + 'Maddison_original.xls', sheet_name="Population", skiprows=2)
maddison_old_pop


Out[18]:
Unnamed: 0 1 Unnamed: 2 1000 Unnamed: 4 1500 Unnamed: 6 1600 Unnamed: 8 1700 ... 2002 2003 2004 2005 2006 2007 2008 2009 Unnamed: 201 2030
0 Western Europe NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Austria 500.0 NaN 700.0 NaN 2000.0 NaN 2500.0 NaN 2500.0 ... 8148.312 8162.656 8174.762 8184.691 8192.880 8199.783 8205.533 8210 NaN 8120.000
2 Belgium 300.0 NaN 400.0 NaN 1400.0 NaN 1600.0 NaN 2000.0 ... 10311.970 10330.824 10348.276 10364.388 10379.067 10392.226 10403.951 10414 NaN 10409.000
3 Denmark 180.0 NaN 360.0 NaN 600.0 NaN 650.0 NaN 700.0 ... 5374.693 5394.138 5413.392 5432.335 5450.661 5468.120 5484.723 5501 NaN 5730.488
4 Finland 20.0 NaN 40.0 NaN 300.0 NaN 400.0 NaN 400.0 ... 5193.039 5204.405 5214.512 5223.442 5231.372 5238.460 5244.749 5250 NaN 5201.445
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
273 Guadeloupe NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 435.739 440.189 444.515 448.713 452.776 456.698 460.486 n.a. NaN 523.493
274 Guyana (Fr.) NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 182.333 186.917 191.309 195.506 199.509 203.321 206.941 n.a. NaN 272.781
275 Martinique NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 422.277 425.966 429.510 432.900 436.131 439.202 442.119 n.a. NaN 486.714
276 Reunion NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 743.981 755.171 766.153 776.948 787.584 798.094 808.506 n.a. NaN 1025.217
277 Total NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 1784.330 1808.243 1831.487 1854.067 1876.000 1897.315 1918.052 n.a. NaN 2308.205

278 rows × 203 columns


In [19]:
maddison_old_gdppc = pd.read_excel(pathout + 'Maddison_original.xls', sheet_name="PerCapita GDP", skiprows=2)
maddison_old_gdppc


Out[19]:
Unnamed: 0 1 Unnamed: 2 1000 Unnamed: 4 1500 Unnamed: 6 1600 Unnamed: 8 1700 ... 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
0 Western Europe NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Austria 425.000000 NaN 425.000000 NaN 707 NaN 837.200000 NaN 993.200000 ... 20065.093878 20691.415561 20812.893753 20955.874051 21165.047259 21626.929322 22140.725899 22892.682427 23674.041130 24130.547035
2 Belgium 450.000000 NaN 425.000000 NaN 875 NaN 975.625000 NaN 1144.000000 ... 19964.428266 20656.458570 20761.238278 21032.935511 21205.859281 21801.602508 22246.561977 22881.632810 23446.949672 23654.763464
3 Denmark 400.000000 NaN 400.000000 NaN 738.333 NaN 875.384615 NaN 1038.571429 ... 22254.890572 22975.162513 23059.374968 23082.620719 23088.582457 23492.664119 23972.564284 24680.492880 24995.245167 24620.568805
4 Finland 400.000000 NaN 400.000000 NaN 453.333 NaN 537.500000 NaN 637.500000 ... 18855.985066 19770.363126 20245.896529 20521.702225 20845.802738 21574.406196 22140.573208 23190.283543 24131.519569 24343.586318
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
190 Total Africa 472.352941 NaN 424.767802 NaN 413.71 NaN 422.071584 NaN 420.628684 ... 1430.752576 1447.071701 1471.156532 1482.629352 1517.935644 1558.099461 1603.686517 1663.531318 1724.226776 1780.265474
191 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
192 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
193 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
194 World Average 466.752281 NaN 453.402162 NaN 566.389 NaN 595.783856 NaN 614.853602 ... 5833.255492 6037.675887 6131.705471 6261.734267 6469.119575 6738.281333 6960.031035 7238.383483 7467.648232 7613.922924

195 rows × 200 columns

Let's start by renaming the first column, which has the region/country names


In [20]:
maddison_old_pop.rename(columns={'Unnamed: 0':'Country'}, inplace=True)
maddison_old_gdppc.rename(columns={'Unnamed: 0':'Country'}, inplace=True)

Now let's drop all the columns that do not have data


In [21]:
maddison_old_pop = maddison_old_pop[[col for col in maddison_old_pop.columns if str(col).startswith('Unnamed')==False]]
maddison_old_gdppc = maddison_old_gdppc[[col for col in maddison_old_gdppc.columns if str(col).startswith('Unnamed')==False]]

Now, let's change the name of the columns so they reflect the underlying variable


In [22]:
maddison_old_pop.columns = ['Country'] + ['pop_'+str(col) for col in maddison_old_pop.columns[1:]]
maddison_old_gdppc.columns = ['Country'] + ['gdppc_'+str(col) for col in maddison_old_gdppc.columns[1:]]

In [23]:
maddison_old_pop


Out[23]:
Country pop_1 pop_1000 pop_1500 pop_1600 pop_1700 pop_1820 pop_1821 pop_1822 pop_1823 ... pop_2001 pop_2002 pop_2003 pop_2004 pop_2005 pop_2006 pop_2007 pop_2008 pop_2009 pop_2030
0 Western Europe NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Austria 500.0 700.0 2000.0 2500.0 2500.0 3369.0 3386.0 3402.0 3419.0 ... 8131.690 8148.312 8162.656 8174.762 8184.691 8192.880 8199.783 8205.533 8210 8120.000
2 Belgium 300.0 400.0 1400.0 1600.0 2000.0 3434.0 3464.0 3495.0 3526.0 ... 10291.679 10311.970 10330.824 10348.276 10364.388 10379.067 10392.226 10403.951 10414 10409.000
3 Denmark 180.0 360.0 600.0 650.0 700.0 1155.0 1167.0 1179.0 1196.0 ... 5355.826 5374.693 5394.138 5413.392 5432.335 5450.661 5468.120 5484.723 5501 5730.488
4 Finland 20.0 40.0 300.0 400.0 400.0 1169.0 1186.0 1202.0 1219.0 ... 5180.309 5193.039 5204.405 5214.512 5223.442 5231.372 5238.460 5244.749 5250 5201.445
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
273 Guadeloupe NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 431.170 435.739 440.189 444.515 448.713 452.776 456.698 460.486 n.a. 523.493
274 Guyana (Fr.) NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 177.562 182.333 186.917 191.309 195.506 199.509 203.321 206.941 n.a. 272.781
275 Martinique NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 418.454 422.277 425.966 429.510 432.900 436.131 439.202 442.119 n.a. 486.714
276 Reunion NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 732.570 743.981 755.171 766.153 776.948 787.584 798.094 808.506 n.a. 1025.217
277 Total NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 1759.756 1784.330 1808.243 1831.487 1854.067 1876.000 1897.315 1918.052 n.a. 2308.205

278 rows × 197 columns


In [24]:
maddison_old_gdppc


Out[24]:
Country gdppc_1 gdppc_1000 gdppc_1500 gdppc_1600 gdppc_1700 gdppc_1820 gdppc_1821 gdppc_1822 gdppc_1823 ... gdppc_1999 gdppc_2000 gdppc_2001 gdppc_2002 gdppc_2003 gdppc_2004 gdppc_2005 gdppc_2006 gdppc_2007 gdppc_2008
0 Western Europe NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Austria 425.000000 425.000000 707 837.200000 993.200000 1218.165628 NaN NaN NaN ... 20065.093878 20691.415561 20812.893753 20955.874051 21165.047259 21626.929322 22140.725899 22892.682427 23674.041130 24130.547035
2 Belgium 450.000000 425.000000 875 975.625000 1144.000000 1318.870122 NaN NaN NaN ... 19964.428266 20656.458570 20761.238278 21032.935511 21205.859281 21801.602508 22246.561977 22881.632810 23446.949672 23654.763464
3 Denmark 400.000000 400.000000 738.333 875.384615 1038.571429 1273.593074 1320.479863 1326.547922 1307.692308 ... 22254.890572 22975.162513 23059.374968 23082.620719 23088.582457 23492.664119 23972.564284 24680.492880 24995.245167 24620.568805
4 Finland 400.000000 400.000000 453.333 537.500000 637.500000 781.009410 NaN NaN NaN ... 18855.985066 19770.363126 20245.896529 20521.702225 20845.802738 21574.406196 22140.573208 23190.283543 24131.519569 24343.586318
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
190 Total Africa 472.352941 424.767802 413.71 422.071584 420.628684 419.755914 NaN NaN NaN ... 1430.752576 1447.071701 1471.156532 1482.629352 1517.935644 1558.099461 1603.686517 1663.531318 1724.226776 1780.265474
191 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
192 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
193 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
194 World Average 466.752281 453.402162 566.389 595.783856 614.853602 665.735330 NaN NaN NaN ... 5833.255492 6037.675887 6131.705471 6261.734267 6469.119575 6738.281333 6960.031035 7238.383483 7467.648232 7613.922924

195 rows × 195 columns

Let's choose the rows that hold the aggregates by region for the main regions of the world.


In [25]:
gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.apply(lambda x: str(x).upper().find('TOTAL')!=-1)].reset_index(drop=True)
gdppc = gdppc.dropna(subset=['gdppc_1'])
gdppc = gdppc.loc[2:]
gdppc['Country'] = gdppc.Country.str.replace('Total', '').str.replace('Countries', '').str.replace('\d+', '').str.replace('European', 'Europe').str.strip()
gdppc = gdppc.loc[gdppc.Country.apply(lambda x: x.find('USSR')==-1 and  x.find('West Asian')==-1)].reset_index(drop=True)
gdppc


Out[25]:
Country gdppc_1 gdppc_1000 gdppc_1500 gdppc_1600 gdppc_1700 gdppc_1820 gdppc_1821 gdppc_1822 gdppc_1823 ... gdppc_1999 gdppc_2000 gdppc_2001 gdppc_2002 gdppc_2003 gdppc_2004 gdppc_2005 gdppc_2006 gdppc_2007 gdppc_2008
0 Western Europe 576.167665 427.425665 771.094 887.906964 993.456911 1194.184683 NaN NaN NaN ... 18497.208533 19176.001655 19463.863297 19627.707522 19801.145425 20199.220700 20522.238008 21087.304789 21589.011346 21671.774225
1 Western Offshoots 400.000000 400.000000 400 400.000000 476.000000 1201.993477 NaN NaN NaN ... 26680.580823 27393.808035 27387.312035 27648.644070 28090.274362 28807.845958 29415.399334 29922.741918 30344.425293 30151.805880
2 East Europe 411.789474 400.000000 496 548.023599 606.010638 683.160984 NaN NaN NaN ... 5734.162109 5970.165085 6143.112873 6321.395376 6573.365882 6942.136596 7261.721015 7730.097570 8192.881904 8568.967581
3 Latin America 400.000000 400.000000 416.457 437.558140 526.639004 691.060678 NaN NaN NaN ... 5765.585093 5889.237351 5846.295193 5746.609672 5785.841237 6063.068969 6265.525702 6530.533583 6783.869986 6973.134656
4 Asia 455.671021 469.961665 568.418 573.550859 571.605276 580.626115 NaN NaN NaN ... 3623.902724 3797.608955 3927.186275 4121.275511 4388.982705 4661.517477 4900.563281 5187.253152 5408.383588 5611.198564
5 Africa 472.352941 424.767802 413.71 422.071584 420.628684 419.755914 NaN NaN NaN ... 1430.752576 1447.071701 1471.156532 1482.629352 1517.935644 1558.099461 1603.686517 1663.531318 1724.226776 1780.265474

6 rows × 195 columns

Let's drop missing values


In [26]:
gdppc = gdppc.dropna(axis=1, how='any')
gdppc


Out[26]:
Country gdppc_1 gdppc_1000 gdppc_1500 gdppc_1600 gdppc_1700 gdppc_1820 gdppc_1870 gdppc_1900 gdppc_1913 ... gdppc_1999 gdppc_2000 gdppc_2001 gdppc_2002 gdppc_2003 gdppc_2004 gdppc_2005 gdppc_2006 gdppc_2007 gdppc_2008
0 Western Europe 576.167665 427.425665 771.094 887.906964 993.456911 1194.184683 1953.068150 2884.661525 3456.576178 ... 18497.208533 19176.001655 19463.863297 19627.707522 19801.145425 20199.220700 20522.238008 21087.304789 21589.011346 21671.774225
1 Western Offshoots 400.000000 400.000000 400 400.000000 476.000000 1201.993477 2419.152411 4014.870040 5232.816582 ... 26680.580823 27393.808035 27387.312035 27648.644070 28090.274362 28807.845958 29415.399334 29922.741918 30344.425293 30151.805880
2 East Europe 411.789474 400.000000 496 548.023599 606.010638 683.160984 936.628265 1437.944586 1694.879668 ... 5734.162109 5970.165085 6143.112873 6321.395376 6573.365882 6942.136596 7261.721015 7730.097570 8192.881904 8568.967581
3 Latin America 400.000000 400.000000 416.457 437.558140 526.639004 691.060678 676.005331 1113.071149 1494.431922 ... 5765.585093 5889.237351 5846.295193 5746.609672 5785.841237 6063.068969 6265.525702 6530.533583 6783.869986 6973.134656
4 Asia 455.671021 469.961665 568.418 573.550859 571.605276 580.626115 553.459947 637.615593 695.131881 ... 3623.902724 3797.608955 3927.186275 4121.275511 4388.982705 4661.517477 4900.563281 5187.253152 5408.383588 5611.198564
5 Africa 472.352941 424.767802 413.71 422.071584 420.628684 419.755914 500.011054 601.236364 637.433138 ... 1430.752576 1447.071701 1471.156532 1482.629352 1517.935644 1558.099461 1603.686517 1663.531318 1724.226776 1780.265474

6 rows × 70 columns

Let's convert from wide to long format


In [27]:
gdppc = pd.wide_to_long(gdppc, ['gdppc_'], i='Country', j='year').reset_index()
gdppc


Out[27]:
Country year gdppc_
0 Western Europe 1 576.168
1 Western Offshoots 1 400
2 East Europe 1 411.789
3 Latin America 1 400
4 Asia 1 455.671
... ... ... ...
409 Western Offshoots 2008 30151.8
410 East Europe 2008 8568.97
411 Latin America 2008 6973.13
412 Asia 2008 5611.2
413 Africa 2008 1780.27

414 rows × 3 columns

Plotting

We can now plot the data. Let's try two different ways. The first uses the plot function from pandas. The second uses the package seaborn, which improves on the capabilities of matplotlib. The main difference is how the data needs to be organized. Of course, these are not the only ways to plot and we can try others.


In [28]:
import matplotlib as mpl
import seaborn as sns
# Setup seaborn
sns.set()

Let's pivot the table so that each region is a column and each row is a year. This will allow us to plot using the plot function of the pandas DataFrame.


In [29]:
gdppc2 = gdppc.pivot_table(index='year',columns='Country',values='gdppc_',aggfunc='sum')
gdppc2


Out[29]:
Country Africa Asia East Europe Latin America Western Europe Western Offshoots
year
1 472.352941 455.671021 411.789474 400.000000 576.167665 400.000000
1000 424.767802 469.961665 400.000000 400.000000 427.425665 400.000000
1500 413.709504 568.417900 496.000000 416.457143 771.093805 400.000000
1600 422.071584 573.550859 548.023599 437.558140 887.906964 400.000000
1700 420.628684 571.605276 606.010638 526.639004 993.456911 476.000000
... ... ... ... ... ... ...
2004 1558.099461 4661.517477 6942.136596 6063.068969 20199.220700 28807.845958
2005 1603.686517 4900.563281 7261.721015 6265.525702 20522.238008 29415.399334
2006 1663.531318 5187.253152 7730.097570 6530.533583 21087.304789 29922.741918
2007 1724.226776 5408.383588 8192.881904 6783.869986 21589.011346 30344.425293
2008 1780.265474 5611.198564 8568.967581 6973.134656 21671.774225 30151.805880

69 rows × 6 columns

Ok. Let's plot using the pandas plot function.


In [30]:
# Select some colors
mycolors = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# Use seaborn to setup a color map to be used by matplotlib
my_cmap = mpl.colors.ListedColormap(sns.color_palette(mycolors).as_hex())

# Set the size of the figure and get a figure and axis object
fig, ax = plt.subplots(figsize=(30,20))
# Plot using the axis ax and colormap my_cmap
gdppc2.loc[1800:].plot(ax=ax, linewidth=8, cmap=my_cmap)
# Change options of axes, legend
ax.tick_params(axis = 'both', which = 'major', labelsize=32)
ax.tick_params(axis = 'both', which = 'minor', labelsize=16)
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(prop={'size': 40}).set_title("Region", prop = {'size':40})
# Label axes
ax.set_xlabel('Year', fontsize=36)
ax.set_ylabel('GDP per capita (1990 Int\'l US$)', fontsize=36)


Out[30]:
Text(0, 0.5, "GDP per capita (1990 Int'l US$)")

In [31]:
fig


Out[31]:

Now, let's use seaborn


In [32]:
gdppc['Region'] = gdppc.Country.astype('category')
gdppc['gdppc_'] = gdppc.gdppc_.astype(float)
# Plot
fig, ax = plt.subplots(figsize=(30,20))
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[gdppc.year>=1800].reset_index(drop=True), alpha=1, lw=8, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=False)
ax.tick_params(axis = 'both', which = 'major', labelsize=32)
ax.tick_params(axis = 'both', which = 'minor', labelsize=16)
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year', fontsize=36)
ax.set_ylabel('GDP per capita (1990 Int\'l US$)', fontsize=36)


Out[32]:
Text(0, 0.5, "GDP per capita (1990 Int'l US$)")

In [33]:
fig


Out[33]:

Nice! Basically the same plot. But we can do better! Let's use seaborn again, but this time use different markers for each region, and let's use only a subset of the data so that it looks better. Also, let's export the figure so we can use it in our slides.


In [34]:
# Create category for hue
gdppc['Region'] = gdppc.Country.astype('category')
gdppc['gdppc_'] = gdppc.gdppc_.astype(float)

sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=1800) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'y1820-2010.pdf', dpi=300, bbox_inches='tight')



In [35]:
fig


Out[35]:

Let's create the same plot using the updated data from the Maddison Project. Here we have less years, but the picture is similar.


In [36]:
maddison_new_region['Region'] = maddison_new_region.region_name

mycolors2 = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71", "orange", "b"]
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='cgdppc', hue='Region', data=maddison_new_region.loc[(maddison_new_region.year.apply(lambda x: x in [1870, 1890, 1913, 1929,1950, 2016])) | ((maddison_new_region.year>1950) & (maddison_new_region.year.apply(lambda x: np.mod(x,10)==0)))], alpha=1, palette=sns.color_palette(mycolors2), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (2011 Int\'l US$)')
plt.savefig(pathgraphs + 'y1870-2016.pdf', dpi=300, bbox_inches='tight')



In [37]:
fig


Out[37]:

Let's show the evolution starting from other periods.


In [38]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=1700) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'take-off-1700-2010.pdf', dpi=300, bbox_inches='tight')



In [39]:
fig


Out[39]:

In [40]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=1500) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'y1500-2010.pdf', dpi=300, bbox_inches='tight')



In [41]:
fig


Out[41]:

In [42]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=1000) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'y1000-2010.pdf', dpi=300, bbox_inches='tight')



In [43]:
fig


Out[43]:

In [44]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=0) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'y1-2010.pdf', dpi=300, bbox_inches='tight')



In [45]:
fig


Out[45]:

Let's plot the evolution of GDP per capita for the whole world


In [46]:
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country=='World Average']
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc
world_gdppc['Region'] = world_gdppc.Country.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)

sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=world_gdppc.loc[(world_gdppc.year>=0) & (world_gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'W-y1-2010.pdf', dpi=300, bbox_inches='tight')



In [47]:
fig


Out[47]:

Let's plot $log(GDPpc)$ during the modern era when we have sustained economic growth


In [48]:
gdppc['lgdppc'] = np.log(gdppc.gdppc_)

# Select some colors
mycolors = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# Use seaborn to setup a color map to be used by matplotlib
my_cmap = mpl.colors.ListedColormap(sns.color_palette(mycolors).as_hex())

sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='lgdppc', hue='Region', data=gdppc.loc[(gdppc.year>=1950)].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(loc='upper left')
ax.set_xlabel('Year')
ax.set_ylabel('Log[GDP per capita (1990 Int\'l US$)]')
plt.savefig(pathgraphs + 'sg1950-2000.pdf', dpi=300, bbox_inches='tight')



In [49]:
fig


Out[49]:

In [50]:
mycolors2 = ["#34495e", "#2ecc71"]
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='cgdppc', hue='Region', data=maddison_new_region.loc[(maddison_new_region.year>=1870) & (maddison_new_region.region.apply(lambda x: x in ['we', 'wo']))], alpha=1, palette=sns.color_palette(mycolors2), style='Region', dashes=False, markers=['D', '^'],)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1f}'))
ax.set_yscale('log')
ax.set_yticks([500, 5000, 50000])
ax.get_yaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter())
ax.legend(loc='upper left')
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$, log-scale)')
plt.savefig(pathgraphs + 'sg1870-2000.pdf', dpi=300, bbox_inches='tight')


Growth Rates

Let's select a subsample of periods between 1CE and 2008 and compute the growth rate per year of income per capita in the world. We will select the sample of years we want using the loc operator and then use the shift operator to get data from the previous observation.


In [51]:
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 2008]).astype(int)
world_gdppc


Out[51]:
Country year gdppc_ Region mysample
0 World Average 1 466.752281 World Average 1
1 World Average 1000 453.402162 World Average 1
2 World Average 1500 566.389464 World Average 1
3 World Average 1600 595.783856 World Average 0
4 World Average 1700 614.853602 World Average 0
... ... ... ... ... ...
189 World Average 2004 6738.281333 World Average 0
190 World Average 2005 6960.031035 World Average 0
191 World Average 2006 7238.383483 World Average 0
192 World Average 2007 7467.648232 World Average 0
193 World Average 2008 7613.922924 World Average 1

69 rows × 5 columns


In [52]:
maddison_growth = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth['year_prev'] = maddison_growth['year'] - maddison_growth['year'].shift(1)
maddison_growth['growth'] = ((maddison_growth['gdppc_'] / maddison_growth['gdppc_'].shift(1)) ** (1/ maddison_growth.year_prev) -1)
maddison_growth['Period'] = maddison_growth['year'].astype(str).shift(1) + '-' + maddison_growth['year'].astype(str)
maddison_growth


Out[52]:
Country year gdppc_ Region mysample year_prev growth Period
0 World Average 1 466.752281 World Average 1 NaN NaN NaN
1 World Average 1000 453.402162 World Average 1 999.0 -0.000029 1-1000
2 World Average 1500 566.389464 World Average 1 500.0 0.000445 1000-1500
3 World Average 1820 665.735330 World Average 1 320.0 0.000505 1500-1820
4 World Average 2008 7613.922924 World Average 1 188.0 0.013046 1820-2008

In [53]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues", maddison_growth.shape[0]+4)[4:])
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
#handles, labels = ax.get_legend_handles_labels()
#ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate of Income per capita')
plt.savefig(pathgraphs + 'W-g1-2010.pdf', dpi=300, bbox_inches='tight')



In [54]:
fig


Out[54]:

Growth of population and income (by regions)


In [55]:
# Growth rates gdppc
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country=='World Average']
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = 'World'
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)
print(maddison_growth_gdppc)


         Country  year       gdppc_ Region  mysample  year_prev    growth     Period
0  World Average     1   466.752281  World         1        NaN       NaN        NaN
1  World Average  1000   453.402162  World         1      999.0 -0.000029     1-1000
2  World Average  1500   566.389464  World         1      500.0  0.000445  1000-1500
3  World Average  1820   665.735330  World         1      320.0  0.000505  1500-1820
4  World Average  1913  1524.430799  World         1       93.0  0.008948  1820-1913

In [56]:
# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country=='World Total']
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = 'World'
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
print(maddison_growth_pop)


       Country  year          pop_ Region  mysample  year_prev    growth     Period
0  World Total     1  2.258200e+05  World         1        NaN       NaN        NaN
1  World Total  1000  2.673300e+05  World         1      999.0  0.000169     1-1000
2  World Total  1500  4.384280e+05  World         1      500.0  0.000990  1000-1500
3  World Total  1820  1.041708e+06  World         1      320.0  0.002708  1500-1820
4  World Total  1913  1.792925e+06  World         1       93.0  0.005856  1820-1913

In [57]:
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth


Out[57]:
Region Period GDPpc Population
1 World 1-1000 -0.000029 0.000169
2 World 1000-1500 0.000445 0.000990
3 World 1500-1820 0.000505 0.002708
4 World 1820-1913 0.008948 0.005856

In [58]:
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 
maddison_growth


Out[58]:
Region Period variable growth
0 World 1-1000 Income per capita -0.000029
1 World 1000-1500 Income per capita 0.000445
2 World 1500-1820 Income per capita 0.000505
3 World 1820-1913 Income per capita 0.008948
4 World 1-1000 Population 0.000169
5 World 1000-1500 Population 0.000990
6 World 1500-1820 Population 0.002708
7 World 1820-1913 Population 0.005856

In [59]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + 'W-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')



In [60]:
fig


Out[60]:

In [61]:
# Growth rates gdppc
myregion = 'Western Offshoots'
fname = 'WO'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')



In [62]:
fig


Out[62]:

In [63]:
# Growth rates gdppc
myregion = 'Western Europe'
fname = 'WE'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total 30  '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total 30  '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')



In [64]:
fig


Out[64]:

In [65]:
# Growth rates gdppc
myregion = 'Latin America'
fname = 'LA'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')



In [66]:
fig


Out[66]:

In [67]:
# Growth rates gdppc
myregion = 'Asia'
fname = 'AS'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')



In [68]:
fig


Out[68]:

In [69]:
# Growth rates gdppc
myregion = 'Africa'
fname = 'AF'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')



In [70]:
fig


Out[70]:

Comparing richest to poorest region across time

Let's create a table that shows the GDP per capita levels for the 6 regions in the original data and compute the ratio of richest to poorest. Let's also plot it.


In [71]:
gdppc2['Richest-Poorest Ratio'] = gdppc2.max(axis=1) / gdppc2.min(axis=1)
gdp_ratio = gdppc2.loc[[1, 1000, 1500, 1700, 1820, 1870, 1913, 1940, 1960, 1980, 2000, 2008]].T
gdp_ratio = gdp_ratio.T.reset_index()
gdp_ratio['Region'] = 'Richest-Poorest'
gdp_ratio['Region'] = gdp_ratio.Region.astype('category')

In [72]:
gdp_ratio


Out[72]:
Country year Africa Asia East Europe Latin America Western Europe Western Offshoots Richest-Poorest Ratio Region
0 1 472.352941 455.671021 411.789474 400.000000 576.167665 400.000000 1.440419 Richest-Poorest
1 1000 424.767802 469.961665 400.000000 400.000000 427.425665 400.000000 1.174904 Richest-Poorest
2 1500 413.709504 568.417900 496.000000 416.457143 771.093805 400.000000 1.927735 Richest-Poorest
3 1700 420.628684 571.605276 606.010638 526.639004 993.456911 476.000000 2.361838 Richest-Poorest
4 1820 419.755914 580.626115 683.160984 691.060678 1194.184683 1201.993477 2.863553 Richest-Poorest
5 1870 500.011054 553.459947 936.628265 676.005331 1953.068150 2419.152411 4.838198 Richest-Poorest
6 1913 637.433138 695.131881 1694.879668 1494.431922 3456.576178 5232.816582 8.209201 Richest-Poorest
7 1940 813.374613 893.992784 1968.706774 1932.850716 4554.045082 6837.844866 8.406760 Richest-Poorest
8 1960 1055.114678 1025.743131 3069.750386 3135.517072 6879.294331 10961.082848 10.685992 Richest-Poorest
9 1980 1514.558119 2028.654705 5785.933433 5437.924365 13154.033928 18060.162963 11.924378 Richest-Poorest
10 2000 1447.071701 3797.608955 5970.165085 5889.237351 19176.001655 27393.808035 18.930512 Richest-Poorest
11 2008 1780.265474 5611.198564 8568.967581 6973.134656 21671.774225 30151.805880 16.936691 Richest-Poorest

In [73]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Richest-Poorest Ratio', data=gdp_ratio, alpha=1, hue='Region', style='Region', dashes=False, markers=True, )
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Richest-Poorest Ratio')
plt.savefig(pathgraphs + 'Richest-Poorest-Ratio.pdf', dpi=300, bbox_inches='tight')



In [74]:
fig


Out[74]:

Visualize as Table


In [75]:
gdp_ratio.style.format({
    1: '{:,.1f}'.format, 1000: '{:,.1f}'.format, 1500: '{:,.1%}'.format, 1700: '{:,.1%}'.format, 
    1820: '{:,.1%}'.format, 1870: '{:,.1%}'.format, 1913: '{:,.1%}'.format, 1940: '{:,.1%}'.format, 
    1960: '{:,.1%}'.format, 1980: '{:,.1%}'.format, 2000: '{:,.1%}'.format, 2008: '{:,.1%}'.format, 
})


Out[75]:
Country year Africa Asia East Europe Latin America Western Europe Western Offshoots Richest-Poorest Ratio Region
0 1 472.352941 455.671021 411.789474 400.000000 576.167665 400.000000 1.440419 Richest-Poorest
1 1000 424.767802 469.961665 400.000000 400.000000 427.425665 400.000000 1.174904 Richest-Poorest
2 1500 413.709504 568.417900 496.000000 416.457143 771.093805 400.000000 1.927735 Richest-Poorest
3 1700 420.628684 571.605276 606.010638 526.639004 993.456911 476.000000 2.361838 Richest-Poorest
4 1820 419.755914 580.626115 683.160984 691.060678 1194.184683 1201.993477 2.863553 Richest-Poorest
5 1870 500.011054 553.459947 936.628265 676.005331 1953.068150 2419.152411 4.838198 Richest-Poorest
6 1913 637.433138 695.131881 1694.879668 1494.431922 3456.576178 5232.816582 8.209201 Richest-Poorest
7 1940 813.374613 893.992784 1968.706774 1932.850716 4554.045082 6837.844866 8.406760 Richest-Poorest
8 1960 1055.114678 1025.743131 3069.750386 3135.517072 6879.294331 10961.082848 10.685992 Richest-Poorest
9 1980 1514.558119 2028.654705 5785.933433 5437.924365 13154.033928 18060.162963 11.924378 Richest-Poorest
10 2000 1447.071701 3797.608955 5970.165085 5889.237351 19176.001655 27393.808035 18.930512 Richest-Poorest
11 2008 1780.265474 5611.198564 8568.967581 6973.134656 21671.774225 30151.805880 16.936691 Richest-Poorest

Export table to LaTeX

Let's print the table as LaTeX code that can be copied and pasted in our slides or paper.


In [76]:
print(gdp_ratio.to_latex(formatters={
    1: '{:,.1f}'.format, 1000: '{:,.1f}'.format, 1500: '{:,.1f}'.format, 1700: '{:,.1f}'.format, 
    1820: '{:,.1f}'.format, 1870: '{:,.1f}'.format, 1913: '{:,.1f}'.format, 1940: '{:,.1f}'.format, 
    1960: '{:,.1f}'.format, 1980: '{:,.1f}'.format, 2000: '{:,.1f}'.format, 2008: '{:,.1f}'.format, 
}))


\begin{tabular}{lrrrrrrrrl}
\toprule
Country &  year &       Africa &         Asia &  East Europe &  Latin America &  Western Europe &  Western Offshoots &  Richest-Poorest Ratio &           Region \\
\midrule
0  &     1 &   472.352941 &   455.671021 &   411.789474 &     400.000000 &      576.167665 &         400.000000 &               1.440419 &  Richest-Poorest \\
1  &  1000 &   424.767802 &   469.961665 &   400.000000 &     400.000000 &      427.425665 &         400.000000 &               1.174904 &  Richest-Poorest \\
2  &  1500 &   413.709504 &   568.417900 &   496.000000 &     416.457143 &      771.093805 &         400.000000 &               1.927735 &  Richest-Poorest \\
3  &  1700 &   420.628684 &   571.605276 &   606.010638 &     526.639004 &      993.456911 &         476.000000 &               2.361838 &  Richest-Poorest \\
4  &  1820 &   419.755914 &   580.626115 &   683.160984 &     691.060678 &     1194.184683 &        1201.993477 &               2.863553 &  Richest-Poorest \\
5  &  1870 &   500.011054 &   553.459947 &   936.628265 &     676.005331 &     1953.068150 &        2419.152411 &               4.838198 &  Richest-Poorest \\
6  &  1913 &   637.433138 &   695.131881 &  1694.879668 &    1494.431922 &     3456.576178 &        5232.816582 &               8.209201 &  Richest-Poorest \\
7  &  1940 &   813.374613 &   893.992784 &  1968.706774 &    1932.850716 &     4554.045082 &        6837.844866 &               8.406760 &  Richest-Poorest \\
8  &  1960 &  1055.114678 &  1025.743131 &  3069.750386 &    3135.517072 &     6879.294331 &       10961.082848 &              10.685992 &  Richest-Poorest \\
9  &  1980 &  1514.558119 &  2028.654705 &  5785.933433 &    5437.924365 &    13154.033928 &       18060.162963 &              11.924378 &  Richest-Poorest \\
10 &  2000 &  1447.071701 &  3797.608955 &  5970.165085 &    5889.237351 &    19176.001655 &       27393.808035 &              18.930512 &  Richest-Poorest \\
11 &  2008 &  1780.265474 &  5611.198564 &  8568.967581 &    6973.134656 &    21671.774225 &       30151.805880 &              16.936691 &  Richest-Poorest \\
\bottomrule
\end{tabular}


In [77]:
%%latex
\begin{tabular}{lrrrrrrrrrrrr}
\toprule
year &  1    &  1000 &  1500 &  1700 &    1820 &    1870 &    1913 &    1940 &     1960 &     1980 &     2000 &     2008 \\
Country               &       &       &       &       &         &         &         &         &          &          &          &          \\
\midrule
Africa                & 472.4 & 424.8 & 413.7 & 420.6 &   419.8 &   500.0 &   637.4 &   813.4 &  1,055.1 &  1,514.6 &  1,447.1 &  1,780.3 \\
Asia                  & 455.7 & 470.0 & 568.4 & 571.6 &   580.6 &   553.5 &   695.1 &   894.0 &  1,025.7 &  2,028.7 &  3,797.6 &  5,611.2 \\
East Europe           & 411.8 & 400.0 & 496.0 & 606.0 &   683.2 &   936.6 & 1,694.9 & 1,968.7 &  3,069.8 &  5,785.9 &  5,970.2 &  8,569.0 \\
Latin America         & 400.0 & 400.0 & 416.5 & 526.6 &   691.1 &   676.0 & 1,494.4 & 1,932.9 &  3,135.5 &  5,437.9 &  5,889.2 &  6,973.1 \\
Western Europe        & 576.2 & 427.4 & 771.1 & 993.5 & 1,194.2 & 1,953.1 & 3,456.6 & 4,554.0 &  6,879.3 & 13,154.0 & 19,176.0 & 21,671.8 \\
Western Offshoots     & 400.0 & 400.0 & 400.0 & 476.0 & 1,202.0 & 2,419.2 & 5,232.8 & 6,837.8 & 10,961.1 & 18,060.2 & 27,393.8 & 30,151.8 \\
Richest-Poorest Ratio &   1.4 &   1.2 &   1.9 &   2.4 &     2.9 &     4.8 &     8.2 &     8.4 &     10.7 &     11.9 &     18.9 &     16.9 \\
\bottomrule
\end{tabular}


\begin{tabular}{lrrrrrrrrrrrr} \toprule year & 1 & 1000 & 1500 & 1700 & 1820 & 1870 & 1913 & 1940 & 1960 & 1980 & 2000 & 2008 \\ Country & & & & & & & & & & & & \\ \midrule Africa & 472.4 & 424.8 & 413.7 & 420.6 & 419.8 & 500.0 & 637.4 & 813.4 & 1,055.1 & 1,514.6 & 1,447.1 & 1,780.3 \\ Asia & 455.7 & 470.0 & 568.4 & 571.6 & 580.6 & 553.5 & 695.1 & 894.0 & 1,025.7 & 2,028.7 & 3,797.6 & 5,611.2 \\ East Europe & 411.8 & 400.0 & 496.0 & 606.0 & 683.2 & 936.6 & 1,694.9 & 1,968.7 & 3,069.8 & 5,785.9 & 5,970.2 & 8,569.0 \\ Latin America & 400.0 & 400.0 & 416.5 & 526.6 & 691.1 & 676.0 & 1,494.4 & 1,932.9 & 3,135.5 & 5,437.9 & 5,889.2 & 6,973.1 \\ Western Europe & 576.2 & 427.4 & 771.1 & 993.5 & 1,194.2 & 1,953.1 & 3,456.6 & 4,554.0 & 6,879.3 & 13,154.0 & 19,176.0 & 21,671.8 \\ Western Offshoots & 400.0 & 400.0 & 400.0 & 476.0 & 1,202.0 & 2,419.2 & 5,232.8 & 6,837.8 & 10,961.1 & 18,060.2 & 27,393.8 & 30,151.8 \\ Richest-Poorest Ratio & 1.4 & 1.2 & 1.9 & 2.4 & 2.9 & 4.8 & 8.2 & 8.4 & 10.7 & 11.9 & 18.9 & 16.9 \\ \bottomrule \end{tabular}

Export Table to HTML


In [78]:
from IPython.display import display, HTML
display(HTML(gdp_ratio.to_html(formatters={
    1: '{:,.1f}'.format, 1000: '{:,.1f}'.format, 1500: '{:,.1f}'.format, 1700: '{:,.1f}'.format, 
    1820: '{:,.1f}'.format, 1870: '{:,.1f}'.format, 1913: '{:,.1f}'.format, 1940: '{:,.1f}'.format, 
    1960: '{:,.1f}'.format, 1980: '{:,.1f}'.format, 2000: '{:,.1f}'.format, 2008: '{:,.1f}'.format, 
})))


Country year Africa Asia East Europe Latin America Western Europe Western Offshoots Richest-Poorest Ratio Region
0 1 472.352941 455.671021 411.789474 400.000000 576.167665 400.000000 1.440419 Richest-Poorest
1 1000 424.767802 469.961665 400.000000 400.000000 427.425665 400.000000 1.174904 Richest-Poorest
2 1500 413.709504 568.417900 496.000000 416.457143 771.093805 400.000000 1.927735 Richest-Poorest
3 1700 420.628684 571.605276 606.010638 526.639004 993.456911 476.000000 2.361838 Richest-Poorest
4 1820 419.755914 580.626115 683.160984 691.060678 1194.184683 1201.993477 2.863553 Richest-Poorest
5 1870 500.011054 553.459947 936.628265 676.005331 1953.068150 2419.152411 4.838198 Richest-Poorest
6 1913 637.433138 695.131881 1694.879668 1494.431922 3456.576178 5232.816582 8.209201 Richest-Poorest
7 1940 813.374613 893.992784 1968.706774 1932.850716 4554.045082 6837.844866 8.406760 Richest-Poorest
8 1960 1055.114678 1025.743131 3069.750386 3135.517072 6879.294331 10961.082848 10.685992 Richest-Poorest
9 1980 1514.558119 2028.654705 5785.933433 5437.924365 13154.033928 18060.162963 11.924378 Richest-Poorest
10 2000 1447.071701 3797.608955 5970.165085 5889.237351 19176.001655 27393.808035 18.930512 Richest-Poorest
11 2008 1780.265474 5611.198564 8568.967581 6973.134656 21671.774225 30151.805880 16.936691 Richest-Poorest

Take-off, industrialization and reversals

Industrialization per capita

Let's create a full dataframe inserting the data by hand. This is based on data from Bairoch, P., 1982. "International industrialization levels from 1750 to 1980". Journal of European Economic History, 11(2), p.269. for 1750-1913 the data comes from Table 9


In [79]:
industrialization = [['Developed Countries', 8, 8, 11, 16, 24, 35, 55],
                     ['Europe', 8, 8, 11, 17, 23, 33, 45],
                     ['Austria-Hungary', 7, 7, 8, 11, 15, 23, 32],
                     ['Belgium', 9, 10, 14, 28, 43, 56, 88],
                     ['France', 9, 9, 12, 20, 28, 39, 59],
                     ['Germany', 8, 8, 9, 15, 25, 52, 85],
                     ['Italy', 8, 8, 8, 10, 12, 17, 26],
                     ['Russia', 6, 6, 7, 8, 10, 15, 20],
                     ['Spain', 7, 7, 8, 11, 14, 19, 22],
                     ['Sweden', 7, 8, 9, 15, 24, 41, 67],
                     ['Switzerland', 7, 10, 16, 26, 39, 67, 87],
                     ['United Kingdom', 10, 16, 25, 64, 87, 100, 115],
                     ['Canada', np.nan, 5, 6, 7, 10, 24, 46],
                     ['United States', 4, 9, 14, 21, 38, 69, 126],
                     ['Japan', 7, 7, 7, 7, 9, 12, 20],
                     ['Third World', 7, 6, 6, 4, 3, 2, 2],
                     ['China', 8, 6, 6, 4, 4, 3, 3],
                     ['India', 7, 6, 6, 3, 2, 1, 2],
                     ['Brazil', np.nan, np.nan, np.nan, 4, 4, 5, 7],
                     ['Mexico', np.nan, np.nan, np.nan, 5, 4, 5, 7],
                     ['World', 7, 6, 7, 7, 9, 14, 21]]

years = [1750, 1800, 1830, 1860, 1880, 1900, 1913]
industrialization = pd.DataFrame(industrialization, columns=['Country'] + ['y'+str(y) for y in years])

For 1913-1980 the data comes from Table 12


In [80]:
industrialization2 = [['Developed Countries', 55, 71, 81, 135, 194, 315, 344],
                      ['Market Economies', np.nan, 96, 105, 167, 222, 362, 387],
                      ['Europe', 45, 76, 94, 107, 166, 260, 280],
                      ['Belgium', 88, 116, 89, 117, 183, 291, 316],
                      ['France', 59, 82, 73, 95, 167, 259, 277],
                      ['Germany', 85, 101, 128, 144, 244, 366, 395],
                      ['Italy', 26, 39, 44, 61, 121, 194, 231],
                      ['Spain', 22, 28, 23, 31, 56, 144, 159],
                      ['Sweden', 67, 84, 135, 163, 262, 405, 409],
                      ['Switzerland', 87, 90, 88, 167, 259, 366, 354],
                      ['United Kingdom', 115, 122, 157, 210, 253, 341, 325],
                      ['Canada', 46, 82, 84, 185, 237, 370, 379],
                      ['United States', 126, 182, 167, 354, 393, 604, 629],
                      ['Japan', 20, 30, 51, 40, 113, 310, 353],
                      ['U.S.S.R.', 20, 20, 38, 73, 139, 222, 252],
                      ['Third World', 2, 3, 4, 5, 8, 14, 17],
                      ['India', 2, 3, 4, 6, 8, 14, 16],
                      ['Brazil', 7, 10, 10, 13, 23, 42, 55],
                      ['Mexico', 7, 9, 8, 12, 22, 36, 41],
                      ['China', 3, 4, 4, 5, 10, 18, 24],
                      ['World', 21, 28, 31 ,48, 66, 100, 103]]
years = [1913, 1928, 1938, 1953, 1963, 1973, 1980]
industrialization2 = pd.DataFrame(industrialization2, columns=['Country'] + ['y'+str(y) for y in years])

Let's join both dataframes so we can plot the whole series.


In [81]:
industrialization = industrialization.merge(industrialization2)
industrialization


Out[81]:
Country y1750 y1800 y1830 y1860 y1880 y1900 y1913 y1928 y1938 y1953 y1963 y1973 y1980
0 Developed Countries 8.0 8.0 11.0 16 24 35 55 71 81 135 194 315 344
1 Europe 8.0 8.0 11.0 17 23 33 45 76 94 107 166 260 280
2 Belgium 9.0 10.0 14.0 28 43 56 88 116 89 117 183 291 316
3 France 9.0 9.0 12.0 20 28 39 59 82 73 95 167 259 277
4 Germany 8.0 8.0 9.0 15 25 52 85 101 128 144 244 366 395
5 Italy 8.0 8.0 8.0 10 12 17 26 39 44 61 121 194 231
6 Spain 7.0 7.0 8.0 11 14 19 22 28 23 31 56 144 159
7 Sweden 7.0 8.0 9.0 15 24 41 67 84 135 163 262 405 409
8 Switzerland 7.0 10.0 16.0 26 39 67 87 90 88 167 259 366 354
9 United Kingdom 10.0 16.0 25.0 64 87 100 115 122 157 210 253 341 325
10 Canada NaN 5.0 6.0 7 10 24 46 82 84 185 237 370 379
11 United States 4.0 9.0 14.0 21 38 69 126 182 167 354 393 604 629
12 Japan 7.0 7.0 7.0 7 9 12 20 30 51 40 113 310 353
13 Third World 7.0 6.0 6.0 4 3 2 2 3 4 5 8 14 17
14 China 8.0 6.0 6.0 4 4 3 3 4 4 5 10 18 24
15 India 7.0 6.0 6.0 3 2 1 2 3 4 6 8 14 16
16 Brazil NaN NaN NaN 4 4 5 7 10 10 13 23 42 55
17 Mexico NaN NaN NaN 5 4 5 7 9 8 12 22 36 41
18 World 7.0 6.0 7.0 7 9 14 21 28 31 48 66 100 103

Let's convert to long format and plot the evolution of industrialization across regions and groups of countries.


In [82]:
industrialization = pd.wide_to_long(industrialization, ['y'], i='Country', j='year').reset_index()
industrialization.rename(columns={'y':'Industrialization'}, inplace=True)

In [83]:
# Select some colors
mycolors = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# Use seaborn to setup a color map to be used by matplotlib
my_cmap = mpl.colors.ListedColormap(sns.color_palette(mycolors).as_hex())
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Industrialization', hue='Country',
             data=industrialization.loc[industrialization.Country.apply(lambda x: x in ['Developed Countries', 'Third World', 'World'])].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=True)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Industrialization per capita (UK in 1900=100)')
plt.savefig(pathgraphs + 'Industrialization-Dev-NonDev.pdf', dpi=300, bbox_inches='tight')



In [84]:
fig


Out[84]:

In [85]:
# Map country name to development level
dev_level = {'Belgium':'Developed',
             'France':'Developed',
             'Germany':'Developed',
             'Italy':'Developed',
             'Spain':'Developed',
             'Sweden':'Developed',
             'Switzerland':'Developed',
             'United Kingdom':'Developed',
             'Canada':'Developed',
             'United States':'Developed',
             'Japan':'Developed',
             'China':'Developing',
             'India':'Developing',
             'Brazil':'Developing',
             'Mexico':'Developing'}

industrialization['dev_level'] = industrialization.Country.map(dev_level)

filled_markers = ('o', 's', 'v', '^', '<', '>', '8', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')

sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Industrialization', hue='Country',
             data=industrialization.loc[industrialization.dev_level=='Developed'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             palette=sns.cubehelix_palette(11, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Industrialization per capita (UK in 1900=100)')
plt.savefig(pathgraphs + 'Industrialization-Dev.pdf', dpi=300, bbox_inches='tight')



In [86]:
fig


Out[86]:

In [87]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Industrialization', hue='Country',
             data=industrialization.loc[industrialization.dev_level=='Developing'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             palette=sns.cubehelix_palette(4, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Industrialization per capita (UK in 1900=100)')
plt.savefig(pathgraphs + 'Industrialization-NonDev.pdf', dpi=300, bbox_inches='tight')



In [88]:
fig


Out[88]:

In [89]:
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Industrialization', hue='Country',
             data=industrialization.loc[
                 (industrialization.Country.apply(lambda x: x in ['India', 'United Kingdom'])) & 
                 (industrialization.year<=1900)].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             )
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Industrialization per capita (UK in 1900=100)')
plt.savefig(pathgraphs + 'Industrialization-UK-IND.pdf', dpi=300, bbox_inches='tight')



In [90]:
fig


Out[90]:

Manufacturing

Let's use data from the same source to explore what happened to the share of manufacturing across regions.