Working with Economic data in Python

This notebook will introduce you to working with data in Python. You will use packages like Numpy to manipulate, work and do computations with arrays, matrices, and such, and anipulate data (see my Introduction to Python). But given the needs of economists (and other scientists) it will be advantageous for us to use pandas. pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for Python. pandas allows you to import and process data in many useful ways. It interacts greatly with other packages that complement it making it a very powerful tool for data analysis.

With pandas you can

Import many types of data, including
- CSV files
- Tab or other types of delimited files
- Excel (xls, xlsx) files
- Stata files

Open files directly from a website
Merge, select, join data
Perform statistical analyses
Create plots of your data

and much more. Let's start by importing pandas and use to it download some data and create some of the figures from the lecture notes. Note that when importing pandas it is accustomed to assign it the alias pd. I suggest you follow this conventiuon, which will make using other peoples code and snippets easier.



In [1]:

    
# Let's import pandas and some other basic packages we will use 
from __future__ import division
%pylab --no-import-all
%matplotlib inline
import pandas as pd
import numpy as np









    



Using matplotlib backend: MacOSX
Populating the interactive namespace from numpy and matplotlib

Working with Pandas

The basic structures in pandas are pd.Series and pd.DataFrame. You can think of a pd.Series as a labeled vector that contains data and has a large set of functions that can be easily performed on it. A pd.DataFrame is similar a table/matrix of multidimensional data where each column contains a pd.Series. I know...this may not explain much, so let's start with some actual examples. Let's create two series, one containing some country names and another containing some ficticious data.



In [2]:

    
countries = pd.Series(['Colombia', 'Turkey', 'USA', 'Germany', 'Chile'], name='country')
print(countries)
print('\n', 'There are ', countries.shape[0], 'countries in this series.')









    



0    Colombia
1      Turkey
2         USA
3     Germany
4       Chile
Name: country, dtype: object

 There are  5 countries in this series.

Notice that we have assinged a name to the series that is different than the name of the variable containing the series. Our print(countries) statement is showing the series and its contents, its name and the dype of data it contains. Here our series is only composed of strings so it assigns it the object dtype (not important for now, but we will use this later to convert data between types, e.g. strings to integers or floats or the other way around).

Let's create the data using some of the functions we already learned.



In [3]:

    
np.random.seed(123456)
data = pd.Series(np.random.normal(size=(countries.shape)), name='noise')
print(data)
print('\n', 'The average in this sample is ', data.mean())









    



0    0.469112
1   -0.282863
2   -1.509059
3   -1.135632
4    1.212112
Name: noise, dtype: float64

 The average in this sample is  -0.24926597871826645

Here we have used the mean() function of the series to compute its mean. There are many other properties/functions for these series including std(), shape, count(), max(), min(), etc. You can access these by writing series.name_of_function_or_property. To see what functions are available you can hit tab after writing series..

Let's create a pd.DataFrame using these two series.



In [4]:

    
df = pd.DataFrame([countries, data])
df









    Out[4]:







  
    
      
      0
      1
      2
      3
      4
    
  
  
    
      country
      Colombia
      Turkey
      USA
      Germany
      Chile
    
    
      noise
      0.469112
      -0.282863
      -1.50906
      -1.13563
      1.21211

Not exactly what we'd like, but don't worry, we can just transpose it so it has each country with its data in a row.



In [5]:

    
df = df.T
df

Now let us add some more data to this dataframe. This is done easily by defining a new columns. Let's create the square of noise, create the sum of noise and its square, and get the length of the country's name.



In [6]:

    
df['noise_sq'] = df.noise**2
df['noise and its square'] = df.noise + df.noise_sq
df['name length'] = df.country.apply(len)
df









    Out[6]:







  
    
      
      country
      noise
      noise_sq
      noise and its square
      name length
    
  
  
    
      0
      Colombia
      0.469112
      0.220066
      0.689179
      8
    
    
      1
      Turkey
      -0.282863
      0.0800117
      -0.202852
      6
    
    
      2
      USA
      -1.50906
      2.27726
      0.768199
      3
    
    
      3
      Germany
      -1.13563
      1.28966
      0.154029
      7
    
    
      4
      Chile
      1.21211
      1.46922
      2.68133
      5

This shows some of the ways in which you can create new data. Especially useful is the apply method, which applies a function to the series. You can also apply a function to the whole dataframe, which is useful if you want to perform computations using various columns.

Let's see some other ways in which we can interact with dataframes. First, let's select some observations, e.g., all countries in the South America.



In [7]:

    
# Let's create a list of South American countries
south_america = ['Colombia', 'Chile']
# Select the rows for South American countries
df.loc[df.country.apply(lambda x: x in south_america)]









    Out[7]:







  
    
      
      country
      noise
      noise_sq
      noise and its square
      name length
    
  
  
    
      0
      Colombia
      0.469112
      0.220066
      0.689179
      8
    
    
      4
      Chile
      1.21211
      1.46922
      2.68133
      5

Now let's use this to create a dummy indicating whether a country belongs to South America. To understand what is going on let's show the result of the condition for selecting rows.



In [8]:

    
df.country.apply(lambda x: x in south_america)









    Out[8]:





0     True
1    False
2    False
3    False
4     True
Name: country, dtype: bool

So in the previous selection of rows we told pandas which rows we wanted or not to be included by passing a series of booleans (True, False). We can use this result to create the dummy, we only need to convert the output to int.



In [9]:

    
df['South America'] = df.country.apply(lambda x: x in south_america).astype(int)

Now, let's plot the various series in the dataframe



In [10]:

    
df.plot()









    Out[10]:





<matplotlib.axes._subplots.AxesSubplot at 0x1275829a0>

Not too nice nor useful. Notice that it assigned the row number to the x-axis labels. Let's change the row labels, which are contained in the dataframe's index by assigning the country names as the index.



In [11]:

    
df = df.set_index('country')
print(df)
df.plot()









    



             noise   noise_sq noise and its square  name length  South America
country                                                                       
Colombia  0.469112   0.220066             0.689179            8              1
Turkey   -0.282863  0.0800117            -0.202852            6              0
USA       -1.50906    2.27726             0.768199            3              0
Germany   -1.13563    1.28966             0.154029            7              0
Chile      1.21211    1.46922              2.68133            5              1






    Out[11]:





<matplotlib.axes._subplots.AxesSubplot at 0x12968dcd0>

Better, but still not very informative. Below we will improve on this when we work with some real data.

Notice that by using the set_index function we have assigned the index to the country names. This may be useful to select data. E.g., if we want to see only the row for Colombia we can



In [12]:

    
df.loc['Colombia']









    Out[12]:





noise                   0.469112
noise_sq                0.220066
noise and its square    0.689179
name length                    8
South America                  1
Name: Colombia, dtype: object

Getting data

One of the nice features of pandas and its ecology is that it makes obtaining data very easy. In order to exemplify this and also to revisit some of the basic facts of comparative development, let's download some data from various sources. This may require you to create accounts in order to access and download the data (sometimes the process is very simple and does not require an actual project...in other cases you need to propose a project and be approved...usually due to privacy concerns with micro-data). Don't be afraid, all these sources are free and are used a lot in research, so it is good that you learn to use them. Let's start with a list of useful sources.

Country-level data economic data

World Bank provides all kinds of socio-economic data.
Penn World Tables is a database with information on relative levels of income, output, input and productivity, covering 182 countries between 1950 and 2017.
Maddison Historical Data provides the most used historical statistics on population and GDP
The Maddison Project Database provides information on comparative economic growth and income levels over the very long run, follow-up to Maddison.
Comparative Historical National Accounts provides information on Gross Domestic Product, including an industry breakdown, for the 19th and 20th centuries.
Human Mortality Database provides detailed mortality and population data for the world for the last two centuries.

Censuses, Surveys, and other micro-level data

IPUMS: provides census and survey data from around the world integrated across time and space.
General Social Survey provides survey data on what Americans think and feel about such issues as national spending priorities, crime and punishment, intergroup relations, and confidence in institutions.
European Social Survey provides survey measures on the attitudes, beliefs and behaviour patterns of diverse European populations in more than thirty nations.
UK Data Service is the UK’s largest collection of social, economic and population data resources.
SHRUG is The Socioeconomic High-resolution Rural-Urban Geographic Platform for India. Provides access to dozens of datasets covering India’s 500,000 villages and 8000 towns using a set of a common geographic identifiers that span 25 years.

Divergence - Big time

To study the divergence across countries let's download and plot the historical GDP and population data. In order to keep the data and not having to download it everytime from scratch, we'll create a folder ./data in the currect directory and save each file there. Also, we'll make sure that if the data does not exist, we download it. We'll use the os package to create directories.

Setting up paths



In [13]:

    
import os

pathout = './data/'

if not os.path.exists(pathout):
    os.mkdir(pathout)
    
pathgraphs = './graphs/'
if not os.path.exists(pathgraphs):
    os.mkdir(pathgraphs)

Download New Maddison Project Data



In [14]:

    
try:
    maddison_new = pd.read_stata(pathout + 'Maddison2018.dta')
    maddison_new_region = pd.read_stata(pathout + 'Maddison2018_region.dta')
    maddison_new_1990 = pd.read_stata(pathout + 'Maddison2018_1990.dta')
except:
    maddison_new = pd.read_stata('https://www.rug.nl/ggdc/historicaldevelopment/maddison/data/mpd2018.dta')
    maddison_new.to_stata(pathout + 'Maddison2018.dta', write_index=False, version=117)
    maddison_new_region = pd.read_stata('https://www.rug.nl/ggdc/historicaldevelopment/maddison/data/mpd2018_region_data.dta')
    maddison_new_region.to_stata(pathout + 'Maddison2018_region.dta', write_index=False, version=117)
    maddison_new_1990 = pd.read_stata('https://www.rug.nl/ggdc/historicaldevelopment/maddison/data/mpd2018_1990bm.dta')
    maddison_new_1990.to_stata(pathout + 'Maddison2018_1990.dta', write_index=False, version=117)



In [15]:

    
maddison_new









    Out[15]:







  
    
      
      countrycode
      country
      year
      cgdppc
      rgdpnapc
      pop
      i_cig
      i_bm
    
  
  
    
      0
      AFG
      Afghanistan
      1820.0
      NaN
      NaN
      3280.0
      NaN
      NaN
    
    
      1
      AFG
      Afghanistan
      1870.0
      NaN
      NaN
      4207.0
      NaN
      NaN
    
    
      2
      AFG
      Afghanistan
      1913.0
      NaN
      NaN
      5730.0
      NaN
      NaN
    
    
      3
      AFG
      Afghanistan
      1950.0
      2392.0
      2392.0
      8150.0
      Extrapolated
      NaN
    
    
      4
      AFG
      Afghanistan
      1951.0
      2422.0
      2422.0
      8284.0
      Extrapolated
      NaN
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      19868
      ZWE
      Zimbabwe
      2012.0
      1623.0
      1604.0
      12620.0
      Extrapolated
      NaN
    
    
      19869
      ZWE
      Zimbabwe
      2013.0
      1801.0
      1604.0
      13183.0
      Extrapolated
      NaN
    
    
      19870
      ZWE
      Zimbabwe
      2014.0
      1797.0
      1594.0
      13772.0
      Extrapolated
      NaN
    
    
      19871
      ZWE
      Zimbabwe
      2015.0
      1759.0
      1560.0
      14230.0
      Extrapolated
      NaN
    
    
      19872
      ZWE
      Zimbabwe
      2016.0
      1729.0
      1534.0
      14547.0
      Extrapolated
      NaN
    
  

19873 rows × 8 columns

This dataset is in long format. Also, notice that the year is not an integer. Let's correct this



In [16]:

    
maddison_new['year'] = maddison_new.year.astype(int)
maddison_new









    Out[16]:







  
    
      
      countrycode
      country
      year
      cgdppc
      rgdpnapc
      pop
      i_cig
      i_bm
    
  
  
    
      0
      AFG
      Afghanistan
      1820
      NaN
      NaN
      3280.0
      NaN
      NaN
    
    
      1
      AFG
      Afghanistan
      1870
      NaN
      NaN
      4207.0
      NaN
      NaN
    
    
      2
      AFG
      Afghanistan
      1913
      NaN
      NaN
      5730.0
      NaN
      NaN
    
    
      3
      AFG
      Afghanistan
      1950
      2392.0
      2392.0
      8150.0
      Extrapolated
      NaN
    
    
      4
      AFG
      Afghanistan
      1951
      2422.0
      2422.0
      8284.0
      Extrapolated
      NaN
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      19868
      ZWE
      Zimbabwe
      2012
      1623.0
      1604.0
      12620.0
      Extrapolated
      NaN
    
    
      19869
      ZWE
      Zimbabwe
      2013
      1801.0
      1604.0
      13183.0
      Extrapolated
      NaN
    
    
      19870
      ZWE
      Zimbabwe
      2014
      1797.0
      1594.0
      13772.0
      Extrapolated
      NaN
    
    
      19871
      ZWE
      Zimbabwe
      2015
      1759.0
      1560.0
      14230.0
      Extrapolated
      NaN
    
    
      19872
      ZWE
      Zimbabwe
      2016
      1729.0
      1534.0
      14547.0
      Extrapolated
      NaN
    
  

19873 rows × 8 columns

Original Maddison Data

Now, let's download, save and read the original Maddison database. Since the original file is an excel file with different data on each sheet, it will require us to use a different method to get all the data.



In [17]:

    
if not os.path.exists(pathout + 'Maddison_original.xls'):
    import urllib
    dataurl = "http://www.ggdc.net/maddison/Historical_Statistics/horizontal-file_02-2010.xls"
    urllib.request.urlretrieve(dataurl, pathout + 'Maddison_original.xls')

Some data munging

This dataset is not very nicely structured for importing, as you can see if you open it in Excel. I suggest you do so, so that you can better see what is going on. Notice that the first two rows really have no data. Also, every second column is empty. Moreover, there are a few empty rows. Let's import the data and clean it so we can plot and analyse it better.



In [18]:

    
maddison_old_pop = pd.read_excel(pathout + 'Maddison_original.xls', sheet_name="Population", skiprows=2)
maddison_old_pop









    Out[18]:







  
    
      
      Unnamed: 0
      1
      Unnamed: 2
      1000
      Unnamed: 4
      1500
      Unnamed: 6
      1600
      Unnamed: 8
      1700
      ...
      2002
      2003
      2004
      2005
      2006
      2007
      2008
      2009
      Unnamed: 201
      2030
    
  
  
    
      0
      Western Europe
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      1
      Austria
      500.0
      NaN
      700.0
      NaN
      2000.0
      NaN
      2500.0
      NaN
      2500.0
      ...
      8148.312
      8162.656
      8174.762
      8184.691
      8192.880
      8199.783
      8205.533
      8210
      NaN
      8120.000
    
    
      2
      Belgium
      300.0
      NaN
      400.0
      NaN
      1400.0
      NaN
      1600.0
      NaN
      2000.0
      ...
      10311.970
      10330.824
      10348.276
      10364.388
      10379.067
      10392.226
      10403.951
      10414
      NaN
      10409.000
    
    
      3
      Denmark
      180.0
      NaN
      360.0
      NaN
      600.0
      NaN
      650.0
      NaN
      700.0
      ...
      5374.693
      5394.138
      5413.392
      5432.335
      5450.661
      5468.120
      5484.723
      5501
      NaN
      5730.488
    
    
      4
      Finland
      20.0
      NaN
      40.0
      NaN
      300.0
      NaN
      400.0
      NaN
      400.0
      ...
      5193.039
      5204.405
      5214.512
      5223.442
      5231.372
      5238.460
      5244.749
      5250
      NaN
      5201.445
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      273
      Guadeloupe
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      435.739
      440.189
      444.515
      448.713
      452.776
      456.698
      460.486
      n.a.
      NaN
      523.493
    
    
      274
      Guyana (Fr.)
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      182.333
      186.917
      191.309
      195.506
      199.509
      203.321
      206.941
      n.a.
      NaN
      272.781
    
    
      275
      Martinique
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      422.277
      425.966
      429.510
      432.900
      436.131
      439.202
      442.119
      n.a.
      NaN
      486.714
    
    
      276
      Reunion
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      743.981
      755.171
      766.153
      776.948
      787.584
      798.094
      808.506
      n.a.
      NaN
      1025.217
    
    
      277
      Total
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      1784.330
      1808.243
      1831.487
      1854.067
      1876.000
      1897.315
      1918.052
      n.a.
      NaN
      2308.205
    
  

278 rows × 203 columns



In [19]:

    
maddison_old_gdppc = pd.read_excel(pathout + 'Maddison_original.xls', sheet_name="PerCapita GDP", skiprows=2)
maddison_old_gdppc









    Out[19]:







  
    
      
      Unnamed: 0
      1
      Unnamed: 2
      1000
      Unnamed: 4
      1500
      Unnamed: 6
      1600
      Unnamed: 8
      1700
      ...
      1999
      2000
      2001
      2002
      2003
      2004
      2005
      2006
      2007
      2008
    
  
  
    
      0
      Western Europe
      NaN
      NaN
      NaN
      NaN
      
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      1
      Austria
      425.000000
      NaN
      425.000000
      NaN
      707
      NaN
      837.200000
      NaN
      993.200000
      ...
      20065.093878
      20691.415561
      20812.893753
      20955.874051
      21165.047259
      21626.929322
      22140.725899
      22892.682427
      23674.041130
      24130.547035
    
    
      2
      Belgium
      450.000000
      NaN
      425.000000
      NaN
      875
      NaN
      975.625000
      NaN
      1144.000000
      ...
      19964.428266
      20656.458570
      20761.238278
      21032.935511
      21205.859281
      21801.602508
      22246.561977
      22881.632810
      23446.949672
      23654.763464
    
    
      3
      Denmark
      400.000000
      NaN
      400.000000
      NaN
      738.333
      NaN
      875.384615
      NaN
      1038.571429
      ...
      22254.890572
      22975.162513
      23059.374968
      23082.620719
      23088.582457
      23492.664119
      23972.564284
      24680.492880
      24995.245167
      24620.568805
    
    
      4
      Finland
      400.000000
      NaN
      400.000000
      NaN
      453.333
      NaN
      537.500000
      NaN
      637.500000
      ...
      18855.985066
      19770.363126
      20245.896529
      20521.702225
      20845.802738
      21574.406196
      22140.573208
      23190.283543
      24131.519569
      24343.586318
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      190
      Total Africa
      472.352941
      NaN
      424.767802
      NaN
      413.71
      NaN
      422.071584
      NaN
      420.628684
      ...
      1430.752576
      1447.071701
      1471.156532
      1482.629352
      1517.935644
      1558.099461
      1603.686517
      1663.531318
      1724.226776
      1780.265474
    
    
      191
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      192
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      193
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      194
      World Average
      466.752281
      NaN
      453.402162
      NaN
      566.389
      NaN
      595.783856
      NaN
      614.853602
      ...
      5833.255492
      6037.675887
      6131.705471
      6261.734267
      6469.119575
      6738.281333
      6960.031035
      7238.383483
      7467.648232
      7613.922924
    
  

195 rows × 200 columns

Let's start by renaming the first column, which has the region/country names



In [20]:

    
maddison_old_pop.rename(columns={'Unnamed: 0':'Country'}, inplace=True)
maddison_old_gdppc.rename(columns={'Unnamed: 0':'Country'}, inplace=True)

Now let's drop all the columns that do not have data



In [21]:

    
maddison_old_pop = maddison_old_pop[[col for col in maddison_old_pop.columns if str(col).startswith('Unnamed')==False]]
maddison_old_gdppc = maddison_old_gdppc[[col for col in maddison_old_gdppc.columns if str(col).startswith('Unnamed')==False]]

Now, let's change the name of the columns so they reflect the underlying variable



In [22]:

    
maddison_old_pop.columns = ['Country'] + ['pop_'+str(col) for col in maddison_old_pop.columns[1:]]
maddison_old_gdppc.columns = ['Country'] + ['gdppc_'+str(col) for col in maddison_old_gdppc.columns[1:]]



In [23]:

    
maddison_old_pop









    Out[23]:







  
    
      
      Country
      pop_1
      pop_1000
      pop_1500
      pop_1600
      pop_1700
      pop_1820
      pop_1821
      pop_1822
      pop_1823
      ...
      pop_2001
      pop_2002
      pop_2003
      pop_2004
      pop_2005
      pop_2006
      pop_2007
      pop_2008
      pop_2009
      pop_2030
    
  
  
    
      0
      Western Europe
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      1
      Austria
      500.0
      700.0
      2000.0
      2500.0
      2500.0
      3369.0
      3386.0
      3402.0
      3419.0
      ...
      8131.690
      8148.312
      8162.656
      8174.762
      8184.691
      8192.880
      8199.783
      8205.533
      8210
      8120.000
    
    
      2
      Belgium
      300.0
      400.0
      1400.0
      1600.0
      2000.0
      3434.0
      3464.0
      3495.0
      3526.0
      ...
      10291.679
      10311.970
      10330.824
      10348.276
      10364.388
      10379.067
      10392.226
      10403.951
      10414
      10409.000
    
    
      3
      Denmark
      180.0
      360.0
      600.0
      650.0
      700.0
      1155.0
      1167.0
      1179.0
      1196.0
      ...
      5355.826
      5374.693
      5394.138
      5413.392
      5432.335
      5450.661
      5468.120
      5484.723
      5501
      5730.488
    
    
      4
      Finland
      20.0
      40.0
      300.0
      400.0
      400.0
      1169.0
      1186.0
      1202.0
      1219.0
      ...
      5180.309
      5193.039
      5204.405
      5214.512
      5223.442
      5231.372
      5238.460
      5244.749
      5250
      5201.445
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      273
      Guadeloupe
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      431.170
      435.739
      440.189
      444.515
      448.713
      452.776
      456.698
      460.486
      n.a.
      523.493
    
    
      274
      Guyana (Fr.)
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      177.562
      182.333
      186.917
      191.309
      195.506
      199.509
      203.321
      206.941
      n.a.
      272.781
    
    
      275
      Martinique
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      418.454
      422.277
      425.966
      429.510
      432.900
      436.131
      439.202
      442.119
      n.a.
      486.714
    
    
      276
      Reunion
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      732.570
      743.981
      755.171
      766.153
      776.948
      787.584
      798.094
      808.506
      n.a.
      1025.217
    
    
      277
      Total
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      1759.756
      1784.330
      1808.243
      1831.487
      1854.067
      1876.000
      1897.315
      1918.052
      n.a.
      2308.205
    
  

278 rows × 197 columns



In [24]:

    
maddison_old_gdppc









    Out[24]:







  
    
      
      Country
      gdppc_1
      gdppc_1000
      gdppc_1500
      gdppc_1600
      gdppc_1700
      gdppc_1820
      gdppc_1821
      gdppc_1822
      gdppc_1823
      ...
      gdppc_1999
      gdppc_2000
      gdppc_2001
      gdppc_2002
      gdppc_2003
      gdppc_2004
      gdppc_2005
      gdppc_2006
      gdppc_2007
      gdppc_2008
    
  
  
    
      0
      Western Europe
      NaN
      NaN
      
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      1
      Austria
      425.000000
      425.000000
      707
      837.200000
      993.200000
      1218.165628
      NaN
      NaN
      NaN
      ...
      20065.093878
      20691.415561
      20812.893753
      20955.874051
      21165.047259
      21626.929322
      22140.725899
      22892.682427
      23674.041130
      24130.547035
    
    
      2
      Belgium
      450.000000
      425.000000
      875
      975.625000
      1144.000000
      1318.870122
      NaN
      NaN
      NaN
      ...
      19964.428266
      20656.458570
      20761.238278
      21032.935511
      21205.859281
      21801.602508
      22246.561977
      22881.632810
      23446.949672
      23654.763464
    
    
      3
      Denmark
      400.000000
      400.000000
      738.333
      875.384615
      1038.571429
      1273.593074
      1320.479863
      1326.547922
      1307.692308
      ...
      22254.890572
      22975.162513
      23059.374968
      23082.620719
      23088.582457
      23492.664119
      23972.564284
      24680.492880
      24995.245167
      24620.568805
    
    
      4
      Finland
      400.000000
      400.000000
      453.333
      537.500000
      637.500000
      781.009410
      NaN
      NaN
      NaN
      ...
      18855.985066
      19770.363126
      20245.896529
      20521.702225
      20845.802738
      21574.406196
      22140.573208
      23190.283543
      24131.519569
      24343.586318
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      190
      Total Africa
      472.352941
      424.767802
      413.71
      422.071584
      420.628684
      419.755914
      NaN
      NaN
      NaN
      ...
      1430.752576
      1447.071701
      1471.156532
      1482.629352
      1517.935644
      1558.099461
      1603.686517
      1663.531318
      1724.226776
      1780.265474
    
    
      191
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      192
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      193
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      194
      World Average
      466.752281
      453.402162
      566.389
      595.783856
      614.853602
      665.735330
      NaN
      NaN
      NaN
      ...
      5833.255492
      6037.675887
      6131.705471
      6261.734267
      6469.119575
      6738.281333
      6960.031035
      7238.383483
      7467.648232
      7613.922924
    
  

195 rows × 195 columns

Let's choose the rows that hold the aggregates by region for the main regions of the world.



In [25]:

    
gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.apply(lambda x: str(x).upper().find('TOTAL')!=-1)].reset_index(drop=True)
gdppc = gdppc.dropna(subset=['gdppc_1'])
gdppc = gdppc.loc[2:]
gdppc['Country'] = gdppc.Country.str.replace('Total', '').str.replace('Countries', '').str.replace('\d+', '').str.replace('European', 'Europe').str.strip()
gdppc = gdppc.loc[gdppc.Country.apply(lambda x: x.find('USSR')==-1 and  x.find('West Asian')==-1)].reset_index(drop=True)
gdppc









    Out[25]:







  
    
      
      Country
      gdppc_1
      gdppc_1000
      gdppc_1500
      gdppc_1600
      gdppc_1700
      gdppc_1820
      gdppc_1821
      gdppc_1822
      gdppc_1823
      ...
      gdppc_1999
      gdppc_2000
      gdppc_2001
      gdppc_2002
      gdppc_2003
      gdppc_2004
      gdppc_2005
      gdppc_2006
      gdppc_2007
      gdppc_2008
    
  
  
    
      0
      Western Europe
      576.167665
      427.425665
      771.094
      887.906964
      993.456911
      1194.184683
      NaN
      NaN
      NaN
      ...
      18497.208533
      19176.001655
      19463.863297
      19627.707522
      19801.145425
      20199.220700
      20522.238008
      21087.304789
      21589.011346
      21671.774225
    
    
      1
      Western Offshoots
      400.000000
      400.000000
      400
      400.000000
      476.000000
      1201.993477
      NaN
      NaN
      NaN
      ...
      26680.580823
      27393.808035
      27387.312035
      27648.644070
      28090.274362
      28807.845958
      29415.399334
      29922.741918
      30344.425293
      30151.805880
    
    
      2
      East Europe
      411.789474
      400.000000
      496
      548.023599
      606.010638
      683.160984
      NaN
      NaN
      NaN
      ...
      5734.162109
      5970.165085
      6143.112873
      6321.395376
      6573.365882
      6942.136596
      7261.721015
      7730.097570
      8192.881904
      8568.967581
    
    
      3
      Latin America
      400.000000
      400.000000
      416.457
      437.558140
      526.639004
      691.060678
      NaN
      NaN
      NaN
      ...
      5765.585093
      5889.237351
      5846.295193
      5746.609672
      5785.841237
      6063.068969
      6265.525702
      6530.533583
      6783.869986
      6973.134656
    
    
      4
      Asia
      455.671021
      469.961665
      568.418
      573.550859
      571.605276
      580.626115
      NaN
      NaN
      NaN
      ...
      3623.902724
      3797.608955
      3927.186275
      4121.275511
      4388.982705
      4661.517477
      4900.563281
      5187.253152
      5408.383588
      5611.198564
    
    
      5
      Africa
      472.352941
      424.767802
      413.71
      422.071584
      420.628684
      419.755914
      NaN
      NaN
      NaN
      ...
      1430.752576
      1447.071701
      1471.156532
      1482.629352
      1517.935644
      1558.099461
      1603.686517
      1663.531318
      1724.226776
      1780.265474
    
  

6 rows × 195 columns

Let's drop missing values



In [26]:

    
gdppc = gdppc.dropna(axis=1, how='any')
gdppc









    Out[26]:







  
    
      
      Country
      gdppc_1
      gdppc_1000
      gdppc_1500
      gdppc_1600
      gdppc_1700
      gdppc_1820
      gdppc_1870
      gdppc_1900
      gdppc_1913
      ...
      gdppc_1999
      gdppc_2000
      gdppc_2001
      gdppc_2002
      gdppc_2003
      gdppc_2004
      gdppc_2005
      gdppc_2006
      gdppc_2007
      gdppc_2008
    
  
  
    
      0
      Western Europe
      576.167665
      427.425665
      771.094
      887.906964
      993.456911
      1194.184683
      1953.068150
      2884.661525
      3456.576178
      ...
      18497.208533
      19176.001655
      19463.863297
      19627.707522
      19801.145425
      20199.220700
      20522.238008
      21087.304789
      21589.011346
      21671.774225
    
    
      1
      Western Offshoots
      400.000000
      400.000000
      400
      400.000000
      476.000000
      1201.993477
      2419.152411
      4014.870040
      5232.816582
      ...
      26680.580823
      27393.808035
      27387.312035
      27648.644070
      28090.274362
      28807.845958
      29415.399334
      29922.741918
      30344.425293
      30151.805880
    
    
      2
      East Europe
      411.789474
      400.000000
      496
      548.023599
      606.010638
      683.160984
      936.628265
      1437.944586
      1694.879668
      ...
      5734.162109
      5970.165085
      6143.112873
      6321.395376
      6573.365882
      6942.136596
      7261.721015
      7730.097570
      8192.881904
      8568.967581
    
    
      3
      Latin America
      400.000000
      400.000000
      416.457
      437.558140
      526.639004
      691.060678
      676.005331
      1113.071149
      1494.431922
      ...
      5765.585093
      5889.237351
      5846.295193
      5746.609672
      5785.841237
      6063.068969
      6265.525702
      6530.533583
      6783.869986
      6973.134656
    
    
      4
      Asia
      455.671021
      469.961665
      568.418
      573.550859
      571.605276
      580.626115
      553.459947
      637.615593
      695.131881
      ...
      3623.902724
      3797.608955
      3927.186275
      4121.275511
      4388.982705
      4661.517477
      4900.563281
      5187.253152
      5408.383588
      5611.198564
    
    
      5
      Africa
      472.352941
      424.767802
      413.71
      422.071584
      420.628684
      419.755914
      500.011054
      601.236364
      637.433138
      ...
      1430.752576
      1447.071701
      1471.156532
      1482.629352
      1517.935644
      1558.099461
      1603.686517
      1663.531318
      1724.226776
      1780.265474
    
  

6 rows × 70 columns

Let's convert from wide to long format



In [27]:

    
gdppc = pd.wide_to_long(gdppc, ['gdppc_'], i='Country', j='year').reset_index()
gdppc









    Out[27]:







  
    
      
      Country
      year
      gdppc_
    
  
  
    
      0
      Western Europe
      1
      576.168
    
    
      1
      Western Offshoots
      1
      400
    
    
      2
      East Europe
      1
      411.789
    
    
      3
      Latin America
      1
      400
    
    
      4
      Asia
      1
      455.671
    
    
      ...
      ...
      ...
      ...
    
    
      409
      Western Offshoots
      2008
      30151.8
    
    
      410
      East Europe
      2008
      8568.97
    
    
      411
      Latin America
      2008
      6973.13
    
    
      412
      Asia
      2008
      5611.2
    
    
      413
      Africa
      2008
      1780.27
    
  

414 rows × 3 columns

Plotting

We can now plot the data. Let's try two different ways. The first uses the plot function from pandas. The second uses the package seaborn, which improves on the capabilities of matplotlib. The main difference is how the data needs to be organized. Of course, these are not the only ways to plot and we can try others.



In [28]:

    
import matplotlib as mpl
import seaborn as sns
# Setup seaborn
sns.set()

Let's pivot the table so that each region is a column and each row is a year. This will allow us to plot using the plot function of the pandas DataFrame.



In [29]:

    
gdppc2 = gdppc.pivot_table(index='year',columns='Country',values='gdppc_',aggfunc='sum')
gdppc2









    Out[29]:







  
    
      Country
      Africa
      Asia
      East Europe
      Latin America
      Western Europe
      Western Offshoots
    
    
      year
      
      
      
      
      
      
    
  
  
    
      1
      472.352941
      455.671021
      411.789474
      400.000000
      576.167665
      400.000000
    
    
      1000
      424.767802
      469.961665
      400.000000
      400.000000
      427.425665
      400.000000
    
    
      1500
      413.709504
      568.417900
      496.000000
      416.457143
      771.093805
      400.000000
    
    
      1600
      422.071584
      573.550859
      548.023599
      437.558140
      887.906964
      400.000000
    
    
      1700
      420.628684
      571.605276
      606.010638
      526.639004
      993.456911
      476.000000
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      2004
      1558.099461
      4661.517477
      6942.136596
      6063.068969
      20199.220700
      28807.845958
    
    
      2005
      1603.686517
      4900.563281
      7261.721015
      6265.525702
      20522.238008
      29415.399334
    
    
      2006
      1663.531318
      5187.253152
      7730.097570
      6530.533583
      21087.304789
      29922.741918
    
    
      2007
      1724.226776
      5408.383588
      8192.881904
      6783.869986
      21589.011346
      30344.425293
    
    
      2008
      1780.265474
      5611.198564
      8568.967581
      6973.134656
      21671.774225
      30151.805880
    
  

69 rows × 6 columns

Ok. Let's plot using the pandas plot function.



In [30]:

    
# Select some colors
mycolors = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# Use seaborn to setup a color map to be used by matplotlib
my_cmap = mpl.colors.ListedColormap(sns.color_palette(mycolors).as_hex())

# Set the size of the figure and get a figure and axis object
fig, ax = plt.subplots(figsize=(30,20))
# Plot using the axis ax and colormap my_cmap
gdppc2.loc[1800:].plot(ax=ax, linewidth=8, cmap=my_cmap)
# Change options of axes, legend
ax.tick_params(axis = 'both', which = 'major', labelsize=32)
ax.tick_params(axis = 'both', which = 'minor', labelsize=16)
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(prop={'size': 40}).set_title("Region", prop = {'size':40})
# Label axes
ax.set_xlabel('Year', fontsize=36)
ax.set_ylabel('GDP per capita (1990 Int\'l US$)', fontsize=36)









    Out[30]:





Text(0, 0.5, "GDP per capita (1990 Int'l US$)")



In [31]:

    
fig









    Out[31]:

Now, let's use seaborn



In [32]:

    
gdppc['Region'] = gdppc.Country.astype('category')
gdppc['gdppc_'] = gdppc.gdppc_.astype(float)
# Plot
fig, ax = plt.subplots(figsize=(30,20))
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[gdppc.year>=1800].reset_index(drop=True), alpha=1, lw=8, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=False)
ax.tick_params(axis = 'both', which = 'major', labelsize=32)
ax.tick_params(axis = 'both', which = 'minor', labelsize=16)
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year', fontsize=36)
ax.set_ylabel('GDP per capita (1990 Int\'l US$)', fontsize=36)









    Out[32]:





Text(0, 0.5, "GDP per capita (1990 Int'l US$)")



In [33]:

    
fig









    Out[33]:

Nice! Basically the same plot. But we can do better! Let's use seaborn again, but this time use different markers for each region, and let's use only a subset of the data so that it looks better. Also, let's export the figure so we can use it in our slides.



In [34]:

    
# Create category for hue
gdppc['Region'] = gdppc.Country.astype('category')
gdppc['gdppc_'] = gdppc.gdppc_.astype(float)

sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=1800) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'y1820-2010.pdf', dpi=300, bbox_inches='tight')



In [35]:

    
fig









    Out[35]:

Let's create the same plot using the updated data from the Maddison Project. Here we have less years, but the picture is similar.



In [36]:

    
maddison_new_region['Region'] = maddison_new_region.region_name

mycolors2 = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71", "orange", "b"]
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='cgdppc', hue='Region', data=maddison_new_region.loc[(maddison_new_region.year.apply(lambda x: x in [1870, 1890, 1913, 1929,1950, 2016])) | ((maddison_new_region.year>1950) & (maddison_new_region.year.apply(lambda x: np.mod(x,10)==0)))], alpha=1, palette=sns.color_palette(mycolors2), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (2011 Int\'l US$)')
plt.savefig(pathgraphs + 'y1870-2016.pdf', dpi=300, bbox_inches='tight')



In [37]:

    
fig









    Out[37]:

Let's show the evolution starting from other periods.



In [38]:

    
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=1700) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'take-off-1700-2010.pdf', dpi=300, bbox_inches='tight')



In [39]:

    
fig









    Out[39]:



In [40]:

    
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=1500) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'y1500-2010.pdf', dpi=300, bbox_inches='tight')



In [41]:

    
fig









    Out[41]:



In [42]:

    
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=1000) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'y1000-2010.pdf', dpi=300, bbox_inches='tight')



In [43]:

    
fig









    Out[43]:



In [44]:

    
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=gdppc.loc[(gdppc.year>=0) & (gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'y1-2010.pdf', dpi=300, bbox_inches='tight')



In [45]:

    
fig









    Out[45]:

Let's plot the evolution of GDP per capita for the whole world



In [46]:

    
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country=='World Average']
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc
world_gdppc['Region'] = world_gdppc.Country.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)

sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='gdppc_', hue='Region', data=world_gdppc.loc[(world_gdppc.year>=0) & (world_gdppc.year.apply(lambda x: x not in [
       1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1961,
       1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1971, 1972,
       1973, 1974, 1975, 1976, 1977, 1978, 1979, 1981, 1982, 1983,
       1984, 1985, 1986, 1987, 1988, 1989, 1991, 1992, 1993, 1994,
       1995, 1996, 1997, 1998, 1999, 2001, 2002, 2003, 2004, 2005,
       2006, 2007]))].reset_index(drop=True), alpha=1, style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$)')
plt.savefig(pathgraphs + 'W-y1-2010.pdf', dpi=300, bbox_inches='tight')



In [47]:

    
fig









    Out[47]:

Let's plot $log(GDPpc)$ during the modern era when we have sustained economic growth



In [48]:

    
gdppc['lgdppc'] = np.log(gdppc.gdppc_)

# Select some colors
mycolors = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# Use seaborn to setup a color map to be used by matplotlib
my_cmap = mpl.colors.ListedColormap(sns.color_palette(mycolors).as_hex())

sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='lgdppc', hue='Region', data=gdppc.loc[(gdppc.year>=1950)].reset_index(drop=True), alpha=1, palette=sns.color_palette(mycolors), style='Region', dashes=False, markers=True,)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
ax.legend(loc='upper left')
ax.set_xlabel('Year')
ax.set_ylabel('Log[GDP per capita (1990 Int\'l US$)]')
plt.savefig(pathgraphs + 'sg1950-2000.pdf', dpi=300, bbox_inches='tight')



In [49]:

    
fig









    Out[49]:



In [50]:

    
mycolors2 = ["#34495e", "#2ecc71"]
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='cgdppc', hue='Region', data=maddison_new_region.loc[(maddison_new_region.year>=1870) & (maddison_new_region.region.apply(lambda x: x in ['we', 'wo']))], alpha=1, palette=sns.color_palette(mycolors2), style='Region', dashes=False, markers=['D', '^'],)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1f}'))
ax.set_yscale('log')
ax.set_yticks([500, 5000, 50000])
ax.get_yaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter())
ax.legend(loc='upper left')
ax.set_xlabel('Year')
ax.set_ylabel('GDP per capita (1990 Int\'l US$, log-scale)')
plt.savefig(pathgraphs + 'sg1870-2000.pdf', dpi=300, bbox_inches='tight')

Growth Rates

Let's select a subsample of periods between 1CE and 2008 and compute the growth rate per year of income per capita in the world. We will select the sample of years we want using the loc operator and then use the shift operator to get data from the previous observation.



In [51]:

    
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 2008]).astype(int)
world_gdppc









    Out[51]:







  
    
      
      Country
      year
      gdppc_
      Region
      mysample
    
  
  
    
      0
      World Average
      1
      466.752281
      World Average
      1
    
    
      1
      World Average
      1000
      453.402162
      World Average
      1
    
    
      2
      World Average
      1500
      566.389464
      World Average
      1
    
    
      3
      World Average
      1600
      595.783856
      World Average
      0
    
    
      4
      World Average
      1700
      614.853602
      World Average
      0
    
    
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      189
      World Average
      2004
      6738.281333
      World Average
      0
    
    
      190
      World Average
      2005
      6960.031035
      World Average
      0
    
    
      191
      World Average
      2006
      7238.383483
      World Average
      0
    
    
      192
      World Average
      2007
      7467.648232
      World Average
      0
    
    
      193
      World Average
      2008
      7613.922924
      World Average
      1
    
  

69 rows × 5 columns



In [52]:

    
maddison_growth = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth['year_prev'] = maddison_growth['year'] - maddison_growth['year'].shift(1)
maddison_growth['growth'] = ((maddison_growth['gdppc_'] / maddison_growth['gdppc_'].shift(1)) ** (1/ maddison_growth.year_prev) -1)
maddison_growth['Period'] = maddison_growth['year'].astype(str).shift(1) + '-' + maddison_growth['year'].astype(str)
maddison_growth









    Out[52]:







  
    
      
      Country
      year
      gdppc_
      Region
      mysample
      year_prev
      growth
      Period
    
  
  
    
      0
      World Average
      1
      466.752281
      World Average
      1
      NaN
      NaN
      NaN
    
    
      1
      World Average
      1000
      453.402162
      World Average
      1
      999.0
      -0.000029
      1-1000
    
    
      2
      World Average
      1500
      566.389464
      World Average
      1
      500.0
      0.000445
      1000-1500
    
    
      3
      World Average
      1820
      665.735330
      World Average
      1
      320.0
      0.000505
      1500-1820
    
    
      4
      World Average
      2008
      7613.922924
      World Average
      1
      188.0
      0.013046
      1820-2008



In [53]:

    
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues", maddison_growth.shape[0]+4)[4:])
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
#handles, labels = ax.get_legend_handles_labels()
#ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate of Income per capita')
plt.savefig(pathgraphs + 'W-g1-2010.pdf', dpi=300, bbox_inches='tight')



In [54]:

    
fig









    Out[54]:

Growth of population and income (by regions)



In [55]:

    
# Growth rates gdppc
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country=='World Average']
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = 'World'
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)
print(maddison_growth_gdppc)









    



         Country  year       gdppc_ Region  mysample  year_prev    growth     Period
0  World Average     1   466.752281  World         1        NaN       NaN        NaN
1  World Average  1000   453.402162  World         1      999.0 -0.000029     1-1000
2  World Average  1500   566.389464  World         1      500.0  0.000445  1000-1500
3  World Average  1820   665.735330  World         1      320.0  0.000505  1500-1820
4  World Average  1913  1524.430799  World         1       93.0  0.008948  1820-1913



In [56]:

    
# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country=='World Total']
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = 'World'
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
print(maddison_growth_pop)









    



       Country  year          pop_ Region  mysample  year_prev    growth     Period
0  World Total     1  2.258200e+05  World         1        NaN       NaN        NaN
1  World Total  1000  2.673300e+05  World         1      999.0  0.000169     1-1000
2  World Total  1500  4.384280e+05  World         1      500.0  0.000990  1000-1500
3  World Total  1820  1.041708e+06  World         1      320.0  0.002708  1500-1820
4  World Total  1913  1.792925e+06  World         1       93.0  0.005856  1820-1913



In [57]:

    
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth



In [58]:

    
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 
maddison_growth









    Out[58]:







  
    
      
      Region
      Period
      variable
      growth
    
  
  
    
      0
      World
      1-1000
      Income per capita
      -0.000029
    
    
      1
      World
      1000-1500
      Income per capita
      0.000445
    
    
      2
      World
      1500-1820
      Income per capita
      0.000505
    
    
      3
      World
      1820-1913
      Income per capita
      0.008948
    
    
      4
      World
      1-1000
      Population
      0.000169
    
    
      5
      World
      1000-1500
      Population
      0.000990
    
    
      6
      World
      1500-1820
      Population
      0.002708
    
    
      7
      World
      1820-1913
      Population
      0.005856



In [59]:

    
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + 'W-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')



In [60]:

    
fig









    Out[60]:



In [61]:

    
# Growth rates gdppc
myregion = 'Western Offshoots'
fname = 'WO'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')



In [62]:

    
fig









    Out[62]:



In [63]:

    
# Growth rates gdppc
myregion = 'Western Europe'
fname = 'WE'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total 30  '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total 30  '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')



In [64]:

    
fig









    Out[64]:



In [65]:

    
# Growth rates gdppc
myregion = 'Latin America'
fname = 'LA'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')



In [66]:

    
fig









    Out[66]:



In [67]:

    
# Growth rates gdppc
myregion = 'Asia'
fname = 'AS'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')



In [68]:

    
fig









    Out[68]:



In [69]:

    
# Growth rates gdppc
myregion = 'Africa'
fname = 'AF'
world_gdppc = maddison_old_gdppc.loc[maddison_old_gdppc.Country.astype(str).str.strip()=='Total '+ myregion]
world_gdppc = pd.wide_to_long(world_gdppc, ['gdppc_'], i='Country', j='year').reset_index()
world_gdppc['Region'] = myregion
world_gdppc['Region'] = world_gdppc.Region.astype('category')
world_gdppc['gdppc_'] = world_gdppc.gdppc_.astype(float)
world_gdppc = world_gdppc.dropna(subset=['gdppc_'])
world_gdppc['mysample'] = world_gdppc.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)
maddison_growth_gdppc = world_gdppc.loc[world_gdppc.mysample==1].reset_index(drop=True)
maddison_growth_gdppc['year_prev'] = maddison_growth_gdppc['year'] - maddison_growth_gdppc['year'].shift(1)
maddison_growth_gdppc['growth'] = ((maddison_growth_gdppc['gdppc_'] / maddison_growth_gdppc['gdppc_'].shift(1)) ** (1/ maddison_growth_gdppc.year_prev) -1)
maddison_growth_gdppc['Period'] = maddison_growth_gdppc['year'].astype(str).shift(1) + '-' + maddison_growth_gdppc['year'].astype(str)

# Growth rates population
world_pop = maddison_old_pop.loc[maddison_old_pop.Country.astype(str).str.strip()=='Total '+ myregion]
world_pop = pd.wide_to_long(world_pop, ['pop_'], i='Country', j='year').reset_index()
world_pop['Region'] = myregion
world_pop['Region'] = world_pop.Region.astype('category')
world_pop['pop_'] = world_pop.pop_.astype(float)
world_pop = world_pop.dropna(subset=['pop_'])
world_pop['mysample'] = world_pop.year.apply(lambda x: x in [1, 1000, 1500, 1820, 1913]).astype(int)

# Merge
maddison_growth_pop = world_pop.loc[world_pop.mysample==1].reset_index(drop=True)
maddison_growth_pop['year_prev'] = maddison_growth_pop['year'] - maddison_growth_pop['year'].shift(1)
maddison_growth_pop['growth'] = ((maddison_growth_pop['pop_'] / maddison_growth_pop['pop_'].shift(1)) ** (1/ maddison_growth_pop.year_prev) -1)
maddison_growth_pop['Period'] = maddison_growth_pop['year'].astype(str).shift(1) + '-' + maddison_growth_pop['year'].astype(str)
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'GDPpc', 'growth_pop':'Population'})
maddison_growth = maddison_growth_gdppc[['Region', 'Period', 'growth']].merge(maddison_growth_pop[['Region', 'Period', 'growth']], on=['Region', 'Period'],
                                                            suffixes=['_gdppc', '_pop'])
maddison_growth = maddison_growth.dropna()
maddison_growth = maddison_growth.rename(columns={'growth_gdppc':'Income per capita', 'growth_pop':'Population'})
maddison_growth = pd.melt(maddison_growth, id_vars =['Region', 'Period'], value_vars =['Income per capita', 'Population'],
        var_name='variable',value_name='growth') 

# Plot
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.barplot(x='Period', y='growth', hue='variable', data=maddison_growth, alpha=1, palette=sns.color_palette("Blues_r"))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.1%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[0:], labels=labels[0:])
ax.set_xlabel('Period')
ax.set_ylabel('Growth Rate')
plt.savefig(pathgraphs + fname + '-pm-gr-y-p.pdf', dpi=300, bbox_inches='tight')



In [70]:

    
fig









    Out[70]:

Comparing richest to poorest region across time

Let's create a table that shows the GDP per capita levels for the 6 regions in the original data and compute the ratio of richest to poorest. Let's also plot it.



In [71]:

    
gdppc2['Richest-Poorest Ratio'] = gdppc2.max(axis=1) / gdppc2.min(axis=1)
gdp_ratio = gdppc2.loc[[1, 1000, 1500, 1700, 1820, 1870, 1913, 1940, 1960, 1980, 2000, 2008]].T
gdp_ratio = gdp_ratio.T.reset_index()
gdp_ratio['Region'] = 'Richest-Poorest'
gdp_ratio['Region'] = gdp_ratio.Region.astype('category')



In [72]:

    
gdp_ratio









    Out[72]:







  
    
      Country
      year
      Africa
      Asia
      East Europe
      Latin America
      Western Europe
      Western Offshoots
      Richest-Poorest Ratio
      Region
    
  
  
    
      0
      1
      472.352941
      455.671021
      411.789474
      400.000000
      576.167665
      400.000000
      1.440419
      Richest-Poorest
    
    
      1
      1000
      424.767802
      469.961665
      400.000000
      400.000000
      427.425665
      400.000000
      1.174904
      Richest-Poorest
    
    
      2
      1500
      413.709504
      568.417900
      496.000000
      416.457143
      771.093805
      400.000000
      1.927735
      Richest-Poorest
    
    
      3
      1700
      420.628684
      571.605276
      606.010638
      526.639004
      993.456911
      476.000000
      2.361838
      Richest-Poorest
    
    
      4
      1820
      419.755914
      580.626115
      683.160984
      691.060678
      1194.184683
      1201.993477
      2.863553
      Richest-Poorest
    
    
      5
      1870
      500.011054
      553.459947
      936.628265
      676.005331
      1953.068150
      2419.152411
      4.838198
      Richest-Poorest
    
    
      6
      1913
      637.433138
      695.131881
      1694.879668
      1494.431922
      3456.576178
      5232.816582
      8.209201
      Richest-Poorest
    
    
      7
      1940
      813.374613
      893.992784
      1968.706774
      1932.850716
      4554.045082
      6837.844866
      8.406760
      Richest-Poorest
    
    
      8
      1960
      1055.114678
      1025.743131
      3069.750386
      3135.517072
      6879.294331
      10961.082848
      10.685992
      Richest-Poorest
    
    
      9
      1980
      1514.558119
      2028.654705
      5785.933433
      5437.924365
      13154.033928
      18060.162963
      11.924378
      Richest-Poorest
    
    
      10
      2000
      1447.071701
      3797.608955
      5970.165085
      5889.237351
      19176.001655
      27393.808035
      18.930512
      Richest-Poorest
    
    
      11
      2008
      1780.265474
      5611.198564
      8568.967581
      6973.134656
      21671.774225
      30151.805880
      16.936691
      Richest-Poorest



In [73]:

    
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Richest-Poorest Ratio', data=gdp_ratio, alpha=1, hue='Region', style='Region', dashes=False, markers=True, )
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
#ax.legend(title='', prop={'size': 40})
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Richest-Poorest Ratio')
plt.savefig(pathgraphs + 'Richest-Poorest-Ratio.pdf', dpi=300, bbox_inches='tight')



In [74]:

    
fig









    Out[74]:

Visualize as Table



In [75]:

    
gdp_ratio.style.format({
    1: '{:,.1f}'.format, 1000: '{:,.1f}'.format, 1500: '{:,.1%}'.format, 1700: '{:,.1%}'.format, 
    1820: '{:,.1%}'.format, 1870: '{:,.1%}'.format, 1913: '{:,.1%}'.format, 1940: '{:,.1%}'.format, 
    1960: '{:,.1%}'.format, 1980: '{:,.1%}'.format, 2000: '{:,.1%}'.format, 2008: '{:,.1%}'.format, 
})









    Out[75]:




            Country         year         Africa         Asia         East Europe         Latin America         Western Europe         Western Offshoots         Richest-Poorest Ratio         Region     

                
                        0
                        1
                        472.352941
                        455.671021
                        411.789474
                        400.000000
                        576.167665
                        400.000000
                        1.440419
                        Richest-Poorest
            
            
                        1
                        1000
                        424.767802
                        469.961665
                        400.000000
                        400.000000
                        427.425665
                        400.000000
                        1.174904
                        Richest-Poorest
            
            
                        2
                        1500
                        413.709504
                        568.417900
                        496.000000
                        416.457143
                        771.093805
                        400.000000
                        1.927735
                        Richest-Poorest
            
            
                        3
                        1700
                        420.628684
                        571.605276
                        606.010638
                        526.639004
                        993.456911
                        476.000000
                        2.361838
                        Richest-Poorest
            
            
                        4
                        1820
                        419.755914
                        580.626115
                        683.160984
                        691.060678
                        1194.184683
                        1201.993477
                        2.863553
                        Richest-Poorest
            
            
                        5
                        1870
                        500.011054
                        553.459947
                        936.628265
                        676.005331
                        1953.068150
                        2419.152411
                        4.838198
                        Richest-Poorest
            
            
                        6
                        1913
                        637.433138
                        695.131881
                        1694.879668
                        1494.431922
                        3456.576178
                        5232.816582
                        8.209201
                        Richest-Poorest
            
            
                        7
                        1940
                        813.374613
                        893.992784
                        1968.706774
                        1932.850716
                        4554.045082
                        6837.844866
                        8.406760
                        Richest-Poorest
            
            
                        8
                        1960
                        1055.114678
                        1025.743131
                        3069.750386
                        3135.517072
                        6879.294331
                        10961.082848
                        10.685992
                        Richest-Poorest
            
            
                        9
                        1980
                        1514.558119
                        2028.654705
                        5785.933433
                        5437.924365
                        13154.033928
                        18060.162963
                        11.924378
                        Richest-Poorest
            
            
                        10
                        2000
                        1447.071701
                        3797.608955
                        5970.165085
                        5889.237351
                        19176.001655
                        27393.808035
                        18.930512
                        Richest-Poorest
            
            
                        11
                        2008
                        1780.265474
                        5611.198564
                        8568.967581
                        6973.134656
                        21671.774225
                        30151.805880
                        16.936691
                        Richest-Poorest

Export table to LaTeX

Let's print the table as LaTeX code that can be copied and pasted in our slides or paper.



In [76]:

    
print(gdp_ratio.to_latex(formatters={
    1: '{:,.1f}'.format, 1000: '{:,.1f}'.format, 1500: '{:,.1f}'.format, 1700: '{:,.1f}'.format, 
    1820: '{:,.1f}'.format, 1870: '{:,.1f}'.format, 1913: '{:,.1f}'.format, 1940: '{:,.1f}'.format, 
    1960: '{:,.1f}'.format, 1980: '{:,.1f}'.format, 2000: '{:,.1f}'.format, 2008: '{:,.1f}'.format, 
}))









    



\begin{tabular}{lrrrrrrrrl}
\toprule
Country &  year &       Africa &         Asia &  East Europe &  Latin America &  Western Europe &  Western Offshoots &  Richest-Poorest Ratio &           Region \\
\midrule
0  &     1 &   472.352941 &   455.671021 &   411.789474 &     400.000000 &      576.167665 &         400.000000 &               1.440419 &  Richest-Poorest \\
1  &  1000 &   424.767802 &   469.961665 &   400.000000 &     400.000000 &      427.425665 &         400.000000 &               1.174904 &  Richest-Poorest \\
2  &  1500 &   413.709504 &   568.417900 &   496.000000 &     416.457143 &      771.093805 &         400.000000 &               1.927735 &  Richest-Poorest \\
3  &  1700 &   420.628684 &   571.605276 &   606.010638 &     526.639004 &      993.456911 &         476.000000 &               2.361838 &  Richest-Poorest \\
4  &  1820 &   419.755914 &   580.626115 &   683.160984 &     691.060678 &     1194.184683 &        1201.993477 &               2.863553 &  Richest-Poorest \\
5  &  1870 &   500.011054 &   553.459947 &   936.628265 &     676.005331 &     1953.068150 &        2419.152411 &               4.838198 &  Richest-Poorest \\
6  &  1913 &   637.433138 &   695.131881 &  1694.879668 &    1494.431922 &     3456.576178 &        5232.816582 &               8.209201 &  Richest-Poorest \\
7  &  1940 &   813.374613 &   893.992784 &  1968.706774 &    1932.850716 &     4554.045082 &        6837.844866 &               8.406760 &  Richest-Poorest \\
8  &  1960 &  1055.114678 &  1025.743131 &  3069.750386 &    3135.517072 &     6879.294331 &       10961.082848 &              10.685992 &  Richest-Poorest \\
9  &  1980 &  1514.558119 &  2028.654705 &  5785.933433 &    5437.924365 &    13154.033928 &       18060.162963 &              11.924378 &  Richest-Poorest \\
10 &  2000 &  1447.071701 &  3797.608955 &  5970.165085 &    5889.237351 &    19176.001655 &       27393.808035 &              18.930512 &  Richest-Poorest \\
11 &  2008 &  1780.265474 &  5611.198564 &  8568.967581 &    6973.134656 &    21671.774225 &       30151.805880 &              16.936691 &  Richest-Poorest \\
\bottomrule
\end{tabular}



In [77]:

    
%%latex
\begin{tabular}{lrrrrrrrrrrrr}
\toprule
year &  1    &  1000 &  1500 &  1700 &    1820 &    1870 &    1913 &    1940 &     1960 &     1980 &     2000 &     2008 \\
Country               &       &       &       &       &         &         &         &         &          &          &          &          \\
\midrule
Africa                & 472.4 & 424.8 & 413.7 & 420.6 &   419.8 &   500.0 &   637.4 &   813.4 &  1,055.1 &  1,514.6 &  1,447.1 &  1,780.3 \\
Asia                  & 455.7 & 470.0 & 568.4 & 571.6 &   580.6 &   553.5 &   695.1 &   894.0 &  1,025.7 &  2,028.7 &  3,797.6 &  5,611.2 \\
East Europe           & 411.8 & 400.0 & 496.0 & 606.0 &   683.2 &   936.6 & 1,694.9 & 1,968.7 &  3,069.8 &  5,785.9 &  5,970.2 &  8,569.0 \\
Latin America         & 400.0 & 400.0 & 416.5 & 526.6 &   691.1 &   676.0 & 1,494.4 & 1,932.9 &  3,135.5 &  5,437.9 &  5,889.2 &  6,973.1 \\
Western Europe        & 576.2 & 427.4 & 771.1 & 993.5 & 1,194.2 & 1,953.1 & 3,456.6 & 4,554.0 &  6,879.3 & 13,154.0 & 19,176.0 & 21,671.8 \\
Western Offshoots     & 400.0 & 400.0 & 400.0 & 476.0 & 1,202.0 & 2,419.2 & 5,232.8 & 6,837.8 & 10,961.1 & 18,060.2 & 27,393.8 & 30,151.8 \\
Richest-Poorest Ratio &   1.4 &   1.2 &   1.9 &   2.4 &     2.9 &     4.8 &     8.2 &     8.4 &     10.7 &     11.9 &     18.9 &     16.9 \\
\bottomrule
\end{tabular}









    





\begin{tabular}{lrrrrrrrrrrrr}
\toprule
year &  1    &  1000 &  1500 &  1700 &    1820 &    1870 &    1913 &    1940 &     1960 &     1980 &     2000 &     2008 \\
Country               &       &       &       &       &         &         &         &         &          &          &          &          \\
\midrule
Africa                & 472.4 & 424.8 & 413.7 & 420.6 &   419.8 &   500.0 &   637.4 &   813.4 &  1,055.1 &  1,514.6 &  1,447.1 &  1,780.3 \\
Asia                  & 455.7 & 470.0 & 568.4 & 571.6 &   580.6 &   553.5 &   695.1 &   894.0 &  1,025.7 &  2,028.7 &  3,797.6 &  5,611.2 \\
East Europe           & 411.8 & 400.0 & 496.0 & 606.0 &   683.2 &   936.6 & 1,694.9 & 1,968.7 &  3,069.8 &  5,785.9 &  5,970.2 &  8,569.0 \\
Latin America         & 400.0 & 400.0 & 416.5 & 526.6 &   691.1 &   676.0 & 1,494.4 & 1,932.9 &  3,135.5 &  5,437.9 &  5,889.2 &  6,973.1 \\
Western Europe        & 576.2 & 427.4 & 771.1 & 993.5 & 1,194.2 & 1,953.1 & 3,456.6 & 4,554.0 &  6,879.3 & 13,154.0 & 19,176.0 & 21,671.8 \\
Western Offshoots     & 400.0 & 400.0 & 400.0 & 476.0 & 1,202.0 & 2,419.2 & 5,232.8 & 6,837.8 & 10,961.1 & 18,060.2 & 27,393.8 & 30,151.8 \\
Richest-Poorest Ratio &   1.4 &   1.2 &   1.9 &   2.4 &     2.9 &     4.8 &     8.2 &     8.4 &     10.7 &     11.9 &     18.9 &     16.9 \\
\bottomrule
\end{tabular}

Export Table to HTML



In [78]:

    
from IPython.display import display, HTML
display(HTML(gdp_ratio.to_html(formatters={
    1: '{:,.1f}'.format, 1000: '{:,.1f}'.format, 1500: '{:,.1f}'.format, 1700: '{:,.1f}'.format, 
    1820: '{:,.1f}'.format, 1870: '{:,.1f}'.format, 1913: '{:,.1f}'.format, 1940: '{:,.1f}'.format, 
    1960: '{:,.1f}'.format, 1980: '{:,.1f}'.format, 2000: '{:,.1f}'.format, 2008: '{:,.1f}'.format, 
})))









    





  
    
      Country
      year
      Africa
      Asia
      East Europe
      Latin America
      Western Europe
      Western Offshoots
      Richest-Poorest Ratio
      Region
    
  
  
    
      0
      1
      472.352941
      455.671021
      411.789474
      400.000000
      576.167665
      400.000000
      1.440419
      Richest-Poorest
    
    
      1
      1000
      424.767802
      469.961665
      400.000000
      400.000000
      427.425665
      400.000000
      1.174904
      Richest-Poorest
    
    
      2
      1500
      413.709504
      568.417900
      496.000000
      416.457143
      771.093805
      400.000000
      1.927735
      Richest-Poorest
    
    
      3
      1700
      420.628684
      571.605276
      606.010638
      526.639004
      993.456911
      476.000000
      2.361838
      Richest-Poorest
    
    
      4
      1820
      419.755914
      580.626115
      683.160984
      691.060678
      1194.184683
      1201.993477
      2.863553
      Richest-Poorest
    
    
      5
      1870
      500.011054
      553.459947
      936.628265
      676.005331
      1953.068150
      2419.152411
      4.838198
      Richest-Poorest
    
    
      6
      1913
      637.433138
      695.131881
      1694.879668
      1494.431922
      3456.576178
      5232.816582
      8.209201
      Richest-Poorest
    
    
      7
      1940
      813.374613
      893.992784
      1968.706774
      1932.850716
      4554.045082
      6837.844866
      8.406760
      Richest-Poorest
    
    
      8
      1960
      1055.114678
      1025.743131
      3069.750386
      3135.517072
      6879.294331
      10961.082848
      10.685992
      Richest-Poorest
    
    
      9
      1980
      1514.558119
      2028.654705
      5785.933433
      5437.924365
      13154.033928
      18060.162963
      11.924378
      Richest-Poorest
    
    
      10
      2000
      1447.071701
      3797.608955
      5970.165085
      5889.237351
      19176.001655
      27393.808035
      18.930512
      Richest-Poorest
    
    
      11
      2008
      1780.265474
      5611.198564
      8568.967581
      6973.134656
      21671.774225
      30151.805880
      16.936691
      Richest-Poorest

Take-off, industrialization and reversals

Industrialization per capita

Let's create a full dataframe inserting the data by hand. This is based on data from Bairoch, P., 1982. "International industrialization levels from 1750 to 1980". Journal of European Economic History, 11(2), p.269. for 1750-1913 the data comes from Table 9



In [79]:

    
industrialization = [['Developed Countries', 8, 8, 11, 16, 24, 35, 55],
                     ['Europe', 8, 8, 11, 17, 23, 33, 45],
                     ['Austria-Hungary', 7, 7, 8, 11, 15, 23, 32],
                     ['Belgium', 9, 10, 14, 28, 43, 56, 88],
                     ['France', 9, 9, 12, 20, 28, 39, 59],
                     ['Germany', 8, 8, 9, 15, 25, 52, 85],
                     ['Italy', 8, 8, 8, 10, 12, 17, 26],
                     ['Russia', 6, 6, 7, 8, 10, 15, 20],
                     ['Spain', 7, 7, 8, 11, 14, 19, 22],
                     ['Sweden', 7, 8, 9, 15, 24, 41, 67],
                     ['Switzerland', 7, 10, 16, 26, 39, 67, 87],
                     ['United Kingdom', 10, 16, 25, 64, 87, 100, 115],
                     ['Canada', np.nan, 5, 6, 7, 10, 24, 46],
                     ['United States', 4, 9, 14, 21, 38, 69, 126],
                     ['Japan', 7, 7, 7, 7, 9, 12, 20],
                     ['Third World', 7, 6, 6, 4, 3, 2, 2],
                     ['China', 8, 6, 6, 4, 4, 3, 3],
                     ['India', 7, 6, 6, 3, 2, 1, 2],
                     ['Brazil', np.nan, np.nan, np.nan, 4, 4, 5, 7],
                     ['Mexico', np.nan, np.nan, np.nan, 5, 4, 5, 7],
                     ['World', 7, 6, 7, 7, 9, 14, 21]]

years = [1750, 1800, 1830, 1860, 1880, 1900, 1913]
industrialization = pd.DataFrame(industrialization, columns=['Country'] + ['y'+str(y) for y in years])

For 1913-1980 the data comes from Table 12



In [80]:

    
industrialization2 = [['Developed Countries', 55, 71, 81, 135, 194, 315, 344],
                      ['Market Economies', np.nan, 96, 105, 167, 222, 362, 387],
                      ['Europe', 45, 76, 94, 107, 166, 260, 280],
                      ['Belgium', 88, 116, 89, 117, 183, 291, 316],
                      ['France', 59, 82, 73, 95, 167, 259, 277],
                      ['Germany', 85, 101, 128, 144, 244, 366, 395],
                      ['Italy', 26, 39, 44, 61, 121, 194, 231],
                      ['Spain', 22, 28, 23, 31, 56, 144, 159],
                      ['Sweden', 67, 84, 135, 163, 262, 405, 409],
                      ['Switzerland', 87, 90, 88, 167, 259, 366, 354],
                      ['United Kingdom', 115, 122, 157, 210, 253, 341, 325],
                      ['Canada', 46, 82, 84, 185, 237, 370, 379],
                      ['United States', 126, 182, 167, 354, 393, 604, 629],
                      ['Japan', 20, 30, 51, 40, 113, 310, 353],
                      ['U.S.S.R.', 20, 20, 38, 73, 139, 222, 252],
                      ['Third World', 2, 3, 4, 5, 8, 14, 17],
                      ['India', 2, 3, 4, 6, 8, 14, 16],
                      ['Brazil', 7, 10, 10, 13, 23, 42, 55],
                      ['Mexico', 7, 9, 8, 12, 22, 36, 41],
                      ['China', 3, 4, 4, 5, 10, 18, 24],
                      ['World', 21, 28, 31 ,48, 66, 100, 103]]
years = [1913, 1928, 1938, 1953, 1963, 1973, 1980]
industrialization2 = pd.DataFrame(industrialization2, columns=['Country'] + ['y'+str(y) for y in years])

Let's join both dataframes so we can plot the whole series.



In [81]:

    
industrialization = industrialization.merge(industrialization2)
industrialization









    Out[81]:







  
    
      
      Country
      y1750
      y1800
      y1830
      y1860
      y1880
      y1900
      y1913
      y1928
      y1938
      y1953
      y1963
      y1973
      y1980
    
  
  
    
      0
      Developed Countries
      8.0
      8.0
      11.0
      16
      24
      35
      55
      71
      81
      135
      194
      315
      344
    
    
      1
      Europe
      8.0
      8.0
      11.0
      17
      23
      33
      45
      76
      94
      107
      166
      260
      280
    
    
      2
      Belgium
      9.0
      10.0
      14.0
      28
      43
      56
      88
      116
      89
      117
      183
      291
      316
    
    
      3
      France
      9.0
      9.0
      12.0
      20
      28
      39
      59
      82
      73
      95
      167
      259
      277
    
    
      4
      Germany
      8.0
      8.0
      9.0
      15
      25
      52
      85
      101
      128
      144
      244
      366
      395
    
    
      5
      Italy
      8.0
      8.0
      8.0
      10
      12
      17
      26
      39
      44
      61
      121
      194
      231
    
    
      6
      Spain
      7.0
      7.0
      8.0
      11
      14
      19
      22
      28
      23
      31
      56
      144
      159
    
    
      7
      Sweden
      7.0
      8.0
      9.0
      15
      24
      41
      67
      84
      135
      163
      262
      405
      409
    
    
      8
      Switzerland
      7.0
      10.0
      16.0
      26
      39
      67
      87
      90
      88
      167
      259
      366
      354
    
    
      9
      United Kingdom
      10.0
      16.0
      25.0
      64
      87
      100
      115
      122
      157
      210
      253
      341
      325
    
    
      10
      Canada
      NaN
      5.0
      6.0
      7
      10
      24
      46
      82
      84
      185
      237
      370
      379
    
    
      11
      United States
      4.0
      9.0
      14.0
      21
      38
      69
      126
      182
      167
      354
      393
      604
      629
    
    
      12
      Japan
      7.0
      7.0
      7.0
      7
      9
      12
      20
      30
      51
      40
      113
      310
      353
    
    
      13
      Third World
      7.0
      6.0
      6.0
      4
      3
      2
      2
      3
      4
      5
      8
      14
      17
    
    
      14
      China
      8.0
      6.0
      6.0
      4
      4
      3
      3
      4
      4
      5
      10
      18
      24
    
    
      15
      India
      7.0
      6.0
      6.0
      3
      2
      1
      2
      3
      4
      6
      8
      14
      16
    
    
      16
      Brazil
      NaN
      NaN
      NaN
      4
      4
      5
      7
      10
      10
      13
      23
      42
      55
    
    
      17
      Mexico
      NaN
      NaN
      NaN
      5
      4
      5
      7
      9
      8
      12
      22
      36
      41
    
    
      18
      World
      7.0
      6.0
      7.0
      7
      9
      14
      21
      28
      31
      48
      66
      100
      103

Let's convert to long format and plot the evolution of industrialization across regions and groups of countries.



In [82]:

    
industrialization = pd.wide_to_long(industrialization, ['y'], i='Country', j='year').reset_index()
industrialization.rename(columns={'y':'Industrialization'}, inplace=True)



In [83]:

    
# Select some colors
mycolors = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# Use seaborn to setup a color map to be used by matplotlib
my_cmap = mpl.colors.ListedColormap(sns.color_palette(mycolors).as_hex())
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Industrialization', hue='Country',
             data=industrialization.loc[industrialization.Country.apply(lambda x: x in ['Developed Countries', 'Third World', 'World'])].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=True)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Industrialization per capita (UK in 1900=100)')
plt.savefig(pathgraphs + 'Industrialization-Dev-NonDev.pdf', dpi=300, bbox_inches='tight')



In [84]:

    
fig









    Out[84]:



In [85]:

    
# Map country name to development level
dev_level = {'Belgium':'Developed',
             'France':'Developed',
             'Germany':'Developed',
             'Italy':'Developed',
             'Spain':'Developed',
             'Sweden':'Developed',
             'Switzerland':'Developed',
             'United Kingdom':'Developed',
             'Canada':'Developed',
             'United States':'Developed',
             'Japan':'Developed',
             'China':'Developing',
             'India':'Developing',
             'Brazil':'Developing',
             'Mexico':'Developing'}

industrialization['dev_level'] = industrialization.Country.map(dev_level)

filled_markers = ('o', 's', 'v', '^', '<', '>', '8', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')

sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Industrialization', hue='Country',
             data=industrialization.loc[industrialization.dev_level=='Developed'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             palette=sns.cubehelix_palette(11, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Industrialization per capita (UK in 1900=100)')
plt.savefig(pathgraphs + 'Industrialization-Dev.pdf', dpi=300, bbox_inches='tight')



In [86]:

    
fig









    Out[86]:



In [87]:

    
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Industrialization', hue='Country',
             data=industrialization.loc[industrialization.dev_level=='Developing'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             palette=sns.cubehelix_palette(4, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Industrialization per capita (UK in 1900=100)')
plt.savefig(pathgraphs + 'Industrialization-NonDev.pdf', dpi=300, bbox_inches='tight')



In [88]:

    
fig









    Out[88]:



In [89]:

    
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='Industrialization', hue='Country',
             data=industrialization.loc[
                 (industrialization.Country.apply(lambda x: x in ['India', 'United Kingdom'])) & 
                 (industrialization.year<=1900)].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             )
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Industrialization per capita (UK in 1900=100)')
plt.savefig(pathgraphs + 'Industrialization-UK-IND.pdf', dpi=300, bbox_inches='tight')



In [90]:

    
fig









    Out[90]:

Manufacturing

Let's use data from the same source to explore what happened to the share of manufacturing across regions.



In [91]:

    
# 1750-1913
manufacturing = [['Developed Countries', 27.0, 32.3, 39.5, 63.4, 79.1, 89.0, 92.5],
                 ['Europe', 23.2, 28.1, 34.2, 53.2, 61.3, 62.0, 56.6],
                 ['Austria-Hungary', 2.9, 3.2, 3.2, 4.2, 4.4, 4.7, 4.4],
                 ['Belgium', 0.3, 0.5, 0.7, 1.4, 1.8, 1.7, 1.8],
                 ['France', 4.0, 4.2, 5.2, 7.9, 7.8, 6.8, 6.1],
                 ['Germany', 2.9, 3.5, 3.5, 4.9, 8.5, 13.2, 14.8],
                 ['Italy', 2.4, 2.5, 2.3, 2.5, 2.5, 2.5, 2.4],
                 ['Russia', 5.0, 5.6, 5.6, 7.0, 7.6, 8.8, 8.2],
                 ['Spain', 1.2, 1.5, 1.5, 1.8, 1.8, 1.6, 1.2],
                 ['Sweden', 0.3, 0.3, 0.4, 0.6, 0.8, 0.9, 1.0],
                 ['Switzerland', 0.1, 0.3, 0.4, 0.7, 0.8, 1.0, 0.9],
                 ['United Kingdom', 1.9, 4.3, 9.5, 19.9, 22.9, 18.5, 13.6],
                 ['Canada', np.nan, np.nan, 0.1, 0.3, 0.4, 0.6, 0.9],
                 ['United States', 0.1, 0.8, 2.4, 7.2, 14.7, 23.6, 32.0],
                 ['Japan', 3.8, 3.5, 2.8, 2.6, 2.4, 2.4, 2.7],
                 ['Third World', 73.0, 67.7, 60.5, 36.6, 20.9, 11.0, 7.5],
                 ['China', 32.8, 33.3, 29.8, 19.7, 12.5, 6.2, 3.6],
                 ['India', 24.5, 19.7, 17.6, 8.6, 2.8, 1.7, 1.4],
                 ['Brazil', np.nan, np.nan, np.nan, 0.4, 0.3, 0.4, 0.5],
                 ['Mexico', np.nan, np.nan, np.nan, 0.4, 0.3, 0.3, 0.3]]

years = [1750, 1800, 1830, 1860, 1880, 1900, 1913]
manufacturing = pd.DataFrame(manufacturing, columns=['Country'] + ['y'+str(y) for y in years])

# 1913-1980
manufacturing2 = [['Developed Countries', 92.5, 92.8, 92.8, 93.5, 91.5, 90.1, 88.0],
                  ['Market Economies', 76.7, 80.3, 76.5, 77.5, 70.5, 70.0, 66.9],
                  ['Europe', 40.8, 35.4, 37.3, 26.1, 26.5, 24.5, 22.9],
                  ['Belgium', 1.8, 1.7, 1.1, 0.8, 0.8, 0.7, 0.7],
                  ['France', 6.1, 6.0, 4.4, 3.2, 3.8, 3.5, 3.3],
                  ['Germany', 14.8, 11.6, 12.7, 5.9, 6.4, 5.9, 5.3],
                  ['Italy', 2.4, 2.7, 2.8, 2.3, 2.9, 2.9, 2.9],
                  ['Spain', 1.2, 1.1, 0.8, 0.7, 0.8, 1.3, 1.4],
                  ['Sweden', 1.0, 0.9, 1.2, 0.9, 0.9, 0.9, 0.8],
                  ['Switzerland', 0.9, 0.7, 0.5, 0.7, 0.7, 0.6, 0.5],
                  ['United Kingdom', 13.6, 9.9, 10.7, 8.4, 6.4, 4.9, 4.0],
                  ['Canada', 0.9, 1.5, 1.4, 2.2, 2.1, 2.1, 2.0],
                  ['United States', 32.0, 39.3, 31.4, 44.7, 35.1, 33.0, 31.5],
                  ['Japan', 2.7, 3.3, 5.2, 2.9, 5.1, 8.8, 9.1],
                  ['U.S.S.R.', 8.2, 5.3, 9.0, 10.7, 14.2, 14.4, 14.8],
                  ['Third World', 7.5, 7.2, 7.2, 6.5, 8.5, 9.9, 12.0],
                  ['India', 1.4, 1.9, 2.4, 1.7, 1.8, 2.1, 2.3],
                  ['Brazil', 0.5, 0.6, 0.6, 0.6, 0.8, 1.1, 1.4],
                  ['Mexico', 0.3, 0.2, 0.2, 0.3, 0.4, 0.5, 0.6],
                  ['China', 3.6, 3.4, 3.1, 2.3, 3.5, 3.9, 5.0]]
years = [1913, 1928, 1938, 1953, 1963, 1973, 1980]
manufacturing2 = pd.DataFrame(manufacturing2, columns=['Country'] + ['y'+str(y) for y in years])

# Merge
manufacturing = manufacturing.merge(manufacturing2)
manufacturing = pd.wide_to_long(manufacturing, ['y'], i='Country', j='year').reset_index()
manufacturing.rename(columns={'y':'manufacturing'}, inplace=True)
manufacturing['manufacturing'] = manufacturing.manufacturing / 100
manufacturing









    Out[91]:







  
    
      
      Country
      year
      manufacturing
    
  
  
    
      0
      Developed Countries
      1750
      0.270
    
    
      1
      Belgium
      1750
      0.003
    
    
      2
      France
      1750
      0.040
    
    
      3
      Germany
      1750
      0.029
    
    
      4
      Italy
      1750
      0.024
    
    
      ...
      ...
      ...
      ...
    
    
      216
      Third World
      1980
      0.120
    
    
      217
      China
      1980
      0.050
    
    
      218
      India
      1980
      0.023
    
    
      219
      Brazil
      1980
      0.014
    
    
      220
      Mexico
      1980
      0.006
    
  

221 rows × 3 columns



In [92]:

    
# Select some colors
mycolors = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# Use seaborn to setup a color map to be used by matplotlib
my_cmap = mpl.colors.ListedColormap(sns.color_palette(mycolors).as_hex())
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='manufacturing', hue='Country',
             data=manufacturing.loc[manufacturing.Country.apply(lambda x: x in ['Developed Countries', 'Third World', 'World'])].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=True)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Share of World Manufacturing')
plt.savefig(pathgraphs + 'Manufacturing-Dev-NonDev.pdf', dpi=300, bbox_inches='tight')



In [93]:

    
fig









    Out[93]:



In [94]:

    
# Map country name to development level
dev_level = {'Belgium':'Developed',
             'France':'Developed',
             'Germany':'Developed',
             'Italy':'Developed',
             'Spain':'Developed',
             'Sweden':'Developed',
             'Switzerland':'Developed',
             'United Kingdom':'Developed',
             'Canada':'Developed',
             'United States':'Developed',
             'Japan':'Developed',
             'China':'Developing',
             'India':'Developing',
             'Brazil':'Developing',
             'Mexico':'Developing'}

manufacturing['dev_level'] = manufacturing.Country.map(dev_level)

filled_markers = ('o', 's', 'v', '^', '<', '>', '8', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')

sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='manufacturing', hue='Country',
             data=manufacturing.loc[manufacturing.dev_level=='Developed'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             palette=sns.cubehelix_palette(11, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0%}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Share of World Manufacturing')
plt.savefig(pathgraphs + 'Manufacturing-Dev.pdf', dpi=300, bbox_inches='tight')



In [95]:

    
fig









    Out[95]:



In [96]:

    
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='manufacturing', hue='Country',
             data=manufacturing.loc[manufacturing.dev_level=='Developing'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             palette=sns.cubehelix_palette(4, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Share of World Manufacturing')
plt.savefig(pathgraphs + 'Manufacturing-NonDev.pdf', dpi=300, bbox_inches='tight')



In [97]:

    
fig









    Out[97]:



In [98]:

    
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='manufacturing', hue='Country',
             data=manufacturing.loc[
                 (manufacturing.Country.apply(lambda x: x in ['India', 'United Kingdom'])) & 
                 (manufacturing.year<=1900)].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             )
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Share of World Manufacturing')
plt.savefig(pathgraphs + 'manufacturing-UK-IND.pdf', dpi=300, bbox_inches='tight')



In [99]:

    
fig









    Out[99]:

Industrial Potential

We can also explore the industrial potantial of these countries.



In [100]:

    
# 1750-1913
indpotential = [['Developed Countries', 34.4, 47.4, 72.9, 143.2, 253.1, 481.2, 863.0,],
                ['Europe', 29.6, 41.2, 63.0, 120.3, 196.2, 335.4, 527.8,],
                ['Austria-Hungary', 3.7, 4.8, 5.8, 9.5, 14.0, 25.6, 40.7,],
                ['Belgium', 0.4, 0.7, 1.3, 3.1, 5.7, 9.2, 16.3,],
                ['France', 5.0, 6.2, 9.5, 17.9, 25.1, 36.8, 57.3,],
                ['Germany', 3.7, 5.2, 6.5, 11.1, 27.4, 71.2, 137.7,],
                ['Italy', 3.1, 3.7, 4.2, 5.7, 8.1, 13.6, 22.5,],
                ['Russia', 6.4, 8.3, 10.3, 15.8, 24.5, 47.5, 76.6,],
                ['Spain', 1.6, 2.1, 2.7, 4.0, 5.8, 8.5, 11.0,],
                ['Sweden', 0.3, 0.5, 0.6, 1.4, 2.6, 5.0, 9.0,],
                ['Switzerland', 0.2, 0.4, 0.8, 1.6, 2.6, 5.4, 8.0,],
                ['United Kingdom', 2.4, 6.2, 17.5, 45.0, 73.3, 100.0, 127.2,],
                ['Canada', np.nan, np.nan, 0.1, 0.6, 1.4, 3.2, 8.7,],
                ['United States', 0.1, 1.1, 4.6, 16.2, 46.9, 127.8, 298.1,],
                ['Japan', 4.8, 5.1, 5.2, 5.8, 7.6, 13.0, 25.1,],
                ['Third World', 92.9, 99.4, 111.5, 82.7, 67.0, 59.6, 69.5,],
                ['China', 41.7, 48.8, 54.9, 44.1, 39.9, 33.5, 33.3,],
                ['India', 31.2, 29.0, 32.5, 19.4, 8.8, 9.3, 13.1,],
                ['Brazil', np.nan, np.nan, np.nan, 0.9, 0.9, 2.1, 4.3,],
                ['Mexico', np.nan, np.nan, np.nan, 0.9, 0.8, 1.7, 2.7,],
                ['World', 127.3, 146.9, 184.4, 225.9, 320.1, 540.8, 932.5,]]

years = [1750, 1800, 1830, 1860, 1880, 1900, 1913]
indpotential = pd.DataFrame(indpotential, columns=['Country'] + ['y'+str(y) for y in years])

# 1913-1980
indpotential2 = [['Developed Countries', 863, 1259, 1562, 2870, 4699, 8432, 9718],
                 ['Market Economies', 715, 1089, 1288, 2380, 3624, 6547, 7388],
                 ['Europe', 380, 480, 629, 801, 1361, 2290, 2529],
                 ['Belgium', 16, 22, 18, 25, 41, 69, 76],
                 ['France', 57, 82, 74, 98, 194, 328, 362],
                 ['Germany', 138, 158, 214, 180, 330, 550, 590],
                 ['Italy', 23, 37, 46, 71, 150, 258, 319],
                 ['Spain', 11, 16, 14, 22, 43, 122, 156],
                 ['Sweden', 9, 12, 21, 28, 48, 80, 83],
                 ['Switzerland', 8, 9, 9, 20, 37, 57, 54],
                 ['United Kingdom', 127, 135, 181, 258, 330, 462, 441],
                 ['Canada', 9, 20, 23, 66, 109, 199, 220],
                 ['United States', 298, 533, 528, 1373, 1804, 3089, 3475],
                 ['Japan', 25, 45, 88, 88, 264, 819, 1001],
                 ['U.S.S.R.', 77, 72, 152, 328, 760, 1345, 1630],
                 ['Third World', 70, 98, 122, 200, 439, 927, 1323],
                 ['India', 13, 26, 40, 52, 91, 194, 254],
                 ['Brazil', 4, 8, 10, 18, 42, 102, 159],
                 ['Mexico', 3, 3, 4, 9, 21, 47, 68],
                 ['China', 33, 46, 52, 71, 178, 369, 553],
                 ['World', 933, 1356, 1684, 3070, 5138, 9359, 11041]]

years = [1913, 1928, 1938, 1953, 1963, 1973, 1980]
indpotential2 = pd.DataFrame(indpotential2, columns=['Country'] + ['y'+str(y) for y in years])

# Merge
indpotential = indpotential.merge(indpotential2[indpotential2.columns.difference(['y1913'])])
indpotential = pd.wide_to_long(indpotential, ['y'], i='Country', j='year').reset_index()
indpotential.rename(columns={'y':'indpotential'}, inplace=True)
indpotential









    Out[100]:







  
    
      
      Country
      year
      indpotential
    
  
  
    
      0
      Developed Countries
      1750
      34.4
    
    
      1
      Europe
      1750
      29.6
    
    
      2
      Belgium
      1750
      0.4
    
    
      3
      France
      1750
      5.0
    
    
      4
      Germany
      1750
      3.7
    
    
      ...
      ...
      ...
      ...
    
    
      242
      China
      1980
      553.0
    
    
      243
      India
      1980
      254.0
    
    
      244
      Brazil
      1980
      159.0
    
    
      245
      Mexico
      1980
      68.0
    
    
      246
      World
      1980
      11041.0
    
  

247 rows × 3 columns



In [101]:

    
# Select some colors
mycolors = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
# Use seaborn to setup a color map to be used by matplotlib
my_cmap = mpl.colors.ListedColormap(sns.color_palette(mycolors).as_hex())
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='indpotential', hue='Country',
             data=indpotential.loc[indpotential.Country.apply(lambda x: x in ['Developed Countries', 'Third World', 'World'])].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=True)
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Total Industrial Potential (UK in 1900 = 100)')
plt.savefig(pathgraphs + 'indpotential-Dev-NonDev.pdf', dpi=300, bbox_inches='tight')



In [102]:

    
fig









    Out[102]:



In [103]:

    
# Map country name to development level
dev_level = {'Belgium':'Developed',
             'France':'Developed',
             'Germany':'Developed',
             'Italy':'Developed',
             'Spain':'Developed',
             'Sweden':'Developed',
             'Switzerland':'Developed',
             'United Kingdom':'Developed',
             'Canada':'Developed',
             'United States':'Developed',
             'Japan':'Developed',
             'China':'Developing',
             'India':'Developing',
             'Brazil':'Developing',
             'Mexico':'Developing'}

indpotential['dev_level'] = indpotential.Country.map(dev_level)

filled_markers = ('o', 's', 'v', '^', '<', '>', '8', 'p', '*', 'h', 'H', 'D', 'd', 'P', 'X')

sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='indpotential', hue='Country',
             data=indpotential.loc[indpotential.dev_level=='Developed'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             palette=sns.cubehelix_palette(11, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Total Industrial Potential (UK in 1900 = 100)')
plt.savefig(pathgraphs + 'indpotential-Dev.pdf', dpi=300, bbox_inches='tight')



In [104]:

    
fig









    Out[104]:



In [105]:

    
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='indpotential', hue='Country',
             data=indpotential.loc[indpotential.dev_level=='Developing'].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             palette=sns.cubehelix_palette(4, start=.5, rot=-.75))
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Total Industrial Potential (UK in 1900 = 100)')
plt.savefig(pathgraphs + 'indpotential-NonDev.pdf', dpi=300, bbox_inches='tight')



In [106]:

    
fig









    Out[106]:



In [107]:

    
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='year', y='indpotential', hue='Country',
             data=indpotential.loc[
                 (indpotential.Country.apply(lambda x: x in ['India', 'United Kingdom'])) & 
                 (indpotential.year<=1900)].reset_index(drop=True),
             alpha=1, style='Country', dashes=False, markers=filled_markers,
             )
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,.0f}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Year')
ax.set_ylabel('Total Industrial Potential (UK in 1900 = 100)')
plt.savefig(pathgraphs + 'indpotential-UK-IND.pdf', dpi=300, bbox_inches='tight')



In [108]:

    
fig









    Out[108]:

Persistence

Let's explore the persistence of economic development since 1950. To do so, let's get the Penn World Table and World Bank Data.

Penn World Table

Let's start by importing the data from the Penn World Tables



In [109]:

    
try:
    pwt_xls = pd.read_excel(pathout + 'pwt91.xlsx',encoding='utf-8')
    pwt = pd.read_stata(pathout + 'pwt91.dta')    
except:
    pwt_xls = pd.read_excel('https://www.rug.nl/ggdc/docs/pwt91.xlsx',sheet_name=1)
    pwt = pd.read_stata('https://www.rug.nl/ggdc/docs/pwt91.dta')
    pwt_xls.to_excel(pathout + 'pwt91.xlsx', index=False, encoding='utf-8')
    pwt.to_stata(pathout + 'pwt91.dta', write_index=False, version=117)
    
# Get labels of variables
pwt_labels = pd.io.stata.StataReader(pathout + 'pwt91.dta').variable_labels()

The excel file let's us know the defintion of the variables, while the Stata file has the data (of course the excel file also has the data). For some reason the original Stata file does not seem to have labels!



In [110]:

    
pwt_labels









    Out[110]:





{'countrycode': '',
 'country': '',
 'currency_unit': '',
 'year': '',
 'rgdpe': '',
 'rgdpo': '',
 'pop': '',
 'emp': '',
 'avh': '',
 'hc': '',
 'ccon': '',
 'cda': '',
 'cgdpe': '',
 'cgdpo': '',
 'cn': '',
 'ck': '',
 'ctfp': '',
 'cwtfp': '',
 'rgdpna': '',
 'rconna': '',
 'rdana': '',
 'rnna': '',
 'rkna': '',
 'rtfpna': '',
 'rwtfpna': '',
 'labsh': '',
 'irr': '',
 'delta': '',
 'xr': '',
 'pl_con': '',
 'pl_da': '',
 'pl_gdpo': '',
 'i_cig': '',
 'i_xm': '',
 'i_xr': '',
 'i_outlier': '',
 'i_irr': '',
 'cor_exp': '',
 'statcap': '',
 'csh_c': '',
 'csh_i': '',
 'csh_g': '',
 'csh_x': '',
 'csh_m': '',
 'csh_r': '',
 'pl_c': '',
 'pl_i': '',
 'pl_g': '',
 'pl_x': '',
 'pl_m': '',
 'pl_n': '',
 'pl_k': ''}



In [111]:

    
pwt_xls









    Out[111]:







  
    
      
      Variable name
      Variable definition
    
  
  
    
      0
      Identifier variables
      NaN
    
    
      1
      countrycode
      3-letter ISO country code
    
    
      2
      country
      Country name
    
    
      3
      currency_unit
      Currency unit
    
    
      4
      year
      Year
    
    
      ...
      ...
      ...
    
    
      62
      pl_g
      Price level of government consumption,  price ...
    
    
      63
      pl_x
      Price level of exports, price level of USA GDP...
    
    
      64
      pl_m
      Price level of imports, price level of USA GDP...
    
    
      65
      pl_n
      Price level of the capital stock, price level ...
    
    
      66
      pl_k
      Price level of the capital services, price lev...
    
  

67 rows × 2 columns



In [112]:

    
pwt









    Out[112]:







  
    
      
      countrycode
      country
      currency_unit
      year
      rgdpe
      rgdpo
      pop
      emp
      avh
      hc
      ...
      csh_x
      csh_m
      csh_r
      pl_c
      pl_i
      pl_g
      pl_x
      pl_m
      pl_n
      pl_k
    
  
  
    
      0
      ABW
      Aruba
      Aruban Guilder
      1950
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      1
      ABW
      Aruba
      Aruban Guilder
      1951
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      2
      ABW
      Aruba
      Aruban Guilder
      1952
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      3
      ABW
      Aruba
      Aruban Guilder
      1953
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      4
      ABW
      Aruba
      Aruban Guilder
      1954
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      12371
      ZWE
      Zimbabwe
      US Dollar
      2013
      28086.937500
      28329.810547
      15.054506
      7.914061
      NaN
      2.504635
      ...
      0.169638
      -0.426188
      0.090225
      0.577488
      0.582022
      0.448409
      0.723247
      0.632360
      0.383488
      0.704313
    
    
      12372
      ZWE
      Zimbabwe
      US Dollar
      2014
      29217.554688
      29355.759766
      15.411675
      8.222112
      NaN
      2.550258
      ...
      0.141791
      -0.340442
      0.051500
      0.600760
      0.557172
      0.392895
      0.724510
      0.628352
      0.349735
      0.704991
    
    
      12373
      ZWE
      Zimbabwe
      US Dollar
      2015
      30091.923828
      29150.750000
      15.777451
      8.530669
      NaN
      2.584653
      ...
      0.137558
      -0.354298
      -0.023353
      0.622927
      0.580814
      0.343926
      0.654940
      0.564430
      0.348472
      0.713156
    
    
      12374
      ZWE
      Zimbabwe
      US Dollar
      2016
      30974.292969
      29420.449219
      16.150362
      8.839398
      NaN
      2.616257
      ...
      0.141248
      -0.310446
      0.003050
      0.640176
      0.599462
      0.337853
      0.657060
      0.550084
      0.346553
      0.718671
    
    
      12375
      ZWE
      Zimbabwe
      US Dollar
      2017
      32693.474609
      30940.816406
      16.529903
      9.181251
      NaN
      2.648248
      ...
      0.141799
      -0.299539
      0.019133
      0.647136
      0.726222
      0.340680
      0.645338
      0.539529
      0.412392
      0.755215
    
  

12376 rows × 52 columns



In [113]:

    
# Describe the data
pwt.describe()









    Out[113]:







  
    
      
      year
      rgdpe
      rgdpo
      pop
      emp
      avh
      hc
      ccon
      cda
      cgdpe
      ...
      csh_x
      csh_m
      csh_r
      pl_c
      pl_i
      pl_g
      pl_x
      pl_m
      pl_n
      pl_k
    
  
  
    
      count
      12376.000000
      9.985000e+03
      9.985000e+03
      9985.000000
      8841.000000
      3373.000000
      8299.000000
      9.985000e+03
      9.985000e+03
      9.985000e+03
      ...
      9985.000000
      9985.000000
      9985.000000
      9985.000000
      9985.000000
      9985.000000
      9985.000000
      9985.000000
      9959.000000
      7047.000000
    
    
      mean
      1983.500000
      2.720569e+05
      2.691928e+05
      30.736767
      14.799485
      1984.099854
      2.064241
      1.984998e+05
      2.686580e+05
      2.697088e+05
      ...
      0.229183
      -0.307399
      0.019670
      0.391839
      0.486303
      0.368860
      0.436420
      0.431026
      0.466652
      1.403137
    
    
      std
      19.628579
      1.078882e+06
      1.070178e+06
      114.569824
      59.107712
      272.879944
      0.720774
      7.772703e+05
      1.079234e+06
      1.070720e+06
      ...
      0.260547
      0.681575
      0.201448
      0.280254
      0.956450
      0.347244
      0.211918
      0.220563
      0.400624
      2.628997
    
    
      min
      1950.000000
      1.846645e+01
      1.977999e+01
      0.004376
      0.001180
      1353.886841
      1.007038
      1.443100e+01
      1.986141e+01
      1.848834e+01
      ...
      -1.496417
      -26.741989
      -8.731015
      0.017207
      0.012448
      0.010474
      0.007868
      0.022644
      0.019666
      0.060732
    
    
      25%
      1966.750000
      6.178189e+03
      6.380658e+03
      1.634517
      0.940000
      1799.336060
      1.431531
      5.227761e+03
      6.395296e+03
      6.002223e+03
      ...
      0.068159
      -0.381261
      -0.022347
      0.182697
      0.198099
      0.125520
      0.243906
      0.248910
      0.219715
      0.663940
    
    
      50%
      1983.500000
      2.725946e+04
      2.710632e+04
      6.115370
      3.021000
      1972.072876
      1.954407
      2.153850e+04
      2.763264e+04
      2.677256e+04
      ...
      0.144143
      -0.203762
      0.000727
      0.326817
      0.396347
      0.256664
      0.473103
      0.486665
      0.364834
      0.982678
    
    
      75%
      2000.250000
      1.386558e+05
      1.374726e+05
      19.891548
      8.583438
      2149.860352
      2.649120
      1.005379e+05
      1.357644e+05
      1.362898e+05
      ...
      0.301996
      -0.104336
      0.044098
      0.520135
      0.594202
      0.490205
      0.596405
      0.576243
      0.569292
      1.458653
    
    
      max
      2017.000000
      1.839607e+07
      1.838384e+07
      1409.517456
      792.575317
      2910.734863
      3.974208
      1.483615e+07
      1.846078e+07
      1.792857e+07
      ...
      3.057809
      23.158607
      9.917986
      3.986815
      35.654171
      2.367351
      2.271417
      5.465247
      6.730951
      60.361191
    
  

8 rows × 44 columns

Computing $\log$ GDP per capita

Now, we can create new variables, transform and plot the data

To compute the $log$ of income per capita (GDPpc), the first thing we need is to know the name of the column that contains the GDPpc data in the dataframe. To do this, let's find among the variables those whic in their description have the word capita.



In [114]:

    
pwt_xls.columns









    Out[114]:





Index(['Variable name', 'Variable definition'], dtype='object')

To be able to read the definitions better, let's tell pandas to show us more content.



In [115]:

    
pd.set_option("display.max_columns", 20)
pd.set_option('display.max_rows', 50)
pd.set_option('display.width', 1000)
#pd.set_option('display.max_colwidth', -1)



In [116]:

    
pwt_xls.loc[pwt_xls['Variable definition'].apply(lambda x: str(x).lower().find('capita')!=-1)]









    Out[116]:







  
    
      
      Variable name
      Variable definition
    
  
  
    
      12
      hc
      Human capital index, based on years of schooli...
    
    
      19
      cn
      Capital stock at current PPPs (in mil. 2011US$)
    
    
      20
      ck
      Capital services levels at current PPPs (USA=1)
    
    
      28
      rnna
      Capital stock at constant 2011 national prices...
    
    
      29
      rkna
      Capital services at constant 2011 national pri...
    
    
      34
      delta
      Average depreciation rate of the capital stock
    
    
      47
      i_irr
      0/1/2/3: the observation for irr is not an out...
    
    
      53
      csh_i
      Share of gross capital formation at current PPPs
    
    
      61
      pl_i
      Price level of capital formation,  price level...
    
    
      65
      pl_n
      Price level of the capital stock, price level ...
    
    
      66
      pl_k
      Price level of the capital services, price lev...

So, it seems the data does not contain that variable. But do not panic...we know how to compute it based on GDP and Population. Let's do it!

Identify the name of the variable for GDP



In [117]:

    
pwt_xls.loc[pwt_xls['Variable definition'].apply(lambda x: str(x).upper().find('GDP')!=-1)]









    Out[117]:







  
    
      
      Variable name
      Variable definition
    
  
  
    
      7
      rgdpe
      Expenditure-side real GDP at chained PPPs (in ...
    
    
      8
      rgdpo
      Output-side real GDP at chained PPPs (in mil. ...
    
    
      17
      cgdpe
      Expenditure-side real GDP at current PPPs (in ...
    
    
      18
      cgdpo
      Output-side real GDP at current PPPs (in mil. ...
    
    
      25
      rgdpna
      Real GDP at constant 2011 national prices (in ...
    
    
      32
      labsh
      Share of labour compensation in GDP at current...
    
    
      38
      pl_con
      Price level of CCON (PPP/XR), price level of U...
    
    
      39
      pl_da
      Price level of CDA (PPP/XR), price level of US...
    
    
      40
      pl_gdpo
      Price level of CGDPo (PPP/XR),  price level of...
    
    
      46
      i_outlier
      0/1: the observation on pl_gdpe or pl_gdpo is ...
    
    
      57
      csh_r
      Share of residual trade and GDP statistical di...
    
    
      60
      pl_c
      Price level of household consumption,  price l...
    
    
      61
      pl_i
      Price level of capital formation,  price level...
    
    
      62
      pl_g
      Price level of government consumption,  price ...
    
    
      63
      pl_x
      Price level of exports, price level of USA GDP...
    
    
      64
      pl_m
      Price level of imports, price level of USA GDP...

Identify the name of the variable for population



In [118]:

    
pwt_xls.loc[pwt_xls['Variable definition'].apply(lambda x: str(x).lower().find('population')!=-1)]









    Out[118]:







  
    
      
      Variable name
      Variable definition
    
  
  
    
      9
      pop
      Population (in millions)

Create a new variables/columns with real GDPpc for all the measures included in PWT



In [119]:

    
# Get columns with GDP measures
gdpcols = pwt_xls.loc[pwt_xls['Variable definition'].apply(lambda x: str(x).upper().find('REAL GDP')!=-1), 'Variable name'].tolist()

# Generate GDPpc for each measure
for gdp in gdpcols:
    pwt[gdp + '_pc'] = pwt[gdp] / pwt['pop']

# GDPpc data
gdppccols = [col+'_pc' for col in gdpcols]
pwt[['countrycode', 'country', 'year'] + gdppccols]









    Out[119]:







  
    
      
      countrycode
      country
      year
      rgdpe_pc
      rgdpo_pc
      cgdpe_pc
      cgdpo_pc
      rgdpna_pc
    
  
  
    
      0
      ABW
      Aruba
      1950
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      1
      ABW
      Aruba
      1951
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      2
      ABW
      Aruba
      1952
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      3
      ABW
      Aruba
      1953
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      4
      ABW
      Aruba
      1954
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      12371
      ZWE
      Zimbabwe
      2013
      1865.683105
      1881.816040
      1874.657715
      1898.868286
      1952.479736
    
    
      12372
      ZWE
      Zimbabwe
      2014
      1895.806519
      1904.774048
      1918.362305
      1935.120605
      1947.798950
    
    
      12373
      ZWE
      Zimbabwe
      2015
      1907.274170
      1847.621094
      1924.819824
      1902.378662
      1934.789307
    
    
      12374
      ZWE
      Zimbabwe
      2016
      1917.869873
      1821.658813
      1932.771973
      1889.612061
      1901.752686
    
    
      12375
      ZWE
      Zimbabwe
      2017
      1977.838257
      1871.808716
      1998.100098
      1940.005371
      1913.949829
    
  

12376 rows × 8 columns

Now let's use the apply function to compute logs.



In [120]:

    
pwt[['l'+col for col in gdppccols]] = pwt[gdppccols].apply(np.log, axis=1)
pwt[['countrycode', 'country', 'year'] + ['l'+col for col in gdppccols]]









    Out[120]:







  
    
      
      countrycode
      country
      year
      lrgdpe_pc
      lrgdpo_pc
      lcgdpe_pc
      lcgdpo_pc
      lrgdpna_pc
    
  
  
    
      0
      ABW
      Aruba
      1950
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      1
      ABW
      Aruba
      1951
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      2
      ABW
      Aruba
      1952
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      3
      ABW
      Aruba
      1953
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      4
      ABW
      Aruba
      1954
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      12371
      ZWE
      Zimbabwe
      2013
      7.531383
      7.539993
      7.536181
      7.549013
      7.576856
    
    
      12372
      ZWE
      Zimbabwe
      2014
      7.547400
      7.552119
      7.559227
      7.567925
      7.574455
    
    
      12373
      ZWE
      Zimbabwe
      2015
      7.553431
      7.521654
      7.562588
      7.550860
      7.567754
    
    
      12374
      ZWE
      Zimbabwe
      2016
      7.558970
      7.507503
      7.566710
      7.544127
      7.550531
    
    
      12375
      ZWE
      Zimbabwe
      2017
      7.589760
      7.534660
      7.599952
      7.570446
      7.556924
    
  

12376 rows × 8 columns

How correlated are these measures of log GDP per capita?



In [121]:

    
pwt[['countrycode', 'country', 'year'] + ['l'+col for col in gdppccols]].groupby('year').corr()









    Out[121]:







  
    
      
      
      lrgdpe_pc
      lrgdpo_pc
      lcgdpe_pc
      lcgdpo_pc
      lrgdpna_pc
    
    
      year
      
      
      
      
      
      
    
  
  
    
      1950
      lrgdpe_pc
      1.000000
      0.996004
      0.999360
      0.994707
      0.939644
    
    
      lrgdpo_pc
      0.996004
      1.000000
      0.995951
      0.998978
      0.942147
    
    
      lcgdpe_pc
      0.999360
      0.995951
      1.000000
      0.995946
      0.939410
    
    
      lcgdpo_pc
      0.994707
      0.998978
      0.995946
      1.000000
      0.943629
    
    
      lrgdpna_pc
      0.939644
      0.942147
      0.939410
      0.943629
      1.000000
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      2017
      lrgdpe_pc
      1.000000
      0.975165
      0.999933
      0.978183
      0.990955
    
    
      lrgdpo_pc
      0.975165
      1.000000
      0.974924
      0.999628
      0.982313
    
    
      lcgdpe_pc
      0.999933
      0.974924
      1.000000
      0.978034
      0.990629
    
    
      lcgdpo_pc
      0.978183
      0.999628
      0.978034
      1.000000
      0.984534
    
    
      lrgdpna_pc
      0.990955
      0.982313
      0.990629
      0.984534
      1.000000
    
  

340 rows × 5 columns

While it seems they are highly correlated, it is hard to see here directly. Let's get the statistics for each measures correlations across all years.



In [122]:

    
pwt[['countrycode', 'country', 'year'] + ['l'+col for col in gdppccols]].groupby('year').corr().describe()

Ok. This gives us a better sense of how strongly correlated these measures of log GDP per capita are. In what follows we will use only one, namely Log[GDPpc] based on Expenditure-side real GDP at chained PPPs (in mil. 2011US$), i.e., lrgdpe_pc.

Convergence post-1960?

Let's start by looking at the distribution of Log[GDPpc] in 1960. For these we need to subset our dataframe and select only the rows for the year 1960. This is don with the loc property of the dataframe.



In [123]:

    
gdppc1960 = pwt.loc[pwt.year==1960, ['countrycode', 'country', 'year', 'lrgdpe_pc']]
gdppc1960









    Out[123]:







  
    
      
      countrycode
      country
      year
      lrgdpe_pc
    
  
  
    
      10
      ABW
      Aruba
      1960
      NaN
    
    
      78
      AGO
      Angola
      1960
      NaN
    
    
      146
      AIA
      Anguilla
      1960
      NaN
    
    
      214
      ALB
      Albania
      1960
      NaN
    
    
      282
      ARE
      United Arab Emirates
      1960
      NaN
    
    
      ...
      ...
      ...
      ...
      ...
    
    
      12046
      VNM
      Viet Nam
      1960
      NaN
    
    
      12114
      YEM
      Yemen
      1960
      NaN
    
    
      12182
      ZAF
      South Africa
      1960
      8.664412
    
    
      12250
      ZMB
      Zambia
      1960
      7.883263
    
    
      12318
      ZWE
      Zimbabwe
      1960
      7.646267
    
  

182 rows × 4 columns

gdppc1960 has the data for all countries in th eyear 1960. We can plot the histogram using the functions of the dataframe.



In [124]:

    
gdppc1960.lrgdpe_pc.hist()









    Out[124]:





<matplotlib.axes._subplots.AxesSubplot at 0x1323d7280>

We can also plot it using the seaborn package. Let's plot the kernel density of the distribution



In [125]:

    
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.kdeplot(gdppc1960.lrgdpe_pc, ax=ax, shade=True, label='1960', linewidth=2)
ax.set_xlabel('Log[Income per capita]')
ax.set_ylabel('Density of Countries')
plt.savefig(pathgraphs + 'y1960-density.pdf', dpi=300, bbox_inches='tight')



In [126]:

    
fig









    Out[126]:

Let's now also include the distribution for other years



In [127]:

    
gdppc1980 = pwt.loc[pwt.year==1980, ['countrycode', 'country', 'year', 'lrgdpe_pc']]
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.kdeplot(gdppc1960.lrgdpe_pc, ax=ax, shade=True, label='1960', linewidth=2)
sns.kdeplot(gdppc1980.lrgdpe_pc, ax=ax, shade=True, label='1980', linewidth=2)
ax.set_xlabel('Log[Income per capita]')
ax.set_ylabel('Density of Countries')
plt.savefig(pathgraphs + 'y1960-1980-density.pdf', dpi=300, bbox_inches='tight')



In [128]:

    
fig









    Out[128]:



In [129]:

    
gdppc2000 = pwt.loc[pwt.year==2000, ['countrycode', 'country', 'year', 'lrgdpe_pc']]
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.kdeplot(gdppc1960.lrgdpe_pc, ax=ax, shade=True, label='1960', linewidth=2)
sns.kdeplot(gdppc1980.lrgdpe_pc, ax=ax, shade=True, label='1980', linewidth=2)
sns.kdeplot(gdppc2000.lrgdpe_pc, ax=ax, shade=True, label='2000', linewidth=2)
ax.set_xlabel('Log[Income per capita]')
ax.set_ylabel('Density of Countries')
plt.savefig(pathgraphs + 'y1960-2000-density.pdf', dpi=300, bbox_inches='tight')



In [130]:

    
fig









    Out[130]:

Let's show the evolution of the distribution by looking at it every 10 years starting from 1950 onwards. Moreover, let's do everything in a unique piece of code.



In [131]:

    
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
period = list(range(1950, 2020, 10)) + [2017]
#mycolors = sns.color_palette("GnBu", n_colors=len(period)+5)
mycolors = sns.cubehelix_palette(len(period), start=.5, rot=-.75)
# Plot
fig, ax = plt.subplots()
k = 0
for t in period:
    sns.kdeplot(pwt.loc[pwt.year==t].lrgdpe_pc, ax=ax, shade=True, label=str(t), linewidth=2, color=mycolors[k])
    k += 1
ax.set_xlabel('Log[Income per capita]')
ax.set_ylabel('Density of Countries')
plt.savefig(pathgraphs + 'y1950-2010-density.pdf', dpi=300, bbox_inches='tight')



In [132]:

    
fig









    Out[132]:

Persistence

The lack of convergence in the last 60 years suggest that there is some persistence in (recent) development. Let's explore this by plotting the association between past GDP per capita across different periods. In order to make things more comparable, let's normalize looking at income levels relative to the US. To do so, it's better to use the year as the index of the dataframe.



In [133]:

    
pwt.set_index('year', inplace=True)
pwt['lrgdpe_pc_US'] = pwt.loc[pwt.countrycode=='USA', 'lrgdpe_pc']
pwt['lrgdpe_pc_rel'] = pwt.lrgdpe_pc / pwt.lrgdpe_pc_US
pwt.reset_index(inplace=True)
pwt[['countrycode', 'country', 'year', 'lrgdpe_pc_rel']]









    Out[133]:







  
    
      
      countrycode
      country
      year
      lrgdpe_pc_rel
    
  
  
    
      0
      ABW
      Aruba
      1950
      NaN
    
    
      1
      ABW
      Aruba
      1951
      NaN
    
    
      2
      ABW
      Aruba
      1952
      NaN
    
    
      3
      ABW
      Aruba
      1953
      NaN
    
    
      4
      ABW
      Aruba
      1954
      NaN
    
    
      ...
      ...
      ...
      ...
      ...
    
    
      12371
      ZWE
      Zimbabwe
      2013
      0.693611
    
    
      12372
      ZWE
      Zimbabwe
      2014
      0.693790
    
    
      12373
      ZWE
      Zimbabwe
      2015
      0.692485
    
    
      12374
      ZWE
      Zimbabwe
      2016
      0.692220
    
    
      12375
      ZWE
      Zimbabwe
      2017
      0.694026
    
  

12376 rows × 4 columns

Let's plot the relative income levels in 1960 to 1980, 2000 and 2017. First let's create the wide version of this data.



In [134]:

    
relgdppc = pwt[['countrycode', 'year', 'lrgdpe_pc_rel']].pivot(index='countrycode', columns='year', values='lrgdpe_pc_rel')
relgdppc.columns = ['y' + str(col) for col in relgdppc.columns]
relgdppc.reset_index(inplace=True)
relgdppc









    Out[134]:







  
    
      
      countrycode
      y1950
      y1951
      y1952
      y1953
      y1954
      y1955
      y1956
      y1957
      y1958
      ...
      y2008
      y2009
      y2010
      y2011
      y2012
      y2013
      y2014
      y2015
      y2016
      y2017
    
  
  
    
      0
      ABW
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      0.989490
      0.989011
      0.977356
      0.972664
      0.969597
      0.969952
      0.968135
      0.966769
      0.963675
      0.962826
    
    
      1
      AGO
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      0.786656
      0.767352
      0.794583
      0.815654
      0.815493
      0.812440
      0.805392
      0.786142
      0.782010
      0.778917
    
    
      2
      AIA
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      0.975120
      0.951357
      0.943233
      0.942697
      0.935110
      0.931704
      0.934498
      0.934647
      0.928544
      0.918428
    
    
      3
      ALB
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      0.826849
      0.836372
      0.844121
      0.847105
      0.848709
      0.845853
      0.848582
      0.851238
      0.850862
      0.856015
    
    
      4
      ARE
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      1.041859
      1.020390
      1.013018
      1.022369
      1.022926
      1.024411
      1.027438
      1.016130
      1.014197
      1.014201
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      177
      VNM
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      0.755160
      0.761839
      0.771784
      0.778270
      0.783986
      0.786128
      0.789185
      0.791125
      0.796555
      0.801899
    
    
      178
      YEM
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      0.755317
      0.757512
      0.772611
      0.761339
      0.750812
      0.750272
      0.737346
      0.711625
      0.697790
      0.684275
    
    
      179
      ZAF
      0.893115
      0.887755
      0.87563
      0.881956
      0.889942
      0.886195
      0.889469
      0.892274
      0.891707
      ...
      0.862708
      0.862971
      0.864622
      0.867436
      0.866010
      0.865265
      0.863675
      0.862188
      0.860596
      0.860410
    
    
      180
      ZMB
      NaN
      NaN
      NaN
      NaN
      NaN
      0.813323
      0.816613
      0.796724
      0.785549
      ...
      0.721515
      0.732669
      0.742396
      0.750338
      0.753036
      0.753754
      0.753928
      0.751828
      0.751907
      0.757633
    
    
      181
      ZWE
      NaN
      NaN
      NaN
      NaN
      0.769047
      0.764846
      0.770286
      0.776759
      0.775427
      ...
      0.618364
      0.670103
      0.674767
      0.682105
      0.690134
      0.693611
      0.693790
      0.692485
      0.692220
      0.694026
    
  

182 rows × 69 columns



In [135]:

    
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
k = 0
fig, ax = plt.subplots()
ax.plot([relgdppc.y1960.min()*.99, relgdppc.y1960.max()*1.01], [relgdppc.y1960.min()*.99, relgdppc.y1960.max()*1.01], c='r', label='45 degree')
sns.regplot(x='y1960', y='y2017', data=relgdppc, ax=ax, label='1960-2017')
movex = relgdppc.y1960.mean() * 0.006125
movey = relgdppc.y2017.mean() * 0.006125
for line in range(0,relgdppc.shape[0]):
    if (np.isnan(relgdppc.y1960[line])==False) & (np.isnan(relgdppc.y2017[line])==False):
        ax.text(relgdppc.y1960[line]+movex, relgdppc.y2017[line]+movey, relgdppc.countrycode[line], horizontalalignment='left', fontsize=12, color='black', weight='semibold')
ax.set_xlabel('Log[Income per capita 1960] relative to US')
ax.set_ylabel('Log[Income per capita in 2017] relative to US')
ax.legend()
plt.savefig(pathgraphs + '1960_versus_2017_drop.pdf', dpi=300, bbox_inches='tight')



In [136]:

    
fig









    Out[136]:

Let's create a function that will simplify our plotting of this figure for various years



In [137]:

    
def PersistencePlot(dfin, var0='y1960', var1='y2010', labelvar='countrycode', 
                    dx=0.006125, dy=0.006125, 
                    xlabel='Log[Income per capita 1960] relative to US', 
                    ylabel='Log[Income per capita in 2010] relative to US',
                    linelabel='1960-2010',
                    filename='1960_versus_2010_drop.pdf'):
    '''
    Plot the association between var0 and var in dataframe using labelvar for labels. 
    '''
    sns.set(rc={'figure.figsize':(11.7,8.27)})
    sns.set_context("talk")
    df = dfin.copy()
    df = df.dropna(subset=[var0, var1]).reset_index(drop=True)
    # Plot
    k = 0
    fig, ax = plt.subplots()
    ax.plot([df[var0].min()*.99, df[var0].max()*1.01], [df[var0].min()*.99, df[var0].max()*1.01], c='r', label='45 degree')
    sns.regplot(x=var0, y=var1, data=df, ax=ax, label=linelabel)
    movex = df[var0].mean() * dx
    movey = df[var1].mean() * dy
    for line in range(0,df.shape[0]):
        ax.text(df[var0][line]+movex, df[var1][line]+movey, df[labelvar][line], horizontalalignment='left', fontsize=12, color='black')
    ax.set_xlabel(xlabel)
    ax.set_ylabel(ylabel)
    ax.legend()
    plt.savefig(pathgraphs + filename, dpi=300, bbox_inches='tight')
    pass



In [138]:

    
PersistencePlot(relgdppc, var0='y1980', var1='y2010', xlabel='Log[Income per capita 1980] relative to US',
                ylabel='Log[Income per capita in 2010] relative to US',
                    filename='1980_versus_2010_drop.pdf')



In [139]:

    
PersistencePlot(relgdppc.loc[(relgdppc.countrycode!='BRN')& (relgdppc.countrycode!='ARE')], var0='y1980', var1='y2010', xlabel='Log[Income per capita 1980] relative to US',
                ylabel='Log[Income per capita in 2010] relative to US', linelabel='1980-2010',
                filename='1980_versus_2010_drop.pdf')



In [140]:

    
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
period = list(range(1980, 2020, 20)) + [2017]
#mycolors = sns.color_palette("GnBu", n_colors=len(period)+5)
mycolors = sns.cubehelix_palette(len(period), start=.5, rot=-.75)
# Plot
k = 0
fig, ax = plt.subplots()
for t in period:
    sns.regplot(x='y1960', y='y'+str(t), data=relgdppc, ax=ax, label='1960-'+str(t))
    k += 1
ax.set_xlabel('Log[Income per capita 1960] relative to US')
ax.set_ylabel('Log[Income per capita in other period] relative to US')
ax.legend()









    Out[140]:





<matplotlib.legend.Legend at 0x13f016fd0>



In [141]:

    
fig









    Out[141]:

Getting data from the World Bank

The World Bank (WB) is a major source of free data. pandas has a subpackage that allows you download from many sources including the WB. The package we will use to access these API is pandas-datareader. pandas-datareader can be used to download data from a host of sources including the WB, OECD, FRED (see here).



In [142]:

    
from pandas_datareader import data, wb

We can now use wb to get information and data from the WB. Let's start by downloading teh set of basic information about the countries included in the API.



In [143]:

    
wbcountries = wb.get_countries()
wbcountries['name'] = wbcountries.name.str.strip()
wbcountries









    Out[143]:







  
    
      
      iso3c
      iso2c
      name
      region
      adminregion
      incomeLevel
      lendingType
      capitalCity
      longitude
      latitude
    
  
  
    
      0
      ABW
      AW
      Aruba
      Latin America & Caribbean
      
      High income
      Not classified
      Oranjestad
      -70.0167
      12.51670
    
    
      1
      AFG
      AF
      Afghanistan
      South Asia
      South Asia
      Low income
      IDA
      Kabul
      69.1761
      34.52280
    
    
      2
      AFR
      A9
      Africa
      Aggregates
      
      Aggregates
      Aggregates
      
      NaN
      NaN
    
    
      3
      AGO
      AO
      Angola
      Sub-Saharan Africa
      Sub-Saharan Africa (excluding high income)
      Lower middle income
      IBRD
      Luanda
      13.2420
      -8.81155
    
    
      4
      ALB
      AL
      Albania
      Europe & Central Asia
      Europe & Central Asia (excluding high income)
      Upper middle income
      IBRD
      Tirane
      19.8172
      41.33170
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      299
      XZN
      A5
      Sub-Saharan Africa excluding South Africa and ...
      Aggregates
      
      Aggregates
      Aggregates
      
      NaN
      NaN
    
    
      300
      YEM
      YE
      Yemen, Rep.
      Middle East & North Africa
      Middle East & North Africa (excluding high inc...
      Low income
      IDA
      Sana'a
      44.2075
      15.35200
    
    
      301
      ZAF
      ZA
      South Africa
      Sub-Saharan Africa
      Sub-Saharan Africa (excluding high income)
      Upper middle income
      IBRD
      Pretoria
      28.1871
      -25.74600
    
    
      302
      ZMB
      ZM
      Zambia
      Sub-Saharan Africa
      Sub-Saharan Africa (excluding high income)
      Lower middle income
      IDA
      Lusaka
      28.2937
      -15.39820
    
    
      303
      ZWE
      ZW
      Zimbabwe
      Sub-Saharan Africa
      Sub-Saharan Africa (excluding high income)
      Lower middle income
      Blend
      Harare
      31.0672
      -17.83120
    
  

304 rows × 10 columns

Let's use wb to find all the series that have the word "population".



In [144]:

    
popvars = wb.search(string='population')
popvars









    Out[144]:







  
    
      
      id
      name
      unit
      source
      sourceNote
      sourceOrganization
      topics
    
  
  
    
      24
      1.1_ACCESS.ELECTRICITY.TOT
      Access to electricity (% of total population)
      
      Sustainable Energy for All
      Access to electricity is the percentage of pop...
      b'World Bank Global Electrification Database 2...
      
    
    
      39
      1.2_ACCESS.ELECTRICITY.RURAL
      Access to electricity (% of rural population)
      
      Sustainable Energy for All
      Access to electricity is the percentage of rur...
      b'World Bank Global Electrification Database 2...
      
    
    
      40
      1.3_ACCESS.ELECTRICITY.URBAN
      Access to electricity (% of urban population)
      
      Sustainable Energy for All
      Access to electricity is the percentage of tot...
      b'World Bank Global Electrification Database 2...
      
    
    
      128
      2.1_ACCESS.CFT.TOT
      Access to Clean Fuels and Technologies for coo...
      
      Sustainable Energy for All
      
      b''
      
    
    
      159
      3.11.01.01.popcen
      Population census
      
      Statistical Capacity Indicators
      Population censuses collect data on the size, ...
      b'World Bank Microdata library. Original sourc...
      
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      17439
      per_sionl.overlap_pop_urb
      Population only receiving All Social Insurance...
      
      The Atlas of Social Protection: Indicators of ...
      NULL
      b'The Atlas of Social Protection: Indicators o...
      Social Protection & Labor
    
    
      17440
      per_sionl.overlap_q1_preT_tot
      Population in the 1st quintile (poorest) only ...
      
      The Atlas of Social Protection: Indicators of ...
      NULL
      b'The Atlas of Social Protection: Indicators o...
      Social Protection & Labor
    
    
      17441
      per_sionl.overlap_q1_rur
      Population in the 1st quintile (poorest) only ...
      
      The Atlas of Social Protection: Indicators of ...
      NULL
      b'The Atlas of Social Protection: Indicators o...
      Social Protection & Labor
    
    
      17442
      per_sionl.overlap_q1_tot
      Population in the 1st quintile (poorest) only ...
      
      The Atlas of Social Protection: Indicators of ...
      NULL
      b'The Atlas of Social Protection: Indicators o...
      Social Protection & Labor
    
    
      17443
      per_sionl.overlap_q1_urb
      Population in the 1st quintile (poorest) only ...
      
      The Atlas of Social Protection: Indicators of ...
      NULL
      b'The Atlas of Social Protection: Indicators o...
      Social Protection & Labor
    
  

1591 rows × 7 columns

Lot's of variables are available, from multiple sources that have been collected by the WB. If you check their website you can see more information on them, also identify and search the variables you may want to focus on. Here let's download the number of males and females in the population by age group, the total population, as well as the total urban population for the year 2017.



In [145]:

    
femalepop = popvars.loc[popvars.id.apply(lambda x: x.find('SP.POP.')!=-1 and x.endswith('FE'))]
malepop = popvars.loc[popvars.id.apply(lambda x: x.find('SP.POP.')!=-1 and x.endswith('MA'))]
popfields = ['SP.POP.0014.FE.IN', 'SP.POP.1564.FE.IN', 'SP.POP.65UP.FE.IN',
             'SP.POP.0014.MA.IN', 'SP.POP.1564.MA.IN', 'SP.POP.65UP.MA.IN',
             'SP.POP.TOTL.FE.IN', 'SP.POP.TOTL.MA.IN', 'SP.POP.TOTL',
             'EN.URB.MCTY', 'EN.URB.LCTY'] + malepop.id.tolist() + femalepop.id.tolist()
popfields









    Out[145]:





['SP.POP.0014.FE.IN',
 'SP.POP.1564.FE.IN',
 'SP.POP.65UP.FE.IN',
 'SP.POP.0014.MA.IN',
 'SP.POP.1564.MA.IN',
 'SP.POP.65UP.MA.IN',
 'SP.POP.TOTL.FE.IN',
 'SP.POP.TOTL.MA.IN',
 'SP.POP.TOTL',
 'EN.URB.MCTY',
 'EN.URB.LCTY',
 'SP.POP.0004.MA',
 'SP.POP.0509.MA',
 'SP.POP.1014.MA',
 'SP.POP.1519.MA',
 'SP.POP.2024.MA',
 'SP.POP.2529.MA',
 'SP.POP.3034.MA',
 'SP.POP.3539.MA',
 'SP.POP.4044.MA',
 'SP.POP.4549.MA',
 'SP.POP.5054.MA',
 'SP.POP.5559.MA',
 'SP.POP.6064.MA',
 'SP.POP.6569.MA',
 'SP.POP.7074.MA',
 'SP.POP.7579.MA',
 'SP.POP.80UP.MA',
 'SP.POP.0004.FE',
 'SP.POP.0509.FE',
 'SP.POP.1014.FE',
 'SP.POP.1519.FE',
 'SP.POP.2024.FE',
 'SP.POP.2529.FE',
 'SP.POP.3034.FE',
 'SP.POP.3539.FE',
 'SP.POP.4044.FE',
 'SP.POP.4549.FE',
 'SP.POP.5054.FE',
 'SP.POP.5559.FE',
 'SP.POP.6064.FE',
 'SP.POP.6569.FE',
 'SP.POP.7074.FE',
 'SP.POP.7579.FE',
 'SP.POP.80UP.FE']

Let's also download GDP per capita in PPP at constant 2011 prices, which is the series NY.GDP.PCAP.PP.KD.



In [146]:

    
wdi = wb.download(indicator=popfields+['NY.GDP.PCAP.PP.KD'], country=wbcountries.iso2c.values, start=2017, end=2017)

wdi









    



/Users/ozak/anaconda3/envs/GeoPython38env/lib/python3.8/site-packages/pandas_datareader/wb.py:592: UserWarning: Non-standard ISO country codes: 1A, 1W, 4E, 6D, 6F, 6L, 6N, 6X, 7E, 8S, A4, A5, A9, B1, B2, B3, B4, B6, B7, B8, C4, C5, C6, C7, C8, C9, D2, D3, D4, D5, D6, D7, D8, D9, EU, F1, F6, JG, L4, L5, L6, L7, M1, M2, N6, O6, OE, R6, S1, S2, S3, S4, T2, T3, T4, T5, T6, T7, V1, V2, V3, V4, XC, XD, XE, XF, XG, XH, XI, XJ, XK, XL, XM, XN, XO, XP, XQ, XT, XU, XY, Z4, Z7, ZB, ZF, ZG, ZJ, ZQ, ZT
  warnings.warn(






    Out[146]:







  
    
      
      
      SP.POP.0014.FE.IN
      SP.POP.1564.FE.IN
      SP.POP.65UP.FE.IN
      SP.POP.0014.MA.IN
      SP.POP.1564.MA.IN
      SP.POP.65UP.MA.IN
      SP.POP.TOTL.FE.IN
      SP.POP.TOTL.MA.IN
      SP.POP.TOTL
      EN.URB.MCTY
      ...
      SP.POP.4044.FE
      SP.POP.4549.FE
      SP.POP.5054.FE
      SP.POP.5559.FE
      SP.POP.6064.FE
      SP.POP.6569.FE
      SP.POP.7074.FE
      SP.POP.7579.FE
      SP.POP.80UP.FE
      NY.GDP.PCAP.PP.KD
    
    
      country
      year
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      Aruba
      2017
      9297.0
      38127.0
      7907.0
      9646.0
      34524.0
      5864.0
      55331.0
      50035.0
      105366.0
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      33966.483000
    
    
      Afghanistan
      2017
      7732365.0
      9413927.0
      497974.0
      8122796.0
      10100250.0
      429088.0
      17644266.0
      18652134.0
      36296400.0
      3913297.0
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      2202.570851
    
    
      Angola
      2017
      6984651.0
      7711810.0
      370978.0
      7015191.0
      7437431.0
      296687.0
      15067439.0
      14749309.0
      29816748.0
      7515345.0
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      7310.901738
    
    
      Albania
      2017
      243452.0
      967394.0
      198481.0
      274603.0
      1005023.0
      184503.0
      1409327.0
      1464130.0
      2873457.0
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      13093.652313
    
    
      Andorra
      2017
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      77001.0
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      Kosovo
      2017
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      Yemen, Rep.
      2017
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      559027.0
      443721.0
      385742.0
      311511.0
      235265.0
      179953.0
      122274.0
      72535.0
      56798.0
      NaN
    
    
      South Africa
      2017
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      1778722.0
      1536691.0
      1304635.0
      1108203.0
      920619.0
      684906.0
      493069.0
      338152.0
      275995.0
      NaN
    
    
      Zambia
      2017
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      355596.0
      259410.0
      196406.0
      152437.0
      115017.0
      85429.0
      60372.0
      38551.0
      29540.0
      NaN
    
    
      Zimbabwe
      2017
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      342473.0
      243625.0
      193105.0
      159731.0
      133656.0
      95561.0
      72600.0
      53412.0
      44607.0
      NaN
    
  

523 rows × 46 columns

Looks like there are lots of missing values...but be not fooled. This is a strange behavior of wb. Since the original source differs, it is not linking the countries correctly. Let's see this



In [147]:

    
wdi.sort_index()









    Out[147]:







  
    
      
      
      SP.POP.0014.FE.IN
      SP.POP.1564.FE.IN
      SP.POP.65UP.FE.IN
      SP.POP.0014.MA.IN
      SP.POP.1564.MA.IN
      SP.POP.65UP.MA.IN
      SP.POP.TOTL.FE.IN
      SP.POP.TOTL.MA.IN
      SP.POP.TOTL
      EN.URB.MCTY
      ...
      SP.POP.4044.FE
      SP.POP.4549.FE
      SP.POP.5054.FE
      SP.POP.5559.FE
      SP.POP.6064.FE
      SP.POP.6569.FE
      SP.POP.7074.FE
      SP.POP.7579.FE
      SP.POP.80UP.FE
      NY.GDP.PCAP.PP.KD
    
    
      country
      year
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      Afghanistan
      2017
      7732365.0
      9413927.0
      497974.0
      8122796.0
      10100250.0
      429088.0
      17644266.0
      18652134.0
      36296400.0
      3913297.0
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      2202.570851
    
    
      2017
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      700156.0
      562807.0
      451226.0
      357049.0
      275515.0
      218541.0
      145457.0
      78439.0
      55537.0
      NaN
    
    
      Albania
      2017
      243452.0
      967394.0
      198481.0
      274603.0
      1005023.0
      184503.0
      1409327.0
      1464130.0
      2873457.0
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      13093.652313
    
    
      2017
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      85864.0
      94264.0
      101671.0
      101284.0
      84148.0
      63784.0
      52773.0
      40717.0
      41206.0
      NaN
    
    
      Algeria
      2017
      6005664.0
      13165826.0
      1310941.0
      6261929.0
      13399086.0
      1245754.0
      20482430.0
      20906768.0
      41389198.0
      2659373.0
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      11550.617638
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      Yemen, Rep.
      2017
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      559027.0
      443721.0
      385742.0
      311511.0
      235265.0
      179953.0
      122274.0
      72535.0
      56798.0
      NaN
    
    
      Zambia
      2017
      3794229.0
      4502766.0
      213893.0
      3859276.0
      4346054.0
      137471.0
      8510888.0
      8342800.0
      16853688.0
      2406227.0
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      3485.002103
    
    
      2017
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      355596.0
      259410.0
      196406.0
      152437.0
      115017.0
      85429.0
      60372.0
      38551.0
      29540.0
      NaN
    
    
      Zimbabwe
      2017
      3024124.0
      4169317.0
      266180.0
      3040282.0
      3590650.0
      146192.0
      7459621.0
      6777124.0
      14236745.0
      1509901.0
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      3134.327494
    
    
      2017
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      342473.0
      243625.0
      193105.0
      159731.0
      133656.0
      95561.0
      72600.0
      53412.0
      44607.0
      NaN
    
  

523 rows × 46 columns

Let's aggregate by year-country so that we have the correct data



In [148]:

    
wdi = wdi.groupby(['country', 'year']).max()
wdi.reset_index(inplace=True)
wdi









    Out[148]:







  
    
      
      country
      year
      SP.POP.0014.FE.IN
      SP.POP.1564.FE.IN
      SP.POP.65UP.FE.IN
      SP.POP.0014.MA.IN
      SP.POP.1564.MA.IN
      SP.POP.65UP.MA.IN
      SP.POP.TOTL.FE.IN
      SP.POP.TOTL.MA.IN
      ...
      SP.POP.4044.FE
      SP.POP.4549.FE
      SP.POP.5054.FE
      SP.POP.5559.FE
      SP.POP.6064.FE
      SP.POP.6569.FE
      SP.POP.7074.FE
      SP.POP.7579.FE
      SP.POP.80UP.FE
      NY.GDP.PCAP.PP.KD
    
  
  
    
      0
      Afghanistan
      2017
      7732365.0
      9.413927e+06
      497974.0
      8.122796e+06
      1.010025e+07
      429088.0
      1.764427e+07
      1.865213e+07
      ...
      700156.0
      562807.0
      451226.0
      357049.0
      275515.0
      218541.0
      145457.0
      78439.0
      55537.0
      2202.570851
    
    
      1
      Albania
      2017
      243452.0
      9.673940e+05
      198481.0
      2.746030e+05
      1.005023e+06
      184503.0
      1.409327e+06
      1.464130e+06
      ...
      85864.0
      94264.0
      101671.0
      101284.0
      84148.0
      63784.0
      52773.0
      40717.0
      41206.0
      13093.652313
    
    
      2
      Algeria
      2017
      6005664.0
      1.316583e+07
      1310941.0
      6.261929e+06
      1.339909e+07
      1245754.0
      2.048243e+07
      2.090677e+07
      ...
      1334956.0
      1112079.0
      950716.0
      767641.0
      631310.0
      458099.0
      325968.0
      256637.0
      270236.0
      11550.617638
    
    
      3
      American Samoa
      2017
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      4
      Andorra
      2017
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      259
      West Bank and Gaza
      2017
      855495.0
      1.267122e+06
      72244.0
      8.935800e+05
      1.300776e+06
      65588.0
      2.194861e+06
      2.259944e+06
      ...
      102015.0
      84835.0
      69853.0
      50659.0
      36755.0
      28050.0
      20237.0
      13392.0
      10565.0
      1183.435345
    
    
      260
      World
      2017
      941262629.0
      2.422523e+09
      358609161.0
      1.005788e+09
      2.488493e+09
      290411331.0
      3.722395e+09
      3.784692e+09
      ...
      240536539.0
      231501913.0
      209202400.0
      179317688.0
      155624953.0
      123519929.0
      87547943.0
      64952254.0
      82589032.0
      16167.228725
    
    
      261
      Yemen, Rep.
      2017
      5456767.0
      7.919190e+06
      431560.0
      5.677559e+06
      7.987563e+06
      362182.0
      1.380752e+07
      1.402730e+07
      ...
      559027.0
      443721.0
      385742.0
      311511.0
      235265.0
      179953.0
      122274.0
      72535.0
      56798.0
      NaN
    
    
      262
      Zambia
      2017
      3794229.0
      4.502766e+06
      213893.0
      3.859276e+06
      4.346054e+06
      137471.0
      8.510888e+06
      8.342800e+06
      ...
      355596.0
      259410.0
      196406.0
      152437.0
      115017.0
      85429.0
      60372.0
      38551.0
      29540.0
      3485.002103
    
    
      263
      Zimbabwe
      2017
      3024124.0
      4.169317e+06
      266180.0
      3.040282e+06
      3.590650e+06
      146192.0
      7.459621e+06
      6.777124e+06
      ...
      342473.0
      243625.0
      193105.0
      159731.0
      133656.0
      95561.0
      72600.0
      53412.0
      44607.0
      3134.327494
    
  

264 rows × 48 columns

Let's merge this data with the original wbcountries dataframe, so that we can use it to plot.



In [149]:

    
wdi = wbcountries.merge(wdi, left_on='name', right_on='country')
wdi









    Out[149]:







  
    
      
      iso3c
      iso2c
      name
      region
      adminregion
      incomeLevel
      lendingType
      capitalCity
      longitude
      latitude
      ...
      SP.POP.4044.FE
      SP.POP.4549.FE
      SP.POP.5054.FE
      SP.POP.5559.FE
      SP.POP.6064.FE
      SP.POP.6569.FE
      SP.POP.7074.FE
      SP.POP.7579.FE
      SP.POP.80UP.FE
      NY.GDP.PCAP.PP.KD
    
  
  
    
      0
      ABW
      AW
      Aruba
      Latin America & Caribbean
      
      High income
      Not classified
      Oranjestad
      -70.0167
      12.51670
      ...
      3920.0
      4249.0
      4847.0
      4638.0
      3810.0
      2928.0
      2021.0
      1439.0
      1519.0
      33966.483000
    
    
      1
      AFG
      AF
      Afghanistan
      South Asia
      South Asia
      Low income
      IDA
      Kabul
      69.1761
      34.52280
      ...
      700156.0
      562807.0
      451226.0
      357049.0
      275515.0
      218541.0
      145457.0
      78439.0
      55537.0
      2202.570851
    
    
      2
      AGO
      AO
      Angola
      Sub-Saharan Africa
      Sub-Saharan Africa (excluding high income)
      Lower middle income
      IBRD
      Luanda
      13.2420
      -8.81155
      ...
      622220.0
      475507.0
      379684.0
      305882.0
      205034.0
      147574.0
      106010.0
      67489.0
      49905.0
      7310.901738
    
    
      3
      ALB
      AL
      Albania
      Europe & Central Asia
      Europe & Central Asia (excluding high income)
      Upper middle income
      IBRD
      Tirane
      19.8172
      41.33170
      ...
      85864.0
      94264.0
      101671.0
      101284.0
      84148.0
      63784.0
      52773.0
      40717.0
      41206.0
      13093.652313
    
    
      4
      AND
      AD
      Andorra
      Europe & Central Asia
      
      High income
      Not classified
      Andorra la Vella
      1.5218
      42.50750
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      259
      XKX
      XK
      Kosovo
      Europe & Central Asia
      Europe & Central Asia (excluding high income)
      Upper middle income
      IDA
      Pristina
      20.9260
      42.56500
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      10302.075476
    
    
      260
      YEM
      YE
      Yemen, Rep.
      Middle East & North Africa
      Middle East & North Africa (excluding high inc...
      Low income
      IDA
      Sana'a
      44.2075
      15.35200
      ...
      559027.0
      443721.0
      385742.0
      311511.0
      235265.0
      179953.0
      122274.0
      72535.0
      56798.0
      NaN
    
    
      261
      ZAF
      ZA
      South Africa
      Sub-Saharan Africa
      Sub-Saharan Africa (excluding high income)
      Upper middle income
      IBRD
      Pretoria
      28.1871
      -25.74600
      ...
      1778722.0
      1536691.0
      1304635.0
      1108203.0
      920619.0
      684906.0
      493069.0
      338152.0
      275995.0
      12703.421242
    
    
      262
      ZMB
      ZM
      Zambia
      Sub-Saharan Africa
      Sub-Saharan Africa (excluding high income)
      Lower middle income
      IDA
      Lusaka
      28.2937
      -15.39820
      ...
      355596.0
      259410.0
      196406.0
      152437.0
      115017.0
      85429.0
      60372.0
      38551.0
      29540.0
      3485.002103
    
    
      263
      ZWE
      ZW
      Zimbabwe
      Sub-Saharan Africa
      Sub-Saharan Africa (excluding high income)
      Lower middle income
      Blend
      Harare
      31.0672
      -17.83120
      ...
      342473.0
      243625.0
      193105.0
      159731.0
      133656.0
      95561.0
      72600.0
      53412.0
      44607.0
      3134.327494
    
  

264 rows × 58 columns

Plot Male vs Female population in each country in 2017



In [150]:

    
PersistencePlot(wdi, var0='SP.POP.TOTL.FE.IN', var1='SP.POP.TOTL.MA.IN', xlabel='Number of Females',
                ylabel='Number of Males', labelvar='iso3c', linelabel='Female-Male', 
                dx=0.1, dy=0.1, filename='Female-Male-2017.pdf')

Let's take $log$s so we see this better



In [151]:

    
wdi['lpop_fe'] = np.log(wdi['SP.POP.TOTL.FE.IN'])
wdi['lpop_ma'] = np.log(wdi['SP.POP.TOTL.MA.IN'])
PersistencePlot(wdi, var0='lpop_fe', var1='lpop_ma', xlabel='Log[Number of Females]',
                ylabel='Log[Number of Males]', labelvar='iso3c', linelabel='Female-Male', 
                dx=0.01, dy=0.01, filename='Female-Male-2017.pdf')

Seems like the gender ratio, i.e., the number of males per female is quite different from 1. Let's plot the histogram of the gender ratio across countries to see this better.



In [152]:

    
(np.exp(wdi['lpop_ma'] - wdi['lpop_fe'])).hist()









    Out[152]:





<matplotlib.axes._subplots.AxesSubplot at 0x143263910>



In [153]:

    
wdi['gender_ratio'] = (wdi['SP.POP.TOTL.MA.IN'] / wdi['SP.POP.TOTL.FE.IN'])
wdi.gender_ratio.hist()









    Out[153]:





<matplotlib.axes._subplots.AxesSubplot at 0x142a5d880>



In [154]:

    
print('Maximum gender ratio = ', wdi.gender_ratio.max())
wdi.loc[wdi.gender_ratio>=1.05][['iso3c', 'name', 'region', 'gender_ratio']].sort_values('gender_ratio')









    



Maximum gender ratio =  3.1097202844667002






    Out[154]:







  
    
      
      iso3c
      name
      region
      gender_ratio
    
  
  
    
      38
      CHN
      China
      East Asia & Pacific
      1.054808
    
    
      224
      SYC
      Seychelles
      Sub-Saharan Africa
      1.056584
    
    
      1
      AFG
      Afghanistan
      South Asia
      1.057122
    
    
      167
      MYS
      Malaysia
      East Asia & Pacific
      1.059319
    
    
      182
      PAK
      Pakistan
      South Asia
      1.060265
    
    
      202
      SAS
      South Asia
      Aggregates
      1.068105
    
    
      238
      TSA
      South Asia (IDA & IBRD)
      Aggregates
      1.068105
    
    
      258
      WSM
      Samoa
      East Asia & Pacific
      1.071733
    
    
      151
      MEA
      Middle East & North Africa
      Aggregates
      1.073636
    
    
      5
      ARB
      Arab World
      Aggregates
      1.074033
    
    
      107
      IND
      India
      South Asia
      1.082583
    
    
      29
      BRN
      Brunei Darussalam
      East Asia & Pacific
      1.083313
    
    
      216
      SST
      Small states
      Aggregates
      1.096263
    
    
      206
      SGP
      Singapore
      East Asia & Pacific
      1.098113
    
    
      54
      DJI
      Djibouti
      Middle East & North Africa
      1.113483
    
    
      30
      BTN
      Bhutan
      South Asia
      1.123726
    
    
      181
      OSS
      Other small states
      Aggregates
      1.130181
    
    
      86
      GNQ
      Equatorial Guinea
      Sub-Saharan Africa
      1.245989
    
    
      203
      SAU
      Saudi Arabia
      Middle East & North Africa
      1.343270
    
    
      125
      KWT
      Kuwait
      Middle East & North Africa
      1.492402
    
    
      150
      MDV
      Maldives
      South Asia
      1.620545
    
    
      20
      BHR
      Bahrain
      Middle East & North Africa
      1.695785
    
    
      180
      OMN
      Oman
      Middle East & North Africa
      1.929817
    
    
      6
      ARE
      United Arab Emirates
      Middle East & North Africa
      2.280813
    
    
      198
      QAT
      Qatar
      Middle East & North Africa
      3.109720



In [155]:

    
print('Minimum gender ratio = ', wdi.gender_ratio.min())
wdi.loc[wdi.gender_ratio<=0.95][['iso3c', 'name', 'region', 'gender_ratio']].sort_values('gender_ratio')









    



Minimum gender ratio =  0.8326826068938992






    Out[155]:







  
    
      
      iso3c
      name
      region
      gender_ratio
    
  
  
    
      176
      NPL
      Nepal
      South Asia
      0.832683
    
    
      49
      CUW
      Curacao
      Latin America & Caribbean
      0.847313
    
    
      143
      LVA
      Latvia
      Europe & Central Asia
      0.849527
    
    
      141
      LTU
      Lithuania
      Europe & Central Asia
      0.857032
    
    
      94
      HKG
      Hong Kong SAR, China
      East Asia & Pacific
      0.857683
    
    
      246
      UKR
      Ukraine
      Europe & Central Asia
      0.862200
    
    
      200
      RUS
      Russian Federation
      Europe & Central Asia
      0.863357
    
    
      23
      BLR
      Belarus
      Europe & Central Asia
      0.870642
    
    
      209
      SLV
      El Salvador
      Latin America & Caribbean
      0.884228
    
    
      69
      EST
      Estonia
      Europe & Central Asia
      0.886953
    
    
      8
      ARM
      Armenia
      Europe & Central Asia
      0.888378
    
    
      192
      PRT
      Portugal
      Europe & Central Asia
      0.897158
    
    
      0
      ABW
      Aruba
      Latin America & Caribbean
      0.904285
    
    
      99
      HUN
      Hungary
      Europe & Central Asia
      0.906684
    
    
      254
      VIR
      Virgin Islands (U.S.)
      Latin America & Caribbean
      0.908106
    
    
      263
      ZWE
      Zimbabwe
      Sub-Saharan Africa
      0.908508
    
    
      190
      PRI
      Puerto Rico
      Latin America & Caribbean
      0.910474
    
    
      80
      GEO
      Georgia
      Europe & Central Asia
      0.913288
    
    
      62
      ECA
      Europe & Central Asia (excluding high income)
      Aggregates
      0.917302
    
    
      229
      TEC
      Europe & Central Asia (IDA & IBRD countries)
      Aggregates
      0.919337
    
    
      144
      MAC
      Macao SAR, China
      East Asia & Pacific
      0.923031
    
    
      148
      MDA
      Moldova
      Europe & Central Asia
      0.923076
    
    
      83
      GIN
      Guinea
      Sub-Saharan Africa
      0.925869
    
    
      136
      LKA
      Sri Lanka
      South Asia
      0.926015
    
    
      97
      HRV
      Croatia
      Europe & Central Asia
      0.927410
    
    
      31
      BWA
      Botswana
      Sub-Saharan Africa
      0.929864
    
    
      10
      ATG
      Antigua and Barbuda
      Latin America & Caribbean
      0.930138
    
    
      158
      MMR
      Myanmar
      East Asia & Pacific
      0.930308
    
    
      248
      URY
      Uruguay
      Latin America & Caribbean
      0.932800
    
    
      28
      BRB
      Barbados
      Latin America & Caribbean
      0.933902
    
    
      34
      CEB
      Central Europe and the Baltics
      Aggregates
      0.937790
    
    
      169
      NAM
      Namibia
      Sub-Saharan Africa
      0.939169
    
    
      75
      FRA
      France
      Europe & Central Asia
      0.939170
    
    
      63
      ECS
      Europe & Central Asia
      Aggregates
      0.939541
    
    
      118
      KAZ
      Kazakhstan
      Europe & Central Asia
      0.940358
    
    
      188
      POL
      Poland
      Europe & Central Asia
      0.940866
    
    
      163
      MOZ
      Mozambique
      Sub-Saharan Africa
      0.941291
    
    
      21
      BHS
      Bahamas, The
      Latin America & Caribbean
      0.944080
    
    
      114
      ITA
      Italy
      Europe & Central Asia
      0.945175
    
    
      19
      BGR
      Bulgaria
      Europe & Central Asia
      0.945606
    
    
      219
      SVK
      Slovak Republic
      Europe & Central Asia
      0.946981
    
    
      222
      SWZ
      Eswatini
      Sub-Saharan Africa
      0.947617
    
    
      199
      ROU
      Romania
      Europe & Central Asia
      0.948377
    
    
      205
      SEN
      Senegal
      Sub-Saharan Africa
      0.948715

Gender ratio and development



In [156]:

    
wdi['lgdppc'] = np.log(wdi['NY.GDP.PCAP.PP.KD'])
sns.set(rc={'figure.figsize':(11.7,8.27)})
#sns.reset_orig()
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.scatterplot(x='lgdppc', y='gender_ratio', hue='region',
                hue_order=['East Asia & Pacific', 'Europe & Central Asia',
                           'Latin America & Caribbean ', 'Middle East & North Africa',
                           'North America', 'South Asia', 'Sub-Saharan Africa '],
                data=wdi.loc[wdi.region!='Aggregates'], alpha=1, style='incomeLevel', 
                style_order=['High income', 'Upper middle income', 'Lower middle income', 'Low income'],
                )
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
ax.yaxis.set_major_formatter(mpl.ticker.StrMethodFormatter('{x:,}'))
handles, labels = ax.get_legend_handles_labels()
ax.legend(handles=handles[1:], labels=labels[1:])
ax.set_xlabel('Log[GDP per capita]')
ax.set_ylabel('Gender Ratio')
plt.savefig(pathgraphs + 'Gender-Ratio-GDPpc.pdf', dpi=300, bbox_inches='tight')



In [157]:

    
fig









    Out[157]:

Use statistical and mathematical functions to analyze the data

Now let's import the statsmodels module to run regressions.



In [158]:

    
import statsmodels.api as sm
import statsmodels.formula.api as smf
from IPython.display import Latex

Let's estimate the elasticity of the number of men with respect to the number of women.



In [159]:

    
mod = sm.OLS(wdi['lpop_ma'],sm.add_constant(wdi['lpop_fe']), missing='drop').fit()
mod.summary2()









    Out[159]:






        Model:                OLS          Adj. R-squared:      0.998  


  Dependent Variable:       lpop_ma             AIC:          -306.5235


         Date:         2020-06-13 11:54         BIC:          -299.5706


   No. Observations:          239          Log-Likelihood:     155.26  


       Df Model:               1            F-statistic:      1.055e+05


     Df Residuals:            237        Prob (F-statistic):  5.56e-316


      R-squared:             0.998             Scale:         0.016103 




           Coef.  Std.Err.      t      P>|t|  [0.025   0.975]


  const    0.0553   0.0496    1.1149   0.2660  -0.0424  0.1529


  lpop_fe  0.9968   0.0031   324.8267  0.0000  0.9908   1.0029




     Omnibus:     287.615   Durbin-Watson:      1.950  


  Prob(Omnibus):   0.000   Jarque-Bera (JB):  13838.973


       Skew:       5.192       Prob(JB):        0.000  


     Kurtosis:    38.803    Condition No.:       98



In [160]:

    
print('The elasticity is %8.4f' % mod.params[1])
print(r'The $R^2$ is %8.3f' % mod.rsquared)









    



The elasticity is   0.9968
The $R^2$ is    0.998

Let's instead use the smf module, which allows us to run the regression wiritng the formula instead of having to pass the data and adding the constant as a new variable. Let's run a simple correlation between $\log(GDPpc)$ and the gender ratio.



In [161]:

    
mod = smf.ols(formula='lgdppc ~ gender_ratio', data=wdi[['lpop_ma','lpop_fe', 'lgdppc', 'gender_ratio']], missing='drop').fit()
mod.summary2()









    Out[161]:






        Model:                OLS          Adj. R-squared:      0.022 


  Dependent Variable:       lgdppc              AIC:          693.7805


         Date:         2020-06-13 11:54         BIC:          700.6216


   No. Observations:          226          Log-Likelihood:     -344.89


       Df Model:               1            F-statistic:        6.003 


     Df Residuals:            224        Prob (F-statistic):   0.0151 


      R-squared:             0.026             Scale:          1.2500 




                Coef.  Std.Err.     t      P>|t|  [0.025  0.975]


  Intercept     8.4168   0.3909   21.5327  0.0000  7.6465  9.1870


  gender_ratio  0.9251   0.3776   2.4500   0.0151  0.1810  1.6693




     Omnibus:     12.884   Durbin-Watson:    1.817


  Prob(Omnibus):   0.002  Jarque-Bera (JB):  7.382


       Skew:      -0.270      Prob(JB):      0.025


     Kurtosis:     2.298   Condition No.:     10



In [162]:

    
mysummary=mod.summary2()
Latex(mysummary.as_latex())









    Out[162]:





\begin{table}
\caption{Results: Ordinary least squares}
\begin{center}
\begin{tabular}{llll}
\hline
Model:              & OLS              & Adj. R-squared:     & 0.022     \\
Dependent Variable: & lgdppc           & AIC:                & 693.7805  \\
Date:               & 2020-06-13 11:54 & BIC:                & 700.6216  \\
No. Observations:   & 226              & Log-Likelihood:     & -344.89   \\
Df Model:           & 1                & F-statistic:        & 6.003     \\
Df Residuals:       & 224              & Prob (F-statistic): & 0.0151    \\
R-squared:          & 0.026            & Scale:              & 1.2500    \\
\hline
\end{tabular}
\end{center}
\hline
\begin{center}
\begin{tabular}{lcccccc}
\hline
              & Coef.  & Std.Err. &    t    & P$> |$t$|$ & [0.025 & 0.975]  \\
\hline
\hline
\end{tabular}
\begin{tabular}{lrrrrrr}
Intercept     & 8.4168 &   0.3909 & 21.5327 &      0.0000 & 7.6465 & 9.1870  \\
gender\_ratio & 0.9251 &   0.3776 &  2.4500 &      0.0151 & 0.1810 & 1.6693  \\
\hline
\end{tabular}
\end{center}
\hline
\begin{center}
\begin{tabular}{llll}
\hline
Omnibus:       & 12.884 & Durbin-Watson:    & 1.817  \\
Prob(Omnibus): & 0.002  & Jarque-Bera (JB): & 7.382  \\
Skew:          & -0.270 & Prob(JB):         & 0.025  \\
Kurtosis:      & 2.298  & Condition No.:    & 10     \\
\hline
\end{tabular}
\end{center}
\end{table}



In [163]:

    
print('The semi-elasticity is %2.4f' % mod.params[1])
print(r'The $R^2$ is %1.3f' % mod.rsquared)









    



The semi-elasticity is 0.9251
The $R^2$ is 0.026

But of course we know correlation is not causation! Even more, from our figure we know that the positive association is driven by the rich oil producing countries of the Middle East & North Africa. To see this, let's replicate the analysis without those countries.



In [164]:

    
mod = smf.ols(formula='lgdppc ~ gender_ratio', data=wdi.loc[wdi.region!='Middle East & North Africa'][['lpop_ma','lpop_fe', 'lgdppc', 'gender_ratio']], missing='drop').fit()
mod.summary2()









    Out[164]:






        Model:                OLS          Adj. R-squared:      0.004 


  Dependent Variable:       lgdppc              AIC:          639.2589


         Date:         2020-06-13 11:54         BIC:          645.9244


   No. Observations:          207          Log-Likelihood:     -317.63


       Df Model:               1            F-statistic:        1.919 


     Df Residuals:            205        Prob (F-statistic):    0.167 


      R-squared:             0.009             Scale:          1.2722 




                Coef.   Std.Err.     t      P>|t|  [0.025   0.975] 


  Intercept     10.8771   1.1291   9.6336   0.0000  8.6510   13.1032


  gender_ratio  -1.5789   1.1397   -1.3853  0.1675  -3.8260  0.6682 




     Omnibus:     12.094   Durbin-Watson:    1.734


  Prob(Omnibus):   0.002  Jarque-Bera (JB):  6.673


       Skew:      -0.255      Prob(JB):      0.036


     Kurtosis:     2.283   Condition No.:     29



In [165]:

    
print('The semi-elasticity is %2.4f with a p-value of %1.4f' % (mod.params[1], mod.pvalues[1]))
print(r'The $R^2$ is %1.3f' % mod.rsquared)
print("Luckily we had plotted the data, right?!")









    



The semi-elasticity is -1.5789 with a p-value of 0.1675
The $R^2$ is 0.009
Luckily we had plotted the data, right?!

Homework

Using Pandas and Statsmodels write a Jupyter Notebook that:

Uses the data from the Maddison Project to plot the evolution of total population across the world.
Plots the evolution of the share of the world population by countries and WB regions.
Downloads fertility, mortality and life expectancy data from the WB and plots its evolution in the last 60 years.
Downloads mortality and life expectancy data (across regions and cohorts) from the Human Mortality Database and plots its evolution.
Using this data analyze the convergence of life expectanty, mortality and fertility.

Submit your notebook as a pull request to the course's github repository.

Wages and Population In England 1200-1860

Let's get the population and wage series from Greg Clark's website for plotting.



In [166]:

    
uk1 = pd.read_excel('http://faculty.econ.ucdavis.edu/faculty/gclark/English%20Data/England%20NNI%20-%20Clark%20-%202015.xlsx', sheet_name='Decadal')
uk2 = pd.read_excel('http://faculty.econ.ucdavis.edu/faculty/gclark/English%20Data/Wages%202014.xlsx', sheet_name='Decadal')



In [167]:

    
uk1









    Out[167]:







  
    
      
      Decade
      Unnamed: 1
      Pop England
      Share Males farm sector
      Male Farm Wage
      Male Non-Farm Wage
      Male average Wage
      Male Work Days per Year
      Total Wage Income
      Land rents
      ...
      All Capital Income
      Indirect Taxes
      Net National Income
      Unnamed: 15
      Price Index - Domestic Expenditure
      Price Index - GDP
      Price Index - Cost of Living
      Unnamed: 19
      Real Net National Income (DE)
      Real NNI/N
    
  
  
    
      0
      NaN
      NaN
      m.
      NaN
      d./day
      d./day
      d./day
      NaN
      (₤ m)
      (₤ m)
      ...
      (₤ m)
      (₤ m)
      (₤ m)
      NaN
      (1860s=100)
      (1860s=100)
      (1860s=100)
      NaN
      (1860s=100)
      (1860s=100)
    
    
      1
      1200.0
      NaN
      3.39595
      0.555168
      1.37365
      2.28282
      2.08878
      300.0
      3.07847
      1.60604
      ...
      1.74125
      0
      6.42576
      NaN
      6.58634
      7.12642
      6.5442
      NaN
      14.8972
      86.6214
    
    
      2
      1210.0
      NaN
      3.39595
      0.575784
      1.26945
      1.84928
      2.02114
      300.0
      3.20043
      1.60604
      ...
      1.95638
      0
      6.76285
      NaN
      7.49473
      8.1093
      7.57584
      NaN
      14.0425
      81.6513
    
    
      3
      1220.0
      NaN
      3.738
      0.626021
      1.25538
      2.13595
      1.94733
      300.0
      3.39416
      1.62895
      ...
      1.97144
      0
      6.99455
      NaN
      8.33274
      9.01602
      8.53557
      NaN
      13.1437
      69.432
    
    
      4
      1230.0
      NaN
      3.9039
      0.652303
      1.17893
      NaN
      1.84872
      300.0
      3.3653
      1.33146
      ...
      2.04084
      0
      6.7376
      NaN
      8.2654
      8.94316
      8.40574
      NaN
      12.4624
      63.035
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      63
      1820.0
      NaN
      11.9821
      0.345313
      20.3334
      34.5349
      34.3278
      300.0
      191.868
      38.1915
      ...
      78.7788
      29.1646
      338.003
      NaN
      108.479
      112.086
      110.194
      NaN
      48.1282
      79.2907
    
    
      64
      1830.0
      NaN
      13.7732
      0.308229
      20.0429
      35.3837
      35.4298
      300.0
      227.646
      36.5573
      ...
      93.748
      25.8767
      383.828
      NaN
      100.892
      102.972
      101.269
      NaN
      58.5932
      84.0351
    
    
      65
      1840.0
      NaN
      15.6365
      0.264763
      21.0963
      36.1676
      37.0167
      300.0
      269.977
      39.1656
      ...
      101.875
      26.1843
      437.202
      NaN
      96.8991
      97.8146
      98.7991
      NaN
      69.559
      87.7247
    
    
      66
      1850.0
      NaN
      17.5896
      0.246630
      22.0997
      37.8408
      39.1299
      300.0
      321.387
      39.4743
      ...
      124.452
      28.3904
      513.703
      NaN
      93.3178
      93.1664
      95.1283
      NaN
      84.549
      94.9057
    
    
      67
      1860.0
      NaN
      19.7222
      0.239390
      23.6258
      43.5979
      44.6595
      300.0
      411.413
      43.1763
      ...
      168.819
      30.283
      653.692
      NaN
      99.9493
      99.9555
      99.9962
      NaN
      100.343
      100.349
    
  

68 rows × 22 columns



In [168]:

    
uk2









    Out[168]:







  
    
      
      Decade
      Farm Laborers, d/day
      Coal Miners, d./day
      Building Laborers, d/day
      Building Craftsmen, d/day
      Unnamed: 5
      Cost of Living (1860s=100)
      Unnamed: 7
      Real Farm Wage (1860s=100)
      Real Building Laborer Wage (1860s=100)
      Real Building Craftsman Wage (1860s=100)
    
  
  
    
      0
      1200
      1.373647
      NaN
      NaN
      2.783922
      NaN
      6.544197
      NaN
      88.841573
      NaN
      80.673336
    
    
      1
      1210
      1.262561
      NaN
      NaN
      2.078984
      NaN
      7.575843
      NaN
      72.045676
      NaN
      52.335306
    
    
      2
      1220
      1.249455
      NaN
      1.625946
      2.602945
      NaN
      8.535567
      NaN
      60.578574
      51.791535
      56.307104
    
    
      3
      1230
      1.178929
      NaN
      NaN
      NaN
      NaN
      8.405740
      NaN
      59.258095
      NaN
      NaN
    
    
      4
      1240
      1.246828
      NaN
      1.878412
      2.893921
      NaN
      8.871055
      NaN
      61.132054
      58.464596
      62.484216
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      62
      1820
      20.333416
      32.226677
      27.009300
      42.060419
      NaN
      110.194354
      NaN
      78.081590
      71.212912
      72.500372
    
    
      63
      1830
      20.042939
      32.680000
      28.021165
      42.746221
      NaN
      101.268842
      NaN
      83.892814
      80.390114
      80.295861
    
    
      64
      1840
      21.096252
      30.920000
      29.023687
      43.311592
      NaN
      98.771980
      NaN
      90.604982
      85.635493
      83.439177
    
    
      65
      1850
      22.099690
      36.680000
      30.103970
      45.577598
      NaN
      95.128327
      NaN
      98.270928
      92.231871
      91.251668
    
    
      66
      1860
      23.625775
      41.760000
      34.466257
      52.729581
      NaN
      99.996226
      NaN
      100.013083
      100.110361
      100.049356
    
  

67 rows × 11 columns

Let's clean the data and merge it into a unique dataframe.



In [169]:

    
uk1 = uk1.loc[uk1.index.difference([0])].reset_index(drop=True)[[col for col in uk1.columns if col.find('Unnamed')==-1]]
uk2 = uk2[[col for col in uk2.columns if col.find('Unnamed')==-1]]
uk = uk1.merge(uk2)
uk.Decade = uk.Decade.astype(int)
uk['Pop England'] = uk['Pop England'].astype(float)



In [170]:

    
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context("talk")
# Plot
fig, ax = plt.subplots()
sns.lineplot(x='Decade', y='Pop England', data=uk.loc[uk.Decade<1730], alpha=1, label='Population', color='r')
ax2 = ax.twinx()
sns.lineplot(x='Decade', y='Real Farm Wage (1860s=100)', data=uk.loc[uk.Decade<1730], alpha=1, label='Real Wages', color='b')
ax.tick_params(axis = 'both', which = 'major')
ax.tick_params(axis = 'both', which = 'minor')
handles, labels = ax.get_legend_handles_labels()
handles2, labels2 = ax2.get_legend_handles_labels()
ax.legend(handles=(handles+handles2), labels=(labels+labels2), loc='upper left')
ax2.legend(handles=(handles+handles2), labels=(labels+labels2), loc='upper left')
nticks = 7
ax.yaxis.set_major_locator(matplotlib.ticker.LinearLocator(nticks))
ax2.yaxis.set_major_locator(matplotlib.ticker.LinearLocator(nticks))
ax.set_xlabel('Year')
ax.set_ylabel('Population (millions)')
plt.savefig(pathgraphs + 'UK-pop-GDPpc-1200-1730.pdf', dpi=300, bbox_inches='tight')



In [171]:

    
fig









    Out[171]:



In [ ]:

	lrgdpe_pc	lrgdpo_pc	lcgdpe_pc	lcgdpo_pc	lrgdpna_pc
count	340.000000	340.000000	340.000000	340.000000	340.000000
mean	0.984764	0.976230	0.984787	0.983319	0.959519
std	0.021766	0.026951	0.021799	0.022352	0.031102
min	0.914426	0.903082	0.913860	0.899516	0.899516
25%	0.976850	0.958346	0.976905	0.973735	0.936402
50%	0.995482	0.988851	0.995799	0.995067	0.951741
75%	0.999805	0.998989	0.999805	0.998989	0.994875
max	1.000000	1.000000	1.000000	1.000000	1.000000

	0	1	2	3	4
country	Colombia	Turkey	USA	Germany	Chile
noise	0.469112	-0.282863	-1.50906	-1.13563	1.21211

	country	noise	noise_sq	noise and its square	name length
0	Colombia	0.469112	0.220066	0.689179	8
1	Turkey	-0.282863	0.0800117	-0.202852	6
2	USA	-1.50906	2.27726	0.768199	3
3	Germany	-1.13563	1.28966	0.154029	7
4	Chile	1.21211	1.46922	2.68133	5

	countrycode	country	year	cgdppc	rgdpnapc	pop	i_cig	i_bm
0	AFG	Afghanistan	1820.0	NaN	NaN	3280.0	NaN	NaN
1	AFG	Afghanistan	1870.0	NaN	NaN	4207.0	NaN	NaN
2	AFG	Afghanistan	1913.0	NaN	NaN	5730.0	NaN	NaN
3	AFG	Afghanistan	1950.0	2392.0	2392.0	8150.0	Extrapolated	NaN
4	AFG	Afghanistan	1951.0	2422.0	2422.0	8284.0	Extrapolated	NaN
...	...	...	...	...	...	...	...	...
19868	ZWE	Zimbabwe	2012.0	1623.0	1604.0	12620.0	Extrapolated	NaN
19869	ZWE	Zimbabwe	2013.0	1801.0	1604.0	13183.0	Extrapolated	NaN
19870	ZWE	Zimbabwe	2014.0	1797.0	1594.0	13772.0	Extrapolated	NaN
19871	ZWE	Zimbabwe	2015.0	1759.0	1560.0	14230.0	Extrapolated	NaN
19872	ZWE	Zimbabwe	2016.0	1729.0	1534.0	14547.0	Extrapolated	NaN

	countrycode	country	year	cgdppc	rgdpnapc	pop	i_cig	i_bm
0	AFG	Afghanistan	1820	NaN	NaN	3280.0	NaN	NaN
1	AFG	Afghanistan	1870	NaN	NaN	4207.0	NaN	NaN
2	AFG	Afghanistan	1913	NaN	NaN	5730.0	NaN	NaN
3	AFG	Afghanistan	1950	2392.0	2392.0	8150.0	Extrapolated	NaN
4	AFG	Afghanistan	1951	2422.0	2422.0	8284.0	Extrapolated	NaN
...	...	...	...	...	...	...	...	...
19868	ZWE	Zimbabwe	2012	1623.0	1604.0	12620.0	Extrapolated	NaN
19869	ZWE	Zimbabwe	2013	1801.0	1604.0	13183.0	Extrapolated	NaN
19870	ZWE	Zimbabwe	2014	1797.0	1594.0	13772.0	Extrapolated	NaN
19871	ZWE	Zimbabwe	2015	1759.0	1560.0	14230.0	Extrapolated	NaN
19872	ZWE	Zimbabwe	2016	1729.0	1534.0	14547.0	Extrapolated	NaN

	Unnamed: 0	1	Unnamed: 2	1000	Unnamed: 4	1500	Unnamed: 6	1600	Unnamed: 8	1700	...	2002	2003	2004	2005	2006	2007	2008	2009	Unnamed: 201	2030
0	Western Europe	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	Austria	500.0	NaN	700.0	NaN	2000.0	NaN	2500.0	NaN	2500.0	...	8148.312	8162.656	8174.762	8184.691	8192.880	8199.783	8205.533	8210	NaN	8120.000
2	Belgium	300.0	NaN	400.0	NaN	1400.0	NaN	1600.0	NaN	2000.0	...	10311.970	10330.824	10348.276	10364.388	10379.067	10392.226	10403.951	10414	NaN	10409.000
3	Denmark	180.0	NaN	360.0	NaN	600.0	NaN	650.0	NaN	700.0	...	5374.693	5394.138	5413.392	5432.335	5450.661	5468.120	5484.723	5501	NaN	5730.488
4	Finland	20.0	NaN	40.0	NaN	300.0	NaN	400.0	NaN	400.0	...	5193.039	5204.405	5214.512	5223.442	5231.372	5238.460	5244.749	5250	NaN	5201.445
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
273	Guadeloupe	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	435.739	440.189	444.515	448.713	452.776	456.698	460.486	n.a.	NaN	523.493
274	Guyana (Fr.)	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	182.333	186.917	191.309	195.506	199.509	203.321	206.941	n.a.	NaN	272.781
275	Martinique	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	422.277	425.966	429.510	432.900	436.131	439.202	442.119	n.a.	NaN	486.714
276	Reunion	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	743.981	755.171	766.153	776.948	787.584	798.094	808.506	n.a.	NaN	1025.217
277	Total	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	1784.330	1808.243	1831.487	1854.067	1876.000	1897.315	1918.052	n.a.	NaN	2308.205

	Unnamed: 0	1	Unnamed: 2	1000	Unnamed: 4	1500	Unnamed: 6	1600	Unnamed: 8	1700	...	1999	2000	2001	2002	2003	2004	2005	2006	2007	2008
0	Western Europe	NaN	NaN	NaN	NaN		NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	Austria	425.000000	NaN	425.000000	NaN	707	NaN	837.200000	NaN	993.200000	...	20065.093878	20691.415561	20812.893753	20955.874051	21165.047259	21626.929322	22140.725899	22892.682427	23674.041130	24130.547035
2	Belgium	450.000000	NaN	425.000000	NaN	875	NaN	975.625000	NaN	1144.000000	...	19964.428266	20656.458570	20761.238278	21032.935511	21205.859281	21801.602508	22246.561977	22881.632810	23446.949672	23654.763464
3	Denmark	400.000000	NaN	400.000000	NaN	738.333	NaN	875.384615	NaN	1038.571429	...	22254.890572	22975.162513	23059.374968	23082.620719	23088.582457	23492.664119	23972.564284	24680.492880	24995.245167	24620.568805
4	Finland	400.000000	NaN	400.000000	NaN	453.333	NaN	537.500000	NaN	637.500000	...	18855.985066	19770.363126	20245.896529	20521.702225	20845.802738	21574.406196	22140.573208	23190.283543	24131.519569	24343.586318
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
190	Total Africa	472.352941	NaN	424.767802	NaN	413.71	NaN	422.071584	NaN	420.628684	...	1430.752576	1447.071701	1471.156532	1482.629352	1517.935644	1558.099461	1603.686517	1663.531318	1724.226776	1780.265474
191	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
192	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
193	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
194	World Average	466.752281	NaN	453.402162	NaN	566.389	NaN	595.783856	NaN	614.853602	...	5833.255492	6037.675887	6131.705471	6261.734267	6469.119575	6738.281333	6960.031035	7238.383483	7467.648232	7613.922924

	Country	gdppc_1	gdppc_1000	gdppc_1500	gdppc_1600	gdppc_1700	gdppc_1820	gdppc_1821	gdppc_1822	gdppc_1823	...	gdppc_1999	gdppc_2000	gdppc_2001	gdppc_2002	gdppc_2003	gdppc_2004	gdppc_2005	gdppc_2006	gdppc_2007	gdppc_2008
0	Western Europe	576.167665	427.425665	771.094	887.906964	993.456911	1194.184683	NaN	NaN	NaN	...	18497.208533	19176.001655	19463.863297	19627.707522	19801.145425	20199.220700	20522.238008	21087.304789	21589.011346	21671.774225
1	Western Offshoots	400.000000	400.000000	400	400.000000	476.000000	1201.993477	NaN	NaN	NaN	...	26680.580823	27393.808035	27387.312035	27648.644070	28090.274362	28807.845958	29415.399334	29922.741918	30344.425293	30151.805880
2	East Europe	411.789474	400.000000	496	548.023599	606.010638	683.160984	NaN	NaN	NaN	...	5734.162109	5970.165085	6143.112873	6321.395376	6573.365882	6942.136596	7261.721015	7730.097570	8192.881904	8568.967581
3	Latin America	400.000000	400.000000	416.457	437.558140	526.639004	691.060678	NaN	NaN	NaN	...	5765.585093	5889.237351	5846.295193	5746.609672	5785.841237	6063.068969	6265.525702	6530.533583	6783.869986	6973.134656
4	Asia	455.671021	469.961665	568.418	573.550859	571.605276	580.626115	NaN	NaN	NaN	...	3623.902724	3797.608955	3927.186275	4121.275511	4388.982705	4661.517477	4900.563281	5187.253152	5408.383588	5611.198564
5	Africa	472.352941	424.767802	413.71	422.071584	420.628684	419.755914	NaN	NaN	NaN	...	1430.752576	1447.071701	1471.156532	1482.629352	1517.935644	1558.099461	1603.686517	1663.531318	1724.226776	1780.265474

	Country	year	gdppc_
0	Western Europe	1	576.168
1	Western Offshoots	1	400
2	East Europe	1	411.789
3	Latin America	1	400
4	Asia	1	455.671
...	...	...	...
409	Western Offshoots	2008	30151.8
410	East Europe	2008	8568.97
411	Latin America	2008	6973.13
412	Asia	2008	5611.2
413	Africa	2008	1780.27

Country	Africa	Asia	East Europe	Latin America	Western Europe	Western Offshoots
year
1	472.352941	455.671021	411.789474	400.000000	576.167665	400.000000
1000	424.767802	469.961665	400.000000	400.000000	427.425665	400.000000
1500	413.709504	568.417900	496.000000	416.457143	771.093805	400.000000
1600	422.071584	573.550859	548.023599	437.558140	887.906964	400.000000
1700	420.628684	571.605276	606.010638	526.639004	993.456911	476.000000
...	...	...	...	...	...	...
2004	1558.099461	4661.517477	6942.136596	6063.068969	20199.220700	28807.845958
2005	1603.686517	4900.563281	7261.721015	6265.525702	20522.238008	29415.399334
2006	1663.531318	5187.253152	7730.097570	6530.533583	21087.304789	29922.741918
2007	1724.226776	5408.383588	8192.881904	6783.869986	21589.011346	30344.425293
2008	1780.265474	5611.198564	8568.967581	6973.134656	21671.774225	30151.805880

	Region	Period	GDPpc	Population
1	World	1-1000	-0.000029	0.000169
2	World	1000-1500	0.000445	0.000990
3	World	1500-1820	0.000505	0.002708
4	World	1820-1913	0.008948	0.005856

	Region	Period	variable	growth
0	World	1-1000	Income per capita	-0.000029
1	World	1000-1500	Income per capita	0.000445
2	World	1500-1820	Income per capita	0.000505
3	World	1820-1913	Income per capita	0.008948
4	World	1-1000	Population	0.000169
5	World	1000-1500	Population	0.000990
6	World	1500-1820	Population	0.002708
7	World	1820-1913	Population	0.005856

	Country	y1750	y1800	y1830	y1860	y1880	y1900	y1913	y1928	y1938	y1953	y1963	y1973	y1980
0	Developed Countries	8.0	8.0	11.0	16	24	35	55	71	81	135	194	315	344
1	Europe	8.0	8.0	11.0	17	23	33	45	76	94	107	166	260	280
2	Belgium	9.0	10.0	14.0	28	43	56	88	116	89	117	183	291	316
3	France	9.0	9.0	12.0	20	28	39	59	82	73	95	167	259	277
4	Germany	8.0	8.0	9.0	15	25	52	85	101	128	144	244	366	395
5	Italy	8.0	8.0	8.0	10	12	17	26	39	44	61	121	194	231
6	Spain	7.0	7.0	8.0	11	14	19	22	28	23	31	56	144	159
7	Sweden	7.0	8.0	9.0	15	24	41	67	84	135	163	262	405	409
8	Switzerland	7.0	10.0	16.0	26	39	67	87	90	88	167	259	366	354
9	United Kingdom	10.0	16.0	25.0	64	87	100	115	122	157	210	253	341	325
10	Canada	NaN	5.0	6.0	7	10	24	46	82	84	185	237	370	379
11	United States	4.0	9.0	14.0	21	38	69	126	182	167	354	393	604	629
12	Japan	7.0	7.0	7.0	7	9	12	20	30	51	40	113	310	353
13	Third World	7.0	6.0	6.0	4	3	2	2	3	4	5	8	14	17
14	China	8.0	6.0	6.0	4	4	3	3	4	4	5	10	18	24
15	India	7.0	6.0	6.0	3	2	1	2	3	4	6	8	14	16
16	Brazil	NaN	NaN	NaN	4	4	5	7	10	10	13	23	42	55
17	Mexico	NaN	NaN	NaN	5	4	5	7	9	8	12	22	36	41
18	World	7.0	6.0	7.0	7	9	14	21	28	31	48	66	100	103

	Country	year	manufacturing
0	Developed Countries	1750	0.270
1	Belgium	1750	0.003
2	France	1750	0.040
3	Germany	1750	0.029
4	Italy	1750	0.024
...	...	...	...
216	Third World	1980	0.120
217	China	1980	0.050
218	India	1980	0.023
219	Brazil	1980	0.014
220	Mexico	1980	0.006

	Country	year	indpotential
0	Developed Countries	1750	34.4
1	Europe	1750	29.6
2	Belgium	1750	0.4
3	France	1750	5.0
4	Germany	1750	3.7
...	...	...	...
242	China	1980	553.0
243	India	1980	254.0
244	Brazil	1980	159.0
245	Mexico	1980	68.0
246	World	1980	11041.0

	Variable name	Variable definition
0	Identifier variables	NaN
1	countrycode	3-letter ISO country code
2	country	Country name
3	currency_unit	Currency unit
4	year	Year
...	...	...
62	pl_g	Price level of government consumption, price ...
63	pl_x	Price level of exports, price level of USA GDP...
64	pl_m	Price level of imports, price level of USA GDP...
65	pl_n	Price level of the capital stock, price level ...
66	pl_k	Price level of the capital services, price lev...

	countrycode	country	currency_unit	year	rgdpe	rgdpo	pop	emp	avh	hc	...	csh_x	csh_m	csh_r	pl_c	pl_i	pl_g	pl_x	pl_m	pl_n	pl_k
0	ABW	Aruba	Aruban Guilder	1950	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	ABW	Aruba	Aruban Guilder	1951	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	ABW	Aruba	Aruban Guilder	1952	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	ABW	Aruba	Aruban Guilder	1953	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	ABW	Aruba	Aruban Guilder	1954	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
12371	ZWE	Zimbabwe	US Dollar	2013	28086.937500	28329.810547	15.054506	7.914061	NaN	2.504635	...	0.169638	-0.426188	0.090225	0.577488	0.582022	0.448409	0.723247	0.632360	0.383488	0.704313
12372	ZWE	Zimbabwe	US Dollar	2014	29217.554688	29355.759766	15.411675	8.222112	NaN	2.550258	...	0.141791	-0.340442	0.051500	0.600760	0.557172	0.392895	0.724510	0.628352	0.349735	0.704991
12373	ZWE	Zimbabwe	US Dollar	2015	30091.923828	29150.750000	15.777451	8.530669	NaN	2.584653	...	0.137558	-0.354298	-0.023353	0.622927	0.580814	0.343926	0.654940	0.564430	0.348472	0.713156
12374	ZWE	Zimbabwe	US Dollar	2016	30974.292969	29420.449219	16.150362	8.839398	NaN	2.616257	...	0.141248	-0.310446	0.003050	0.640176	0.599462	0.337853	0.657060	0.550084	0.346553	0.718671
12375	ZWE	Zimbabwe	US Dollar	2017	32693.474609	30940.816406	16.529903	9.181251	NaN	2.648248	...	0.141799	-0.299539	0.019133	0.647136	0.726222	0.340680	0.645338	0.539529	0.412392	0.755215

	year	rgdpe	rgdpo	pop	emp	avh	hc	ccon	cda	cgdpe	...	csh_x	csh_m	csh_r	pl_c	pl_i	pl_g	pl_x	pl_m	pl_n	pl_k
count	12376.000000	9.985000e+03	9.985000e+03	9985.000000	8841.000000	3373.000000	8299.000000	9.985000e+03	9.985000e+03	9.985000e+03	...	9985.000000	9985.000000	9985.000000	9985.000000	9985.000000	9985.000000	9985.000000	9985.000000	9959.000000	7047.000000
mean	1983.500000	2.720569e+05	2.691928e+05	30.736767	14.799485	1984.099854	2.064241	1.984998e+05	2.686580e+05	2.697088e+05	...	0.229183	-0.307399	0.019670	0.391839	0.486303	0.368860	0.436420	0.431026	0.466652	1.403137
std	19.628579	1.078882e+06	1.070178e+06	114.569824	59.107712	272.879944	0.720774	7.772703e+05	1.079234e+06	1.070720e+06	...	0.260547	0.681575	0.201448	0.280254	0.956450	0.347244	0.211918	0.220563	0.400624	2.628997
min	1950.000000	1.846645e+01	1.977999e+01	0.004376	0.001180	1353.886841	1.007038	1.443100e+01	1.986141e+01	1.848834e+01	...	-1.496417	-26.741989	-8.731015	0.017207	0.012448	0.010474	0.007868	0.022644	0.019666	0.060732
25%	1966.750000	6.178189e+03	6.380658e+03	1.634517	0.940000	1799.336060	1.431531	5.227761e+03	6.395296e+03	6.002223e+03	...	0.068159	-0.381261	-0.022347	0.182697	0.198099	0.125520	0.243906	0.248910	0.219715	0.663940
50%	1983.500000	2.725946e+04	2.710632e+04	6.115370	3.021000	1972.072876	1.954407	2.153850e+04	2.763264e+04	2.677256e+04	...	0.144143	-0.203762	0.000727	0.326817	0.396347	0.256664	0.473103	0.486665	0.364834	0.982678
75%	2000.250000	1.386558e+05	1.374726e+05	19.891548	8.583438	2149.860352	2.649120	1.005379e+05	1.357644e+05	1.362898e+05	...	0.301996	-0.104336	0.044098	0.520135	0.594202	0.490205	0.596405	0.576243	0.569292	1.458653
max	2017.000000	1.839607e+07	1.838384e+07	1409.517456	792.575317	2910.734863	3.974208	1.483615e+07	1.846078e+07	1.792857e+07	...	3.057809	23.158607	9.917986	3.986815	35.654171	2.367351	2.271417	5.465247	6.730951	60.361191

	Variable name	Variable definition
12	hc	Human capital index, based on years of schooli...
19	cn	Capital stock at current PPPs (in mil. 2011US$)
20	ck	Capital services levels at current PPPs (USA=1)
28	rnna	Capital stock at constant 2011 national prices...
29	rkna	Capital services at constant 2011 national pri...
34	delta	Average depreciation rate of the capital stock
47	i_irr	0/1/2/3: the observation for irr is not an out...
53	csh_i	Share of gross capital formation at current PPPs
61	pl_i	Price level of capital formation, price level...
65	pl_n	Price level of the capital stock, price level ...
66	pl_k	Price level of the capital services, price lev...

	Variable name	Variable definition
7	rgdpe	Expenditure-side real GDP at chained PPPs (in ...
8	rgdpo	Output-side real GDP at chained PPPs (in mil. ...
17	cgdpe	Expenditure-side real GDP at current PPPs (in ...
18	cgdpo	Output-side real GDP at current PPPs (in mil. ...
25	rgdpna	Real GDP at constant 2011 national prices (in ...
32	labsh	Share of labour compensation in GDP at current...
38	pl_con	Price level of CCON (PPP/XR), price level of U...
39	pl_da	Price level of CDA (PPP/XR), price level of US...
40	pl_gdpo	Price level of CGDPo (PPP/XR), price level of...
46	i_outlier	0/1: the observation on pl_gdpe or pl_gdpo is ...
57	csh_r	Share of residual trade and GDP statistical di...
60	pl_c	Price level of household consumption, price l...
61	pl_i	Price level of capital formation, price level...
62	pl_g	Price level of government consumption, price ...
63	pl_x	Price level of exports, price level of USA GDP...
64	pl_m	Price level of imports, price level of USA GDP...

	countrycode	country	year	lrgdpe_pc
10	ABW	Aruba	1960	NaN
78	AGO	Angola	1960	NaN
146	AIA	Anguilla	1960	NaN
214	ALB	Albania	1960	NaN
282	ARE	United Arab Emirates	1960	NaN
...	...	...	...	...
12046	VNM	Viet Nam	1960	NaN
12114	YEM	Yemen	1960	NaN
12182	ZAF	South Africa	1960	8.664412
12250	ZMB	Zambia	1960	7.883263
12318	ZWE	Zimbabwe	1960	7.646267

	countrycode	y1950	y1951	y1952	y1953	y1954	y1955	y1956	y1957	y1958	...	y2008	y2009	y2010	y2011	y2012	y2013	y2014	y2015	y2016	y2017
0	ABW	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	0.989490	0.989011	0.977356	0.972664	0.969597	0.969952	0.968135	0.966769	0.963675	0.962826
1	AGO	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	0.786656	0.767352	0.794583	0.815654	0.815493	0.812440	0.805392	0.786142	0.782010	0.778917
2	AIA	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	0.975120	0.951357	0.943233	0.942697	0.935110	0.931704	0.934498	0.934647	0.928544	0.918428
3	ALB	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	0.826849	0.836372	0.844121	0.847105	0.848709	0.845853	0.848582	0.851238	0.850862	0.856015
4	ARE	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	1.041859	1.020390	1.013018	1.022369	1.022926	1.024411	1.027438	1.016130	1.014197	1.014201
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
177	VNM	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	0.755160	0.761839	0.771784	0.778270	0.783986	0.786128	0.789185	0.791125	0.796555	0.801899
178	YEM	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	0.755317	0.757512	0.772611	0.761339	0.750812	0.750272	0.737346	0.711625	0.697790	0.684275
179	ZAF	0.893115	0.887755	0.87563	0.881956	0.889942	0.886195	0.889469	0.892274	0.891707	...	0.862708	0.862971	0.864622	0.867436	0.866010	0.865265	0.863675	0.862188	0.860596	0.860410
180	ZMB	NaN	NaN	NaN	NaN	NaN	0.813323	0.816613	0.796724	0.785549	...	0.721515	0.732669	0.742396	0.750338	0.753036	0.753754	0.753928	0.751828	0.751907	0.757633
181	ZWE	NaN	NaN	NaN	NaN	0.769047	0.764846	0.770286	0.776759	0.775427	...	0.618364	0.670103	0.674767	0.682105	0.690134	0.693611	0.693790	0.692485	0.692220	0.694026

	iso3c	iso2c	name	region	adminregion	incomeLevel	lendingType	capitalCity	longitude	latitude
0	ABW	AW	Aruba	Latin America & Caribbean		High income	Not classified	Oranjestad	-70.0167	12.51670
1	AFG	AF	Afghanistan	South Asia	South Asia	Low income	IDA	Kabul	69.1761	34.52280
2	AFR	A9	Africa	Aggregates		Aggregates	Aggregates		NaN	NaN
3	AGO	AO	Angola	Sub-Saharan Africa	Sub-Saharan Africa (excluding high income)	Lower middle income	IBRD	Luanda	13.2420	-8.81155
4	ALB	AL	Albania	Europe & Central Asia	Europe & Central Asia (excluding high income)	Upper middle income	IBRD	Tirane	19.8172	41.33170
...	...	...	...	...	...	...	...	...	...	...
299	XZN	A5	Sub-Saharan Africa excluding South Africa and ...	Aggregates		Aggregates	Aggregates		NaN	NaN
300	YEM	YE	Yemen, Rep.	Middle East & North Africa	Middle East & North Africa (excluding high inc...	Low income	IDA	Sana'a	44.2075	15.35200
301	ZAF	ZA	South Africa	Sub-Saharan Africa	Sub-Saharan Africa (excluding high income)	Upper middle income	IBRD	Pretoria	28.1871	-25.74600
302	ZMB	ZM	Zambia	Sub-Saharan Africa	Sub-Saharan Africa (excluding high income)	Lower middle income	IDA	Lusaka	28.2937	-15.39820
303	ZWE	ZW	Zimbabwe	Sub-Saharan Africa	Sub-Saharan Africa (excluding high income)	Lower middle income	Blend	Harare	31.0672	-17.83120

	id	name	unit	source	sourceNote	sourceOrganization	topics
24	1.1_ACCESS.ELECTRICITY.TOT	Access to electricity (% of total population)		Sustainable Energy for All	Access to electricity is the percentage of pop...	b'World Bank Global Electrification Database 2...
39	1.2_ACCESS.ELECTRICITY.RURAL	Access to electricity (% of rural population)		Sustainable Energy for All	Access to electricity is the percentage of rur...	b'World Bank Global Electrification Database 2...
40	1.3_ACCESS.ELECTRICITY.URBAN	Access to electricity (% of urban population)		Sustainable Energy for All	Access to electricity is the percentage of tot...	b'World Bank Global Electrification Database 2...
128	2.1_ACCESS.CFT.TOT	Access to Clean Fuels and Technologies for coo...		Sustainable Energy for All		b''
159	3.11.01.01.popcen	Population census		Statistical Capacity Indicators	Population censuses collect data on the size, ...	b'World Bank Microdata library. Original sourc...
...	...	...	...	...	...	...	...
17439	per_sionl.overlap_pop_urb	Population only receiving All Social Insurance...		The Atlas of Social Protection: Indicators of ...	NULL	b'The Atlas of Social Protection: Indicators o...	Social Protection & Labor
17440	per_sionl.overlap_q1_preT_tot	Population in the 1st quintile (poorest) only ...		The Atlas of Social Protection: Indicators of ...	NULL	b'The Atlas of Social Protection: Indicators o...	Social Protection & Labor
17441	per_sionl.overlap_q1_rur	Population in the 1st quintile (poorest) only ...		The Atlas of Social Protection: Indicators of ...	NULL	b'The Atlas of Social Protection: Indicators o...	Social Protection & Labor
17442	per_sionl.overlap_q1_tot	Population in the 1st quintile (poorest) only ...		The Atlas of Social Protection: Indicators of ...	NULL	b'The Atlas of Social Protection: Indicators o...	Social Protection & Labor
17443	per_sionl.overlap_q1_urb	Population in the 1st quintile (poorest) only ...		The Atlas of Social Protection: Indicators of ...	NULL	b'The Atlas of Social Protection: Indicators o...	Social Protection & Labor

	country	year	SP.POP.0014.FE.IN	SP.POP.1564.FE.IN	SP.POP.65UP.FE.IN	SP.POP.0014.MA.IN	SP.POP.1564.MA.IN	SP.POP.65UP.MA.IN	SP.POP.TOTL.FE.IN	SP.POP.TOTL.MA.IN	...	SP.POP.4044.FE	SP.POP.4549.FE	SP.POP.5054.FE	SP.POP.5559.FE	SP.POP.6064.FE	SP.POP.6569.FE	SP.POP.7074.FE	SP.POP.7579.FE	SP.POP.80UP.FE	NY.GDP.PCAP.PP.KD
0	Afghanistan	2017	7732365.0	9.413927e+06	497974.0	8.122796e+06	1.010025e+07	429088.0	1.764427e+07	1.865213e+07	...	700156.0	562807.0	451226.0	357049.0	275515.0	218541.0	145457.0	78439.0	55537.0	2202.570851
1	Albania	2017	243452.0	9.673940e+05	198481.0	2.746030e+05	1.005023e+06	184503.0	1.409327e+06	1.464130e+06	...	85864.0	94264.0	101671.0	101284.0	84148.0	63784.0	52773.0	40717.0	41206.0	13093.652313
2	Algeria	2017	6005664.0	1.316583e+07	1310941.0	6.261929e+06	1.339909e+07	1245754.0	2.048243e+07	2.090677e+07	...	1334956.0	1112079.0	950716.0	767641.0	631310.0	458099.0	325968.0	256637.0	270236.0	11550.617638
3	American Samoa	2017	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	Andorra	2017	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
259	West Bank and Gaza	2017	855495.0	1.267122e+06	72244.0	8.935800e+05	1.300776e+06	65588.0	2.194861e+06	2.259944e+06	...	102015.0	84835.0	69853.0	50659.0	36755.0	28050.0	20237.0	13392.0	10565.0	1183.435345
260	World	2017	941262629.0	2.422523e+09	358609161.0	1.005788e+09	2.488493e+09	290411331.0	3.722395e+09	3.784692e+09	...	240536539.0	231501913.0	209202400.0	179317688.0	155624953.0	123519929.0	87547943.0	64952254.0	82589032.0	16167.228725
261	Yemen, Rep.	2017	5456767.0	7.919190e+06	431560.0	5.677559e+06	7.987563e+06	362182.0	1.380752e+07	1.402730e+07	...	559027.0	443721.0	385742.0	311511.0	235265.0	179953.0	122274.0	72535.0	56798.0	NaN
262	Zambia	2017	3794229.0	4.502766e+06	213893.0	3.859276e+06	4.346054e+06	137471.0	8.510888e+06	8.342800e+06	...	355596.0	259410.0	196406.0	152437.0	115017.0	85429.0	60372.0	38551.0	29540.0	3485.002103
263	Zimbabwe	2017	3024124.0	4.169317e+06	266180.0	3.040282e+06	3.590650e+06	146192.0	7.459621e+06	6.777124e+06	...	342473.0	243625.0	193105.0	159731.0	133656.0	95561.0	72600.0	53412.0	44607.0	3134.327494

	iso3c	name	region	gender_ratio
38	CHN	China	East Asia & Pacific	1.054808
224	SYC	Seychelles	Sub-Saharan Africa	1.056584
1	AFG	Afghanistan	South Asia	1.057122
167	MYS	Malaysia	East Asia & Pacific	1.059319
182	PAK	Pakistan	South Asia	1.060265
202	SAS	South Asia	Aggregates	1.068105
238	TSA	South Asia (IDA & IBRD)	Aggregates	1.068105
258	WSM	Samoa	East Asia & Pacific	1.071733
151	MEA	Middle East & North Africa	Aggregates	1.073636
5	ARB	Arab World	Aggregates	1.074033
107	IND	India	South Asia	1.082583
29	BRN	Brunei Darussalam	East Asia & Pacific	1.083313
216	SST	Small states	Aggregates	1.096263
206	SGP	Singapore	East Asia & Pacific	1.098113
54	DJI	Djibouti	Middle East & North Africa	1.113483
30	BTN	Bhutan	South Asia	1.123726
181	OSS	Other small states	Aggregates	1.130181
86	GNQ	Equatorial Guinea	Sub-Saharan Africa	1.245989
203	SAU	Saudi Arabia	Middle East & North Africa	1.343270
125	KWT	Kuwait	Middle East & North Africa	1.492402
150	MDV	Maldives	South Asia	1.620545
20	BHR	Bahrain	Middle East & North Africa	1.695785
180	OMN	Oman	Middle East & North Africa	1.929817
6	ARE	United Arab Emirates	Middle East & North Africa	2.280813
198	QAT	Qatar	Middle East & North Africa	3.109720

	iso3c	name	region	gender_ratio
176	NPL	Nepal	South Asia	0.832683
49	CUW	Curacao	Latin America & Caribbean	0.847313
143	LVA	Latvia	Europe & Central Asia	0.849527
141	LTU	Lithuania	Europe & Central Asia	0.857032
94	HKG	Hong Kong SAR, China	East Asia & Pacific	0.857683
246	UKR	Ukraine	Europe & Central Asia	0.862200
200	RUS	Russian Federation	Europe & Central Asia	0.863357
23	BLR	Belarus	Europe & Central Asia	0.870642
209	SLV	El Salvador	Latin America & Caribbean	0.884228
69	EST	Estonia	Europe & Central Asia	0.886953
8	ARM	Armenia	Europe & Central Asia	0.888378
192	PRT	Portugal	Europe & Central Asia	0.897158
0	ABW	Aruba	Latin America & Caribbean	0.904285
99	HUN	Hungary	Europe & Central Asia	0.906684
254	VIR	Virgin Islands (U.S.)	Latin America & Caribbean	0.908106
263	ZWE	Zimbabwe	Sub-Saharan Africa	0.908508
190	PRI	Puerto Rico	Latin America & Caribbean	0.910474
80	GEO	Georgia	Europe & Central Asia	0.913288
62	ECA	Europe & Central Asia (excluding high income)	Aggregates	0.917302
229	TEC	Europe & Central Asia (IDA & IBRD countries)	Aggregates	0.919337
144	MAC	Macao SAR, China	East Asia & Pacific	0.923031
148	MDA	Moldova	Europe & Central Asia	0.923076
83	GIN	Guinea	Sub-Saharan Africa	0.925869
136	LKA	Sri Lanka	South Asia	0.926015
97	HRV	Croatia	Europe & Central Asia	0.927410
31	BWA	Botswana	Sub-Saharan Africa	0.929864
10	ATG	Antigua and Barbuda	Latin America & Caribbean	0.930138
158	MMR	Myanmar	East Asia & Pacific	0.930308
248	URY	Uruguay	Latin America & Caribbean	0.932800
28	BRB	Barbados	Latin America & Caribbean	0.933902
34	CEB	Central Europe and the Baltics	Aggregates	0.937790
169	NAM	Namibia	Sub-Saharan Africa	0.939169
75	FRA	France	Europe & Central Asia	0.939170
63	ECS	Europe & Central Asia	Aggregates	0.939541
118	KAZ	Kazakhstan	Europe & Central Asia	0.940358
188	POL	Poland	Europe & Central Asia	0.940866
163	MOZ	Mozambique	Sub-Saharan Africa	0.941291
21	BHS	Bahamas, The	Latin America & Caribbean	0.944080
114	ITA	Italy	Europe & Central Asia	0.945175
19	BGR	Bulgaria	Europe & Central Asia	0.945606
219	SVK	Slovak Republic	Europe & Central Asia	0.946981
222	SWZ	Eswatini	Sub-Saharan Africa	0.947617
199	ROU	Romania	Europe & Central Asia	0.948377
205	SEN	Senegal	Sub-Saharan Africa	0.948715

Model:	OLS	Adj. R-squared:	0.998
Dependent Variable:	lpop_ma	AIC:	-306.5235
Date:	2020-06-13 11:54	BIC:	-299.5706
No. Observations:	239	Log-Likelihood:	155.26
Df Model:	1	F-statistic:	1.055e+05
Df Residuals:	237	Prob (F-statistic):	5.56e-316
R-squared:	0.998	Scale:	0.016103

	Coef.	Std.Err.	t	P>\|t\|	[0.025	0.975]
const	0.0553	0.0496	1.1149	0.2660	-0.0424	0.1529
lpop_fe	0.9968	0.0031	324.8267	0.0000	0.9908	1.0029

Omnibus:	287.615	Durbin-Watson:	1.950
Prob(Omnibus):	0.000	Jarque-Bera (JB):	13838.973
Skew:	5.192	Prob(JB):	0.000
Kurtosis:	38.803	Condition No.:	98

Model:	OLS	Adj. R-squared:	0.022
Dependent Variable:	lgdppc	AIC:	693.7805
Date:	2020-06-13 11:54	BIC:	700.6216
No. Observations:	226	Log-Likelihood:	-344.89
Df Model:	1	F-statistic:	6.003
Df Residuals:	224	Prob (F-statistic):	0.0151
R-squared:	0.026	Scale:	1.2500