In [1]:
# Configure Jupyter so figures appear in the notebook
%matplotlib inline
# Configure Jupyter to display the assigned value after an assignment
%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'
# import functions from the modsim.py module
from modsim import *
from pandas import read_html
In [2]:
filename = 'data/World_population_estimates.html'
tables = read_html(filename, header=0, index_col=0, decimal='M')
table2 = tables[2]
table2.columns = ['census', 'prb', 'un', 'maddison',
'hyde', 'tanton', 'biraben', 'mj',
'thomlinson', 'durand', 'clark']
table2.shape
Out[2]:
In [3]:
census = table2.census / 1e9
census.shape
Out[3]:
In [4]:
un = table2.un / 1e9
un.shape
Out[4]:
A DataFrame
contains index
, which labels the rows. It is an Int64Index
, which is similar to a NumPy array.
In [5]:
table2.index
Out[5]:
And columns
, which labels the columns.
In [6]:
table2.columns
Out[6]:
And values
, which is an array of values.
In [7]:
table2.values
Out[7]:
A Series
does not have columns
, but it does have name
.
In [8]:
census.name
Out[8]:
It contains values
, which is an array.
In [9]:
census.values
Out[9]:
And it contains index
:
In [10]:
census.index
Out[10]:
If you ever wonder what kind of object a variable refers to, you can use the type
function. The result indicates what type the object is, and the module where that type is defined.
DataFrame
, Int64Index
, Index
, and Series
are defined by Pandas.
ndarray
is defined by NumPy.
In [11]:
type(table2)
Out[11]:
In [12]:
type(table2.index)
Out[12]:
In [13]:
type(table2.columns)
Out[13]:
In [14]:
type(table2.values)
Out[14]:
In [15]:
type(census)
Out[15]:
In [16]:
type(census.index)
Out[16]:
In [17]:
type(census.values)
Out[17]:
The following exercise provides a chance to practice what you have learned so far, and maybe develop a different growth model. If you feel comfortable with what we have done so far, you might want to give it a try.
Optional Exercise: On the Wikipedia page about world population estimates, the first table contains estimates for prehistoric populations. The following cells process this table and plot some of the results.
In [18]:
filename = 'data/World_population_estimates.html'
tables = read_html(filename, header=0, index_col=0, decimal='M')
len(tables)
Out[18]:
Select tables[1]
, which is the second table on the page.
In [19]:
table1 = tables[1]
table1.head()
Out[19]:
Not all agencies and researchers provided estimates for the same dates. Again NaN
is the special value that indicates missing data.
In [20]:
table1.tail()
Out[20]:
Again, we'll replace the long column names with more convenient abbreviations.
In [21]:
table1.columns = ['PRB', 'UN', 'Maddison', 'HYDE', 'Tanton',
'Biraben', 'McEvedy & Jones', 'Thomlinson', 'Durand', 'Clark']
Some of the estimates are in a form Pandas doesn't recognize as numbers, but we can coerce them to be numeric.
In [22]:
for col in table1.columns:
table1[col] = pd.to_numeric(table1[col], errors='coerce')
Here are the results. Notice that we are working in millions now, not billions.
In [23]:
table1.plot()
decorate(xlim=[-10000, 2000], xlabel='Year',
ylabel='World population (millions)',
title='Prehistoric population estimates')
plt.legend(fontsize='small');
We can use xlim
to zoom in on everything after Year 0.
In [24]:
table1.plot()
decorate(xlim=[0, 2000], xlabel='Year',
ylabel='World population (millions)',
title='CE population estimates')
plt.legend(fontsize='small');
See if you can find a model that fits these data well from Year 0 to 1950.
How well does your best model predict actual population growth from 1950 to the present?
In [25]:
# Solution
# The function I found that best matches the data has the form
# a + b / (c - x)
# This function is hard to explain physically; that is, it doesn't
# correspond to a growth model that makes sense in terms of human behavior.
# And it implies that the population goes to infinity in 2040.
xs = linspace(100, 1950)
ys = 110 + 200000 / (2040 - xs)
table1.plot()
plot(xs, ys, color='gray', label='model')
decorate(xlim=[0, 2000], xlabel='Year',
ylabel='World population (millions)',
title='CE population estimates')
plt.legend(fontsize='small');
In [26]:
# Solution
# And it doesn't do a particularly good job of predicting
# actual growth from 1940 to the present.
plot(census, ':', label='US Census')
plot(un, '--', label='UN DESA')
xs = linspace(1940, 2020)
ys = 110 + 200000 / (2040 - xs)
plot(xs, ys/1000, color='gray', label='model')
decorate(xlim=[1950, 2016], xlabel='Year',
ylabel='World population (billions)',
title='Prehistoric population estimates')
In [ ]: