```
In [1]:
```# Configure Jupyter so figures appear in the notebook
%matplotlib inline
# Configure Jupyter to display the assigned value after an assignment
%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'
# import functions from the modsim.py module
from modsim import *

```
In [2]:
```from pandas import read_html

The data directory contains a downloaded copy of https://en.wikipedia.org/wiki/World_population_estimates

The arguments of `read_html`

specify the file to read and how to interpret the tables in the file. The result, `tables`

, is a sequence of `DataFrame`

objects; `len(tables)`

reports the length of the sequence.

```
In [3]:
```filename = 'data/World_population_estimates.html'
tables = read_html(filename, header=0, index_col=0, decimal='M')
len(tables)

We can select the `DataFrame`

we want using the bracket operator. The tables are numbered from 0, so `tables[2]`

is actually the third table on the page.

`head`

selects the header and the first five rows.

```
In [4]:
```table2 = tables[2]
table2.head()

`tail`

selects the last five rows.

```
In [5]:
```table2.tail()

Long column names are awkard to work with, but we can replace them with abbreviated names.

```
In [6]:
```table2.columns = ['census', 'prb', 'un', 'maddison',
'hyde', 'tanton', 'biraben', 'mj',
'thomlinson', 'durand', 'clark']

Here's what the DataFrame looks like now.

```
In [7]:
```table2.head()

The first column, which is labeled `Year`

, is special. It is the **index** for this `DataFrame`

, which means it contains the labels for the rows.

Some of the values use scientific notation; for example, `2.544000e+09`

is shorthand for $2.544 \cdot 10^9$ or 2.544 billion.

`NaN`

is a special value that indicates missing data.

```
In [8]:
```census = table2.census
census.head()

```
In [9]:
```census.tail()

Like a `DataFrame`

, a `Series`

contains an index, which labels the rows.

`1e9`

is scientific notation for $1 \cdot 10^9$ or 1 billion.

From here on, we will work in units of billions.

```
In [10]:
```un = table2.un / 1e9
un.head()

```
In [11]:
```census = table2.census / 1e9
census.head()

Here's what these estimates look like.

```
In [12]:
```plot(census, ':', label='US Census')
plot(un, '--', label='UN DESA')
decorate(xlabel='Year',
ylabel='World population (billion)')
savefig('figs/chap05-fig01.pdf')

The following expression computes the elementwise differences between the two series, then divides through by the UN value to produce relative errors, then finds the largest element.

So the largest relative error between the estimates is about 1.3%.

```
In [13]:
```max(abs(census - un) / un) * 100

**Exercise:** Break down that expression into smaller steps and display the intermediate results, to make sure you understand how it works.

- Compute the elementwise differences,
`census - un`

- Compute the absolute differences,
`abs(census - un)`

- Compute the relative differences,
`abs(census - un) / un`

- Compute the percent differences,
`abs(census - un) / un * 100`

```
In [14]:
``````
# Solution goes here
```

```
In [15]:
``````
# Solution goes here
```

```
In [16]:
``````
# Solution goes here
```

```
In [17]:
``````
# Solution goes here
```

`max`

and `abs`

are built-in functions provided by Python, but NumPy also provides version that are a little more general. When you import `modsim`

, you get the NumPy versions of these functions.

We can select a value from a `Series`

using bracket notation. Here's the first element:

```
In [18]:
```census[1950]

And the last value.

```
In [19]:
```census[2016]

But rather than "hard code" those dates, we can get the first and last labels from the `Series`

:

```
In [20]:
```t_0 = get_first_label(census)

```
In [21]:
```t_end = get_last_label(census)

```
In [22]:
```elapsed_time = t_end - t_0

And we can get the first and last values:

```
In [23]:
```p_0 = get_first_value(census)

```
In [24]:
```p_end = get_last_value(census)

Then we can compute the average annual growth in billions of people per year.

```
In [25]:
```total_growth = p_end - p_0

```
In [26]:
```annual_growth = total_growth / elapsed_time

Now let's create a `TimeSeries`

to contain values generated by a linear growth model.

```
In [27]:
```results = TimeSeries()

`TimeSeries`

is empty, but we can initialize it so the starting value, in 1950, is the 1950 population estimated by the US Census.

```
In [28]:
```results[t_0] = census[t_0]
results

After that, the population in the model grows by a constant amount each year.

```
In [29]:
```for t in linrange(t_0, t_end):
results[t+1] = results[t] + annual_growth

Here's what the results looks like, compared to the actual data.

```
In [30]:
```plot(census, ':', label='US Census')
plot(un, '--', label='UN DESA')
plot(results, color='gray', label='model')
decorate(xlabel='Year',
ylabel='World population (billion)',
title='Constant growth')
savefig('figs/chap05-fig02.pdf')

The model fits the data pretty well after 1990, but not so well before.

**Optional Exercise:** Try fitting the model using data from 1970 to the present, and see if that does a better job.

Hint:

Copy the code from above and make a few changes. Test your code after each small change.

Make sure your

`TimeSeries`

starts in 1950, even though the estimated annual growth is based on later data.You might want to add a constant to the starting value to match the data better.

```
In [31]:
``````
# Solution goes here
```

```
In [32]:
```census.loc[1960:1970]

```
In [ ]:
```