```
In [1]:
```# Configure Jupyter so figures appear in the notebook
%matplotlib inline
# Configure Jupyter to display the assigned value after an assignment
%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'
# import functions from the modsim.py module
from modsim import *

Each year the U.S. Atlantic Salmon Assessment Committee reports estimates of salmon populations in oceans and rivers in the northeastern United States. The reports are useful for monitoring changes in these populations, but they generally do not include predictions.

The goal of this case study is to model year-to-year changes in population, evaluate how predictable these changes are, and estimate the probability that a particular population will increase or decrease in the next 10 years.

As an example, I'll use data from page 18 of the 2017 report, which provides population estimates for the Narraguagus and Sheepscot Rivers in Maine.

At the end of this notebook, I make some suggestions for extracting data from a PDF document automatically, but for this example I will keep it simple and type it in.

Here are the population estimates for the Narraguagus River:

```
In [2]:
```pops = [2749, 2845, 4247, 1843, 2562, 1774, 1201, 1284, 1287, 2339, 1177, 962, 1176, 2149, 1404, 969, 1237, 1615, 1201];

To get this data into a Pandas Series, I'll also make a range of years to use as an index.

```
In [3]:
```years = range(1997, 2016)

```
Out[3]:
```

And here's the series.

```
In [4]:
```pop_series = TimeSeries(pops, index=years, dtype=np.float64)

```
Out[4]:
```

Here's what it looks like:

```
In [5]:
```def plot_population(series):
plot(series, label='Estimated population')
decorate(xlabel='Year',
ylabel='Population estimate',
title='Narraguacus River',
ylim=[0, 5000])
plot_population(pop_series)

```
```

```
In [6]:
```abs_diffs = np.ediff1d(pop_series, to_end=0)

```
Out[6]:
```

We can compute relative differences by dividing by the original series elementwise.

```
In [7]:
```rel_diffs = abs_diffs / pop_series

```
Out[7]:
```

Or we can use the `modsim`

function `compute_rel_diff`

:

```
In [8]:
```rel_diffs = compute_rel_diff(pop_series)

```
Out[8]:
```

These relative differences are observed annual net growth rates. So let's drop the `0`

and save them.

```
In [9]:
```rates = rel_diffs.drop(2015)

```
Out[9]:
```

`choice`

to make a random choice from a series.

```
In [10]:
```np.random.choice(rates)

```
Out[10]:
```

```
In [11]:
```t_0 = 2015
p_0 = pop_series[t_0]

```
Out[11]:
```

Create a `System`

object with variables `t_0`

, `p_0`

, `rates`

, and `duration=10`

years.

The series of observed rates is one big parameter of the model.

```
In [12]:
```system = System(t_0=t_0,
p_0=p_0,
duration=10,
rates=rates)

```
Out[12]:
```

`pop`

, `t`

, and `system`

.
It should choose a random growth rate, compute the change in population, and return the new population.

```
In [13]:
```# Solution
def update_func1(pop, t, system):
"""Simulate one time step.
pop: population
t: time step
system: System object
"""
rate = np.random.choice(system.rates)
pop += rate * pop
return pop

Test your update function and run it a few times

```
In [14]:
```update_func1(p_0, t_0, system)

```
Out[14]:
```

Here's a version of `run_simulation`

that stores the results in a `TimeSeries`

and returns it.

```
In [15]:
```def run_simulation(system, update_func):
"""Simulate a queueing system.
system: System object
update_func: function object
"""
t_0 = system.t_0
t_end = t_0 + system.duration
results = TimeSeries()
results[t_0] = system.p_0
for t in linrange(t_0, t_end):
results[t+1] = update_func(results[t], t, system)
return results

Use `run_simulation`

to run generate a prediction for the next 10 years.

The plot your prediction along with the original data. Your prediction should pick up where the data leave off.

```
In [16]:
```# Solution
results = run_simulation(system, update_func1)
plot(results, label='Simulation')
plot_population(pop_series)

```
```

```
In [17]:
```def plot_many_simulations(system, update_func, iters):
"""Runs simulations and plots the results.
system: System object
update_func: function object
iters: number of simulations to run
"""
for i in range(iters):
results = run_simulation(system, update_func)
plot(results, color='gray', linewidth=5, alpha=0.1)

The plot option `alpha=0.1`

makes the lines semi-transparent, so they are darker where they overlap.

Run `plot_many_simulations`

with your update function and `iters=30`

. Also plot the original data.

```
In [18]:
```# Solution
plot_many_simulations(system, update_func1, 30)
plot_population(pop_series)

```
```

The results are highly variable: according to this model, the population might continue to decline over the next 10 years, or it might recover and grow rapidly!

It's hard to say how seriously we should take this model. There are many factors that influence salmon populations that are not included in the model. For example, if the population starts to grow quickly, it might be limited by resource limits, predators, or fishing. If the population starts to fall, humans might restrict fishing and stock the river with farmed fish.

So these results should probably not be considered useful predictions. However, there might be something useful we can do, which is to estimate the probability that the population will increase or decrease in the next 10 years.

```
In [19]:
```def run_many_simulations(system, update_func, iters):
"""Runs simulations and report final populations.
system: System object
update_func: function object
iters: number of simulations to run
returns: series of final populations
"""
# FILL THIS IN

```
In [20]:
```# Solution
def run_many_simulations(system, update_func, iters):
"""Runs simulations and report final populations.
system: System object
update_func: function object
iters: number of simulations to run
returns: series of final populations
"""
last_pops = ModSimSeries()
for i in range(iters):
results = run_simulation(system, update_func)
last_pops[i] = get_last_value(results)
return last_pops

Test your function by running it with `iters=5`

.

```
In [21]:
```run_many_simulations(system, update_func1, 5)

```
Out[21]:
```

Now we can run 1000 simulations and describe the distribution of the results.

```
In [22]:
```last_pops = run_many_simulations(system, update_func1, 1000)
last_pops.describe()

```
Out[22]:
```

If we substract off the initial population, we get the distribution of changes.

```
In [23]:
```net_changes = last_pops - p_0
net_changes.describe()

```
Out[23]:
```

The median is negative, which indicates that the population decreases more often than it increases.

We can be more specific by counting the number of runs where `net_changes`

is positive.

```
In [24]:
```np.sum(net_changes > 0)

```
Out[24]:
```

Or we can use `mean`

to compute the fraction of runs where `net_changes`

is positive.

```
In [25]:
```np.mean(net_changes > 0)

```
Out[25]:
```

And here's the fraction where it's negative.

```
In [26]:
```np.mean(net_changes < 0)

```
Out[26]:
```

There are a few ways we could improve the model.

It looks like there might be cyclic behavior in the past data, with a period of 4-5 years. We could extend the model to include this effect.

Older data might not be as relevant for prediction as newer data, so we could give more weight to newer data.

The second option is easier to implement, so let's try it.

I'll use `linspace`

to create an array of "weights" for the observed rates. The probability that I choose each rate will be proportional to these weights.

The weights have to add up to 1, so I divide through by the total.

```
In [27]:
```weights = linspace(0, 1, len(rates))
weights /= sum(weights)
plot(weights)
decorate(xlabel='Index into the rates array',
ylabel='Weight')

```
```

I'll add the weights to the `System`

object, since they are parameters of the model.

```
In [28]:
```system.weights = weights

We can pass these weights as a parameter to `np.random.choice`

(see the documentation)

```
In [29]:
```np.random.choice(system.rates, p=system.weights)

```
Out[29]:
```

Write an update function that takes the weights into account.

```
In [30]:
```# Solution
def update_func2(pop, t, system):
"""Simulate one time step.
pop: population
t: time step
system: System object
"""
rate = np.random.choice(system.rates, p=system.weights)
pop += rate * pop
return pop

Use `plot_many_simulations`

to plot the results.

```
In [31]:
```# Solution
plot_many_simulations(system, update_func2, 30)
plot_population(pop_series)

```
```

`run_many_simulations`

to collect the results and `describe`

to summarize the distribution of net changes.

```
In [32]:
```# Solution
last_pops = run_many_simulations(system, update_func2, 1000)
net_changes = last_pops - p_0
net_changes.describe()

```
Out[32]:
```

Does the refined model have much effect on the probability of population decline?

```
In [33]:
```# Solution
np.mean(net_changes < 0)

```
Out[33]:
```

```
In [34]:
```from tabula import read_pdf

```
In [35]:
```df = read_pdf('data/USASAC2018-Report-30-2017-Activities-Page11.pdf')

```
Out[35]:
```