Monte Carlo methods and simulations are philosophically related to and often incorporated into agent-based models. In the words of Wiki:
Sawilowsky[9] distinguishes between a simulation, a Monte Carlo method, and a Monte Carlo simulation: a simulation is a fictitious representation of reality, a Monte Carlo method is a technique that can be used to solve a mathematical or statistical problem, and a Monte Carlo simulation uses repeated sampling to determine the properties of some phenomenon (or behavior). Examples:
* Simulation: Drawing one pseudo-random uniform variable from the interval [0,1] can be used to simulate the tossing of a coin: If the value is less than or equal to 0.50 designate the outcome as heads, but if the value is greater than 0.50 designate the outcome as tails. This is a simulation, but not a Monte Carlo simulation. * Monte Carlo method: Pouring out a box of coins on a table, and then computing the ratio of coins that land heads versus tails is a Monte Carlo method of determining the behavior of repeated coin tosses, but it is not a simulation. * Monte Carlo simulation: Drawing a large number of pseudo-random uniform variables from the interval [0,1], and assigning values less than or equal to 0.50 as heads and greater than 0.50 as tails, is a Monte Carlo simulation of the behavior of repeatedly tossing a coin.
One classic problem in ecology and finance is hitting time, stopping time, or passage time problem. In the first hitting time problem, one models the state of some variable over time until it hits a certain condition, at which point the process stops.
For example, one might model the level of some renewable resource like a forest or fishery. So long as the resource level is greater than zero, the resource can continue to regenerate; however, if at any point the "last" unit of the resource is extracted, then the resource dies. Thus, we might simulate the survival of our resource under certain dynamics of growth or extraction. Let's walk few examples below.
In our first example, let's start by simulating a single forest under the following assumptions:
initial_resource
units of forest.[min_consumption, max_consumption)
for extraction. We decrease the forest by subtracting this amount.[min_growth, max_growth)
for growth. We increase the forest by adding this amount.
In [232]:
# Imports
import numpy
import matplotlib.pyplot as plt
import pandas
import seaborn; seaborn.set()
# Import widget methods
from IPython.html.widgets import *
In [233]:
# Set model parameters
initial_resource = 2
min_consumption = 2
max_consumption = 3
min_growth = 2
max_growth = 3
max_steps = 100000
# Set model state variables
current_resource = initial_resource
resource_history = [current_resource]
growth_history = []
consumption_history = []
# Continue running until the resource hits at or below zero
while current_resource > 0:
# Sample growth
growth = numpy.random.uniform(min_growth, max_growth)
# Sample consumption
consumption = numpy.random.uniform(min_consumption, max_consumption)
# Change forest resource level
current_resource += growth - consumption
# Keep track of history
resource_history.append(current_resource)
growth_history.append(growth)
consumption_history.append(consumption)
# Break at max steps
if len(resource_history) > max_steps:
break
In [234]:
%matplotlib inline
# Plot the time series for each component
f = plt.figure()
plt.plot(resource_history)
plt.xlabel('Time')
plt.ylabel('Resource level')
plt.title('Resource level during a single simulation')
Out[234]:
In [235]:
def run_simulation(initial_resource = 2, min_consumption = 2, max_consumption = 3,
min_growth = 2, max_growth = 3, max_steps=100000):
"""
Simulate a resource under uniform arithmetic growth and consumption.
"""
# Set model state variables
current_resource = initial_resource
resource_history = [current_resource]
growth_history = []
consumption_history = []
# Continue running until the resource hits at or below zero
while current_resource > 0:
# Sample growth
growth = numpy.random.uniform(min_growth, max_growth)
# Sample consumption
consumption = numpy.random.uniform(min_consumption, max_consumption)
# Change forest resource level
current_resource += growth - consumption
# Keep track of history
resource_history.append(current_resource)
growth_history.append(growth)
consumption_history.append(consumption)
# Break at max steps
if len(resource_history) > max_steps:
break
# Plot the time series for each component
f = plt.figure()
plt.plot(resource_history)
plt.xlabel('Time')
plt.ylabel('Resource level')
plt.title('Resource level during a single simulation')
# Call the ipython interact() method to allow us to explore the parameters and sampling
interact(run_simulation, initial_resource=(1, 1000),
min_consumption = (0, 20),
max_consumption = (0, 20),
min_growth = (0, 20),
max_growth = (0, 20))
Out[235]:
As you adjust the sliders in the applet above, you will likely see very different simulation outcomes. In some cases, the forest may die off quickly; in others, it may survive so long as to cause your computer to halt the simulation!
The important idea to understand here is that each simulation is just that - a single simulation. Regardless of what might happen in a single simulation, we can not make general claims about our system from a single simulation. Our goal is to understand the behavior of our system through the distribution of many simulations. So instead of looking at a single simulation and basing our hitting time on this simulation, we should instead generate many simulations and look at their distribution over time.
Let's do this below by improving our single simulation method to return the resource time series. Then, we can wrap this method in another method that will generate many forest simulations for analysis.
In [236]:
def run_simulation(initial_resource=1, min_consumption=0.1, max_consumption=0.2,
min_growth=0.1, max_growth=0.2, max_steps=100000):
"""
Simulate a resource under uniform arithmetic growth and consumption.
"""
# Set model state variables
current_resource = initial_resource
resource_history = [current_resource]
growth_history = []
consumption_history = []
# Continue running until the resource hits at or below zero
while current_resource > 0:
# Sample growth
growth = numpy.random.uniform(min_growth, max_growth)
# Sample consumption
consumption = numpy.random.uniform(min_consumption, max_consumption)
# Change forest resource level
current_resource += growth - consumption
# Keep track of history
resource_history.append(current_resource)
growth_history.append(growth)
consumption_history.append(consumption)
# Break at max steps
if len(resource_history) > max_steps:
break
# Plot the time series for each component
return resource_history
def run_monte_carlo(initial_resource=1, min_consumption=0.1, max_consumption=0.2,
min_growth=0.1, max_growth=0.2, max_steps=100000, num_samples = 100):
"""
Run many individual simulations and output a summary of the Monte Carlo experiment.
"""
# Sample data
sample_results = []
# Run simulations
for sample_id in range(num_samples):
sample_results.append(run_simulation(initial_resource, min_consumption, max_consumption,
min_growth, max_growth, max_steps))
return pandas.DataFrame(sample_results).T
def plot_monte_carlo_runs(initial_resource=1, min_consumption=0.1, max_consumption=0.2,
min_growth=0.1, max_growth=0.2, max_steps=100000, num_samples = 100):
# Get data and plot
simulation_data = run_monte_carlo(initial_resource, min_consumption, max_consumption,
min_growth, max_growth, max_steps, num_samples)
f = plt.figure()
simulation_data.plot(legend=False, alpha=0.5, linewidth=0.5, linestyle='-')
return simulation_data
In [237]:
%%time
# Plot one set of 1000 simulations
simulation_data = plot_monte_carlo_runs(initial_resource=1, min_consumption=0.1, max_consumption=0.2,
min_growth=0.1, max_growth=0.2, num_samples=500)
Instead of trying to make sense of all 1000 samples by plotting them simultaneously, we might instead plot the average resource level at a fixed time $t$. Additionally, we might augment this average with a view of the standard deviation or fixed percentile. In the examples below, we'll show both options.
In [238]:
# Plot the average line
f = plt.figure()
average_ts = simulation_data.fillna(0).mean(axis=1)
average_ts.plot()
Out[238]:
In [239]:
# Plot the average line
f = plt.figure()
average_ts = simulation_data.fillna(0).mean(axis=1)
average_ts.plot()
# Overplot the +/- standard deviation bounds
# N.B.: We can vary the treatment of "dead" sims using
# the .std(skipna=True/False) argument.
standard_deviation_ts = simulation_data.fillna(0).std(axis=1)
lower_bound_ts = average_ts - standard_deviation_ts
upper_bound_ts = average_ts + standard_deviation_ts
plt.fill_between(average_ts.index, lower_bound_ts, upper_bound_ts,
alpha=0.25)
Out[239]:
In [240]:
# Get the 10th, median, and 90th percentiles
bound_ts = simulation_data.fillna(0).quantile([0.25, 0.5, 0.75], axis=1).T
bound_ts.columns = ['lower', 'median', 'upper']
plt.plot(bound_ts.index, bound_ts['median'], alpha=0.5)
plt.fill_between(bound_ts.index, bound_ts['lower'], bound_ts['upper'],
alpha=0.25)
Out[240]:
In [241]:
# Get the 10th, median, and 90th percentiles
bound_ts = bound_ts.loc[0:bound_ts.sum(axis=1).idxmin(), :]
plt.plot(bound_ts.index, bound_ts['median'], alpha=0.5)
plt.fill_between(bound_ts.index, bound_ts['lower'], bound_ts['upper'],
alpha=0.25)
Out[241]:
Next, let's examine the distribution of resource levels at a fixed point in time. In the example below, we are taking a cross-section at time $t=100$ to generate our histogram.
Note that we have NA
values in our time series for steps "past" the end of a simulation. In order to include simulations that have already died out prior to $t=100$, we need to fill these NA
values with zero. We can do this with the pandas.DataFrame.fillna
method.
In [242]:
# Plot the distribution of resource values at time t=10
simulation_data.loc[100, :].fillna(0).hist()
Out[242]:
In [243]:
# Now, let's plot the distribution of first hitting times.
first_hitting_times = simulation_data.isnull().idxmax()
first_hitting_times.hist(bins=100)
Out[243]:
In [244]:
def run_simulation_fast(initial_resource=1, min_consumption=0.1, max_consumption=0.2,
min_growth=0.1, max_growth=0.2, max_steps=100000, step_size=10000):
"""
Simulate a resource under uniform arithmetic growth and consumption.
"""
# Set model state variables
current_resource = initial_resource
resource_history = [current_resource]
growth_history = []
consumption_history = []
# Continue running until the resource hits at or below zero
while resource_history[-1] > 0:
# Sample growth
growth = numpy.random.uniform(min_growth, max_growth, step_size)
# Sample consumption
consumption = numpy.random.uniform(min_consumption, max_consumption, step_size)
# Keep track of history
resource_history.extend(resource_history[-1] + numpy.cumsum(growth) - numpy.cumsum(consumption))
growth_history.extend(growth)
consumption_history.extend(consumption)
# Break at max steps or negative value
if len(resource_history) > max_steps:
break
# Truncate "extra" negative samples and return
if resource_history[-1] <= 0:
# Get first negative index
first_negative_index = numpy.min(numpy.where(numpy.sign(resource_history) < 1))
return resource_history[0:first_negative_index]
else:
# If last value was > 0, then return full history.
return resource_history
In [245]:
%%timeit -n 20
resource_history = run_simulation(initial_resource=1)
In [246]:
%%timeit -n 20
resource_history = run_simulation_fast(initial_resource=1)
In [247]:
%%timeit -n 20
resource_history = run_simulation(initial_resource=10)
In [248]:
%%timeit -n 20
resource_history = run_simulation_fast(initial_resource=10)
In the first head-to-head comparison, our initial conditions resulted in forests that had comparatively lower expected lifetimes. As a result, many of the simulations ended quickly, and the "batch" simulation of many steps resulted in "wasted" samples corresponding to dead forests that had to be thrown away.
In the second head-to-head comparison, our initial conditions resulted in forests that had much longer expected lifetimes; many of the simulations will make it to the max_steps
lifetime, at which time the simulation will halt. As a result, we were able to amortize the batch random number sampling over much larger simulation periods, and the fast method's performance was even better.
In [249]:
def run_simulation_fastest(initial_resource=1, min_consumption=0.1, max_consumption=0.2,
min_growth=0.1, max_growth=0.2, max_steps=100000):
"""
Simulate a resource under uniform arithmetic growth and consumption.
"""
# Set model state variables
resource_history = initial_resource + \
numpy.cumsum(numpy.random.uniform(min_growth, max_growth, max_steps)) - \
numpy.cumsum(numpy.random.uniform(min_consumption, max_consumption, max_steps))
# Truncate "extra" negative samples and return
if resource_history[-1] <= 0:
# Get first negative index
first_negative_index = numpy.min(numpy.where(numpy.sign(resource_history) < 1))
return resource_history[0:first_negative_index]
else:
# If last value was > 0, then return full history.
return resource_history
In [250]:
%%timeit -n 20
resource_history = run_simulation(initial_resource=1)
In [251]:
%%timeit -n 20
resource_history = run_simulation_fast(initial_resource=1)
In [252]:
%%timeit -n 20
resource_history = run_simulation_fastest(initial_resource=1)
In [253]:
%%timeit -n 20
resource_history = run_simulation(initial_resource=10)
In [254]:
%%timeit -n 20
resource_history = run_simulation_fast(initial_resource=10)
In [255]:
%%timeit -n 20
resource_history = run_simulation_fastest(initial_resource=10)
If you're recall the section on algorithmic complexity from Chapter 3 of our Thinking Complexity book, you might notice that our "fastest" method appears to have a run time that doesn't vary with initial_resource
, whereas the "normal" and "fast" methods do. In this case, we have an algorithm that is linear with max_steps
but constant with respect to all other parameters.
In general, we cannot always implement simulations in this fashion. However, here are a few hints or restatements that you might be able to improve your simulation: