This tutorial uses the same example as the example introduction. In other words:
In [1]:
import numpy
from matplotlib import pyplot
import chaospy
from chaospy.example import (
coordinates, exponential_model, distribution)
Monte Carlo is the simplest of all collocation methods. It consist of the following steps:
As an alternative to (pseudo-)random, it is also possible to use a low-discrepancy sequences. These are structured sequences designed to behave as random, but also to minimize the gaps in the sampling, leading to better performance.
Below we will do the analysis both for classical (pseudo-)random samples, but also for the Halton sequence. Other low-discrepancy sequences also exist.
In [2]:
# Sampling schemes we want to include
rules = ["random", "halton"]
# Fix random seed to enforce determinism
numpy.random.seed(123)
# Generate samples for both schemes
number_of_samples = 10000
samples = {rule: distribution.sample(
number_of_samples, rule=rule) for rule in rules}
samples["random"].shape, samples["halton"].shape
Out[2]:
The two schemes represents two different approaches.
While random
is designed to pass as random as well as possible, halton
is designed to not cluster.
This is quite visible when visualizing the samples.
In [3]:
# increase figure width to better fit two plots
pyplot.rc("figure", figsize=[9, 4])
for idx, rule in enumerate(rules):
pyplot.subplot(1, 2, idx+1)
# Show samples as 2-dimensional scatter plot
pyplot.scatter(*samples[rule][:, :100], marker="x", color="k")
# Make scatter ficutres pretty
pyplot.xlabel("init (normal)")
pyplot.ylabel("rate (uniform)") if not idx else None
pyplot.axis([1, 2, 0.1, 0.2])
pyplot.title(rule)
pyplot.show()
In [4]:
# Evaluate model predictor for every sample
evaluations = {
rule: numpy.array([exponential_model(sample)
for sample in samples[rule].T])
for rule in rules
}
evaluations["random"].shape, evaluations["halton"].shape
Out[4]:
These model evaluations can be visualized, however ti is hard to observe the difference between the two schemes from these plots.
In [5]:
for idx, rule in enumerate(rules):
pyplot.subplot(1, 2, idx+1)
# Make a plot line for the first 50 evaluations
for evals in evaluations[rule][:50]:
pyplot.plot(coordinates, evals, "k-", alpha=0.3)
# Make plot pretty
pyplot.xlabel("coordinates $t$")
pyplot.ylabel("model evaluations $u$") if not idx else None
pyplot.title(rule)
pyplot.axis([0, 10, 0, 2])
pyplot.show()
While the true statistics are calculated from the mathematical definition, in Monte Carlo integration, the statistics are calculated using the empirical counterpart for each metric. For mean and variance, these counterparts are average and sample variance:
$$ \mbox E(U) \approx \tfrac 1N \sum_{n=1}^N U_n \qquad \mbox{Var}(U) \approx \left(\tfrac 1N\sum U_n^2\right) - \left(\tfrac 1N \sum_{n=1}^N U_n\right)^2 $$Which numerically becomes:
In [6]:
for idx, rule in enumerate(rules):
pyplot.subplot(1, 2, idx+1)
# Estimate mean and variance
expected = numpy.mean(evaluations[rule], axis=0)
variance = numpy.var(evaluations[rule], axis=0)
# Create band one standard deviation away from mean
pyplot.fill_between(coordinates, expected-variance**0.5,
expected+variance**0.5, alpha=0.3,
color="k", label="standard deviation")
pyplot.plot(coordinates, expected, "k-",
label="Estimated expected value")
# Make plot pretty
pyplot.xlabel("coordinates $t$")
pyplot.axis([0, 10, 0, 2])
pyplot.title(rule)
if idx:
pyplot.legend()
else:
pyplot.ylabel("Model evaluations $u$")
pyplot.show()
It is hard to assess how well these models are doing from the final estimation alone. They look about the same. So to compare results, we do error analysis. To do so, we use the reference analytical solution and error function as defined in example introduction.
In [7]:
from chaospy.example import error_mean, error_variance
To illustrate the difference in a common format, we plot error as a function of the number of samples used in a logarithmic plot.
In [8]:
# Estimate mean and variance for various number of samples
sizes = (10**numpy.linspace(1, 4, 20)).astype(int)
estimated_mean = {
rule: [numpy.mean(evaluations[rule][:size], axis=0)
for size in sizes] for rule in rules
}
estimated_variance = {
rule: [numpy.var(evaluations[rule][:size], axis=0)
for size in sizes] for rule in rules
}
for idx, rule in enumerate(rules):
pyplot.subplot(1, 2, idx+1)
# error plot for mean
errors = [error_mean(mean)
for mean in estimated_mean[rule]]
pyplot.loglog(sizes, errors, "k-", label="mean")
# error plot for variance
errors = [error_variance(variance)
for variance in estimated_variance[rule]]
pyplot.loglog(sizes, errors, "k--", label="variance")
# make plot pretty
pyplot.axis([10, 10000, 1e-5, 1e-1])
pyplot.title(rule)
pyplot.xlabel("Number of samples")
pyplot.legend() if idx else pyplot.ylabel("Error")
pyplot.show()