From page 5 of Thinking Complexity:
Deterministic → stochastic: Classical models are usually deterministic, which may reflect underlying philosophical determinism, discussed in Chapter 6; complex models often feature randomness.
In order to incorporate randomness into our models, we need to understand basic distributions and learn how to work with them in Python. The notebook below covers the basic shape, parameters, and sampling of the following distributions:
In [1]:
# Imports
import numpy
import scipy.stats
import matplotlib.pyplot as plt
# Setup seaborn for plotting
import seaborn; seaborn.set()
# Import widget methods
from IPython.html.widgets import *
The continous uniform distribution is one of the most commonly utilized distributions. As its name implies, it is characterized by a uniform or equal probability of any point being drawn from the distribution. This is clear from the probability density function (PDF) below:
We can sample a continuous uniform distribution using the numpy.random.uniform
method below.
In [8]:
numpy.random.uniform(-1, 1, size=3)
Out[8]:
In the example below, we will visualize the distribution of size=100
continous uniform samples. This particular type of visualization is called a histogram.
In [11]:
%matplotlib inline
# Sample random data
r = numpy.random.uniform(0, 1, size=100)
p = plt.hist(r)
In the interactive tool below, we will explore how a random sample drawn from the continuous uniform distribution varies with:
Try varying the number of samples in the single digits, then slowly increase the number to 1000. How does the "smoothness" of the average sample vary? Compare to the probability density function figure above.
In [15]:
def plot_continuous_uniform(range_min=0, range_max=1, samples=100):
"""
A continuous uniform plotter that takes min/max range and sample count.
"""
# Check assertions
assert (range_min < range_max)
assert (samples > 1)
# Sample random data
r = numpy.random.uniform(range_min, range_max, samples)
p = plt.hist(r)
# Call the ipython interact() method to allow us to explore the parameters and sampling
interact(plot_continuous_uniform, range_min=(0, 10),
range_max = (1, 20),
samples = (2, 1000))
Out[15]:
The discrete uniform distribution) is another commonly utilized distributions. As its name implies, it is characterized by a uniform or equal probability of any point being drawn from the distribution. This is clear from the probability density function (PDF) below:
We can sample a discrete uniform distribution using the numpy.random.randint
method below.
In [17]:
numpy.random.randint(0, 10, size=3)
Out[17]:
In [18]:
# Sample random data
r = numpy.random.randint(0, 10, size=100)
p = plt.hist(r)
In the interactive tool below, we will explore how a random sample drawn from the discrete uniform distribution varies with:
Try varying the number of samples in the single digits, then slowly increase the number to 1000. How does the "smoothness" of the average sample vary? Compare to the probability density function figure above.
In [25]:
def plot_discrete_uniform(range_min=0, range_max=10, samples=100):
"""
A discrete uniform plotter that takes min/max range and sample count.
"""
# Check assertions
assert (range_min < range_max)
assert (samples > 1)
# Sample random data
r = numpy.random.randint(range_min, range_max, samples)
p = plt.hist(r)
# Call the ipython interact() method to allow us to explore the parameters and sampling
interact(plot_discrete_uniform, range_min=(-10, 10),
range_max = (-9, 20),
samples = (2, 1000))
Out[25]:
The normal distribution, commonly referred to as the "bell curve", is one of the most commmonly occuring continuous distributions in nature. It is characterized by its symmetry and its dispersion parameter, referred to as standard deviation. 68% of the distribution's probability mass falls within +/-1 standard deviation, and 95% of the probability mass falls within +/-2 standard deviations.
The normal distribution's probability density function (PDF) is below:
We can sample a normal distribution using the numpy.random.normal
method below.
In [28]:
numpy.random.normal(10, 3, size=3)
Out[28]:
In [34]:
# Sample random data
r = numpy.random.normal(10, 3, size=100)
p = plt.hist(r)
In the interactive tool below, we will explore how a random sample drawn from the normal distribution varies with:
In addition to a histogram, this tool also shows a kernel density estimate (KDE). We can use KDEs to provide us with estimates of probability density functions, either for analysis and comparison or to use in further generative contexts to sample new values.
In [38]:
def plot_normal(mean=0, standard_deviation=10, samples=100, window_range=100):
# Check assertions
assert (standard_deviation > 0)
assert (samples > 1)
# Sample random data and visualization
r = numpy.random.normal(mean, standard_deviation,
size=samples)
p = plt.hist(r, normed=True)
# Calculate the kernel density estimate and overplot it on the histogram
kernel = scipy.stats.gaussian_kde(r)
r_range = numpy.linspace(min(r), max(r))
plt.plot(r_range, kernel(r_range))
# Set the x limits
plt.xlim(min(-window_range, min(r)), max(window_range, max(r)))
# Create the widget
interact(plot_normal, mean=(-25, 25),
standard_deviation = (1, 100),
samples = (2, 1000),
window_range = (1, 100))
Out[38]:
The Poisson distribution is, in Wikipedia's words:
a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event. The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume.
The Poisson distribution's probability density function (PDF) is below:
We can sample a normal distribution using the numpy.random.poisson
method below.
In [40]:
numpy.random.poisson(5, size=3)
Out[40]:
In [41]:
# Sample random data
r = numpy.random.poisson(5, size=100)
p = plt.hist(r)
In the interactive tool below, we will explore how a random sample drawn from the Poisson distribution varies with:
In addition to a histogram, this tool again shows a kernel density estimate (KDE). Compare the KDE to the probability density function above.
In [44]:
def plot_poisson(rate=5, samples=100, window_range=20):
# Check assertions
assert (rate > 0)
assert (samples > 1)
# Sample random data
r = numpy.random.poisson(rate, size=samples)
f = plt.figure()
p = plt.hist(r, normed=True)
# Calculate the KDE and overplot
kernel = scipy.stats.gaussian_kde(r)
r_range = numpy.linspace(min(r), max(r))
plt.plot(r_range, kernel(r_range))
# Set the x limits
plt.xlim(-1, max(max(r), window_range))
# Create the ipython widget
interact(plot_poisson, rate=(1, 100),
samples = (2, 10000),
window_range = (1, 100))