This lesson was developed using materials from the Computational Physics book by Mark Newman at University of Michigan and materials prepared by Matt Moelter and Jodi Christiansen for PHYS 202 at Cal Poly.
Instructions: Create a new directory called MonteCarlo
with a notebook called MonteCarloTour
. Give it a heading 1 cell title Monte Carlo Sampling and Simulation. Read this page, typing in the code in the code cells and executing them as you go.
Do not copy/paste.
Type the commands yourself to get the practice doing it. This will also slow you down so you can think about the commands and what they are doing as you type them.</font>
Save your notebook when you are done, then try the accompanying exercises.
In [ ]:
%pylab inline
import numpy as np
import matplotlib.pyplot as plot
We've seen how to produce statistically random numbers distributed uniformly over some interval, but what if we wanted to increase the number of samples in a particular region of $x$? What if we wanted the probability to be distributed according to some function, $f(x)$? The probability of getting a particular value $x_i$ should depend on the value of $f(x_i)$. We have two primary ways to go about this:
Rejection Sampling is similar to what we did when we learned about "Hit or Miss" Monte Carlo integration. As a reminder, the crux of it for one-dimensional functions is that we (1) generated a pair of random numbers $(x_i,y_i)$ within a box around our function, $f(x)$, then (2) checked to see whether the value of $y_i$ was greater than or less than the value $f(x_i)$. If $y_i$ was less than $f(x_i)$ we kept the values $(x_i,y_i)$ and if $y_i$ was greater than $f(x_i)$ we threw them out. Over many iterations, $N$, we "filled in" the area defined by the function, which allowed us to compute the integral numerically by multiplying the area of the box times the ratio of the number of hits to (hits+misses).
Rejection sampling works the same way, only instead of computing the integral, we use the $x_i$ samples as a discrete approximation of the original function. Here is a histogram of the number of times a particular value of $x_i$ was selected ("hits") using rejection sampling for the function in the previous figure, normalized and plotted with the original function.
We could apply this method with any function to be able to sample randomly from it, but there are some drawbacks. This is a somewhat inefficient method, since we must generate two random numbers for every possible sample point and we reject many of the samples. In this example, a large fraction of the area of the box lies above the function boundary. There are ways to redefine the shape of the box to better match the shape of the function, but these methods can get complicated.
John von Neumann (1903-1957), the eminent mathematician, devised an efficient and elegant method to circumvent the rejections. This consists of formulating the cumulative distribution function (CDF) $F(x)$ for our function $f(x)$, inverting the CDF to get $F^{-1} = x(F)$, sampling numbers $z$ from the uniform distribution U[0,1] and using the inverse to find $x=F^{-1}(z)$. That's a mouthful. Let's break it down.
CDF sounds horrible, but it is really pretty straightforward. It is just the fraction of the probability distribution for a function $f(x)$ that lies below a particular value $x_i$ - the running integrated probability. $F(x)$ ranges from 0 at the lower bound of the function's range to 1 at the upper bound. It can be computed analytically if a function can be integrated analytically, or numerically otherwise. Here is an analytic example:
Consider the function
$$f(x) = \begin{cases} \sin(x)/2 & 0 \le x \le \pi \\\\ 0 & \text{otherwise} \end{cases}$$The cumulative distribution function F(x) in this case is just the integral of $f(x)$ below the bound:
$$F(x) = \int_{-\infty}^x f(t)dt = \begin{cases} 0 & x \lt 0 \\\\ (\cos(0)-\cos(x))/2 & 0 \le x \le \pi \\\\ 1 & x \gt \pi \end{cases}$$Let's plot them together.
In [ ]:
x = np.arange(0.,np.pi+0.001,0.01)
f = lambda x: np.sin(x)/2.
F = lambda x: (np.cos(0)-np.cos(x))/2.
plt.plot(x,f(x),label='True distribution',lw=3)
plt.plot(x,F(x),label='Cumulative function',lw=3)
plt.xlim(0,3.5)
plt.ylim(0,1.1)
plt.legend(loc=0)
plt.show()
The CDF represents the probability for a value of $x_i$ to lie below a given point $x_0$ within the bounds of the function. In this case, where the distribution is symmetric about $\pi/2$, we would expect half of the values we generate to lie below $x_0$ = $\pi/2$. The value of the CDF at $\pi/2$ should be 0.5, as you can see from the figure that it is. The full distribution lies below $x_0 = \pi$, so the probability of obtaining a value for $x_i$ that is less than $\pi$ should be 100%, again as the plot of the CDF shows. To see what this means, run the interact
object below and observe how the plot changes as we increase the $x_0$-value.
In [ ]:
from IPython.html.widgets import interact, interactive
from scipy.integrate import trapz,cumtrapz
In [ ]:
def cdf_viewer(xmax = 0.95):
#Plot the full function first
x = np.arange(0.,np.pi+0.001,0.01)
f = lambda x: np.sin(x)/2.
plt.plot(x,f(x),label='True distribution',lw=3)
#Now show the CDF up to the xmax limit
xcdf = np.arange(0.,xmax+0.001,0.01)
F = lambda x: (np.cos(0)-np.cos(x))/2.
plt.plot(xcdf,F(xcdf),label='Cumulative function',lw=3)
plt.plot((xmax,xmax),(0.,f(xmax)),"b--")
plt.fill_between(xcdf,0.,f(xcdf),alpha=0.1)
#Use trapz with the true distribution as a comparison, but only integrate
#up to xmax
total = trapz(f(x),x)
area = trapz(f(xcdf),xcdf)
frac = area/total
plt.text(2.0,0.6,"CDF value = %.3f"%F(xmax),fontsize=12)
plt.text(2.0,0.5,"Shaded area = %.3f"%frac,fontsize=12)
plt.xlim(0,3.5)
plt.ylim(0,1.1)
plt.legend(loc="upper left")
plt.show()
v = interact(cdf_viewer,xmax=(0.,np.pi,0.01))
The endpoint of the CDF represents the fraction of the total area under the curve up to that point - the shaded region. The farther in $x$ we go, the more area accumulated, until we reach the total of 1 at the end of the distribution.
To produce a set of samples that has the same probability distribution as f(x), generate $N$ uniform samples, $z_i$, on U[0,1]. For each sample, $z_i$, compute $x_i = F^{-1}(z_i)$.
Inverting the CDF to solve for $F^{-1}(z)$ gives
$$x_i = F^{-1}(z_i) = \cos^{-1}(\cos(0) - 2z_i)$$Now let's implement that and plot the result:
In [ ]:
#number of samples
N = 10000
#N random numbers between 0 and 1
z = np.random.random_sample(N);
#inverse CDF
Finv = lambda z: np.arccos(np.cos(0)-2*z)
#sampled x values
xsamples = Finv(z);
#how do they compare to the original?
plt.hist(xsamples,50,normed=True,label="Sampled distribution")
plt.plot(x,f(x),label='True distribution',lw=3,color='red')
plt.xlim(0,pi)
plt.ylim(0,0.71)
plt.legend(loc=0)
plt.show()
The samples have the same shape as the original function!
The power of this method is that we only had to generate one random number $z_i$ for every sample $x_i$, and there are no rejections.
Here is another analytic example:
Consider the density function
$$f(x) = \begin{cases} \lambda e^{-\lambda x} & 0 \le x \le \infty \\\\ 0 & \text{otherwise}\end{cases}$$The CDF $F(x)$ for this function is
$$ F(x) = 1-e^{-\lambda x} $$and its inverse is given by
$$ F^{-1}(z) = -\ln(1-z)/\lambda $$We can compute the function and CDF over the range $0 \le x \le 5$, with $\lambda$ = 1 and then generate 10,000 samples according to the distribution of $f(x)$.
In [ ]:
#Set up the functions and plot
x = np.arange(0.,10.01,0.01)
lam = 1
f = lambda x: lam*exp(-lam*x)
F = lambda x: 1-exp(-lam*x)
Finv = lambda z: -log(1-z)/lam
plt.plot(x, f(x),label=r'$f(x)=\lambda e^{-\lambda x}$')
plt.plot(x, F(x), linewidth=5, label=r'$F(x)=1-e^{-\lambda x}$')
plt.ylim(0,1.5)
plt.xlim(0,5.)
plt.xlabel(r"$x$",fontsize = 20)
plt.ylabel(r'$f(x),F(x)$',fontsize=20)
plt.legend(loc="upper left",fontsize=15)
plt.show()
In [ ]:
#Generate the sampled distribution
N = 10000
z = numpy.random.random_sample(N)
xsamples = Finv(z)
plt.figure(1,(10,10))
plt.subplot(2,1,1)
plt.hist(xsamples, 50, normed=True, label="Sampled Distribution")
plt.plot(x, f(x), label="True Distribution", lw=3, color='red')
plt.xlabel(r"$x$",fontsize = 20)
plt.ylabel(r'$f(x) = \lambda e^{-\lambda x}$',fontsize=20)
plt.legend(loc='best')
#Zoom in on the tail
plt.subplot(2,1,2)
plt.semilogy(x, f(x), label="True Distribution", lw=3, color='red')
n, b, p = plt.hist(xsamples, 50, normed=True, label="Sampled Distribution")
plt.xlabel(r"$x$",fontsize = 20)
plt.ylabel(r'$f(x) = \lambda e^{-\lambda x}$',fontsize=20)
plt.legend(loc='best')
plt.show()
It turns out that the function we wish to sample from need not be analytically solvable for $F(x)$ and $F^{-1}$. We can also determine the CDF and its inverse numerically and then obtain sampled values from that numerical inverse. In some ways it is actually conceptually simpler. The steps of the algorithm are as follows:
Define the numerical array that describes the distribution
Compute the CDF as an array numerically using the cumulative sum of the distribution array and normalizing each element by the integral(sum) of the distribution.
Generate an array of uniform random values $z$ on the interval U[0,1]
Find the index of the value in the CDF array that corresponds to a particular $z_i$.
Use the array index of the CDF value from the previous step to determine the sampled $x_i$ value for that $z_i$ value.
Let's try this out for the exponential function, $f(x) = \lambda e^{-\lambda x}$, with $\lambda$ = 1. We can use NumPy's cumsum
(or scipy.integrate.cumtrapz
) and searchsorted
to implement these steps.
In [ ]:
#create the arrays (not functions)
x = np.arange(0.,5.,0.002)
f = e**(-x)
#This is an array between 0 and 1 (because of the normalization f.sum())
#with the values determined by the original function
CDF = np.cumsum(f)/f.sum()
#Get a random number r between 0 and 1
r = np.random.uniform()
print "Random number = ",r
#Find where in the CDF array this r falls and return the x value at that location in the x array
xloc = x[CDF.searchsorted(r)]
print "x location = ",xloc
#Now do it 10000 times
N = 10000
xsamples = np.zeros(N,float)
for i in range(N):
rr = np.random.uniform()
xsamples[i] = x[CDF.searchsorted(rr)]
#Make a histogram of the xsamples and compare to the original function
plt.plot(x,f,label='True distribution',lw=3,color='red')
plt.plot(x,CDF,label="Cumulative distribution",lw=3,color='green')
plt.hist(xsamples,bins=100,normed=True,label="Sampled distribution",alpha=0.5)
#Plot the example point
plt.plot(xloc,r,'g*',markersize=10)
plt.plot((xloc,xloc),(0.,r),"k--",(0.,xloc),(r,r),"k--")
plt.ylim(0,1.5)
plt.xlim(0,5.)
plt.legend()
plt.show()
Run it a few times until you can see the example point clearly. The important conclusion is that we can now create a set of sampled x values whose frequency matches the shape of any arbitrary function.
If you look at the help for the numpy.random
module, you will see that there is a set of standard built-in distributions that you can sample from. These use methods such as the Inverse CDF Transform with exact solutions where possible, and other more advanced techniques when necessary, to produce statistically safe samples. Whenever you can, use the built-in functions. When a particular function is not available, you can use this technique. The numerical method can also be adapted to more than one-dimension, or rejection sampling can be used in those cases.
There are many areas of physics where it is useful to simulate a phenomenon before we conduct an experiment, to help guide the experimental design and procedure. There are also cases where we use simulations of an experiment after the fact to correct for any instrumental biases.
For example, in high energy particle and nuclear collisions, we need to know how well our detectors "see" the particles produced in the collisions so we can optimize their design to catch as many as possible. We can use Monte Carlo sampling to simulate the stochastic nature of the interaction of particles with matter to model the "response" of our detector to particles of a known type and energy. The input to the simulation represents the "true" distribution of particles, while the output corresponds to an estimate of the "observed" distribution of particles, assuming that we have accurately characterized the physics of the interactions in our simulation. The ratio of input to output can be used as a correction factor to estimate the "true" distribution of particles from the "observed" distribution in a real experiment. Obviously, reliable simulations are essential to producing good results.
The GEANT4 package, written in C++, and its Fortran 77 precursor GEANT3, are toolkits for simulating the passage of particles through matter. The simulations use Monte Carlo sampling to approximate the inherently stochastic processes that govern particle interactions. All of the LHC experiments and many other high energy particle and nuclear physics experiments rely on it. The physics in the package has been tested repeatedly against benchmark experimental data to validate its output. It is a complex program that can be daunting when you first start using it, so many concrete examples are provided with the software to show how it can be used in different contexts. Assessing radiation damage and the interaction of particle beams with human tissue for medical applications are two notable examples.
Don't worry, we won't be learning GEANT4 in this class, but I present it as a classic example of a Monte Carlo sampling simulator that is in common use. For our purposes, a simple example from your introductory Modern Physics course will be more instructive.
Let’s consider the double-slit experiment as an example of the Monte Carlo simulation technique. In the lab, we relate the intensity of the observed beam (either photons or electrons) to the sum of the two waves, one from each slit. Each slit gives us a diffraction pattern,
$$ I_{SS_{diffraction}} = \text{sinc}^2(a x / \lambda L) $$where $\text{sinc}(x) = \sin(\pi x)/(\pi x)$ is the normalized sinc function.
The double slit, however, is not just the sum of the two single slits, but rather includes an interference term,
$$ I_{DS_{interference}} = \cos^2(2\pi d x/\lambda L) $$due to the wave-nature of the photons or electrons.
The observed intensity includes both the probability that the diffraction and interference are appreciable.
$$ I_{DS_{total}} = I_{SS_{diffraction}} * I_{DS_{interference}} $$
Here is a diagram to illustrate the concept and define the variables.
|
The intensity on the screen will look something like this:
|
Now, let’s do the quantum mechanics problem. What if we let just one photon or electron pass through the slit at a time? What would we see on the screen?
Instead of seeing the addition of waves, we’d see the location of the individual photon or electron. Because $E = h\nu$, the intensity plotted above is also the un-normalized probability distribution of finding a photon or electron at any single location.
To simulate this experiment, we’ll define the experimental parameters, create the probability distribution, and then throw random numbers to distribute photons based on their probability. To make it awesome, we'll set the parameters up as an interactive widget so we can explore the system in detail.
Let the initial distance between the slits $d$ = 15 $\mu$m and the slit width $a$ = 10 $\mu$m. The screen is 1 m from the plate with the slits. We will use a He-Neon laser with wavelength $\lambda$ = 632.8 nm. The screen size is (0.2 $\times$ 0.2) m.
We can set up the probability distribution for this situation and use the inverse CDF transform technique to generate $N$ photons in the range (0,10000).
Before we create the interact
, let's repackage our numerical inverse CDF transform as a re-usable function called distribute1d
:
In [ ]:
def distribute1D(x,prob,N):
"""takes any distribution which is directly proportional
to the number of particles, and returns data that is
statistically the same as the input data."""
CDF = cumtrapz(prob)/np.sum(prob)
xsamples = np.zeros(N,float)
for i in range(0,N):
r = np.random.ranf()
xsamples[i] = x[CDF.searchsorted(r)]
return xsamples
Now define the double_slit
function and make it interactive:
In [ ]:
#Quantum double-slit
#define the experimental parameters
#d = 15. # (micron) dist. between slits
#a = 10. # (micron) slit width.
#L = 1. # (m) dist. from slit to screen
#lam = 632.8 # (nm) He-Neon laser
def double_slit(d=15.,a=10.,L=3.,lam=632.8,N=0):
#convert d and a in microns to meters
dm = d*1.e-6
am = a*1.e-6
#convert wavelength from nm to m
wave=lam*1.e-9
# create the probability distribution
x = np.linspace(-0.2,0.2,10000)
#Isingle = np.sin(np.pi*am*x/wave/L)**2./(np.pi*am*x/wave/L)**2
Isingle = np.sinc(am*x/wave/L)**2.
Idouble = (np.cos(2*np.pi*dm*x/wave/L)**2)
Itot = Isingle*Idouble
#generate the random photon locations on the screen
#x according to the intensity distribution
xsamples = distribute1D(x,Itot,N)
#y randomly over the full screen height
ysamples = -0.2 + 0.4*np.random.ranf(N)
#Make subplot of the intensity and the screen distribution
fig = plt.figure(1,(10,6))
plt.subplot(2,1,1)
plt.plot(x,Itot)
plt.xlim(-0.2,0.2)
plt.ylim(0.,1.2)
plt.ylabel("Intensity",fontsize=20)
plt.subplot(2,1,2)
plt.xlim(-0.2,0.2)
plt.ylim(-0.2,0.2)
plt.scatter(xsamples,ysamples)
plt.xlabel("x (m)",fontsize=20)
plt.ylabel("y (m)",fontsize=20)
v5 = interact(double_slit,d=(1.,20.,1.), a=(5,50.,1.), L=(1.0,3.0),
lam=(435.,700.),N=(0,10000))
Observe how the interference pattern builds up as more photons are added. Does it evolve like you'd expect? What effect does varying each of the parameters have on the shape of the distributions?