A Monte Carlo Simulation is any physical simulation making use of random numbers. You've already writen the code for two simple Monte Carlo simulations:
A field of physics which makes significant use of Monte Carlo simulations is statistical mechanics (they are also frequently used in astrophysics - I have most often encountered them around the generation of mock catalogues used to simulate astronomical surveys).
Using statistical mechanics as an example application, a common goal is to compute the average of some quantity in a system (e.g., the pressure in a gas) in thermal equilibrium at temperature $T$ (in kelvins). Statistical mechanics theory tells us that while we cannot say specifically what state the system would be in (what all the particle speeds would be in a gas, for instance), the probability of its being in a state with energy $E_i$ is related to the "Boltzmann factor," $e^{-\beta E_i}$, via
\begin{equation} P(E_i) = \dfrac{e^{-\beta E_i}}{Z} , (1) \end{equation}where $\beta \equiv 1/k_B T$ ($k_B$ is "Boltzmann's constant") and $Z \equiv \sum_i e^{-\beta E_i}$ is the "partition function," involving a sum over all "microstates" of the system. The average value of some quantity $X$, whose value in the $i^{\mathrm{th}}$ state is $X_i$, then is
\begin{equation} <X> = \sum_i X_i \, P(E_i) . (2) \end{equation}Usually, this sum cannot be determined analytically, so it must be computed numerically. However, it would be prohibitively time-consuming, if not impossible, to do it directly, as the number of states involved typically is "astronomically" large. MC techniques can come to the rescue here: the idea is to replace the complete sum by one over a smaller number of terms chosen randomly. If we choose $N$ of the states, and designate them by the subscript $j$, then we can write
\begin{equation} <X> \simeq \dfrac{\sum_{j=1}^N X_j P(E_j)}{\sum_{j=1}^N P(E_j)} , (3) \end{equation}where we have to include the denominator for normalization. It was not needed in the previous expression because the probability is normalized for the full sum; i.e., $\sum_i P(E_i) = 1$ when all states $i$ are included.
It turns out that in this instance the simple approach of using a uniform sampling of states will give poor results: most of the states are associated with Boltzmann factors that are so small (since $E_j \gg k_B T$ for them) that they contribute a negligible amount to the sum. In fact, most of the sum comes from a very small subset of the terms, most or all of which are likely to be missed in a uniform random sample. Here's a situation where we should use importance sampling!
The average above is expressible, in terms of weights $w_i$ that we choose, as
\begin{equation} <X> \simeq \dfrac{1}{N} \sum_{j=1}^N \dfrac{X_j P(E_j)}{w_j} \sum_{i} w_i , (4) \end{equation}where the last sum is over all of the states, and normally is computed analytically.
The aim now is to choose the weights $w_i$ to pick out the large $P(E_i)$ terms in the sum, and such that the sum of the $w_i$ can be determined analytically. But this is easy: we simply take $w_i = P(E_i)$ which, by Eq. (1), are normalized, $\sum_i w_i = \sum_i P(E_i) = 1$. Then, the previous expression becomes
\begin{equation} <X> \simeq \dfrac{1}{N} \sum_{j=1}^N X_j . (5) \end{equation}This is telling us that to average $X$, we just choose $N$ states in proportion to their Boltzmann probabilities; i.e., their likelihood of occurring. (Of course, the more states we use, the more accurate the approximation.) This averaging process is a simulation of the actual physical process in that the system will find itself most often in the states having the highest probabilities, so an average of a quantity $X$ in the system will be dominated by the values in the most probable states, just as in the average we've constructed.
There's a complication here: the probabilities $P(E_i)$ depend on the partition function $Z$, which we usually don't know a priori and which often is difficult to calculate numerically due to the large number of states involved, as mentioned earlier. It is, however, possible to choose the probabilities without knowing $Z$, using a powerful model for random processes known as a Markov chain, named after Russian mathematician Andrei Markov.
(ASIDE: The Wikipedia page for Markov chain lists numerous applications, including to the modeling of chemical reactions, to speech recognition systems, macroeconomic and population genetics models, and even Google's PageRank algorithm. It is also frequently used in astronomical applications such as statistics of large scale structure, fitting cosmological parameters to data and many others).
A Markov chain represents a random process in which a system transitions through a sequence of connected states (analogous to links in a chain), the sequence being determined by transition probabilities $T_{ij}$ from state $i$ to state $j$ which are independent of the previous states in the chain. That is, in a Markov chain $T_{ij}$ depends only on the current state $i$ and the next state $j$.
In the present application, the states in the Markov chain will correspond to actual states of our physical system. For a long enough chain, the probability of being in a particular state will be the same as the probability of transitioning to that state. The idea, then, is to choose the $T_{ij}$ such that the probability of transitioning into any particular state $i$ on any step in the chain is the Boltzmann probability for that state, $P(E_i)$.
In order for the $T_{ij}$ to satisfy this condition, we first note that the $T_{ij}$ will be normalized according to $\sum_j T_{ij} = 1$, since the system must transition to some state from the $i^{\mathrm{th}}$ one. We then choose the $T_{ij}$ so that they satisfy
\begin{equation} \dfrac{T_{ij}}{T_{ji}} = \dfrac{P(E_j)}{P(E_i)} = \dfrac{e^{-\beta E_j} / Z}{e^{-\beta E_i} / Z} = e^{-\beta (E_j - E_i)}. \hspace{50pt} (6) \end{equation}Note that the partition function $Z$ has been eliminated from this result, so it will not need to be calculated.
It is not hard to prove (alhtough long, so see Appendix D of Newman's Computational Physics) that using this technique, a system started in any random state will evolve to the Boltzman Distribution (if you wait long enough).
We still need to work out what $T_{ij}$ should be (the transition probabilities we will use). One common method to do this is known as the Metropolis algorithm (also known as the Metropolis-Hastings algorithm), named after Nicholas Metropolis who was involved in its development.
Suppose that we start with the system in state $i$, and consider making a small change that would put the system in state $j$; e.g., by changing the energy of one gas particle. (Note that the new state could be the same as the previous state; i.e., $T_{ii} \ne 0$, in general. This would happen, for instance, if two gas particles collided elastically and interchanged energies.) The symbols $i$ and $j$ here designate steps in the Markov chain rather than particular states of the system.) The Metropolis algorithm instructs us to accept such a change with an acceptance probability
\begin{equation} P_a = \begin{cases} 1 & \mathrm{if} \hspace{1mm} E_j \le E_i , \\ e^{-\beta (E_j - E_i)} & \mathrm{if} \hspace{1mm} E_j > E_i . (7) \end{cases} \end{equation}The idea here is that the system always will "choose" to go to a state with equal or lower energy than the current state, but it also may move to a state of higher energy, with some probability dependent on the temperature (through $\beta = 1/k_B T$). (If this didn't happen, numerical models of thermodynamic systems would always result in them evolving directly to their ground state which, in fact, doesn't happen due to thermal excitations.) In the second case above, the probability of transitioning to a state of higher energy is inversely related to the change in energy (the exponential factor becomes smaller as $E_j - E_i$ grows), and it's also inversely related to $T$ (smaller $T$ leads to a larger negative exponent and therefore a smaller exponential factor). This simulated behavior is designed to match what we expect to happen physically: it's more likely for our system to jump to a higher energy if it's in a higher-temperature environment than a lower temperature environment, but the jump is less likely for a larger energy difference.
To convert the acceptance probability into a transition probability, suppose that the number of possible changes to the system is $M$. Then the probability of transitioning from state $i$ to state $j$ is the probability of choosing a particular change (which would be just $1/M$), times the probability that we accept that change. This lets us write the transition probabilities as
\begin{equation} \begin{cases} T_{ij} = \dfrac{1}{M} \times e^{-\beta (E_j - E_i)} \hspace{3mm} \mathrm{and} \hspace{3mm} T_{ji} = \dfrac{1}{M} \times 1 , & \hspace{5mm} \mathrm{if} \hspace{1mm} E_j > E_i ; \\[2ex] T_{ij} = \dfrac{1}{M} \times 1 \hspace{3mm} \mathrm{and} \hspace{3mm} T_{ji} = \dfrac{1}{M} \times e^{-\beta (E_i - E_j)} , & \hspace{5mm} \mathrm{if} \hspace{1mm} E_j \le E_i . \end{cases} (8) \end{equation}then it is true that
\begin{equation} \frac{T_{ij}}{T_{ij}} = \frac{e^{-\beta (E_j - E_i)}/M}{1/M} = e^{-\beta (E_j - E_i)} (9) \end{equation}which is what we need.
Now that we have identified all of the pieces, we can spell out the Markov Chain Monte Carlo (MCMC) simulation method (AKA fancy terms for something quite simple which will make you sound very clever):
Choose a starting state (must be physically allowed, but otherwise this is up to you)
Select a change to the system at random from among the set of possible changes, and compute the change in energy from the previous state.
Compute the corresponding acceptance probability $P_a$ from Equation (7) above using the energy change. Accept the change with probability $P_a$---i.e., with that probability change to the new state; otherwise, reject the change.
Compute and store the value of the quantity (or quantities) of interest in the current state.
Repeat from step 2.
Step 3 of the sequence can be implemented by using np.random.random()
to obtain a random number $z$ and comparing it with $e^{-\beta (E_j - E_i)}$; if $z < e^{-\beta (E_j - E_i)}$, then the change to the new state is made; if not, the change is rejected and the system remains in the same state. (The larger $e^{-\beta (E_j - E_i)}$ is, the more likely $z$ is to be less than it and the more likely the change is to be accepted.)
It's important to emphasize a few points about the MCMC approach:
Even if a change of state is rejected and the system remains in its current state, that still counts as a "step" in the Markov chain, and therefore the value of any quantity of interest that's being recorded should be recorded again (and/or again added to any sum of its values).
In order for Eq. (9) to hold, the number of changes to the system that are available in going from state $i$ to state $j$ must be the same number that would be available in making the reverse change. This might require careful selection of the set of possible state changes.
It must be possible eventually to get to any state; this property of the system, known as ergodicity, is a requirement for the Metropolis algorithm to work.
While the Markov chain process always will converge to the correct Boltzmann probability distribution, there's no way to predict how many steps that will take; i.e., how long it will take for the system to equilibrate. Basically, one determines when equilibration has been achieved by looking for minimal or no changes in the output values.
In an ideal gas of N atoms, the total energy of the system is the sum of the energies of the particles
$$ E = \sum^N_{i=1} E(n_x^{(i)},n_y^{(i)},n_z^{(i)}),$$where in a cubic box of length, $L$ an atom of mass, $m$ has energy given by
$$ E(n_x,n_y,n_z) = \frac{\pi^2 \hbar^2}{2mL^2} (n_x^2 + n_y^2 +n_z^2) $$For our MCMC we will choose to:
When this happens the change in energy is (for example for $n_x$ changing to $n_x+1$):
$$ \Delta E = \frac{\pi^2 \hbar^2}{2mL^2} ((n_x \pm 1)^2 + n_y^2 +n_z^2) - \frac{\pi^2 \hbar^2}{2mL^2} (n_x^2 + n_y^2 +n_z^2) $$which simplifies to:
$$ \Delta E = \frac{\pi^2 \hbar^2}{2mL^2} (\pm 2n_x + 1). $$(and the equivalent for changes in $n_y$ or $n_z$).
We will simulated N = 1000 particles, for $k_B T = 1$ and work in theorists units ($m = \hbar = 1$, and $L = 1$).
We will start in the "ground state" $N_x = n_y = n_z = 1$.
In [9]:
import numpy as np
import numpy.random as rnd
import matplotlib.pyplot as plt
#This for Notebook display
%matplotlib inline
T = 10.0
N = 1000
steps = 250000
#Array of integers to store quantum numbers
n = np.ones([N,3],int)
#Main loop
eplot=[]
E=3*N*np.pi*np.pi/2
for k in range(steps):
#Choose the particle and state to move
i=rnd.randint(N) #Random integer from 0 to N-1
j=rnd.randint(3) #Random integer from 0 to 2
#Choose if to go +1 or -1 (50-50)
if rnd.random()<0.5:
dn = 1
dE = (2*n[i,j]+1)*np.pi*np.pi/2
else:
dn=-1
dE=(-2*n[i,j]+1)*np.pi*np.pi/2
#Decide if move should be accepted.
#If energy is not going to be reduced (ie. n=n+1 or we're in the ground state) just keep current state
if n[i,j]>1 or dn==1:
#Now calculate acceptance probability, and keep new state if random number is less than this
if rnd.random()<np.exp(-dE/T):
n[i,j] += dn
E += dE
#Save the state
eplot.append(E)
#Make a graph of energy steps
plt.plot(eplot)
plt.ylabel("Energy")
plt.xlabel("Step number")
plt.show()
Comments:
Definitions:
To summarize:
Annealing is the process of cooling materials slowly enough that their atoms/molecules are able to order themselves before the material cools enough to freeze. This process has been used for centuries by glass and metal works to create stronger materials.
Simulated annealing refers to doing this in your code. It provides a way to identify global minimima in system with complex likelihood surfaces.
The process is as follows:
Many of the applications of this process are not in physics, but real world simulations with many parameters, like optimizing car fuel consumption, traveling salesman problems etc.
So how slow is slow enough? Well there is no correct answer. Often the system is set up with a temperature high enough that it reach equillibrium really quickly, and then the temperature is dropped as
$$ T = T_0 e^{-t/\tau}$$where $\tau$ is chosen by trial and error. This is another place in computational physics where you have to make a trade off between accuracy/the best answer, and computing power. If $\tau$ is too large the code may take too long to run, but if it's too small you may not get to the global minimum....
Another effective optimization method inspired by the real world, which goes by the name genetic programming (based on genetic algorithms).
This approach adopts the biological concepts of mutation, inheritance, and natural selection, and applies them to possible solutions to an optimization problem.
The idea is to start with a group of candidate solutions, evaluate their "fitness" as solutions to the problem, and then "mutate" them to try to improve their fitness.
Find the shortest route which visits a set of N locations
Example applications:
This is a classic problem for computer scientists and computational physicists alike.
Set up the problem as follows:
Run the simulation:
Now use an MCMC with Simulated Annealing to find the shorted route.
In (hopefully) plain English:
If the swap shortens the tour keep it, if not keep it with the Metropolis probability for a "temperature" T and an "energy" $E = D$ (ie. keep if the probability, P is
$$ P \lt e^{\Delta D/T}$$
for some value of $T$ you have chosen.
Tips/good starting points (but experiment!)
r=np.empty([N+1,2],float)
will define an (N+1)x 2 array. plt.plot(r[:,0],r[:,1]
if t%5000==0:
will do something every 5000th step. Tmax = 10.0
Tmin = 1e-3
tau = 2e4
Once you are sure your code runs, trying timing it for different numbers of locations. You can try to use
import time
start_time = time.clock()
main()
print(time.clock() - start_time, "seconds")
In [ ]: