Variational Methods

The variational method is an approximate method used in quantum mechanics. Compared to perturbation theory, the variational method can be more robust in situations where it is hard to determine a good unperturbed Hamiltonian (i.e., one which makes the perturbation small but is still solvable). On the other hand, in cases where there is a good unperturbed Hamiltonian, perturbation theory can be more efficient than the variational method.

The basic idea of the variational method is to guess a "trial" wavefunction for the problem, which consists of some adjustable parameters called "variational parameters". These parameters are adjusted until the energy of the trial wavefunction is minimized. The resulting trial wavefunction and its corresponding energy are variational method approximations to the exact wavefunction and energy.

Why would it make sense that the best approximate trial wavefunction is the one with the lowest energy? This results from the Variational Theorem, which states that the energy of any trial wavefunction $E$ is always an upper bound to the exact ground state energy $E_0$. This can be proven easily. Let the trial wavefunction be denoted $\phi$. Any trial function can formally be expanded as a linear combination of the exact eigenfunctions $\psi_i$. Of course, in practice, we do not know the $\psi_i$, since we are assuming that we are applying the variational method to a problem we can not solve analytically. Nevertheless, that does not prevent us from using the exact eigenfunctions in our proof, since they certainly exist and form a complete set, even if we do not happen to know them: \begin{equation} \sum_{i,j}\langle\psi_{i}|\psi_{j}\rangle=\delta_{i,j} \end{equation} where $\delta_{i,j}$ is the Kronecker delta. \begin{equation} H \left|\psi_i\right\rangle = E_i\left|\psi_i \right\rangle . \end{equation}

We are asuming that the physical states are normalized, i.e. their norm is equal to unity (we can always make it to do so). Let us assume that we have a candidate wavefunction to describe the ground-state, that we call $|\phi\rangle$, and that this function deppends on a set of parameters ${c_i}$, that are complex numbers. Ignoring complications involved with a continuous spectrum of H, suppose that the spectrum is bounded from below and that its greatest lower bound is $E_0$. So, the approximate energy corresponding to this wavefunction is the expectation value of $H$: \begin{eqnarray} \left\langle\phi|H|\phi\right\rangle & = & \sum_{i,j} \left\langle\phi|\psi_{i}\right\rangle \left\langle\psi_{i}|H|\psi_{j}\right\rangle \left\langle\psi_{j}|\phi\right\rangle \\ & = & \sum_{i} E_i \left|\left\langle\psi_i|\phi\right\rangle\right|^2\ge\sum_{i}E_0 \left|\left\langle\psi_i|\phi\right\rangle\right|^2=E_0 \end{eqnarray}

In other words, the energy of any approximate wavefunction is always greater than or equal to the exact ground state energy ${E}_0$. This explains the strategy of the variational method: since the energy of any approximate trial function is always above the true energy, then any variations in the trial function which lower its energy are necessarily making the approximate energy closer to the exact answer. (The trial wavefunction is also a better approximation to the true ground state wavefunction as the energy is lowered, although not necessarily in every possible sense unless the limit $\phi = \psi_0$ is reached).

Frequently, the trial function is written as a linear combination of basis functions, such as

$$ |\phi\rangle = \sum_i c_i |\phi_i\rangle. $$

This leads to the linear variation method, and the variational parameters are the expansion coefficients $c_i$. We shall assume that the possible solutions are restricted to a subspace of the Hilbert space, and we shall seek the best possible solution in this subspace.

The energy for this approximate wavefunction is just \begin{equation} E[\phi] = \frac{\sum_{ij} c_i^* c_j \langle\phi_i |H| \phi_j\rangle}{ \sum_{ij} c_i^* c_j \langle\phi_i|\phi_j\rangle}, \label{evar} \end{equation} which can be simplified using the notation $H_{ij} = \langle \psi_i|H|\psi_j\rangle = \int \phi_i^* {H} \phi_j,$ $S_{ij} = \langle \psi_i|H|\psi_j\rangle = \int \phi_i^* \phi_j,$ to yield $$ E[\phi] = \frac{\sum_{ij} c_i^* c_j H_{ij}}{\sum_{ij} c_i^* c_j S_{ij}}. $$

In order to find the optimial solution, we need to minimize this energy functional with respect to the variational parameters $c_i$, or calculate the variation such that: \begin{equation} \delta E({c_i})=0. \end{equation} We will calculate only the variation with respect to $c_i^*$, since the variation with respect to $c_j$ will yield the same result: \begin{eqnarray} \delta E & = & \sum_i \frac{\left(\sum_j c_j H_{ij} \sum_{i',j'} c_{i'}^*c_{j'} S_{i'j'} - \sum_j c_j S_{ij} \sum_{i',j'} c_{i'}^*c_{j'} H_{i'j'}\right)}{\left(\sum_{i',j'} c_{i'}^*c_{j'} S_{i'j'} \right)^2} \delta c_i^*. \end{eqnarray} Reordering some terms we can rewrite it as: \begin{equation} \sum_i \left(\frac{\sum_j c_j H_{ij} - E \sum_j c_j S_{ij}}{\sum_{i'j'} c_{i'}^*c_{j'} S_{i'j'}}\right) \delta c_i^*, \end{equation} where $E$ is given by Eq.(\ref{evar}). This should be satisfied for all $c_i$'s, and we find that the coefficients should satisfy the following conditions: \begin{equation} \sum_{j=1}^N{(H_{ij}-ES_{ij})c_j} = 0 \,\,\,\,\, i=1,2,...,N. \end{equation} This is a generalized eigenvalue problem, that can be rewritten: \begin{equation} {\bf Hc}=E{\bf Sc}, \end{equation} where ${\bf H}$ is the Hamiltonian matrix, and ${\bf S}$ is the so-called "overlap matrix". The main difference with an ordinary eigenvalue problem is the marix ${\bf S}$ on the right hand side. We'll see how to solve this problem in the exercises. The solution to thsi problem is obtained by solving the following "secular determinant" \begin{equation} \left \vert \begin{array}{cccc} H_{11} - E S_{11} & H_{12} - E S_{12} & \cdots & H_{1N}-ES_{1N} \\ H_{21} - E S_{21} & H_{22} - E S_{22} & \cdots & H_{2N}-ES_{2N} \\ \vdots & \vdots & \vdots & \vdots \\ H_{N1} - E S_{N1} & H_{2N} - E S_{2N} & \cdots & H_{NN} - E S_{NN} \end{array}\right\vert = 0. \end{equation}

If an orthonormal basis is used, the secular equation is greatly simplified because $S_{ij} = \delta_{ij}$ (1 for $i=j$ and 0 for $i \neq j$), and we obtain: \begin{equation} {\bf Hc}=E{\bf c}, \end{equation} which is nothing else but the stationary Schr\"odinger's equation, formulated in this basis. In this case, the secular determinant is \begin{equation} \left \vert \begin{array}{cccc} H_{11} - E & H_{12} & \cdots & H_{1N} \\ H_{21} & H_{22} - E & \cdots & H_{2N} \\ \vdots & \vdots & \vdots & \vdots \\ H_{N1} & H_{2N} & \cdots & H_{NN} - E \end{array}\right\vert = 0, \end{equation}

In either case, the secular determinant for $N$ basis functions gives an $N$-th order polynomial in $E$ which is solved for $N$ different roots, each of which approximates a different eigenvalue. These equations can be easily solved using readily available library routines, such as those in Numerical Recipes, or Lapack.

These equations can be easily solved using readily available library routines, such as those in Numerical Recipes, or Lapack. At this point one may wonder where the approximation is: aren't we solving the problem exactly? If we take into account a complete basis set, the answer is "yes, we are solving the problem exactly". But as we said before, teh Hilbert space is very large, and we therefore have to limit the basis size to a number that is easily tractable with a computer. Therefore, we have to work in a constrained Hilbert space with a relatively small number of basis states kept, which makes the result variational. Because of the computer time needed for numerical diagonalizations scales as the third power of the linear matrix size, we would want to keep the basis size as small as possible. Therefore, the basis wavefunctions must be choses carefully: it should be possible to approximate the exact solution to the full problem with a small number of basis states. Inorder to do that, we need some good intuition about the underlying physics of the problem.

We have used a linear parametrization of the wave function. This greatly simplifies the calculations. However, nonlinear parametrizations are also possible, such as in the case of hartree-Fock theory.

The variational method lies behind hartree-Fock theory and the configuration interaction method for the electronic structure of atoms and molecules, as we will see in the following chapter.

Examples of linear variational calculations

The infinite potential well

The potential well with inifinite barriers is defined: \begin{equation} V(x) = \left\{ \begin{array}{ccc} \infty & {\mathrm {for}} & |x| > |a| \\ 0 & {\mathrm {for}} & |x| \leq |a| \end{array} \right. \end{equation} and it forces the wave function to vanish at the boundaries of the well at $x=\pm a$. The exact solutioon for this problems is known and treated in introductory quantum mechanics courses. Here we discuss a linear variational approach to be compared with the exact solution. We take $a=1$ and use natural units such that $\hbar^2/2m=1$.

As basis functions we take simple polynomials that vanish on the boundaries of the well: \begin{equation} \psi_n(x)=x^n(x-1)(x+1), n=0,1,2,3... \end{equation} The reason for choosing this particular form of basis functions is that the relevant matrix elements can easily be calculated analytically. We start we the overlap matrix: \begin{equation} S_{mn}=\langle \psi_n|\psi_m \rangle = \int_{-1}^1 \psi_n(x) \psi_m(x) dx. \end{equation} Working out the integrals, one obtains \begin{equation} S_{mn}=\frac{2}{n+m+5} - \frac{4}{n+m+3} + \frac{2}{n+m+1} \end{equation} for $n+m$ even, and zero otherwise.

We can also calculate the Hamiltonian matrix elements: \begin{eqnarray} H_{mn}=\langle \psi_n | p^2 | \psi_m \rangle = \int_{-1}^1 \psi_n(x) \left(-\frac{d^2}{dx^2} \right) \psi_m(x) dx \\ = -8 \left[ \frac{1-m-n-2mn}{(m+n+3)(m+n+1)(m+n-1)} \right] \end{eqnarray} for $m+n$ even, and zero otherwise.

Exercise 16.1: Infinite potential well

$\bullet$ Write a computer program in which you fill the overlap and Hamiltonian matrices for this problem. Use standard software to solve the generalized eigenvalue problem. Notice that the matrices are Hermitian, so only the upper, or lower triangular parts have to be calculated (including the diagonal).

$\bullet$ Compare results with exact analytic solutions. These are given by \begin{equation} \psi_n(x) = \left\{ \begin{array}{ccc} \cos{(k_nx)} & n & {\mathrm {odd}} \\ \sin{(k_nx)} & n & {\mathrm{even \,\, and \,\, positive}} \end{array}\right. \end{equation} with $k_n=n\pi/2, n=1,2...$, and the corresponding energies are given by \begin{equation} E_n = k_n^2=\frac{n^2\pi^2}{4} \end{equation} For each eigenvector ${\bf c}$, the function $\sum_{p=1}^{N} = c_p\phi_p(x)$ should approximate an exact eigenfunction. They can be compared by displaying both graphically. Carry out the comparison for various numbers of basis states kept.


In [39]:
import numpy as np
from scipy import linalg

N=10
h=np.zeros((N,N), dtype=float)
s=np.zeros((N,N), dtype=float)

for m in range(N):
    for n in range(m,N):
        if((m+n)%2 == 0):
            h[n,m]=-8*(1-m-n-2*m*n)/((m+n+3)*(m+n+1)*(m+n-1))
            s[n,m] = 2./(m+n+5)-4./(m+n+3)+2./(n+m+1)
# No need for these lines below: we only have to define the lower triangle
#            h[m,n] = h[n,m]
#            s[m,n] = s[n,m]
        
w, v = linalg.eigh(h,s)
for n in range(N):
    print(n,np.pi**2*(n+1)**2/4,w[n])


0 2.4674011002723395 2.467401100272339
1 9.869604401089358 9.869604401093914
2 22.206609902451056 22.2066123442276
3 39.47841760435743 39.47850458931857
4 61.68502750680849 61.76067928958051
5 88.82643960980423 89.16437778460741
6 120.90265391334464 132.91796389990037
7 157.91367041742973 181.68712260391723
8 199.8594891220595 463.1473433601288
9 246.74011002723395 642.3003906576474

Hydrogen atom

One example of the variational method would be using the Gaussian function $\chi(r) = e^{- \alpha r^2}$ as a trial function for the hydrogen atom ground state. This problem could be solved by the variational method by obtaining the energy of $\chi(r)$ as a function of the variational parameter $\alpha$, and then minimizing $E(\alpha)$ to find the optimum value $\alpha_{min}$. The variational theorem's approximate wavefunction and energy for the hydrogen atom would then be $\chi(r) = e^{- \alpha_{min} r^2}$ and $E(\alpha_{min})$.

This is a one electron problem, so we do not have to worry about electron-electron interactions, or antisymmetrization of the wave function. The Schr\"odinger's equation reads: \begin{equation} \left[ -\frac{\hbar^2}{2m}\nabla^2 - \frac{e^2}{4\pi\epsilon_0}\frac{1}{r} \right] \psi(x) = E \psi(x) \end{equation} where the second term is the Coulomb interaction with the positive nucleus (remember, this is a charged particle in a central potential). The mass $m$ is the reduced mass of the proton-electron system, which is approximately equal to the electron mass. The ground state has energy \begin{equation} E=-\frac{N}{\hbar^2} \left(\frac {{\mathrm e}^2}{4\pi\epsilon_0} \right)^2 \approx -13.6058 {\mathrm eV} \end{equation} and the wave function is given by \begin{equation} \psi({\bf x}) = \frac{2}{a_0^{3/2}} \exp{(-x/a_0)} \label{H_exact} \end{equation} where $a_0$ is Bohr's radius \begin{equation} a_0=\frac{4\pi\epsilon_0\hbar^2}{m{\mathrm e}^2}. \end{equation}

It is convenient to use units such that equations take on a simpler form. These are the so-called standard units in electronic structure: the unit of distance is Bohr's radius, masses are expressed in units of the electon mass $m_{\mathrm e}$, and charge in units of the electron charge e. The energy is finally given in "hartrees", equal to $E_H=m_{\mathrm e}c^2\alpha^2$ (where $\alpha$ is the fine structure constant). In these units the Schr\"odinger equation for the hydrogen atom assumes the following simpler form: \begin{equation} \left[ -\frac{1}{2}\nabla^2 - \frac{1}{r} \right] \psi(x) = E \psi(x). \end{equation}

To approximate the ground state energy and wave function of the hydrogen atom in a linear variational procedure, we will use Gaussian basis functions. For the ground state, we only need angular momentum $l=0$ wave functions ($s$-orbitals), which have the form: \begin{equation} \chi_p(r)=\exp{(-\alpha_p r^2)} \end{equation} centered on the nucleus (whis is thus placed at the origin). We have to specify the values of the exponents $\alpha_p$, which are our variational parameters. Optimal values of these exponents have been previously found by other means, and in our case, we will keep these values fixed: \begin{eqnarray} \alpha_1 &=& 13.00773 \\ \alpha_2 &=& 1.962079 \\ \alpha_3 &=& 0.444529 \\ \alpha_4 &=& 0.1219492. \end{eqnarray} If the program works correctly, it should shield a value of the energy close to the exact results $E_0=-1/2 E_H$.

It remains to determine the coefficients of the linear expansion, by solving the generalized eigenvalue problem, as we did in the previous example. The matrix elements of the overlap matrix ${\bf S}$, the kinetic energy matrix ${\bf T}$, and the Coulomb interaction ${\bf V}$ are given by: \begin{eqnarray} S_{pq} &=& \int d^3r e^{-\alpha_pr^2}e^{-\alpha_qr^2} = \left(\frac{\pi}{\alpha_p+\alpha_q}\right)^{3/2} \\ T_{pq} &=& \int d^3r e^{-\alpha_pr^2}\nabla^2 e^{-\alpha_qr^2} = 3 \frac{\alpha_p\alpha_q\pi^{3/2}}{(\alpha_p+\alpha_q)^{5/2}} \\ V_{pq} &=& \int d^3r e^{-\alpha_pr^2} \frac{1}{r} e^{-\alpha_qr^2} = - \frac{2\pi}{(\alpha_p+\alpha_q)}. \end{eqnarray} Using these expressions, one can fill the overlap and Hamiltonian matrices and solve the problem numerically.

Exercise 16.2: Hydrogen atom

$\bullet$ Solve the problem stated in the previous section. If your program has no errors, you should obtain $E=-0.499278 E_H$, which is remarkable considering that only four basis states have been taken into account.

$\bullet$ Compare graphically the variational ground-state to the exact one, given by \begin{equation} \psi({\bf x}) = \frac{2}{a_0^{3/2}} \exp{(-x/a_0)} \end{equation}


In [41]:
import numpy as np
from scipy import linalg

N=4
h=np.zeros((N,N), dtype=float)
s=np.zeros((N,N), dtype=float)

𝛼 = np.array([13.00773,1.962079,0.444529,0.1219492])
for m in range(N):
    for n in range(m,N):
        t=3*𝛼[n]*𝛼[m]*np.pi**1.5/(𝛼[n]+𝛼[m])**2.5
        v=-2*np.pi/(𝛼[n]+𝛼[m])          
        h[n,m]=t+v
        s[n,m]=(np.pi/(𝛼[n]+𝛼[m]))**1.5

w, v = linalg.eigh(h,s)
for n in range(N):
    print(n,w[n])


0 -0.49927840566748505
1 0.1132139204579877
2 2.5922995719598165
3 21.144365190122503

In [ ]: