In [1]:
from scipy import *
from matplotlib.pyplot import *
from scipy.linalg import *
from numpy.linalg import *
import FNC
In [2]:
# This (optional) block is for improving the display of plots.
from IPython.display import set_matplotlib_formats
set_matplotlib_formats("svg","pdf")
rcParams["figure.figsize"] = [7,4]
rcParams["lines.linewidth"] = 2
rcParams["lines.markersize"] = 4
rcParams['animation.html'] = "jshtml" # or try "html5"
Here are 5-year averages of the worldwide temperature anomaly as compared to the 1951-1980 average (source: NASA).
In [3]:
year = arange(1955,2005,5)
y = array([ -0.0480, -0.0180, -0.0360, -0.0120, -0.0040,
0.1180, 0.2100, 0.3320, 0.3340, 0.4560 ])
fig,ax = subplots()
ax.scatter(year,y,color="k",label="data")
xlabel("year")
ylabel("anomaly (degrees C)")
title("World temperature anomaly");
A polynomial interpolant can be used to fit the data. Here we build one using a Vandermonde matrix. First, though, we express time as decades since 1950, as it improves the condition number of the matrix.
In [4]:
t = (year-1950)/10
V = vander(t)
c = solve(V,y)
p = poly1d(c)
tt = linspace(1955,2000,500)
ax.plot(tt,p((tt-1950)/10),label="interpolant")
ax.legend();
fig
Out[4]:
As you can see, the interpolant does represent the data, in a sense. However it's a crazy-looking curve for the application. Trying too hard to reproduce all the data exactly is known as overfitting.
Here are the 5-year temperature averages again.
In [5]:
year = arange(1955,2005,5)
y = array([ -0.0480, -0.0180, -0.0360, -0.0120, -0.0040,
0.1180, 0.2100, 0.3320, 0.3340, 0.4560 ])
t = (year-1950)/10
The standard best-fit line results from using a linear polynomial that meets the least squares criterion.
In [6]:
V = [ [t[i],1] for i in range(t.size) ] # Vandermonde-ish matrix
c,res,rank,sv = lstsq(V,y)
fig,ax = subplots()
ax.scatter(year,y,color="k",label="data")
p = poly1d(c)
tt = linspace(1955,2000,500)
ax.plot(tt,p((tt-1950)/10),label="linear fit")
xlabel("year")
ylabel("anomaly (degrees C)")
title("World temperature anomaly");
ax.legend();
If we use a global cubic polynomial, the points are fit more closely.
In [7]:
V = [ [t[i]**3,t[i]**2,t[i],1] for i in range(t.size) ] # Vandermonde-ish matrix
c,res,rank,sv = lstsq(V,y,rcond=None)
p = poly1d(c)
ax.plot(tt,p((tt-1950)/10),label="cubic fit")
fig
Out[7]:
If we were to continue increasing the degree of the polynomial, the residual at the data points would get smaller, but overfitting would increase.
Finding numerical approximations to $\pi$ has fascinated people for millenia. One famous formula is
$$ \frac{\pi^2}{6} = 1 + \frac{1}{2^2} + \frac{1}{3^2} + \cdots. $$Say $s_k$ is the sum of the first terms of the series above, and $p_k = \sqrt{6s_k}$. Here is a fancy way to compute these sequences in a compact code.
In [8]:
a = array([1/(k+1)**2 for k in range(100)])
s = cumsum(a) # cumulative summation
p = sqrt(6*s)
plot(range(100),p,"o")
xlabel("$k$")
ylabel("$p_k$")
title("Sequence convergence");
This graph suggests that $p_k\to \pi$ but doesn't give much information about the rate of convergence. Let $\epsilon_k=|\pi-p_k|$ be the sequence of errors. By plotting the error sequence on a log-log scale, we can see a nearly linear relationship.
In [9]:
ep = abs(pi-p) # error sequence
loglog(range(100),ep,"o")
xlabel("$k$")
ylabel("error")
title("Sequence convergence");
This suggests a power-law relationship where $\epsilon_k\approx a k^b$, or $\log \epsilon_k \approx b (\log k) + \log a$.
In [10]:
V = [ [1,log(k+1)] for k in range(100) ] # fitting matrix
c = lstsq(V,log(ep),rcond=None)[0] # coefficients of linear fit
print(c)
In terms of the parameters $a$ and $b$ used above, we have
In [11]:
a,b = exp(c[0]),c[1]
print("b:",b)
It's tempting to conjecture that $b\to -1$ asymptotically. Here is how the numerical fit compares to the original convergence curve.
In [12]:
loglog(range(100),ep,"o",label="sequence")
k = arange(1,100)
plot(k,a*k**b,"--",label="power fit")
xlabel("$k$"); ylabel("error");
legend(); title("Sequence convergence");
Because the functions $\sin^2(t)$, $\cos^2(t)$, and $1$ are linearly dependent, we should find that the following matrix is somewhat ill-conditioned.
In [13]:
t = linspace(0,3,400)
A = array([ [ sin(t)**2,cos((1+1e-7)*t)**2,1 ] for t in t ])
kappa = cond(A)
print(kappa)
Now we set up an artificial linear least squares problem with a known exact solution that actually makes the residual zero.
In [14]:
x = array([1.,2,1])
b = A@x
Using backslash to find the solution, we get a relative error that is about $\kappa$ times machine epsilon.
In [15]:
x_BS = lstsq(A,b,rcond=None)[0]
print("observed error:",norm(x_BS-x)/norm(x))
print("max error:",kappa/2**52)
If we formulate and solve via the normal equations, we get a much larger relative error. With $\kappa^2\approx 10^{14}$, we may not be left with more than about 2 accurate digits.
In [16]:
N = A.T@A
x_NE = solve(N,A.T@b)
print("observed error:",norm(x_NE-x)/norm(x))
print("digits:",-log10(norm(x_NE-x)/norm(x)))
We can access both the thin and full forms of the QR factorization. Thin is the default (we use the numpy version here, not the scipy version).
In [17]:
A = 1.0 + floor(9*rand(6,4))
A.shape
Out[17]:
Here is a standard call:
In [18]:
Q,R = qr(A)
print("Q:",Q)
print("R:",R)
We can test that $\mathbf{Q}$ has orthonormal columns.
In [19]:
norm(Q.T@Q - eye(4))
Out[19]:
Here's the full or "complete" factorization.
In [20]:
Q,R = qr(A,"complete")
print(Q.shape)
In [21]:
norm(Q.T@Q - eye(6))
Out[21]:
We will use Householder reflections to produce a QR factorization of the matrix
In [22]:
A = 1.0 + floor(9*rand(6,4))
m,n = A.shape
Our first step is to introduce zeros below the diagonal in column 1. Define the vector
In [23]:
z = A[:,0]
Applying the Householder definitions gives us
In [24]:
v = z - norm(z)*hstack([1,zeros(m-1)])
P = eye(m) - (2/dot(v,v))*outer(v,v) # reflector
By design we can use the reflector to get the zero structure we seek:
In [25]:
P@z
Out[25]:
Now we let
In [26]:
A = P@A
print(A)
We are set to put zeros into column 2. We must not use row 1 in any way, lest it destroy the zeros we just introduced. To put it another way, we can repeat the process we just did on the smaller submatrix
In [27]:
A[1:,1:]
Out[27]:
In [28]:
z = A[1:,1]
v = z - norm(z)*hstack([1,zeros(m-2)])
P = eye(m-1) - (2/dot(v,v))*outer(v,v)
We now apply the reflector to the submatrix.
In [29]:
A[1:,1:] = P@A[1:,1:]
print(A)
We need two more iterations of this process.
In [30]:
for j in [2,3]:
z = A[j:,j]
v = z - norm(z)*hstack([1,zeros(m-j-1)])
P = eye(m-j) - (2/dot(v,v))*outer(v,v)
A[j:,j:] = P@A[j:,j:]
We have now reduced the original to an upper triangular matrix using four orthogonal Householder reflections:
In [31]:
R = A
print(R)