In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
sns.set_context('notebook', font_scale=1.5)
The first exercise is about using Newton's method to find the cube roots of unity - find $z$ such that $z^3 = 1$. From the fundamental theorem of algebra, we know there must be exactly 3 complex roots since this is a degree 3 polynomial.
We start with Euler's equation $$ e^{ix} = \cos x + i \sin x $$
Raising $e^{ix}$ to the $n$th power where $n$ is an integer, we get from Euler's formula with $nx$ substituting for $x$ $$ (e^{ix})^n = e^{i(nx)} = \cos nx + i \sin nx $$
Whenever $nx$ is an integer multiple of $2\pi$, we have $$ \cos nx + i \sin nx = 1 $$
So $$ e^{2\pi i \frac{k}{n}} $$ is a root of 1 whenever $k/n = 0, 1, 2, \ldots$.
So the cube roots of unity are $1, e^{2\pi i/3}, e^{4\pi i/3}$.
While we can do this analytically, the idea is to use Newton's method to find these roots, and in the process, discover some rather perplexing behavior of Newton's method.
In [2]:
from sympy import Symbol, exp, I, pi, N, expand
from sympy import init_printing
init_printing()
In [3]:
expand(exp(2*pi*I/3), complex=True)
Out[3]:
In [4]:
expand(exp(4*pi*I/3), complex=True)
Out[4]:
In [5]:
plt.figure(figsize=(4,4))
roots = np.array([[1,0], [-0.5, np.sqrt(3)/2], [-0.5, -np.sqrt(3)/2]])
plt.scatter(roots[:,0], roots[:,1], s=50, c='red')
xp = np.linspace(0, 2*np.pi, 100)
plt.plot(np.cos(xp), np.sin(xp), c='blue');
1. Newton's method for functions of complex variables - stability and basins of attraction. (30 points)
Write a function with the following function signature newton(z, f, fprime, max_iter=100, tol=1e-6)
where
z
is a starting value (a complex number e.g. 3 + 4j
)f
is a function of z
fprime
is the derivative of f
The function will run until either max_iter is reached or the absolute value of the Newton step is less than tol. In either case, the function should return the number of iterations taken and the final value of z
as a tuple (i
, z
). Define the function f
and fprime
that will result in Newton's method finding the cube roots of 1. Find 3 starting points that will give different roots, and print both the start and end points.
Write the following two plotting functions to see some (pretty) aspects of Newton's algorithm in the complex plane.
The first function plot_newton_iters(f, fprime, n=200, extent=[-1,1,-1,1], cmap='hsv')
calculates and stores the number of iterations taken for convergence (or max_iter) for each point in a 2D array. The 2D array limits are given by extent
- for example, when extent = [-1,1,-1,1]
the corners of the plot are (-i, -i), (1, -i), (1, i), (-1, i)
. There are n
grid points in both the real and imaginary axes. The argument cmap
specifies the color map to use - the suggested defaults are fine. Finally plot the image using plt.imshow
- make sure the axis ticks are correctly scaled. Make a plot for the cube roots of 1.
The second function plot_newton_basins(f, fprime, n=200, extent=[-1,1,-1,1], cmap='jet')
has the same arguments, but this time the grid stores the identity of the root that the starting point converged to. Make a plot for the cube roots of 1 - since there are 3 roots, there should be only 3 colors in the plot.
In [6]:
def newton(z, f, fprime, max_iter=100, tol=1e-6):
"""The Newton-Raphson method."""
for i in range(max_iter):
step = f(z)/fprime(z)
if abs(step) < tol:
return i, z
z -= step
return i, z
In [7]:
def plot_newton_iters(p, pprime, n=200, extent=[-1,1,-1,1], cmap='hsv'):
"""Shows how long it takes to converge to a root using the Newton-Rahphson method."""
m = np.zeros((n,n))
xmin, xmax, ymin, ymax = extent
for r, x in enumerate(np.linspace(xmin, xmax, n)):
for s, y in enumerate(np.linspace(ymin, ymax, n)):
z = x + y*1j
m[s, r] = newton(z, p, pprime)[0]
plt.imshow(m, cmap=cmap, extent=extent)
In [8]:
def plot_newton_basins(p, pprime, n=200, extent=[-1,1,-1,1], cmap='jet'):
"""Shows basin of attraction for convergence to each root using the Newton-Raphson method."""
root_count = 0
roots = {}
m = np.zeros((n,n))
xmin, xmax, ymin, ymax = extent
for r, x in enumerate(np.linspace(xmin, xmax, n)):
for s, y in enumerate(np.linspace(ymin, ymax, n)):
z = x + y*1j
root = np.round(newton(z, p, pprime)[1], 1)
if not root in roots:
roots[root] = root_count
root_count += 1
m[s, r] = roots[root]
plt.imshow(m, cmap=cmap, extent=extent)
In [9]:
plt.grid('off')
plot_newton_iters(lambda x: x**3 - 1, lambda x: 3*x**2)
In [10]:
plt.grid('off')
m = plot_newton_basins(lambda x: x**3 - 1, lambda x: 3*x**2)
2. Ill-conditioned linear problems. (20 points)
You are given a $n \times p$ design matrix $X$ and a $p$-vector of observations $y$ and asked to find the coefficients $\beta$ that solve the linear equations $X \beta = y$.
X = np.load('x.npy')
y = np.load('y.npy')
The solution $\beta$ can also be loaded as
beta = np.load('b.npy')
f1
that takes arguments $X$ and $y$ and returns $\beta$ using this formula.np.linalg.solve
that does not require inverting a matrix? Write a function f2
that takes arguments $X$ and $y$ and returns $\beta$ using this.np.linalg.lstsq(x, y)[0]
f1
or f2
? Remove the column of $X$ that is making the matrix singular and find the $p-1$ vector $b$ using f2
.np.linalg.lstsq
? This arises because the relevant condition number for f2
is actually for the matrix $X^TX$ while the condition number of lstsq
is for the matrix $X$. Why is the condition so high even after removing the column that makes the matrix singular?
In [11]:
X = np.load('x.npy')
y = np.load('y.npy')
beta = np.load('b.npy')
def f1(X, y):
"""Direct translation of normal equations to code."""
return np.dot(np.linalg.inv(np.dot(X.T, X)), np.dot(X.T, y))
def f2(X, y):
"""Solving normal equations wihtout matrix inversion."""
return np.linalg.solve(np.dot(X.T, X), np.dot(X.T, y))
%precision 3
print("X = ")
print(X)
# counting from 0 (so column 5 is the last column)
# we can see that column 5 is a multiple of column 3
# so one approach is to simply remove this (dependent) column
print("True solution\t\t", beta)
print("Library function\t", np.linalg.lstsq(X, y)[0])
print("Using f1\t\t", f1(X[:, :5], y))
print("Using f2\t\t", f2(X[:, :5], y))
In [12]:
np.linalg.svd(X)[1]
Out[12]:
In [13]:
np.linalg.svd(X[:, :-1])[1]
Out[13]:
In [14]:
np.linalg.cond(X[:, :-1])
Out[14]:
One way to think about the condition number is in terms of the ratio of the largest singular value to the smallest one - so a measure of the disproportionate stretching effect of the linear transform in one direction versus another. When this is very big, it means that errors in one or more direction will be amplified greatly. This often occurs because one or more columns is "almost" dependent - i.e. it can be approximated by a linear combination of the other columns.
3. Consider the following function on $\mathbb{R}^2$:
$$f(x_1,x_2) = -x_1x_2e^{-\frac{(x_1^2+x_2^2)}{2}}$$scipy.optimize.minimize
.matplotlib
. (20 points)
In [15]:
import sympy as sym
from sympy import Matrix
from numpy import linalg as la
x1, x2 = sym.symbols('x1 x2')
def f(x1, x2):
return sym.Matrix([-x1*x2*sym.exp(-(x1**2 + x2**2)/2)])
def h(x1,x2):
return x1**2+x2**2
def g(x1,x2):
return 2*x1+3*x2
def characterize_cp(H):
l,v = la.eig(H)
if(np.all(np.greater(l,np.zeros(2)))):
return("minimum")
elif(np.all(np.less(l,np.zeros(2)))):
return("maximum")
else:
return("saddle")
In [16]:
sym.init_printing()
In [17]:
fun = f(x1,x2)
X = sym.Matrix([x1,x2])
gradf = fun.jacobian(X)
sym.simplify(gradf)
Out[17]:
In [18]:
hessianf = gradf.jacobian(X)
sym.simplify(hessianf)
Out[18]:
In [19]:
fcritical = sym.solve(gradf,X)
for i in range(4):
H = np.array(hessianf.subs([(x1,fcritical[i][0]),(x2,fcritical[i][1])])).astype(float)
print(fcritical[i], characterize_cp(H))
In [20]:
import scipy.optimize as opt
In [21]:
def f(x):
return -x[0] * x[1] * np.exp(-(x[0]**2+x[1]**2)/2)
cons = ({'type': 'eq',
'fun' : lambda x: np.array([2.0*x[0] + 3.0*x[1] - 5.0]),
'jac' : lambda x: np.array([2.0,3.0])},
{'type': 'ineq',
'fun' : lambda x: np.array([-x[0]**2.0 - x[1]**2.0 + 10.0])})
x0 = [1.5,1.5]
cx = opt.minimize(f, x0, constraints=cons)
In [22]:
x = np.linspace(-5, 5, 200)
y = np.linspace(-5, 5, 200)
X, Y = np.meshgrid(x, y)
Z = f(np.vstack([X.ravel(), Y.ravel()])).reshape((200,200))
plt.contour(X, Y, Z)
plt.plot(x, (5-2*x)/3, 'k:', linewidth=1)
plt.plot(x, (10.0-x**2)**0.5, 'k:', linewidth=1)
plt.plot(x, -(10.0-x**2)**0.5, 'k:', linewidth=1)
plt.fill_between(x,(10-x**2)**0.5,-(10-x**2)**0.5,alpha=0.15)
plt.text(cx['x'][0], cx['x'][1], 'x', va='center', ha='center', size=20, color='red')
plt.axis([-5,5,-5,5])
plt.title('Contour plot of f(x) subject to constraints g(x) and h(x)')
plt.xlabel('x1')
plt.ylabel('x2')
pass
4. One of the goals of the course it that you will be able to implement novel algorithms from the literature. (30 points)
def mean_shift(xs, x, kernel, max_iters=100, tol=1e-6):
x1d.npy
x2d.npy
x0 = (-4, 5)
until it converges onto a mode in the 2D data for each kernel. Superimpose the path on top of a contour plot of the data density.
In [23]:
def gaussian_kernel(xs, x, h=1.0):
"""Gaussian kernel for a shifting window centerd at x."""
X = xs - x
try:
d = xs.shape[1]
except:
d = 1
k = np.array([(2*np.pi*h**d)**-0.5*np.exp(-(np.dot(_.T, _)/h)**2) for _ in X])
if d != 1:
k = k[:, np.newaxis]
return k
def flat_kernel(xs, x, h=1.0):
"""Flat kenrel for a shifting window centerd at x."""
X = xs - x
try:
d = xs.shape[1]
except:
d = 1
k = np.array([1 if np.dot(_.T, _) < h else 0 for _ in X])
if d != 1:
k = k[:, np.newaxis]
return k
def mean_shift(xs, x, kernel, max_iters=100, tol=1e-6, trace=False):
"""Finds the local mode using mean shift algorithm."""
record = []
for i in range(max_iters):
if trace:
record.append(x)
m = (kernel(xs, x)*xs).sum(axis=0)/kernel(xs, x).sum(axis=0) - x
if np.sum(m**2) < tol:
break
x += m
return i, x, np.array(record)
In [24]:
x1 = np.load('x1d.npy')
# choose kernel to evaluate
kernel = flat_kernel
# kernel = gaussian_kernel
i1, m1, path = mean_shift(x1, 1, kernel)
print(i1, m1)
i2, m2, path = mean_shift(x1, -7, kernel)
print(i2, m2)
i3, m3, path = mean_shift(x1, 7 ,kernel)
print(i3, m3)
xp = np.linspace(0, 1.0, 100)
plt.hist(x1, 50, histtype='step', normed=True);
plt.axvline(m1, c='blue')
plt.axvline(m2, c='blue')
plt.axvline(m3, c='blue');
In [25]:
x2 = np.load('x2d.npy')
# choose kernel to evaluate
# kernel = flat_kernel (also OK if they use the Epanachnikov kernel since the flat is a shadow of that)
kernel = gaussian_kernel
i1, m1, path1 = mean_shift(x2, [0,0], kernel, trace=True)
print(i1, m1)
i2, m2, path2 = mean_shift(x2, [-4,5], kernel, trace=True)
print(i2, m2)
i3, m3, path3 = mean_shift(x2, [10,10] ,kernel, trace=True)
print(i3, m3)
In [26]:
import scipy.stats as stats
# size of marekr at starting position
base = 40
# set up for estimating density using gaussian_kde
xmin, xmax = -6, 12
ymin,ymax = -5, 15
X, Y = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
positions = np.vstack([X.ravel(), Y.ravel()])
kde = stats.gaussian_kde(x2.T)
Z = np.reshape(kde(positions).T, X.shape)
plt.contour(X, Y, Z)
# plot data in background
plt.scatter(x2[:, 0], x2[:, 1], c='grey', alpha=0.2, edgecolors='none')
# path from [0,0]
plt.scatter(path1[:, 0], path1[:, 1], s=np.arange(base, base+len(path1)),
c='red', edgecolors='red', marker='x', linewidth=1.5)
# path from [-4,5]
plt.scatter(path2[:, 0], path2[:, 1], s=np.arange(base, base+len(path2)),
c='blue', edgecolors='blue', marker='x', linewidth=1.5)
# path from [10,10]
plt.scatter(path3[:, 0], path3[:, 1], s=np.arange(base, base+len(path3)),
c='green', edgecolors='green',marker='x', linewidth=1.5)
plt.axis([xmin, xmax, ymin, ymax]);