In [1]:

    
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
sns.set_context('notebook', font_scale=1.5)

The first exercise is about using Newton's method to find the cube roots of unity - find $z$ such that $z^3 = 1$. From the fundamental theorem of algebra, we know there must be exactly 3 complex roots since this is a degree 3 polynomial.

We start with Euler's equation $$ e^{ix} = \cos x + i \sin x $$

Raising $e^{ix}$ to the $n$th power where $n$ is an integer, we get from Euler's formula with $nx$ substituting for $x$ $$ (e^{ix})^n = e^{i(nx)} = \cos nx + i \sin nx $$

Whenever $nx$ is an integer multiple of $2\pi$, we have $$ \cos nx + i \sin nx = 1 $$

So $$ e^{2\pi i \frac{k}{n}} $$ is a root of 1 whenever $k/n = 0, 1, 2, \ldots$.

So the cube roots of unity are $1, e^{2\pi i/3}, e^{4\pi i/3}$.

While we can do this analytically, the idea is to use Newton's method to find these roots, and in the process, discover some rather perplexing behavior of Newton's method.



In [2]:

    
from sympy import Symbol, exp, I, pi, N, expand
from sympy import init_printing 
init_printing()



In [3]:

    
expand(exp(2*pi*I/3), complex=True)









    Out[3]:





$$- \frac{1}{2} + \frac{\sqrt{3} i}{2}$$



In [4]:

    
expand(exp(4*pi*I/3), complex=True)









    Out[4]:





$$- \frac{1}{2} - \frac{\sqrt{3} i}{2}$$



In [5]:

    
plt.figure(figsize=(4,4))
roots = np.array([[1,0], [-0.5, np.sqrt(3)/2], [-0.5, -np.sqrt(3)/2]])
plt.scatter(roots[:,0], roots[:,1], s=50, c='red')
xp = np.linspace(0, 2*np.pi, 100)
plt.plot(np.cos(xp), np.sin(xp), c='blue');

1. Newton's method for functions of complex variables - stability and basins of attraction. (30 points)

Write a function with the following function signature newton(z, f, fprime, max_iter=100, tol=1e-6) where
- z is a starting value (a complex number e.g. 3 + 4j)
- f is a function of z
- fprime is the derivative of f The function will run until either max_iter is reached or the absolute value of the Newton step is less than tol. In either case, the function should return the number of iterations taken and the final value of z as a tuple (i, z).
Define the function f and fprime that will result in Newton's method finding the cube roots of 1. Find 3 starting points that will give different roots, and print both the start and end points.

Write the following two plotting functions to see some (pretty) aspects of Newton's algorithm in the complex plane.

The first function plot_newton_iters(f, fprime, n=200, extent=[-1,1,-1,1], cmap='hsv') calculates and stores the number of iterations taken for convergence (or max_iter) for each point in a 2D array. The 2D array limits are given by extent - for example, when extent = [-1,1,-1,1] the corners of the plot are (-i, -i), (1, -i), (1, i), (-1, i). There are n grid points in both the real and imaginary axes. The argument cmap specifies the color map to use - the suggested defaults are fine. Finally plot the image using plt.imshow - make sure the axis ticks are correctly scaled. Make a plot for the cube roots of 1.
The second function plot_newton_basins(f, fprime, n=200, extent=[-1,1,-1,1], cmap='jet') has the same arguments, but this time the grid stores the identity of the root that the starting point converged to. Make a plot for the cube roots of 1 - since there are 3 roots, there should be only 3 colors in the plot.



In [6]:

    
def newton(z, f, fprime, max_iter=100, tol=1e-6):
    """The Newton-Raphson method."""
    for i in range(max_iter):
        step = f(z)/fprime(z)
        if abs(step) < tol:
            return i, z
        z -= step
    return i, z



In [7]:

    
def plot_newton_iters(p, pprime, n=200, extent=[-1,1,-1,1], cmap='hsv'):
    """Shows how long it takes to converge to a root using the Newton-Rahphson method."""
    m = np.zeros((n,n))
    xmin, xmax, ymin, ymax = extent
    for r, x in enumerate(np.linspace(xmin, xmax, n)):
        for s, y in enumerate(np.linspace(ymin, ymax, n)):
            z = x + y*1j
            m[s, r] = newton(z, p, pprime)[0]
    plt.imshow(m, cmap=cmap, extent=extent)



In [8]:

    
def plot_newton_basins(p, pprime, n=200, extent=[-1,1,-1,1], cmap='jet'):
    """Shows basin of attraction for convergence to each root using the Newton-Raphson method."""
    root_count = 0
    roots = {}

    m = np.zeros((n,n))
    xmin, xmax, ymin, ymax = extent
    for r, x in enumerate(np.linspace(xmin, xmax, n)):
        for s, y in enumerate(np.linspace(ymin, ymax, n)):
            z = x + y*1j
            root = np.round(newton(z, p, pprime)[1], 1)
            if not root in roots:
                roots[root] = root_count
                root_count += 1
            m[s, r] = roots[root]
    plt.imshow(m, cmap=cmap, extent=extent)



In [9]:

    
plt.grid('off')
plot_newton_iters(lambda x: x**3 - 1, lambda x: 3*x**2)



In [10]:

    
plt.grid('off')
m = plot_newton_basins(lambda x: x**3 - 1, lambda x: 3*x**2)

2. Ill-conditioned linear problems. (20 points)

You are given a $n \times p$ design matrix $X$ and a $p$-vector of observations $y$ and asked to find the coefficients $\beta$ that solve the linear equations $X \beta = y$.

X = np.load('x.npy')
y = np.load('y.npy')

The solution $\beta$ can also be loaded as

beta = np.load('b.npy')

Write a formula that could solve the system of linear equations in terms of $X$ and $y$ Write a function f1 that takes arguments $X$ and $y$ and returns $\beta$ using this formula.
How could you code this formula using np.linalg.solve that does not require inverting a matrix? Write a function f2 that takes arguments $X$ and $y$ and returns $\beta$ using this.
Note that carefully designed algorithms can solve this ill-conditioned problem, which is why you should always use library functions for linear algebra rather than write your own.
```
np.linalg.lstsq(x, y)[0]
```
What happens if you try to solve for $\beta$ using f1 or f2? Remove the column of $X$ that is making the matrix singular and find the $p-1$ vector $b$ using f2.
Note that the solution differs from that given by np.linalg.lstsq? This arises because the relevant condition number for f2 is actually for the matrix $X^TX$ while the condition number of lstsq is for the matrix $X$. Why is the condition so high even after removing the column that makes the matrix singular?



In [11]:

    
X = np.load('x.npy')
y = np.load('y.npy')
beta = np.load('b.npy')

def f1(X, y):
    """Direct translation of normal equations to code."""
    return np.dot(np.linalg.inv(np.dot(X.T, X)), np.dot(X.T, y))

def f2(X, y):
    """Solving normal equations wihtout matrix inversion."""
    return np.linalg.solve(np.dot(X.T, X), np.dot(X.T, y))

%precision 3
print("X = ")
print(X)
# counting from 0 (so column 5 is the last column)
# we can see that column 5 is a multiple of column 3
# so one approach is to simply remove this (dependent) column

print("True solution\t\t", beta)
print("Library function\t", np.linalg.lstsq(X, y)[0])
print("Using f1\t\t", f1(X[:, :5], y))
print("Using f2\t\t", f2(X[:, :5], y))









    



X = 
[[  5.000e+00   4.816e+14   9.000e+00   5.000e+00   0.000e+00   5.000e+01]
 [  1.000e+00   4.214e+14   6.000e+00   9.000e+00   2.000e+00   9.000e+01]
 [  5.000e+00   1.204e+14   4.000e+00   2.000e+00   4.000e+00   2.000e+01]
 [  7.000e+00   5.418e+14   1.000e+00   7.000e+00   0.000e+00   7.000e+01]
 [  9.000e+00   5.418e+14   7.000e+00   6.000e+00   9.000e+00   6.000e+01]
 [  0.000e+00   6.020e+13   8.000e+00   8.000e+00   3.000e+00   8.000e+01]
 [  8.000e+00   4.214e+14   3.000e+00   6.000e+00   5.000e+00   6.000e+01]
 [  9.000e+00   1.806e+14   4.000e+00   8.000e+00   1.000e+00   8.000e+01]
 [  0.000e+00   1.806e+14   9.000e+00   2.000e+00   0.000e+00   2.000e+01]
 [  9.000e+00   1.204e+14   7.000e+00   7.000e+00   9.000e+00   7.000e+01]]
True solution		 [ 0.469  0.096  0.903  0.119  0.525  0.084]
Library function	 [ 0.469  0.096  0.903  0.009  0.526  0.095]
Using f1		 [ 0.465  0.096  0.908  0.951  0.523]
Using f2		 [ 0.471  0.096  0.903  0.953  0.526]

Condition numbers are ratio of largest to smallest singular values



In [12]:

    
np.linalg.svd(X)[1]









    Out[12]:





array([  1.128e+15,   1.124e+02,   1.364e+01,   1.151e+01,   6.344e+00,
         9.940e-16])



In [13]:

    
np.linalg.svd(X[:, :-1])[1]









    Out[13]:





array([  1.128e+15,   1.863e+01,   1.224e+01,   8.025e+00,   6.086e+00])



In [14]:

    
np.linalg.cond(X[:, :-1])









    Out[14]:





$$1.85324085869e+14$$

One way to think about the condition number is in terms of the ratio of the largest singular value to the smallest one - so a measure of the disproportionate stretching effect of the linear transform in one direction versus another. When this is very big, it means that errors in one or more direction will be amplified greatly. This often occurs because one or more columns is "almost" dependent - i.e. it can be approximated by a linear combination of the other columns.

3. Consider the following function on $\mathbb{R}^2$:

$$f(x_1,x_2) = -x_1x_2e^{-\frac{(x_1^2+x_2^2)}{2}}$$

Write down its gradient.
write down the Hessian matrix.
Find the critical points of $f$.
Characterize the critical points as max/min or neither. Find the minimum under the constraint $$g(x) = x_1^2+x_2^2 \leq 10$$ and $$h(x) = 2x_1 + 3x_2 = 5$$ using scipy.optimize.minimize.
Plot the function contours using matplotlib. (20 points)



In [15]:

    
import sympy as sym
from sympy import Matrix
from numpy import linalg as la

x1, x2 = sym.symbols('x1 x2')
def f(x1, x2):
    return sym.Matrix([-x1*x2*sym.exp(-(x1**2 + x2**2)/2)])

def h(x1,x2):
    return x1**2+x2**2

def g(x1,x2):
    return 2*x1+3*x2

def characterize_cp(H):
    l,v = la.eig(H)
    if(np.all(np.greater(l,np.zeros(2)))):
       return("minimum")
    elif(np.all(np.less(l,np.zeros(2)))):
       return("maximum")
    else:
       return("saddle")



In [16]:

    
sym.init_printing()



In [17]:

    
fun = f(x1,x2)
X = sym.Matrix([x1,x2])

gradf = fun.jacobian(X)
sym.simplify(gradf)









    Out[17]:





$$\left[\begin{matrix}x_{2} \left(x_{1}^{2} - 1\right) e^{- \frac{x_{1}^{2}}{2} - \frac{x_{2}^{2}}{2}} & x_{1} \left(x_{2}^{2} - 1\right) e^{- \frac{x_{1}^{2}}{2} - \frac{x_{2}^{2}}{2}}\end{matrix}\right]$$



In [18]:

    
hessianf = gradf.jacobian(X)
sym.simplify(hessianf)









    Out[18]:





$$\left[\begin{matrix}x_{1} x_{2} \left(- x_{1}^{2} + 3\right) e^{- \frac{x_{1}^{2}}{2} - \frac{x_{2}^{2}}{2}} & \left(- x_{1}^{2} x_{2}^{2} + x_{1}^{2} + x_{2}^{2} - 1\right) e^{- \frac{x_{1}^{2}}{2} - \frac{x_{2}^{2}}{2}}\\\left(- x_{1}^{2} x_{2}^{2} + x_{1}^{2} + x_{2}^{2} - 1\right) e^{- \frac{x_{1}^{2}}{2} - \frac{x_{2}^{2}}{2}} & x_{1} x_{2} \left(- x_{2}^{2} + 3\right) e^{- \frac{x_{1}^{2}}{2} - \frac{x_{2}^{2}}{2}}\end{matrix}\right]$$



In [19]:

    
fcritical = sym.solve(gradf,X)

for i in range(4):
    H = np.array(hessianf.subs([(x1,fcritical[i][0]),(x2,fcritical[i][1])])).astype(float) 
    print(fcritical[i], characterize_cp(H))









    



(-1, -1) minimum
(-1, 1) maximum
(0, 0) saddle
(1, -1) maximum



In [20]:

    
import scipy.optimize as opt



In [21]:

    
def f(x):
    return -x[0] * x[1] * np.exp(-(x[0]**2+x[1]**2)/2)

cons = ({'type': 'eq',
         'fun' : lambda x: np.array([2.0*x[0] + 3.0*x[1] - 5.0]),
         'jac' : lambda x: np.array([2.0,3.0])},
        {'type': 'ineq',
         'fun' : lambda x: np.array([-x[0]**2.0 - x[1]**2.0 + 10.0])})

x0 = [1.5,1.5]
cx = opt.minimize(f, x0, constraints=cons)



In [22]:

    
x = np.linspace(-5, 5, 200)
y = np.linspace(-5, 5, 200)
X, Y = np.meshgrid(x, y)
Z = f(np.vstack([X.ravel(), Y.ravel()])).reshape((200,200))

plt.contour(X, Y, Z)

plt.plot(x, (5-2*x)/3, 'k:', linewidth=1)
plt.plot(x, (10.0-x**2)**0.5, 'k:', linewidth=1)
plt.plot(x, -(10.0-x**2)**0.5, 'k:', linewidth=1)
plt.fill_between(x,(10-x**2)**0.5,-(10-x**2)**0.5,alpha=0.15)
plt.text(cx['x'][0], cx['x'][1], 'x', va='center', ha='center', size=20, color='red')
plt.axis([-5,5,-5,5])
plt.title('Contour plot of f(x) subject to constraints g(x) and h(x)')
plt.xlabel('x1')
plt.ylabel('x2')
pass









    



/Users/cliburn/anaconda2/envs/p3/lib/python3.5/site-packages/ipykernel/__main__.py:9: RuntimeWarning: invalid value encountered in sqrt
/Users/cliburn/anaconda2/envs/p3/lib/python3.5/site-packages/ipykernel/__main__.py:10: RuntimeWarning: invalid value encountered in sqrt
/Users/cliburn/anaconda2/envs/p3/lib/python3.5/site-packages/ipykernel/__main__.py:11: RuntimeWarning: invalid value encountered in sqrt

4. One of the goals of the course it that you will be able to implement novel algorithms from the literature. (30 points)

Implement the mean-shift algorithm in 1D as described here.
- Use the following function signature
```
def mean_shift(xs, x, kernel, max_iters=100, tol=1e-6):
```
- xs is the data set, x is the starting location, and kernel is a kernel function
- tol is the difference in $||x||$ across iterations
Use the following kernels with bandwidth $h$ (a default value of 1.0 will work fine)
- Flat - return 1 if $||x|| < h$ and 0 otherwise
- Gaussian $$\frac{1}{\sqrt{2 \pi h}}e^{\frac{-||x||^2}{h^2}}$$
- Note that $||x||$ is the norm of the data point being evaluated minus the current value of $x$
Use both kernels to find all 3 modes of the data set in x1d.npy
Modify the algorithm and/or kernels so that it now works in an arbitrary number of dimensions.
Use both kernels to find all 3 modes of the data set in x2d.npy
Plot the path of successive intermediate solutions of the mean-shift algorithm starting from x0 = (-4, 5) until it converges onto a mode in the 2D data for each kernel. Superimpose the path on top of a contour plot of the data density.



In [23]:

    
def gaussian_kernel(xs, x, h=1.0):
    """Gaussian kernel for a shifting window centerd at x."""

    X = xs - x
    try:
        d = xs.shape[1]
    except:
        d = 1        
    k = np.array([(2*np.pi*h**d)**-0.5*np.exp(-(np.dot(_.T, _)/h)**2) for _ in X])
    if d != 1:
        k = k[:, np.newaxis]
    return k

def flat_kernel(xs, x, h=1.0):
    """Flat kenrel for a shifting window centerd at x."""

    X = xs - x
    try:
        d = xs.shape[1]
    except:
        d = 1    
    k = np.array([1 if np.dot(_.T, _) < h else 0 for _ in X])
    if d != 1:
        k = k[:, np.newaxis]
    return k

def mean_shift(xs, x, kernel, max_iters=100, tol=1e-6, trace=False):
    """Finds the local mode using mean shift algorithm."""

    record = []
    
    for i in range(max_iters):
        if trace:
            record.append(x)
        m = (kernel(xs, x)*xs).sum(axis=0)/kernel(xs, x).sum(axis=0) - x
        if np.sum(m**2) < tol: 
            break
        x += m
    return i, x, np.array(record)



In [24]:

    
x1 = np.load('x1d.npy')

# choose kernel to evaluate
kernel = flat_kernel
# kernel = gaussian_kernel

i1, m1, path = mean_shift(x1, 1, kernel)
print(i1, m1)

i2, m2, path = mean_shift(x1, -7, kernel)
print(i2, m2)

i3, m3, path = mean_shift(x1, 7 ,kernel)
print(i3, m3)

xp = np.linspace(0, 1.0, 100)
plt.hist(x1, 50, histtype='step', normed=True);
plt.axvline(m1, c='blue')
plt.axvline(m2, c='blue')
plt.axvline(m3, c='blue');









    



15 0.575467435455
16 -4.99502467965
16 3.97936417657



In [25]:

    
x2 = np.load('x2d.npy')

# choose kernel to evaluate
# kernel = flat_kernel (also OK if they use the Epanachnikov kernel since the flat is a shadow of that)
kernel = gaussian_kernel

i1, m1, path1 = mean_shift(x2, [0,0], kernel, trace=True)
print(i1, m1)

i2, m2, path2 = mean_shift(x2, [-4,5], kernel, trace=True)
print(i2, m2)

i3, m3, path3 = mean_shift(x2, [10,10] ,kernel, trace=True)
print(i3, m3)









    



59 [ 2.318  2.826]
12 [-3.07   3.057]
42 [ 6.023  8.951]



In [26]:

    
import scipy.stats as stats

# size of marekr at starting position
base = 40

# set up for estimating density using gaussian_kde
xmin, xmax = -6, 12
ymin,ymax = -5, 15
X, Y = np.mgrid[xmin:xmax:100j, ymin:ymax:100j]
positions = np.vstack([X.ravel(), Y.ravel()])
kde = stats.gaussian_kde(x2.T)
Z = np.reshape(kde(positions).T, X.shape)

plt.contour(X, Y, Z)
# plot data in background
plt.scatter(x2[:, 0], x2[:, 1], c='grey', alpha=0.2, edgecolors='none')

# path from [0,0]
plt.scatter(path1[:, 0], path1[:, 1], s=np.arange(base, base+len(path1)), 
            c='red', edgecolors='red', marker='x', linewidth=1.5)

# path from [-4,5]
plt.scatter(path2[:, 0], path2[:, 1], s=np.arange(base, base+len(path2)), 
            c='blue', edgecolors='blue', marker='x', linewidth=1.5)

# path from [10,10]
plt.scatter(path3[:, 0], path3[:, 1], s=np.arange(base, base+len(path3)), 
            c='green', edgecolors='green',marker='x', linewidth=1.5)
plt.axis([xmin, xmax, ymin, ymax]);