| Text provided under a Creative Commons Attribution license, CC-BY. All code is made available under the FSF-approved MIT license. (c) Marc Spiegelman |
|
In [ ]:
%matplotlib inline
import numpy as np
import scipy.linalg as la
import matplotlib.pyplot as plt
import csv
GOAL: Demonstrate the use of the SVD to calculate principal components or "Empirical Orthogonal Functions" in a geophysical data set. This example is modified from a paper by Chris Small (LDEO)
Small, C., 1994. A global analysis of mid-ocean ridge axial topography. Geophys J Int 116, 64–84. doi:10.1111/j.1365-246X.1994.tb02128.x
In [ ]:
# read the data from the csv file
data = np.genfromtxt('m80.csv', delimiter='')
data_mean = np.mean(data,0)
# and plot out a few profiles and the mean depth.
plt.figure()
rows = [ 9,59,99]
labels = [ 'slow','medium','fast']
for i,row in enumerate(rows):
plt.plot(data[row,:],label=labels[i])
plt.hold(True)
plt.plot(data_mean,'k--',label='mean')
plt.xlabel('Distance across axis (km)')
plt.ylabel('Relative Elevation (m)')
plt.legend(loc='best')
plt.title('Example cross-axis topography of mid-ocean ridges')
plt.show()
In [ ]:
plt.figure()
X = data - data_mean
plt.imshow(X)
plt.xlabel('Distance across axis (Km)')
plt.ylabel('Relative Spreading Rate')
plt.colorbar()
plt.show()
In [ ]:
# now calculate the SVD of the de-meaned data matrix
U,S,Vt = la.svd(X,full_matrices=False)
And begin by looking at the spectrum of singular values $\Sigma$. Defining the variance as $\Sigma^2$ then we can also calculate the cumulative contribution to the total variance as
$$ g_k = \frac{\sum_{i=0}^k \sigma_i^2}{\sum_{i=0}^n \sigma_i^2} $$Plotting both $\Sigma$ and $g$ shows that $\sim$ 80% of the total variance can be explained by the first 4-5 Components
In [ ]:
# plot the singular values
plt.figure()
plt.semilogy(S,'bo')
plt.grid()
plt.title('Singular Values')
plt.show()
# and cumulative percent of variance
g = np.cumsum(S*S)/np.sum(S*S)
plt.figure()
plt.plot(g,'bx-')
plt.title('% cumulative percent variance explained')
plt.grid()
plt.show()
Plotting the first 4 Singular Vectors in $V$, shows them to reflect some commonly occuring patterns in the data
In [ ]:
plt.figure()
num_EOFs=3
for row in range(num_EOFs):
plt.plot(Vt[row,:],label='EOF{}'.format(row+1))
plt.grid()
plt.xlabel('Distance (km)')
plt.title('First {} EOFs '.format(num_EOFs))
plt.legend(loc='best')
plt.show()
For example, the first EOF pattern is primarily a symmetric pattern with an axial high surrounded by two off axis troughs (or an axial low with two flanking highs, the EOF's are just unit vector bases for the row-space and can be added with any positive or negative coefficient). The Second EOF is broader and all of one sign while the third EOF encodes assymetry.
Using the SVD we can also decompose each profile into a weighted linear combination of EOF's i.e.
$$ X = U\Sigma V^T = C V^T $$where $C = U\Sigma$ is a matrix of coefficients that describes the how each data row is decomposed into the relevant basis vectors. We can then produce a k-rank truncated representation of the data by
$$ X_k = C_k V_k^T $$where $C_k$ is the first $k$ columns of $C$ and $V_k$ is the first $k$ EOF's.
Here we show the original data and the reconstructed data using the first 5 EOF's
In [ ]:
# recontruct the data using the first 5 EOF's
k=5
Ck = np.dot(U[:,:k],np.diag(S[:k]))
Vtk = Vt[:k,:]
data_k = data_mean + np.dot(Ck,Vtk)
plt.figure()
plt.imshow(data_k)
plt.colorbar()
plt.title('reconstructed data')
plt.show()
And we can consider a few reconstructed profiles compared with the original data
In [ ]:
# show the original 3 profiles and their recontructed values using the first k EOF's
for i,row in enumerate(rows):
plt.figure()
plt.plot(data_k[row,:],label='k={}'.format(k))
plt.hold(True)
plt.plot(data[row,:],label='original data')
Cstring = [ '{:3.0f}, '.format(Ck[row,i]) for i in range(k) ]
plt.title('Reconstruction profile {}:\n C_{}='.format(row,k)+''.join(Cstring))
plt.legend(loc='best')
plt.show()
We can also use the Principal Components to look at the projection of the data onto a lower dimensional space as the coefficients $C$, are simply the coordinates of our data along each principal component. For example we can view the data in the 2-Dimensional space defined by the first 2 EOF's by simply plotting C_1 against C_2.
In [ ]:
# plot the data in the plane defined by the first two principal components
plt.figure()
plt.scatter(Ck[:,0],Ck[:,1])
plt.xlabel('$V_1$')
plt.ylabel('$V_2$')
plt.grid()
plt.title('Projection onto the first two principal components')
plt.show()
# Or consider the degree of assymetry (EOF 3) as a function of spreading rate
plt.figure()
plt.plot(Ck[:,2],'bo')
plt.xlabel('Spreading rate')
plt.ylabel('$C_3$')
plt.grid()
plt.title('Degree of assymetry')
plt.show()
In [ ]: