Matrix factorization is a very interesting area of machine learning research. Formulating a problem as a 2D matrix $X$ to be decomposed into multiple matrices, which combine to return an approximation of $X$, can lead to state of the art results for many interesting problems. This core concept is the focus of compressive sensing, matrix completion, sparse coding, robust PCA, dictionary learning, and many other algorithms. One major website which shows many different types of matrix decomposition algorithms is the Matrix Factorization Jungle, run by Igor Carron. There has been a heavy focus on random projections in recent algorithms, which can often lead to increased stability and computationally efficient solutions.
Below is a link to the GoDec algorithm output, as applied to the "Hall" video (shown below) found in this zip file, which is a surveillance tape taken from a mall. Using the GoDec algorithm, the background is almost completely subtracted from the noisy elements of people walking, while still capturing periodic background elements as part of the background. I have written code for both the GoDec and Robust PCA algorithms in numpy based on their Matlab equivalents. There are many datasets which can be found here, and we will set up a simple download function for ease-of-access. Special thanks to @kuantkid for the PyRPCA repo, which was the inspiration to start and extend this work, and especially the idea of creating a demo video from PNGs which is PRETTY. DANG. AWESOME.
In [1]:
from IPython.display import YouTubeVideo
YouTubeVideo('JgfK46RA8XY')
Out[1]:
First we want to download a video, so that we can compare the algorithmic result against the original video. The file is downloaded, if it does not already exist in the working directory. Next, it will create a directory of the same name, and unzip the file contents (Campus.zip to Campus/filename).
In [2]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
In [3]:
try:
from urllib2 import urlopen
except ImportError:
from urllib.request import urlopen
from scipy.io import loadmat, savemat
import os
ext = {"water":'WaterSurface.zip',
"fountain":'Fountain.zip',
"campus":'Campus.zip',
"escalator": 'Escalator.zip',
"curtain": 'Curtain.zip',
"lobby": 'Lobby.zip',
"mall": 'ShoppingMall.zip',
"hall": 'hall.zip',
"bootstrap": 'Bootstrap.zip'}
example = "mall"
def progress_bar_downloader(url, fname, progress_update_every=5):
#from http://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python/22776#22776
u = urlopen(url)
f = open(fname, 'wb')
meta = u.info()
file_size = int(meta.get("Content-Length"))
print("Downloading: %s Bytes: %s" % (fname, file_size))
file_size_dl = 0
block_sz = 8192
p = 0
while True:
buffer = u.read(block_sz)
if not buffer:
break
file_size_dl += len(buffer)
f.write(buffer)
if (file_size_dl * 100. / file_size) > p:
status = r"%10d [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
print(status)
p += progress_update_every
f.close()
def get_video_clip(d):
#Download files from http://perception.i2r.a-star.edu.sg/bk_model/bk_index.html
if os.path.exists('./' + d):
print('Video file %s already downloaded, continuing' % d)
return
else:
print('Video file %s not found, downloading' % d)
progress_bar_downloader(r'http://perception.i2r.a-star.edu.sg/BK_Model_TestData/' + d, d)
def bname(x): return x.split('.')[0]
get_video_clip(ext[example])
if not os.path.exists('./' + bname(ext[example])):
os.makedirs(bname(ext[example]))
os.system('unzip ' + ext[example] + ' -d ' + bname(ext[example]))
The code below will read in all the .bmp images downloaded and unzipped from the website, as well as converting to grayscale, scaling the result between 0 and 1. Eventually, I plan to do a "full-color" version of this testing, but for now the greyscale will have to suffice.
In [4]:
from scipy import misc
import numpy as np
from glob import glob
def rgb2gray(rgb):
r, g, b = rgb[:, :, 0], rgb[:, :, 1], rgb[:, :, 2]
gray = 0.2989 * r + 0.5870 * g + 0.1140 * b
return gray / 255.
fdir = bname(ext[example])
names = sorted(glob(fdir + "/*.bmp"))
d1, d2, channels = misc.imread(names[0]).shape
d1 = 128
d2 = 160
num = len(names)
X = np.zeros((d1, d2, num))
for n, i in enumerate(names):
X[:, :, n] = misc.imresize(rgb2gray(misc.imread(i).astype(np.double)) / 255., (d1, d2))
X = X.reshape(d1 * d2, num)
clip = 100
print(X.shape)
print(d1)
print(d2)
Robust Principal Component Analysis (PCA) is an extension of PCA. Rather than attempting to solve $X = L$, where $L$ is typically a low-rank approximation ($N \times M$, vs. $N \times P$, $M < P$), Robust PCA solves the factorization problem $X = L + S$, where $L$ is a low-rank approximation, and $S$ is a sparse component. By separating the factorization into two separate matrix components, Robust PCA makes a much better low-rank estimate $L$ on many problems.
There are a variety of algorithms to solve this optimization problem. The code below is an implementation of the Inexact Augmented Lagrangian Multiplier algorithm for Robust PCA which is identical to the equivalent MATLAB code (download), or as near as I could make it. The functionality seems equivalent, and for relevant details please see the paper. This algorithm was chosen because according to the timing results at the bottom of this page, it was both the fastest and most accurate of the formulas listed. Though it appears to be fairly slow in our testing, it is fully believable that this is an implementation issue, since this code has not been specifically optimized for numpy. Due to this limitation, we clip the algorithm to the first few frames to save time.
In [5]:
import numpy as np
from numpy.linalg import norm, svd
def inexact_augmented_lagrange_multiplier(X, lmbda=.01, tol=1e-3,
maxiter=100, verbose=True):
"""
Inexact Augmented Lagrange Multiplier
"""
Y = X
norm_two = norm(Y.ravel(), 2)
norm_inf = norm(Y.ravel(), np.inf) / lmbda
dual_norm = np.max([norm_two, norm_inf])
Y = Y / dual_norm
A = np.zeros(Y.shape)
E = np.zeros(Y.shape)
dnorm = norm(X, 'fro')
mu = 1.25 / norm_two
rho = 1.5
sv = 10.
n = Y.shape[0]
itr = 0
while True:
Eraw = X - A + (1 / mu) * Y
Eupdate = np.maximum(Eraw - lmbda / mu, 0) + np.minimum(Eraw + lmbda / mu, 0)
U, S, V = svd(X - Eupdate + (1 / mu) * Y, full_matrices=False)
svp = (S > 1 / mu).shape[0]
if svp < sv:
sv = np.min([svp + 1, n])
else:
sv = np.min([svp + round(.05 * n), n])
Aupdate = np.dot(np.dot(U[:, :svp], np.diag(S[:svp] - 1 / mu)), V[:svp, :])
A = Aupdate
E = Eupdate
Z = X - A - E
Y = Y + mu * Z
mu = np.min([mu * rho, mu * 1e7])
itr += 1
if ((norm(Z, 'fro') / dnorm) < tol) or (itr >= maxiter):
break
if verbose:
print("Finished at iteration %d" % (itr))
return A, E
In [6]:
sz = clip
A, E = inexact_augmented_lagrange_multiplier(X[:, :sz])
A = A.reshape(d1, d2, sz) * 255.
E = E.reshape(d1, d2, sz) * 255.
#Refer to them by position desired for video demo later
savemat("./IALM_background_subtraction.mat", {"1": A, "2": E})
print("RPCA complete")
The code below contains an implementation of the GoDec algorithm, which attempts to solve the problem $X = L + S + G$, with $L$ low-rank, $S$ sparse, and $G$ as a component of Gaussian noise. By allowing the decomposition to expand to 3 matrix components, the algorithm is able to more effectively differentiate the sparse component from the low-rank.
In [7]:
import numpy as np
from numpy.linalg import norm
from scipy.linalg import qr
def wthresh(a, thresh):
#Soft wavelet threshold
res = np.abs(a) - thresh
return np.sign(a) * ((res > 0) * res)
#Default threshold of .03 is assumed to be for input in the range 0-1...
#original matlab had 8 out of 255, which is about .03 scaled to 0-1 range
def go_dec(X, thresh=.03, rank=2, power=0, tol=1e-3,
max_iter=100, random_seed=0, verbose=True):
m, n = X.shape
if m < n:
X = X.T
m, n = X.shape
L = X
S = np.zeros(L.shape)
itr = 0
random_state = np.random.RandomState(random_seed)
while True:
Y2 = random_state.randn(n, rank)
for i in range(power + 1):
Y1 = np.dot(L, Y2)
Y2 = np.dot(L.T, Y1);
Q, R = qr(Y2, mode='economic')
L_new = np.dot(np.dot(L, Q), Q.T)
T = L - L_new + S
L = L_new
S = wthresh(T, thresh)
T -= S
err = norm(T.ravel(), 2)
if (err < tol) or (itr >= max_iter):
break
L += T
itr += 1
#Is this even useful in soft GoDec? May be a display issue...
G = X - L - S
if m < n:
L = L.T
S = S.T
G = G.T
if verbose:
print("Finished at iteration %d" % (itr))
return L, S, G
In [8]:
sz = clip
L, S, G = go_dec(X[:, :sz])
L = L.reshape(d1, d2, sz) * 255.
S = S.reshape(d1, d2, sz) * 255.
G = G.reshape(d1, d2, sz) * 255.
savemat("./GoDec_background_subtraction.mat", {"1": L, "2": S, "3": G, })
print("GoDec complete")
In [9]:
import os
import sys
import matplotlib.pyplot as plt
from scipy.io import loadmat
import numpy as np
from matplotlib import cm
import matplotlib
#demo inspired by / stolen from @kuantkid on Github - nice work!
def mlabdefaults():
matplotlib.rcParams['lines.linewidth'] = 1.5
matplotlib.rcParams['savefig.dpi'] = 300
matplotlib.rcParams['font.size'] = 22
matplotlib.rcParams['font.family'] = "Times New Roman"
matplotlib.rcParams['legend.fontsize'] = "small"
matplotlib.rcParams['legend.fancybox'] = True
matplotlib.rcParams['lines.markersize'] = 10
matplotlib.rcParams['figure.figsize'] = 8, 5.6
matplotlib.rcParams['legend.labelspacing'] = 0.1
matplotlib.rcParams['legend.borderpad'] = 0.1
matplotlib.rcParams['legend.borderaxespad'] = 0.2
matplotlib.rcParams['font.monospace'] = "Courier New"
matplotlib.rcParams['savefig.dpi'] = 200
def make_video(alg, cache_path='/tmp/matrix_dec_tmp'):
name = alg
if not os.path.exists(cache_path):
os.mkdir(cache_path)
#If you generate a big
if not os.path.exists('%s/%s_tmp'%(cache_path, name)):
os.mkdir("%s/%s_tmp"%(cache_path, name))
mat = loadmat('./%s_background_subtraction.mat'%(name))
org = X.reshape(d1, d2, X.shape[1]) * 255.
fig = plt.figure()
ax = fig.add_subplot(111)
usable = [x for x in sorted(mat.keys()) if "_" not in x][0]
sz = min(org.shape[2], mat[usable].shape[2])
for i in range(sz):
ax.cla()
ax.axis("off")
ax.imshow(np.hstack([mat[x][:, :, i] for x in sorted(mat.keys()) if "_" not in x] + \
[org[:, :, i]]), cm.gray)
fname_ = '%s/%s_tmp/_tmp%03d.png'%(cache_path, name, i)
if (i % 25) == 0:
print('Completed frame', i, 'of', sz, 'for method', name)
fig.tight_layout()
fig.savefig(fname_, bbox_inches="tight")
#Write out an mp4 and webm video from the png files. -r 5 means 5 frames a second
#libx264 is h.264 encoding, -s 160x130 is the image size
#You may need to sudo apt-get install libavcodec
plt.close()
num_arrays = na = len([x for x in mat.keys() if "_" not in x])
cdims = (na * d1, d2)
cmd_h264 = "ffmpeg -y -r 10 -i '%s/%s_tmp/_tmp%%03d.png' -c:v libx264 " % (cache_path, name) + \
"-s %dx%d -preset ultrafast -pix_fmt yuv420p %s_animation.mp4" % (cdims[0], cdims[1], name)
cmd_vp8 = "ffmpeg -y -r 10 -i '%s/%s_tmp/_tmp%%03d.png' -c:v libvpx " % (cache_path, name) + \
"-s %dx%d -preset ultrafast -pix_fmt yuv420p %s_animation.webm" % (cdims[0], cdims[1], name)
os.system(cmd_h264)
os.system(cmd_vp8)
if __name__ == "__main__":
mlabdefaults()
all_methods = ['IALM', 'GoDec']
for name in all_methods:
make_video(name);
In [10]:
print("Background is generated from this file:", example)
The code below will display HTML5 video for each of the videos generated in the previos step, and embed it in the IPython notebook. There are "echoes" of people, which are much more pronounced in the Robust PCA video than the GoDec version, likely due to the increased flexibility of an independent Gaussian term. Overall, the effect is pretty cool though not mathematically as good as the GoDec result.
In [11]:
from IPython.display import HTML
from base64 import b64encode
def html5_video(alg, frames):
#This *should* support all browsers...
framesz = 250
info = {"mp4": {"ext":"mp4", "encoded": '', "size":(frames * framesz, framesz)}}
html_output = []
for k in info.keys():
f = open("%s_animation.%s" % (alg, info[k]["ext"]), "rb").read()
encoded = b64encode(f).decode('ascii')
video_tag = '<video width="500" height="250" autoplay="autoplay" ' + \
'loop src="data:video/%s;base64,%s">' % (k, encoded)
html_output.append(video_tag)
return HTML(data=''.join(html_output))
If these videos freeze for some reason, just hit refresh and they should start playing.
In [12]:
html5_video("IALM", 3)
Out[12]:
In [13]:
html5_video("GoDec", 4)
Out[13]:
Though the results are definitely not identical to the video results, they are similar enough to the benchmark that I am satisfied for now. Future work in this area will involve more decomposition algorithms, dictionary learning, and matrix completion. Eventually, I would like to get this into scikit-learn format, and post as a gist or contribute to the codebase.
kk