Matrix factorization is a very interesting area of machine learning research. Formulating a problem as a 2D matrix $X$ to be decomposed into multiple matrices, which combine to return an approximation of $X$, can lead to state of the art results for many interesting problems. This core concept is the focus of compressive sensing, matrix completion, sparse coding, robust PCA, dictionary learning, and many other algorithms. One major website which shows many different types of matrix decomposition algorithms is the Matrix Factorization Jungle, run by Igor Carron. There has been a heavy focus on random projections in recent algorithms, which can often lead to increased stability and computationally efficient solutions.

Below is a link to the GoDec algorithm output, as applied to the "Hall" video (shown below) found in this zip file, which is a surveillance tape taken from a mall. Using the GoDec algorithm, the background is almost completely subtracted from the noisy elements of people walking, while still capturing periodic background elements as part of the background. I have written code for both the GoDec and Robust PCA algorithms in numpy based on their Matlab equivalents. There are many datasets which can be found here, and we will set up a simple download function for ease-of-access. Special thanks to @kuantkid for the PyRPCA repo, which was the inspiration to start and extend this work, and especially the idea of creating a demo video from PNGs which is PRETTY. DANG. AWESOME.

Interstellar Overdrive



In [1]:

    
from IPython.display import YouTubeVideo
YouTubeVideo('JgfK46RA8XY')









    Out[1]:

First we want to download a video, so that we can compare the algorithmic result against the original video. The file is downloaded, if it does not already exist in the working directory. Next, it will create a directory of the same name, and unzip the file contents (Campus.zip to Campus/filename).



In [2]:

    
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline



In [3]:

    
try:
    from urllib2 import urlopen
except ImportError:
    from urllib.request import urlopen
from scipy.io import loadmat, savemat
import os

ext = {"water":'WaterSurface.zip',
       "fountain":'Fountain.zip',
       "campus":'Campus.zip',
       "escalator": 'Escalator.zip',
       "curtain": 'Curtain.zip',
       "lobby": 'Lobby.zip',
       "mall": 'ShoppingMall.zip',
       "hall": 'hall.zip',
       "bootstrap": 'Bootstrap.zip'}

example = "mall"


def progress_bar_downloader(url, fname, progress_update_every=5):
    #from http://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python/22776#22776
    u = urlopen(url)
    f = open(fname, 'wb')
    meta = u.info()
    file_size = int(meta.get("Content-Length"))
    print("Downloading: %s Bytes: %s" % (fname, file_size))
    file_size_dl = 0
    block_sz = 8192
    p = 0
    while True:
        buffer = u.read(block_sz)
        if not buffer:
            break
        file_size_dl += len(buffer)
        f.write(buffer)
        if (file_size_dl * 100. / file_size) > p:
            status = r"%10d  [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
            print(status)
            p += progress_update_every
    f.close()


def get_video_clip(d):
    #Download files from http://perception.i2r.a-star.edu.sg/bk_model/bk_index.html
    if os.path.exists('./' + d):
        print('Video file %s already downloaded, continuing' % d)
        return
    else:
        print('Video file %s not found, downloading' % d)
    progress_bar_downloader(r'http://perception.i2r.a-star.edu.sg/BK_Model_TestData/' + d, d)

    
def bname(x): return x.split('.')[0]
    
get_video_clip(ext[example])

if not os.path.exists('./' + bname(ext[example])):
    os.makedirs(bname(ext[example]))
    os.system('unzip ' + ext[example] + ' -d ' + bname(ext[example]))









    



Video file ShoppingMall.zip already downloaded, continuing

The code below will read in all the .bmp images downloaded and unzipped from the website, as well as converting to grayscale, scaling the result between 0 and 1. Eventually, I plan to do a "full-color" version of this testing, but for now the greyscale will have to suffice.



In [4]:

    
from scipy import misc
import numpy as np
from glob import glob

def rgb2gray(rgb):
    r, g, b = rgb[:, :, 0], rgb[:, :, 1], rgb[:, :, 2]
    gray = 0.2989 * r + 0.5870 * g + 0.1140 * b
    return gray / 255.

fdir = bname(ext[example])
names = sorted(glob(fdir + "/*.bmp"))
d1, d2, channels = misc.imread(names[0]).shape
d1 = 128
d2 = 160
num = len(names)
X = np.zeros((d1, d2, num))
for n, i in enumerate(names):
    X[:, :, n] = misc.imresize(rgb2gray(misc.imread(i).astype(np.double)) / 255., (d1, d2))
    
X = X.reshape(d1 * d2, num)
clip = 100
print(X.shape)
print(d1)
print(d2)









    



(20480, 1286)
128
160

Robust PCA

Robust Principal Component Analysis (PCA) is an extension of PCA. Rather than attempting to solve $X = L$, where $L$ is typically a low-rank approximation ($N \times M$, vs. $N \times P$, $M < P$), Robust PCA solves the factorization problem $X = L + S$, where $L$ is a low-rank approximation, and $S$ is a sparse component. By separating the factorization into two separate matrix components, Robust PCA makes a much better low-rank estimate $L$ on many problems.

There are a variety of algorithms to solve this optimization problem. The code below is an implementation of the Inexact Augmented Lagrangian Multiplier algorithm for Robust PCA which is identical to the equivalent MATLAB code (download), or as near as I could make it. The functionality seems equivalent, and for relevant details please see the paper. This algorithm was chosen because according to the timing results at the bottom of this page, it was both the fastest and most accurate of the formulas listed. Though it appears to be fairly slow in our testing, it is fully believable that this is an implementation issue, since this code has not been specifically optimized for numpy. Due to this limitation, we clip the algorithm to the first few frames to save time.



In [5]:

    
import numpy as np
from numpy.linalg import norm, svd

def inexact_augmented_lagrange_multiplier(X, lmbda=.01, tol=1e-3,
                                          maxiter=100, verbose=True):
    """
    Inexact Augmented Lagrange Multiplier
    """
    Y = X
    norm_two = norm(Y.ravel(), 2)
    norm_inf = norm(Y.ravel(), np.inf) / lmbda
    dual_norm = np.max([norm_two, norm_inf])
    Y = Y / dual_norm
    A = np.zeros(Y.shape)
    E = np.zeros(Y.shape)
    dnorm = norm(X, 'fro')
    mu = 1.25 / norm_two
    rho = 1.5
    sv = 10.
    n = Y.shape[0]
    itr = 0
    while True:
        Eraw = X - A + (1 / mu) * Y
        Eupdate = np.maximum(Eraw - lmbda / mu, 0) + np.minimum(Eraw + lmbda / mu, 0)
        U, S, V = svd(X - Eupdate + (1 / mu) * Y, full_matrices=False)
        svp = (S > 1 / mu).shape[0]
        if svp < sv:
            sv = np.min([svp + 1, n])
        else:
            sv = np.min([svp + round(.05 * n), n])
        Aupdate = np.dot(np.dot(U[:, :svp], np.diag(S[:svp] - 1 / mu)), V[:svp, :])
        A = Aupdate
        E = Eupdate
        Z = X - A - E
        Y = Y + mu * Z
        mu = np.min([mu * rho, mu * 1e7])
        itr += 1
        if ((norm(Z, 'fro') / dnorm) < tol) or (itr >= maxiter):
            break
    if verbose:
        print("Finished at iteration %d" % (itr))  
    return A, E



In [6]:

    
sz = clip
A, E = inexact_augmented_lagrange_multiplier(X[:, :sz])
A = A.reshape(d1, d2, sz) * 255.
E = E.reshape(d1, d2, sz) * 255.
#Refer to them by position desired for video demo later 
savemat("./IALM_background_subtraction.mat", {"1": A, "2": E})
print("RPCA complete")









    



Finished at iteration 18
RPCA complete

GoDec

The code below contains an implementation of the GoDec algorithm, which attempts to solve the problem $X = L + S + G$, with $L$ low-rank, $S$ sparse, and $G$ as a component of Gaussian noise. By allowing the decomposition to expand to 3 matrix components, the algorithm is able to more effectively differentiate the sparse component from the low-rank.



In [7]:

    
import numpy as np
from numpy.linalg import norm
from scipy.linalg import qr

def wthresh(a, thresh):
    #Soft wavelet threshold
    res = np.abs(a) - thresh
    return np.sign(a) * ((res > 0) * res)

#Default threshold of .03 is assumed to be for input in the range 0-1...
#original matlab had 8 out of 255, which is about .03 scaled to 0-1 range
def go_dec(X, thresh=.03, rank=2, power=0, tol=1e-3,
           max_iter=100, random_seed=0, verbose=True):
    m, n = X.shape
    if m < n:
        X = X.T
    m, n = X.shape
    L = X
    S = np.zeros(L.shape)
    itr = 0
    random_state = np.random.RandomState(random_seed)    
    while True:
        Y2 = random_state.randn(n, rank)
        for i in range(power + 1):
            Y1 = np.dot(L, Y2)
            Y2 = np.dot(L.T, Y1);
        Q, R = qr(Y2, mode='economic')
        L_new = np.dot(np.dot(L, Q), Q.T)
        T = L - L_new + S
        L = L_new
        S = wthresh(T, thresh)
        T -= S
        err = norm(T.ravel(), 2)
        if (err < tol) or (itr >= max_iter):
            break 
        L += T
        itr += 1
    #Is this even useful in soft GoDec? May be a display issue...
    G = X - L - S
    if m < n:
        L = L.T
        S = S.T
        G = G.T
    if verbose:
        print("Finished at iteration %d" % (itr))
    return L, S, G



In [8]:

    
sz = clip
L, S, G = go_dec(X[:, :sz])
L = L.reshape(d1, d2, sz) * 255.
S = S.reshape(d1, d2, sz) * 255.
G = G.reshape(d1, d2, sz) * 255.
savemat("./GoDec_background_subtraction.mat", {"1": L, "2": S, "3": G, })
print("GoDec complete")









    



Finished at iteration 100
GoDec complete

A Momentary Lapse of Reason

Now it is time to do something a little unreasonable - we can actually take all of this data, reshape it into a series of images, and plot it as a video inside the IPython notebook! The first step is to generate the frames for the video as .png files, as shown below.



In [9]:

    
import os
import sys
import matplotlib.pyplot as plt
from scipy.io import loadmat
import numpy as np
from matplotlib import cm
import matplotlib

#demo inspired by / stolen from @kuantkid on Github - nice work!
def mlabdefaults():
    matplotlib.rcParams['lines.linewidth'] = 1.5
    matplotlib.rcParams['savefig.dpi'] = 300 
    matplotlib.rcParams['font.size'] = 22
    matplotlib.rcParams['font.family'] = "Times New Roman"
    matplotlib.rcParams['legend.fontsize'] = "small"
    matplotlib.rcParams['legend.fancybox'] = True
    matplotlib.rcParams['lines.markersize'] = 10
    matplotlib.rcParams['figure.figsize'] = 8, 5.6
    matplotlib.rcParams['legend.labelspacing'] = 0.1
    matplotlib.rcParams['legend.borderpad'] = 0.1
    matplotlib.rcParams['legend.borderaxespad'] = 0.2
    matplotlib.rcParams['font.monospace'] = "Courier New"
    matplotlib.rcParams['savefig.dpi'] = 200
    
def make_video(alg, cache_path='/tmp/matrix_dec_tmp'):
    name = alg
    if not os.path.exists(cache_path):
        os.mkdir(cache_path)
    #If you generate a big 
    if not os.path.exists('%s/%s_tmp'%(cache_path, name)):
        os.mkdir("%s/%s_tmp"%(cache_path, name))
    mat = loadmat('./%s_background_subtraction.mat'%(name))
    org = X.reshape(d1, d2, X.shape[1]) * 255.
    fig = plt.figure()
    ax = fig.add_subplot(111)
    usable = [x for x in sorted(mat.keys()) if "_" not in x][0]
    sz = min(org.shape[2], mat[usable].shape[2])
    for i in range(sz):
        ax.cla()
        ax.axis("off")
        ax.imshow(np.hstack([mat[x][:, :, i] for x in sorted(mat.keys()) if "_" not in x] + \
                            [org[:, :, i]]), cm.gray)
        fname_ = '%s/%s_tmp/_tmp%03d.png'%(cache_path, name, i)
        if (i % 25) == 0:
            print('Completed frame', i, 'of', sz, 'for method', name)
        fig.tight_layout()
        fig.savefig(fname_, bbox_inches="tight")
        #Write out an mp4 and webm video from the png files. -r 5 means 5 frames a second
        #libx264 is h.264 encoding, -s 160x130 is the image size
        #You may need to sudo apt-get install libavcodec
    plt.close()

    num_arrays = na = len([x for x  in mat.keys() if "_" not in x])
    cdims = (na * d1, d2)
    cmd_h264 = "ffmpeg -y -r 10 -i '%s/%s_tmp/_tmp%%03d.png' -c:v libx264 " % (cache_path, name) + \
               "-s %dx%d -preset ultrafast -pix_fmt yuv420p %s_animation.mp4" % (cdims[0], cdims[1], name)
    cmd_vp8 = "ffmpeg -y -r 10 -i '%s/%s_tmp/_tmp%%03d.png' -c:v libvpx " % (cache_path, name) + \
              "-s %dx%d -preset ultrafast -pix_fmt yuv420p %s_animation.webm" % (cdims[0], cdims[1], name)
    os.system(cmd_h264)
    os.system(cmd_vp8)
        
if __name__ == "__main__":
    mlabdefaults()
    all_methods = ['IALM', 'GoDec']
    for name in all_methods:
        make_video(name);









    



Completed frame 0 of 100 for method IALM
Completed frame 25 of 100 for method IALM
Completed frame 50 of 100 for method IALM
Completed frame 75 of 100 for method IALM
Completed frame 0 of 100 for method GoDec
Completed frame 25 of 100 for method GoDec
Completed frame 50 of 100 for method GoDec
Completed frame 75 of 100 for method GoDec



In [10]:

    
print("Background is generated from this file:", example)









    



Background is generated from this file: mall

Echoes

The code below will display HTML5 video for each of the videos generated in the previos step, and embed it in the IPython notebook. There are "echoes" of people, which are much more pronounced in the Robust PCA video than the GoDec version, likely due to the increased flexibility of an independent Gaussian term. Overall, the effect is pretty cool though not mathematically as good as the GoDec result.



In [11]:

    
from IPython.display import HTML
from base64 import b64encode

def html5_video(alg, frames):
    #This *should* support all browsers...
    framesz = 250
    info = {"mp4": {"ext":"mp4", "encoded": '', "size":(frames * framesz, framesz)}}
    html_output = []
    for k in info.keys():
        f = open("%s_animation.%s" % (alg, info[k]["ext"]), "rb").read()
        encoded = b64encode(f).decode('ascii')
        video_tag = '<video width="500" height="250" autoplay="autoplay" ' + \
                    'loop src="data:video/%s;base64,%s">' % (k, encoded)
        html_output.append(video_tag)
    return HTML(data=''.join(html_output))

If these videos freeze for some reason, just hit refresh and they should start playing.



In [12]:

    
html5_video("IALM", 3)









    Out[12]:



In [13]:

    
html5_video("GoDec", 4)









    Out[13]:

The Final Cut

Though the results are definitely not identical to the video results, they are similar enough to the benchmark that I am satisfied for now. Future work in this area will involve more decomposition algorithms, dictionary learning, and matrix completion. Eventually, I would like to get this into scikit-learn format, and post as a gist or contribute to the codebase.