Video Sliding Windows

So far we restricted ourselves to 1D time series, but the idea of recovering periodic dynamics with geometry can just as easily apply to multivariate signals. In this module, we will examine sliding windows of videos as an exmaple. Many natural videos also have periodicity, such as this video of a woman doing jumping jacks


In [ ]:
import io
import base64
from IPython.display import HTML

video = io.open('jumpingjacks.ogg', 'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''<video alt="test" controls>
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii')))

Video can be decomposed into a 3D array, which has dimensions width x height x time. To tease out periodicity in geometric form, we will do the exact same thing as with sliding window 1D signal embeddings, but instead of just one sample per time shift, we need to take every pixel in every frame in the time window. The figure below depicts this



To see this visually in the video next to PCA of the embedding, look at the following video


In [ ]:
video = io.open('jumpingjackssliding.ogg', 'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''<video alt="test" controls>
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii')))

PCA Preprocessing for Efficiency


One issue we have swept under the rug so far is memory consumption and computational efficiency. Doing a raw sliding window of every pixel of every frame in the video would blow up in memory. However, even though there are WH pixels in each frame, there are only N frames in the video. This means that each frame in the video can be represented in an (N-1) dimensional subspace of the pixel space, and the coordinates of this subspace can be used in lieu of the pixels in the sliding window embedding. This can be done efficiently with a PCA step before the sliding window embedding. Run the cell below to load code that does PCA efficiently


In [ ]:
#Do all of the imports and setup inline plotting
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from mpl_toolkits.mplot3d import Axes3D
import scipy.interpolate

from ripser import ripser
from persim import plot_diagrams
from VideoTools import *

##Here is the actual PCA code
def getPCAVideo(I):
    ICov = I.dot(I.T)
    [lam, V] = linalg.eigh(ICov)
    V = V*np.sqrt(lam[None, :])
    return V

Jumping Jacks Example Live Demo


Let's now load in code that does sliding window embeddings of videos. The code is very similar to the 1D case, and it has the exact same parameters. The only difference is that each sliding window lives in a Euclidean space of dimension the number of pixels times dim. We're also using linear interpolation instead of spline interpolation to keep things fast


In [ ]:
def getSlidingWindowVideo(I, dim, Tau, dT):
    N = I.shape[0] #Number of frames
    P = I.shape[1] #Number of pixels (possibly after PCA)
    pix = np.arange(P)
    NWindows = int(np.floor((N-dim*Tau)/dT))
    X = np.zeros((NWindows, dim*P))
    idx = np.arange(N)
    for i in range(NWindows):
        idxx = dT*i + Tau*np.arange(dim)
        start = int(np.floor(idxx[0]))
        end = int(np.ceil(idxx[-1]))+2
        if end >= I.shape[0]:
            X = X[0:i, :]
            break
        f = scipy.interpolate.interp2d(pix, idx[start:end+1], I[idx[start:end+1], :], kind='linear')
        X[i, :] = f(pix, idxx).flatten()
    return X

Finally, let's load in the jumping jacks video and perform PCA to reduce the number of effective pixels.
Note that loading the video may take a few seconds on the virtual image


In [ ]:
#Load in video and do PCA to compress dimension
(X, FrameDims) = loadImageIOVideo("jumpingjacks.ogg")
X = getPCAVideo(X)

Now let's do a sliding window embedding and examine the sliding window embedding using TDA. As before, you should tweak the parameters of the sliding window embedding and study the effect on the geometry.


In [ ]:
#Given that the period is 30 frames per cycle, choose a dimension and a Tau that capture 
#this motion in the roundest possible way
#Plot persistence diagram and PCA
dim = 30
Tau = 1
dT = 1

#Get sliding window video
XS = getSlidingWindowVideo(X, dim, Tau, dT)

#Mean-center and normalize sliding window
XS = XS - np.mean(XS, 1)[:, None]
XS = XS/np.sqrt(np.sum(XS**2, 1))[:, None]

#Get persistence diagrams
dgms = ripser(XS)['dgms']

#Do PCA for visualization
pca = PCA(n_components = 3)
Y = pca.fit_transform(XS)


fig = plt.figure(figsize=(12, 6))
plt.subplot(121)
plot_diagrams(dgms)
plt.title("1D Persistence Diagram")

c = plt.get_cmap('nipy_spectral')
C = c(np.array(np.round(np.linspace(0, 255, Y.shape[0])), dtype=np.int32))
C = C[:, 0:3]
ax2 = fig.add_subplot(122, projection = '3d')
ax2.set_title("PCA of Sliding Window Embedding")
ax2.scatter(Y[:, 0], Y[:, 1], Y[:, 2], c=C)
ax2.set_aspect('equal', 'datalim')
plt.show()

Periodicities in The KTH Dataset


We will now examine videos from the KTH dataset, which is a repository of black and white videos of human activities. It consists of 25 subjects performing 6 different actions in each of 4 scenarios. We will use the algorithms developed in this section to measure and rank the periodicity of the different video clips.

Varying Window Length


For our first experiment, we will be showing some precomputed results of varying the sliding window length, while choosing Tau and dT appropriately to keep the dimension and the number of points, respectively, the same in the sliding window embedding. As an example, we will apply it to one of the videos of a subject waving his hands back and forth, as shown below


In [ ]:
video = io.open('KTH/handwaving/person01_handwaving_d1_uncomp.ogg', 'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''<video alt="test" controls>
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii')))

We have done some additional preprocessing, including applying a bandpass filter to each PCA pixel to cut down on drift in the video. Below we show a video varying the window size of the embedding and plotting the persistence diagram, "self-similarity matrix" (distance matrix), and PCA of the embedding, as well as an evolving plot of the maximum persistence versus window size:


In [ ]:
video = io.open('Handwaving_Deriv10_Block160_PCA10.ogg', 'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''<video alt="test" controls>
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii')))

As you can see, the maximum persistence peaks at around 40 frames, which is the period of each hand wave. This is what the theory we developed for 1D time series would have predicted as the roundest window.

Quasiperiodicity Quantification in Video


We now examine how this pipeline can be used to detect quasiperiodicity in videos. As an example, we examine videos from high-speed glottography, or high speed videos (4000 fps) of the left and right vocal folds in the human vocal tract. When a person has a normal voice, the vocal folds oscillate in a periodic fashion. On the other hand, if they have certain types of paralysis or near chaotic dynamics, they can exhibit biphonation just as the horse whinnies did. More info can be found in this paper.

Healthy Subject

Let's begin by analyzing a video of a healthy person. In this example and in the following example, we will be computing both persistent H1 and persistent H2, so the code may take a bit longer to run.

Questions

  • What can we say about the vocal folds of a healthy subject based on the persistence diagram?

In [ ]:
video = io.open('NormalPeriodicCrop.ogg', 'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''<video alt="test" controls>
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii')))

In [ ]:
(X, FrameDims) = loadVideo("NormalPeriodicCrop.ogg")
X = getPCAVideo(X)
dim = 70
Tau = 0.5
dT = 1
derivWin = 10

#Take a bandpass filter in time at each pixel to smooth out noise
[X, validIdx] = getTimeDerivative(X, derivWin)

#Do the sliding window
XS = getSlidingWindowVideo(X, dim, Tau, dT)

#Mean-center and normalize sliding window
XS = XS - np.mean(XS, 1)[:, None]
XS = XS/np.sqrt(np.sum(XS**2, 1))[:, None]

#Compute and plot persistence diagrams
print("Computing persistence diagrams...")
dgms = ripser(XS, maxdim=2)['dgms']
print("Finished computing persistence diagrams")

plt.figure()
plot_diagrams(dgms)
plt.title("Persistence Diagrams$")
plt.show()

Subject with Biphonation

Let's now examine a video of someone with a vocal pathology. This video may still appear periodic, but if you look closely there's a subtle shift going on over time


In [ ]:
video = io.open('ClinicalAsymmetry.mp4', 'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''<video alt="test" controls>
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii')))

In [ ]:
(X, FrameDims) = loadVideo("ClinicalAsymmetry.mp4")
X = getPCAVideo(X)
X = X[0:200, :]
#'dim':32, 'Tau':0.25, 'dT':0.25, 'derivWin':2
dim = 100
Tau = 0.25
dT = 0.5
derivWin = 5

#Take a bandpass filter in time at each pixel to smooth out noise
[X, validIdx] = getTimeDerivative(X, derivWin)

#Do the sliding window
XS = getSlidingWindowVideo(X, dim, Tau, dT)
print("XS.shape = ", XS.shape)

#Mean-center and normalize sliding window
XS = XS - np.mean(XS, 1)[:, None]
XS = XS/np.sqrt(np.sum(XS**2, 1))[:, None]

#Compute and plot persistence diagrams
print("Computing persistence diagrams...")
dgms = ripser(XS, maxdim=2)['dgms']
print("Finished computing persistence diagrams")

plt.figure()
plt.title("Persistence Diagrams$")
plot_diagrams(dgms)
plt.show()

Question:

  • What shape is this? What does this say about the underlying frequencies involved?

Another Subject with Biphonation

Let's now examine another person with a vocal pathology, this time due to mucus that is pushed out of the vocal folds every other oscillation. This time, we will look at both $\mathbb{Z} / 2\mathbb{Z}$ coefficients and $\mathbb{Z} / 3 \mathbb{Z}$ coefficients.

Questions

  • Can you see any changes between $\mathbb{Z} / 2\mathbb{Z}$ coefficients and $\mathbb{Z} / 3 \mathbb{Z}$ coefficients? the What shape is this? Can you relate this to something we've seen before?

In [ ]:
video = io.open('LTR_ED_MucusBiphonCrop.ogg', 'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''<video alt="test" controls>
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii')))

In [ ]:
(X, FrameDims) = loadVideo("LTR_ED_MucusBiphonCrop.ogg")
X = getPCAVideo(X)
X = X[0:200, :]
#'dim':32, 'Tau':0.25, 'dT':0.25, 'derivWin':2
dim = 100
Tau = 1
dT = 0.25
derivWin = 5

#Take a bandpass filter in time at each pixel to smooth out noise
[X, validIdx] = getTimeDerivative(X, derivWin)

#Do the sliding window
XS = getSlidingWindowVideo(X, dim, Tau, dT)
print("XS.shape = ", XS.shape)

#Mean-center and normalize sliding window
XS = XS - np.mean(XS, 1)[:, None]
XS = XS/np.sqrt(np.sum(XS**2, 1))[:, None]

#Compute and plot persistence diagrams
print("Computing persistence diagrams...")
dgms2 = ripser(XS, maxdim=2, coeff=2)['dgms']
dgms3 = ripser(XS, maxdim=2, coeff=3)['dgms']
print("Finished computing persistence diagrams")

plt.figure(figsize=(8, 4))
plt.subplot(121)
plot_diagrams(dgms2)
plt.title("Persistence Diagrams $\mathbb{Z}2$")
plt.subplot(122)
plot_diagrams(dgms3)
plt.title("Persistence Diagrams $\mathbb{Z}3$")
plt.show()

Summary

  • Periodicity can be studied on general time series data, including multivariate time series such as video
  • Computational tricks, such as PCA, can be employed to make sliding window videos computationally tractable
  • It is even possible to pick up on quasiperiodicity/biphonation in videos without doing any tracking.

In [ ]: