Unsupervised Clustering Experiment

This is a simple experiment in which I read a certain number of megabytes from a video an attempt to do unsupervised clusterig. This is by no means a novel experiment, but it is used rather to become familiar with the OpenCV Python bindings, Scikit-learn, and numpy


In [ ]:
from read_video import * 
import numpy as np
import matplotlib.pyplot as plt
import cv2

Time Critical Path

Reading in the video is going to be a very expensive operation. The following cell will take several seconds to execute because the program reas in max_buf_size_mb megabytes into memory. A conserative number is chosen here because the scikit-learn toolbox calculations will need to use about the same amount of memory to cacluate the clusters.


In [ ]:
video_to_read = "/Users/cody/test.mov"
max_buf_size_mb = 500; 
%time frame_buffer = ReadVideo(video_to_read, max_buf_size_mb)

In [ ]:
frame_buffer.nbytes
print("Matrix shape: {}".format(frame_buffer.shape))

Plot First and Last Frames


In [ ]:
%matplotlib inline
#If you try to imshow doubles, it will look messed up.
plt.imshow(frame_buffer[0, :, :, :]); # Plot first frame
plt.show()
plt.imshow(frame_buffer[-1, :, :, :]); # Plot last frame
plt.show()

In [ ]:
from sklearn import metrics
from sklearn.cluster import KMeans
from sklearn import cluster
from sklearn.datasets import load_digits
from sklearn.decomposition import PCA
from sklearn.preprocessing import scale

Reshape the Data for Clustering

Like R, scikit learn needs to have data grouped in the form of n_samples x n_features. Here I reshape the data without copying any of it. In this case, I want to segment based on the contents of the frames so n_samples = num_frames and n_features = pixel_values. It is very important that data not be copied here, since copying video data is a very expensive operation. I've timed the function to prove to myself that I am just creating a new window into the data (i.e. pointers) and not copying anything.

NOTE: I have included the color channels here as well.


In [ ]:
buf_s = frame_buffer.shape
K = buf_s[0] # Number of frames
M = buf_s[1]
N = buf_s[2]
chan = buf_s[3] # Color channel
%time scikit_buffer = frame_buffer.reshape([K, M*N*chan])
scikit_buffer.shape

Begin Heavy Lifting

Up until this point, everything I have setup has been to get video data properly formatted and ready to ship to the clusting algorithm. Since I recorded the short video used in the example, I know exactly how many clusters I think there should be:

  1. Blank screen
  2. e
  3. c
  4. e
  5. 6
  6. 3
  7. 3

So exactly 7 clusters can be inferred from the video data. The code below is also a time critcal path, and takes quite a long time to compute. I've timed this function to prove to myself that it does indeed take quite some time.


In [ ]:
k_means = cluster.KMeans(n_clusters=7, n_init=1, copy_x=False)
%time k_means.fit(scikit_buffer)

Analysis

As I had hoped, contiguous frames were clustered together but not without a few anomalies. Just from looking at the data, we can see that some clusters may have been misclassified, 1 and 5 for example towards the end of the array.


In [ ]:
labels = k_means.labels_
values = k_means.cluster_centers_.squeeze()
labels

Visualization of the Data

The only good way to visualize this data would be to look at the transitions from one classification to the other, i.e. when current_classification != previous_classification. If the clustering works, we should a distinc difference between the frame on the left and the frame on the right.


In [ ]:
prev = labels[0]
plt_count = 0
for i in range(1, labels.size): 
    if (plt_count == 5): 
        break; 
    if (prev != labels[i]):
        plt.subplot(1,2,1); 
        plt.title(i)
        plt.imshow(frame_buffer[i, :, :, :])
        plt.subplot(1,2,2); 
        plt.title(i-1)
        plt.imshow(frame_buffer[i-1, :, :, :])
        plt.show()
        plt_count = plt_count + 1
    prev = labels[i]

As can be seen above, the results were not what I hypothesized. It appears that clustering is finding things that I am not seeing, which is not good. One possibility as to why this is failing is that I might be giving it too much data. One solution would be to use histograms to classify each from instead of the entire RGB frame.