This is a simple experiment in which I read a certain number of megabytes from a video an attempt to do unsupervised clusterig. This is by no means a novel experiment, but it is used rather to become familiar with the OpenCV Python bindings, Scikit-learn, and numpy
from read_video import *
import numpy as np
import matplotlib.pyplot as plt
import cv2
Reading in the video is going to be a very expensive operation. The following cell will take several seconds to execute because the program reas in max_buf_size_mb
megabytes into memory. A conserative number is chosen here because the scikit-learn toolbox calculations will need to use about the same amount of memory to cacluate the clusters.
video_to_read = "/Users/cody/"
max_buf_size_mb = 500;
%time frame_buffer = ReadVideo(video_to_read, max_buf_size_mb)
print("Matrix shape: {}".format(frame_buffer.shape))
%matplotlib inline
#If you try to imshow doubles, it will look messed up.
plt.imshow(frame_buffer[0, :, :, :]); # Plot first frame
plt.imshow(frame_buffer[-1, :, :, :]); # Plot last frame
from sklearn import metrics
from sklearn.cluster import KMeans
from sklearn import cluster
from sklearn.datasets import load_digits
from sklearn.decomposition import PCA
from sklearn.preprocessing import scale
Like R, scikit learn needs to have data grouped in the form of n_samples x n_features
. Here I reshape the data without copying any of it. In this case, I want to segment based on the contents of the frames so n_samples = num_frames
and n_features = pixel_values
. It is very important that data not be copied here, since copying video data is a very expensive operation. I've timed the function to prove to myself that I am just creating a new window into the data (i.e. pointers) and not copying anything.
NOTE: I have included the color channels here as well.
buf_s = frame_buffer.shape
K = buf_s[0] # Number of frames
M = buf_s[1]
N = buf_s[2]
chan = buf_s[3] # Color channel
%time scikit_buffer = frame_buffer.reshape([K, M*N*chan])
Up until this point, everything I have setup has been to get video data properly formatted and ready to ship to the clusting algorithm. Since I recorded the short video used in the example, I know exactly how many clusters I think there should be:
So exactly 7 clusters can be inferred from the video data. The code below is also a time critcal path, and takes quite a long time to compute. I've timed this function to prove to myself that it does indeed take quite some time.
k_means = cluster.KMeans(n_clusters=7, n_init=1, copy_x=False)
labels = k_means.labels_
values = k_means.cluster_centers_.squeeze()
The only good way to visualize this data would be to look at the transitions from one classification to the other, i.e. when current_classification != previous_classification
. If the clustering works, we should a distinc difference between the frame on the left and the frame on the right.
prev = labels[0]
plt_count = 0
for i in range(1, labels.size):
if (plt_count == 5):
if (prev != labels[i]):
plt.imshow(frame_buffer[i, :, :, :])
plt.imshow(frame_buffer[i-1, :, :, :])
plt_count = plt_count + 1
prev = labels[i]
As can be seen above, the results were not what I hypothesized. It appears that clustering is finding things that I am not seeing, which is not good. One possibility as to why this is failing is that I might be giving it too much data. One solution would be to use histograms to classify each from instead of the entire RGB frame.