In [46]:
from __future__ import print_function
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
In the below table, I compare four different methods for implementing cross correlation.
NoGpuSupport
- this is OpenCV's normal implementation of cross correlation using the matchTemplate
function. GpuSupport
- This is OpenCV's GPU implementation of the matchTemplate
function. Decompose
- This is somewhat of a dummy test that demonstrates the maximum speed that overlap and add could work. This does not include any addtions or memory copies. It just calculates how long it takes to compute the cross correlation multiple times based on what value the image would be broke up. i.e. if the L
value is set to 512, there would be a total of 4 cross correlation operations for a 1024x1024 image. OverlapAdd
- This benchmark demonstrates my implementation of overlap and add for a 2D signal.In the table below, Problem Space
is referring to the image size, i.e. 512 means that the image is 512x512.
In [53]:
data = pd.read_csv("./cross_correlation_results.csv")
data[['Experiment', 'Problem Space', 'Baseline', 'Iterations/sec', 'Min (us)', 'Mean (us)',
'Max (us)', 'Standard Deviation']]
Out[53]:
In [52]:
import matplotlib.cm as cm
prob_space = data.groupby('Experiment')
ind = np.arange(len(data.groupby('Problem Space')))
colors = cm.rainbow(np.linspace(0, 1, len(ind)))
fig, ax = plt.subplots()
width = .15;
offset = 0
rects = []
names = []
for (name, group), c in zip(prob_space, colors):
names.append(name)
rects.append(ax.bar(ind +offset, np.log10(group['Iterations/sec']), width, color=c))
offset = offset + width
ax.set_ylabel('log10(Frames per second)')
ax.set_xlabel('Image size in pixels')
ax.set_title('Comparison of Xcorr Methods')
ax.set_xticks(ind + width)
ax.set_xticklabels(data['Problem Space'].unique())
ax.legend(rects, names, bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
Out[52]:
From the plot above, we can see that the best implementation, even for large images (8k x 8k), is the GPU supported matchTemplate
function available in OpenCV. As we would expect, the decompose
experiment is always better than the OverlapAdd
experiment but is never able to beat out the GpuSupport
experiment. Despite all this, all GPU implementations significantly beat ou the CPU only implementation. Experiments were also attempted using 16k x 16k images, but I found that there wasn't sufficient memory on the GPU to handle images of this size. In that regard, overlap and add can easily handle images of arbitrary size.