In [1]:
%load_ext watermark
In [2]:
%watermark -d -v -u -t -z -p numpy
[More information](http://nbviewer.ipython.org/github/rasbt/python_reference/blob/master/ipython_magic/watermark.ipynb) about the `watermark` magic command extension.
There are at least two ways to stack NumPy arrays vertically (row-wise), either via numpy.concatenate(tup, axis=0)
, or by the more specific numpy.vstack(tup)
function. Although the NumPy documentations claims that they are equivalent, there are rumors that numpy.concatenate
is the faster approach.
The same applies to numpy.hstack
vs np.concatenate(tup, axis=1)
for stacking arrays column-wise (vertically), and there is a third way, numpy.append(a, b)
.
Let's see if those rumors are true...
ndarray
object (in contrast to Python's append
method on list
objects).
In [ ]:
Before we do the actual benchmarks, let us quickly check that those methods are indeed similar results:
In [3]:
import numpy as np
In [4]:
# Vertical (row-wise) stacking
a = np.array([[1,2,3],[4,5,6]])
b = np.array([[9,8,7],[7,8,9]])
print(np.concatenate((a,b)), end='\n\n')
print(np.vstack((a,b)))
In [5]:
# Horizontal (column-wise) stacking
a = np.array([1,2,3])
b = np.array([4,5,6])
print(np.concatenate((a,b), axis=1), end='\n\n')
print(np.hstack((a,b)), end='\n\n')
print(np.append(a,b))
In [22]:
import timeit
from numpy import append as np_append
from numpy import concatenate as np_concatenate
from numpy import hstack as np_hstack
from numpy import vstack as np_vstack
funcs = ('np_append', 'np_concatenate', 'np_hstack', 'np_linalg_norm')
t_append, t_hconc, t_vconc, t_hstack, t_vstack = [], [], [], [], []
orders_5 = [10**i for i in range(1, 5)]
for n in orders_5:
nxn_dim = np.random.randn(n,n)
t_vconc.append(min(timeit.Timer('np_concatenate((nxn_dim, nxn_dim))',
'from __main__ import nxn_dim, np_concatenate').repeat(repeat=5, number=1)))
t_vstack.append(min(timeit.Timer('np_vstack((nxn_dim, nxn_dim))',
'from __main__ import nxn_dim, np_vstack').repeat(repeat=5, number=1)))
orders_6 = [10**i for i in range(1, 6)]
for n in orders_6:
nx1_dim = np.random.randn(n,1)
t_append.append(min(timeit.Timer('np_append(nx1_dim, nx1_dim)',
'from __main__ import nx1_dim, np_append').repeat(repeat=5, number=1)))
t_hconc.append(min(timeit.Timer('np_concatenate((nx1_dim, nx1_dim), axis=1)',
'from __main__ import nx1_dim, np_concatenate').repeat(repeat=5, number=1)))
t_hstack.append(min(timeit.Timer('np_hstack((nx1_dim, nx1_dim))',
'from __main__ import nx1_dim, np_hstack').repeat(repeat=5, number=1)))
In [23]:
%matplotlib inline
In [24]:
from matplotlib import pyplot as plt
def plot():
def settings():
plt.xlim([min(orders_6) / 10, max(orders_6)* 10])
plt.legend(loc=2, fontsize=14)
plt.grid()
plt.xticks(fontsize=16)
plt.yticks(fontsize=16)
plt.xscale('log')
plt.yscale('log')
plt.legend(loc=2, fontsize=14)
fig = plt.figure(figsize=(15,8))
plt.subplot(1,2,1)
plt.plot(orders_5, t_vconc, alpha=0.7, label='np.concatenate((a,b))')
plt.plot(orders_5, t_vstack, alpha=0.7, label='np.vstack((a,b))')
plt.xlabel(r'sample size $n$ ($n\times \, n$ NumPy array)', fontsize=14)
plt.ylabel('time per computation in seconds', fontsize=14)
plt.title('Vertical stacking of NumPy arrays (row wise)', fontsize=14)
settings()
plt.subplot(1,2,2)
plt.plot(orders_6, t_hconc, alpha=0.7, label='np.concatenate((a,b), axis=1)')
plt.plot(orders_6, t_hstack, alpha=0.7, label='np.hstack((a,b))')
plt.plot(orders_6, t_append, alpha=0.7, label='np.append(a,b)')
plt.xlabel(r'sample size $n$ ($n\times \, 1$ NumPy array)', fontsize=14)
plt.ylabel('time per computation in seconds', fontsize=14)
plt.title('Horizontal stacking of NumPy arrays (column wise)', fontsize=14)
settings()
plt.tight_layout()
plt.show()
In [25]:
plot()
In [26]:
%watermark
The plots above are indicating that the concatenate
functions are indeed faster to call for small sample sizes. However, large arrays are the ones where performance really matters, and we can see that the other functions are catching up performance-wise with increasing array sizes.
In [ ]: