In [1]:
%%HTML
<style>
.container { width:100% }
</style>


Vectorization is Fast!

This short notebook demonstrates that working with NumPy arrays is much faster than working with Python lists.


In [2]:
import numpy as np

We begin by defining two NumPy arrays a and b that are each filled with a million random numbers.


In [3]:
a = np.random.rand(1000000)
b = np.random.rand(1000000)

Next, we compute the dot product of a and b. Mathematically, this is defined as follows: $$ \textbf{a} \cdot \textbf{b} = \sum\limits_{i=1}^n \textbf{a}[i] \cdot \textbf{b}[i], $$ where $n$ is the dimension of aand b. In Python we can use the operator @ to compute the dot product.


In [4]:
%%time 
a @ b


CPU times: user 2.55 ms, sys: 1.4 ms, total: 3.95 ms
Wall time: 3.35 ms
Out[4]:
249968.5113950584

To compare this time with time that is needed if a and b are stored as lists instead, we convert a and b to ordinary Python lists.


In [5]:
la = list(a)
lb = list(b)

Next, we compute the dot product of a and b using these lists.


In [6]:
%%time
sum = 0
for i in range(len(la)):
    sum += la[i] * lb[i]


CPU times: user 309 ms, sys: 3.24 ms, total: 313 ms
Wall time: 315 ms

We notice that NumPy based computation is much faster than the list based computation. Similar observations can be made when a function is applied to all elements of an array. For big arrays, using the vectorized functions offered by NumPy is usually much faster than applying the function to all elements of a list.


In [7]:
import math

In [8]:
%%time
for i, x in enumerate(la):
    lb[i] = math.sin(x)


CPU times: user 257 ms, sys: 2.77 ms, total: 260 ms
Wall time: 263 ms

In [9]:
%%time
b = np.sin(a)


CPU times: user 9.95 ms, sys: 2.99 ms, total: 12.9 ms
Wall time: 12.9 ms

In [ ]: