Understanding NumPy Internals

We can achieve significant performance speed enhancement with NumPy over native Python code, particularly when our computations follow the Single Instruction, Multiple Data (SIMD) paradigm.


In [1]:
import numpy as np

Copy and "deep copy"

To achieve high performance, assignments in Python usually do not copy the underlaying objects.

This is important for example when objects are passed between functions, to avoid an excessive amount of memory copying when it is not necessary (techincal term: pass by reference).

First, we need a way to check whether two arrays share the same underlying data buffer in memory.

Let's define a function aid() that returns the memory location of the underlying data buffer:


In [3]:
def aid(x):
    # This function returns the memory
    # block address of an array.
    return x.__array_interface__['data'][0]

Two arrays with the same data location (as returned by aid()) share the same underlying data buffer.

However, the opposite is true only if the arrays have the same offset (meaning that they have the same first element).


In [4]:
A = np.array([[1, 2], [3, 4]])

A


Out[4]:
array([[1, 2],
       [3, 4]])

In [5]:
# now B is referring to the same array data as A 
B = A

In [6]:
aid(A) == aid(B)


Out[6]:
True

In [7]:
# changing B affects A
B[0,0] = 10

B


Out[7]:
array([[10,  2],
       [ 3,  4]])

In [8]:
A


Out[8]:
array([[10,  2],
       [ 3,  4]])
  • If we want to avoid this behavior, so that when we get a new completely independent object B copied from A, then we need to do a so-called deep copy using the function np.copy:

In [9]:
B = np.copy(A)

In [10]:
# now, if we modify B, A is not affected
B[0,0] = -5

B


Out[10]:
array([[-5,  2],
       [ 3,  4]])

In [11]:
A


Out[11]:
array([[10,  2],
       [ 3,  4]])

In [13]:
aid(A) == aid(B)


Out[13]:
False


In [ ]: