The ndarray object from NumPy

The (nd)array object:

  • Collection of elements of the same type
  • Implemented in memory as a true table optimized for performance
  • Handled in similar way as any other python object

Multidimensional, any type of data

  • Dimensions can be modified, flexible indexation
  • Internal optimization for 1D, 2D and 3D

It can be interfaced with other languages...

Nota: the array module exists in python but it is limited to 1d, not to be confused with NumPy's nd-array


In [1]:
import numpy

Array creation, given its contents

From a list of values:


In [2]:
numpy.array([1, 2, 3, 5, 7, 11, 13, 17])


Out[2]:
array([ 1,  2,  3,  5,  7, 11, 13, 17])

From a list of lists for 2D arrays:


In [3]:
numpy.array([[1, 2, 3], [4, 5, 6]])


Out[3]:
array([[1, 2, 3],
       [4, 5, 6]])

Also specifying the type of element:


In [4]:
numpy.array([[1, 2, 3], [4, 5, 6]], dtype=numpy.float64)


Out[4]:
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

Array creation using numpy methods


In [5]:
shape = (2,3)
numpy.zeros(shape,dtype="int32")


Out[5]:
array([[0, 0, 0],
       [0, 0, 0]], dtype=int32)

In [6]:
numpy.ones(shape,dtype="int32")


Out[6]:
array([[1, 1, 1],
       [1, 1, 1]], dtype=int32)

In [7]:
#this is a ipython trick to request help ...
numpy.arange?

In [8]:
numpy.arange(0, 10, 2)


Out[8]:
array([0, 2, 4, 6, 8])

In [9]:
numpy.identity(3,dtype=numpy.int)


Out[9]:
array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]])

Arrays can also be generated from index positions using numpy.fromfunction. This can be combined with lambda-function


In [10]:
lambda_function = lambda i,j: numpy.sin(i) * numpy.cos(j)
numpy.fromfunction(lambda_function, (3,3))


Out[10]:
array([[ 0.        ,  0.        , -0.        ],
       [ 0.84147098,  0.45464871, -0.35017549],
       [ 0.90929743,  0.4912955 , -0.37840125]])

Numpy array can directly be dumped to and restrored from disk:


In [11]:
a = numpy.ones((3, 5), dtype="complex")
numpy.save('data.npy', a)
b = numpy.load('data.npy')
print(b)


[[ 1.+0.j  1.+0.j  1.+0.j  1.+0.j  1.+0.j]
 [ 1.+0.j  1.+0.j  1.+0.j  1.+0.j  1.+0.j]
 [ 1.+0.j  1.+0.j  1.+0.j  1.+0.j  1.+0.j]]

Exercise 2

Use Python as a simple calculator and compare the behaviour of numpy array (vs lists) as in Exercise 1

a = [1, 2, 3]

b = numpy.array(a)

  • What is the result of 2 * a[2] ?
  • What is the result of 2 * a ?
  • What is the result of 2 * b[2] ?
  • What is the result of 2 * b ?

In [12]:
a = [1, 2, 3]
import numpy
b = numpy.array(a)
print(2*a[2])
print(2*a)
print(2*b[2])
print(2*b)


6
[1, 2, 3, 1, 2, 3]
6
[2 4 6]

Type of elements

  • Integers, signed or not, of width 8, 16, 32 and 64 bits
  • Float in half (16bits), single (32bits) or double (64 bits) precision
  • Complexes in single (complex64: 2x32 bits) and double (complex128: 2x64 bits) precision for real and imaginary parts
  • Fixed length strings
  • Any python object: you loose any king of optimization

Nota:

  • numpy.float is the "python float" i.e. a 64-bit double precision float.
  • numpy.int is the "python int" which is 32 bits on windows and may be 64 bits on unix 64 bits systems

It is adviced to explicitly specify the data-type: numpy.uint16


In [13]:
import numpy
a = {'dict':'a'}
b = ['list','b']
c = ('tuple','c')
v = numpy.array([a, b, c,"bla"])
print(v)


[{'dict': 'a'} ['list', 'b'] ('tuple', 'c') 'bla']

Record Array

Like an array of struct where each field has a name:

Example: image stored each pixel a 3 uint8 for Red, Green and Blue channels:


In [14]:
img = numpy.zeros((2,2), {'names': ('r','g','b'), 'formats': (numpy.uint8, numpy.uint8, numpy.uint8)}) 
img["r"] = 10
print(img)


[[(10, 0, 0) (10, 0, 0)]
 [(10, 0, 0) (10, 0, 0)]]

View on an array

Nd-array are composed of a buffer containing the data (or the pointer to the data) and associated metadata. Multiple array can point on the same buffer, they are called "views" on the array. Modifying one view impacts all others.

Array Attributes

Each array has multiple attributes:

  • dtype: data type of the elements
  • itemsize: size of each element
  • ndim: number of dimention
  • shape: tuple containing the array dimensions. It is a Read and Write attribute.
  • size: number of elements
  • nbytes: size allocated to the underlying buffer
  • strides: tuple of bytes to step in each dimension when traversing an array
  • data: the actual buffer where the data are
  • flags: information about data contiguity in memory
  • flat: flat (1d) view of the array
  • T: transposed view of the array: this is performed on playing with shape and strides...

In [15]:
a = numpy.arange(6)
a.shape = (2,3)
print(a)
print("dtype: "+str(a.dtype))
print("shape: "+str(a.shape))
print("strides: "+str(a.strides))
print("buffer address: %s"%a.ctypes.data)


[[0 1 2]
 [3 4 5]]
dtype: int64
shape: (2, 3)
strides: (24, 8)
buffer address: 52598944

In [16]:
b = a.T
print(b)
print("dtype: "+str(b.dtype))
print("shape: "+str(b.shape))
print("strides: "+str(b.strides))
print("buffer address: %s"%b.ctypes.data)


[[0 3]
 [1 4]
 [2 5]]
dtype: int64
shape: (3, 2)
strides: (8, 24)
buffer address: 52598944

In [17]:
b[0,0] = 10
print(a)


[[10  1  2]
 [ 3  4  5]]

Shape/Strides and views/buffer are quite complex mecanisms ...

Indexing

One can select elements as with any other Python sequence.

  • Indexing starts at 0 for each array dimension
  • Indexes can be negative: x[-1] is the same as x[len(x) - 1]
  • Indexes can be a list or an array of index (integers)
  • Indexes can be an array of boolean of the same size

The output refers to the original array and usually it is not contiguous in memory.


In [18]:
a = numpy.arange(24).reshape((6, 4))
print(a)
a[3, 2] #extraction of a single element


[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]
 [20 21 22 23]]
Out[18]:
14

In [19]:
a[3:4, 2] #extraction of the third line, second element


Out[19]:
array([14])

In [20]:
a[3,:]     #same as previous assuming a has at least two dimensions


Out[20]:
array([12, 13, 14, 15])

In [21]:
a[0, -1] #last element of the first row


Out[21]:
3

In [22]:
a[0:2, 0:4:2] #slicing allowed


Out[22]:
array([[0, 2],
       [4, 6]])

In [23]:
a[0:2, :] = 5  #assignation is also possible

In [24]:
a>10


Out[24]:
array([[False, False, False, False],
       [False, False, False, False],
       [False, False, False,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True]], dtype=bool)

In [25]:
a[a>10]


Out[25]:
array([11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23])

In [26]:
a


Out[26]:
array([[ 5,  5,  5,  5],
       [ 5,  5,  5,  5],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23]])

Exercise 3

Provide an expression to calculate the difference X[i+1]-X[i] for all the elements of the 1D array X.

Subquestion: how to get all elements starting excluding the first ? excluding the last ?


In [27]:
X = numpy.array(range(100))
# Nop there is no answer yet :þ

NumPy ndarray methods

There are really a lot of methods to numpy arrays but the most important ones are (assuming a is an array):

  • a.min(): returns the minimum of the array
  • a.max(): returns the maximum of the array
  • a.sort(): returns an array with the sorted elements
  • a.argsort(): returns an array with the index to sort the array
  • a.sum(): returns the sum of the elements of the array
  • a.copy(): returns an actual copy of the array (not a view)

Note that most methods take axis, dtype and out optionnal arguments:

a.sum(axis=None, dtype=None, out=None) Perform the sum along a specified axis

  • axis: along which axis the operation should be performed
  • dtype: request the data type in which operations should be performed (in overflow of integers are expected)
  • out: provide an allocated array to store the result (prevents a malloc/free cycle)

In [28]:
dir(a) #to convince youself


Out[28]:
['T',
 '__abs__',
 '__add__',
 '__and__',
 '__array__',
 '__array_finalize__',
 '__array_interface__',
 '__array_prepare__',
 '__array_priority__',
 '__array_struct__',
 '__array_wrap__',
 '__class__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__delslice__',
 '__div__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getslice__',
 '__gt__',
 '__hash__',
 '__hex__',
 '__iadd__',
 '__iand__',
 '__idiv__',
 '__ifloordiv__',
 '__ilshift__',
 '__imod__',
 '__imul__',
 '__index__',
 '__init__',
 '__int__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__irshift__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__long__',
 '__lshift__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__nonzero__',
 '__oct__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdiv__',
 '__rdivmod__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rfloordiv__',
 '__rlshift__',
 '__rmod__',
 '__rmul__',
 '__ror__',
 '__rpow__',
 '__rrshift__',
 '__rshift__',
 '__rsub__',
 '__rtruediv__',
 '__rxor__',
 '__setattr__',
 '__setitem__',
 '__setslice__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__truediv__',
 '__xor__',
 'all',
 'any',
 'argmax',
 'argmin',
 'argpartition',
 'argsort',
 'astype',
 'base',
 'byteswap',
 'choose',
 'clip',
 'compress',
 'conj',
 'conjugate',
 'copy',
 'ctypes',
 'cumprod',
 'cumsum',
 'data',
 'diagonal',
 'dot',
 'dtype',
 'dump',
 'dumps',
 'fill',
 'flags',
 'flat',
 'flatten',
 'getfield',
 'imag',
 'item',
 'itemset',
 'itemsize',
 'max',
 'mean',
 'min',
 'nbytes',
 'ndim',
 'newbyteorder',
 'nonzero',
 'partition',
 'prod',
 'ptp',
 'put',
 'ravel',
 'real',
 'repeat',
 'reshape',
 'resize',
 'round',
 'searchsorted',
 'setfield',
 'setflags',
 'shape',
 'size',
 'sort',
 'squeeze',
 'std',
 'strides',
 'sum',
 'swapaxes',
 'take',
 'tofile',
 'tolist',
 'tostring',
 'trace',
 'transpose',
 'var',
 'view']

Numpy methods for I/O

To ASCII files:

  • numpy.loadtxt(filename)
  • numpy.savetxt(filename, array)

To binary files (can be compressed):

  • numpy.load(filename)
  • numpy.save(filename, array)

Exercise 4

  • Generate a 100x100 array with integers in increasing order
  • Perform a 2x2 binning on the array

Definition from wikipedia:

In the context of image processing, binning is the procedure of combining a cluster of pixels into a single pixel. As such, in 2x2 binning, an array of 4 pixels becomes a single larger pixel, reducing the overall number of pixels. This aggregation, reducing the number of data (with a loss of information), facilitates the analysis. For instance, binning the data may also reduce the impact of read noise on the processed image (at the cost of a lower resolution).


In [29]:
img = numpy.arange(6*6)
img.shape = 6,-1
img


Out[29]:
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])

In [30]:
img[::2,::2]+img[1::2,::2]+img[::2,1::2]+img[1::2,1::2]


Out[30]:
array([[ 14,  22,  30],
       [ 62,  70,  78],
       [110, 118, 126]])

In [31]:
c = img.copy()
c.shape = 3,2,3,2
c.sum(axis=-1).sum(axis=1)


Out[31]:
array([[ 14,  22,  30],
       [ 62,  70,  78],
       [110, 118, 126]])

Array operation

All standard operations when applied to arrays, operate element by element: +, -, *, /, //, % ...

Other common operations are:

  • numpy.dot(a, b) # Standard linear algebra matrix multiplication
  • numpy.inner(a, b) # Inner product
  • numpy.outer(a, b) # Outer product

Exercise 5

Write a function fillArray(n, m) to generate an array of dimension (n, m) in which X[i, j] = cos(i) * sin(j)

Time it for n=1000, m = 1000


In [31]:


In [34]:
from math import cos, sin
def no_numpy(n,m):
    x = []
    sinj = [sin(j) for j in range(m)]
    for i in range(n):
        cosi = cos(i)
        l=[0]*m
        for j,sj in enumerate(sinj):
            l[j] = sj*cosi
        x.append(l)
    return numpy.array(x)

%timeit no_numpy(1000,1000)


1 loops, best of 3: 171 ms per loop

In [35]:
def matlab_like(n,m):
    y,x = numpy.ogrid[:n,:m]
    return numpy.cos(y)*numpy.sin(x)
%timeit matlab_like(1000,1000)


100 loops, best of 3: 2.94 ms per loop

In [36]:
def optimized(n,m):
    msin = numpy.sin(numpy.arange(m))
    ncos = numpy.cos(numpy.arange(n))
    return numpy.outer(ncos,msin)
%timeit optimized(1000,1000)


100 loops, best of 3: 2.87 ms per loop

Conclusion

Speed is more a question of algorithm than a question of programming language !!!

Next: Numpy's C optimized libraries


In [ ]: