The ndarray object from NumPy

The (nd)array object:

Collection of elements of the same type
Implemented in memory as a true table optimized for performance
Handled in similar way as any other python object

Multidimensional, any type of data

Dimensions can be modified, flexible indexation
Internal optimization for 1D, 2D and 3D

It can be interfaced with other languages...

Nota: the array module exists in python but it is limited to 1d, not to be confused with NumPy's nd-array



In [1]:

    
import numpy

Array creation, given its contents

From a list of values:



In [2]:

    
numpy.array([1, 2, 3, 5, 7, 11, 13, 17])









    Out[2]:





array([ 1,  2,  3,  5,  7, 11, 13, 17])

From a list of lists for 2D arrays:



In [3]:

    
numpy.array([[1, 2, 3], [4, 5, 6]])









    Out[3]:





array([[1, 2, 3],
       [4, 5, 6]])

Also specifying the type of element:



In [4]:

    
numpy.array([[1, 2, 3], [4, 5, 6]], dtype=numpy.float64)









    Out[4]:





array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

Array creation using numpy methods



In [5]:

    
shape = (2,3)
numpy.zeros(shape,dtype="int32")









    Out[5]:





array([[0, 0, 0],
       [0, 0, 0]], dtype=int32)



In [6]:

    
numpy.ones(shape,dtype="int32")









    Out[6]:





array([[1, 1, 1],
       [1, 1, 1]], dtype=int32)



In [7]:

    
#this is a ipython trick to request help ...
numpy.arange?



In [8]:

    
numpy.arange(0, 10, 2)









    Out[8]:





array([0, 2, 4, 6, 8])



In [9]:

    
numpy.identity(3,dtype=numpy.int)









    Out[9]:





array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]])

Arrays can also be generated from index positions using numpy.fromfunction. This can be combined with lambda-function



In [10]:

    
lambda_function = lambda i,j: numpy.sin(i) * numpy.cos(j)
numpy.fromfunction(lambda_function, (3,3))









    Out[10]:





array([[ 0.        ,  0.        , -0.        ],
       [ 0.84147098,  0.45464871, -0.35017549],
       [ 0.90929743,  0.4912955 , -0.37840125]])

Numpy array can directly be dumped to and restrored from disk:



In [11]:

    
a = numpy.ones((3, 5), dtype="complex")
numpy.save('data.npy', a)
b = numpy.load('data.npy')
print(b)









    



[[ 1.+0.j  1.+0.j  1.+0.j  1.+0.j  1.+0.j]
 [ 1.+0.j  1.+0.j  1.+0.j  1.+0.j  1.+0.j]
 [ 1.+0.j  1.+0.j  1.+0.j  1.+0.j  1.+0.j]]

Exercise 2

Use Python as a simple calculator and compare the behaviour of numpy array (vs lists) as in Exercise 1

a = [1, 2, 3]

b = numpy.array(a)

What is the result of 2 * a[2] ?
What is the result of 2 * a ?
What is the result of 2 * b[2] ?
What is the result of 2 * b ?



In [12]:

    
a = [1, 2, 3]
import numpy
b = numpy.array(a)
print(2*a[2])
print(2*a)
print(2*b[2])
print(2*b)









    



6
[1, 2, 3, 1, 2, 3]
6
[2 4 6]

Type of elements

Integers, signed or not, of width 8, 16, 32 and 64 bits
Float in half (16bits), single (32bits) or double (64 bits) precision
Complexes in single (complex64: 2x32 bits) and double (complex128: 2x64 bits) precision for real and imaginary parts
Fixed length strings
Any python object: you loose any king of optimization

Nota:

numpy.float is the "python float" i.e. a 64-bit double precision float.
numpy.int is the "python int" which is 32 bits on windows and may be 64 bits on unix 64 bits systems

It is adviced to explicitly specify the data-type: numpy.uint16



In [13]:

    
import numpy
a = {'dict':'a'}
b = ['list','b']
c = ('tuple','c')
v = numpy.array([a, b, c,"bla"])
print(v)









    



[{'dict': 'a'} ['list', 'b'] ('tuple', 'c') 'bla']

Record Array

Like an array of struct where each field has a name:

Example: image stored each pixel a 3 uint8 for Red, Green and Blue channels:



In [14]:

    
img = numpy.zeros((2,2), {'names': ('r','g','b'), 'formats': (numpy.uint8, numpy.uint8, numpy.uint8)}) 
img["r"] = 10
print(img)









    



[[(10, 0, 0) (10, 0, 0)]
 [(10, 0, 0) (10, 0, 0)]]

View on an array

Nd-array are composed of a buffer containing the data (or the pointer to the data) and associated metadata. Multiple array can point on the same buffer, they are called "views" on the array. Modifying one view impacts all others.

Array Attributes

Each array has multiple attributes:

dtype: data type of the elements
itemsize: size of each element
ndim: number of dimention
shape: tuple containing the array dimensions. It is a Read and Write attribute.
size: number of elements
nbytes: size allocated to the underlying buffer
strides: tuple of bytes to step in each dimension when traversing an array
data: the actual buffer where the data are
flags: information about data contiguity in memory
flat: flat (1d) view of the array
T: transposed view of the array: this is performed on playing with shape and strides...



In [15]:

    
a = numpy.arange(6)
a.shape = (2,3)
print(a)
print("dtype: "+str(a.dtype))
print("shape: "+str(a.shape))
print("strides: "+str(a.strides))
print("buffer address: %s"%a.ctypes.data)









    



[[0 1 2]
 [3 4 5]]
dtype: int64
shape: (2, 3)
strides: (24, 8)
buffer address: 52598944



In [16]:

    
b = a.T
print(b)
print("dtype: "+str(b.dtype))
print("shape: "+str(b.shape))
print("strides: "+str(b.strides))
print("buffer address: %s"%b.ctypes.data)









    



[[0 3]
 [1 4]
 [2 5]]
dtype: int64
shape: (3, 2)
strides: (8, 24)
buffer address: 52598944



In [17]:

    
b[0,0] = 10
print(a)









    



[[10  1  2]
 [ 3  4  5]]

Shape/Strides and views/buffer are quite complex mecanisms ...

Indexing

One can select elements as with any other Python sequence.

Indexing starts at 0 for each array dimension
Indexes can be negative: x[-1] is the same as x[len(x) - 1]
Indexes can be a list or an array of index (integers)
Indexes can be an array of boolean of the same size

The output refers to the original array and usually it is not contiguous in memory.



In [18]:

    
a = numpy.arange(24).reshape((6, 4))
print(a)
a[3, 2] #extraction of a single element









    



[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]
 [20 21 22 23]]






    Out[18]:





14



In [19]:

    
a[3:4, 2] #extraction of the third line, second element









    Out[19]:





array([14])



In [20]:

    
a[3,:]     #same as previous assuming a has at least two dimensions









    Out[20]:





array([12, 13, 14, 15])



In [21]:

    
a[0, -1] #last element of the first row









    Out[21]:





3



In [22]:

    
a[0:2, 0:4:2] #slicing allowed









    Out[22]:





array([[0, 2],
       [4, 6]])



In [23]:

    
a[0:2, :] = 5  #assignation is also possible



In [24]:

    
a>10









    Out[24]:





array([[False, False, False, False],
       [False, False, False, False],
       [False, False, False,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True]], dtype=bool)



In [25]:

    
a[a>10]









    Out[25]:





array([11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23])



In [26]:

    
a









    Out[26]:





array([[ 5,  5,  5,  5],
       [ 5,  5,  5,  5],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23]])

Exercise 3

Provide an expression to calculate the difference X[i+1]-X[i] for all the elements of the 1D array X.

Subquestion: how to get all elements starting excluding the first ? excluding the last ?



In [27]:

    
X = numpy.array(range(100))
# Nop there is no answer yet :þ

NumPy ndarray methods

There are really a lot of methods to numpy arrays but the most important ones are (assuming a is an array):

a.min(): returns the minimum of the array
a.max(): returns the maximum of the array
a.sort(): returns an array with the sorted elements
a.argsort(): returns an array with the index to sort the array
a.sum(): returns the sum of the elements of the array
a.copy(): returns an actual copy of the array (not a view)

Note that most methods take axis, dtype and out optionnal arguments:

a.sum(axis=None, dtype=None, out=None) Perform the sum along a specified axis

axis: along which axis the operation should be performed
dtype: request the data type in which operations should be performed (in overflow of integers are expected)
out: provide an allocated array to store the result (prevents a malloc/free cycle)



In [28]:

    
dir(a) #to convince youself









    Out[28]:





['T',
 '__abs__',
 '__add__',
 '__and__',
 '__array__',
 '__array_finalize__',
 '__array_interface__',
 '__array_prepare__',
 '__array_priority__',
 '__array_struct__',
 '__array_wrap__',
 '__class__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__delslice__',
 '__div__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getslice__',
 '__gt__',
 '__hash__',
 '__hex__',
 '__iadd__',
 '__iand__',
 '__idiv__',
 '__ifloordiv__',
 '__ilshift__',
 '__imod__',
 '__imul__',
 '__index__',
 '__init__',
 '__int__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__irshift__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__long__',
 '__lshift__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__nonzero__',
 '__oct__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdiv__',
 '__rdivmod__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rfloordiv__',
 '__rlshift__',
 '__rmod__',
 '__rmul__',
 '__ror__',
 '__rpow__',
 '__rrshift__',
 '__rshift__',
 '__rsub__',
 '__rtruediv__',
 '__rxor__',
 '__setattr__',
 '__setitem__',
 '__setslice__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__truediv__',
 '__xor__',
 'all',
 'any',
 'argmax',
 'argmin',
 'argpartition',
 'argsort',
 'astype',
 'base',
 'byteswap',
 'choose',
 'clip',
 'compress',
 'conj',
 'conjugate',
 'copy',
 'ctypes',
 'cumprod',
 'cumsum',
 'data',
 'diagonal',
 'dot',
 'dtype',
 'dump',
 'dumps',
 'fill',
 'flags',
 'flat',
 'flatten',
 'getfield',
 'imag',
 'item',
 'itemset',
 'itemsize',
 'max',
 'mean',
 'min',
 'nbytes',
 'ndim',
 'newbyteorder',
 'nonzero',
 'partition',
 'prod',
 'ptp',
 'put',
 'ravel',
 'real',
 'repeat',
 'reshape',
 'resize',
 'round',
 'searchsorted',
 'setfield',
 'setflags',
 'shape',
 'size',
 'sort',
 'squeeze',
 'std',
 'strides',
 'sum',
 'swapaxes',
 'take',
 'tofile',
 'tolist',
 'tostring',
 'trace',
 'transpose',
 'var',
 'view']

Numpy methods for I/O

To ASCII files:

numpy.loadtxt(filename)
numpy.savetxt(filename, array)

To binary files (can be compressed):

numpy.load(filename)
numpy.save(filename, array)

Exercise 4

Generate a 100x100 array with integers in increasing order
Perform a 2x2 binning on the array

Definition from wikipedia:

In the context of image processing, binning is the procedure of combining a cluster of pixels into a single pixel. As such, in 2x2 binning, an array of 4 pixels becomes a single larger pixel, reducing the overall number of pixels. This aggregation, reducing the number of data (with a loss of information), facilitates the analysis. For instance, binning the data may also reduce the impact of read noise on the processed image (at the cost of a lower resolution).



In [29]:

    
img = numpy.arange(6*6)
img.shape = 6,-1
img









    Out[29]:





array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])



In [30]:

    
img[::2,::2]+img[1::2,::2]+img[::2,1::2]+img[1::2,1::2]









    Out[30]:





array([[ 14,  22,  30],
       [ 62,  70,  78],
       [110, 118, 126]])



In [31]:

    
c = img.copy()
c.shape = 3,2,3,2
c.sum(axis=-1).sum(axis=1)









    Out[31]:





array([[ 14,  22,  30],
       [ 62,  70,  78],
       [110, 118, 126]])

Array operation

All standard operations when applied to arrays, operate element by element: +, -, *, /, //, % ...

Other common operations are:

numpy.dot(a, b) # Standard linear algebra matrix multiplication
numpy.inner(a, b) # Inner product
numpy.outer(a, b) # Outer product

Exercise 5

Write a function fillArray(n, m) to generate an array of dimension (n, m) in which X[i, j] = cos(i) * sin(j)

Time it for n=1000, m = 1000



In [31]:



In [34]:

    
from math import cos, sin
def no_numpy(n,m):
    x = []
    sinj = [sin(j) for j in range(m)]
    for i in range(n):
        cosi = cos(i)
        l=[0]*m
        for j,sj in enumerate(sinj):
            l[j] = sj*cosi
        x.append(l)
    return numpy.array(x)

%timeit no_numpy(1000,1000)









    



1 loops, best of 3: 171 ms per loop



In [35]:

    
def matlab_like(n,m):
    y,x = numpy.ogrid[:n,:m]
    return numpy.cos(y)*numpy.sin(x)
%timeit matlab_like(1000,1000)









    



100 loops, best of 3: 2.94 ms per loop



In [36]:

    
def optimized(n,m):
    msin = numpy.sin(numpy.arange(m))
    ncos = numpy.cos(numpy.arange(n))
    return numpy.outer(ncos,msin)
%timeit optimized(1000,1000)









    



100 loops, best of 3: 2.87 ms per loop

Conclusion

Speed is more a question of algorithm than a question of programming language !!!

Next: Numpy's C optimized libraries



In [ ]: