NumPy: creating and manipulating numerical data

(See Ch. 3, "Python Scientific lecture notes", Release 2012.3 (EuroScipy 2012)

by the EuroScipy tutorial team

Editors: Valentin Haenel, Emmanuelle Gouillart, Gaël Varoquaux

http://scipy-lectures.github.com)

The numpy array object

Python has built-in:

containers: lists (costless insertion and append), dictionaries (fast lookup)
high-level number objects: integers, floating point

Numpy is:

extension package to Python for multi-dimensional arrays
closer to hardware (efficiency)
designed for scientific computation (convenience)



In [5]:

    
import numpy as np
a = np.array([0, 1, 2, 3, 4])
a
a = np.arange(5)
a









    Out[5]:





array([0, 1, 2, 3, 4])

For example...

...an array containing:

values of an experiment/simulation at discrete time steps
signal recorded by a measurement device, e.g. sound wave
pixels of an image, grey-level or colour
3-D data measured at different X-Y-Z positions, e.g. MRI scan
...

Why is it useful?

efficient numerical operations



In [6]:

    
l = range(1000)
l**2
%timeit [i**2 for i in l]









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-7aed15c9ab3b> in <module>()
      1 l = range(1000)
----> 2 l**2
      3 get_ipython().magic(u'timeit [i**2 for i in l]')

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'



In [8]:

    
a = np.arange(10)
%timeit a**2
a**2









    



1000000 loops, best of 3: 472 ns per loop






    Out[8]:





array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81])

Reference documentation:

On the web: http://docs.scipy.org/
Interactive help:



In [9]:

    
help(np.array)









    



Help on built-in function array in module numpy.core.multiarray:

array(...)
    array(object, dtype=None, copy=True, order=None, subok=False, ndmin=0)
    
    Create an array.
    
    Parameters
    ----------
    object : array_like
        An array, any object exposing the array interface, an
        object whose __array__ method returns an array, or any
        (nested) sequence.
    dtype : data-type, optional
        The desired data-type for the array.  If not given, then
        the type will be determined as the minimum type required
        to hold the objects in the sequence.  This argument can only
        be used to 'upcast' the array.  For downcasting, use the
        .astype(t) method.
    copy : bool, optional
        If true (default), then the object is copied.  Otherwise, a copy
        will only be made if __array__ returns a copy, if obj is a
        nested sequence, or if a copy is needed to satisfy any of the other
        requirements (`dtype`, `order`, etc.).
    order : {'C', 'F', 'A'}, optional
        Specify the order of the array.  If order is 'C' (default), then the
        array will be in C-contiguous order (last-index varies the
        fastest).  If order is 'F', then the returned array
        will be in Fortran-contiguous order (first-index varies the
        fastest).  If order is 'A', then the returned array may
        be in any order (either C-, Fortran-contiguous, or even
        discontiguous).
    subok : bool, optional
        If True, then sub-classes will be passed-through, otherwise
        the returned array will be forced to be a base-class array (default).
    ndmin : int, optional
        Specifies the minimum number of dimensions that the resulting
        array should have.  Ones will be pre-pended to the shape as
        needed to meet this requirement.
    
    Returns
    -------
    out : ndarray
        An array object satisfying the specified requirements.
    
    See Also
    --------
    empty, empty_like, zeros, zeros_like, ones, ones_like, fill
    
    Examples
    --------
    >>> np.array([1, 2, 3])
    array([1, 2, 3])
    
    Upcasting:
    
    >>> np.array([1, 2, 3.0])
    array([ 1.,  2.,  3.])
    
    More than one dimension:
    
    >>> np.array([[1, 2], [3, 4]])
    array([[1, 2],
           [3, 4]])
    
    Minimum dimensions 2:
    
    >>> np.array([1, 2, 3], ndmin=2)
    array([[1, 2, 3]])
    
    Type provided:
    
    >>> np.array([1, 2, 3], dtype=complex)
    array([ 1.+0.j,  2.+0.j,  3.+0.j])
    
    Data-type consisting of more than one element:
    
    >>> x = np.array([(1,2),(3,4)],dtype=[('a','<i4'),('b','<i4')])
    >>> x['a']
    array([1, 3])
    
    Creating an array from sub-classes:
    
    >>> np.array(np.mat('1 2; 3 4'))
    array([[1, 2],
           [3, 4]])
    
    >>> np.array(np.mat('1 2; 3 4'), subok=True)
    matrix([[1, 2],
            [3, 4]])



In [10]:

    
np.array?

Type: builtin_function_or_method String Form: Docstring: array(object, dtype=None, copy=True, order=None, subok=False, ndmin=0) Create an array. Parameters ---------- object : array_like An array, any object exposing the array interface, an ... dtype : data-type, optional The desired data-type for the array. If not given, then ... copy : bool, optional If true (default), then the object is copied. Otherwise, a copy ...

Looking for something:



In [11]:

    
np.lookfor('create array')









    



Search results for 'create array'
---------------------------------
numpy.array
    Create an array.
numpy.memmap
    Create a memory-map to an array stored in a *binary* file on disk.
numpy.diagflat
    Create a two-dimensional array with the flattened input as a diagonal.
numpy.fromiter
    Create a new 1-dimensional array from an iterable object.
numpy.partition
    Return a partitioned copy of an array.
numpy.ma.diagflat
    Create a two-dimensional array with the flattened input as a diagonal.
numpy.ctypeslib.as_array
    Create a numpy array from a ctypes array or a ctypes POINTER.
numpy.ma.make_mask
    Create a boolean mask from an array.
numpy.ctypeslib.as_ctypes
    Create and return a ctypes object from a numpy array.  Actually
numpy.ma.mrecords.fromarrays
    Creates a mrecarray from a (flat) list of masked arrays.
numpy.lib.format.open_memmap
    Open a .npy file as a memory-mapped array.
numpy.ma.MaskedArray.__new__
    Create a new masked array from scratch.
numpy.lib.arrayterator.Arrayterator
    Buffered iterator for big arrays.
numpy.ma.mrecords.fromtextfile
    Creates a mrecarray from data stored in the file `filename`.
numpy.asarray
    Convert the input to an array.
numpy.ndarray
    ndarray(shape, dtype=float, buffer=None, offset=0,
numpy.recarray
    Construct an ndarray that allows field access using attributes.
numpy.chararray
    chararray(shape, itemsize=1, unicode=False, buffer=None, offset=0,
numpy.pad
    Pads an array.
numpy.sum
    Sum of array elements over a given axis.
numpy.asanyarray
    Convert the input to an ndarray, but pass ndarray subclasses through.
numpy.copy
    Return an array copy of the given object.
numpy.diag
    Extract a diagonal or construct a diagonal array.
numpy.load
    Load arrays or pickled objects from ``.npy``, ``.npz`` or pickled files.
numpy.sort
    Return a sorted copy of an array.
numpy.array_equiv
    Returns True if input arrays are shape consistent and all elements equal.
numpy.dtype
    Create a data type object.
numpy.choose
    Construct an array from an index array and a set of arrays to choose from.
numpy.nditer
    Efficient multi-dimensional iterator object to iterate over arrays.
numpy.swapaxes
    Interchange two axes of an array.
numpy.full_like
    Return a full array with the same shape and type as a given array.
numpy.ones_like
    Return an array of ones with the same shape and type as a given array.
numpy.empty_like
    Return a new array with the same shape and type as a given array.
numpy.zeros_like
    Return an array of zeros with the same shape and type as a given array.
numpy.asarray_chkfinite
    Convert the input to an array, checking for NaNs or Infs.
numpy.diag_indices
    Return the indices to access the main diagonal of an array.
numpy.ma.choose
    Use an index array to construct a new array from a set of choices.
numpy.chararray.tolist
    a.tolist()
numpy.matlib.rand
    Return a matrix of random values with given shape.
numpy.savez_compressed
    Save several arrays into a single file in compressed ``.npz`` format.
numpy.ma.empty_like
    Return a new array with the same shape and type as a given array.
numpy.ma.make_mask_none
    Return a boolean mask of the given shape, filled with False.
numpy.ma.mrecords.fromrecords
    Creates a MaskedRecords from a list of records.
numpy.around
    Evenly round to the given number of decimals.
numpy.source
    Print or write to a file the source code for a Numpy object.
numpy.diagonal
    Return specified diagonals.
numpy.histogram2d
    Compute the bi-dimensional histogram of two data samples.
numpy.fft.ifft
    Compute the one-dimensional inverse discrete Fourier Transform.
numpy.fft.ifftn
    Compute the N-dimensional inverse discrete Fourier Transform.
numpy.busdaycalendar
    A business day calendar object that efficiently stores information



In [12]:

    
np.con*?

np.concatenate np.conj np.conjugate np.convolve

Creating arrays



In [13]:

    
a = np.array([0, 1, 2, 3])
a









    Out[13]:





array([0, 1, 2, 3])



In [15]:

    
a.ndim









    Out[15]:





1



In [16]:

    
a.shape









    Out[16]:





(4,)



In [ ]:

    
len(a)



In [23]:

    
b = np.array([(0, 1, 2), [3, 4, 5]])
b









    Out[23]:





array([[0, 1, 2],
       [3, 4, 5]])



In [25]:

    
b.ndim









    Out[25]:





2



In [19]:

    
len(b) # returns the size of the first dimension









    Out[19]:





2



In [20]:

    
b.shape









    Out[20]:





(2, 3)



In [26]:

    
c = np.array([[[1, 2], [3, 4], [5, 6]], [[7, 8], [9, 10], [11, 12]]])
c









    Out[26]:





array([[[ 1,  2],
        [ 3,  4],
        [ 5,  6]],

       [[ 7,  8],
        [ 9, 10],
        [11, 12]]])



In [27]:

    
c.shape









    Out[27]:





(2, 3, 2)

In practice, we rarely enter items one by one...

Evenly spaced:



In [28]:

    
a = np.arange(10) # 0 .. n-1 (!)
a









    Out[28]:





array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])



In [31]:

    
b = np.arange(1, 9, 1) # start, end (exlusive), step
b









    Out[31]:





array([1, 2, 3, 4, 5, 6, 7, 8])

or by number of points:



In [33]:

    
c = np.linspace(0, 1, 6, endpoint=False) # start, end, num-points
c









    Out[33]:





array([ 0.        ,  0.16666667,  0.33333333,  0.5       ,  0.66666667,
        0.83333333])



In [ ]:

    
d = np.linspace(0, 1, 5, endpoint=False)
d

Common arrays:



In [37]:

    
a = np.ones((3, 3)) # reminder: (3, 3) is a tuple
a









    Out[37]:





array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])



In [38]:

    
b = np.zeros((2, 2))
b









    Out[38]:





array([[ 0.,  0.],
       [ 0.,  0.]])



In [39]:

    
c = np.eye(3)
c









    Out[39]:





array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])



In [40]:

    
d = np.diag(np.array([1, 2, 3, 4]))
d









    Out[40]:





array([[1, 0, 0, 0],
       [0, 2, 0, 0],
       [0, 0, 3, 0],
       [0, 0, 0, 4]])

np.random: random numbers (Mersenne Twister PRNG):



In [41]:

    
a = np.random.rand(4) # uniform in [0, 1]
a









    Out[41]:





array([ 0.09739491,  0.13817063,  0.67998996,  0.83138239])



In [55]:

    
b = np.random.randn(4) # Gaussian
b









    Out[55]:





array([ 0.47143516, -1.19097569,  1.43270697, -0.3126519 ])



In [54]:

    
np.random.seed(1234) # Setting the random seed

Basic data types

You may have noticed that, in some instances, array elements are displayed with a trailing dot (e.g. 2. vs 2).

This is due to a difference in the data-type used:



In [62]:

    
a = np.array([1, 2, 3]).astype(float)
a[0] = 2121.2
a









    Out[62]:





array([  2.12120000e+03,   2.00000000e+00,   3.00000000e+00])



In [57]:

    
b = np.array([1., 2., 3.])
b.dtype









    Out[57]:





dtype('float64')

Different data-types allow us to store data more compactly in memory, but most of the time we simply work with floating point numbers.

Note that, in the example above, NumPy auto-detects the data-type from the input.

You can explicitly specify which data-type you want:



In [ ]:

    
c = np.array([1, 2, 3], dtype=int)
c.dtype

The default data type is floating point:



In [ ]:

    
a = np.ones((3, 3))
a.dtype

There are also other types:

Complex



In [ ]:

    
d = np.array([1+2j, 3+4j, 5+6*1j])
d.dtype

Bool



In [ ]:

    
e = np.array([True, False, False, True])
e.dtype

Strings



In [ ]:

    
f = np.array(['a','happy','world'])
f.dtype # <--- strings containing max. 5 letters



In [ ]:

    
f[:] = 'abcdefg'
f

Much more, ... int32/int64 ...

Indexing and slicing

The items of an array can be accessed and assigned to the same way as other Python sequences (e.g. lists)



In [63]:

    
a = np.arange(10)
a









    Out[63]:





array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])



In [66]:

    
a[-4]









    Out[66]:





6

Warning: Indices begin at 0, like other Python sequences (and C/C++).

In contrast, in Fortran or Matlab, indices begin at 1.

For multidimensional arrays, indices are tuples of integers:



In [67]:

    
a = np.diag(np.arange(3))
a









    Out[67]:





array([[0, 0, 0],
       [0, 1, 0],
       [0, 0, 2]])



In [68]:

    
a[1, 1]









    Out[68]:





1



In [69]:

    
a[2, 1] = 10 # third line, second column
a









    Out[69]:





array([[ 0,  0,  0],
       [ 0,  1,  0],
       [ 0, 10,  2]])



In [70]:

    
a[1]









    Out[70]:





array([0, 1, 0])

Note that:

In 2D, the first dimension corresponds to rows, the second to columns.
for multidimensional a, a[0] is interpreted by taking all elements in the unspecified dimensions.

Slicing

Arrays, like other Python sequences can also be sliced:



In [71]:

    
a = np.arange(10)
a









    Out[71]:





array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])



In [72]:

    
a[2:9:3] # [start:end:step]









    Out[72]:





array([2, 5, 8])

Note that:

last index is not included!



In [73]:

    
a[:4]









    Out[73]:





array([0, 1, 2, 3])

All three slice components are optional
by default, start is 0, end is the last and step is 1:



In [ ]:

    
a[1:3]



In [74]:

    
a[::2]









    Out[74]:





array([0, 2, 4, 6, 8])



In [ ]:

    
a[3:]

Copies and views

A slicing operation creates a view on the original array.
This is just a way of accessing array data. Thus the original array is not copied in memory.
When modifying the view, the original array is modified as well:



In [75]:

    
a = np.arange(10)
a









    Out[75]:





array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])



In [76]:

    
b = a[::2]
b









    Out[76]:





array([0, 2, 4, 6, 8])



In [77]:

    
b[0] = 12
b









    Out[77]:





array([12,  2,  4,  6,  8])



In [83]:

    
x = [1,2,3,4]
b = x[0:2]
b[0]=100000
x









    Out[83]:





[1, 2, 3, 4]



In [79]:

    
a = np.arange(10)
b = a[::2].copy() # force a copy
b[0] = 12
a









    Out[79]:





array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

This behavior can be surprising at first sight...
but it allows to save both memory and time.

Warning: The transpose is a view

As a result, a matrix cannot be made symmetric in-place:



In [86]:

    
a = np.ones((100, 100))
a += a.T
a









    Out[86]:





array([[ 2.,  2.,  2., ...,  2.,  2.,  2.],
       [ 2.,  2.,  2., ...,  2.,  2.,  2.],
       [ 2.,  2.,  2., ...,  2.,  2.,  2.],
       ..., 
       [ 3.,  3.,  3., ...,  2.,  2.,  2.],
       [ 3.,  3.,  3., ...,  2.,  2.,  2.],
       [ 3.,  3.,  3., ...,  2.,  2.,  2.]])



In [87]:

    
a = np.ones((100,100))
a += a.copy().T
a









    Out[87]:





array([[ 2.,  2.,  2., ...,  2.,  2.,  2.],
       [ 2.,  2.,  2., ...,  2.,  2.,  2.],
       [ 2.,  2.,  2., ...,  2.,  2.,  2.],
       ..., 
       [ 2.,  2.,  2., ...,  2.,  2.,  2.],
       [ 2.,  2.,  2., ...,  2.,  2.,  2.],
       [ 2.,  2.,  2., ...,  2.,  2.,  2.]])

A worked Exercise: Prime number sieve

Compute prime numbers in 0–99, with a "sieve"

Construct a shape (100,) boolean array is_prime, filled with True in the beginning
Cross out 0 and 1 which are not primes
For each integer j starting from 2, cross out its higher multiples

Construct a shape (100,) boolean array is_prime, filled with True in the beginning:



In [ ]:

Cross out 0 and 1 which are not primes:



In [ ]:

For each integer j starting from 2, cross out its higher multiples:



In [ ]:

Skim through help(np.nonzero), and print the prime numbers
Follow-up:
- Move the above code into a script file named prime_sieve.py
- Run it to check it works



In [ ]:

    
help(np.nonzero)



In [ ]:

Convert the simple sieve to the sieve of Eratosthenes:
1. Skip j which are already known to not be primes
2. The first number to cross out is j^2



In [ ]:

Adding Axes

Indexing with the np.newaxis object allows us to add an axis to an array:



In [88]:

    
z = np.array([1, 2, 3])
z









    Out[88]:





array([1, 2, 3])



In [89]:

    
z[:, np.newaxis]









    Out[89]:





array([[1],
       [2],
       [3]])



In [90]:

    
z[np.newaxis, :]









    Out[90]:





array([[1, 2, 3]])

Fancy indexing

Numpy arrays can be indexed with slices
but also with boolean or integer arrays (masks)
This method is called fancy indexing.
It creates copies not views.

Using boolean masks



In [96]:

    
np.random.seed(3)
a = np.random.random_integers(0, 20, 15)
a









    Out[96]:





array([10,  3,  8,  0, 19, 10, 11,  9, 10,  6,  0, 20, 12,  7, 14])



In [92]:

    
(a % 3 == 0)









    Out[92]:





array([False,  True, False,  True, False, False, False,  True, False,
        True,  True, False,  True, False, False], dtype=bool)



In [97]:

    
mask = (a % 3 == 0)
extract_from_a = a[mask] # or, a[a%3==0]
extract_from_a # extract a sub-array with the mask
extract_from_a[0] = 9999999
a









    Out[97]:





array([10,  3,  8,  0, 19, 10, 11,  9, 10,  6,  0, 20, 12,  7, 14])

Indexing with a mask can be very useful to assign a new value to a sub-array:



In [98]:

    
a[a % 3 == 0] = -1
a









    Out[98]:





array([10, -1,  8, -1, 19, 10, 11, -1, 10, -1, -1, 20, -1,  7, 14])

Indexing with an array of integers



In [102]:

    
a = np.arange(2,12)
a









    Out[102]:





array([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

Indexing can be done with an array of integers, where the same index is repeated several time:



In [103]:

    
a[[2, 3, 2, 4, 2]] # note: [2, 3, 2, 4, 2] is a Python list









    Out[103]:





array([4, 5, 4, 6, 4])

New values can be assigned with this kind of indexing:



In [104]:

    
a[[9, 7]] = -10
a









    Out[104]:





array([  2,   3,   4,   5,   6,   7,   8, -10,  10, -10])

When a new array is created by indexing with an array of integers, the new array has the same shape as the array of integers:



In [ ]:

    
a = np.arange(10)
idx = np.array([[3, 4], [9, 7]])
a[idx]

Consider the following examples of fancy indexing:



In [ ]:

    
a = np.arange(36).reshape(6,6)
a



In [ ]:

    
a[(0,1,2,3,4),(1,2,3,4,5)]



In [ ]:

    
a[:3,[0,2,5]]



In [ ]:

    
mask = np.array([1,0,1,0,0,1],dtype=bool)
a[mask,2]

We can even use fancy indexing and broadcasting (later) at the same time:



In [ ]:

    
a = np.arange(12).reshape(3,4)
a



In [ ]:

    
i = np.array([[0, 1], [1, 2]])
a[i, 2] # same as a[i, 2*np.ones((2, 2), dtype=int)]

Numerical operations on arrays

Most of numpy and scipy's mathematical operations are vectorized. Applying these operations to arrays means they are applied elementwise. However, similar to MATLAB, internally the elementwise operations are implemented in C and C++. Thus vectorized computations are much faster than if you would simply loop through each element yourself and apply your operation manually.

So alway try think vectorized instead of manually looping through arrays.

Elementwise operations

With scalars:



In [105]:

    
a = np.array([1, 2, 3, 4])
a + 1









    Out[105]:





array([2, 3, 4, 5])



In [106]:

    
2**a









    Out[106]:





array([ 2,  4,  8, 16])



In [107]:

    
"""Speed up of vectorization"""
def twice_each_element(a):
    b=np.zeros(a.shape)
    
    for idx, elem in enumerate(a):
        b[idx] = 2*elem
    
    return b

a = np.arange(100)

%timeit twice_each_element(a)
%timeit 2*a









    



10000 loops, best of 3: 35.3 µs per loop
1000000 loops, best of 3: 944 ns per loop

All arithmetic operates elementwise:



In [109]:

    
a = np.array([1, 2, 3, 4])
b = np.ones(4) + 1
print a
print b
a - b









    



[1 2 3 4]
[ 2.  2.  2.  2.]






    Out[109]:





array([-1.,  0.,  1.,  2.])



In [112]:

    
a*b









    Out[112]:





array([ 2.,  4.,  6.,  8.])



In [113]:

    
j = np.arange(5)
2**(j + 1) - j









    Out[113]:





array([ 2,  3,  6, 13, 28])

Warning:

Array multiplication is not matrix multiplication:



In [114]:

    
c = np.ones((3, 3))
c * c # NOT matrix multiplication!









    Out[114]:





array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

Note: Matrix multiplication:



In [115]:

    
c.dot(c)









    Out[115]:





array([[ 3.,  3.,  3.],
       [ 3.,  3.,  3.],
       [ 3.,  3.,  3.]])

Comparisons:



In [116]:

    
a = np.array([1, 2, 3, 4])
b = np.array([4, 2, 2, 4])
a == b









    Out[116]:





array([False,  True, False,  True], dtype=bool)



In [117]:

    
a > b









    Out[117]:





array([False, False,  True, False], dtype=bool)

Logical operations:



In [118]:

    
a = np.array([1, 1, 0, 0], dtype=bool)
b = np.array([1, 0, 1, 0], dtype=bool)
np.logical_or(a, b)









    Out[118]:





array([ True,  True,  True, False], dtype=bool)



In [ ]:

    
np.logical_and(a, b)

Shape mismatches:



In [119]:

    
a = np.arange(4)
a









    Out[119]:





array([0, 1, 2, 3])



In [120]:

    
a + np.array([1, 2])









    



---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-120-033edaa49cc8> in <module>()
----> 1 a + np.array([1, 2])

ValueError: operands could not be broadcast together with shapes (4,) (2,)

‘Broadcast’? We’ll return to that later.

Transposition:



In [121]:

    
a = np.triu(np.ones((3, 3)), 1) # see help(np.triu)
a









    Out[121]:





array([[ 0.,  1.,  1.],
       [ 0.,  0.,  1.],
       [ 0.,  0.,  0.]])



In [122]:

    
a.T









    Out[122]:





array([[ 0.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  1.,  0.]])

Note: Linear algebra

The sub-module np.linalg implements basic linear algebra
- solving linear systems
- singular value decomposition
- etc
However,
- it is not guaranteed to be compiled using efficient routines
- thus the use of scipy.linalg is recommended...

Basic reductions

Computing sums:



In [123]:

    
x = np.array([1, 2, 3, 4])
np.sum(x)









    Out[123]:





10



In [ ]:

    
x.sum()

Sum by rows and by columns:



In [124]:

    
x = np.array([[1, 1], [2, 2]])
x









    Out[124]:





array([[1, 1],
       [2, 2]])



In [125]:

    
x.sum(axis=0) # columns (first dimension)









    Out[125]:





array([3, 3])



In [126]:

    
x[:, 0].sum(), x[:, 1].sum()









    Out[126]:





(3, 3)



In [ ]:

    
x.sum(axis=1) # rows (second dimension)



In [ ]:

    
x[0, :].sum(), x[1, :].sum()

Same idea in higher dimensions:



In [ ]:

    
x = np.random.rand(2, 2, 2)
x.sum(axis=2)[0, 1]



In [ ]:

    
x[0, 1, :].sum()

Other reductions— works the same way (and take axis=)

Statistics:



In [ ]:

    
x = np.array([1, 2, 3, 1])
y = np.array([[1, 2, 3], [5, 6, 1]])
x.mean()



In [ ]:

    
np.median(x)



In [ ]:

    
np.median(y, axis=-1) # last axis



In [ ]:

    
x.std() # full population standard dev.

Extrema:



In [ ]:

    
x = np.array([1, 3, 2])
x.min()



In [ ]:

    
x.max()



In [ ]:

    
x.argmin() # index of minimum



In [ ]:

    
x.argmax() # index of maximum

Logical operations:



In [ ]:

    
np.all([True, True, False])



In [ ]:

    
np.any([True, True, False])

Note: Can be used for array comparisons:



In [ ]:

    
a = np.zeros((100, 100))
np.any(a != 0)



In [ ]:

    
np.all(a == a)



In [ ]:

    
a = np.array([1, 2, 3, 2])
b = np.array([2, 2, 3, 2])
c = np.array([6, 4, 4, 5])
((a <= b) & (b <= c)).all()

... and many more (best to learn as you go).

Broadcasting

Basic operations on numpy arrays (addition, etc.) are elementwise
This works on arrays of the same size.

Nevertheless

It’s also possible to do operations on arrays of different sizes
if Numpy can transform these arrays so that they all have the same size
this conversion is called broadcasting.

Consider the following example of broadcasting:



In [134]:

    
a = np.tile(np.arange(0, 40, 10), (3, 1)).T
a









    Out[134]:





array([[ 0,  0,  0],
       [10, 10, 10],
       [20, 20, 20],
       [30, 30, 30]])



In [131]:

    
b = np.array([0, 1, 2])
b









    Out[131]:





array([0, 1, 2])



In [135]:

    
a+b









    Out[135]:





array([[ 0,  1,  2],
       [10, 11, 12],
       [20, 21, 22],
       [30, 31, 32]])

A useful trick:



In [136]:

    
a = np.arange(0, 40, 10)
a.shape









    Out[136]:





(4,)



In [137]:

    
a = a[:, np.newaxis] # adds a new axis -> 2D array
a.shape









    Out[137]:





(4, 1)



In [140]:

    
b









    Out[140]:





array([0, 1, 2])



In [141]:

    
a









    Out[141]:





array([[ 0],
       [10],
       [20],
       [30]])



In [142]:

    
a + b









    Out[142]:





array([[ 0,  1,  2],
       [10, 11, 12],
       [20, 21, 22],
       [30, 31, 32]])

We have already used broadcasting without knowing it!:



In [ ]:

    
a = np.ones((4, 5))
a[0] = 2 # we assign an array of dimension 0 to an array of dimension 1
a

Broadcasting seems a bit magical
but it is actually quite natural to use it
e.g., when we want to solve a problem whose output data is an array with more dimensions than input data.

Array shape manipulation

Flattening



In [ ]:

    
import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6]])
a.ravel()



In [ ]:

    
a.T.ravel()

Higher dimensions: last dimensions ravel out “first”.

Reshaping

The inverse operation to flattening:



In [ ]:

    
a.shape



In [ ]:

    
b = a.ravel()
b.reshape((2, 3))

Creating an array with a different shape, from another array:



In [ ]:

    
a = np.arange(36)
b = a.reshape((6, 6))
b



In [ ]:

    
b = a.reshape((6, -1)) # unspecified (-1) value is inferred
b

Views and copies

ndarray.reshape may return a view (cf help(np.reshape))), not a copy:



In [ ]:

    
b[0, 0] = 99
a

Beware: reshape may also return a copy!



In [ ]:

    
a = np.zeros((3, 2))
b = a.T.reshape(3*2)
b[0] = 9
a

To understand, one would need to take a deeper look into the memory layout of an array.

Dimension shuffling



In [ ]:

    
a = np.arange(4*3*2).reshape(4, 3, 2)
a.shape



In [ ]:

    
a[0, 2, 1]



In [ ]:

    
b = a.transpose(1, 2, 0)
b.shape



In [ ]:

    
b[2, 1, 0]

Also creates a view:



In [ ]:

    
b[2, 1, 0] = -1
a[0, 2, 1]

Resizing

Size of an array can be changed with ndarray.resize:



In [ ]:

    
a = np.arange(4)
a.resize((8,))
a

However, it must not be referred to somewhere else:



In [ ]:

    
b = a
a.resize((4,))

Sorting data

Sorting along an axis:



In [ ]:

    
a = np.array([[4, 3, 5], [1, 2, 1]])
b = np.sort(a, axis=1)
b

Note: Sorts each row separately!
In-place sort:



In [ ]:

    
a.sort(axis=1)
a

Sorting with fancy indexing:



In [ ]:

    
a = np.array([4, 3, 1, 2])
j = np.argsort(a)
j



In [ ]:

    
a[j]

Finding minima and maxima:



In [ ]:

    
a = np.array([4, 3, 1, 2])
j_max = np.argmax(a)
j_min = np.argmin(a)
j_max, j_min

Basic Visualization

(See Ch. 3, "Python Scientific lecture notes", Release 2012.3 (EuroScipy 2012)

by the EuroScipy tutorial team

Editors: Valentin Haenel, Emmanuelle Gouillart, Gaël Varoquaux

http://scipy-lectures.github.com)

You may also need to visualize your data arrays.

Matplotlib is a Python plotting library. Its capabilities and interface are quite similar to the ones offered by Matlab. It can be used to display the results of your calculations, create publication quality gures and can be embedded in graphical user interfaces.

Matplotlib is a 2D plotting package.
We can import its functions as below:



In [ ]:

    
import matplotlib.pyplot as plt # the tidy way
import numpy as np

1D plotting



In [ ]:

    
x = np.linspace(0, 3, 20)
y = np.linspace(0, 9, 20)
plt.figure()
plt.plot(x, y) # line plot
plt.plot(x, y,'o') # dot plot
plt.show() # <-- shows the plot (not needed with Ipython)

Plotting into the notebook



In [ ]:

    
# Add the following line to make plots *appear* within the iypthon notebook
%matplotlib inline

Labels



In [ ]:

    
x = np.linspace(0, 3, 20)
y = np.linspace(0, 9, 20)

plt.figure()
plt.plot(x, y) # line plot
plt.plot(x, y,'o') # dot plot

plt.xlabel('input')
plt.ylabel('output')
plt.title('Input-Output-Relationship')

2D plotting

(such as images)



In [ ]:

    
image = np.random.rand(30, 30)
plt.imshow(image, cmap=plt.cm.gray)
plt.colorbar()

Subplots



In [ ]:

    
x = np.linspace(0, 3, 20)
y = np.linspace(0, 9, 20)
image = np.random.rand(30, 30)

plt.figure()

plt.subplot(1,2,1)
plt.plot(x, y) # line plot
plt.plot(x, y,'o') # dot plot

plt.subplot(1,2,2)
plt.imshow(image, cmap=plt.cm.gray)
plt.colorbar()

An example of broadcasting in grid-based computations

A lot of grid-based or network-based problems can also use broadcasting.
For instance, if we want to compute the distance from the origin of points on a 5 x 5 grid, we can do:



In [ ]:

    
x, y = np.arange(5), np.arange(5)
distance = np.sqrt(x ** 2 + y[:, np.newaxis] ** 2)
distance



In [ ]:

    
plt.pcolor(distance)
plt.colorbar() 
plt.axis('equal')

or for a set of more general distances...



In [ ]:

    
Mileposts = np.random.random_integers(1,50,10)
Mileposts[0] = 0
Mileposts



In [ ]:

    
DistAlongRoad = np.cumsum(Mileposts)
DistAlongRoad



In [ ]:

    
PtPtDist = np.abs(DistAlongRoad - DistAlongRoad[:, np.newaxis])
PtPtDist



In [ ]:

    
plt.pcolor(PtPtDist) 
plt.colorbar() 
plt.axis('equal')



In [ ]:



In [ ]: