In [1]:

    
name = '2016-02-19-numpy-arrays-basics'
title = 'Back to basics: NumPy arrays'
tags = 'numpy'
author = 'Denis Sergeev'



In [2]:

    
from nb_tools import connect_notebook_to_post
from IPython.core.display import HTML

html = connect_notebook_to_post(name, title, tags, author)

Today we refreshed our knowledge of python numerical arrays and their basic features.

Basic data containers in Python include

high-level number objects: integer, floating point, complex, etc.
containers: lists (costless insertion and append), dictionaries (fast lookup)

Why not to use lists?



In [3]:

    
from IPython.display import Image
Image('https://mlpforums.com/uploads/monthly_01_2014/post-6104-0-95171600-1390631637.jpg', width=200, height=200)









    Out[3]:

Lists in Python are quite general, can have arbitrary objects as elements.
Addition and scalar multiplication are defined for lists, but not what we want for numerical computation, e.g.

Addition results in concatenation



In [4]:

    
x = [1, 2, 3]
y = [10, 20, 30]
x + y









    Out[4]:





[1, 2, 3, 10, 20, 30]

And multiplication results in repeating:



In [5]:

    
x = [2, 3]
x * 3









    Out[5]:





[2, 3, 2, 3, 2, 3]

NumPy, show me what you got!

Aside: import conventions



In [6]:

    
import numpy as np

NumPy arrays

NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. In Numpy dimensions are called axes. The number of axes is rank.

Why it is useful: Memory-efficient container that provides fast numerical operations.

Let's compare it to list operations



In [7]:

    
l = list(range(1000))



In [8]:

    
%timeit [i**2 for i in l]









    



1000 loops, best of 3: 284 µs per loop



In [9]:

    
a = np.arange(1000)



In [10]:

    
%timeit a**2









    



The slowest run took 194.61 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 1.67 µs per loop

Creating arrays

Manually



In [11]:

    
a = np.array([3, 4, 5, 6])



In [12]:

    
a









    Out[12]:





array([3, 4, 5, 6])

The class is called



In [13]:

    
type(a)









    Out[13]:





numpy.ndarray

Scalar array



In [14]:

    
a0 = np.array(7)



In [15]:

    
a0.ndim









    Out[15]:





0



In [16]:

    
b = np.array([[10, 20, 30], [9, 8, 7]])



In [17]:

    
b









    Out[17]:





array([[10, 20, 30],
       [ 9,  8,  7]])



In [18]:

    
c = np.array([[[1], [2]], [[3], [4]]])



In [19]:

    
c.shape









    Out[19]:





(2, 2, 1)

Common mistakes



In [20]:

    
try:
    a = np.array(1,2,3,4) # WRONG, throws ValueError
except ValueError as e:
    print(e)









    



only 2 non-keyword arguments accepted



In [21]:

    
a = np.array([1,2,3,4]) # RIGHT

Do not use np.ndarray function to create an array



In [22]:

    
np.ndarray([1,2,3,4])









    Out[22]:





array([[[[  6.90888959e-310,   6.90888959e-310,   6.90889049e-310,
            6.90888014e-310],
         [  6.90888923e-310,   6.90886687e-310,   6.90889049e-310,
            6.90886687e-310],
         [  6.90888923e-310,   6.90886687e-310,   6.90889049e-310,
            6.90886687e-310]],

        [[  6.90888923e-310,   6.90886687e-310,   6.90889049e-310,
            6.90886687e-310],
         [  6.90888923e-310,   6.90886687e-310,   6.90889049e-310,
            6.90886687e-310],
         [  6.90888923e-310,   6.90886687e-310,   6.90889049e-310,
            6.90886687e-310]]]])

Functions for creating arrays

evenly spaced



In [23]:

    
np.arange(1, 9, 2) # start, end (exclusive), step









    Out[23]:





array([1, 3, 5, 7])

by a number of points



In [24]:

    
np.linspace(0, 1, 6)   # start, end, num-points









    Out[24]:





array([ 0. ,  0.2,  0.4,  0.6,  0.8,  1. ])



In [25]:

    
np.logspace(-3,2,7)









    Out[25]:





array([  1.00000000e-03,   6.81292069e-03,   4.64158883e-02,
         3.16227766e-01,   2.15443469e+00,   1.46779927e+01,
         1.00000000e+02])

filled with specific number

Zeros



In [26]:

    
np.zeros((2, 3))









    Out[26]:





array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

Ones



In [27]:

    
np.ones((3, 2))









    Out[27]:





array([[ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.]])

Empty



In [28]:

    
np.empty([2,3])









    Out[28]:





array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

The function empty creates an array whose initial content is random and depends on the state of the memory. By default, the dtype of the created array is float64.

Random numbers



In [29]:

    
np.random.seed(1234)



In [30]:

    
np.random.rand(4)       # uniform in [0, 1]









    Out[30]:





array([ 0.19151945,  0.62210877,  0.43772774,  0.78535858])



In [31]:

    
np.random.randn(4)      # Gaussian









    Out[31]:





array([-0.72058873,  0.88716294,  0.85958841, -0.6365235 ])

Special cases



In [32]:

    
np.eye(3)









    Out[32]:





array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])



In [33]:

    
np.diag(np.array([1, 2, 3, 4]))









    Out[33]:





array([[1, 0, 0, 0],
       [0, 2, 0, 0],
       [0, 0, 3, 0],
       [0, 0, 0, 4]])

Array attributes



In [34]:

    
b









    Out[34]:





array([[10, 20, 30],
       [ 9,  8,  7]])

ndarray.ndim

the number of axes (dimensions) of the array. In the Python world, the number of dimensions is referred to as rank.



In [35]:

    
b.ndim









    Out[35]:





2

ndarray.shape

the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the rank, or number of dimensions, ndim.



In [36]:

    
b.shape









    Out[36]:





(2, 3)

ndarray.size

the total number of elements of the array. This is equal to the product of the elements of shape.



In [37]:

    
b.size









    Out[37]:





6

Note that size is not equal to len(). The latter returns the length of the first dimension.



In [38]:

    
len(b)









    Out[38]:





2

ndarray.dtype

an object describing the type of the elements in the array. One can create or specify dtype’s using standard Python types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples.



In [39]:

    
b.dtype









    Out[39]:





dtype('int64')

ndarray.itemsize

the size in bytes of each element of the array.



In [40]:

    
a.itemsize









    Out[40]:





8

ndarray.data

the buffer containing the actual elements of the array. Normally, we won’t need to use this attribute because we will access the elements in an array using indexing facilities.



In [41]:

    
b.data









    Out[41]:





<memory at 0x7f2e580c7708>

Copies and mutability

A common source for confusion and mistakes is that numpy arrays are mutable, meaning they can be changed after creation. Other common mutable objects are lists and dictionaries, while tuples are inmutable.

It also implies that when operating and manipulating arrays, their data is sometimes copied into a new array and sometimes not.

Example

Create an array



In [42]:

    
a = np.arange(12)
a









    Out[42]:





array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

Change its 6th element



In [43]:

    
a[5] = 1000



In [44]:

    
a









    Out[44]:





array([   0,    1,    2,    3,    4, 1000,    6,    7,    8,    9,   10,
         11])

Simple assignment == no copy at all



In [45]:

    
a = np.arange(6)



In [46]:

    
b = a

a and b are two names for the same ndarray object



In [47]:

    
b is a









    Out[47]:





True

Change the shape and the 3rd element of b



In [48]:

    
b[2] = 999
b.shape = 3, 2

b









    Out[48]:





array([[  0,   1],
       [999,   3],
       [  4,   5]])

But a is also changed!



In [49]:

    
a









    Out[49]:





array([[  0,   1],
       [999,   3],
       [  4,   5]])

Python passes mutable objects as references, so function calls make no copy neither.

View == shallow copy



In [50]:

    
a = np.arange(6)



In [51]:

    
c = a.view()



In [52]:

    
c is a









    Out[52]:





False

c is a view of the data owned by a



In [53]:

    
c.base is a









    Out[53]:





True



In [54]:

    
c[2] = 999
a # a's data changes









    Out[54]:





array([  0,   1, 999,   3,   4,   5])



In [55]:

    
c.shape = 2,3
a.shape # a's shape doesn't change









    Out[55]:





(6,)

Slicing an array returns a view of it:



In [56]:

    
s = a[2:5]
s









    Out[56]:





array([999,   3,   4])



In [57]:

    
s[:] = 10 # s[:] is a view of s. Note the difference between s=10 and s[:]=10



In [58]:

    
a









    Out[58]:





array([ 0,  1, 10, 10, 10,  5])

How to avoid confusement? Use deep copy.



In [59]:

    
a = np.arange(6)

Create a copy using copy() attribute



In [60]:

    
d = a.copy()

The copy doesn't share anything with the original array:



In [61]:

    
d is a









    Out[61]:





False



In [62]:

    
d.base is a









    Out[62]:





False



In [63]:

    
d[-1] = 1000



In [64]:

    
d









    Out[64]:





array([   0,    1,    2,    3,    4, 1000])



In [65]:

    
a









    Out[65]:





array([0, 1, 2, 3, 4, 5])

Looking for help

Interactive help



In [66]:

    
np.rollaxis??



In [67]:

    
np.*space*?

with NumPy: a built-in search engine



In [68]:

    
np.lookfor('create array')









    



Search results for 'create array'
---------------------------------
numpy.array
    Create an array.
numpy.memmap
    Create a memory-map to an array stored in a *binary* file on disk.
numpy.diagflat
    Create a two-dimensional array with the flattened input as a diagonal.
numpy.fromiter
    Create a new 1-dimensional array from an iterable object.
numpy.partition
    Return a partitioned copy of an array.
numpy.ma.diagflat
    Create a two-dimensional array with the flattened input as a diagonal.
numpy.ctypeslib.as_array
    Create a numpy array from a ctypes array or a ctypes POINTER.
numpy.ma.make_mask
    Create a boolean mask from an array.
numpy.ctypeslib.as_ctypes
    Create and return a ctypes object from a numpy array.  Actually
numpy.ma.mrecords.fromarrays
    Creates a mrecarray from a (flat) list of masked arrays.
numpy.ma.mvoid.__new__
    Create a new masked array from scratch.
numpy.lib.format.open_memmap
    Open a .npy file as a memory-mapped array.
numpy.ma.MaskedArray.__new__
    Create a new masked array from scratch.
numpy.lib.arrayterator.Arrayterator
    Buffered iterator for big arrays.
numpy.ma.mrecords.fromtextfile
    Creates a mrecarray from data stored in the file `filename`.
numpy.asarray
    Convert the input to an array.
numpy.ndarray
    ndarray(shape, dtype=float, buffer=None, offset=0,
numpy.recarray
    Construct an ndarray that allows field access using attributes.
numpy.chararray
    chararray(shape, itemsize=1, unicode=False, buffer=None, offset=0,
numpy.pad
    Pads an array.
numpy.sum
    Sum of array elements over a given axis.
numpy.asanyarray
    Convert the input to an ndarray, but pass ndarray subclasses through.
numpy.copy
    Return an array copy of the given object.
numpy.diag
    Extract a diagonal or construct a diagonal array.
numpy.load
    Load arrays or pickled objects from ``.npy``, ``.npz`` or pickled files.
numpy.sort
    Return a sorted copy of an array.
numpy.array_equiv
    Returns True if input arrays are shape consistent and all elements equal.
numpy.dtype
    Create a data type object.
numpy.choose
    Construct an array from an index array and a set of arrays to choose from.
numpy.nditer
    Efficient multi-dimensional iterator object to iterate over arrays.
numpy.swapaxes
    Interchange two axes of an array.
numpy.ma.mrecords.MaskedRecords.__new__
    Create a new masked array from scratch.
numpy.full_like
    Return a full array with the same shape and type as a given array.
numpy.ones_like
    Return an array of ones with the same shape and type as a given array.
numpy.empty_like
    Return a new array with the same shape and type as a given array.
numpy.zeros_like
    Return an array of zeros with the same shape and type as a given array.
numpy.asarray_chkfinite
    Convert the input to an array, checking for NaNs or Infs.
numpy.diag_indices
    Return the indices to access the main diagonal of an array.
numpy.ma.choose
    Use an index array to construct a new array from a set of choices.
numpy.chararray.tolist
    a.tolist()
numpy.matlib.rand
    Return a matrix of random values with given shape.
numpy.savez_compressed
    Save several arrays into a single file in compressed ``.npz`` format.
numpy.ma.empty_like
    Return a new array with the same shape and type as a given array.
numpy.ma.make_mask_none
    Return a boolean mask of the given shape, filled with False.
numpy.ma.mrecords.fromrecords
    Creates a MaskedRecords from a list of records.
numpy.around
    Evenly round to the given number of decimals.
numpy.source
    Print or write to a file the source code for a Numpy object.
numpy.diagonal
    Return specified diagonals.
numpy.histogram2d
    Compute the bi-dimensional histogram of two data samples.
numpy.fft.ifft
    Compute the one-dimensional inverse discrete Fourier Transform.
numpy.fft.ifftn
    Compute the N-dimensional inverse discrete Fourier Transform.
numpy.busdaycalendar
    A business day calendar object that efficiently stores information

References



In [69]:

    
HTML(html)









    Out[69]:





    
     This post was written as an IPython (Jupyter) notebook. You can view or download it using
    nbviewer.