In [1]:
name = '2017-09-25-numpy-intro'
title = 'Intro to NumPy'
tags = 'numpy, basics'
author = 'Denis Sergeev'

In [2]:
from nb_tools import connect_notebook_to_post
from IPython.core.display import HTML

html = connect_notebook_to_post(name, title, tags, author)

Basic data containers in Python include

  • high-level number objects: integer, floating point, complex, etc.
  • containers: lists (costless insertion and append), dictionaries (fast lookup)

Lists

  • One-dimensional
  • Can contain items of different types
  • Mutable, i.e. items can be added, dropped, or replaced
  • Similar to MATLAB's cell arrays

In [3]:
my_collection = [1, 4, 6, 10]

In [4]:
my_collection.append(100000)

In [5]:
my_collection.remove(1)

In [6]:
my_collection[1] = 'abcdef'

In [7]:
my_collection


Out[7]:
[4, 'abcdef', 10, 100000]

Zero-based indexing


In [8]:
a = [10, 20, 30, 40, 50, 60, 70]

In [9]:
low, high = 2, 4

In [10]:
a[:low]


Out[10]:
[10, 20]

In [11]:
a[low:high]


Out[11]:
[30, 40]

In [12]:
a[high:]


Out[12]:
[50, 60, 70]

Works with any index-supporting objects, including strings:


In [13]:
s = 'qwerty'

In [14]:
s[1:-1]


Out[14]:
'wert'

another example

Given a 2D image, img, stored in row-major order, we want to find the linear position in the array of the element at position (x, y). Using zero-based indexing, that linear position is simply img[y * width + x], whereas with one-based indexing it is img((y - 1) * width + x). Now there is a -1 in there!

Why not to use lists?

  • Lists in Python are quite general, can have arbitrary objects as elements.

  • Addition and scalar multiplication are defined for lists, but not what we want for numerical computation, e.g.

Addition results in concatenation


In [15]:
x = [1, 2, 3]
y = [10, 20, 30]
x + y


Out[15]:
[1, 2, 3, 10, 20, 30]

And multiplication results in repeating:


In [16]:
x = [2, 3]
x * 3


Out[16]:
[2, 3, 2, 3, 2, 3]

Enter NumPy arrays

Aside: import conventions


In [17]:
import numpy as np

NumPy arrays

NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. In Numpy dimensions are called axes. The number of axes is rank.

Why it is useful: Memory-efficient container that provides fast numerical operations.

Let's compare it to list operations


In [18]:
l = list(range(1000))

In [19]:
%timeit [i**2 for i in l]


266 µs ± 1.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [20]:
a = np.arange(1000)

In [21]:
%timeit a**2


1.15 µs ± 26.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Creating arrays

Manually


In [22]:
a = np.array([3, 4, 5, 6])

In [23]:
a


Out[23]:
array([3, 4, 5, 6])

The class is called


In [24]:
type(a)


Out[24]:
numpy.ndarray

In [25]:
a.ndim


Out[25]:
1

Scalar array


In [26]:
a0 = np.array(7)

In [27]:
a0.ndim


Out[27]:
0

In [28]:
b = np.array([[10, 20, 30], [9, 8, 7]])

In [29]:
b


Out[29]:
array([[10, 20, 30],
       [ 9,  8,  7]])

In [30]:
c = np.array([[[1], [2]], [[3], [4]]])

In [31]:
c.shape


Out[31]:
(2, 2, 1)

In [32]:
c.max()


Out[32]:
4

Equivalent to size(c) in MATLAB.

Common mistakes


In [33]:
try:
    a = np.array(1,2,3,4) # WRONG, throws ValueError
except ValueError as e:
    print(e)


only 2 non-keyword arguments accepted

In [34]:
a = [1,2,3,4]

In [35]:
a = np.array(a) # RIGHT

In [36]:
b = a.copy()

Do not use np.ndarray function to create an array

np.ndarray([1,2,3,4])

Functions for creating arrays

evenly spaced


In [37]:
np.arange(1, 9, 2) # start, end (exclusive), step


Out[37]:
array([1, 3, 5, 7])

by a number of points


In [38]:
np.linspace(0, 1, 6)   # start, end, num-points


Out[38]:
array([ 0. ,  0.2,  0.4,  0.6,  0.8,  1. ])

In [39]:
np.logspace(-3,2,7)


Out[39]:
array([  1.00000000e-03,   6.81292069e-03,   4.64158883e-02,
         3.16227766e-01,   2.15443469e+00,   1.46779927e+01,
         1.00000000e+02])

filled with specific number

  • Zeros

In [40]:
np.zeros((2, 3))


Out[40]:
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])
  • Ones

In [41]:
np.ones((3, 2))


Out[41]:
array([[ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.]])
  • Empty

In [42]:
np.empty([2,3])


Out[42]:
array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

The function empty creates an array whose initial content is random and depends on the state of the memory. By default, the dtype of the created array is float64.

  • Random numbers

In [43]:
np.random.seed(1234)

In [44]:
np.random.rand(4)       # uniform in [0, 1]


Out[44]:
array([ 0.19151945,  0.62210877,  0.43772774,  0.78535858])

In [45]:
np.random.randn(4)      # Gaussian


Out[45]:
array([-0.72058873,  0.88716294,  0.85958841, -0.6365235 ])

Special cases


In [46]:
np.eye(3)


Out[46]:
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

In [47]:
np.diag(np.array([1, 2, 3, 4]))


Out[47]:
array([[1, 0, 0, 0],
       [0, 2, 0, 0],
       [0, 0, 3, 0],
       [0, 0, 0, 4]])

Missing data


In [48]:
a = np.array([1, 2, 3, np.nan])

In [49]:
b = list(a)

In [50]:
a = np.array([1, 2, 3])

In [51]:
b = np.ma.masked_less(a, 2)

In [52]:
b


Out[52]:
masked_array(data = [-- 2 3],
             mask = [ True False False],
       fill_value = 999999)

In [53]:
b.data


Out[53]:
array([1, 2, 3])

In [54]:
b.mask


Out[54]:
array([ True, False, False], dtype=bool)

We will have a separate session on Masked Arrays in NumPy.

Looking for help

  • Interactive help

In [55]:
np.rollaxis??

In [56]:
np.*space*?
  • with NumPy: a built-in search engine

In [57]:
np.lookfor('create array')


Search results for 'create array'
---------------------------------
numpy.array
    Create an array.
numpy.memmap
    Create a memory-map to an array stored in a *binary* file on disk.
numpy.diagflat
    Create a two-dimensional array with the flattened input as a diagonal.
numpy.fromiter
    Create a new 1-dimensional array from an iterable object.
numpy.partition
    Return a partitioned copy of an array.
numpy.ctypeslib.as_array
    Create a numpy array from a ctypes array or a ctypes POINTER.
numpy.ma.diagflat
    Create a two-dimensional array with the flattened input as a diagonal.
numpy.ma.make_mask
    Create a boolean mask from an array.
numpy.ctypeslib.as_ctypes
    Create and return a ctypes object from a numpy array.  Actually
numpy.ma.mrecords.fromarrays
    Creates a mrecarray from a (flat) list of masked arrays.
numpy.ma.mvoid.__new__
    Create a new masked array from scratch.
numpy.lib.format.open_memmap
    Open a .npy file as a memory-mapped array.
numpy.ma.MaskedArray.__new__
    Create a new masked array from scratch.
numpy.lib.arrayterator.Arrayterator
    Buffered iterator for big arrays.
numpy.ma.mrecords.fromtextfile
    Creates a mrecarray from data stored in the file `filename`.
numpy.asarray
    Convert the input to an array.
numpy.ndarray
    ndarray(shape, dtype=float, buffer=None, offset=0,
numpy.recarray
    Construct an ndarray that allows field access using attributes.
numpy.chararray
    chararray(shape, itemsize=1, unicode=False, buffer=None, offset=0,
numpy.pad
    Pads an array.
numpy.asanyarray
    Convert the input to an ndarray, but pass ndarray subclasses through.
numpy.copy
    Return an array copy of the given object.
numpy.diag
    Extract a diagonal or construct a diagonal array.
numpy.load
    Load arrays or pickled objects from ``.npy``, ``.npz`` or pickled files.
numpy.sort
    Return a sorted copy of an array.
numpy.array_equiv
    Returns True if input arrays are shape consistent and all elements equal.
numpy.dtype
    Create a data type object.
numpy.choose
    Construct an array from an index array and a set of arrays to choose from.
numpy.nditer
    Efficient multi-dimensional iterator object to iterate over arrays.
numpy.swapaxes
    Interchange two axes of an array.
numpy.full_like
    Return a full array with the same shape and type as a given array.
numpy.ones_like
    Return an array of ones with the same shape and type as a given array.
numpy.empty_like
    Return a new array with the same shape and type as a given array.
numpy.ma.mrecords.MaskedRecords.__new__
    Create a new masked array from scratch.
numpy.nan_to_num
    Replace nan with zero and inf with finite numbers.
numpy.zeros_like
    Return an array of zeros with the same shape and type as a given array.
numpy.asarray_chkfinite
    Convert the input to an array, checking for NaNs or Infs.
numpy.diag_indices
    Return the indices to access the main diagonal of an array.
numpy.chararray.tolist
    a.tolist()
numpy.ma.choose
    Use an index array to construct a new array from a set of choices.
numpy.savez_compressed
    Save several arrays into a single file in compressed ``.npz`` format.
numpy.matlib.rand
    Return a matrix of random values with given shape.
numpy.ma.empty_like
    Return a new array with the same shape and type as a given array.
numpy.ma.make_mask_none
    Return a boolean mask of the given shape, filled with False.
numpy.ma.mrecords.fromrecords
    Creates a MaskedRecords from a list of records.
numpy.around
    Evenly round to the given number of decimals.
numpy.source
    Print or write to a file the source code for a NumPy object.
numpy.diagonal
    Return specified diagonals.
numpy.einsum_path
    Evaluates the lowest cost contraction order for an einsum expression by
numpy.histogram2d
    Compute the bi-dimensional histogram of two data samples.
numpy.fft.ifft
    Compute the one-dimensional inverse discrete Fourier Transform.
numpy.fft.ifftn
    Compute the N-dimensional inverse discrete Fourier Transform.
numpy.busdaycalendar
    A business day calendar object that efficiently stores information

In [58]:
HTML(html)


Out[58]:

This post was written as an IPython (Jupyter) notebook. You can view or download it using nbviewer.