In [1]:

    
name = '2017-09-25-numpy-intro'
title = 'Intro to NumPy'
tags = 'numpy, basics'
author = 'Denis Sergeev'



In [2]:

    
from nb_tools import connect_notebook_to_post
from IPython.core.display import HTML

html = connect_notebook_to_post(name, title, tags, author)

Basic data containers in Python include

high-level number objects: integer, floating point, complex, etc.
containers: lists (costless insertion and append), dictionaries (fast lookup)

Lists

One-dimensional
Can contain items of different types
Mutable, i.e. items can be added, dropped, or replaced
Similar to MATLAB's cell arrays



In [3]:

    
my_collection = [1, 4, 6, 10]



In [4]:

    
my_collection.append(100000)



In [5]:

    
my_collection.remove(1)



In [6]:

    
my_collection[1] = 'abcdef'



In [7]:

    
my_collection









    Out[7]:





[4, 'abcdef', 10, 100000]

Zero-based indexing



In [8]:

    
a = [10, 20, 30, 40, 50, 60, 70]



In [9]:

    
low, high = 2, 4



In [10]:

    
a[:low]









    Out[10]:





[10, 20]



In [11]:

    
a[low:high]









    Out[11]:





[30, 40]



In [12]:

    
a[high:]









    Out[12]:





[50, 60, 70]

Works with any index-supporting objects, including strings:



In [13]:

    
s = 'qwerty'



In [14]:

    
s[1:-1]









    Out[14]:





'wert'

another example

Given a 2D image, img, stored in row-major order, we want to find the linear position in the array of the element at position (x, y). Using zero-based indexing, that linear position is simply img[y * width + x], whereas with one-based indexing it is img((y - 1) * width + x). Now there is a -1 in there!

Why not to use lists?

Lists in Python are quite general, can have arbitrary objects as elements.
Addition and scalar multiplication are defined for lists, but not what we want for numerical computation, e.g.

Addition results in concatenation



In [15]:

    
x = [1, 2, 3]
y = [10, 20, 30]
x + y









    Out[15]:





[1, 2, 3, 10, 20, 30]

And multiplication results in repeating:



In [16]:

    
x = [2, 3]
x * 3









    Out[16]:





[2, 3, 2, 3, 2, 3]

Enter NumPy arrays

Aside: import conventions



In [17]:

    
import numpy as np

NumPy arrays

NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. In Numpy dimensions are called axes. The number of axes is rank.

Why it is useful: Memory-efficient container that provides fast numerical operations.

Let's compare it to list operations



In [18]:

    
l = list(range(1000))



In [19]:

    
%timeit [i**2 for i in l]









    



266 µs ± 1.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)



In [20]:

    
a = np.arange(1000)



In [21]:

    
%timeit a**2









    



1.15 µs ± 26.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Creating arrays

Manually



In [22]:

    
a = np.array([3, 4, 5, 6])



In [23]:

    
a









    Out[23]:





array([3, 4, 5, 6])

The class is called



In [24]:

    
type(a)









    Out[24]:





numpy.ndarray



In [25]:

    
a.ndim









    Out[25]:





1

Scalar array



In [26]:

    
a0 = np.array(7)



In [27]:

    
a0.ndim









    Out[27]:





0



In [28]:

    
b = np.array([[10, 20, 30], [9, 8, 7]])



In [29]:

    
b









    Out[29]:





array([[10, 20, 30],
       [ 9,  8,  7]])



In [30]:

    
c = np.array([[[1], [2]], [[3], [4]]])



In [31]:

    
c.shape









    Out[31]:





(2, 2, 1)



In [32]:

    
c.max()









    Out[32]:





4

Equivalent to size(c) in MATLAB.

Common mistakes



In [33]:

    
try:
    a = np.array(1,2,3,4) # WRONG, throws ValueError
except ValueError as e:
    print(e)









    



only 2 non-keyword arguments accepted



In [34]:

    
a = [1,2,3,4]



In [35]:

    
a = np.array(a) # RIGHT



In [36]:

    
b = a.copy()

Do not use np.ndarray function to create an array

np.ndarray([1,2,3,4])

Functions for creating arrays

evenly spaced



In [37]:

    
np.arange(1, 9, 2) # start, end (exclusive), step









    Out[37]:





array([1, 3, 5, 7])

by a number of points



In [38]:

    
np.linspace(0, 1, 6)   # start, end, num-points









    Out[38]:





array([ 0. ,  0.2,  0.4,  0.6,  0.8,  1. ])



In [39]:

    
np.logspace(-3,2,7)









    Out[39]:





array([  1.00000000e-03,   6.81292069e-03,   4.64158883e-02,
         3.16227766e-01,   2.15443469e+00,   1.46779927e+01,
         1.00000000e+02])

filled with specific number

Zeros



In [40]:

    
np.zeros((2, 3))









    Out[40]:





array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

Ones



In [41]:

    
np.ones((3, 2))









    Out[41]:





array([[ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.]])

Empty



In [42]:

    
np.empty([2,3])









    Out[42]:





array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

The function empty creates an array whose initial content is random and depends on the state of the memory. By default, the dtype of the created array is float64.

Random numbers



In [43]:

    
np.random.seed(1234)



In [44]:

    
np.random.rand(4)       # uniform in [0, 1]









    Out[44]:





array([ 0.19151945,  0.62210877,  0.43772774,  0.78535858])



In [45]:

    
np.random.randn(4)      # Gaussian









    Out[45]:





array([-0.72058873,  0.88716294,  0.85958841, -0.6365235 ])

Special cases



In [46]:

    
np.eye(3)









    Out[46]:





array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])



In [47]:

    
np.diag(np.array([1, 2, 3, 4]))









    Out[47]:





array([[1, 0, 0, 0],
       [0, 2, 0, 0],
       [0, 0, 3, 0],
       [0, 0, 0, 4]])

Missing data



In [48]:

    
a = np.array([1, 2, 3, np.nan])



In [49]:

    
b = list(a)



In [50]:

    
a = np.array([1, 2, 3])



In [51]:

    
b = np.ma.masked_less(a, 2)



In [52]:

    
b









    Out[52]:





masked_array(data = [-- 2 3],
             mask = [ True False False],
       fill_value = 999999)



In [53]:

    
b.data









    Out[53]:





array([1, 2, 3])



In [54]:

    
b.mask









    Out[54]:





array([ True, False, False], dtype=bool)

We will have a separate session on Masked Arrays in NumPy.

Looking for help

Interactive help



In [55]:

    
np.rollaxis??



In [56]:

    
np.*space*?

with NumPy: a built-in search engine



In [57]:

    
np.lookfor('create array')









    



Search results for 'create array'
---------------------------------
numpy.array
    Create an array.
numpy.memmap
    Create a memory-map to an array stored in a *binary* file on disk.
numpy.diagflat
    Create a two-dimensional array with the flattened input as a diagonal.
numpy.fromiter
    Create a new 1-dimensional array from an iterable object.
numpy.partition
    Return a partitioned copy of an array.
numpy.ctypeslib.as_array
    Create a numpy array from a ctypes array or a ctypes POINTER.
numpy.ma.diagflat
    Create a two-dimensional array with the flattened input as a diagonal.
numpy.ma.make_mask
    Create a boolean mask from an array.
numpy.ctypeslib.as_ctypes
    Create and return a ctypes object from a numpy array.  Actually
numpy.ma.mrecords.fromarrays
    Creates a mrecarray from a (flat) list of masked arrays.
numpy.ma.mvoid.__new__
    Create a new masked array from scratch.
numpy.lib.format.open_memmap
    Open a .npy file as a memory-mapped array.
numpy.ma.MaskedArray.__new__
    Create a new masked array from scratch.
numpy.lib.arrayterator.Arrayterator
    Buffered iterator for big arrays.
numpy.ma.mrecords.fromtextfile
    Creates a mrecarray from data stored in the file `filename`.
numpy.asarray
    Convert the input to an array.
numpy.ndarray
    ndarray(shape, dtype=float, buffer=None, offset=0,
numpy.recarray
    Construct an ndarray that allows field access using attributes.
numpy.chararray
    chararray(shape, itemsize=1, unicode=False, buffer=None, offset=0,
numpy.pad
    Pads an array.
numpy.asanyarray
    Convert the input to an ndarray, but pass ndarray subclasses through.
numpy.copy
    Return an array copy of the given object.
numpy.diag
    Extract a diagonal or construct a diagonal array.
numpy.load
    Load arrays or pickled objects from ``.npy``, ``.npz`` or pickled files.
numpy.sort
    Return a sorted copy of an array.
numpy.array_equiv
    Returns True if input arrays are shape consistent and all elements equal.
numpy.dtype
    Create a data type object.
numpy.choose
    Construct an array from an index array and a set of arrays to choose from.
numpy.nditer
    Efficient multi-dimensional iterator object to iterate over arrays.
numpy.swapaxes
    Interchange two axes of an array.
numpy.full_like
    Return a full array with the same shape and type as a given array.
numpy.ones_like
    Return an array of ones with the same shape and type as a given array.
numpy.empty_like
    Return a new array with the same shape and type as a given array.
numpy.ma.mrecords.MaskedRecords.__new__
    Create a new masked array from scratch.
numpy.nan_to_num
    Replace nan with zero and inf with finite numbers.
numpy.zeros_like
    Return an array of zeros with the same shape and type as a given array.
numpy.asarray_chkfinite
    Convert the input to an array, checking for NaNs or Infs.
numpy.diag_indices
    Return the indices to access the main diagonal of an array.
numpy.chararray.tolist
    a.tolist()
numpy.ma.choose
    Use an index array to construct a new array from a set of choices.
numpy.savez_compressed
    Save several arrays into a single file in compressed ``.npz`` format.
numpy.matlib.rand
    Return a matrix of random values with given shape.
numpy.ma.empty_like
    Return a new array with the same shape and type as a given array.
numpy.ma.make_mask_none
    Return a boolean mask of the given shape, filled with False.
numpy.ma.mrecords.fromrecords
    Creates a MaskedRecords from a list of records.
numpy.around
    Evenly round to the given number of decimals.
numpy.source
    Print or write to a file the source code for a NumPy object.
numpy.diagonal
    Return specified diagonals.
numpy.einsum_path
    Evaluates the lowest cost contraction order for an einsum expression by
numpy.histogram2d
    Compute the bi-dimensional histogram of two data samples.
numpy.fft.ifft
    Compute the one-dimensional inverse discrete Fourier Transform.
numpy.fft.ifftn
    Compute the N-dimensional inverse discrete Fourier Transform.
numpy.busdaycalendar
    A business day calendar object that efficiently stores information

References



In [58]:

    
HTML(html)









    Out[58]:





    
     This post was written as an IPython (Jupyter) notebook. You can view or download it using
    nbviewer.