Tutorial Brief

numpy is a powerful set of tools to perform mathematical operations of on lists of numbers. It works faster than normal python lists operations and can manupilate high dimentional arrays too.

Finding Help:

SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering.

http://www.scipy.org/

So NumPy is a part of a bigger ecosystem of libraries that build on the optimized performance of NumPy NDArray.

It contain these core packages:

NumPy

Base N-dimensional array package

SciPy

Fundamental library for scientific computing

Matplotlib

Comprehensive 2D Plotting

IPython

Enhanced Interactive Console

SymPy

Symbolic mathematics

Pandas

Data structures & analysis

Importig the library

Import numpy library as np

This helps in writing code and it's almost a standard in scientific work


In [1]:
import numpy as np

Working with ndarray

We will generate an ndarray with np.arange method.

np.arange([start,] stop[, step,], dtype=None)


In [2]:
np.arange(10)


Out[2]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [3]:
np.arange(1,10)


Out[3]:
array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [4]:
np.arange(1,10, 0.5)


Out[4]:
array([ 1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ,  4.5,  5. ,  5.5,  6. ,
        6.5,  7. ,  7.5,  8. ,  8.5,  9. ,  9.5])

In [5]:
np.arange(1,10, 3)


Out[5]:
array([1, 4, 7])

In [6]:
np.arange(1,10, 2, dtype=np.float64)


Out[6]:
array([ 1.,  3.,  5.,  7.,  9.])

Examining ndrray


In [7]:
ds = np.arange(1,10,2)
ds.ndim


Out[7]:
1

In [8]:
ds.shape


Out[8]:
(5,)

In [9]:
ds.size


Out[9]:
5

In [10]:
ds.dtype


Out[10]:
dtype('int64')

In [11]:
ds.itemsize


Out[11]:
8

In [12]:
x=ds.data
list(x)


Out[12]:
['\x01',
 '\x00',
 '\x00',
 '\x00',
 '\x00',
 '\x00',
 '\x00',
 '\x00',
 '\x03',
 '\x00',
 '\x00',
 '\x00',
 '\x00',
 '\x00',
 '\x00',
 '\x00',
 '\x05',
 '\x00',
 '\x00',
 '\x00',
 '\x00',
 '\x00',
 '\x00',
 '\x00',
 '\x07',
 '\x00',
 '\x00',
 '\x00',
 '\x00',
 '\x00',
 '\x00',
 '\x00',
 '\t',
 '\x00',
 '\x00',
 '\x00',
 '\x00',
 '\x00',
 '\x00',
 '\x00']

In [13]:
ds


Out[13]:
array([1, 3, 5, 7, 9])

In [14]:
# Memory Usage
ds.size * ds.itemsize


Out[14]:
40

Why to use numpy?

We will compare the time it takes to create two lists and do some basic operations on them.

Generate a list


In [15]:
%%capture timeit_results
# Regular Python
%timeit python_list_1 = range(1,1000)
python_list_1 = range(1,1000)
python_list_2 = range(1,1000)

#Numpy
%timeit numpy_list_1 = np.arange(1,1000)
numpy_list_1 = np.arange(1,1000)
numpy_list_2 = np.arange(1,1000)

In [16]:
print timeit_results


100000 loops, best of 3: 13.9 us per loop
100000 loops, best of 3: 2.14 us per loop


In [17]:
# Function to calculate time in seconds
def return_time(timeit_result):
    temp_time = float(timeit_result.split(" ")[5])
    temp_unit = timeit_result.split(" ")[6]
    if temp_unit == "ms":
        temp_time = temp_time * 1e-3
    elif temp_unit == "us":
        temp_time = temp_time * 1e-6
    elif temp_unit == "ns":
        temp_time = temp_time * 1e-9
    return temp_time

In [18]:
python_time = return_time(timeit_results.stdout.split("\n")[0])
numpy_time = return_time(timeit_results.stdout.split("\n")[1])

print "Python/NumPy: %.1f" % (python_time/numpy_time)


Python/NumPy: 6.5

Basic Operation


In [19]:
%%capture timeit_python
%%timeit
# Regular Python
[(x + y) for x, y in zip(python_list_1, python_list_2)]
[(x - y) for x, y in zip(python_list_1, python_list_2)]
[(x * y) for x, y in zip(python_list_1, python_list_2)]
[(x / y) for x, y in zip(python_list_1, python_list_2)];

In [20]:
print timeit_python


1000 loops, best of 3: 626 us per loop


In [21]:
%%capture timeit_numpy
%%timeit
#Numpy
numpy_list_1 + numpy_list_2
numpy_list_1 - numpy_list_2
numpy_list_1 * numpy_list_2
numpy_list_1 / numpy_list_2;

In [22]:
print timeit_numpy


10000 loops, best of 3: 34.6 us per loop


In [23]:
python_time = return_time(timeit_python.stdout)
numpy_time = return_time(timeit_numpy.stdout)

print "Python/NumPy: %.1f" % (python_time/numpy_time)


Python/NumPy: 18.1

Most Common Functions

List Creation

array(object, dtype=None, copy=True, order=None, subok=False, ndmin=0)

Parameters
----------
object : array_like
    An array, any object exposing the array interface, an
    object whose __array__ method returns an array, or any
    (nested) sequence.
dtype : data-type, optional
    The desired data-type for the array.  If not given, then
    the type will be determined as the minimum type required
    to hold the objects in the sequence.  This argument can only
    be used to 'upcast' the array.  For downcasting, use the
    .astype(t) method.
copy : bool, optional
    If true (default), then the object is copied.  Otherwise, a copy
    will only be made if __array__ returns a copy, if obj is a
    nested sequence, or if a copy is needed to satisfy any of the other
    requirements (`dtype`, `order`, etc.).
order : {'C', 'F', 'A'}, optional
    Specify the order of the array.  If order is 'C' (default), then the
    array will be in C-contiguous order (last-index varies the
    fastest).  If order is 'F', then the returned array
    will be in Fortran-contiguous order (first-index varies the
    fastest).  If order is 'A', then the returned array may
    be in any order (either C-, Fortran-contiguous, or even
    discontiguous).
subok : bool, optional
    If True, then sub-classes will be passed-through, otherwise
    the returned array will be forced to be a base-class array (default).
ndmin : int, optional
    Specifies the minimum number of dimensions that the resulting
    array should have.  Ones will be pre-pended to the shape as
    needed to meet this requirement.

In [24]:
np.array([1,2,3,4,5])


Out[24]:
array([1, 2, 3, 4, 5])

Multi Dimentional Array


In [25]:
np.array([[1,2],[3,4],[5,6]])


Out[25]:
array([[1, 2],
       [3, 4],
       [5, 6]])

zeros(shape, dtype=float, order='C')

Parameters
----------
shape : int or sequence of ints
    Shape of the new array, e.g., ``(2, 3)`` or ``2``.
dtype : data-type, optional
    The desired data-type for the array, e.g., `numpy.int8`.  Default is
    `numpy.float64`.
order : {'C', 'F'}, optional
    Whether to store multidimensional data in C- or Fortran-contiguous
    (row- or column-wise) order in memory.

In [26]:
np.zeros((3,4))


Out[26]:
array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])

In [27]:
np.zeros((3,4), dtype=np.int64)


Out[27]:
array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

np.linspace(start, stop, num=50, endpoint=True, retstep=False)

Parameters
----------
start : scalar
    The starting value of the sequence.
stop : scalar
    The end value of the sequence, unless `endpoint` is set to False.
    In that case, the sequence consists of all but the last of ``num + 1``
    evenly spaced samples, so that `stop` is excluded.  Note that the step
    size changes when `endpoint` is False.
num : int, optional
    Number of samples to generate. Default is 50.
endpoint : bool, optional
    If True, `stop` is the last sample. Otherwise, it is not included.
    Default is True.
retstep : bool, optional
    If True, return (`samples`, `step`), where `step` is the spacing
    between samples.

In [28]:
np.linspace(1,5)


Out[28]:
array([ 1.        ,  1.08163265,  1.16326531,  1.24489796,  1.32653061,
        1.40816327,  1.48979592,  1.57142857,  1.65306122,  1.73469388,
        1.81632653,  1.89795918,  1.97959184,  2.06122449,  2.14285714,
        2.2244898 ,  2.30612245,  2.3877551 ,  2.46938776,  2.55102041,
        2.63265306,  2.71428571,  2.79591837,  2.87755102,  2.95918367,
        3.04081633,  3.12244898,  3.20408163,  3.28571429,  3.36734694,
        3.44897959,  3.53061224,  3.6122449 ,  3.69387755,  3.7755102 ,
        3.85714286,  3.93877551,  4.02040816,  4.10204082,  4.18367347,
        4.26530612,  4.34693878,  4.42857143,  4.51020408,  4.59183673,
        4.67346939,  4.75510204,  4.83673469,  4.91836735,  5.        ])

In [29]:
np.linspace(0,2,num=4)


Out[29]:
array([ 0.        ,  0.66666667,  1.33333333,  2.        ])

In [30]:
np.linspace(0,2,num=4,endpoint=False)


Out[30]:
array([ 0. ,  0.5,  1. ,  1.5])

random_sample(size=None)

Parameters
----------
size : int or tuple of ints, optional
    Defines the shape of the returned array of random floats. If None
    (the default), returns a single float.

In [31]:
np.random.random((2,3))


Out[31]:
array([[ 0.60383905,  0.84632409,  0.18122863],
       [ 0.73495109,  0.36127266,  0.27401845]])

In [32]:
np.random.random_sample((2,3))


Out[32]:
array([[ 0.57143353,  0.23111517,  0.90913961],
       [ 0.03532667,  0.44399022,  0.15470195]])

Statistical Analysis


In [33]:
data_set = np.random.random((2,3))
data_set


Out[33]:
array([[ 0.99101817,  0.91362334,  0.37546237],
       [ 0.66962595,  0.27485988,  0.45081456]])

np.max(a, axis=None, out=None, keepdims=False)

Parameters
----------
a : array_like
    Input data.
axis : int, optional
    Axis along which to operate.  By default, flattened input is used.
out : ndarray, optional
    Alternative output array in which to place the result.  Must
    be of the same shape and buffer length as the expected output.
    See `doc.ufuncs` (Section "Output arguments") for more details.
keepdims : bool, optional
    If this is set to True, the axes which are reduced are left
    in the result as dimensions with size one. With this option,
    the result will broadcast correctly against the original `arr`.

In [34]:
np.max(data_set)


Out[34]:
0.99101817417900118

In [35]:
np.max(data_set, axis=0)


Out[35]:
array([ 0.99101817,  0.91362334,  0.45081456])

In [36]:
np.max(data_set, axis=1)


Out[36]:
array([ 0.99101817,  0.66962595])

np.min(a, axis=None, out=None, keepdims=False)


In [37]:
np.min(data_set)


Out[37]:
0.2748598782158802

np.mean(a, axis=None, dtype=None, out=None, keepdims=False)


In [38]:
np.mean(data_set)


Out[38]:
0.6125673792901426

np.median(a, axis=None, out=None, overwrite_input=False)


In [39]:
np.median(data_set)


Out[39]:
0.56022025506863438

np.std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False)


In [40]:
np.std(data_set)


Out[40]:
0.2688073942829961

np.sum(a, axis=None, dtype=None, out=None, keepdims=False)


In [41]:
np.sum(data_set)


Out[41]:
3.6754042757408558

Reshaping

np.reshape(a, newshape, order='C')


In [42]:
np.reshape(data_set, (3,2))


Out[42]:
array([[ 0.99101817,  0.91362334],
       [ 0.37546237,  0.66962595],
       [ 0.27485988,  0.45081456]])

In [43]:
np.reshape(data_set, (6,1))


Out[43]:
array([[ 0.99101817],
       [ 0.91362334],
       [ 0.37546237],
       [ 0.66962595],
       [ 0.27485988],
       [ 0.45081456]])

In [44]:
np.reshape(data_set, (6))


Out[44]:
array([ 0.99101817,  0.91362334,  0.37546237,  0.66962595,  0.27485988,
        0.45081456])

np.ravel(a, order='C')


In [45]:
np.ravel(data_set)


Out[45]:
array([ 0.99101817,  0.91362334,  0.37546237,  0.66962595,  0.27485988,
        0.45081456])

Slicing


In [46]:
data_set = np.random.random((5,10))
data_set


Out[46]:
array([[ 0.2187482 ,  0.87071416,  0.73663416,  0.27910705,  0.78239476,
         0.53835918,  0.51234398,  0.72563682,  0.7497531 ,  0.61090375],
       [ 0.46166143,  0.84292073,  0.19234863,  0.31204936,  0.64249925,
         0.23149184,  0.45047676,  0.79576087,  0.84369549,  0.09006852],
       [ 0.74299397,  0.91711184,  0.76535827,  0.16743916,  0.33435712,
         0.50974527,  0.82367946,  0.03806086,  0.70315627,  0.58959405],
       [ 0.74813493,  0.5738713 ,  0.40863753,  0.44157988,  0.32909602,
         0.51802248,  0.33975736,  0.36404317,  0.70869127,  0.50686958],
       [ 0.48861471,  0.16930154,  0.03239842,  0.0835669 ,  0.44708358,
         0.8001063 ,  0.39644714,  0.83747988,  0.71102625,  0.44535013]])

In [47]:
data_set[1]


Out[47]:
array([ 0.46166143,  0.84292073,  0.19234863,  0.31204936,  0.64249925,
        0.23149184,  0.45047676,  0.79576087,  0.84369549,  0.09006852])

In [48]:
data_set[1][0]


Out[48]:
0.46166143118294922

In [49]:
data_set[1,0]


Out[49]:
0.46166143118294922

Slicing a range


In [50]:
data_set[2:4]


Out[50]:
array([[ 0.74299397,  0.91711184,  0.76535827,  0.16743916,  0.33435712,
         0.50974527,  0.82367946,  0.03806086,  0.70315627,  0.58959405],
       [ 0.74813493,  0.5738713 ,  0.40863753,  0.44157988,  0.32909602,
         0.51802248,  0.33975736,  0.36404317,  0.70869127,  0.50686958]])

In [51]:
data_set[2:4,0]


Out[51]:
array([ 0.74299397,  0.74813493])

In [52]:
data_set[2:4,0:2]


Out[52]:
array([[ 0.74299397,  0.91711184],
       [ 0.74813493,  0.5738713 ]])

In [53]:
data_set[:,0]


Out[53]:
array([ 0.2187482 ,  0.46166143,  0.74299397,  0.74813493,  0.48861471])

Stepping


In [54]:
data_set[2:4:1]


Out[54]:
array([[ 0.74299397,  0.91711184,  0.76535827,  0.16743916,  0.33435712,
         0.50974527,  0.82367946,  0.03806086,  0.70315627,  0.58959405],
       [ 0.74813493,  0.5738713 ,  0.40863753,  0.44157988,  0.32909602,
         0.51802248,  0.33975736,  0.36404317,  0.70869127,  0.50686958]])

In [55]:
data_set[::]


Out[55]:
array([[ 0.2187482 ,  0.87071416,  0.73663416,  0.27910705,  0.78239476,
         0.53835918,  0.51234398,  0.72563682,  0.7497531 ,  0.61090375],
       [ 0.46166143,  0.84292073,  0.19234863,  0.31204936,  0.64249925,
         0.23149184,  0.45047676,  0.79576087,  0.84369549,  0.09006852],
       [ 0.74299397,  0.91711184,  0.76535827,  0.16743916,  0.33435712,
         0.50974527,  0.82367946,  0.03806086,  0.70315627,  0.58959405],
       [ 0.74813493,  0.5738713 ,  0.40863753,  0.44157988,  0.32909602,
         0.51802248,  0.33975736,  0.36404317,  0.70869127,  0.50686958],
       [ 0.48861471,  0.16930154,  0.03239842,  0.0835669 ,  0.44708358,
         0.8001063 ,  0.39644714,  0.83747988,  0.71102625,  0.44535013]])

In [56]:
data_set[::2]


Out[56]:
array([[ 0.2187482 ,  0.87071416,  0.73663416,  0.27910705,  0.78239476,
         0.53835918,  0.51234398,  0.72563682,  0.7497531 ,  0.61090375],
       [ 0.74299397,  0.91711184,  0.76535827,  0.16743916,  0.33435712,
         0.50974527,  0.82367946,  0.03806086,  0.70315627,  0.58959405],
       [ 0.48861471,  0.16930154,  0.03239842,  0.0835669 ,  0.44708358,
         0.8001063 ,  0.39644714,  0.83747988,  0.71102625,  0.44535013]])

In [57]:
data_set[2:4]


Out[57]:
array([[ 0.74299397,  0.91711184,  0.76535827,  0.16743916,  0.33435712,
         0.50974527,  0.82367946,  0.03806086,  0.70315627,  0.58959405],
       [ 0.74813493,  0.5738713 ,  0.40863753,  0.44157988,  0.32909602,
         0.51802248,  0.33975736,  0.36404317,  0.70869127,  0.50686958]])

In [58]:
data_set[2:4,::2]


Out[58]:
array([[ 0.74299397,  0.76535827,  0.33435712,  0.82367946,  0.70315627],
       [ 0.74813493,  0.40863753,  0.32909602,  0.33975736,  0.70869127]])