Numpy and Scipy Tutorial

Numpy and Scipy are the most common Python package for mathematical and numerical routines in precompiled, fast functions. The NumPy package provides basic routines for manipulating large arrays and matrices of numeric data. The SciPy package extends the functionality of NumPy with a substantial collection of useful algorithms, like minimization, Fourier transformation, regression, and other applied mathematical techniques.

References

  • Scipy tutorial
  • Python for Data Analysis Data Wrangling with Pandas, NumPy, and IPython

First we use the following convention when importing numpy and scipy


In [1]:
import numpy as np
import scipy as cp

ndarray: N-dimensional array

We can create a ndarray from a Python list like this


In [2]:
a = np.array([1,2,3,4])
b = np.array([5,6])
print(a)
print(b)


[1 2 3 4]
[5 6]

Each array has its own type, data type and shape:


In [3]:
print(type(a))
print(a.dtype)
a.shape


<class 'numpy.ndarray'>
int64
Out[3]:
(4,)

In this case, our array is a 1-dimensional array, thus the shape (4,). Surely we can have multi-dimensional array like this:


In [4]:
a2 = np.array ([[1,2,3], [4,5,6]])
print(a2.shape)
print(len(a2))
print(a2.ndim)


(2, 3)
2
2

Note that len function returns the first dimension. ndim property returns the number of dimensions. If you want to get the number of elements in an array, you can use the size property.


In [5]:
a2.size


Out[5]:
6

You can specify the type of the array elements in the constructor like this:


In [6]:
a3 = np.array([4.5, 7.1, 6.2], float)
a3.dtype


Out[6]:
dtype('float64')

You can convert array type from one type to another:


In [7]:
print(a3.astype(str))


['4.5' '7.1' '6.2']

You can reshape array using tuples that define new dimension


In [8]:
a = np.array(range(10), float)
a = a.reshape((5, 2))
a


Out[8]:
array([[ 0.,  1.],
       [ 2.,  3.],
       [ 4.,  5.],
       [ 6.,  7.],
       [ 8.,  9.]])

Convert Array to other data structure

Convert to list:


In [9]:
a.tolist()


Out[9]:
[[0.0, 1.0], [2.0, 3.0], [4.0, 5.0], [6.0, 7.0], [8.0, 9.0]]

Convert to binary string


In [10]:
binary_string = a.tostring()
print(binary_string)
a = np.fromstring(binary_string)


b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xf0?\x00\x00\x00\x00\x00\x00\x00@\x00\x00\x00\x00\x00\x00\x08@\x00\x00\x00\x00\x00\x00\x10@\x00\x00\x00\x00\x00\x00\x14@\x00\x00\x00\x00\x00\x00\x18@\x00\x00\x00\x00\x00\x00\x1c@\x00\x00\x00\x00\x00\x00 @\x00\x00\x00\x00\x00\x00"@'

Other functions to create array

The arange function is similar to the range function but returns an array:


In [11]:
np.arange(5, dtype=float)


Out[11]:
array([ 0.,  1.,  2.,  3.,  4.])

Array can be copied from existing ones


In [12]:
b = a.copy ()
b


Out[12]:
array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

In [13]:
print(np.zeros(10))
print(np.ones(20))


[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
[ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.]

In [14]:
print(np.zeros((3,5)))
print(np.ones((5,7)))


[[ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]]
[[ 1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.]]

In [15]:
np.empty((2,3,2))


Out[15]:
array([[[  1.83697243e-316,   0.00000000e+000],
        [  0.00000000e+000,   0.00000000e+000],
        [  0.00000000e+000,   0.00000000e+000]],

       [[  0.00000000e+000,   0.00000000e+000],
        [  0.00000000e+000,   0.00000000e+000],
        [  0.00000000e+000,   0.00000000e+000]]])

The zeros_like and ones_like functions create a new array with the same dimensions and type of an existing one:


In [16]:
a = np.array([[1, 2, 3], [4, 5, 6]], float)
np.zeros_like(a)


Out[16]:
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

To create an identity matrix of a given size:


In [17]:
np.identity(4, dtype=float)


Out[17]:
array([[ 1.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  1.]])

The eye function returns matrices with ones along the kth diagonal:


In [18]:
np.eye(4, k=1, dtype=float)


Out[18]:
array([[ 0.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.]])

Array Manipulation

Fill array with single value


In [19]:
a.fill(0)
a


Out[19]:
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

Transpose array:


In [20]:
a = np.array([[1, 2, 3], [4, 5, 6]], float)
print(a.transpose())
print(a.T) # shorthand
print(a.T.shape)


[[ 1.  4.]
 [ 2.  5.]
 [ 3.  6.]]
[[ 1.  4.]
 [ 2.  5.]
 [ 3.  6.]]
(3, 2)

Flatten a multidimensional array to a 1 dimensional


In [21]:
a.flatten()


Out[21]:
array([ 1.,  2.,  3.,  4.,  5.,  6.])

Concatenate 1-dimensional arrays


In [22]:
a = np.array([1,2], float)
b = np.array([3,4,5,6], float)
c = np.array([7,8,9], float)
np.concatenate((a, b, c))


Out[22]:
array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

If an array has more than one dimension, it is possible to specify the axis along which multiple arrays are concatenated. By default (without specifying the axis), NumPy concatenates along the first dimension:


In [23]:
a = np.array([[1, 2], [3, 4]], float)
b = np.array([[5, 6], [7,8]], float)
print(np.concatenate((a,b)))
print(np.concatenate((a,b), axis=0))
print(np.concatenate((a,b), axis=1))


[[ 1.  2.]
 [ 3.  4.]
 [ 5.  6.]
 [ 7.  8.]]
[[ 1.  2.]
 [ 3.  4.]
 [ 5.  6.]
 [ 7.  8.]]
[[ 1.  2.  5.  6.]
 [ 3.  4.  7.  8.]]

Finally, the dimensionality of an array can be increased using the newaxis constant in bracket notation


In [24]:
a = np.array([1, 2, 3], float)
print(a)
print (a[:, np.newaxis])
print (a[:,np.newaxis].shape)
print(a[np.newaxis,:])
print(a[np.newaxis,:].shape)


[ 1.  2.  3.]
[[ 1.]
 [ 2.]
 [ 3.]]
(3, 1)
[[ 1.  2.  3.]]
(1, 3)

The in statement can be used to test if values are present in an array:


In [25]:
print(7.1 in a3)
print(2 in a3)


True
False

Array Mathematics


In [26]:
a = np.array([1,2,3], float)
b = np.array([5,2,6], float)
print (a + b)
print (a * b)
print (b / a)
print (a % b)
print (b**a)


[ 6.  4.  9.]
[  5.   4.  18.]
[ 5.  1.  2.]
[ 1.  0.  3.]
[   5.    4.  216.]

For two-dimensional arrays, multiplication remains elementwise and does not correspond to matrix multiplication.


In [27]:
a = np.array([[1,2], [3,4]], float)
b = np.array([[2,0], [1,3]], float)
a * b


Out[27]:
array([[  2.,   0.],
       [  3.,  12.]])

Errors are thrown if arrays do not match in size


In [29]:
a = np.array([1,2,3], float)
b = np.array([4,5], float)
# a + b should throw error

NumPy offers a large library of common mathematical functions that can be applied elementwise to arrays. abs, sign, sqrt, log, log10, exp, sin, cos, tan, arcsin, arccos, arctan, sinh, cosh, tanh, arcsinh, arccosh, arctanh, floor, ceil, rint


In [30]:
a = np.array( [0, 1.5 ,2.4,3.7] )

print (np.sqrt(a))
print (np.floor(a))
print (np.sin(a))
print (np.cos(a))


[ 0.          1.22474487  1.54919334  1.92353841]
[ 0.  1.  2.  3.]
[ 0.          0.99749499  0.67546318 -0.52983614]
[ 1.          0.0707372  -0.73739372 -0.84810003]

Also included in the NumPy module are two important mathematical constants:


In [31]:
print (np.pi)
print (np.e)


3.141592653589793
2.718281828459045

Other array manipulation functions:


In [32]:
# iterate array like a list
a = np.array([1, 4, 5], int)
for x in a:
    print(x)

# loop through multi dimensional array
a = np.array([[1, 2], [3, 4], [5, 6]], float)
for x in a:
    print(x)

for x,y in a:
    print (x * y)
    
# basic array operation
print (a.sum())
print (np.sum(a))
print (a.prod())
print (np.prod(a))
print (a.min())
print (a.max())

# print the indice of the min and max value
print (a.argmin())
print (a.argmax())

# multidimensional array can specify the axis
a = np.array([[0, 2], [3, -1], [3, 5]], float)
print (a.min(axis=1))
print (a.max(axis=0))

# sort, clip
a = np.array([6, 2, 5, -1, 0, 3, 7], float)
print (sorted(a))
print (a.clip(0, 5))

# uniq
print (np.unique(a))

# diagonal
a = np.array([[0, 2], [3, -1], [3, 5]], float)
print (a.diagonal())


1
4
5
[ 1.  2.]
[ 3.  4.]
[ 5.  6.]
2.0
12.0
30.0
21.0
21.0
720.0
720.0
1.0
6.0
0
5
[ 0. -1.  3.]
[ 3.  5.]
[-1.0, 0.0, 2.0, 3.0, 5.0, 6.0, 7.0]
[ 5.  2.  5.  0.  0.  3.  5.]
[-1.  0.  2.  3.  5.  6.  7.]
[ 0. -1.]

Comparison operator and value testing


In [33]:
a = np.array([1, 3, 0], float)
b = np.array([0, 3, 2], float)
print (a > b)
print (a < b)
print (a == b)

print (np.any(a > b))
print (np.all(a > b))

a = np.array([1, 3, 0], float)
print (np.logical_and(a > 0, a < 3))
print (np.logical_not(b))
c = np.array([False, True, False], bool)
print (np.logical_or(b, c))


[ True False False]
[False False  True]
[False  True False]
True
False
[ True False False]
[ True False False]
[False  True  True]

Some testing functions


In [34]:
a = np.array([1, 3, 0], float)
print (np.where(a != 0, 1 / a, a))
a = np.array([[0, 1], [3, 0]], float)
print (a.nonzero())

a = np.array([1, np.NaN, np.Inf], float)
print (np.isnan(a))
print (np.isfinite(a))


[ 1.          0.33333333  0.        ]
(array([0, 1]), array([1, 0]))
[False  True False]
[ True False False]
/opt/conda/lib/python3.5/site-packages/ipykernel/__main__.py:2: RuntimeWarning: divide by zero encountered in true_divide
  from ipykernel import kernelapp as app

Array item selection:


In [35]:
a = np.array([[6, 4], [5, 9]], float)
print (a[a >= 6])


[ 6.  9.]

Array selection using integer array


In [36]:
a = np.array([2, 4, 6, 8], float)
b = np.array([0, 0, 1, 3, 2, 1], int)
print (a[b])


[ 2.  2.  4.  8.  6.  4.]

Take and put can be used as well


In [37]:
a = np.array([2, 4, 6, 8], float)
b = np.array([0, 0, 1, 3, 2, 1], int)
print (a.take(b))

a = np.array([0, 1, 2, 3, 4, 5], float)
b = np.array([9, 8, 7], float)
a.put([0, 3], b)
print (a)


[ 2.  2.  4.  8.  6.  4.]
[ 9.  1.  2.  8.  4.  5.]

Scipy

scipy can create polynomial


In [38]:
p = cp.poly1d([3,4,5])
print(p)


   2
3 x + 4 x + 5

In [39]:
print (p(1))
print (p([2,3]))


12
[25 44]

Do integral and derivative


In [40]:
print(p.integ())
print(p.deriv())


   3     2
1 x + 2 x + 5 x
 
6 x + 4

Statistics

numpy provide basic statistics function: mean, var, std, cov


In [41]:
a = np.random.randn(100)

print(np.mean(a))
print(np.var(a))
print(np.std(a))
print(np.cov(a))


0.0422058513763
0.994352119313
0.997172061037
1.0043960801145746

The median can be found:


In [42]:
np.median(a)


Out[42]:
0.051614233806788805

The covariance for data can be found for multiple variables:


In [43]:
a = np.array([[1,2,3,4,5], [6,5,2,2,2]], float)
print(np.cov(a))


[[ 2.5  -2.75]
 [-2.75  3.8 ]]

scipy provide more advanced functions


In [44]:
# Create a normal distribution with mean 0.5, variance 1
import scipy.stats
rv = cp.stats.norm()
print (rv.stats())
print (rv.mean(), rv.std(), rv.var())

# get cdf, pdf of a particular point
print(rv.pdf(0))
print(rv.cdf(0))


(array(0.0), array(1.0))
0.0 1.0 1.0
0.398942280401
0.5

Random Numbers

Set random seed


In [45]:
np.random.seed(213412)

An array of random numbers in the half-open interval [0.0, 1.0) can be generated:


In [46]:
np.random.randn ()
print (np.random.random())
print( np.random.randint(5, 10))
print ( np.random.rand(5))
print (np.random.rand (2,4))


0.33649210807218777
5
[ 0.35934662  0.06707364  0.10821685  0.71789385  0.70999135]
[[ 0.79202933  0.77513257  0.90432356  0.84152773]
 [ 0.01240178  0.22017021  0.14227512  0.52863095]]

NumPy also includes generators for many other distributions: Beta, binomial, chi-square, Dirichlet, exponential, F, Gamma, geometric, Gumbel, hypergeometric, Laplace, logistic, lognormal, logarithmic, multinomial, multivariate, negative binomial, noncentral chi-square, noncentral F, normal, Pareto, Poisson, power, Rayleigh, Cauchy, student's t, triangular, von Mises, Wald, Weibull, and Zipf distributions


In [47]:
print(np.random.poisson(6.0))
print (np.random.normal())
print ( np.random.normal(1.5, 4.0))
print ( np.random.normal(size=5))


10
0.9783124325897614
6.218375298348087
[ 1.09509914 -1.31193193 -0.26995144 -1.58766924  0.21740938]

The random module can also be used to randomly shuffle the order of items in a list


In [48]:
a = list(range(10))
np.random.shuffle(a)
a


Out[48]:
[9, 6, 3, 0, 4, 7, 8, 2, 5, 1]