Numpy and Scipy Tutorial

Numpy and Scipy are the most common Python package for mathematical and numerical routines in precompiled, fast functions. The NumPy package provides basic routines for manipulating large arrays and matrices of numeric data. The SciPy package extends the functionality of NumPy with a substantial collection of useful algorithms, like minimization, Fourier transformation, regression, and other applied mathematical techniques.

References

Scipy tutorial
Python for Data Analysis Data Wrangling with Pandas, NumPy, and IPython

First we use the following convention when importing numpy and scipy



In [1]:

    
import numpy as np
import scipy as cp

ndarray: N-dimensional array

We can create a ndarray from a Python list like this



In [2]:

    
a = np.array([1,2,3,4])
b = np.array([5,6])
print(a)
print(b)

Each array has its own type, data type and shape:



In [3]:

    
print(type(a))
print(a.dtype)
a.shape









    



<class 'numpy.ndarray'>
int64






    Out[3]:





(4,)

In this case, our array is a 1-dimensional array, thus the shape (4,). Surely we can have multi-dimensional array like this:



In [4]:

    
a2 = np.array ([[1,2,3], [4,5,6]])
print(a2.shape)
print(len(a2))
print(a2.ndim)

Note that len function returns the first dimension. ndim property returns the number of dimensions. If you want to get the number of elements in an array, you can use the size property.



In [5]:

    
a2.size









    Out[5]:





6

You can specify the type of the array elements in the constructor like this:



In [6]:

    
a3 = np.array([4.5, 7.1, 6.2], float)
a3.dtype









    Out[6]:





dtype('float64')

You can convert array type from one type to another:



In [7]:

    
print(a3.astype(str))









    



['4.5' '7.1' '6.2']

You can reshape array using tuples that define new dimension



In [8]:

    
a = np.array(range(10), float)
a = a.reshape((5, 2))
a









    Out[8]:





array([[ 0.,  1.],
       [ 2.,  3.],
       [ 4.,  5.],
       [ 6.,  7.],
       [ 8.,  9.]])

Convert Array to other data structure

Convert to list:



In [9]:

    
a.tolist()









    Out[9]:





[[0.0, 1.0], [2.0, 3.0], [4.0, 5.0], [6.0, 7.0], [8.0, 9.0]]

Convert to binary string



In [10]:

    
binary_string = a.tostring()
print(binary_string)
a = np.fromstring(binary_string)









    



b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xf0?\x00\x00\x00\x00\x00\x00\x00@\x00\x00\x00\x00\x00\x00\x08@\x00\x00\x00\x00\x00\x00\x10@\x00\x00\x00\x00\x00\x00\x14@\x00\x00\x00\x00\x00\x00\x18@\x00\x00\x00\x00\x00\x00\x1c@\x00\x00\x00\x00\x00\x00 @\x00\x00\x00\x00\x00\x00"@'

Other functions to create array

The arange function is similar to the range function but returns an array:



In [11]:

    
np.arange(5, dtype=float)









    Out[11]:





array([ 0.,  1.,  2.,  3.,  4.])

Array can be copied from existing ones



In [12]:

    
b = a.copy ()
b









    Out[12]:





array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])



In [13]:

    
print(np.zeros(10))
print(np.ones(20))









    



[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
[ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.
  1.  1.]



In [14]:

    
print(np.zeros((3,5)))
print(np.ones((5,7)))









    



[[ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.]]
[[ 1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.]]



In [15]:

    
np.empty((2,3,2))









    Out[15]:





array([[[  1.83697243e-316,   0.00000000e+000],
        [  0.00000000e+000,   0.00000000e+000],
        [  0.00000000e+000,   0.00000000e+000]],

       [[  0.00000000e+000,   0.00000000e+000],
        [  0.00000000e+000,   0.00000000e+000],
        [  0.00000000e+000,   0.00000000e+000]]])

The zeros_like and ones_like functions create a new array with the same dimensions and type of an existing one:



In [16]:

    
a = np.array([[1, 2, 3], [4, 5, 6]], float)
np.zeros_like(a)









    Out[16]:





array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

To create an identity matrix of a given size:



In [17]:

    
np.identity(4, dtype=float)









    Out[17]:





array([[ 1.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  1.]])

The eye function returns matrices with ones along the kth diagonal:



In [18]:

    
np.eye(4, k=1, dtype=float)









    Out[18]:





array([[ 0.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.]])

Array Manipulation

Fill array with single value



In [19]:

    
a.fill(0)
a









    Out[19]:





array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

Transpose array:



In [20]:

    
a = np.array([[1, 2, 3], [4, 5, 6]], float)
print(a.transpose())
print(a.T) # shorthand
print(a.T.shape)









    



[[ 1.  4.]
 [ 2.  5.]
 [ 3.  6.]]
[[ 1.  4.]
 [ 2.  5.]
 [ 3.  6.]]
(3, 2)

Flatten a multidimensional array to a 1 dimensional



In [21]:

    
a.flatten()









    Out[21]:





array([ 1.,  2.,  3.,  4.,  5.,  6.])

Concatenate 1-dimensional arrays



In [22]:

    
a = np.array([1,2], float)
b = np.array([3,4,5,6], float)
c = np.array([7,8,9], float)
np.concatenate((a, b, c))









    Out[22]:





array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

If an array has more than one dimension, it is possible to specify the axis along which multiple arrays are concatenated. By default (without specifying the axis), NumPy concatenates along the first dimension:



In [23]:

    
a = np.array([[1, 2], [3, 4]], float)
b = np.array([[5, 6], [7,8]], float)
print(np.concatenate((a,b)))
print(np.concatenate((a,b), axis=0))
print(np.concatenate((a,b), axis=1))









    



[[ 1.  2.]
 [ 3.  4.]
 [ 5.  6.]
 [ 7.  8.]]
[[ 1.  2.]
 [ 3.  4.]
 [ 5.  6.]
 [ 7.  8.]]
[[ 1.  2.  5.  6.]
 [ 3.  4.  7.  8.]]

Finally, the dimensionality of an array can be increased using the newaxis constant in bracket notation



In [24]:

    
a = np.array([1, 2, 3], float)
print(a)
print (a[:, np.newaxis])
print (a[:,np.newaxis].shape)
print(a[np.newaxis,:])
print(a[np.newaxis,:].shape)









    



[ 1.  2.  3.]
[[ 1.]
 [ 2.]
 [ 3.]]
(3, 1)
[[ 1.  2.  3.]]
(1, 3)

The in statement can be used to test if values are present in an array:



In [25]:

    
print(7.1 in a3)
print(2 in a3)









    



True
False

Array Mathematics



In [26]:

    
a = np.array([1,2,3], float)
b = np.array([5,2,6], float)
print (a + b)
print (a * b)
print (b / a)
print (a % b)
print (b**a)









    



[ 6.  4.  9.]
[  5.   4.  18.]
[ 5.  1.  2.]
[ 1.  0.  3.]
[   5.    4.  216.]

For two-dimensional arrays, multiplication remains elementwise and does not correspond to matrix multiplication.



In [27]:

    
a = np.array([[1,2], [3,4]], float)
b = np.array([[2,0], [1,3]], float)
a * b









    Out[27]:





array([[  2.,   0.],
       [  3.,  12.]])

Errors are thrown if arrays do not match in size



In [29]:

    
a = np.array([1,2,3], float)
b = np.array([4,5], float)
# a + b should throw error

NumPy offers a large library of common mathematical functions that can be applied elementwise to arrays. abs, sign, sqrt, log, log10, exp, sin, cos, tan, arcsin, arccos, arctan, sinh, cosh, tanh, arcsinh, arccosh, arctanh, floor, ceil, rint



In [30]:

    
a = np.array( [0, 1.5 ,2.4,3.7] )

print (np.sqrt(a))
print (np.floor(a))
print (np.sin(a))
print (np.cos(a))









    



[ 0.          1.22474487  1.54919334  1.92353841]
[ 0.  1.  2.  3.]
[ 0.          0.99749499  0.67546318 -0.52983614]
[ 1.          0.0707372  -0.73739372 -0.84810003]

Also included in the NumPy module are two important mathematical constants:



In [31]:

    
print (np.pi)
print (np.e)









    



3.141592653589793
2.718281828459045

Other array manipulation functions:



In [32]:

    
# iterate array like a list
a = np.array([1, 4, 5], int)
for x in a:
    print(x)

# loop through multi dimensional array
a = np.array([[1, 2], [3, 4], [5, 6]], float)
for x in a:
    print(x)

for x,y in a:
    print (x * y)
    
# basic array operation
print (a.sum())
print (np.sum(a))
print (a.prod())
print (np.prod(a))
print (a.min())
print (a.max())

# print the indice of the min and max value
print (a.argmin())
print (a.argmax())

# multidimensional array can specify the axis
a = np.array([[0, 2], [3, -1], [3, 5]], float)
print (a.min(axis=1))
print (a.max(axis=0))

# sort, clip
a = np.array([6, 2, 5, -1, 0, 3, 7], float)
print (sorted(a))
print (a.clip(0, 5))

# uniq
print (np.unique(a))

# diagonal
a = np.array([[0, 2], [3, -1], [3, 5]], float)
print (a.diagonal())









    



1
4
5
[ 1.  2.]
[ 3.  4.]
[ 5.  6.]
2.0
12.0
30.0
21.0
21.0
720.0
720.0
1.0
6.0
0
5
[ 0. -1.  3.]
[ 3.  5.]
[-1.0, 0.0, 2.0, 3.0, 5.0, 6.0, 7.0]
[ 5.  2.  5.  0.  0.  3.  5.]
[-1.  0.  2.  3.  5.  6.  7.]
[ 0. -1.]

Comparison operator and value testing



In [33]:

    
a = np.array([1, 3, 0], float)
b = np.array([0, 3, 2], float)
print (a > b)
print (a < b)
print (a == b)

print (np.any(a > b))
print (np.all(a > b))

a = np.array([1, 3, 0], float)
print (np.logical_and(a > 0, a < 3))
print (np.logical_not(b))
c = np.array([False, True, False], bool)
print (np.logical_or(b, c))









    



[ True False False]
[False False  True]
[False  True False]
True
False
[ True False False]
[ True False False]
[False  True  True]

Some testing functions



In [34]:

    
a = np.array([1, 3, 0], float)
print (np.where(a != 0, 1 / a, a))
a = np.array([[0, 1], [3, 0]], float)
print (a.nonzero())

a = np.array([1, np.NaN, np.Inf], float)
print (np.isnan(a))
print (np.isfinite(a))









    



[ 1.          0.33333333  0.        ]
(array([0, 1]), array([1, 0]))
[False  True False]
[ True False False]






    



/opt/conda/lib/python3.5/site-packages/ipykernel/__main__.py:2: RuntimeWarning: divide by zero encountered in true_divide
  from ipykernel import kernelapp as app

Array item selection:



In [35]:

    
a = np.array([[6, 4], [5, 9]], float)
print (a[a >= 6])









    



[ 6.  9.]

Array selection using integer array



In [36]:

    
a = np.array([2, 4, 6, 8], float)
b = np.array([0, 0, 1, 3, 2, 1], int)
print (a[b])









    



[ 2.  2.  4.  8.  6.  4.]

Take and put can be used as well



In [37]:

    
a = np.array([2, 4, 6, 8], float)
b = np.array([0, 0, 1, 3, 2, 1], int)
print (a.take(b))

a = np.array([0, 1, 2, 3, 4, 5], float)
b = np.array([9, 8, 7], float)
a.put([0, 3], b)
print (a)









    



[ 2.  2.  4.  8.  6.  4.]
[ 9.  1.  2.  8.  4.  5.]

Scipy

scipy can create polynomial



In [38]:

    
p = cp.poly1d([3,4,5])
print(p)









    



   2
3 x + 4 x + 5



In [39]:

    
print (p(1))
print (p([2,3]))

Do integral and derivative



In [40]:

    
print(p.integ())
print(p.deriv())









    



   3     2
1 x + 2 x + 5 x
 
6 x + 4

Statistics

numpy provide basic statistics function: mean, var, std, cov



In [41]:

    
a = np.random.randn(100)

print(np.mean(a))
print(np.var(a))
print(np.std(a))
print(np.cov(a))









    



0.0422058513763
0.994352119313
0.997172061037
1.0043960801145746

The median can be found:



In [42]:

    
np.median(a)









    Out[42]:





0.051614233806788805

The covariance for data can be found for multiple variables:



In [43]:

    
a = np.array([[1,2,3,4,5], [6,5,2,2,2]], float)
print(np.cov(a))









    



[[ 2.5  -2.75]
 [-2.75  3.8 ]]

scipy provide more advanced functions



In [44]:

    
# Create a normal distribution with mean 0.5, variance 1
import scipy.stats
rv = cp.stats.norm()
print (rv.stats())
print (rv.mean(), rv.std(), rv.var())

# get cdf, pdf of a particular point
print(rv.pdf(0))
print(rv.cdf(0))









    



(array(0.0), array(1.0))
0.0 1.0 1.0
0.398942280401
0.5

Random Numbers

Set random seed



In [45]:

    
np.random.seed(213412)

An array of random numbers in the half-open interval [0.0, 1.0) can be generated:



In [46]:

    
np.random.randn ()
print (np.random.random())
print( np.random.randint(5, 10))
print ( np.random.rand(5))
print (np.random.rand (2,4))









    



0.33649210807218777
5
[ 0.35934662  0.06707364  0.10821685  0.71789385  0.70999135]
[[ 0.79202933  0.77513257  0.90432356  0.84152773]
 [ 0.01240178  0.22017021  0.14227512  0.52863095]]

NumPy also includes generators for many other distributions: Beta, binomial, chi-square, Dirichlet, exponential, F, Gamma, geometric, Gumbel, hypergeometric, Laplace, logistic, lognormal, logarithmic, multinomial, multivariate, negative binomial, noncentral chi-square, noncentral F, normal, Pareto, Poisson, power, Rayleigh, Cauchy, student's t, triangular, von Mises, Wald, Weibull, and Zipf distributions



In [47]:

    
print(np.random.poisson(6.0))
print (np.random.normal())
print ( np.random.normal(1.5, 4.0))
print ( np.random.normal(size=5))









    



10
0.9783124325897614
6.218375298348087
[ 1.09509914 -1.31193193 -0.26995144 -1.58766924  0.21740938]

The random module can also be used to randomly shuffle the order of items in a list



In [48]:

    
a = list(range(10))
np.random.shuffle(a)
a









    Out[48]:





[9, 6, 3, 0, 4, 7, 8, 2, 5, 1]