Need for a faster `array`

We know how lists work in Python. We also know that lists can hold the data items of various data types. This means that the list storage allocated to elements can vary in size. This factor makes the list access slow, and operations on array could take long time. numpy provides a elagent solution in the form of ndarray, a $n$ - Dimensional collections of elements with same data types. numpy also provides easier way to manipulate arrays, This makes it High Performance Numerical Calculation possible.

Importing `numpy`

Following is the standard statement to import numpy. In future examples and library usages, we assume that you have imported the library in this way



In [1]:

    
import numpy as np

Creating `ndarray` from Lists

numpy allows us to create an array from exsisting Python List. Datatype conversions are performed if the input list contains elements of multiple datatypes. Datatypes are always promoted. Let's look at some examples.



In [2]:

    
a = np.array([1,2,3,4])
a









    Out[2]:





array([1, 2, 3, 4])



In [3]:

    
b = np.array([[1],[2],[3]])
b









    Out[3]:





array([[1],
       [2],
       [3]])



In [4]:

    
c = np.array([1,2,'x'])
c









    Out[4]:





array(['1', '2', 'x'],
      dtype='<U21')

Note how int is converted into str. U21 is 32-bit Unicode Encoding. (Actual bits needed to store data is 21)



In [5]:

    
d = np.array([1.3,1,3])
d









    Out[5]:





array([ 1.3,  1. ,  3. ])

Note how int is converted to float

Accessing array elements and random shuffling

Array elements can be accessed using indices, slices and using masked arrays

Let's create a random array and then illustrate the methods of accessing array elements



In [6]:

    
x = np.arange(20)  # Like range(), but returns ndarry instead
x









    Out[6]:





array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])



In [7]:

    
x.shape # (rows,cols)









    Out[7]:





(20,)



In [8]:

    
x.shape = (4,5) # 4 rows 5 cols
x









    Out[8]:





array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])



In [9]:

    
x.size  # Total number of elements









    Out[9]:





20



In [10]:

    
np.random.shuffle(x) # shuffles ndarray in-place
x









    Out[10]:





array([[ 5,  6,  7,  8,  9],
       [ 0,  1,  2,  3,  4],
       [15, 16, 17, 18, 19],
       [10, 11, 12, 13, 14]])

This is how we can shuffle an array. random.shuffle() function takes an ndarray as an argument and sorts it in place. NEVER treat its return value as result!



In [11]:

    
x[0]









    Out[11]:





array([5, 6, 7, 8, 9])



In [12]:

    
x[0][2]  # Ok, inefficient









    Out[12]:





7

Above method is inefficient access because, it fetches x[0] first and accesses it's element at index 2. Next method computes the address from 2 co-ordinates directly, and fetches the element at one access



In [13]:

    
x[0,2]   # Efficient









    Out[13]:





7



In [14]:

    
x[0,1:4]









    Out[14]:





array([6, 7, 8])

Above example selects the elements at indices (0,1),(0,2),(0,3). Note that the slices can also be used to select elements from multi-dimensional array



In [15]:

    
x[1:4,0]









    Out[15]:





array([ 0, 15, 10])



In [16]:

    
x > 15









    Out[16]:





array([[False, False, False, False, False],
       [False, False, False, False, False],
       [False,  True,  True,  True,  True],
       [False, False, False, False, False]], dtype=bool)

Note that it returned a boolean array after performing suitable operation. It is called masked array



In [17]:

    
x [ x > 15 ]









    Out[17]:





array([16, 17, 18, 19])

This method to access array element is called as Access by Masked array

Functions that operates on `ndarray`s

Numpy provides many Mathematical functions, that not only operates on inividual numbers, but also on entire arrays. Let's illustrate them



In [18]:

    
np.sin(x)









    Out[18]:





array([[-0.95892427, -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849],
       [ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 ],
       [ 0.65028784, -0.28790332, -0.96139749, -0.75098725,  0.14987721],
       [-0.54402111, -0.99999021, -0.53657292,  0.42016704,  0.99060736]])



In [19]:

    
x[np.sin(x) > 0] # elements whose sine is non-negative









    Out[19]:





array([ 7,  8,  9,  1,  2,  3, 15, 19, 13, 14])

many trigonometrical functions like $sin$,$cos$, calculus related functions like $grad$ are also available

`concatenate` the arrays

concatenate((a1, a2, ...), axis=0) Join a sequence of arrays along an existing axis.

Parameters:

a1, a2, ... : sequence of array_like
The arrays must have the same shape, except in the dimension corresponding to axis (the first, by default).
axis : int, optional
The axis along which the arrays will be joined. Default is 0.
Returns:
res : ndarray
The concatenated array.

hstack((a1, a2, ...)) combines a1, a2, ... horizontally
vstack((a1, a2, ...)) combines a1, a2, ... vertically
dstack((a1, a2, ...)) combines a1, a2, ... depthwise



In [20]:

    
a = np.array([1,2,3,4])
b = np.array([9,8,7,6])



In [21]:

    
a









    Out[21]:





array([1, 2, 3, 4])



In [22]:

    
b









    Out[22]:





array([9, 8, 7, 6])



In [23]:

    
np.concatenate((a,b),axis=0) # (a,b) is a tuple of arrays









    Out[23]:





array([1, 2, 3, 4, 9, 8, 7, 6])



In [24]:

    
np.dstack((a,b))









    Out[24]:





array([[[1, 9],
        [2, 8],
        [3, 7],
        [4, 6]]])



In [25]:

    
np.vstack((a,b))









    Out[25]:





array([[1, 2, 3, 4],
       [9, 8, 7, 6]])



In [26]:

    
np.hstack((a,b))









    Out[26]:





array([1, 2, 3, 4, 9, 8, 7, 6])

We will use these functions frequently in upcoming chapters

Aggregate Functions

Aggregate Functions are those which operates on entire array, to provide an overview of the elements

sum, average like functions fall in this catagory

We will use the below array to illustrate the usage of aggregate functions



In [27]:

    
s = np.sin(x)
s









    Out[27]:





array([[-0.95892427, -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849],
       [ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 ],
       [ 0.65028784, -0.28790332, -0.96139749, -0.75098725,  0.14987721],
       [-0.54402111, -0.99999021, -0.53657292,  0.42016704,  0.99060736]])



In [28]:

    
np.sum(s) # You understood it, right?!









    Out[28]:





0.085276633692154657



In [29]:

    
np.average(s)









    Out[29]:





0.0042638316846077325



In [30]:

    
np.min(x)









    Out[30]:





0



In [31]:

    
np.max(s)









    Out[31]:





0.99060735569487035

At current point, we will stop. This basic understanding of numpy is enough to understand the concepts of Algorithm Analysis in upcoming part.

Interested readers can refer the NumPy Official Tutorial at SciPy

Need for a faster array

Importing numpy

Creating ndarray from Lists