What is Numpy

NumPy is the fundamental package for scientific computing with Python. It is a package that provide high-performance vector, matrix and higher-dimensional data structures for Python. It is implemented in C and Fortran so when calculations are vectorized, performance is very good.

So, in a nutshell:

a powerful Python extension for N-dimensional array
a tool for integrating C/C++ and Fortran code
designed for scientific computation: linear algebra and Signal Analysis

If you are a MATLAB® user I do recommend to read Numpy for MATLAB Users.

I'm a supporter of the Open Science Movement, thus I humbly suggest you to take a look at the Science Code Manifesto

Getting Started with Numpy Arrays

NumPy's main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type.

In Numpy dimensions are called axes.

The number of axes is called rank.

The most important attributes of an ndarray object are:

ndarray.ndim - the number of axes (dimensions) of the array.
ndarray.shape - the dimensions of the array. For a matrix with n rows and m columns, shape will be (n,m).
ndarray.size - the total number of elements of the array.
ndarray.dtype - numpy.int32, numpy.int16, and numpy.float64 are some examples.
ndarray.itemsize - the size in bytes of elements of the array. For example, elements of type float64 has itemsize 8 (=64/8)

To use numpy need to import the module it using of example:



In [2]:

    
import numpy as np  # naming import convention

Terminology Assumption

In the numpy package the terminology used for vectors, matrices and higher-dimensional data sets is array.

Reference Documentation

On the web: http://docs.scipy.org/
Interactive help:



In [ ]:

    
np.array?

If you're looking for something

Creating `numpy` arrays

Get acquainted with NumPy

Let's start by creating some numpy.array objects in order to get our hands into the very details of numpy basic data structure.

NumPy is a very flexible library, and provides many ways to create (and initialize) new numpy arrays.

One way is using specific functions dedicated to generate numpy arrays (usually, array of numbers)[+]

[+] More on data types, later on !-)

First `numpy array` example: array of numbers

NumPy provides many functions to generate arrays with with specific properties (e.g. size or shape).

We will see later examples in which we will generate ndarray using explicit Python lists.

However, for larger arrays, using Python lists is simply inpractical.

`np.arange`

In standard Python, we use the range function to generate an iterable object of integers within a specific range (at a specified step, default: 1)



In [3]:

    
r = range(10)
print(list(r))

print(type(r))  # NOTE: if this print will return a <type 'list'> it means you're using Py2.7









    



[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
<class 'range'>

Similarly, in numpy there is the arange function which instead generates a numpy.ndarray



In [4]:

    
ra = np.arange(10) 
print(ra)

print(type(ra))









    



[0 1 2 3 4 5 6 7 8 9]
<class 'numpy.ndarray'>

However, we are working with the Numerical Python library, so we should expect more when it comes to numbers.

In fact, we can create an array within a floating point step-wise range:



In [5]:

    
# floating point step-wise range generatation
raf = np.arange(-1, 1, 0.1)  
print(raf)









    



[-1.00000000e+00 -9.00000000e-01 -8.00000000e-01 -7.00000000e-01
 -6.00000000e-01 -5.00000000e-01 -4.00000000e-01 -3.00000000e-01
 -2.00000000e-01 -1.00000000e-01 -2.22044605e-16  1.00000000e-01
  2.00000000e-01  3.00000000e-01  4.00000000e-01  5.00000000e-01
  6.00000000e-01  7.00000000e-01  8.00000000e-01  9.00000000e-01]

Properties of `numpy array`

Apart from the actual content, which is of course different because specified ranges are different, the ra and raf arrays differ by their dtype:



In [6]:

    
print(f"dtype of 'ra': {ra.dtype}, dtype of 'raf': {raf.dtype}")









    



dtype of 'ra': int64, dtype of 'raf': float64

More properties of the `numpy array`



In [7]:

    
ra.itemsize # bytes per element









    Out[7]:





8



In [8]:

    
ra.nbytes # number of bytes









    Out[8]:





80



In [9]:

    
ra.ndim # number of dimensions









    Out[9]:





1



In [10]:

    
ra.shape # shape, i.e. number of elements per-dimension/axis









    Out[10]:





(10,)



In [ ]:

    
## please replicate the same set of operations here for `raf`



In [ ]:

    
# your code here

Q: Do you notice any relevant difference?

`np.linspace` and `np.logspace`

Like np.arange, in numpy there are other two "similar" functions:

np.linspace
np.logspace

Looking at the examples below, can you spot the difference?



In [11]:

    
np.linspace(0, 10, 20)









    Out[11]:





array([ 0.        ,  0.52631579,  1.05263158,  1.57894737,  2.10526316,
        2.63157895,  3.15789474,  3.68421053,  4.21052632,  4.73684211,
        5.26315789,  5.78947368,  6.31578947,  6.84210526,  7.36842105,
        7.89473684,  8.42105263,  8.94736842,  9.47368421, 10.        ])



In [12]:

    
np.logspace(0, np.e**2, 10, base=np.e)









    Out[12]:





array([1.00000000e+00, 2.27278564e+00, 5.16555456e+00, 1.17401982e+01,
       2.66829540e+01, 6.06446346e+01, 1.37832255e+02, 3.13263169e+02,
       7.11980032e+02, 1.61817799e+03])

Random Number Generation

`np.random.rand` & `np.random.randn`



In [13]:

    
# uniform random numbers in [0,1]
ru = np.random.rand(10)



In [14]:

    
ru









    Out[14]:





array([0.06629061, 0.56102955, 0.81081042, 0.80936217, 0.19182628,
       0.78609316, 0.88379009, 0.45329187, 0.84304588, 0.56232631])

Note: numbers and the content of the array may vary



In [15]:

    
# standard normal distributed random numbers
rs = np.random.randn(10)



In [16]:

    
rs









    Out[16]:





array([ 0.45052791, -0.80566857, -0.10401981,  0.91948746, -0.0329787 ,
       -0.71872119,  1.42738938, -0.63292836,  0.5397375 ,  0.89186053])

Note: numbers and the content of the array may vary

Q: What if I ask you to generate random numbers in a way that we both obtain the very same numbers? (Provided we share the same CPU architecture)

Zeros and Ones (or Empty)

`np.zeros`, `np.ones`, `np.empty`

Sometimes it may be required to initialise arrays of zeros, or of all ones or finally just rubbish (i.e. empty) of a specific shape:



In [17]:

    
Z = np.zeros((3,3))

print(Z)









    



[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]



In [18]:

    
O = np.ones((3, 3))
print(O)









    



[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]



In [19]:

    
E = np.empty(10)

print(E)









    



[0.45052791 0.80566857 0.10401981 0.91948746 0.0329787  0.71872119
 1.42738938 0.63292836 0.5397375  0.89186053]



In [ ]:

    
# TRY THIS!

np.empty(9)

Other specialised Functions

Diagonal Matrices

1. `np.diag`



In [20]:

    
# a diagonal matrix
np.diag([1,2,3])









    Out[20]:





array([[1, 0, 0],
       [0, 2, 0],
       [0, 0, 3]])



In [21]:

    
# diagonal with offset from the main diagonal
np.diag([1,2,3], k=1)









    Out[21]:





array([[0, 0, 3, 0],
       [0, 2, 0, 0],
       [1, 0, 0, 0],
       [0, 0, 0, 0]])

Identity Matrix $\mathrm{I} \mapsto$ `np.eye`



In [22]:

    
# a diagonal matrix with ones on the main diagonal
np.eye(3, dtype='int')  # 3 is the









    Out[22]:





array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]])

Create `numpy.ndarray` from `list`

To create new vector or matrix arrays from Python lists we can use the numpy.array constructor function:



In [23]:

    
v = np.array([1,2,3,4])
v









    Out[23]:





array([1, 2, 3, 4])



In [24]:

    
print(type(v))









    



<class 'numpy.ndarray'>

Alternatively there is also the np.asarray function which easily convert a Python list into a numpy array:



In [25]:

    
v = np.asarray([1, 2, 3, 4])
v









    Out[25]:





array([1, 2, 3, 4])



In [26]:

    
print(type(v))









    



<class 'numpy.ndarray'>

We can use the very same strategy for higher-dimensional arrays.

E.g. Let's create a matrix from a list of lists:



In [27]:

    
M = np.array([[1, 2], [3, 4]])
M









    Out[27]:





array([[1, 2],
       [3, 4]])



In [28]:

    
v.shape, M.shape









    Out[28]:





((4,), (2, 2))

So, why is it useful then?

So far the numpy.ndarray looks awefully much like a Python list (or nested list).

Why not simply use Python lists for computations instead of creating a new array type?

There are several reasons:

Python lists are very general.
- They can contain any kind of object.
- They are dynamically typed.
- They do not support mathematical functions such as matrix and dot multiplications, etc.
- Implementing such functions for Python lists would not be very efficient because of the dynamic typing.

Numpy arrays are statically typed and homogeneous.
- The type of the elements is determined when array is created.

Numpy arrays are memory efficient.
- Because of the static typing, fast implementation of mathematical functions such as multiplication and addition of numpy arrays can be implemented in a compiled language (C and Fortran is used).



In [29]:

    
L = range(100000)



In [30]:

    
%timeit [i**2 for i in L]









    



41.7 ms ± 14.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)



In [31]:

    
a = np.arange(100000)



In [32]:

    
%timeit a**2  # This operation is called Broadcasting - more on this later!









    



92.9 µs ± 10.1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)



In [33]:

    
%timeit [element**2 for element in a]









    



48.4 ms ± 18.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Exercises: DIY

Simple arrays

Create simple one and two dimensional arrays. First, redo the examples from above. And then create your own.
Use the functions len, shape and ndim on some of those arrays and observe their output.



In [ ]:

Creating arrays using functions

Experiment with arange, linspace, ones, zeros, eye and diag.
Create different kinds of arrays with random numbers.
Try setting the seed before creating an array with random values
- hint: use np.random.seed



In [ ]:

Numpy Array Object

NumPy has a multidimensional array object called ndarray. It consists of two parts as follows:

The actual data
Some metadata describing the data

The majority of array operations leave the raw data untouched. The only aspect that changes is the metadata.

Data vs Metadata (Attributes)

This internal separation between actual data (i.e. the content of the array --> the memory) and metadata (i.e. properties and attributes of the data), allows for example for an efficient memory management.

For example, the shape of an Numpy array can be modified without copying and/or affecting the actual data, which makes it a fast operation even for large arrays.



In [34]:

    
a = np.arange(45)

a









    Out[34]:





array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44])



In [35]:

    
a.shape









    Out[35]:





(45,)



In [36]:

    
A = a.reshape(9, 5)

A









    Out[36]:





array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44]])



In [37]:

    
n, m = A.shape



In [38]:

    
B = A.reshape((1,n*m))
B









    Out[38]:





array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
        16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
        32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44]])

Q: What is the difference (in terms of shape) between B and the original a?

Flattening

Another (quite common) reshaping operation you will end up performing on n-dimensional arrays is flattening.

Flattening means collapsing all the axis into a unique one

`np.ravel`

numpy.ndarray objects have a ravel method that generates a new version of the array as a 1D vector.

Also this time, the original memory is unaffected, and a pointer with different metadata is returned.



In [39]:

    
A = np.array([[1, 2, 3], [4, 5, 6]])
A.ravel()









    Out[39]:





array([1, 2, 3, 4, 5, 6])

By default, the np.ravel performs the operation row-wise á-la-C. Numpy also support a Fortran-style order of indices (i.e. column-major indexing)



In [40]:

    
A.ravel('F')  # order F (Fortran) is column-major, C (default) row-major









    Out[40]:





array([1, 4, 2, 5, 3, 6])

Alternatively We can also use the function np.flatten to make a higher-dimensional array into a vector. But this function create a copy of the data.

Transpose

Similarly, we can transpose a matrix



In [41]:

    
A.T









    Out[41]:





array([[1, 4],
       [2, 5],
       [3, 6]])



In [42]:

    
A.T.ravel()









    Out[42]:





array([1, 4, 2, 5, 3, 6])

Introducing `np.newaxis`

In addition to shape, we can also manipulate the axis of an array.

(1) We can always add as many axis as we want:



In [43]:

    
A = np.arange(20).reshape(10, 2)
A = A[np.newaxis, ...]  # this is called ellipsis

print(A.shape)









    



(1, 10, 2)

(2) We can also permute axis:



In [44]:

    
A = A.swapaxes(0, 2)  # swap axis 0 with axis 2 --> new shape: (2, 10, 1)

print(A.shape)









    



(2, 10, 1)

Again, changin and manipulating the axis will not touch the memory, it will just change parameters (i.e. strides and offset) to navigate data.

Numerical Types and Precision

In NumPy, talking about int or float does not make "real sense". This is mainly for two reasons:

(a) int or float are assumed at the maximum precision available on your machine (presumably int64 and float64, respectively.

(b) Different precision imply different numerical ranges, and so different memory size (i.e. number of bytes required to represent all the numbers in the corresponding numerical range).

Numpy support the following numerical types:

bool             | This stores boolean (True or False) as a bit

int0             | This is a platform integer (normally either int32 or int64)
int8             | This is an integer ranging from -128 to 127
int16            | This is an integer ranging from -32768 to 32767
int32            | This is an integer ranging from -2 ** 31 to 2 ** 31 -1
int64            | This is an integer ranging from -2 ** 63 to 2 ** 63 -1

uint8            | This is an unsigned integer ranging from 0 to 255
uint16           | This is an unsigned integer ranging from 0 to 65535
uint32           | This is an unsigned integer ranging from 0 to 2 ** 32 - 1
uint64           | This is an unsigned integer ranging from 0 to 2 ** 64 - 1

float16          | This is a half precision float with sign bit, 5 bits exponent, and 10 bits mantissa
float32          | This is a single precision float with sign bit, 8 bits exponent, and 23 bits mantissa
float64 or float | This is a double precision float with sign bit, 11 bits exponent, and 52 bits mantissa
complex64        | This is a complex number represented by two 32-bit floats (real and imaginary components)
complex128       | This is a complex number represented by two 64-bit floats (real and imaginary components)
(or complex)

Numerical Types and Representation

The numerical dtype of an array should be selected very carefully, as it directly affects the numerical representation of elements, that is:

the number of bytes used;
the numerical range

We can always specify the dtype of an array when we create one. If we do not, the dtype of the array will be inferred, namely np.int_ or np.float_ depending on the case.



In [45]:

    
a = np.arange(10)
print(a)

print(a.dtype)









    



[0 1 2 3 4 5 6 7 8 9]
int64



In [46]:

    
au = np.arange(10, dtype=np.uint8)
print(au)

print(au.dtype)









    



[0 1 2 3 4 5 6 7 8 9]
uint8

So, then: What happens if I try to represent a number that is Out of range?

Let's have a go with integers, i.e., int8 and uint8



In [47]:

    
x = np.zeros(4, 'int8')  # Integer ranging from -128 to 127
x









    Out[47]:





array([0, 0, 0, 0], dtype=int8)

Spoiler Alert: very simple example of indexing in NumPy

Well...it works as expected, doesn't it?



In [48]:

    
x[0] = 127
x









    Out[48]:





array([127,   0,   0,   0], dtype=int8)



In [49]:

    
x[0] = 128
x









    Out[49]:





array([-128,    0,    0,    0], dtype=int8)



In [50]:

    
x[1] = 129
x









    Out[50]:





array([-128, -127,    0,    0], dtype=int8)



In [51]:

    
x[2] = 257  # i.e. (128 x 2) + 1
x









    Out[51]:





array([-128, -127,    1,    0], dtype=int8)



In [52]:

    
ux = np.zeros(4, 'uint8')  # Integer ranging from 0 to 255, dtype also as string!
ux









    Out[52]:





array([0, 0, 0, 0], dtype=uint8)



In [53]:

    
ux[0] = 255
ux[1] = 256
ux[2] = 257
ux[3] = 513  # (256 x 2) + 1
ux









    Out[53]:





array([255,   0,   1,   1], dtype=uint8)

Machine Info and Supported Numerical Representation

Numpy provides two functions to inspect the information of supported integer and floating-point types, namely np.iinfo and np.finfo:



In [54]:

    
np.iinfo(np.int32)









    Out[54]:





iinfo(min=-2147483648, max=2147483647, dtype=int32)



In [55]:

    
np.finfo(np.float16)









    Out[55]:





finfo(resolution=0.001, min=-6.55040e+04, max=6.55040e+04, dtype=float16)

In addition, the MachAr class will provide information on the current machine :



In [56]:

    
machine_info = np.MachAr()



In [57]:

    
machine_info.epsilon









    Out[57]:





2.220446049250313e-16



In [58]:

    
machine_info.huge









    Out[58]:





1.7976931348623157e+308



In [59]:

    
np.finfo(np.float64).max == machine_info.huge









    Out[59]:





True



In [ ]:

    
# TRY THIS!

help(machine_info)

Data Type Object

Data type objects are instances of the numpy.dtype class.

Once again, arrays have a data type.
To be precise, every element in a NumPy array has the same data type.

The data type object can tell you the size of the data in bytes.
(Recall: The size in bytes is given by the itemsize attribute of the dtype class)



In [60]:

    
a = np.arange(7, dtype=np.uint16)
print('a itemsize: ', a.itemsize)
print('a.dtype.itemsize: ', a.dtype.itemsize)









    



a itemsize:  2
a.dtype.itemsize:  2

Character Codes

Character codes are included for backward compatibility with Numeric.
Numeric is the predecessor of NumPy. Their use is not recommended, but these codes pop up in several places.

Btw, You should instead use the dtype objects.

integer                     i
Unsigned integer            u
Single precision float      f
Double precision float      d
bool                        b
complex                     D
string                      S
unicode                     U

`dtype` contructors



In [61]:

    
np.dtype(float)









    Out[61]:





dtype('float64')



In [62]:

    
np.dtype('f')









    Out[62]:





dtype('float32')



In [63]:

    
np.dtype('d')









    Out[63]:





dtype('float64')



In [64]:

    
np.dtype('f8')









    Out[64]:





dtype('float64')



In [65]:

    
np.dtype('U10')  # Unicode string of up to 10 chars









    Out[65]:





dtype('<U10')

Note: A listing of all data type names can be found by calling np.sctypeDict.keys()

Custom `dtype`

We can use the np.dtype constructor to create a custom record type.



In [66]:

    
rt = np.dtype([('name', np.str_, 40), ('numitems', np.int32), ('price', np.float32)])



In [67]:

    
rt['name']  # see the difference with Python 2









    Out[67]:





dtype('<U40')



In [68]:

    
rt['numitems']









    Out[68]:





dtype('int32')



In [69]:

    
rt['price']









    Out[69]:





dtype('float32')

Instantiate an array of dtype equal to t (record type)



In [70]:

    
record_items = np.array([('Meaning of life DVD', 42, 3.14), ('Butter', 13, 2.72)], 
                        dtype=rt)



In [71]:

    
print(record_items)









    



[('Meaning of life DVD', 42, 3.14) ('Butter', 13, 2.72)]

Exercises - Basic Numpy

Ex 1.1

Create an array containing integers from $2$ to $2^6$



In [ ]:

Ex 1.2

Print ndarray attributes and properties (e.g. type, dtype, shape...) using previous on



In [ ]:

Ex 1.3

Create a 3x3 Matrix array and fill it with random integer numbers

hint: Take a look at np.random.randint



In [ ]:

Ex 1.4

Create a list containing $5$ others lists of integers, all of the same size. Convert this list of lists into a matrix (i.e. numpy.ndarray)



In [ ]:

Ex 1.5

What happens if we generate an array converting a list of lists of different lengths?



In [ ]: