What is Numpy

NumPy is the fundamental package for scientific computing with Python. It is a package that provide high-performance vector, matrix and higher-dimensional data structures for Python. It is implemented in C and Fortran so when calculations are vectorized, performance is very good.

So, in a nutshell:

  • a powerful Python extension for N-dimensional array
  • a tool for integrating C/C++ and Fortran code
  • designed for scientific computation: linear algebra and Signal Analysis

If you are a MATLAB® user I do recommend to read Numpy for MATLAB Users.

I'm a supporter of the Open Science Movement, thus I humbly suggest you to take a look at the Science Code Manifesto

Getting Started with Numpy Arrays

NumPy's main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type.

In Numpy dimensions are called axes.

The number of axes is called rank.

The most important attributes of an ndarray object are:

  • ndarray.ndim - the number of axes (dimensions) of the array.
  • ndarray.shape - the dimensions of the array. For a matrix with n rows and m columns, shape will be (n,m).
  • ndarray.size - the total number of elements of the array.
  • ndarray.dtype - numpy.int32, numpy.int16, and numpy.float64 are some examples.
  • ndarray.itemsize - the size in bytes of elements of the array. For example, elements of type float64 has itemsize 8 (=64/8)

To use numpy need to import the module it using of example:


In [2]:
import numpy as np  # naming import convention

Terminology Assumption

In the numpy package the terminology used for vectors, matrices and higher-dimensional data sets is array.

Reference Documentation


In [ ]:
np.array?

If you're looking for something

Creating numpy arrays

Get acquainted with NumPy

Let's start by creating some numpy.array objects in order to get our hands into the very details of numpy basic data structure.

NumPy is a very flexible library, and provides many ways to create (and initialize) new numpy arrays.

One way is using specific functions dedicated to generate numpy arrays (usually, array of numbers)[+]

[+] More on data types, later on !-)

First numpy array example: array of numbers

NumPy provides many functions to generate arrays with with specific properties (e.g. size or shape).

We will see later examples in which we will generate ndarray using explicit Python lists.

However, for larger arrays, using Python lists is simply inpractical.

np.arange

In standard Python, we use the range function to generate an iterable object of integers within a specific range (at a specified step, default: 1)


In [3]:
r = range(10)
print(list(r))

print(type(r))  # NOTE: if this print will return a <type 'list'> it means you're using Py2.7


[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
<class 'range'>

Similarly, in numpy there is the arange function which instead generates a numpy.ndarray


In [4]:
ra = np.arange(10) 
print(ra)

print(type(ra))


[0 1 2 3 4 5 6 7 8 9]
<class 'numpy.ndarray'>

However, we are working with the Numerical Python library, so we should expect more when it comes to numbers.

In fact, we can create an array within a floating point step-wise range:


In [5]:
# floating point step-wise range generatation
raf = np.arange(-1, 1, 0.1)  
print(raf)


[-1.00000000e+00 -9.00000000e-01 -8.00000000e-01 -7.00000000e-01
 -6.00000000e-01 -5.00000000e-01 -4.00000000e-01 -3.00000000e-01
 -2.00000000e-01 -1.00000000e-01 -2.22044605e-16  1.00000000e-01
  2.00000000e-01  3.00000000e-01  4.00000000e-01  5.00000000e-01
  6.00000000e-01  7.00000000e-01  8.00000000e-01  9.00000000e-01]

Properties of numpy array

Apart from the actual content, which is of course different because specified ranges are different, the ra and raf arrays differ by their dtype:


In [6]:
print(f"dtype of 'ra': {ra.dtype}, dtype of 'raf': {raf.dtype}")


dtype of 'ra': int64, dtype of 'raf': float64

More properties of the numpy array


In [7]:
ra.itemsize # bytes per element


Out[7]:
8

In [8]:
ra.nbytes # number of bytes


Out[8]:
80

In [9]:
ra.ndim # number of dimensions


Out[9]:
1

In [10]:
ra.shape # shape, i.e. number of elements per-dimension/axis


Out[10]:
(10,)

In [ ]:
## please replicate the same set of operations here for `raf`

In [ ]:
# your code here

Q: Do you notice any relevant difference?

np.linspace and np.logspace

Like np.arange, in numpy there are other two "similar" functions:

  • np.linspace
  • np.logspace

Looking at the examples below, can you spot the difference?


In [11]:
np.linspace(0, 10, 20)


Out[11]:
array([ 0.        ,  0.52631579,  1.05263158,  1.57894737,  2.10526316,
        2.63157895,  3.15789474,  3.68421053,  4.21052632,  4.73684211,
        5.26315789,  5.78947368,  6.31578947,  6.84210526,  7.36842105,
        7.89473684,  8.42105263,  8.94736842,  9.47368421, 10.        ])

In [12]:
np.logspace(0, np.e**2, 10, base=np.e)


Out[12]:
array([1.00000000e+00, 2.27278564e+00, 5.16555456e+00, 1.17401982e+01,
       2.66829540e+01, 6.06446346e+01, 1.37832255e+02, 3.13263169e+02,
       7.11980032e+02, 1.61817799e+03])

Random Number Generation

np.random.rand & np.random.randn


In [13]:
# uniform random numbers in [0,1]
ru = np.random.rand(10)

In [14]:
ru


Out[14]:
array([0.06629061, 0.56102955, 0.81081042, 0.80936217, 0.19182628,
       0.78609316, 0.88379009, 0.45329187, 0.84304588, 0.56232631])

Note: numbers and the content of the array may vary


In [15]:
# standard normal distributed random numbers
rs = np.random.randn(10)

In [16]:
rs


Out[16]:
array([ 0.45052791, -0.80566857, -0.10401981,  0.91948746, -0.0329787 ,
       -0.71872119,  1.42738938, -0.63292836,  0.5397375 ,  0.89186053])

Note: numbers and the content of the array may vary

Q: What if I ask you to generate random numbers in a way that we both obtain the very same numbers? (Provided we share the same CPU architecture)

Zeros and Ones (or Empty)

np.zeros, np.ones, np.empty

Sometimes it may be required to initialise arrays of zeros, or of all ones or finally just rubbish (i.e. empty) of a specific shape:


In [17]:
Z = np.zeros((3,3))

print(Z)


[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]

In [18]:
O = np.ones((3, 3))
print(O)


[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]

In [19]:
E = np.empty(10)

print(E)


[0.45052791 0.80566857 0.10401981 0.91948746 0.0329787  0.71872119
 1.42738938 0.63292836 0.5397375  0.89186053]

In [ ]:
# TRY THIS!

np.empty(9)

Other specialised Functions

Diagonal Matrices

1. np.diag


In [20]:
# a diagonal matrix
np.diag([1,2,3])


Out[20]:
array([[1, 0, 0],
       [0, 2, 0],
       [0, 0, 3]])

In [21]:
# diagonal with offset from the main diagonal
np.diag([1,2,3], k=1)


Out[21]:
array([[0, 0, 3, 0],
       [0, 2, 0, 0],
       [1, 0, 0, 0],
       [0, 0, 0, 0]])

Identity Matrix $\mathrm{I} \mapsto$ np.eye


In [22]:
# a diagonal matrix with ones on the main diagonal
np.eye(3, dtype='int')  # 3 is the


Out[22]:
array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]])

Create numpy.ndarray from list

To create new vector or matrix arrays from Python lists we can use the numpy.array constructor function:


In [23]:
v = np.array([1,2,3,4])
v


Out[23]:
array([1, 2, 3, 4])

In [24]:
print(type(v))


<class 'numpy.ndarray'>

Alternatively there is also the np.asarray function which easily convert a Python list into a numpy array:


In [25]:
v = np.asarray([1, 2, 3, 4])
v


Out[25]:
array([1, 2, 3, 4])

In [26]:
print(type(v))


<class 'numpy.ndarray'>

We can use the very same strategy for higher-dimensional arrays.

E.g. Let's create a matrix from a list of lists:


In [27]:
M = np.array([[1, 2], [3, 4]])
M


Out[27]:
array([[1, 2],
       [3, 4]])

In [28]:
v.shape, M.shape


Out[28]:
((4,), (2, 2))

So, why is it useful then?

So far the numpy.ndarray looks awefully much like a Python list (or nested list).

Why not simply use Python lists for computations instead of creating a new array type?

There are several reasons:

  • Python lists are very general.
    • They can contain any kind of object.
    • They are dynamically typed.
    • They do not support mathematical functions such as matrix and dot multiplications, etc.
    • Implementing such functions for Python lists would not be very efficient because of the dynamic typing.
  • Numpy arrays are statically typed and homogeneous.
    • The type of the elements is determined when array is created.
  • Numpy arrays are memory efficient.
    • Because of the static typing, fast implementation of mathematical functions such as multiplication and addition of numpy arrays can be implemented in a compiled language (C and Fortran is used).

In [29]:
L = range(100000)

In [30]:
%timeit [i**2 for i in L]


41.7 ms ± 14.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [31]:
a = np.arange(100000)

In [32]:
%timeit a**2  # This operation is called Broadcasting - more on this later!


92.9 µs ± 10.1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [33]:
%timeit [element**2 for element in a]


48.4 ms ± 18.7 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Exercises: DIY

Simple arrays

  • Create simple one and two dimensional arrays. First, redo the examples from above. And then create your own.

  • Use the functions len, shape and ndim on some of those arrays and observe their output.


In [ ]:

Creating arrays using functions

  • Experiment with arange, linspace, ones, zeros, eye and diag.

  • Create different kinds of arrays with random numbers.

  • Try setting the seed before creating an array with random values

    • hint: use np.random.seed

In [ ]:


Numpy Array Object

NumPy has a multidimensional array object called ndarray. It consists of two parts as follows:

  • The actual data
  • Some metadata describing the data

The majority of array operations leave the raw data untouched. The only aspect that changes is the metadata.

Data vs Metadata (Attributes)

This internal separation between actual data (i.e. the content of the array --> the memory) and metadata (i.e. properties and attributes of the data), allows for example for an efficient memory management.

For example, the shape of an Numpy array can be modified without copying and/or affecting the actual data, which makes it a fast operation even for large arrays.


In [34]:
a = np.arange(45)

a


Out[34]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44])

In [35]:
a.shape


Out[35]:
(45,)

In [36]:
A = a.reshape(9, 5)

A


Out[36]:
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34],
       [35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44]])

In [37]:
n, m = A.shape

In [38]:
B = A.reshape((1,n*m))
B


Out[38]:
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
        16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
        32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44]])

Q: What is the difference (in terms of shape) between B and the original a?

Flattening

Another (quite common) reshaping operation you will end up performing on n-dimensional arrays is flattening.

Flattening means collapsing all the axis into a unique one

np.ravel

numpy.ndarray objects have a ravel method that generates a new version of the array as a 1D vector.

Also this time, the original memory is unaffected, and a pointer with different metadata is returned.


In [39]:
A = np.array([[1, 2, 3], [4, 5, 6]])
A.ravel()


Out[39]:
array([1, 2, 3, 4, 5, 6])

By default, the np.ravel performs the operation row-wise á-la-C. Numpy also support a Fortran-style order of indices (i.e. column-major indexing)


In [40]:
A.ravel('F')  # order F (Fortran) is column-major, C (default) row-major


Out[40]:
array([1, 4, 2, 5, 3, 6])

Alternatively We can also use the function np.flatten to make a higher-dimensional array into a vector. But this function create a copy of the data.

Transpose

Similarly, we can transpose a matrix


In [41]:
A.T


Out[41]:
array([[1, 4],
       [2, 5],
       [3, 6]])

In [42]:
A.T.ravel()


Out[42]:
array([1, 4, 2, 5, 3, 6])

Introducing np.newaxis

In addition to shape, we can also manipulate the axis of an array.

(1) We can always add as many axis as we want:


In [43]:
A = np.arange(20).reshape(10, 2)
A = A[np.newaxis, ...]  # this is called ellipsis

print(A.shape)


(1, 10, 2)

(2) We can also permute axis:


In [44]:
A = A.swapaxes(0, 2)  # swap axis 0 with axis 2 --> new shape: (2, 10, 1)

print(A.shape)


(2, 10, 1)

Again, changin and manipulating the axis will not touch the memory, it will just change parameters (i.e. strides and offset) to navigate data.


Numerical Types and Precision

In NumPy, talking about int or float does not make "real sense". This is mainly for two reasons:

(a) int or float are assumed at the maximum precision available on your machine (presumably int64 and float64, respectively.

(b) Different precision imply different numerical ranges, and so different memory size (i.e. number of bytes required to represent all the numbers in the corresponding numerical range).

Numpy support the following numerical types:

bool             | This stores boolean (True or False) as a bit

int0             | This is a platform integer (normally either int32 or int64)
int8             | This is an integer ranging from -128 to 127
int16            | This is an integer ranging from -32768 to 32767
int32            | This is an integer ranging from -2 ** 31 to 2 ** 31 -1
int64            | This is an integer ranging from -2 ** 63 to 2 ** 63 -1

uint8            | This is an unsigned integer ranging from 0 to 255
uint16           | This is an unsigned integer ranging from 0 to 65535
uint32           | This is an unsigned integer ranging from 0 to 2 ** 32 - 1
uint64           | This is an unsigned integer ranging from 0 to 2 ** 64 - 1

float16          | This is a half precision float with sign bit, 5 bits exponent, and 10 bits mantissa
float32          | This is a single precision float with sign bit, 8 bits exponent, and 23 bits mantissa
float64 or float | This is a double precision float with sign bit, 11 bits exponent, and 52 bits mantissa
complex64        | This is a complex number represented by two 32-bit floats (real and imaginary components)
complex128       | This is a complex number represented by two 64-bit floats (real and imaginary components)
(or complex)

Numerical Types and Representation

The numerical dtype of an array should be selected very carefully, as it directly affects the numerical representation of elements, that is:

  • the number of bytes used;
  • the numerical range

We can always specify the dtype of an array when we create one. If we do not, the dtype of the array will be inferred, namely np.int_ or np.float_ depending on the case.


In [45]:
a = np.arange(10)
print(a)

print(a.dtype)


[0 1 2 3 4 5 6 7 8 9]
int64

In [46]:
au = np.arange(10, dtype=np.uint8)
print(au)

print(au.dtype)


[0 1 2 3 4 5 6 7 8 9]
uint8

So, then: What happens if I try to represent a number that is Out of range?

Let's have a go with integers, i.e., int8 and uint8


In [47]:
x = np.zeros(4, 'int8')  # Integer ranging from -128 to 127
x


Out[47]:
array([0, 0, 0, 0], dtype=int8)

Spoiler Alert: very simple example of indexing in NumPy

Well...it works as expected, doesn't it?


In [48]:
x[0] = 127
x


Out[48]:
array([127,   0,   0,   0], dtype=int8)

In [49]:
x[0] = 128
x


Out[49]:
array([-128,    0,    0,    0], dtype=int8)

In [50]:
x[1] = 129
x


Out[50]:
array([-128, -127,    0,    0], dtype=int8)

In [51]:
x[2] = 257  # i.e. (128 x 2) + 1
x


Out[51]:
array([-128, -127,    1,    0], dtype=int8)

In [52]:
ux = np.zeros(4, 'uint8')  # Integer ranging from 0 to 255, dtype also as string!
ux


Out[52]:
array([0, 0, 0, 0], dtype=uint8)

In [53]:
ux[0] = 255
ux[1] = 256
ux[2] = 257
ux[3] = 513  # (256 x 2) + 1
ux


Out[53]:
array([255,   0,   1,   1], dtype=uint8)

Machine Info and Supported Numerical Representation

Numpy provides two functions to inspect the information of supported integer and floating-point types, namely np.iinfo and np.finfo:


In [54]:
np.iinfo(np.int32)


Out[54]:
iinfo(min=-2147483648, max=2147483647, dtype=int32)

In [55]:
np.finfo(np.float16)


Out[55]:
finfo(resolution=0.001, min=-6.55040e+04, max=6.55040e+04, dtype=float16)

In addition, the MachAr class will provide information on the current machine :


In [56]:
machine_info = np.MachAr()

In [57]:
machine_info.epsilon


Out[57]:
2.220446049250313e-16

In [58]:
machine_info.huge


Out[58]:
1.7976931348623157e+308

In [59]:
np.finfo(np.float64).max == machine_info.huge


Out[59]:
True

In [ ]:
# TRY THIS!

help(machine_info)

Data Type Object

Data type objects are instances of the numpy.dtype class.

Once again, arrays have a data type.
To be precise, every element in a NumPy array has the same data type.

The data type object can tell you the size of the data in bytes.
(Recall: The size in bytes is given by the itemsize attribute of the dtype class)


In [60]:
a = np.arange(7, dtype=np.uint16)
print('a itemsize: ', a.itemsize)
print('a.dtype.itemsize: ', a.dtype.itemsize)


a itemsize:  2
a.dtype.itemsize:  2

Character Codes

Character codes are included for backward compatibility with Numeric.
Numeric is the predecessor of NumPy. Their use is not recommended, but these codes pop up in several places.

Btw, You should instead use the dtype objects.

integer                     i
Unsigned integer            u
Single precision float      f
Double precision float      d
bool                        b
complex                     D
string                      S
unicode                     U

dtype contructors


In [61]:
np.dtype(float)


Out[61]:
dtype('float64')

In [62]:
np.dtype('f')


Out[62]:
dtype('float32')

In [63]:
np.dtype('d')


Out[63]:
dtype('float64')

In [64]:
np.dtype('f8')


Out[64]:
dtype('float64')

In [65]:
np.dtype('U10')  # Unicode string of up to 10 chars


Out[65]:
dtype('<U10')

Note: A listing of all data type names can be found by calling np.sctypeDict.keys()

Custom dtype

We can use the np.dtype constructor to create a custom record type.


In [66]:
rt = np.dtype([('name', np.str_, 40), ('numitems', np.int32), ('price', np.float32)])

In [67]:
rt['name']  # see the difference with Python 2


Out[67]:
dtype('<U40')

In [68]:
rt['numitems']


Out[68]:
dtype('int32')

In [69]:
rt['price']


Out[69]:
dtype('float32')
  • Instantiate an array of dtype equal to t (record type)

In [70]:
record_items = np.array([('Meaning of life DVD', 42, 3.14), ('Butter', 13, 2.72)], 
                        dtype=rt)

In [71]:
print(record_items)


[('Meaning of life DVD', 42, 3.14) ('Butter', 13, 2.72)]

Exercises - Basic Numpy

Ex 1.1

Create an array containing integers from $2$ to $2^6$


In [ ]:

Ex 1.2

Print ndarray attributes and properties (e.g. type, dtype, shape...) using previous on


In [ ]:

Ex 1.3

Create a 3x3 Matrix array and fill it with random integer numbers

  • hint: Take a look at np.random.randint

In [ ]:

Ex 1.4

Create a list containing $5$ others lists of integers, all of the same size. Convert this list of lists into a matrix (i.e. numpy.ndarray)


In [ ]:

Ex 1.5

What happens if we generate an array converting a list of lists of different lengths?


In [ ]: