NumPy is the fundamental package for scientific computing with Python. It is a package that provide high-performance vector, matrix and higher-dimensional data structures for Python. It is implemented in C and Fortran so when calculations are vectorized, performance is very good.
So, in a nutshell:
If you are a MATLAB® user I do recommend to read Numpy for MATLAB Users.
I'm a supporter of the Open Science Movement, thus I humbly suggest you to take a look at the Science Code Manifesto
NumPy's main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type.
In Numpy dimensions are called axes.
The number of axes is called rank.
The most important attributes of an ndarray object are:
To use numpy
need to import the module it using of example:
In [2]:
import numpy as np # naming import convention
In the numpy
package the terminology used for vectors, matrices and higher-dimensional data sets is array.
On the web: http://docs.scipy.org/
Interactive help:
In [ ]:
np.array?
If you're looking for something
Let's start by creating some numpy.array
objects in order to get our hands into the very details of numpy basic data structure.
NumPy is a very flexible library, and provides many ways to create (and initialize) new numpy arrays.
One way is using specific functions dedicated to generate numpy arrays (usually, array of numbers)[+]
[+] More on data types, later on !-)
NumPy provides many functions to generate arrays with with specific properties (e.g. size
or shape
).
We will see later examples in which we will generate ndarray
using explicit Python lists.
However, for larger arrays, using Python lists is simply inpractical.
In standard Python, we use the range
function to generate an iterable object of integers within a specific range (at a specified step
, default: 1
)
In [3]:
r = range(10)
print(list(r))
print(type(r)) # NOTE: if this print will return a <type 'list'> it means you're using Py2.7
Similarly, in numpy there is the arange
function which instead generates a numpy.ndarray
In [4]:
ra = np.arange(10)
print(ra)
print(type(ra))
However, we are working with the Numerical Python library, so we should expect more when it comes to numbers.
In fact, we can create an array within a floating point step-wise range:
In [5]:
# floating point step-wise range generatation
raf = np.arange(-1, 1, 0.1)
print(raf)
Apart from the actual content, which is of course different because specified ranges are different, the ra
and raf
arrays differ by their dtype
:
In [6]:
print(f"dtype of 'ra': {ra.dtype}, dtype of 'raf': {raf.dtype}")
In [7]:
ra.itemsize # bytes per element
Out[7]:
In [8]:
ra.nbytes # number of bytes
Out[8]:
In [9]:
ra.ndim # number of dimensions
Out[9]:
In [10]:
ra.shape # shape, i.e. number of elements per-dimension/axis
Out[10]:
In [ ]:
## please replicate the same set of operations here for `raf`
In [ ]:
# your code here
Q: Do you notice any relevant difference?
Like np.arange
, in numpy there are other two "similar" functions:
Looking at the examples below, can you spot the difference?
In [11]:
np.linspace(0, 10, 20)
Out[11]:
In [12]:
np.logspace(0, np.e**2, 10, base=np.e)
Out[12]:
In [13]:
# uniform random numbers in [0,1]
ru = np.random.rand(10)
In [14]:
ru
Out[14]:
Note: numbers and the content of the array may vary
In [15]:
# standard normal distributed random numbers
rs = np.random.randn(10)
In [16]:
rs
Out[16]:
Note: numbers and the content of the array may vary
Q: What if I ask you to generate random numbers in a way that we both obtain the very same numbers? (Provided we share the same CPU architecture)
In [17]:
Z = np.zeros((3,3))
print(Z)
In [18]:
O = np.ones((3, 3))
print(O)
In [19]:
E = np.empty(10)
print(E)
In [ ]:
# TRY THIS!
np.empty(9)
In [20]:
# a diagonal matrix
np.diag([1,2,3])
Out[20]:
In [21]:
# diagonal with offset from the main diagonal
np.diag([1,2,3], k=1)
Out[21]:
In [22]:
# a diagonal matrix with ones on the main diagonal
np.eye(3, dtype='int') # 3 is the
Out[22]:
To create new vector or matrix arrays from Python lists we can use the
numpy.array
constructor function:
In [23]:
v = np.array([1,2,3,4])
v
Out[23]:
In [24]:
print(type(v))
Alternatively there is also the np.asarray
function which easily convert a Python list into a numpy array:
In [25]:
v = np.asarray([1, 2, 3, 4])
v
Out[25]:
In [26]:
print(type(v))
We can use the very same strategy for higher-dimensional arrays.
E.g. Let's create a matrix from a list of lists:
In [27]:
M = np.array([[1, 2], [3, 4]])
M
Out[27]:
In [28]:
v.shape, M.shape
Out[28]:
So far the numpy.ndarray
looks awefully much like a Python list (or nested list).
Why not simply use Python lists for computations instead of creating a new array type?
There are several reasons:
numpy
arrays can be implemented in a compiled language (C and Fortran is used).
In [29]:
L = range(100000)
In [30]:
%timeit [i**2 for i in L]
In [31]:
a = np.arange(100000)
In [32]:
%timeit a**2 # This operation is called Broadcasting - more on this later!
In [33]:
%timeit [element**2 for element in a]
Create simple one and two dimensional arrays. First, redo the examples from above. And then create your own.
Use the functions len
, shape
and ndim
on some of those arrays and
observe their output.
In [ ]:
Experiment with arange
, linspace
, ones
, zeros
, eye
and diag
.
Create different kinds of arrays with random numbers.
Try setting the seed before creating an array with random values
np.random.seed
In [ ]:
NumPy
has a multidimensional array object called ndarray. It consists of two parts as follows:
The majority of array operations leave the raw data untouched. The only aspect that changes is the metadata.
This internal separation between actual data (i.e. the content of the array --> the memory
) and metadata (i.e. properties and attributes of the data), allows for example for an efficient memory management.
For example, the shape of an Numpy array can be modified without copying and/or affecting the actual data, which makes it a fast operation even for large arrays.
In [34]:
a = np.arange(45)
a
Out[34]:
In [35]:
a.shape
Out[35]:
In [36]:
A = a.reshape(9, 5)
A
Out[36]:
In [37]:
n, m = A.shape
In [38]:
B = A.reshape((1,n*m))
B
Out[38]:
Q: What is the difference (in terms of shape) between B
and the original a
?
In [39]:
A = np.array([[1, 2, 3], [4, 5, 6]])
A.ravel()
Out[39]:
By default, the np.ravel
performs the operation row-wise á-la-C. Numpy also support a Fortran-style order of indices (i.e. column-major indexing)
In [40]:
A.ravel('F') # order F (Fortran) is column-major, C (default) row-major
Out[40]:
Alternatively We can also use the function np.flatten
to make a higher-dimensional array into a vector. But this function create a copy of the data.
In [41]:
A.T
Out[41]:
In [42]:
A.T.ravel()
Out[42]:
(1) We can always add as many axis as we want:
In [43]:
A = np.arange(20).reshape(10, 2)
A = A[np.newaxis, ...] # this is called ellipsis
print(A.shape)
(2) We can also permute axis:
In [44]:
A = A.swapaxes(0, 2) # swap axis 0 with axis 2 --> new shape: (2, 10, 1)
print(A.shape)
Again, changin and manipulating the axis
will not touch the memory, it will just change parameters (i.e. strides
and offset
) to navigate data.
In NumPy, talking about int
or float
does not make "real sense". This is mainly for two reasons:
(a) int
or float
are assumed at the maximum precision available on your machine (presumably int64
and
float64
, respectively.
(b) Different precision imply different numerical ranges, and so different memory size (i.e. number of bytes required to represent all the numbers in the corresponding numerical range).
Numpy support the following numerical types:
bool | This stores boolean (True or False) as a bit
int0 | This is a platform integer (normally either int32 or int64)
int8 | This is an integer ranging from -128 to 127
int16 | This is an integer ranging from -32768 to 32767
int32 | This is an integer ranging from -2 ** 31 to 2 ** 31 -1
int64 | This is an integer ranging from -2 ** 63 to 2 ** 63 -1
uint8 | This is an unsigned integer ranging from 0 to 255
uint16 | This is an unsigned integer ranging from 0 to 65535
uint32 | This is an unsigned integer ranging from 0 to 2 ** 32 - 1
uint64 | This is an unsigned integer ranging from 0 to 2 ** 64 - 1
float16 | This is a half precision float with sign bit, 5 bits exponent, and 10 bits mantissa
float32 | This is a single precision float with sign bit, 8 bits exponent, and 23 bits mantissa
float64 or float | This is a double precision float with sign bit, 11 bits exponent, and 52 bits mantissa
complex64 | This is a complex number represented by two 32-bit floats (real and imaginary components)
complex128 | This is a complex number represented by two 64-bit floats (real and imaginary components)
(or complex)
The numerical dtype of an array should be selected very carefully, as it directly affects the numerical representation of elements, that is:
We can always specify the dtype
of an array when we create one. If we do not, the dtype
of the array will be inferred, namely np.int_
or np.float_
depending on the case.
In [45]:
a = np.arange(10)
print(a)
print(a.dtype)
In [46]:
au = np.arange(10, dtype=np.uint8)
print(au)
print(au.dtype)
So, then: What happens if I try to represent a number that is Out of range?
Let's have a go with integers, i.e., int8
and uint8
In [47]:
x = np.zeros(4, 'int8') # Integer ranging from -128 to 127
x
Out[47]:
Spoiler Alert: very simple example of indexing in NumPy
Well...it works as expected, doesn't it?
In [48]:
x[0] = 127
x
Out[48]:
In [49]:
x[0] = 128
x
Out[49]:
In [50]:
x[1] = 129
x
Out[50]:
In [51]:
x[2] = 257 # i.e. (128 x 2) + 1
x
Out[51]:
In [52]:
ux = np.zeros(4, 'uint8') # Integer ranging from 0 to 255, dtype also as string!
ux
Out[52]:
In [53]:
ux[0] = 255
ux[1] = 256
ux[2] = 257
ux[3] = 513 # (256 x 2) + 1
ux
Out[53]:
Numpy provides two functions to inspect the information of supported integer and floating-point types, namely np.iinfo
and np.finfo
:
In [54]:
np.iinfo(np.int32)
Out[54]:
In [55]:
np.finfo(np.float16)
Out[55]:
In addition, the MachAr
class will provide information on the current machine :
In [56]:
machine_info = np.MachAr()
In [57]:
machine_info.epsilon
Out[57]:
In [58]:
machine_info.huge
Out[58]:
In [59]:
np.finfo(np.float64).max == machine_info.huge
Out[59]:
In [ ]:
# TRY THIS!
help(machine_info)
Data type objects are instances of the numpy.dtype
class.
Once again, arrays have a data type.
To be precise, every element in a NumPy array has the same data type.
The data type object can tell you the size
of the data in bytes.
(Recall: The size in bytes is given by the itemsize
attribute of the dtype class)
In [60]:
a = np.arange(7, dtype=np.uint16)
print('a itemsize: ', a.itemsize)
print('a.dtype.itemsize: ', a.dtype.itemsize)
Character codes are included for backward compatibility with Numeric.
Numeric is the predecessor of NumPy. Their use is not recommended, but these codes pop up in several places.
Btw, You should instead use the dtype objects.
integer i
Unsigned integer u
Single precision float f
Double precision float d
bool b
complex D
string S
unicode U
In [61]:
np.dtype(float)
Out[61]:
In [62]:
np.dtype('f')
Out[62]:
In [63]:
np.dtype('d')
Out[63]:
In [64]:
np.dtype('f8')
Out[64]:
In [65]:
np.dtype('U10') # Unicode string of up to 10 chars
Out[65]:
Note: A listing of all data type names can be found by calling np.sctypeDict.keys()
We can use the np.dtype
constructor to create a custom record type.
In [66]:
rt = np.dtype([('name', np.str_, 40), ('numitems', np.int32), ('price', np.float32)])
In [67]:
rt['name'] # see the difference with Python 2
Out[67]:
In [68]:
rt['numitems']
Out[68]:
In [69]:
rt['price']
Out[69]:
dtype
equal to t
(record type)
In [70]:
record_items = np.array([('Meaning of life DVD', 42, 3.14), ('Butter', 13, 2.72)],
dtype=rt)
In [71]:
print(record_items)
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]: