In this notebook we will focus on NumPy built-in support for Serialisation and I/O. In other words, we will learn how to save and load NumPy ndarray
objects in native (binary) format for easy sharing. Moreover we are going to discover how NumPy can load data from external files.
In [1]:
import numpy as np
A very common file format for data files are the comma-separated values (CSV), or related format such as TSV (tab-separated values).
To read data from such file into Numpy arrays we can use the numpy.genfromtxt
function.
In [2]:
# In Jupyter, all commands starting with ! are mapped as SHELL commands
!head stockholm_td_adj.dat
In [3]:
np.genfromtxt?
In [4]:
st_temperatures = np.genfromtxt('stockholm_td_adj.dat',
skip_header=1)
In [5]:
st_temperatures.shape
Out[5]:
In [6]:
st_temperatures[:10, ]
Out[6]:
In [7]:
st_temperatures.dtype
Out[7]:
In [8]:
## Calculate which and how many years we have in our data
years = np.unique(st_temperatures[:, 0]).astype(np.int)
years, len(years)
Out[8]:
In [10]:
years.min(), years.max()
Out[10]:
In [11]:
!head stockholm_td_adj.dat
In [12]:
mask_year = st_temperatures[:, 0] == 1984
In [24]:
mask_feb = st_temperatures[:, 1] == 2
In [25]:
mask_feb.shape
Out[25]:
In [26]:
mask_year.dtype
Out[26]:
In [27]:
type(mask_year)
Out[27]:
In [28]:
## Calculate the mean temperature of mid-days on February in 1984
feb_noon_temps = st_temperatures[(mask_year & mask_feb), 4]
In [29]:
type(feb_noon_temps)
Out[29]:
In [30]:
feb_noon_temps.dtype
Out[30]:
In [31]:
feb_noon_temps.mean()
Out[31]:
In [21]:
## ....
Useful when storing and reading back numpy array data.
Use the functions np.save
and np.load
:
In [22]:
np.save("st_temperatures.npy", st_temperatures)
See also:
np.savez
: save several NumPy arrays into one single filenp.savez_compressed
np.savetxt
In [23]:
T = np.load("st_temperatures.npy")
print(T.shape, T.dtype)
If you are a MATLAB® user I do recommend to read Numpy for MATLAB Users.
In addition to the numpy.ndarray
type, NumPy also support a very specific data type called Matrix
.
This special type of object has been introduced to allow for API and programming compatibility with MATLAB®.
Note: The most relevant feature of this new array type is the behavior of the standard arithmetic operators +, -, *
to use matrix algebra, which work as they would in MATLAB.
In [2]:
from numpy import matrix
In [3]:
a = np.arange(0, 5)
A = np.array([[n+m*10 for n in range(5)] for m in range(5)])
In [4]:
a
Out[4]:
In [5]:
A
Out[5]:
In [6]:
M = matrix(A)
v = matrix(a).T # make it a column vector
In [7]:
a
Out[7]:
In [8]:
M * M
Out[8]:
In [9]:
A @ A # @ operator equivalent to np.dot(A, A)
Out[9]:
In [10]:
# Element wise multiplication in NumPy
A * A
Out[10]:
In [11]:
M * v
Out[11]:
In [12]:
A * a
Out[12]:
In [13]:
# inner product
v.T * v
Out[13]:
In [14]:
# with matrix objects, standard matrix algebra applies
v + M*v
Out[14]:
If we try to add, subtract or multiply objects with incomplatible shapes we get an error:
In [15]:
v_incompat = matrix(list(range(1, 7))).T
In [16]:
M.shape, v_incompat.shape
Out[16]:
In [17]:
M * v_incompat
See also the related functions: inner
, outer
, cross
, kron
, tensordot
.
Try for example help(inner)
.
Let's create a numpy.ndarray
object
In [21]:
A = np.random.rand(10000, 300, 50) # note: this may take a while
In [22]:
A
Out[22]:
In [20]:
from scipy import io as spio
In [23]:
spio.savemat('numpy_to.mat', {'A': A}, oned_as='row') # savemat expects a dictionary
MATLAB $\mapsto$ NumPy: scipy.io.loadmat
In [24]:
data_dictionary = spio.loadmat('numpy_to.mat')
In [25]:
list(data_dictionary.keys())
Out[25]:
In [26]:
data_dictionary['A']
Out[26]:
In [27]:
A_load = data_dictionary['A']
In [28]:
np.all(A == A_load)
Out[28]:
In [30]:
type(A_load)
Out[30]: