Learning Objectives: Learn how to create, transform and visualize multidimensional data of a single type using Numpy.
NumPy is the foundation for scientific computing and data science in Python.
NumPy arrays are the foundational data type that the entire Python numerical computing stack is built upon
While this notebook doesn't focus on plotting, matplotlib will be used to make a few basic plots.
In [1]:
%matplotlib inline
from matplotlib import pyplot as plt
plt.style.use('ggplot')
In [2]:
import numpy as np
import vizarray as vz
In [3]:
data = [0,2,4,6]
a = np.array(data)
In [4]:
type(a)
Out[4]:
In [5]:
a
Out[5]:
In [6]:
vz.vizarray(a)
Out[6]:
In [7]:
a.shape
Out[7]:
In [8]:
a.ndim
Out[8]:
In [9]:
a.size
Out[9]:
In [10]:
a.nbytes
Out[10]:
In [11]:
a.dtype
Out[11]:
In [12]:
data = [[0.0,2.0,4.0,6.0],[1.0,3.0,5.0,7.0]]
b = np.array(data)
In [13]:
b
Out[13]:
In [14]:
vz.vizarray(b)
Out[14]:
In [15]:
b.shape, b.ndim, b.size, b.nbytes
Out[15]:
In [16]:
c = np.arange(0.0, 10.0, 1.0) # Step size of 1.0
c
Out[16]:
In [17]:
e = np.linspace(0.0, 5.0, 11) # 11 points
e
Out[17]:
In [18]:
np.empty((4,4))
Out[18]:
In [19]:
np.zeros((3,3))
Out[19]:
In [20]:
np.ones((3,3))
Out[20]:
See also:
empty_like
, ones_like
, zeros_like
eye
, identity
Arrays have a dtype
attribute that encapsulates the data type of each element. It can be set:
dtype
argument to an array creation function
In [21]:
a = np.array([0,1,2,3])
In [22]:
a, a.dtype
Out[22]:
All array creation functions accept an optional dtype
argument:
In [23]:
b = np.zeros((2,2), dtype=np.complex64)
b
Out[23]:
In [24]:
c = np.arange(0, 10, 2, dtype=np.float)
c
Out[24]:
You can use the astype
method to create a copy of the array with a given dtype
:
In [25]:
d = c.astype(dtype=np.int)
d
Out[25]:
IPython's tab completion is useful for exploring the various available dtypes
:
In [26]:
np.float*?
The NumPy documentation on dtypes describes the many other ways of specifying dtypes.
Basic mathematical operations are elementwise for:
In [27]:
a = np.empty((3,3))
a.fill(0.1)
a
Out[27]:
In [28]:
b = np.ones((3,3))
b
Out[28]:
In [29]:
a+b
Out[29]:
In [30]:
b/a
Out[30]:
In [31]:
a**2
Out[31]:
In [32]:
np.pi*b
Out[32]:
Indexing and slicing provide an efficient way of getting the values in an array and modifying them.
In [33]:
a = np.random.rand(10,10)
The enable
function is part of vizarray
and enables a nice display of arrays:
In [34]:
vz.enable()
In [35]:
a
Out[35]:
In [36]:
a[0,0]
Out[36]:
In [37]:
a[-1,-1] == a[9,9]
Out[37]:
Extract the 0th column:
In [38]:
a[:,0]
Out[38]:
The last row:
In [39]:
a[-1,:]
Out[39]:
You can also slice ranges:
In [40]:
a[0:2,0:2]
Out[40]:
Assignment also works with slices:
In [41]:
a[0:5,0:5] = 1.0
In [42]:
a
Out[42]:
In [43]:
vz.disable()
Note how even though we assigned the value to the slice, the original array was changed. This clarifies that slices are views of the same data, not a copy.
In [44]:
ages = np.array([23,56,67,89,23,56,27,12,8,72])
genders = np.array(['m','m','f','f','m','f','m','m','m','f'])
In [45]:
ages > 30
Out[45]:
In [46]:
genders == 'm'
Out[46]:
In [47]:
(ages > 10) & (ages < 50)
Out[47]:
You can use a boolean array to index into the original or another array:
In [48]:
mask = (genders == 'f')
ages[mask]
Out[48]:
In [49]:
ages[ages>30]
Out[49]:
In [50]:
vz.enable()
In [51]:
a = np.random.rand(3,4)
In [52]:
a
Out[52]:
In [53]:
a.T
Out[53]:
In [54]:
a.reshape(2,6)
Out[54]:
In [55]:
a.reshape(6,2)
Out[55]:
In [56]:
a.ravel()
Out[56]:
In [57]:
vz.disable()
Universal function, or "ufuncs," are functions that take and return arrays or scalars:
In [58]:
vz.set_block_size(5)
vz.enable()
In [59]:
t = np.linspace(0.0, 4*np.pi, 100)
t
Out[59]:
In [60]:
np.sin(t)
Out[60]:
In [61]:
np.exp(t)
Out[61]:
In [62]:
vz.disable()
vz.set_block_size(30)
In [63]:
plt.plot(t, np.exp(-0.1*t)*np.sin(t))
Out[63]:
In [64]:
ages = np.array([23,56,67,89,23,56,27,12,8,72])
genders = np.array(['m','m','f','f','m','f','m','m','m','f'])
Numpy has a basic set of methods and function for computing basic quantities about data.
In [65]:
ages.min(), ages.max()
Out[65]:
In [66]:
ages.mean()
Out[66]:
In [67]:
ages.var(), ages.std()
Out[67]:
In [68]:
np.bincount(ages)
Out[68]:
The cumsum
and cumprod
methods compute cumulative sums and products:
In [69]:
ages.cumsum()
Out[69]:
In [70]:
ages.cumprod()
Out[70]:
Most of the functions and methods above take an axis
argument that will apply the action along a particular axis:
In [71]:
a = np.random.randint(0,10,(3,4))
a
Out[71]:
With axis=0
the action takes place along rows:
In [72]:
a.sum(axis=0)
Out[72]:
With axis=1
the action takes place along columns:
In [73]:
a.sum(axis=1)
Out[73]:
The unique
function is extremely useful in working with categorical data:
In [74]:
np.unique(genders)
Out[74]:
In [75]:
np.unique(genders, return_counts=True)
Out[75]:
The where function allows you to apply conditional logic to arrays. Here is a rough sketch of how it works:
def where(condition, if_false, if_true):
In [76]:
np.where(ages>30, 0, 1)
Out[76]:
The if_false
and if_true
values can be arrays themselves:
In [77]:
np.where(ages<30, 0, ages)
Out[77]:
NumPy has a a number of different function to reading and writing arrays to and from disk.
In [78]:
a = np.random.rand(10)
a
Out[78]:
In [79]:
np.save('array1', a)
In [80]:
ls
Using %pycat
to look at the file shows that it is binary:
In [81]:
%pycat array1.npy