Definition: A multidimensional array $A$ of dimension $d$ is a collection of individual data points indexed by $n$ numbers: i.e., an individual data point is denoted by

$$A[i_1, i_2, \dots, i_d]$$

where $i_l = 1,\dots, n_l$ is the index corresponding to the $l$ axis, and where $n_l$ is the number of data points stored along the $l$ axis.

The shape of an array is the tuple

$$(n_1,\dots,n_d).$$

Remark 1: The dimension above is different from the notion of dimension in linear algebra, which is the number of entries in the array (i.e. $n_1n_2\dots n_d$).

Remark 2: The dimension in the definition above emphasis the fact that a multidimensional array of dimension $d$ can be geometrically regarded as a $d$ dimensinal cube of numbers sitting in $\mathbb R^d$.

Nested list representation

1D array of numerical observations

Construct a list by enumerating its elements


In [5]:
a = [1, 3, 4, 5, 6]

Construct a list by comprehension


In [6]:
b = [x**2 for x in range(100)]

2D array (or matrix)


In [7]:
A = [ [1, 2, 3], [4, 5, 6] ]
A


Out[7]:
[[1, 2, 3], [4, 5, 6]]

In [8]:
A[0]


Out[8]:
[1, 2, 3]

In [9]:
A[0][1]


Out[9]:
2

3D arrays

Nest one more level of list

Shortcomings:

  • indexing
  • methods act on the first level
  • speed for big data
  • basic arithmetic and statistic operation missing

In [9]:

Numpy multidimensional arrays

Constructor


In [10]:
from numpy import array

To instanciate an numpy array, you may pass to its constructor a representation of your multidimensional array in terms of nested lists.


In [11]:
Arr = array(A)
print Arr
print '\n'
print A


[[1 2 3]
 [4 5 6]]


[[1, 2, 3], [4, 5, 6]]

Numpy construct 2D arrays from nested lists $[a_1,\dots, a_m]$, where $a_i=[a_{i1},\dots, a_{in}]$ by interpreting the list elements $a_i$ as the rows of the resulting 2D array: this is the row-major convention. (R will follow the column-major convention).

The array constructor has several optional argument customizing the behaviour of the array object that it creates. The most important for us is the dtype argument, which can take among many possible values the two following ones:

  • float64 for real numbers
  • int64 for integers

In [23]:
a = [[1,2,3], [4,5,6]]
A = array(a, dtype = "float64")
A


Out[23]:
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

Remarks: If one tries to assign (using the bracket opearator) a new value to an array component

A[i,j] = val

python will try to convert val to an object of the dtype class (whose name isstored as an attribute of the array object:

A.dtype

Operations and methods


In [28]:
b = [1.9, 2, 4, 9, 0, 4, 2, 9]

B = array(b)

B.shape


Out[28]:
(8,)

Caution: The reshape methode returns an new array with desired shape, but does not modify the original array!


In [35]:
C = B.reshape((2, 4))

In [28]:
Barr = Arr.reshape((1,6))
Barr


Out[28]:
array([[1, 2, 3, 4, 5, 6]])

In [25]:
Arr.reshape((3,2))


Out[25]:
array([[1, 2],
       [3, 4],
       [5, 6]])

In [34]:
C = array([x**2 for x in range(25)]).reshape((5,5))

In [33]:
C


Out[33]:
(5, 5)

Subsetting and indexing

Scalar indexing (or subsetting)


In [46]:
from numpy import linspace

In [51]:
A = linspace(0, 100, 100)
A.size

A.reshape((10,10))


Out[51]:
array([[   0.        ,    1.01010101,    2.02020202,    3.03030303,
           4.04040404,    5.05050505,    6.06060606,    7.07070707,
           8.08080808,    9.09090909],
       [  10.1010101 ,   11.11111111,   12.12121212,   13.13131313,
          14.14141414,   15.15151515,   16.16161616,   17.17171717,
          18.18181818,   19.19191919],
       [  20.2020202 ,   21.21212121,   22.22222222,   23.23232323,
          24.24242424,   25.25252525,   26.26262626,   27.27272727,
          28.28282828,   29.29292929],
       [  30.3030303 ,   31.31313131,   32.32323232,   33.33333333,
          34.34343434,   35.35353535,   36.36363636,   37.37373737,
          38.38383838,   39.39393939],
       [  40.4040404 ,   41.41414141,   42.42424242,   43.43434343,
          44.44444444,   45.45454545,   46.46464646,   47.47474747,
          48.48484848,   49.49494949],
       [  50.50505051,   51.51515152,   52.52525253,   53.53535354,
          54.54545455,   55.55555556,   56.56565657,   57.57575758,
          58.58585859,   59.5959596 ],
       [  60.60606061,   61.61616162,   62.62626263,   63.63636364,
          64.64646465,   65.65656566,   66.66666667,   67.67676768,
          68.68686869,   69.6969697 ],
       [  70.70707071,   71.71717172,   72.72727273,   73.73737374,
          74.74747475,   75.75757576,   76.76767677,   77.77777778,
          78.78787879,   79.7979798 ],
       [  80.80808081,   81.81818182,   82.82828283,   83.83838384,
          84.84848485,   85.85858586,   86.86868687,   87.87878788,
          88.88888889,   89.8989899 ],
       [  90.90909091,   91.91919192,   92.92929293,   93.93939394,
          94.94949495,   95.95959596,   96.96969697,   97.97979798,
          98.98989899,  100.        ]])

Logical indexing


In [45]:
A = array(range(25)).reshape((5, 5))
A


Out[45]:
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

In [48]:
Ind = A < 10

In [49]:
A[Ind]


Out[49]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Slices


In [50]:
A[2,3]


Out[50]:
13

In [51]:
A


Out[51]:
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

In [52]:
A[0:2, 1:4]


Out[52]:
array([[1, 2, 3],
       [6, 7, 8]])

In [53]:
A[0:5:2, 0:5:2]


Out[53]:
array([[ 0,  2,  4],
       [10, 12, 14],
       [20, 22, 24]])

Vectorized computation


In [54]:
A = array(range(25)).reshape((5,5))
B = array([x**2 for x in range(25)]).reshape((5,5))

Term by term sum


In [55]:
A + B


Out[55]:
array([[  0,   2,   6,  12,  20],
       [ 30,  42,  56,  72,  90],
       [110, 132, 156, 182, 210],
       [240, 272, 306, 342, 380],
       [420, 462, 506, 552, 600]])

Term by term multiplication


In [56]:
A * B


Out[56]:
array([[    0,     1,     8,    27,    64],
       [  125,   216,   343,   512,   729],
       [ 1000,  1331,  1728,  2197,  2744],
       [ 3375,  4096,  4913,  5832,  6859],
       [ 8000,  9261, 10648, 12167, 13824]])

Apply math functions on the matrix as whole


In [58]:
from numpy import cos, exp

In [62]:
cos(A) * exp(B) - A / 3.0


Out[62]:
array([[  1.00000000e+000,   1.13536061e+000,  -2.33875141e+001,
         -8.02299229e+003,  -5.80835079e+006],
       [  2.04250671e+010,   4.13951643e+015,   1.43795288e+021,
         -9.07214402e+026,  -1.37225084e+035],
       [ -2.25552256e+043,   1.56896799e+050,   2.91522907e+062,
          2.25729649e+073,   1.80969420e+084],
       [ -3.95269810e+097,  -1.44743303e+111,  -8.92680071e+124,
          3.39753879e+140,   5.96176056e+156],
       [  2.13078812e+173,  -1.82992151e+191,  -1.57947308e+210,
         -2.94016740e+229,   6.04186125e+249]])