NumPy Array Basics - Multi-dimensional Arrays


In [1]:
import sys
print(sys.version)
import numpy as np
print(np.__version__)


3.3.2 (v3.3.2:d047928ae3f6, May 13 2013, 13:52:24) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
1.9.2

In [2]:
npa = np.arange(25)

In [3]:
npa


Out[3]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24])

We learned in the last video how to generate arrays, now let’s generate multidimensional arrays. These are, as you might guess, arrays with multiple dimensions.

We can create these by reshaping arrays. One of the simplest ways is to just reshape an array with the reshape command. That gives us an x by x array.


In [4]:
npa.reshape((5,5))


Out[4]:
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

We can also use the zeros commands.


In [5]:
npa2 = np.zeros((5,5))

In [6]:
npa2


Out[6]:
array([[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])

To get the size of the array we can use the size method.


In [7]:
npa2.size


Out[7]:
25

To get the shape of the array we can use the shape method.


In [8]:
npa2.shape


Out[8]:
(5, 5)

to get the number of dimension we use the ndim method.


In [9]:
npa2.ndim


Out[9]:
2

We can create as many dimensions as we need to, here's 3 dimensions.


In [10]:
np.arange(8).reshape(2,2,2)


Out[10]:
array([[[0, 1],
        [2, 3]],

       [[4, 5],
        [6, 7]]])

Here's 4 dimensions


In [11]:
np.zeros((4,4,4,4))


Out[11]:
array([[[[ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.]],

        [[ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.]],

        [[ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.]],

        [[ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.]]],


       [[[ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.]],

        [[ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.]],

        [[ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.]],

        [[ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.]]],


       [[[ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.]],

        [[ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.]],

        [[ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.]],

        [[ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.]]],


       [[[ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.]],

        [[ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.]],

        [[ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.]],

        [[ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.],
         [ 0.,  0.,  0.,  0.]]]])

In [12]:
np.arange(16).reshape(2,2,2,2)


Out[12]:
array([[[[ 0,  1],
         [ 2,  3]],

        [[ 4,  5],
         [ 6,  7]]],


       [[[ 8,  9],
         [10, 11]],

        [[12, 13],
         [14, 15]]]])

For the most part we’ll be working with 2 dimensions.


In [13]:
npa2


Out[13]:
array([[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])

In [14]:
npa


Out[14]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24])

Now we can really see the power of vectorization, let’s create two random 2 dimensional arrays.

Now I’m going to set the random seed. This basically makes your random number generation reproducible.


In [15]:
np.random.seed(10)

let’s try some random number generation and then we can perform some matrix comparisons.


In [16]:
npa2 = np.random.random_integers(1,10,25).reshape(5,5)
npa2


Out[16]:
array([[10,  5,  1,  2, 10],
       [ 1,  2,  9, 10,  1],
       [ 9,  7,  5,  4,  1],
       [ 5,  7,  9,  2,  9],
       [ 5,  2,  4,  7,  6]])

In [17]:
npa3 = np.random.random_integers(1,10,25).reshape(5,5)
npa3


Out[17]:
array([[ 4, 10,  7, 10,  2],
       [10,  5,  3,  7,  8],
       [ 9,  9, 10,  3,  1],
       [ 7,  8,  9,  2,  8],
       [ 2,  5,  1,  9,  6]])

We can do this comparison with greater than or equal to.


In [18]:
npa2 > npa3


Out[18]:
array([[ True, False, False, False,  True],
       [False, False,  True,  True, False],
       [False, False, False,  True, False],
       [False, False, False, False,  True],
       [ True, False,  True, False, False]], dtype=bool)

We can also sum up the values where there are equal.


In [19]:
(npa2 == npa3).sum()


Out[19]:
5

Or we can sum where one is greater than or equal to in the columns.

We can do that with sum or we could get the total by summing that array.


In [20]:
sum(npa2 >= npa3)


Out[20]:
array([3, 0, 3, 3, 4])

In [21]:
sum(npa2 >= npa3).sum()


Out[21]:
13

We can also get the minimums and maximums like we got with single dimensional arrays or for specific dimensions.


In [22]:
npa2.min()


Out[22]:
1

In [23]:
npa2.min(axis=1)


Out[23]:
array([1, 1, 1, 2, 2])

In [24]:
npa2.max(axis=0)


Out[24]:
array([10,  7,  9, 10, 10])

There are plenty of other functions that numpy as. we can transpose with .T property or transpose method.


In [25]:
npa2.T


Out[25]:
array([[10,  1,  9,  5,  5],
       [ 5,  2,  7,  7,  2],
       [ 1,  9,  5,  9,  4],
       [ 2, 10,  4,  2,  7],
       [10,  1,  1,  9,  6]])

In [26]:
npa2.transpose()


Out[26]:
array([[10,  1,  9,  5,  5],
       [ 5,  2,  7,  7,  2],
       [ 1,  9,  5,  9,  4],
       [ 2, 10,  4,  2,  7],
       [10,  1,  1,  9,  6]])

In [27]:
npa2.T == npa2.transpose()


Out[27]:
array([[ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True]], dtype=bool)

We can also multiply this transposition by itself for example. This will be an item by item multiplication


In [28]:
npa2.T * npa2


Out[28]:
array([[100,   5,   9,  10,  50],
       [  5,   4,  63,  70,   2],
       [  9,  63,  25,  36,   4],
       [ 10,  70,  36,   4,  63],
       [ 50,   2,   4,  63,  36]])

We can flatten these arrays in several different ways.

we can flatten it, which returns a new array that we can change


In [29]:
np2 = npa2.flatten()
np2


Out[29]:
array([10,  5,  1,  2, 10,  1,  2,  9, 10,  1,  9,  7,  5,  4,  1,  5,  7,
        9,  2,  9,  5,  2,  4,  7,  6])

or we can ravel it which ends up returning the original array in a flattened format.


In [30]:
r = npa2.ravel()
r


Out[30]:
array([10,  5,  1,  2, 10,  1,  2,  9, 10,  1,  9,  7,  5,  4,  1,  5,  7,
        9,  2,  9,  5,  2,  4,  7,  6])

In [31]:
np2[0] = 25

In [32]:
npa2


Out[32]:
array([[10,  5,  1,  2, 10],
       [ 1,  2,  9, 10,  1],
       [ 9,  7,  5,  4,  1],
       [ 5,  7,  9,  2,  9],
       [ 5,  2,  4,  7,  6]])

With ravel if we change a value in the raveled array that will change it in the original n-dimensional array as well


In [33]:
r[0] = 25

In [34]:
npa2


Out[34]:
array([[25,  5,  1,  2, 10],
       [ 1,  2,  9, 10,  1],
       [ 9,  7,  5,  4,  1],
       [ 5,  7,  9,  2,  9],
       [ 5,  2,  4,  7,  6]])

Now we can use some other helpful functions like cumsum and comprod to get the cumulative products and sums. This works for any dimensional array.


In [35]:
npa2.cumsum()


Out[35]:
array([ 25,  30,  31,  33,  43,  44,  46,  55,  65,  66,  75,  82,  87,
        91,  92,  97, 104, 113, 115, 124, 129, 131, 135, 142, 148])

In [36]:
npa2.cumprod()


Out[36]:
array([              25,              125,              125,
                    250,             2500,             2500,
                   5000,            45000,           450000,
                 450000,          4050000,         28350000,
              141750000,        567000000,        567000000,
             2835000000,      19845000000,     178605000000,
           357210000000,    3214890000000,   16074450000000,
         32148900000000,  128595600000000,  900169200000000,
       5401015200000000])

That really covers a lot of the basic functions you’re going to use or need when working with pandas but it is worth being aware that numpy is a very deep library that does a lot more things that I've covered here. I wanted to cover these basics because they're going to come up when we're working with pandas. I'm sure this has felt fairly academic at this point but I can promise you that it provides a valuable foundation to pandas.

need. If there’s anything you have questions about feel free to ask along the side and I can create some appendix videos to help you along.