NumPy Array Basics - Querying, Slicing, Combining, and Splitting Arrays


In [1]:
import sys
print(sys.version)
import numpy as np
print(np.__version__)


3.3.2 (v3.3.2:d047928ae3f6, May 13 2013, 13:52:24) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
1.9.2

In this video we’ll be covering querying, slicing, combining, and splitting arrays. Now this information is really important because it will consistently come up when we’re working in pandas. Overall it’s a pretty simple idea and it’s fairly declarative.


In [2]:
np.random.seed(10)

First let’s generate some random arrays. We’ll generate some that are 1 dimension, 2 dimensional, and 3 dimensional.


In [3]:
ar = np.arange(12)
ar


Out[3]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [4]:
ar2 = np.random.random_integers(12, size=12)
ar2


Out[4]:
array([10,  5,  1,  2, 12, 10,  1,  2, 11,  9, 10,  1])

In [5]:
ndim_ar = np.arange(12).reshape(3,4)
ndim_ar


Out[5]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [6]:
ndim_ar2 = np.random.random_integers(12, size=(3,4))
ndim_ar2


Out[6]:
array([[11,  9,  7,  5],
       [ 4,  1,  5, 12],
       [ 7,  9, 12, 11]])

In [7]:
ndim_ar3d = np.arange(8).reshape(2,2,2)
ndim_ar3d


Out[7]:
array([[[0, 1],
        [2, 3]],

       [[4, 5],
        [6, 7]]])

Querying 1 dimensional arrays is easy, we just perform the lookup like we would a regular array.


In [8]:
ar[5]


Out[8]:
5

In [9]:
ar[5:]


Out[9]:
array([ 5,  6,  7,  8,  9, 10, 11])

In [10]:
ar[1:6:2]


Out[10]:
array([1, 3, 5])

In [11]:
ar[-1:-6:-2]


Out[11]:
array([11,  9,  7])

In [12]:
ndim_ar


Out[12]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

Querying 2 dimensional arrays is a bit more interesting, we use commas to separate the axis.


In [13]:
ndim_ar[:,1:3]


Out[13]:
array([[ 1,  2],
       [ 5,  6],
       [ 9, 10]])

In [14]:
ndim_ar[1:3,:]


Out[14]:
array([[ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [15]:
ndim_ar[1:3,1:3]


Out[15]:
array([[ 5,  6],
       [ 9, 10]])

In [16]:
ndim_ar3d


Out[16]:
array([[[0, 1],
        [2, 3]],

       [[4, 5],
        [6, 7]]])

Things get even more interesting with 3+ dimensions, obviously it’s a lot to keep track in your head. but We just go dimension by dimension.

We’ll get the first dimension, then all the items in the second dimension, then everything beyond the first item in the 3rd dimension.


In [17]:
ndim_ar3d[0,:,1:]


Out[17]:
array([[1],
       [3]])

Now that we’ve got some experience querying, let’s go over combining different arrays.

Now let’s stack the first two arrays vertically, we’ll do that with vstack. We can do the same with multidimensional arrays.


In [18]:
np.vstack((ar,ar2))


Out[18]:
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
       [10,  5,  1,  2, 12, 10,  1,  2, 11,  9, 10,  1]])

In [19]:
np.vstack((ndim_ar, ndim_ar2))


Out[19]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [11,  9,  7,  5],
       [ 4,  1,  5, 12],
       [ 7,  9, 12, 11]])

Now you can probably guess what horizontal stacking is. That would be hstack.


In [20]:
np.hstack((ndim_ar, ndim_ar2))


Out[20]:
array([[ 0,  1,  2,  3, 11,  9,  7,  5],
       [ 4,  5,  6,  7,  4,  1,  5, 12],
       [ 8,  9, 10, 11,  7,  9, 12, 11]])

In [21]:
ar3 = np.hstack((ar,ar2))
ar3


Out[21]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 10,  5,  1,  2, 12,
       10,  1,  2, 11,  9, 10,  1])

In [22]:
ar


Out[22]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

Of course we aren’t limited to two arrays, we can stack as many as we like.

We can also use concatenate to join them together. We can specify the axis to do so.


In [23]:
np.concatenate((ar,ar2))


Out[23]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 10,  5,  1,  2, 12,
       10,  1,  2, 11,  9, 10,  1])

we can stack them dimensionally with dstack. Now we’ve got a 3 dimensional join of these two two dimensional arrays.


In [24]:
ndim_ar3 = np.concatenate((ndim_ar,ndim_ar2), axis=0)
ndim_ar3


Out[24]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [11,  9,  7,  5],
       [ 4,  1,  5, 12],
       [ 7,  9, 12, 11]])

In [25]:
np.concatenate((ndim_ar,ndim_ar2), axis=1)


Out[25]:
array([[ 0,  1,  2,  3, 11,  9,  7,  5],
       [ 4,  5,  6,  7,  4,  1,  5, 12],
       [ 8,  9, 10, 11,  7,  9, 12, 11]])

In [26]:
ndim_ar3d = np.dstack((ndim_ar,ndim_ar2))
ndim_ar3d


Out[26]:
array([[[ 0, 11],
        [ 1,  9],
        [ 2,  7],
        [ 3,  5]],

       [[ 4,  4],
        [ 5,  1],
        [ 6,  5],
        [ 7, 12]],

       [[ 8,  7],
        [ 9,  9],
        [10, 12],
        [11, 11]]])

We can split them back with the dsplit command. This gives us back our original arrays.


In [27]:
ndim_ar3d_split = np.dsplit(ndim_ar3d, 2)
ndim_ar3d_split


Out[27]:
[array([[[ 0],
         [ 1],
         [ 2],
         [ 3]],
 
        [[ 4],
         [ 5],
         [ 6],
         [ 7]],
 
        [[ 8],
         [ 9],
         [10],
         [11]]]), array([[[11],
         [ 9],
         [ 7],
         [ 5]],
 
        [[ 4],
         [ 1],
         [ 5],
         [12]],
 
        [[ 7],
         [ 9],
         [12],
         [11]]])]

In [28]:
ndim_ar3d_split[0].flatten()


Out[28]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [29]:
ndim_ar3d_split[1].flatten()


Out[29]:
array([11,  9,  7,  5,  4,  1,  5, 12,  7,  9, 12, 11])

In [30]:
ar


Out[30]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

We can do the same with hsplit and vsplit.


In [31]:
np.hsplit(ar3,2)


Out[31]:
[array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11]),
 array([10,  5,  1,  2, 12, 10,  1,  2, 11,  9, 10,  1])]

In [32]:
ndim_ar3


Out[32]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [11,  9,  7,  5],
       [ 4,  1,  5, 12],
       [ 7,  9, 12, 11]])

In [33]:
np.vsplit(ndim_ar3,2)


Out[33]:
[array([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]]), array([[11,  9,  7,  5],
        [ 4,  1,  5, 12],
        [ 7,  9, 12, 11]])]

Now that's all that I wanted to cover. I will mention that there isa dedicated matrix formation but going over this is outside the scope of this tutorial.

At this point we’ve covered a lot of numpy. Now I’m sure you’re thinking this isn’t quite applied but a lot of this functionality will be embedded into pandas. so it’s great to review. With these videos we’ve covered a lot of what you’ll find yourself using in numpy but don’t be afraid to dive into the documentation yourself. There are some things that I haven’t covered like specific matrix types, linear algebra but these are outside the scope of this introduction. If you’ve got any questions please feel free to ask in the discussion.