2D Numpy Arrays


Earlier in this week, we created only 1-dimensional Numpy arrays. But Pyhon can do a lot more, it can handle high dimensional or multi dimensional arrays as well.

If we try to check the type of Numpy arrays that was created earlier, we would see that it's of type:

`numpy.ndarray`

  • numpy. tells us, it was defined in the numpy package.

  • ndarray - stands for n-dimensional array.

Creating n-dimensional array


To create an n-dimensional numpy array, (say) a two-dimensional array we follow this template:

variable = np.arary( [ [elem1, elem2, ... , elem(n-j)], [elem1, elem2, ... , elem(n-k)] ] )

In [19]:
import numpy as np

# e.e. create a 2d array
np_2d = np.array([ [1, 2, 3, 4, 5, 6, 7 ], 
                  [ 8, 9,10, 11, 12, 13, 14] ])

# print np_2d
print( np_2d )
print( type( np_2d ) )

""" 

+ The output is of rectangualar data structure.

+ Each sublist ~ "row" in the 2D arary.

"""


[[ 1  2  3  4  5  6  7]
 [ 8  9 10 11 12 13 14]]
<class 'numpy.ndarray'>
Out[19]:
' \n\n+ The output is of rectangualar data structure.\n\n+ Each sublist ~ "row" in the 2D arary.\n\n'

Shape of n-dimensional array


We can also look the type of structure our n-dimensional array object is, by calling the shape on the array itself.

< array_name > . shape

  • shape is a attribute which gives us more information about the structure of the data structure.

Note:

  • All rules of numpy arrays also apply to multi-dimensional arrays as well.
  • i.e. an array can only contain a single type.
  • If we change a (say) float elem to string, all array elements will be coerced to "strings".

    • i.e. we end up with the homogenous array!
  • Can think of them as improved list-of-list.

In [10]:
# check the shape of the previously created numpy array
np_2d.shape


Out[10]:
(2, 7)

Subsetting on n-dimensional arrays


As discussed earlier, Numpy n-dimensional arrays share all rules as that of Python array and some of it's own as well.

We can subset them as well, but this time it's a little bit different, here's how:

N-dimensional_array_name[ < row index of desired elem > ] [ < coloumn index of desired elem > ]

Alternative,

N-dimensional_array_name[ < row index> , < col index > ]


In [18]:
# subset a row and coloumn

"""
+ Colon before comma specifies both rows.
+ Then follows the usual subsetting of any list.
+ Since we only wan't the 2nd and 3rd coloumn, hence
  we put the indices 1 to 3.
  
  - Remember, the 3rd index is not computed.
  
+ The "intersection" gives us two rows and colomns.

"""

print("Two rows and two coloumns: \n" + str(np_2d[:, 1:3]))

# Select a row completely
"""
Here's are n-dimensional array:

col:   0  1  2  3  4  5  6
       ^  ^  ^  ^  ^  ^  ^       
   [ [ 1  2  3  4  5  6  7 ]    ----- row 0
     [ 8  9 10 11 12 13 14 ] ]  ----- row 1

i.e we want the whole 1st row 2( index 0 )
"""
print("\nThe 1st row at index 0 is: " + str( np_2d[ 0, :]) )  # second row and coloumns


Two rows and two coloumns: 
[[ 2  3]
 [ 9 10]]

The 1st row at index 0 is: [1 2 3 4 5 6 7]

Exercise:


RQ1: What charaterizes multi-dimensional Numpy arrays?

Ans: You can create a 2D Numpy array from a regular list of lists.


RQ2: You created the following 2D Numpy array, x:

    import numpy as np
    x = np.array(["a", "b", "c", "d"],
                 ["e", "f", "g", "h"] ] )

Which Python command do you use to select the string "g" from x?

Ans: x[1, 2]


RQ3: What does the resulting array z contain after executing the following lines of Python code?

    import numpy as np
    x = np.array([[1, 2, 3],
                  [1, 2, 3]])
    y = np.array([[1, 1, 1],
                  [1, 2, 3]])
    z = x - y

Ans: Since the arithematic operations on a two even on a 1D array is done element wise, same goes for ND arrays, i.e. rules are all the same.

    import numpy as np
    x = np.array([[1, 2, 3],
                  [1, 2, 3]])
    y = np.array([[1, 1, 1],
                  [1, 2, 3]])
    z = x - y

Lab: 2D Numpy Array


Objective:

  • Creating 2D arrays.

  • Perform some analyses on them.


Lab Exercises:

  • Your First 2D Numpy Arrays.
  • Baseball data in 2D form.
  • Subsetting 2D Numpy Arrays.
  • 2D Arithematic

1. Your First 2D Numpy Arrays -- 100xp, status: earned.


Preface: Before working on the actual MLB data, let's try to create a 2D Numpy array from a small list of lists.

In this exercise, baseball is a list of lists. The main list contains 4 elements. Each of these elements is a list containing the height and the weight of 4 baseball players, in this order.

Instructions:

  • Use np.array() to create a 2D Numpy array from baseball. Name it np_baseball.

  • Print out the type of np_baseball.

  • Print out the shape attribute of np_baseball.

    • Use np_baseball.shape.


In [21]:
# Create baseball, a list of lists
baseball = [[180, 78.4],
            [215, 102.7],
            [210, 98.5],
            [188, 75.2]]

# Import numpy
import numpy as np

# Create a 2D Numpy array from baseball: np_baseball
np_baseball = np.array(baseball)

# Print out the type of np_baseball
print( type( np_baseball ) )

# Print out the shape of np_baseball
print( np_baseball. shape )


<class 'numpy.ndarray'>
(4, 2)

2. Baseball data in 2D form

Preface:

You have another look at the MLB data and realize that it makes more sense to restructure all this information in a 2D Numpy array. This array should have 1015 rows, corresponding to the 1015 baseball players you have information on, and 2 columns (for height and weight).

The MLB was, again, very helpful and passed you the data in a different structure, a Python list of lists. In this list of lists, each sublist represents the height and weight of a single baseball player. The name of this embedded list is baseball.

Store the data as a 2D array to unlock Numpy's extra functionality.

Instructions:

  • Use np.array() to create a 2D Numpy array from baseball. Name it np_baseball.
  • Print out the shape attribute of np_baseball.


In [23]:
# baseball is available as a regular list of lists

# Import numpy package
import numpy as np

# Create a 2D Numpy array from baseball: np_baseball
np_baseball = np.array( baseball )

# Print out the shape of np_baseball
print( np_baseball.shape )


[[ 180.    78.4]
 [ 215.   102.7]
 [ 210.    98.5]
 [ 188.    75.2]]
(4, 2)

3. Subsetting 2D Numpy Arrays:

Preface:

If your 2D Numpy array has a regular structure, i.e. each row and column has a fixed number of values, complicated ways of subsetting become very easy. Have a look at the code below where the elements "a" and "c" are extracted from a list of lists.

    # regular list of lists
    x = [["a", "b"], ["c", "d"]]
    [x[0][0], x[1][0]]

    # numpy
    import numpy as np
    np_x = np.array(x)
    np_x[:,0]

Thus for 2D Numpy arrays:

  • Indexes before the comma refer to rows.
  • While those after the comma, refered to as coloumns.
  • The : is for slicing.

Instructions:

  • Print out the 50th row of np_baseball.
  • Make a new variable, np_weight, containing the entire second column of np_baseball.
  • Select the height (first column) of the 124th baseball player in np_baseball and print it out.


In [ ]:
# baseball is available as a regular list of lists

# Import numpy package
import numpy as np

# Create np_baseball (2 cols)
np_baseball = np.array(baseball)

# Print out the 50th row of np_baseball
#Ans: print( np_baseball[49,:])

# Select the entire second column of np_baseball: np_weight
#Ans: np_weight = np_baseball[1,:]

# Print out height of 124th player
#Ans: print( np_baseball[] )

2D Arithematic

Instructions:

  • You managed to get hold on the changes in weight, height and age of all baseball players. It is available as a 2D Numpy array, update.

    • Add np_baseball and update and print out the result.
  • You want to convert the units of height and weight.

    • As a first step, create a numpy array with three values: 0.0254, 0.453592 and 1. Name this arary conversion.
  • Multiply np_baseball with conversion and print out the result.


In [25]:
# baseball is available as a regular list of lists
# update is available as 2D Numpy array

# Import numpy package
import numpy as np

# Create np_baseball (3 cols)
np_baseball = np.array(baseball)

# Print out addition of np_baseball and update
# Ans : print( np_baseball + update)

# Create Numpy array: conversion
# Ans: conversion = np.array([0.0254, 0.453592, 1])

# Print out product of np_baseball and conversion
# Ans: print( np_baseball * conversion )