List Recap


  • Powerful

  • COllection of values

  • Hold different types

  • Change, add, remove


The Problem:

But there's one feature is missing, when analyzing data, the need for Data Science is to:

  • Perform mathematical operations over collections of values.
  • Speed

Unfortunatly list don't support both of these issues and here's why:

e.g:


In [2]:
# some random heights of the family
height = [1.75, 1.65, 1.71, 1.89, 1.79]

# some random weights of the family
weight = [65.4, 59.2, 63.6, 88.4, 68.7]

# Now if we go to calculate BMI
weight / height ** 2


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-ee8cd1551509> in <module>()
      6 
      7 # Now if we go to calculate BMI
----> 8 weight / height ** 2

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

Solution : Numpy


  • Numric Python or simply "numpy".
  • An alternative to python list: Numpy Array.
  • calculation is performed over entire arrays( element wise )
  • Easy and Fast.

Importing Numpy

Syntax: import numpy


In [5]:
import numpy as np # selective import

In [16]:
# Convet the followoing list to numpy arrays
height = [1.75, 1.65, 1.71, 1.89, 1.79]

weight = [65.4, 59.2, 63.6, 88.4, 68.7]

np_height = np.array( height )
np_weight = np.array( weight )

# Let's confirm this as numpy arrray
type(np_height)
type(np_weight)


Out[16]:
numpy.ndarray

In [19]:
bmi = np_weight / np_height ** 2
bmi


Out[19]:
array([ 21.35510204,  21.74471993,  21.75028214,  24.7473475 ,  21.44127836])

Note:


  • Numpy assumes that your array contain elements of same type.
  • If the arary contains elements of differnet types, then resulitng numpy array will converted to type string.
  • Numpy array should'nt be missclassified as an array, technically it a "new data type", just like int, string, float or boolean, and:

    • Comes packaged with it's own methods.

    • i.e. It can behave differently than you'd expect.



In [20]:
# A numpy arary with different types
np.array( [1, 2.5, "are different", True ] )


Out[20]:
array(['1', '2.5', 'are different', 'True'], 
      dtype='<U32')

Numpy : remarks



In [23]:
# a simple python list
py_list = [ 1, 2, 3 ]

# a numpy array
numpy_array = np.array([1, 2, 3])

""" 
remarks:

+ If we add py_list with itself, it will generate a list of
  new length.
  
+ Whereas, if we add the numpy_array, it would perform,
  "element wise addition"
  
Warning: 

Again be careful while using different python types in a numpy arary.
  
"""
py_list + py_list


Out[23]:
[1, 2, 3, 1, 2, 3]

In [24]:
numpy_array + numpy_array


Out[24]:
array([2, 4, 6])

Numpy Subsetting


All the subsetting operation on a list, also get's performed on Numpy arrays, except for a few minor change, we look them now.


In [41]:
bmi

# get the fourth elemnt from the numpy array "bmi"
print("The bmi of the fourth element is: " + str( bmi[3] ) )

# slice and dice
print("\nThe bmi's from 2nd to 3rd element is: " + str( bmi[2 : 4] ) )

""" 

    Specifically for Numpy, there's another way to do list
    subsetting via "booleans", here's how.

"""

print("\nList of bmi have bmi larger than 23: " + str( bmi > 23 ) )

# Next, use this boolean arary to do subsetting

print("\nThe element with the largest bmi is: " + str(bmi[ bmi > 23 ]) )


The bmi of the fourth element is: 24.7473474987

The bmi's from 2nd to 3rd element is: [ 21.75028214  24.7473475 ]

List of bmi have bmi larger than 23: [False False False  True False]

The element with the largest bmi is: [ 24.7473475]

Exercise :


RQ1: Which Numpy function do you use to create an array?

Ans: array()


RQ2: Which two statements describe the advantage of Numpy Package over regular Python Lists?

Ans:

  • The Numpy Package provides the array, a data type that can be used to do element-wise calculations.
  • Because Numpy arrays can only hold element of a single type,

    • calculations on Numpy arrays can be carried out way faster than regular Python lists.

RQ3: What is the resulting Numpy array z after executing the following lines of code?

   import numpy as np
   x = np.array([1, 2, 3])
   y = np.array([3, 2, 1])
   z = x + y

Ans: array( [4, 4, 4] )


RQ4: What happens when you put an integer, a Boolean, and a string in the same Numpy array using the array() function?

Ans: An array element is converted to string.

Lab : Numpy


Objective:

  • Parctice with Numpy

  • Perform Calculations with it.

  • Understand subtle difference b/w Numpy arrays and Python list.


List of lab exercises:


  • Your first Numpy Arary -- 100xp, status : earned
  • Baseball's player's height -- 100xp, status : earned
  • Lightweight baseball players -- 100xp, status : earned
  • Numpy Side Effects -- 50xp, status : earned
  • Subsetting Numpy Arrays -- 100xp, status : earned

1. Your First Numpy array



In [43]:
"""
Instructions: 

    + Import the "numpy" package as "np", so that you can refer to "numpy" with "np".
    
    + Use "np.array()" to create a Numpy array from "baseball". Name this array "np_baseball".
    
    + Print out the "type of np_baseball" to check that you got it right.

"""
# Create list baseball 
baseball = [180, 215, 210, 210, 188, 176, 209, 200]

# Import the numpy package as np
import numpy as np

# Create a Numpy array from baseball: np_baseball
np_baseball = np.array(baseball)
print(np_baseball)

# Print out type of np_baseball
print(type( np_baseball) )


[180 215 210 210 188 176 209 200]
<class 'numpy.ndarray'>

2. Baseball player's height


Preface:

You are a huge baseball fan. You decide to call the MLB (Major League Baseball) and ask around for some more statistics on the height of the main players. They pass along data on more than a thousand players, which is stored as a regular Python list: height. The height is expressed in inches. Can you make a Numpy array out of it and convert the units to centimeters?



In [46]:
"""
Instructions:

    + Create a Numpy array from height. Name this new array np_height.

    + Print "np_height".
    
    + Multiply "np_height" with 0.0254 to convert all height measurements from inches to meters. 
    
        - Store the new values in a new array, "np_height_m".
        
    + Print out np_height_m and check if the output makes sense.

"""

# height is available as a regular list
# http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_MLB_HeightsWeights#References

# Import numpy
import numpy as np

# Create a Numpy array from height: np_height
np_height = np.array( height )

# Print out np_height
print("The Height of the baseball players are: " + str( np_height ) )

# Convert np_height to m: np_height_m
np_height_m = np_height * 0.0254   # a inch is 0.0245 meters

# Print np_height_m
print("\nThe Height of the baseball players in meters are: " + str( np_height_m ) )


The Height of the baseball players are: [ 1.75  1.65  1.71  1.89  1.79]

The Height of the baseball players in meters are: [ 0.04445   0.04191   0.043434  0.048006  0.045466]

3. Baseball player's BMI:


Preface:

The MLB also offers to let you analyze their weight data. Again, both are available as regular Python lists: height and weight. height is in inches and weight is in pounds.

It's now possible to calculate the BMI of each baseball player. Python code to convert height to a Numpy array with the correct units is already available in the workspace. Follow the instructions step by step and finish the game!



In [49]:
"""
Instructions:

    + Create a Numpy array from the weight list with the correct units.
    
        -  Multiply by 0.453592 to go from pounds to kilograms. 
        
        - Store the resulting Numpy array as np_weight_kg.
        
    + Use np_height_m and np_weight_kg to calculate the BMI of each player. 
    
        - Use the following equation: 
        
          BMI = weight( kg ) / height( m )
          
        - Save the resulting numpy array as "bmi".
        
    + Print out "bmi".
    
"""
# height and weight are available as a regular lists

# Import numpy
import numpy as np

# Create array from height with correct units: np_height_m
np_height_m = np.array(height) * 0.0254

# Create array from weight with correct units: np_weight_kg 
np_weight_kg = np.array( weight ) * 0.453592

# Calculate the BMI: bmi
bmi = np_weight_kg / np_height_m ** 2

# Print out bmi
print("\nThe Bmi of all the baseball players are: " + str( bmi ) )


The Bmi of all the baseball players are: [ 15014.11036781  15288.0386275   15291.94924605  17399.09301044
  15074.69826837]

4. Leightweight baseball players:


To subset both regular Python lists and Numpy arrays, you can use square brackets:

    x = [4 , 9 , 6, 3, 1]
    x[1]
    import numpy as np
    y = np.array(x)
    y[1]

For Numpy specifically, you can also use boolean Numpy arrays:

    high = y > 5
    y[high]


In [71]:
""" 
Instructions:

    + Create a boolean Numpy array:
    
        - the element of the array should be "True",
        
        - If the corresponding baseball player's BMI is below 21.
        
        -  You can use the "<" operator for this
        
        - Name the array "light", Print the array "light".
        
        
    + Print out a Numpy array with the BMIs of all baseball players whose BMI is below 21. 
    
        - Use "light" inside square brackets to do a selection on the bmi array.
"""
# height and weight are available as a regular lists

# Import numpy
import numpy as np

# Calculate the BMI: bmi
np_height_m = np.array(height) * 0.0254
np_weight_kg = np.array(weight) * 0.453592
bmi = np_weight_kg / (np_height_m ** 2)

# Create the light array
light = np.array( bmi < 21 )

# Print out light
print("\nLightweight baseball players" + str( light ) )

# Print out BMIs of all baseball players whose BMI is below 21
print(bmi[ light < 21 ])


Lightweight baseball players[False False False False False]
[ 15014.11036781  15288.0386275   15291.94924605  17399.09301044
  15074.69826837]

5. Numpy Side Effect:


Preface:

  • Numpy arrays cannot contain elements with different types.
  • If you try to build such a list, some of the elments' types are changed to end up with a homogenous list.
    • This is known as type coercion.
  • Second, the typical arithmetic operators,

    such as +, -, * and / have a different meaning for regular Python lists and Numpy arrays.


Have a look at this line:

```In [1]: np.array([True, 1, 2]) + np.array([3, 4, False])
   Out[1]: array([4, 5, 2])```

Here, the + operator is summing Numpy arrays element wise, as a result, the True element ~ 1 as integer, get's added to 3, a int to give off 4, only to be later converted to a string. Same happens with all the othere two numbers.

Which code chunk builds the exact same Python data structure?

Ans: np.array([4, 3, 0]) + np.array([0, 2, 2]).


6. Subsetting Numpy Arrays:


Luckily, subsetting the two, i.e. "Python list" and "Numpy arrays" behave similar while subsetting, wohoooo!


In [72]:
"""
Instructions:

    + Subset np_weight: print out the element at index 50.
    
    + Print out a sub-array of np_height: It contains the elements at index 100 up to and including index 110
"""

# height and weight are available as a regular lists

# Import numpy
import numpy as np

# Store weight and height lists as numpy arrays
np_weight = np.array(weight)
np_height = np.array(height)

# Print out the weight at index 50
# Ans: print(np_weight[50])

# Print out sub-array of np_height: index 100 up to and including index 110
# Ans: print(np_height[100 : 111])


---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-72-2baeb4a975bb> in <module>()
     17 
     18 # Print out the weight at index 50
---> 19 print(np_weight[50])
     20 
     21 # Print out sub-array of np_height: index 100 up to and including index 110

IndexError: index 50 is out of bounds for axis 0 with size 5