Intro to NumPy

This notebook demonstrates the limitations of Python's built-in data types in executing some scientific analyses.

Source: https://campus.datacamp.com/courses/intro-to-python-for-data-science


First, let's create a dummy datasets of heights and weights of 5 imaginary people.


In [ ]:
#Create a list of heights and weights
height = [1.73, 1.68, 1.17, 1.89, 1.79]
weight = [65.4, 59.2, 63.6, 88.4, 68.7]
print height
print weight

If we assume body mass index (BMI) = weight / height ** 2, what would it take to compute BMI for our data?


In [ ]:
#[Attempt to] compute BMI from lists
bmi = weight/height ** 2

The above attempt raises an error because we can't do this with lists.
The only way around this is to iterate through each item in the lists...


In [ ]:
#Compute BMI from lists
bmi = []
for idx in range(len(height)):
    bmi.append(weight[idx] / height[idx] ** 2)
print bmi

However with NumPy, we have access to more data types, specifically arrays, that can speed through this process.


In [ ]:
#Import numpy, often done using the alias 'np'
import numpy as np

In [ ]:
#Convert the height and weight lists to arrays
arrHeight = np.array(height)
arrWeight = np.array(weight)

print arrHeight
print arrWeight

NumPy arrays allow us to do computations on entire collections...


In [ ]:
arrBMI = arrWeight / arrHeight ** 2
print arrBMI