In mathematics you don't understand things. You just get used to them.


-John von Neumann


What is Numpy?

Numpy (pronounced "Num-Pie") is a C-based extension to the Python language that provides support for large, efficient numerical arrays, vectorized operations, and linear algebra support.

Because Python is an interpreted language, it is relatively slow in comparison to C++, C#, and Java. In order to speed up calculations with large datasets, Numpy was created. It has since become an integral part of Python data analysis tools like matplotlib, Scipy, and Pandas.


What Will We Use Numpy For?

Odds are you won't be using Numpy for anything at first. You will go about your life blissfully unaware of the fact that Numpy is doing all the grunt work on your behalf. You will on occasion be forced to use Numpy datatypes and Numpy ndarrays for machine learning and analysis, and may be choose to use some of the more convienient Numpy functions below.

Numpy Basics


In [ ]:
# Import numpy
import numpy as np

# Create numpy array
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# print datatype (add end arg because we don't want newline)
print('Datatype: ', end='')
print(arr.dtype)

# print shape 1-dim. 10 rows x 1 dimension
print('Initial shape: ', end='')
print(arr.shape)

# reshape numpy array to two rows x 5 dimensions
arr = arr.reshape(2, 5)
print('New shape: ', end='')
print(arr.shape)

arr

In [ ]:
# Can turn array to list
as_list = arr.tolist()

# Run max convenience function
print('Max: ', end='')
print(arr.max())

# Run mean
print('Mean: ', end='')
print(arr.mean())

# Run standard deviation
print('Stdev: ', end='')
print(arr.std())

# Run cumulative sum
print('Cumulative Sum: ', end='')
print(arr.cumsum())

# Transpose
arr.T

Useful Functions and Methods

np.linspace()

Linspace returns evenly spaced numbers given x number of intervals.

>>> np.linspace(0, 5, 9)

array([ 0.   ,  0.625,  1.25 ,  1.875,  2.5  ,  3.125,  3.75 ,  4.375,  5.   ])

np.arange()

Returns evenly spaced values given an interval/step of size x.

>>> np.arange(0, 5, .5)

array([ 0. ,  0.5,  1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ,  4.5])

polyfit()

Provides a linear or polynomial least squares regression. Returns highest coefficient to lowest, and then provides residuals.

>>> x = np.array([0.0, 1.0, 2.0, 3.0,  4.0,  5.0])
>>> y = np.array([0.0, 0.8, 0.9, 0.1, -0.8, -1.0])
>>> z = np.polyfit(x, y, 3)
>>> z

array([ 0.08703704, -0.81349206,  1.69312169, -0.03968254])


np.vectorize()

Turn Python function into numpy ufunc.

>>>def myfunc(a, b):
...    "Return a-b if a>b, otherwise return a+b"
...    if a > b:
...        return a - b
...    else:
...        return a + b
...
>>> vfunc = np.vectorize(myfunc)
>>> vfunc([1, 2, 3, 4], 2)

array([3, 4, 1, 2])

np.concatenate()

Join a sequence of arrays along an existing axis.

>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[5, 6]])
>>> np.concatenate((a, b), axis=0)

array([[1, 2],
       [3, 4],
       [5, 6]])

np.argsort()

Perform an operation on an array, and have an array with the sort order returned to you. To demonstrate, in this example, you would the first element in the new sort would be 'a' (index 0), the second would be 'c' (index 2), the third would be 'y' (index 3), and the fourth would be 'z' (index 1).

>>>np.argsort(['a', 'z', 'c', 'y'])

array([0, 2, 3, 1])

np.random.normal()

Create a series of normally distributed random points based on a) the distribution center, b) the standard deviation, and c) the number of points required. This is useful for simulating data for models.

>>>np.random.normal(75, 10, 5)

array([ 87.65815149,  80.77955407,  75.26875019,  72.49581822,  63.34919635])

Additional Learing Resources


Next Up: Scipy