In [1]:
import numpy as np

Numpy

Creation

From lists


In [2]:
ary = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
ary


Out[2]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

With linspace, always useful for plotting data


In [3]:
np.linspace(0, 1, 101)


Out[3]:
array([ 0.  ,  0.01,  0.02,  0.03,  0.04,  0.05,  0.06,  0.07,  0.08,
        0.09,  0.1 ,  0.11,  0.12,  0.13,  0.14,  0.15,  0.16,  0.17,
        0.18,  0.19,  0.2 ,  0.21,  0.22,  0.23,  0.24,  0.25,  0.26,
        0.27,  0.28,  0.29,  0.3 ,  0.31,  0.32,  0.33,  0.34,  0.35,
        0.36,  0.37,  0.38,  0.39,  0.4 ,  0.41,  0.42,  0.43,  0.44,
        0.45,  0.46,  0.47,  0.48,  0.49,  0.5 ,  0.51,  0.52,  0.53,
        0.54,  0.55,  0.56,  0.57,  0.58,  0.59,  0.6 ,  0.61,  0.62,
        0.63,  0.64,  0.65,  0.66,  0.67,  0.68,  0.69,  0.7 ,  0.71,
        0.72,  0.73,  0.74,  0.75,  0.76,  0.77,  0.78,  0.79,  0.8 ,
        0.81,  0.82,  0.83,  0.84,  0.85,  0.86,  0.87,  0.88,  0.89,
        0.9 ,  0.91,  0.92,  0.93,  0.94,  0.95,  0.96,  0.97,  0.98,
        0.99,  1.  ])

If you prefer the stepsize


In [4]:
np.arange(0, 1, 0.01)


Out[4]:
array([ 0.  ,  0.01,  0.02,  0.03,  0.04,  0.05,  0.06,  0.07,  0.08,
        0.09,  0.1 ,  0.11,  0.12,  0.13,  0.14,  0.15,  0.16,  0.17,
        0.18,  0.19,  0.2 ,  0.21,  0.22,  0.23,  0.24,  0.25,  0.26,
        0.27,  0.28,  0.29,  0.3 ,  0.31,  0.32,  0.33,  0.34,  0.35,
        0.36,  0.37,  0.38,  0.39,  0.4 ,  0.41,  0.42,  0.43,  0.44,
        0.45,  0.46,  0.47,  0.48,  0.49,  0.5 ,  0.51,  0.52,  0.53,
        0.54,  0.55,  0.56,  0.57,  0.58,  0.59,  0.6 ,  0.61,  0.62,
        0.63,  0.64,  0.65,  0.66,  0.67,  0.68,  0.69,  0.7 ,  0.71,
        0.72,  0.73,  0.74,  0.75,  0.76,  0.77,  0.78,  0.79,  0.8 ,
        0.81,  0.82,  0.83,  0.84,  0.85,  0.86,  0.87,  0.88,  0.89,
        0.9 ,  0.91,  0.92,  0.93,  0.94,  0.95,  0.96,  0.97,  0.98,  0.99])

Many more ways:


In [ ]:
np.ones(10)
np.zeros(10)
np.loadtxt('filename.txt', ...)
np.genfromtxt('filename.txt', ...)

...

Shape


In [6]:
ary.shape


Out[6]:
(10,)

In [7]:
ary = np.array([[0, 1., 2., 3., 4., 5], [5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16]], dtype='float')
ary


Out[7]:
array([[  0.,   1.,   2.,   3.,   4.,   5.],
       [  5.,   6.,   7.,   8.,   9.,  10.],
       [ 11.,  12.,  13.,  14.,  15.,  16.]])

In [8]:
ary.shape


Out[8]:
(3, 6)

No problem to change the shape


In [9]:
ary.shape = (6, 3)
ary


Out[9]:
array([[  0.,   1.,   2.],
       [  3.,   4.,   5.],
       [  5.,   6.,   7.],
       [  8.,   9.,  10.],
       [ 11.,  12.,  13.],
       [ 14.,  15.,  16.]])

Number fo all elements in array


In [10]:
ary.size


Out[10]:
18

Standart Indexing and Slicing

You can do all the nice indexing and slicing.


In [11]:
ary = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [12]:
ary[2]


Out[12]:
2

In [13]:
ary[3:-3]


Out[13]:
array([3, 4, 5, 6])

In [14]:
ary[:-1:2]


Out[14]:
array([0, 2, 4, 6, 8])

More fun on higher dimension


In [15]:
ary = np.array([[0, 1., 2., 3., 4.], [5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24]])
ary


Out[15]:
array([[  0.,   1.,   2.,   3.,   4.],
       [  5.,   6.,   7.,   8.,   9.],
       [ 10.,  11.,  12.,  13.,  14.],
       [ 15.,  16.,  17.,  18.,  19.],
       [ 20.,  21.,  22.,  23.,  24.]])

In [16]:
ary[2, 2]


Out[16]:
12.0

In [17]:
ary[-1, [0, 2, 4]]


Out[17]:
array([ 20.,  22.,  24.])

Slicing the center


In [18]:
ary[1:-1, 1:-1]


Out[18]:
array([[  6.,   7.,   8.],
       [ 11.,  12.,  13.],
       [ 16.,  17.,  18.]])

Slicing more advanced


In [19]:
ary[::2, ::2]


Out[19]:
array([[  0.,   2.,   4.],
       [ 10.,  12.,  14.],
       [ 20.,  22.,  24.]])

In [20]:
ary[2]


Out[20]:
array([ 10.,  11.,  12.,  13.,  14.])

Dtype


In [21]:
ary = np.array([1., 2., 3., 4., 5, 6, 7, 8, 9])
ary.dtype


Out[21]:
dtype('float64')

In [22]:
ary = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
ary.dtype


Out[22]:
dtype('int64')

In [23]:
ary = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='float128')
ary.dtype


Out[23]:
dtype('float128')

Custom dtype, more on that later


In [24]:
dt = ([('b-field', 'int'), ('temp', 'float'), ('spectrum', 'object')])
ary = np.array([(1, 4.2, (333.,400.,600, 234. )), (2, 1.5, (12, 22., 23., 221)), (4, 0.3, (212., 21., 21., 21.))], dtype=dt)

In [25]:
ary


Out[25]:
array([(1, 4.2, (333.0, 400.0, 600, 234.0)),
       (2, 1.5, (12, 22.0, 23.0, 221)), (4, 0.3, (212.0, 21.0, 21.0, 21.0))], 
      dtype=[('b-field', '<i8'), ('temp', '<f8'), ('spectrum', 'O')])

In [26]:
ary[0][1]


Out[26]:
4.2000000000000002

In [27]:
ary[2][1]


Out[27]:
0.29999999999999999

Memory

Number of bytes of one element


In [28]:
ary = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
ary.itemsize


Out[28]:
8

In [29]:
ary = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='float128')
ary.itemsize


Out[29]:
16

Bytes of all elements


In [30]:
ary.nbytes


Out[30]:
144

is equal to


In [31]:
ary.itemsize * ary.size


Out[31]:
144

Keep in mind that the total array object is ofcourse a little bit bigger than that


In [32]:
from sys import getsizeof
getsizeof(ary)


Out[32]:
240

Elementwise Operation

Operatiors act elementwise on numpy arrays


In [33]:
ary = np.array([0, 1, 2, 3, 4, 5])

In [34]:
ary + 10


Out[34]:
array([10, 11, 12, 13, 14, 15])

In [35]:
ary - 33


Out[35]:
array([-33, -32, -31, -30, -29, -28])

In [36]:
ary * 3


Out[36]:
array([ 0,  3,  6,  9, 12, 15])

In [37]:
ary / 2


Out[37]:
array([ 0. ,  0.5,  1. ,  1.5,  2. ,  2.5])

In [38]:
ary ** 2


Out[38]:
array([ 0,  1,  4,  9, 16, 25])

Ufunc - Universal Functions

Numpy includes a couple of mathematical functions, so called universal functions.

  • np.sin(), np.cos(), np.tan(), ...
  • np.sqrt(), np.exp(), np.log(), np.log10(), ...
  • np.mean(), np.var(),
  • np.abs(), ...
  • np.max(), np.min(), ...
  • np.convolve, ...
  • np.random.random(), np.random.poission(), ...
  • ... and many many more

Ufunc act elementwise on arrays


In [39]:
x = np.linspace(0, 2 * np.pi, 10)
x


Out[39]:
array([ 0.        ,  0.6981317 ,  1.3962634 ,  2.0943951 ,  2.7925268 ,
        3.4906585 ,  4.1887902 ,  4.88692191,  5.58505361,  6.28318531])

In [40]:
np.sin(x)


Out[40]:
array([  0.00000000e+00,   6.42787610e-01,   9.84807753e-01,
         8.66025404e-01,   3.42020143e-01,  -3.42020143e-01,
        -8.66025404e-01,  -9.84807753e-01,  -6.42787610e-01,
        -2.44929360e-16])

In [41]:
import matplotlib.pyplot as plt
%matplotlib inline

In [42]:
x = np.linspace(0, 2 * np.pi, 100)
plt.plot(x, np.sin(x))


Out[42]:
[<matplotlib.lines.Line2D at 0x7f84e34baef0>]

In [43]:
x = np.linspace(0, 2 * np.pi, 100)
y = np.sin(x)

dx = (x[:-1] - x[1:])
dy = (y[:-1] - y[1:])

In [44]:
plt.plot(x, y, x[1:], dy / dx)
plt.plot(x, np.cos(x), ls='', marker='o', markevery=3)


Out[44]:
[<matplotlib.lines.Line2D at 0x7f84e345e0f0>]

To hard to remember: Use the np.diff() insted, into also works for higher


In [45]:
x = np.linspace(0, 2 * np.pi, 100)
y = np.sin(x)

plt.plot(x, y)
plt.plot(x[1:], np.diff(y) / np.diff(x))


Out[45]:
[<matplotlib.lines.Line2D at 0x7f84e3426550>]

This does not only work for the numpy universal functions.


In [2]:
def parabola(x, a = 1, b=0, c=0):
    return a * (x - b)**2 - c

In [47]:
x = np.linspace(-6, 6, 1001)
plt.plot(x, parabola(x, a=2, b=2, c=3))
plt.plot(x, parabola(x, a=2, b=-2, c=3))
plt.grid()
plt.ylim(-3, 25)


Out[47]:
(-3, 25)

It still handles normal python types


In [48]:
parabola(4)


Out[48]:
16

In [49]:
parabola(3.14)


Out[49]:
9.8596

But you can not put a list, tuple, ... directly into it


In [50]:
parabola([1,2,3,4])


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-50-9a977739e9ae> in <module>()
----> 1 parabola([1,2,3,4])

<ipython-input-46-1d09628ca04e> in parabola(x, a, b, c)
      1 def parabola(x, a = 1, b=0, c=0):
----> 2     return a * (x - b)**2 - c

TypeError: unsupported operand type(s) for -: 'list' and 'int'

But you can always transform it into a numpy array


In [51]:
np.exp([1,2,34])


Out[51]:
array([  2.71828183e+00,   7.38905610e+00,   5.83461743e+14])

You can extent your function behaviour by adding a simpel line


In [7]:
def parabola(x, a = 1, b=0, c=0):
    
    x = np.array(x, copy=False)
    
    return a * (x - b)**2 - c

Fancy boolean or “mask” indexing


In [26]:
data = np.array([1, 2, 3, 4, 5])

Define a boolean mask


In [27]:
mask = np.array([True, True, False, True, False])

Return True items of the mask


In [29]:
data[mask]


Out[29]:
array([1, 2, 4])

Why do i want to do that?


In [44]:
import matplotlib.pyplot as plt
%matplotlib inline

Filter parts of sin function


In [165]:
data = np.random.random(100) - 0.5

x = np.linspace(-2 * np.pi, 2 * np.pi)
y = np.sin(x)

plt.figure(figsize=(12, 3))
plt.plot(x, y, lw=2)

mask = y > 0
plt.plot(x[mask], y[mask], ls='', marker='o', color='red', ms=8)
plt.xlim(-2 * np.pi, 2 * np.pi)
plt.ylim(-1.1, 1.1)

mask = (y < 0) & (x > 0) 
plt.plot(x[mask], y[mask], ls='', marker='D', color='green', ms=8)
plt.xlim(-2 * np.pi, 2 * np.pi)
plt.ylim(-1.1, 1.1)


Out[165]:
(-1.1, 1.1)

Find all the spikes in the measurment


In [164]:
# Create some random spikes
y = []
for i in range(1000):
    r = np.random.random()
    if  r > 0.95:
        y.append(np.random.randint(-10, 10))
    else:
        y.append(r - 0.5)

y = np.array(y)
x = np.arange(y.size)

# Plot the raw data
plt.figure(figsize=(12,3))
plt.plot(x, y)

# Find all positiv spikes
m_pos = y > np.var(y)
plt.plot(x[m_pos], y[m_pos], ls='', marker='o')

# Find all negatives spikes
m_neg = y < -np.var(y)
plt.plot(x[m_neg], y[m_neg], ls='', marker='o')

plt.ylim(-12, 12)


Out[164]:
(-12, 12)

Speed

A few words about Speed.

  • Python is fast (development)
  • Python is slow (runtime)

You don't have to understand the following timing in detail. The message is:

  • Use Python
  • Use Numpy for calculations: fancy indexing, universal functions instead of for loops and if statements
  • Use Cython if can not dot it

In [53]:
def py_pow(data):
    for point in data:
        point ** 2
        
def np_pow(data):
    data ** 2

Time the functions using IPython %timeit magic


In [54]:
data = np.random.random(100000)
%timeit py_pow(data)
%timeit np_pow(data)


10 loops, best of 3: 37.2 ms per loop
The slowest run took 4.67 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 108 µs per loop

Lets have a more detailed view with the line_profiler. You have to install it!


In [55]:
%load_ext line_profiler

In [56]:
%lprun -f py_pow py_pow(data)

In [57]:
%load_ext Cython

In [58]:
%%cython

def cy_pow(double[:] data):
    
    cdef long i   
    for i in range(len(data)):
        data[i] ** 2

In [59]:
data = np.random.random(100000)
%timeit cy_pow(data)


1000 loops, best of 3: 1.03 ms per loop