In [1]:

    
import numpy as np

Numpy

Creation

From lists



In [2]:

    
ary = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
ary









    Out[2]:





array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

With linspace, always useful for plotting data



In [3]:

    
np.linspace(0, 1, 101)









    Out[3]:





array([ 0.  ,  0.01,  0.02,  0.03,  0.04,  0.05,  0.06,  0.07,  0.08,
        0.09,  0.1 ,  0.11,  0.12,  0.13,  0.14,  0.15,  0.16,  0.17,
        0.18,  0.19,  0.2 ,  0.21,  0.22,  0.23,  0.24,  0.25,  0.26,
        0.27,  0.28,  0.29,  0.3 ,  0.31,  0.32,  0.33,  0.34,  0.35,
        0.36,  0.37,  0.38,  0.39,  0.4 ,  0.41,  0.42,  0.43,  0.44,
        0.45,  0.46,  0.47,  0.48,  0.49,  0.5 ,  0.51,  0.52,  0.53,
        0.54,  0.55,  0.56,  0.57,  0.58,  0.59,  0.6 ,  0.61,  0.62,
        0.63,  0.64,  0.65,  0.66,  0.67,  0.68,  0.69,  0.7 ,  0.71,
        0.72,  0.73,  0.74,  0.75,  0.76,  0.77,  0.78,  0.79,  0.8 ,
        0.81,  0.82,  0.83,  0.84,  0.85,  0.86,  0.87,  0.88,  0.89,
        0.9 ,  0.91,  0.92,  0.93,  0.94,  0.95,  0.96,  0.97,  0.98,
        0.99,  1.  ])

If you prefer the stepsize



In [4]:

    
np.arange(0, 1, 0.01)









    Out[4]:





array([ 0.  ,  0.01,  0.02,  0.03,  0.04,  0.05,  0.06,  0.07,  0.08,
        0.09,  0.1 ,  0.11,  0.12,  0.13,  0.14,  0.15,  0.16,  0.17,
        0.18,  0.19,  0.2 ,  0.21,  0.22,  0.23,  0.24,  0.25,  0.26,
        0.27,  0.28,  0.29,  0.3 ,  0.31,  0.32,  0.33,  0.34,  0.35,
        0.36,  0.37,  0.38,  0.39,  0.4 ,  0.41,  0.42,  0.43,  0.44,
        0.45,  0.46,  0.47,  0.48,  0.49,  0.5 ,  0.51,  0.52,  0.53,
        0.54,  0.55,  0.56,  0.57,  0.58,  0.59,  0.6 ,  0.61,  0.62,
        0.63,  0.64,  0.65,  0.66,  0.67,  0.68,  0.69,  0.7 ,  0.71,
        0.72,  0.73,  0.74,  0.75,  0.76,  0.77,  0.78,  0.79,  0.8 ,
        0.81,  0.82,  0.83,  0.84,  0.85,  0.86,  0.87,  0.88,  0.89,
        0.9 ,  0.91,  0.92,  0.93,  0.94,  0.95,  0.96,  0.97,  0.98,  0.99])

Many more ways:



In [ ]:

    
np.ones(10)
np.zeros(10)
np.loadtxt('filename.txt', ...)
np.genfromtxt('filename.txt', ...)

...

Shape



In [6]:

    
ary.shape









    Out[6]:





(10,)



In [7]:

    
ary = np.array([[0, 1., 2., 3., 4., 5], [5, 6, 7, 8, 9, 10], [11, 12, 13, 14, 15, 16]], dtype='float')
ary









    Out[7]:





array([[  0.,   1.,   2.,   3.,   4.,   5.],
       [  5.,   6.,   7.,   8.,   9.,  10.],
       [ 11.,  12.,  13.,  14.,  15.,  16.]])



In [8]:

    
ary.shape









    Out[8]:





(3, 6)

No problem to change the shape



In [9]:

    
ary.shape = (6, 3)
ary









    Out[9]:





array([[  0.,   1.,   2.],
       [  3.,   4.,   5.],
       [  5.,   6.,   7.],
       [  8.,   9.,  10.],
       [ 11.,  12.,  13.],
       [ 14.,  15.,  16.]])

Number fo all elements in array



In [10]:

    
ary.size









    Out[10]:





18

Standart Indexing and Slicing

You can do all the nice indexing and slicing.



In [11]:

    
ary = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])



In [12]:

    
ary[2]









    Out[12]:





2



In [13]:

    
ary[3:-3]









    Out[13]:





array([3, 4, 5, 6])



In [14]:

    
ary[:-1:2]









    Out[14]:





array([0, 2, 4, 6, 8])

More fun on higher dimension



In [15]:

    
ary = np.array([[0, 1., 2., 3., 4.], [5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24]])
ary









    Out[15]:





array([[  0.,   1.,   2.,   3.,   4.],
       [  5.,   6.,   7.,   8.,   9.],
       [ 10.,  11.,  12.,  13.,  14.],
       [ 15.,  16.,  17.,  18.,  19.],
       [ 20.,  21.,  22.,  23.,  24.]])



In [16]:

    
ary[2, 2]









    Out[16]:





12.0



In [17]:

    
ary[-1, [0, 2, 4]]









    Out[17]:





array([ 20.,  22.,  24.])

Slicing the center



In [18]:

    
ary[1:-1, 1:-1]









    Out[18]:





array([[  6.,   7.,   8.],
       [ 11.,  12.,  13.],
       [ 16.,  17.,  18.]])

Slicing more advanced



In [19]:

    
ary[::2, ::2]









    Out[19]:





array([[  0.,   2.,   4.],
       [ 10.,  12.,  14.],
       [ 20.,  22.,  24.]])



In [20]:

    
ary[2]









    Out[20]:





array([ 10.,  11.,  12.,  13.,  14.])

Dtype



In [21]:

    
ary = np.array([1., 2., 3., 4., 5, 6, 7, 8, 9])
ary.dtype









    Out[21]:





dtype('float64')



In [22]:

    
ary = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
ary.dtype









    Out[22]:





dtype('int64')



In [23]:

    
ary = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='float128')
ary.dtype









    Out[23]:





dtype('float128')

Custom dtype, more on that later



In [24]:

    
dt = ([('b-field', 'int'), ('temp', 'float'), ('spectrum', 'object')])
ary = np.array([(1, 4.2, (333.,400.,600, 234. )), (2, 1.5, (12, 22., 23., 221)), (4, 0.3, (212., 21., 21., 21.))], dtype=dt)



In [25]:

    
ary









    Out[25]:





array([(1, 4.2, (333.0, 400.0, 600, 234.0)),
       (2, 1.5, (12, 22.0, 23.0, 221)), (4, 0.3, (212.0, 21.0, 21.0, 21.0))], 
      dtype=[('b-field', '<i8'), ('temp', '<f8'), ('spectrum', 'O')])



In [26]:

    
ary[0][1]









    Out[26]:





4.2000000000000002



In [27]:

    
ary[2][1]









    Out[27]:





0.29999999999999999

Memory

Number of bytes of one element



In [28]:

    
ary = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
ary.itemsize









    Out[28]:





8



In [29]:

    
ary = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='float128')
ary.itemsize









    Out[29]:





16

Bytes of all elements



In [30]:

    
ary.nbytes









    Out[30]:





144

is equal to



In [31]:

    
ary.itemsize * ary.size









    Out[31]:





144

Keep in mind that the total array object is ofcourse a little bit bigger than that



In [32]:

    
from sys import getsizeof
getsizeof(ary)









    Out[32]:





240

Elementwise Operation

Operatiors act elementwise on numpy arrays



In [33]:

    
ary = np.array([0, 1, 2, 3, 4, 5])



In [34]:

    
ary + 10









    Out[34]:





array([10, 11, 12, 13, 14, 15])



In [35]:

    
ary - 33









    Out[35]:





array([-33, -32, -31, -30, -29, -28])



In [36]:

    
ary * 3









    Out[36]:





array([ 0,  3,  6,  9, 12, 15])



In [37]:

    
ary / 2









    Out[37]:





array([ 0. ,  0.5,  1. ,  1.5,  2. ,  2.5])



In [38]:

    
ary ** 2









    Out[38]:





array([ 0,  1,  4,  9, 16, 25])

Ufunc - Universal Functions

Numpy includes a couple of mathematical functions, so called universal functions.

np.sin(), np.cos(), np.tan(), ...
np.sqrt(), np.exp(), np.log(), np.log10(), ...
np.mean(), np.var(),
np.abs(), ...
np.max(), np.min(), ...
np.convolve, ...
np.random.random(), np.random.poission(), ...
... and many many more

Ufunc act elementwise on arrays



In [39]:

    
x = np.linspace(0, 2 * np.pi, 10)
x









    Out[39]:





array([ 0.        ,  0.6981317 ,  1.3962634 ,  2.0943951 ,  2.7925268 ,
        3.4906585 ,  4.1887902 ,  4.88692191,  5.58505361,  6.28318531])



In [40]:

    
np.sin(x)









    Out[40]:





array([  0.00000000e+00,   6.42787610e-01,   9.84807753e-01,
         8.66025404e-01,   3.42020143e-01,  -3.42020143e-01,
        -8.66025404e-01,  -9.84807753e-01,  -6.42787610e-01,
        -2.44929360e-16])



In [41]:

    
import matplotlib.pyplot as plt
%matplotlib inline



In [42]:

    
x = np.linspace(0, 2 * np.pi, 100)
plt.plot(x, np.sin(x))









    Out[42]:





[<matplotlib.lines.Line2D at 0x7f84e34baef0>]



In [43]:

    
x = np.linspace(0, 2 * np.pi, 100)
y = np.sin(x)

dx = (x[:-1] - x[1:])
dy = (y[:-1] - y[1:])



In [44]:

    
plt.plot(x, y, x[1:], dy / dx)
plt.plot(x, np.cos(x), ls='', marker='o', markevery=3)









    Out[44]:





[<matplotlib.lines.Line2D at 0x7f84e345e0f0>]

To hard to remember: Use the np.diff() insted, into also works for higher



In [45]:

    
x = np.linspace(0, 2 * np.pi, 100)
y = np.sin(x)

plt.plot(x, y)
plt.plot(x[1:], np.diff(y) / np.diff(x))









    Out[45]:





[<matplotlib.lines.Line2D at 0x7f84e3426550>]

This does not only work for the numpy universal functions.



In [2]:

    
def parabola(x, a = 1, b=0, c=0):
    return a * (x - b)**2 - c



In [47]:

    
x = np.linspace(-6, 6, 1001)
plt.plot(x, parabola(x, a=2, b=2, c=3))
plt.plot(x, parabola(x, a=2, b=-2, c=3))
plt.grid()
plt.ylim(-3, 25)









    Out[47]:





(-3, 25)

It still handles normal python types



In [48]:

    
parabola(4)









    Out[48]:





16



In [49]:

    
parabola(3.14)









    Out[49]:





9.8596

But you can not put a list, tuple, ... directly into it



In [50]:

    
parabola([1,2,3,4])









    



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-50-9a977739e9ae> in <module>()
----> 1 parabola([1,2,3,4])

<ipython-input-46-1d09628ca04e> in parabola(x, a, b, c)
      1 def parabola(x, a = 1, b=0, c=0):
----> 2     return a * (x - b)**2 - c

TypeError: unsupported operand type(s) for -: 'list' and 'int'

But you can always transform it into a numpy array



In [51]:

    
np.exp([1,2,34])









    Out[51]:





array([  2.71828183e+00,   7.38905610e+00,   5.83461743e+14])

You can extent your function behaviour by adding a simpel line



In [7]:

    
def parabola(x, a = 1, b=0, c=0):
    
    x = np.array(x, copy=False)
    
    return a * (x - b)**2 - c

Fancy boolean or “mask” indexing



In [26]:

    
data = np.array([1, 2, 3, 4, 5])

Define a boolean mask



In [27]:

    
mask = np.array([True, True, False, True, False])

Return True items of the mask



In [29]:

    
data[mask]









    Out[29]:





array([1, 2, 4])

Why do i want to do that?



In [44]:

    
import matplotlib.pyplot as plt
%matplotlib inline

Filter parts of sin function



In [165]:

    
data = np.random.random(100) - 0.5

x = np.linspace(-2 * np.pi, 2 * np.pi)
y = np.sin(x)

plt.figure(figsize=(12, 3))
plt.plot(x, y, lw=2)

mask = y > 0
plt.plot(x[mask], y[mask], ls='', marker='o', color='red', ms=8)
plt.xlim(-2 * np.pi, 2 * np.pi)
plt.ylim(-1.1, 1.1)

mask = (y < 0) & (x > 0) 
plt.plot(x[mask], y[mask], ls='', marker='D', color='green', ms=8)
plt.xlim(-2 * np.pi, 2 * np.pi)
plt.ylim(-1.1, 1.1)









    Out[165]:





(-1.1, 1.1)

Find all the spikes in the measurment



In [164]:

    
# Create some random spikes
y = []
for i in range(1000):
    r = np.random.random()
    if  r > 0.95:
        y.append(np.random.randint(-10, 10))
    else:
        y.append(r - 0.5)

y = np.array(y)
x = np.arange(y.size)

# Plot the raw data
plt.figure(figsize=(12,3))
plt.plot(x, y)

# Find all positiv spikes
m_pos = y > np.var(y)
plt.plot(x[m_pos], y[m_pos], ls='', marker='o')

# Find all negatives spikes
m_neg = y < -np.var(y)
plt.plot(x[m_neg], y[m_neg], ls='', marker='o')

plt.ylim(-12, 12)









    Out[164]:





(-12, 12)

Speed

A few words about Speed.

Python is fast (development)
Python is slow (runtime)

You don't have to understand the following timing in detail. The message is:

Use Python
Use Numpy for calculations: fancy indexing, universal functions instead of for loops and if statements
Use Cython if can not dot it



In [53]:

    
def py_pow(data):
    for point in data:
        point ** 2
        
def np_pow(data):
    data ** 2

Time the functions using IPython %timeit magic



In [54]:

    
data = np.random.random(100000)
%timeit py_pow(data)
%timeit np_pow(data)









    



10 loops, best of 3: 37.2 ms per loop
The slowest run took 4.67 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 108 µs per loop

Lets have a more detailed view with the line_profiler. You have to install it!



In [55]:

    
%load_ext line_profiler



In [56]:

    
%lprun -f py_pow py_pow(data)



In [57]:

    
%load_ext Cython



In [58]:

    
%%cython

def cy_pow(double[:] data):
    
    cdef long i   
    for i in range(len(data)):
        data[i] ** 2



In [59]:

    
data = np.random.random(100000)
%timeit cy_pow(data)









    



1000 loops, best of 3: 1.03 ms per loop