Scientific computing

Numpy: advanced array operations

Scipy introduction: from linear algebra to image analisys

Simpy: symbolic math

Numpy

Most of the time when doing scientific computing speed is critical. There are several reasons Numpy is much faster than standard Python. The most important is that Numpy enforces strong typing, while Python is a dynamic typed language. That translates in Numpy using less heap space for representing data. Because array operations are the core of scientific computing, we will look at this library in greater depth.



In [1]:

    
import numpy as np
L = range(1000)
%timeit [i**2 for i in L]
a = np.arange(1000)
%timeit a**2
print(type(L[1]))
print(a.dtype)









    



1000 loops, best of 3: 397 µs per loop
The slowest run took 14.38 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.64 µs per loop
<class 'int'>
int64



In [3]:

    
import numpy as np
##Get help!
#np.lookfor('create array')
#np.array?
#np.arr*?



In [5]:

    
a = np.array([0, 1, 2, 3])
b = np.array([[0, 1, 2], [3, 4, 5]])
c = np.array([[[1], [2]], [[3], [4]]])
print(a)
print(b)
print(b.ndim, b.shape, len(b))









    



[0 1 2 3]
[[0 1 2]
 [3 4 5]]
2 (2, 3) 2



In [10]:

    
##Array/Matrix creation
%pylab inline
import matplotlib.pyplot as plt

b = np.arange(1, 9, 2)
c = np.linspace(0, 1, 6)
print b, c
a = np.ones((3, 3))
b = np.zeros((2, 2))
c = np.eye(3)
print a, b, c
d = np.diag(np.array([1, 2, 3, 4]))
a = np.random.rand(4)       # uniform in [0, 1]
b = np.random.randn(4)      # Gaussian
print a, b, d
#np.random.seed(1234)
image = np.random.rand(30, 30)
plt.imshow(image, cmap=plt.cm.hot)    
plt.colorbar()









    



Populating the interactive namespace from numpy and matplotlib
[1 3 5 7] [ 0.   0.2  0.4  0.6  0.8  1. ]
[[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]] [[ 0.  0.]
 [ 0.  0.]] [[ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  0.  1.]]
[ 0.27495465  0.89816518  0.82189635  0.51527916] [ 1.70194735 -0.57257169  0.27530572 -0.28526591] [[1 0 0 0]
 [0 2 0 0]
 [0 0 3 0]
 [0 0 0 4]]






    Out[10]:





<matplotlib.colorbar.Colorbar instance at 0x7fdb822a82d8>



In [14]:

    
# Indexing, Slicing and Selection
a = np.arange(10)
print a[0], a[2], a[-1]
print a[2:5], a[2:], a[:-2], a[::2], a[2::2]
a = np.diag(np.arange(3))
print a
print a[1, 1]
a[2, 1] = 10 # third line, second column
print a[1]
print a[:,1]
print a[1:,2:]
m = np.arange(6) + np.arange(0, 51, 10)[:, np.newaxis]
print np.arange(0, 51, 10)[:, np.newaxis]
print m
print m[2::2,::2]









    



0 2 9
[2 3 4] [2 3 4 5 6 7 8 9] [0 1 2 3 4 5 6 7] [0 2 4 6 8] [2 4 6 8]
[[0 0 0]
 [0 1 0]
 [0 0 2]]
1
[0 1 0]
[ 0  1 10]
[[0]
 [2]]
[[ 0]
 [10]
 [20]
 [30]
 [40]
 [50]]
[[ 0  1  2  3  4  5]
 [10 11 12 13 14 15]
 [20 21 22 23 24 25]
 [30 31 32 33 34 35]
 [40 41 42 43 44 45]
 [50 51 52 53 54 55]]
[[20 22 24]
 [40 42 44]]



In [15]:

    
a = np.arange(10)
b = a[::2]
b[0] = 12
print b, a
print np.may_share_memory(a, b)

a = np.arange(10)
c = a[::2].copy()  # force a copy
c[0] = 12
print a, c
print np.may_share_memory(a, c)









    



[12  2  4  6  8] [12  1  2  3  4  5  6  7  8  9]
True
[0 1 2 3 4 5 6 7 8 9] [12  2  4  6  8]
False



In [20]:

    
a = np.random.random_integers(0, 20, 15)
print a
print a%3==0
print a[a%3==0]
a[a % 3 == 0] = -1
print a

a = np.arange(0, 100, 10)
print a[[2, 3, 2, 4, 2]]
a[[9, 7]] = -100
print a

a = np.arange(0,100,10)
idx = np.array([[3,4],[9,7]])
print a[idx]









    



[ 1 18 14  0 19 17 12  1 17  7 16 16 11 16 14]
[False  True False  True False False  True False False False False False
 False False False]
[18  0 12]
[ 1 -1 14 -1 19 17 -1  1 17  7 16 16 11 16 14]
[20 30 20 40 20]
[   0   10   20   30   40   50   60 -100   80 -100]
[[30 40]
 [90 70]]



In [21]:

    
m = np.arange(6) + np.arange(0, 51, 10)[:, np.newaxis]
print m
print m[(0,1,2,3,4),(1,2,3,4,5)]
print m[[2,5], [1,3]]
print m[3:,[1,3]]









    



[[ 0  1  2  3  4  5]
 [10 11 12 13 14 15]
 [20 21 22 23 24 25]
 [30 31 32 33 34 35]
 [40 41 42 43 44 45]
 [50 51 52 53 54 55]]
[ 1 12 23 34 45]
[21 53]
[[31 33]
 [41 43]
 [51 53]]



In [6]:

    
def get_primes():
    primes = np.ones((100,), dtype=bool)
    primes[:2] = 0
    N_max = int(np.sqrt(len(primes)))
    for j in range(2, N_max):
        primes[2*j::j] = 0
    return primes
print(get_primes())









    



[False False  True  True False  True False  True False False False  True
 False  True False False False  True False  True False False False  True
 False False False False False  True False  True False False False False
 False  True False False False  True False  True False False False  True
 False False False False False  True False False False False False  True
 False  True False False False False False  True False False False  True
 False  True False False False False False  True False False False  True
 False False False False False  True False False False False False False
 False  True False False]



In [24]:

    
a = np.triu(np.ones((3, 3)), 1)
print a
print a.T









    



[[ 0.  1.  1.]
 [ 0.  0.  1.]
 [ 0.  0.  0.]]
[[ 0.  0.  0.]
 [ 1.  0.  0.]
 [ 1.  1.  0.]]



In [40]:

    
#Array operations
a = np.array([1, 2, 3, 4])
print a + 1, 2**a
b = np.ones(4) + 1
print a - b, a * b

j = np.arange(5)
print 2**(j + 1) - j

c = np.ones((3, 3))
print 2*c + 1, c * c, c + c
print c.dot(c) #matrix multiplication!

a = np.arange(5)
print np.sin(a), np.log(a), np.exp(a)

a = np.triu(np.ones((3, 3)), 1)
print a, a.T









    



[2 3 4 5] [ 2  4  8 16]
[-1.  0.  1.  2.] [ 2.  4.  6.  8.]
[ 2  3  6 13 28]
[[ 3.  3.  3.]
 [ 3.  3.  3.]
 [ 3.  3.  3.]] [[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]] [[ 2.  2.  2.]
 [ 2.  2.  2.]
 [ 2.  2.  2.]]
[[ 3.  3.  3.]
 [ 3.  3.  3.]
 [ 3.  3.  3.]]
[ 0.          0.84147098  0.90929743  0.14112001 -0.7568025 ] [       -inf  0.          0.69314718  1.09861229  1.38629436] [  1.           2.71828183   7.3890561   20.08553692  54.59815003]
[[ 0.  1.  1.]
 [ 0.  0.  1.]
 [ 0.  0.  0.]] [[ 0.  0.  0.]
 [ 1.  0.  0.]
 [ 1.  1.  0.]]
10 10 10
[3 3] 3
[2 4] 2
[[[ 0.77710864  0.42473141]
  [ 0.27735134  0.17435609]]

 [[ 0.99553147  0.05230577]
  [ 0.96649186  0.75504016]]] 0.45170742645 0.45170742645
1 3 0 1
False True
False True
True
1.75 0.829156197589 1.5 [ 2.  5.]



In [ ]:

    
##Reductions

x = np.array([1, 2, 3, 4])
print np.sum(x), x.sum(), x.sum(axis=0)

x = np.array([[1, 1], [2, 2]])
print x.sum(axis=0), x[:, 0].sum()   # columns (first dimension)
print x.sum(axis=1), x[0, :].sum()  # rows (second dimension)

x = np.random.rand(2,2,2)
print x, x.sum(axis=2)[0,1], x[0,1,:].sum()

x = np.array([1, 3, 2])
print x.min(), x.max(), x.argmin(), x.argmax()
print np.all([True, True, False]), np.any([True, True, False])
a = np.zeros((100, 100))
print np.any(a != 0), np.all(a == a)
a = np.array([1, 2, 3, 2])
b = np.array([2, 2, 3, 2])
c = np.array([6, 4, 4, 5])
print ((a <= b) & (b <= c)).all()

x = np.array([1, 2, 3, 1])
y = np.array([[1, 2, 3], [5, 6, 1]])
print x.mean(), x.std(), np.median(x), np.median(y, axis=-1)



In [8]:

    
x = np.array([1, 2, 3, 4])
print(np.sum(x), x.sum(), x.sum(axis=0))

x = np.array([[1, 1], [2, 2]])
print(x)
print(x.sum(axis=0), x[:, 0].sum())   # columns (first dimension)
print(x.sum(axis=1), x[0, :].sum())  # rows (second dimension)









    



10 10 10
[[1 1]
 [2 2]]
[3 3] 3
[2 4] 2



In [9]:

    
a = np.zeros((100, 100))
print(np.any(a != 0), np.all(a == a))









    



False True



In [43]:

    
##Broadcasting
a = np.tile(np.arange(0, 40, 10), (3, 1)).T
b = np.array([0, 1, 2])
print a+b

a = np.arange(0, 40, 10)
print a.shape
a = a[:, np.newaxis]  # adds a new axis -> 2D array
print a.shape
print a, b, a+b

a = np.ones((4, 5))
a[0] = 2
print a

x, y = np.ogrid[0:5, 0:5]
x, y = np.mgrid[0:5, 0:5]









    



[[ 0  1  2]
 [10 11 12]
 [20 21 22]
 [30 31 32]]
(4,)
(4, 1)
[[ 0]
 [10]
 [20]
 [30]] [0 1 2] [[ 0  1  2]
 [10 11 12]
 [20 21 22]
 [30 31 32]]
[[ 2.  2.  2.  2.  2.]
 [ 1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.]]



In [14]:

    
a = np.arange(0, 40, 10)
b = np.array([0, 1, 2])
print(a.shape)
print(a)
a = a[:, np.newaxis]  # adds a new axis -> 2D array
print(a.shape)
print(a)
print(b)
print(a+b)









    



(4,)
[ 0 10 20 30]
(4, 1)
[[ 0]
 [10]
 [20]
 [30]]
[0 1 2]
[[ 0  1  2]
 [10 11 12]
 [20 21 22]
 [30 31 32]]



In [27]:

    
x, y = np.ogrid[0:5, 0:5]
print x
print y
x, y = np.mgrid[0:5, 0:5]
print x
print y









    



[[0]
 [1]
 [2]
 [3]
 [4]]
[[0 1 2 3 4]]
[[0 0 0 0 0]
 [1 1 1 1 1]
 [2 2 2 2 2]
 [3 3 3 3 3]
 [4 4 4 4 4]]
[[0 1 2 3 4]
 [0 1 2 3 4]
 [0 1 2 3 4]
 [0 1 2 3 4]
 [0 1 2 3 4]]



In [29]:

    
##Shape manipulations
a = np.array([[1, 2, 3], [4, 5, 6]])
print a.ravel(), a.T.ravel()
b = a.ravel()
b = b.reshape((2, 3)) #!reshape may return a copy!
print b

z = np.array([1, 2, 3])
print z, z[:, np.newaxis], z[np.newaxis, :]

a = np.arange(4*3*2).reshape(4, 3, 2)
print a.shape, a[0, 2, 1]
b = a.transpose(1, 2, 0)
print b.shape, b[2, 1, 0]
a = np.arange(4)
print a.resize((8,))









    



[1 2 3 4 5 6] [1 4 2 5 3 6]
[[1 2 3]
 [4 5 6]]
[1 2 3] [[1]
 [2]
 [3]] [[1 2 3]]
(4, 3, 2) 5
(3, 2, 4) 5
None



In [30]:

    
##Sorting

a = np.array([[4, 3, 5], [1, 2, 1]])
b = np.sort(a, axis=1)
print b
print a.sort(axis=1)

a = np.array([4, 3, 1, 2])
j = np.argsort(a)
print j, a[j]
a = np.array([4, 3, 1, 2])
print np.argmax(a), np.argmin(a)









    



[[3 4 5]
 [1 1 2]]
None
[2 3 1 0] [1 2 3 4]
0 2

Scipy

Scipy is Python's favorite library for scientific computing. Some of its functionality overlaps with Numpy and Scikit-learn, but in general Scipy modules are viewed as the equivalent of Matlab's standard toolboxes. One is supposed to use it in conjunction with the so call "Scipy-stack" so for example it is best to deal array functionality with Numpy and machine learning with Scikit-learn. We will quickly go through several more common uses of Scipy, while it is good to keep in mind that we are barely scratching the skin of actual scientific computing. http://www.c3se.chalmers.se/common/python_course_2012/Lecture5_SciPy_2012.pdf

scipy.linalg - Singuar Value Decomposition

There are three main uses of this module: solving linear equations, solving eigenvalues problems and matrix factorizations.

One of the most common techniques for data summarization is Singuar Value Decomposition (SVD) and Partial Component Analysis (PCA). We will solve one SVD task here, and a PCA task later in the machine learning section. Keep in mind that most machine learning algorithms in scikit-learn are wrappers for the basic functions in this scipy module. From the linear algebra perspective PCA is a matrix decomposition method. Several other such methods are available in the linalg module.

If X is a matrix with each variable in a column and each observation in a row then the SVD is $$X = U S V$$ where the columns of U are orthogonal (left singular vectors), the columns of V are orthogonal (right singluar vectors) and S is a diagonal matrix of zeroes with main diagonal s (singular values).

In the example below, most of the variation in the dataset is explained by the first two singular values, corresponding to the first two features.

Obs:

scipy.linalg.orth(A) - uses SVD to find an orthonormal basis for A.



In [33]:

    
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from scipy import linalg
from sklearn.datasets import load_iris

iris = load_iris()
print iris.feature_names, iris.target_names
print iris.DESCR
#print iris.data
A = iris.data
U, s, V = linalg.svd(A)
print U.shape, V.shape, s.shape
print s
#a = U.dot(eye())
S = linalg.diagsvd(s, 150, 4)
At = np.dot(U, np.dot(S, V))
print np.allclose(A, At)

fig, axes = plt.subplots(1, 2, figsize=(15,5))

# fig = plt.figure()
ax1 = axes[0]
colors = np.array(['blue','red','black'])
labels = np.array(['setosa','versicolour','verginica'])
ax1.scatter(U[:,0], U[:,1], color = colors[iris.target])
ax1.set_xlabel("First singular vector")
ax1.set_ylabel("Second singular vector")
ax1.legend()

ax2 = axes[1]
colors = np.array(['blue','red','black'])
labels = np.array(['setosa','versicolour','verginica'])
ax2.scatter(A[:,0], A[:,1], color = colors[iris.target])
ax2.set_xlabel("First feature vector")
ax2.set_ylabel("Second feature vector")
ax2.legend()









    



['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'] ['setosa' 'versicolor' 'virginica']
Iris Plants Database

Notes
-----
Data Set Characteristics:
    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, predictive attributes and the class
    :Attribute Information:
        - sepal length in cm
        - sepal width in cm
        - petal length in cm
        - petal width in cm
        - class:
                - Iris-Setosa
                - Iris-Versicolour
                - Iris-Virginica
    :Summary Statistics:
    ============== ==== ==== ======= ===== ====================
                    Min  Max   Mean    SD   Class Correlation
    ============== ==== ==== ======= ===== ====================
    sepal length:   4.3  7.9   5.84   0.83    0.7826
    sepal width:    2.0  4.4   3.05   0.43   -0.4194
    petal length:   1.0  6.9   3.76   1.76    0.9490  (high!)
    petal width:    0.1  2.5   1.20  0.76     0.9565  (high!)
    ============== ==== ==== ======= ===== ====================
    :Missing Attribute Values: None
    :Class Distribution: 33.3% for each of 3 classes.
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :Date: July, 1988

This is a copy of UCI ML iris datasets.
http://archive.ics.uci.edu/ml/datasets/Iris

The famous Iris database, first used by Sir R.A Fisher

This is perhaps the best known database to be found in the
pattern recognition literature.  Fisher's paper is a classic in the field and
is referenced frequently to this day.  (See Duda & Hart, for example.)  The
data set contains 3 classes of 50 instances each, where each class refers to a
type of iris plant.  One class is linearly separable from the other 2; the
latter are NOT linearly separable from each other.

References
----------
   - Fisher,R.A. "The use of multiple measurements in taxonomic problems"
     Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions to
     Mathematical Statistics" (John Wiley, NY, 1950).
   - Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis.
     (Q327.D83) John Wiley & Sons.  ISBN 0-471-22361-1.  See page 218.
   - Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New System
     Structure and Classification Rule for Recognition in Partially Exposed
     Environments".  IEEE Transactions on Pattern Analysis and Machine
     Intelligence, Vol. PAMI-2, No. 1, 67-71.
   - Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule".  IEEE Transactions
     on Information Theory, May 1972, 431-433.
   - See also: 1988 MLC Proceedings, 54-64.  Cheeseman et al"s AUTOCLASS II
     conceptual clustering system finds 3 classes in the data.
   - Many, many more ...

(150, 150) (4, 4) (4,)
[ 95.95066751  17.72295328   3.46929666   1.87891236]
True



In [15]:

    
#np.allclose?

scipy.signal and scipy.fftpack: Signal theory

Signal processing is useful in order to interpret the data of many measuring instruments. We are performing a simple example, but for those that want to learn more applications of Python for signal processing I reccomend a number of online IPython courses.

http://nbviewer.ipython.org/github/calebmadrigal/FourierTalkOSCON/tree/master/: Basic entry level course focused only on FFT
http://nbviewer.ipython.org/github/unpingco/Python-for-Signal-Processing/tree/master/: A more complex and complete course of signal processing with Python.

A small example would be a noisy signal whose frequency is unknown to the observer, who only knows the sampling time step. The signal is supposed to come from a real function so the Fourier transform will be symmetric. The scipy.fftpack.fftfreq() function will generate the sampling frequencies and scipy.fftpack.fft() will compute the fast Fourier transform:



In [17]:

    
%matplotlib inline
import numpy as np
from scipy import fftpack
import matplotlib.pyplot as plt

time_step = 0.22
period = 5.
time_vec = np.arange(0, 20, time_step)
sig = np.sin(2 * np.pi / period * time_vec) + 0.5 * np.random.randn(time_vec.size)

from scipy import fftpack

#print(sig.size)
sample_freq = fftpack.fftfreq(sig.size, d=time_step)
sig_fft = fftpack.fft(sig)
pidxs = np.where(sample_freq > 0)
freqs, power = sample_freq[pidxs], np.abs(sig_fft)[pidxs]
freq = freqs[power.argmax()]
print("Determined frequency:",freq)

sig_fft[np.abs(sample_freq) > freq] = 0
main_sig = fftpack.ifft(sig_fft)#Discrete inverse Fourier transform

fig = plt.figure()

ax1 = fig.add_subplot(311)
ax1.plot(time_vec,sig)
ax1.set_title('Signal')

ax2 = fig.add_subplot(312)
ax2.plot(freqs, power)
ax2.set_xlabel('Frequency [Hz]')
ax2.set_ylabel('power')
ax2.set_title('Peak frequency')

ax3 = fig.add_subplot(313)
ax3.plot(time_vec,main_sig)
ax1.set_title('Cleaned signal')









    



91
Determined frequency: 0.1998001998






    



/home/sergiun/programs/anaconda3/envs/py35/lib/python3.5/site-packages/numpy/core/numeric.py:474: ComplexWarning: Casting complex values to real discards the imaginary part
  return array(a, dtype, copy=False, order=order)






    Out[17]:





<matplotlib.text.Text at 0x7fc9bbc96240>

scipy.optimize: Local and global optimization, fitting and root finding

In the statistics chapter we already used this package for line fitting. We estimated the parameters of a function by performing an error minimization. An optimization problem complexity is dependent on several factors:

Do you intend a local or a global optimization?
Is the function linear or nonlinear?
Is the function convex or not?
Can a gradient be computed?
Can the Hessian matrix be computed?
Do we perform optimization under constraints? Scipy does not cover all solvers efficiently but there are several Python packages specialized for certain classes of optimization problems. In general though heavy optimization is solved with dedicated programs, many of whom have language bindings for Python.

To exemplify, we use Newton's optimization to find the minima of a nonlinear function.



In [35]:

    
import numpy as np
import scipy.optimize as optimize


def f(x):   # The rosenbrock function
    return .5*(1 - x[0])**2 + (x[1] - x[0]**2)**2
def fprime(x):
    return np.array((-2*.5*(1 - x[0]) - 4*x[0]*(x[1] - x[0]**2), 2*(x[1] - x[0]**2)))
print optimize.fmin_ncg(f, [2, 2], fprime=fprime)

def hessian(x): # Computed with sympy
    return np.array(((1 - 4*x[1] + 12*x[0]**2, -4*x[0]), (-4*x[0], 2)))
print optimize.fmin_ncg(f, [2, 2], fprime=fprime, fhess=hessian)

%matplotlib inline
from matplotlib import cm
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure()
ax = fig.gca(projection='3d')
X = np.arange(-5, 5, 0.25)
Y = np.arange(-5, 5, 0.25)
X, Y = np.meshgrid(X, Y)
Z = .5*(1 - X)**2 + (Y - X**2)**2
surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.coolwarm,
        linewidth=0, antialiased=False)
ax.set_zlim(-1000.01, 1000.01)
fig.colorbar(surf, shrink=0.5, aspect=5)
plt.show()









    



Optimization terminated successfully.
         Current function value: 0.000000
         Iterations: 9
         Function evaluations: 11
         Gradient evaluations: 51
         Hessian evaluations: 0
[ 1.  1.]
Optimization terminated successfully.
         Current function value: 0.000000
         Iterations: 9
         Function evaluations: 11
         Gradient evaluations: 19
         Hessian evaluations: 9
[ 1.  1.]

scipy.interpolate: Cubic interpolation

Interpolation is useful when we have sampled a function but want to approximate its values on different points. A well known class of interpolation functions are the splines, most commonly three spline curves are combined in order to interpolate a smooth curved line between two datapoints.



In [13]:

    
%matplotlib inline
import numpy as np
from scipy.interpolate import interp1d
import pylab as pl

measured_time = np.linspace(0, 1, 10)
noise = 0.1 * np.random.randn(10)
measures = np.sin(2 * np.pi * measured_time) + noise

linear_interp = interp1d(measured_time, measures)
computed_time = np.linspace(0, 1, 50)
linear_results = linear_interp(computed_time)
cubic_interp = interp1d(measured_time, measures, kind='cubic')
cubic_results = cubic_interp(computed_time)

pl.plot(measured_time, measures, 'o', ms=6, label='measures')
pl.plot(computed_time, linear_results, label='linear interp')
pl.plot(computed_time, cubic_results, label='cubic interp')
pl.legend()









    Out[13]:





<matplotlib.legend.Legend at 0x7ff9bfdf8cd0>

scipy.integrate: Integration and ODE solvers

This submodule is useful for summing up function values over intervals (integration) and solving ordinary differential equations. Partial differential equations are not covered and require other Python packages. As a quick example, we solve a case of Michaelis-Menten enzime kinetics.



In [36]:

    
%matplotlib inline
#from scipy import *
import scipy.integrate as integrate

'''
Slightly modified from a sample code generated by this program, 
that formulates a solver for different cases of enzime reactions:
http://code.google.com/p/kinpy/

## Reaction ##

#Michaelis-Menten enzyme kinetics.

E + S <-> ES
ES <-> E + P

## Mapping ##

E       0        -1*v_0(y[1], y[0], y[2]) +1*v_1(y[3], y[0], y[2])
S       1        -1*v_0(y[1], y[0], y[2])
ES      2        +1*v_0(y[1], y[0], y[2]) -1*v_1(y[3], y[0], y[2])
P       3        +1*v_1(y[3], y[0], y[2])
'''

dy = lambda y, t: array([\
 -1*v_0(y[1], y[0], y[2]) +1*v_1(y[3], y[0], y[2]),\
 -1*v_0(y[1], y[0], y[2]),\
 +1*v_0(y[1], y[0], y[2]) -1*v_1(y[3], y[0], y[2]),\
 +1*v_1(y[3], y[0], y[2])\
])

#Initial concentrations:
y0 = array([\
#E
0.6,\
#S
1.2,\
#ES
3.0,\
#P
0.2,\
])

#E + S <-> ES
v_0 = lambda S, E, ES : k0 * E**1 * S**1 - k0r * ES**1
k0 = 1.2
k0r = 1.5

#ES <-> E + P
v_1 = lambda P, E, ES : k1 * ES**1 - k1r * E**1 * P**1
k1 = 0.9
k1r = 1.9

t = arange(0, 10, 0.01)
Y = integrate.odeint(dy, y0, t)

import pylab as pl

pl.plot(t, Y, label='y')









    Out[36]:





[<matplotlib.lines.Line2D at 0x7fdb9805ef90>,
 <matplotlib.lines.Line2D at 0x7fdb71daa350>,
 <matplotlib.lines.Line2D at 0x7fdb71daa6d0>,
 <matplotlib.lines.Line2D at 0x7fdb71daa510>]

scipy.ndimage - Image processing

This module is useful for containing functions for multidimensional image manipulation, butthere image processing is deloped in scikit-learn and there are also independent packages for image manipulation. My favourite is PIL, but if falls out of the scope of this course.



In [18]:

    
%matplotlib inline
import numpy as np
from scipy import ndimage
from scipy import misc
import matplotlib.pyplot as plt
from matplotlib import cm
import pylab as pl

lena = misc.lena()
shifted_lena = ndimage.shift(lena, (50, 50))
shifted_lena2 = ndimage.shift(lena, (50, 50), mode='nearest')
rotated_lena = ndimage.rotate(lena, 30)
cropped_lena = lena[50:-50, 50:-50]
zoomed_lena = ndimage.zoom(lena, 2)
zoomed_lena.shape
noisy_lena = np.copy(lena).astype(np.float)
noisy_lena += lena.std()*0.5*np.random.standard_normal(lena.shape)
blurred_lena = ndimage.gaussian_filter(noisy_lena, sigma=3)
median_lena = ndimage.median_filter(blurred_lena, size=5)
from scipy import signal
wiener_lena = signal.wiener(blurred_lena, (5,5))

fig = plt.figure()
ax1 = fig.add_subplot(111)
pl.imshow(blurred_lena, cmap=cm.gray)









    



---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-18-dc28677dbe8c> in <module>()
      7 import pylab as pl
      8 
----> 9 lena = misc.lena()
     10 shifted_lena = ndimage.shift(lena, (50, 50))
     11 shifted_lena2 = ndimage.shift(lena, (50, 50), mode='nearest')

/home/sergiun/programs/anaconda3/envs/py35/lib/python3.5/site-packages/scipy/misc/common.py in lena()
    374     face, ascent
    375     """
--> 376     raise RuntimeError('lena() is no longer included in SciPy, please use '
    377                        'ascent() or face() instead')
    378 

RuntimeError: lena() is no longer included in SciPy, please use ascent() or face() instead

Sympy

Symbolic math is sometimes important, especially if we are weak at calculus or if we need to perform automated calculus on long formulas. We are briefly going through a few test cases, to get the feel of it. Symbolic math is especially developed for Mathematica, or Sage which is an open-source equivalent.



In [2]:

    
import sympy
print sympy.sqrt(8)
import math
print math.sqrt(8)









    



2*sqrt(2)
2.82842712475



In [11]:

    
from sympy import symbols
x, y, z, t = symbols('x y z t')
expr = x + 2*y
print expr
print x * expr
from sympy import expand, factor, simplify
expanded_expr = expand(x*expr)
print expanded_expr
print factor(expanded_expr)
exp = expanded_expr.subs(x, z**t)
print exp
print simplify(exp)









    



x + 2*y
x*(x + 2*y)
x**2 + 2*x*y
x*(x + 2*y)
2*y*z**t + z**(2*t)
z**t*(2*y + z**t)

In the scipy.optimize paragraph we needed the Hessian matrix for a function f. Here is how you can obtain it in sympy:



In [21]:

    
import sympy
x, y = sympy.symbols('x y')
f = .5*(1 - x)**2 + (y - x**2)**2
h = sympy.hessian(f, [x,y])
print h
from IPython.display import Latex
Latex(sympy.latex(h))









    



Matrix([[12*x**2 - 4*y + 1.0, -4*x], [-4*x, 2]])






    Out[21]:





\left[\begin{matrix}12 x^{2} - 4 y + 1.0 & - 4 x\\- 4 x & 2\end{matrix}\right]



In [22]:

    
from IPython.display import HTML
HTML('<iframe src=http://en.wikipedia.org/wiki/Hessian_matrix width=700 height=350></iframe>')









    Out[22]:



In [ ]: