Sometimes computing the likelihood is not as fast as we would like. Theano provides handy profiling tools, which pymc3 provides a wrapper model.profile which returns a ProfileStats object. Here we'll profile the likelihood and gradient for the stochastic volatility example.
First we build the model.
In [1]:
import numpy as np
from pymc3 import *
from pymc3.math import exp
from pymc3.distributions.timeseries import *
n = 400
returns = np.genfromtxt(get_data_file('pymc3.examples', "data/SP500.csv"))[-n:]
with Model() as model:
sigma = Exponential('sigma', 1. / .02, testval=.1)
nu = Exponential('nu', 1. / 10)
s = GaussianRandomWalk('s', sigma ** -2, shape=n)
r = StudentT('r', nu, lam=exp(-2 * s), observed=returns)
WARNING (theano.configdefaults): Only clang++ is supported. With g++, we end up with strange g++/OSX bugs.
Applied log-transform to sigma and added transformed sigma_log_ to model.
Applied log-transform to nu and added transformed nu_log_ to model.
Then call profile and summarize it.
In [2]:
model.profile(model.logpt).summary()
Function profiling
==================
Message: /Users/colin/projects/pymc3/pymc3/model.py:349
Time in 1000 calls to Function.__call__: 4.690456e-02s
Time in Function.fn.__call__: 2.515221e-02s (53.624%)
Time in thunks: 1.793075e-02s (38.228%)
Total compile time: 5.913320e-01s
Number of Apply nodes: 26
Theano Optimizer time: 4.063430e-01s
Theano validate time: 1.725912e-03s
Theano Linker time (includes C, CUDA code generation/compiling): 1.008379e-01s
Import time 7.905388e-02s
Node make_thunk time 9.964800e-02s
Time in all call to theano.grad() 3.328490e-02s
Time since theano import 41.720s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
82.9% 82.9% 0.015s 1.24e-06s C 12000 12 theano.tensor.elemwise.Elemwise
5.6% 88.5% 0.001s 5.03e-07s C 2000 2 theano.tensor.elemwise.Sum
3.8% 92.3% 0.001s 1.72e-07s C 4000 4 theano.tensor.basic.Reshape
3.3% 95.6% 0.001s 2.94e-07s C 2000 2 theano.tensor.subtensor.Subtensor
2.6% 98.2% 0.000s 2.29e-07s C 2000 2 theano.tensor.elemwise.DimShuffle
1.1% 99.2% 0.000s 9.44e-08s C 2000 2 theano.compile.ops.ViewOp
0.8% 100.0% 0.000s 7.09e-08s C 2000 2 theano.compile.ops.Rebroadcast
... (remaining 0 Classes account for 0.00%(0.00s) of the runtime)
Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
65.1% 65.1% 0.012s 1.17e-05s C 1000 1 Elemwise{Composite{Switch(Identity((GT(Composite{exp((i0 * i1))}(i0, i1), i2) * i3 * GT(inv(sqrt(Composite{exp((i0 * i1))}(i0, i1))), i2))), (((i4 + (i5 * log(((i6 * Composite{exp((i0 * i1))}(i0, i1)) / i7)))) - i8) - (i5 * i9 * log1p(((Composite{exp((i0 * i1))}(i0, i1) * i10) / i7)))), i11)}}
5.6% 70.7% 0.001s 5.03e-07s C 2000 2 Sum{acc_dtype=float64}
4.8% 75.5% 0.001s 8.57e-07s C 1000 1 Elemwise{Composite{Switch(i0, (i1 * ((-(i2 * sqr((i3 - i4)))) + i5)), i6)}}
2.7% 78.2% 0.000s 2.43e-07s C 2000 2 Elemwise{exp,no_inplace}
2.6% 80.8% 0.000s 2.29e-07s C 2000 2 InplaceDimShuffle{x}
2.5% 83.3% 0.000s 2.24e-07s C 2000 2 Elemwise{Composite{scalar_gammaln((i0 * i1))}}
2.1% 85.3% 0.000s 1.85e-07s C 2000 2 Reshape{1}
1.9% 87.3% 0.000s 3.45e-07s C 1000 1 Elemwise{Composite{(Switch(Identity(GT(i0, i1)), (i2 - (i3 * i0)), i4) + log(Abs(exp(i5))) + Switch(Identity(GT(i6, i1)), (i7 - (i8 * i6)), i4) + log(Abs(exp(i9))) + i10 + i11)}}[(0, 0)]
1.8% 89.0% 0.000s 1.59e-07s C 2000 2 Reshape{0}
1.7% 90.7% 0.000s 3.03e-07s C 1000 1 Subtensor{int64::}
1.6% 92.3% 0.000s 2.85e-07s C 1000 1 Subtensor{:int64:}
1.3% 93.6% 0.000s 2.40e-07s C 1000 1 Elemwise{Composite{log((i0 * i1))}}
1.3% 94.9% 0.000s 2.25e-07s C 1000 1 Elemwise{Composite{Identity((GT(i0, i1) * GT(inv(sqrt(i0)), i1)))}}
1.2% 96.1% 0.000s 2.17e-07s C 1000 1 Elemwise{add,no_inplace}
1.1% 97.2% 0.000s 1.90e-07s C 1000 1 Elemwise{gt,no_inplace}
1.1% 98.2% 0.000s 9.44e-08s C 2000 2 ViewOp
1.0% 99.2% 0.000s 1.77e-07s C 1000 1 Elemwise{Composite{inv(sqr(i0))}}
0.8% 100.0% 0.000s 7.09e-08s C 2000 2 Rebroadcast{1}
... (remaining 0 Ops account for 0.00%(0.00s) of the runtime)
Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
65.1% 65.1% 0.012s 1.17e-05s 1000 22 Elemwise{Composite{Switch(Identity((GT(Composite{exp((i0 * i1))}(i0, i1), i2) * i3 * GT(inv(sqrt(Composite{exp((i0 * i1))}(i0, i1))), i2))), (((i4 + (i5 * log(((i6 * Composite{exp((i0 * i1))}(i0, i1)) / i7)))) - i8) - (i5 * i9 * log1p(((Composite{exp((i0 * i1))}(i0, i1) * i10) / i7)))), i11)}}(TensorConstant{(1,) of -2.0}, s, TensorConstant{(1,) of 0}, Elemwise{gt,no_inplace}.0, Elemwise{Composite{scalar_gammaln((i0 * i1))}}.0, TensorConstant{(1,) o
4.8% 69.9% 0.001s 8.57e-07s 1000 21 Elemwise{Composite{Switch(i0, (i1 * ((-(i2 * sqr((i3 - i4)))) + i5)), i6)}}(Elemwise{Composite{Identity((GT(i0, i1) * GT(inv(sqrt(i0)), i1)))}}.0, TensorConstant{(1,) of 0.5}, Elemwise{Composite{inv(sqr(i0))}}.0, Subtensor{int64::}.0, Subtensor{:int64:}.0, Elemwise{Composite{log((i0 * i1))}}.0, TensorConstant{(1,) of -inf})
3.0% 72.9% 0.001s 5.36e-07s 1000 23 Sum{acc_dtype=float64}(Elemwise{Composite{Switch(i0, (i1 * ((-(i2 * sqr((i3 - i4)))) + i5)), i6)}}.0)
2.6% 75.5% 0.000s 4.71e-07s 1000 24 Sum{acc_dtype=float64}(Elemwise{Composite{Switch(Identity((GT(Composite{exp((i0 * i1))}(i0, i1), i2) * i3 * GT(inv(sqrt(Composite{exp((i0 * i1))}(i0, i1))), i2))), (((i4 + (i5 * log(((i6 * Composite{exp((i0 * i1))}(i0, i1)) / i7)))) - i8) - (i5 * i9 * log1p(((Composite{exp((i0 * i1))}(i0, i1) * i10) / i7)))), i11)}}.0)
1.9% 77.4% 0.000s 3.45e-07s 1000 25 Elemwise{Composite{(Switch(Identity(GT(i0, i1)), (i2 - (i3 * i0)), i4) + log(Abs(exp(i5))) + Switch(Identity(GT(i6, i1)), (i7 - (i8 * i6)), i4) + log(Abs(exp(i9))) + i10 + i11)}}[(0, 0)](Elemwise{exp,no_inplace}.0, TensorConstant{0}, TensorConstant{3.9120230674743652}, TensorConstant{50.0}, TensorConstant{-inf}, Reshape{0}.0, Elemwise{exp,no_inplace}.0, TensorConstant{-2.3025850929940455}, TensorConstant{0.1}, Reshape{0}.0, Sum{acc_dtype=float64}.0,
1.7% 79.1% 0.000s 3.03e-07s 1000 3 Subtensor{int64::}(s, Constant{1})
1.6% 80.8% 0.000s 2.93e-07s 1000 20 Elemwise{Composite{scalar_gammaln((i0 * i1))}}(TensorConstant{(1,) of 0.5}, Elemwise{add,no_inplace}.0)
1.6% 82.4% 0.000s 2.85e-07s 1000 2 Subtensor{:int64:}(s, Constant{-1})
1.5% 83.8% 0.000s 2.61e-07s 1000 0 Elemwise{exp,no_inplace}(sigma_log_)
1.4% 85.2% 0.000s 2.50e-07s 1000 10 InplaceDimShuffle{x}(sigma)
1.3% 86.5% 0.000s 2.40e-07s 1000 18 Elemwise{Composite{log((i0 * i1))}}(TensorConstant{(1,) of 0...9154943092}, Elemwise{Composite{inv(sqr(i0))}}.0)
1.3% 87.9% 0.000s 2.37e-07s 1000 5 Reshape{1}(sigma_log_, TensorConstant{(1,) of -1})
1.3% 89.1% 0.000s 2.25e-07s 1000 19 Elemwise{Composite{Identity((GT(i0, i1) * GT(inv(sqrt(i0)), i1)))}}(Elemwise{Composite{inv(sqr(i0))}}.0, TensorConstant{(1,) of 0})
1.2% 90.4% 0.000s 2.24e-07s 1000 1 Elemwise{exp,no_inplace}(nu_log_)
1.2% 91.6% 0.000s 2.17e-07s 1000 15 Elemwise{add,no_inplace}(TensorConstant{(1,) of 1.0}, InplaceDimShuffle{x}.0)
1.2% 92.7% 0.000s 2.10e-07s 1000 13 Reshape{0}(Rebroadcast{1}.0, TensorConstant{[]})
1.2% 93.9% 0.000s 2.08e-07s 1000 11 InplaceDimShuffle{x}(nu)
1.1% 95.0% 0.000s 1.90e-07s 1000 17 Elemwise{gt,no_inplace}(InplaceDimShuffle{x}.0, TensorConstant{(1,) of 0})
1.0% 96.0% 0.000s 1.77e-07s 1000 14 Elemwise{Composite{inv(sqr(i0))}}(InplaceDimShuffle{x}.0)
0.9% 96.8% 0.000s 1.54e-07s 1000 16 Elemwise{Composite{scalar_gammaln((i0 * i1))}}(TensorConstant{(1,) of 0.5}, InplaceDimShuffle{x}.0)
... (remaining 6 Apply instances account for 3.19%(0.00s) of the runtime)
Here are tips to potentially make your code run faster
(if you think of new ones, suggest them on the mailing list).
Test them first, as they are not guaranteed to always provide a speedup.
- Try the Theano flag floatX=float32
We don't know if amdlibm will accelerate this scalar op. scalar_gammaln
We don't know if amdlibm will accelerate this scalar op. scalar_gammaln
- Try installing amdlibm and set the Theano flag lib.amdlibm=True. This speeds up only some Elemwise operation.
In [3]:
model.profile(gradient(model.logpt, model.vars)).summary()
Function profiling
==================
Message: /Users/colin/projects/pymc3/pymc3/model.py:349
Time in 1000 calls to Function.__call__: 8.100653e-02s
Time in Function.fn.__call__: 5.562019e-02s (68.661%)
Time in thunks: 3.688788e-02s (45.537%)
Total compile time: 9.140079e-01s
Number of Apply nodes: 63
Theano Optimizer time: 5.787621e-01s
Theano validate time: 8.579969e-03s
Theano Linker time (includes C, CUDA code generation/compiling): 2.355778e-01s
Import time 1.562703e-01s
Node make_thunk time 2.326310e-01s
Time in all call to theano.grad() 1.042541e+00s
Time since theano import 43.896s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
59.3% 59.3% 0.022s 8.41e-07s C 26000 26 theano.tensor.elemwise.Elemwise
9.8% 69.1% 0.004s 5.19e-07s C 7000 7 theano.tensor.elemwise.Sum
8.8% 77.9% 0.003s 1.62e-06s C 2000 2 theano.tensor.subtensor.IncSubtensor
5.9% 83.8% 0.002s 2.17e-07s C 10000 10 theano.tensor.basic.Reshape
5.5% 89.3% 0.002s 2.04e-06s C 1000 1 theano.tensor.basic.Join
3.2% 92.5% 0.001s 1.17e-06s C 1000 1 theano.tensor.basic.Alloc
2.4% 94.8% 0.001s 1.77e-07s C 5000 5 theano.tensor.elemwise.DimShuffle
2.0% 96.9% 0.001s 3.70e-07s C 2000 2 theano.tensor.subtensor.Subtensor
2.0% 98.8% 0.001s 1.21e-07s C 6000 6 theano.compile.ops.Rebroadcast
0.6% 99.4% 0.000s 2.25e-07s C 1000 1 theano.compile.ops.Shape_i
0.6% 100.0% 0.000s 1.05e-07s C 2000 2 theano.compile.ops.ViewOp
... (remaining 0 Classes account for 0.00%(0.00s) of the runtime)
Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
16.9% 16.9% 0.006s 6.25e-06s C 1000 1 Elemwise{Composite{Switch(i0, (-log1p((i1 / i2))), i3)}}
9.8% 26.8% 0.004s 5.19e-07s C 7000 7 Sum{acc_dtype=float64}
6.1% 32.9% 0.002s 2.26e-06s C 1000 1 Elemwise{Composite{Identity((GT(i0, i1) * i2 * GT(inv(sqrt(i0)), i1)))}}
5.9% 38.8% 0.002s 2.19e-06s C 1000 1 IncSubtensor{InplaceInc;int64::}
5.6% 44.4% 0.002s 2.05e-06s C 1000 1 Elemwise{Composite{Switch(i0, ((i1 * i2 * i3 * i4) / i5), i6)}}
5.5% 49.9% 0.002s 2.04e-06s C 1000 1 Join
5.3% 55.2% 0.002s 1.95e-06s C 1000 1 Elemwise{Composite{Switch(i0, (i1 * (i2 + ((i3 * i4 * i5 * i6) / i7))), i8)}}[(0, 6)]
5.0% 60.2% 0.002s 4.60e-07s C 4000 4 Elemwise{Switch}
4.3% 64.6% 0.002s 1.60e-06s C 1000 1 Elemwise{Composite{exp((i0 * i1))}}
3.2% 67.7% 0.001s 1.17e-06s C 1000 1 Alloc
2.9% 70.7% 0.001s 2.72e-07s C 4000 4 Reshape{1}
2.9% 73.6% 0.001s 1.81e-07s C 6000 6 Reshape{0}
2.8% 76.4% 0.001s 1.04e-06s C 1000 1 IncSubtensor{InplaceInc;:int64:}
2.4% 78.8% 0.001s 1.77e-07s C 5000 5 InplaceDimShuffle{x}
1.8% 80.6% 0.001s 6.52e-07s C 1000 1 Elemwise{Composite{Switch(i0, (i1 * sqr(i2)), i3)}}
1.6% 82.2% 0.001s 5.92e-07s C 1000 1 Elemwise{Composite{Switch(i0, (i1 * i2 * i3), i4)}}
1.4% 83.6% 0.001s 5.26e-07s C 1000 1 Elemwise{Composite{(Switch(Identity(GT(i0, i1)), (i2 * i0), i1) + (i3 * i0) + (((i4 * i5 * scalar_psi((i4 * i6))) + (i7 * (i8 / i9)) + (i4 * i10 * scalar_psi((i4 * i9))) + (i4 * i11) + (i12 / i9)) * i0))}}[(0, 3)]
1.3% 84.9% 0.000s 1.18e-07s C 4000 4 Rebroadcast{1}
1.3% 86.2% 0.000s 4.67e-07s C 1000 1 Elemwise{mul,no_inplace}
1.3% 87.4% 0.000s 2.34e-07s C 2000 2 Elemwise{Composite{(sgn(exp(i0)) / Abs(exp(i0)))}}
... (remaining 15 Ops account for 12.56%(0.00s) of the runtime)
Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
16.9% 16.9% 0.006s 6.25e-06s 1000 32 Elemwise{Composite{Switch(i0, (-log1p((i1 / i2))), i3)}}(Elemwise{Composite{Identity((GT(i0, i1) * i2 * GT(inv(sqrt(i0)), i1)))}}.0, Elemwise{mul,no_inplace}.0, InplaceDimShuffle{x}.0, TensorConstant{(1,) of 0})
6.1% 23.1% 0.002s 2.26e-06s 1000 26 Elemwise{Composite{Identity((GT(i0, i1) * i2 * GT(inv(sqrt(i0)), i1)))}}(Elemwise{Composite{exp((i0 * i1))}}.0, TensorConstant{(1,) of 0}, Elemwise{gt,no_inplace}.0)
5.9% 29.0% 0.002s 2.19e-06s 1000 54 IncSubtensor{InplaceInc;int64::}(Elemwise{Composite{Switch(i0, (i1 * (i2 + ((i3 * i4 * i5 * i6) / i7))), i8)}}[(0, 6)].0, Elemwise{Composite{Switch(i0, (i1 * i2 * i3), i4)}}.0, Constant{1})
5.6% 34.6% 0.002s 2.05e-06s 1000 48 Elemwise{Composite{Switch(i0, ((i1 * i2 * i3 * i4) / i5), i6)}}(Elemwise{Composite{Identity((GT(i0, i1) * i2 * GT(inv(sqrt(i0)), i1)))}}.0, TensorConstant{(1,) of 0.5}, InplaceDimShuffle{x}.0, Elemwise{Composite{exp((i0 * i1))}}.0, TensorConstant{[ 4.05769..48400e-06]}, Elemwise{Add}[(0, 1)].0, TensorConstant{(1,) of 0})
5.5% 40.1% 0.002s 2.04e-06s 1000 62 Join(TensorConstant{0}, Rebroadcast{1}.0, Rebroadcast{1}.0, IncSubtensor{InplaceInc;:int64:}.0)
5.3% 45.4% 0.002s 1.95e-06s 1000 50 Elemwise{Composite{Switch(i0, (i1 * (i2 + ((i3 * i4 * i5 * i6) / i7))), i8)}}[(0, 6)](Elemwise{Composite{Identity((GT(i0, i1) * i2 * GT(inv(sqrt(i0)), i1)))}}.0, TensorConstant{(1,) of -2.0}, TensorConstant{(1,) of 0.5}, TensorConstant{(1,) of -0.5}, InplaceDimShuffle{x}.0, TensorConstant{[ 4.05769..48400e-06]}, Elemwise{Composite{exp((i0 * i1))}}.0, Elemwise{Add}[(0, 1)].0, TensorConstant{(1,) of 0})
4.3% 49.7% 0.002s 1.60e-06s 1000 5 Elemwise{Composite{exp((i0 * i1))}}(TensorConstant{(1,) of -2.0}, s)
3.2% 52.9% 0.001s 1.17e-06s 1000 41 Alloc(Elemwise{switch,no_inplace}.0, Elemwise{Composite{(i0 - Switch(LT(i1, i0), i2, i0))}}[(0, 0)].0)
2.8% 55.7% 0.001s 1.04e-06s 1000 57 IncSubtensor{InplaceInc;:int64:}(IncSubtensor{InplaceInc;int64::}.0, Elemwise{Composite{Switch(i0, (i1 * i2), i3)}}[(0, 2)].0, Constant{-1})
1.8% 57.5% 0.001s 6.52e-07s 1000 30 Elemwise{Composite{Switch(i0, (i1 * sqr(i2)), i3)}}(Elemwise{Composite{Identity((GT(i0, i1) * GT(inv(sqrt(i0)), i1)))}}.0, TensorConstant{(1,) of 0.5}, Elemwise{sub,no_inplace}.0, TensorConstant{(1,) of 0})
1.6% 59.1% 0.001s 5.99e-07s 1000 35 Elemwise{Switch}(Elemwise{Composite{Identity((GT(i0, i1) * i2 * GT(inv(sqrt(i0)), i1)))}}.0, TensorConstant{(1,) of 1.0}, TensorConstant{(1,) of 0.0})
1.6% 60.7% 0.001s 5.92e-07s 1000 29 Elemwise{Composite{Switch(i0, (i1 * i2 * i3), i4)}}(Elemwise{Composite{Identity((GT(i0, i1) * GT(inv(sqrt(i0)), i1)))}}.0, TensorConstant{(1,) of -1.0}, InplaceDimShuffle{x}.0, Elemwise{sub,no_inplace}.0, TensorConstant{(1,) of 0})
1.5% 62.2% 0.001s 5.62e-07s 1000 49 Sum{acc_dtype=float64}(Alloc.0)
1.5% 63.7% 0.001s 5.50e-07s 1000 34 Elemwise{switch,no_inplace}(Elemwise{Composite{Identity((GT(i0, i1) * i2 * GT(inv(sqrt(i0)), i1)))}}.0, TensorConstant{(1,) of -0..9154943092}, TensorConstant{(1,) of 0})
1.5% 65.2% 0.001s 5.38e-07s 1000 43 Sum{acc_dtype=float64}(Elemwise{switch,no_inplace}.0)
1.4% 66.6% 0.001s 5.26e-07s 1000 56 Elemwise{Composite{(Switch(Identity(GT(i0, i1)), (i2 * i0), i1) + (i3 * i0) + (((i4 * i5 * scalar_psi((i4 * i6))) + (i7 * (i8 / i9)) + (i4 * i10 * scalar_psi((i4 * i9))) + (i4 * i11) + (i12 / i9)) * i0))}}[(0, 3)](Elemwise{exp,no_inplace}.0, TensorConstant{0}, TensorConstant{-0.1}, Reshape{0}.0, TensorConstant{0.5}, Sum{acc_dtype=float64}.0, Elemwise{add,no_inplace}.0, TensorConstant{3.141592653589793}, Sum{acc_dtype=float64}.0, nu, Sum{acc_dtype=fl
1.4% 68.0% 0.001s 5.20e-07s 1000 42 Sum{acc_dtype=float64}(Elemwise{Composite{Switch(i0, (-log1p((i1 / i2))), i3)}}.0)
1.4% 69.4% 0.001s 5.08e-07s 1000 40 Sum{acc_dtype=float64}(Elemwise{Composite{Switch(i0, (i1 * sqr(i2)), i3)}}.0)
1.4% 70.8% 0.001s 5.07e-07s 1000 53 Sum{acc_dtype=float64}(Elemwise{Composite{Switch(i0, ((i1 * i2 * i3 * i4) / i5), i6)}}.0)
1.4% 72.1% 0.001s 5.01e-07s 1000 44 Sum{acc_dtype=float64}(Elemwise{switch,no_inplace}.0)
... (remaining 43 Apply instances account for 27.87%(0.01s) of the runtime)
Here are tips to potentially make your code run faster
(if you think of new ones, suggest them on the mailing list).
Test them first, as they are not guaranteed to always provide a speedup.
- Try the Theano flag floatX=float32
We don't know if amdlibm will accelerate this scalar op. scalar_psi
We don't know if amdlibm will accelerate this scalar op. scalar_psi
- Try installing amdlibm and set the Theano flag lib.amdlibm=True. This speeds up only some Elemwise operation.
Content source: wanderer2/pymc3
Similar notebooks: