Profiling

Sometimes computing the likelihood is not as fast as we would like. Theano provides handy profiling tools, which pymc3 provides a wrapper model.profile which returns a ProfileStats object. Here we'll profile the likelihood and gradient for the stochastic volatility example.

First we build the model.


In [1]:
import numpy as np
from pymc3 import *
from pymc3.math import exp
from pymc3.distributions.timeseries import *

n = 400
returns = np.genfromtxt(get_data_file('pymc3.examples', "data/SP500.csv"))[-n:]
with Model() as model:
    sigma = Exponential('sigma', 1. / .02, testval=.1)
    nu = Exponential('nu', 1. / 10)
    s = GaussianRandomWalk('s', sigma ** -2, shape=n)
    r = StudentT('r', nu, lam=exp(-2 * s), observed=returns)


WARNING (theano.configdefaults): Only clang++ is supported. With g++, we end up with strange g++/OSX bugs.
Applied log-transform to sigma and added transformed sigma_log_ to model.
Applied log-transform to nu and added transformed nu_log_ to model.

Then call profile and summarize it.


In [2]:
model.profile(model.logpt).summary()


Function profiling
==================
  Message: /Users/colin/projects/pymc3/pymc3/model.py:349
  Time in 1000 calls to Function.__call__: 4.690456e-02s
  Time in Function.fn.__call__: 2.515221e-02s (53.624%)
  Time in thunks: 1.793075e-02s (38.228%)
  Total compile time: 5.913320e-01s
    Number of Apply nodes: 26
    Theano Optimizer time: 4.063430e-01s
       Theano validate time: 1.725912e-03s
    Theano Linker time (includes C, CUDA code generation/compiling): 1.008379e-01s
       Import time 7.905388e-02s
       Node make_thunk time 9.964800e-02s

Time in all call to theano.grad() 3.328490e-02s
Time since theano import 41.720s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
  82.9%    82.9%       0.015s       1.24e-06s     C    12000      12   theano.tensor.elemwise.Elemwise
   5.6%    88.5%       0.001s       5.03e-07s     C     2000       2   theano.tensor.elemwise.Sum
   3.8%    92.3%       0.001s       1.72e-07s     C     4000       4   theano.tensor.basic.Reshape
   3.3%    95.6%       0.001s       2.94e-07s     C     2000       2   theano.tensor.subtensor.Subtensor
   2.6%    98.2%       0.000s       2.29e-07s     C     2000       2   theano.tensor.elemwise.DimShuffle
   1.1%    99.2%       0.000s       9.44e-08s     C     2000       2   theano.compile.ops.ViewOp
   0.8%   100.0%       0.000s       7.09e-08s     C     2000       2   theano.compile.ops.Rebroadcast
   ... (remaining 0 Classes account for   0.00%(0.00s) of the runtime)

Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
  65.1%    65.1%       0.012s       1.17e-05s     C     1000        1   Elemwise{Composite{Switch(Identity((GT(Composite{exp((i0 * i1))}(i0, i1), i2) * i3 * GT(inv(sqrt(Composite{exp((i0 * i1))}(i0, i1))), i2))), (((i4 + (i5 * log(((i6 * Composite{exp((i0 * i1))}(i0, i1)) / i7)))) - i8) - (i5 * i9 * log1p(((Composite{exp((i0 * i1))}(i0, i1) * i10) / i7)))), i11)}}
   5.6%    70.7%       0.001s       5.03e-07s     C     2000        2   Sum{acc_dtype=float64}
   4.8%    75.5%       0.001s       8.57e-07s     C     1000        1   Elemwise{Composite{Switch(i0, (i1 * ((-(i2 * sqr((i3 - i4)))) + i5)), i6)}}
   2.7%    78.2%       0.000s       2.43e-07s     C     2000        2   Elemwise{exp,no_inplace}
   2.6%    80.8%       0.000s       2.29e-07s     C     2000        2   InplaceDimShuffle{x}
   2.5%    83.3%       0.000s       2.24e-07s     C     2000        2   Elemwise{Composite{scalar_gammaln((i0 * i1))}}
   2.1%    85.3%       0.000s       1.85e-07s     C     2000        2   Reshape{1}
   1.9%    87.3%       0.000s       3.45e-07s     C     1000        1   Elemwise{Composite{(Switch(Identity(GT(i0, i1)), (i2 - (i3 * i0)), i4) + log(Abs(exp(i5))) + Switch(Identity(GT(i6, i1)), (i7 - (i8 * i6)), i4) + log(Abs(exp(i9))) + i10 + i11)}}[(0, 0)]
   1.8%    89.0%       0.000s       1.59e-07s     C     2000        2   Reshape{0}
   1.7%    90.7%       0.000s       3.03e-07s     C     1000        1   Subtensor{int64::}
   1.6%    92.3%       0.000s       2.85e-07s     C     1000        1   Subtensor{:int64:}
   1.3%    93.6%       0.000s       2.40e-07s     C     1000        1   Elemwise{Composite{log((i0 * i1))}}
   1.3%    94.9%       0.000s       2.25e-07s     C     1000        1   Elemwise{Composite{Identity((GT(i0, i1) * GT(inv(sqrt(i0)), i1)))}}
   1.2%    96.1%       0.000s       2.17e-07s     C     1000        1   Elemwise{add,no_inplace}
   1.1%    97.2%       0.000s       1.90e-07s     C     1000        1   Elemwise{gt,no_inplace}
   1.1%    98.2%       0.000s       9.44e-08s     C     2000        2   ViewOp
   1.0%    99.2%       0.000s       1.77e-07s     C     1000        1   Elemwise{Composite{inv(sqr(i0))}}
   0.8%   100.0%       0.000s       7.09e-08s     C     2000        2   Rebroadcast{1}
   ... (remaining 0 Ops account for   0.00%(0.00s) of the runtime)

Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
  65.1%    65.1%       0.012s       1.17e-05s   1000    22   Elemwise{Composite{Switch(Identity((GT(Composite{exp((i0 * i1))}(i0, i1), i2) * i3 * GT(inv(sqrt(Composite{exp((i0 * i1))}(i0, i1))), i2))), (((i4 + (i5 * log(((i6 * Composite{exp((i0 * i1))}(i0, i1)) / i7)))) - i8) - (i5 * i9 * log1p(((Composite{exp((i0 * i1))}(i0, i1) * i10) / i7)))), i11)}}(TensorConstant{(1,) of -2.0}, s, TensorConstant{(1,) of 0}, Elemwise{gt,no_inplace}.0, Elemwise{Composite{scalar_gammaln((i0 * i1))}}.0, TensorConstant{(1,) o
   4.8%    69.9%       0.001s       8.57e-07s   1000    21   Elemwise{Composite{Switch(i0, (i1 * ((-(i2 * sqr((i3 - i4)))) + i5)), i6)}}(Elemwise{Composite{Identity((GT(i0, i1) * GT(inv(sqrt(i0)), i1)))}}.0, TensorConstant{(1,) of 0.5}, Elemwise{Composite{inv(sqr(i0))}}.0, Subtensor{int64::}.0, Subtensor{:int64:}.0, Elemwise{Composite{log((i0 * i1))}}.0, TensorConstant{(1,) of -inf})
   3.0%    72.9%       0.001s       5.36e-07s   1000    23   Sum{acc_dtype=float64}(Elemwise{Composite{Switch(i0, (i1 * ((-(i2 * sqr((i3 - i4)))) + i5)), i6)}}.0)
   2.6%    75.5%       0.000s       4.71e-07s   1000    24   Sum{acc_dtype=float64}(Elemwise{Composite{Switch(Identity((GT(Composite{exp((i0 * i1))}(i0, i1), i2) * i3 * GT(inv(sqrt(Composite{exp((i0 * i1))}(i0, i1))), i2))), (((i4 + (i5 * log(((i6 * Composite{exp((i0 * i1))}(i0, i1)) / i7)))) - i8) - (i5 * i9 * log1p(((Composite{exp((i0 * i1))}(i0, i1) * i10) / i7)))), i11)}}.0)
   1.9%    77.4%       0.000s       3.45e-07s   1000    25   Elemwise{Composite{(Switch(Identity(GT(i0, i1)), (i2 - (i3 * i0)), i4) + log(Abs(exp(i5))) + Switch(Identity(GT(i6, i1)), (i7 - (i8 * i6)), i4) + log(Abs(exp(i9))) + i10 + i11)}}[(0, 0)](Elemwise{exp,no_inplace}.0, TensorConstant{0}, TensorConstant{3.9120230674743652}, TensorConstant{50.0}, TensorConstant{-inf}, Reshape{0}.0, Elemwise{exp,no_inplace}.0, TensorConstant{-2.3025850929940455}, TensorConstant{0.1}, Reshape{0}.0, Sum{acc_dtype=float64}.0,
   1.7%    79.1%       0.000s       3.03e-07s   1000     3   Subtensor{int64::}(s, Constant{1})
   1.6%    80.8%       0.000s       2.93e-07s   1000    20   Elemwise{Composite{scalar_gammaln((i0 * i1))}}(TensorConstant{(1,) of 0.5}, Elemwise{add,no_inplace}.0)
   1.6%    82.4%       0.000s       2.85e-07s   1000     2   Subtensor{:int64:}(s, Constant{-1})
   1.5%    83.8%       0.000s       2.61e-07s   1000     0   Elemwise{exp,no_inplace}(sigma_log_)
   1.4%    85.2%       0.000s       2.50e-07s   1000    10   InplaceDimShuffle{x}(sigma)
   1.3%    86.5%       0.000s       2.40e-07s   1000    18   Elemwise{Composite{log((i0 * i1))}}(TensorConstant{(1,) of 0...9154943092}, Elemwise{Composite{inv(sqr(i0))}}.0)
   1.3%    87.9%       0.000s       2.37e-07s   1000     5   Reshape{1}(sigma_log_, TensorConstant{(1,) of -1})
   1.3%    89.1%       0.000s       2.25e-07s   1000    19   Elemwise{Composite{Identity((GT(i0, i1) * GT(inv(sqrt(i0)), i1)))}}(Elemwise{Composite{inv(sqr(i0))}}.0, TensorConstant{(1,) of 0})
   1.2%    90.4%       0.000s       2.24e-07s   1000     1   Elemwise{exp,no_inplace}(nu_log_)
   1.2%    91.6%       0.000s       2.17e-07s   1000    15   Elemwise{add,no_inplace}(TensorConstant{(1,) of 1.0}, InplaceDimShuffle{x}.0)
   1.2%    92.7%       0.000s       2.10e-07s   1000    13   Reshape{0}(Rebroadcast{1}.0, TensorConstant{[]})
   1.2%    93.9%       0.000s       2.08e-07s   1000    11   InplaceDimShuffle{x}(nu)
   1.1%    95.0%       0.000s       1.90e-07s   1000    17   Elemwise{gt,no_inplace}(InplaceDimShuffle{x}.0, TensorConstant{(1,) of 0})
   1.0%    96.0%       0.000s       1.77e-07s   1000    14   Elemwise{Composite{inv(sqr(i0))}}(InplaceDimShuffle{x}.0)
   0.9%    96.8%       0.000s       1.54e-07s   1000    16   Elemwise{Composite{scalar_gammaln((i0 * i1))}}(TensorConstant{(1,) of 0.5}, InplaceDimShuffle{x}.0)
   ... (remaining 6 Apply instances account for 3.19%(0.00s) of the runtime)

Here are tips to potentially make your code run faster
                 (if you think of new ones, suggest them on the mailing list).
                 Test them first, as they are not guaranteed to always provide a speedup.
  - Try the Theano flag floatX=float32
We don't know if amdlibm will accelerate this scalar op. scalar_gammaln
We don't know if amdlibm will accelerate this scalar op. scalar_gammaln
  - Try installing amdlibm and set the Theano flag lib.amdlibm=True. This speeds up only some Elemwise operation.

In [3]:
model.profile(gradient(model.logpt, model.vars)).summary()


Function profiling
==================
  Message: /Users/colin/projects/pymc3/pymc3/model.py:349
  Time in 1000 calls to Function.__call__: 8.100653e-02s
  Time in Function.fn.__call__: 5.562019e-02s (68.661%)
  Time in thunks: 3.688788e-02s (45.537%)
  Total compile time: 9.140079e-01s
    Number of Apply nodes: 63
    Theano Optimizer time: 5.787621e-01s
       Theano validate time: 8.579969e-03s
    Theano Linker time (includes C, CUDA code generation/compiling): 2.355778e-01s
       Import time 1.562703e-01s
       Node make_thunk time 2.326310e-01s

Time in all call to theano.grad() 1.042541e+00s
Time since theano import 43.896s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
  59.3%    59.3%       0.022s       8.41e-07s     C    26000      26   theano.tensor.elemwise.Elemwise
   9.8%    69.1%       0.004s       5.19e-07s     C     7000       7   theano.tensor.elemwise.Sum
   8.8%    77.9%       0.003s       1.62e-06s     C     2000       2   theano.tensor.subtensor.IncSubtensor
   5.9%    83.8%       0.002s       2.17e-07s     C    10000      10   theano.tensor.basic.Reshape
   5.5%    89.3%       0.002s       2.04e-06s     C     1000       1   theano.tensor.basic.Join
   3.2%    92.5%       0.001s       1.17e-06s     C     1000       1   theano.tensor.basic.Alloc
   2.4%    94.8%       0.001s       1.77e-07s     C     5000       5   theano.tensor.elemwise.DimShuffle
   2.0%    96.9%       0.001s       3.70e-07s     C     2000       2   theano.tensor.subtensor.Subtensor
   2.0%    98.8%       0.001s       1.21e-07s     C     6000       6   theano.compile.ops.Rebroadcast
   0.6%    99.4%       0.000s       2.25e-07s     C     1000       1   theano.compile.ops.Shape_i
   0.6%   100.0%       0.000s       1.05e-07s     C     2000       2   theano.compile.ops.ViewOp
   ... (remaining 0 Classes account for   0.00%(0.00s) of the runtime)

Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
  16.9%    16.9%       0.006s       6.25e-06s     C     1000        1   Elemwise{Composite{Switch(i0, (-log1p((i1 / i2))), i3)}}
   9.8%    26.8%       0.004s       5.19e-07s     C     7000        7   Sum{acc_dtype=float64}
   6.1%    32.9%       0.002s       2.26e-06s     C     1000        1   Elemwise{Composite{Identity((GT(i0, i1) * i2 * GT(inv(sqrt(i0)), i1)))}}
   5.9%    38.8%       0.002s       2.19e-06s     C     1000        1   IncSubtensor{InplaceInc;int64::}
   5.6%    44.4%       0.002s       2.05e-06s     C     1000        1   Elemwise{Composite{Switch(i0, ((i1 * i2 * i3 * i4) / i5), i6)}}
   5.5%    49.9%       0.002s       2.04e-06s     C     1000        1   Join
   5.3%    55.2%       0.002s       1.95e-06s     C     1000        1   Elemwise{Composite{Switch(i0, (i1 * (i2 + ((i3 * i4 * i5 * i6) / i7))), i8)}}[(0, 6)]
   5.0%    60.2%       0.002s       4.60e-07s     C     4000        4   Elemwise{Switch}
   4.3%    64.6%       0.002s       1.60e-06s     C     1000        1   Elemwise{Composite{exp((i0 * i1))}}
   3.2%    67.7%       0.001s       1.17e-06s     C     1000        1   Alloc
   2.9%    70.7%       0.001s       2.72e-07s     C     4000        4   Reshape{1}
   2.9%    73.6%       0.001s       1.81e-07s     C     6000        6   Reshape{0}
   2.8%    76.4%       0.001s       1.04e-06s     C     1000        1   IncSubtensor{InplaceInc;:int64:}
   2.4%    78.8%       0.001s       1.77e-07s     C     5000        5   InplaceDimShuffle{x}
   1.8%    80.6%       0.001s       6.52e-07s     C     1000        1   Elemwise{Composite{Switch(i0, (i1 * sqr(i2)), i3)}}
   1.6%    82.2%       0.001s       5.92e-07s     C     1000        1   Elemwise{Composite{Switch(i0, (i1 * i2 * i3), i4)}}
   1.4%    83.6%       0.001s       5.26e-07s     C     1000        1   Elemwise{Composite{(Switch(Identity(GT(i0, i1)), (i2 * i0), i1) + (i3 * i0) + (((i4 * i5 * scalar_psi((i4 * i6))) + (i7 * (i8 / i9)) + (i4 * i10 * scalar_psi((i4 * i9))) + (i4 * i11) + (i12 / i9)) * i0))}}[(0, 3)]
   1.3%    84.9%       0.000s       1.18e-07s     C     4000        4   Rebroadcast{1}
   1.3%    86.2%       0.000s       4.67e-07s     C     1000        1   Elemwise{mul,no_inplace}
   1.3%    87.4%       0.000s       2.34e-07s     C     2000        2   Elemwise{Composite{(sgn(exp(i0)) / Abs(exp(i0)))}}
   ... (remaining 15 Ops account for  12.56%(0.00s) of the runtime)

Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
  16.9%    16.9%       0.006s       6.25e-06s   1000    32   Elemwise{Composite{Switch(i0, (-log1p((i1 / i2))), i3)}}(Elemwise{Composite{Identity((GT(i0, i1) * i2 * GT(inv(sqrt(i0)), i1)))}}.0, Elemwise{mul,no_inplace}.0, InplaceDimShuffle{x}.0, TensorConstant{(1,) of 0})
   6.1%    23.1%       0.002s       2.26e-06s   1000    26   Elemwise{Composite{Identity((GT(i0, i1) * i2 * GT(inv(sqrt(i0)), i1)))}}(Elemwise{Composite{exp((i0 * i1))}}.0, TensorConstant{(1,) of 0}, Elemwise{gt,no_inplace}.0)
   5.9%    29.0%       0.002s       2.19e-06s   1000    54   IncSubtensor{InplaceInc;int64::}(Elemwise{Composite{Switch(i0, (i1 * (i2 + ((i3 * i4 * i5 * i6) / i7))), i8)}}[(0, 6)].0, Elemwise{Composite{Switch(i0, (i1 * i2 * i3), i4)}}.0, Constant{1})
   5.6%    34.6%       0.002s       2.05e-06s   1000    48   Elemwise{Composite{Switch(i0, ((i1 * i2 * i3 * i4) / i5), i6)}}(Elemwise{Composite{Identity((GT(i0, i1) * i2 * GT(inv(sqrt(i0)), i1)))}}.0, TensorConstant{(1,) of 0.5}, InplaceDimShuffle{x}.0, Elemwise{Composite{exp((i0 * i1))}}.0, TensorConstant{[  4.05769..48400e-06]}, Elemwise{Add}[(0, 1)].0, TensorConstant{(1,) of 0})
   5.5%    40.1%       0.002s       2.04e-06s   1000    62   Join(TensorConstant{0}, Rebroadcast{1}.0, Rebroadcast{1}.0, IncSubtensor{InplaceInc;:int64:}.0)
   5.3%    45.4%       0.002s       1.95e-06s   1000    50   Elemwise{Composite{Switch(i0, (i1 * (i2 + ((i3 * i4 * i5 * i6) / i7))), i8)}}[(0, 6)](Elemwise{Composite{Identity((GT(i0, i1) * i2 * GT(inv(sqrt(i0)), i1)))}}.0, TensorConstant{(1,) of -2.0}, TensorConstant{(1,) of 0.5}, TensorConstant{(1,) of -0.5}, InplaceDimShuffle{x}.0, TensorConstant{[  4.05769..48400e-06]}, Elemwise{Composite{exp((i0 * i1))}}.0, Elemwise{Add}[(0, 1)].0, TensorConstant{(1,) of 0})
   4.3%    49.7%       0.002s       1.60e-06s   1000     5   Elemwise{Composite{exp((i0 * i1))}}(TensorConstant{(1,) of -2.0}, s)
   3.2%    52.9%       0.001s       1.17e-06s   1000    41   Alloc(Elemwise{switch,no_inplace}.0, Elemwise{Composite{(i0 - Switch(LT(i1, i0), i2, i0))}}[(0, 0)].0)
   2.8%    55.7%       0.001s       1.04e-06s   1000    57   IncSubtensor{InplaceInc;:int64:}(IncSubtensor{InplaceInc;int64::}.0, Elemwise{Composite{Switch(i0, (i1 * i2), i3)}}[(0, 2)].0, Constant{-1})
   1.8%    57.5%       0.001s       6.52e-07s   1000    30   Elemwise{Composite{Switch(i0, (i1 * sqr(i2)), i3)}}(Elemwise{Composite{Identity((GT(i0, i1) * GT(inv(sqrt(i0)), i1)))}}.0, TensorConstant{(1,) of 0.5}, Elemwise{sub,no_inplace}.0, TensorConstant{(1,) of 0})
   1.6%    59.1%       0.001s       5.99e-07s   1000    35   Elemwise{Switch}(Elemwise{Composite{Identity((GT(i0, i1) * i2 * GT(inv(sqrt(i0)), i1)))}}.0, TensorConstant{(1,) of 1.0}, TensorConstant{(1,) of 0.0})
   1.6%    60.7%       0.001s       5.92e-07s   1000    29   Elemwise{Composite{Switch(i0, (i1 * i2 * i3), i4)}}(Elemwise{Composite{Identity((GT(i0, i1) * GT(inv(sqrt(i0)), i1)))}}.0, TensorConstant{(1,) of -1.0}, InplaceDimShuffle{x}.0, Elemwise{sub,no_inplace}.0, TensorConstant{(1,) of 0})
   1.5%    62.2%       0.001s       5.62e-07s   1000    49   Sum{acc_dtype=float64}(Alloc.0)
   1.5%    63.7%       0.001s       5.50e-07s   1000    34   Elemwise{switch,no_inplace}(Elemwise{Composite{Identity((GT(i0, i1) * i2 * GT(inv(sqrt(i0)), i1)))}}.0, TensorConstant{(1,) of -0..9154943092}, TensorConstant{(1,) of 0})
   1.5%    65.2%       0.001s       5.38e-07s   1000    43   Sum{acc_dtype=float64}(Elemwise{switch,no_inplace}.0)
   1.4%    66.6%       0.001s       5.26e-07s   1000    56   Elemwise{Composite{(Switch(Identity(GT(i0, i1)), (i2 * i0), i1) + (i3 * i0) + (((i4 * i5 * scalar_psi((i4 * i6))) + (i7 * (i8 / i9)) + (i4 * i10 * scalar_psi((i4 * i9))) + (i4 * i11) + (i12 / i9)) * i0))}}[(0, 3)](Elemwise{exp,no_inplace}.0, TensorConstant{0}, TensorConstant{-0.1}, Reshape{0}.0, TensorConstant{0.5}, Sum{acc_dtype=float64}.0, Elemwise{add,no_inplace}.0, TensorConstant{3.141592653589793}, Sum{acc_dtype=float64}.0, nu, Sum{acc_dtype=fl
   1.4%    68.0%       0.001s       5.20e-07s   1000    42   Sum{acc_dtype=float64}(Elemwise{Composite{Switch(i0, (-log1p((i1 / i2))), i3)}}.0)
   1.4%    69.4%       0.001s       5.08e-07s   1000    40   Sum{acc_dtype=float64}(Elemwise{Composite{Switch(i0, (i1 * sqr(i2)), i3)}}.0)
   1.4%    70.8%       0.001s       5.07e-07s   1000    53   Sum{acc_dtype=float64}(Elemwise{Composite{Switch(i0, ((i1 * i2 * i3 * i4) / i5), i6)}}.0)
   1.4%    72.1%       0.001s       5.01e-07s   1000    44   Sum{acc_dtype=float64}(Elemwise{switch,no_inplace}.0)
   ... (remaining 43 Apply instances account for 27.87%(0.01s) of the runtime)

Here are tips to potentially make your code run faster
                 (if you think of new ones, suggest them on the mailing list).
                 Test them first, as they are not guaranteed to always provide a speedup.
  - Try the Theano flag floatX=float32
We don't know if amdlibm will accelerate this scalar op. scalar_psi
We don't know if amdlibm will accelerate this scalar op. scalar_psi
  - Try installing amdlibm and set the Theano flag lib.amdlibm=True. This speeds up only some Elemwise operation.