Function profiling
==================
Message: /home/miguel/PycharmProjects/GeMpy/GeMpy/GeoMig.py:477
Time in 412 calls to Function.__call__: 2.742617e+00s
Time in Function.fn.__call__: 2.696156e+00s (98.306%)
Time in thunks: 2.541325e+00s (92.661%)
Total compile time: 1.843985e+01s
Number of Apply nodes: 254
Theano Optimizer time: 2.322626e+00s
Theano validate time: 1.088021e-01s
Theano Linker time (includes C, CUDA code generation/compiling): 1.604251e+01s
Import time 1.168010e-01s
Time in all call to theano.grad() 0.000000e+00s
Time since theano import 61.358s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
68.9% 68.9% 1.751s 6.25e-05s C 28016 68 theano.tensor.elemwise.Elemwise
11.1% 80.0% 0.282s 6.85e-05s C 4120 10 theano.tensor.blas.Dot22Scalar
9.7% 89.7% 0.246s 5.43e-05s C 4532 11 theano.tensor.basic.Alloc
4.9% 94.6% 0.125s 4.34e-05s C 2884 7 theano.tensor.elemwise.Sum
1.6% 96.2% 0.040s 9.76e-05s Py 412 1 theano.tensor.nlinalg.MatrixInverse
1.0% 97.2% 0.026s 6.34e-05s C 412 1 theano.tensor.blas_c.CGemv
0.7% 98.0% 0.018s 7.33e-06s C 2472 6 theano.tensor.basic.Join
0.6% 98.6% 0.016s 3.83e-05s Py 412 1 theano.tensor.extra_ops.FillDiagonal
0.5% 99.1% 0.012s 1.18e-06s C 10300 25 theano.tensor.subtensor.IncSubtensor
0.3% 99.4% 0.009s 4.13e-07s C 21424 52 theano.tensor.elemwise.DimShuffle
0.2% 99.6% 0.005s 4.87e-07s C 11124 27 theano.tensor.basic.Reshape
0.2% 99.8% 0.005s 7.00e-07s C 7004 17 theano.tensor.subtensor.Subtensor
0.1% 99.9% 0.002s 4.21e-07s C 5356 13 theano.tensor.opt.MakeVector
0.0% 99.9% 0.001s 4.01e-07s C 2884 7 theano.tensor.basic.ScalarFromTensor
0.0% 100.0% 0.001s 4.06e-07s C 2472 6 theano.compile.ops.Shape_i
0.0% 100.0% 0.000s 5.78e-07s C 412 1 theano.tensor.basic.AllocEmpty
0.0% 100.0% 0.000s 5.57e-07s C 412 1 theano.compile.ops.Rebroadcast
... (remaining 0 Classes account for 0.00%(0.00s) of the runtime)
Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
30.4% 30.4% 0.771s 1.87e-03s C 412 1 Elemwise{Composite{(i0 * i1 * LT(Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i2, i3, i4), i5) * (((i6 + ((i7 * Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i2, i3, i4)) / i8)) - ((i9 * Composite{(sqr(i0) * i0)}(Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i2, i3, i4))) / i10)) + ((i11 * Composite{(sqr(sqr(i0)) * i0)}(Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i2, i3, i4))) / i12)))}}[(0, 4)]
29.3% 59.6% 0.745s 1.81e-03s C 412 1 Elemwise{Composite{(i0 * ((LT(Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i1, i2, i3), i4) * ((i5 + (i6 * Composite{(sqr(i0) * i0)}((Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i1, i2, i3) / i4))) + (i7 * Composite{((sqr(sqr(i0)) * sqr(i0)) * i0)}((Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i1, i2, i3) / i4)))) - ((i8 * sqr((Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i1, i2, i3) / i4))) + (i9 * Composite{(sqr(sqr(i0)
11.1% 70.8% 0.282s 6.85e-05s C 4120 10 Dot22Scalar
9.7% 80.4% 0.246s 5.43e-05s C 4532 11 Alloc
4.1% 84.5% 0.104s 1.27e-04s C 824 2 Elemwise{Mul}[(0, 1)]
2.3% 86.8% 0.058s 7.03e-05s C 824 2 Sum{axis=[0], acc_dtype=float64}
2.0% 88.8% 0.051s 1.24e-04s C 412 1 Sum{axis=[0], acc_dtype=float64}
1.8% 90.7% 0.047s 2.84e-05s C 1648 4 Elemwise{Cast{float64}}
1.6% 92.3% 0.040s 9.76e-05s Py 412 1 MatrixInverse
1.4% 93.7% 0.035s 7.17e-06s C 4944 12 Elemwise{sub,no_inplace}
1.0% 94.7% 0.026s 6.34e-05s C 412 1 CGemv{inplace}
0.7% 95.4% 0.018s 7.33e-06s C 2472 6 Join
0.6% 96.0% 0.016s 9.86e-06s C 1648 4 Sum{axis=[1], acc_dtype=float64}
0.6% 96.7% 0.016s 3.83e-05s Py 412 1 FillDiagonal
0.6% 97.3% 0.015s 9.33e-06s C 1648 4 Elemwise{Sqr}[(0, 0)]
0.4% 97.6% 0.010s 2.37e-05s C 412 1 Elemwise{Composite{(i0 + (i1 * i2 * i3) + (i4 * i5 * i6 * i7))}}[(0, 3)]
0.2% 97.8% 0.005s 4.64e-07s C 10300 25 Reshape{2}
0.2% 98.0% 0.004s 1.49e-06s C 2884 7 IncSubtensor{InplaceSet;int64:int64:, int64:int64:}
0.1% 98.1% 0.003s 5.95e-07s C 4944 12 Subtensor{::, int64}
0.1% 98.2% 0.003s 6.53e-06s C 412 1 Elemwise{Composite{Switch(EQ(Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i0, i1, i2), i3), i3, ((((i4 * i5) / sqr(Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i0, i1, i2))) * ((i6 * LT(Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i0, i1, i2), i7) * i8 * Composite{(((i0 + ((i1 * i2) / i3)) - ((i4 * i5) / i6)) + ((i7 * i8) / i9))}(i9, i10, Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i0, i1, i2), i11, i12, Composite{(sqr(i0
... (remaining 53 Ops account for 1.78%(0.05s) of the runtime)
Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
30.4% 30.4% 0.771s 1.87e-03s 412 248 Elemwise{Composite{(i0 * i1 * LT(Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i2, i3, i4), i5) * (((i6 + ((i7 * Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i2, i3, i4)) / i8)) - ((i9 * Composite{(sqr(i0) * i0)}(Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i2, i3, i4))) / i10)) + ((i11 * Composite{(sqr(sqr(i0)) * i0)}(Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i2, i3, i4))) / i12)))}}[(0, 4)](Subtensor{:int64:}.0, Join.0, Reshape{2
29.3% 59.6% 0.745s 1.81e-03s 412 247 Elemwise{Composite{(i0 * ((LT(Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i1, i2, i3), i4) * ((i5 + (i6 * Composite{(sqr(i0) * i0)}((Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i1, i2, i3) / i4))) + (i7 * Composite{((sqr(sqr(i0)) * sqr(i0)) * i0)}((Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i1, i2, i3) / i4)))) - ((i8 * sqr((Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i1, i2, i3) / i4))) + (i9 * Composite{(sqr(sqr(i0)) * i0)}((C
6.3% 65.9% 0.160s 3.87e-04s 412 240 Alloc(CGemv{inplace}.0, Shape_i{0}.0, TensorConstant{1}, TensorConstant{1}, Elemwise{Add}[(0, 1)].0)
4.9% 70.8% 0.125s 3.03e-04s 412 166 Dot22Scalar(Elemwise{Cast{float64}}.0, InplaceDimShuffle{1,0}.0, TensorConstant{2.0})
4.1% 74.9% 0.104s 2.53e-04s 412 249 Elemwise{Mul}[(0, 1)](Subtensor{:int64:}.0, InplaceDimShuffle{1,0}.0, Subtensor{int64::}.0, InplaceDimShuffle{x,x}.0)
3.7% 78.7% 0.095s 2.30e-04s 412 112 Dot22Scalar(Elemwise{Cast{float64}}.0, InplaceDimShuffle{1,0}.0, TensorConstant{2.0})
3.2% 81.8% 0.081s 1.96e-04s 412 195 Alloc(Subtensor{:int64:}.0, Shape_i{0}.0, TensorConstant{1}, TensorConstant{1}, Elemwise{Composite{Switch(LT(i0, i1), Switch(LT((i2 + i0), i1), i1, (i2 + i0)), Switch(LT(i0, i2), i0, i2))}}.0)
2.3% 84.1% 0.058s 1.42e-04s 412 111 Dot22Scalar(Elemwise{Cast{float64}}.0, InplaceDimShuffle{1,0}.0, TensorConstant{2.0})
2.0% 86.2% 0.051s 1.24e-04s 412 252 Sum{axis=[0], acc_dtype=float64}(Elemwise{Mul}[(0, 1)].0)
1.8% 88.0% 0.046s 1.11e-04s 412 26 Elemwise{Cast{float64}}(Positions of the points to interpolate)
1.6% 89.5% 0.040s 9.76e-05s 412 238 MatrixInverse(IncSubtensor{InplaceSet;int64::, int64:int64:}.0)
1.2% 90.7% 0.030s 7.26e-05s 412 251 Sum{axis=[0], acc_dtype=float64}(Elemwise{Composite{(i0 * i1 * LT(Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i2, i3, i4), i5) * (((i6 + ((i7 * Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i2, i3, i4)) / i8)) - ((i9 * Composite{(sqr(i0) * i0)}(Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i2, i3, i4))) / i10)) + ((i11 * Composite{(sqr(sqr(i0)) * i0)}(Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i2, i3, i4))) / i12)))}}[(0, 4)].0)
1.1% 91.8% 0.028s 6.79e-05s 412 250 Sum{axis=[0], acc_dtype=float64}(Elemwise{Composite{(i0 * ((LT(Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i1, i2, i3), i4) * ((i5 + (i6 * Composite{(sqr(i0) * i0)}((Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i1, i2, i3) / i4))) + (i7 * Composite{((sqr(sqr(i0)) * sqr(i0)) * i0)}((Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i1, i2, i3) / i4)))) - ((i8 * sqr((Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i1, i2, i3) / i4))) + (i9 *
1.0% 92.8% 0.026s 6.34e-05s 412 239 CGemv{inplace}(AllocEmpty{dtype='float32'}.0, TensorConstant{1.0}, MatrixInverse.0, IncSubtensor{InplaceSet;int64:int64:}.0, TensorConstant{0.0})
0.6% 93.5% 0.016s 3.83e-05s 412 220 FillDiagonal(Elemwise{Composite{Switch(EQ(Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i0, i1, i2), i3), i3, ((((i4 * i5) / sqr(Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i0, i1, i2))) * ((i6 * LT(Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i0, i1, i2), i7) * i8 * Composite{(((i0 + ((i1 * i2) / i3)) - ((i4 * i5) / i6)) + ((i7 * i8) / i9))}(i9, i10, Composite{Cast{float32}(sqrt(((i0 + i1) - i2)))}(i0, i1, i2), i11, i12, Composite{(sqr(
0.6% 94.1% 0.015s 3.71e-05s 412 191 Sum{axis=[1], acc_dtype=float64}(Elemwise{Sqr}[(0, 0)].0)
0.6% 94.6% 0.015s 3.60e-05s 412 175 Elemwise{Sqr}[(0, 0)](Elemwise{Cast{float64}}.0)
0.5% 95.1% 0.012s 2.89e-05s 412 173 Join(TensorConstant{0}, Elemwise{sub,no_inplace}.0, Elemwise{sub,no_inplace}.0, Elemwise{sub,no_inplace}.0)
0.4% 95.6% 0.011s 2.74e-05s 412 158 Elemwise{sub,no_inplace}(InplaceDimShuffle{0,x}.0, InplaceDimShuffle{1,0}.0)
0.4% 96.0% 0.011s 2.70e-05s 412 159 Elemwise{sub,no_inplace}(InplaceDimShuffle{0,x}.0, InplaceDimShuffle{1,0}.0)
... (remaining 234 Apply instances account for 4.00%(0.10s) of the runtime)
Here are tips to potentially make your code run faster
(if you think of new ones, suggest them on the mailing list).
Test them first, as they are not guaranteed to always provide a speedup.
We don't know if amdlibm will accelerate this scalar op. deg2rad
We don't know if amdlibm will accelerate this scalar op. deg2rad
- Try installing amdlibm and set the Theano flag lib.amdlibm=True. This speeds up only some Elemwise operation.