Function profiling
==================
Message: /home/miguel/PycharmProjects/GeMpy/GeoMig.py:562
Time in 5 calls to Function.__call__: 2.937774e+01s
Time in Function.fn.__call__: 2.937736e+01s (99.999%)
Time in thunks: 2.934835e+01s (99.900%)
Total compile time: 1.559712e+00s
Number of Apply nodes: 171
Theano Optimizer time: 1.410723e+00s
Theano validate time: 4.508591e-02s
Theano Linker time (includes C, CUDA code generation/compiling): 9.198022e-02s
Import time 0.000000e+00s
Time in all call to theano.grad() 0.000000e+00s
Time since theano import 132.105s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
83.6% 83.6% 24.529s 1.29e-01s C 190 38 theano.tensor.elemwise.Elemwise
6.4% 90.0% 1.889s 5.40e-02s C 35 7 theano.tensor.elemwise.Sum
5.1% 95.1% 1.496s 2.99e-02s C 50 10 theano.tensor.blas.Dot22Scalar
2.5% 97.6% 0.743s 1.86e-02s C 40 8 theano.tensor.basic.Alloc
2.3% 100.0% 0.682s 2.27e-02s C 30 6 theano.tensor.basic.Join
0.0% 100.0% 0.008s 1.55e-03s Py 5 1 theano.tensor.nlinalg.MatrixInverse
0.0% 100.0% 0.000s 2.76e-06s C 95 19 theano.tensor.basic.Reshape
0.0% 100.0% 0.000s 2.37e-06s C 65 13 theano.tensor.subtensor.IncSubtensor
0.0% 100.0% 0.000s 9.48e-07s C 155 31 theano.tensor.elemwise.DimShuffle
0.0% 100.0% 0.000s 2.77e-05s Py 5 1 theano.tensor.extra_ops.FillDiagonal
0.0% 100.0% 0.000s 2.34e-06s C 55 11 theano.tensor.subtensor.Subtensor
0.0% 100.0% 0.000s 1.37e-06s C 60 12 theano.tensor.opt.MakeVector
0.0% 100.0% 0.000s 1.14e-06s C 30 6 theano.compile.ops.Shape_i
0.0% 100.0% 0.000s 5.77e-06s C 5 1 theano.tensor.blas_c.CGemv
0.0% 100.0% 0.000s 7.71e-07s C 30 6 theano.tensor.basic.ScalarFromTensor
0.0% 100.0% 0.000s 1.19e-06s C 5 1 theano.tensor.basic.AllocEmpty
... (remaining 0 Classes account for 0.00%(0.00s) of the runtime)
Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
47.3% 47.3% 13.894s 2.78e+00s C 5 1 Elemwise{Composite{(i0 * i1 * LT(Composite{sqrt(((i0 + i1) - i2))}(i2, i3, i4), i5) * Composite{(sqr(i0) * i0)}((i5 - Composite{sqrt(((i0 + i1) - i2))}(i2, i3, i4))) * ((i6 * sqr(Composite{sqrt(((i0 + i1) - i2))}(i2, i3, i4))) + i7 + (i8 * i5 * Composite{sqrt(((i0 + i1) - i2))}(i2, i3, i4))))}}[(0, 1)]
28.1% 75.5% 8.253s 1.65e+00s C 5 1 Elemwise{Composite{(i0 * ((LT(Composite{sqrt(((i0 + i1) - i2))}(i1, i2, i3), i4) * ((i5 + (i6 * Composite{(sqr(i0) * i0)}((Composite{sqrt(((i0 + i1) - i2))}(i1, i2, i3) / i4))) + (i7 * Composite{((sqr(sqr(i0)) * sqr(i0)) * i0)}((Composite{sqrt(((i0 + i1) - i2))}(i1, i2, i3) / i4)))) - ((i8 * sqr((Composite{sqrt(((i0 + i1) - i2))}(i1, i2, i3) / i4))) + (i9 * Composite{(sqr(sqr(i0)) * i0)}((Composite{sqrt(((i0 + i1) - i2))}(i1, i2, i3) / i4
6.1% 81.6% 1.799s 1.20e-01s C 15 3 Sum{axis=[0], acc_dtype=float64}
5.1% 86.7% 1.496s 2.99e-02s C 50 10 Dot22Scalar
3.6% 90.3% 1.054s 2.11e-01s C 5 1 Elemwise{Composite{sqrt(((i0 + i1) - i2))}}[(0, 2)]
2.8% 93.0% 0.810s 1.62e-02s C 50 10 Elemwise{sub,no_inplace}
2.5% 95.6% 0.743s 1.86e-02s C 40 8 Alloc
2.3% 97.9% 0.682s 2.27e-02s C 30 6 Join
1.5% 99.4% 0.431s 2.87e-02s C 15 3 Elemwise{mul,no_inplace}
0.3% 99.7% 0.090s 4.51e-03s C 20 4 Sum{axis=[1], acc_dtype=float64}
0.2% 99.9% 0.054s 1.09e-02s C 5 1 Elemwise{Composite{(i0 + ((i1 * i2) / i3) + i4)}}[(0, 0)]
0.1% 100.0% 0.034s 1.70e-03s C 20 4 Elemwise{sqr,no_inplace}
0.0% 100.0% 0.008s 1.55e-03s Py 5 1 MatrixInverse
0.0% 100.0% 0.000s 2.76e-06s C 95 19 Reshape{2}
0.0% 100.0% 0.000s 2.77e-05s Py 5 1 FillDiagonal
0.0% 100.0% 0.000s 2.18e-05s C 5 1 Elemwise{Composite{Switch(EQ(Composite{sqrt(((i0 + i1) - i2))}(i0, i1, i2), i3), i3, ((i4 * (((i5 * i6 * i7 * LT(Composite{sqrt(((i0 + i1) - i2))}(i0, i1, i2), i8) * Composite{(sqr((i0 - i1)) * (i0 - i1))}(i8, Composite{sqrt(((i0 + i1) - i2))}(i0, i1, i2)) * Composite{((i0 * i1) + i2 + (i3 * i4 * i5))}(i9, sqr(Composite{sqrt(((i0 + i1) - i2))}(i0, i1, i2)), i10, i11, i8, Composite{sqrt(((i0 + i1) - i2))}(i0, i1, i2))) / (sqr(Composite{sqr
0.0% 100.0% 0.000s 1.37e-06s C 60 12 MakeVector{dtype='int64'}
0.0% 100.0% 0.000s 1.54e-05s C 5 1 Elemwise{Composite{((((LT(Composite{sqrt(((i0 + i1) - i2))}(i0, i1, i2), i3) * ((i4 + (i5 * Composite{(sqr(i0) * i0)}((Composite{sqrt(((i0 + i1) - i2))}(i0, i1, i2) / i3))) + (i6 * Composite{((sqr(sqr(i0)) * sqr(i0)) * i0)}((Composite{sqrt(((i0 + i1) - i2))}(i0, i1, i2) / i3)))) - ((i7 * sqr((Composite{sqrt(((i0 + i1) - i2))}(i0, i1, i2) / i3))) + (i8 * Composite{(sqr(sqr(i0)) * i0)}((Composite{sqrt(((i0 + i1) - i2))}(i0, i1, i2) / i3))))
0.0% 100.0% 0.000s 1.73e-06s C 40 8 Subtensor{::, int64}
0.0% 100.0% 0.000s 2.33e-06s C 25 5 IncSubtensor{InplaceSet;int64:int64:, int64:int64:}
... (remaining 37 Ops account for 0.00%(0.00s) of the runtime)
Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
47.3% 47.3% 13.894s 2.78e+00s 5 165 Elemwise{Composite{(i0 * i1 * LT(Composite{sqrt(((i0 + i1) - i2))}(i2, i3, i4), i5) * Composite{(sqr(i0) * i0)}((i5 - Composite{sqrt(((i0 + i1) - i2))}(i2, i3, i4))) * ((i6 * sqr(Composite{sqrt(((i0 + i1) - i2))}(i2, i3, i4))) + i7 + (i8 * i5 * Composite{sqrt(((i0 + i1) - i2))}(i2, i3, i4))))}}[(0, 1)](Subtensor{:int64:}.0, Join.0, Reshape{2}.0, Reshape{2}.0, Dot22Scalar.0, InplaceDimShuffle{x,x}.0, TensorConstant{(1, 1) of 3.0}, Elemwise{mul,no_inp
28.1% 75.5% 8.253s 1.65e+00s 5 164 Elemwise{Composite{(i0 * ((LT(Composite{sqrt(((i0 + i1) - i2))}(i1, i2, i3), i4) * ((i5 + (i6 * Composite{(sqr(i0) * i0)}((Composite{sqrt(((i0 + i1) - i2))}(i1, i2, i3) / i4))) + (i7 * Composite{((sqr(sqr(i0)) * sqr(i0)) * i0)}((Composite{sqrt(((i0 + i1) - i2))}(i1, i2, i3) / i4)))) - ((i8 * sqr((Composite{sqrt(((i0 + i1) - i2))}(i1, i2, i3) / i4))) + (i9 * Composite{(sqr(sqr(i0)) * i0)}((Composite{sqrt(((i0 + i1) - i2))}(i1, i2, i3) / i4)))))) - (L
4.6% 80.0% 1.346s 2.69e-01s 5 168 Sum{axis=[0], acc_dtype=float64}(Elemwise{Composite{(i0 * i1 * LT(Composite{sqrt(((i0 + i1) - i2))}(i2, i3, i4), i5) * Composite{(sqr(i0) * i0)}((i5 - Composite{sqrt(((i0 + i1) - i2))}(i2, i3, i4))) * ((i6 * sqr(Composite{sqrt(((i0 + i1) - i2))}(i2, i3, i4))) + i7 + (i8 * i5 * Composite{sqrt(((i0 + i1) - i2))}(i2, i3, i4))))}}[(0, 1)].0)
3.6% 83.6% 1.054s 2.11e-01s 5 98 Elemwise{Composite{sqrt(((i0 + i1) - i2))}}[(0, 2)](Reshape{2}.0, Reshape{2}.0, Dot22Scalar.0)
2.7% 86.3% 0.780s 1.56e-01s 5 105 Dot22Scalar(Reshape{2}.0, Positions of the points to interpolate.T, TensorConstant{2.0})
2.5% 88.8% 0.742s 1.48e-01s 5 157 Alloc(CGemv{inplace}.0, Shape_i{0}.0, TensorConstant{1}, TensorConstant{1}, Elemwise{Add}[(0, 1)].0)
2.3% 91.1% 0.682s 1.36e-01s 5 121 Join(TensorConstant{0}, Elemwise{sub,no_inplace}.0, Elemwise{sub,no_inplace}.0)
1.5% 92.6% 0.430s 8.61e-02s 5 166 Elemwise{mul,no_inplace}(Subtensor{int64::}.0, Positions of the points to interpolate.T)
1.4% 94.0% 0.410s 8.19e-02s 5 109 Elemwise{sub,no_inplace}(InplaceDimShuffle{0,x}.0, InplaceDimShuffle{1,0}.0)
1.4% 95.4% 0.400s 8.00e-02s 5 108 Elemwise{sub,no_inplace}(InplaceDimShuffle{0,x}.0, InplaceDimShuffle{1,0}.0)
1.2% 96.6% 0.361s 7.22e-02s 5 43 Dot22Scalar(Reference points for every layer, Positions of the points to interpolate.T, TensorConstant{2.0})
1.2% 97.8% 0.360s 7.20e-02s 5 167 Sum{axis=[0], acc_dtype=float64}(Elemwise{Composite{(i0 * ((LT(Composite{sqrt(((i0 + i1) - i2))}(i1, i2, i3), i4) * ((i5 + (i6 * Composite{(sqr(i0) * i0)}((Composite{sqrt(((i0 + i1) - i2))}(i1, i2, i3) / i4))) + (i7 * Composite{((sqr(sqr(i0)) * sqr(i0)) * i0)}((Composite{sqrt(((i0 + i1) - i2))}(i1, i2, i3) / i4)))) - ((i8 * sqr((Composite{sqrt(((i0 + i1) - i2))}(i1, i2, i3) / i4))) + (i9 * Composite{(sqr(sqr(i0)) * i0)}((Composite{sqrt(((i0 + i1) -
1.2% 99.0% 0.355s 7.10e-02s 5 44 Dot22Scalar(Rest of the points of the layers, Positions of the points to interpolate.T, TensorConstant{2.0})
0.3% 99.4% 0.094s 1.87e-02s 5 169 Sum{axis=[0], acc_dtype=float64}(Elemwise{mul,no_inplace}.0)
0.3% 99.7% 0.090s 1.80e-02s 5 45 Sum{axis=[1], acc_dtype=float64}(Elemwise{sqr,no_inplace}.0)
0.2% 99.8% 0.054s 1.09e-02s 5 170 Elemwise{Composite{(i0 + ((i1 * i2) / i3) + i4)}}[(0, 0)](Sum{axis=[0], acc_dtype=float64}.0, TensorConstant{(1,) of -1.75}, Sum{axis=[0], acc_dtype=float64}.0, InplaceDimShuffle{x}.0, Sum{axis=[0], acc_dtype=float64}.0)
0.1% 100.0% 0.034s 6.79e-03s 5 11 Elemwise{sqr,no_inplace}(Positions of the points to interpolate)
0.0% 100.0% 0.008s 1.55e-03s 5 155 MatrixInverse(IncSubtensor{InplaceSet;int64::, int64:int64:}.0)
0.0% 100.0% 0.000s 9.94e-05s 5 134 Sum{axis=[1], acc_dtype=float64}(Elemwise{Sqr}[(0, 0)].0)
0.0% 100.0% 0.000s 9.12e-05s 5 100 Alloc(Elemwise{sub,no_inplace}.0, TensorConstant{1}, TensorConstant{2}, Elemwise{Composite{Switch(EQ(i0, i1), (i0 // (-i0)), i0)}}.0, Shape_i{0}.0)
... (remaining 151 Apply instances account for 0.01%(0.00s) of the runtime)
Here are tips to potentially make your code run faster
(if you think of new ones, suggest them on the mailing list).
Test them first, as they are not guaranteed to always provide a speedup.
We don't know if amdlibm will accelerate this scalar op. deg2rad
- Try installing amdlibm and set the Theano flag lib.amdlibm=True. This speeds up only some Elemwise operation.