A profile is a set of statistics that describes how often and for how long various parts of the program executed.
This notebooks shows how to profile various parts of BatchFlow: namely, pipelines and models.
In [1]:
import sys
sys.path.append("../../..")
from batchflow import B, V, W
from batchflow.opensets import MNIST
from batchflow.models.torch import ResNet18
In [2]:
dataset = MNIST()
To collect information about model training times (both on CPU and GPU), one must set profile option in the model configuration to True:
In [3]:
model_config = {
'inputs/labels/classes': 10,
'loss': 'ce',
'profile': True,
}
In [4]:
pipeline = (dataset.train.p
.init_variable('loss_history', [])
.to_array(channels='first', dtype='float32')
.multiply(multiplier=1/255., preserve_type=False)
.init_model('dynamic', ResNet18,
'resnet', config=model_config)
.train_model('resnet',
B.images, B.labels,
fetches='loss',
save_to=V('loss_history', mode='a'))
)
To gather statistics about how long each action takes, we must set profile to True inside run call:
In [5]:
BATCH_SIZE = 64
N_ITERS = 50
pipeline.run(BATCH_SIZE, n_iters=N_ITERS, bar=True, profile=True,
bar_desc=W(V('loss_history')[-1].format('Loss is {:7.7}')))
Loss is 0.1426592: 100%|██████████| 50/50 [01:22<00:00, 1.29s/it]
Out[5]:
<batchflow.batchflow.pipeline.Pipeline at 0x7f52f4521048>
First of all, there is an elapsed_time attribute inside every instance of Pipeline: it stores total time of running the pipeline (even if it was used multiple times):
In [6]:
pipeline.elapsed_time
Out[6]:
82.73438310623169
Note that elapsed_time attribute is created whether or not we set profile to True.
After running with profile=True, pipeline has attribute profile_info: this DataFrame holds collected information:
In [7]:
pipeline.profile_info.head()
Out[7]:
iter
total_time
pipeline_time
ncalls
tottime
cumtime
batch_id
start_time
action
id
to_array #0
<built-in method _abc._abc_instancecheck>::/home/tsimfer/anaconda3/lib/python3.7/abc.py::137::__instancecheck__
1
0.081367
0.0711
64
0.000166
0.001459
139994263066552
1.582270e+09
<built-in method _abc._abc_subclasscheck>::/home/tsimfer/anaconda3/lib/python3.7/abc.py::141::__subclasscheck__
1
0.081367
0.0711
57
0.001096
0.001288
139994263066552
1.582270e+09
<built-in method _operator.index>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/numpy/core/numeric.py::1557::normalize_axis_tuple
1
0.081367
0.0711
128
0.000037
0.000037
139994263066552
1.582270e+09
<built-in method posix.sched_getaffinity>::../../../batchflow/batchflow/decorators.py::18::_workers_count
1
0.081367
0.0711
1
0.000048
0.000048
139994263066552
1.582270e+09
<built-in method builtins.any>::../../../batchflow/batchflow/decorators.py::86::any_action_failed
1
0.081367
0.0711
1
0.000003
0.000026
139994263066552
1.582270e+09
Note that there is a detailed information about exact methods that are called inside each of the actions. That is a lot of data which can give us precise understanding of parts of the code, that are our bottlenecks.
Columns of the profile_info:
action, iter, batch_id and start_time are pretty self-explainableid allows to identify exact method with great details: it is a concatenation of method_name, file_name, line_number and calleetotal_time is a time taken by an actionpipeline_time is total_time plus time of processing the profiling table at each iterationtottime is a time taken by a method inside actioncumtime is a time take by a method and all of the methods that are called inside this methodMore often than not, though, we don't need such granularity. Pipeline method show_profile_info makes some handy aggregations:
Note: by default, results are sorted on total_time or tottime, depending on level of details.
In [8]:
# timings for each action
pipeline.show_profile_info(per_iter=False, detailed=False)
Out[8]:
total_time
pipeline_time
sum
mean
max
sum
mean
max
action
train_model #2
75.821759
1.516435
3.217391
75.503571
1.510071
3.205411
to_array #0
2.835696
0.056714
0.106486
2.546398
0.050928
0.097141
multiply #1
1.692740
0.033855
0.069450
1.509747
0.030195
0.060036
In [9]:
# for each action show 2 of the slowest methods, based on maximum `ncalls`
pipeline.show_profile_info(per_iter=False, detailed=True, sortby=('ncalls', 'max'), limit=2)
Out[9]:
ncalls
tottime
cumtime
sum
mean
max
sum
mean
max
sum
mean
max
action
id
multiply #1
<built-in method builtins.isinstance>::../../../batchflow/batchflow/components.py::105::find_in_index
409600
8192.00
8192
0.050672
0.001013
0.001615
0.050672
0.001013
0.001615
<built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index
204800
4096.00
4096
0.111507
0.002230
0.003821
0.111507
0.002230
0.003821
to_array #0
<built-in method builtins.isinstance>::../../../batchflow/batchflow/components.py::105::find_in_index
409600
8192.00
8192
0.069341
0.001387
0.002048
0.069341
0.001387
0.002048
<built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index
204800
4096.00
4096
0.167002
0.003340
0.009804
0.167002
0.003340
0.009804
train_model #2
<method 'append' of 'list' objects>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/profiler.py::641::parse_cpu_trace
1082936
21658.72
21686
0.151943
0.003039
0.005326
0.151943
0.003039
0.005326
<lambda>::~::0::<method 'sort' of 'list' objects>
541468
10829.36
10843
0.091438
0.001829
0.004184
0.091438
0.001829
0.004184
In [10]:
# timings for each action for each iter
pipeline.show_profile_info(per_iter=True, detailed=False,)
Out[10]:
total_time
pipeline_time
batch_id
iter
action
1
train_model #2
3.217391
3.205411
139994263066555
to_array #0
0.081367
0.071100
139994263066552
multiply #1
0.024790
0.021719
139994263066552
2
train_model #2
1.652845
1.647868
139994266422688
to_array #0
0.043760
0.039731
139994266422688
multiply #1
0.029897
0.024791
139994266422688
3
train_model #2
1.298541
1.293026
139994121185768
to_array #0
0.045227
0.040213
139994121185768
multiply #1
0.033502
0.030644
139994121185768
4
train_model #2
1.454606
1.450196
139994266422688
to_array #0
0.046735
0.041521
139994266422688
multiply #1
0.035175
0.032045
139994266422688
5
train_model #2
1.379844
1.375117
139994121545432
to_array #0
0.042979
0.038731
139994121545432
multiply #1
0.039776
0.036961
139994121545432
6
train_model #2
1.470328
1.465981
139994147965584
to_array #0
0.059414
0.056368
139994147965584
multiply #1
0.023794
0.020913
139994147965584
7
train_model #2
1.403927
1.399618
139994114391120
to_array #0
0.040669
0.037654
139994114391120
multiply #1
0.025973
0.023206
139994114391120
8
train_model #2
1.290563
1.286370
139994266422688
to_array #0
0.047976
0.043082
139994266422688
multiply #1
0.028793
0.026014
139994266422688
9
train_model #2
1.557899
1.545445
139994121545432
to_array #0
0.053481
0.045220
139994121545432
multiply #1
0.029404
0.026678
139994121545432
10
train_model #2
1.516459
1.511827
139994120640160
to_array #0
0.100030
0.080075
139994120640160
multiply #1
0.043438
0.039798
139994120640160
...
...
...
...
...
41
train_model #2
1.372318
1.367815
139994085289152
to_array #0
0.053491
0.050423
139994085289152
multiply #1
0.040002
0.037197
139994085289152
42
train_model #2
1.813250
1.801895
139993976315800
to_array #0
0.043776
0.038399
139993976315800
multiply #1
0.021445
0.018358
139993976315800
43
train_model #2
1.713757
1.709046
139994085289152
to_array #0
0.085090
0.081801
139994085289152
multiply #1
0.054966
0.050499
139994085289152
44
train_model #2
1.894864
1.881982
139993976315800
to_array #0
0.091153
0.087466
139993976315800
multiply #1
0.042776
0.038372
139993976315800
45
train_model #2
0.981556
0.977580
139994065303872
to_array #0
0.055165
0.046776
139994065303872
multiply #1
0.031591
0.027191
139994065303872
46
train_model #2
0.995855
0.991259
139993976315800
to_array #0
0.038230
0.031463
139993976315800
multiply #1
0.021815
0.018316
139993976315800
47
train_model #2
1.048226
1.044194
139994085289152
to_array #0
0.045290
0.040356
139994085289152
multiply #1
0.020976
0.018173
139994085289152
48
train_model #2
1.156613
1.152424
139993958521208
to_array #0
0.043842
0.040790
139993958521208
multiply #1
0.021237
0.018435
139993958521208
49
train_model #2
1.118411
1.109987
139994065302360
to_array #0
0.040939
0.036036
139994065302360
multiply #1
0.028280
0.025523
139994065302360
50
train_model #2
1.065028
1.060632
139993976315800
to_array #0
0.062030
0.054732
139993976315800
multiply #1
0.037935
0.032852
139993976315800
150 rows × 3 columns
In [11]:
# for each iter each action show 3 of the slowest methods, based on maximum `ncalls`
pipeline.show_profile_info(per_iter=True, detailed=True, sortby='tottime', limit=3)
Out[11]:
ncalls
tottime
cumtime
iter
action
id
1
multiply #1
find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>
4096
0.010762
0.013369
<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos
64
0.002169
0.015538
_get::../../../batchflow/batchflow/components.py::145::get
64
0.002132
0.017950
to_array #0
find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>
4096
0.031077
0.037401
<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos
64
0.006795
0.044196
<built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index
4096
0.004464
0.004464
train_model #2
<method 'run_backward' of 'torch._C._EngineBase' objects>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py::44::backward
1
1.213085
1.213085
<built-in method conv2d>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py::334::conv2d_forward
143
1.080935
1.080935
<built-in method batch_norm>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py::1643::batch_norm
143
0.241610
0.241610
2
multiply #1
find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>
4096
0.011132
0.013721
_get::../../../batchflow/batchflow/components.py::145::get
64
0.003379
0.019765
<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos
64
0.002331
0.016052
to_array #0
find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>
4096
0.015315
0.018649
<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos
64
0.003696
0.022345
__array_interface__::~::0::<built-in method numpy.array>
64
0.003527
0.005620
train_model #2
<method 'run_backward' of 'torch._C._EngineBase' objects>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py::44::backward
1
1.004392
1.004392
__init__::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/profiler.py::641::parse_cpu_trace
10843
0.152933
0.157272
<built-in method conv2d>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py::334::conv2d_forward
20
0.110850
0.110850
3
multiply #1
find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>
4096
0.015414
0.018821
<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos
64
0.003056
0.021877
_get::../../../batchflow/batchflow/components.py::145::get
64
0.003026
0.025336
to_array #0
find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>
4096
0.017931
0.021669
<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos
64
0.003794
0.025463
<built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index
4096
0.002563
0.002563
train_model #2
<method 'run_backward' of 'torch._C._EngineBase' objects>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py::44::backward
1
0.755082
0.755082
<built-in method conv2d>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py::334::conv2d_forward
20
0.143743
0.143743
parse_cpu_trace::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/profiler.py::259::__exit__
1
0.093924
0.117686
4
multiply #1
find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>
4096
0.012787
0.015775
_get::../../../batchflow/batchflow/components.py::145::get
64
0.005221
0.024614
<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos
64
0.003126
0.018901
...
...
...
...
...
...
47
train_model #2
<method 'run_backward' of 'torch._C._EngineBase' objects>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py::44::backward
1
0.592086
0.592086
parse_cpu_trace::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/profiler.py::259::__exit__
1
0.111435
0.138595
<built-in method conv2d>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py::334::conv2d_forward
20
0.105768
0.105768
48
multiply #1
find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>
4096
0.009514
0.011902
<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos
64
0.001712
0.013614
<built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index
4096
0.001590
0.001590
to_array #0
find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>
4096
0.017634
0.021265
<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos
64
0.004749
0.026014
<built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index
4096
0.002465
0.002465
train_model #2
<method 'run_backward' of 'torch._C._EngineBase' objects>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py::44::backward
1
0.734896
0.734896
parse_cpu_trace::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/profiler.py::259::__exit__
1
0.101226
0.124736
<built-in method conv2d>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py::334::conv2d_forward
20
0.082176
0.082176
49
multiply #1
find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>
4096
0.012500
0.015302
_get::../../../batchflow/batchflow/components.py::145::get
64
0.002776
0.020858
<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos
64
0.002441
0.017743
to_array #0
find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>
4096
0.015514
0.018998
<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos
64
0.003202
0.022200
<built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index
4096
0.002402
0.002402
train_model #2
<method 'run_backward' of 'torch._C._EngineBase' objects>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py::44::backward
1
0.573810
0.573810
parse_cpu_trace::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/profiler.py::259::__exit__
1
0.123973
0.154925
<built-in method conv2d>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py::334::conv2d_forward
20
0.116472
0.116472
50
multiply #1
find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>
4096
0.015247
0.018710
_get::../../../batchflow/batchflow/components.py::145::get
64
0.003869
0.026167
<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos
64
0.003115
0.021825
to_array #0
find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>
4096
0.024549
0.029653
<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos
64
0.005232
0.034885
<built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index
4096
0.003526
0.003526
train_model #2
<method 'run_backward' of 'torch._C._EngineBase' objects>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py::44::backward
1
0.641996
0.641996
<built-in method conv2d>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py::334::conv2d_forward
20
0.101700
0.101700
parse_cpu_trace::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/profiler.py::259::__exit__
1
0.091317
0.115946
450 rows × 3 columns
In [12]:
model = pipeline.m('resnet')
There is an info property that, unsurprisingly, shows a lot of interesting details regarding model itself or the training process:
In [13]:
model.info
##### Config:
{'benchmark': True,
'body': {'encoder': {'blocks': {'base': <class 'batchflow.models.torch.blocks.ResBlock'>,
'bottleneck': False,
'downsample': [False, True, True, True],
'filters': [64, 128, 256, 512],
'layout': 'cnacn',
'n_reps': [2, 2, 2, 2],
'se': False},
'downsample': {'layout': 'p',
'pool_size': 2,
'pool_strides': 2},
'num_stages': 4,
'order': ['skip', 'block']}},
'common': {'data_format': 'channels_first'},
'decay': None,
'device': None,
'head': {'classes': 10,
'dropout_rate': 0.4,
'filters': 10,
'layout': 'Vdf',
'target_shape': (64,),
'units': 10},
'initial_block': {'filters': 64,
'kernel_size': 7,
'layout': 'cnap',
'pool_size': 3,
'pool_strides': 2,
'strides': 2},
'inputs': {'labels': {'classes': 10}, 'targets': {'classes': 10}},
'loss': 'ce',
'microbatch': None,
'optimizer': 'Adam',
'order': ['initial_block', 'body', 'head'],
'output': None,
'placeholder_batch_size': 2,
'predictions': None,
'profile': True,
'sync_frequency': 1,
'train_steps': {'': {'decay': None,
'loss': 'ce',
'n_iters': None,
'optimizer': 'Adam'}}}
##### Devices:
Leading device is cpu
##### Train steps:
{'': {'decay': None,
'initialized': True,
'iter': 50,
'loss': [CrossEntropyLoss()],
'n_iters': None,
'optimizer': Adam (
Parameter Group 0
amsgrad: False
betas: (0.9, 0.999)
eps: 1e-08
lr: 0.001
weight_decay: 0
)}}
##### Additional info:
Input 0 has shape (64, 1, 28, 28)
Target has shape (64,)
Total number of parameters in model: 11175370
Total number of passed training iterations: 50
##### Last iteration params:
{'actual_model_inputs_shape': [(64, 1, 28, 28)],
'actual_model_outputs_shape': (64,),
'microbatch': None,
'steps': 1,
'sync_frequency': 1,
'train_mode': ['']}
As with pipeline, there is a profile_info attribute, as well as show_profile_info method. Depending on type of the used device (CPU or GPU)
In [14]:
# one row for every operation inside model; limit at 5 rows
model.show_profile_info(per_iter=False, limit=5)
Out[14]:
ncalls
CPU_tottime
CPU_cumtime
CPU_tottime_avg
sum
mean
max
sum
mean
max
sum
mean
max
sum
mean
max
name
mkldnn_convolution_backward
1000
20.00
20
38.598603
0.771972
1.102136
38.598603
0.771972
1.102136
1.929930
0.038599
0.055107
mkldnn_convolution
1000
20.00
20
6.414861
0.128297
0.219037
6.414861
0.128297
0.219037
0.320743
0.006415
0.010952
sqrt
3038
62.00
62
3.609079
0.073655
0.117845
3.609079
0.073655
0.117845
0.058211
0.001188
0.001901
native_batch_norm_backward
1000
20.00
20
2.508889
0.050178
0.078185
2.508889
0.050178
0.078185
0.125444
0.002509
0.003909
add_
10114
202.28
206
1.842055
0.036841
0.056120
1.842055
0.036841
0.056120
0.008953
0.000179
0.000272
In [15]:
# for each iteration show 3 of the slowest operations
model.show_profile_info(per_iter=True, limit=3)
Out[15]:
ncalls
CPU_tottime
CPU_cumtime
CPU_tottime_avg
iter
name
0
mkldnn_convolution_backward
20
1.102136
1.102136
0.055107
mkldnn_convolution
20
0.116687
0.116687
0.005834
native_batch_norm_backward
20
0.073042
0.073042
0.003652
1
mkldnn_convolution_backward
20
0.910873
0.910873
0.045544
mkldnn_convolution
20
0.109726
0.109726
0.005486
sqrt
62
0.061738
0.061738
0.000996
2
mkldnn_convolution_backward
20
0.674379
0.674379
0.033719
mkldnn_convolution
20
0.142355
0.142355
0.007118
sqrt
62
0.070446
0.070446
0.001136
3
mkldnn_convolution_backward
20
0.741461
0.741461
0.037073
mkldnn_convolution
20
0.103317
0.103317
0.005166
sqrt
62
0.058764
0.058764
0.000948
4
mkldnn_convolution_backward
20
0.768156
0.768156
0.038408
mkldnn_convolution
20
0.113721
0.113721
0.005686
sqrt
62
0.069744
0.069744
0.001125
5
mkldnn_convolution_backward
20
0.745379
0.745379
0.037269
mkldnn_convolution
20
0.113744
0.113744
0.005687
sqrt
62
0.074473
0.074473
0.001201
6
mkldnn_convolution_backward
20
0.743273
0.743273
0.037164
mkldnn_convolution
20
0.136254
0.136254
0.006813
sqrt
62
0.106651
0.106651
0.001720
7
mkldnn_convolution_backward
20
0.721489
0.721489
0.036074
mkldnn_convolution
20
0.117488
0.117488
0.005874
sqrt
62
0.056416
0.056416
0.000910
8
mkldnn_convolution_backward
20
0.816239
0.816239
0.040812
mkldnn_convolution
20
0.104785
0.104785
0.005239
sqrt
62
0.088693
0.088693
0.001431
9
mkldnn_convolution_backward
20
0.762722
0.762722
0.038136
mkldnn_convolution
20
0.106693
0.106693
0.005335
sqrt
62
0.083834
0.083834
0.001352
...
...
...
...
...
...
40
mkldnn_convolution_backward
20
0.503122
0.503122
0.025156
mkldnn_convolution
20
0.089508
0.089508
0.004475
sqrt
62
0.066431
0.066431
0.001071
41
mkldnn_convolution_backward
20
1.013086
1.013086
0.050654
mkldnn_convolution
20
0.131901
0.131901
0.006595
sqrt
62
0.090812
0.090812
0.001465
42
mkldnn_convolution_backward
20
0.860204
0.860204
0.043010
mkldnn_convolution
20
0.124513
0.124513
0.006226
sqrt
62
0.117845
0.117845
0.001901
43
mkldnn_convolution_backward
20
1.018114
1.018114
0.050906
mkldnn_convolution
20
0.190651
0.190651
0.009533
sqrt
62
0.090844
0.090844
0.001465
44
mkldnn_convolution_backward
20
0.480101
0.480101
0.024005
mkldnn_convolution
20
0.080808
0.080808
0.004040
sqrt
62
0.072511
0.072511
0.001170
45
mkldnn_convolution_backward
20
0.504529
0.504529
0.025226
mkldnn_convolution
20
0.079474
0.079474
0.003974
sqrt
62
0.066213
0.066213
0.001068
46
mkldnn_convolution_backward
20
0.526187
0.526187
0.026309
mkldnn_convolution
20
0.104905
0.104905
0.005245
sqrt
62
0.061077
0.061077
0.000985
47
mkldnn_convolution_backward
20
0.654809
0.654809
0.032740
mkldnn_convolution
20
0.081344
0.081344
0.004067
sqrt
62
0.063444
0.063444
0.001023
48
mkldnn_convolution_backward
20
0.511648
0.511648
0.025582
mkldnn_convolution
20
0.115203
0.115203
0.005760
sqrt
62
0.070366
0.070366
0.001135
49
mkldnn_convolution_backward
20
0.559349
0.559349
0.027967
mkldnn_convolution
20
0.100382
0.100382
0.005019
native_batch_norm_backward
20
0.047145
0.047145
0.002357
150 rows × 4 columns
Content source: analysiscenter/dataset
Similar notebooks: