Profiling BatchFlow code

A profile is a set of statistics that describes how often and for how long various parts of the program executed.

This notebooks shows how to profile various parts of BatchFlow: namely, pipelines and models.


In [1]:
import sys

sys.path.append("../../..")
from batchflow import B, V, W
from batchflow.opensets import MNIST
from batchflow.models.torch import ResNet18

In [2]:
dataset = MNIST()

To collect information about model training times (both on CPU and GPU), one must set profile option in the model configuration to True:


In [3]:
model_config = {
    'inputs/labels/classes': 10,
    'loss': 'ce',
    'profile': True,
}

In [4]:
pipeline = (dataset.train.p
                .init_variable('loss_history', [])
                .to_array(channels='first', dtype='float32')
                .multiply(multiplier=1/255., preserve_type=False)
                .init_model('dynamic', ResNet18,
                            'resnet', config=model_config)
                .train_model('resnet',
                             B.images, B.labels,
                             fetches='loss',
                             save_to=V('loss_history', mode='a'))
                )

To gather statistics about how long each action takes, we must set profile to True inside run call:


In [5]:
BATCH_SIZE = 64
N_ITERS = 50

pipeline.run(BATCH_SIZE, n_iters=N_ITERS, bar=True, profile=True,
                   bar_desc=W(V('loss_history')[-1].format('Loss is {:7.7}')))


Loss is 0.1426592: 100%|██████████| 50/50 [01:22<00:00,  1.29s/it] 
Out[5]:
<batchflow.batchflow.pipeline.Pipeline at 0x7f52f4521048>

Pipeline profiling

First of all, there is an elapsed_time attribute inside every instance of Pipeline: it stores total time of running the pipeline (even if it was used multiple times):


In [6]:
pipeline.elapsed_time


Out[6]:
82.73438310623169

Note that elapsed_time attribute is created whether or not we set profile to True.

After running with profile=True, pipeline has attribute profile_info: this DataFrame holds collected information:


In [7]:
pipeline.profile_info.head()


Out[7]:
iter total_time pipeline_time ncalls tottime cumtime batch_id start_time
action id
to_array #0 <built-in method _abc._abc_instancecheck>::/home/tsimfer/anaconda3/lib/python3.7/abc.py::137::__instancecheck__ 1 0.081367 0.0711 64 0.000166 0.001459 139994263066552 1.582270e+09
<built-in method _abc._abc_subclasscheck>::/home/tsimfer/anaconda3/lib/python3.7/abc.py::141::__subclasscheck__ 1 0.081367 0.0711 57 0.001096 0.001288 139994263066552 1.582270e+09
<built-in method _operator.index>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/numpy/core/numeric.py::1557::normalize_axis_tuple 1 0.081367 0.0711 128 0.000037 0.000037 139994263066552 1.582270e+09
<built-in method posix.sched_getaffinity>::../../../batchflow/batchflow/decorators.py::18::_workers_count 1 0.081367 0.0711 1 0.000048 0.000048 139994263066552 1.582270e+09
<built-in method builtins.any>::../../../batchflow/batchflow/decorators.py::86::any_action_failed 1 0.081367 0.0711 1 0.000003 0.000026 139994263066552 1.582270e+09

Note that there is a detailed information about exact methods that are called inside each of the actions. That is a lot of data which can give us precise understanding of parts of the code, that are our bottlenecks.

Columns of the profile_info:

  • action, iter, batch_id and start_time are pretty self-explainable
  • id allows to identify exact method with great details: it is a concatenation of method_name, file_name, line_number and callee
  • total_time is a time taken by an action
  • pipeline_time is total_time plus time of processing the profiling table at each iteration
  • tottime is a time taken by a method inside action
  • cumtime is a time take by a method and all of the methods that are called inside this method

More often than not, though, we don't need such granularity. Pipeline method show_profile_info makes some handy aggregations:

Note: by default, results are sorted on total_time or tottime, depending on level of details.


In [8]:
# timings for each action
pipeline.show_profile_info(per_iter=False, detailed=False)


Out[8]:
total_time pipeline_time
sum mean max sum mean max
action
train_model #2 75.821759 1.516435 3.217391 75.503571 1.510071 3.205411
to_array #0 2.835696 0.056714 0.106486 2.546398 0.050928 0.097141
multiply #1 1.692740 0.033855 0.069450 1.509747 0.030195 0.060036

In [9]:
# for each action show 2 of the slowest methods, based on maximum `ncalls`
pipeline.show_profile_info(per_iter=False, detailed=True, sortby=('ncalls', 'max'), limit=2)


Out[9]:
ncalls tottime cumtime
sum mean max sum mean max sum mean max
action id
multiply #1 <built-in method builtins.isinstance>::../../../batchflow/batchflow/components.py::105::find_in_index 409600 8192.00 8192 0.050672 0.001013 0.001615 0.050672 0.001013 0.001615
<built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index 204800 4096.00 4096 0.111507 0.002230 0.003821 0.111507 0.002230 0.003821
to_array #0 <built-in method builtins.isinstance>::../../../batchflow/batchflow/components.py::105::find_in_index 409600 8192.00 8192 0.069341 0.001387 0.002048 0.069341 0.001387 0.002048
<built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index 204800 4096.00 4096 0.167002 0.003340 0.009804 0.167002 0.003340 0.009804
train_model #2 <method 'append' of 'list' objects>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/profiler.py::641::parse_cpu_trace 1082936 21658.72 21686 0.151943 0.003039 0.005326 0.151943 0.003039 0.005326
<lambda>::~::0::<method 'sort' of 'list' objects> 541468 10829.36 10843 0.091438 0.001829 0.004184 0.091438 0.001829 0.004184

In [10]:
# timings for each action for each iter
pipeline.show_profile_info(per_iter=True, detailed=False,)


Out[10]:
total_time pipeline_time batch_id
iter action
1 train_model #2 3.217391 3.205411 139994263066555
to_array #0 0.081367 0.071100 139994263066552
multiply #1 0.024790 0.021719 139994263066552
2 train_model #2 1.652845 1.647868 139994266422688
to_array #0 0.043760 0.039731 139994266422688
multiply #1 0.029897 0.024791 139994266422688
3 train_model #2 1.298541 1.293026 139994121185768
to_array #0 0.045227 0.040213 139994121185768
multiply #1 0.033502 0.030644 139994121185768
4 train_model #2 1.454606 1.450196 139994266422688
to_array #0 0.046735 0.041521 139994266422688
multiply #1 0.035175 0.032045 139994266422688
5 train_model #2 1.379844 1.375117 139994121545432
to_array #0 0.042979 0.038731 139994121545432
multiply #1 0.039776 0.036961 139994121545432
6 train_model #2 1.470328 1.465981 139994147965584
to_array #0 0.059414 0.056368 139994147965584
multiply #1 0.023794 0.020913 139994147965584
7 train_model #2 1.403927 1.399618 139994114391120
to_array #0 0.040669 0.037654 139994114391120
multiply #1 0.025973 0.023206 139994114391120
8 train_model #2 1.290563 1.286370 139994266422688
to_array #0 0.047976 0.043082 139994266422688
multiply #1 0.028793 0.026014 139994266422688
9 train_model #2 1.557899 1.545445 139994121545432
to_array #0 0.053481 0.045220 139994121545432
multiply #1 0.029404 0.026678 139994121545432
10 train_model #2 1.516459 1.511827 139994120640160
to_array #0 0.100030 0.080075 139994120640160
multiply #1 0.043438 0.039798 139994120640160
... ... ... ... ...
41 train_model #2 1.372318 1.367815 139994085289152
to_array #0 0.053491 0.050423 139994085289152
multiply #1 0.040002 0.037197 139994085289152
42 train_model #2 1.813250 1.801895 139993976315800
to_array #0 0.043776 0.038399 139993976315800
multiply #1 0.021445 0.018358 139993976315800
43 train_model #2 1.713757 1.709046 139994085289152
to_array #0 0.085090 0.081801 139994085289152
multiply #1 0.054966 0.050499 139994085289152
44 train_model #2 1.894864 1.881982 139993976315800
to_array #0 0.091153 0.087466 139993976315800
multiply #1 0.042776 0.038372 139993976315800
45 train_model #2 0.981556 0.977580 139994065303872
to_array #0 0.055165 0.046776 139994065303872
multiply #1 0.031591 0.027191 139994065303872
46 train_model #2 0.995855 0.991259 139993976315800
to_array #0 0.038230 0.031463 139993976315800
multiply #1 0.021815 0.018316 139993976315800
47 train_model #2 1.048226 1.044194 139994085289152
to_array #0 0.045290 0.040356 139994085289152
multiply #1 0.020976 0.018173 139994085289152
48 train_model #2 1.156613 1.152424 139993958521208
to_array #0 0.043842 0.040790 139993958521208
multiply #1 0.021237 0.018435 139993958521208
49 train_model #2 1.118411 1.109987 139994065302360
to_array #0 0.040939 0.036036 139994065302360
multiply #1 0.028280 0.025523 139994065302360
50 train_model #2 1.065028 1.060632 139993976315800
to_array #0 0.062030 0.054732 139993976315800
multiply #1 0.037935 0.032852 139993976315800

150 rows × 3 columns


In [11]:
# for each iter each action show 3 of the slowest methods, based on maximum `ncalls`
pipeline.show_profile_info(per_iter=True, detailed=True, sortby='tottime', limit=3)


Out[11]:
ncalls tottime cumtime
iter action id
1 multiply #1 find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp> 4096 0.010762 0.013369
<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos 64 0.002169 0.015538
_get::../../../batchflow/batchflow/components.py::145::get 64 0.002132 0.017950
to_array #0 find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp> 4096 0.031077 0.037401
<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos 64 0.006795 0.044196
<built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index 4096 0.004464 0.004464
train_model #2 <method 'run_backward' of 'torch._C._EngineBase' objects>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py::44::backward 1 1.213085 1.213085
<built-in method conv2d>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py::334::conv2d_forward 143 1.080935 1.080935
<built-in method batch_norm>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py::1643::batch_norm 143 0.241610 0.241610
2 multiply #1 find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp> 4096 0.011132 0.013721
_get::../../../batchflow/batchflow/components.py::145::get 64 0.003379 0.019765
<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos 64 0.002331 0.016052
to_array #0 find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp> 4096 0.015315 0.018649
<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos 64 0.003696 0.022345
__array_interface__::~::0::<built-in method numpy.array> 64 0.003527 0.005620
train_model #2 <method 'run_backward' of 'torch._C._EngineBase' objects>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py::44::backward 1 1.004392 1.004392
__init__::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/profiler.py::641::parse_cpu_trace 10843 0.152933 0.157272
<built-in method conv2d>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py::334::conv2d_forward 20 0.110850 0.110850
3 multiply #1 find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp> 4096 0.015414 0.018821
<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos 64 0.003056 0.021877
_get::../../../batchflow/batchflow/components.py::145::get 64 0.003026 0.025336
to_array #0 find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp> 4096 0.017931 0.021669
<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos 64 0.003794 0.025463
<built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index 4096 0.002563 0.002563
train_model #2 <method 'run_backward' of 'torch._C._EngineBase' objects>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py::44::backward 1 0.755082 0.755082
<built-in method conv2d>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py::334::conv2d_forward 20 0.143743 0.143743
parse_cpu_trace::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/profiler.py::259::__exit__ 1 0.093924 0.117686
4 multiply #1 find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp> 4096 0.012787 0.015775
_get::../../../batchflow/batchflow/components.py::145::get 64 0.005221 0.024614
<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos 64 0.003126 0.018901
... ... ... ... ... ...
47 train_model #2 <method 'run_backward' of 'torch._C._EngineBase' objects>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py::44::backward 1 0.592086 0.592086
parse_cpu_trace::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/profiler.py::259::__exit__ 1 0.111435 0.138595
<built-in method conv2d>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py::334::conv2d_forward 20 0.105768 0.105768
48 multiply #1 find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp> 4096 0.009514 0.011902
<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos 64 0.001712 0.013614
<built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index 4096 0.001590 0.001590
to_array #0 find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp> 4096 0.017634 0.021265
<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos 64 0.004749 0.026014
<built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index 4096 0.002465 0.002465
train_model #2 <method 'run_backward' of 'torch._C._EngineBase' objects>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py::44::backward 1 0.734896 0.734896
parse_cpu_trace::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/profiler.py::259::__exit__ 1 0.101226 0.124736
<built-in method conv2d>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py::334::conv2d_forward 20 0.082176 0.082176
49 multiply #1 find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp> 4096 0.012500 0.015302
_get::../../../batchflow/batchflow/components.py::145::get 64 0.002776 0.020858
<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos 64 0.002441 0.017743
to_array #0 find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp> 4096 0.015514 0.018998
<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos 64 0.003202 0.022200
<built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index 4096 0.002402 0.002402
train_model #2 <method 'run_backward' of 'torch._C._EngineBase' objects>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py::44::backward 1 0.573810 0.573810
parse_cpu_trace::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/profiler.py::259::__exit__ 1 0.123973 0.154925
<built-in method conv2d>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py::334::conv2d_forward 20 0.116472 0.116472
50 multiply #1 find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp> 4096 0.015247 0.018710
_get::../../../batchflow/batchflow/components.py::145::get 64 0.003869 0.026167
<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos 64 0.003115 0.021825
to_array #0 find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp> 4096 0.024549 0.029653
<listcomp>::../../../batchflow/batchflow/components.py::113::get_pos 64 0.005232 0.034885
<built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index 4096 0.003526 0.003526
train_model #2 <method 'run_backward' of 'torch._C._EngineBase' objects>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py::44::backward 1 0.641996 0.641996
<built-in method conv2d>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py::334::conv2d_forward 20 0.101700 0.101700
parse_cpu_trace::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/profiler.py::259::__exit__ 1 0.091317 0.115946

450 rows × 3 columns

Model profiling


In [12]:
model = pipeline.m('resnet')

There is an info property that, unsurprisingly, shows a lot of interesting details regarding model itself or the training process:


In [13]:
model.info


##### Config:
{'benchmark': True,
 'body': {'encoder': {'blocks': {'base': <class 'batchflow.models.torch.blocks.ResBlock'>,
                                 'bottleneck': False,
                                 'downsample': [False, True, True, True],
                                 'filters': [64, 128, 256, 512],
                                 'layout': 'cnacn',
                                 'n_reps': [2, 2, 2, 2],
                                 'se': False},
                      'downsample': {'layout': 'p',
                                     'pool_size': 2,
                                     'pool_strides': 2},
                      'num_stages': 4,
                      'order': ['skip', 'block']}},
 'common': {'data_format': 'channels_first'},
 'decay': None,
 'device': None,
 'head': {'classes': 10,
          'dropout_rate': 0.4,
          'filters': 10,
          'layout': 'Vdf',
          'target_shape': (64,),
          'units': 10},
 'initial_block': {'filters': 64,
                   'kernel_size': 7,
                   'layout': 'cnap',
                   'pool_size': 3,
                   'pool_strides': 2,
                   'strides': 2},
 'inputs': {'labels': {'classes': 10}, 'targets': {'classes': 10}},
 'loss': 'ce',
 'microbatch': None,
 'optimizer': 'Adam',
 'order': ['initial_block', 'body', 'head'],
 'output': None,
 'placeholder_batch_size': 2,
 'predictions': None,
 'profile': True,
 'sync_frequency': 1,
 'train_steps': {'': {'decay': None,
                      'loss': 'ce',
                      'n_iters': None,
                      'optimizer': 'Adam'}}}

##### Devices:
Leading device is cpu

##### Train steps:
{'': {'decay': None,
      'initialized': True,
      'iter': 50,
      'loss': [CrossEntropyLoss()],
      'n_iters': None,
      'optimizer': Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.001
    weight_decay: 0
)}}

##### Additional info:
Input 0 has shape (64, 1, 28, 28)
Target has shape (64,)

Total number of parameters in model: 11175370

Total number of passed training iterations: 50

##### Last iteration params:
{'actual_model_inputs_shape': [(64, 1, 28, 28)],
 'actual_model_outputs_shape': (64,),
 'microbatch': None,
 'steps': 1,
 'sync_frequency': 1,
 'train_mode': ['']}

As with pipeline, there is a profile_info attribute, as well as show_profile_info method. Depending on type of the used device (CPU or GPU)


In [14]:
# one row for every operation inside model; limit at 5 rows
model.show_profile_info(per_iter=False, limit=5)


Out[14]:
ncalls CPU_tottime CPU_cumtime CPU_tottime_avg
sum mean max sum mean max sum mean max sum mean max
name
mkldnn_convolution_backward 1000 20.00 20 38.598603 0.771972 1.102136 38.598603 0.771972 1.102136 1.929930 0.038599 0.055107
mkldnn_convolution 1000 20.00 20 6.414861 0.128297 0.219037 6.414861 0.128297 0.219037 0.320743 0.006415 0.010952
sqrt 3038 62.00 62 3.609079 0.073655 0.117845 3.609079 0.073655 0.117845 0.058211 0.001188 0.001901
native_batch_norm_backward 1000 20.00 20 2.508889 0.050178 0.078185 2.508889 0.050178 0.078185 0.125444 0.002509 0.003909
add_ 10114 202.28 206 1.842055 0.036841 0.056120 1.842055 0.036841 0.056120 0.008953 0.000179 0.000272

In [15]:
# for each iteration show 3 of the slowest operations
model.show_profile_info(per_iter=True, limit=3)


Out[15]:
ncalls CPU_tottime CPU_cumtime CPU_tottime_avg
iter name
0 mkldnn_convolution_backward 20 1.102136 1.102136 0.055107
mkldnn_convolution 20 0.116687 0.116687 0.005834
native_batch_norm_backward 20 0.073042 0.073042 0.003652
1 mkldnn_convolution_backward 20 0.910873 0.910873 0.045544
mkldnn_convolution 20 0.109726 0.109726 0.005486
sqrt 62 0.061738 0.061738 0.000996
2 mkldnn_convolution_backward 20 0.674379 0.674379 0.033719
mkldnn_convolution 20 0.142355 0.142355 0.007118
sqrt 62 0.070446 0.070446 0.001136
3 mkldnn_convolution_backward 20 0.741461 0.741461 0.037073
mkldnn_convolution 20 0.103317 0.103317 0.005166
sqrt 62 0.058764 0.058764 0.000948
4 mkldnn_convolution_backward 20 0.768156 0.768156 0.038408
mkldnn_convolution 20 0.113721 0.113721 0.005686
sqrt 62 0.069744 0.069744 0.001125
5 mkldnn_convolution_backward 20 0.745379 0.745379 0.037269
mkldnn_convolution 20 0.113744 0.113744 0.005687
sqrt 62 0.074473 0.074473 0.001201
6 mkldnn_convolution_backward 20 0.743273 0.743273 0.037164
mkldnn_convolution 20 0.136254 0.136254 0.006813
sqrt 62 0.106651 0.106651 0.001720
7 mkldnn_convolution_backward 20 0.721489 0.721489 0.036074
mkldnn_convolution 20 0.117488 0.117488 0.005874
sqrt 62 0.056416 0.056416 0.000910
8 mkldnn_convolution_backward 20 0.816239 0.816239 0.040812
mkldnn_convolution 20 0.104785 0.104785 0.005239
sqrt 62 0.088693 0.088693 0.001431
9 mkldnn_convolution_backward 20 0.762722 0.762722 0.038136
mkldnn_convolution 20 0.106693 0.106693 0.005335
sqrt 62 0.083834 0.083834 0.001352
... ... ... ... ... ...
40 mkldnn_convolution_backward 20 0.503122 0.503122 0.025156
mkldnn_convolution 20 0.089508 0.089508 0.004475
sqrt 62 0.066431 0.066431 0.001071
41 mkldnn_convolution_backward 20 1.013086 1.013086 0.050654
mkldnn_convolution 20 0.131901 0.131901 0.006595
sqrt 62 0.090812 0.090812 0.001465
42 mkldnn_convolution_backward 20 0.860204 0.860204 0.043010
mkldnn_convolution 20 0.124513 0.124513 0.006226
sqrt 62 0.117845 0.117845 0.001901
43 mkldnn_convolution_backward 20 1.018114 1.018114 0.050906
mkldnn_convolution 20 0.190651 0.190651 0.009533
sqrt 62 0.090844 0.090844 0.001465
44 mkldnn_convolution_backward 20 0.480101 0.480101 0.024005
mkldnn_convolution 20 0.080808 0.080808 0.004040
sqrt 62 0.072511 0.072511 0.001170
45 mkldnn_convolution_backward 20 0.504529 0.504529 0.025226
mkldnn_convolution 20 0.079474 0.079474 0.003974
sqrt 62 0.066213 0.066213 0.001068
46 mkldnn_convolution_backward 20 0.526187 0.526187 0.026309
mkldnn_convolution 20 0.104905 0.104905 0.005245
sqrt 62 0.061077 0.061077 0.000985
47 mkldnn_convolution_backward 20 0.654809 0.654809 0.032740
mkldnn_convolution 20 0.081344 0.081344 0.004067
sqrt 62 0.063444 0.063444 0.001023
48 mkldnn_convolution_backward 20 0.511648 0.511648 0.025582
mkldnn_convolution 20 0.115203 0.115203 0.005760
sqrt 62 0.070366 0.070366 0.001135
49 mkldnn_convolution_backward 20 0.559349 0.559349 0.027967
mkldnn_convolution 20 0.100382 0.100382 0.005019
native_batch_norm_backward 20 0.047145 0.047145 0.002357

150 rows × 4 columns