Profiling BatchFlow code

A profile is a set of statistics that describes how often and for how long various parts of the program executed.

This notebooks shows how to profile various parts of BatchFlow: namely, pipelines and models.



In [1]:

    
import sys

sys.path.append("../../..")
from batchflow import B, V, W
from batchflow.opensets import MNIST
from batchflow.models.torch import ResNet18



In [2]:

    
dataset = MNIST()

To collect information about model training times (both on CPU and GPU), one must set profile option in the model configuration to True:



In [3]:

    
model_config = {
    'inputs/labels/classes': 10,
    'loss': 'ce',
    'profile': True,
}



In [4]:

    
pipeline = (dataset.train.p
                .init_variable('loss_history', [])
                .to_array(channels='first', dtype='float32')
                .multiply(multiplier=1/255., preserve_type=False)
                .init_model('dynamic', ResNet18,
                            'resnet', config=model_config)
                .train_model('resnet',
                             B.images, B.labels,
                             fetches='loss',
                             save_to=V('loss_history', mode='a'))
                )

To gather statistics about how long each action takes, we must set profile to True inside run call:



In [5]:

    
BATCH_SIZE = 64
N_ITERS = 50

pipeline.run(BATCH_SIZE, n_iters=N_ITERS, bar=True, profile=True,
                   bar_desc=W(V('loss_history')[-1].format('Loss is {:7.7}')))









    



Loss is 0.1426592: 100%|██████████| 50/50 [01:22<00:00,  1.29s/it] 






    Out[5]:





<batchflow.batchflow.pipeline.Pipeline at 0x7f52f4521048>

Pipeline profiling

First of all, there is an elapsed_time attribute inside every instance of Pipeline: it stores total time of running the pipeline (even if it was used multiple times):



In [6]:

    
pipeline.elapsed_time









    Out[6]:





82.73438310623169

Note that elapsed_time attribute is created whether or not we set profile to True.

After running with profile=True, pipeline has attribute profile_info: this DataFrame holds collected information:



In [7]:

    
pipeline.profile_info.head()









    Out[7]:







  
    
      
      
      iter
      total_time
      pipeline_time
      ncalls
      tottime
      cumtime
      batch_id
      start_time
    
    
      action
      id
      
      
      
      
      
      
      
      
    
  
  
    
      to_array #0
      <built-in method _abc._abc_instancecheck>::/home/tsimfer/anaconda3/lib/python3.7/abc.py::137::__instancecheck__
      1
      0.081367
      0.0711
      64
      0.000166
      0.001459
      139994263066552
      1.582270e+09
    
    
      <built-in method _abc._abc_subclasscheck>::/home/tsimfer/anaconda3/lib/python3.7/abc.py::141::__subclasscheck__
      1
      0.081367
      0.0711
      57
      0.001096
      0.001288
      139994263066552
      1.582270e+09
    
    
      <built-in method _operator.index>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/numpy/core/numeric.py::1557::normalize_axis_tuple
      1
      0.081367
      0.0711
      128
      0.000037
      0.000037
      139994263066552
      1.582270e+09
    
    
      <built-in method posix.sched_getaffinity>::../../../batchflow/batchflow/decorators.py::18::_workers_count
      1
      0.081367
      0.0711
      1
      0.000048
      0.000048
      139994263066552
      1.582270e+09
    
    
      <built-in method builtins.any>::../../../batchflow/batchflow/decorators.py::86::any_action_failed
      1
      0.081367
      0.0711
      1
      0.000003
      0.000026
      139994263066552
      1.582270e+09

Note that there is a detailed information about exact methods that are called inside each of the actions. That is a lot of data which can give us precise understanding of parts of the code, that are our bottlenecks.

Columns of the profile_info:

action, iter, batch_id and start_time are pretty self-explainable
id allows to identify exact method with great details: it is a concatenation of method_name, file_name, line_number and callee
total_time is a time taken by an action
pipeline_time is total_time plus time of processing the profiling table at each iteration
tottime is a time taken by a method inside action
cumtime is a time take by a method and all of the methods that are called inside this method

More often than not, though, we don't need such granularity. Pipeline method show_profile_info makes some handy aggregations:

Note: by default, results are sorted on total_time or tottime, depending on level of details.



In [8]:

    
# timings for each action
pipeline.show_profile_info(per_iter=False, detailed=False)









    Out[8]:







  
    
      
      total_time
      pipeline_time
    
    
      
      sum
      mean
      max
      sum
      mean
      max
    
    
      action
      
      
      
      
      
      
    
  
  
    
      train_model #2
      75.821759
      1.516435
      3.217391
      75.503571
      1.510071
      3.205411
    
    
      to_array #0
      2.835696
      0.056714
      0.106486
      2.546398
      0.050928
      0.097141
    
    
      multiply #1
      1.692740
      0.033855
      0.069450
      1.509747
      0.030195
      0.060036



In [9]:

    
# for each action show 2 of the slowest methods, based on maximum `ncalls`
pipeline.show_profile_info(per_iter=False, detailed=True, sortby=('ncalls', 'max'), limit=2)









    Out[9]:







  
    
      
      
      ncalls
      tottime
      cumtime
    
    
      
      
      sum
      mean
      max
      sum
      mean
      max
      sum
      mean
      max
    
    
      action
      id
      
      
      
      
      
      
      
      
      
    
  
  
    
      multiply #1
      <built-in method builtins.isinstance>::../../../batchflow/batchflow/components.py::105::find_in_index
      409600
      8192.00
      8192
      0.050672
      0.001013
      0.001615
      0.050672
      0.001013
      0.001615
    
    
      <built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index
      204800
      4096.00
      4096
      0.111507
      0.002230
      0.003821
      0.111507
      0.002230
      0.003821
    
    
      to_array #0
      <built-in method builtins.isinstance>::../../../batchflow/batchflow/components.py::105::find_in_index
      409600
      8192.00
      8192
      0.069341
      0.001387
      0.002048
      0.069341
      0.001387
      0.002048
    
    
      <built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index
      204800
      4096.00
      4096
      0.167002
      0.003340
      0.009804
      0.167002
      0.003340
      0.009804
    
    
      train_model #2
      <method 'append' of 'list' objects>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/profiler.py::641::parse_cpu_trace
      1082936
      21658.72
      21686
      0.151943
      0.003039
      0.005326
      0.151943
      0.003039
      0.005326
    
    
      <lambda>::~::0::<method 'sort' of 'list' objects>
      541468
      10829.36
      10843
      0.091438
      0.001829
      0.004184
      0.091438
      0.001829
      0.004184



In [10]:

    
# timings for each action for each iter
pipeline.show_profile_info(per_iter=True, detailed=False,)









    Out[10]:







  
    
      
      
      total_time
      pipeline_time
      batch_id
    
    
      iter
      action
      
      
      
    
  
  
    
      1
      train_model #2
      3.217391
      3.205411
      139994263066555
    
    
      to_array #0
      0.081367
      0.071100
      139994263066552
    
    
      multiply #1
      0.024790
      0.021719
      139994263066552
    
    
      2
      train_model #2
      1.652845
      1.647868
      139994266422688
    
    
      to_array #0
      0.043760
      0.039731
      139994266422688
    
    
      multiply #1
      0.029897
      0.024791
      139994266422688
    
    
      3
      train_model #2
      1.298541
      1.293026
      139994121185768
    
    
      to_array #0
      0.045227
      0.040213
      139994121185768
    
    
      multiply #1
      0.033502
      0.030644
      139994121185768
    
    
      4
      train_model #2
      1.454606
      1.450196
      139994266422688
    
    
      to_array #0
      0.046735
      0.041521
      139994266422688
    
    
      multiply #1
      0.035175
      0.032045
      139994266422688
    
    
      5
      train_model #2
      1.379844
      1.375117
      139994121545432
    
    
      to_array #0
      0.042979
      0.038731
      139994121545432
    
    
      multiply #1
      0.039776
      0.036961
      139994121545432
    
    
      6
      train_model #2
      1.470328
      1.465981
      139994147965584
    
    
      to_array #0
      0.059414
      0.056368
      139994147965584
    
    
      multiply #1
      0.023794
      0.020913
      139994147965584
    
    
      7
      train_model #2
      1.403927
      1.399618
      139994114391120
    
    
      to_array #0
      0.040669
      0.037654
      139994114391120
    
    
      multiply #1
      0.025973
      0.023206
      139994114391120
    
    
      8
      train_model #2
      1.290563
      1.286370
      139994266422688
    
    
      to_array #0
      0.047976
      0.043082
      139994266422688
    
    
      multiply #1
      0.028793
      0.026014
      139994266422688
    
    
      9
      train_model #2
      1.557899
      1.545445
      139994121545432
    
    
      to_array #0
      0.053481
      0.045220
      139994121545432
    
    
      multiply #1
      0.029404
      0.026678
      139994121545432
    
    
      10
      train_model #2
      1.516459
      1.511827
      139994120640160
    
    
      to_array #0
      0.100030
      0.080075
      139994120640160
    
    
      multiply #1
      0.043438
      0.039798
      139994120640160
    
    
      ...
      ...
      ...
      ...
      ...
    
    
      41
      train_model #2
      1.372318
      1.367815
      139994085289152
    
    
      to_array #0
      0.053491
      0.050423
      139994085289152
    
    
      multiply #1
      0.040002
      0.037197
      139994085289152
    
    
      42
      train_model #2
      1.813250
      1.801895
      139993976315800
    
    
      to_array #0
      0.043776
      0.038399
      139993976315800
    
    
      multiply #1
      0.021445
      0.018358
      139993976315800
    
    
      43
      train_model #2
      1.713757
      1.709046
      139994085289152
    
    
      to_array #0
      0.085090
      0.081801
      139994085289152
    
    
      multiply #1
      0.054966
      0.050499
      139994085289152
    
    
      44
      train_model #2
      1.894864
      1.881982
      139993976315800
    
    
      to_array #0
      0.091153
      0.087466
      139993976315800
    
    
      multiply #1
      0.042776
      0.038372
      139993976315800
    
    
      45
      train_model #2
      0.981556
      0.977580
      139994065303872
    
    
      to_array #0
      0.055165
      0.046776
      139994065303872
    
    
      multiply #1
      0.031591
      0.027191
      139994065303872
    
    
      46
      train_model #2
      0.995855
      0.991259
      139993976315800
    
    
      to_array #0
      0.038230
      0.031463
      139993976315800
    
    
      multiply #1
      0.021815
      0.018316
      139993976315800
    
    
      47
      train_model #2
      1.048226
      1.044194
      139994085289152
    
    
      to_array #0
      0.045290
      0.040356
      139994085289152
    
    
      multiply #1
      0.020976
      0.018173
      139994085289152
    
    
      48
      train_model #2
      1.156613
      1.152424
      139993958521208
    
    
      to_array #0
      0.043842
      0.040790
      139993958521208
    
    
      multiply #1
      0.021237
      0.018435
      139993958521208
    
    
      49
      train_model #2
      1.118411
      1.109987
      139994065302360
    
    
      to_array #0
      0.040939
      0.036036
      139994065302360
    
    
      multiply #1
      0.028280
      0.025523
      139994065302360
    
    
      50
      train_model #2
      1.065028
      1.060632
      139993976315800
    
    
      to_array #0
      0.062030
      0.054732
      139993976315800
    
    
      multiply #1
      0.037935
      0.032852
      139993976315800
    
  

150 rows × 3 columns



In [11]:

    
# for each iter each action show 3 of the slowest methods, based on maximum `ncalls`
pipeline.show_profile_info(per_iter=True, detailed=True, sortby='tottime', limit=3)









    Out[11]:







  
    
      
      
      
      ncalls
      tottime
      cumtime
    
    
      iter
      action
      id
      
      
      
    
  
  
    
      1
      multiply #1
      find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>
      4096
      0.010762
      0.013369
    
    
      <listcomp>::../../../batchflow/batchflow/components.py::113::get_pos
      64
      0.002169
      0.015538
    
    
      _get::../../../batchflow/batchflow/components.py::145::get
      64
      0.002132
      0.017950
    
    
      to_array #0
      find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>
      4096
      0.031077
      0.037401
    
    
      <listcomp>::../../../batchflow/batchflow/components.py::113::get_pos
      64
      0.006795
      0.044196
    
    
      <built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index
      4096
      0.004464
      0.004464
    
    
      train_model #2
      <method 'run_backward' of 'torch._C._EngineBase' objects>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py::44::backward
      1
      1.213085
      1.213085
    
    
      <built-in method conv2d>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py::334::conv2d_forward
      143
      1.080935
      1.080935
    
    
      <built-in method batch_norm>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/nn/functional.py::1643::batch_norm
      143
      0.241610
      0.241610
    
    
      2
      multiply #1
      find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>
      4096
      0.011132
      0.013721
    
    
      _get::../../../batchflow/batchflow/components.py::145::get
      64
      0.003379
      0.019765
    
    
      <listcomp>::../../../batchflow/batchflow/components.py::113::get_pos
      64
      0.002331
      0.016052
    
    
      to_array #0
      find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>
      4096
      0.015315
      0.018649
    
    
      <listcomp>::../../../batchflow/batchflow/components.py::113::get_pos
      64
      0.003696
      0.022345
    
    
      __array_interface__::~::0::<built-in method numpy.array>
      64
      0.003527
      0.005620
    
    
      train_model #2
      <method 'run_backward' of 'torch._C._EngineBase' objects>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py::44::backward
      1
      1.004392
      1.004392
    
    
      __init__::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/profiler.py::641::parse_cpu_trace
      10843
      0.152933
      0.157272
    
    
      <built-in method conv2d>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py::334::conv2d_forward
      20
      0.110850
      0.110850
    
    
      3
      multiply #1
      find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>
      4096
      0.015414
      0.018821
    
    
      <listcomp>::../../../batchflow/batchflow/components.py::113::get_pos
      64
      0.003056
      0.021877
    
    
      _get::../../../batchflow/batchflow/components.py::145::get
      64
      0.003026
      0.025336
    
    
      to_array #0
      find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>
      4096
      0.017931
      0.021669
    
    
      <listcomp>::../../../batchflow/batchflow/components.py::113::get_pos
      64
      0.003794
      0.025463
    
    
      <built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index
      4096
      0.002563
      0.002563
    
    
      train_model #2
      <method 'run_backward' of 'torch._C._EngineBase' objects>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py::44::backward
      1
      0.755082
      0.755082
    
    
      <built-in method conv2d>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py::334::conv2d_forward
      20
      0.143743
      0.143743
    
    
      parse_cpu_trace::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/profiler.py::259::__exit__
      1
      0.093924
      0.117686
    
    
      4
      multiply #1
      find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>
      4096
      0.012787
      0.015775
    
    
      _get::../../../batchflow/batchflow/components.py::145::get
      64
      0.005221
      0.024614
    
    
      <listcomp>::../../../batchflow/batchflow/components.py::113::get_pos
      64
      0.003126
      0.018901
    
    
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      47
      train_model #2
      <method 'run_backward' of 'torch._C._EngineBase' objects>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py::44::backward
      1
      0.592086
      0.592086
    
    
      parse_cpu_trace::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/profiler.py::259::__exit__
      1
      0.111435
      0.138595
    
    
      <built-in method conv2d>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py::334::conv2d_forward
      20
      0.105768
      0.105768
    
    
      48
      multiply #1
      find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>
      4096
      0.009514
      0.011902
    
    
      <listcomp>::../../../batchflow/batchflow/components.py::113::get_pos
      64
      0.001712
      0.013614
    
    
      <built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index
      4096
      0.001590
      0.001590
    
    
      to_array #0
      find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>
      4096
      0.017634
      0.021265
    
    
      <listcomp>::../../../batchflow/batchflow/components.py::113::get_pos
      64
      0.004749
      0.026014
    
    
      <built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index
      4096
      0.002465
      0.002465
    
    
      train_model #2
      <method 'run_backward' of 'torch._C._EngineBase' objects>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py::44::backward
      1
      0.734896
      0.734896
    
    
      parse_cpu_trace::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/profiler.py::259::__exit__
      1
      0.101226
      0.124736
    
    
      <built-in method conv2d>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py::334::conv2d_forward
      20
      0.082176
      0.082176
    
    
      49
      multiply #1
      find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>
      4096
      0.012500
      0.015302
    
    
      _get::../../../batchflow/batchflow/components.py::145::get
      64
      0.002776
      0.020858
    
    
      <listcomp>::../../../batchflow/batchflow/components.py::113::get_pos
      64
      0.002441
      0.017743
    
    
      to_array #0
      find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>
      4096
      0.015514
      0.018998
    
    
      <listcomp>::../../../batchflow/batchflow/components.py::113::get_pos
      64
      0.003202
      0.022200
    
    
      <built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index
      4096
      0.002402
      0.002402
    
    
      train_model #2
      <method 'run_backward' of 'torch._C._EngineBase' objects>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py::44::backward
      1
      0.573810
      0.573810
    
    
      parse_cpu_trace::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/profiler.py::259::__exit__
      1
      0.123973
      0.154925
    
    
      <built-in method conv2d>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py::334::conv2d_forward
      20
      0.116472
      0.116472
    
    
      50
      multiply #1
      find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>
      4096
      0.015247
      0.018710
    
    
      _get::../../../batchflow/batchflow/components.py::145::get
      64
      0.003869
      0.026167
    
    
      <listcomp>::../../../batchflow/batchflow/components.py::113::get_pos
      64
      0.003115
      0.021825
    
    
      to_array #0
      find_in_index::../../../batchflow/batchflow/components.py::120::<listcomp>
      4096
      0.024549
      0.029653
    
    
      <listcomp>::../../../batchflow/batchflow/components.py::113::get_pos
      64
      0.005232
      0.034885
    
    
      <built-in method numpy.where>::../../../batchflow/batchflow/components.py::105::find_in_index
      4096
      0.003526
      0.003526
    
    
      train_model #2
      <method 'run_backward' of 'torch._C._EngineBase' objects>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/__init__.py::44::backward
      1
      0.641996
      0.641996
    
    
      <built-in method conv2d>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/nn/modules/conv.py::334::conv2d_forward
      20
      0.101700
      0.101700
    
    
      parse_cpu_trace::/home/tsimfer/anaconda3/lib/python3.7/site-packages/torch/autograd/profiler.py::259::__exit__
      1
      0.091317
      0.115946
    
  

450 rows × 3 columns

Model profiling



In [12]:

    
model = pipeline.m('resnet')

There is an info property that, unsurprisingly, shows a lot of interesting details regarding model itself or the training process:



In [13]:

    
model.info









    



##### Config:
{'benchmark': True,
 'body': {'encoder': {'blocks': {'base': <class 'batchflow.models.torch.blocks.ResBlock'>,
                                 'bottleneck': False,
                                 'downsample': [False, True, True, True],
                                 'filters': [64, 128, 256, 512],
                                 'layout': 'cnacn',
                                 'n_reps': [2, 2, 2, 2],
                                 'se': False},
                      'downsample': {'layout': 'p',
                                     'pool_size': 2,
                                     'pool_strides': 2},
                      'num_stages': 4,
                      'order': ['skip', 'block']}},
 'common': {'data_format': 'channels_first'},
 'decay': None,
 'device': None,
 'head': {'classes': 10,
          'dropout_rate': 0.4,
          'filters': 10,
          'layout': 'Vdf',
          'target_shape': (64,),
          'units': 10},
 'initial_block': {'filters': 64,
                   'kernel_size': 7,
                   'layout': 'cnap',
                   'pool_size': 3,
                   'pool_strides': 2,
                   'strides': 2},
 'inputs': {'labels': {'classes': 10}, 'targets': {'classes': 10}},
 'loss': 'ce',
 'microbatch': None,
 'optimizer': 'Adam',
 'order': ['initial_block', 'body', 'head'],
 'output': None,
 'placeholder_batch_size': 2,
 'predictions': None,
 'profile': True,
 'sync_frequency': 1,
 'train_steps': {'': {'decay': None,
                      'loss': 'ce',
                      'n_iters': None,
                      'optimizer': 'Adam'}}}

##### Devices:
Leading device is cpu

##### Train steps:
{'': {'decay': None,
      'initialized': True,
      'iter': 50,
      'loss': [CrossEntropyLoss()],
      'n_iters': None,
      'optimizer': Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.001
    weight_decay: 0
)}}

##### Additional info:
Input 0 has shape (64, 1, 28, 28)
Target has shape (64,)

Total number of parameters in model: 11175370

Total number of passed training iterations: 50

##### Last iteration params:
{'actual_model_inputs_shape': [(64, 1, 28, 28)],
 'actual_model_outputs_shape': (64,),
 'microbatch': None,
 'steps': 1,
 'sync_frequency': 1,
 'train_mode': ['']}

As with pipeline, there is a profile_info attribute, as well as show_profile_info method. Depending on type of the used device (CPU or GPU)



In [14]:

    
# one row for every operation inside model; limit at 5 rows
model.show_profile_info(per_iter=False, limit=5)









    Out[14]:







  
    
      
      ncalls
      CPU_tottime
      CPU_cumtime
      CPU_tottime_avg
    
    
      
      sum
      mean
      max
      sum
      mean
      max
      sum
      mean
      max
      sum
      mean
      max
    
    
      name
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      mkldnn_convolution_backward
      1000
      20.00
      20
      38.598603
      0.771972
      1.102136
      38.598603
      0.771972
      1.102136
      1.929930
      0.038599
      0.055107
    
    
      mkldnn_convolution
      1000
      20.00
      20
      6.414861
      0.128297
      0.219037
      6.414861
      0.128297
      0.219037
      0.320743
      0.006415
      0.010952
    
    
      sqrt
      3038
      62.00
      62
      3.609079
      0.073655
      0.117845
      3.609079
      0.073655
      0.117845
      0.058211
      0.001188
      0.001901
    
    
      native_batch_norm_backward
      1000
      20.00
      20
      2.508889
      0.050178
      0.078185
      2.508889
      0.050178
      0.078185
      0.125444
      0.002509
      0.003909
    
    
      add_
      10114
      202.28
      206
      1.842055
      0.036841
      0.056120
      1.842055
      0.036841
      0.056120
      0.008953
      0.000179
      0.000272



In [15]:

    
# for each iteration show 3 of the slowest operations
model.show_profile_info(per_iter=True, limit=3)









    Out[15]:







  
    
      
      
      ncalls
      CPU_tottime
      CPU_cumtime
      CPU_tottime_avg
    
    
      iter
      name
      
      
      
      
    
  
  
    
      0
      mkldnn_convolution_backward
      20
      1.102136
      1.102136
      0.055107
    
    
      mkldnn_convolution
      20
      0.116687
      0.116687
      0.005834
    
    
      native_batch_norm_backward
      20
      0.073042
      0.073042
      0.003652
    
    
      1
      mkldnn_convolution_backward
      20
      0.910873
      0.910873
      0.045544
    
    
      mkldnn_convolution
      20
      0.109726
      0.109726
      0.005486
    
    
      sqrt
      62
      0.061738
      0.061738
      0.000996
    
    
      2
      mkldnn_convolution_backward
      20
      0.674379
      0.674379
      0.033719
    
    
      mkldnn_convolution
      20
      0.142355
      0.142355
      0.007118
    
    
      sqrt
      62
      0.070446
      0.070446
      0.001136
    
    
      3
      mkldnn_convolution_backward
      20
      0.741461
      0.741461
      0.037073
    
    
      mkldnn_convolution
      20
      0.103317
      0.103317
      0.005166
    
    
      sqrt
      62
      0.058764
      0.058764
      0.000948
    
    
      4
      mkldnn_convolution_backward
      20
      0.768156
      0.768156
      0.038408
    
    
      mkldnn_convolution
      20
      0.113721
      0.113721
      0.005686
    
    
      sqrt
      62
      0.069744
      0.069744
      0.001125
    
    
      5
      mkldnn_convolution_backward
      20
      0.745379
      0.745379
      0.037269
    
    
      mkldnn_convolution
      20
      0.113744
      0.113744
      0.005687
    
    
      sqrt
      62
      0.074473
      0.074473
      0.001201
    
    
      6
      mkldnn_convolution_backward
      20
      0.743273
      0.743273
      0.037164
    
    
      mkldnn_convolution
      20
      0.136254
      0.136254
      0.006813
    
    
      sqrt
      62
      0.106651
      0.106651
      0.001720
    
    
      7
      mkldnn_convolution_backward
      20
      0.721489
      0.721489
      0.036074
    
    
      mkldnn_convolution
      20
      0.117488
      0.117488
      0.005874
    
    
      sqrt
      62
      0.056416
      0.056416
      0.000910
    
    
      8
      mkldnn_convolution_backward
      20
      0.816239
      0.816239
      0.040812
    
    
      mkldnn_convolution
      20
      0.104785
      0.104785
      0.005239
    
    
      sqrt
      62
      0.088693
      0.088693
      0.001431
    
    
      9
      mkldnn_convolution_backward
      20
      0.762722
      0.762722
      0.038136
    
    
      mkldnn_convolution
      20
      0.106693
      0.106693
      0.005335
    
    
      sqrt
      62
      0.083834
      0.083834
      0.001352
    
    
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      40
      mkldnn_convolution_backward
      20
      0.503122
      0.503122
      0.025156
    
    
      mkldnn_convolution
      20
      0.089508
      0.089508
      0.004475
    
    
      sqrt
      62
      0.066431
      0.066431
      0.001071
    
    
      41
      mkldnn_convolution_backward
      20
      1.013086
      1.013086
      0.050654
    
    
      mkldnn_convolution
      20
      0.131901
      0.131901
      0.006595
    
    
      sqrt
      62
      0.090812
      0.090812
      0.001465
    
    
      42
      mkldnn_convolution_backward
      20
      0.860204
      0.860204
      0.043010
    
    
      mkldnn_convolution
      20
      0.124513
      0.124513
      0.006226
    
    
      sqrt
      62
      0.117845
      0.117845
      0.001901
    
    
      43
      mkldnn_convolution_backward
      20
      1.018114
      1.018114
      0.050906
    
    
      mkldnn_convolution
      20
      0.190651
      0.190651
      0.009533
    
    
      sqrt
      62
      0.090844
      0.090844
      0.001465
    
    
      44
      mkldnn_convolution_backward
      20
      0.480101
      0.480101
      0.024005
    
    
      mkldnn_convolution
      20
      0.080808
      0.080808
      0.004040
    
    
      sqrt
      62
      0.072511
      0.072511
      0.001170
    
    
      45
      mkldnn_convolution_backward
      20
      0.504529
      0.504529
      0.025226
    
    
      mkldnn_convolution
      20
      0.079474
      0.079474
      0.003974
    
    
      sqrt
      62
      0.066213
      0.066213
      0.001068
    
    
      46
      mkldnn_convolution_backward
      20
      0.526187
      0.526187
      0.026309
    
    
      mkldnn_convolution
      20
      0.104905
      0.104905
      0.005245
    
    
      sqrt
      62
      0.061077
      0.061077
      0.000985
    
    
      47
      mkldnn_convolution_backward
      20
      0.654809
      0.654809
      0.032740
    
    
      mkldnn_convolution
      20
      0.081344
      0.081344
      0.004067
    
    
      sqrt
      62
      0.063444
      0.063444
      0.001023
    
    
      48
      mkldnn_convolution_backward
      20
      0.511648
      0.511648
      0.025582
    
    
      mkldnn_convolution
      20
      0.115203
      0.115203
      0.005760
    
    
      sqrt
      62
      0.070366
      0.070366
      0.001135
    
    
      49
      mkldnn_convolution_backward
      20
      0.559349
      0.559349
      0.027967
    
    
      mkldnn_convolution
      20
      0.100382
      0.100382
      0.005019
    
    
      native_batch_norm_backward
      20
      0.047145
      0.047145
      0.002357
    
  

150 rows × 4 columns

		iter	total_time	pipeline_time	ncalls	tottime	cumtime	batch_id	start_time
action	id
to_array #0	<built-in method _abc._abc_instancecheck>::/home/tsimfer/anaconda3/lib/python3.7/abc.py::137::__instancecheck__	1	0.081367	0.0711	64	0.000166	0.001459	139994263066552	1.582270e+09
	<built-in method _abc._abc_subclasscheck>::/home/tsimfer/anaconda3/lib/python3.7/abc.py::141::__subclasscheck__	1	0.081367	0.0711	57	0.001096	0.001288	139994263066552	1.582270e+09
	<built-in method _operator.index>::/home/tsimfer/anaconda3/lib/python3.7/site-packages/numpy/core/numeric.py::1557::normalize_axis_tuple	1	0.081367	0.0711	128	0.000037	0.000037	139994263066552	1.582270e+09
	<built-in method posix.sched_getaffinity>::../../../batchflow/batchflow/decorators.py::18::_workers_count	1	0.081367	0.0711	1	0.000048	0.000048	139994263066552	1.582270e+09
	<built-in method builtins.any>::../../../batchflow/batchflow/decorators.py::86::any_action_failed	1	0.081367	0.0711	1	0.000003	0.000026	139994263066552	1.582270e+09

	total_time			pipeline_time
	sum	mean	max	sum	mean	max
action
train_model #2	75.821759	1.516435	3.217391	75.503571	1.510071	3.205411
to_array #0	2.835696	0.056714	0.106486	2.546398	0.050928	0.097141
multiply #1	1.692740	0.033855	0.069450	1.509747	0.030195	0.060036

		total_time	pipeline_time	batch_id
iter	action
1	train_model #2	3.217391	3.205411	139994263066555
	to_array #0	0.081367	0.071100	139994263066552
	multiply #1	0.024790	0.021719	139994263066552
2	train_model #2	1.652845	1.647868	139994266422688
	to_array #0	0.043760	0.039731	139994266422688
	multiply #1	0.029897	0.024791	139994266422688
3	train_model #2	1.298541	1.293026	139994121185768
	to_array #0	0.045227	0.040213	139994121185768
	multiply #1	0.033502	0.030644	139994121185768
4	train_model #2	1.454606	1.450196	139994266422688
	to_array #0	0.046735	0.041521	139994266422688
	multiply #1	0.035175	0.032045	139994266422688
5	train_model #2	1.379844	1.375117	139994121545432
	to_array #0	0.042979	0.038731	139994121545432
	multiply #1	0.039776	0.036961	139994121545432
6	train_model #2	1.470328	1.465981	139994147965584
	to_array #0	0.059414	0.056368	139994147965584
	multiply #1	0.023794	0.020913	139994147965584
7	train_model #2	1.403927	1.399618	139994114391120
	to_array #0	0.040669	0.037654	139994114391120
	multiply #1	0.025973	0.023206	139994114391120
8	train_model #2	1.290563	1.286370	139994266422688
	to_array #0	0.047976	0.043082	139994266422688
	multiply #1	0.028793	0.026014	139994266422688
9	train_model #2	1.557899	1.545445	139994121545432
	to_array #0	0.053481	0.045220	139994121545432
	multiply #1	0.029404	0.026678	139994121545432
10	train_model #2	1.516459	1.511827	139994120640160
	to_array #0	0.100030	0.080075	139994120640160
	multiply #1	0.043438	0.039798	139994120640160
...	...	...	...	...
41	train_model #2	1.372318	1.367815	139994085289152
	to_array #0	0.053491	0.050423	139994085289152
	multiply #1	0.040002	0.037197	139994085289152
42	train_model #2	1.813250	1.801895	139993976315800
	to_array #0	0.043776	0.038399	139993976315800
	multiply #1	0.021445	0.018358	139993976315800
43	train_model #2	1.713757	1.709046	139994085289152
	to_array #0	0.085090	0.081801	139994085289152
	multiply #1	0.054966	0.050499	139994085289152
44	train_model #2	1.894864	1.881982	139993976315800
	to_array #0	0.091153	0.087466	139993976315800
	multiply #1	0.042776	0.038372	139993976315800
45	train_model #2	0.981556	0.977580	139994065303872
	to_array #0	0.055165	0.046776	139994065303872
	multiply #1	0.031591	0.027191	139994065303872
46	train_model #2	0.995855	0.991259	139993976315800
	to_array #0	0.038230	0.031463	139993976315800
	multiply #1	0.021815	0.018316	139993976315800
47	train_model #2	1.048226	1.044194	139994085289152
	to_array #0	0.045290	0.040356	139994085289152
	multiply #1	0.020976	0.018173	139994085289152
48	train_model #2	1.156613	1.152424	139993958521208
	to_array #0	0.043842	0.040790	139993958521208
	multiply #1	0.021237	0.018435	139993958521208
49	train_model #2	1.118411	1.109987	139994065302360
	to_array #0	0.040939	0.036036	139994065302360
	multiply #1	0.028280	0.025523	139994065302360
50	train_model #2	1.065028	1.060632	139993976315800
	to_array #0	0.062030	0.054732	139993976315800
	multiply #1	0.037935	0.032852	139993976315800

		ncalls	CPU_tottime	CPU_cumtime	CPU_tottime_avg
iter	name
0	mkldnn_convolution_backward	20	1.102136	1.102136	0.055107
	mkldnn_convolution	20	0.116687	0.116687	0.005834
	native_batch_norm_backward	20	0.073042	0.073042	0.003652
1	mkldnn_convolution_backward	20	0.910873	0.910873	0.045544
	mkldnn_convolution	20	0.109726	0.109726	0.005486
	sqrt	62	0.061738	0.061738	0.000996
2	mkldnn_convolution_backward	20	0.674379	0.674379	0.033719
	mkldnn_convolution	20	0.142355	0.142355	0.007118
	sqrt	62	0.070446	0.070446	0.001136
3	mkldnn_convolution_backward	20	0.741461	0.741461	0.037073
	mkldnn_convolution	20	0.103317	0.103317	0.005166
	sqrt	62	0.058764	0.058764	0.000948
4	mkldnn_convolution_backward	20	0.768156	0.768156	0.038408
	mkldnn_convolution	20	0.113721	0.113721	0.005686
	sqrt	62	0.069744	0.069744	0.001125
5	mkldnn_convolution_backward	20	0.745379	0.745379	0.037269
	mkldnn_convolution	20	0.113744	0.113744	0.005687
	sqrt	62	0.074473	0.074473	0.001201
6	mkldnn_convolution_backward	20	0.743273	0.743273	0.037164
	mkldnn_convolution	20	0.136254	0.136254	0.006813
	sqrt	62	0.106651	0.106651	0.001720
7	mkldnn_convolution_backward	20	0.721489	0.721489	0.036074
	mkldnn_convolution	20	0.117488	0.117488	0.005874
	sqrt	62	0.056416	0.056416	0.000910
8	mkldnn_convolution_backward	20	0.816239	0.816239	0.040812
	mkldnn_convolution	20	0.104785	0.104785	0.005239
	sqrt	62	0.088693	0.088693	0.001431
9	mkldnn_convolution_backward	20	0.762722	0.762722	0.038136
	mkldnn_convolution	20	0.106693	0.106693	0.005335
	sqrt	62	0.083834	0.083834	0.001352
...	...	...	...	...	...
40	mkldnn_convolution_backward	20	0.503122	0.503122	0.025156
	mkldnn_convolution	20	0.089508	0.089508	0.004475
	sqrt	62	0.066431	0.066431	0.001071
41	mkldnn_convolution_backward	20	1.013086	1.013086	0.050654
	mkldnn_convolution	20	0.131901	0.131901	0.006595
	sqrt	62	0.090812	0.090812	0.001465
42	mkldnn_convolution_backward	20	0.860204	0.860204	0.043010
	mkldnn_convolution	20	0.124513	0.124513	0.006226
	sqrt	62	0.117845	0.117845	0.001901
43	mkldnn_convolution_backward	20	1.018114	1.018114	0.050906
	mkldnn_convolution	20	0.190651	0.190651	0.009533
	sqrt	62	0.090844	0.090844	0.001465
44	mkldnn_convolution_backward	20	0.480101	0.480101	0.024005
	mkldnn_convolution	20	0.080808	0.080808	0.004040
	sqrt	62	0.072511	0.072511	0.001170
45	mkldnn_convolution_backward	20	0.504529	0.504529	0.025226
	mkldnn_convolution	20	0.079474	0.079474	0.003974
	sqrt	62	0.066213	0.066213	0.001068
46	mkldnn_convolution_backward	20	0.526187	0.526187	0.026309
	mkldnn_convolution	20	0.104905	0.104905	0.005245
	sqrt	62	0.061077	0.061077	0.000985
47	mkldnn_convolution_backward	20	0.654809	0.654809	0.032740
	mkldnn_convolution	20	0.081344	0.081344	0.004067
	sqrt	62	0.063444	0.063444	0.001023
48	mkldnn_convolution_backward	20	0.511648	0.511648	0.025582
	mkldnn_convolution	20	0.115203	0.115203	0.005760
	sqrt	62	0.070366	0.070366	0.001135
49	mkldnn_convolution_backward	20	0.559349	0.559349	0.027967
	mkldnn_convolution	20	0.100382	0.100382	0.005019
	native_batch_norm_backward	20	0.047145	0.047145	0.002357