Peformance Comprison

So, now that we've explored a number of optimization schemes, how do they compare?

We start by loading some notebook wide features. Memory profiler and snakeviz are tools we will use to debug our codes. Bokeh and Holoviews are plotting libraries. The debugger was used in the development of this notebook. Hopefully we won't need it here.



In [1]:

    
import holoviews as hv
hv.extension('bokeh','matplotlib')
from IPython.core import debugger
ist = debugger.set_trace

First we need to gather all of our profiling data.

Disclaimer I am quite new to pandas and still learning its idioms so my usage below may not be ideal. Please let me know (verbally or via PR) if you have a cleaner way to gather these data into a DataFrame. Thanks!



In [2]:

    
import os
import pandas as pd
import pstats


energies = []
names = []
files = [f for f in os.listdir('./energy/') if f[0] != '.']
for file in files:
    names.append(os.path.splitext(os.path.basename(file))[0])
    with open('./energy/' + file,'r') as f:
        energies.append(float(f.readline()))
es = pd.Series(energies,index=names,name='Energy')   


times = []
time_names = []
mem_names = []
mems = []
files = [f for f in os.listdir('./prof/') if f[0] != '.']
for file in files:
    fname = './prof/' + file
    name,ext = os.path.splitext(os.path.basename(file))
    if ext == '.prof':
        time_names.append(name)
        times.append(pstats.Stats(fname).total_tt)
    elif ext == '.memprof':
        mem_names.append(name)
        with open(fname,'r') as f:
            mems.append([float(i) for i in f.readlines()])
ts = pd.Series(times,index=time_names,name='RunTime') 
ms1 = pd.Series([i[0] for i in mems],index=mem_names,name='MaxMem')   
ms2 = pd.Series([i[1] for i in mems],index=mem_names,name='IncMem')   

df = pd.concat([es,ts,ms1,ms2],axis=1).sort_values(by='RunTime')

#Add in relative runtime column
df.insert(2,column='RelTime',value=df['RunTime']/df['RunTime'][df.index=='python'][0])
df









    Out[2]:






  
    
      
      Energy
      RunTime
      RelTime
      MaxMem
      IncMem
    
  
  
    
      cython3
      779.641239
      0.091468
      0.002722
      137.207031
      0.000000
    
    
      cython2
      779.641239
      0.108092
      0.003217
      137.125000
      0.000000
    
    
      cython1
      779.641239
      0.850514
      0.025312
      136.757812
      0.007812
    
    
      numba2
      779.641239
      0.875366
      0.026051
      181.804688
      0.000000
    
    
      numpy2
      779.641239
      3.293480
      0.098015
      1546.695312
      1051.960938
    
    
      numpy1
      779.641239
      4.133790
      0.123023
      1935.484375
      1701.859375
    
    
      numba3
      779.641239
      5.057656
      0.150517
      1487.417969
      1018.257812
    
    
      python
      779.641239
      33.601806
      1.000000
      138.398438
      0.003906
    
    
      numba1
      779.641239
      38.230602
      1.137754
      179.875000
      -0.246094

Finally, we can throw the DataFrame at HoloViews and get a nice, formatted bar chart back



In [3]:

    
%%opts Bars [xrotation=40,height=400,width=600,show_grid=True,tools=['hover']]
%%opts Bars [fontsize={'ticks':14,'labels':16}] (alpha=0.6)

hvb1 = (hv.Bars(df.reset_index(),kdims=['index'],vdims=['RelTime'])
         .redim.label(RelTime = 'Relative Time',index='Approach'))

hvb2 = (hv.Bars(df.reset_index(),kdims=['index'],vdims=['RunTime'])
        .redim.label(RunTime = 'Run Time',index='Approach')
        .redim.unit(RunTime = 's'))

hvb3 = (hv.Bars(df.reset_index(),kdims=['index'],vdims=['MaxMem'])
        .redim.label(MaxMem = 'Maximum Memory',index='Approach')
        .redim.unit(MaxMem = 'MB'))

(hvb1 + hvb2 + hvb3).cols(1)









    Out[3]:

Takeaways

Pure Python is very memory efficient, but poor in terms of execution.
Even a naive Numpy implementation is ~10x faster than pure Python
Numpy also offers both improvements in code readability and simplicity
Unfortunately, our Numba implementations were always worse than the functions they wrapped
Cython is the clear winner, but involves increased effort.
Top speed is always achieved with proper tuning and profiling



In [ ]:



In [ ]:

	Energy	RunTime	RelTime	MaxMem	IncMem
cython3	779.641239	0.091468	0.002722	137.207031	0.000000
cython2	779.641239	0.108092	0.003217	137.125000	0.000000
cython1	779.641239	0.850514	0.025312	136.757812	0.007812
numba2	779.641239	0.875366	0.026051	181.804688	0.000000
numpy2	779.641239	3.293480	0.098015	1546.695312	1051.960938
numpy1	779.641239	4.133790	0.123023	1935.484375	1701.859375
numba3	779.641239	5.057656	0.150517	1487.417969	1018.257812
python	779.641239	33.601806	1.000000	138.398438	0.003906
numba1	779.641239	38.230602	1.137754	179.875000	-0.246094