Peformance Comprison

So, now that we've explored a number of optimization schemes, how do they compare?

We start by loading some notebook wide features. Memory profiler and snakeviz are tools we will use to debug our codes. Bokeh and Holoviews are plotting libraries. The debugger was used in the development of this notebook. Hopefully we won't need it here.


In [1]:
import holoviews as hv
hv.extension('bokeh','matplotlib')
from IPython.core import debugger
ist = debugger.set_trace


First we need to gather all of our profiling data.

Disclaimer I am quite new to pandas and still learning its idioms so my usage below may not be ideal. Please let me know (verbally or via PR) if you have a cleaner way to gather these data into a DataFrame. Thanks!


In [2]:
import os
import pandas as pd
import pstats


energies = []
names = []
files = [f for f in os.listdir('./energy/') if f[0] != '.']
for file in files:
    names.append(os.path.splitext(os.path.basename(file))[0])
    with open('./energy/' + file,'r') as f:
        energies.append(float(f.readline()))
es = pd.Series(energies,index=names,name='Energy')   


times = []
time_names = []
mem_names = []
mems = []
files = [f for f in os.listdir('./prof/') if f[0] != '.']
for file in files:
    fname = './prof/' + file
    name,ext = os.path.splitext(os.path.basename(file))
    if ext == '.prof':
        time_names.append(name)
        times.append(pstats.Stats(fname).total_tt)
    elif ext == '.memprof':
        mem_names.append(name)
        with open(fname,'r') as f:
            mems.append([float(i) for i in f.readlines()])
ts = pd.Series(times,index=time_names,name='RunTime') 
ms1 = pd.Series([i[0] for i in mems],index=mem_names,name='MaxMem')   
ms2 = pd.Series([i[1] for i in mems],index=mem_names,name='IncMem')   

df = pd.concat([es,ts,ms1,ms2],axis=1).sort_values(by='RunTime')

#Add in relative runtime column
df.insert(2,column='RelTime',value=df['RunTime']/df['RunTime'][df.index=='python'][0])
df


Out[2]:
Energy RunTime RelTime MaxMem IncMem
cython3 779.641239 0.091468 0.002722 137.207031 0.000000
cython2 779.641239 0.108092 0.003217 137.125000 0.000000
cython1 779.641239 0.850514 0.025312 136.757812 0.007812
numba2 779.641239 0.875366 0.026051 181.804688 0.000000
numpy2 779.641239 3.293480 0.098015 1546.695312 1051.960938
numpy1 779.641239 4.133790 0.123023 1935.484375 1701.859375
numba3 779.641239 5.057656 0.150517 1487.417969 1018.257812
python 779.641239 33.601806 1.000000 138.398438 0.003906
numba1 779.641239 38.230602 1.137754 179.875000 -0.246094

Finally, we can throw the DataFrame at HoloViews and get a nice, formatted bar chart back


In [3]:
%%opts Bars [xrotation=40,height=400,width=600,show_grid=True,tools=['hover']]
%%opts Bars [fontsize={'ticks':14,'labels':16}] (alpha=0.6)

hvb1 = (hv.Bars(df.reset_index(),kdims=['index'],vdims=['RelTime'])
         .redim.label(RelTime = 'Relative Time',index='Approach'))

hvb2 = (hv.Bars(df.reset_index(),kdims=['index'],vdims=['RunTime'])
        .redim.label(RunTime = 'Run Time',index='Approach')
        .redim.unit(RunTime = 's'))

hvb3 = (hv.Bars(df.reset_index(),kdims=['index'],vdims=['MaxMem'])
        .redim.label(MaxMem = 'Maximum Memory',index='Approach')
        .redim.unit(MaxMem = 'MB'))

(hvb1 + hvb2 + hvb3).cols(1)


Out[3]:

Takeaways

  • Pure Python is very memory efficient, but poor in terms of execution.
  • Even a naive Numpy implementation is ~10x faster than pure Python
  • Numpy also offers both improvements in code readability and simplicity
  • Unfortunately, our Numba implementations were always worse than the functions they wrapped
  • Cython is the clear winner, but involves increased effort.
  • Top speed is always achieved with proper tuning and profiling

In [ ]:


In [ ]: