using Timeit

How to compare runs to determine which is a faster implementation.



In [ ]:

    
import pandas as pd
import numpy as np
import timeit

Create a panda data frame with 1000 values randomly 1 <= x < 10. Uniform random?



In [ ]:

    
help('numpy.random.randint')

Could also use np.random.normal for some statistical fun



In [ ]:

    
data = pd.DataFrame(data=np.random.randint(1,10,1000),columns=['value'])



In [ ]:

    
data.describe()



In [ ]:

    
np.median(a=data['value'])

define setup string which will be run in test case, but before timing begins.



In [ ]:

    
setup = '''
import pandas as pd
import numpy as np
data = pd.DataFrame(data=np.random.randint(1,10,1000),columns=['value'])'''

calculate the median of the column 'value' time how long it takes.



In [ ]:

    
median_statement ='''np.median(a=data['value'])'''

run timeit with a timer containing the setup and the median statement. The median statement will be called 100000 times.



In [ ]:

    
help('timeit')



In [ ]:

    
timeit.Timer(setup=setup, stmt=median_statement).timeit(number=100000)

Maybe a better test with less overhead.

Interested in answering the question about the effort needed to do median so removing the panda dataframe and just working with a list. Just looking at difference in time to do median and mean.



In [ ]:

    
setup = '''
import numpy as np
data = np.random.randint(1,10,1000)
'''



In [ ]:

    
median_statement = '''np.median(a=data)'''
mean_statement   = '''np.mean  (a=data)'''

Build a little dictionary with the median and mean setup and statement.



In [ ]:

    
timeTestSetup = {'median': timeit.Timer(setup=setup,stmt=median_statement),
'mean': timeit.Timer(setup=setup, stmt=mean_statement)}

Run them both and report the results.



In [ ]:

    
for (k,v) in timeTestSetup.items():
    print("Method:\t{}: Time:\t{}".format(k,v.timeit()))



In [ ]: