using Timeit

How to compare runs to determine which is a faster implementation.


In [ ]:
import pandas as pd
import numpy as np
import timeit

Create a panda data frame with 1000 values randomly 1 <= x < 10. Uniform random?


In [ ]:
help('numpy.random.randint')

Could also use np.random.normal for some statistical fun


In [ ]:
data = pd.DataFrame(data=np.random.randint(1,10,1000),columns=['value'])

In [ ]:
data.describe()

In [ ]:
np.median(a=data['value'])

define setup string which will be run in test case, but before timing begins.


In [ ]:
setup = '''
import pandas as pd
import numpy as np
data = pd.DataFrame(data=np.random.randint(1,10,1000),columns=['value'])'''

calculate the median of the column 'value' time how long it takes.


In [ ]:
median_statement ='''np.median(a=data['value'])'''

run timeit with a timer containing the setup and the median statement. The median statement will be called 100000 times.


In [ ]:
help('timeit')

In [ ]:
timeit.Timer(setup=setup, stmt=median_statement).timeit(number=100000)

Maybe a better test with less overhead.

Interested in answering the question about the effort needed to do median so removing the panda dataframe and just working with a list. Just looking at difference in time to do median and mean.


In [ ]:
setup = '''
import numpy as np
data = np.random.randint(1,10,1000)
'''

In [ ]:
median_statement = '''np.median(a=data)'''
mean_statement   = '''np.mean  (a=data)'''

Build a little dictionary with the median and mean setup and statement.


In [ ]:
timeTestSetup = {'median': timeit.Timer(setup=setup,stmt=median_statement),
'mean': timeit.Timer(setup=setup, stmt=mean_statement)}

Run them both and report the results.


In [ ]:
for (k,v) in timeTestSetup.items():
    print("Method:\t{}: Time:\t{}".format(k,v.timeit()))

In [ ]: