In [ ]:
import pandas as pd
import numpy as np
import timeit
Create a panda data frame with 1000 values randomly 1 <= x < 10. Uniform random?
In [ ]:
help('numpy.random.randint')
Could also use np.random.normal for some statistical fun
In [ ]:
data = pd.DataFrame(data=np.random.randint(1,10,1000),columns=['value'])
In [ ]:
data.describe()
In [ ]:
np.median(a=data['value'])
define setup string which will be run in test case, but before timing begins.
In [ ]:
setup = '''
import pandas as pd
import numpy as np
data = pd.DataFrame(data=np.random.randint(1,10,1000),columns=['value'])'''
calculate the median of the column 'value' time how long it takes.
In [ ]:
median_statement ='''np.median(a=data['value'])'''
run timeit with a timer containing the setup and the median statement. The median statement will be called 100000 times.
In [ ]:
help('timeit')
In [ ]:
timeit.Timer(setup=setup, stmt=median_statement).timeit(number=100000)
Maybe a better test with less overhead.
Interested in answering the question about the effort needed to do median so removing the panda dataframe and just working with a list. Just looking at difference in time to do median and mean.
In [ ]:
setup = '''
import numpy as np
data = np.random.randint(1,10,1000)
'''
In [ ]:
median_statement = '''np.median(a=data)'''
mean_statement = '''np.mean (a=data)'''
Build a little dictionary with the median and mean setup and statement.
In [ ]:
timeTestSetup = {'median': timeit.Timer(setup=setup,stmt=median_statement),
'mean': timeit.Timer(setup=setup, stmt=mean_statement)}
Run them both and report the results.
In [ ]:
for (k,v) in timeTestSetup.items():
print("Method:\t{}: Time:\t{}".format(k,v.timeit()))
In [ ]: