Okay, finally the experiment using readability measures as features has finished so we will analyze the result in this session. First, load our toolkits and let matplotlib plots inline.
In [1]:
import numpy as np
import pandas as pd
%matplotlib inline
Load the experiment result and see the result.
In [2]:
df = pd.read_hdf('../reports/large-exp-readability-feats.h5', 'df')
df
Out[2]:
Group them all by method, feature, and distance (averaging over all threads).
In [3]:
df2 = df.groupby(level=['method', 'feature', 'metric']).mean()
df2
Out[3]:
Scroll easier by looking at its transpose.
In [4]:
df2.T
Out[4]:
Put baseline and performance distribution side by side.
In [5]:
df3 = df2.T.unstack(level='result')
df3
Out[5]:
Let's group this result for each random event and plot the corresponding baseline and performance distributions.
In [6]:
grouped = df3.groupby(level=['num_norm', 'num_oot', 'num_top'])
In [7]:
for name, group in grouped:
group.plot(kind='bar', legend=False, use_index=False, title='num_norm={}, num_oot={}, num_top={}'.format(*name))
We see form the plot that the performance doesn't differ that much from the baseline. In some cases, it is even worse than the baseline. Let's take a look at their expected values.
In [8]:
ngroup = len(grouped)
data = np.empty((ngroup, 2))
index = []
for i, (name, _) in enumerate(grouped):
tmp = df3.loc[name]
prod = tmp.T * np.array(tmp.index) # multiply pmf and support
prod = prod.unstack(level='result')
expval = prod.sum(axis=1, level='result').values.ravel()
data[i,:] = expval
index.append(name)
In [9]:
data
Out[9]:
In [10]:
index = pd.MultiIndex.from_tuples(index, names=['num_norm', 'num_oot', 'num_top'])
columns = pd.MultiIndex.from_tuples([('E[X]', 'base'), ('E[X]', 'perf')])
In [11]:
result = pd.DataFrame(data, index=index, columns=columns)
In [12]:
result
Out[12]:
Now we can see clearer the comparison between the two.