Now, we're going to analyze the result from our large experiment. In this experiment, we used 30 iterations for each experiment configuration and unigram frequency as our feature. Unfortunately, experiment script using readability measures as features is still running so the result is not available yet.
Let's import our tools, Numpy and Pandas, and let matplotlib plots inline.
In [1]:
import pandas as pd
import numpy as np
%matplotlib inline
Read the experiment result and display it.
In [2]:
df = pd.read_hdf('../reports/large-exp-unigram-feats.h5', 'df')
In [3]:
df
Out[3]:
Let's compute the average performance over all threads.
In [4]:
df2 = df.groupby(level=['method', 'feature', 'metric']).mean()
In [5]:
df2
Out[5]:
It is easier to scroll vertically, so let's display its transpose instead.
In [6]:
df2.T
Out[6]:
Let's put the baseline and performance side by side so we can compare them more easily.
In [7]:
df3 = df2.T.unstack(level='result')
In [8]:
df3
Out[8]:
A little explanation, k
here denotes the number of OOT posts found in the top list.
The table looks very neat now. However, it is still difficult to compare baseline and performance distribution this way. So, let's just plot them so we can see more clearly the shape of the distribution.
First, group them by the number of normal posts, OOT posts, and posts in top list. Each (num_norm, num_oot, num_top)
configuration represents a different random event so we have to plot each of them separately.
In [9]:
grouped = df3.groupby(level=['num_norm', 'num_oot', 'num_top'])
Now, simply plot each group. In the plot, blue and green denotes the baseline and actual performance respectively.
In [10]:
for name, group in grouped:
group.plot(kind='bar', legend=False, use_index=False, title='num_norm={}, num_oot={}, num_top={}'.format(*name))
Great! Now we can see that our method performance, in this case txt_comp_dist
, is better than baseline most of the time since its distribution has higher probability for large values of k
compared to baseline.
Now, we may have plot the distributions and are able to draw some conclusion, but wouldn't it be nice if we can represent each distribution with one numerical value and see whose value is larger to determine the superior one? That's what we're going to do now. Let's represent each distribution with its expected value. (why?)
In [11]:
ngroup = len(grouped)
data = np.empty((ngroup, 2))
index = []
for i, (name, _) in enumerate(grouped):
tmp = df3.loc[name]
prod = tmp.T * np.array(tmp.index) # multiply pmf and support
prod = prod.unstack(level='result')
expval = prod.sum(axis=1, level='result').values.ravel()
data[i,:] = expval
index.append(name)
In [12]:
data
Out[12]:
Doesn't look very nice, eh? Let's create a DataFrame so it can be displayed nicely.
In [13]:
index = pd.MultiIndex.from_tuples(index, names=['num_norm', 'num_oot', 'num_top'])
columns = pd.MultiIndex.from_tuples([('E[X]', 'base'), ('E[X]', 'perf')])
In [14]:
result = pd.DataFrame(data, index=index, columns=columns)
In [15]:
result
Out[15]:
We're done! Remember that in this experiment, we used txt_comp_dist
anomalous text detection method with unigram frequency as features and euclidean distance metric.