Now, we're going to analyze the result from our large experiment. In this experiment, we used 30 iterations for each experiment configuration and unigram frequency as our feature. Unfortunately, experiment script using readability measures as features is still running so the result is not available yet.

Let's import our tools, Numpy and Pandas, and let matplotlib plots inline.


In [1]:
import pandas as pd
import numpy as np
%matplotlib inline

Read the experiment result and display it.


In [2]:
df = pd.read_hdf('../reports/large-exp-unigram-feats.h5', 'df')

In [3]:
df


Out[3]:
num_norm 10 ... 80
num_oot 1 ... 8
num_top 1 3 5 ... 5
result base perf base perf base ... base perf
k 0 1 0 1 0 1 0 1 0 1 ... 2 3 4 5 0 1 2 3 4 5
method feature metric norm_dir oot_dir
txt_comp_dist unigram euclidean bbs152930 bbs57549 0.909091 0.090909 0.266667 0.733333 0.727273 0.272727 0.066667 0.933333 0.545455 0.454545 ... 0.058722 0.004517 0.000143 0.000001 0.000000 0.100000 0.300000 0.400000 0.200000 0.000000
mus10142 0.909091 0.090909 0.000000 1.000000 0.727273 0.272727 0.066667 0.933333 0.545455 0.454545 ... 0.058722 0.004517 0.000143 0.000001 0.000000 0.000000 0.000000 0.100000 0.900000 0.000000
phy40008 0.909091 0.090909 0.200000 0.800000 0.727273 0.272727 0.100000 0.900000 0.545455 0.454545 ... 0.058722 0.004517 0.000143 0.000001 0.000000 0.266667 0.100000 0.433333 0.200000 0.000000
phy17301 bbs57549 0.909091 0.090909 0.066667 0.933333 0.727273 0.272727 0.000000 1.000000 0.545455 0.454545 ... 0.058722 0.004517 0.000143 0.000001 0.000000 0.166667 0.266667 0.500000 0.066667 0.000000
mus10142 0.909091 0.090909 0.000000 1.000000 0.727273 0.272727 0.000000 1.000000 0.545455 0.454545 ... 0.058722 0.004517 0.000143 0.000001 0.000000 0.000000 0.066667 0.400000 0.466667 0.066667
phy40008 0.909091 0.090909 0.366667 0.633333 0.727273 0.272727 0.066667 0.933333 0.545455 0.454545 ... 0.058722 0.004517 0.000143 0.000001 0.133333 0.166667 0.500000 0.166667 0.033333 0.000000
mus1139 bbs57549 0.909091 0.090909 0.066667 0.933333 0.727273 0.272727 0.166667 0.833333 0.545455 0.454545 ... 0.058722 0.004517 0.000143 0.000001 0.233333 0.333333 0.366667 0.066667 0.000000 0.000000
mus10142 0.909091 0.090909 0.466667 0.533333 0.727273 0.272727 0.300000 0.700000 0.545455 0.454545 ... 0.058722 0.004517 0.000143 0.000001 0.400000 0.166667 0.300000 0.133333 0.000000 0.000000
phy40008 0.909091 0.090909 0.300000 0.700000 0.727273 0.272727 0.133333 0.866667 0.545455 0.454545 ... 0.058722 0.004517 0.000143 0.000001 0.366667 0.300000 0.333333 0.000000 0.000000 0.000000

9 rows × 116 columns

Let's compute the average performance over all threads.


In [4]:
df2 = df.groupby(level=['method', 'feature', 'metric']).mean()

In [5]:
df2


Out[5]:
num_norm 10 ... 80
num_oot 1 ... 8
num_top 1 3 5 ... 5
result base perf base perf base ... base perf
k 0 1 0 1 0 1 0 1 0 1 ... 2 3 4 5 0 1 2 3 4 5
method feature metric
txt_comp_dist unigram euclidean 0.909091 0.090909 0.192593 0.807407 0.727273 0.272727 0.1 0.9 0.545455 0.454545 ... 0.058722 0.004517 0.000143 0.000001 0.125926 0.166667 0.248148 0.244444 0.207407 0.007407

1 rows × 116 columns

It is easier to scroll vertically, so let's display its transpose instead.


In [6]:
df2.T


Out[6]:
method txt_comp_dist
feature unigram
metric euclidean
num_norm num_oot num_top result k
10 1 1 base 0 0.909091
1 0.090909
perf 0 0.192593
1 0.807407
3 base 0 0.727273
1 0.272727
perf 0 0.100000
1 0.900000
5 base 0 0.545455
1 0.454545
perf 0 0.051852
1 0.948148
4 1 base 0 0.714286
1 0.285714
perf 0 0.133333
1 0.866667
3 base 0 0.329670
1 0.494505
2 0.164835
3 0.010989
perf 0 0.092593
1 0.229630
2 0.255556
3 0.422222
5 base 0 0.125874
1 0.419580
2 0.359640
3 0.089910
4 0.004995
perf 0 0.011111
... ... ... ... ... ...
80 4 5 base 4 0.000003
perf 0 0.196296
1 0.229630
2 0.285185
3 0.214815
4 0.074074
8 1 base 0 0.909091
1 0.090909
perf 0 0.537037
1 0.462963
3 base 0 0.748706
1 0.230371
2 0.020413
3 0.000510
perf 0 0.133333
1 0.270370
2 0.507407
3 0.088889
5 base 0 0.613645
1 0.322971
2 0.058722
3 0.004517
4 0.000143
5 0.000001
perf 0 0.125926
1 0.166667
2 0.248148
3 0.244444
4 0.207407
5 0.007407

116 rows × 1 columns

Let's put the baseline and performance side by side so we can compare them more easily.


In [7]:
df3 = df2.T.unstack(level='result')

In [8]:
df3


Out[8]:
method txt_comp_dist
feature unigram
metric euclidean
result base perf
num_norm num_oot num_top k
10 1 1 0 0.909091 0.192593
1 0.090909 0.807407
3 0 0.727273 0.100000
1 0.272727 0.900000
5 0 0.545455 0.051852
1 0.454545 0.948148
4 1 0 0.714286 0.133333
1 0.285714 0.866667
3 0 0.329670 0.092593
1 0.494505 0.229630
2 0.164835 0.255556
3 0.010989 0.422222
5 0 0.125874 0.011111
1 0.419580 0.137037
2 0.359640 0.225926
3 0.089910 0.300000
4 0.004995 0.325926
8 1 0 0.555556 0.100000
1 0.444444 0.900000
3 0 0.147059 0.033333
1 0.441176 0.200000
2 0.343137 0.333333
3 0.068627 0.433333
5 0 0.029412 0.011111
1 0.196078 0.088889
2 0.392157 0.162963
3 0.294118 0.333333
4 0.081699 0.259259
5 0.006536 0.144444
80 1 1 0 0.987654 0.844444
1 0.012346 0.155556
3 0 0.962963 0.507407
1 0.037037 0.492593
5 0 0.938272 0.470370
1 0.061728 0.529630
4 1 0 0.952381 0.637037
1 0.047619 0.362963
3 0 0.862264 0.240741
1 0.132656 0.411111
2 0.005038 0.325926
3 0.000042 0.022222
5 0 0.778699 0.196296
1 0.204921 0.229630
2 0.015968 0.285185
3 0.000409 0.214815
4 0.000003 0.074074
8 1 0 0.909091 0.537037
1 0.090909 0.462963
3 0 0.748706 0.133333
1 0.230371 0.270370
2 0.020413 0.507407
3 0.000510 0.088889
5 0 0.613645 0.125926
1 0.322971 0.166667
2 0.058722 0.248148
3 0.004517 0.244444
4 0.000143 0.207407
5 0.000001 0.007407

A little explanation, k here denotes the number of OOT posts found in the top list.

The table looks very neat now. However, it is still difficult to compare baseline and performance distribution this way. So, let's just plot them so we can see more clearly the shape of the distribution.

First, group them by the number of normal posts, OOT posts, and posts in top list. Each (num_norm, num_oot, num_top) configuration represents a different random event so we have to plot each of them separately.


In [9]:
grouped = df3.groupby(level=['num_norm', 'num_oot', 'num_top'])

Now, simply plot each group. In the plot, blue and green denotes the baseline and actual performance respectively.


In [10]:
for name, group in grouped:
    group.plot(kind='bar', legend=False, use_index=False, title='num_norm={}, num_oot={}, num_top={}'.format(*name))


Great! Now we can see that our method performance, in this case txt_comp_dist, is better than baseline most of the time since its distribution has higher probability for large values of k compared to baseline.

Now, we may have plot the distributions and are able to draw some conclusion, but wouldn't it be nice if we can represent each distribution with one numerical value and see whose value is larger to determine the superior one? That's what we're going to do now. Let's represent each distribution with its expected value. (why?)


In [11]:
ngroup = len(grouped)
data = np.empty((ngroup, 2))
index = []
for i, (name, _) in enumerate(grouped):
    tmp = df3.loc[name]
    prod = tmp.T * np.array(tmp.index)   # multiply pmf and support
    prod = prod.unstack(level='result')
    expval = prod.sum(axis=1, level='result').values.ravel()
    data[i,:] = expval
    index.append(name)

In [12]:
data


Out[12]:
array([[ 0.09090909,  0.80740741],
       [ 0.27272727,  0.9       ],
       [ 0.45454545,  0.94814815],
       [ 0.28571429,  0.86666667],
       [ 0.85714286,  2.00740741],
       [ 1.42857143,  2.79259259],
       [ 0.44444444,  0.9       ],
       [ 1.33333333,  2.16666667],
       [ 2.22222222,  3.17407407],
       [ 0.01234568,  0.15555556],
       [ 0.03703704,  0.49259259],
       [ 0.0617284 ,  0.52962963],
       [ 0.04761905,  0.36296296],
       [ 0.14285714,  1.12962963],
       [ 0.23809524,  1.74074074],
       [ 0.09090909,  0.46296296],
       [ 0.27272727,  1.55185185],
       [ 0.45454545,  2.26296296]])

Doesn't look very nice, eh? Let's create a DataFrame so it can be displayed nicely.


In [13]:
index = pd.MultiIndex.from_tuples(index, names=['num_norm', 'num_oot', 'num_top'])
columns = pd.MultiIndex.from_tuples([('E[X]', 'base'), ('E[X]', 'perf')])

In [14]:
result = pd.DataFrame(data, index=index, columns=columns)

In [15]:
result


Out[15]:
E[X]
base perf
num_norm num_oot num_top
10 1 1 0.090909 0.807407
3 0.272727 0.900000
5 0.454545 0.948148
4 1 0.285714 0.866667
3 0.857143 2.007407
5 1.428571 2.792593
8 1 0.444444 0.900000
3 1.333333 2.166667
5 2.222222 3.174074
80 1 1 0.012346 0.155556
3 0.037037 0.492593
5 0.061728 0.529630
4 1 0.047619 0.362963
3 0.142857 1.129630
5 0.238095 1.740741
8 1 0.090909 0.462963
3 0.272727 1.551852
5 0.454545 2.262963

We're done! Remember that in this experiment, we used txt_comp_dist anomalous text detection method with unigram frequency as features and euclidean distance metric.