Comparing on multiple data sets

Let us go back to our problem

  • compute the mean difference of accuracy (nbc-aode) for each dataset

In [1]:
using Distributions, Gadfly, Compose, DataFrames
include("/home/benavoli/Data/Work_Julia/Tutorial/Julia/Plots/plot_simplex.jl")
include("/home/benavoli/Data/Work_Julia/Julia/Plots/plot_data1.jl")
include("/home/benavoli/Data/Work_Julia/Tutorial/Julia/Tests/Bsigntest.jl")
ClassID = readdlm("/home/benavoli/Data/Work_Julia/Tutorial/Julia/Data/ClassifierID.dat", ',')
ClassNames = readdlm("/home/benavoli/Data/Work_Julia/Tutorial/Julia/Data/ClassifierNames.dat", ',')
DatasetID = readdlm("/home/benavoli/Data/Work_Julia/Tutorial/Julia/Data/DatasetID.dat", ',');
DatasetNames = readdlm("/home/benavoli/Data/Work_Julia/Tutorial/Julia/Data/DatasetNames.dat", ',');
Percent_correct = readdlm("/home/benavoli/Data/Work_Julia/Tutorial/Julia/Data/Percent_correct.dat", ',');

cl1a=1 #nbc
cl2a=2#aode
println("Comparison of ", ClassNames[cl1a,1], " vs. ", ClassNames[cl2a,1])
println()

#Compute mean accuracy
indi=find(x->x==cl1a,ClassID)
indj=find(x->x==cl2a,ClassID)
accNbc=Float64[]
accAode=Float64[]
for d=1:Int32(maximum(DatasetID))
    indd=find(x->x==d,DatasetID)
    indid=intersect(indi,indd)
    indjd=intersect(indj,indd)
    push!(accNbc,mean(Percent_correct[indid])/100)
    push!(accAode,mean(Percent_correct[indjd])/100)
end


Comparison of nbc vs. aode

WARNING: New definition 
    +(AbstractArray{T<:Any, 2}, WoodburyMatrices.SymWoodbury) at /home/benavoli/.julia/v0.4/WoodburyMatrices/src/SymWoodburyMatrices.jl:106
is ambiguous with: 
    +(DataArrays.DataArray, AbstractArray) at /home/benavoli/.julia/v0.4/DataArrays/src/operators.jl:276.
To fix, define 
    +(DataArrays.DataArray{T<:Any, 2}, WoodburyMatrices.SymWoodbury)
before the new definition.
WARNING: New definition 
    +(AbstractArray{T<:Any, 2}, WoodburyMatrices.SymWoodbury) at /home/benavoli/.julia/v0.4/WoodburyMatrices/src/SymWoodburyMatrices.jl:106
is ambiguous with: 
    +(DataArrays.AbstractDataArray, AbstractArray) at /home/benavoli/.julia/v0.4/DataArrays/src/operators.jl:300.
To fix, define 
    +(DataArrays.AbstractDataArray{T<:Any, 2}, WoodburyMatrices.SymWoodbury)
before the new definition.

NBC - AODE


In [14]:
using HypothesisTests
p=plot_data1(cl1a,cl2a,"all datasets",accNbc-accAode,-0.12,0.08)
display(p) 
p1=pvalue(SignedRankTest(accNbc,accAode))
p2=pvalue(SignTest(accNbc,accAode))
@printf "p-value SignedRank=%0.8f    " p1 
@printf "p-value Sign=%0.8f\n" p2


DeltaAcc -0.45 -0.40 -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 -0.32 -0.31 -0.30 -0.29 -0.28 -0.27 -0.26 -0.25 -0.24 -0.23 -0.22 -0.21 -0.20 -0.19 -0.18 -0.17 -0.16 -0.15 -0.14 -0.13 -0.12 -0.11 -0.10 -0.09 -0.08 -0.07 -0.06 -0.05 -0.04 -0.03 -0.02 -0.01 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 -0.4 -0.2 0.0 0.2 0.4 -0.32 -0.30 -0.28 -0.26 -0.24 -0.22 -0.20 -0.18 -0.16 -0.14 -0.12 -0.10 -0.08 -0.06 -0.04 -0.02 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.28 0.30 -40 -30 -20 -10 0 10 20 30 40 50 60 70 -30 -29 -28 -27 -26 -25 -24 -23 -22 -21 -20 -19 -18 -17 -16 -15 -14 -13 -12 -11 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 -30 0 30 60 -30 -28 -26 -24 -22 -20 -18 -16 -14 -12 -10 -8 -6 -4 -2 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60
p-value SignedRank=0.00000163    p-value Sign=0.00000040

Bayesian Sign Test

The Bayesian sign test reduces to compute

$$ \mathcal{P}\Big(P(0,\infty)>P(-\infty,0]\Big) =\mathcal{P}\Big(P(0,\infty)>0.5\Big) $$

where $\mathcal{P}$ is computed w.r.t. $(w_0,w_1,\dots,w_n)\sim Dir(s,1,\dots,1)$ and $P_0 \sim Dp(s,\alpha)$

Statistically significally better

p-value is greater than 0.05, the NHST cannot conclude anything.

What does $0.99$ $p$-value mean?

two-sided p-value


In [3]:
#Bayesian Sign Test 
datacla=Bsigntest(accNbc,accAode,0)
#Some cPlot code
df1 = DataFrame(x=datacla[2,:][:]); p1=plot(df1, x=:x, Geom.histogram,Coord.Cartesian(xmin=0,xmax=1),Theme(major_label_font_size=13pt,minor_label_font_size=12pt,key_label_font_size=11pt),Guide.xlabel("P(nbc-aode)"))
set_default_plot_size(15cm, 8cm); display(p1)
prob=length(find(z->z> 0.5,datacla[2,:][:]))/length(datacla[2,:][:])
println("Probability that P(nbc-aode)>0.5 is $prob")


P(nbc-aode) -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 -1.00 -0.95 -0.90 -0.85 -0.80 -0.75 -0.70 -0.65 -0.60 -0.55 -0.50 -0.45 -0.40 -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60 1.65 1.70 1.75 1.80 1.85 1.90 1.95 2.00 -1 0 1 2 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 -3000 -2500 -2000 -1500 -1000 -500 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 -2500 -2400 -2300 -2200 -2100 -2000 -1900 -1800 -1700 -1600 -1500 -1400 -1300 -1200 -1100 -1000 -900 -800 -700 -600 -500 -400 -300 -200 -100 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300 2400 2500 2600 2700 2800 2900 3000 3100 3200 3300 3400 3500 3600 3700 3800 3900 4000 4100 4200 4300 4400 4500 4600 4700 4800 4900 5000 -2500 0 2500 5000 -2600 -2400 -2200 -2000 -1800 -1600 -1400 -1200 -1000 -800 -600 -400 -200 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000
Probability that P(nbc-aode)>0.5 is 0.0
This is a very rough query to the posterior!

The cassock does not necessarily make the priest!

A colour thinking

Comparing nbc and aode through Bayesian analysis:

ROPE

Can we say anything about the probability that nbc is practically equivalent to aode?

  • we first define the meaning of “practically equivalent”.
  • two classifiers whose mean difference of accuracies is less that $1\%$ are practically equivalent.
  • the interval [−0.01, 0.01] thus defines a region of practical equivalence (rope)

Bayesian Sign Test with Rope

Let us ask the posterior

$$ P(-\infty,-r), ~~~ P[-r,r],~~~ P(r,\infty) $$

This is equal to

$$ \begin{array}{rclcl} P(-\infty,-r)&=&w_0 P_0(-\infty,r)&+& \sum_{i=1}^n w_i I_{(-\infty,-r)}(x_i)\\ P[-r,r]&=&w_0 P_0[-r,r]&+& \sum_{i=1}^n w_i I_{[-r,r]}(x_i)\\ P(r,\infty)&=&w_0 P_0(r,\infty)&+& \sum_{i=1}^n w_i I_{(r,\infty)}(x_i) \end{array} $$

In [4]:
#Bayesian Sign Test with rope
rope=0.01
datacla=Bsigntest(accNbc,accAode,rope)
#Plot
dfa = DataFrame(x=datacla[1,:][:])
pl0=plot(dfa, x=:x, Geom.histogram,Theme(major_label_font_size=13pt,minor_label_font_size=12pt,key_label_font_size=11pt),Guide.xlabel("P(nbc \U300A aode)"))
dfb = DataFrame(x=datacla[2,:][:])
pc0=plot(dfb, x=:x, Geom.histogram,Theme(major_label_font_size=13pt,minor_label_font_size=12pt,key_label_font_size=11pt),Guide.xlabel("P(nbc=aode)"))
dfc = DataFrame(x=datacla[3,:][:])
pr0=plot(dfc, x=:x, Geom.histogram,Theme(major_label_font_size=13pt,minor_label_font_size=12pt,key_label_font_size=11pt),Guide.xlabel("P(nbc \U300B aode)"))
set_default_plot_size(40cm, 12cm)
display(hstack(pl0,pc0,pr0))


P(nbc 》 aode) -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 -0.31 -0.30 -0.29 -0.28 -0.27 -0.26 -0.25 -0.24 -0.23 -0.22 -0.21 -0.20 -0.19 -0.18 -0.17 -0.16 -0.15 -0.14 -0.13 -0.12 -0.11 -0.10 -0.09 -0.08 -0.07 -0.06 -0.05 -0.04 -0.03 -0.02 -0.01 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.30 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.50 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.60 0.61 -0.5 0.0 0.5 1.0 -0.32 -0.30 -0.28 -0.26 -0.24 -0.22 -0.20 -0.18 -0.16 -0.14 -0.12 -0.10 -0.08 -0.06 -0.04 -0.02 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48 0.50 0.52 0.54 0.56 0.58 0.60 0.62 -4×10³ -3×10³ -2×10³ -1×10³ 0 1×10³ 2×10³ 3×10³ 4×10³ 5×10³ 6×10³ 7×10³ -3000 -2900 -2800 -2700 -2600 -2500 -2400 -2300 -2200 -2100 -2000 -1900 -1800 -1700 -1600 -1500 -1400 -1300 -1200 -1100 -1000 -900 -800 -700 -600 -500 -400 -300 -200 -100 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300 2400 2500 2600 2700 2800 2900 3000 3100 3200 3300 3400 3500 3600 3700 3800 3900 4000 4100 4200 4300 4400 4500 4600 4700 4800 4900 5000 5100 5200 5300 5400 5500 5600 5700 5800 5900 6000 -3000 0 3000 6000 -3000 -2800 -2600 -2400 -2200 -2000 -1800 -1600 -1400 -1200 -1000 -800 -600 -400 -200 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600 5800 6000 P(nbc=aode) -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 -0.85 -0.80 -0.75 -0.70 -0.65 -0.60 -0.55 -0.50 -0.45 -0.40 -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60 -1 0 1 2 -0.85 -0.80 -0.75 -0.70 -0.65 -0.60 -0.55 -0.50 -0.45 -0.40 -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60 -3000 -2500 -2000 -1500 -1000 -500 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 -2500 -2400 -2300 -2200 -2100 -2000 -1900 -1800 -1700 -1600 -1500 -1400 -1300 -1200 -1100 -1000 -900 -800 -700 -600 -500 -400 -300 -200 -100 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300 2400 2500 2600 2700 2800 2900 3000 3100 3200 3300 3400 3500 3600 3700 3800 3900 4000 4100 4200 4300 4400 4500 4600 4700 4800 4900 5000 -2500 0 2500 5000 -2600 -2400 -2200 -2000 -1800 -1600 -1400 -1200 -1000 -800 -600 -400 -200 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 P(nbc 《 aode) -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 -0.85 -0.80 -0.75 -0.70 -0.65 -0.60 -0.55 -0.50 -0.45 -0.40 -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60 -1 0 1 2 -0.85 -0.80 -0.75 -0.70 -0.65 -0.60 -0.55 -0.50 -0.45 -0.40 -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60 -3000 -2500 -2000 -1500 -1000 -500 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 -2500 -2400 -2300 -2200 -2100 -2000 -1900 -1800 -1700 -1600 -1500 -1400 -1300 -1200 -1100 -1000 -900 -800 -700 -600 -500 -400 -300 -200 -100 0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300 2400 2500 2600 2700 2800 2900 3000 3100 3200 3300 3400 3500 3600 3700 3800 3900 4000 4100 4200 4300 4400 4500 4600 4700 4800 4900 5000 -2500 0 2500 5000 -2600 -2400 -2200 -2000 -1800 -1600 -1400 -1200 -1000 -800 -600 -400 -200 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000

In [5]:
ClassID = readdlm("/home/benavoli/Data/Work_Julia/Tutorial/Julia/Data/ClassifierID.dat", ',')
ClassNames = readdlm("/home/benavoli/Data/Work_Julia/Tutorial/Julia/Data/ClassifierNames.dat", ',')
DatasetID = readdlm("/home/benavoli/Data/Work_Julia/Tutorial/Julia/Data/DatasetID.dat", ',');
DatasetNames = readdlm("/home/benavoli/Data/Work_Julia/Tutorial/Julia/Data/DatasetNames.dat", ',');
Percent_correct = readdlm("/home/benavoli/Data/Work_Julia/Tutorial/Julia/Data/Percent_correct.dat", ',');

cl1b=4 #nbc
cl2b=5 #aode
println("Comparison of ", ClassNames[cl1b,1], " vs. ", ClassNames[cl2b,1])
println()

#Compute mean accuracy
indi=find(x->x==cl1b,ClassID)
indj=find(x->x==cl2b,ClassID)
accJ48=Float64[]
accJ48gr=Float64[]
for d=1:Int32(maximum(DatasetID))
    indd=find(x->x==d,DatasetID)
    indid=intersect(indi,indd)
    indjd=intersect(indj,indd)
    push!(accJ48,mean(Percent_correct[indid])/100)
    push!(accJ48gr,mean(Percent_correct[indjd])/100)
end


Comparison of j48 vs. j48gr

Simplex

How can we represent a trinomial probability?

$$ \theta_1=P(-\infty,r), ~~~ \theta_2=P(r,\infty),~~~ \theta_3=P[-r,r] $$

In [6]:
ptrianglea=plot_simplex(datacla, ClassNames[cl1a],ClassNames[cl2a])
set_default_plot_size(12.5cm, 10cm)
display(ptrianglea)


100 1 150 50 Count rope aode nbc

Query: for classifiers being practically equivalent means $$ \mathcal{P}\big(P(-\infty,-r)>P(r,\infty)+P[-r,r]\big) $$

J48 - J48gr (only 30 datasets)


In [7]:
using HypothesisTests
p1=pvalue(SignedRankTest(accJ48[20:50],accJ48gr[20:50]))
p2=pvalue(SignTest(accJ48[20:50],accJ48gr[20:50]))
p=plot_data1(cl1b,cl2b,"all datasets",accJ48[20:50]-accJ48gr[20:50],-0.013,0.011)
display(p)  
@printf "p-value SignedRank=%0.4f  " p1 
@printf "p-value Sign=%0.4f\n" p2


DeltaAcc -0.07 -0.06 -0.05 -0.04 -0.03 -0.02 -0.01 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 -0.037 -0.036 -0.035 -0.034 -0.033 -0.032 -0.031 -0.030 -0.029 -0.028 -0.027 -0.026 -0.025 -0.024 -0.023 -0.022 -0.021 -0.020 -0.019 -0.018 -0.017 -0.016 -0.015 -0.014 -0.013 -0.012 -0.011 -0.010 -0.009 -0.008 -0.007 -0.006 -0.005 -0.004 -0.003 -0.002 -0.001 0.000 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.010 0.011 0.012 0.013 0.014 0.015 0.016 0.017 0.018 0.019 0.020 0.021 0.022 0.023 0.024 0.025 0.026 0.027 0.028 0.029 0.030 0.031 0.032 0.033 0.034 0.035 -0.04 -0.02 0.00 0.02 0.04 -0.038 -0.036 -0.034 -0.032 -0.030 -0.028 -0.026 -0.024 -0.022 -0.020 -0.018 -0.016 -0.014 -0.012 -0.010 -0.008 -0.006 -0.004 -0.002 0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016 0.018 0.020 0.022 0.024 0.026 0.028 0.030 0.032 0.034 0.036 -300 -250 -200 -150 -100 -50 0 50 100 150 200 250 300 350 400 450 500 550 -250 -240 -230 -220 -210 -200 -190 -180 -170 -160 -150 -140 -130 -120 -110 -100 -90 -80 -70 -60 -50 -40 -30 -20 -10 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250 260 270 280 290 300 310 320 330 340 350 360 370 380 390 400 410 420 430 440 450 460 470 480 490 500 -250 0 250 500 -260 -240 -220 -200 -180 -160 -140 -120 -100 -80 -60 -40 -20 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400 420 440 460 480 500
p-value SignedRank=0.0822  p-value Sign=0.6636

In [8]:
#Bayesian Sign Test with rope
rope=0.01
dataclb=Bsigntest(accJ48[20:50],accJ48gr[20:50],rope)
#Plot
dfa = DataFrame(x=dataclb[1,:][:])
pl=plot(dfa, x=:x, Geom.histogram,Theme(major_label_font_size=13pt,minor_label_font_size=12pt,key_label_font_size=11pt),Guide.xlabel("P(J48 \U300A J48gr)"))
dfb = DataFrame(x=dataclb[2,:][:])
pc=plot(dfb, x=:x, Geom.histogram,Theme(major_label_font_size=13pt,minor_label_font_size=12pt,key_label_font_size=11pt),Guide.xlabel("P(J48=J48gr)"))
dfc = DataFrame(x=dataclb[3,:][:])
pr=plot(dfc, x=:x, Geom.histogram,Theme(major_label_font_size=13pt,minor_label_font_size=12pt,key_label_font_size=11pt),Guide.xlabel("P(J48 \U300B J48gr)"))
set_default_plot_size(40cm, 12cm)
display(hstack(pl,pc,pr))


P(J48 》 J48gr) -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 -0.31 -0.30 -0.29 -0.28 -0.27 -0.26 -0.25 -0.24 -0.23 -0.22 -0.21 -0.20 -0.19 -0.18 -0.17 -0.16 -0.15 -0.14 -0.13 -0.12 -0.11 -0.10 -0.09 -0.08 -0.07 -0.06 -0.05 -0.04 -0.03 -0.02 -0.01 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.30 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.50 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.60 0.61 -0.5 0.0 0.5 1.0 -0.32 -0.30 -0.28 -0.26 -0.24 -0.22 -0.20 -0.18 -0.16 -0.14 -0.12 -0.10 -0.08 -0.06 -0.04 -0.02 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48 0.50 0.52 0.54 0.56 0.58 0.60 0.62 -6.0×10⁴ -5.0×10⁴ -4.0×10⁴ -3.0×10⁴ -2.0×10⁴ -1.0×10⁴ 0 1.0×10⁴ 2.0×10⁴ 3.0×10⁴ 4.0×10⁴ 5.0×10⁴ 6.0×10⁴ 7.0×10⁴ 8.0×10⁴ 9.0×10⁴ 1.0×10⁵ 1.1×10⁵ -5.0×10⁴ -4.8×10⁴ -4.6×10⁴ -4.4×10⁴ -4.2×10⁴ -4.0×10⁴ -3.8×10⁴ -3.6×10⁴ -3.4×10⁴ -3.2×10⁴ -3.0×10⁴ -2.8×10⁴ -2.6×10⁴ -2.4×10⁴ -2.2×10⁴ -2.0×10⁴ -1.8×10⁴ -1.6×10⁴ -1.4×10⁴ -1.2×10⁴ -1.0×10⁴ -8.0×10³ -6.0×10³ -4.0×10³ -2.0×10³ 0 2.0×10³ 4.0×10³ 6.0×10³ 8.0×10³ 1.0×10⁴ 1.2×10⁴ 1.4×10⁴ 1.6×10⁴ 1.8×10⁴ 2.0×10⁴ 2.2×10⁴ 2.4×10⁴ 2.6×10⁴ 2.8×10⁴ 3.0×10⁴ 3.2×10⁴ 3.4×10⁴ 3.6×10⁴ 3.8×10⁴ 4.0×10⁴ 4.2×10⁴ 4.4×10⁴ 4.6×10⁴ 4.8×10⁴ 5.0×10⁴ 5.2×10⁴ 5.4×10⁴ 5.6×10⁴ 5.8×10⁴ 6.0×10⁴ 6.2×10⁴ 6.4×10⁴ 6.6×10⁴ 6.8×10⁴ 7.0×10⁴ 7.2×10⁴ 7.4×10⁴ 7.6×10⁴ 7.8×10⁴ 8.0×10⁴ 8.2×10⁴ 8.4×10⁴ 8.6×10⁴ 8.8×10⁴ 9.0×10⁴ 9.2×10⁴ 9.4×10⁴ 9.6×10⁴ 9.8×10⁴ 1.0×10⁵ -5×10⁴ 0 5×10⁴ 1×10⁵ -5.0×10⁴ -4.5×10⁴ -4.0×10⁴ -3.5×10⁴ -3.0×10⁴ -2.5×10⁴ -2.0×10⁴ -1.5×10⁴ -1.0×10⁴ -5.0×10³ 0 5.0×10³ 1.0×10⁴ 1.5×10⁴ 2.0×10⁴ 2.5×10⁴ 3.0×10⁴ 3.5×10⁴ 4.0×10⁴ 4.5×10⁴ 5.0×10⁴ 5.5×10⁴ 6.0×10⁴ 6.5×10⁴ 7.0×10⁴ 7.5×10⁴ 8.0×10⁴ 8.5×10⁴ 9.0×10⁴ 9.5×10⁴ 1.0×10⁵ P(J48=J48gr) 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48 0.50 0.52 0.54 0.56 0.58 0.60 0.62 0.64 0.66 0.68 0.70 0.72 0.74 0.76 0.78 0.80 0.82 0.84 0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00 1.02 1.04 1.06 1.08 1.10 1.12 1.14 1.16 1.18 1.20 1.22 1.24 1.26 1.28 1.30 1.32 1.34 1.36 1.38 1.40 0.0 0.5 1.0 1.5 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 -5×10³ -4×10³ -3×10³ -2×10³ -1×10³ 0 1×10³ 2×10³ 3×10³ 4×10³ 5×10³ 6×10³ 7×10³ 8×10³ 9×10³ -4.0×10³ -3.8×10³ -3.6×10³ -3.4×10³ -3.2×10³ -3.0×10³ -2.8×10³ -2.6×10³ -2.4×10³ -2.2×10³ -2.0×10³ -1.8×10³ -1.6×10³ -1.4×10³ -1.2×10³ -1.0×10³ -8.0×10² -6.0×10² -4.0×10² -2.0×10² 0 2.0×10² 4.0×10² 6.0×10² 8.0×10² 1.0×10³ 1.2×10³ 1.4×10³ 1.6×10³ 1.8×10³ 2.0×10³ 2.2×10³ 2.4×10³ 2.6×10³ 2.8×10³ 3.0×10³ 3.2×10³ 3.4×10³ 3.6×10³ 3.8×10³ 4.0×10³ 4.2×10³ 4.4×10³ 4.6×10³ 4.8×10³ 5.0×10³ 5.2×10³ 5.4×10³ 5.6×10³ 5.8×10³ 6.0×10³ 6.2×10³ 6.4×10³ 6.6×10³ 6.8×10³ 7.0×10³ 7.2×10³ 7.4×10³ 7.6×10³ 7.8×10³ 8.0×10³ -5×10³ 0 5×10³ 1×10⁴ -4.0×10³ -3.5×10³ -3.0×10³ -2.5×10³ -2.0×10³ -1.5×10³ -1.0×10³ -5.0×10² 0 5.0×10² 1.0×10³ 1.5×10³ 2.0×10³ 2.5×10³ 3.0×10³ 3.5×10³ 4.0×10³ 4.5×10³ 5.0×10³ 5.5×10³ 6.0×10³ 6.5×10³ 7.0×10³ 7.5×10³ 8.0×10³ P(J48 《 J48gr) -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 -0.40 -0.38 -0.36 -0.34 -0.32 -0.30 -0.28 -0.26 -0.24 -0.22 -0.20 -0.18 -0.16 -0.14 -0.12 -0.10 -0.08 -0.06 -0.04 -0.02 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48 0.50 0.52 0.54 0.56 0.58 0.60 0.62 0.64 0.66 0.68 0.70 0.72 0.74 0.76 0.78 0.80 0.82 -0.5 0.0 0.5 1.0 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 -5×10³ -4×10³ -3×10³ -2×10³ -1×10³ 0 1×10³ 2×10³ 3×10³ 4×10³ 5×10³ 6×10³ 7×10³ 8×10³ 9×10³ -4.0×10³ -3.8×10³ -3.6×10³ -3.4×10³ -3.2×10³ -3.0×10³ -2.8×10³ -2.6×10³ -2.4×10³ -2.2×10³ -2.0×10³ -1.8×10³ -1.6×10³ -1.4×10³ -1.2×10³ -1.0×10³ -8.0×10² -6.0×10² -4.0×10² -2.0×10² 0 2.0×10² 4.0×10² 6.0×10² 8.0×10² 1.0×10³ 1.2×10³ 1.4×10³ 1.6×10³ 1.8×10³ 2.0×10³ 2.2×10³ 2.4×10³ 2.6×10³ 2.8×10³ 3.0×10³ 3.2×10³ 3.4×10³ 3.6×10³ 3.8×10³ 4.0×10³ 4.2×10³ 4.4×10³ 4.6×10³ 4.8×10³ 5.0×10³ 5.2×10³ 5.4×10³ 5.6×10³ 5.8×10³ 6.0×10³ 6.2×10³ 6.4×10³ 6.6×10³ 6.8×10³ 7.0×10³ 7.2×10³ 7.4×10³ 7.6×10³ 7.8×10³ 8.0×10³ -5×10³ 0 5×10³ 1×10⁴ -4.0×10³ -3.5×10³ -3.0×10³ -2.5×10³ -2.0×10³ -1.5×10³ -1.0×10³ -5.0×10² 0 5.0×10² 1.0×10³ 1.5×10³ 2.0×10³ 2.5×10³ 3.0×10³ 3.5×10³ 4.0×10³ 4.5×10³ 5.0×10³ 5.5×10³ 6.0×10³ 6.5×10³ 7.0×10³ 7.5×10³ 8.0×10³

In [9]:
ptriangleb=plot_simplex(dataclb, ClassNames[cl1b],ClassNames[cl2b])
set_default_plot_size(12.5cm, 10cm)
display(ptriangleb)


1000 500 1 1500 Count rope j48gr j48

J48 - J48gr all datasets (cheating?)


In [10]:
using HypothesisTests
p1=pvalue(SignedRankTest(accJ48,accJ48gr))
p2=pvalue(SignTest(accJ48,accJ48gr))
p=plot_data1(cl1b,cl2b,"all datasets",accJ48-accJ48gr,-0.013,0.011)
display(p)  
@printf "p-value SignedRank=%0.4f  " p1 
@printf "p-value Sign=%0.4f\n" p2


DeltaAcc -0.07 -0.06 -0.05 -0.04 -0.03 -0.02 -0.01 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 -0.037 -0.036 -0.035 -0.034 -0.033 -0.032 -0.031 -0.030 -0.029 -0.028 -0.027 -0.026 -0.025 -0.024 -0.023 -0.022 -0.021 -0.020 -0.019 -0.018 -0.017 -0.016 -0.015 -0.014 -0.013 -0.012 -0.011 -0.010 -0.009 -0.008 -0.007 -0.006 -0.005 -0.004 -0.003 -0.002 -0.001 0.000 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.010 0.011 0.012 0.013 0.014 0.015 0.016 0.017 0.018 0.019 0.020 0.021 0.022 0.023 0.024 0.025 0.026 0.027 0.028 0.029 0.030 0.031 0.032 0.033 0.034 0.035 -0.04 -0.02 0.00 0.02 0.04 -0.038 -0.036 -0.034 -0.032 -0.030 -0.028 -0.026 -0.024 -0.022 -0.020 -0.018 -0.016 -0.014 -0.012 -0.010 -0.008 -0.006 -0.004 -0.002 0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016 0.018 0.020 0.022 0.024 0.026 0.028 0.030 0.032 0.034 0.036 -600 -500 -400 -300 -200 -100 0 100 200 300 400 500 600 700 800 900 1000 1100 -500 -480 -460 -440 -420 -400 -380 -360 -340 -320 -300 -280 -260 -240 -220 -200 -180 -160 -140 -120 -100 -80 -60 -40 -20 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400 420 440 460 480 500 520 540 560 580 600 620 640 660 680 700 720 740 760 780 800 820 840 860 880 900 920 940 960 980 1000 -500 0 500 1000 -500 -450 -400 -350 -300 -250 -200 -150 -100 -50 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000
p-value SignedRank=0.0006  p-value Sign=0.0095

J48 - J48gr all datasets (we cannot cheat)


In [11]:
#Bayesian Sign Test with rope
rope=0.01
dataclball=Bsigntest(accJ48,accJ48gr,rope)
#Plot
ptriangleball=plot_simplex(dataclball, ClassNames[cl1b],ClassNames[cl2b])
set_default_plot_size(12.5cm, 10cm)
display(ptriangleball)


1 1500 1000 2000 500 Count rope j48gr j48

What about other nonparametric tests?

  • Wilcoxon rank-sum
  • Wilcoxon signed-rank
  • Friedman test
  • Kendall's tau test for independence
  • Fisher test for independence
  • ...

Wilcoxon signed-rank Test

Similar to Sign test but it uses the magnitude of the difference. Let $Z,Z' \sim P$:

  • Known $P$: just check whether $P(Z>-Z')>0.5$

  • Unknown $P$: use observations from $P$ to estimate $P$ and query the posterior $P(Z>-Z')$

Our estimate of $P$ is not $$ \NO{P=w_0 P_0+ \sum_{i=1}^n w_i \delta_{x_i}}, $$ Alessio Benavoli, Giorgio Corani, Francesca Mangili, and Marco Zaffalon. A Bayesian non- parametric procedure for comparing algorithms. In Proceedings of the 31th International Conference on Machine Learning (ICML 2015), pages 1–9, 2015a</sub>

Different questions to the posterior

Bayesian Wilcoxon signed-rank:

$$ \begin{array}{rcl} \theta_{l}&=&\sum_{i=0}^n \sum_{j=0}^n w_iw_j I_{(-\infty,-2r)}(z_j+z_i),\\ \theta_{e}&=&\sum_{i=0}^n \sum_{j=0}^n w_iw_j I_{[-2r,2r]}(z_j+z_i),\\ \theta_{r}&=&\sum_{i=0}^n \sum_{j=0}^n w_iw_j I_{(2r,\infty)}(z_j+z_i), \end{array} $$

No need of the symmetry assumptions!

Alessio Benavoli, Giorgio Corani, Francesca Mangili, and Marco Zaffalon. A Bayesian non- parametric procedure for comparing algorithms. In Proceedings of the 31th International Conference on Machine Learning (ICML 2015), pages 1–9, 2015a</sub>


In [12]:
include("/home/benavoli/Data/Work_Julia/Tutorial/Julia/Tests/heaviside.jl")
include("/home/benavoli/Data/Work_Julia/Tutorial/Julia/Tests/Bsignranktestv1.jl")
#Bayesian Sign-Rank Test without rope
rope=0.01
datasra=Bsignranktest(accNbc,accAode,rope)
datasrb=Bsignranktest(accJ48,accJ48gr,rope)
ptriangleSa=plot_simplex(datasra, ClassNames[cl1a],ClassNames[cl2a])
ptriangleSb=plot_simplex(datasrb, ClassNames[cl1b],ClassNames[cl2b])
set_default_plot_size(25cm, 10cm)
display(hstack(ptriangleSa,ptriangleSb))


5000 1 10000 Count rope j48gr j48 1 100 200 300 Count rope aode nbc

Summing up the posterior

Sensitivity to the prior (IDP)

Different Methods

Two Methods for comparing two classifiers on multiple datasets:

**Bayesian hierarchical model**

The hierarchical model is powerful

  1. inputs the $m$ runs of the $k$-fold cross-validation results;
  2. correlation due to the overlapping training set ($\rho$).

Issue:

  • It is a bit slow (2-3 mins) because it processes 5400 observations

Why is it important to be fast ?

In machine learning, we often need to run statistical tests hundreds/thousands/millions of times

  • for features selection
  • algorithms racing
  • independence tests in Bayesian Networks
  • ...

in this case, it is more convenient to use a light (faster) test but that is powerful

Summing up the Tutorial

Bayesian analysis: a colour thinking

Software and available Bayesian tests

All tests we have presented are available in R, Julia and Python https://github.com/BayesianTestsML/tutorial/

A. Benavoli, G. Corani, J. Demsar, and M. Zaffalon. Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis. ArXiv e-prints 1606.04316, 1-40, 2016.

Sensitivity to the prior (Imprecise Dirichlet Process)

  • Prior measure $s=1$, $\alpha = \delta_{-1.5}$
  • Observations $\{-1,-0.5, 0.25, 0.75, 1.25, 1.75\}$
  • posterior strength: $s_n=s+n=1+6=7$
  • posterior probability measure: $\alpha_n=\frac{s}{s+n} \alpha+ \frac{1}{s+n} \sum\limits_{i=1}^n \delta_{x_i}$

$$ \begin{array}{l} (P(B_1),P(B_2))\sim Beta(3,4) ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ (P(B_1),P(B_2))\sim Beta(2,5)~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ ~~~~ \end{array} $$

In [13]:
using Gadfly
using Compose
myplot=plot(x=rand(1)*0.0,y=rand(1)*0.0,Guide.annotation(compose(context(),line([(2, 0), (2, 1/7)]),stroke(colorant"blue"),linewidth(0.4))),Guide.annotation(compose(context(),line([(-1, 0), (-1, 1/7)]),line([(-0.5, 0), (-0.5, 1/7)]),line([(0.25, 0), (0.25, 1/7)]),line([(0.75, 0), (0.75, 1/7)]),line([(1.25, 0), (1.25, 1/7)]),line([(1.75, 0), (1.75, 1/7)]),stroke(colorant"orange"),linewidth(0.4))),Stat.xticks(ticks=[-2.0,-1.5,-1.0,-0.5,0.0, 0.5, 1.0,1.5,2]),Stat.yticks(ticks=[0.0, 0.5, 1.0]),xintercept=[0.0], Geom.line,Guide.ylabel("Probability"),Theme(major_label_font_size=15pt,minor_label_font_size=13pt))
draw(PNG("plots/realpost1b.png", 6inch, 3inch), myplot)


WARNING: The following aesthetics are mapped, but not used by any geometry:
    xintercept