The man behind

What is nonparametric statistic?

Frequentist definition

A nonparametric procedure is a statistical procedure that has a certain desirable properties that hold under relatively mild assumptions regarding the underlying populations from which the data are obtained.

_{_{Hollander, Myles, Douglas A. Wolfe, and Eric Chicken. Nonparametric statistical methods. John Wiley and Sons, 2013.}</sub>}

![alt text][logo]

Frequentist definition

The term nonparametric is imprecise. The related term distribution-free has a more precise meaning...The distribution-free property enables one to obtain the distribution of the statistic under the null hypothesis without specifying the underlying distribution of the data

_{_{Hollander, Myles, Douglas A. Wolfe, and Eric Chicken. Nonparametric statistical methods. John Wiley and Sons, 2013.}</sub>}

BNP definition

Inferring the distribution of the data when is completely unknown.

Bayesian parametric

Bayesian nonparametric

Distribution on distributions?

$$ x_1,\dots,x_n\sim P ~~~\textit{ and }~~~ P \sim \mu $$

Nonparametric Prior

Probability Space: Let $\mathbb{X}$ be a standard Borel space with Borel $\sigma$-field $\mathcal{B}_\mathbb{X}$ and $\mathbb{P}$ be the space of probability measures on $(\mathbb{X},\mathcal{B}_{\mathbb{X}})$.

Nonparametric Prior: Let $\mathbb{M}$ be the class of all probability measures on $(\mathbb{P},\mathcal{B}_\mathbb{P})$. We call the elements $\mu \in \mathbb{M}$ nonparametric priors.

Dirichlet Process

developed by Ferguson

$$ x_1,\dots,x_n\sim P ~~~\textit{ and }~~~ P \sim \mu $$

We are interested on a particular $\mu$:

$$ x_1,\dots,x_n\sim P ~~~\textit{ and }~~~ P \sim Dp(s,\alpha) $$

Dirichlet Process

developed by Ferguson

An element $\mu \in \mathbb{M}$ is called a Dirichlet process $\mathcal{D}p(s,\alpha)$ on $\mathbb{X}$, with

prior strength $s\in \mathbb{R}^+$
prior probability measure $\alpha$ on $\mathbb{X}$

if for every finite measurable partition $B_1,\dots,B_m$ of $\mathbb{X}$,

the vector $(P(B_1),\dots,P(B_m))$

has a Dirichlet distribution with parameters $(s\alpha(B_1),\dots,s\alpha(B_m))$

Prior Base measure

It means that $\alpha(\mathbb{X})=s\in \mathbb{R}$ is finite and, thus, we can normalize $\alpha(\cdot)$!

Normalization: $$ \alpha(\cdot)=s ~\alpha(\cdot) ~~~ \text{ with $~~~\alpha(\mathbb{X})=1$} $$
Note that $\alpha(\cdot)$ is now a probability measure.

We equivalently say :$~~~P\sim \mathcal{D}(\alpha)~~~$ or $~~~P \sim \mathcal{Dp}(s,\alpha)$

Example 1/3

$\mathbb{X}=\mathbb{R}$
prior strength measure: $s=4$
prior probability measure: $\alpha=0.5\delta_{\{-1\}}+0.5\delta_{\{1\}}$



In [8]:

    
using Gadfly
using Compose
myplot=plot(x=rand(1)*0.0,y=rand(1)*0.0,Guide.annotation(compose(context(),line([(-1, 0), (-1, 0.5)]),line([(1, 0), (1, 0.5)]),stroke(colorant"blue"),linewidth(0.4))),Stat.xticks(ticks=[-2.0,-1.5,-1.0,-0.5,0.0, 0.5, 1.0,1.5,2]),Stat.yticks(ticks=[0.0, 0.5, 1.0]),xintercept=[0.0], Geom.line,Guide.ylabel("Probability"),Theme(major_label_font_size=15pt,minor_label_font_size=13pt))
draw(PNG("plots/real.png", 6inch, 3inch), myplot)









    



WARNING: The following aesthetics are mapped, but not used by any geometry:
    xintercept

Dirichlet Process

We say that $P\sim \mathcal{D}p(s,\alpha)$ if

for every finite measurable partition $B_1,\dots,B_m$ of $\mathbb{X}$,

the vector $(P(B_1),\dots,P(B_m))$ has a Dirichlet distribution with parameters $(s\alpha(B_1),\dots,s\alpha(B_m))$

Example 2/3

$$ P \sim Dp(s,\alpha) $$

Partition: $B_1=(-\infty,0]$, $B_2=(0,\infty)$
then

$$ \begin{array}{rcl} (P(B_1),P(B_2))&\sim& Dir(s\alpha(B_1),s\alpha(B_2))\\ &=&Beta(s\alpha(B_1),s\alpha(B_2)) \end{array} $$

Example 3/3

Since $s=4$ and $\alpha=0.5\delta_{\{-1\}}+0.5\delta_{\{1\}}$ then

$$ \begin{array}{rcl} (P(B_1),P(B_2))&\sim& Dir(s\alpha(B_1),s\alpha(B_2))\\ &=& Dir(2,2)=Beta(2,2) \end{array} $$



In [2]:

    
xv=(0:0.01:1)
yv=6*xv.^(1).*(1-xv).^(1)

densplot=plot(x=xv,y=yv, Geom.line,Guide.xlabel("P(B<sub>1</sub>)"),Guide.ylabel("Density"),Theme(panel_fill=colorant"black", default_color=colorant"orange",major_label_font_size=15pt,minor_label_font_size=13pt))
draw(PNG("plots/densplot.png", 6inch, 3inch), densplot)

Posterior Dirichlet Process

One of the most remarkable properties of the DP priors is that the posterior of $P$ is again a DP.

let $x_1,\dots,x_n \sim P$ i.i.d.
assume that $P \sim Dp(s,\alpha)$

Then the posterior distribution of $P$ is
$$ P|x_1,\dots,x_n \sim Dp\left(s+n, \frac{s}{s+n} \alpha+ \frac{1}{s+n} \sum\limits_{i=1}^n \delta_{x_i}\right), $$

where $\delta_{x_i}$ is an atomic probability measure centered at $x_i$.

Example cont.

$\mathbb{X}=\mathbb{R}$
prior strength: $\alpha(\mathbb{R})=s=4$
prior probability measure: $\alpha=0.5\delta_{\{-1\}}+0.5\delta_{\{1\}}$

Example cont.

Observations: $\{-1.5,-0.5, 0.25, 0.75, 1.25, 1.75\}$
posterior strength: $s_n=s+n=4+6$
posterior probability measure: $\alpha_n=\frac{s}{s+n} \alpha+ \frac{1}{s+n} \sum\limits_{i=1}^n \delta_{x_i}$



In [3]:

    
using Gadfly
using Compose
myplot=plot(x=rand(1)*0.0,y=rand(1)*0.0,Guide.annotation(compose(context(),line([(-1, 0), (-1, 0.4)]),line([(1, 0), (1, 0.4)]),stroke(colorant"blue"),linewidth(0.4))),Guide.annotation(compose(context(),line([(-1.5, 0), (-1.5, 0.1)]),line([(-0.5, 0), (-0.5, 0.1)]),line([(0.25, 0), (0.25, 0.1)]),line([(0.75, 0), (0.75, 0.1)]),line([(1.25, 0), (1.25, 0.1)]),line([(1.75, 0), (1.75, 0.1)]),stroke(colorant"orange"),linewidth(0.4))),Stat.xticks(ticks=[-2.0,-1.5,-1.0,-0.5,0.0, 0.5, 1.0,1.5,2]),Stat.yticks(ticks=[0.0, 0.5, 1.0]),xintercept=[0.0], Geom.line,Guide.ylabel("Probability"),Theme(major_label_font_size=15pt,minor_label_font_size=13pt))
draw(PNG("plots/realpost.png", 6inch, 3inch), myplot)









    



WARNING: The following aesthetics are mapped, but not used by any geometry:
    xintercept

Partition: $B_1=(-\infty,0]$, $B_2=(0,\infty)$

$$ \begin{array}{rcl} (P(B_1),P(B_2))&\sim& Dir(s_n\alpha_n(B_1),s_n\alpha_n(B_2))\\ &=&Dir(4,6)=Beta(4,6) \end{array} $$



In [4]:

    
xv=(0:0.01:1)
yv=504*xv.^(3).*(1-xv).^(5)

densplot=plot(x=xv,y=yv, Geom.line,Guide.xlabel("P(B<sub>1</sub>)"),Guide.ylabel("Density"),Theme(panel_fill=colorant"black", default_color=colorant"orange",major_label_font_size=15pt,minor_label_font_size=13pt))
draw(PNG("plots/densplotpost.png", 6inch, 3inch), densplot)

Another example

Prior measure $s=1$, $\alpha = \delta_{-1.5}$
Observations $\{-1,-0.5, 0.25, 0.75, 1.25, 1.75\}$
posterior strength: $s_n=s+n=1+6=7$
posterior probability measure: $\alpha_n=\frac{s}{s+n} \alpha+ \frac{1}{s+n} \sum\limits_{i=1}^n \delta_{x_i}$

$$ \begin{array}{rcl} (P(B_1),P(B_2))&\sim&Beta(3,4) \end{array} $$



In [5]:

    
using Gadfly
using Compose
myplot=plot(x=rand(1)*0.0,y=rand(1)*0.0,Guide.annotation(compose(context(),line([(-1.5, 0), (-1.5, 1/7)]),stroke(colorant"blue"),linewidth(0.4))),Guide.annotation(compose(context(),line([(-1, 0), (-1, 1/7)]),line([(-0.5, 0), (-0.5, 1/7)]),line([(0.25, 0), (0.25, 1/7)]),line([(0.75, 0), (0.75, 1/7)]),line([(1.25, 0), (1.25, 1/7)]),line([(1.75, 0), (1.75, 1/7)]),stroke(colorant"orange"),linewidth(0.4))),Stat.xticks(ticks=[-2.0,-1.5,-1.0,-0.5,0.0, 0.5, 1.0,1.5,2]),Stat.yticks(ticks=[0.0, 0.5, 1.0]),xintercept=[0.0], Geom.line,Guide.ylabel("Probability"),Theme(major_label_font_size=15pt,minor_label_font_size=13pt))
draw(PNG("plots/realpost1.png", 6inch, 3inch), myplot)









    



WARNING: The following aesthetics are mapped, but not used by any geometry:
    xintercept

Partition: $B_1=(-\infty,-0.75]$, $B_2=(-0.75,0.5]$ $B_3=[0.5,\infty)$

$$ \begin{array}{rcl} (P(B_1),P(B_2),P(B_3))&\sim& Dir(s_n\alpha_n(B_1),s_n\alpha_n(B_2),s_n\alpha_n(B_3))\\ &=&Dir(2,2,3) \end{array} $$

Finest Partition

$\alpha_n=\frac{s}{s+n} \alpha+ \frac{1}{s+n} \sum\limits_{i=1}^n \delta_{x_i}$, $~~~~s_n=s+n$
$B_1=(-\infty,-1.5]$, $~B_2=(-1.5,-1]$, $~B_3=(-1,-0.5]$,.....

Finest Partition

We can actually only consider

$B_1=(.,-1.5]$, $~B_2=(.,-1]$, $~B_3=(.,-0.5]$,...

$$ \begin{array}{rcl} (P(B_1),\dots,P(B_{8}))&\sim& Dir(s_n\alpha_n(B_1),\dots,s_n\alpha_n(B_8))\\ &=& Dir(s,1,1,1,1,1,1) \end{array} $$
Note that the support of $P$ is atomic, hence

$$ P=w_0 \delta_{\{-1.5\}}+ w_1 \delta_{-1}+ w_2 \delta_{-0.5}+\dots+ w_7 \delta_{1.75}, $$
with $(w_0,w_1,\dots,w_n)\sim Dir(s,1,\dots,1)$

So we have a Hierarchical model:

General property

$$ P=w_0 P_0+ \sum_{i=1}^n w_i \delta_{X_i}, $$

where

$(w_0,w_1,\dots,w_n)\sim Dir(s,1,\dots,1)$
$\sum_{i=0}^n w_i = 1$
$P_0 \sim Dp(s,\alpha)$

Bayesian NonParametric Statistics

Our view

It reduces to ask the right question to the poster

Bayesian NonParametric Statistics

It reduces to ask the right question to the posterior
$$ P=w_0 P_0+ \sum_{i=1}^n w_i \delta_{x_i}, $$
with

$(w_0,w_1,\dots,w_n)\sim Dir(s,1,\dots,1)$

$\sum_{i=0}^n w_i = 1$,

$P_0 \sim Dp(s,\alpha)$.

Sign Test

Goal: is the median of some distribution $P$ greater than zero?

Known $P$: just check whether $P(0,\infty)>P(-\infty,0]~~$ equiv. $~~P(0,\infty)>0.5$
Unknown $P$: use observations from $P$ to estimate $P$ and query the posterior (Bayesian way)

Our estimate of $P$ is $$ P=w_0 P_0+ \sum_{i=1}^n w_i \delta_{x_i}, $$ with

$(w_0,w_1,\dots,w_n)\sim Dir(s,1,\dots,1)$
$P_0 \sim Dp(s,\alpha)$.

Sign Test

Let us ask the posterior

$$ P(\infty,0] ~~~ P(0,\infty) $$

This is equal to

$$ \begin{array}{rclcl} P(-\infty,0]&=&w_0 P_0(-\infty,0]&+& \sum_{i=1}^n w_i I_{(-\infty,0]}(x_i)\\ P(0,\infty)&=&w_0 P_0(0,\infty)&+& \sum_{i=1}^n w_i I_{(0,\infty)}(x_i) \end{array} $$

Bayesian Sign Test

The Bayesian sign test reduces to compute

$$ \mathcal{P}\Big(P(0,\infty)>P(-\infty,0]\Big) =\mathcal{P}\Big(P(0,\infty)>0.5\Big) $$

where $\mathcal{P}$ is computed w.r.t. $(w_0,w_1,\dots,w_n)\sim Dir(s,1,\dots,1)$ and $P_0 \sim Dp(s,\alpha)$

Example cont.

$$ \begin{array}{rcl} P(-\infty,0]&=&w_0 P_0(-\infty,0]+ \sum_{i=1}^n w_i I_{(-\infty,0]}(x_i)\\ &=&w_0 + w_1 +w_2\\ P(0,\infty)&=& w_0 P_0(0,\infty)+ \sum_{i=1}^n w_i I_{(0,\infty)}(x_i)\\ &=& w_3 +w_4+w_5+w_6 \end{array} $$

The goal is to compute

$$ \mathcal{P}(P(0,\infty)>0.5) =\mathcal{P}(w_3 +w_4+w_5+w_6 >0.5) $$

where $\mathcal{P}$ is computed w.r.t. $(w_0,w_1,\dots,w_n)\sim Dir(s,1,\dots,1)$



In [6]:

    
using Distributions, Gadfly, Compose, DataFrames
include("plots/Bsigntest1.jl")

#Bayesian Sign Test 
observ=[-1;-0.5;0.25;0.75;1.25;1.75];
data=Bsigntest(observ,zeros(length(observ),1),0,[1,0]);
#Plot
df0 = DataFrame(x=data[2,:][:])
p0=plot(df0, x=:x, Geom.histogram,Theme(major_label_font_size=13pt,minor_label_font_size=12pt,key_label_font_size=11pt),Guide.xlabel("P(0,&#8734;)"))
set_default_plot_size(15cm, 8cm)
display(p0)
prob=length(find(z->z> 0.5,data[2,:][:]))/length(data[2,:][:])
println("Probability that P(0,\u221e)>0.5 is $prob")









    













    



Probability that P(0,∞)>0.5 is 0.65675

Bayesian Answer

The median is greater than zero with posterior probability $0.66$.

No Null hypothesis
No $p$-value

Sensitivity to the prior (IDP)

$$ \begin{array}{rcl} P(-\infty,0]&=&w_0 P_0(-\infty,0]+ \sum_{i=1}^n w_i I_{(-\infty,0]}(x_i)\\ &=& w_1 +w_2\\ P(0,\infty)&=& w_0 P_0(0,\infty)+ \sum_{i=1}^n w_i I_{(0,\infty)}(x_i)\\ &=& w_0 +w_3 +w_4+w_5+w_6 \end{array} $$

The goal is to compute

$$ \mathcal{P}(P(0,\infty)>0.5) =\mathcal{P}(w_0 +w_3 +w_4+w_5+w_6 >0.5) $$

where $\mathcal{P}$ is computed w.r.t. $(w_0,w_1,\dots,w_n)\sim Dir(s,1,\dots,1)$



In [7]:

    
using Distributions, Gadfly, Compose, DataFrames
include("plots/Bsigntest1.jl")

#Bayesian Sign Test 
observ=[-1;-0.5;0.25;0.75;1.25;1.75];
data=Bsigntest(observ,zeros(length(observ),1),0,[0,1]);
#Plot
df0 = DataFrame(x=data[2,:][:])
p0=plot(df0, x=:x, Geom.histogram,Theme(major_label_font_size=13pt,minor_label_font_size=12pt,key_label_font_size=11pt),Guide.xlabel("P(0,&#8734;)"))
set_default_plot_size(15cm, 8cm)
display(p0)
prob=length(find(z->z> 0.5,data[2,:][:]))/length(data[2,:][:])
println("Probability that P(0,\u221e)>0.5 is $prob")









    













    



Probability that P(0,∞)>0.5 is 0.88956

The man behind

What is nonparametric statistic?

Frequentist definition

Frequentist definition

BNP definition

Distribution on distributions?

Nonparametric Prior

Dirichlet Process

developed by Ferguson

Dirichlet Process

developed by Ferguson

Prior Base measure

Example 1/3

Dirichlet Process

Example 2/3

Example 3/3

Posterior Dirichlet Process

Example cont.

Example cont.

Another example

Finest Partition

Finest Partition

General property

Bayesian NonParametric Statistics

Our view

It reduces to ask the right question to the poster

Bayesian NonParametric Statistics

It reduces to ask the right question to the posterior

Sign Test

Sign Test

Bayesian Sign Test

Example cont.

Bayesian Answer

Sensitivity to the prior (IDP)