This notebook is part of the supplementary material for:
Genewein T., Leibfried F., Grau-Moya J., Braun D.A. (2015) Bounded rationality, abstraction and hierarchical decision-making: an information-theoretic optimality principle, Frontiers in Robotics and AI.
More information on how to run the notebook on the accompanying github repsitory where you can also find updated versions of the code and notebooks.
This notebook in mentioned in Section 2.3. Due to time- and space-limitations the results of this notebook are not in the paper.
This notebook provides a proof-of-concept implementation of a naive sample-based Blahut-Arimoto iteration scheme. Neither the code nor the notebook have been particularly polished or tested. There is a short theory-bit in the beginning of the notebook but most of the explanations are brief and mixed into the code as comments.
The solution to a free energy variational problem (Section 2.1 in the paper) has the form of a Boltzmann distribution $$p(y) = \frac{1}{Z}p_0(y)e^{\beta U(y)},$$ where $Z=\sum_y p_0(y)e^{\beta U(y)}$ denotes the partition sum, $p_0(y)$ is a prior distribution and $U(y)$ is the utility function. The inverse temperature $\beta$ can be interpreted as a resource parameter and it governs how far the posterior $p(y)$ can deviate from the prior (measured as a KL-divergence) - see the paper Section 2.1 for details.
For a decision-maker, it suffices to obtain a sample from $p(y)$ and act according to that sample, rather than computing the full distribution $p(y)$. A simple scheme to sample from $p(y)$ is given by rejection sampling.
Rejection sampling
Goal: get a sample $y$ from the distribution $f(y)$. Draw from a uniform distribution $u\sim \mathcal{U}(0,1)$ and from a proposal distribution $y\sim g(y)$. If $u < \frac{f(y)}{M g(y)}$, accept the sample as a sample from $f(y)$, otherwise reject the sample and repeat. $M$ is a constant that ensures that $M g(y) \geq f(y)~\forall y$. Note that rejection sampling also works for sampling from an unnormalized distribution as long as $M$ is chosen accordingly.
For the free-energy problem, we want a sample from $p(y)\propto f(y) = p_0(y)e^{\beta U(y)}$. We choose $g(y)=p_0(y)$ and set $M=e^{\beta U_{max}}$, where $U_{max}=\underset{y}{max}~U(y)$
Finally we get the following rejection sampling scheme:
The solution to the rate distortion problem looks very similar to the Boltzmann distribution in the free-energy case. However, there is one crucial difference: in the free-energy case, the prior is an arbitrary distribution - in the rate distortion case, the prior is replaced by the marginal distribution, which leads to a set of self-consistent equations $$\begin{align} p^*(a|w)&=\frac{1}{Z(w)}p(a)e^{\beta U(a,w)} \\ p(a)&=\sum_w p(w)p(a|w) \end{align}$$ After convergence of the Blahut-Arimoto iterations, the marginal $p(a)$ can just be treated like a prior and the rejection sampling scheme described above can straightforwardly be used. However, when initializing with an arbitary marginal distribution $\hat{p}(a)$ the iterations must be performed in a sample-based manner until convergence.
Here, we do this in a naive and very straightforward way: we represent $\hat{p}(a)$ simply through counters (a categorical distribution). Then we do the following:
Additionally, this notebook allows for some burn-in time, where after a certain number of iterations of 1. and 2. (i.e. after the "burn-in") the counters for $\hat{p}(a)$ are reset. This naive scheme seems to work but it is unclear how to choose the batch-size (number of samples from $\hat{p}^*(a|w)$ to take before performing an update step on $\hat{p}(a)$), how to set the burn-in phase, etc.
In the notebook below, you can try different batch-sizes and different burn-in times and you can compare full-forgetting against no forgetting (i.e. no resetting of the counters).
In [ ]:
#only run this once
include("RateDistortionDecisionMaking.jl")
In [ ]:
#load taxonomy example
using RateDistortionDecisionMaking, DataFrames, Gadfly, Distributions
#set up taxonomy example
include("TaxonomyExample.jl")
w_vec, w_strings, a_vec, a_strings, p_w, U = setuptaxonomy()
#pre-compute utilities, find maxima
U_pre, Umax = setuputilityarrays(a_vec,w_vec,U)
#initialize p(a) uniformly
num_acts = length(a_vec)
pa_init = ones(num_acts)/num_acts;
In [ ]:
#Performs rejection sampling with a constant (scaled uniform) envelope
#using a softmax acceptance-rejection criterion.
#prop_dist .. proposal distribution (must be an instance of type ::Distribution)
#nsamps ..... desired number of samples (scalar)
#maxsteps ... maximum number of acceptance-rejection steps (scalar, must be ≧ nsamps)
#β .......... softmax parameter
#lh ......... likelihood value (vector of length N)
#maxlh ...... maximum value that the likelihood can take (scalar)
function rej_samp_const(prop_dist::Distribution, nsamps::Integer, maxsteps::Integer, β::Number, lh::Vector, maxlh::Number)
#initialize
samps = zeros(nsamps)
acc_cnt = 0 #acceptance-counter
if(maxsteps < nsamps)
maxsteps = nsamps
end
k=0 #use this to make sure that k is still available after the loop
for k in 1:maxsteps
u=rand(1) #sample from uniform between (0,1)
index = rand(prop_dist) #sample from proposal
ratio = exp(β*lh[index])/exp(β*maxlh)
if u[1]<ratio #explicit indexing is needed to get a float, since the >= can not handle arrays
#if we enter here, accept the sample
acc_cnt = acc_cnt + 1
samps[acc_cnt] = index
if(acc_cnt == nsamps)
#we have enough samples, exit loop
break
end
end
end
if(k==maxsteps)
warn("[RejSampConst] Maximum number of steps reached - number of samples is potentially lower than nsamps!\n")
end
#store all accepted samples (this can be less than nsamps if maxsteps is too low or acceptance-rate is low)
samples = samps[1:acc_cnt]
#compute acceptance ratio
acc_ratio = acc_cnt/k
return samples, acc_ratio
end
In [ ]:
#marginal is simply represented by counters (i.e. by frequencies)
function init_marginal_representation_ctrs(pa_init::Vector)
return pa_init
end
#this updates the marginal over actions p(a) using a counter-representation
#this function counts the number of times each action-index occurs in sampled_indices
#these counts are then added to the current marginal_ctrs. Optionally, the counters are reset
#before adding the new samples (=hard forgetting).
function update_marginal_ctrs(sampled_indices::Vector, marginal_ctrs::Vector; reset_ctrs::Bool=false)
#TODO: perhaps replace hard-resetting with an exponential decay?
p_ctrs = marginal_ctrs
card_p = length(p_ctrs)
#reset counters for marginal? (make sure every entry is non-zero!)
if reset_ctrs
p_ctrs = ones(card_p)/card_p
end
#update marginal counters using a histogram to do the counting (bin-borders have to be set manually!)
e,p_counts = hist(sampled_indices,0.5:1:(card_p+0.5))
p_ctrs = p_ctrs + p_counts
#normalize to get the updated marginal
p_sampled = p_ctrs / sum(p_ctrs)
return p_sampled, p_ctrs #return the probability-vector, but also the representation of the marginal (as counts)
end
#function for BA sampling
#burnin_ratio specifies the ratio of outer iterations that will not count
#towards computation of the final marginal distribution (counters will be blocked)
#reset_marginal_ctrs specifies whether the marginal is computed with the samples of the last
#iteration only (=hard forgetting by resetting counters) or whether the marginal
#is computed with all samples of all iterations (=no forgetting)
function BAsampling(pa_init::Vector, β::Number, U_pre::Matrix, Umax::Vector, pw::Vector,
nsteps_marginalupdate::Integer, nsteps_conditionalupdate::Integer;
burnin_ratio::Real=0.7, max_rejsamp_steps::Integer=200,
compute_performance::Bool=false, performance_as_dataframe::Bool=false,
performance_per_iteration::Bool=false,
init_marg_func::Function=init_marginal_representation_ctrs,
update_marg_func::Function=update_marginal_ctrs, update_func_args...)
#compute cardinality, check size of U_pre
card_a = length(pa_init)
card_w = length(pw)
if size(U_pre) != (card_a, card_w)
error("Size mismatch of U_pre and pa_init or pw!")
end
#check that burnin_ratio is really a ratio
if (burnin_ratio < 0) || (burnin_ratio > 1)
error("burnin_ratio must be a number between 0 and 1.")
end
#if performance measures don't need to be returned, don't compute them per iteration
if compute_performance==false
performance_per_iteration = false
end
#preallocate if necessary
if performance_per_iteration
I_i = zeros(maxiter)
Ha_i = zeros(maxiter)
Hagw_i = zeros(maxiter)
EU_i = zeros(maxiter)
RDobj_i = zeros(maxiter)
end
#initialize sampling distributions
pw_dist = Categorical(pw) #proposal distribution
pagw_ctrs = ones(card_a, card_w) #counters for conditional distribution
pa_sampled = pa_init #marginal distribution
#initialize the marginal representation
pa_ctrs = init_marg_func(pa_init)
burnin_triggered=false
#outer loop - in each iteration the marginal is updated
iter=0
for iter in 1:nsteps_marginalupdate
a_samples = zeros(nsteps_conditionalupdate) #this will hold the samples from p(a|w) during inner loop
#inner loop - in each step a sample is drawn from the conditional and stored for
#for the batch-update of the marginal
for j in 1:nsteps_conditionalupdate
#draw a w sample
w_samp = rand(pw_dist)
#draw a sample from p(a|w) using the current estimate of p(a) as proposal distribution using rejection sampling
agw_samp, acc_ratio = rej_samp_const(Categorical(pa_sampled), 1, max_rejsamp_steps, β, U_pre[:,w_samp], Umax[w_samp])
a_samples[j] = agw_samp[1]
#update conditional counters
pagw_ctrs[agw_samp, w_samp] += 1
end
#very simple burn-in: simply reset counters
if (iter >(nsteps_marginalupdate)*burnin_ratio) && (!burnin_triggered)
burnin_triggered = true
pagw_ctrs = ones(card_a, card_w)
end
#update marginal with samples drawn in inner loop
pa_sampled, pa_ctrs = update_marg_func(a_samples, pa_ctrs; update_func_args...)
#compute entropic quantities (if requested with additional parameter)
if performance_per_iteration
#compute sample-based conditional p(a|w)
pagw_sampled = zeros(card_a, card_w)
for i in 1:card_w
pagw_sampled[:,i] = pagw_ctrs[:,i] / sum(pagw_ctrs[:,i])
end
I_i[iter], Ha_i[iter], Hagw_i[iter], EU_i[iter], RDobj_i[iter] = analyzeBAsolution(pw, pa_sampled, pagw_sampled, U_pre, β)
end
end
#compute conditionals using the sample-counts of the previous inner loops
#the burn-in parameter specifies how many of the inner loops are discarded
pagw_sampled = zeros(card_a, card_w)
for i in 1:card_w
pagw_sampled[:,i] = pagw_ctrs[:,i] / sum(pagw_ctrs[:,i])
end
#return results
if compute_performance == false
return pagw_sampled, pa_sampled
else
if performance_per_iteration == false
#compute performance measures for final solution
I, Ha, Hagw, EU, RDobj = analyzeBAsolution(pw, pa_sampled, pagw_sampled, U_pre, β)
else
#"cut" valid results from preallocated vector
I = I_i[1:iter]
Ha = Ha_i[1:iter]
Hagw = Hagw_i[1:iter]
EU = EU_i[1:iter]
RDobj = RDobj_i[1:iter]
end
#if needed, transform to data frame
if performance_as_dataframe == false
return pagw_sampled, pa_sampled, I, Ha, Hagw, EU, RDobj
else
performance_df = performancemeasures2DataFrame(I, Ha, Hagw, EU, RDobj)
return pagw_sampled, pa_sampled, performance_df
end
end
end
In [ ]:
#example call and also plot evolution of performance measueres
maxiter = 10000
β = 1.2
nsteps_marg = 500
nsteps_cond = 750
pagw_s,pa_s,perf = BAsampling(pa_init, β, U_pre, Umax, p_w, nsteps_marg, nsteps_cond,
burnin_ratio=0.7, max_rejsamp_steps=500, reset_ctrs=false,
compute_performance=true, performance_as_dataframe=true, performance_per_iteration=true)
plt_cond = visualizeBAconditional(pagw_s,a_vec,w_vec,a_strings,w_strings)
#instead of using a range of β-values (as for the standard-performance plot),
#use a vector indicating the iteration
niter = size(perf,1)
plt_perf_entropy, plt_perf_utility, plt_rateutility = plotperformancemeasures(perf,collect(1:niter),
suppress_vis=true, xlabel_perf="Iteration")
#TODO: somehow the "Iteration" label above doesn't seem to work!
display(vstack(plt_perf_entropy, plt_perf_utility))
display(plt_cond)
In [ ]:
#compute theoretical result for rate-dutility curve
ε = 0.0001 #convergence critetion for BAiterations
maxiter = 10000
β_sweep = collect(0.01:0.05:3)
nβ = length(β_sweep)
#preallocate
I = zeros(nβ)
Ha = zeros(nβ)
Hagw = zeros(nβ)
EU = zeros(nβ)
RDobj = zeros(nβ)
#sweep through β values and perfomr Blahut-Arimoto iterations for each value
for i=1:nβ
pagw, pa, I[i], Ha[i], Hagw[i], EU[i], RDobj[i] = BAiterations(pa_init, β_sweep[i], U_pre, p_w, ε, maxiter,compute_performance=true)
end
#show rate-utility curve (shaded region is theoretically infeasible)
perf_res_analytical = performancemeasures2DataFrame(I, Ha, Hagw, EU, RDobj);
plot_perf_entropy, plot_perf_util, plot_rateutility = plotperformancemeasures(perf_res_analytical, β_sweep, suppress_vis=true);
display(plot_rateutility)
In [ ]:
#run the smapling for different temperatures and repeat each run n-times
#then plot these results against the rate-utility curve (based on the closed-form solutions)
βrange_samp = [0.1, 0.25, 0.5, 0.8, 1.2, 1.4, 1.6, 2]
#βrange_samp = [1.2, 2]
nruns = 10; #number of runs per β point
nsteps_marg = 500
nsteps_cond = 750
burnin_ratio = 0.8
max_rejsamp_steps=500 #maximum number of steps for sampling from the conditional
nconditions = size(βrange_samp,1)*nruns
I_sampled = zeros(2*nconditions)
Ha_sampled = zeros(2*nconditions)
EU_sampled = zeros(2*nconditions)
βval = zeros(2*nconditions)
ResetCtrs = falses(2*nconditions)
#first run with reset_ctrs to false
reset_ctrs = false #if true, the ctrs for the marginal are reset in each iteration (="hard" forgetting)
for b in 1:nconditions
println("BA Sampling, run $b of $(2*nconditions)")
β = βrange_samp[round(Int,ceil(b/nruns))]
pagw_s, pa_s, I, Ha, Hagw, EU, RDobj = BAsampling(pa_init, β, U_pre, Umax, p_w, nsteps_marg, nsteps_cond,
reset_ctrs=reset_ctrs, burnin_ratio=burnin_ratio,
max_rejsamp_steps=max_rejsamp_steps, compute_performance=true)
I_sampled[b] = I
Ha_sampled[b] = Ha
EU_sampled[b] = EU
βval[b] = β
ResetCtrs[b] = reset_ctrs
end
#second run with reset_ctrs to true
reset_ctrs = true #if true, the ctrs for the marginal are reset in each iteration (="hard" forgetting)
for b in 1:nconditions
println("BA Sampling, run $(nconditions+b) of $(2*nconditions)")
β = βrange_samp[round(Int,ceil(b/nruns))]
pagw_s, pa_s, I, Ha, Hagw, EU, RDobj = BAsampling(pa_init, β, U_pre, Umax, p_w, nsteps_marg, nsteps_cond,
reset_ctrs=reset_ctrs, burnin_ratio=burnin_ratio,
max_rejsamp_steps=max_rejsamp_steps, compute_performance=true)
I_sampled[nconditions+b] = I
Ha_sampled[nconditions+b] = Ha
EU_sampled[nconditions+b] = EU
βval[nconditions+b] = β
ResetCtrs[nconditions+b] = reset_ctrs
end
#wrap data in DataFrame for convenient plotting
res_sampled = DataFrame(β=βval, I_aw=I_sampled, H_a=Ha_sampled, E_U=EU_sampled, Forgetting=ResetCtrs)
In [ ]:
#compute theoretical result for same set of temperatures
ε = 0.0001 #convergence critetion for BAiterations
maxiter = 10000
nβ = length(βrange_samp)
#preallocate
I = zeros(nβ)
Ha = zeros(nβ)
Hagw = zeros(nβ)
EU = zeros(nβ)
RDobj = zeros(nβ)
#sweep through β values and perfomr Blahut-Arimoto iterations for each value
for i=1:nβ
pagw, pa, I[i], Ha[i], Hagw[i], EU[i], RDobj[i] = BAiterations(pa_init, βrange_samp[i], U_pre, p_w, ε, maxiter,compute_performance=true)
end
#show rate-utility curve (shaded region is theoretically infeasible)
perf_res_analytical_samp = performancemeasures2DataFrame(I, Ha, Hagw, EU, RDobj)
perf_res_analytical_samp[:β] = βrange_samp;
In [ ]:
#plot the solutions from the sampling runs into the (analytical) rate-utility plot
plot(Guide.ylabel("E[U]"), Guide.xlabel("I(A;W) [bits]"),
Guide.title("Rate-Utility curve (dots show sampling solutions)"), BAtheme(), BAdiscretecolorscale(2),
layer(res_sampled,y="E_U",x="I_aw",Geom.point,color="Forgetting"),
layer(perf_res_analytical_samp,y="E_U",x="I_aw",Geom.point),
layer(perf_res_analytical,y="E_U",x="I_aw",Geom.line),
layer(perf_res_analytical,y="E_U",x="I_aw",ymin="E_U",ymax=ones(size(perf_res_analytical,1))*maximum(perf_res_analytical[:E_U]),
Geom.ribbon,BAtheme(default_color=colorant"green"))
)
In [ ]:
#compute the mean for each group of points (that were produced with the same beta and same forgetting setting)
res_samp_aggregated = aggregate(res_sampled, [:β,:Forgetting], mean)
#plot the mean-solutions from the sampling runs into the (analytical) rate-utility plot
plt_samp = plot(Guide.ylabel("E[U]"), Guide.xlabel("I(A;W) [bits]"),
Guide.title("Rate-Utility curve (dots show mean sampling solutions)"), BAtheme(), BAdiscretecolorscale(2),
layer(res_samp_aggregated,y="E_U_mean",x="I_aw_mean",Geom.point,Geom.line(preserve_order=true),color="Forgetting"),
layer(perf_res_analytical_samp,y="E_U",x="I_aw",Geom.point),
layer(perf_res_analytical,y="E_U",x="I_aw",Geom.line),
layer(perf_res_analytical,y="E_U",x="I_aw",ymin="E_U",ymax=ones(size(perf_res_analytical,1))*maximum(perf_res_analytical[:E_U]),
Geom.ribbon,BAtheme(default_color=colorant"green"))
)
In [ ]:
#store the plots
#draw(SVG("Figures/RateUtilityCurve.svg", 8.5cm, 7cm), plot_rateutility)
#draw(SVG("Figures/SamplingCond.svg", 8.5cm, 9cm), plt_cond)
#draw(SVG("Figures/BASampling.svg", 13cm, 11cm), plt_samp)
#plot_samp = vstack(plt_samp, plt_cond)
#draw(SVG("Figures/SamplingResults.svg", 18cm,16cm),plot_samp)
In [ ]:
#try changing the number of inner-/outer-loop iterations
#try changing the burn-in ratio
#try soft-forgetting (with exponential-decay window)
#forgetting seems to do better than no forgetting (in terms of being closer to the rate-utility curve),
#but it also seems that the points with forgetting tend to have a lower I(A;O) (and also a lower E[U]) - even though
#the temperatures are the same.