Using the notation from kdpp paper: http://www.alexkulesza.com/pubs/kdpps_icml11.pdf
Using equation (13) + (18) we have
$$\begin{aligned} \text{det}(L+I) \sum_{\lvert Y \rvert = k} \mathcal{P}_L(Y) = e_k^N \end{aligned}$$Rearranging:
$$\frac{e_k^N}{\text{det}(L+I)} = \sum_{\lvert Y \rvert = k} \mathcal{P}_L(Y) $$Then it follows:
$$\begin{aligned} Pr(k' | k' > k) &= \frac{\sum_{\lvert Y \rvert = k'} \mathcal{P}_L(Y)}{\sum_{j=k+1, k+2, \dots, N} \sum_{\lvert Y \rvert = j} \mathcal{P}_L(Y)}\\ &= \frac{e_{k'}^N / \text{det}(L+I)}{\sum_{j=k+1, k+2, \dots, N} e_{j}^N / \text{det}(L+I)} \\ &= \frac{e_{k'}^N }{\sum_{j=k+1, k+2, \dots, N} e_{j}^N } \end{aligned}$$
In [1]:
from dpp import *
import numpy as np
from sklearn.metrics.pairwise import rbf_kernel
In [2]:
def sample_k(L, k):
"""
Given a decomposed kernel, and the number of elements thus chosen
return appropriate k' to be samples (ranges from k+1, k+2, ... N)
"""
N = L['D'].shape[0]
E = elem_sympoly(L['D'], N)
el_list = list(range(k+1, N+1))
E_ls = [E[x, -1] for x in el_list]
E_ls = np.array(E_ls)/(np.sum(E_ls))
return np.random.choice(el_list, p=E_ls)
In [3]:
X = np.random.normal(size=(100, 10))
In [4]:
feat_dist = rbf_kernel(X.T)
L = decompose_kernel(feat_dist)
In [5]:
sample_k(L, 7)
Out[5]:
In [ ]:
In [6]:
sample_dpp(L)
Out[6]: