Using the notation from kdpp paper: http://www.alexkulesza.com/pubs/kdpps_icml11.pdf

Using equation (13) + (18) we have

$$\begin{aligned} \text{det}(L+I) \sum_{\lvert Y \rvert = k} \mathcal{P}_L(Y) = e_k^N \end{aligned}$$

Rearranging:

$$\frac{e_k^N}{\text{det}(L+I)} = \sum_{\lvert Y \rvert = k} \mathcal{P}_L(Y) $$

Then it follows:

$$\begin{aligned} Pr(k' | k' > k) &= \frac{\sum_{\lvert Y \rvert = k'} \mathcal{P}_L(Y)}{\sum_{j=k+1, k+2, \dots, N} \sum_{\lvert Y \rvert = j} \mathcal{P}_L(Y)}\\ &= \frac{e_{k'}^N / \text{det}(L+I)}{\sum_{j=k+1, k+2, \dots, N} e_{j}^N / \text{det}(L+I)} \\ &= \frac{e_{k'}^N }{\sum_{j=k+1, k+2, \dots, N} e_{j}^N } \end{aligned}$$



In [1]:

    
from dpp import *
import numpy as np
from sklearn.metrics.pairwise import rbf_kernel



In [2]:

    
def sample_k(L, k):
    """
    Given a decomposed kernel, and the number of elements thus chosen
    return appropriate k' to be samples (ranges from k+1, k+2, ... N)
    """
    N = L['D'].shape[0]
    E = elem_sympoly(L['D'], N)
    el_list = list(range(k+1, N+1))
    E_ls = [E[x, -1] for x in el_list]
    E_ls = np.array(E_ls)/(np.sum(E_ls))
    return np.random.choice(el_list, p=E_ls)



In [3]:

    
X = np.random.normal(size=(100, 10))



In [4]:

    
feat_dist = rbf_kernel(X.T)
L = decompose_kernel(feat_dist)



In [5]:

    
sample_k(L, 7)









    Out[5]:





8



In [ ]:



In [6]:

    
sample_dpp(L)









    Out[6]:





[2, 1, 9, 5]