In [7]:
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics.pairwise import euclidean_distances
In [4]:
X = np.random.normal(size=(10, 10))
Entropy
Based on the paper in Mitra paper. Distance is based on this:
$$D_{pq} = \left( \sum_{j=1}^M \left(\frac{x_{p,j} - x_{q,j}}{\text{max}_j - \text{min}_j}\right)^2 \right)^{1/2}$$This is simply MinMaxScaler with euclidean distance.
Then we will further define
$$\text{sim}(p, q) = e^{-\alpha D_{pq}}$$Where $\alpha = \frac{-\log 0.5}{\bar{D}}$ and $\bar{D}$ is the average distance computed between data points for the whole dataset.
Then using this, we can calculate Entropy:
$$-\sum_{p=1}^l \sum_{q=1}^l (\text{sim}(p, q) \times \log sim(p, q) + (1-\text{sim}(p, q))\times \log(1-\text{sim}(p, q)))$$
In [6]:
mm = MinMaxScaler()
X_mm = mm.fit_transform(X)
In [25]:
np.exp(np.array([1,1,1,1,1,1]))
Out[25]:
In [36]:
def entropy(X):
mm = MinMaxScaler()
X_mm = mm.fit_transform(X)
Dpq = euclidean_distances(X_mm)
D_bar = np.mean([x for x in np.triu(Dpq).flatten() if x != 0])
alpha = -np.log(0.5)/D_bar
sim_pq = np.exp(-alpha * Dpq)
log_sim_pq = np.log(sim_pq)
entropy = -2*np.sum(np.triu(sim_pq*log_sim_pq + ((1-sim_pq)*np.log((1-sim_pq))), 1))
return entropy
In [41]:
entropy(np.random.normal(size=(10, 2)))
Out[41]:
In [42]:
from sklearn.mixture import BayesianGaussianMixture
In [61]:
bgm = BayesianGaussianMixture(n_components=10)
In [62]:
X = np.random.normal(size=(1000,)).reshape(-1, 1)
In [63]:
bgm.fit(X)
Out[63]:
In [64]:
bgm.predict(X)
Out[64]: