In the feature selection context, Mutual Information between a label Y and a predictor X is the amount of entropy shared between the true distributions of P(X) and P(Y).
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Entropy-mutual-information-relative-entropy-relation-diagram.svg/744px-Entropy-mutual-information-relative-entropy-relation-diagram.svg.png" width="400", height="400"/>
In [1]:
import pandas as pd
import numpy as np
lenses_data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/lenses/lenses.data', sep='\s+', header=None
)
Attribute Information:
1 : the patient should be fitted with hard contact lenses,
2 : the patient should be fitted with soft contact lenses,
3 : the patient should not be fitted with contact lenses.
1. age of the patient: (1) young, (2) pre-presbyopic, (3) presbyopic
2. spectacle prescription: (1) myope, (2) hypermetrope
3. astigmatic: (1) no, (2) yes
4. tear production rate: (1) reduced, (2) normal
In [2]:
lenses_data.columns= ['index', 'age', 'spec_type', 'astigmatic', 'tear_prod_rate', 'lens_type']
lenses_data = lenses_data.set_index('index')
In [3]:
lenses_data.head()
Out[3]:
In [4]:
lens_type_names = {1: 'hard', 2: 'soft', 3: 'no_lense'}
lenses_data = lenses_data.assign(lens_type=lenses_data.lens_type.map(lambda n: lens_type_names[n]))
In [5]:
type_names = {1: 'no', 2: 'yes'}
lenses_data = lenses_data.assign(astigmatic=lenses_data.astigmatic.map(lambda n: type_names[n]))
In [6]:
lenses_data.head()
Out[6]:
In [7]:
from sklearn.metrics import mutual_info_score
mutual_info_score(lenses_data['lens_type'], lenses_data['astigmatic'])
Out[7]:
Mutual Information is defined by
$I(Y, X) = H(Y) - H(Y \mid X)$
First step is to calculate $H(Y) = - P(Y) \space log \space P(Y)$
In [8]:
y_counts = lenses_data['astigmatic'].value_counts()
In [9]:
y_counts
Out[9]:
In [10]:
P_y = lenses_data['astigmatic'].value_counts(normalize=True)
In [11]:
P_y
Out[11]:
In [12]:
H_y = - P_y.dot(np.log(P_y))
In [13]:
H_y
Out[13]:
Next step is to calculate $H(Y \mid X)$.
$H(Y \mid X) = \sum_{x \in X} p(x) H(Y \mid X=x) = - \sum _{x \in X} p(x) \sum_{y \in Y} p(Y=y \mid X=x)\space log \space p(Y=y \mid X=x)$
In [14]:
# cont_table = pd.crosstab(lenses_data['astigmatic'], lenses_data['lens_type'])
cont_table = pd.crosstab(lenses_data['astigmatic'], lenses_data['lens_type'])
cont_table
Out[14]:
In [15]:
n_elems = cont_table.sum(axis=0)
n_elems
Out[15]:
In [16]:
P_cond = cont_table / n_elems
P_cond
Out[16]:
In [17]:
P_x = lenses_data['lens_type'].value_counts(normalize=True)
P_x
Out[17]:
In [18]:
P_cond_aug = P_cond.where(P_cond != 0, other=1.0)
P_cond_aug
Out[18]:
In [19]:
H_temp = P_cond.mul(np.log(P_cond_aug)).sum(axis=0)
H_temp
Out[19]:
In [20]:
H_Y_given_X = - H_temp.dot(P_x)
H_Y_given_X
Out[20]:
In [21]:
mutual_info = H_y - H_Y_given_X
In [22]:
mutual_info
Out[22]: