Mutual Information

In the feature selection context, Mutual Information between a label Y and a predictor X is the amount of entropy shared between the true distributions of P(X) and P(Y).

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/d4/Entropy-mutual-information-relative-entropy-relation-diagram.svg/744px-Entropy-mutual-information-relative-entropy-relation-diagram.svg.png" width="400", height="400"/>


In [1]:
import pandas as pd
import numpy as np
lenses_data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/lenses/lenses.data', sep='\s+', header=None
                         )

Attribute Information:

 1 : the patient should be fitted with hard contact lenses,
 2 : the patient should be fitted with soft contact lenses,
 3 : the patient should not be fitted with contact lenses.

1. age of the patient: (1) young, (2) pre-presbyopic, (3) presbyopic
2. spectacle prescription:  (1) myope, (2) hypermetrope
3. astigmatic:     (1) no, (2) yes
4. tear production rate:  (1) reduced, (2) normal

In [2]:
lenses_data.columns= ['index', 'age', 'spec_type', 'astigmatic', 'tear_prod_rate', 'lens_type']

lenses_data = lenses_data.set_index('index')

In [3]:
lenses_data.head()


Out[3]:
age spec_type astigmatic tear_prod_rate lens_type
index
1 1 1 1 1 3
2 1 1 1 2 2
3 1 1 2 1 3
4 1 1 2 2 1
5 1 2 1 1 3

In [4]:
lens_type_names = {1: 'hard', 2: 'soft', 3: 'no_lense'}
lenses_data = lenses_data.assign(lens_type=lenses_data.lens_type.map(lambda n: lens_type_names[n]))

In [5]:
type_names = {1: 'no', 2: 'yes'}
lenses_data = lenses_data.assign(astigmatic=lenses_data.astigmatic.map(lambda n: type_names[n]))

In [6]:
lenses_data.head()


Out[6]:
age spec_type astigmatic tear_prod_rate lens_type
index
1 1 1 no 1 no_lense
2 1 1 no 2 soft
3 1 1 yes 1 no_lense
4 1 1 yes 2 hard
5 1 2 no 1 no_lense

Calculating Mututal Information Score Directly


In [7]:
from sklearn.metrics import mutual_info_score
mutual_info_score(lenses_data['lens_type'], lenses_data['astigmatic'])


Out[7]:
0.26132011223880902

Mutual Information is defined by

$I(Y, X) = H(Y) - H(Y \mid X)$

First step is to calculate $H(Y) = - P(Y) \space log \space P(Y)$


In [8]:
y_counts = lenses_data['astigmatic'].value_counts()

In [9]:
y_counts


Out[9]:
no     12
yes    12
Name: astigmatic, dtype: int64

In [10]:
P_y = lenses_data['astigmatic'].value_counts(normalize=True)

In [11]:
P_y


Out[11]:
no     0.5
yes    0.5
Name: astigmatic, dtype: float64

In [12]:
H_y = - P_y.dot(np.log(P_y))

In [13]:
H_y


Out[13]:
0.69314718055994529

Next step is to calculate $H(Y \mid X)$.

$H(Y \mid X) = \sum_{x \in X} p(x) H(Y \mid X=x) = - \sum _{x \in X} p(x) \sum_{y \in Y} p(Y=y \mid X=x)\space log \space p(Y=y \mid X=x)$


In [14]:
# cont_table = pd.crosstab(lenses_data['astigmatic'], lenses_data['lens_type'])
cont_table = pd.crosstab(lenses_data['astigmatic'], lenses_data['lens_type'])
cont_table


Out[14]:
lens_type hard no_lense soft
astigmatic
no 0 7 5
yes 4 8 0

In [15]:
n_elems = cont_table.sum(axis=0)
n_elems


Out[15]:
lens_type
hard         4
no_lense    15
soft         5
dtype: int64

In [16]:
P_cond = cont_table / n_elems
P_cond


Out[16]:
lens_type hard no_lense soft
astigmatic
no 0.0 0.466667 1.0
yes 1.0 0.533333 0.0

In [17]:
P_x = lenses_data['lens_type'].value_counts(normalize=True)
P_x


Out[17]:
no_lense    0.625000
soft        0.208333
hard        0.166667
Name: lens_type, dtype: float64

In [18]:
P_cond_aug = P_cond.where(P_cond != 0, other=1.0)
P_cond_aug


Out[18]:
lens_type hard no_lense soft
astigmatic
no 1.0 0.466667 1.0
yes 1.0 0.533333 1.0

In [19]:
H_temp = P_cond.mul(np.log(P_cond_aug)).sum(axis=0)
H_temp


Out[19]:
lens_type
hard        0.000000
no_lense   -0.690923
soft        0.000000
dtype: float64

In [20]:
H_Y_given_X = - H_temp.dot(P_x)
H_Y_given_X


Out[20]:
0.43182706832113626

In [21]:
mutual_info = H_y - H_Y_given_X

In [22]:
mutual_info


Out[22]:
0.26132011223880902