Mutual Information

In the feature selection context, Mutual Information between a label Y and a predictor X is the amount of entropy shared between the true distributions of P(X) and P(Y).



In [1]:

    
import pandas as pd
import numpy as np
lenses_data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/lenses/lenses.data', sep='\s+', header=None
                         )

Attribute Information:

 1 : the patient should be fitted with hard contact lenses,
 2 : the patient should be fitted with soft contact lenses,
 3 : the patient should not be fitted with contact lenses.

1. age of the patient: (1) young, (2) pre-presbyopic, (3) presbyopic
2. spectacle prescription:  (1) myope, (2) hypermetrope
3. astigmatic:     (1) no, (2) yes
4. tear production rate:  (1) reduced, (2) normal



In [2]:

    
lenses_data.columns= ['index', 'age', 'spec_type', 'astigmatic', 'tear_prod_rate', 'lens_type']

lenses_data = lenses_data.set_index('index')



In [3]:

    
lenses_data.head()









    Out[3]:






  
    
      
      age
      spec_type
      astigmatic
      tear_prod_rate
      lens_type
    
    
      index
      
      
      
      
      
    
  
  
    
      1
      1
      1
      1
      1
      3
    
    
      2
      1
      1
      1
      2
      2
    
    
      3
      1
      1
      2
      1
      3
    
    
      4
      1
      1
      2
      2
      1
    
    
      5
      1
      2
      1
      1
      3



In [4]:

    
lens_type_names = {1: 'hard', 2: 'soft', 3: 'no_lense'}
lenses_data = lenses_data.assign(lens_type=lenses_data.lens_type.map(lambda n: lens_type_names[n]))



In [5]:

    
type_names = {1: 'no', 2: 'yes'}
lenses_data = lenses_data.assign(astigmatic=lenses_data.astigmatic.map(lambda n: type_names[n]))



In [6]:

    
lenses_data.head()









    Out[6]:






  
    
      
      age
      spec_type
      astigmatic
      tear_prod_rate
      lens_type
    
    
      index
      
      
      
      
      
    
  
  
    
      1
      1
      1
      no
      1
      no_lense
    
    
      2
      1
      1
      no
      2
      soft
    
    
      3
      1
      1
      yes
      1
      no_lense
    
    
      4
      1
      1
      yes
      2
      hard
    
    
      5
      1
      2
      no
      1
      no_lense

Calculating Mututal Information Score Directly



In [7]:

    
from sklearn.metrics import mutual_info_score
mutual_info_score(lenses_data['lens_type'], lenses_data['astigmatic'])









    Out[7]:





0.26132011223880902

Mutual Information is defined by

$I(Y, X) = H(Y) - H(Y \mid X)$

First step is to calculate $H(Y) = - P(Y) \space log \space P(Y)$



In [8]:

    
y_counts = lenses_data['astigmatic'].value_counts()



In [9]:

    
y_counts









    Out[9]:





no     12
yes    12
Name: astigmatic, dtype: int64



In [10]:

    
P_y = lenses_data['astigmatic'].value_counts(normalize=True)



In [11]:

    
P_y









    Out[11]:





no     0.5
yes    0.5
Name: astigmatic, dtype: float64



In [12]:

    
H_y = - P_y.dot(np.log(P_y))



In [13]:

    
H_y









    Out[13]:





0.69314718055994529

Next step is to calculate $H(Y \mid X)$.

$H(Y \mid X) = \sum_{x \in X} p(x) H(Y \mid X=x) = - \sum _{x \in X} p(x) \sum_{y \in Y} p(Y=y \mid X=x)\space log \space p(Y=y \mid X=x)$



In [14]:

    
# cont_table = pd.crosstab(lenses_data['astigmatic'], lenses_data['lens_type'])
cont_table = pd.crosstab(lenses_data['astigmatic'], lenses_data['lens_type'])
cont_table









    Out[14]:






  
    
      lens_type
      hard
      no_lense
      soft
    
    
      astigmatic
      
      
      
    
  
  
    
      no
      0
      7
      5
    
    
      yes
      4
      8
      0



In [15]:

    
n_elems = cont_table.sum(axis=0)
n_elems









    Out[15]:





lens_type
hard         4
no_lense    15
soft         5
dtype: int64



In [16]:

    
P_cond = cont_table / n_elems
P_cond









    Out[16]:






  
    
      lens_type
      hard
      no_lense
      soft
    
    
      astigmatic
      
      
      
    
  
  
    
      no
      0.0
      0.466667
      1.0
    
    
      yes
      1.0
      0.533333
      0.0



In [17]:

    
P_x = lenses_data['lens_type'].value_counts(normalize=True)
P_x









    Out[17]:





no_lense    0.625000
soft        0.208333
hard        0.166667
Name: lens_type, dtype: float64



In [18]:

    
P_cond_aug = P_cond.where(P_cond != 0, other=1.0)
P_cond_aug









    Out[18]:






  
    
      lens_type
      hard
      no_lense
      soft
    
    
      astigmatic
      
      
      
    
  
  
    
      no
      1.0
      0.466667
      1.0
    
    
      yes
      1.0
      0.533333
      1.0



In [19]:

    
H_temp = P_cond.mul(np.log(P_cond_aug)).sum(axis=0)
H_temp









    Out[19]:





lens_type
hard        0.000000
no_lense   -0.690923
soft        0.000000
dtype: float64



In [20]:

    
H_Y_given_X = - H_temp.dot(P_x)
H_Y_given_X









    Out[20]:





0.43182706832113626



In [21]:

    
mutual_info = H_y - H_Y_given_X



In [22]:

    
mutual_info









    Out[22]:





0.26132011223880902

	age	spec_type	astigmatic	tear_prod_rate	lens_type
index
1	1	1	no	1	no_lense
2	1	1	no	2	soft
3	1	1	yes	1	no_lense
4	1	1	yes	2	hard
5	1	2	no	1	no_lense