One student emailed with the following question:

Right now I'm trying to edit the example entropy function to the one you wrote on the board in class.

My question is the example code only has one P_s, right? Our goal is to add four different values but I don't understand how to code P_w, P_h and so on. Would you give me more details and advice on this?

As a hint, I will rewrite the entropy formula I wrote to explicitly look up the various categories.

def entropy(series):
    """Normalized Shannon Index"""
    # a series in which all the entries are equal should result in normalized entropy of 1.0
    # eliminate 0s
    series1 = series[series!=0]

    # if len(series) < 2 (i.e., 0 or 1) then return 0
    if len(series1) > 1:
        # calculate the maximum possible entropy for given length of input series
        max_s = -np.log(1.0/len(series))
        total = float(sum(series1))
        p = series1.astype('float')/float(total)
        return sum(-p*np.log(p))/max_s
        return 0.0

# supporint imports 

import numpy as np
from pandas import Series

def entropy_term(p):
    """Individual Shannon entropy term -- handles the case in which p is 0"""
    if p == 0:
        return 0
        return -p*np.log(p)

def entropy5_explicit_labels(series):
    """entropy5 calculation for an input Series with 5 categories"""
    # calculate the normalizing term -- what's the maximum entropy
    # there are five categories here
    max_s = -np.log(1.0/5)
    total = float(series['White']+series['Black']+series['Asian']+ \
    s = entropy_term(series['White']/total) + \
        entropy_term(series['Black']/total) + \
        entropy_term(series['Asian']/total) + \
        entropy_term(series['Hispanic']/total) + \
    s = s/max_s
    return s

def entropy4_explicit_labels(series):
    """entropy4 calculation for an input Series with 4 categories"""
    # calculate the normalizing term -- what's the maximum entropy
    # there are five categories here
    max_s = -np.log(1.0/4)
    # don't include Other in the total
    total = float(series['White']+series['Black']+series['Asian']+ \
    s = entropy_term(series['White']/total) + \
        entropy_term(series['Black']/total) + \
        entropy_term(series['Asian']/total) + \
    s = s/max_s
    return s

# Using the population figures for the Houston Metro Area
# Make a pandas Series out of the dict

houston = Series({'Asian': 384596,
 'Black': 998883,
 'Hispanic': 2099412,
 'Other': 103437,
 'White': 2360472})

Note how the entropy function can be used to do both the entropy5 and entropy4 calculation by just changing the subset of the houston Series being passed into entropy

# comparing two ways of doing the entropy5 calculation
(entropy(houston[['White', 'Black', 'Asian', 'Hispanic', 'Other']]),

(0.79628076626851163, 0.79628076626851163)

# comparing two ways of doing the entropy4 calculation 
# don't include Other

(entropy(houston[['White', 'Black', 'Asian', 'Hispanic']]),

(0.87642479416885899, 0.87642479416885899)

Calculating a entropy_rice function is left to the reader....

