population of California in 2010: https://github.com/rdhyee/diversity-census-calc/blob/0.0.1/census_2010_sf1/state_five_categories.csv#L6



In [1]:

    
pops = [14956253, 2163804, 4775070, 14013719, 1345110]
CA_pop = 37253956



In [2]:

    
sum(pops) == CA_pop









    Out[2]:





True

Shannon index

https://en.wikipedia.org/wiki/Diversity_index#Shannon_index

$H' = -\sum_{i=1}^R p_i \ln p_i$



In [3]:

    
import math

s = 0.0
pop_total = sum(pops)

for pop in pops:
    p = pop / pop_total
    s += -p*math.log(p)
    
# normalize s so that it goes from 0 to 1    
s /= -math.log(0.2)

print (s)









    



0.7969941601550823

Compare with the entropy_5 column for California

Gini-Simpson

https://en.wikipedia.org/wiki/Diversity_index#Gini.E2.80.93Simpson_index

$1 - \lambda = 1 - \sum_{i=1}^R p_i^2 = 1 - 1/{}^2D$



In [4]:

    
pop_total = sum(pops)
p2_total = 0.0

for pop in pops:
    p = pop / pop_total
    p2_total += p*p
    
gs = 1-p2_total
print(gs)









    



0.6762156265155197