population of California in 2010: https://github.com/rdhyee/diversity-census-calc/blob/0.0.1/census_2010_sf1/state_five_categories.csv#L6
In [1]:
pops = [14956253, 2163804, 4775070, 14013719, 1345110]
CA_pop = 37253956
In [2]:
sum(pops) == CA_pop
Out[2]:
https://en.wikipedia.org/wiki/Diversity_index#Shannon_index
$H' = -\sum_{i=1}^R p_i \ln p_i$
In [3]:
import math
s = 0.0
pop_total = sum(pops)
for pop in pops:
p = pop / pop_total
s += -p*math.log(p)
# normalize s so that it goes from 0 to 1
s /= -math.log(0.2)
print (s)
Compare with the entropy_5
column for California
https://en.wikipedia.org/wiki/Diversity_index#Gini.E2.80.93Simpson_index
$1 - \lambda = 1 - \sum_{i=1}^R p_i^2 = 1 - 1/{}^2D$
In [4]:
pop_total = sum(pops)
p2_total = 0.0
for pop in pops:
p = pop / pop_total
p2_total += p*p
gs = 1-p2_total
print(gs)