Exercise

In this exercise, reproduce some of the findings from What Makes Houston the Next Great American City? | Travel | Smithsonian, specifically the calculation represented in

whose caption is

To assess the parity of the four major U.S. ethnic and racial groups, Rice University researchers used a scale called the Entropy Index. It ranges from 0 (a population has just one group) to 1 (all groups are equivalent). Edging New York for the most balanced diversity, Houston had an Entropy Index of 0.874 (orange bar).

The research report by Smithsonian Magazine is Houston Region Grows More Racially/Ethnically Diverse, With Small Declines in Segregation: A Joint Report Analyzing Census Data from 1990, 2000, and 2010 by the Kinder Institute for Urban Research & the Hobby Center for the Study of Texas.

In the report, you'll find the following quotes:

How does Houston’s racial/ethnic diversity compare to the racial/ethnic diversity of other large metropolitan areas? The Houston metropolitan area is the most racially/ethnically diverse.

....

Houston is one of the most racially/ethnically diverse metropolitan areas in the nation as well. *It is the most diverse of the 10 largest U.S. metropolitan areas.* [emphasis mine] Unlike the other large metropolitan areas, all four major racial/ethnic groups have substantial representation in Houston with Latinos and Anglos occupying roughly equal shares of the population.

....

Houston has the highest entropy score of the 10 largest metropolitan areas, 0.874. New York is a close second with a score of 0.872.

....

Your task is:

Tabulate all the metropolian/micropolitan statistical areas. Remember that you have to group various entities that show up separately in the Census API but which belong to the same area. You should find 942 metropolitan/micropolitan statistical areas in the 2010 Census.
Calculate the normalized Shannon index (entropy5) using the categories of White, Black, Hispanic, Asian, and Other as outlined in the Day_07_G_Calculating_Diversity notebook
Calculate the normalized Shannon index (entropy4) by not considering the Other category. In other words, assume that the the total population is the sum of White, Black, Hispanic, and Asian.
Figure out how exactly the entropy score was calculated in the report from Rice University. Since you'll find that the entropy score reported matches neither entropy5 nor entropy4, you'll need to play around with the entropy calculation to figure how to use 4 categories to get the score for Houston to come out to "0.874" and that for NYC to be "0.872". [I think I've done so and get 0.873618 and 0.872729 respectively.]
Add a calculation of the Gini-Simpson diversity index using the five categories of White, Black, Hispanic, Asian, and Other.
Note where the Bay Area stands in terms of the diversity index.

For bonus points:

make a bar chart in the style used in the Smithsonian Magazine

Deliverable:

You will need to upload your notebook to a gist and render the notebook in nbviewer and then enter the nbviewer URL (e.g., http://nbviewer.ipython.org/gist/rdhyee/60b6c0b0aad7fd531938)
On bCourses, upload the CSV version of your msas_df.

HAVE FUN: ASK QUESTIONS AND WORK TOGETHER

Constraints

Below is testing code to help make sure you are on the right track. A key assumption made here is that you will end up with a Pandas DataFrame called msas_df, indexed by the FIPS code of a metropolitan/micropolitan area (e.g., Houston's code is 26420) and with the the following columns:

Total
White
Black
Hispanic
Asian
Other
p_White
p_Black
p_Hispanic
p_Asian
p_Other
entropy4
entropy5
entropy_rice
gini_simpson

You should have 942 rows, one for each MSA. You can compare your results for entropy5, entropy_rice with mine.



In [43]:

    
# FILL IN WITH YOUR CODE

import census
import settings
import us


import time
from pandas import DataFrame, Series, Index
import pandas as pd
from itertools import islice

c = census.Census(key=settings.CENSUS_KEY)



In [44]:

    
# def msas(variables="NAME"):
#     # tabulate a set of fips codes for the states
#     states_fips = set([s.fips for s in us.states.STATES])
#     geo = {'for':'metropolitan statistical area/micropolitan statistical area:*', 
#                'in':'state:*'}
               
#     for msa in c.sf1.get(variables, geo=geo):
#         if msa['state'] in states_fips:
#             yield msa


def msas(variables="NAME"):
    
     for state in us.STATES:
        geo = {'for':'metropolitan statistical area/micropolitan statistical area:*', 
               'in':'state:{state_fips}'.format(state_fips=state.fips)
               }
    
        for msa in c.sf1.get(variables, geo=geo):
            yield msa


msa_list = list(islice(msas("NAME,P0010001"), None))
len(msa_list)









    Out[44]:





1013



In [45]:

    
msa_list[:5]









    Out[45]:





[{u'NAME': u'Albertville, AL Micro Area',
  u'P0010001': u'93019',
  u'metropolitan statistical area/micropolitan statistical area': u'10700',
  u'state': u'01'},
 {u'NAME': u'Alexander City, AL Micro Area',
  u'P0010001': u'53155',
  u'metropolitan statistical area/micropolitan statistical area': u'10760',
  u'state': u'01'},
 {u'NAME': u'Anniston-Oxford, AL Metro Area',
  u'P0010001': u'118572',
  u'metropolitan statistical area/micropolitan statistical area': u'11500',
  u'state': u'01'},
 {u'NAME': u'Auburn-Opelika, AL Metro Area',
  u'P0010001': u'140247',
  u'metropolitan statistical area/micropolitan statistical area': u'12220',
  u'state': u'01'},
 {u'NAME': u'Birmingham-Hoover, AL Metro Area',
  u'P0010001': u'1128047',
  u'metropolitan statistical area/micropolitan statistical area': u'13820',
  u'state': u'01'}]



In [46]:

    
df = DataFrame(msa_list)



In [82]:

    
df.groupby('metropolitan statistical area/micropolitan statistical area').count()
type(df)









    Out[82]:





pandas.core.frame.DataFrame



In [61]:

    
df[['P0010001', 'state']] = \
    df[['P0010001', 'state']].astype('int')



In [92]:

    
#initially done like this
#grouped_msa = df.groupby('metropolitan statistical area/micropolitan statistical area').sum() 
#---> groups by msa---> have name, popn, state as columns
#but did not work to get list of msa
#therefore changed the strategy to the following

grouped_msa = df.groupby('metropolitan statistical area/micropolitan statistical area')



In [94]:

    
s= grouped_msa['P0010001'].sum()
s









    Out[94]:





metropolitan statistical area/micropolitan statistical area
10020                                                           57999
10100                                                           40602
10140                                                           72797
10180                                                          165252
10220                                                           37492
10300                                                           99892
10420                                                          703200
10460                                                           63797
10500                                                          157308
10540                                                          116672
10580                                                          870716
10620                                                           60585
10660                                                           31255
10700                                                           93019
10740                                                          887077
...
49060                                                           36311
49100                                                           51461
49180                                                          477717
49260                                                           20081
49300                                                          114520
49340                                                          798552
49380                                                           21378
49420                                                          243231
49460                                                           22438
49540                                                           28065
49620                                                          434972
49660                                                          565773
49700                                                          166892
49740                                                          195751
49780                                                           86074
Name: P0010001, Length: 942, dtype: int32



In [201]:

    
type(s)









    Out[201]:





pandas.core.series.Series



In [ ]:

    
#to get the msa_ids
msa_ids = s.index
msa_ids



In [119]:

    
msa_ids









    Out[119]:





Index([u'10020', u'10100', u'10140', u'10180', u'10220', u'10300', u'10420', u'10460', u'10500', u'10540', u'10580', u'10620', u'10660', u'10700', u'10740', u'10760', u'10780', u'10820', u'10860', u'10880', u'10900', u'10940', u'10980', u'11020', u'11060', u'11100', u'11140', u'11180', u'11220', u'11260', u'11300', u'11340', u'11380', u'11420', u'11460', u'11500', u'11540', u'11580', u'11620', u'11660', u'11700', u'11740', u'11780', u'11820', u'11860', u'11900', u'11940', u'11980', u'12020', u'12060', u'12100', u'12140', u'12180', u'12220', u'12260', u'12300', u'12380', u'12420', u'12460', u'12540', u'12580', u'12620', u'12660', u'12700', u'12740', u'12780', u'12820', u'12860', u'12900', u'12940', u'12980', u'13020', u'13060', u'13100', u'13140', u'13180', u'13220', u'13260', u'13300', u'13340', u'13380', u'13420', u'13460', u'13500', u'13540', u'13620', u'13660', u'13700', u'13740', u'13780', u'13820', u'13860', u'13900', u'13940', u'13980', u'14020', u'14060', u'14100', u'14140', u'14180', ...], dtype='object')



In [111]:

    
#to access list of states of each msa
for msa in s.index:
    print msa, s[msa],list(grouped_msa.get_group(msa).state)









    



10020 57999 [22]
10100 40602 [46]
10140 72797 [53]
10180 165252 [48]
10220 37492 [40]
10300 99892 [26]
10420 703200 [39]
10460 63797 [35]
10500 157308 [13]
10540 116672 [41]
10580 870716 [36]
10620 60585 [37]
10660 31255 [27]
10700 93019 [1]
10740 887077 [35]
10760 53155 [1]
10780 153922 [22]
10820 36009 [27]
10860 40838 [48]
10880 111408 [26]
10900 821173 [34, 42]
10940 42476 [26]
10980 29598 [26]
11020 127089 [42]
11060 26446 [40]
11100 249881 [48]
11140 37829 [13]
11180 89542 [19]
11220 50219 [36]
11260 380821 [2]
11300 131636 [18]
11340 187126 [45]
11380 14786 [48]
11420 34185 [18]
11460 344791 [26]
11500 118572 [1]
11540 225666 [55]
11580 34862 [12]
11620 56980 [40]
11660 22995 [5]
11700 424858 [37]
11740 53139 [39]
11780 101497 [39]
11820 37039 [41]
11860 16924 [20]
11900 64757 [39]
11940 52266 [47]
11980 78532 [48]
12020 192541 [13]
12060 5268860 [13]
12100 274549 [34]
12140 42223 [18]
12180 80026 [36]
12220 140247 [1]
12260 556877 [13, 45]
12300 122151 [23]
12380 39163 [27]
12420 1716289 [48]
12460 27842 [13]
12540 839631 [6]
12580 2710489 [24]
12620 153923 [23]
12660 61976 [55]
12700 215888 [25]
12740 59534 [50]
12780 50976 [40]
12820 27979 [22]
12860 60079 [36]
12900 36647 [5]
12940 802484 [22]
12980 136146 [26]
13020 107771 [26]
13060 36702 [48]
13100 22311 [31]
13140 388745 [48]
13180 88759 [55]
13220 78859 [54]
13260 46134 [18]
13300 31861 [48]
13340 45858 [39]
13380 201140 [53]
13420 44442 [27]
13460 157733 [41]
13500 28933 [45]
13540 37125 [50]
13620 39361 [33, 50]
13660 42798 [26]
13700 35012 [48]
13740 158050 [30]
13780 251725 [36]
13820 1128047 [1]
13860 18546 [6]
13900 108779 [38]
13940 45607 [16]
13980 162958 [51]
14020 192714 [18]
14060 169572 [17]
14100 85562 [42]
14140 107342 [51, 54]
14180 46480 [5]
14220 47168 [22]
14260 616561 [16]
14300 33915 [48]
14340 26306 [19]
14380 51079 [37]
14420 22150 [48]
14460 4552402 [25, 33]
14500 294567 [8]
14540 125953 [21]
14580 89513 [30]
14620 43450 [42]
14660 91067 [27]
14700 83877 [29]
14740 251133 [53]
14780 33718 [48]
14820 33090 [37]
14860 916829 [9]
14940 49975 [49]
15020 34869 [28]
15060 22364 [41]
15100 31965 [46]
15140 18787 [47]
15180 406220 [48]
15220 38106 [48]
15260 112370 [13]
15340 43784 [39]
15380 1135509 [36]
15420 43021 [16]
15460 47656 [17, 19]
15500 151131 [37]
15540 211261 [50]
15580 34200 [30]
15620 47584 [26]
15660 55186 [13]
15700 32618 [24]
15740 40087 [39]
15780 31488 [5]
15820 24512 [21]
15860 46824 [8]
15900 37069 [17]
15940 404422 [39]
15980 618754 [12]
16020 96275 [17, 29]
16060 60218 [17]
16100 53829 [35]
16180 55274 [32]
16220 75450 [56]
16260 46163 [49]
16300 257940 [19]
16340 41475 [13]
16380 40814 [39]
16420 31499 [21]
16460 39437 [17]
16500 75455 [53]
16540 149618 [42]
16580 231891 [17]
16620 304284 [54]
16660 64921 [17]
16700 664607 [45]
16740 1758038 [37, 45]
16820 201559 [51]
16860 528143 [13, 47]
16900 33140 [45]
16940 91738 [56]
16980 9461105 [17, 18, 55]
17020 220000 [6]
17060 78064 [39]
17140 2130151 [18, 21, 39]
17200 43742 [33]
17220 94196 [54]
17260 26151 [28]
17300 273949 [21, 47]
17340 64665 [6]
17380 34145 [28]
17420 115788 [47]
17460 2077240 [39]
17500 39140 [12]
17540 49116 [19]
17580 48376 [35]
17660 138494 [16]
17700 35471 [20]
17740 45248 [26]
17780 228660 [48]
17820 645613 [8]
17860 172786 [29]
17900 767598 [45]
17940 80956 [47]
17980 294865 [1, 13]
18020 76794 [18]
18060 59779 [28]
18100 32237 [31]
18140 1836536 [39]
18180 146445 [33]
18220 24277 [18]
18260 106042 [47]
18300 63043 [41]
18340 35637 [21]
18380 23439 [13]
18420 37057 [28]
18460 43041 [13]
18500 98990 [36]
18580 428185 [48]
18620 47735 [48]
18660 49336 [36]
18700 85579 [41]
18740 36901 [39]
18820 38124 [18]
18860 28610 [6]
18880 180822 [12]
18900 56053 [47]
18940 61773 [22]
18980 80406 [1]
19020 46689 [51]
19060 103299 [24, 54]
19100 6371773 [48]
19140 142227 [13]
19180 81625 [17]
19220 53174 [21]
19260 106561 [51]
19300 182265 [1]
19340 379690 [17, 19]
19380 841502 [39]
19460 153829 [1]
19500 110768 [17]
19540 34387 [18]
19580 39037 [39]
19620 48879 [48]
19660 494593 [12]
19700 25095 [35]
19740 2543482 [8]
19760 35654 [22]
19780 569633 [19]
19820 4296250 [26]
19860 24982 [38]
19900 32062 [45]
19940 36031 [17]
19980 33848 [20]
20020 145639 [1]
20060 50731 [13]
20100 162310 [10]
20140 58414 [13]
20180 81642 [42]
20220 93653 [19]
20260 279771 [27, 55]
20300 21904 [48]
20340 45048 [40]
20380 114678 [37]
20420 51334 [8]
20460 42416 [40]
20500 504357 [37]
20540 38335 [47]
20580 54258 [48]
20620 107841 [39]
20660 37782 [24]
20700 169842 [42]
20740 161151 [55]
20780 59507 [8]
20820 34242 [17]
20900 41280 [48]
20940 174528 [6]
20980 41639 [5]
21020 64094 [37]
21060 119736 [21]
21120 22119 [40]
21140 197559 [18]
21220 50805 [32]
21260 40915 [53]
21300 88830 [36]
21340 800647 [48]
21380 36480 [20]
21420 60580 [40]
21460 100199 [1]
21500 280566 [42]
21540 37069 [26]
21580 40246 [35]
21640 29970 [1, 13]
21660 351715 [41]
21700 134623 [6]
21740 21118 [56]
21780 358676 [18, 21]
21820 97581 [2]
21860 20840 [27]
21900 56418 [54]
21980 24877 [32]
22020 208777 [27, 38]
22060 64142 [27]
22100 65359 [29]
22140 130044 [35]
22180 366383 [37]
22220 463204 [5, 29]
22260 57303 [27]
22280 51980 [32]
22300 74782 [39]
22340 27172 [13]
22380 134421 [4]
22420 425790 [26]
22500 205566 [45]
22520 147137 [1]
22540 101633 [55]
22580 67810 [37]
22620 28258 [5]
22660 299630 [8]
22700 38013 [19]
22780 52274 [29]
22800 43001 [19, 29]
22820 28159 [8]
22840 71109 [1]
22860 52334 [22]
22900 298592 [5, 40]
22980 27695 [13]
23060 416257 [18]
23140 33224 [18]
23180 70706 [21]
23240 24837 [48]
23300 47711 [17]
23340 36691 [31]
23380 60944 [39]
23420 930450 [6]
23460 104430 [1]
23500 55342 [45]
23540 264275 [12]
23580 179684 [13]
23620 38437 [48]
23660 70626 [17]
23700 71492 [35]
23780 36776 [20]
23820 46997 [32]
23860 60158 [45]
23900 101407 [42]
23940 46133 [56]
23980 52272 [21]
24020 128923 [36]
24100 55531 [36]
24140 122623 [37]
24180 59672 [48]
24220 98461 [27, 38]
24260 72726 [31]
24300 146723 [8]
24340 774160 [26]
24380 27213 [35]
24420 82713 [41]
24460 27674 [20]
24500 81327 [30]
24540 252825 [8]
24580 306241 [55]
24620 68831 [47]
24660 723801 [37]
24700 25740 [18]
24740 51137 [28]
24780 189510 [37]
24820 52959 [39]
24860 636986 [45]
24900 42914 [28]
24940 69661 [45]
24980 21906 [28]
25060 248820 [28]
25100 20640 [40]
25180 269140 [24, 54]
25220 121097 [22]
25260 152982 [6]
25300 38948 [29]
25340 54181 [47]
25380 24913 [17]
25420 549475 [42]
25460 45233 [5]
25500 125228 [51]
25540 1212381 [9]
25580 37906 [31]
25620 142842 [28]
25660 16096 [30]
25700 28452 [20]
25720 23530 [49]
25740 74801 [30]
25760 21757 [5]
25780 45422 [37]
25820 19372 [48]
25860 365497 [37]
25900 185079 [15]
25940 187010 [45]
25980 77917 [13]
26020 64727 [35]
26100 263801 [26]
26140 141236 [12]
26180 953207 [15]
26220 22346 [41]
26260 31606 [5]
26300 96024 [5]
26340 38784 [26]
26380 208178 [22]
26420 5946800 [48]
26460 63096 [36]
26480 49683 [47]
26500 45913 [42]
26540 37124 [18]
26580 287702 [21, 39, 54]
26620 417593 [1]
26660 67861 [48]
26700 17398 [46]
26740 64511 [20]
26780 36651 [27]
26820 130374 [16]
26860 88880 [42]
26900 1756241 [18]
26940 29450 [28]
26980 152586 [19]
27020 30591 [26, 55]
27060 101564 [36]
27100 160248 [26]
27140 539057 [28]
27180 115425 [47]
27220 31464 [16, 56]
27260 1345596 [12]
27300 40902 [17]
27340 177772 [37]
27380 50845 [48]
27420 21100 [38]
27460 134905 [36]
27500 160331 [55]
27540 54734 [18]
27620 149807 [29]
27660 31594 [22]
27700 30099 [13]
27740 198716 [47]
27780 143679 [42]
27860 121026 [5]
27900 175518 [29]
27940 31275 [2]
27980 154834 [15]
28020 326589 [26]
28060 90928 [30]
28100 113449 [17]
28140 2035334 [20, 29]
28180 67091 [15]
28260 52591 [31]
28300 77117 [33]
28340 47536 [18]
28380 31953 [29]
28420 253340 [53]
28500 49625 [48]
28540 13477 [2]
28580 73090 [12]
28620 33920 [37]
28660 405300 [48]
28700 309544 [47, 51]
28740 182493 [36]
28780 32477 [48]
28820 59495 [37]
28860 30038 [29]
28900 66380 [41]
28940 698030 [47]
28980 13592 [2]
29020 98688 [18]
29060 60088 [33]
29100 133665 [27, 55]
29140 201789 [18]
29180 273738 [22]
29220 40716 [47]
29260 25748 [41]
29300 67044 [13]
29340 199607 [22]
29380 67531 [12]
29420 200186 [4]
29460 602095 [12]
29500 13833 [48]
29540 519445 [42]
29580 76652 [45]
29620 464036 [26]
29660 36299 [56]
29700 250304 [48]
29740 209233 [35]
29780 29393 [35]
29820 1951269 [32]
29860 84823 [28]
29900 36157 [37]
29940 110826 [20]
29980 41869 [47]
30020 124098 [40]
30060 35571 [29]
30100 174724 [33, 50]
30140 133568 [42]
30220 22935 [48]
30260 44947 [42]
30280 30617 [47]
30300 60888 [16, 53]
30340 107702 [23]
30380 46682 [42]
30420 26370 [31]
30460 472099 [21]
30500 105151 [24]
30580 22952 [20]
30620 106331 [39]
30660 30305 [17]
30700 302157 [31]
30740 78265 [37]
30780 699757 [5]
30820 39238 [42]
30860 125442 [16, 49]
30900 38966 [18]
30940 58849 [21]
30980 214369 [48]
31020 102410 [53]
31060 17950 [35]
31100 12828837 [6]
31140 1283566 [18, 21]
31180 284890 [48]
31260 86771 [48]
31300 134168 [37]
31340 252634 [51]
31380 32612 [17]
31420 232293 [13]
31460 150865 [6]
31500 32428 [18]
31540 568593 [55]
31580 46920 [21]
31620 24552 [5]
31660 51599 [36]
31700 400721 [33]
31740 127081 [20]
31820 81442 [55]
31860 96740 [27]
31900 124475 [39]
31920 42750 [48]
31940 65778 [26, 55]
31980 70061 [18]
32020 66501 [39]
32060 66357 [17]
32100 67077 [26]
32140 25857 [27]
32180 23370 [29]
32220 65631 [48]
32260 40648 [19]
32270 74749 [55]
32280 35021 [47]
32300 67972 [51]
32340 23370 [29]
32380 51749 [19]
32460 37121 [21]
32500 31360 [21]
32540 45837 [40]
32580 774769 [48]
32620 53535 [28]
32660 39839 [47]
32700 29180 [20]
32740 88765 [42]
32780 203206 [41]
32820 1316100 [5, 28, 47]
32860 43857 [55]
32900 255793 [6]
32940 107449 [28]
32980 28743 [55]
33020 25529 [29]
33060 31848 [40]
33100 5564635 [12]
33140 111467 [18]
33180 28691 [21]
33220 83629 [26]
33260 136872 [48]
33300 55149 [13]
33340 1555908 [55]
33380 41207 [22]
33420 28111 [48]
33460 3279833 [27, 55]
33500 69540 [38]
33540 109299 [30]
33580 22835 [46]
33620 25414 [29]
33660 412992 [1]
33700 514453 [6]
33740 176441 [22]
33780 152021 [26]
33820 36842 [55]
33860 374536 [1]
33940 41276 [8]
33980 66469 [37]
34020 54650 [22]
34060 129709 [54]
34100 136608 [47]
34140 37244 [16]
34180 89120 [53]
34220 45498 [13]
34260 41513 [5]
34300 27038 [16]
34340 73673 [37]
34380 70311 [26]
34420 32334 [48]
34460 44396 [21]
34500 47284 [17]
34540 60921 [39]
34580 116901 [53]
34620 117671 [18]
34660 37191 [21]
34700 54132 [19]
34740 172188 [26]
34780 70990 [40]
34820 269291 [45]
34860 64524 [48]
34900 136484 [6]
34940 321520 [12]
34980 1589934 [47]
35020 53119 [22, 28]
35060 39566 [22]
35100 126802 [37]
35140 37508 [45]
35220 49462 [18]
35260 91108 [42]
35300 862477 [9]
35340 73240 [22]
35380 1167764 [22]
35420 92582 [39]
35460 35662 [47]
35500 36842 [19]
35580 25893 [27]
35620 18897109 [34, 36, 42]
35660 156813 [26]
35700 47420 [4]
35740 48271 [31]
35820 37590 [31]
35840 702281 [12]
35860 28525 [18]
35900 69340 [37]
35940 59626 [39]
35980 274055 [9]
36020 78506 [53]
36060 46039 [54]
36100 331298 [12]
36140 97265 [34]
36180 51454 [24]
36220 137130 [48]
36260 547184 [49]
36300 111944 [36]
36340 54984 [42]
36380 39996 [12]
36420 1252987 [40]
36460 80317 [36]
36500 252264 [53]
36540 865350 [19, 31]
36580 62259 [36]
36620 53936 [16, 41]
36660 83384 [22]
36700 92501 [45]
36740 2134411 [12]
36780 166994 [55]
36820 22381 [19]
36860 154908 [17]
36900 35625 [19]
36940 36576 [27]
36980 114752 [21]
37020 70648 [26]
37060 47351 [28]
37100 823318 [6]
37140 98762 [17, 21]
37220 43946 [32]
37260 74364 [12]
37300 58458 [48]
37340 543376 [12]
37380 95696 [12]
37420 23464 [48]
37460 168852 [12]
37500 42090 [5]
37540 32330 [47]
37580 49793 [48]
37620 162056 [39, 54]
37660 21607 [20]
37700 162246 [28]
37740 53597 [4]
37780 13783 [48]
37800 33309 [19]
37820 87062 [41]
37860 448991 [12]
37900 379186 [17]
37940 36903 [18]
37980 5965343 [10, 24, 34, 42]
38020 55365 [6]
38060 4192887 [4]
38100 55834 [28]
38180 19988 [46]
38200 23421 [22]
38220 100258 [5]
38260 39134 [20]
38300 2356285 [42]
38340 131219 [25]
38380 36273 [48]
38420 51208 [55]
38460 82128 [36]
38500 47051 [18]
38540 90656 [16]
38580 58258 [39, 54]
38620 46562 [40]
38700 38950 [17]
38740 42794 [29]
38780 19846 [35]
38820 71404 [53]
38860 514098 [23]
38900 2226009 [41, 53]
38940 424107 [12]
39020 79499 [39]
39060 148289 [42]
39100 670301 [36]
39140 211033 [4]
39220 21403 [49]
39260 20978 [41]
39300 1600852 [25, 44]
39340 526810 [49]
39380 159063 [8]
39420 44776 [53]
39460 159978 [12]
39500 77314 [17, 29]
39540 195408 [55]
39580 1130490 [37]
39660 126382 [46]
39700 22134 [48]
39740 411442 [42]
39780 63463 [6]
39820 177223 [6]
39860 46183 [27]
39900 425417 [32]
39940 50778 [16]
39980 68917 [18]
40060 1258251 [51]
40080 99972 [21]
40100 60968 [48]
40140 4224851 [6]
40180 40123 [56]
40220 308707 [51]
40260 76790 [37]
40300 53497 [17]
40340 186011 [27]
40380 1054323 [36]
40420 349431 [17]
40460 46639 [37]
40500 39736 [23]
40540 43806 [56]
40580 152392 [37]
40620 45156 [29]
40660 96317 [13]
40700 107667 [41]
40740 65645 [35]
40760 20497 [35]
40780 83939 [5]
40820 63009 [22]
40860 61642 [50]
40900 2149127 [6]
40940 45657 [4]
40980 200169 [26]
41060 189093 [27]
41100 138115 [49]
41140 127329 [20, 29]
41180 2812896 [17, 29]
41220 50513 [13]
41260 31946 [42]
41420 390738 [41]
41460 61697 [20]
41500 415057 [6]
41540 125203 [24]
41580 138428 [37]
41620 1124197 [49]
41660 111823 [48]
41700 2142508 [48]
41740 3095313 [6]
41780 77079 [39]
41820 57866 [37]
41860 4335391 [6]
41940 1836911 [6]
42020 269637 [6]
42060 423895 [6]
42100 262382 [6]
42140 144170 [35]
42220 483878 [6]
42300 38520 [26]
42340 347611 [13]
42380 62622 [42]
42420 37660 [31]
42460 53227 [1]
42500 24181 [18]
42540 563631 [42]
42580 197145 [10]
42620 77076 [5]
42660 3439809 [53]
42680 138028 [12]
42700 98786 [12]
42740 42201 [29]
42780 39702 [42]
42820 43820 [1]
42860 74273 [45]
42900 35251 [36]
42940 89889 [47]
42980 42376 [18]
43060 69442 [40]
43100 115507 [55]
43140 98078 [37]
43180 45058 [47]
43220 60699 [53]
43260 29116 [56]
43300 120877 [48]
43320 107449 [4]
43340 398604 [22]
43380 49423 [39]
43420 131346 [4]
43460 39191 [29]
43500 29514 [35]
43540 27994 [8]
43580 143577 [19, 31, 46]
43620 228261 [46]
43660 16921 [48]
43700 63063 [21]
43740 77742 [42]
43780 319224 [18, 26]
43860 88247 [37]
43900 284307 [45]
43940 24097 [46]
43980 16667 [19]
44020 16667 [19]
44060 471221 [53]
44100 210170 [17]
44140 692942 [25]
44180 436712 [29]
44220 138333 [39]
44260 47671 [28]
44300 153990 [42]
44340 70217 [13]
44380 159437 [37]
44420 118502 [51]
44500 37890 [48]
44540 22709 [8]
44580 58498 [17]
44600 124454 [39, 54]
44620 70019 [55]
44660 77350 [40]
44700 685306 [6]
44740 20260 [19]
44780 61295 [26]
44860 35161 [48]
44900 26015 [13]
44940 107456 [45]
44980 94528 [42]
45000 34895 [6]
45020 15216 [48]
45060 662577 [36]
45140 46987 [40]
45180 82291 [1]
45220 367413 [12]
45260 12093 [22]
45300 2783243 [12]
45340 32937 [35]
45380 34800 [17]
45460 172425 [18]
45500 136027 [5, 48]
45520 25213 [41]
45540 93420 [12]
45580 27153 [13]
45620 44720 [13]
45640 162878 [37]
45660 56745 [39]
45700 40118 [13]
45740 26175 [13]
45780 651429 [39]
45820 233870 [20]
45860 189927 [9]
45900 143372 [26]
45940 366513 [34]
45980 32899 [1]
46020 98764 [6]
46060 980263 [4]
46100 100210 [47]
46140 937478 [40]
46180 136268 [28]
46220 219461 [1]
46260 21452 [1]
46300 99604 [16]
46340 209714 [48]
46380 87841 [6]
46420 28961 [45]
46460 38620 [21, 47]
46500 40097 [39]
46540 299397 [36]
46620 26405 [48]
46660 139588 [13]
46700 413344 [6]
46740 34215 [1]
46780 28744 [39]
46820 13864 [46]
46860 32588 [49]
46900 13535 [48]
46980 48773 [28]
47020 115384 [48]
47080 36346 [13]
47180 38440 [18]
47220 156898 [34]
47260 1671683 [37, 51]
47300 442179 [6]
47340 32888 [18]
47380 234906 [48]
47420 22897 [27, 38]
47460 58781 [53]
47500 38892 [45]
47540 45949 [39]
47580 139900 [13]
47620 41815 [42]
47660 52595 [29]
47700 77358 [18]
47780 31648 [18]
47820 47759 [37]
47900 5582170 [11, 24, 51, 54]
47920 29030 [39]
47940 167819 [19]
47980 33130 [46]
48020 83686 [55]
48060 116229 [36]
48100 27731 [12]
48140 134063 [55]
48180 55070 [13]
48220 27469 [40]
48300 110884 [53]
48460 40400 [29]
48500 20634 [28]
48540 147950 [39, 54]
48580 102228 [55]
48620 623061 [20]
48660 151306 [48]
48700 116111 [42]
48740 118428 [9]
48780 22398 [38]
48820 42239 [27]
48900 362315 [37]
48940 42040 [39]
48980 81234 [37]
49020 128472 [51, 54]
49060 36311 [20]
49100 51461 [27]
49180 477717 [37]
49260 20081 [40]
49300 114520 [39]
49340 798552 [25]
49380 21378 [27]
49420 243231 [53]
49460 22438 [46]
49540 28065 [28]
49620 434972 [42]
49660 565773 [39, 42]
49700 166892 [6]
49740 195751 [4]
49780 86074 [39]



In [73]:

    
def P005_range(n0,n1): 
    return tuple(('P005'+ "{i:04d}".format(i=i) for i in xrange(n0,n1)))

P005_vars = P005_range(1,18)
P005_vars_str = ",".join(P005_vars)
P005_vars_with_name = ['NAME'] + list(P005_vars)


# http://manishamde.github.io/blog/2013/03/07/pandas-and-python-top-10/#create
def convert_to_rdotmap(row):
    """takes the P005 variables and maps to a series with White, Black, Asian, Hispanic, Other
    Total and Name"""
    return pd.Series({'Total':row['P0050001'],
                      'White':row['P0050003'],
                      'Black':row['P0050004'],
                      'Asian':row['P0050006'],
                      'Hispanic':row['P0050010'],
                      'Other': row['P0050005'] + row['P0050007'] + row['P0050008'] + row['P0050009'],
                      'Name': row['NAME']
                      }, index=['Name', 'Total', 'White', 'Black', 'Hispanic', 'Asian', 'Other'])


def normalize(s):
    """take a Series and divide each item by the sum so that the new series adds up to 1.0"""
    total = np.sum(s)
    return s.astype('float') / total


def entropy(series):
    """Normalized Shannon Index"""
    # a series in which all the entries are equal should result in normalized entropy of 1.0
    
    # eliminate 0s
    series1 = series[series!=0]

    # if len(series) < 2 (i.e., 0 or 1) then return 0
    
    if len(series) > 1:
        # calculate the maximum possible entropy for given length of input series
        max_s = -np.log(1.0/len(series))
    
        total = float(sum(series1))
        p = series1.astype('float')/float(total)
        return sum(-p*np.log(p))/max_s
    else:
        return 0.0

    
def convert_P005_to_int(df):
    # do conversion in place
    df[list(P005_vars)] = df[list(P005_vars)].astype('int')
    return df
    

def diversity(r):

    """Returns a DataFrame with the following columns
    """
    df = DataFrame(r)
    df = convert_P005_to_int(df)
    # df[list(P005_vars)] = df[list(P005_vars)].astype('int')
    df1 = df.apply(convert_to_rdotmap, axis=1)
    
    df1['entropy5'] = df1[['Asian','Black','Hispanic','White','Other']].apply(entropy,axis=1)
    df1['entropy4'] = df1[['Asian','Black','Hispanic','White']].apply(entropy,axis=1)
    return df1



In [225]:

    
#http://api.census.gov/data/2010/sf1?get=P0010001&for=metropolitan+statistical+area/micropolitan+statistical+area:11260&in=state:02
def get_all_msas(variables='NAME'):
    count = 0
    for msa in s.index:

        for state in list(grouped_msa.get_group(msa).state):
            geo = {'for':'metropolitan statistical area/micropolitan statistical area:{msa_fips}'.format(msa_fips=msa), 
                   'in':'state:{state_fips}'.format(state_fips=state)
                  }
           
            for each_msa in c.sf1.get(variables, geo=geo):
                count +=1
                if msa == '35620':
                    print state
                    print each_msa
                yield each_msa



In [226]:

    
r = list(get_all_msas(P005_vars_with_name))









    



34
{u'P0050015': u'738', u'P0050014': u'4916', u'P0050011': u'693977', u'NAME': u'New York-Northern New Jersey-Long Island, NY-NJ-PA Metro Area (part)', u'metropolitan statistical area/micropolitan statistical area': u'35620', u'P0050016': u'419929', u'P0050017': u'84217', u'P0050012': u'59911', u'state': u'34', u'P0050009': u'93679', u'P0050008': u'23517', u'P0050013': u'13553', u'P0050005': u'7789', u'P0050004': u'773074', u'P0050007': u'1304', u'P0050006': u'608231', u'P0050001': u'6471215', u'P0050010': u'1277241', u'P0050003': u'3686380', u'P0050002': u'5193974'}
36
{u'P0050015': u'2988', u'P0050014': u'12491', u'P0050011': u'1246568', u'NAME': u'New York-Northern New Jersey-Long Island, NY-NJ-PA Metro Area (part)', u'metropolitan statistical area/micropolitan statistical area': u'35620', u'P0050016': u'1249846', u'P0050017': u'227263', u'P0050012': u'258337', u'state': u'36', u'P0050009': u'206336', u'P0050008': u'70159', u'P0050013': u'47653', u'P0050005': u'23461', u'P0050004': u'2267972', u'P0050007': u'3542', u'P0050006': u'1252026', u'P0050001': u'12368525', u'P0050010': u'3045146', u'P0050003': u'5499883', u'P0050002': u'9323379'}
42
{u'P0050015': u'3', u'P0050014': u'14', u'P0050011': u'3307', u'NAME': u'New York-Northern New Jersey-Long Island, NY-NJ-PA Metro Area (part)', u'metropolitan statistical area/micropolitan statistical area': u'35620', u'P0050016': u'1116', u'P0050017': u'412', u'P0050012': u'272', u'state': u'42', u'P0050009': u'797', u'P0050008': u'77', u'P0050013': u'49', u'P0050005': u'127', u'P0050004': u'3050', u'P0050007': u'13', u'P0050006': u'583', u'P0050001': u'57369', u'P0050010': u'5173', u'P0050003': u'47549', u'P0050002': u'52196'}



In [227]:

    
len(r)









    Out[227]:





1013



In [297]:

    
test1= DataFrame(r)
len(test1)
# test.P0050001.astype('int').sum()









    Out[297]:





1013



In [298]:

    
import numpy as np
df_diversity = diversity(r)
df_diversity









    Out[298]:






  
    
      
      Name
      Total
      White
      Black
      Hispanic
      Asian
      Other
      entropy5
      entropy4
    
  
  
    
      0 
                                Abbeville, LA Micro Area
         57999
         46305
          8246
         1381
         1148
          919
       0.428364
       0.445662
    
    
      1 
                                 Aberdeen, SD Micro Area
         40602
         37774
           192
          554
          358
         1724
       0.203138
       0.113942
    
    
      2 
                                 Aberdeen, WA Micro Area
         72797
         59282
           762
         6272
          995
         5486
       0.422324
       0.321742
    
    
      3 
                                  Abilene, TX Metro Area
        165252
        112735
         11549
        35108
         2110
         3750
       0.570100
       0.597267
    
    
      4 
                                      Ada, OK Micro Area
         37492
         25973
           879
         1523
          244
         8873
       0.525798
       0.282587
    
    
      5 
                                   Adrian, MI Micro Area
         99892
         87483
          2397
         7614
          492
         1906
       0.312895
       0.300877
    
    
      6 
                                    Akron, OH Metro Area
        703200
        579151
         83984
        10733
        14124
        15208
       0.396953
       0.394132
    
    
      7 
                               Alamogordo, NM Micro Area
         63797
         33716
          2048
        22026
          716
         5291
       0.665741
       0.617931
    
    
      8 
                                   Albany, GA Metro Area
        157308
         68820
         81671
         3270
         1455
         2092
       0.548816
       0.594085
    
    
      9 
                           Albany-Lebanon, OR Micro Area
        116672
        101579
           456
         9127
         1078
         4432
       0.316341
       0.260671
    
    
      10
                  Albany-Schenectady-Troy, NY Metro Area
        870716
        721505
         62963
        36027
        26983
        23238
       0.423665
       0.414162
    
    
      11
                                Albemarle, NC Micro Area
         60585
         49877
          6590
         2166
         1108
          844
       0.405878
       0.424194
    
    
      12
                               Albert Lea, MN Micro Area
         31255
         27701
           200
         2750
          232
          372
       0.274817
       0.275758
    
    
      13
                              Albertville, AL Micro Area
         93019
         78060
          1389
        11238
          475
         1857
       0.354368
       0.347732
    
    
      14
                              Albuquerque, NM Metro Area
        887077
        374214
         19766
       414222
        16769
        62106
       0.662122
       0.629810
    
    
      15
                           Alexander City, AL Micro Area
         53155
         36442
         14606
         1272
          211
          624
       0.482905
       0.520690
    
    
      16
                               Alexandria, LA Metro Area
        153922
         98984
         45378
         4349
         1598
         3613
       0.546936
       0.568047
    
    
      17
                               Alexandria, MN Micro Area
         36009
         34974
           146
          341
          163
          385
       0.104223
       0.079173
    
    
      18
                                    Alice, TX Micro Area
         40838
          8062
           156
        32254
          134
          232
       0.357935
       0.392505
    
    
      19
                                  Allegan, MI Micro Area
        111408
         99945
          1264
         7454
          620
         2125
       0.269400
       0.249417
    
    
      20
       Allentown-Bethlehem-Easton, PA-NJ Metro Area (...
        108692
         93165
          3592
         7659
         2642
         1634
       0.363594
       0.371422
    
    
      21
       Allentown-Bethlehem-Easton, PA-NJ Metro Area (...
        712481
        552576
         31325
        98939
        17525
        12116
       0.477843
       0.501141
    
    
      22
                                     Alma, MI Micro Area
         42476
         37144
          2341
         2301
          145
          545
       0.317042
       0.322703
    
    
      23
                                   Alpena, MI Micro Area
         29598
         28623
            78
          304
          148
          445
       0.114741
       0.078099
    
    
      24
                                  Altoona, PA Metro Area
        127089
        121495
          2051
         1230
          689
         1624
       0.148198
       0.124293
    
    
      25
                                    Altus, OK Micro Area
         26446
         17403
          1918
         5538
          303
         1284
       0.615835
       0.604190
    
    
      26
                                 Amarillo, TX Metro Area
        249881
        160881
         14570
        63116
         6319
         4995
       0.601431
       0.640350
    
    
      27
                                 Americus, GA Micro Area
         37829
         17025
         18061
         1878
          454
          411
       0.598709
       0.659004
    
    
      28
                                     Ames, IA Metro Area
         89542
         77812
          2125
         2695
         5369
         1541
       0.344768
       0.343437
    
    
      29
                                Amsterdam, NY Micro Area
         50219
         42732
           712
         5654
          356
          765
       0.337035
       0.339575
    
    
      30
                                Anchorage, AK Metro Area
        380821
        256490
         16125
        25362
        24283
        58561
       0.648631
       0.524013
    
    
      31
                                 Anderson, IN Metro Area
        131636
        113577
         10887
         4189
          547
         2436
       0.335387
       0.328977
    
    
      32
                                 Anderson, SC Metro Area
        187126
        147362
         29810
         5447
         1384
         3123
       0.427670
       0.442685
    
    
      33
                                  Andrews, TX Micro Area
         14786
          7083
           199
         7195
           85
          224
       0.530728
       0.568132
    
    
      34
                                   Angola, IN Micro Area
         34185
         32513
           155
          982
          162
          373
       0.154586
       0.137582
    
    
      35
                                Ann Arbor, MI Metro Area
        344791
        248675
         43152
        13860
        27005
        12099
       0.585306
       0.590587
    
    
      36
                          Anniston-Oxford, AL Metro Area
        118572
         87285
         24177
         3893
          830
         2387
       0.481695
       0.498171
    
    
      37
                                 Appleton, WI Metro Area
        225666
        203691
          1918
         8049
         6235
         5773
       0.276392
       0.241187
    
    
      38
                                  Arcadia, FL Micro Area
         34862
         19549
          4338
        10425
          169
          381
       0.633690
       0.699896
    
    
      39
                                  Ardmore, OK Micro Area
         56980
         41684
          3297
         3641
          547
         7811
       0.550703
       0.406843
    
    
      40
                              Arkadelphia, AR Micro Area
         22995
         16177
          5366
          926
          120
          406
       0.506412
       0.533308
    
    
      41
                                Asheville, NC Metro Area
        424858
        366448
         18976
        27100
         3664
         8670
       0.349423
       0.340762
    
    
      42
                                  Ashland, OH Micro Area
         53139
         51352
           395
          501
          275
          616
       0.119532
       0.094279
    
    
      43
                                Ashtabula, OH Micro Area
        101497
         92126
          3467
         3441
          372
         2091
       0.260059
       0.234344
    
    
      44
                                  Astoria, OR Micro Area
         37039
         32295
           163
         2838
          445
         1298
       0.317363
       0.268305
    
    
      45
                                 Atchison, KS Micro Area
         16924
         15152
           844
          383
           70
          475
       0.284117
       0.244408
    
    
      46
                                   Athens, OH Micro Area
         64757
         58779
          1741
         1002
         1741
         1494
       0.269547
       0.239279
    
    
      47
                                   Athens, TN Micro Area
         52266
         47273
          2021
         1482
          385
         1105
       0.270483
       0.245305
    
    
      48
                                   Athens, TX Micro Area
         78532
         63494
          4813
         8490
          318
         1417
       0.421412
       0.431882
    
    
      49
                     Athens-Clarke County, GA Metro Area
        192541
        130515
         37182
        15313
         6065
         3466
       0.598788
       0.641689
    
    
      50
           Atlanta-Sandy Springs-Marietta, GA Metro Area
       5268860
       2671757
       1679979
       547400
       252510
       117214
       0.729649
       0.787682
    
    
      51
                  Atlantic City-Hammonton, NJ Metro Area
        274549
        160871
         40882
        46241
        20419
         6136
       0.730078
       0.787988
    
    
      52
                                   Auburn, IN Micro Area
         42223
         40406
           141
         1031
          186
          459
       0.139699
       0.120237
    
    
      53
                                   Auburn, NY Micro Area
         80026
         73098
          3009
         1896
          387
         1636
       0.248565
       0.221134
    
    
      54
                           Auburn-Opelika, AL Metro Area
        140247
         97900
         31674
         4571
         3615
         2487
       0.537048
       0.569332
    
    
      55
        Augusta-Richmond County, GA-SC Metro Area (part)
        369793
        190907
        145579
        15474
         8146
         9687
       0.634125
       0.666177
    
    
      56
        Augusta-Richmond County, GA-SC Metro Area (part)
        187084
        123762
         48994
         9238
         1412
         3678
       0.551061
       0.581431
    
    
      57
                       Augusta-Waterville, ME Micro Area
        122151
        116549
           642
         1504
          886
         2570
       0.151290
       0.104215
    
    
      58
                                   Austin, MN Micro Area
         39163
         32975
           791
         4138
          636
          623
       0.369001
       0.375469
    
    
      59
             Austin-Round Rock-San Marcos, TX Metro Area
       1716289
        938474
        120510
       538313
        80980
        38012
       0.688892
       0.739489
    
    
      
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
  

1013 rows × 9 columns



In [299]:

    
temp_diversity = df_diversity









    



---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-299-a1d55402ef66> in <module>()
      1 temp_diversity = df_diversity
----> 2 df_diversity = df_diversity.groupby('metropolitan statistical area/micropolitan statistical area').sum()

/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/pandas/core/generic.pyc in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze)
   2432         axis = self._get_axis_number(axis)
   2433         return groupby(self, by, axis=axis, level=level, as_index=as_index,
-> 2434                        sort=sort, group_keys=group_keys, squeeze=squeeze)
   2435 
   2436     def asfreq(self, freq, method=None, how=None, normalize=False):

/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/pandas/core/groupby.pyc in groupby(obj, by, **kwds)
    787         raise TypeError('invalid type: %s' % type(obj))
    788 
--> 789     return klass(obj, by, **kwds)
    790 
    791 

/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/pandas/core/groupby.pyc in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze)
    236         if grouper is None:
    237             grouper, exclusions = _get_grouper(obj, keys, axis=axis,
--> 238                                                level=level, sort=sort)
    239 
    240         self.grouper = grouper

/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/pandas/core/groupby.pyc in _get_grouper(obj, key, axis, level, sort)
   1610             exclusions.append(gpr)
   1611             name = gpr
-> 1612             gpr = obj[gpr]
   1613 
   1614         if isinstance(gpr, Categorical) and len(gpr) != len(obj):

/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/pandas/core/frame.pyc in __getitem__(self, key)
   1633             return self._getitem_multilevel(key)
   1634         else:
-> 1635             return self._getitem_column(key)
   1636 
   1637     def _getitem_column(self, key):

/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/pandas/core/frame.pyc in _getitem_column(self, key)
   1640         # get column
   1641         if self.columns.is_unique:
-> 1642             return self._get_item_cache(key)
   1643 
   1644         # duplicate columns & possible reduce dimensionaility

/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/pandas/core/generic.pyc in _get_item_cache(self, item)
    981         res = cache.get(item)
    982         if res is None:
--> 983             values = self._data.get(item)
    984             res = self._box_item_values(item, values)
    985             cache[item] = res

/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/pandas/core/internals.pyc in get(self, item)
   2752                 return self.get_for_nan_indexer(indexer)
   2753 
-> 2754             _, block = self._find_block(item)
   2755             return block.get(item)
   2756         else:

/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/pandas/core/internals.pyc in _find_block(self, item)
   3063 
   3064     def _find_block(self, item):
-> 3065         self._check_have(item)
   3066         for i, block in enumerate(self.blocks):
   3067             if item in block:

/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/pandas/core/internals.pyc in _check_have(self, item)
   3070     def _check_have(self, item):
   3071         if item not in self.items:
-> 3072             raise KeyError('no item named %s' % com.pprint_thing(item))
   3073 
   3074     def reindex_axis(self, new_axis, indexer=None, method=None, axis=0,

KeyError: u'no item named metropolitan statistical area/micropolitan statistical area'



In [263]:

    
type(df_diversity)









    Out[263]:





pandas.core.frame.DataFrame



In [264]:

    
#'Asian','Black','Hispanic', 'Other', 'Total', 'White',
#  'entropy4', 'entropy5', 'entropy_rice', 'gini_simpson',
#'p_Asian', 'p_Black', 'p_Hispanic', 'p_Other','p_White'
df_diversity['p_Asian'] = df_diversity['Asian']/df_diversity['Total']
df_diversity['p_Black'] = df_diversity['Black']/df_diversity['Total']
df_diversity['p_Hispanic'] = df_diversity['Hispanic']/df_diversity['Total']
df_diversity['p_Other'] = df_diversity['Other']/df_diversity['Total']
df_diversity['p_White'] = df_diversity['White']/df_diversity['Total']



In [265]:

    
def entropyRice(series):
    """Normalized Shannon Index"""
    # a series in which all the entries are equal should result in normalized entropy of 1.0
    
    # eliminate 0s
    series1 = series[series!=0]
    
    if len(series) > 1:
        # calculate the maximum possible entropy for given length of input series
        max_s = -np.log(1.0/len(series))
    
        total = float(sum(series1))
        p = series1.astype('float')/float(total)
#         print series['Other']
        p_other = series['Other'].astype('float')/float(total)
        E_other = -(p_other*np.log(p_other))/max_s
#         print E_other
        return (sum(-p*np.log(p))/max_s) - E_other
    else: 
        return 0.0

def giniSimpson(series):
    """Normalized Shannon Index"""
    # a series in which all the entries are equal should result in normalized entropy of 1.0
    
    # eliminate 0s
    series1 = series[series!=0]
    
    if len(series) > 1:
        # calculate the maximum possible entropy for given length of input series
        max_s = -np.log(1.0/len(series))
    
        total = float(sum(series1))
        p = series1.astype('float')/float(total)
        return sum(p*(1-p))
    else: 
        return 0.0 
    
    
df_diversity['gini_simpson'] = df_diversity[['Asian','Black','Hispanic','White','Other']].apply(giniSimpson,axis=1)

df_diversity['entropy_rice'] = df_diversity[['Asian','Black','Hispanic','White','Other']].apply(entropyRice,axis=1)



In [266]:

    
df_diversity.head()









    Out[266]:






  
    
      
      Total
      White
      Black
      Hispanic
      Asian
      Other
      entropy5
      entropy4
      p_Asian
      p_Black
      p_Hispanic
      p_Other
      p_White
      gini_simpson
      entropy_rice
    
    
      Name
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      Abbeville, LA Micro Area
        57999
        46305
        8246
        1381
       1148
        919
       0.428364
       0.445662
       0.019793
       0.142175
       0.023811
       0.015845
       0.798376
       0.341173
       0.387557
    
    
      Aberdeen, SD Micro Area
        40602
        37774
         192
         554
        358
       1724
       0.203138
       0.113942
       0.008817
       0.004729
       0.013645
       0.042461
       0.930348
       0.132363
       0.119791
    
    
      Aberdeen, WA Micro Area
        72797
        59282
         762
        6272
        995
       5486
       0.422324
       0.321742
       0.013668
       0.010467
       0.086157
       0.075360
       0.814347
       0.323441
       0.301262
    
    
      Abilene, TX Metro Area
       165252
       112735
       11549
       35108
       2110
       3750
       0.570100
       0.597267
       0.012768
       0.069887
       0.212451
       0.022693
       0.682201
       0.483905
       0.516722
    
    
      Ada, OK Micro Area
        37492
        25973
         879
        1523
        244
       8873
       0.525798
       0.282587
       0.006508
       0.023445
       0.040622
       0.236664
       0.692761
       0.461830
       0.313886
    
  

5 rows × 15 columns



In [267]:

    
msas_df = df_diversity
#df_diversity.sort_index(by="entropy5", ascending=False).head()



In [285]:

    
assert len(msas_df) == 942  
type(msas_df)









    Out[285]:





pandas.core.frame.DataFrame



In [295]:

    
top_10_metros = msas_df.sort_index(by='Total', ascending=False)[:10]
msa_codes_in_top_10_pop_sorted_by_entropy_rice = list(top_10_metros.sort_index(by='entropy_rice', 
                                                ascending=False).index) 
type(top_10_metros)
top_10_metros
# msa_codes_in_top_10_pop_sorted_by_entropy_rice
to_unicode(msa_codes_in_top_10_pop_sorted_by_entropy_rice)
# [u'Houston-Sugar Land-Baytown, TX Metro Area',
#  u'New York-Northern New Jersey-Long Island, NY-NJ-PA Metro Area (part)',
#  u'Washington-Arlington-Alexandria, DC-VA-MD-WV Metro Area (part)',
#  u'Los Angeles-Long Beach-Santa Ana, CA Metro Area',
#  u'Dallas-Fort Worth-Arlington, TX Metro Area',
#  u'Miami-Fort Lauderdale-Pompano Beach, FL Metro Area',
#  u'Chicago-Joliet-Naperville, IL-IN-WI Metro Area (part)',
#  u'Atlanta-Sandy Springs-Marietta, GA Metro Area',
#  u'Philadelphia-Camden-Wilmington, PA-NJ-DE-MD Metro Area (part)',
#  u'Boston-Cambridge-Quincy, MA-NH Metro Area (part)']









    Out[295]:





[u'Houston-Sugar Land-Baytown, TX Metro Area',
 u'New York-Northern New Jersey-Long Island, NY-NJ-PA Metro Area (part)',
 u'Washington-Arlington-Alexandria, DC-VA-MD-WV Metro Area (part)',
 u'Los Angeles-Long Beach-Santa Ana, CA Metro Area',
 u'Dallas-Fort Worth-Arlington, TX Metro Area',
 u'Miami-Fort Lauderdale-Pompano Beach, FL Metro Area',
 u'Chicago-Joliet-Naperville, IL-IN-WI Metro Area (part)',
 u'Atlanta-Sandy Springs-Marietta, GA Metro Area',
 u'Philadelphia-Camden-Wilmington, PA-NJ-DE-MD Metro Area (part)',
 u'Boston-Cambridge-Quincy, MA-NH Metro Area (part)']



In [155]:

    
# Testing code

def to_unicode(vals):
    return [unicode(v) for v in vals]

def test_msas_df(msas_df):

    min_set_of_columns =  set(['Asian','Black','Hispanic', 'Other', 'Total', 'White',
     'entropy4', 'entropy5', 'entropy_rice', 'gini_simpson','p_Asian', 'p_Black',
     'p_Hispanic', 'p_Other','p_White'])  
    
    #--> what does this assert mean?
    assert min_set_of_columns & set(msas_df.columns) == min_set_of_columns
    
    # https://www.census.gov/geo/maps-data/data/tallies/national_geo_tallies.html
    # 366 metro areas
    # 576 micropolitan areas
    
    assert len(msas_df) == 942  
    
    # total number of people in metro/micro areas
    
    assert msas_df.Total.sum() == 289261315
    assert msas_df['White'].sum() == 180912856
    assert msas_df['Other'].sum() == 8540181
    
    # list of msas in descendng order by entropy_rice 
    # calculate the top 10 metros by population
    top_10_metros = msas_df.sort_index(by='Total', ascending=False)[:10]
    
    msa_codes_in_top_10_pop_sorted_by_entropy_rice = list(top_10_metros.sort_index(by='entropy_rice', 
                                                ascending=False).index) 
    
    assert to_unicode(msa_codes_in_top_10_pop_sorted_by_entropy_rice)== [u'26420', u'35620', u'47900', u'31100', u'19100', 
        u'33100', u'16980', u'12060', u'37980', u'14460']


    top_10_metro = msas_df.sort_index(by='Total', ascending=False)[:10]
    
    list(top_10_metro.sort_index(by='entropy_rice', ascending=False)['entropy5'])
    
    np.testing.assert_allclose(top_10_metro.sort_index(by='entropy_rice', ascending=False)['entropy5'], 
    [0.79628076626851163, 0.80528601550164602, 0.80809418318973791, 0.7980698349711991,
     0.75945930510650161, 0.74913610558765376, 0.73683277781032397, 0.72964862063970914,
     0.64082509648457675, 0.55697288400004963])
    
    np.testing.assert_allclose(top_10_metro.sort_index(by='entropy_rice', ascending=False)['entropy_rice'],
    [0.87361766576115552,
     0.87272877244078051,
     0.85931803868749834,
     0.85508015237749468,
     0.82169723530719896,
     0.81953527301129059,
     0.80589423784325431,
     0.78602596561378812,
     0.68611350427640316,
     0.56978827050565117])



In [270]:

    
# you are on the right track if test_msas_df doesn't complain
test_msas_df(msas_df)









    



---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-270-824fb53aa28c> in <module>()
      1 # you are on the right track if test_msas_df doesn't complain
----> 2 test_msas_df(msas_df)

<ipython-input-155-1e1577c3e222> in test_msas_df(msas_df)
     33 
     34     assert to_unicode(msa_codes_in_top_10_pop_sorted_by_entropy_rice)== [u'26420', u'35620', u'47900', u'31100', u'19100', 
---> 35         u'33100', u'16980', u'12060', u'37980', u'14460']
     36 
     37 

AssertionError:



In [ ]:

    
# code to save your dataframe to a CSV
# upload the CSV to bCourses
# uncomment to run
# msas_df.to_csv("msas_2010.csv", encoding="UTF-8")



In [ ]:

    
# load back the CSV and test again
# df = DataFrame.from_csv("msas_2010.csv", encoding="UTF-8")
# test_msas_df(df)



In [ ]:

	Name	Total	White	Black	Hispanic	Asian	Other	entropy5	entropy4
0	Abbeville, LA Micro Area	57999	46305	8246	1381	1148	919	0.428364	0.445662
1	Aberdeen, SD Micro Area	40602	37774	192	554	358	1724	0.203138	0.113942
2	Aberdeen, WA Micro Area	72797	59282	762	6272	995	5486	0.422324	0.321742
3	Abilene, TX Metro Area	165252	112735	11549	35108	2110	3750	0.570100	0.597267
4	Ada, OK Micro Area	37492	25973	879	1523	244	8873	0.525798	0.282587
5	Adrian, MI Micro Area	99892	87483	2397	7614	492	1906	0.312895	0.300877
6	Akron, OH Metro Area	703200	579151	83984	10733	14124	15208	0.396953	0.394132
7	Alamogordo, NM Micro Area	63797	33716	2048	22026	716	5291	0.665741	0.617931
8	Albany, GA Metro Area	157308	68820	81671	3270	1455	2092	0.548816	0.594085
9	Albany-Lebanon, OR Micro Area	116672	101579	456	9127	1078	4432	0.316341	0.260671
10	Albany-Schenectady-Troy, NY Metro Area	870716	721505	62963	36027	26983	23238	0.423665	0.414162
11	Albemarle, NC Micro Area	60585	49877	6590	2166	1108	844	0.405878	0.424194
12	Albert Lea, MN Micro Area	31255	27701	200	2750	232	372	0.274817	0.275758
13	Albertville, AL Micro Area	93019	78060	1389	11238	475	1857	0.354368	0.347732
14	Albuquerque, NM Metro Area	887077	374214	19766	414222	16769	62106	0.662122	0.629810
15	Alexander City, AL Micro Area	53155	36442	14606	1272	211	624	0.482905	0.520690
16	Alexandria, LA Metro Area	153922	98984	45378	4349	1598	3613	0.546936	0.568047
17	Alexandria, MN Micro Area	36009	34974	146	341	163	385	0.104223	0.079173
18	Alice, TX Micro Area	40838	8062	156	32254	134	232	0.357935	0.392505
19	Allegan, MI Micro Area	111408	99945	1264	7454	620	2125	0.269400	0.249417
20	Allentown-Bethlehem-Easton, PA-NJ Metro Area (...	108692	93165	3592	7659	2642	1634	0.363594	0.371422
21	Allentown-Bethlehem-Easton, PA-NJ Metro Area (...	712481	552576	31325	98939	17525	12116	0.477843	0.501141
22	Alma, MI Micro Area	42476	37144	2341	2301	145	545	0.317042	0.322703
23	Alpena, MI Micro Area	29598	28623	78	304	148	445	0.114741	0.078099
24	Altoona, PA Metro Area	127089	121495	2051	1230	689	1624	0.148198	0.124293
25	Altus, OK Micro Area	26446	17403	1918	5538	303	1284	0.615835	0.604190
26	Amarillo, TX Metro Area	249881	160881	14570	63116	6319	4995	0.601431	0.640350
27	Americus, GA Micro Area	37829	17025	18061	1878	454	411	0.598709	0.659004
28	Ames, IA Metro Area	89542	77812	2125	2695	5369	1541	0.344768	0.343437
29	Amsterdam, NY Micro Area	50219	42732	712	5654	356	765	0.337035	0.339575
30	Anchorage, AK Metro Area	380821	256490	16125	25362	24283	58561	0.648631	0.524013
31	Anderson, IN Metro Area	131636	113577	10887	4189	547	2436	0.335387	0.328977
32	Anderson, SC Metro Area	187126	147362	29810	5447	1384	3123	0.427670	0.442685
33	Andrews, TX Micro Area	14786	7083	199	7195	85	224	0.530728	0.568132
34	Angola, IN Micro Area	34185	32513	155	982	162	373	0.154586	0.137582
35	Ann Arbor, MI Metro Area	344791	248675	43152	13860	27005	12099	0.585306	0.590587
36	Anniston-Oxford, AL Metro Area	118572	87285	24177	3893	830	2387	0.481695	0.498171
37	Appleton, WI Metro Area	225666	203691	1918	8049	6235	5773	0.276392	0.241187
38	Arcadia, FL Micro Area	34862	19549	4338	10425	169	381	0.633690	0.699896
39	Ardmore, OK Micro Area	56980	41684	3297	3641	547	7811	0.550703	0.406843
40	Arkadelphia, AR Micro Area	22995	16177	5366	926	120	406	0.506412	0.533308
41	Asheville, NC Metro Area	424858	366448	18976	27100	3664	8670	0.349423	0.340762
42	Ashland, OH Micro Area	53139	51352	395	501	275	616	0.119532	0.094279
43	Ashtabula, OH Micro Area	101497	92126	3467	3441	372	2091	0.260059	0.234344
44	Astoria, OR Micro Area	37039	32295	163	2838	445	1298	0.317363	0.268305
45	Atchison, KS Micro Area	16924	15152	844	383	70	475	0.284117	0.244408
46	Athens, OH Micro Area	64757	58779	1741	1002	1741	1494	0.269547	0.239279
47	Athens, TN Micro Area	52266	47273	2021	1482	385	1105	0.270483	0.245305
48	Athens, TX Micro Area	78532	63494	4813	8490	318	1417	0.421412	0.431882
49	Athens-Clarke County, GA Metro Area	192541	130515	37182	15313	6065	3466	0.598788	0.641689
50	Atlanta-Sandy Springs-Marietta, GA Metro Area	5268860	2671757	1679979	547400	252510	117214	0.729649	0.787682
51	Atlantic City-Hammonton, NJ Metro Area	274549	160871	40882	46241	20419	6136	0.730078	0.787988
52	Auburn, IN Micro Area	42223	40406	141	1031	186	459	0.139699	0.120237
53	Auburn, NY Micro Area	80026	73098	3009	1896	387	1636	0.248565	0.221134
54	Auburn-Opelika, AL Metro Area	140247	97900	31674	4571	3615	2487	0.537048	0.569332
55	Augusta-Richmond County, GA-SC Metro Area (part)	369793	190907	145579	15474	8146	9687	0.634125	0.666177
56	Augusta-Richmond County, GA-SC Metro Area (part)	187084	123762	48994	9238	1412	3678	0.551061	0.581431
57	Augusta-Waterville, ME Micro Area	122151	116549	642	1504	886	2570	0.151290	0.104215
58	Austin, MN Micro Area	39163	32975	791	4138	636	623	0.369001	0.375469
59	Austin-Round Rock-San Marcos, TX Metro Area	1716289	938474	120510	538313	80980	38012	0.688892	0.739489
	...	...	...	...	...	...	...	...	...

	Total	White	Black	Hispanic	Asian	Other	entropy5	entropy4	p_Asian	p_Black	p_Hispanic	p_Other	p_White	gini_simpson	entropy_rice
Name
Abbeville, LA Micro Area	57999	46305	8246	1381	1148	919	0.428364	0.445662	0.019793	0.142175	0.023811	0.015845	0.798376	0.341173	0.387557
Aberdeen, SD Micro Area	40602	37774	192	554	358	1724	0.203138	0.113942	0.008817	0.004729	0.013645	0.042461	0.930348	0.132363	0.119791
Aberdeen, WA Micro Area	72797	59282	762	6272	995	5486	0.422324	0.321742	0.013668	0.010467	0.086157	0.075360	0.814347	0.323441	0.301262
Abilene, TX Metro Area	165252	112735	11549	35108	2110	3750	0.570100	0.597267	0.012768	0.069887	0.212451	0.022693	0.682201	0.483905	0.516722
Ada, OK Micro Area	37492	25973	879	1523	244	8873	0.525798	0.282587	0.006508	0.023445	0.040622	0.236664	0.692761	0.461830	0.313886