Read in JSON and DataFrame Basics



In [2]:

    
# read population in
import json
import requests
from pandas import DataFrame

# pop_json_url holds a 
pop_json_url = "https://gist.github.com/rdhyee/8511607/raw/f16257434352916574473e63612fcea55a0c1b1c/population_of_countries.json"
pop_list= requests.get(pop_json_url).json()

df = DataFrame(pop_list)
df[:5]









    Out[2]:






  
    
      
      0
      1
      2
    
  
  
    
      0
       1
               China
       1385566537
    
    
      1
       2
               India
       1252139596
    
    
      2
       3
       United States
        320050716
    
    
      3
       4
           Indonesia
        249865631
    
    
      4
       5
              Brazil
        200361925
    
  

5 rows × 3 columns



In [11]:

    
df.dtypes









    Out[11]:





0    float64
1     object
2      int64
dtype: object

Q: Based on the above statement, which of these would you expect to see in pop_list?

['1', 'United States', '320050716']
[1, 'United States', 320050716]
['United States', 320050716]
[1, 'United States', '320050716']

Q: What is the relationship between s and the population of China?

s = sum(df[df[1].str.startswith('C')][2])

s is greater than the population of China
s is the same as the population of China
s is less than the population of China
s is not a number.

Q: This statement does the following?

df.columns = ['Number','Country','Population']

Nothing
df gets a new attribute called columns
df's columns are renamed based on the list
Throws an exception

Q: How would you rewrite this statement to get the same result

s = sum(df[df[1].str.startswith('C')][2])

after running:

df.columns = ['Number','Country','Population']

Series Examples



In [54]:

    
from pandas import DataFrame, Series
import numpy as np

s1 = Series(np.arange(1,4))
s1









    Out[54]:





0    1
1    2
2    3
dtype: int64

Q: What is

s1 + 1

Q: What is

s1.apply(lambda k: 2*k).sum()

Q: What is

s1.cumsum()[1]

Q: What is

s1.cumsum() + s1.cumsum()

Q: Describe what is happening in these statements:

s1 + 1

and

s1.cumsum() + s1.cumsum()

Q: What is

np.any(s1 > 2)

Census API Examples



In [62]:

    
from census import Census
from us import states

import settings

c = Census(settings.CENSUS_KEY)
c.sf1.get(('NAME', 'P0010001'), {'for': 'state:%s' % states.CA.fips})









    Out[62]:





[{u'NAME': u'California', u'P0010001': u'37253956', u'state': u'06'}]

Q: What is the purpose of settings.CENSUS_KEY?

It is the password for the Census Python package
It is an API Access key for authentication with the Census API
It is an API Access key for authentication with Github
It is key shared by all users of the Census API

Q: What is the difference between r1 and r2?

r1 = c.sf1.get(('NAME', 'P0010001'), {'for': 'county:*', 'in': 'state:%s' % states.CA.fips})
r2 = c.sf1.get(('NAME', 'P0010001'), {'for': 'county:*', 'in': 'state:*' })

Q: Which is the correct geographic hierarchy?

Nation > States = Nation is subdivided into States

Counties > States
Counties > Census Blocks > Census Tracks
Places > Counties
Census Tracts > Block Groups > Census Blocks



In [72]:

    
from pandas import DataFrame

r = c.sf1.get(('NAME', 'P0010001'), {'for': 'state:*'})
df = DataFrame(r)

df.head()









    Out[72]:






  
    
      
      NAME
      P0010001
      state
    
  
  
    
      0
          Alabama
        4779736
       01
    
    
      1
           Alaska
         710231
       02
    
    
      2
          Arizona
        6392017
       04
    
    
      3
         Arkansas
        2915918
       05
    
    
      4
       California
       37253956
       06
    
  

5 rows × 3 columns

Q: Why does df have 52 items? Please explain



In [75]:

    
len(df)









    Out[75]:





52

Q: Why are the results below different? Please explain



In [84]:

    
print df.P0010001.sum()
print
print df.P0010001.astype(int).sum()









    



477973671023163920172915918372539565029196357409789793460172318801310968765313603011567582128306326483802304635528531184339367453337213283615773552654762998836405303925296729759889279894151826341270055113164708791894205917919378102953548367259111536504375135138310741270237910525674625364814180634610525145561276388562574180010246724540185299456869865636263725789

312471327

Q: Describe the output of the following:

df.P0010001 = df.P0010001.astype(int)
df[['NAME','P0010001']].sort('P0010001', ascending=False).head()

Q: After running:

df.set_index('NAME', inplace=True)

how would you access the Series for the state of Alaska?

df['Alaska']
df[1]
df.ix['Alaska']
df[df['NAME'] == 'Alaska']



In [90]:

    
np.in1d([ s.fips for s in states.STATES], df.state)









    Out[90]:





array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True], dtype=bool)



In [91]:

    
df[np.in1d(df.state, [ s.fips for s in states.STATES])]









    Out[91]:






  
    
      
      NAME
      P0010001
      state
    
  
  
    
      0 
                    Alabama
        4779736
       01
    
    
      1 
                     Alaska
         710231
       02
    
    
      2 
                    Arizona
        6392017
       04
    
    
      3 
                   Arkansas
        2915918
       05
    
    
      4 
                 California
       37253956
       06
    
    
      5 
                   Colorado
        5029196
       08
    
    
      6 
                Connecticut
        3574097
       09
    
    
      7 
                   Delaware
         897934
       10
    
    
      8 
       District of Columbia
         601723
       11
    
    
      9 
                    Florida
       18801310
       12
    
    
      10
                    Georgia
        9687653
       13
    
    
      11
                     Hawaii
        1360301
       15
    
    
      12
                      Idaho
        1567582
       16
    
    
      13
                   Illinois
       12830632
       17
    
    
      14
                    Indiana
        6483802
       18
    
    
      15
                       Iowa
        3046355
       19
    
    
      16
                     Kansas
        2853118
       20
    
    
      17
                   Kentucky
        4339367
       21
    
    
      18
                  Louisiana
        4533372
       22
    
    
      19
                      Maine
        1328361
       23
    
    
      20
                   Maryland
        5773552
       24
    
    
      21
              Massachusetts
        6547629
       25
    
    
      22
                   Michigan
        9883640
       26
    
    
      23
                  Minnesota
        5303925
       27
    
    
      24
                Mississippi
        2967297
       28
    
    
      25
                   Missouri
        5988927
       29
    
    
      26
                    Montana
         989415
       30
    
    
      27
                   Nebraska
        1826341
       31
    
    
      28
                     Nevada
        2700551
       32
    
    
      29
              New Hampshire
        1316470
       33
    
    
      30
                 New Jersey
        8791894
       34
    
    
      31
                 New Mexico
        2059179
       35
    
    
      32
                   New York
       19378102
       36
    
    
      33
             North Carolina
        9535483
       37
    
    
      34
               North Dakota
         672591
       38
    
    
      35
                       Ohio
       11536504
       39
    
    
      36
                   Oklahoma
        3751351
       40
    
    
      37
                     Oregon
        3831074
       41
    
    
      38
               Pennsylvania
       12702379
       42
    
    
      39
               Rhode Island
        1052567
       44
    
    
      40
             South Carolina
        4625364
       45
    
    
      41
               South Dakota
         814180
       46
    
    
      42
                  Tennessee
        6346105
       47
    
    
      43
                      Texas
       25145561
       48
    
    
      44
                       Utah
        2763885
       49
    
    
      45
                    Vermont
         625741
       50
    
    
      46
                   Virginia
        8001024
       51
    
    
      47
                 Washington
        6724540
       53
    
    
      48
              West Virginia
        1852994
       54
    
    
      49
                  Wisconsin
        5686986
       55
    
    
      50
                    Wyoming
         563626
       56
    
  

51 rows × 3 columns

	0	1	2
0	1	China	1385566537
1	2	India	1252139596
2	3	United States	320050716
3	4	Indonesia	249865631
4	5	Brazil	200361925

	NAME	P0010001	state
0	Alabama	4779736	01
1	Alaska	710231	02
2	Arizona	6392017	04
3	Arkansas	2915918	05
4	California	37253956	06