Goal

Let's get set up to use the US Census API:

http://www.census.gov/developers/

Things we'd like to be able to do:

  • calculate the population of California.
  • then calculate the population of every geographic entity going down to census block if possible.
  • for a given geographic unit, can we get the racial/ethnic breakdown?

It's useful to make ties to the county-type calculation we do with the downloaded census files.

Installing census, a useful Python module

Dependency: to start with -- let's use the Python module: https://pypi.python.org/pypi/census/

pip install -U  census

Getting and activating an API key

To use the census API, you will need an API key

"Your request for a new API key has been successfully submitted. Please check your email. In a few minutes you should receive a message with instructions on how to activate your new key."

Then create a settings.py in the same directory as this notebook (or somewhere else in your Python path) to hold settings.CENSUS_KEY. (I prefer this approach over directly exposing your API key in the notebook code.)


In [2]:
print "hello"


hello

In [3]:
# This cell should run successfully if you have a string set up to represent your census key

try:
    import settings
    assert type(settings.CENSUS_KEY) == str or type(settings.CENSUS_KEY) == unicode
except Exception as e:
    print "error in importing settings to get at settings.CENSUS_KEY", e

us.states module


In [4]:
# let's figure out a bit about the us module, in particular, us.states
# https://github.com/unitedstates/python-us

from us import states
assert states.CA.fips == u'06'

In [5]:
# set up your census object
# example from https://github.com/sunlightlabs/census

from census import Census
from us import states

c = Census(settings.CENSUS_KEY)

In [6]:
for (i, state) in enumerate(states.STATES):
    print i, state.name, state.fips


0 Alabama 01
1 Alaska 02
2 Arizona 04
3 Arkansas 05
4 California 06
5 Colorado 08
6 Connecticut 09
7 Delaware 10
8 District of Columbia 11
9 Florida 12
10 Georgia 13
11 Hawaii 15
12 Idaho 16
13 Illinois 17
14 Indiana 18
15 Iowa 19
16 Kansas 20
17 Kentucky 21
18 Louisiana 22
19 Maine 23
20 Maryland 24
21 Massachusetts 25
22 Michigan 26
23 Minnesota 27
24 Mississippi 28
25 Missouri 29
26 Montana 30
27 Nebraska 31
28 Nevada 32
29 New Hampshire 33
30 New Jersey 34
31 New Mexico 35
32 New York 36
33 North Carolina 37
34 North Dakota 38
35 Ohio 39
36 Oklahoma 40
37 Oregon 41
38 Pennsylvania 42
39 Rhode Island 44
40 South Carolina 45
41 South Dakota 46
42 Tennessee 47
43 Texas 48
44 Utah 49
45 Vermont 50
46 Virginia 51
47 Washington 53
48 West Virginia 54
49 Wisconsin 55
50 Wyoming 56

Formulating URL requests to the API explicitly


In [7]:
import requests

In [8]:
# get the total population of all states
url = "http://api.census.gov/data/2010/sf1?key={key}&get=P0010001,NAME&for=state:*".format(key=settings.CENSUS_KEY)

In [1]:
r = requests.get(url)


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-0aec23ab0de0> in <module>()
----> 1 r = requests.get(url)

NameError: name 'requests' is not defined

EXERCISE

Show how to calculate the total population of the USA, including and excluding Puerto Rico. (I don't know why Puerto Rico is included but not other unincorporated territories


In [10]:
# Including Puerto Rico
__builtin__.sum([int(lst[0]) for lst in r.json()[1:] if lst[1]] )


Out[10]:
312471327

In [11]:
# Excluding Puerto Rico
__builtin__.sum([int(lst[0]) for lst in r.json()[1:] if lst[1] != 'Puerto Rico'] )


Out[11]:
308745538

Next Steps: Focusing on sf1 + census

How to map out the geographical hierachy and pull out total population figures?

  1. Nation
  2. Regions
  3. Divisions
  4. State
  5. County
  6. Census Tract
  7. Block Group
  8. Census Block

Questions

  • What identifiers are used for these various geographic entities?
  • Can we get an enumeration of each of these entities?
  • How to figure out which census tract, block group, census block one is in?

Total Population of California

2010 Census Summary File 1

P0010001 is found in 2010 SF1 API Variables [XML] = "total population"


In [12]:
c.sf1.get(('NAME', 'P0010001'), {'for': 'state:%s' % states.CA.fips})


Out[12]:
[{u'NAME': u'California', u'P0010001': u'37253956', u'state': u'06'}]

In [16]:
c.sf1.get(('NAME', 'P0010001'), {'for': 'state:%s' % states.CA.fips})[0]['P0010001']


Out[16]:
u'37253956'

In [14]:
"population of California: {0}".format(
        int(c.sf1.get(('NAME', 'P0010001'), {'for': 'state:%s' % states.CA.fips})[0]['P0010001']))


Out[14]:
'population of California: 37253956'