Working definition of open data
From http://en.wikipedia.org/w/index.php?title=Special:Cite&page=Open_data&id=532390265:
Open data is the idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control.
A piece of content or data is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and/or share-alike.
PfDA
, Chap 3 Python for Data Analysis > 3. IPython: An Interactive Computing and Development EnvironmentPfDA
, Appendix: Python Language Essentials -- to help remind yourself of key elements of standard Python PfDA
, Chap 2 Introductory ExamplesDay_01_B_World_Population.ipynb
The Racial Dot Map: One Dot Per Person | Weldon Cooper Center for Public Service
pip
and how to use it?
In [2]:
# set up your census object
# example from https://github.com/sunlightlabs/census
from census import Census
from us import states
import settings
c = Census(settings.CENSUS_KEY)
for (i, state) in enumerate(states.STATES):
print i, state.name, state.fips
In [3]:
import requests
# get the total population of all states
url = "http://api.census.gov/data/2010/sf1?key={key}&get=P0010001,NAME&for=state:*".format(key=settings.CENSUS_KEY)
r = requests.get(url)
r.json()[:5]
Out[3]:
In [4]:
c.sf1.get(('NAME', 'P0010001'), {'for': 'state:%s' % states.CA.fips})
Out[4]:
Day 3: Key Concept for Today: Execution Environment of Python
PfDA
, Appendix: Python Language Essentials -- to help remind yourself of key elements of standard Python PfDA
, Chap 3 Python for Data Analysis > 3. IPython: An Interactive Computing and Development EnvironmentPfDA
, Chap 2 Introductory ExamplesDay 4: Work through Day_04_B_numpy_and_pandas_series.ipynb (everything before Advanced: Operator Overloading)
You should be able to calculate the total population of the US in the fill-in section of Day_04_C_Census.ipynb.
You should be able to calculate the population of California by totaling the county populations.
For Day 5: Geographical Hierarchies in the Census, study:
Day_06_C_Calculating_Diversity_Preview.ipynb and Day_06_D_Assignment
generators Day 6: Generators for Geographic Entities
Day_06_D_Assignment.ipynb: exercise to write a generator for Census Places (answer: Day_06_E_Assignment_Answers.ipynb)
In [9]:
# You should understand how this works.
import pandas as pd
from pandas import DataFrame
import census
import settings
import us
from itertools import islice
c=census.Census(settings.CENSUS_KEY)
def places(variables="NAME"):
for state in us.states.STATES:
print state
geo = {'for':'place:*', 'in':'state:{s_fips}'.format(s_fips=state.fips)}
for place in c.sf1.get(variables, geo=geo):
yield place
r = list(islice(places("NAME,P0010001"), None))
places_df = DataFrame(r)
places_df.P0010001 = places_df.P0010001.astype('int')
places_df['FIPS'] = places_df.apply(lambda s: s['state']+s['place'], axis=1)
# print "number of places", len(places_df)
# print "total pop", places_df.P0010001.sum()
# places_df.head()
assert places_df.P0010001.sum() == 228457238
# number of places in 2010 Census
assert len(places_df) == 29261
# places_df
apply + lambda functions: Day_06_A_Apply_Lambda.ipynb
http://www.census.gov/developers/data/sf1.xml
compare to http://www.census.gov/prod/cen2010/briefs/c2010br-02.pdf
I think the P0050001 might be the key category
P0050002 Not Hispanic or Latino (total) =
Not Hispanic Other (should also be P0050002 - (P0050003 + P0050004 + P0050006)
P0050010 Hispanic or Latino
P0050010 = P0050011...P0050017
"Whites are coded as blue; African-Americans, green; Asians, red; Hispanics, orange; and all other racial categories are coded as brown."
Day 7: Preview of Plotting Graphs and Maps
Do the following notebooks work for you to show basic graphics.
Day_07_E_Census_fields.ipynb is an exploration of the concepts and variables in the 2010 Census.
Day_07_F_Groupby.ipynb: gives you background on how to understand and use groupby
in Pandas. Don't miss AJ's Day_10_Groupby_Examples.ipynb, which should be helpful, especially if you found Day_10_Groupby_Examples.ipynb obscure.
Day_07_G_Calculating_Diversity.ipynb: a prelude to the big diversity-calculation assignment Day_08_A_Metro_Diversity.ipynb
I will assume that you've read Chapter 8 of PfDA
and can run Day_11_B_Setting_Up_for_PfDA.ipynb.
study overview slide: Day 12: Overview of Plotting Options.
Note some fundamental conceptual aspects to matplotlib
(as I outline in Day_12_A_Matplotlib_Intro.ipynb
and try to make basic plots on your own (line plots, scatter plots, bar plots).
Day_12_B_Baby_Names_Starter.ipynb#Names-that-are-both-M-and-F
Before you use Day_13_C_Baby_Names_MF_Completed.ipynb, try the approach in Day_13_B_Baby_Names_MF_Starter.ipynb
Assignment in nbviewer.ipython.org/github/rdhyee/working-open-data-2014/blob/master/notebooks/Day_13_B_Baby_Names_MF_Starter.ipynb:
Submit a notebook that describes what you've learned about the nature of ambigendered names in the baby names database. (Due date:
Monday, March 10Wed, March 12 at 11:5pm --> bCourses assignment) I'm interested in seeing what you do with the data set in this regard. At the minimum, show that you are able to run Day_13_C_Baby_Names_MF_Completed. Be creative and have fun.
In [5]: