Read in JSON and DataFrame Basics
In [2]:
# read population in
import json
import requests
from pandas import DataFrame
# pop_json_url holds a
pop_json_url = "https://gist.github.com/rdhyee/8511607/raw/f16257434352916574473e63612fcea55a0c1b1c/population_of_countries.json"
pop_list= requests.get(pop_json_url).json()
df = DataFrame(pop_list)
df[:5]
Out[2]:
In [11]:
df.dtypes
Out[11]:
Q: Based on the above statement, which of these would you expect to see in pop_list?
['1', 'United States', '320050716'][1, 'United States', 320050716]['United States', 320050716][1, 'United States', '320050716']Q: What is the relationship between s and the population of China?
s = sum(df[df[1].str.startswith('C')][2])
s is greater than the population of Chinas is the same as the population of Chinas is less than the population of Chinas is not a number.Q: This statement does the following?
df.columns = ['Number','Country','Population']
columnsQ: How would you rewrite this statement to get the same result
s = sum(df[df[1].str.startswith('C')][2])
after running:
df.columns = ['Number','Country','Population']
Series Examples
In [54]:
from pandas import DataFrame, Series
import numpy as np
s1 = Series(np.arange(1,4))
s1
Out[54]:
Q: What is
s1 + 1
Q: What is
s1.apply(lambda k: 2*k).sum()
Q: What is
s1.cumsum()[1]
Q: What is
s1.cumsum() + s1.cumsum()
Q: Describe what is happening in these statements:
s1 + 1
and
s1.cumsum() + s1.cumsum()
Q: What is
np.any(s1 > 2)
Census API Examples
In [62]:
from census import Census
from us import states
import settings
c = Census(settings.CENSUS_KEY)
c.sf1.get(('NAME', 'P0010001'), {'for': 'state:%s' % states.CA.fips})
Out[62]:
Q: What is the purpose of settings.CENSUS_KEY?
Q: What is the difference between r1 and r2?
r1 = c.sf1.get(('NAME', 'P0010001'), {'for': 'county:*', 'in': 'state:%s' % states.CA.fips})
r2 = c.sf1.get(('NAME', 'P0010001'), {'for': 'county:*', 'in': 'state:*' })
Q: Which is the correct geographic hierarchy?
Nation > States = Nation is subdivided into States
In [72]:
from pandas import DataFrame
r = c.sf1.get(('NAME', 'P0010001'), {'for': 'state:*'})
df = DataFrame(r)
df.head()
Out[72]:
Q: Why does df have 52 items? Please explain
In [75]:
len(df)
Out[75]:
Q: Why are the results below different? Please explain
In [84]:
print df.P0010001.sum()
print
print df.P0010001.astype(int).sum()
Q: Describe the output of the following:
df.P0010001 = df.P0010001.astype(int)
df[['NAME','P0010001']].sort('P0010001', ascending=False).head()
Q: After running:
df.set_index('NAME', inplace=True)
how would you access the Series for the state of Alaska?
In [90]:
np.in1d([ s.fips for s in states.STATES], df.state)
Out[90]:
In [91]:
df[np.in1d(df.state, [ s.fips for s in states.STATES])]
Out[91]: