In [2]:

    
library(dplyr)
library(ggplot2)
library(ggmap)
library(data.table)
library(maptools)
library(rgdal)
options(jupyter.plot_mimetypes = 'image/png')

Census tract data

To map data onto map at the census tract level, need to get California shapefile from census.gov. I chose cartographic boundary shapefiles for small scale mapping projects, at the 500k scale. The function readOGR() from the package rgdal can read shapefile into SpatialPolygonsDataFrame. GEOID has the unique ID for each county and will be used to join the ACS census data.



In [3]:

    
ca_shp = readOGR(dsn="./cb_2014_06_tract_500k", layer ="cb_2014_06_tract_500k")









    



OGR data source with driver: ESRI Shapefile 
Source: "./cb_2014_06_tract_500k", layer: "cb_2014_06_tract_500k"
with 8043 features
It has 9 fields






    



Warning message:
In readOGR(dsn = "./cb_2014_06_tract_500k", layer = "cb_2014_06_tract_500k"): Z-dimension discarded

Income data from ACS (American Community Survey)

Income data is from 'ACS_14_5YR_B19001', 5-year estimate from 2010 to 2014, for all census tracts in California, in inflation-adjusted dollars. The column 'HD)1_VD01' is the total number of people in that census tract. Other columns show estimates of number of people at a specified income level. I will focus on the last column 'HD01_VD17' for number of people with income $200,000 or more for this notebook.



In [4]:

    
ca_data = read.csv('ACS_14_5YR_B19001/ACS_14_5YR_B19001_with_ann.csv')



In [5]:

    
head(ca_data)









    Out[5]:





GEO.id GEO.id2 GEO.display.label HD01_VD01 HD02_VD01 HD01_VD02 HD02_VD02 HD01_VD03 HD02_VD03 HD01_VD04 ellip.h HD01_VD13 HD02_VD13 HD01_VD14 HD02_VD14 HD01_VD15 HD02_VD15 HD01_VD16 HD02_VD16 HD01_VD17 HD02_VD17

	1 Id Id2 Geography Estimate; Total: Margin of Error; Total: Estimate; Total: - Less than $10,000 Margin of Error; Total: - Less than $10,000 Estimate; Total: - $10,000 to $14,999 Margin of Error; Total: - $10,000 to $14,999 Estimate; Total: - $15,000 to $19,999 ⋯ Estimate; Total: - $75,000 to $99,999 Margin of Error; Total: - $75,000 to $99,999 Estimate; Total: - $100,000 to $124,999 Margin of Error; Total: - $100,000 to $124,999 Estimate; Total: - $125,000 to $149,999 Margin of Error; Total: - $125,000 to $149,999 Estimate; Total: - $150,000 to $199,999 Margin of Error; Total: - $150,000 to $199,999 Estimate; Total: - $200,000 or more Margin of Error; Total: - $200,000 or more
	2 1400000US06001400100 06001400100 Census Tract 4001, Alameda County, California 1300 66 32 27 11 17 39 ⋯ 147 56 147 64 72 40 167 67 530 100
	3 1400000US06001400200 06001400200 Census Tract 4002, Alameda County, California 815 48 15 13 0 12 10 ⋯ 58 30 110 40 96 42 107 52 251 48
	4 1400000US06001400300 06001400300 Census Tract 4003, Alameda County, California 2510 95 67 51 308 140 66 ⋯ 206 88 175 91 179 90 224 97 465 132
	5 1400000US06001400400 06001400400 Census Tract 4004, Alameda County, California 1812 81 71 45 58 55 60 ⋯ 273 86 245 114 185 66 234 80 226 78
	6 1400000US06001400500 06001400500 Census Tract 4005, Alameda County, California 1590 78 38 32 138 72 0 ⋯ 308 91 231 84 62 39 106 49 68 32



In [6]:

    
ca_data = select(ca_data, GEO.id2, GEO.display.label, HD01_VD01,HD01_VD17) %>% slice(-1) 
head(ca_data)









    Out[6]:





GEO.id2 GEO.display.label HD01_VD01 HD01_VD17

	1 06001400100 Census Tract 4001, Alameda County, California 1300 530
	2 06001400200 Census Tract 4002, Alameda County, California 815 251
	3 06001400300 Census Tract 4003, Alameda County, California 2510 465
	4 06001400400 Census Tract 4004, Alameda County, California 1812 226
	5 06001400500 Census Tract 4005, Alameda County, California 1590 68
	6 06001400600 Census Tract 4006, Alameda County, California 726 20



In [8]:

    
#convert data type and calculate percentage
ca_data$GEO.id2 = as.character(ca_data$GEO.id2)
ca_data$GEO.display.label = as.character(ca_data$GEO.display.label)
ca_data$HD01_VD01 = as.numeric(as.character(ca_data$HD01_VD01))
ca_data$HD01_VD17 = as.numeric(as.character(ca_data$HD01_VD17))
ca_data$percent = (ca_data$HD01_VD17/ca_data$HD01_VD01)*100
head(ca_data)









    Out[8]:





GEO.id2 GEO.display.label HD01_VD01 HD01_VD17 percent

	1 06001400100 Census Tract 4001, Alameda County, California 1300 530 40.76923
	2 06001400200 Census Tract 4002, Alameda County, California 815 251 30.79755
	3 06001400300 Census Tract 4003, Alameda County, California 2510 465 18.5259
	4 06001400400 Census Tract 4004, Alameda County, California 1812 226 12.47241
	5 06001400500 Census Tract 4005, Alameda County, California 1590 68 4.27673
	6 06001400600 Census Tract 4006, Alameda County, California 726 20 2.754821



In [9]:

    
summary(ca_data$percent)









    Out[9]:





   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  0.000   1.040   3.512   7.174   9.639 100.000      73

Merge census tract data with ACS income data

Join census tract data with income data based on id (GEO.ID2 from income data, and id from cencus tract). Select three counties in the bay area, San Francisco, Alameda, and San Mateo.



In [10]:

    
ca_tract<-fortify(ca_shp,region = "GEOID") 
str(ca_tract)









    



'data.frame':	330321 obs. of  7 variables:
 $ long : num  -122 -122 -122 -122 -122 ...
 $ lat  : num  37.9 37.9 37.9 37.9 37.9 ...
 $ order: int  1 2 3 4 5 6 7 8 9 10 ...
 $ hole : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ piece: Factor w/ 6 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ id   : chr  "06001400100" "06001400100" "06001400100" "06001400100" ...
 $ group: Factor w/ 8112 levels "06001400100.1",..: 1 1 1 1 1 1 1 1 1 1 ...



In [11]:

    
ca_data$id = ca_data$GEO.id2
ca_tract2 = left_join(ca_tract,ca_data, by=c('id'))
dim(ca_tract2)









    Out[11]:





	330321
	12



In [12]:

    
ca_tract_sf = ca_tract2[grep('San Francisco',ca_tract2$GEO.display.label),]
dim(ca_tract_sf)









    Out[12]:





	2980
	12



In [13]:

    
basemap <-get_map('San Francisco', zoom=12) 
ggmap(basemap) +
geom_polygon(data = ca_tract_sf , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('San Francisco High Income Rate\n ACS 2010-2014 data')









    



Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=San+Francisco&zoom=12&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=San%20Francisco&sensor=false



In [14]:

    
ca_tract_alameda = ca_tract2[grep('Alameda',ca_tract2$GEO.display.label),]
dim(ca_tract_alameda)









    Out[14]:





	8398
	12



In [15]:

    
basemap <-get_map('Hayward', zoom=10) 
ggmap(basemap) +
geom_polygon(data = ca_tract_alameda , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('Alameda High Income Rate\n ACS 2010-2014 data')









    



Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=Hayward&zoom=10&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Hayward&sensor=false



In [26]:

    
#remove the tract with 100% high-income to highlight the lower range
ca_data2 = filter(ca_data,percent<100)
dim(ca_data2)









    Out[26]:





	7982
	6



In [27]:

    
ca_tract3 = left_join(ca_tract,ca_data2, by=c('id'))
dim(ca_tract2)









    Out[27]:





	330321
	12



In [29]:

    
ca_tract_alameda = ca_tract3[grep('Alameda',ca_tract3$GEO.display.label),]
dim(ca_tract_alameda)









    Out[29]:





	8361
	12



In [30]:

    
basemap <-get_map('Hayward', zoom=10) 
ggmap(basemap) +
geom_polygon(data = ca_tract_alameda , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('Alameda High Income Rate\n ACS 2010-2014 data')









    



Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=Hayward&zoom=10&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Hayward&sensor=false



In [ ]:



In [16]:

    
ca_tract_sanmateo = ca_tract2[grep('San Mateo',ca_tract2$GEO.display.label),]
dim(ca_tract_sanmateo)









    Out[16]:





	5235
	12



In [17]:

    
basemap <-get_map('San Mateo', zoom=10) 
ggmap(basemap) +
geom_polygon(data = ca_tract_sanmateo , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('San Mateo High Income Rate\n ACS 2010-2014 data')









    



Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=San+Mateo&zoom=10&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=San%20Mateo&sensor=false



In [ ]:



In [19]:

    
#zoom in on oakland and san leandro
basemap <-get_map('San Leandro', zoom=11) 
ggmap(basemap)+
geom_polygon(data = ca_tract_alameda , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('Alameda Poverty Rate\n ACS 2010-2014 data')









    



Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=San+Leandro&zoom=11&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=San%20Leandro&sensor=false



In [ ]:

	GEO.id	GEO.id2	GEO.display.label	HD01_VD01	HD02_VD01	HD01_VD02	HD02_VD02	HD01_VD03	HD02_VD03	HD01_VD04	ellip.h	HD01_VD13	HD02_VD13	HD01_VD14	HD02_VD14	HD01_VD15	HD02_VD15	HD01_VD16	HD02_VD16	HD01_VD17	HD02_VD17
1	Id	Id2	Geography	Estimate; Total:	Margin of Error; Total:	Estimate; Total: - Less than $10,000	Margin of Error; Total: - Less than $10,000	Estimate; Total: - $10,000 to $14,999	Margin of Error; Total: - $10,000 to $14,999	Estimate; Total: - $15,000 to $19,999	⋯	Estimate; Total: - $75,000 to $99,999	Margin of Error; Total: - $75,000 to $99,999	Estimate; Total: - $100,000 to $124,999	Margin of Error; Total: - $100,000 to $124,999	Estimate; Total: - $125,000 to $149,999	Margin of Error; Total: - $125,000 to $149,999	Estimate; Total: - $150,000 to $199,999	Margin of Error; Total: - $150,000 to $199,999	Estimate; Total: - $200,000 or more	Margin of Error; Total: - $200,000 or more
2	1400000US06001400100	06001400100	Census Tract 4001, Alameda County, California	1300	66	32	27	11	17	39	⋯	147	56	147	64	72	40	167	67	530	100
3	1400000US06001400200	06001400200	Census Tract 4002, Alameda County, California	815	48	15	13	0	12	10	⋯	58	30	110	40	96	42	107	52	251	48
4	1400000US06001400300	06001400300	Census Tract 4003, Alameda County, California	2510	95	67	51	308	140	66	⋯	206	88	175	91	179	90	224	97	465	132
5	1400000US06001400400	06001400400	Census Tract 4004, Alameda County, California	1812	81	71	45	58	55	60	⋯	273	86	245	114	185	66	234	80	226	78
6	1400000US06001400500	06001400500	Census Tract 4005, Alameda County, California	1590	78	38	32	138	72	0	⋯	308	91	231	84	62	39	106	49	68	32