In [2]:
library(dplyr)
library(ggplot2)
library(ggmap)
library(data.table)
library(maptools)
library(rgdal)
options(jupyter.plot_mimetypes = 'image/png')

Census tract data

To map data onto map at the census tract level, need to get California shapefile from census.gov. I chose cartographic boundary shapefiles for small scale mapping projects, at the 500k scale. The function readOGR() from the package rgdal can read shapefile into SpatialPolygonsDataFrame. GEOID has the unique ID for each county and will be used to join the ACS census data.


In [3]:
ca_shp = readOGR(dsn="./cb_2014_06_tract_500k", layer ="cb_2014_06_tract_500k")


OGR data source with driver: ESRI Shapefile 
Source: "./cb_2014_06_tract_500k", layer: "cb_2014_06_tract_500k"
with 8043 features
It has 9 fields
Warning message:
In readOGR(dsn = "./cb_2014_06_tract_500k", layer = "cb_2014_06_tract_500k"): Z-dimension discarded

Income data from ACS (American Community Survey)

Income data is from 'ACS_14_5YR_B19001', 5-year estimate from 2010 to 2014, for all census tracts in California, in inflation-adjusted dollars. The column 'HD)1_VD01' is the total number of people in that census tract. Other columns show estimates of number of people at a specified income level. I will focus on the last column 'HD01_VD17' for number of people with income $200,000 or more for this notebook.


In [4]:
ca_data = read.csv('ACS_14_5YR_B19001/ACS_14_5YR_B19001_with_ann.csv')

In [5]:
head(ca_data)


Out[5]:
GEO.idGEO.id2GEO.display.labelHD01_VD01HD02_VD01HD01_VD02HD02_VD02HD01_VD03HD02_VD03HD01_VD04ellip.hHD01_VD13HD02_VD13HD01_VD14HD02_VD14HD01_VD15HD02_VD15HD01_VD16HD02_VD16HD01_VD17HD02_VD17
1IdId2GeographyEstimate; Total:Margin of Error; Total:Estimate; Total: - Less than $10,000Margin of Error; Total: - Less than $10,000Estimate; Total: - $10,000 to $14,999Margin of Error; Total: - $10,000 to $14,999Estimate; Total: - $15,000 to $19,999Estimate; Total: - $75,000 to $99,999Margin of Error; Total: - $75,000 to $99,999Estimate; Total: - $100,000 to $124,999Margin of Error; Total: - $100,000 to $124,999Estimate; Total: - $125,000 to $149,999Margin of Error; Total: - $125,000 to $149,999Estimate; Total: - $150,000 to $199,999Margin of Error; Total: - $150,000 to $199,999Estimate; Total: - $200,000 or moreMargin of Error; Total: - $200,000 or more
21400000US0600140010006001400100Census Tract 4001, Alameda County, California13006632271117391475614764724016767530100
31400000US0600140020006001400200Census Tract 4002, Alameda County, California8154815130121058301104096421075225148
41400000US0600140030006001400300Census Tract 4003, Alameda County, California25109567513081406620688175911799022497465132
51400000US0600140040006001400400Census Tract 4004, Alameda County, California181281714558556027386245114185662348022678
61400000US0600140050006001400500Census Tract 4005, Alameda County, California159078383213872030891231846239106496832

In [6]:
ca_data = select(ca_data, GEO.id2, GEO.display.label, HD01_VD01,HD01_VD17) %>% slice(-1) 
head(ca_data)


Out[6]:
GEO.id2GEO.display.labelHD01_VD01HD01_VD17
106001400100Census Tract 4001, Alameda County, California1300530
206001400200Census Tract 4002, Alameda County, California815251
306001400300Census Tract 4003, Alameda County, California2510465
406001400400Census Tract 4004, Alameda County, California1812226
506001400500Census Tract 4005, Alameda County, California159068
606001400600Census Tract 4006, Alameda County, California72620

In [8]:
#convert data type and calculate percentage
ca_data$GEO.id2 = as.character(ca_data$GEO.id2)
ca_data$GEO.display.label = as.character(ca_data$GEO.display.label)
ca_data$HD01_VD01 = as.numeric(as.character(ca_data$HD01_VD01))
ca_data$HD01_VD17 = as.numeric(as.character(ca_data$HD01_VD17))
ca_data$percent = (ca_data$HD01_VD17/ca_data$HD01_VD01)*100
head(ca_data)


Out[8]:
GEO.id2GEO.display.labelHD01_VD01HD01_VD17percent
106001400100Census Tract 4001, Alameda County, California130053040.76923
206001400200Census Tract 4002, Alameda County, California81525130.79755
306001400300Census Tract 4003, Alameda County, California251046518.5259
406001400400Census Tract 4004, Alameda County, California181222612.47241
506001400500Census Tract 4005, Alameda County, California1590684.27673
606001400600Census Tract 4006, Alameda County, California726202.754821

In [9]:
summary(ca_data$percent)


Out[9]:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  0.000   1.040   3.512   7.174   9.639 100.000      73 

Merge census tract data with ACS income data

Join census tract data with income data based on id (GEO.ID2 from income data, and id from cencus tract). Select three counties in the bay area, San Francisco, Alameda, and San Mateo.


In [10]:
ca_tract<-fortify(ca_shp,region = "GEOID") 
str(ca_tract)


'data.frame':	330321 obs. of  7 variables:
 $ long : num  -122 -122 -122 -122 -122 ...
 $ lat  : num  37.9 37.9 37.9 37.9 37.9 ...
 $ order: int  1 2 3 4 5 6 7 8 9 10 ...
 $ hole : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ piece: Factor w/ 6 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ id   : chr  "06001400100" "06001400100" "06001400100" "06001400100" ...
 $ group: Factor w/ 8112 levels "06001400100.1",..: 1 1 1 1 1 1 1 1 1 1 ...

In [11]:
ca_data$id = ca_data$GEO.id2
ca_tract2 = left_join(ca_tract,ca_data, by=c('id'))
dim(ca_tract2)


Out[11]:
  1. 330321
  2. 12

In [12]:
ca_tract_sf = ca_tract2[grep('San Francisco',ca_tract2$GEO.display.label),]
dim(ca_tract_sf)


Out[12]:
  1. 2980
  2. 12

In [13]:
basemap <-get_map('San Francisco', zoom=12) 
ggmap(basemap) +
geom_polygon(data = ca_tract_sf , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('San Francisco High Income Rate\n ACS 2010-2014 data')


Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=San+Francisco&zoom=12&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=San%20Francisco&sensor=false

In [14]:
ca_tract_alameda = ca_tract2[grep('Alameda',ca_tract2$GEO.display.label),]
dim(ca_tract_alameda)


Out[14]:
  1. 8398
  2. 12

In [15]:
basemap <-get_map('Hayward', zoom=10) 
ggmap(basemap) +
geom_polygon(data = ca_tract_alameda , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('Alameda High Income Rate\n ACS 2010-2014 data')


Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=Hayward&zoom=10&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Hayward&sensor=false

In [26]:
#remove the tract with 100% high-income to highlight the lower range
ca_data2 = filter(ca_data,percent<100)
dim(ca_data2)


Out[26]:
  1. 7982
  2. 6

In [27]:
ca_tract3 = left_join(ca_tract,ca_data2, by=c('id'))
dim(ca_tract2)


Out[27]:
  1. 330321
  2. 12

In [29]:
ca_tract_alameda = ca_tract3[grep('Alameda',ca_tract3$GEO.display.label),]
dim(ca_tract_alameda)


Out[29]:
  1. 8361
  2. 12

In [30]:
basemap <-get_map('Hayward', zoom=10) 
ggmap(basemap) +
geom_polygon(data = ca_tract_alameda , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('Alameda High Income Rate\n ACS 2010-2014 data')


Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=Hayward&zoom=10&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Hayward&sensor=false

In [ ]:


In [16]:
ca_tract_sanmateo = ca_tract2[grep('San Mateo',ca_tract2$GEO.display.label),]
dim(ca_tract_sanmateo)


Out[16]:
  1. 5235
  2. 12

In [17]:
basemap <-get_map('San Mateo', zoom=10) 
ggmap(basemap) +
geom_polygon(data = ca_tract_sanmateo , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('San Mateo High Income Rate\n ACS 2010-2014 data')


Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=San+Mateo&zoom=10&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=San%20Mateo&sensor=false

In [ ]:


In [19]:
#zoom in on oakland and san leandro
basemap <-get_map('San Leandro', zoom=11) 
ggmap(basemap)+
geom_polygon(data = ca_tract_alameda , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('Alameda Poverty Rate\n ACS 2010-2014 data')


Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=San+Leandro&zoom=11&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=San%20Leandro&sensor=false

In [ ]: