In [2]:
library(dplyr)
library(ggplot2)
library(ggmap)
library(data.table)
library(maptools)
library(rgdal)
options(jupyter.plot_mimetypes = 'image/png')

Census tract data

To map data onto map at the census tract level, need to get California shapefile from census.gov. I chose cartographic boundary shapefiles for small scale mapping projects, at the 500k scale. The function readOGR() from the package rgdal can read shapefile into SpatialPolygonsDataFrame. GEOID has the unique ID for each county and will be used to join the ACS census data.


In [3]:
ca_shp = readOGR(dsn="./cb_2014_06_tract_500k", layer ="cb_2014_06_tract_500k")


OGR data source with driver: ESRI Shapefile 
Source: "./cb_2014_06_tract_500k", layer: "cb_2014_06_tract_500k"
with 8043 features
It has 9 fields
Warning message:
In readOGR(dsn = "./cb_2014_06_tract_500k", layer = "cb_2014_06_tract_500k"): Z-dimension discarded

Poverty data from ACS(American community survey)

Poverty data is from ACS 'ACS_14_5YR_B17001', 5-year estimate 2010-2014, for all census tracts in California. The first column HD01_VD01 is the total number of people in the census tract. The column HD02_VD01 is the number of people whose income in the past 12 months were below poverty level. GEO.id2 is the unique ID for each census tract and will be used to join census tract data.


In [4]:
ca_data = read.csv('ACS_14_5YR_B17001/ACS_14_5YR_B17001_with_ann.csv')

In [5]:
head(ca_data)


Out[5]:
GEO.idGEO.id2GEO.display.labelHD01_VD01HD02_VD01HD01_VD02HD02_VD02HD01_VD03HD02_VD03HD01_VD04ellip.hHD01_VD55HD02_VD55HD01_VD56HD02_VD56HD01_VD57HD02_VD57HD01_VD58HD02_VD58HD01_VD59HD02_VD59
1IdId2GeographyEstimate; Total:Margin of Error; Total:Estimate; Income in the past 12 months below poverty level:Margin of Error; Income in the past 12 months below poverty level:Estimate; Income in the past 12 months below poverty level: - Male:Margin of Error; Income in the past 12 months below poverty level: - Male:Estimate; Income in the past 12 months below poverty level: - Male: - Under 5 yearsEstimate; Income in the past 12 months at or above poverty level: - Female: - 35 to 44 yearsMargin of Error; Income in the past 12 months at or above poverty level: - Female: - 35 to 44 yearsEstimate; Income in the past 12 months at or above poverty level: - Female: - 45 to 54 yearsMargin of Error; Income in the past 12 months at or above poverty level: - Female: - 45 to 54 yearsEstimate; Income in the past 12 months at or above poverty level: - Female: - 55 to 64 yearsMargin of Error; Income in the past 12 months at or above poverty level: - Female: - 55 to 64 yearsEstimate; Income in the past 12 months at or above poverty level: - Female: - 65 to 74 yearsMargin of Error; Income in the past 12 months at or above poverty level: - Female: - 65 to 74 yearsEstimate; Income in the past 12 months at or above poverty level: - Female: - 75 years and overMargin of Error; Income in the past 12 months at or above poverty level: - Female: - 75 years and over
21400000US0600140010006001400100Census Tract 4001, Alameda County, California27521741378983510153592216733880198568641
31400000US0600140020006001400200Census Tract 4002, Alameda County, California193914778351925016847139479730160365526
41400000US0600140030006001400300Census Tract 4003, Alameda County, California53654624601541698405751403031113801232599910774
51400000US0600140040006001400400Census Tract 4004, Alameda County, California42743263071091146193328928472198621405811746
61400000US0600140050006001400500Census Tract 4005, Alameda County, California3516282530182320133027470187642107279516760

In [6]:
ca_data = select(ca_data, GEO.id2, GEO.display.label, HD01_VD01,HD01_VD02) %>% slice(-1) 
head(ca_data)


Out[6]:
GEO.id2GEO.display.labelHD01_VD01HD01_VD02
106001400100Census Tract 4001, Alameda County, California2752137
206001400200Census Tract 4002, Alameda County, California193978
306001400300Census Tract 4003, Alameda County, California5365460
406001400400Census Tract 4004, Alameda County, California4274307
506001400500Census Tract 4005, Alameda County, California3516530
606001400600Census Tract 4006, Alameda County, California1733143

In [7]:
#convert data type and calculate percentage
ca_data$GEO.id2 = as.character(ca_data$GEO.id2)
ca_data$GEO.display.label = as.character(ca_data$GEO.display.label)
ca_data$HD01_VD01 = as.numeric(as.character(ca_data$HD01_VD01))
ca_data$HD01_VD02 = as.numeric(as.character(ca_data$HD01_VD02))
ca_data$percent = (ca_data$HD01_VD02/ca_data$HD01_VD01)*100
head(ca_data)


Out[7]:
GEO.id2GEO.display.labelHD01_VD01HD01_VD02percent
106001400100Census Tract 4001, Alameda County, California27521374.978198
206001400200Census Tract 4002, Alameda County, California1939784.022692
306001400300Census Tract 4003, Alameda County, California53654608.574091
406001400400Census Tract 4004, Alameda County, California42743077.182967
506001400500Census Tract 4005, Alameda County, California351653015.07395
606001400600Census Tract 4006, Alameda County, California17331438.251587

In [8]:
summary(ca_data$percent)


Out[8]:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  0.000   7.296  13.470  16.640  23.410 100.000      67 

Merge census tract data with ACS poverty data

To use census tract data, need to convert the SpatialPolygonsDataFrame back into data frame with fortiy(). Join census tract data with insurace coverage data based on id(unique id for each census tract). Select three counties in the bay area to map, San Francisco, Alameda, and San Mateo.


In [9]:
ca_tract<-fortify(ca_shp,region = "GEOID") 
str(ca_tract)


'data.frame':	330321 obs. of  7 variables:
 $ long : num  -122 -122 -122 -122 -122 ...
 $ lat  : num  37.9 37.9 37.9 37.9 37.9 ...
 $ order: int  1 2 3 4 5 6 7 8 9 10 ...
 $ hole : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ piece: Factor w/ 6 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ id   : chr  "06001400100" "06001400100" "06001400100" "06001400100" ...
 $ group: Factor w/ 8112 levels "06001400100.1",..: 1 1 1 1 1 1 1 1 1 1 ...

In [10]:
ca_data$id = ca_data$GEO.id2
ca_tract2 = left_join(ca_tract,ca_data, by=c('id'))
dim(ca_tract2)


Out[10]:
  1. 330321
  2. 12

In [11]:
ca_tract_sf = ca_tract2[grep('San Francisco',ca_tract2$GEO.display.label),]
dim(ca_tract_sf)


Out[11]:
  1. 2980
  2. 12

In [12]:
basemap <-get_map('San Francisco', zoom=12) 
ggmap(basemap) +
geom_polygon(data = ca_tract_sf , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('San Francisco Poverty Rate\n ACS 2010-2014 data')


Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=San+Francisco&zoom=12&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=San%20Francisco&sensor=false

In [14]:
ca_tract_alameda = ca_tract2[grep('Alameda',ca_tract2$GEO.display.label),]
dim(ca_tract_alameda)


Out[14]:
  1. 8398
  2. 12

In [15]:
basemap <-get_map('Hayward', zoom=10) 
ggmap(basemap) +
geom_polygon(data = ca_tract_alameda , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('Alameda Poverty Rate\n ACS 2010-2014 data')


Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=Hayward&zoom=10&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Hayward&sensor=false

In [16]:
ca_tract_sanmateo = ca_tract2[grep('San Mateo',ca_tract2$GEO.display.label),]
dim(ca_tract_sanmateo)


Out[16]:
  1. 5235
  2. 12

In [18]:
basemap <-get_map('San Mateo', zoom=10) 
ggmap(basemap) +
geom_polygon(data = ca_tract_sanmateo , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('San Mateo Poverty Rate\n ACS 2010-2014 data')


Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=San+Mateo&zoom=10&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=San%20Mateo&sensor=false

In [19]:
#zoom in on oakland and san leandro
basemap <-get_map('San Leandro', zoom=11) 
ggmap(basemap)+
geom_polygon(data = ca_tract_alameda , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('Alameda Poverty Rate\n ACS 2010-2014 data')


Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=San+Leandro&zoom=11&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=San%20Leandro&sensor=false

In [ ]: