In [2]:
library(dplyr)
library(ggplot2)
library(ggmap)
library(data.table)
library(maptools)
library(rgdal)
options(jupyter.plot_mimetypes = 'image/png')
To map data onto map at the census tract level, need to get California shapefile from census.gov. I chose cartographic boundary shapefiles for small scale mapping projects, at the 500k scale. The function readOGR() from the package rgdal can read shapefile into SpatialPolygonsDataFrame. GEOID has the unique ID for each county and will be used to join the ACS census data.
In [3]:
ca_shp = readOGR(dsn="./cb_2014_06_tract_500k", layer ="cb_2014_06_tract_500k")
Poverty data is from ACS 'ACS_14_5YR_B17001', 5-year estimate 2010-2014, for all census tracts in California. The first column HD01_VD01 is the total number of people in the census tract. The column HD02_VD01 is the number of people whose income in the past 12 months were below poverty level. GEO.id2 is the unique ID for each census tract and will be used to join census tract data.
In [4]:
ca_data = read.csv('ACS_14_5YR_B17001/ACS_14_5YR_B17001_with_ann.csv')
In [5]:
head(ca_data)
Out[5]:
In [6]:
ca_data = select(ca_data, GEO.id2, GEO.display.label, HD01_VD01,HD01_VD02) %>% slice(-1)
head(ca_data)
Out[6]:
In [7]:
#convert data type and calculate percentage
ca_data$GEO.id2 = as.character(ca_data$GEO.id2)
ca_data$GEO.display.label = as.character(ca_data$GEO.display.label)
ca_data$HD01_VD01 = as.numeric(as.character(ca_data$HD01_VD01))
ca_data$HD01_VD02 = as.numeric(as.character(ca_data$HD01_VD02))
ca_data$percent = (ca_data$HD01_VD02/ca_data$HD01_VD01)*100
head(ca_data)
Out[7]:
In [8]:
summary(ca_data$percent)
Out[8]:
To use census tract data, need to convert the SpatialPolygonsDataFrame back into data frame with fortiy(). Join census tract data with insurace coverage data based on id(unique id for each census tract). Select three counties in the bay area to map, San Francisco, Alameda, and San Mateo.
In [9]:
ca_tract<-fortify(ca_shp,region = "GEOID")
str(ca_tract)
In [10]:
ca_data$id = ca_data$GEO.id2
ca_tract2 = left_join(ca_tract,ca_data, by=c('id'))
dim(ca_tract2)
Out[10]:
In [11]:
ca_tract_sf = ca_tract2[grep('San Francisco',ca_tract2$GEO.display.label),]
dim(ca_tract_sf)
Out[11]:
In [12]:
basemap <-get_map('San Francisco', zoom=12)
ggmap(basemap) +
geom_polygon(data = ca_tract_sf , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('San Francisco Poverty Rate\n ACS 2010-2014 data')
In [14]:
ca_tract_alameda = ca_tract2[grep('Alameda',ca_tract2$GEO.display.label),]
dim(ca_tract_alameda)
Out[14]:
In [15]:
basemap <-get_map('Hayward', zoom=10)
ggmap(basemap) +
geom_polygon(data = ca_tract_alameda , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('Alameda Poverty Rate\n ACS 2010-2014 data')
In [16]:
ca_tract_sanmateo = ca_tract2[grep('San Mateo',ca_tract2$GEO.display.label),]
dim(ca_tract_sanmateo)
Out[16]:
In [18]:
basemap <-get_map('San Mateo', zoom=10)
ggmap(basemap) +
geom_polygon(data = ca_tract_sanmateo , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('San Mateo Poverty Rate\n ACS 2010-2014 data')
In [19]:
#zoom in on oakland and san leandro
basemap <-get_map('San Leandro', zoom=11)
ggmap(basemap)+
geom_polygon(data = ca_tract_alameda , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('Alameda Poverty Rate\n ACS 2010-2014 data')
In [ ]: