In [2]:
library(dplyr)
library(ggplot2)
library(ggmap)
library(data.table)
library(maptools)
library(rgdal)
options(jupyter.plot_mimetypes = 'image/png')
To map data onto map at the census tract level, need to get California shapefile from census.gov. I chose cartographic boundary shapefiles for small scale mapping projects, at the 500k scale. The function readOGR() from the package rgdal can read shapefile into SpatialPolygonsDataFrame. GEOID has the unique ID for each county and will be used to join the ACS census data.
In [3]:
ca_shp = readOGR(dsn="./cb_2014_06_tract_500k", layer ="cb_2014_06_tract_500k")
Health insurance coverage data is from ACS 'ACS_14_5yr_S2701', 5-year estimate 2010-2014 for all census tracts in California. The column 'HC03_EST_VC01' is percent uninsured. GEO.id2 is the unique ID for each census tract and will be used to join census tract data.
In [4]:
ca_data = read.csv('ACS_14_5YR_S2701/ACS_14_5YR_S2701_with_ann.csv')
In [5]:
head(ca_data)
Out[5]:
In [6]:
ca_data = select(ca_data, GEO.id2, GEO.display.label, HC03_EST_VC01) %>% slice(-1)
head(ca_data)
Out[6]:
In [7]:
str(ca_data)
In [8]:
#convert data types
ca_data$GEO.id2 = as.character(ca_data$GEO.id2)
ca_data$GEO.display.label = as.character(ca_data$GEO.display.label)
ca_data$percent = as.numeric(as.character(ca_data$HC03_EST_VC01))
str(ca_data)
In [9]:
summary(ca_data$percent)
Out[9]:
To use census tract data, need to convert the SpatialPolygonsDataFrame back into data frame with fortify(). Join census tract data with insurace coverage data based on id(unique id for each census tract). Select three counties in the bay area to map, San Francisco, Alameda, and San Mateo.
In [10]:
ca_tract<-fortify(ca_shp,region = "GEOID")
str(ca_tract)
In [11]:
ca_data$id = ca_data$GEO.id2
ca_tract2 = left_join(ca_tract,ca_data, by=c('id'))
dim(ca_tract2)
Out[11]:
In [12]:
ca_tract_sf = ca_tract2[grep('San Francisco',ca_tract2$GEO.display.label),]
dim(ca_tract_sf)
Out[12]:
In [13]:
basemap <-get_map('San Francisco', zoom=12)
ggmap(basemap) +
geom_polygon(data = ca_tract_sf , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('San Francisco Percent Uninsured\n ACS 2010-2014 data')
In [15]:
ca_tract_alameda = ca_tract2[grep('Alameda',ca_tract2$GEO.display.label),]
dim(ca_tract_alameda)
Out[15]:
In [16]:
basemap <-get_map('Hayward', zoom=10)
ggmap(basemap) +
geom_polygon(data = ca_tract_alameda , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('Alameda Percent Uninsured\n ACS 2010-2014 data')
In [17]:
ca_tract_sanmateo = ca_tract2[grep('San Mateo',ca_tract2$GEO.display.label),]
dim(ca_tract_sanmateo)
Out[17]:
In [18]:
basemap <-get_map('San Mateo', zoom=10)
ggmap(basemap) +
geom_polygon(data = ca_tract_sanmateo , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('San Mateo Percent Uninsured\n ACS 2010-2014 data')
In [ ]: