In [2]:
library(dplyr)
library(ggplot2)
library(ggmap)
library(data.table)
library(maptools)
library(rgdal)
options(jupyter.plot_mimetypes = 'image/png')

Census tract data

To map data onto map at the census tract level, need to get California shapefile from census.gov. I chose cartographic boundary shapefiles for small scale mapping projects, at the 500k scale. The function readOGR() from the package rgdal can read shapefile into SpatialPolygonsDataFrame. GEOID has the unique ID for each county and will be used to join the ACS census data.


In [3]:
ca_shp = readOGR(dsn="./cb_2014_06_tract_500k", layer ="cb_2014_06_tract_500k")


OGR data source with driver: ESRI Shapefile 
Source: "./cb_2014_06_tract_500k", layer: "cb_2014_06_tract_500k"
with 8043 features
It has 9 fields
Warning message:
In readOGR(dsn = "./cb_2014_06_tract_500k", layer = "cb_2014_06_tract_500k"): Z-dimension discarded

Insurace coverage data from ACS(American community survey)

Health insurance coverage data is from ACS 'ACS_14_5yr_S2701', 5-year estimate 2010-2014 for all census tracts in California. The column 'HC03_EST_VC01' is percent uninsured. GEO.id2 is the unique ID for each census tract and will be used to join census tract data.


In [4]:
ca_data = read.csv('ACS_14_5YR_S2701/ACS_14_5YR_S2701_with_ann.csv')

In [5]:
head(ca_data)


Out[5]:
GEO.idGEO.id2GEO.display.labelHC01_EST_VC01HC01_MOE_VC01HC02_EST_VC01HC02_MOE_VC01HC03_EST_VC01HC03_MOE_VC01HC04_EST_VC01ellip.hHC01_EST_VC95HC01_MOE_VC95HC02_EST_VC95HC02_MOE_VC95HC03_EST_VC95HC03_MOE_VC95HC04_EST_VC95HC04_MOE_VC95HC05_EST_VC95HC05_MOE_VC95
1IdId2GeographyTotal; Estimate; Total civilian noninstitutionalized populationTotal; Margin of Error; Total civilian noninstitutionalized populationNumber Uninsured; Estimate; Total civilian noninstitutionalized populationNumber Uninsured; Margin of Error; Total civilian noninstitutionalized populationPercent Uninsured; Estimate; Total civilian noninstitutionalized populationPercent Uninsured; Margin of Error; Total civilian noninstitutionalized populationNumber Insured by Coverage Type; Estimate; Total civilian noninstitutionalized populationTotal; Estimate; PERCENT IMPUTED - Health insurance coverage - Public coverage - VA Health CareTotal; Margin of Error; PERCENT IMPUTED - Health insurance coverage - Public coverage - VA Health CareNumber Uninsured; Estimate; PERCENT IMPUTED - Health insurance coverage - Public coverage - VA Health CareNumber Uninsured; Margin of Error; PERCENT IMPUTED - Health insurance coverage - Public coverage - VA Health CarePercent Uninsured; Estimate; PERCENT IMPUTED - Health insurance coverage - Public coverage - VA Health CarePercent Uninsured; Margin of Error; PERCENT IMPUTED - Health insurance coverage - Public coverage - VA Health CareNumber Insured by Coverage Type; Estimate; PERCENT IMPUTED - Health insurance coverage - Public coverage - VA Health CareNumber Insured by Coverage Type; Margin of Error; PERCENT IMPUTED - Health insurance coverage - Public coverage - VA Health CarePercent Insured by Coverage Type; Estimate; PERCENT IMPUTED - Health insurance coverage - Public coverage - VA Health CarePercent Insured by Coverage Type; Margin of Error; PERCENT IMPUTED - Health insurance coverage - Public coverage - VA Health Care
21400000US0600140010006001400100Census Tract 4001, Alameda County, California3385545114633.41.9327114.4(X)(X)(X)(X)(X)(X)(X)(X)(X)
31400000US0600140020006001400200Census Tract 4002, Alameda County, California1939147105455.42.218348.0(X)(X)(X)(X)(X)(X)(X)(X)(X)
41400000US0600140030006001400300Census Tract 4003, Alameda County, California54284664042037.43.850248.5(X)(X)(X)(X)(X)(X)(X)(X)(X)
51400000US0600140040006001400400Census Tract 4004, Alameda County, California42743262921006.82.339828.5(X)(X)(X)(X)(X)(X)(X)(X)(X)
61400000US0600140050006001400500Census Tract 4005, Alameda County, California351628255525115.86.629619.2(X)(X)(X)(X)(X)(X)(X)(X)(X)

In [6]:
ca_data = select(ca_data, GEO.id2, GEO.display.label, HC03_EST_VC01) %>% slice(-1) 
head(ca_data)


Out[6]:
GEO.id2GEO.display.labelHC03_EST_VC01
106001400100Census Tract 4001, Alameda County, California3.4
206001400200Census Tract 4002, Alameda County, California5.4
306001400300Census Tract 4003, Alameda County, California7.4
406001400400Census Tract 4004, Alameda County, California6.8
506001400500Census Tract 4005, Alameda County, California15.8
606001400600Census Tract 4006, Alameda County, California22.1

In [7]:
str(ca_data)


'data.frame':	8057 obs. of  3 variables:
 $ GEO.id2          : Factor w/ 8058 levels "06001400100",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ GEO.display.label: Factor w/ 8058 levels "Census Tract 1, Colusa County, California",..: 3838 3839 3844 3847 3851 3853 3857 3858 3860 3867 ...
 $ HC03_EST_VC01    : Factor w/ 477 levels "-","0.0","0.1",..: 234 411 451 441 78 151 110 103 414 26 ...

In [8]:
#convert data types
ca_data$GEO.id2 = as.character(ca_data$GEO.id2)
ca_data$GEO.display.label = as.character(ca_data$GEO.display.label)
ca_data$percent = as.numeric(as.character(ca_data$HC03_EST_VC01))
str(ca_data)


Warning message:
In eval(expr, envir, enclos): NAs introduced by coercion
'data.frame':	8057 obs. of  4 variables:
 $ GEO.id2          : chr  "06001400100" "06001400200" "06001400300" "06001400400" ...
 $ GEO.display.label: chr  "Census Tract 4001, Alameda County, California" "Census Tract 4002, Alameda County, California" "Census Tract 4003, Alameda County, California" "Census Tract 4004, Alameda County, California" ...
 $ HC03_EST_VC01    : Factor w/ 477 levels "-","0.0","0.1",..: 234 411 451 441 78 151 110 103 414 26 ...
 $ percent          : num  3.4 5.4 7.4 6.8 15.8 22.1 19 18.3 5.7 10.6 ...

In [9]:
summary(ca_data$percent)


Out[9]:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   0.00    9.10   15.20   16.68   22.85   65.50      62 

Merge census tract data with ACS insurance coverage data

To use census tract data, need to convert the SpatialPolygonsDataFrame back into data frame with fortify(). Join census tract data with insurace coverage data based on id(unique id for each census tract). Select three counties in the bay area to map, San Francisco, Alameda, and San Mateo.


In [10]:
ca_tract<-fortify(ca_shp,region = "GEOID") 
str(ca_tract)


'data.frame':	330321 obs. of  7 variables:
 $ long : num  -122 -122 -122 -122 -122 ...
 $ lat  : num  37.9 37.9 37.9 37.9 37.9 ...
 $ order: int  1 2 3 4 5 6 7 8 9 10 ...
 $ hole : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ piece: Factor w/ 6 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ id   : chr  "06001400100" "06001400100" "06001400100" "06001400100" ...
 $ group: Factor w/ 8112 levels "06001400100.1",..: 1 1 1 1 1 1 1 1 1 1 ...

In [11]:
ca_data$id = ca_data$GEO.id2
ca_tract2 = left_join(ca_tract,ca_data, by=c('id'))
dim(ca_tract2)


Out[11]:
  1. 330321
  2. 11

In [12]:
ca_tract_sf = ca_tract2[grep('San Francisco',ca_tract2$GEO.display.label),]
dim(ca_tract_sf)


Out[12]:
  1. 2980
  2. 11

In [13]:
basemap <-get_map('San Francisco', zoom=12) 
ggmap(basemap) +
geom_polygon(data = ca_tract_sf , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('San Francisco Percent Uninsured\n ACS 2010-2014 data')


Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=San+Francisco&zoom=12&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=San%20Francisco&sensor=false

In [15]:
ca_tract_alameda = ca_tract2[grep('Alameda',ca_tract2$GEO.display.label),]
dim(ca_tract_alameda)


Out[15]:
  1. 8398
  2. 11

In [16]:
basemap <-get_map('Hayward', zoom=10) 
ggmap(basemap) +
geom_polygon(data = ca_tract_alameda , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('Alameda Percent Uninsured\n ACS 2010-2014 data')


Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=Hayward&zoom=10&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=Hayward&sensor=false

In [17]:
ca_tract_sanmateo = ca_tract2[grep('San Mateo',ca_tract2$GEO.display.label),]
dim(ca_tract_sanmateo)


Out[17]:
  1. 5235
  2. 11

In [18]:
basemap <-get_map('San Mateo', zoom=10) 
ggmap(basemap) +
geom_polygon(data = ca_tract_sanmateo , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('San Mateo Percent Uninsured\n ACS 2010-2014 data')


Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=San+Mateo&zoom=10&size=640x640&scale=2&maptype=terrain&language=en-EN&sensor=false
Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=San%20Mateo&sensor=false

In [ ]: