In [2]:
library(dplyr)
library(ggplot2)
library(ggmap)
library(data.table)
library(maptools)
library(rgdal)
options(jupyter.plot_mimetypes = 'image/png')
To map data onto map at the census tract level, need to get California shapefile from census.gov. I chose cartographic boundary shapefiles for small scale mapping projects, at the 500k scale. The function readOGR() from the package rgdal can read shapefile into SpatialPolygonsDataFrame. GEOID has the unique ID for each county and will be used to join the ACS census data.
In [3]:
ca_shp = readOGR(dsn="./cb_2014_06_tract_500k", layer ="cb_2014_06_tract_500k")
Income data is from 'ACS_14_5YR_B19001', 5-year estimate from 2010 to 2014, for all census tracts in California, in inflation-adjusted dollars. The column 'HD)1_VD01' is the total number of people in that census tract. Other columns show estimates of number of people at a specified income level. I will focus on the last column 'HD01_VD17' for number of people with income $200,000 or more for this notebook.
In [4]:
ca_data = read.csv('ACS_14_5YR_B19001/ACS_14_5YR_B19001_with_ann.csv')
In [5]:
head(ca_data)
Out[5]:
In [6]:
ca_data = select(ca_data, GEO.id2, GEO.display.label, HD01_VD01,HD01_VD17) %>% slice(-1)
head(ca_data)
Out[6]:
In [8]:
#convert data type and calculate percentage
ca_data$GEO.id2 = as.character(ca_data$GEO.id2)
ca_data$GEO.display.label = as.character(ca_data$GEO.display.label)
ca_data$HD01_VD01 = as.numeric(as.character(ca_data$HD01_VD01))
ca_data$HD01_VD17 = as.numeric(as.character(ca_data$HD01_VD17))
ca_data$percent = (ca_data$HD01_VD17/ca_data$HD01_VD01)*100
head(ca_data)
Out[8]:
In [9]:
summary(ca_data$percent)
Out[9]:
In [10]:
ca_tract<-fortify(ca_shp,region = "GEOID")
str(ca_tract)
In [11]:
ca_data$id = ca_data$GEO.id2
ca_tract2 = left_join(ca_tract,ca_data, by=c('id'))
dim(ca_tract2)
Out[11]:
In [12]:
ca_tract_sf = ca_tract2[grep('San Francisco',ca_tract2$GEO.display.label),]
dim(ca_tract_sf)
Out[12]:
In [13]:
basemap <-get_map('San Francisco', zoom=12)
ggmap(basemap) +
geom_polygon(data = ca_tract_sf , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('San Francisco High Income Rate\n ACS 2010-2014 data')
In [14]:
ca_tract_alameda = ca_tract2[grep('Alameda',ca_tract2$GEO.display.label),]
dim(ca_tract_alameda)
Out[14]:
In [15]:
basemap <-get_map('Hayward', zoom=10)
ggmap(basemap) +
geom_polygon(data = ca_tract_alameda , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('Alameda High Income Rate\n ACS 2010-2014 data')
In [26]:
#remove the tract with 100% high-income to highlight the lower range
ca_data2 = filter(ca_data,percent<100)
dim(ca_data2)
Out[26]:
In [27]:
ca_tract3 = left_join(ca_tract,ca_data2, by=c('id'))
dim(ca_tract2)
Out[27]:
In [29]:
ca_tract_alameda = ca_tract3[grep('Alameda',ca_tract3$GEO.display.label),]
dim(ca_tract_alameda)
Out[29]:
In [30]:
basemap <-get_map('Hayward', zoom=10)
ggmap(basemap) +
geom_polygon(data = ca_tract_alameda , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('Alameda High Income Rate\n ACS 2010-2014 data')
In [ ]:
In [16]:
ca_tract_sanmateo = ca_tract2[grep('San Mateo',ca_tract2$GEO.display.label),]
dim(ca_tract_sanmateo)
Out[16]:
In [17]:
basemap <-get_map('San Mateo', zoom=10)
ggmap(basemap) +
geom_polygon(data = ca_tract_sanmateo , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('San Mateo High Income Rate\n ACS 2010-2014 data')
In [ ]:
In [19]:
#zoom in on oakland and san leandro
basemap <-get_map('San Leandro', zoom=11)
ggmap(basemap)+
geom_polygon(data = ca_tract_alameda , aes(x=long, y=lat, group = group, fill=percent)) +
scale_fill_gradient(low = "#ffffcc", high = "#ff4444") +
ggtitle('Alameda Poverty Rate\n ACS 2010-2014 data')
In [ ]: