Live fire: Census estimates release day

Every year, the US Census Bureau releases new estimates of the population of every metropolitan area, county, city and town in the US. They are estimates because they only do the headcount census every 10 years. Between then, they use data and modeling to estimate what the population is. Every 10 years, they recalibrate their models based on how close they came to getting it right, given the headcount census.

Today, we're going to simulate being in a newsroom on the day these new data are released. We're going to look at how a local news organization handled it, and we're going to show how a little bit of R and ggplot knowhow can make this better, easier and pushbutton quick next year.

First, let's talk about how a local newspaper covered it. What did they choose to focus on? What numerical measures did they use? Were they the right ones? Were they useful? Did they use any visuals? What could they have done differently?

Now let's take our own crack at this. You are now on deadline. You have until the end of class to create a visual story out of this data, looking at the state of Nebraska. You will need to:

  • Create some tables of data to show trends.
  • Create at least two visualizations of the data.

Some suggestions: Fastest growing? Fastest shrinking? Gainers to losers? One-year change vs since 2010? Every county in a lattice chart? Urban vs rural? Counties that have lost population every year this decade? Gained?

Pair up, plan what you are going to do, and get started. To help you, here's some boilerplate code to get you going. NOTE THE read.csv BITS. IT'S PULLING THE DATA STRAIGHT FROM THE URL.


In [1]:
library(dplyr)
library(ggplot2)


Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


In [2]:
counties <- read.csv(url("https://www2.census.gov/programs-surveys/popest/datasets/2010-2017/counties/totals/co-est2017-alldata.csv"))

In [3]:
head(counties)


SUMLEVREGIONDIVISIONSTATECOUNTYSTNAMECTYNAMECENSUS2010POPESTIMATESBASE2010POPESTIMATE2010RDOMESTICMIG2015RDOMESTICMIG2016RDOMESTICMIG2017RNETMIG2011RNETMIG2012RNETMIG2013RNETMIG2014RNETMIG2015RNETMIG2016RNETMIG2017
40 3 6 1 0 Alabama Alabama 4779736 4780135 4785579 -0.3172050 -0.404473 0.7888823 0.4507405 0.9393925 1.3642955 0.6942708 0.6785751 0.5589306 1.708218
50 3 6 1 1 Alabama Autauga County 54571 54571 54750 -1.9507393 4.831269 1.0471015 5.9118318 -6.1021012 -4.0502819 2.0993255 -1.6590399 5.1037088 1.317904
50 3 6 1 3 Alabama Baldwin County 182265 182265 183110 17.0478719 20.493601 22.3831750 16.2859400 17.1967858 22.6152855 20.3809040 17.9037487 21.3172439 23.163873
50 3 6 1 5 Alabama Barbour County 27457 27457 27332 -16.2224360 -18.755525 -19.0423948 0.2560211 -6.8224333 -8.0189202 -5.5497616 -16.4110690 -18.9476921 -19.159940
50 3 6 1 7 Alabama Bibb County 22915 22919 22872 0.9313878 -1.416117 -0.8829827 -5.0419800 -4.0966456 -5.8900379 1.2434497 1.8184237 -0.5310439 0.000000
50 3 6 1 9 Alabama Blount County 57322 57324 57381 -1.5633685 -1.736835 6.2124162 0.2435990 -1.3546723 -0.4860352 -1.7713100 -0.5384936 -0.6599972 7.285313

In [4]:
colnames(counties)


  1. 'SUMLEV'
  2. 'REGION'
  3. 'DIVISION'
  4. 'STATE'
  5. 'COUNTY'
  6. 'STNAME'
  7. 'CTYNAME'
  8. 'CENSUS2010POP'
  9. 'ESTIMATESBASE2010'
  10. 'POPESTIMATE2010'
  11. 'POPESTIMATE2011'
  12. 'POPESTIMATE2012'
  13. 'POPESTIMATE2013'
  14. 'POPESTIMATE2014'
  15. 'POPESTIMATE2015'
  16. 'POPESTIMATE2016'
  17. 'POPESTIMATE2017'
  18. 'NPOPCHG_2010'
  19. 'NPOPCHG_2011'
  20. 'NPOPCHG_2012'
  21. 'NPOPCHG_2013'
  22. 'NPOPCHG_2014'
  23. 'NPOPCHG_2015'
  24. 'NPOPCHG_2016'
  25. 'NPOPCHG_2017'
  26. 'BIRTHS2010'
  27. 'BIRTHS2011'
  28. 'BIRTHS2012'
  29. 'BIRTHS2013'
  30. 'BIRTHS2014'
  31. 'BIRTHS2015'
  32. 'BIRTHS2016'
  33. 'BIRTHS2017'
  34. 'DEATHS2010'
  35. 'DEATHS2011'
  36. 'DEATHS2012'
  37. 'DEATHS2013'
  38. 'DEATHS2014'
  39. 'DEATHS2015'
  40. 'DEATHS2016'
  41. 'DEATHS2017'
  42. 'NATURALINC2010'
  43. 'NATURALINC2011'
  44. 'NATURALINC2012'
  45. 'NATURALINC2013'
  46. 'NATURALINC2014'
  47. 'NATURALINC2015'
  48. 'NATURALINC2016'
  49. 'NATURALINC2017'
  50. 'INTERNATIONALMIG2010'
  51. 'INTERNATIONALMIG2011'
  52. 'INTERNATIONALMIG2012'
  53. 'INTERNATIONALMIG2013'
  54. 'INTERNATIONALMIG2014'
  55. 'INTERNATIONALMIG2015'
  56. 'INTERNATIONALMIG2016'
  57. 'INTERNATIONALMIG2017'
  58. 'DOMESTICMIG2010'
  59. 'DOMESTICMIG2011'
  60. 'DOMESTICMIG2012'
  61. 'DOMESTICMIG2013'
  62. 'DOMESTICMIG2014'
  63. 'DOMESTICMIG2015'
  64. 'DOMESTICMIG2016'
  65. 'DOMESTICMIG2017'
  66. 'NETMIG2010'
  67. 'NETMIG2011'
  68. 'NETMIG2012'
  69. 'NETMIG2013'
  70. 'NETMIG2014'
  71. 'NETMIG2015'
  72. 'NETMIG2016'
  73. 'NETMIG2017'
  74. 'RESIDUAL2010'
  75. 'RESIDUAL2011'
  76. 'RESIDUAL2012'
  77. 'RESIDUAL2013'
  78. 'RESIDUAL2014'
  79. 'RESIDUAL2015'
  80. 'RESIDUAL2016'
  81. 'RESIDUAL2017'
  82. 'GQESTIMATESBASE2010'
  83. 'GQESTIMATES2010'
  84. 'GQESTIMATES2011'
  85. 'GQESTIMATES2012'
  86. 'GQESTIMATES2013'
  87. 'GQESTIMATES2014'
  88. 'GQESTIMATES2015'
  89. 'GQESTIMATES2016'
  90. 'GQESTIMATES2017'
  91. 'RBIRTH2011'
  92. 'RBIRTH2012'
  93. 'RBIRTH2013'
  94. 'RBIRTH2014'
  95. 'RBIRTH2015'
  96. 'RBIRTH2016'
  97. 'RBIRTH2017'
  98. 'RDEATH2011'
  99. 'RDEATH2012'
  100. 'RDEATH2013'
  101. 'RDEATH2014'
  102. 'RDEATH2015'
  103. 'RDEATH2016'
  104. 'RDEATH2017'
  105. 'RNATURALINC2011'
  106. 'RNATURALINC2012'
  107. 'RNATURALINC2013'
  108. 'RNATURALINC2014'
  109. 'RNATURALINC2015'
  110. 'RNATURALINC2016'
  111. 'RNATURALINC2017'
  112. 'RINTERNATIONALMIG2011'
  113. 'RINTERNATIONALMIG2012'
  114. 'RINTERNATIONALMIG2013'
  115. 'RINTERNATIONALMIG2014'
  116. 'RINTERNATIONALMIG2015'
  117. 'RINTERNATIONALMIG2016'
  118. 'RINTERNATIONALMIG2017'
  119. 'RDOMESTICMIG2011'
  120. 'RDOMESTICMIG2012'
  121. 'RDOMESTICMIG2013'
  122. 'RDOMESTICMIG2014'
  123. 'RDOMESTICMIG2015'
  124. 'RDOMESTICMIG2016'
  125. 'RDOMESTICMIG2017'
  126. 'RNETMIG2011'
  127. 'RNETMIG2012'
  128. 'RNETMIG2013'
  129. 'RNETMIG2014'
  130. 'RNETMIG2015'
  131. 'RNETMIG2016'
  132. 'RNETMIG2017'

Here's some code to filter out just Nebraska counties, remove the statewide total number and calculate percent change into a field called change.


In [5]:
nebraska <- counties %>% 
filter(STNAME == "Nebraska") %>% 
filter(SUMLEV == 50) %>% 
mutate(change = ((POPESTIMATE2017-POPESTIMATE2016)/POPESTIMATE2016)*100)

Homework:

Read Tufte 2,3 and 5 and be prepared for a disussion of lying with charts. Also, prepare a pitch for your next visual story, which is due Thursday of Dead Week.


In [ ]: