Making new data

One of the most common data analysis techniques is to look at change over time. The most common way of comparing change over time is through percent change. The math behind calculating percent change is very simple, and you should know it off the top of your head. The easy way to remember it is:

(new - old) / old

Or new minus old divided by old. Your new number minus the old number, the result of which is divided by the old number. To do that in R, we can use dplyr and mutate to calculate new metrics in a new field using existing fields of data.

So first we'll import dyplr.


In [1]:
library(dplyr)


Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Now we'll import a common and simple dataset of population estimates for every county in the US. The estimates data has data from 2010 to 2016.


In [2]:
population <- read.csv("../../Data/population.csv")

In [3]:
head(population)


STNAMECTYNAMEPOPESTIMATE2010POPESTIMATE2011POPESTIMATE2012POPESTIMATE2013POPESTIMATE2014POPESTIMATE2015POPESTIMATE2016
Alabama Autauga County 54742 55255 55027 54792 54977 55035 55416
Alabama Baldwin County183199 186653 190403 195147 199745 203690 208563
Alabama Barbour County 27348 27326 27132 26938 26763 26270 25965
Alabama Bibb County 22861 22736 22645 22501 22511 22561 22643
Alabama Blount County 57376 57707 57772 57746 57621 57676 57704
Alabama Bullock County 10892 10722 10654 10576 10712 10455 10362

The code to calculate percent change is pretty simple. Remember, with summarize, we used n() to count things. With mutate, we use very similar syntax to calculate a new value using other values in our dataset. So in this case, we're trying to do (new-old)/old, but we're doing it with fields. If we look at what we got when we did head, you'll see there's POPESTIMATE16 as the new data, and we'll use POPESTIMATE2015 as the old data. So we're looking at 1 year. Then, to help us, we'll use arrange again to sort it, so we get the county with the fastest growing population over one year.


In [4]:
population %>% mutate(
  change = (POPESTIMATE2016 - POPESTIMATE2015)/POPESTIMATE2015,
) %>% arrange(desc(change))


STNAMECTYNAMEPOPESTIMATE2010POPESTIMATE2011POPESTIMATE2012POPESTIMATE2013POPESTIMATE2014POPESTIMATE2015POPESTIMATE2016change
Texas Hudspeth County 3467 3417 3351 3331 3243 3425 4053 0.18335766
Utah San Juan County 14797 14787 14900 14988 15208 15707 16895 0.07563507
Texas Kendall County 33651 34525 35766 37461 38830 40452 42540 0.05161673
Texas Hays County 158241 163209 168408 176029 184951 194574 204470 0.05085983
Utah Wasatch County 23629 24403 25385 26609 27789 29165 30528 0.04673410
Iowa Dallas County 66699 69759 72271 75010 77798 80777 84516 0.04628793
Colorado Costilla County 3526 3646 3603 3521 3540 3563 3721 0.04434465
Texas Comal County 109294 112047 115005 118776 123487 129113 134788 0.04395375
Nebraska Thomas County 650 688 692 700 688 686 716 0.04373178
Florida Sumter County 94280 98584 102790 108263 114012 118882 123996 0.04301745
Oregon Crook County 20899 20675 20642 20776 21036 21647 22570 0.04263870
Colorado Ouray County 4466 4454 4546 4579 4615 4660 4857 0.04227468
Utah Juab County 10222 10302 10282 10255 10421 10566 11010 0.04202158
Washington Kittitas County 41010 41577 41630 41853 42518 43057 44866 0.04201407
Georgia Forsyth County 176770 182351 187613 194781 203679 212125 221009 0.04188097
South Carolina Lancaster County 76956 77734 79184 80486 83232 86026 89594 0.04147583
Idaho Valley County 9779 9610 9515 9585 9808 10083 10496 0.04096003
Texas Williamson County426415 442278 456352 470822 489143 508059 528718 0.04066260
South Carolina Horry County 270519 275426 281772 289310 298795 309871 322342 0.04024578
Florida Osceola County 269846 278906 289501 299822 311445 323028 336015 0.04020395
South Carolina Berkeley County 178941 183701 189560 193925 198248 202776 210898 0.04005405
Colorado Custer County 4275 4227 4241 4264 4344 4425 4602 0.04000000
Montana Granite County 3071 3141 3108 3134 3213 3239 3368 0.03982711
Texas Fort Bend County 590433 606962 625796 653252 684646 713849 741237 0.03836666
Florida Walton County 55244 55673 57330 59411 61550 63459 65889 0.03829244
Colorado Archuleta County 12055 12011 12113 12194 12229 12384 12854 0.03795220
Colorado Dolores County 2065 2033 1998 2030 1975 1981 2056 0.03785967
Montana Gallatin County 89631 91353 92595 94637 97322 100736 104502 0.03738485
Florida St. Johns County 191266 196050 202308 209607 218151 226658 235087 0.03718819
Idaho Teton County 10153 10174 10083 10276 10300 10568 10960 0.03709311
California Alpine County 1160 1109 1123 1141 1104 1106 1071 -0.03164557
Kansas Decatur County 2946 2918 2877 2905 2890 2926 2832 -0.03212577
South Dakota Hyde County 1417 1397 1430 1382 1401 1397 1352 -0.03221188
Virginia Emporia city 5933 5807 5791 5661 5585 5482 5305 -0.03228749
Nebraska Hayes County 954 978 942 966 928 927 897 -0.03236246
Texas Reagan County 3348 3385 3464 3598 3726 3731 3608 -0.03296703
Texas Hemphill County 3796 3951 4070 4125 4161 4270 4129 -0.03302108
Arkansas Monroe County 8123 8070 7843 7670 7612 7415 7169 -0.03317599
New York Hamilton County 4834 4834 4797 4760 4689 4698 4542 -0.03320562
Nebraska Hooker County 735 742 712 730 727 733 708 -0.03410641
Arkansas Lee County 10371 10277 10162 9991 9798 9645 9310 -0.03473302
Oklahoma Ellis County 4149 4042 4069 4129 4112 4227 4080 -0.03477644
North Dakota McIntosh County 2796 2765 2756 2752 2764 2757 2656 -0.03663402
Texas Ochiltree County 10173 10424 10559 10651 10673 10698 10306 -0.03664236
Utah Uintah County 32446 33275 34682 35737 36958 37789 36373 -0.03747122
Kansas Geary County 35284 35301 37947 36893 36682 36981 35586 -0.03772207
Montana Richland County 9749 10146 10801 11179 11560 11938 11482 -0.03819735
Georgia Charlton County 12838 13433 13332 13083 12921 13009 12497 -0.03935737
South Carolina Allendale County 10344 10247 9988 9805 9689 9420 9045 -0.03980892
Texas Crane County 4381 4365 4562 4756 4928 5039 4830 -0.04147648
Texas Schleicher County 3499 3304 3255 3193 3156 3196 3056 -0.04380476
Illinois Alexander County 8214 8018 7712 7215 7074 6776 6478 -0.04397875
North Dakota Dunn County 3542 3751 3967 4144 4369 4574 4366 -0.04547442
Oklahoma Beckham County 22053 22323 23123 23521 23661 23628 22519 -0.04693584
Georgia Chattahoochee County11178 11317 12632 12436 11956 11462 10922 -0.04711220
Kansas Morton County 3237 3171 3133 3133 3058 2992 2848 -0.04812834
New Mexico Harding County 687 711 698 688 680 699 665 -0.04864092
North Dakota Burke County 1959 2049 2162 2286 2237 2312 2198 -0.04930796
Texas Terrell County 1009 952 917 888 905 855 812 -0.05029240
Nevada Eureka County 1993 1982 1996 2056 1988 2027 1917 -0.05426739

But if we look at change, we'll see that it's a decimal point. That's because for percent change to be a percent, you must multiply it by 100. You do that this way:


In [6]:
population %>% mutate(
  change = ((POPESTIMATE2016 - POPESTIMATE2015)/POPESTIMATE2015)*100,
  longchange = ((POPESTIMATE2016 - POPESTIMATE2010)/POPESTIMATE2010)*100,
) %>% arrange(longchange)


STNAMECTYNAMEPOPESTIMATE2010POPESTIMATE2011POPESTIMATE2012POPESTIMATE2013POPESTIMATE2014POPESTIMATE2015POPESTIMATE2016changelongchange
Illinois Alexander County 8214 8018 7712 7215 7074 6776 6478 -4.3978749 -21.134648
Texas Terrell County 1009 952 917 888 905 855 812 -5.0292398 -19.524281
Kentucky Lee County 7707 7691 7551 6838 6774 6737 6580 -2.3304141 -14.623070
Idaho Butte County 2907 2805 2722 2626 2609 2501 2501 0.0000000 -13.966288
West Virginia McDowell County 22076 21708 21335 20901 20291 19698 19141 -2.8276982 -13.294981
Texas Schleicher County 3499 3304 3255 3193 3156 3196 3056 -4.3804756 -12.660760
South Carolina Allendale County 10344 10247 9988 9805 9689 9420 9045 -3.9808917 -12.558005
Arkansas Phillips County 21670 21413 20762 20437 19938 19534 18975 -2.8616771 -12.436548
Michigan Ontonagon County 6750 6626 6417 6314 6173 6008 5911 -1.6145140 -12.429630
Idaho Clark County 979 951 874 862 874 872 860 -1.3761468 -12.155260
Kansas Morton County 3237 3171 3133 3133 3058 2992 2848 -4.8128342 -12.017300
Oklahoma Cimarron County 2457 2484 2392 2320 2271 2200 2162 -1.7272727 -12.006512
Louisiana Tensas Parish 5224 5084 4965 4881 4797 4726 4597 -2.7295810 -12.002297
Alabama Macon County 21499 21317 20632 20016 19594 19215 18963 -1.3114754 -11.795897
Arkansas Monroe County 8123 8070 7843 7670 7612 7415 7169 -3.3175995 -11.744429
Texas Presidio County 7876 7747 7557 7282 7040 6881 6958 1.1190234 -11.655663
Texas Foard County 1338 1360 1310 1277 1268 1215 1183 -2.6337449 -11.584454
New Mexico San Juan County 130135 127991 128331 126518 124055 118701 115079 -3.0513644 -11.569524
New Mexico Hidalgo County 4855 4840 4785 4616 4539 4414 4302 -2.5373811 -11.390319
California Lassen County 34838 34288 33662 32144 31714 31333 30870 -1.4776753 -11.389862
Kansas Elk County 2873 2799 2678 2647 2699 2602 2547 -2.1137586 -11.347024
New Mexico De Baca County 2019 1972 1941 1893 1823 1831 1793 -2.0753687 -11.193660
New Mexico Colfax County 13736 13614 13226 13039 12674 12387 12253 -1.0817793 -10.796447
Virginia Emporia city 5933 5807 5791 5661 5585 5482 5305 -3.2287486 -10.584864
Texas Dickens County 2441 2395 2315 2283 2205 2197 2184 -0.5917160 -10.528472
Alaska Bristol Bay Borough 1003 1037 983 954 942 892 898 0.6726457 -10.468594
Arkansas Lafayette County 7636 7540 7441 7245 7119 6981 6847 -1.9194958 -10.332635
Arkansas Lee County 10371 10277 10162 9991 9798 9645 9310 -3.4733022 -10.230450
Alabama Coosa County 11758 11369 11201 11058 10794 10687 10581 -0.9918593 -10.010206
Mississippi Quitman County 8165 8029 7802 7826 7678 7514 7349 -2.1959010 -9.993876
South Carolina Horry County 270519 275426 281772 289310 298795 309871 322342 4.0245780 19.15688
Georgia Bryan County 30382 31242 32272 33120 33847 35052 36230 3.3607212 19.24824
Florida Walton County 55244 55673 57330 59411 61550 63459 65889 3.8292441 19.26906
Texas Andrews County 14835 15390 16107 16776 17405 18026 17760 -1.4756463 19.71689
Utah Morgan County 9527 9660 9822 10240 10636 11091 11437 3.1196466 20.04828
Texas Sterling County 1138 1170 1191 1237 1358 1359 1367 0.5886681 20.12302
South Dakota Lincoln County 45181 46775 48329 49849 51507 52826 54469 3.1102109 20.55731
Texas Denton County 666736 685376 707475 728282 752820 778491 806180 3.5567527 20.91442
Texas Montgomery County 459319 471591 484674 498951 517985 536434 556203 3.6852623 21.09297
North Dakota Billings County 771 838 914 889 898 920 934 1.5217391 21.14137
Virginia Loudoun County 315585 326921 338196 350678 362798 374559 385945 3.0398415 22.29510
Florida St. Johns County 191266 196050 202308 209607 218151 226658 235087 3.7188187 22.91102
North Dakota Dunn County 3542 3751 3967 4144 4369 4574 4366 -4.5474421 23.26369
Texas Comal County 109294 112047 115005 118776 123487 129113 134788 4.3953746 23.32607
Texas Williamson County 426415 442278 456352 470822 489143 508059 528718 4.0662600 23.99142
Louisiana St. Bernard Parish 36813 39504 41519 43409 44455 45384 45688 0.6698396 24.10833
Florida Osceola County 269846 278906 289501 299822 311445 323028 336015 4.0203945 24.52102
Georgia Forsyth County 176770 182351 187613 194781 203679 212125 221009 4.1880966 25.02631
Texas Fort Bend County 590433 606962 625796 653252 684646 713849 741237 3.8366657 25.54126
Georgia Long County 14681 15244 16164 16641 17172 17799 18437 3.5844710 25.58409
Texas Kendall County 33651 34525 35766 37461 38830 40452 42540 5.1616731 26.41526
Iowa Dallas County 66699 69759 72271 75010 77798 80777 84516 4.6287928 26.71254
North Dakota Stark County 24352 25171 26919 28406 30524 32139 31199 -2.9247954 28.11679
Utah Wasatch County 23629 24403 25385 26609 27789 29165 30528 4.6734099 29.19717
Texas Hays County 158241 163209 168408 176029 184951 194574 204470 5.0859827 29.21430
Florida Sumter County 94280 98584 102790 108263 114012 118882 123996 4.3017446 31.51888
North Dakota Mountrail County 7720 8107 8741 9336 9751 10307 10242 -0.6306394 32.66839
Texas Loving County 83 95 81 103 87 115 113 -1.7391304 36.14458
North Dakota Williams County 22586 24407 26741 29608 32143 35387 34337 -2.9671913 52.02780
North Dakota McKenzie County 6398 7005 7963 9255 10960 12792 12621 -1.3367730 97.26477

Assignment

How has Nebraska's electorate changed from 2010 to the election of Donald Trump in 2016? Specifically, how has the total number of voters, Republicans, Democrats and Non-Partisans changed in that time in each county in Nebraska? Which counties have changed the most in terms of the number of registered voters? Here is your dataset.

Rubric

  1. Did you import the data correctly?
  2. Did you mutate the data correctly? Did you do it in one step?
  3. Did you sort the data correctly?
  4. Did you explain each step using Markdown?

In [ ]: