In [1]:
library(dplyr)
library(ggplot2)
enrollment <- read.csv("../../Data/collegeenrollment.csv")
In [2]:
jour <- filter(enrollment, MajorName == "Journalism")
jdf <- jour %>%
group_by(Race) %>%
summarize(
total=sum(Count)) %>%
select(Race, total) %>%
filter(total != 0)
ggplot(jdf, aes(x="", y=total, fill=Race)) + geom_bar(width = 1, stat = "identity") + coord_polar("y", start=0)
You can see, it's pretty white. But ... what about beyond that? How carefully can you evaluate angles and area?
Not well.
So let's introduce a better way: The Waffle Chart. Some call it a square pie chart. I personally hate that. Waffles it is. Here's the library's github page with some instructions.
In [3]:
# install.packages('waffle')
library(waffle)
The downside of the waffle library -- it doesn't play well with arbitrary dataframes. The good news? You don't want to be doing too many categories, so it's easy the create vectors to handle this.
In [4]:
jour %>%
group_by(Race) %>%
summarize(
total=sum(Count)) %>%
select(Race, total) %>%
filter(total != 0)
So that's the data we need to create a vector for. Here's what that looks like:
In [5]:
j <- c('Asian'=3, 'Black'=9, 'Hispanic'=10, 'NonResidentAlien'=4, 'Two or more races'=9, "Unknown"=1, "White"=194)
Now we can put into the waffle library. The element j is our data, the number of rows you can change if you want, title and xlab are straight out of ggplot.
In [6]:
waffle(j, rows = 10, title="Journalism majors by race", xlab="1 square = 1 student")
In [7]:
jour %>%
group_by(Gender) %>%
summarize(
total=sum(Count)) %>%
select(Gender, total)
In [8]:
g <- c('Female'=141, 'Male'=89)
waffle(g, rows = 10, title="Journalism majors by gender", xlab="1 square = 1 student", colors=c("#ff69b4", "#0000ff"))
To compare different datasets in sequential waffle charts, you create an iron (get it?) to put them together. Here, we'll compare the percentage of business administration majors who are male to the percentage of advertising majors who are male. First, let's get the stats.
In [9]:
ba <- filter(enrollment, MajorName == "Business Administration")
ba %>%
group_by(Gender) %>%
summarize(
total=sum(Count)) %>%
select(Gender, total)
ad <- filter(enrollment, MajorName == "Advertising & Public Relations")
ad %>%
group_by(Gender) %>%
summarize(
total=sum(Count)) %>%
select(Gender, total)
Going old school, I calculated the percentages by hand, then added them to a vector where each waffle chart is expecting them. I then added a table to make it clearer to readers.
In [11]:
iron(
waffle(c(Women = 32, Men = 68), rows = 10, title = "Business administration majors"),
waffle(c(Women = 66, Men = 44), rows = 10, title = "Advertising majors")
)
In [ ]: