A bubble chart is a scatterplot where the size of the bubble is scaled by some value. Bubble charts were made small f famous by the late Hans Rosling in one of the most watched Ted talks ever.
Your challenge in class today: Make a bubble chart out of the dataset of majors by race and sex. I'm interested in gender differences by majors.
I'll help you set up the data.
In [1]:
library(dplyr)
library(ggplot2)
library(reshape2)
In [2]:
enrollment <- read.csv("../../Data/collegeenrollment.csv")
In [3]:
head(enrollment)
Note we have data that is long -- data by race and gender. We just want it by gender. So we need to do some grouping together and later we need to make this wider. So we have to first group by College, Major and Gender and add them up.
In [4]:
majors <- enrollment %>%
group_by(College, MajorName, Gender) %>%
summarize(
Total=sum(Count)
)
Now we need to make that long data wide, so Male and Female are on the same line.
In [5]:
majors_bubble <- dcast(majors, College + MajorName ~ Gender)
In [6]:
head(majors_bubble)
We now have enough to do a scatterplot, but what are we lacking for a bubble chart? Some kind of weighting. So let's create a couple.
In [7]:
bubble <- majors_bubble %>%
mutate(
Total = Male+Female,
Difference = abs(Male-Female)
)
The abs()
bits there mean give me the absolute value -- so everything is above zero, regardless of which is larger.
So let's try a plot:
In [9]:
ggplot(bubble, aes(x = Male, y = Female, size=Difference)) +
geom_point(alpha=0.4, color='red') +
scale_size_continuous(range=c(.1, 20)) +
scale_colour_continuous(guide = FALSE) +
geom_text(data=bubble, aes(label=MajorName, size=10), check_overlap=TRUE)
Your challenge: Make this better. Here's some guidance on things you can change. Here's more. We'll talk at the end of class.
In [ ]: