Treemaps

Treemaps are somewhat controversial in data visualization circles. According to Tufte's commandments, data graphics must, among other things:

  • Show the data
  • Induce the viewer to think about substance rather than the methodology
  • Avoid distortion
  • Make large datasets coherent
  • Encourage the eye to compare different pieces of data
  • Reveal the data at several levels of detail

The question many arrive at with Treemaps are ... do they?

Let's explore.

If you haven't aleady, activate your dataviz environment and open Jupyter Notebooks. You'll need to run this command at the top:

install.packages('treemap', repos='http://cran.us.r-project.org')

If you get an error message: Close Jupyter, shut down the server in your terminal, and run this command: conda install r-essentials which will update all the libraries. I needed to do this on a Mac. PC worked.

For our example, we'll make a Treemap of our college enrollment data. The goal is to show which colleges are bigger than others, as well as the majors within them.


In [1]:
library(treemap)

In [2]:
enrollment <- read.csv("../../Data/collegeenrollment.csv")

In [3]:
head(enrollment)


CollegeDegreeMajorCodeMajorNameRaceGenderRaceGenderCountTotal
College of Agri Sci and Natl ResourcesB1BC BIOC Biochemistry NonResidentAlienMale NonResidentAlien Male 3 97
College of Agri Sci and Natl ResourcesB1AS ASCI Animal Science NonResidentAlienMale NonResidentAlien Male 0 338
College of Agri Sci and Natl ResourcesB1FW FWL Fisheries and Wildlife NonResidentAlienMale NonResidentAlien Male 0 191
College of Agri Sci and Natl ResourcesB1AP APSC Applied Science NonResidentAlienMale NonResidentAlien Male 1 71
College of Agri Sci and Natl ResourcesB1HO HORT Horticulture NonResidentAlienMale NonResidentAlien Male 1 52
College of Agri Sci and Natl ResourcesB1ED AEDU Agricultural Education NonResidentAlienMale NonResidentAlien Male 0 103

In [4]:
treemap(enrollment, 
    index=c("College","MajorName"),  # A list grouping variables: ORDER MATTERS.
    vSize = "Total",  # This determines the size, so it must be a number.
    title="Majors at UNL, 2017", # Customize the title
    fontsize.title = 24, #Change the font size of the title
    fontsize.labels=c(15,7), # Size of labels, must equal count of index
    fontcolor.labels=c("white","black"),     
    fontface.labels=c(2,1), # Font of labels: 1,2,3,4 for normal, bold, italic, bold-italic...
    bg.labels=c("transparent"), # Background color of labels
    align.labels=list(
        c("left", "top"), 
        c("right", "bottom")
        ), # Where to place labels in the rectangle?
    overlap.labels=0.5, # number between 0 and 1 that determines the overlap between labels.
    inflate.labels=F, # If true, labels are bigger when rectangle is bigger.
)


Discussion: Does this accomplish Tufte's commandments?

  • Show the data
  • Induce the viewer to think about substance rather than the methodology
  • Avoid distortion
  • Make large datasets coherent
  • Encourage the eye to compare different pieces of data
  • Reveal the data at several levels of detail

In class challenge: Make a treemap of UNL crime

  1. Use dplyr to group by building and then by crime and count them up.
  2. Make a treemap out of that dataframe.
  3. Answer the question: Does it illuminate or distract?

In [ ]: