A categorical variable is related to what is sometimes called an "enum" in other languages. R calls these "factors" and represents them efficiently as integers. For code designed for categorical variables, we need to tell R that this column of a data frame should be treated as a "factor."
Here is a column of strings:
In [60]:
stringcolors = c("red","red","blue","green","red")
In [61]:
stringcolors
Out[61]:
Let's convert it to factors.
In [62]:
colors = factor(stringcolors)
In [63]:
colors
Out[63]:
In [64]:
is.factor(colors)
Out[64]:
In [65]:
sizes = factor(c("S","M","M","L","S"))
In [66]:
prices = factor(c(19.99, 12.99, 9.99,12.99,19.99))
In [67]:
prices
Out[67]:
In [68]:
clothes = data.frame(colors,sizes,prices)
In [69]:
clothes
Out[69]:
In [70]:
summary(clothes)
Out[70]:
In [71]:
xtabs(~colors+sizes, data=clothes)
Out[71]:
But the sizes are in the wrong order; R has no way to know what it is. To tell R to treat a variable as ordinal, we do this:
In [72]:
sizes = factor(sizes,levels=c("S","M","L"), ordered=TRUE)
In [74]:
clothes = data.frame(colors,sizes,prices)
In [75]:
xtabs(~colors+sizes, data=clothes)
Out[75]:
Now the sizes have been ordered S<M<L
In [ ]: