In [1]:
# Default plot size is 7 inches x 7 inches; change to 7 x 3
options(repr.plot.height=3)
In [ ]:
library(rpart) # CART tree models
library(rpart.plot) # Pretty plotting
library(vcd) # Spline plotting
In [3]:
titanic <- as.data.frame(Titanic)
head(titanic, n=5)
summary(titanic)
Out[3]:
Out[3]:
In [4]:
Survival.by.Sex <- xtabs(Freq~Sex+Survived, data=titanic)
Survival.by.Class <- xtabs(Freq~Class+Survived, data=titanic)
Survival.by.Age <- xtabs(Freq~Age+Survived, data=titanic)
oldpar <- par(mfrow=c(1,3))
options(repr.plot.width=7)
spineplot(Survival.by.Sex, col=c(rgb(0, 0, 0.5), rgb(0.3, 0.3, 1)))
spineplot(Survival.by.Class, col=c(rgb(0, 0, 0.5), rgb(0.3, 0.3, 1)))
spineplot(Survival.by.Age, col=c(rgb(0, 0, 0.5), rgb(0.3, 0.3, 1)))
par(oldpar)
In [ ]:
cart.control <- rpart.control(minbucket=1, cp=0, maxdepth=5)
model.cart = rpart(
Survived ~ . ,
data=titanic[ , -5],
weights=titanic$Freq,
method="class",
#xval=10,
control=cart.control
)
print(model.cart)
printcp(model.cart)
In [6]:
# The standard Tree plot
plot(model.cart, margin=0.01)
text(model.cart, use.n=TRUE, cex=.8)
options(repr.plot.height=5)
In [7]:
# Better visualization using rpart.plot
prp(x=model.cart,
fallen.leaves=TRUE, branch=.5, faclen=0, trace=1,
extra=1, under=TRUE,
branch.lty=3,
split.box.col="whitesmoke", split.border.col="darkgray", split.round=0.4)
In [8]:
# Confusion Matrix given a cutoff
threshold = 0.8
cm <- table(titanic$Survived,
predict(model.cart, titanic[,-5], type="prob")[,2] > threshold)
print(cm)
This notebook also demonstrates importing an extra library, igraph.
The Docker container sets this up, without the student needing to import anything
(grep for igraph).
We'll use an adjacency matrix to describe the network topology of Caffeine, and create the graph
using graph.adjacency(<the-adjacency-matrix>) to demonstrate some standard selection and
plotting functions using R's igraph library. The chemical formula below demonstrates use of inline LaTeX math markup, and the image inline image placement.
In [ ]:
library(igraph)
In [10]:
caffeine.adjacency <- as.matrix(read.table("caffeine.txt", sep=" "))
caffeine <- graph.adjacency(caffeine.adjacency, mode='undirected')
In [11]:
V(caffeine)$name <- strsplit('CHHHNCOCNCHHHCHNCNCHHHCO', '')[[1]]
V(caffeine)$color <- rgb(1, 1, 1)
V(caffeine)[name == 'C']$color <- rgb(0, 0, 0, 0.7)
V(caffeine)[name == 'O']$color <- rgb(1, 0, 0, 0.7)
V(caffeine)[name == 'N']$color <- rgb(0, 0, 1, 0.7)
plot(caffeine)
options(repr.plot.height=5, repr.plot.width=5)
In [ ]: