This is one of the more oiular sites for online datasets http://archive.ics.uci.edu/ml/machine-learning-databases/ We are pulling in the iris datasets
In [1]:
flowers <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data")
flowers
By default, read skips a row with a blank element this adds it back in. Notice that the there are now 150 rows...
In [2]:
flowers <- read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data",blank.lines.skip=FALSE)
flowers
so let's remove it explicitly
In [3]:
flowers <- na.omit(flowers)
flowers
let's rename the columns
In [4]:
colnames(flowers) <- c("F1", "F2", "F3", "F4", "Label")
summary(flowers)
we don't know anything about this dataset so let's do a kmeans (like a scatterplot)
In [5]:
indexes = sample(1:nrow(flowers), size=0.6*nrow(flowers))
flowers.train <- flowers[-indexes,]
flowers.test <- flowers[indexes,]
fit <- kmeans(flowers.train[,1:4],5)
fit
lets see what it looks like in a graphical format
In [6]:
plot(flowers.train[c("F1", "F2")], col=fit$cluster)
points(fit$centers[,c("F1", "F2")], col=1:3, pch=8, cex=2)
In [ ]: