2.2 Data Structure

Each object has a mode or type, e.g. numerical, character or logic and a class.

data frame is a structure.

Factors are nominal or ordinal variable, stored and treated specially.

2.2.1 vector

The data in one vector must be the same mode or type.


In [ ]:
a <- c(1, 2, 5, 3, -2, 6)
a

In [ ]:
a0 <- c(1, "one", TRUE)
print(a0)
class(a0)

In [ ]:
b <- c("one", "two", "three")
b

In [ ]:
c <- c(TRUE, FALSE, TRUE, TRUE)
c

In [ ]:
a[3]

In [ ]:
a[2:6]

In [ ]:
a[c(1, 3, 5)]

In [ ]:
c(2:6)

2.2.2 matrix


In [ ]:
y <- matrix(1:20, nrow = 4, ncol = 5)
y

In [ ]:
cells <- c(1, 26, 34, 28)
rname <- c("R1", "R2")
cname <- c("C1", "C2")
mymatrix <- matrix(cells, nrow = 2, ncol = 2, byrow = TRUE, dimnames = list(rname, cname))
mymatrix

In [ ]:
x <- matrix(1:10, nrow = 2)
x

In [ ]:
x[2,]

In [ ]:
x[,2]

In [ ]:
x[1, 4]

In [ ]:
x[1, c(4, 5)]

2.2.3 array


In [ ]:
dim1 <- c("A1", "A2")
dim2 <- c("B1", "B2", "B3")
dim3 <- c("C1", "C2", "C3", "C4")
z <- array(data = 1:24, dim = c(2, 3, 4), dimnames = list(dim1, dim2, dim3))
print(z)

2.2.4 dataframe


In [ ]:
patientID <- c(1, 2, 3, 4)
age <- c(25, 34, 28, 52)
diabetes <- c("Type1", "Type1", "Type2", "Type1")
status <- c("Poor", "Improved", "Excellent", "Poor")
patientdata <- data.frame(patientID, age, diabetes, status)
patientdata

In [ ]:
str(patientdata)

In [ ]:
patientdata[1:2]

In [ ]:
patientdata[c("diabetes", "status")]

In [ ]:
patientdata$age

In [ ]:
table(patientdata$diabetes, patientdata$status)

Attach, detach and with


In [ ]:
attach(mtcars)
summary(mpg)
plot(mpg, disp)
detach(mtcars)

In [ ]:
with(data = mtcars, {
    keepstats <<- summary(mpg)
    plot(mpg, disp)
}
)
print(keepstats)

In [ ]:
patientdata <- data.frame(age, diabetes, status, row.names = patientID)
patientdata

2.2.5 factors

Categorical "nominal" and ordered categorical data "ordinal"


In [ ]:
status <- c("Poor", "Improved", "Excellent", "Poor")
status <- factor(status, ordered = TRUE, levels = c("Poor", "Improved", "Excellent"))
str(status)

In [ ]:
patientdata$status <- status
str(patientdata)

In [ ]:
summary(patientdata)

2.2.6 lists


In [ ]:
g <- "My first list"
f <- c(25, 26, 18, 39)
j <- matrix(1:10, nrow = 5)
k <- c("one", "two", "three")
mylist <- list(title = g, ages = f, j, k)
mylist

In [ ]:
mylist[[2]]

In [ ]:
mylist[["ages"]]

In [ ]:
mylist$ages

List is important:

  1. orgnize and recall disparate info in a simple way
  2. many functions return list

In [ ]:
x <- c(8, 6, 4)
x[7] <- 10
x

Difference between R and other programming languages

  1. . doesn't have special meanings in R, but $ is somewhat like the ..
  2. R doesn't have multiline comments
  3. Assigning to a non exist element of a vector/matrix/dataframe expnds the structure
  4. R doesn't have scalar value, there are 1-d vector
  5. Indices start from 1, not 0
  6. Variable can't be declared, come into existence on the first assignment.

2.3.2 importing delimited text file


In [ ]:
grades <- read.csv(file = "studentgrades.txt", header = TRUE, sep = ",")

In [ ]:
grades

In [ ]:
str(grades)

In [ ]:
grades <- read.table(file = "studentgrades.txt", header = TRUE, sep = ",", row.names = "StudentID",
                    colClasses = c("character", "character", "character",
                                  "numeric", "numeric", "numeric"))

In [ ]:
grades

In [ ]:
str(grades)

2.4.2 Value labels


In [ ]:
gender <- c(1, 1, 2, 2, 1)
gender.factor <- factor(c("F", "F", "M", "M", "F"), ordered = TRUE, levels = c("F", "M"), labels = c("female", "male")) #
str(gender.factor)