R Notes


1) R Basics 1.1) Arithmetics: +,-, *,^, %% 1.2) Data types 1.3) Vectors: names(), sum(), selection vector 1.4) Matrices: matrix(), colnames(), rownames(), dimnames(), cbind(), rbind(), matrix arithmetics 1.5) Factors: factor(), levels(), summary(), selection, comparison 1.6) DataFrames: head(), tail(), str(), summary(), data.frame(), selection, subset(), order(), 1.7) Lists: create list, slice list, concat list

2) R Intermediate 2.1) Relational Operators: ==, !=, >, <

1) R Basics

1.1) Arithmetics

For simple arithmetics use:

  • , - , * , / , Exponetiation : ^ Module : %%

For matrix arithmetics use %*% or %/ % , else all elements are calculated and the arithmetic is not the usual matrix one.


1.2) Basic Data Types

Numerics : 4.5

Integers : 4

Boolean : TRUE, FALSE, T, F

Characters : 'this is a text', 'abc'

Vectors, Matrices, Factors and Lists are made using the basic data types.

To find the type of a variable use: class(variable_name)


In [95]:
var <- 4.1

In [96]:
class(var)


Out[96]:
'numeric'

1.3) Vectors

To create a vector


In [97]:
myvec <- c(1,3,5.5); myvec


Out[97]:
  1. 1
  2. 3
  3. 5.5

Naming the vector by :


In [98]:
names(myvec) <- c('a','b','c'); myvec


Out[98]:
a
1
b
3
c
5.5

Summing the elements of a vector:


In [99]:
sum(myvec)


Out[99]:
9.5

Select a vector index:


In [100]:
myvec[2]        # counting starts from 1 not 0!


Out[100]:
b: 3

In [101]:
myvec[c(1,2)]   # multiple select


Out[101]:
a
1
b
3

In [102]:
myvec[1:2]      # slicing


Out[102]:
a
1
b
3

In [103]:
myvec[c('a','b')] # by name


Out[103]:
a
1
b
3

Selection by vector comparison:


In [104]:
poker_vector <- c(140, -50, 20, -120, 240);
selection <- poker_vector > 0;
selection


Out[104]:
  1. TRUE
  2. FALSE
  3. TRUE
  4. FALSE
  5. TRUE

In [105]:
winnings_vector <- poker_vector[selection]; winnings_vector


Out[105]:
  1. 140
  2. 20
  3. 240

1.4) Matrices

Using the matrix function we can make a 2D matrix:


In [106]:
matrix(1:9, byrow = TRUE, nrow =3) # byrow : fill by rows , nrow = # rows


Out[106]:
123
456
789

create a matrix using vectors:


In [107]:
new_hope <- c(460.998, 314.4);
empire_strikes <- c(290.475, 247.900);
return_jedi <- c(309.306, 165.8);

star_wars_matrix = matrix(c(new_hope,empire_strikes,return_jedi), nrow=3,
                          byrow=TRUE);
star_wars_matrix


Out[107]:
460.998314.400
290.475247.900
309.306165.800

Name your columns, rows


In [108]:
colnames(star_wars_matrix) <- c("US", "non-US");
rownames(star_wars_matrix) <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi");
star_wars_matrix


Out[108]:
USnon-US
A New Hope460.998314.400
The Empire Strikes Back290.475247.900
Return of the Jedi309.306165.800

Summing rows and cols:


In [109]:
colSums(star_wars_matrix)


Out[109]:
US
1060.779
non-US
728.1

In [110]:
rowSums(star_wars_matrix)


Out[110]:
A New Hope
775.398
The Empire Strikes Back
538.375
Return of the Jedi
475.106

In [111]:
box_office_all <- c(461, 314.4, 290.5, 247.9, 309.3, 165.8);
movie_names <- c("A New Hope","The Empire Strikes Back","Return of the Jedi");
col_titles <- c("US","non-US")

#### NAME USING VECTOR NAMES
star_wars_matrix <- matrix(box_office_all, nrow = 3, byrow = TRUE, 
                           dimnames = list(movie_names, col_titles));

star_wars_matrix


Out[111]:
USnon-US
A New Hope461.0314.4
The Empire Strikes Back290.5247.9
Return of the Jedi309.3165.8

Adding columns and rows : cbind(), rbind()


In [112]:
worldwide_vector <- rowSums(star_wars_matrix);
all_wars_matrix <- cbind(star_wars_matrix, worldwide_vector)
all_wars_matrix


Out[112]:
USnon-USworldwide_vector
A New Hope461.0314.4775.4
The Empire Strikes Back290.5247.9538.4
Return of the Jedi309.3165.8475.1

In [113]:
box_office_2 <- c(474, 552, 310, 338, 380, 468);
movie_names_2 <- c("The Phantom Menace","Attack of the Clones","Revenge of the Sith");
col_titles_2 <- c("US","non-US")
star_wars_matrix_2 <- matrix(box_office_2, nrow = 3, byrow = TRUE, 
                           dimnames = list(movie_names_2, col_titles_2));


star_wars_all <- rbind(star_wars_matrix,star_wars_matrix_2);
star_wars_all


Out[113]:
USnon-US
A New Hope461.0314.4
The Empire Strikes Back290.5247.9
Return of the Jedi309.3165.8
The Phantom Menace474552
Attack of the Clones310338
Revenge of the Sith380468

Selecting elements of matrix


In [114]:
star_wars_all[1,2] #1st row, 2nd column


Out[114]:
314.4

In [115]:
star_wars_all[ ,2] # ALL rows from 2nd column


Out[115]:
A New Hope
314.4
The Empire Strikes Back
247.9
Return of the Jedi
165.8
The Phantom Menace
552
Attack of the Clones
338
Revenge of the Sith
468

In [116]:
star_wars_all[1, ] # 1st row, all columns


Out[116]:
US
461
non-US
314.4

In [117]:
star_wars_all[1:2,] #slicing


Out[117]:
USnon-US
A New Hope461.0314.4
The Empire Strikes Back290.5247.9

In [118]:
mean(star_wars_all[1:2,])


Out[118]:
328.45

Arithmetics with matrices


In [119]:
star_wars_all *2 # multiply all elements by 2


Out[119]:
USnon-US
A New Hope922.0628.8
The Empire Strikes Back581.0495.8
Return of the Jedi618.6331.6
The Phantom Menace 9481104
Attack of the Clones620676
Revenge of the Sith760936

1.4 Factors

Factors are used for categorical variables, i.e. 'Gender'

Having a vector with elements from two categories you can make a factor variable by calling the factor() function:


In [120]:
gender_vector <- c("Male", "Female", "Female", "Male", "Male");
factor_gender_vector <- factor(gender_vector);
factor_gender_vector


Out[120]:
  1. Male
  2. Female
  3. Female
  4. Male
  5. Male

There are two types of categorical variables nominal and ordinal. You can use arguments in the factor function to define the levels of an ordinal variable.

For example:


In [121]:
animals_vector <- c("Elephant", "Giraffe", "Donkey", "Horse")
temperature_vector <- c("High", "Low", "High","Low", "Medium")

factor_animals_vector <- factor(animals_vector)
factor_animals_vector
factor_temperature_vector <- factor(temperature_vector, order = TRUE, levels = c("Low", "Medium", "High"))
factor_temperature_vector 

# in R terminal this shows also the levels :
# for the nominal it states: Levels: Donkey Elephant Giraffe Horse
# while for the ordinal : Levels: Low < Medium < High


Out[121]:
  1. Elephant
  2. Giraffe
  3. Donkey
  4. Horse
Out[121]:
  1. High
  2. Low
  3. High
  4. Low
  5. Medium

You can change the names of the levels by levels() function:


In [122]:
survey_vector <- c("M", "F", "F", "M", "M")
factor_survey_vector <- factor(survey_vector)
factor_survey_vector


Out[122]:
  1. M
  2. F
  3. F
  4. M
  5. M

but I want 'Male' and 'Female' !


In [123]:
levels(factor_survey_vector) <- c('Female', 'Male')

In [124]:
factor_survey_vector


Out[124]:
  1. Male
  2. Female
  3. Female
  4. Male
  5. Male

Summary of factors


In [125]:
summary(survey_vector)


Out[125]:
   Length     Class      Mode 
        5 character character 

In [126]:
summary(factor_survey_vector)


Out[126]:
Female
2
Male
3

Selection


In [127]:
factor_survey_vector[1]


Out[127]:
Male

In [128]:
factor_survey_vector[2]


Out[128]:
Female

Comparison of factors

Comparison does not work for nominal factors, but works fine for ordinal!!


In [129]:
factor_survey_vector[1]>factor_survey_vector[2] ## NA! Doesn't work for 
                                                ## nominal factors!


Warning message:
In Ops.factor(factor_survey_vector[1], factor_survey_vector[2]): ‘>’ not meaningful for factors
Out[129]:
[1] NA

In [130]:
speed_vector <- c("Fast", "Slow", "Slow", "Fast", "Ultra-fast")
factor_speed_vector <- factor(speed_vector, ordered = TRUE, 
                              levels = c("Slow", "Fast", "Ultra-fast"))

compare_them <- factor_speed_vector[2]>factor_speed_vector[5]

# Is data analyst 2 faster than data analyst 5?
compare_them


Out[130]:
FALSE

1.6) DataFrame

Contrary to matrices, dataframes can hold data of various types


In [131]:
mtcars;  # dataframe included in R


Out[131]:
mpgcyldisphpdratwtqsecvsamgearcarb
Mazda RX42161601103.92.6216.460144
Mazda RX4 Wag2161601103.92.87517.020144
Datsun 71022.84108933.852.3218.611141
Hornet 4 Drive21.462581103.083.21519.441031
Hornet Sportabout18.783601753.153.4417.020032
Valiant18.162251052.763.4620.221031
Duster 36014.383602453.213.5715.840034
Merc 240D24.44146.7623.693.19201042
Merc 23022.84140.8953.923.1522.91042
Merc 28019.26167.61233.923.4418.31044
Merc 280C17.86167.61233.923.4418.91044
Merc 450SE16.48275.81803.074.0717.40033
Merc 450SL17.38275.81803.073.7317.60033
Merc 450SLC15.28275.81803.073.78180033
Cadillac Fleetwood10.484722052.935.2517.980034
Lincoln Continental10.4846021535.42417.820034
Chrysler Imperial14.784402303.235.34517.420034
Fiat 12832.4478.7664.082.219.471141
Honda Civic30.4475.7524.931.61518.521142
Toyota Corolla33.9471.1654.221.83519.91141
Toyota Corona21.54120.1973.72.46520.011031
Dodge Challenger15.583181502.763.5216.870032
AMC Javelin15.283041503.153.43517.30032
Camaro Z2813.383502453.733.8415.410034
Pontiac Firebird19.284001753.083.84517.050032
Fiat X1-927.3479664.081.93518.91141
Porsche 914-2264120.3914.432.1416.70152
Lotus Europa30.4495.11133.771.51316.91152
Ford Pantera L15.883512644.223.1714.50154
Ferrari Dino19.761451753.622.7715.50156
Maserati Bora1583013353.543.5714.60158
Volvo 142E21.441211094.112.7818.61142

Quick look at dataframe head(), tail(), str() [structure]


In [132]:
head(mtcars,2)


Out[132]:
mpgcyldisphpdratwtqsecvsamgearcarb
Mazda RX42161601103.92.6216.460144
Mazda RX4 Wag2161601103.92.87517.020144

In [133]:
tail(mtcars,4)


Out[133]:
mpgcyldisphpdratwtqsecvsamgearcarb
Ford Pantera L15.883512644.223.1714.50154
Ferrari Dino19.761451753.622.7715.50156
Maserati Bora1583013353.543.5714.60158
Volvo 142E21.441211094.112.7818.61142

In [134]:
str(mtcars) # means structure


'data.frame':	32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

In [135]:
summary(mtcars)  # returns summary statistics


Out[135]:
      mpg             cyl             disp             hp       
 Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
 1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
 Median :19.20   Median :6.000   Median :196.3   Median :123.0  
 Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
 3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
 Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
      drat             wt             qsec             vs        
 Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
 1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
 Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
 Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
 3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
 Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
       am              gear            carb      
 Min.   :0.0000   Min.   :3.000   Min.   :1.000  
 1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
 Median :0.0000   Median :4.000   Median :2.000  
 Mean   :0.4062   Mean   :3.688   Mean   :2.812  
 3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
 Max.   :1.0000   Max.   :5.000   Max.   :8.000  

Creating a data.frame()

You can create a dataframe using simple vectors by calling the data.frame() function


In [136]:
planets <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune");
type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet", "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant")
diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883); 
rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67);
rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE);

# Create the data frame:
planets_df  <- data.frame(planets, type, diameter, rotation, rings)
planets_df  # indexes the rows and names the columns by the name of vector


Out[136]:
planetstypediameterrotationrings
1MercuryTerrestrial planet0.38258.64FALSE
2VenusTerrestrial planet0.949-243.02FALSE
3EarthTerrestrial planet11FALSE
4MarsTerrestrial planet0.5321.03FALSE
5JupiterGas giant11.2090.41TRUE
6SaturnGas giant9.4490.43TRUE
7UranusGas giant4.007-0.72TRUE
8NeptuneGas giant3.8830.67TRUE

Slicing a data frame

As with matrices you can use [1,2], [1, ], [1:3, ] etc

OR USE THE NAME OF THE COLUMN!!!


In [137]:
planets_df[1:3,1]


Out[137]:
  1. Mercury
  2. Venus
  3. Earth

In [138]:
planets_df[3:8, "diameter"]


Out[138]:
  1. 1
  2. 0.532
  3. 11.209
  4. 9.449
  5. 4.007
  6. 3.883

In [139]:
planets_df$diameter # this returns a vector with the elements of the column
## similar to planets_df[ ,"diameter"]


Out[139]:
  1. 0.382
  2. 0.949
  3. 1
  4. 0.532
  5. 11.209
  6. 9.449
  7. 4.007
  8. 3.883

Selection of df entries using selector vector


In [140]:
selection_rings <- planets_df$rings

In [141]:
planets_with_rings <- planets_df[selection_rings, ] # all columns, 
                                                    #rows with True

In [142]:
planets_with_rings


Out[142]:
planetstypediameterrotationrings
5JupiterGas giant11.2090.41TRUE
6SaturnGas giant9.4490.43TRUE
7UranusGas giant4.007-0.72TRUE
8NeptuneGas giant3.8830.67TRUE

Subset()

Use the subset function to select entries from a df


In [143]:
planets_with_rings_2 <- subset(planets_df, subset = rings == TRUE);
planets_with_rings_2


Out[143]:
planetstypediameterrotationrings
5JupiterGas giant11.2090.41TRUE
6SaturnGas giant9.4490.43TRUE
7UranusGas giant4.007-0.72TRUE
8NeptuneGas giant3.8830.67TRUE

Order

the order() function returns the hierarchical order in a vector:


In [144]:
a<- c(100, 150, 101)

In [145]:
order(a)


Out[145]:
  1. 1
  2. 3
  3. 2

This can be used in data frames. For example order the planet df with the largest planet in the top.

To do this use the order function with the decreasing = TRUE argument to make a vector and then pass it in the df :


In [146]:
positions <- order(planets_df$diameter, decreasing=TRUE)

In [147]:
largest_first_df <- planets_df[positions, ]

In [148]:
largest_first_df


Out[148]:
planetstypediameterrotationrings
5JupiterGas giant11.2090.41TRUE
6SaturnGas giant9.4490.43TRUE
7UranusGas giant4.007-0.72TRUE
8NeptuneGas giant3.8830.67TRUE
3EarthTerrestrial planet11FALSE
2VenusTerrestrial planet0.949-243.02FALSE
4MarsTerrestrial planet0.5321.03FALSE
1MercuryTerrestrial planet0.38258.64FALSE

1.7) Lists

Lists hold together objects from different data types, sizes and flavors under one common name.

To make a list just use the list() function with arguments the things you want the list to hold


In [149]:
vec <- 1:10;
mat <- matrix(1:9, byrow=TRUE, ncol=3)
mdf <- mtcars[1:10,]

In [151]:
my_list <- list(vec, mat, mdf)
my_list


Out[151]:
    1. 1
    2. 2
    3. 3
    4. 4
    5. 5
    6. 6
    7. 7
    8. 8
    9. 9
    10. 10
  1. 123
    456
    789
  2. mpgcyldisphpdratwtqsecvsamgearcarb
    Mazda RX42161601103.92.6216.460144
    Mazda RX4 Wag2161601103.92.87517.020144
    Datsun 71022.84108933.852.3218.611141
    Hornet 4 Drive21.462581103.083.21519.441031
    Hornet Sportabout18.783601753.153.4417.020032
    Valiant18.162251052.763.4620.221031
    Duster 36014.383602453.213.5715.840034
    Merc 240D24.44146.7623.693.19201042
    Merc 23022.84140.8953.923.1522.91042
    Merc 28019.26167.61233.923.4418.31044

Update the names(my_list) to name the object your list holds:


In [155]:
names(my_list) <- c("vector", "matrix", "dataframe")

In [156]:
my_list


Error in vapply(seq_along(mapped), function(i) {: values must be length 1,
 but FUN(X[[3]]) result is length 0
Out[156]:
$vector
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
$matrix
123
456
789
$dataframe
mpgcyldisphpdratwtqsecvsamgearcarb
Mazda RX42161601103.92.6216.460144
Mazda RX4 Wag2161601103.92.87517.020144
Datsun 71022.84108933.852.3218.611141
Hornet 4 Drive21.462581103.083.21519.441031
Hornet Sportabout18.783601753.153.4417.020032
Valiant18.162251052.763.4620.221031
Duster 36014.383602453.213.5715.840034
Merc 240D24.44146.7623.693.19201042
Merc 23022.84140.8953.923.1522.91042
Merc 28019.26167.61233.923.4418.31044

Select elements from list

To select from the list use the double bracket [[ ]]


In [159]:
my_list[[1]] # returns the object 1 (ie the vector)


Out[159]:
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10

In [160]:
my_list[[2]]


Out[160]:
123
456
789

In [164]:
my_list[[1]][2] #from the list the 1st object (vec), 
                # from that the 2nd element


Out[164]:
2

Append to list

To append an info in the list use the c() function. To also name the new item when concatinating use the syntax :

c(list, name=new_var)


In [166]:
year <- 1980
my_list_2 <- c(my_list, myyear=year)
my_list_2


Error in vapply(seq_along(mapped), function(i) {: values must be length 1,
 but FUN(X[[3]]) result is length 0
Out[166]:
$vector
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 7
  8. 8
  9. 9
  10. 10
$matrix
123
456
789
$dataframe
mpgcyldisphpdratwtqsecvsamgearcarb
Mazda RX42161601103.92.6216.460144
Mazda RX4 Wag2161601103.92.87517.020144
Datsun 71022.84108933.852.3218.611141
Hornet 4 Drive21.462581103.083.21519.441031
Hornet Sportabout18.783601753.153.4417.020032
Valiant18.162251052.763.4620.221031
Duster 36014.383602453.213.5715.840034
Merc 240D24.44146.7623.693.19201042
Merc 23022.84140.8953.923.1522.91042
Merc 28019.26167.61233.923.4418.31044
$myyear
1980


2) Intermediate R

2.1) Relational Operators

These operators are ==, !=, <, >, and can be used to return TRUE or FALSE.

Some interesting cases:


In [167]:
3>5


Out[167]:
FALSE

In [168]:
FALSE == TRUE


Out[168]:
FALSE

In [169]:
FALSE < TRUE  # TRUE = 1, FALSE = 0


Out[169]:
TRUE

In [171]:
"Hello" > "Goodbye" # alphabetical order (G < H < I ...)


Out[171]:
TRUE

For vectors:


In [172]:
linkedin <- c (10,20,30,40,60)

In [173]:
linkedin > 30


Out[173]:
  1. FALSE
  2. FALSE
  3. FALSE
  4. TRUE
  5. TRUE

In [176]:
facebook <- c(10,50,20,56) # check the warning

In [177]:
linkedin<facebook


Warning message:
In linkedin < facebook: longer object length is not a multiple of shorter object length
Out[177]:
  1. FALSE
  2. TRUE
  3. FALSE
  4. TRUE
  5. FALSE

In [ ]: