R Notes

1) R Basics 1.1) Arithmetics: +,-, *,^, %% 1.2) Data types 1.3) Vectors: names(), sum(), selection vector 1.4) Matrices: matrix(), colnames(), rownames(), dimnames(), cbind(), rbind(), matrix arithmetics 1.5) Factors: factor(), levels(), summary(), selection, comparison 1.6) DataFrames: head(), tail(), str(), summary(), data.frame(), selection, subset(), order(), 1.7) Lists: create list, slice list, concat list

2) R Intermediate 2.1) Relational Operators: ==, !=, >, <

1) R Basics

1.1) Arithmetics

For simple arithmetics use:

• , - , * , / , Exponetiation : ^ Module : %%

For matrix arithmetics use %*% or %/ % , else all elements are calculated and the arithmetic is not the usual matrix one.

1.2) Basic Data Types

Numerics : 4.5

Integers : 4

Boolean : TRUE, FALSE, T, F

Characters : 'this is a text', 'abc'

Vectors, Matrices, Factors and Lists are made using the basic data types.

To find the type of a variable use: class(variable_name)

``````

In [95]:

var <- 4.1

``````
``````

In [96]:

class(var)

``````
``````

Out[96]:

'numeric'

``````

1.3) Vectors

To create a vector

``````

In [97]:

myvec <- c(1,3,5.5); myvec

``````
``````

Out[97]:

1
3
5.5

``````

Naming the vector by :

``````

In [98]:

names(myvec) <- c('a','b','c'); myvec

``````
``````

Out[98]:

a
1
b
3
c
5.5

``````

Summing the elements of a vector:

``````

In [99]:

sum(myvec)

``````
``````

Out[99]:

9.5

``````

Select a vector index:

``````

In [100]:

myvec[2]        # counting starts from 1 not 0!

``````
``````

Out[100]:

b: 3

``````
``````

In [101]:

myvec[c(1,2)]   # multiple select

``````
``````

Out[101]:

a
1
b
3

``````
``````

In [102]:

myvec[1:2]      # slicing

``````
``````

Out[102]:

a
1
b
3

``````
``````

In [103]:

myvec[c('a','b')] # by name

``````
``````

Out[103]:

a
1
b
3

``````

Selection by vector comparison:

``````

In [104]:

poker_vector <- c(140, -50, 20, -120, 240);
selection <- poker_vector > 0;
selection

``````
``````

Out[104]:

TRUE
FALSE
TRUE
FALSE
TRUE

``````
``````

In [105]:

winnings_vector <- poker_vector[selection]; winnings_vector

``````
``````

Out[105]:

140
20
240

``````

1.4) Matrices

Using the matrix function we can make a 2D matrix:

``````

In [106]:

matrix(1:9, byrow = TRUE, nrow =3) # byrow : fill by rows , nrow = # rows

``````
``````

Out[106]:

123
456
789

``````

create a matrix using vectors:

``````

In [107]:

new_hope <- c(460.998, 314.4);
empire_strikes <- c(290.475, 247.900);
return_jedi <- c(309.306, 165.8);

star_wars_matrix = matrix(c(new_hope,empire_strikes,return_jedi), nrow=3,
byrow=TRUE);
star_wars_matrix

``````
``````

Out[107]:

460.998314.400
290.475247.900
309.306165.800

``````

``````

In [108]:

colnames(star_wars_matrix) <- c("US", "non-US");
rownames(star_wars_matrix) <- c("A New Hope", "The Empire Strikes Back", "Return of the Jedi");
star_wars_matrix

``````
``````

Out[108]:

USnon-US

A New Hope460.998314.400
The Empire Strikes Back290.475247.900
Return of the Jedi309.306165.800

``````

Summing rows and cols:

``````

In [109]:

colSums(star_wars_matrix)

``````
``````

Out[109]:

US
1060.779
non-US
728.1

``````
``````

In [110]:

rowSums(star_wars_matrix)

``````
``````

Out[110]:

A New Hope
775.398
The Empire Strikes Back
538.375
Return of the Jedi
475.106

``````
``````

In [111]:

box_office_all <- c(461, 314.4, 290.5, 247.9, 309.3, 165.8);
movie_names <- c("A New Hope","The Empire Strikes Back","Return of the Jedi");
col_titles <- c("US","non-US")

#### NAME USING VECTOR NAMES
star_wars_matrix <- matrix(box_office_all, nrow = 3, byrow = TRUE,
dimnames = list(movie_names, col_titles));

star_wars_matrix

``````
``````

Out[111]:

USnon-US

A New Hope461.0314.4
The Empire Strikes Back290.5247.9
Return of the Jedi309.3165.8

``````

Adding columns and rows : cbind(), rbind()

``````

In [112]:

worldwide_vector <- rowSums(star_wars_matrix);
all_wars_matrix <- cbind(star_wars_matrix, worldwide_vector)
all_wars_matrix

``````
``````

Out[112]:

USnon-USworldwide_vector

A New Hope461.0314.4775.4
The Empire Strikes Back290.5247.9538.4
Return of the Jedi309.3165.8475.1

``````
``````

In [113]:

box_office_2 <- c(474, 552, 310, 338, 380, 468);
movie_names_2 <- c("The Phantom Menace","Attack of the Clones","Revenge of the Sith");
col_titles_2 <- c("US","non-US")
star_wars_matrix_2 <- matrix(box_office_2, nrow = 3, byrow = TRUE,
dimnames = list(movie_names_2, col_titles_2));

star_wars_all <- rbind(star_wars_matrix,star_wars_matrix_2);
star_wars_all

``````
``````

Out[113]:

USnon-US

A New Hope461.0314.4
The Empire Strikes Back290.5247.9
Return of the Jedi309.3165.8
The Phantom Menace474552
Attack of the Clones310338
Revenge of the Sith380468

``````

Selecting elements of matrix

``````

In [114]:

star_wars_all[1,2] #1st row, 2nd column

``````
``````

Out[114]:

314.4

``````
``````

In [115]:

star_wars_all[ ,2] # ALL rows from 2nd column

``````
``````

Out[115]:

A New Hope
314.4
The Empire Strikes Back
247.9
Return of the Jedi
165.8
The Phantom Menace
552
Attack of the Clones
338
Revenge of the Sith
468

``````
``````

In [116]:

star_wars_all[1, ] # 1st row, all columns

``````
``````

Out[116]:

US
461
non-US
314.4

``````
``````

In [117]:

star_wars_all[1:2,] #slicing

``````
``````

Out[117]:

USnon-US

A New Hope461.0314.4
The Empire Strikes Back290.5247.9

``````
``````

In [118]:

mean(star_wars_all[1:2,])

``````
``````

Out[118]:

328.45

``````

Arithmetics with matrices

``````

In [119]:

star_wars_all *2 # multiply all elements by 2

``````
``````

Out[119]:

USnon-US

A New Hope922.0628.8
The Empire Strikes Back581.0495.8
Return of the Jedi618.6331.6
The Phantom Menace 9481104
Attack of the Clones620676
Revenge of the Sith760936

``````

1.4 Factors

Factors are used for categorical variables, i.e. 'Gender'

Having a vector with elements from two categories you can make a factor variable by calling the `factor()` function:

``````

In [120]:

gender_vector <- c("Male", "Female", "Female", "Male", "Male");
factor_gender_vector <- factor(gender_vector);
factor_gender_vector

``````
``````

Out[120]:

Male
Female
Female
Male
Male

``````

There are two types of categorical variables nominal and ordinal. You can use arguments in the factor function to define the levels of an ordinal variable.

For example:

``````

In [121]:

animals_vector <- c("Elephant", "Giraffe", "Donkey", "Horse")
temperature_vector <- c("High", "Low", "High","Low", "Medium")

factor_animals_vector <- factor(animals_vector)
factor_animals_vector
factor_temperature_vector <- factor(temperature_vector, order = TRUE, levels = c("Low", "Medium", "High"))
factor_temperature_vector

# in R terminal this shows also the levels :
# for the nominal it states: Levels: Donkey Elephant Giraffe Horse
# while for the ordinal : Levels: Low < Medium < High

``````
``````

Out[121]:

Elephant
Giraffe
Donkey
Horse

Out[121]:

High
Low
High
Low
Medium

``````

You can change the names of the levels by `levels()` function:

``````

In [122]:

survey_vector <- c("M", "F", "F", "M", "M")
factor_survey_vector <- factor(survey_vector)
factor_survey_vector

``````
``````

Out[122]:

M
F
F
M
M

``````

but I want 'Male' and 'Female' !

``````

In [123]:

levels(factor_survey_vector) <- c('Female', 'Male')

``````
``````

In [124]:

factor_survey_vector

``````
``````

Out[124]:

Male
Female
Female
Male
Male

``````

Summary of factors

``````

In [125]:

summary(survey_vector)

``````
``````

Out[125]:

Length     Class      Mode
5 character character

``````
``````

In [126]:

summary(factor_survey_vector)

``````
``````

Out[126]:

Female
2
Male
3

``````

Selection

``````

In [127]:

factor_survey_vector[1]

``````
``````

Out[127]:

Male

``````
``````

In [128]:

factor_survey_vector[2]

``````
``````

Out[128]:

Female

``````

Comparison of factors

Comparison does not work for nominal factors, but works fine for ordinal!!

``````

In [129]:

factor_survey_vector[1]>factor_survey_vector[2] ## NA! Doesn't work for
## nominal factors!

``````
``````

Warning message:
In Ops.factor(factor_survey_vector[1], factor_survey_vector[2]): ‘>’ not meaningful for factors

Out[129]:

[1] NA

``````
``````

In [130]:

speed_vector <- c("Fast", "Slow", "Slow", "Fast", "Ultra-fast")
factor_speed_vector <- factor(speed_vector, ordered = TRUE,
levels = c("Slow", "Fast", "Ultra-fast"))

compare_them <- factor_speed_vector[2]>factor_speed_vector[5]

# Is data analyst 2 faster than data analyst 5?
compare_them

``````
``````

Out[130]:

FALSE

``````

1.6) DataFrame

Contrary to matrices, dataframes can hold data of various types

``````

In [131]:

mtcars;  # dataframe included in R

``````
``````

Out[131]:

mpgcyldisphpdratwtqsecvsamgearcarb

Mazda RX42161601103.92.6216.460144
Mazda RX4 Wag2161601103.92.87517.020144
Datsun 71022.84108933.852.3218.611141
Hornet 4 Drive21.462581103.083.21519.441031
Valiant18.162251052.763.4620.221031
Duster 36014.383602453.213.5715.840034
Merc 240D24.44146.7623.693.19201042
Merc 23022.84140.8953.923.1522.91042
Merc 28019.26167.61233.923.4418.31044
Merc 280C17.86167.61233.923.4418.91044
Merc 450SE16.48275.81803.074.0717.40033
Merc 450SL17.38275.81803.073.7317.60033
Merc 450SLC15.28275.81803.073.78180033
Lincoln Continental10.4846021535.42417.820034
Chrysler Imperial14.784402303.235.34517.420034
Fiat 12832.4478.7664.082.219.471141
Honda Civic30.4475.7524.931.61518.521142
Toyota Corolla33.9471.1654.221.83519.91141
Toyota Corona21.54120.1973.72.46520.011031
Dodge Challenger15.583181502.763.5216.870032
AMC Javelin15.283041503.153.43517.30032
Camaro Z2813.383502453.733.8415.410034
Pontiac Firebird19.284001753.083.84517.050032
Fiat X1-927.3479664.081.93518.91141
Porsche 914-2264120.3914.432.1416.70152
Lotus Europa30.4495.11133.771.51316.91152
Ford Pantera L15.883512644.223.1714.50154
Ferrari Dino19.761451753.622.7715.50156
Maserati Bora1583013353.543.5714.60158
Volvo 142E21.441211094.112.7818.61142

``````

Quick look at dataframe head(), tail(), str() [structure]

``````

In [132]:

``````
``````

Out[132]:

mpgcyldisphpdratwtqsecvsamgearcarb

Mazda RX42161601103.92.6216.460144
Mazda RX4 Wag2161601103.92.87517.020144

``````
``````

In [133]:

tail(mtcars,4)

``````
``````

Out[133]:

mpgcyldisphpdratwtqsecvsamgearcarb

Ford Pantera L15.883512644.223.1714.50154
Ferrari Dino19.761451753.622.7715.50156
Maserati Bora1583013353.543.5714.60158
Volvo 142E21.441211094.112.7818.61142

``````
``````

In [134]:

str(mtcars) # means structure

``````
``````

'data.frame':	32 obs. of  11 variables:
\$ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
\$ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
\$ disp: num  160 160 108 258 360 ...
\$ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
\$ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
\$ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
\$ qsec: num  16.5 17 18.6 19.4 17 ...
\$ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
\$ am  : num  1 1 1 0 0 0 0 0 0 0 ...
\$ gear: num  4 4 4 3 3 3 3 4 4 4 ...
\$ carb: num  4 4 1 1 2 1 4 2 2 4 ...

``````
``````

In [135]:

summary(mtcars)  # returns summary statistics

``````
``````

Out[135]:

mpg             cyl             disp             hp
Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0
1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5
Median :19.20   Median :6.000   Median :196.3   Median :123.0
Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7
3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0
Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0
drat             wt             qsec             vs
Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000
1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000
Median :3.695   Median :3.325   Median :17.71   Median :0.0000
Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375
3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000
Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000
am              gear            carb
Min.   :0.0000   Min.   :3.000   Min.   :1.000
1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000
Median :0.0000   Median :4.000   Median :2.000
Mean   :0.4062   Mean   :3.688   Mean   :2.812
3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000
Max.   :1.0000   Max.   :5.000   Max.   :8.000

``````

Creating a data.frame()

You can create a dataframe using simple vectors by calling the `data.frame()` function

``````

In [136]:

planets <- c("Mercury", "Venus", "Earth", "Mars", "Jupiter", "Saturn", "Uranus", "Neptune");
type <- c("Terrestrial planet", "Terrestrial planet", "Terrestrial planet", "Terrestrial planet", "Gas giant", "Gas giant", "Gas giant", "Gas giant")
diameter <- c(0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883);
rotation <- c(58.64, -243.02, 1, 1.03, 0.41, 0.43, -0.72, 0.67);
rings <- c(FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, TRUE);

# Create the data frame:
planets_df  <- data.frame(planets, type, diameter, rotation, rings)
planets_df  # indexes the rows and names the columns by the name of vector

``````
``````

Out[136]:

planetstypediameterrotationrings

1MercuryTerrestrial planet0.38258.64FALSE
2VenusTerrestrial planet0.949-243.02FALSE
3EarthTerrestrial planet11FALSE
4MarsTerrestrial planet0.5321.03FALSE
5JupiterGas giant11.2090.41TRUE
6SaturnGas giant9.4490.43TRUE
7UranusGas giant4.007-0.72TRUE
8NeptuneGas giant3.8830.67TRUE

``````

Slicing a data frame

As with matrices you can use [1,2], [1, ], [1:3, ] etc

OR USE THE NAME OF THE COLUMN!!!

``````

In [137]:

planets_df[1:3,1]

``````
``````

Out[137]:

Mercury
Venus
Earth

``````
``````

In [138]:

planets_df[3:8, "diameter"]

``````
``````

Out[138]:

1
0.532
11.209
9.449
4.007
3.883

``````
``````

In [139]:

planets_df\$diameter # this returns a vector with the elements of the column
## similar to planets_df[ ,"diameter"]

``````
``````

Out[139]:

0.382
0.949
1
0.532
11.209
9.449
4.007
3.883

``````

Selection of df entries using selector vector

``````

In [140]:

selection_rings <- planets_df\$rings

``````
``````

In [141]:

planets_with_rings <- planets_df[selection_rings, ] # all columns,
#rows with True

``````
``````

In [142]:

planets_with_rings

``````
``````

Out[142]:

planetstypediameterrotationrings

5JupiterGas giant11.2090.41TRUE
6SaturnGas giant9.4490.43TRUE
7UranusGas giant4.007-0.72TRUE
8NeptuneGas giant3.8830.67TRUE

``````

Subset()

Use the subset function to select entries from a df

``````

In [143]:

planets_with_rings_2 <- subset(planets_df, subset = rings == TRUE);
planets_with_rings_2

``````
``````

Out[143]:

planetstypediameterrotationrings

5JupiterGas giant11.2090.41TRUE
6SaturnGas giant9.4490.43TRUE
7UranusGas giant4.007-0.72TRUE
8NeptuneGas giant3.8830.67TRUE

``````

Order

the order() function returns the hierarchical order in a vector:

``````

In [144]:

a<- c(100, 150, 101)

``````
``````

In [145]:

order(a)

``````
``````

Out[145]:

1
3
2

``````

This can be used in data frames. For example order the planet df with the largest planet in the top.

To do this use the order function with the decreasing = TRUE argument to make a vector and then pass it in the df :

``````

In [146]:

positions <- order(planets_df\$diameter, decreasing=TRUE)

``````
``````

In [147]:

largest_first_df <- planets_df[positions, ]

``````
``````

In [148]:

largest_first_df

``````
``````

Out[148]:

planetstypediameterrotationrings

5JupiterGas giant11.2090.41TRUE
6SaturnGas giant9.4490.43TRUE
7UranusGas giant4.007-0.72TRUE
8NeptuneGas giant3.8830.67TRUE
3EarthTerrestrial planet11FALSE
2VenusTerrestrial planet0.949-243.02FALSE
4MarsTerrestrial planet0.5321.03FALSE
1MercuryTerrestrial planet0.38258.64FALSE

``````

1.7) Lists

Lists hold together objects from different data types, sizes and flavors under one common name.

To make a list just use the `list()` function with arguments the things you want the list to hold

``````

In [149]:

vec <- 1:10;
mat <- matrix(1:9, byrow=TRUE, ncol=3)
mdf <- mtcars[1:10,]

``````
``````

In [151]:

my_list <- list(vec, mat, mdf)
my_list

``````
``````

Out[151]:

1
2
3
4
5
6
7
8
9
10

123
456
789

mpgcyldisphpdratwtqsecvsamgearcarb

Mazda RX42161601103.92.6216.460144
Mazda RX4 Wag2161601103.92.87517.020144
Datsun 71022.84108933.852.3218.611141
Hornet 4 Drive21.462581103.083.21519.441031
Valiant18.162251052.763.4620.221031
Duster 36014.383602453.213.5715.840034
Merc 240D24.44146.7623.693.19201042
Merc 23022.84140.8953.923.1522.91042
Merc 28019.26167.61233.923.4418.31044

``````

Update the `names(my_list)` to name the object your list holds:

``````

In [155]:

names(my_list) <- c("vector", "matrix", "dataframe")

``````
``````

In [156]:

my_list

``````
``````

Error in vapply(seq_along(mapped), function(i) {: values must be length 1,
but FUN(X[[3]]) result is length 0

Out[156]:

\$vector

1
2
3
4
5
6
7
8
9
10

\$matrix

123
456
789

\$dataframe

mpgcyldisphpdratwtqsecvsamgearcarb

Mazda RX42161601103.92.6216.460144
Mazda RX4 Wag2161601103.92.87517.020144
Datsun 71022.84108933.852.3218.611141
Hornet 4 Drive21.462581103.083.21519.441031
Valiant18.162251052.763.4620.221031
Duster 36014.383602453.213.5715.840034
Merc 240D24.44146.7623.693.19201042
Merc 23022.84140.8953.923.1522.91042
Merc 28019.26167.61233.923.4418.31044

``````

Select elements from list

To select from the list use the double bracket [[ ]]

``````

In [159]:

my_list[[1]] # returns the object 1 (ie the vector)

``````
``````

Out[159]:

1
2
3
4
5
6
7
8
9
10

``````
``````

In [160]:

my_list[[2]]

``````
``````

Out[160]:

123
456
789

``````
``````

In [164]:

my_list[[1]][2] #from the list the 1st object (vec),
# from that the 2nd element

``````
``````

Out[164]:

2

``````

Append to list

To append an info in the list use the c() function. To also name the new item when concatinating use the syntax :

`c(list, name=new_var)`

``````

In [166]:

year <- 1980
my_list_2 <- c(my_list, myyear=year)
my_list_2

``````
``````

Error in vapply(seq_along(mapped), function(i) {: values must be length 1,
but FUN(X[[3]]) result is length 0

Out[166]:

\$vector

1
2
3
4
5
6
7
8
9
10

\$matrix

123
456
789

\$dataframe

mpgcyldisphpdratwtqsecvsamgearcarb

Mazda RX42161601103.92.6216.460144
Mazda RX4 Wag2161601103.92.87517.020144
Datsun 71022.84108933.852.3218.611141
Hornet 4 Drive21.462581103.083.21519.441031
Valiant18.162251052.763.4620.221031
Duster 36014.383602453.213.5715.840034
Merc 240D24.44146.7623.693.19201042
Merc 23022.84140.8953.923.1522.91042
Merc 28019.26167.61233.923.4418.31044

\$myyear
1980

``````

2) Intermediate R

2.1) Relational Operators

These operators are ==, !=, <, >, and can be used to return TRUE or FALSE.

Some interesting cases:

``````

In [167]:

3>5

``````
``````

Out[167]:

FALSE

``````
``````

In [168]:

FALSE == TRUE

``````
``````

Out[168]:

FALSE

``````
``````

In [169]:

FALSE < TRUE  # TRUE = 1, FALSE = 0

``````
``````

Out[169]:

TRUE

``````
``````

In [171]:

"Hello" > "Goodbye" # alphabetical order (G < H < I ...)

``````
``````

Out[171]:

TRUE

``````

For vectors:

``````

In [172]:

``````
``````

In [173]:

``````
``````

Out[173]:

FALSE
FALSE
FALSE
TRUE
TRUE

``````
``````

In [176]:

facebook <- c(10,50,20,56) # check the warning

``````
``````

In [177]:

``````
``````

Warning message:
In linkedin < facebook: longer object length is not a multiple of shorter object length

Out[177]:

FALSE
TRUE
FALSE
TRUE
FALSE

``````
``````

In [ ]:

``````