Before we begin!


The following course although uses Data Camp quite significantly, but I am going to add the courses from R swirl package too. Why? Because I wan't to get the best of both worlds.

R swirl package courses are a bit difficult at first but nothing undoable. It's just my way of practising R programming.

Table of Content



Swirl: R Programming

  1. Installing Swirl

  2. Basic Building Blocks

  3. The Workspace


Installing Swirl


"First things, First!"

Step 1: Get R

(copy pasting stuff, why re-invent the wheel!)

  • In order to run swirl, you must have R 3.1.0 or later installed on your computer.

  • If you need to install R, you can do so here.

For help installing R, check out one of the following videos (courtesy of Roger Peng at Johns Hopkins Biostatistics):

Step 2: Get R Studio!

Although not mendatory, as I am using an R kernal for Jupyter notebook just to add in more interactivity as I am a kinesthetic learner, but swirl courses can be taken using R-studio which is an IDE for Running R-scripts and much more.

Here's a link to the IDE: R Studio

Step 3: Install Swirl

Well we might wonder, What is is Swirl? In short, It's an R package for learning R, inside R, cool!

  • Inside R-studio, into the console, after > type:

    install.packages("swirl")

  • Patience is the key, it will take a min to download the swirl package from CRAN repo.

Step 4: Start Swirl

  • Now load the package by typing in the console:

    library(swirl)

  • Call the swirl() method, to begin the interactive session.

    swirl()

Step 5: Installing an Interactive Course

Although this is not the scope of what I am practising, but just to be helpful here, we can install more interactive courses.

Following is the link to more courses.

If we Wan't more? Here's the whole Swirl Course Network for us.

That's it!

True Basics


[RQ1: ] What are the three advantages of using R?

Ans:

  • Publication quality visualizations.

  • R pacakges

  • Reproudcibiliy, using R-scripts.


[RQ2: ] What are the two reasons for using R-scripts?

Ans:

  • Reproducibiliy.

  • Automate your work.


[RQ3: ] _Which symbol should you use to place comments in R?

Ans: The pound sign #.


**[RQ4: ]Which R function should you use to remove a variable from the workspace in R?

Ans: To remove a variable ( or any R object ) from workspace, we can use the remove method, type in the console: rm("variable/object name")

Note: Beware! doing so will completely remove that object from R's workspace.

Lab 1


Objective:

  • Learn about the pros and cons of R.
  • Performing Hands-on R coding.
  • Using R as a basic Calculator.

    • Perform Variable assignment.
  • Learn to use the R's Workspace.

R: The true basics

  1. How it works

  2. Documenting your code

  3. Little arithemetic with R

  4. R's pros and cons

  5. Varibale assignment

  6. Varibale assignment 2

  7. Variable assignment 3

  8. The Workspace

  9. Build and Destroy your workspace

How it works



In [1]:
####################################################
# Title: How it works                              #
# -------------------------------------------------#
# About: Getting to know R                         #
# -------------------------------------------------#
# Instructions:                                    #
#   > Add another line of code to that calculates  #
#     the sum of 6 and 12, hit enter if in R-studio#
#     else, hit ctl + Enter key in Jupyter Notebook#
####################################################

6 + 12


18

Documenting your code


Preface:

Adding comments to your code is extremely important to make sure that you and others can understand what your code is about. R makes use of the # sign to add comments, just like Twitter!

It's important to note that comments are not run as R code, so they will not influence your result. For example, Calculate 3 + 4 in the editor on the right is a comment and is ignored during execution.

Instructions:

  • Add a comment in the editor or in Jupyter, right infront of the former arithematic we did.

In [2]:
#####################################################
# Title: Documenting your code, and we are doing it!#
# --------------------------------------------------#
#####################################################

# Calculate 3 + 4
3 + 4

# Calculate 6 + 12
6 + 12


7
18

Little arithmetics with R


In its most basic form R can be used as a simple calculator. Consider the following arithmetic operators:

  • Addition: +
  • Subtraction: -
  • Multiplication: *
  • Division: /
  • Exponentiation: ^
  • Modulo: %%

The last two might need some explaining:

  • The ^ operator raises the number to its left to the power of the number to its right: for example 3^2 equals 9.
  • The modulo returns the remainder of the division of the number to the left by the number on its right, for example 5 modulo 3 or 5 %% 3 equals 2.

Pop Quiz

[Swirl]Why we use programming language as opposed to a calculator?

Ans: "To automate some process, or avoid unnecessary repetition."


Instructions:

  • Type 2^5 in the editor to calculate 2 to the power 5.

  • Type 28 %% 6 to calculate 28 modulo 6.

  • Click 'Submit Answer' and have a look at the R output in the console.


In [3]:
#####################################################
# Title: Little arithmetics with R                  #
# --------------------------------------------------#
# About: Doing some basic arithmetic with R         #
#####################################################

# Addition
5 + 5 

# Subtraction
5 - 5 

# Multiplication
3 * 5

 # Division
(5 + 5) / 2 

# Exponentiation
2^5

# Modulo
28 %% 6


10
0
15
5
32
4

R's pros and cons


There are things that make R the awesome and immensely popular language that it is today. On the other hand, there are also aspects about R that are less attractive.

Which of the following statements are true regarding this statistical programming language developed by Ihaka and Gentleman in the nineties?

  1. As opposed to SAS and SPSS, R is completely open-source.

  2. R is open-source, but it's hard to share your code with others since R uses a command-line interface.

  3. It typically takes a long time for new and updated R packages to be released and made available to the public.

  4. R is easy to use, but this comes at the cost of limited graphical abilities.

  5. R works well with large data sets, if the code is properly written and the data fits into the working memory.

Ans: Statement 1 and 5.

Variable assignment


As formerly discussed, a part of using programming language is to automate processes, to do so we need to create variables that store some value for reuse in later work, and of course adding reproducibility to our work.

  • In simplest sense, it allow us to store a value / object in R.

  • In R we use assignment operator <- to assign a value to our variables.

Instructions:

  • Assign the value 42 to x

  • Print out the value of the variable x


In [6]:
########################################################
# Title: Variable Assignent                            #
# -----------------------------------------------------#
# About: Using variables using <- operator.            #
#        We can use = operator too!                    #
# -----------------------------------------------------#
# Useful Mnemonic, think of <- as an arrow!            #
########################################################

# Assign the value 42 to x
x <- 42

# Print out the value of the varible x
print(x)


[1] 42

Pinch:

Notice that R did not print the result of 12 this time. When you use the assignment operator, R assumes that you don't want to see the result immediately, but rather that you intend to use the result for something else later on.

Variable assignment 2


Suppose you have a fruit basket with five apples. As a data analyst in training, you want to store the number of apples in a variable with the name my_apples.

Instructions:

  • Using <-, assign the value 5 to my_apples below the first comment.

  • Type my_apples below the second comment. This will print out the value of my_apples.

  • Click 'Submit Answer', and look at the console: you see that the number 5 is printed. So R now links the variable my_apples to the value 5.


In [7]:
# Assign the value 5 to the variable called my_apples
my_apples <- 5

# Print out the value of the variable my_apples
my_apples


5

Variable assignment 3


Every tasty fruit basket needs oranges, so you decide to add six oranges. As a data analyst, your reflex is to immediately create the variable my_oranges and assign the value 6 to it.

Next, you want to calculate how many pieces of fruit you have in total. Since you have given meaningful names to these values, you can now code this in a clear way:

my_apples + my_oranges

Instructions:

  • Assign to my_oranges the value 6.

  • Add the variables my_apples and my_oranges and have R simply print the result.

  • Combine the variables my_apples and my_oranges into a new variable my_fruit, which is the total amount of fruits in your fruit basket.


In [8]:
# Assign a value to the variables my_apples and my_oranges
my_apples <- 5
my_oranges <- 6

# Add these two variables together and print the result
my_apples + my_oranges

# Create the variable my_fruit
my_fruit <- my_apples + my_oranges


11

The Workspace


If you assign a value to a variable, this variable is stored in the workspace. It's the place where all user-defined variables and objects live. The command ls() lists the contents of this workspace. rm() allows you to remove objects from the workspace again. Try the following code in the console:

  • a <- 1
  • b <- 2
  • ls()
  • rm(a)
  • ls()
  • The first two lines create the varibles a and b.

  • Calling ls() shows all the objects currently in the workspace.

  • An object can be removed via rm("object name").

    • e.g. rm(a) removes object a, ls() will show only one remaining object i.e. b in the workspace.
- Can remove both objects via `rm(a, b)`.

  • If we wan't to remove everything from workspace for decluttering things, we can use rm( list = ls()).

Instructions:

  • List the contents of the workspace to check that the workspace is empty.
  • Create a variable, horses, equal to 3.
  • Create another variable, dogs, which you set to 7.
  • Create a new variable, animals, that is equal to the sum of horses and dogs.
  • Inspect the contents of the workspace again to see that indeed, these three variables are available.
  • Eliminate the dogs variable from the workspace.
  • Finally, inspect the objects in your workspace once more to see that only horses and animals remain.

In [9]:
# Clear the entire workspace
rm(list = ls())

# List the contents of your workspace
ls()

# Create the variable horses
horses <- 3

# Create the variable dogs
dogs <- 7

# Create the variable animals
animals <- horses + dogs

# Inspect the contents of the workspace again
ls()

# Remove dogs from the workspace
rm(dogs)

# Inspect the objects in your workspace once more
ls()


""
  1. "animals"
  2. "dogs"
  3. "horses"
  1. "animals"
  2. "horses"

Basic Data Types


The notes are available as handouts. The content up ahead are just exercises and Lab content.

Knowledge Checks

RQ1: Which R function should you use to get the variable type?

Ans: class()


RQ2: Which two of the following variables are logical values?

Ans:

  • TRUE

  • NA


RQ3: What is the purpose of the is.*() function?

Ans: It is used to see whether variables are of certain type.


RQ4: What is the purpose of the as.*() function?

Ans: To Transform the type of a variable to another type.


Go to TOC

Lab-2


Objective:

  • Getting familiar with the basic data types in R.
  • Creating variables of different types in R.
  • Inspecting those types with the class() function.
  • Understanding Coercion / Type casting in R.

Basic Data Types

1. Basic Data Types

2. Back to Apples and Oranges

3. What's that data types?

4. Coercion: Taming your data

5. Coercion for the sake of cleaning


Basic Data Types


Preface

Some of R's most basic types to get started are:

  • Decimals values like 4.5 are called numerics.
  • Natural numbers like 4L are called integers. Integers are also numerics.
  • Boolean values (TRUE or FALSE) are called logical.

+Text (or string) values are called characters.

Note: how the quotation marks on the right indicate that "some text" is a character.


Instructions:

Change the value of the:

  • my_numeric variable to 42.
  • my_character variable to "forty-two". Note that the quotation marks indicate that "forty-two" is a character.
  • my_logical variable to FALSE.

Note: that R is case sensitive!


In [ ]:
#####################################################
# Title: Basic Data Types                           #
# --------------------------------------------------#
# About: Practisng with data types in R             #
#####################################################

# What is the answer to the universe?
my_numeric <- 42        #Interesting queue to SETI project!

# The quotation marks indicate that the variable is of type character
my_character <- "forty-two"

# Change the value of my_logical
my_logical <- FALSE

Back to Apples and Oranges


Preface:

Common knowledge tells you not to add apples and oranges. But hey, that is what you just did, no :-)?

The my_apples and my_oranges variables both contained a number in the previous exercise. The + operator works with numeric variables in R. If you really tried to add "apples" and "oranges", and assigned a text value to the variable my_oranges (see the editor), you would be trying to assign the addition of a numeric and a character variable to the variable my_fruit. This is not possible.


Instructions:

  • Click 'Submit Answer' and read the error message. Make sure to understand why this did not work.
  • Adjust the code so that R knows you have 6 oranges and thus a fruit basket with 11 pieces of fruit.

In [1]:
# Assign a value to the variable called my_apples
my_apples <- 5 

# Print out the value of my_apples
my_apples       

# Assign a value to the variable my_oranges and print it out
my_oranges <- 6   #changed "six" to 6.
my_oranges 

# New variable that contains the total amount of fruit
my_fruit <- my_apples + my_oranges 
my_fruit


5
6
11

What's that data type?


Preface:

Do you remember that when you added 5 + "six", you got an error due to a mismatch in data types? You can avoid such embarrassing situations by checking the data type of a variable beforehand. You can do this as follows:

class(some_variable_name)

In the workspace (you can inspect it by typing ls() in the console), some variables have already been defined. Which statement concerning these variables are correct?


Ans: a's class is numeric, b is a character, c is a logical.

Coercion: Taming your data


Preface:

coercion to transform your data from one type to the other is possible. Next to the class() function and the is.*() functions, you can use the as.*() functions to enforce data to change types. For example,

var <- "3"

var_num <- as.numeric(var)

converts the character string "3" in var to a numeric 3 and assigns it to var_num. Beware however, that it is not always possible to convert the types without information loss or errors:

as.integer("4.5")

as.numeric("three")

The first line will convert the character string "4.5" to the integer 4. The second one will convert the character string "three" to an NA.


Instructions:

  • Convert var1, a logical, to a character and assign it to the variable var1_char.
  • Next, see whether var1_char actually is a character by using the is.character() function on it.
  • Convert var2, a numeric, to a logical and assign it to the variable var2_log.
  • Inspect the class of var2_log using class().
  • Finally, try to coerce var3 to a numeric and assign the result to var3_num. Was it successful?

In [3]:
# Create variables var1, var2 and var3
var1 <- TRUE
var2 <- 0.3
var3 <- "i"

# Convert var1 to a character: var1_char
var1_char <- as.character(var1)
var1_char

# See whether var1_char is a character
is.character(var1_char)

# Convert var2 to a logical: var2_log
var2_log <- as.logical(var2)
var2_log

# Inspect the class of var2_log
class(var2_log)

# Coerce var3 to a numeric: var3_num
var3_num <- as.numeric(var3)
var3_num


"TRUE"
TRUE
TRUE
"logical"
Warning message in eval(expr, envir, enclos):
"NAs introduced by coercion"
[1] NA

Coercion for the sake of cleaning


Preface:

When might coercion come in handy?

  • When dealing with "messy" datasets.

    • (say) numerical variables have been stored as character strings,
    • logicals have been stored as numericals.
  • To prepare ourselves for such problems, try this coding exercise: your first modest steps in data cleaning!

Instructions:

  • Use as.numeric() to convert the character age; assign the result to a new variable age_clean.
  • With the help of as.logical(), convert the numeric employed and store the result to a new variable employed_clean.
  • Using the as.numeric() function, convert the respondent's salary to a numeric; assign the resulting numeric to the variable salary_clean.

In [ ]:
# Convert age to numeric: age_clean
age_clean <- as.numeric(age)

# Convert employed to logical: employed_clean
employed_clean <- as.logical(employed)

# Convert salary to numeric: salary_clean
salary_clean <- as.numeric(salary)