Only the miniscules are being covered, the entire of the lecture notes are covered inside the provided handouts.
Miniscules
Attaching lables to the vector elements:
Step3: Using names()
function to attach labels.
Syntax:
names(<vector1>, <name vector>)
c( key = value, ... )
c( "key" = value, ... )
Under the hood: R vectors have "attributes" associcated with them.
i.e. setting the names of the remaining vector, as seeting names attributes of the remains object.
To prove this, we can pull up str()
function that compactly displays the structure of an R Object.
Vectors are homogeneous( i.e they can hold elements of the same type ).
[RQ1]: What function should you use to create a vector?
Ans: The combine function, c()
.
[RQ2]: Which three of the following actions attach labels to the vector elements?
Ans:
Define the labels by using the names()
function.
Define the labels inside the c()
by using the equal sign, putting the names of the labels b/w quotes.
Or by putting the names of the labels b/w quotes.
[RQ3]: Which two of the following options are correct?
Ans: Atomic vectors can hold elements only of the same type, whereas lists can hold elements of a different type.
Preface:
Vectors are one dimensional arrays that can hold numeric data, character data, or logical data. You create a vector with the combine function c(). You place the vector elements separated by a comma between the brackets. For example:
numeric_vector <- c(1, 2, 3)
character_vector <- c("a", "b", "c")
boolean_vector <- c(TRUE, FALSE)
Instructions:
Build a vector, boolean_vector, that contains the three elements: TRUE, FALSE and TRUE (in that order).
In [1]:
#####################################################
# Title: Create a Vector #
# --------------------------------------------------#
# About: Title says it all! #
#####################################################
numeric_vector <- c(1, 10, 49)
character_vector <- c("a", "b", "c")
# Create boolean_vector
boolean_vector <- c( TRUE, FALSE, TRUE)
boolean_vector
Preface:
After one week in Las Vegas and still zero Ferraris in your garage, you decide that it is time to start using your data science superpowers.
Before doing a first analysis, you decide to first collect all the winnings and losses for the last week:
For poker_vector
:
On Monday you won $140
Tuesday you lost $50
Wednesday you won $20
Thursday you lost $120
Friday you won $240
For roulette_vector
:
On Monday you lost $24
Tuesday you lost $50
Wednesday you won $100
Thursday you lost $350
Friday you won $10
You only played poker and roulette, since there was a delegation of mediums that occupied the craps tables. To be able to use this data in R, you decide to create the variables poker_vector
and roulette_vector
.
Instructions:
Assign the winnings/losses for roulette to the variable roulette_vector
.
In [2]:
#####################################################
# Title: Create a Vector 2 #
# --------------------------------------------------#
# About: Analysing a poker game. #
#####################################################
# Poker winnings from Monday to Friday
poker_vector <- c(140, -50, 20, -120, 240)
# Roulette winnings from Monday to Friday: roulette_vector
roulette_vector <- c(-24, -50, 100, -350, 10)
Part of a job of data analyst is to have a clear view on the data being used.
Preface:
In the previous exercise, we created a vector with our winnings over the week. Each vector element refers to a day of the week but it is hard to tell which element belongs to which day. It would be nice if we could show that in the vector itself. Remember the names()
function to name the elements of a vector?
some_vector <- c("Johnny", "Poker Player")
names(some_vector) <- c("Name", "Profession")
We can do the same thing in our combine function, c()
:
some_vector <- c(Name = "Johnny", Profession = "Poker Player")
Instructions:
Go ahead and assign the days of the week as names to poker_vector
and roulette_vector
. In case you are not sure, the days of the week are: Monday, Tuesday, Wednesday, Thursday and Friday.
In [3]:
# Poker winnings from Monday to Friday
poker_vector <- c(140, -50, 20, -120, 240)
# Roulette winnings from Monday to Friday
roulette_vector <- c(-24, -50, 100, -350, 10)
# Add names to both poker_vector and roulette_vector
names(poker_vector) <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(roulette_vector) <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
Preface:
In the previous exercises you probably experienced that it is boring and frustrating to type and retype information such as the days of the week. However, there is a more efficient way to do this, namely, to assign the days of the week vector to a variable!
Just like we did with your poker and roulette returns, we can also create a variable that contains the days of the week. This way we can use and re-use it.
Instructions:
Create a variable days_vector
that contains the days of the week, from Monday to Friday.
Use that variable days_vector
to set the names of poker_vector
and roulette_vector
.
In [5]:
# Poker winnings from Monday to Friday
poker_vector <- c(140, -50, 20, -120, 240)
# Roulette winnings from Monday to Friday
roulette_vector <- c(-24, -50, 100, -350, 10)
# Create the variable days_vector
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
# Assign the names of the day to roulette_vector and poker_vector
names(poker_vector) <- days_vector
names(roulette_vector) <- days_vector
# Check
poker_vector
roulette_vector
The previous exercises outlined different ways of creating and naming vectors. Have a look at this chunk of code:
poker_vector1 <- c(140, -50, 20, -120, 240)
names(poker_vector1) <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
poker_vector2 <- c(Monday = 140, -50, 20, -120, 240)
roulette_vector1 <- c(-24, -50, 100, -350, 10)
days_vector <- names(poker_vector1)
names(roulette_vector1) <- days_vector
roulette_vector2 <- c(-24, -50, 100, -350, 10)
names(roulette_vector2) <- "Monday"
Which of the following statements is true?
Ans: poker_vector1
and roulette_vector1
have the same names, while poker_vector2
and roulette_vector2
show a names mismatch.
Conclusion
We might expect that the names of the vectors roulette_vector1
and roulette_vector2
are named the same; but the different approaches treat missing name information differently. Also, notice here how we can also use names()
to get the names of a vector!
Nuances:
x <- c(3)
This raises a question, what if we have vectors of un-equal lengths?
We will look into it a bit later in the course.
For now, let's see an example:
In [6]:
earnings <- c( 50, 89, 34 )
expenditure <- c( 24, 46, 43 )
earnings - expenditure
3 * earnings # 3; scalar and earnings; vector
is.logical(earnings) # Interesting! returns a single logcial value
as.logical(earnings) # does what it says!
RQ1: Complete the following sentence: To calculate the sum of 2 vectors of equal length, __.
Ans: R takes the sum of each element of the vectors and returns a new vector of the same length. correct.
RQ2: How is multiplication and division of vectors performed in R?
Ans: Element wise.
RQ3: Which two of the following statements are correct?
Ans:
When multiplying a vector with a scalar (single value) in R, every element in the vector will be multiplyed by this scalar.
By using the sum()
func., it is possible to add up all the elements in a vector.
Preface:
Now that you have the poker and roulette winnings nicely as a named vector, you can start doing some data science magic.
You want to find out the following type of information:
You'll have to do arithmetic calculations on vectors to solve these problems. Remember that this happens element-wise; the following three statements are completely equivalent:
c(1, 2, 3) + c(4, 5, 6)
c(1 + 4, 2 + 5, 3 + 6)
c(5, 7, 9)
Instructions:
A_vector
and B_vector and it assign to
total_vector`. The result should be a vector.total_vector
to the console.B_vector
from A_vector
and assign the result to diff_vector.diff_vector
to the console as well.
In [2]:
# A_vector and B_vector have already been defined for you
A_vector <- c(1, 2, 3)
B_vector <- c(4, 5, 6)
# Take the sum of A_vector and B_vector: total_vector
total_vector <- (A_vector + B_vector) # Overall profit or loss /day
# Note: Addition is perfomed element wise.
# Print total_vector
print(total_vector)
# Calculate the difference between A_vector and B_vector: diff_vector
diff_vector <- (A_vector - B_vector) # Made loss
# Print diff_vector
print(diff_vector)
Preface:
Instructions:
Assign to the varible total_daily
how much you won or lost on each day in total( poker and roulette combined )
In [4]:
# Casino winnings from Monday to Friday
poker_vector <- c(140, -50, 20, -120, 240)
roulette_vector <- c(-24, -50, 100, -350, 10)
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(poker_vector) <- days_vector
names(roulette_vector) <- days_vector
# Take a peak!
poker_vector
roulette_vector
# Calculate your daily earnings: total_daily
total_daily <- (poker_vector + roulette_vector)
Preface:
The sum()
calculates the sum of all elements of a vector.
Instructions:
total_poker
.total_roulette
.total_week
(which is the sum of all gains and losses of the week).total_week
.
In [5]:
# Casino winnings from Monday to Friday
poker_vector <- c(140, -50, 20, -120, 240)
roulette_vector <- c(-24, -50, 100, -350, 10)
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(poker_vector) <- days_vector
names(roulette_vector) <- days_vector
# Total winnings with poker: total_poker
total_poker <- sum( poker_vector )
# Total winnings with roulette: total_roulette
total_roulette <- sum( roulette_vector )
# Total winnings overall: total_week
total_week <- sum( total_poker, total_roulette )
# Print total_week
print( total_week)
Conclusion: Look's like we are losing money!
Preface:
We rethink our strategy and realize that we might be less skilled in roulette than in poker, we check it by using the >
operator.
Instructions:
poker_better
, that tells whether your poker gains exceeded your roulette results on a daily basis.total_poker
and total_roulette
as in the previous exercise.total_poker
and total_roulette
, Check if your total gains in poker are higher than for roulette by using a comparison. Assign the result of this comparison to the variable choose_poker
and print it out. What do you conclude, should you focus on roulette or on poker?
In [6]:
# Casino winnings from Monday to Friday
poker_vector <- c(140, -50, 20, -120, 240)
roulette_vector <- c(-24, -50, 100, -350, 10)
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(poker_vector) <- days_vector
names(roulette_vector) <- days_vector
# Calculate poker_better
poker_better <- (poker_vector > roulette_vector)
# Calculate total_poker and total_roulette, as before
total_poker <- sum( poker_vector)
total_roulette <- sum( roulette_vector )
# Calculate choose_poker
choose_poker <- ( total_poker > total_roulette )
# Print choose_poker
print(choose_poker)
Conclusion: Look like we truly are doing much better in "poker".
Preface:
In the previous exercise, you found out that roulette is not really your forte. However, you have some vague memories from visits in Vegas where you actually excelled at this game. You plan to dig through your receipts of when you withdrew and cashed chips and found out about your actual performance in the previous week you were in Sin City. In that week, you also only played poker and roulette; the information is stored in poker_past
and roulette_past
. The information for the current week, with which you have been working all along, is in poker_present
and roulette_present
. All these variables are available in your workspace.
Instructions:
sum()
function twice in combination with the +
operator to calculate the total gains for your entire past week in Vegas (this means for both poker and roulette). Assign the result to total_past
.-
operator, subtract poker_past
from poker_present
, to calculate diff_poker
. diff_poker
should be a vector with 5 elements.
In [24]:
# Casino winnings from Monday to Friday
poker_past <- c(-70, 90, 110, -120, 30)
roulette_past <- c(-24, -50, 100, -350, 10)
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(poker_past) <- days_vector
names(roulette_past) <- days_vector
poker_past
roulette_past
poker_present <- c(140, -50, 20, -120, 240)
names(poker_present) <- days_vector
poker_present
# Calculate total gains for your entire past week: total_past
total_past <- sum( poker_past ) + sum( roulette_past )
# Difference of past to present performance: diff_poker
diff_poker <- poker_present - poker_past
print(diff_poker)
print('\nLooks like we made worse!')
[RQ1]: Which kind of brackets should you use in order to subset a vector?
Ans: A pair of square bracket, []
.
[RQ2]: When using the minus operator for subsetting a named vector, you can subset by?
Ans: it's index
.
[RQ3]: Which two of the following statements are true?
Ans: The following are true:
Preface:
After you figured that roulette is not your forte, you decide to compare your performance at the beginning of the working week compared to the end of it. You did have a couple of Margarita cocktails at the end of the week…
To answer that question, you only want to focus on a selection of the total_vector
. In other words, our goal is to select specific elements of the vector.
Instructions:
Assign the poker results of Wednesday to the variable poker_wednesday
.
Assign the roulette results of Friday to the variable roulette_friday
.
In [2]:
# Casino winnings from Monday to Friday
poker_vector <- c(140, -50, 20, -120, 240)
roulette_vector <- c(-24, -50, 100, -350, 10)
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(poker_vector) <- days_vector
names(roulette_vector) <- days_vector
# Poker results of Wednesday: poker_wednesday
poker_wednesday <- poker_vector[3]
poker_wednesday
# Roulette results of Friday: roulette_friday
roulette_friday <- roulette_vector[5]
roulette_friday
Preface:
How about analyzing your midweek results? Instead of using a single number to select a single element, you can also select multiple elements by passing a vector inside the square brackets. For example,
poker_vector[c(1,5)]
selects the first and the fifth element of poker_vector
.
Instructions:
Assign the poker results of Tuesday, Wednesday and Thursday to the variable poker_midweek
.
Assign the roulette results of Thursday and Friday to the variable roulette_endweek
.
In [5]:
# Casino winnings from Monday to Friday
poker_vector <- c(140, -50, 20, -120, 240)
roulette_vector <- c(-24, -50, 100, -350, 10)
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(poker_vector) <- days_vector
names(roulette_vector) <- days_vector
# Mid-week poker results: poker_midweek
poker_midweek <- poker_vector[c(2, 3, 4)]
poker_midweek
# End-of-week roulette results: roulette_endweek
roulette_endweek <- roulette_vector[c(4, 5)]
roulette_endweek
Preface:
Now, selecting multiple successive elements of poker_vector
with c(2,3,4)
is not very convenient. Many statisticians are lazy people by nature, so they created an easier way to do this: c(2,3,4)
can be abbreviated to 2:4
, which generates a vector with all natural numbers from 2 up to 4. Try it out in the console!
Instructions:
Assign to roulette_subset the roulette results from Tuesday to Friday inclusive by making use of :
.
Print the resulting variable to the console.
In [8]:
# Casino winnings from Monday to Friday
poker_vector <- c(140, -50, 20, -120, 240)
roulette_vector <- c(-24, -50, 100, -350, 10)
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(poker_vector) <- days_vector
names(roulette_vector) <- days_vector
# Roulette results for Tuesday to Friday inclusive: roulette_subset
roulette_subset <- roulette_vector[2:5]
# Print roulette_subset
roulette_subset
print("The elements in poker_vector and roulette_vector also have names associated with them? You can also subset vectors using these names, remember?")
Preface:
Another way to tackle the previous exercise is by using the names of the vector elements (Monday, Tuesday, …) instead of their numeric positions. For example,
poker_vector["Monday"]
will select the first element of poker_vector
since "Monday" is the name of that first element.
Instructions:
Select the fourth element, corresponding to Thursday, from roulette_vector
. Name it roulette_thursday
.
Select Tuesday's poker gains using subsetting by name. Assign the result to poker_tuesday.
In [9]:
# Casino winnings from Monday to Friday
poker_vector <- c(140, -50, 20, -120, 240)
roulette_vector <- c(-24, -50, 100, -350, 10)
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(poker_vector) <- days_vector
names(roulette_vector) <- days_vector
# Select Thursday's roulette gains: roulette_thursday
roulette_thursday <- roulette_vector["Thursday"]
roulette_thursday
# Select Tuesday's poker gains: poker_tuesday
poker_tuesday <- poker_vector["Tuesday"]
poker_tuesday
Preface:
Just like selecting single elements using numerics extends naturally to selecting multiple elements, you can also use a vector of names. As an example, try:
roulette_vector[c("Monday","Wednesday")]
Of course you can't use the colon trick here:
`"Monday":"Wednesday" will generate an error.
Instructions:
Create a vector containing the poker gains for the first three days of the week; name it poker_start.
Using the function mean()
, calculate the average poker gains during these first three days. Assign the result to a variable avg_poker_start
.
In [10]:
# Casino winnings from Monday to Friday
poker_vector <- c(140, -50, 20, -120, 240)
roulette_vector <- c(-24, -50, 100, -350, 10)
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(poker_vector) <- days_vector
names(roulette_vector) <- days_vector
# Select the first three elements from poker_vector: poker_start
poker_start <- poker_vector[c("Monday", "Tuesday", "Wednesday")]
# Calculate the average poker gains during the first three days: avg_poker_start
avg_poker_start <- mean(poker_start)
avg_poker_start
Preface:
There are basically three ways to subset vectors: by using the indices, by using the names (if the vectors are named) and by using logical vectors. Filip already told you about the internals in the instructional video. As a refresher, have a look at the following statements to select elements from poker_vector, which are all equivalent:
poker_vector[c(1,3)]
poker_vector[c("Monday", "Wednesday")]
poker_vector[c(TRUE, FALSE, TRUE, FALSE, FALSE)]
Instructions:
Assign the roulette results from the first, third and fifth day to roulette_subset
.
Select the first three days from poker_vector
using a vector of logicals. Assign the result to poker_start
.
In [13]:
# Casino winnings from Monday to Friday
poker_vector <- c(140, -50, 20, -120, 240)
roulette_vector <- c(-24, -50, 100, -350, 10)
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(poker_vector) <- days_vector
names(roulette_vector) <- days_vector
# Roulette results for day 1, 3 and 5: roulette_subset
roulette_subset <- roulette_vector[c(1, 3, 5)]
roulette_subset
# Poker results for first three days: poker_start
poker_start <- poker_vector[c(TRUE, TRUE, TRUE, FALSE, FALSE)]
poker_start
Preface:
By making use of a combination of comparison operators and subsetting using logicals, you can investigate your casino performance in a more pro-active way. The (logical) comparison operators known to R are:
< for less than
for greater than <= for less than or equal to = for greater than or equal to == for equal to each other != not equal to each other
Experiment with these operators in the console. The result will be a logical vector, which you can use to perform subsetting! This means that instead of selecting a subset of days to investigate yourself like before, you can simply ask R to return only those days where you realized a positive return for poker.
Instructions:
Check if your poker winnings are positive on the different days of the week (i.e. > 0), and assign this to selection_vector
.
Assign the amounts that you won on the profitable days, so a vector, to the variable poker_profits, by using selection_vector
.
In [15]:
# Casino winnings from Monday to Friday
poker_vector <- c(140, -50, 20, -120, 240)
roulette_vector <- c(-24, -50, 100, -350, 10)
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(poker_vector) <- days_vector
names(roulette_vector) <- days_vector
# Create logical vector corresponding to profitable poker days: selection_vector
selection_vector <- poker_vector > 0
selection_vector
# Select amounts for profitable poker days: poker_profits
poker_profits <- poker_vector[poker_vector > 0]
poker_profits
Preface:
To fully prepare you for the challenge that's coming, you'll do a final analysis of your casino ventures. This time, you'll use your newly acquired skills to perform advanced selection on roulette_vector.
Along the way, you'll need the sum()
function. You used it before to calculate the total winnings, so an a numeric vector. However, you can also use sum()
on a logical vector; it simply counts the number of vector elements that are TRUE
.
Instructions:
roulette_profits
. This vector thus contains the positive winnings of roulette_vector
. You can do this with a one-liner!roulette_total_profit
.sum()
function. Store the result in a variable num_profitable_days
.
In [25]:
# Casino winnings from Monday to Friday
poker_vector <- c(140, -50, 20, -120, 240)
roulette_vector <- c(-24, -50, 100, -350, 10)
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
names(poker_vector) <- days_vector
names(roulette_vector) <- days_vector
# Select amounts for profitable roulette days: roulette_profits
roulette_profits <- roulette_vector[roulette_vector > 0]
roulette_profits
# Sum of the profitable roulette days: roulette_total_profit
roulette_total_profit <- sum(roulette_profits)
roulette_total_profit
# Number of profitable roulette days: num_profitable_days
num_profitable_days <- sum(roulette_vector > 0)
num_profitable_days
# Conclusion
print("roulette is not our game!")
Preface:
By now, you should have gained some insights on how your casino habits are actually working out for you. In fact, why not decide on changing your game completely? Let's dive into the world of Blackjack for once, and analyze some game outcomes here. In short, blackjack is a game where you have to ask for cards until you arrive at a sum that is as close to 21 as possible. However, if you exceed 21, you've lost. You can be greedy and go for 21, or you can be careful and settle for 16 or so. A player wins when his or her sum, or score, exceeds that of the house.
The sums for the player's last 7 games are stored in player
; the house's scores are contained in house
. Both are available in the workspace. In both cases, the scores were never higher than 21.
Instructions:
player_third
.player
vector to only select the scores that exceeded the scores of house
, so the scores that had the player win. Use subsetting in combination with the relational operator >
. Assign the subset to the variable winning_scores
.player
was lower than 18. This time, you should use a relational operator in combination with sum()
. Save the resulting value in a new variable, n_low_score
.
In [30]:
player <- c(14, 17, 20, 21, 20, 18, 14)
house <- c(20, 15, 21, 20, 20, 17, 19)
# Select the player's score for the third game: player_third
player_third <- player[3]
player_third
# Select the scores where player exceeds house: winning_scores
winnings_scores <- player[player > house]
winnings_scores
# Count number of times player < 18: n_low_score
n_low_score <- sum( player < 18 )
n_low_score