Probability

  • Disjoint events, probability one of them occured is P(1) + P(2)+ ..., it's the same as probability one of number in die is add up to 1.
  • events : set of outcomes.
  • Law of Addition Rule, disjoint vs not disjoint. Disjoint, P(A and B) automatically zero.
P (A or B) = P (A) + P (B) − P (A and B)
  • Probability distributions : all disjoint, between 0 and 1, add up to 1.
  • Sample space, list of all possible outcomes. die event {2,3}, then complement {1,4,5,6}
  • Independence can allow two events to occurs at the same time. This will have a multiplication rule, where P( A and B occurs at the same time)
P(A and B) = P(A) × P(B)

  • For example, Since gender,handedness and person order(first vs third person) are independence event. We can, multiply together. (First or second or third) person female and right hand = P(female) * P(right hand).First two MR, third FL,
P(MR) ** 2  * P(FL)


  • Marginal probability, row/columns total in propotion of total population. Not consider other variables.
  • Join probability, take intersection of two variables, in proportion of total population.
  • *Conditional probability:
P (A|B) = P (A and B) / P(B) **or** P(A and B) = P(A|B) × P(B)

  • We can use General multiplication rule in conjunction with marginal and conditional probability.
  • with replacement, data can picked over and over again, thus make it independent every pick. Without replacement make it dependant events.

Distributions

For z-score


In [3]:
%%R

mean = 1500
sd = 300
d = 1800
(d-mean)/sd


[1] 1

To calculate the percentiles


In [9]:
%%R

mean = 1500
sd = 300
point = 2100
LT = T

pnorm(point,mean=mean,sd=sd,lower.tail=LT)


[1] 0.9772499

In [11]:
%%R

mean = 1500
sd = 300
percentile = 0.4
LT = T

qnorm(percentile,mean=mean,sd=sd,lower.tail=LT)


[1] 1423.996

Pnorm ranges


In [30]:
%%R

mean = 70
sd = 3.3
lower = 69
upper = 74
LT = T
# IN RANGE
# 1 - pnorm(upper,mean=mean,sd=sd,lower.tail=!LT) - pnorm(lower,mean=mean,sd=sd,lower.tail=LT)

# OUT RANGE
# pnorm(upper,mean=mean,sd=sd,lower.tail=!LT) + pnorm(lower,mean=mean,sd=sd,lower.tail=LT)


[1] 0.5063336
  • Standard normal distributions, is mean 0 and sd 1. Used in default pnorm and qnorm R.
  • Always draw normal probability distributions, and shade the are we're interesed.
  • In evaluation the distribution through normal probability plot, imagine a straight line. If the data is skewed to the high end of the line, it's right skewed. Otherwise , low-end is left skewed.
  • To construct normal probability plot, for each observation, make a scatter plot with its Z-score(horizontal) against its percentile(vertical).

In [ ]: