Day 5: Introduction to Correlation

Objective

In this challenge, we practice calculating correlation. Check out the Resources tab to learn more!

Task

You are provided the popularity scores for a set of juices (the higher, the better): [10, 9.8, 8, 7.8, 7.7, 7, 6, 5, 4, 2]

These are the respective prices for the juices: [200, 44, 32, 24, 22, 17, 15, 12, 8, 4]

Write a program computing (or calculate manually) the Pearson coefficient and the Spearman Rank coefficient of correlation between these values.



In [5]:

    
# #Python Import Libraries
import scipy
from scipy import stats

# #Data
arr_popularity = [10, 9.8, 8, 7.8, 7.7, 7, 6, 5, 4, 2]
arr_price = [200, 44, 32, 24, 22, 17, 15, 12, 8, 4]



In [8]:

    
scipy.stats.pearsonr(arr_popularity, arr_price)









    Out[8]:





(0.61247219372084816, 0.05978461460708815)



In [9]:

    
scipy.stats.spearmanr(arr_popularity, arr_price)









    Out[9]:





(1.0, 0.0)

Day 5: Introduction to Linear Regression

Objective

In this challenge, we practice using linear regression techniques. Check out the Resources tab to learn more!

Task

You are given the Math aptitude test (x) scores for a set of students, as well as their respective scores for a Statistics course (y). The students enrolled in Statistics immediately after taking the math aptitude test.

The scores (x, y) for each student are:

(95,85)

(85,95)

(80,70)

(70,65)

(60,70)

If a student scored an 80 on the Math aptitude test, what score would we expect her to achieve in Statistics?

Determine the equation of the best-fit line using the least squares method, and then compute the value of y when x=80.



In [60]:

    
# #Python Import Libraries
import sklearn
import numpy as np



In [54]:

    
arr_x = [i[0] for i in arr_data]
arr_y = [i[1] for i in arr_data]



In [57]:

    
stats.linregress(arr_x, arr_y)









    Out[57]:





(0.64383561643835618,
 26.780821917808218,
 0.69305252981930043,
 0.19446749009400915,
 0.38664772840212874)



In [58]:

    
m, c, r_val, p_val, err = stats.linregress(arr_x, arr_y)



In [61]:

    
# #y = mx + c
m*80 + c









    Out[61]:





78.287671232876704

Answer : 78.3



In [ ]: