About The SAT by Felix Gaye and Dainel Greenberg

Is there a relatioship between Household income per capita and SAT scores? May 2016

The SAT is a college entrance exam created by the College Board. It is used by a majority of colleges in the United States, as a base metric in which to judge applicants. Because High schools will no doubt vary in difficulty it is often unfair to simply compare students based of their GPA. An exam like the SAT serves as a way to even the playing field.Everyone is given a chance to be directly compete with their peers. The exam is administered 7/year and students are given the opportunity to retake the exam as many ties as they see fit.

Although this exam is meant to serve as an equalizer, we are interested in seeing whether or not average household income plays a role in how high SAT scores are. As we assue that this will play a role, a more important question is how impactful it will be, and whether or not it is a big enough factor to include when comparing students.

Packages imported, we use matplotlib.pyplot to plot scatter plots. We use pandas to allow for data analysis and manipulation. Additionally we imported numpy to be used for scientific computing, and for mathematical caluclations.



In [1]:

    
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
%matplotlib inline

Creating the Dataset we used SAT scores from 2014. Our data is all downloadable directly from the CollegeBoard website. We organized our data alphabetically by State, and our analysis will focus on the combined total SAT scores of reading, writing, and math. The data was organized into columns for each subject and then saved as a csv, and read using pandas.



In [2]:

    
file1 = 'C:/Users/felgaye/Documents/Data Bootcamp Data sheet.csv'
df1 = pd.read_csv(file1)



In [3]:

    
df1









    Out[3]:






  
    
      
      State
      Reading
      Math
      Writing
      Total
      Income
    
  
  
    
      0
      Alabama
      547
      538
      532
      1617
      42278
    
    
      1
      Alaska
      507
      503
      475
      1485
      67629
    
    
      2
      Arizona
      522
      525
      500
      1547
      49254
    
    
      3
      Arkansas
      573
      571
      554
      1698
      44922
    
    
      4
      California
      498
      510
      496
      1504
      60487
    
    
      5
      Colorado
      582
      586
      567
      1735
      60940
    
    
      6
      Connecticut
      507
      510
      508
      1525
      70161
    
    
      7
      Delaware
      456
      459
      444
      1359
      57522
    
    
      8
      District of Columbia
      440
      438
      431
      1309
      68277
    
    
      9
      Florida
      491
      485
      472
      1448
      46140
    
    
      10
      Georgia
      488
      485
      472
      1445
      49555
    
    
      11
      Hawaii
      484
      504
      472
      1460
      71223
    
    
      12
      Idaho
      458
      456
      450
      1364
      53438
    
    
      13
      Illinois
      599
      616
      587
      1802
      54916
    
    
      14
      Indiana
      497
      500
      477
      1474
      48060
    
    
      15
      Iowa
      605
      611
      578
      1794
      57810
    
    
      16
      Kansas
      591
      596
      566
      1753
      53444
    
    
      17
      Kentucky
      589
      585
      572
      1746
      42786
    
    
      18
      Louisiana
      561
      556
      550
      1667
      42406
    
    
      19
      Maine
      467
      471
      449
      1387
      51710
    
    
      20
      Maryland
      492
      495
      481
      1468
      76165
    
    
      21
      Massachusetts
      516
      531
      509
      1556
      63151
    
    
      22
      Michigan
      593
      610
      581
      1784
      52005
    
    
      23
      Minnesota
      598
      610
      578
      1786
      67244
    
    
      24
      Mississippi
      583
      566
      565
      1714
      35521
    
    
      25
      Missouri
      595
      597
      579
      1771
      56630
    
    
      26
      Montana
      555
      552
      530
      1637
      51102
    
    
      27
      Nebraska
      589
      587
      569
      1745
      56870
    
    
      28
      Nevada
      495
      494
      469
      1458
      49875
    
    
      29
      New Hampshire
      524
      530
      512
      1566
      73397
    
    
      30
      New Jersey
      501
      523
      502
      1526
      65243
    
    
      31
      New Mexico
      548
      543
      526
      1617
      46686
    
    
      32
      New York
      488
      502
      478
      1468
      54310
    
    
      33
      North Carolina
      499
      507
      477
      1483
      46784
    
    
      34
      North Dakota
      612
      620
      584
      1816
      60730
    
    
      35
      Ohio
      555
      562
      535
      1652
      49644
    
    
      36
      Oklahoma
      576
      571
      550
      1697
      47199
    
    
      37
      Oregon
      523
      522
      499
      1544
      58875
    
    
      38
      Pennsylvania
      497
      504
      480
      1481
      55173
    
    
      39
      Rhode Island
      497
      496
      487
      1480
      58633
    
    
      40
      South Carolina
      488
      490
      465
      1443
      44929
    
    
      41
      South Dakota
      604
      609
      579
      1792
      53053
    
    
      42
      Tennessee
      578
      570
      566
      1714
      43716
    
    
      43
      Texas
      476
      495
      461
      1432
      53875
    
    
      44
      Utah
      571
      568
      551
      1690
      63383
    
    
      45
      Vermont
      522
      525
      507
      1554
      60708
    
    
      46
      Virginia
      518
      515
      497
      1530
      66155
    
    
      47
      Washington
      510
      518
      491
      1519
      59068
    
    
      48
      West Virginia
      517
      505
      500
      1522
      39552
    
    
      49
      Wisconsin
      596
      608
      578
      1782
      58080
    
    
      50
      Wyoming
      590
      599
      573
      1762
      55690
    
    
      51
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      52
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      53
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      54
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      55
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      56
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      57
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN

Graphs We looked at the household income, and its relationship to SAT score for 2014. For each graph we put SAT scores on the x-axis, and put Income on the Y-axis. The results are in a scatter plot below



In [6]:

    
#SAT Reading vs Income
df1.plot.scatter('Reading', 'Income', color = 'r')
plt.title('SAT 2014 Reading scores vs. Income')
plt.xlabel('SAT Reading Score')
plt.ylabel('Income')
plt.show()

#SAT Writing vs Income
df1.plot.scatter('Writing', 'Income', color = 'b')
plt.title('SAT 2014 Writing scores vs Income')
plt.xlabel('SAT Writing Score')
plt.ylabel('Income')
plt.show()

#SAT Math vs Income
df1.plot.scatter('Writing', 'Income', color = 'g')
plt.title('SAT 2014 Math scores vs Income')
plt.xlabel('SAT Writing Score')
plt.ylabel('Income')
plt.show()

#SAT Total vs Income
df1.plot.scatter('Total', 'Income', color = 'g')
plt.title('SAT 2014 Total score vs Income')
plt.xlabel('SAT Writing Score')
plt.ylabel('Income')
plt.show()



In [7]:

    
lm = smf.ols(formula='Reading ~ Income', data=df1).fit()
lm.params
lm.summary()









    Out[7]:





OLS Regression Results

  Dep. Variable:          Reading        R-squared:             0.038


  Model:                    OLS          Adj. R-squared:        0.018


  Method:              Least Squares     F-statistic:           1.936


  Date:              Sat, 30 Apr 2016    Prob (F-statistic):    0.170 


  Time:                  16:56:19        Log-Likelihood:      -268.20


  No. Observations:           51         AIC:                   540.4


  Df Residuals:               49         BIC:                   544.3


  Df Model:                    1                                     


  Covariance Type:       nonrobust                                   




               coef      std err       t       P>|t|  [95.0% Conf. Int.] 


  Intercept    590.6732     40.797     14.478   0.000    508.689   672.658


  Income        -0.0010      0.001     -1.391   0.170     -0.002     0.000




  Omnibus:        10.569    Durbin-Watson:         1.667


  Prob(Omnibus):   0.005    Jarque-Bera (JB):      2.933


  Skew:            0.081    Prob(JB):              0.231


  Kurtosis:        1.836    Cond. No.           3.44e+05



In [8]:

    
lm = smf.ols(formula='Writing ~ Income', data=df1).fit()
lm.params
lm.summary()









    Out[8]:





OLS Regression Results

  Dep. Variable:          Writing        R-squared:             0.026


  Model:                    OLS          Adj. R-squared:        0.006


  Method:              Least Squares     F-statistic:           1.300


  Date:              Sat, 30 Apr 2016    Prob (F-statistic):    0.260 


  Time:                  16:57:37        Log-Likelihood:      -266.33


  No. Observations:           51         AIC:                   536.7


  Df Residuals:               49         BIC:                   540.5


  Df Model:                    1                                     


  Covariance Type:       nonrobust                                   




               coef      std err       t       P>|t|  [95.0% Conf. Int.] 


  Intercept    562.1131     39.328     14.293   0.000    483.080   641.146


  Income        -0.0008      0.001     -1.140   0.260     -0.002     0.001




  Omnibus:        14.940    Durbin-Watson:         1.681


  Prob(Omnibus):   0.001    Jarque-Bera (JB):      3.343


  Skew:            0.037    Prob(JB):              0.188


  Kurtosis:        1.748    Cond. No.           3.44e+05



In [9]:

    
lm = smf.ols(formula='Math ~ Income', data=df1).fit()
lm.params
lm.summary()









    Out[9]:





OLS Regression Results

  Dep. Variable:           Math          R-squared:             0.012


  Model:                    OLS          Adj. R-squared:       -0.008


  Method:              Least Squares     F-statistic:          0.5978


  Date:              Sat, 30 Apr 2016    Prob (F-statistic):    0.443 


  Time:                  16:58:38        Log-Likelihood:      -269.26


  No. Observations:           51         AIC:                   542.5


  Df Residuals:               49         BIC:                   546.4


  Df Model:                    1                                     


  Covariance Type:       nonrobust                                   




               coef      std err       t       P>|t|  [95.0% Conf. Int.] 


  Intercept    569.5981     41.654     13.675   0.000    485.892   653.304


  Income        -0.0006      0.001     -0.773   0.443     -0.002     0.001




  Omnibus:         5.238    Durbin-Watson:         1.586


  Prob(Omnibus):   0.073    Jarque-Bera (JB):      2.233


  Skew:            0.153    Prob(JB):              0.327


  Kurtosis:        2.022    Cond. No.           3.44e+05



In [10]:

    
lm = smf.ols(formula='Total ~ Income', data=df1).fit()
lm.params
lm.summary()









    Out[10]:





OLS Regression Results

  Dep. Variable:           Total         R-squared:             0.024


  Model:                    OLS          Adj. R-squared:        0.004


  Method:              Least Squares     F-statistic:           1.218


  Date:              Sat, 30 Apr 2016    Prob (F-statistic):    0.275 


  Time:                  16:59:35        Log-Likelihood:      -323.76


  No. Observations:           51         AIC:                   651.5


  Df Residuals:               49         BIC:                   655.4


  Df Model:                    1                                     


  Covariance Type:       nonrobust                                   




               coef      std err       t       P>|t|  [95.0% Conf. Int.] 


  Intercept   1722.3845    121.258     14.204   0.000   1478.707  1966.062


  Income        -0.0024      0.002     -1.104   0.275     -0.007     0.002




  Omnibus:         9.661    Durbin-Watson:         1.640


  Prob(Omnibus):   0.008    Jarque-Bera (JB):      2.840


  Skew:            0.096    Prob(JB):              0.242


  Kurtosis:        1.860    Cond. No.           3.44e+05

Correlation The OLS regression results indicate that the there is a 0.24 correlation betweeen Total scores and Average Household Income.



In [19]:

    
np.mean1 = (np.mean(df1.Reading), np.mean(df1.Math), np.mean(df1.Writing))



In [20]:

    
print(np.mean1)









    



(534.6666666666666, 537.8235294117648, 517.8627450980392)

Conclusion Our results reveal that States with a higher average household income seem to have slightly higher SAT scores. Although the correlation is not weak, it is also not strong enough to come to the strict conclusion that Higher income is causation for higher SAT scores. There is a lot of variability with SAT Scores and Income. This is most notable when looking at our graphs. The state with the lowest total Income is surprisingly on the higher end of SAT scores. Oddly enough our data disproves our initial hypothesis. That being said income isn't indicative of test scores. There are many factors that need to be taken into consideration in our evaluations. Factors as simple as test participation rate can have a huge impact on our data. Although the exam is taken nationwide, some states specifically, those of the midwest are likely to use the ACT. For this reason, students taking the SAT are likely to be more ambitious and also score higher. These small discrepancies are what we assume could be a reason for our incorrect hypothesis.

Further Questions

What would happen if we decided to look at each state on an individual level?

Assuming that Income plays a large role on the State level, What does that mean for how we should compare SAT scores? Would this negate teh SAT's purpose of serving as a metric for which to compare students across the United States?

	State	Reading	Math	Writing	Total	Income
0	Alabama	547	538	532	1617	42278
1	Alaska	507	503	475	1485	67629
2	Arizona	522	525	500	1547	49254
3	Arkansas	573	571	554	1698	44922
4	California	498	510	496	1504	60487
5	Colorado	582	586	567	1735	60940
6	Connecticut	507	510	508	1525	70161
7	Delaware	456	459	444	1359	57522
8	District of Columbia	440	438	431	1309	68277
9	Florida	491	485	472	1448	46140
10	Georgia	488	485	472	1445	49555
11	Hawaii	484	504	472	1460	71223
12	Idaho	458	456	450	1364	53438
13	Illinois	599	616	587	1802	54916
14	Indiana	497	500	477	1474	48060
15	Iowa	605	611	578	1794	57810
16	Kansas	591	596	566	1753	53444
17	Kentucky	589	585	572	1746	42786
18	Louisiana	561	556	550	1667	42406
19	Maine	467	471	449	1387	51710
20	Maryland	492	495	481	1468	76165
21	Massachusetts	516	531	509	1556	63151
22	Michigan	593	610	581	1784	52005
23	Minnesota	598	610	578	1786	67244
24	Mississippi	583	566	565	1714	35521
25	Missouri	595	597	579	1771	56630
26	Montana	555	552	530	1637	51102
27	Nebraska	589	587	569	1745	56870
28	Nevada	495	494	469	1458	49875
29	New Hampshire	524	530	512	1566	73397
30	New Jersey	501	523	502	1526	65243
31	New Mexico	548	543	526	1617	46686
32	New York	488	502	478	1468	54310
33	North Carolina	499	507	477	1483	46784
34	North Dakota	612	620	584	1816	60730
35	Ohio	555	562	535	1652	49644
36	Oklahoma	576	571	550	1697	47199
37	Oregon	523	522	499	1544	58875
38	Pennsylvania	497	504	480	1481	55173
39	Rhode Island	497	496	487	1480	58633
40	South Carolina	488	490	465	1443	44929
41	South Dakota	604	609	579	1792	53053
42	Tennessee	578	570	566	1714	43716
43	Texas	476	495	461	1432	53875
44	Utah	571	568	551	1690	63383
45	Vermont	522	525	507	1554	60708
46	Virginia	518	515	497	1530	66155
47	Washington	510	518	491	1519	59068
48	West Virginia	517	505	500	1522	39552
49	Wisconsin	596	608	578	1782	58080
50	Wyoming	590	599	573	1762	55690
51	NaN	NaN	NaN	NaN	NaN	NaN
52	NaN	NaN	NaN	NaN	NaN	NaN
53	NaN	NaN	NaN	NaN	NaN	NaN
54	NaN	NaN	NaN	NaN	NaN	NaN
55	NaN	NaN	NaN	NaN	NaN	NaN
56	NaN	NaN	NaN	NaN	NaN	NaN
57	NaN	NaN	NaN	NaN	NaN	NaN

Dep. Variable:	Reading	R-squared:	0.038
Model:	OLS	Adj. R-squared:	0.018
Method:	Least Squares	F-statistic:	1.936
Date:	Sat, 30 Apr 2016	Prob (F-statistic):	0.170
Time:	16:56:19	Log-Likelihood:	-268.20
No. Observations:	51	AIC:	540.4
Df Residuals:	49	BIC:	544.3
Df Model:	1
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[95.0% Conf. Int.]
Intercept	590.6732	40.797	14.478	0.000	508.689 672.658
Income	-0.0010	0.001	-1.391	0.170	-0.002 0.000

Omnibus:	10.569	Durbin-Watson:	1.667
Prob(Omnibus):	0.005	Jarque-Bera (JB):	2.933
Skew:	0.081	Prob(JB):	0.231
Kurtosis:	1.836	Cond. No.	3.44e+05

Dep. Variable:	Writing	R-squared:	0.026
Model:	OLS	Adj. R-squared:	0.006
Method:	Least Squares	F-statistic:	1.300
Date:	Sat, 30 Apr 2016	Prob (F-statistic):	0.260
Time:	16:57:37	Log-Likelihood:	-266.33
No. Observations:	51	AIC:	536.7
Df Residuals:	49	BIC:	540.5
Df Model:	1
Covariance Type:	nonrobust

Omnibus:	14.940	Durbin-Watson:	1.681
Prob(Omnibus):	0.001	Jarque-Bera (JB):	3.343
Skew:	0.037	Prob(JB):	0.188
Kurtosis:	1.748	Cond. No.	3.44e+05

Dep. Variable:	Math	R-squared:	0.012
Model:	OLS	Adj. R-squared:	-0.008
Method:	Least Squares	F-statistic:	0.5978
Date:	Sat, 30 Apr 2016	Prob (F-statistic):	0.443
Time:	16:58:38	Log-Likelihood:	-269.26
No. Observations:	51	AIC:	542.5
Df Residuals:	49	BIC:	546.4
Df Model:	1
Covariance Type:	nonrobust

Omnibus:	5.238	Durbin-Watson:	1.586
Prob(Omnibus):	0.073	Jarque-Bera (JB):	2.233
Skew:	0.153	Prob(JB):	0.327
Kurtosis:	2.022	Cond. No.	3.44e+05

Dep. Variable:	Total	R-squared:	0.024
Model:	OLS	Adj. R-squared:	0.004
Method:	Least Squares	F-statistic:	1.218
Date:	Sat, 30 Apr 2016	Prob (F-statistic):	0.275
Time:	16:59:35	Log-Likelihood:	-323.76
No. Observations:	51	AIC:	651.5
Df Residuals:	49	BIC:	655.4
Df Model:	1
Covariance Type:	nonrobust

Omnibus:	9.661	Durbin-Watson:	1.640
Prob(Omnibus):	0.008	Jarque-Bera (JB):	2.840
Skew:	0.096	Prob(JB):	0.242
Kurtosis:	1.860	Cond. No.	3.44e+05