About The SAT by Felix Gaye and Dainel Greenberg

Is there a relatioship between Household income per capita and SAT scores? May 2016

The SAT is a college entrance exam created by the College Board. It is used by a majority of colleges in the United States, as a base metric in which to judge applicants. Because High schools will no doubt vary in difficulty it is often unfair to simply compare students based of their GPA. An exam like the SAT serves as a way to even the playing field.Everyone is given a chance to be directly compete with their peers. The exam is administered 7/year and students are given the opportunity to retake the exam as many ties as they see fit.

Although this exam is meant to serve as an equalizer, we are interested in seeing whether or not average household income plays a role in how high SAT scores are. As we assue that this will play a role, a more important question is how impactful it will be, and whether or not it is a big enough factor to include when comparing students.

Packages imported, we use matplotlib.pyplot to plot scatter plots. We use pandas to allow for data analysis and manipulation. Additionally we imported numpy to be used for scientific computing, and for mathematical caluclations.


In [1]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
%matplotlib inline

Creating the Dataset we used SAT scores from 2014. Our data is all downloadable directly from the CollegeBoard website. We organized our data alphabetically by State, and our analysis will focus on the combined total SAT scores of reading, writing, and math. The data was organized into columns for each subject and then saved as a csv, and read using pandas.


In [2]:
file1 = 'C:/Users/felgaye/Documents/Data Bootcamp Data sheet.csv'
df1 = pd.read_csv(file1)

In [3]:
df1


Out[3]:
State Reading Math Writing Total Income
0 Alabama 547 538 532 1617 42278
1 Alaska 507 503 475 1485 67629
2 Arizona 522 525 500 1547 49254
3 Arkansas 573 571 554 1698 44922
4 California 498 510 496 1504 60487
5 Colorado 582 586 567 1735 60940
6 Connecticut 507 510 508 1525 70161
7 Delaware 456 459 444 1359 57522
8 District of Columbia 440 438 431 1309 68277
9 Florida 491 485 472 1448 46140
10 Georgia 488 485 472 1445 49555
11 Hawaii 484 504 472 1460 71223
12 Idaho 458 456 450 1364 53438
13 Illinois 599 616 587 1802 54916
14 Indiana 497 500 477 1474 48060
15 Iowa 605 611 578 1794 57810
16 Kansas 591 596 566 1753 53444
17 Kentucky 589 585 572 1746 42786
18 Louisiana 561 556 550 1667 42406
19 Maine 467 471 449 1387 51710
20 Maryland 492 495 481 1468 76165
21 Massachusetts 516 531 509 1556 63151
22 Michigan 593 610 581 1784 52005
23 Minnesota 598 610 578 1786 67244
24 Mississippi 583 566 565 1714 35521
25 Missouri 595 597 579 1771 56630
26 Montana 555 552 530 1637 51102
27 Nebraska 589 587 569 1745 56870
28 Nevada 495 494 469 1458 49875
29 New Hampshire 524 530 512 1566 73397
30 New Jersey 501 523 502 1526 65243
31 New Mexico 548 543 526 1617 46686
32 New York 488 502 478 1468 54310
33 North Carolina 499 507 477 1483 46784
34 North Dakota 612 620 584 1816 60730
35 Ohio 555 562 535 1652 49644
36 Oklahoma 576 571 550 1697 47199
37 Oregon 523 522 499 1544 58875
38 Pennsylvania 497 504 480 1481 55173
39 Rhode Island 497 496 487 1480 58633
40 South Carolina 488 490 465 1443 44929
41 South Dakota 604 609 579 1792 53053
42 Tennessee 578 570 566 1714 43716
43 Texas 476 495 461 1432 53875
44 Utah 571 568 551 1690 63383
45 Vermont 522 525 507 1554 60708
46 Virginia 518 515 497 1530 66155
47 Washington 510 518 491 1519 59068
48 West Virginia 517 505 500 1522 39552
49 Wisconsin 596 608 578 1782 58080
50 Wyoming 590 599 573 1762 55690
51 NaN NaN NaN NaN NaN NaN
52 NaN NaN NaN NaN NaN NaN
53 NaN NaN NaN NaN NaN NaN
54 NaN NaN NaN NaN NaN NaN
55 NaN NaN NaN NaN NaN NaN
56 NaN NaN NaN NaN NaN NaN
57 NaN NaN NaN NaN NaN NaN

Graphs We looked at the household income, and its relationship to SAT score for 2014. For each graph we put SAT scores on the x-axis, and put Income on the Y-axis. The results are in a scatter plot below


In [6]:
#SAT Reading vs Income
df1.plot.scatter('Reading', 'Income', color = 'r')
plt.title('SAT 2014 Reading scores vs. Income')
plt.xlabel('SAT Reading Score')
plt.ylabel('Income')
plt.show()

#SAT Writing vs Income
df1.plot.scatter('Writing', 'Income', color = 'b')
plt.title('SAT 2014 Writing scores vs Income')
plt.xlabel('SAT Writing Score')
plt.ylabel('Income')
plt.show()

#SAT Math vs Income
df1.plot.scatter('Writing', 'Income', color = 'g')
plt.title('SAT 2014 Math scores vs Income')
plt.xlabel('SAT Writing Score')
plt.ylabel('Income')
plt.show()

#SAT Total vs Income
df1.plot.scatter('Total', 'Income', color = 'g')
plt.title('SAT 2014 Total score vs Income')
plt.xlabel('SAT Writing Score')
plt.ylabel('Income')
plt.show()



In [7]:
lm = smf.ols(formula='Reading ~ Income', data=df1).fit()
lm.params
lm.summary()


Out[7]:
OLS Regression Results
Dep. Variable: Reading R-squared: 0.038
Model: OLS Adj. R-squared: 0.018
Method: Least Squares F-statistic: 1.936
Date: Sat, 30 Apr 2016 Prob (F-statistic): 0.170
Time: 16:56:19 Log-Likelihood: -268.20
No. Observations: 51 AIC: 540.4
Df Residuals: 49 BIC: 544.3
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
Intercept 590.6732 40.797 14.478 0.000 508.689 672.658
Income -0.0010 0.001 -1.391 0.170 -0.002 0.000
Omnibus: 10.569 Durbin-Watson: 1.667
Prob(Omnibus): 0.005 Jarque-Bera (JB): 2.933
Skew: 0.081 Prob(JB): 0.231
Kurtosis: 1.836 Cond. No. 3.44e+05

In [8]:
lm = smf.ols(formula='Writing ~ Income', data=df1).fit()
lm.params
lm.summary()


Out[8]:
OLS Regression Results
Dep. Variable: Writing R-squared: 0.026
Model: OLS Adj. R-squared: 0.006
Method: Least Squares F-statistic: 1.300
Date: Sat, 30 Apr 2016 Prob (F-statistic): 0.260
Time: 16:57:37 Log-Likelihood: -266.33
No. Observations: 51 AIC: 536.7
Df Residuals: 49 BIC: 540.5
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
Intercept 562.1131 39.328 14.293 0.000 483.080 641.146
Income -0.0008 0.001 -1.140 0.260 -0.002 0.001
Omnibus: 14.940 Durbin-Watson: 1.681
Prob(Omnibus): 0.001 Jarque-Bera (JB): 3.343
Skew: 0.037 Prob(JB): 0.188
Kurtosis: 1.748 Cond. No. 3.44e+05

In [9]:
lm = smf.ols(formula='Math ~ Income', data=df1).fit()
lm.params
lm.summary()


Out[9]:
OLS Regression Results
Dep. Variable: Math R-squared: 0.012
Model: OLS Adj. R-squared: -0.008
Method: Least Squares F-statistic: 0.5978
Date: Sat, 30 Apr 2016 Prob (F-statistic): 0.443
Time: 16:58:38 Log-Likelihood: -269.26
No. Observations: 51 AIC: 542.5
Df Residuals: 49 BIC: 546.4
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
Intercept 569.5981 41.654 13.675 0.000 485.892 653.304
Income -0.0006 0.001 -0.773 0.443 -0.002 0.001
Omnibus: 5.238 Durbin-Watson: 1.586
Prob(Omnibus): 0.073 Jarque-Bera (JB): 2.233
Skew: 0.153 Prob(JB): 0.327
Kurtosis: 2.022 Cond. No. 3.44e+05

In [10]:
lm = smf.ols(formula='Total ~ Income', data=df1).fit()
lm.params
lm.summary()


Out[10]:
OLS Regression Results
Dep. Variable: Total R-squared: 0.024
Model: OLS Adj. R-squared: 0.004
Method: Least Squares F-statistic: 1.218
Date: Sat, 30 Apr 2016 Prob (F-statistic): 0.275
Time: 16:59:35 Log-Likelihood: -323.76
No. Observations: 51 AIC: 651.5
Df Residuals: 49 BIC: 655.4
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [95.0% Conf. Int.]
Intercept 1722.3845 121.258 14.204 0.000 1478.707 1966.062
Income -0.0024 0.002 -1.104 0.275 -0.007 0.002
Omnibus: 9.661 Durbin-Watson: 1.640
Prob(Omnibus): 0.008 Jarque-Bera (JB): 2.840
Skew: 0.096 Prob(JB): 0.242
Kurtosis: 1.860 Cond. No. 3.44e+05

Correlation The OLS regression results indicate that the there is a 0.24 correlation betweeen Total scores and Average Household Income.


In [19]:
np.mean1 = (np.mean(df1.Reading), np.mean(df1.Math), np.mean(df1.Writing))

In [20]:
print(np.mean1)


(534.6666666666666, 537.8235294117648, 517.8627450980392)

Conclusion Our results reveal that States with a higher average household income seem to have slightly higher SAT scores. Although the correlation is not weak, it is also not strong enough to come to the strict conclusion that Higher income is causation for higher SAT scores. There is a lot of variability with SAT Scores and Income. This is most notable when looking at our graphs. The state with the lowest total Income is surprisingly on the higher end of SAT scores. Oddly enough our data disproves our initial hypothesis. That being said income isn't indicative of test scores. There are many factors that need to be taken into consideration in our evaluations. Factors as simple as test participation rate can have a huge impact on our data. Although the exam is taken nationwide, some states specifically, those of the midwest are likely to use the ACT. For this reason, students taking the SAT are likely to be more ambitious and also score higher. These small discrepancies are what we assume could be a reason for our incorrect hypothesis.

Further Questions

What would happen if we decided to look at each state on an individual level?

Assuming that Income plays a large role on the State level, What does that mean for how we should compare SAT scores? Would this negate teh SAT's purpose of serving as a metric for which to compare students across the United States?