Introduction

In this post, we are looking at the SAT (Scholastic Aptitude Test) scores of high schoolers in the US along with other informations. We will use Python for the data analysis.

The SAT is a test that high schoolers take in the US before applying to college so it is fairly important. Moreover, SAT scores are used to indicate how good a school district is. There have been allegations about the SAT being unfair (to certain ethnic groups in the US for example). Doing this analysis on New York City data will help shed some light on the fairness of the SAT.

Pre-analysis

In order to investigate the relation between the SAT and other factors such the ethnic groups or class sizes, we need data. In the Web site of the city of New York, we can find many data sets from which these have been selected:

SAT results by school (more information and download here).
School demographics (more information and download here).
Class size at the school level (more information and download here).

In what follows, we have a look on the data sets. Then, we clean and unify all the individual datasets into a single one in order to work with the data more easily.

Data description

In this section, we explore retrieved the data sets. In order to achieve this, we load them into data frames.



In [33]:

    
#in order to analyse the data sets, we load them into Python data frames (pandas and numpy libraries)
import pandas as pd
import numpy as np



In [34]:

    
#initialize the array that will contain the different data sets
data = {}

SAT scores by school

This data set contains the most recent school level results for New York City on the SAT. Results are available at the school level.

It is to be noticed that the SAT test is divided into 3 sections, each of which is scored out of 800 points (last 3 columns). The total score is out of 2400.

Full data can be downloaded here in CSV format.



In [35]:

    
#read csv file and show the data head
data["sat_results"] = pd.read_csv("data/{0}.csv".format("SAT_Results"))
data["sat_results"].head()









    Out[35]:






  
    
      
      DBN
      SCHOOL NAME
      Num of SAT Test Takers
      SAT Critical Reading Avg. Score
      SAT Math Avg. Score
      SAT Writing Avg. Score
    
  
  
    
      0
      01M292
      HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES
      29
      355
      404
      363
    
    
      1
      01M448
      UNIVERSITY NEIGHBORHOOD HIGH SCHOOL
      91
      383
      423
      366
    
    
      2
      01M450
      EAST SIDE COMMUNITY SCHOOL
      70
      377
      402
      370
    
    
      3
      01M458
      FORSYTH SATELLITE ACADEMY
      7
      414
      401
      359
    
    
      4
      01M509
      MARTA VALLE HIGH SCHOOL
      44
      390
      433
      384

School demographics

This data set contains the annual school accounts of NYC public school student populations served by several information from which ethnicity and gender.

Full data can be downloaded here in CSV format.



In [36]:

    
#read csv file and show the data head
data["demographics"] = pd.read_csv("data/{0}.csv".format("Demographics_and_Accountability"))
data["demographics"].head()









    Out[36]:






  
    
      
      DBN
      Name
      schoolyear
      fl_percent
      frl_percent
      total_enrollment
      prek
      k
      grade1
      grade2
      ...
      black_num
      black_per
      hispanic_num
      hispanic_per
      white_num
      white_per
      male_num
      male_per
      female_num
      female_per
    
  
  
    
      0
      01M015
      P.S. 015 ROBERTO CLEMENTE
      20052006
      89.4
      NaN
      281
      15
      36
      40
      33
      ...
      74
      26.3
      189
      67.3
      5
      1.8
      158.0
      56.2
      123.0
      43.8
    
    
      1
      01M015
      P.S. 015 ROBERTO CLEMENTE
      20062007
      89.4
      NaN
      243
      15
      29
      39
      38
      ...
      68
      28.0
      153
      63.0
      4
      1.6
      140.0
      57.6
      103.0
      42.4
    
    
      2
      01M015
      P.S. 015 ROBERTO CLEMENTE
      20072008
      89.4
      NaN
      261
      18
      43
      39
      36
      ...
      77
      29.5
      157
      60.2
      7
      2.7
      143.0
      54.8
      118.0
      45.2
    
    
      3
      01M015
      P.S. 015 ROBERTO CLEMENTE
      20082009
      89.4
      NaN
      252
      17
      37
      44
      32
      ...
      75
      29.8
      149
      59.1
      7
      2.8
      149.0
      59.1
      103.0
      40.9
    
    
      4
      01M015
      P.S. 015 ROBERTO CLEMENTE
      20092010
      
      96.5
      208
      16
      40
      28
      32
      ...
      67
      32.2
      118
      56.7
      6
      2.9
      124.0
      59.6
      84.0
      40.4
    
  

5 rows × 38 columns

Class size on the school level

This data set contains the average class sizes for each school, by grade and program type (full description here).

Full data can be downloaded here in CSV format.



In [37]:

    
#read csv file and show the data head
data["class_size"] = pd.read_csv("data/{0}.csv".format("Class_Size-School_level_detail"))
data["class_size"].head()









    Out[37]:






  
    
      
      CSD
      BOROUGH
      SCHOOL CODE
      SCHOOL NAME
      GRADE
      PROGRAM TYPE
      CORE SUBJECT (MS CORE and 9-12 ONLY)
      CORE COURSE (MS CORE and 9-12 ONLY)
      SERVICE CATEGORY(K-9* ONLY)
      NUMBER OF STUDENTS / SEATS FILLED
      NUMBER OF SECTIONS
      AVERAGE CLASS SIZE
      SIZE OF SMALLEST CLASS
      SIZE OF LARGEST CLASS
      DATA SOURCE
      SCHOOLWIDE PUPIL-TEACHER RATIO
    
  
  
    
      0
      1
      M
      M015
      P.S. 015 Roberto Clemente
      0K
      GEN ED
      -
      -
      -
      19.0
      1.0
      19.0
      19.0
      19.0
      ATS
      NaN
    
    
      1
      1
      M
      M015
      P.S. 015 Roberto Clemente
      0K
      CTT
      -
      -
      -
      21.0
      1.0
      21.0
      21.0
      21.0
      ATS
      NaN
    
    
      2
      1
      M
      M015
      P.S. 015 Roberto Clemente
      01
      GEN ED
      -
      -
      -
      17.0
      1.0
      17.0
      17.0
      17.0
      ATS
      NaN
    
    
      3
      1
      M
      M015
      P.S. 015 Roberto Clemente
      01
      CTT
      -
      -
      -
      17.0
      1.0
      17.0
      17.0
      17.0
      ATS
      NaN
    
    
      4
      1
      M
      M015
      P.S. 015 Roberto Clemente
      02
      GEN ED
      -
      -
      -
      15.0
      1.0
      15.0
      15.0
      15.0
      ATS
      NaN

Data cleaning

Having a consistent dataset will help us do analysis more quickly. In order to do this, we’ll first need to find a common column to unify them on. Looking at the output above, it appears that "DBN" (District Borough Number) might be that common column, as it appears in multiple datasets. DBN is a unique code for each school.

However:

the "class_size" data set don’t have a DBN field.
there are several rows for each high school in both the "class_size" and the "demographics" data sets.
the 3 components of the SAT scores (columns) need to be summed.

Thus, we need to clean these them first.

Cleaning each data set

If we look to the data set "class_size" and to the "DBN" column in the other data sets, it looks like the DBN is actually a combination of "CSD" and "SCHOOL CODE". There’s no systematized way to find insights like this in data, and it requires some exploration and playing around to figure out. Let us create the column "DBN" now.



In [38]:

    
#create the "DBN" column by combining other columns
data["class_size"]["DBN"] = data["class_size"].apply(lambda x: "{0:02d}{1}".format(x["CSD"], x["SCHOOL CODE"]), axis=1)
data["class_size"].head()









    Out[38]:






  
    
      
      CSD
      BOROUGH
      SCHOOL CODE
      SCHOOL NAME
      GRADE
      PROGRAM TYPE
      CORE SUBJECT (MS CORE and 9-12 ONLY)
      CORE COURSE (MS CORE and 9-12 ONLY)
      SERVICE CATEGORY(K-9* ONLY)
      NUMBER OF STUDENTS / SEATS FILLED
      NUMBER OF SECTIONS
      AVERAGE CLASS SIZE
      SIZE OF SMALLEST CLASS
      SIZE OF LARGEST CLASS
      DATA SOURCE
      SCHOOLWIDE PUPIL-TEACHER RATIO
      DBN
    
  
  
    
      0
      1
      M
      M015
      P.S. 015 Roberto Clemente
      0K
      GEN ED
      -
      -
      -
      19.0
      1.0
      19.0
      19.0
      19.0
      ATS
      NaN
      01M015
    
    
      1
      1
      M
      M015
      P.S. 015 Roberto Clemente
      0K
      CTT
      -
      -
      -
      21.0
      1.0
      21.0
      21.0
      21.0
      ATS
      NaN
      01M015
    
    
      2
      1
      M
      M015
      P.S. 015 Roberto Clemente
      01
      GEN ED
      -
      -
      -
      17.0
      1.0
      17.0
      17.0
      17.0
      ATS
      NaN
      01M015
    
    
      3
      1
      M
      M015
      P.S. 015 Roberto Clemente
      01
      CTT
      -
      -
      -
      17.0
      1.0
      17.0
      17.0
      17.0
      ATS
      NaN
      01M015
    
    
      4
      1
      M
      M015
      P.S. 015 Roberto Clemente
      02
      GEN ED
      -
      -
      -
      15.0
      1.0
      15.0
      15.0
      15.0
      ATS
      NaN
      01M015

In order to combine these datasets, we’ll need to find a way to condense data sets like "class_size" to the point where there’s only a single row per high school. If not, there won’t be a way to compare SAT scores to class size. We can accomplish this by first understanding the data better, then by doing some aggregation. With the class_size dataset, it looks like GRADE and PROGRAM TYPE have multiple values for each school. By restricting each field to a single value, we can filter most of the duplicate rows.



In [39]:

    
class_size_temp = data["class_size"]

#restrict the class_size data set to a specific grade and program type
class_size_temp = class_size_temp[class_size_temp["GRADE "] == "09-12"]
class_size_temp = class_size_temp[class_size_temp["PROGRAM TYPE"] == "GEN ED"]

#there still are duplicates, we group the class_size data set by DBN, and take the average of each column
class_size_temp = class_size_temp.groupby("DBN").agg(np.mean)

#reset the index, so DBN is added back in as a column (otherwise, DBN is the index column)
class_size_temp.reset_index(inplace=True)

data["class_size_ge_09-12"] = class_size_temp
data["class_size_ge_09-12"].head()









    Out[39]:






  
    
      
      DBN
      CSD
      NUMBER OF STUDENTS / SEATS FILLED
      NUMBER OF SECTIONS
      AVERAGE CLASS SIZE
      SIZE OF SMALLEST CLASS
      SIZE OF LARGEST CLASS
      SCHOOLWIDE PUPIL-TEACHER RATIO
    
  
  
    
      0
      01M292
      1
      88.0000
      4.000000
      22.564286
      18.50
      26.571429
      NaN
    
    
      1
      01M332
      1
      46.0000
      2.000000
      22.000000
      21.00
      23.500000
      NaN
    
    
      2
      01M378
      1
      33.0000
      1.000000
      33.000000
      33.00
      33.000000
      NaN
    
    
      3
      01M448
      1
      105.6875
      4.750000
      22.231250
      18.25
      27.062500
      NaN
    
    
      4
      01M450
      1
      57.6000
      2.733333
      21.200000
      19.40
      22.866667
      NaN

Next, we’ll need to condense the demographics dataset. The data was collected for multiple years for the same schools. We will only pick rows where the schoolyear field is the most recent available.



In [40]:

    
#select only one year
demographics_temp = data["demographics"]
demographics_temp = demographics_temp[demographics_temp["schoolyear"] == 20112012]

data["demographics_2011-2012"] = demographics_temp
data["demographics_2011-2012"].head()









    Out[40]:






  
    
      
      DBN
      Name
      schoolyear
      fl_percent
      frl_percent
      total_enrollment
      prek
      k
      grade1
      grade2
      ...
      black_num
      black_per
      hispanic_num
      hispanic_per
      white_num
      white_per
      male_num
      male_per
      female_num
      female_per
    
  
  
    
      6
      01M015
      P.S. 015 ROBERTO CLEMENTE
      20112012
      NaN
      89.4
      189
      13
      31
      35
      28
      ...
      63
      33.3
      109
      57.7
      4
      2.1
      97.0
      51.3
      92.0
      48.7
    
    
      13
      01M019
      P.S. 019 ASHER LEVY
      20112012
      NaN
      61.5
      328
      32
      46
      52
      54
      ...
      81
      24.7
      158
      48.2
      28
      8.5
      147.0
      44.8
      181.0
      55.2
    
    
      20
      01M020
      PS 020 ANNA SILVER
      20112012
      NaN
      92.5
      626
      52
      102
      121
      87
      ...
      55
      8.8
      357
      57.0
      16
      2.6
      330.0
      52.7
      296.0
      47.3
    
    
      27
      01M034
      PS 034 FRANKLIN D ROOSEVELT
      20112012
      NaN
      99.7
      401
      14
      34
      38
      36
      ...
      90
      22.4
      275
      68.6
      8
      2.0
      204.0
      50.9
      197.0
      49.1
    
    
      35
      01M063
      PS 063 WILLIAM MCKINLEY
      20112012
      NaN
      78.9
      176
      18
      20
      30
      21
      ...
      41
      23.3
      110
      62.5
      15
      8.5
      97.0
      55.1
      79.0
      44.9
    
  

5 rows × 38 columns

Now, we compute the total SAT scores from their individual 3 components.



In [41]:

    
#convert each of the SAT score columns from a string to a number
cols = ['SAT Math Avg. Score', 'SAT Critical Reading Avg. Score', 'SAT Writing Avg. Score']
for c in cols:
    data["sat_results"][c] = pd.to_numeric(data["sat_results"][c], errors='coerce')

#add together all of the columns to get the sat_score column, which is the total SAT score
data['sat_results']['sat_score'] = data['sat_results'][cols[0]] + data['sat_results'][cols[1]] + data['sat_results'][cols[2]]

data['sat_results'].head()









    Out[41]:






  
    
      
      DBN
      SCHOOL NAME
      Num of SAT Test Takers
      SAT Critical Reading Avg. Score
      SAT Math Avg. Score
      SAT Writing Avg. Score
      sat_score
    
  
  
    
      0
      01M292
      HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES
      29
      355.0
      404.0
      363.0
      1122.0
    
    
      1
      01M448
      UNIVERSITY NEIGHBORHOOD HIGH SCHOOL
      91
      383.0
      423.0
      366.0
      1172.0
    
    
      2
      01M450
      EAST SIDE COMMUNITY SCHOOL
      70
      377.0
      402.0
      370.0
      1149.0
    
    
      3
      01M458
      FORSYTH SATELLITE ACADEMY
      7
      414.0
      401.0
      359.0
      1174.0
    
    
      4
      01M509
      MARTA VALLE HIGH SCHOOL
      44
      390.0
      433.0
      384.0
      1207.0

Combining the data sets

We can now combine all the data sets. In this analysis, we ignore missing value.



In [42]:

    
#joining data sets
data_full = data['sat_results']
data_full = data_full.merge(data["class_size_ge_09-12"], on="DBN", how="inner")
data_full = data_full.merge(data["demographics_2011-2012"], on="DBN", how="inner")

#the structure of the global data set
data_full.shape









    Out[42]:





(434, 51)

Data analysis

A good way to explore a dataset and see what columns are related to the one you care about is to compute correlations. This will tell you which columns are closely related to the column you’re interested in:

The closer to 0 the correlation, the weaker the connection.
The closer to 1, the stronger the positive correlation, and the closer to -1, the stronger the negative correlation.



In [43]:

    
#show the correlations of the columns with the "sat_score" column
data_full_corr = data_full.corr()['sat_score']
data_full_corr.sort_values(ascending=False)









    Out[43]:





sat_score                            1.000000
SAT Writing Avg. Score               0.981751
SAT Critical Reading Avg. Score      0.976819
SAT Math Avg. Score                  0.956406
white_per                            0.646568
asian_per                            0.552204
asian_num                            0.483939
white_num                            0.460505
female_num                           0.403581
NUMBER OF STUDENTS / SEATS FILLED    0.400095
AVERAGE CLASS SIZE                   0.395964
total_enrollment                     0.385741
NUMBER OF SECTIONS                   0.364537
male_num                             0.345808
SIZE OF LARGEST CLASS                0.327099
SIZE OF SMALLEST CLASS               0.282388
female_per                           0.107676
sped_num                             0.058782
CSD                                  0.054011
hispanic_num                         0.052672
black_num                            0.043408
male_per                            -0.107624
ell_num                             -0.130355
black_per                           -0.302794
hispanic_per                        -0.363696
ell_percent                         -0.379510
sped_percent                        -0.420646
frl_percent                         -0.718163
SCHOOLWIDE PUPIL-TEACHER RATIO            NaN
schoolyear                                NaN
Name: sat_score, dtype: float64

This gives us a few insights that can be explored. For example:

The percentage of females at a school (female_per) correlates positively with SAT score, whereas the percentage of males (male_per) correlates negatively.
There is a significant racial inequality in SAT scores (white_per, asian_per, black_per, hispanic_per).

Each of these items is a potential angle to explore and tell a story about using the data.

Exploring race and SAT scores

One angle to investigate involves race and SAT scores. There is a large correlation difference, and plotting it out will help us see that.



In [44]:

    
#importing the matplotlib library
import matplotlib.pyplot as plt
%matplotlib inline

#2D graph 
data_full.corr()["sat_score"][["white_per", "asian_per", "black_per", "hispanic_per"]].plot.bar()









    Out[44]:





<matplotlib.axes._subplots.AxesSubplot at 0x109172b38>

It looks like the higher percentages of white and asian students correlate with higher SAT scores, but higher percentages of black and hispanic students correlate with lower SAT scores. For hispanic students, this may be due to the fact that there are more recent immigrants who are english learners.

Gender differences in SAT scores

An other angle to explore is the relationship between gender and SAT score. We noted that a higher percentage of females in a school tends to correlate with higher SAT scores. We can visualize this with a bar graph.



In [45]:

    
#2D graph 
data_full.corr()["sat_score"][["male_per", "female_per"]].plot.bar()









    Out[45]:





<matplotlib.axes._subplots.AxesSubplot at 0x109c5f160>

To dig more into the correlation, we can make a scatterplot of "female_per" and "sat_score".



In [46]:

    
#scatter plot
data_full.plot.scatter(x='female_per', y='sat_score')









    Out[46]:





<matplotlib.axes._subplots.AxesSubplot at 0x104209978>

It looks like there’s a cluster of schools with a high percentage of females, and very high SAT scores (in the top right). We can get the names of the schools in this cluster.



In [47]:

    
data_full[(data_full["female_per"] > 65) & (data_full["sat_score"] > 1400)]["SCHOOL NAME"]









    Out[47]:





8                         BARD HIGH SCHOOL EARLY COLLEGE
27              PROFESSIONAL PERFORMING ARTS HIGH SCHOOL
31                         ELEANOR ROOSEVELT HIGH SCHOOL
48                          TALENT UNLIMITED HIGH SCHOOL
78     FIORELLO H. LAGUARDIA HIGH SCHOOL OF MUSIC & A...
364                          TOWNSEND HARRIS HIGH SCHOOL
411         FRANK SINATRA SCHOOL OF THE ARTS HIGH SCHOOL
Name: SCHOOL NAME, dtype: object

Searching Google reveals that these are elite schools that focus on the performing arts. These schools tend to have higher percentages of females, and higher SAT scores. This likely accounts for the correlation between higher female percentages and SAT scores, and the inverse correlation between higher male percentages and lower SAT scores.

Conclusion

In this post, we have analysed the schools SAT scores and the relation with some other informations like ethnic groups and gender. However, we explored some angles only in the surface, and could have dived into more. Furthermore, we could have combined with other data sets such as schools location informations.

Credits

This case study is based on an article of Vik Paruchuri. I thank him for this example.



In [ ]:

	DBN	SCHOOL NAME	Num of SAT Test Takers	SAT Critical Reading Avg. Score	SAT Math Avg. Score	SAT Writing Avg. Score
0	01M292	HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES	29	355	404	363
1	01M448	UNIVERSITY NEIGHBORHOOD HIGH SCHOOL	91	383	423	366
2	01M450	EAST SIDE COMMUNITY SCHOOL	70	377	402	370
3	01M458	FORSYTH SATELLITE ACADEMY	7	414	401	359
4	01M509	MARTA VALLE HIGH SCHOOL	44	390	433	384

	DBN	Name	schoolyear	fl_percent	frl_percent	total_enrollment	prek	k	grade1	grade2	...	black_num	black_per	hispanic_num	hispanic_per	white_num	white_per	male_num	male_per	female_num	female_per
0	01M015	P.S. 015 ROBERTO CLEMENTE	20052006	89.4	NaN	281	15	36	40	33	...	74	26.3	189	67.3	5	1.8	158.0	56.2	123.0	43.8
1	01M015	P.S. 015 ROBERTO CLEMENTE	20062007	89.4	NaN	243	15	29	39	38	...	68	28.0	153	63.0	4	1.6	140.0	57.6	103.0	42.4
2	01M015	P.S. 015 ROBERTO CLEMENTE	20072008	89.4	NaN	261	18	43	39	36	...	77	29.5	157	60.2	7	2.7	143.0	54.8	118.0	45.2
3	01M015	P.S. 015 ROBERTO CLEMENTE	20082009	89.4	NaN	252	17	37	44	32	...	75	29.8	149	59.1	7	2.8	149.0	59.1	103.0	40.9
4	01M015	P.S. 015 ROBERTO CLEMENTE	20092010		96.5	208	16	40	28	32	...	67	32.2	118	56.7	6	2.9	124.0	59.6	84.0	40.4

	CSD	BOROUGH	SCHOOL CODE	SCHOOL NAME	GRADE	PROGRAM TYPE	CORE SUBJECT (MS CORE and 9-12 ONLY)	CORE COURSE (MS CORE and 9-12 ONLY)	SERVICE CATEGORY(K-9* ONLY)	NUMBER OF STUDENTS / SEATS FILLED	NUMBER OF SECTIONS	AVERAGE CLASS SIZE	SIZE OF SMALLEST CLASS	SIZE OF LARGEST CLASS	DATA SOURCE	SCHOOLWIDE PUPIL-TEACHER RATIO
0	1	M	M015	P.S. 015 Roberto Clemente	0K	GEN ED	-	-	-	19.0	1.0	19.0	19.0	19.0	ATS	NaN
1	1	M	M015	P.S. 015 Roberto Clemente	0K	CTT	-	-	-	21.0	1.0	21.0	21.0	21.0	ATS	NaN
2	1	M	M015	P.S. 015 Roberto Clemente	01	GEN ED	-	-	-	17.0	1.0	17.0	17.0	17.0	ATS	NaN
3	1	M	M015	P.S. 015 Roberto Clemente	01	CTT	-	-	-	17.0	1.0	17.0	17.0	17.0	ATS	NaN
4	1	M	M015	P.S. 015 Roberto Clemente	02	GEN ED	-	-	-	15.0	1.0	15.0	15.0	15.0	ATS	NaN

	DBN	CSD	NUMBER OF STUDENTS / SEATS FILLED	NUMBER OF SECTIONS	AVERAGE CLASS SIZE	SIZE OF SMALLEST CLASS	SIZE OF LARGEST CLASS	SCHOOLWIDE PUPIL-TEACHER RATIO
0	01M292	1	88.0000	4.000000	22.564286	18.50	26.571429	NaN
1	01M332	1	46.0000	2.000000	22.000000	21.00	23.500000	NaN
2	01M378	1	33.0000	1.000000	33.000000	33.00	33.000000	NaN
3	01M448	1	105.6875	4.750000	22.231250	18.25	27.062500	NaN
4	01M450	1	57.6000	2.733333	21.200000	19.40	22.866667	NaN

	DBN	Name	schoolyear	fl_percent	frl_percent	total_enrollment	prek	k	grade1	grade2	...	black_num	black_per	hispanic_num	hispanic_per	white_num	white_per	male_num	male_per	female_num	female_per
6	01M015	P.S. 015 ROBERTO CLEMENTE	20112012	NaN	89.4	189	13	31	35	28	...	63	33.3	109	57.7	4	2.1	97.0	51.3	92.0	48.7
13	01M019	P.S. 019 ASHER LEVY	20112012	NaN	61.5	328	32	46	52	54	...	81	24.7	158	48.2	28	8.5	147.0	44.8	181.0	55.2
20	01M020	PS 020 ANNA SILVER	20112012	NaN	92.5	626	52	102	121	87	...	55	8.8	357	57.0	16	2.6	330.0	52.7	296.0	47.3
27	01M034	PS 034 FRANKLIN D ROOSEVELT	20112012	NaN	99.7	401	14	34	38	36	...	90	22.4	275	68.6	8	2.0	204.0	50.9	197.0	49.1
35	01M063	PS 063 WILLIAM MCKINLEY	20112012	NaN	78.9	176	18	20	30	21	...	41	23.3	110	62.5	15	8.5	97.0	55.1	79.0	44.9

	DBN	SCHOOL NAME	Num of SAT Test Takers	SAT Critical Reading Avg. Score	SAT Math Avg. Score	SAT Writing Avg. Score	sat_score
0	01M292	HENRY STREET SCHOOL FOR INTERNATIONAL STUDIES	29	355.0	404.0	363.0	1122.0
1	01M448	UNIVERSITY NEIGHBORHOOD HIGH SCHOOL	91	383.0	423.0	366.0	1172.0
2	01M450	EAST SIDE COMMUNITY SCHOOL	70	377.0	402.0	370.0	1149.0
3	01M458	FORSYTH SATELLITE ACADEMY	7	414.0	401.0	359.0	1174.0
4	01M509	MARTA VALLE HIGH SCHOOL	44	390.0	433.0	384.0	1207.0