Creating Custom Groupings

We wanted the ability to compare school districts based on similar school districts as well as by statewide. This notebook creates the groupings. The output files for these groupings were used in the Tableau visualization.

The columns for these groupings were chosen based on the New York State Education Department's definition of similar schools. Link: http://www.p12.nysed.gov/repcrd2004/information/similar-schools/guide.shtml

Start with the standard imports we have used for every notebook in this class.


In [119]:
%matplotlib inline
import numpy as np
import scipy as sp
import matplotlib as mpl
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import pandas as pd
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 100)
pd.set_option('display.notebook_repr_html', True)
import seaborn as sns
sns.set_style("whitegrid")
sns.set_context("poster")

We need to use the cleaned file instead of the filtered file because we need to get some of the columns that we dropped from the filtered file.


In [120]:
districts = pd.read_csv("data/finaldata/cleaned.csv")

Apply the filtering logic that we used in the main notebook.


In [121]:
import math
print 'Total number of unique school districts: ' + str(len(np.unique(districts['Agency ID - NCES Assigned [District] Latest available year'])))
districts = districts[districts['Highest Grade Offered [District] 2009-10']=='12th Grade']
print 'Total number of school districts that have high schools: ' + str(len(districts))
districts = districts[districts['SURVYEAR']!='–']
print 'Total number of school districts that have a row on raw graduation data: ' + str(len(districts))
districts = districts[districts['AFGR']>=0]
print 'Total number of school districts with valid graduation data: ' + str(len(districts))


Total number of unique school districts: 19023
Total number of school districts that have high schools: 12955
Total number of school districts that have a row on raw graduation data: 12955
Total number of school districts with valid graduation data: 10785

In [122]:
districts=districts.replace([np.inf, -np.inf], np.nan)

Mean, min, max graduation rates by state for export to the visualization.


In [123]:
#CITATION: http://stackoverflow.com/questions/17578115/pass-percentiles-to-pandas-agg-function
def percentile(n):
    def percentile_(x):
        return np.percentile(x, n)
    percentile_.__name__ = 'percentile_%s' % n
    return percentile_

In [124]:
stateresults = districts.groupby('State Name [District] Latest available year')['AFGR'].aggregate([np.min, percentile(25), np.mean, percentile(75), np.max])
stateresults.to_csv("data/finaldata/stateresults.csv")
stateresults.head(100)


Out[124]:
amin percentile_25 mean percentile_75 amax
State Name [District] Latest available year
Alabama 32.8 69.150 73.879389 79.050 99.7
Alaska 23.3 59.150 71.076471 83.200 100.0
Arizona 2.0 63.800 74.582390 90.450 100.0
Arkansas 10.3 71.975 78.018548 84.900 100.0
California 7.5 71.750 78.743750 93.925 100.0
Colorado 12.5 72.500 80.590286 92.450 100.0
Delaware 45.8 62.000 73.138095 83.500 100.0
District of Columbia 48.1 55.300 65.814286 75.200 87.5
Florida 26.1 64.100 69.871233 76.800 97.5
Georgia 13.8 62.200 68.688827 74.500 100.0
Hawaii 75.4 75.400 75.400000 75.400 75.4
Idaho 32.7 74.650 82.307207 91.450 100.0
Illinois 0.9 79.000 85.419634 94.475 100.0
Indiana 12.9 74.900 80.259000 87.525 100.0
Iowa 54.5 85.200 90.833645 98.000 100.0
Kansas 50.0 79.000 86.439929 95.150 100.0
Kentucky 50.5 75.400 80.813018 86.000 100.0
Louisiana 21.4 60.350 67.958667 74.450 100.0
Maine 60.0 78.100 83.972941 91.400 100.0
Maryland 61.2 81.500 84.391667 90.650 94.3
Massachusetts 30.5 76.200 82.731950 92.300 100.0
Michigan 20.7 72.600 79.972775 90.600 100.0
Minnesota 27.6 82.100 86.419008 96.450 100.0
Mississippi 40.3 56.500 63.114189 69.500 92.8
Missouri 13.7 79.300 85.352328 93.500 100.0
Montana 38.5 77.900 85.141176 100.000 100.0
Nebraska 44.4 85.075 89.775833 100.000 100.0
Nevada 40.0 55.800 66.256250 71.700 96.3
New Hampshire 60.4 77.650 85.777941 93.375 100.0
New Jersey 44.4 86.650 90.510700 99.500 100.0
New Mexico 27.9 64.300 72.602247 81.300 100.0
New York 5.7 78.000 84.652346 94.075 100.0
North Carolina 43.8 71.350 75.571970 80.100 100.0
North Dakota 45.5 81.500 88.311679 100.000 100.0
Ohio 30.4 81.375 86.206019 94.125 100.0
Oklahoma 40.9 75.925 83.520892 92.800 100.0
Oregon 33.3 72.200 79.913295 89.100 100.0
Pennsylvania 27.3 81.700 86.660342 94.200 100.0
Rhode Island 50.0 73.800 78.796970 86.000 100.0
South Carolina 48.3 60.750 66.332143 70.950 97.0
South Dakota 36.5 80.000 86.419178 100.000 100.0
Tennessee 62.7 76.750 83.058824 88.900 100.0
Texas 19.6 75.000 82.116227 90.500 100.0
Utah 41.4 75.325 79.920455 88.575 100.0
Vermont 53.8 86.200 89.664151 96.800 100.0
Virginia 52.1 73.000 78.966165 85.500 100.0
Washington 40.5 71.900 80.136735 89.700 100.0
West Virginia 68.8 75.700 80.057143 83.400 100.0
Wisconsin 40.7 89.150 92.749728 100.000 100.0
Wyoming 19.2 76.150 80.606522 88.750 100.0

In [125]:
districts['Urban-centric Locale [District] 2009-10'].value_counts()


Out[125]:
42-Rural: Distant      2539
43-Rural: Remote       1939
21-Suburb: Large       1586
41-Rural: Fringe       1366
32-Town: Distant       1058
33-Town: Remote         801
13-City: Small          376
11-City: Large          292
31-Town: Fringe         275
22-Suburb: Mid-size     201
23-Suburb: Small        177
12-City: Mid-size       175
dtype: int64

Calculate the ratio of free and reduced lunch students to total students.


In [126]:
districts['r_lunch_free_reduced'] = districts['Total Free and Reduced Lunch Students [Public School] 2009-10']/districts['Total Students [Public School] 2009-10']

Determine the top quartile and bottom quartile boundaries for Large and Small school size.


In [127]:
topquartile = districts.groupby('Urban-centric Locale [District] 2009-10')[['Total Students [Public School] 2009-10']]
topq = topquartile.quantile(q=.75)
topq.head(20)


Out[127]:
Total Students [Public School] 2009-10
Urban-centric Locale [District] 2009-10
11-City: Large 34462.25
12-City: Mid-size 26544.50
13-City: Small 11804.75
21-Suburb: Large 7033.00
22-Suburb: Mid-size 6290.00
23-Suburb: Small 5624.00
31-Town: Fringe 3404.00
32-Town: Distant 3272.00
33-Town: Remote 2838.50
41-Rural: Fringe 3715.50
42-Rural: Distant 1290.50
43-Rural: Remote 616.00

In [128]:
bottomquartile = districts.groupby('Urban-centric Locale [District] 2009-10')[['Total Students [Public School] 2009-10']]
bottomq = bottomquartile.quantile(q=.25)
bottomq.head(20)


Out[128]:
Total Students [Public School] 2009-10
Urban-centric Locale [District] 2009-10
11-City: Large 534.00
12-City: Mid-size 2217.50
13-City: Small 4306.50
21-Suburb: Large 2192.50
22-Suburb: Mid-size 1955.00
23-Suburb: Small 1488.00
31-Town: Fringe 1367.50
32-Town: Distant 1298.50
33-Town: Remote 994.25
41-Rural: Fringe 1135.50
42-Rural: Distant 441.25
43-Rural: Remote 213.00

In [129]:
groups = []

for name, group in topquartile:
    groups.append(name)

districts['Student_Size'] = 'Medium'
for index in range(len(groups)):
    districts['Student_Size'] = np.where((districts['Urban-centric Locale [District] 2009-10']==groups[index]) & (districts['Total Students [Public School] 2009-10']>topq.iloc[index]['Total Students [Public School] 2009-10']), 'Large', districts['Student_Size'])
    districts['Student_Size'] = np.where((districts['Urban-centric Locale [District] 2009-10']==groups[index]) & (districts['Total Students [Public School] 2009-10']<=bottomq.iloc[index]['Total Students [Public School] 2009-10']), 'Small', districts['Student_Size'])

print districts['Student_Size'].value_counts()


Medium    5397
Small     2696
Large     2692
dtype: int64

Determine the top quartile and bottom quartile boundaries for free and reduced price lunch ratio.


In [130]:
topquartile = districts.groupby(['Urban-centric Locale [District] 2009-10', 'Student_Size'])[['r_lunch_free_reduced']]
topq = topquartile.quantile(q=.75)
topq.head(20)


Out[130]:
r_lunch_free_reduced
Urban-centric Locale [District] 2009-10 Student_Size
11-City: Large Large 0.760625
Medium 0.781110
Small 0.863215
12-City: Mid-size Large 0.699413
Medium 0.686311
Small 0.786781
13-City: Small Large 0.656074
Medium 0.662596
Small 0.749835
21-Suburb: Large Large 0.520991
Medium 0.414711
Small 0.544477
22-Suburb: Mid-size Large 0.551703
Medium 0.490063
Small 0.598169
23-Suburb: Small Large 0.567036
Medium 0.473632
Small 0.585396
31-Town: Fringe Large 0.487223
Medium 0.465843

In [131]:
bottomquartile = districts.groupby(['Urban-centric Locale [District] 2009-10', 'Student_Size'])[['r_lunch_free_reduced']]
bottomq = bottomquartile.quantile(q=.25)
bottomq.head(20)


Out[131]:
r_lunch_free_reduced
Urban-centric Locale [District] 2009-10 Student_Size
11-City: Large Large 0.516167
Medium 0.452896
Small 0.578877
12-City: Mid-size Large 0.473946
Medium 0.376341
Small 0.449552
13-City: Small Large 0.374962
Medium 0.354174
Small 0.385787
21-Suburb: Large Large 0.182504
Medium 0.097351
Small 0.141586
22-Suburb: Mid-size Large 0.263716
Medium 0.177061
Small 0.260663
23-Suburb: Small Large 0.338845
Medium 0.249453
Small 0.298682
31-Town: Fringe Large 0.241222
Medium 0.228426

In [132]:
groups = []

for name, group in topquartile:
    groups.append(name)

districts['Lunch_Status'] = 'Average'
for index in range(len(groups)):
    districts['Lunch_Status'] = np.where((districts['Student_Size']==groups[index][1]) & (districts['Urban-centric Locale [District] 2009-10']==groups[index][0]) & (districts['r_lunch_free_reduced']>topq.iloc[index]['r_lunch_free_reduced']), 'High', districts['Lunch_Status'])
    districts['Lunch_Status'] = np.where((districts['Student_Size']==groups[index][1]) & (districts['Urban-centric Locale [District] 2009-10']==groups[index][0]) & (districts['r_lunch_free_reduced']<=bottomq.iloc[index]['r_lunch_free_reduced']), 'Low', districts['Lunch_Status'])

print districts['Lunch_Status'].value_counts()


Average    5509
Low        2642
High       2634
dtype: int64

Check even distribution of grouped similar schools row counts.


In [133]:
districts.groupby(['Urban-centric Locale [District] 2009-10', 'Student_Size', 'Lunch_Status'])['AFGR'].count()


Out[133]:
Urban-centric Locale [District] 2009-10  Student_Size  Lunch_Status
11-City: Large                           Large         Average          41
                                                       High             16
                                                       Low              16
                                         Medium        Average          86
                                                       High             30
                                                       Low              30
                                         Small         Average          37
                                                       High             18
                                                       Low              18
12-City: Mid-size                        Large         Average          22
                                                       High             11
                                                       Low              11
                                         Medium        Average          43
                                                       High             22
                                                       Low              22
                                         Small         Average          26
                                                       High              9
                                                       Low               9
13-City: Small                           Large         Average          47
                                                       High             23
                                                       Low              24
                                         Medium        Average          95
                                                       High             46
                                                       Low              47
                                         Small         Average          50
                                                       High             22
                                                       Low              22
21-Suburb: Large                         Large         Average         200
                                                       High             98
                                                       Low              98
                                                                      ... 
33-Town: Remote                          Small         Average         104
                                                       High             48
                                                       Low              48
41-Rural: Fringe                         Large         Average         171
                                                       High             85
                                                       Low              85
                                         Medium        Average         350
                                                       High            167
                                                       Low             167
                                         Small         Average         179
                                                       High             81
                                                       Low              81
42-Rural: Distant                        Large         Average         322
                                                       High            156
                                                       Low             156
                                         Medium        Average         652
                                                       High            309
                                                       Low             310
                                         Small         Average         320
                                                       High            157
                                                       Low             157
43-Rural: Remote                         Large         Average         245
                                                       High            119
                                                       Low             119
                                         Medium        Average         495
                                                       High            238
                                                       Low             238
                                         Small         Average         247
                                                       High            119
                                                       Low             119
Name: AFGR, dtype: int64

In [134]:
similardistrictresults = districts.groupby(['Urban-centric Locale [District] 2009-10', 'Student_Size', 'Lunch_Status'])['AFGR'].aggregate([np.min, percentile(25), np.mean, percentile(75), np.max])
similardistrictresults.to_csv("data/finaldata/similardistrictresults.csv")
similardistrictresults.head(100)


Out[134]:
amin percentile_25 mean percentile_75 amax
Urban-centric Locale [District] 2009-10 Student_Size Lunch_Status
11-City: Large Large Average 45.3 61.800 68.821951 75.400 89.7
High 48.2 59.500 62.668750 66.325 81.9
Low 61.0 76.750 79.237500 85.025 93.7
Medium Average 31.2 55.825 68.116279 81.850 100.0
High 30.5 44.675 62.916667 82.975 100.0
Low 21.4 72.900 80.126667 94.225 100.0
Small Average 12.9 47.300 68.740541 95.000 100.0
High 3.4 49.625 68.216667 95.575 100.0
Low 53.3 61.650 75.605556 88.175 100.0
12-City: Mid-size Large Average 52.5 66.000 69.931818 76.400 81.1
High 49.6 53.400 59.036364 59.300 85.2
Low 54.0 79.400 84.872727 92.050 100.0
Medium Average 49.8 70.650 78.297674 86.500 100.0
High 43.8 54.075 61.931818 68.950 82.3
Low 0.9 83.450 83.250000 92.800 99.3
Small Average 32.4 57.375 75.342308 97.975 100.0
High 21.4 31.400 52.588889 81.500 100.0
Low 41.4 51.800 74.833333 99.000 100.0
13-City: Small Large Average 62.0 71.250 76.768085 83.850 94.3
High 42.2 57.500 64.504348 71.450 94.7
Low 68.1 79.400 86.179167 93.700 100.0
Medium Average 53.3 71.300 78.528421 86.600 100.0
High 42.7 55.125 62.930435 69.025 86.0
Low 78.0 86.700 90.714894 96.750 100.0
Small Average 7.5 69.925 74.256000 88.475 100.0
High 30.2 52.675 65.472727 68.550 100.0
Low 37.9 75.575 82.359091 95.200 100.0
21-Suburb: Large Large Average 2.0 78.225 82.886000 89.600 100.0
High 29.6 62.175 71.431633 79.900 100.0
Low 72.6 91.550 94.068367 97.475 100.0
... ... ... ... ... ... ... ...
32-Town: Distant Small High 16.0 71.600 76.779688 86.425 100.0
Low 53.3 86.200 90.853125 100.000 100.0
33-Town: Remote Large Average 43.8 71.100 76.486139 81.500 100.0
High 44.6 62.500 69.595918 80.000 88.7
Low 49.5 79.550 84.014000 90.250 100.0
Medium Average 54.9 75.500 81.783902 87.600 100.0
High 44.1 65.175 71.955102 77.925 98.4
Low 5.1 83.275 87.801020 95.575 100.0
Small Average 14.8 75.350 82.146154 90.400 100.0
High 23.5 64.800 73.681250 84.375 100.0
Low 4.5 87.025 88.864583 98.725 100.0
41-Rural: Fringe Large Average 5.8 74.000 78.780117 85.750 100.0
High 50.9 65.000 72.410588 80.600 96.2
Low 53.4 88.100 90.928235 95.300 100.0
Medium Average 6.0 79.825 84.741143 91.600 100.0
High 28.4 67.400 74.473653 82.350 100.0
Low 43.6 87.000 91.302994 97.450 100.0
Small Average 23.2 77.300 84.595531 95.900 100.0
High 13.7 66.700 76.081481 89.200 100.0
Low 33.3 84.100 88.254321 97.800 100.0
42-Rural: Distant Large Average 52.8 75.100 80.779814 86.300 100.0
High 32.8 63.975 70.751282 76.900 100.0
Low 65.1 82.575 87.375641 92.475 100.0
Medium Average 28.4 79.275 85.486810 92.750 100.0
High 37.3 73.000 79.825566 89.000 100.0
Low 69.1 85.225 90.581935 97.375 100.0
Small Average 32.7 77.725 84.822500 96.000 100.0
High 10.3 69.200 79.912102 94.100 100.0
Low 13.1 81.800 87.599363 100.000 100.0
43-Rural: Remote Large Average 33.3 72.800 80.016735 87.300 100.0

100 rows × 5 columns


In [136]:
districts.head()


Out[136]:
Agency Name State Name [District] Latest available year State Name [District] 2009-10 State Abbr [District] Latest available year Agency Name [District] 2009-10 Agency ID - NCES Assigned [District] Latest available year County Name [District] 2009-10 County Number [District] 2009-10 Race/Ethnicity Category [District] 2009-10 ANSI/FIPS State Code [District] Latest available year Total Number Operational Schools [Public School] 2009-10 Total Number Operational Charter Schools [Public School] 2009-10 Total Number of Public Schools [Public School] 2009-10 Years District Reported Data [District] Latest available year Years District Did Not Report Data [District] Latest available year Location Address [District] 2013-14 Location City [District] 2013-14 Location State Abbr [District] 2013-14 Location ZIP [District] 2013-14 Location ZIP4 [District] 2013-14 Mailing Address [District] 2013-14 Mailing City [District] 2013-14 Mailing State Abbr [District] 2013-14 Mailing ZIP [District] 2013-14 Mailing ZIP4 [District] 2013-14 Phone Number [District] 2013-14 Agency Name_DEL State Name [District] Latest available year_DEL Agency Type [District] 2009-10 School District Level Code (SCHLEV) [District Finance] 2009-10 Urban-centric Locale [District] 2009-10 Boundary Change Indicator Flag [District] 2009-10 CBSA Name [District] 2009-10 CBSA ID [District] 2009-10 CSA Name [District] 2009-10 CSA ID [District] 2009-10 Latitude [District] 2009-10 Longitude [District] 2009-10 State Agency ID [District] 2009-10 Supervisory Union (ID) Number [District] 2009-10 Agency Charter Status [District] 2009-10 Metro Micro Area Code [District] 2009-10 Congressional Code [District] 2009-10 Census ID (CENSUSID) [District Finance] 2009-10 Lowest Grade Offered [District] 2009-10 Highest Grade Offered [District] 2009-10 Total Students (UG PK-12) [District] 2009-10 PK thru 12th Students [District] 2009-10 Ungraded Students [District] 2009-10 Total Students [Public School] 2009-10 ... r_lrev_gst r_lrev_put r_lrev_it r_lrev_aot r_lrev_pgc r_lrev_cc r_lrev_oss r_lrev_tui r_lrev_trans r_lrev_slr r_lrev_ts r_lrev_sar r_lrev_osalserv r_lrev_sfns r_lrev_ie r_lrev_molr r_lrev_sp r_lrev_rr r_lrev_sale r_lrev_ff r_lrev_pc r_srev_gfa r_srev_sep r_srev_trans r_srev_sip r_srev_cbsp r_srev_vep r_srev_codsp r_srev_bep r_srev_gt r_srev_slp r_srev_aor r_srev_splea r_srev_osp r_srev_ns r_frev_title1 r_frev_dis r_frev_cna r_frev_ems r_frev_dfs r_frev_voc r_frev_ao r_frev_ns r_frev_ia r_frev_be r_frev_na r_frev_aofed r_lunch_free_reduced Student_Size Lunch_Status
1 21ST CENTURY CHARTER SCH OF GARY Indiana Indiana IN 21ST CENTURY CHARTER SCH OF GARY 1800046 MARION COUNTY 18097 Reported 5 categories 18 1 1 1 2004-2013 1986-2003 556 WASHINGTON ST GARY IN 46402 NaN 333 N PENNSYLVANIA SUITE 1000 INDIANAPOLIS IN 46202 NaN 3175361027 21ST CENTURY CHARTER SCH OF GARY Indiana 7-Charter school agency 03-Elementary/secondary school system 13-City: Small 1-No change since last report Indianapolis IN 26900 Indianapolis-Anderson-Columbus IN 294 39.771949 -86.155184 9545 0 1-All associated schools are charter schools 1-Metropolitan Area 1807 NaN Kindergarten 12th Grade 360 360 NaN 360 ... NaN NaN NaN NaN NaN 0.070312 0.390625 0.000000 0 0.062500 0.000000 0.000000 0.000000 0 0.000000 0.078125 0.007812 0.000000 0.390625 0 0.00000 0.964599 0.000000 0.000000 0.000000 0.003155 0.000000 0 0 0.003155 0.000000 0.029092 0.000000 0.000000 0 0.606477 0.000000 0.156035 0.000000 0.000000 0.000000 0.176644 0.060844 0 0 0 0.000000 0.894444 Small High
2 21ST CENTURY CYBER CS Pennsylvania Pennsylvania PA 21ST CENTURY CYBER CS 4200091 CHESTER COUNTY 42029 Reported 5 categories 42 1 1 1 2001-2013 1986-2000 805 SPRINGDALE DR EXTON PA 19341 3043 805 SPRINGDALE DR. EXTON PA 19341 3043 4848755400 21ST CENTURY CYBER CS Pennsylvania 7-Charter school agency 03-Elementary/secondary school system 21-Suburb: Large 1-No change since last report Philadelphia-Camden-Wilmington PA-NJ-DE-MD 37980 Philadelphia-Camden-Vineland PA-NJ-DE-MD 428 40.005030 -75.678564 124150002 0 1-All associated schools are charter schools 1-Metropolitan Area 4206 NaN 6th Grade 12th Grade 594 594 NaN 594 ... NaN NaN NaN NaN NaN 0.000000 0.983958 0.005893 0 0.000000 0.000000 0.000000 0.000000 0 0.000000 0.010149 0.000000 0.000000 0.000000 0 0.00000 0.000000 0.000000 0.000000 0.950617 0.000000 0.000000 0 0 0.000000 0.000000 0.049383 0.000000 0.000000 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.328283 Small Average
10 A+ ACADEMY Texas Texas TX A+ ACADEMY 4800203 DALLAS COUNTY 48113 Reported 5 categories 48 1 1 1 2000-2013 1986-1999 8225 BRUTON RD DALLAS TX 75217 NaN 8225 BRUTON RD DALLAS TX 75217 NaN 2143813226 A+ ACADEMY Texas 7-Charter school agency 03-Elementary/secondary school system 11-City: Large 1-No change since last report Dallas-Fort Worth-Arlington TX 19100 Dallas-Fort Worth TX 206 32.767535 -96.660866 057829 0 1-All associated schools are charter schools 1-Metropolitan Area 4830 NaN Prekindergarten 12th Grade 1033 1033 NaN 1033 ... NaN NaN NaN NaN NaN 0.000000 0.000000 0.000000 0 0.800000 0.000000 0.000000 0.000000 0 0.142857 0.000000 0.000000 0.057143 0.000000 0 0.00000 0.952145 0.000000 0.000000 0.000000 0.000000 0.000000 0 0 0.000000 0.000211 0.006233 0.040038 0.001373 0 0.351058 0.056430 0.238741 0.033641 0.000000 0.000000 0.320130 0.000000 0 0 0 0.000000 0.909971 Medium High
13 A-C CENTRAL CUSD 262 Illinois Illinois IL A-C CENTRAL CUSD 262 1700105 CASS COUNTY 17017 Reported 5 categories 17 3 0 3 1989-2013 1986-1988 501 EAST BUCHANAN ST ASHLAND IL 62612 7624 PO BOX 260 ASHLAND IL 62612 260 2174768112 A-C CENTRAL CUSD 262 Illinois 1-Local school district 03-Elementary/secondary school system 42-Rural: Distant 1-No change since last report NaN NaN NaN NaN 39.892187 -90.016057 46-009-2620-26 0 3-All associated schools are noncharter 0-New England (NECTA) or not reported 1718 1.450092e+13 Kindergarten 12th Grade 436 436 NaN 432 ... 0.000000 0.000000 0.000000 0 NaN 0.000000 0.001049 0.000000 0 0.060829 0.009963 0.014683 0.000000 0 0.005244 0.052438 0.027792 0.004195 0.000000 0 0.02517 0.642181 0.049572 0.068049 0.000000 0.004056 0.000901 0 0 0.000000 0.000901 0.015773 0.218567 0.000000 0 0.162651 0.306024 0.087952 0.038554 0.002410 0.000000 0.402410 0.000000 0 0 0 0.000000 0.398148 Small Average
14 A-H-S-T COMM SCHOOL DISTRICT Iowa Iowa IA A-H-S-T COMM SCHOOL DISTRICT 1904080 POTTAWATTAMIE COUNTY 19155 Reported 7 categories 19 2 0 2 1986-2013 NaN 768 SOUTH MAPLE ST AVOCA IA 51521 NaN BOX 158 AVOCA IA 51521 NaN 7123436364 A-H-S-T COMM SCHOOL DISTRICT Iowa 1-Local school district 03-Elementary/secondary school system 43-Rural: Remote 1-No change since last report Omaha-Council Bluffs NE-IA 36540 Omaha-Council Bluffs-Fremont NE-IA 420 41.471017 -95.341001 780441 000 0 3-All associated schools are noncharter 1-Metropolitan Area 1905 1.650780e+13 Prekindergarten 12th Grade 696 696 NaN 595 ... 0.101505 0.055343 0.061464 0 NaN 0.000000 0.043101 0.000000 0 0.034175 0.008926 0.004336 0.002805 0 0.004081 0.066820 0.000000 0.000000 0.004081 0 0.00000 0.859543 0.000000 0.000000 0.118656 0.000000 0.002144 0 0 0.000000 0.001072 0.018585 0.000000 0.000000 0 0.095816 0.107962 0.148448 0.043185 0.008097 0.005398 0.554656 0.000000 0 0 0 0.036437 0.344538 Medium Low

5 rows × 846 columns


In [137]:
highfull = districts[['Agency ID - NCES Assigned [District] Latest available year', 'Latitude [District] 2009-10', 'Longitude [District] 2009-10', 'Agency Name', 'State Name [District] Latest available year', 'Urban-centric Locale [District] 2009-10', 'Student_Size', 'Lunch_Status', 'AFGR', 'i_agency_type_regional_education_services', 'i_lgo_PK', 'i_lgo_K', 'Total Number Operational Schools [Public School] 2009-10', 'i_fin_sdlc_sec', 'r_frev_ao', 'r_frev_dis', 'Total Expenditures (TOTALEXP) per Pupil (V33) [District Finance] 2009-10', 'r_stud_reg_12_W_M', 'r_stud_reg_12_W_F']]

In [138]:
lowfull = districts[['Agency ID - NCES Assigned [District] Latest available year', 'i_fin_sdlc_elem_sec', 'i_fin_sdlc_voc', 'r_stud_reg_12_W_F', 'i_lgo_PK', 'r_stud_reg_12_W_M', 'i_lgo_K', 'i_agency_type_local_school_district', 'r_frev_ao', 'r_stud_re_B', 'r_stud_912']]

In [139]:
highnoge = districts[['Agency ID - NCES Assigned [District] Latest available year', 'i_lgo_K', 'i_lgo_PK', 'i_fin_sdlc_sec', 'r_st_TS', 'Total Number Operational Schools [Public School] 2009-10', 'Total Expenditures (TOTALEXP) per Pupil (V33) [District Finance] 2009-10', 'r_frev_ao', 'r_frev_dis', 'r_stud_912', 'i_fin_sdlc_voc']]

In [140]:
lownoge = districts[['Agency ID - NCES Assigned [District] Latest available year', 'i_lgo_K', 'i_lgo_PK', 'r_st_TS', 'i_fin_sdlc_voc', 'Total Expenditures (TOTALEXP) per Pupil (V33) [District Finance] 2009-10', 'r_stud_912', 'r_frev_ao', 'i_ma_metropolitan', 'i_ucl_city_small', 'i_ma_micropolitan']]

In [141]:
finalvisdistrict = highfull.merge(lowfull, 'left', 'Agency ID - NCES Assigned [District] Latest available year', suffixes=('', '_LOWFULL'))
finalvisdistrict = finalvisdistrict.merge(highnoge, 'left', 'Agency ID - NCES Assigned [District] Latest available year', suffixes=('', '_HIGHNOGE'))
finalvisdistrict = finalvisdistrict.merge(lownoge, 'left', 'Agency ID - NCES Assigned [District] Latest available year', suffixes=('', '_LOWNOGE'))

In [142]:
finalvisdistrict.head()


Out[142]:
Agency ID - NCES Assigned [District] Latest available year Latitude [District] 2009-10 Longitude [District] 2009-10 Agency Name State Name [District] Latest available year Urban-centric Locale [District] 2009-10 Student_Size Lunch_Status AFGR i_agency_type_regional_education_services i_lgo_PK i_lgo_K Total Number Operational Schools [Public School] 2009-10 i_fin_sdlc_sec r_frev_ao r_frev_dis Total Expenditures (TOTALEXP) per Pupil (V33) [District Finance] 2009-10 r_stud_reg_12_W_M r_stud_reg_12_W_F i_fin_sdlc_elem_sec i_fin_sdlc_voc r_stud_reg_12_W_F_LOWFULL i_lgo_PK_LOWFULL r_stud_reg_12_W_M_LOWFULL i_lgo_K_LOWFULL i_agency_type_local_school_district r_frev_ao_LOWFULL r_stud_re_B r_stud_912 i_lgo_K_HIGHNOGE i_lgo_PK_HIGHNOGE i_fin_sdlc_sec_HIGHNOGE r_st_TS Total Number Operational Schools [Public School] 2009-10_HIGHNOGE Total Expenditures (TOTALEXP) per Pupil (V33) [District Finance] 2009-10_HIGHNOGE r_frev_ao_HIGHNOGE r_frev_dis_HIGHNOGE r_stud_912_HIGHNOGE i_fin_sdlc_voc_HIGHNOGE i_lgo_K_LOWNOGE i_lgo_PK_LOWNOGE r_st_TS_LOWNOGE i_fin_sdlc_voc_LOWNOGE Total Expenditures (TOTALEXP) per Pupil (V33) [District Finance] 2009-10_LOWNOGE r_stud_912_LOWNOGE r_frev_ao_LOWNOGE i_ma_metropolitan i_ucl_city_small i_ma_micropolitan
0 1800046 39.771949 -86.155184 21ST CENTURY CHARTER SCH OF GARY Indiana 13-City: Small Small High 30.2 0 0 1 1 0 0.176644 0.000000 11197 0.000000 0.000000 1 0 0.000000 0 0.000000 1 0 0.176644 0.947222 0.277778 1 0 0 0.113889 1 11197 0.176644 0.000000 0.277778 0 1 0 0.113889 0 11197 0.277778 0.176644 1 1 0
1 4200091 40.005030 -75.678564 21ST CENTURY CYBER CS Pennsylvania 21-Suburb: Large Small Average 100.0 0 0 0 1 0 NaN NaN 8732 0.048822 0.122896 1 0 0.122896 0 0.048822 0 0 NaN 0.045455 0.813131 0 0 0 0.055303 1 8732 NaN NaN 0.813131 0 0 0 0.055303 0 8732 0.813131 NaN 1 0 0
2 4800203 32.767535 -96.660866 A+ ACADEMY Texas 11-City: Large Medium High 55.7 0 1 0 1 0 0.320130 0.056430 8864 0.002904 0.003872 1 0 0.003872 1 0.002904 0 0 0.320130 0.089061 0.220716 0 1 0 0.167986 1 8864 0.320130 0.056430 0.220716 0 0 1 0.167986 0 8864 0.220716 0.320130 1 0 0
3 1700105 39.892187 -90.016057 A-C CENTRAL CUSD 262 Illinois 42-Rural: Distant Small Average 70.7 0 0 1 3 0 0.402410 0.306024 13200 0.027778 0.034722 1 0 0.034722 0 0.027778 1 1 0.402410 0.018519 0.300926 1 0 0 0.095000 3 13200 0.402410 0.306024 0.300926 0 1 0 0.095000 0 13200 0.300926 0.402410 0 0 0
4 1904080 41.471017 -95.341001 A-H-S-T COMM SCHOOL DISTRICT Iowa 43-Rural: Remote Medium Low 95.7 0 1 0 2 0 0.554656 0.107962 11586 0.030252 0.043697 1 0 0.043697 1 0.030252 0 1 0.554656 0.001681 0.282353 0 1 0 0.118534 2 11586 0.554656 0.107962 0.282353 0 0 1 0.118534 0 11586 0.282353 0.554656 1 0 0

In [143]:
finalvisdistrict.to_csv("data/finaldata/tableaudistricts.csv", index=False)

In [ ]:


In [ ]: