Generate data needed for visualization in Tableau

Start with the standard imports we have used for every notebook in this class.



In [45]:

    
%matplotlib inline
import numpy as np
import scipy as sp
import matplotlib as mpl
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import pandas as pd
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 100)
pd.set_option('display.notebook_repr_html', True)
import seaborn as sns
sns.set_style("whitegrid")
sns.set_context("poster")

Each of the datasheets downloaded from ELSI had download metadata on the top of them and total and key information on the bottom of them that were not data rows. This metadata, total, and key information was manually deleted before import. Some of the files had ="0" instead of 0 in the cells. This was found/replaced before import using the sed -i '' 's/="0"/0/g' *.csv command from the terminal.



In [46]:

    
#CITATION: This is the data from National Center for Education Statistics on Schools
#Schools for all 50 states and Washington, D.C.
#http://nces.ed.gov/ccd/elsi/
#ELSI Root Data Source: U.S. Department of Education National Center for Education Statistics Common Core of Data (CCD) "Public Elementary/Secondary School Universe Survey" 2012-13 v.1a  2013-14 v.1a.
#KEY:
#† indicates that the data are not applicable.
#– indicates that the data are missing.
#‡ indicates that the data do not meet NCES data quality standards.

schoolinformation = pd.read_csv("data/rawdata/schools/2009-2010 SCHOOL Information Tab.csv", dtype=np.str)
schoolcharacteristicsa = pd.read_csv("data/rawdata/schools/2009-2010 SCHOOL CharacteristicsA Tab.csv", dtype=np.str)
schoolcharacteristicsb = pd.read_csv("data/rawdata/schools/2009-2010 SCHOOL CharacteristicsB Tab.csv", dtype=np.str)
schoolenrollment = pd.read_csv("data/rawdata/schools/2009-2010 SCHOOL Enrollments Tab.csv", dtype=np.str)
schoolenrollmentdetails = pd.read_csv("data/rawdata/schools/2009-2010 SCHOOL Enrollment Details Tab.csv", dtype=np.str)
#schoolenrollmentK3 = pd.read_csv("tempdata/2013-2014 SCHOOL Enrollment by Grade, Race-Ethnicity, and Gender Tab PreK-3.csv", dtype=np.str)
#schoolenrollment48 = pd.read_csv("tempdata/2013-2014 SCHOOL Enrollment by Grade, Race-Ethnicity and Gender Tab 4-8.csv", dtype=np.str)
#schoolenrollment912 = pd.read_csv("tempdata/2013-2014 SCHOOL Enrollment by Grade, Race-Ethnicity and Gender Tab 9-Ungraded.csv", dtype=np.str)
schoolteacherstaff = pd.read_csv("data/rawdata/schools/2009-2010 SCHOOL Teacher and Staff Tab.csv", dtype=np.str)

Check the lengths of the datasets to see if we have a row for every school.



In [47]:

    
print len(schoolinformation)
print len(schoolcharacteristicsa)
print len(schoolcharacteristicsb)
print len(schoolenrollment)
print len(schoolenrollmentdetails)
#print len(schoolenrollmentK3)
#print len(schoolenrollment48)
#print len(schoolenrollment912)
print len(schoolteacherstaff)

Drop all of the duplicate columns.



In [49]:

    
schoolcharacteristicsa = schoolcharacteristicsa.drop(schoolcharacteristicsa.columns[[0, 1, 20]], 1)
schoolcharacteristicsb = schoolcharacteristicsb.drop(schoolcharacteristicsb.columns[[0, 1]], 1)
schoolenrollment = schoolenrollment.drop(schoolenrollment.columns[[0, 1]], 1)
schoolenrollmentdetails = schoolenrollmentdetails.drop(schoolenrollmentdetails.columns[[0, 1]], 1)
#schoolenrollmentK3 = schoolenrollmentK3.drop(schoolenrollmentK3.columns[[0, 1]], 1)
#schoolenrollment48 = schoolenrollment48.drop(schoolenrollment48.columns[[0, 1]], 1)
#schoolenrollment912 = schoolenrollment912.drop(schoolenrollment912.columns[[0, 1, 72]], 1)



In [50]:

    
schoolinformation.head()









    Out[50]:






  
    
      
      School Name
      State Name [Public School] Latest available year
      State Name [Public School] 2009-10
      State Abbr [Public School] Latest available year
      School Name [Public School] 2009-10
      School ID - NCES Assigned [Public School] Latest available year
      Agency Name [Public School] 2009-10
      Agency ID - NCES Assigned [Public School] Latest available year
      County Name [Public School] 2009-10
      County Number [Public School] 2009-10
      Race/Ethnicity Category [Public School] 2009-10
      ANSI/FIPS State Code [Public School] Latest available year
      Years School Reported Data [Public School] Latest available year
      Years School Did Not Report Data [Public School] Latest available year
      Location Address [Public School] 2013-14
      Location City [Public School] 2013-14
      Location State Abbr [Public School] 2013-14
      Location ZIP [Public School] 2013-14
      Location ZIP4 [Public School] 2013-14
      Mailing Address [Public School] 2013-14
      Mailing City [Public School] 2013-14
      Mailing State Abbr [Public School] 2013-14
      Mailing ZIP [Public School] 2013-14
      Mailing ZIP4 [Public School] 2013-14
      Phone Number [Public School] 2013-14
    
  
  
    
      0
      EDWARD "DUKE" ELLINGTON @ BECKHAM
      Michigan
      Michigan
      MI
      EDWARD (DUKE) ELLINGTON CONSERVATORY OF MUSIC/ART
      261200004838
      DETROIT CITY SCHOOL DISTRICT
      2612000
      WAYNE COUNTY
      26163
      Reported 5 categories
      26
      1986-2013
      NaN
      9860 PARK DR
      DETROIT
      MI
      48213
      7102
      9860 PARK DRIVE
      DETROIT
      MI
      48213
      7102
      3138528500
    
    
      1
      J.M. "CHICK" BUCKBEE JUVENILE CENTER
      West Virginia
      West Virginia
      WV
      J.M. "CHICK" BUCKBEE JUVENILE CENTER
      540051101159
      INSTITUTIONAL EDUCATIONAL PROGRAMS
      5400511
      HAMPSHIRE COUNTY
      54027
      Reported 7 categories
      54
      2003-2013
      1986-2002
      1 JERRY LN
      AUGUSTA
      WV
      26704
      †
      #1 JERRY LANE
      AUGUSTA
      WV
      26704
      †
      3044961369
    
    
      2
      JAMES H. "TIGER" MORTON JUVENILE CENTER
      West Virginia
      West Virginia
      WV
      JAMES H. "TIGER" MORTON JUVENILE CENTER
      540051101146
      INSTITUTIONAL EDUCATIONAL PROGRAMS
      5400511
      KANAWHA COUNTY
      54039
      Reported 7 categories
      54
      2003-2013
      1986-2002
      60 MANFRED HOLLAND WAY
      DUNBAR
      WV
      25064
      †
      60 MANFRED HOLLAND WAY
      DUNBAR
      WV
      25064
      †
      3047663502
    
    
      3
      MARY HARRIS "MOTHER" JONES ELEMENTARY
      Maryland
      Maryland
      MD
      MARY HARRIS "MOTHER" JONES ELEMENTARY
      240051001257
      PRINCE GEORGE'S COUNTY PUBLIC SCHOOLS
      2400510
      PRINCE GEORGE'S COUNTY
      24033
      Reported 5 categories
      24
      2002-2013
      1986-2001
      2405 TECUMSEH ST
      ADELPHI
      MD
      20783
      1658
      2405 TECUMSEH ST
      ADELPHI
      MD
      20783
      1658
      3014087900
    
    
      4
      PEMAYETV EMAHAKV CHARTER "OUR WAY SCHOOL"
      Florida
      Florida
      FL
      PEMAYETV EMAHAKV "OUR WAY SCHOOL"
      120066004849
      GLADES
      1200660
      GLADES COUNTY
      12043
      Reported 5 categories
      12
      2006-2013
      1986-2005
      100 EAST HARNEY POND RD NE
      OKEECHOBEE
      FL
      34974
      2867
      100 E HARNEY POND RD NE
      OKEECHOBEE
      FL
      34974
      2867
      8634672501

Join all of the school datasets.



In [51]:

    
joinedschool = schoolteacherstaff.join([schoolcharacteristicsa, schoolcharacteristicsb, schoolenrollment, schoolenrollmentdetails])



In [52]:

    
joinedschool = schoolinformation.merge(joinedschool, 'left', 'School ID - NCES Assigned [Public School] Latest available year', suffixes=('', '_DEL'))



In [55]:

    
#Need to get rid of Excel syntax ="" from some of the columns
for i, col in enumerate(joinedschool.columns):
    joinedschool[col] = joinedschool[col].map(lambda x: str(x).lstrip('="').rstrip('"'))



In [ ]:

    
#If by chance any rows have NaN, replace with the ELSI standard for missing data '–'
joinedschool = joinedschool.fillna('–')
joinedschool = joinedschool.replace('nan', '–')



In [57]:

    
# Replacing Missing Data / NA / Bad Quality data with blank, later to be turned into NaN for float columns
# CITATIOIN : http://pandas.pydata.org/pandas-docs/version/0.15.2/missing_data.html

joinedschool = joinedschool.replace('\xe2\x80\x93', '') # Replace "-" (Missing Data) with blank
joinedschool = joinedschool.replace('\xe2\x80\xa0', '') # Replace "†" (Not Applicable) with blank
joinedschool = joinedschool.replace('\xe2\x80\xa1', '') # Replace "‡" (Bad Quality) with blank



In [62]:

    
joinedschool['i_agency_type_regional_education_services'] = np.where(joinedschool['Agency Type [District] 2009-10']=='4-Regional education services agency', 1, 0)
joinedschool['i_agency_type_local_school_district'] = np.where(joinedschool['Agency Type [District] 2009-10']=='1-Local school district', 1, 0)
joinedschool['i_lgo_PK'] = np.where(joinedschool['Lowest Grade Offered [Public School] 2009-10']=='Prekindergarten', 1, 0)
joinedschool['i_lgo_K'] = np.where(joinedschool['Lowest Grade Offered [Public School] 2009-10']=='Kindergarten', 1, 0)
joinedschool['Black Students [Public School] 2009-10'] = joinedschool['Black Students [Public School] 2009-10'].replace('', np.nan)
joinedschool['Grades 9-12 Students [Public School] 2009-10'] = joinedschool['Grades 9-12 Students [Public School] 2009-10'].replace('', np.nan)
joinedschool['Total Students [Public School] 2009-10'] = joinedschool['Total Students [Public School] 2009-10'].replace('', np.nan)
joinedschool['Black Students [Public School] 2009-10'] = joinedschool['Black Students [Public School] 2009-10'].astype(float)
joinedschool['Grades 9-12 Students [Public School] 2009-10'] = joinedschool['Grades 9-12 Students [Public School] 2009-10'].astype(float)
joinedschool['Total Students [Public School] 2009-10'] = joinedschool['Total Students [Public School] 2009-10'].astype(float)
joinedschool['r_stud_re_B'] = joinedschool['Black Students [Public School] 2009-10']/joinedschool['Total Students [Public School] 2009-10']
joinedschool['r_stud_912'] = joinedschool['Grades 9-12 Students [Public School] 2009-10']/joinedschool['Total Students [Public School] 2009-10']
joinedschool['r_st_TS'] = joinedschool['Pupil/Teacher Ratio [Public School] 2009-10']









    



/Users/ChaserAcer/anaconda/lib/python2.7/site-packages/pandas/core/common.py:516: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  mask = arr == x



In [65]:

    
joinedschool = joinedschool[['School Name [Public School] 2009-10', 'State Name [Public School] Latest available year', 'School ID - NCES Assigned [Public School] Latest available year', 'Agency ID - NCES Assigned [Public School] Latest available year', 'i_agency_type_regional_education_services', 'i_agency_type_local_school_district', 'i_lgo_PK', 'i_lgo_K', 'r_stud_re_B', 'r_stud_912', 'r_st_TS']]



In [66]:

    
joinedschool.head()









    Out[66]:






  
    
      
      School Name [Public School] 2009-10
      State Name [Public School] Latest available year
      School ID - NCES Assigned [Public School] Latest available year
      Agency ID - NCES Assigned [Public School] Latest available year
      i_agency_type_regional_education_services
      i_agency_type_local_school_district
      i_lgo_PK
      i_lgo_K
      r_stud_re_B
      r_stud_912
      r_st_TS
    
  
  
    
      0
      EDWARD (DUKE) ELLINGTON CONSERVATORY OF MUSIC/ART
      Michigan
      261200004838
      2612000
      0
      1
      0
      1
      0.990050
      NaN
      19.45
    
    
      1
      J.M. "CHICK" BUCKBEE JUVENILE CENTER
      West Virginia
      540051101159
      5400511
      0
      0
      0
      0
      0.000000
      0.888889
      4.50
    
    
      2
      JAMES H. "TIGER" MORTON JUVENILE CENTER
      West Virginia
      540051101146
      5400511
      0
      0
      0
      0
      0.266667
      0.800000
      7.50
    
    
      3
      MARY HARRIS "MOTHER" JONES ELEMENTARY
      Maryland
      240051001257
      2400510
      0
      1
      1
      0
      0.129076
      NaN
      13.83
    
    
      4
      PEMAYETV EMAHAKV "OUR WAY SCHOOL
      Florida
      120066004849
      1200660
      0
      1
      0
      1
      0.019900
      NaN
      9.14



In [67]:

    
joinedschool=joinedschool.replace([np.inf, -np.inf], np.nan)



In [68]:

    
joinedschool.fillna(value=0,inplace=True)



In [69]:

    
joinedschool.head()









    Out[69]:






  
    
      
      School Name [Public School] 2009-10
      State Name [Public School] Latest available year
      School ID - NCES Assigned [Public School] Latest available year
      Agency ID - NCES Assigned [Public School] Latest available year
      i_agency_type_regional_education_services
      i_agency_type_local_school_district
      i_lgo_PK
      i_lgo_K
      r_stud_re_B
      r_stud_912
      r_st_TS
    
  
  
    
      0
      EDWARD (DUKE) ELLINGTON CONSERVATORY OF MUSIC/ART
      Michigan
      261200004838
      2612000
      0
      1
      0
      1
      0.990050
      0.000000
      19.45
    
    
      1
      J.M. "CHICK" BUCKBEE JUVENILE CENTER
      West Virginia
      540051101159
      5400511
      0
      0
      0
      0
      0.000000
      0.888889
      4.50
    
    
      2
      JAMES H. "TIGER" MORTON JUVENILE CENTER
      West Virginia
      540051101146
      5400511
      0
      0
      0
      0
      0.266667
      0.800000
      7.50
    
    
      3
      MARY HARRIS "MOTHER" JONES ELEMENTARY
      Maryland
      240051001257
      2400510
      0
      1
      1
      0
      0.129076
      0.000000
      13.83
    
    
      4
      PEMAYETV EMAHAKV "OUR WAY SCHOOL
      Florida
      120066004849
      1200660
      0
      1
      0
      1
      0.019900
      0.000000
      9.14



In [71]:

    
joinedschool.to_csv("data/finaldata/tableauschools.csv")

This is the full list of all column names in the schools dataframe.



In [72]:

    
for col in joinedschool.columns:
    print col









    



School Name [Public School] 2009-10
State Name [Public School] Latest available year
School ID - NCES Assigned [Public School] Latest available year
Agency ID - NCES Assigned [Public School] Latest available year
i_agency_type_regional_education_services
i_agency_type_local_school_district
i_lgo_PK
i_lgo_K
r_stud_re_B
r_stud_912
r_st_TS



In [ ]:

	School Name	State Name [Public School] Latest available year	State Name [Public School] 2009-10	State Abbr [Public School] Latest available year	School Name [Public School] 2009-10	School ID - NCES Assigned [Public School] Latest available year	Agency Name [Public School] 2009-10	Agency ID - NCES Assigned [Public School] Latest available year	County Name [Public School] 2009-10	County Number [Public School] 2009-10	Race/Ethnicity Category [Public School] 2009-10	ANSI/FIPS State Code [Public School] Latest available year	Years School Reported Data [Public School] Latest available year	Years School Did Not Report Data [Public School] Latest available year	Location Address [Public School] 2013-14	Location City [Public School] 2013-14	Location State Abbr [Public School] 2013-14	Location ZIP [Public School] 2013-14	Location ZIP4 [Public School] 2013-14	Mailing Address [Public School] 2013-14	Mailing City [Public School] 2013-14	Mailing State Abbr [Public School] 2013-14	Mailing ZIP [Public School] 2013-14	Mailing ZIP4 [Public School] 2013-14	Phone Number [Public School] 2013-14
0	EDWARD "DUKE" ELLINGTON @ BECKHAM	Michigan	Michigan	MI	EDWARD (DUKE) ELLINGTON CONSERVATORY OF MUSIC/ART	261200004838	DETROIT CITY SCHOOL DISTRICT	2612000	WAYNE COUNTY	26163	Reported 5 categories	26	1986-2013	NaN	9860 PARK DR	DETROIT	MI	48213	7102	9860 PARK DRIVE	DETROIT	MI	48213	7102	3138528500
1	J.M. "CHICK" BUCKBEE JUVENILE CENTER	West Virginia	West Virginia	WV	J.M. "CHICK" BUCKBEE JUVENILE CENTER	540051101159	INSTITUTIONAL EDUCATIONAL PROGRAMS	5400511	HAMPSHIRE COUNTY	54027	Reported 7 categories	54	2003-2013	1986-2002	1 JERRY LN	AUGUSTA	WV	26704	†	#1 JERRY LANE	AUGUSTA	WV	26704	†	3044961369
2	JAMES H. "TIGER" MORTON JUVENILE CENTER	West Virginia	West Virginia	WV	JAMES H. "TIGER" MORTON JUVENILE CENTER	540051101146	INSTITUTIONAL EDUCATIONAL PROGRAMS	5400511	KANAWHA COUNTY	54039	Reported 7 categories	54	2003-2013	1986-2002	60 MANFRED HOLLAND WAY	DUNBAR	WV	25064	†	60 MANFRED HOLLAND WAY	DUNBAR	WV	25064	†	3047663502
3	MARY HARRIS "MOTHER" JONES ELEMENTARY	Maryland	Maryland	MD	MARY HARRIS "MOTHER" JONES ELEMENTARY	240051001257	PRINCE GEORGE'S COUNTY PUBLIC SCHOOLS	2400510	PRINCE GEORGE'S COUNTY	24033	Reported 5 categories	24	2002-2013	1986-2001	2405 TECUMSEH ST	ADELPHI	MD	20783	1658	2405 TECUMSEH ST	ADELPHI	MD	20783	1658	3014087900
4	PEMAYETV EMAHAKV CHARTER "OUR WAY SCHOOL"	Florida	Florida	FL	PEMAYETV EMAHAKV "OUR WAY SCHOOL"	120066004849	GLADES	1200660	GLADES COUNTY	12043	Reported 5 categories	12	2006-2013	1986-2005	100 EAST HARNEY POND RD NE	OKEECHOBEE	FL	34974	2867	100 E HARNEY POND RD NE	OKEECHOBEE	FL	34974	2867	8634672501