Charter school identities and outcomes in the accountability era:
Preliminary results

April 19th, 2017
By Jaren Haber, PhD CandidateDept. of Sociology, UC Berkeley

(this out-dated graphic courtesy of U.S. News & World Report, 2009)

Research questions

How are charter schools different from each other in terms of ideology? How do these differences shape their survival and their outcomes, and what does this reveal about current educational policy?

The corpus

Website self-descriptions of all 6,753 charter schools open in 2014-15 (identified using the NCES Public School Universe Survey)
Charter school websites are a publicly visible proclamation of identity attempting to impress parents, regulators, etc.
This study the first to use this contemporary, comprehensive data source on U.S. charter school identities
Me & research team working on using BeautifulSoup and requests.get to webscrape the full sample

Motivation

Too much focus on test scores in education, too little on organizational aspects
Are charter schools innovative? How?
How does educational policy shape ed. philosophy? Organization? Outcomes?
No one has studied charters' public image as expressed in their OWN words

Methods

NLP: Word frequencies, distinctive words, etc.
Supervised: Custom dictionaries
Unsupervised: Topic models, word embeddings
Later: statistical regression to test, e.g., how progressivist schools in liberal communities have higher performance than they do in other places

Preliminary analysis: website self-descriptions of non-random sample of 196 schools

Early-stage sample: NOT representative!
About half randomly selected, half tracked down (many through Internet Archive) because of missing URLs
Closed schools over-represented

Preliminary conclusions:

Word counts:

Website self-descriptions for schools in mid-sized cities and suburbs tend to be longest, followed by other urban and suburban schools, then schools in towns, and shortest tends to be rural schools
Charter schools in cities and suburbs have the highest textual redundancy (lowest ratio of types to tokens)

Word embeddings:

The two educational philosophies I'm interested in--progressivism and essentialism--can be distinguished using semantic vectors
Useful way for creating and checking my dictionaries

Topic modeling:

Urban charter schools' websites emphasize GOALS (topic 0)
Suburban charter schools' websites emphasize CURRICULUM (topic 1) in addition to goals

Next steps:

Working with custom dictionaries, POS tagging
Webscraping and parsing HTML to get full sample
Match website text with data on test scores and community characteristics (e.g., race, class, political leanings) --> test hypotheses with statistical regression
More long-term: Collect longitudinal mission statement data from the Internet Archive --> look at survival and geographic dispersion of identity categories over time (especially pre-NCLB if possible)



In [1]:

    
# The keyword categories to help parse website text:
mission = ['mission',' vision ', 'vision:', 'mission:', 'our purpose', 'our ideals', 'ideals:', 'our cause', 'cause:', 'goals', 'objective']
curriculum = ['curriculum', 'curricular', 'program', 'method', 'pedagogy', 'pedagogical', 'approach', 'model', 'system', 'structure']
philosophy = ['philosophy', 'philosophical', 'beliefs', 'believe', 'principles', 'creed', 'credo', 'value',  'moral']
history = ['history', 'our story', 'the story', 'school story', 'background', 'founding', 'founded', 'established', 'establishment', 'our school began', 'we began', 'doors opened', 'school opened']
general =  ['about us', 'our school', 'who we are', 'overview', 'general information', 'our identity', 'profile', 'highlights']

Initializing Python



In [2]:

    
#!/usr/bin/env python
# -*- coding: UTF-8



In [3]:

    
# IMPORTING KEY PACKAGES
import csv # for reading in CSVs and turning them into dictionaries
import re # for regular expressions
import os # for navigating file trees
import nltk # for natural language processing tools
import pandas # for working with dataframes
import numpy as np # for working with numbers



In [4]:

    
# FOR CLEANING, TOKENIZING, AND STEMMING THE TEXT
from nltk import word_tokenize, sent_tokenize # widely used text tokenizer
from nltk.stem.porter import PorterStemmer # an approximate method of stemming words (it just cuts off the ends)
from nltk.corpus import stopwords # for one method of eliminating stop words, to clean the text
stopenglish = list(stopwords.words("english")) # assign the string of english stopwords to a variable and turn it into a list
import string # for one method of eliminating punctuation
punctuations = list(string.punctuation) # assign the string of common punctuation symbols to a variable and turn it into a list



In [5]:

    
# FOR ANALYZING WITH THE TEXT
from sklearn.feature_extraction.text import CountVectorizer # to work with document-term matrices, especially
countvec = CountVectorizer(tokenizer=nltk.word_tokenize)
from sklearn.feature_extraction.text import TfidfVectorizer # for creating TF-IDFs
tfidfvec = TfidfVectorizer()
from sklearn.decomposition import LatentDirichletAllocation # for topic modeling

import gensim # for word embedding models
from scipy.spatial.distance import cosine # for cosine similarity
from sklearn.metrics import pairwise # for pairwise similarity
from sklearn.manifold import MDS, TSNE # for multi-dimensional scaling



In [6]:

    
# FOR VISUALIZATIONS
import matplotlib
import matplotlib.pyplot as plt

# Visualization parameters
% pylab inline 
% matplotlib inline
matplotlib.style.use('ggplot')









    



Populating the interactive namespace from numpy and matplotlib

Reading in preliminary data



In [7]:

    
sample = [] # make empty list
with open('../data_URAP_etc/mission_data_prelim.csv', 'r', encoding = 'Latin-1')\
as csvfile: # open file                      
    reader = csv.DictReader(csvfile) # create a reader
    for row in reader: # loop through rows
        sample.append(row) # append each row to the list



In [8]:

    
sample[0]









    Out[8]:





{'ADDRESS': '308 SOUTH BLAKE ST, PINE BLUFF, AR',
 'AM': '0',
 'AM01F': '-2',
 'AM01M': '-2',
 'AM02F': '-2',
 'AM02M': '-2',
 'AM03F': '-2',
 'AM03M': '-2',
 'AM04F': '-2',
 'AM04M': '-2',
 'AM05F': '0',
 'AM05M': '0',
 'AM06F': '0',
 'AM06M': '0',
 'AM07F': '0',
 'AM07M': '0',
 'AM08F': '0',
 'AM08M': '0',
 'AM09F': '-2',
 'AM09M': '-2',
 'AM10F': '-2',
 'AM10M': '-2',
 'AM11F': '-2',
 'AM11M': '-2',
 'AM12F': '-2',
 'AM12M': '-2',
 'AMALF': '0',
 'AMALM': '0',
 'AMKGF': '-2',
 'AMKGM': '-2',
 'AMPKF': '-2',
 'AMPKM': '-2',
 'AMUGF': '-2',
 'AMUGM': '-2',
 'AS01F': '-2',
 'AS01M': '-2',
 'AS02F': '-2',
 'AS02M': '-2',
 'AS03F': '-2',
 'AS03M': '-2',
 'AS04F': '-2',
 'AS04M': '-2',
 'AS05F': '0',
 'AS05M': '0',
 'AS06F': '0',
 'AS06M': '0',
 'AS07F': '0',
 'AS07M': '0',
 'AS08F': '0',
 'AS08M': '0',
 'AS09F': '-2',
 'AS09M': '-2',
 'AS10F': '-2',
 'AS10M': '-2',
 'AS11F': '-2',
 'AS11M': '-2',
 'AS12F': '-2',
 'AS12M': '-2',
 'ASALF': '0',
 'ASALM': '0',
 'ASIAN': '0',
 'ASKGF': '-2',
 'ASKGM': '-2',
 'ASPKF': '-2',
 'ASPKM': '-2',
 'ASUGF': '-2',
 'ASUGM': '-2',
 'BIES': '2',
 'BL01F': '-2',
 'BL01M': '-2',
 'BL02F': '-2',
 'BL02M': '-2',
 'BL03F': '-2',
 'BL03M': '-2',
 'BL04F': '-2',
 'BL04M': '-2',
 'BL05F': '5',
 'BL05M': '12',
 'BL06F': '13',
 'BL06M': '11',
 'BL07F': '14',
 'BL07M': '11',
 'BL08F': '13',
 'BL08M': '11',
 'BL09F': '-2',
 'BL09M': '-2',
 'BL10F': '-2',
 'BL10M': '-2',
 'BL11F': '-2',
 'BL11M': '-2',
 'BL12F': '-2',
 'BL12M': '-2',
 'BLACK': '90',
 'BLALF': '45',
 'BLALM': '45',
 'BLKGF': '-2',
 'BLKGM': '-2',
 'BLPKF': '-2',
 'BLPKM': '-2',
 'BLUGF': '-2',
 'BLUGM': '-2',
 'CDCODE': '504',
 'CHARTAUTH1': '0500A',
 'CHARTAUTH2': '0500B',
 'CHARTR': '1',
 'CONAME': 'JEFFERSON COUNTY',
 'CONUM': '5069',
 'CUSTOMID': 'AR3542702',
 'FIPST': '5',
 'FRELCH': '77',
 'FTE': '5.01',
 'G01': '-2',
 'G01OFFRD': '2',
 'G02': '-2',
 'G02OFFRD': '2',
 'G03': '-2',
 'G03OFFRD': '2',
 'G04': '-2',
 'G04OFFRD': '2',
 'G05': '18',
 'G05OFFRD': '1',
 'G06': '25',
 'G06OFFRD': '1',
 'G07': '25',
 'G07OFFRD': '1',
 'G08': '24',
 'G08OFFRD': '1',
 'G09': '-2',
 'G09OFFRD': '2',
 'G10': '-2',
 'G10OFFRD': '2',
 'G11': '-2',
 'G11OFFRD': '2',
 'G12': '-2',
 'G12OFFRD': '2',
 'GSHI': '8',
 'GSLO': '5',
 'HI01F': '-2',
 'HI01M': '-2',
 'HI02F': '-2',
 'HI02M': '-2',
 'HI03F': '-2',
 'HI03M': '-2',
 'HI04F': '-2',
 'HI04M': '-2',
 'HI05F': '1',
 'HI05M': '0',
 'HI06F': '1',
 'HI06M': '0',
 'HI07F': '0',
 'HI07M': '0',
 'HI08F': '0',
 'HI08M': '0',
 'HI09F': '-2',
 'HI09M': '-2',
 'HI10F': '-2',
 'HI10M': '-2',
 'HI11F': '-2',
 'HI11M': '-2',
 'HI12F': '-2',
 'HI12M': '-2',
 'HIALF': '2',
 'HIALM': '0',
 'HIKGF': '-2',
 'HIKGM': '-2',
 'HIPKF': '-2',
 'HIPKM': '-2',
 'HISP': '2',
 'HIUGF': '-2',
 'HIUGM': '-2',
 'HP01F': '-2',
 'HP01M': '-2',
 'HP02F': '-2',
 'HP02M': '-2',
 'HP03F': '-2',
 'HP03M': '-2',
 'HP04F': '-2',
 'HP04M': '-2',
 'HP05F': '0',
 'HP05M': '0',
 'HP06F': '0',
 'HP06M': '0',
 'HP07F': '0',
 'HP07M': '0',
 'HP08F': '0',
 'HP08M': '0',
 'HP09F': '-2',
 'HP09M': '-2',
 'HP10F': '-2',
 'HP10M': '-2',
 'HP11F': '-2',
 'HP11M': '-2',
 'HP12F': '-2',
 'HP12M': '-2',
 'HPALF': '0',
 'HPALM': '0',
 'HPKGF': '-2',
 'HPKGM': '-2',
 'HPPKF': '-2',
 'HPPKM': '-2',
 'HPUGF': '-2',
 'HPUGM': '-2',
 'ISFLE': 'PS',
 'ISFTEPUP': 'PS',
 'ISMEMPUP': 'PS',
 'ISPELM': 'PS',
 'ISPFEMALE': 'PS',
 'ISPWHITE': 'PS',
 'KG': '-2',
 'KGOFFRD': '2',
 'LATCOD': '34.2275',
 'LCITY': 'PINE BLUFF',
 'LEAID': '500410',
 'LEANM': 'RESPONSIVE ED SOLUTIONS QUEST MIDDLE SCHOOL OF PINE BLUFF',
 'LEVEL': '2',
 'LONCOD': '-92.0436',
 'LSTATE': 'AR',
 'LSTREE': '308 SOUTH BLAKE ST',
 'LZIP': '71601',
 'LZIP4': '',
 'MAGNET': '2',
 'MCITY': 'PINE BLUFF',
 'MEMBER': '92',
 'MSTATE': 'AR',
 'MSTREE': '308 SOUTH BLAKE ST',
 'MZIP': '71601',
 'MZIP4': '',
 'NCESSCH': '50041001581',
 'NSLPSTATUS': 'NSLPWOPRO',
 'PACIFIC': '0',
 'PHONE': '8703293310',
 'PK': '-2',
 'PKOFFRD': '2',
 'RECONSTF': '2',
 'RECONSTY': 'N',
 'REDLCH': '5',
 'SCHNAM': 'QUEST MIDDLE SCHOOL OF PINE BLUFF',
 'SCHNO': '1581',
 'SEARCH': 'QUEST MIDDLE SCHOOL OF PINE BLUFF 308 SOUTH BLAKE ST, PINE BLUFF, AR',
 'SEASCH': '3542702',
 'SFLE': '2',
 'SFTEPUP': '2',
 'SHARED': '2',
 'SMEMPUP': '2',
 'SPELM': '2',
 'SPFEMALE': '2',
 'SPWHITE': '2',
 'STATUS': '3',
 'STID': '3542700',
 'STITLI': '1',
 'SURVYEAR': '2013',
 'TITLEI': '1',
 'TITLEISTAT': '3',
 'TOTETH': '92',
 'TOTFRL': '82',
 'TR': '0',
 'TR01F': '-2',
 'TR01M': '-2',
 'TR02F': '-2',
 'TR02M': '-2',
 'TR03F': '-2',
 'TR03M': '-2',
 'TR04F': '-2',
 'TR04M': '-2',
 'TR05F': '0',
 'TR05M': '0',
 'TR06F': '0',
 'TR06M': '0',
 'TR07F': '0',
 'TR07M': '0',
 'TR08F': '0',
 'TR08M': '0',
 'TR09F': '-2',
 'TR09M': '-2',
 'TR10F': '-2',
 'TR10M': '-2',
 'TR11F': '-2',
 'TR11M': '-2',
 'TR12F': '-2',
 'TR12M': '-2',
 'TRALF': '0',
 'TRALM': '0',
 'TRKGF': '-2',
 'TRKGM': '-2',
 'TRPKF': '-2',
 'TRPKM': '-2',
 'TRUGF': '-2',
 'TRUGM': '-2',
 'TYPE': '1',
 'UG': '-2',
 'UGOFFRD': '2',
 'ULOCAL': '13',
 'UNION': '0',
 'URL': 'http://responsiveed.com/questpinebluff/',
 'VIRTUALSTAT': 'VIRTUALNO',
 'WEBTEXT': 'Quest Middle Schools¨ are schools focused on high expectations for behavior and academics. Students must work hard to meet their goals. To fully succeed in a Quest Middle School, students must consistently show leadership skills, good behavior, and a work ethic to meet expectations. Beyond this, Quest Schools provides curriculum designed to teach wisdom. Knowledge is crucial, but wisdom is a vital part of a middle school studentÕs growth and maturity. Character education is taught at all levels. Students are taught leadership skills through our 7 Habits of Highly Effective Teens* environment.\nOur administrators and teachers care about students and have a passion to see them reach their full potential. While providing quality education for all students, Quest educators collaborate to make sure each child receives the attention necessary to be successful. Quest provides a safe environment committed to learning. Educators work with students and parents to meet the rigorous academic standards. Quest Combines the Teaching of Knowledge and Wisdom.\nQuest Middle Schools use a variety of curriculum to ensure that middle school students have a solid foundation of content learning above traditional curriculum.\nBeyond this, Quest Middle Schools provide curriculum designed to teach wisdom. Knowledge is crucial, but wisdom is a vital part of a middle school studentÕs growth and maturity. Character education is taught at all levels. Students are taught leadership skills through our 7 Habits of Highly Effective Teens* environment.\nQuest has a Private School Atmosphere Without the Tuition Cost.\nThe campus is dedicated to the idea that education can have a great connection with the home and family. Though the atmosphere feels like a private school, there is no tuition to attend a Quest Middle School.\nQuest is a public school chartered by the State Board of Education. As a public school, the campus has the responsibility to ensure all students meet the standards created by the Texas Education Agency. ',
 'WH01F': '-2',
 'WH01M': '-2',
 'WH02F': '-2',
 'WH02M': '-2',
 'WH03F': '-2',
 'WH03M': '-2',
 'WH04F': '-2',
 'WH04M': '-2',
 'WH05F': '0',
 'WH05M': '0',
 'WH06F': '0',
 'WH06M': '0',
 'WH07F': '0',
 'WH07M': '0',
 'WH08F': '0',
 'WH08M': '0',
 'WH09F': '-2',
 'WH09M': '-2',
 'WH10F': '-2',
 'WH10M': '-2',
 'WH11F': '-2',
 'WH11M': '-2',
 'WH12F': '-2',
 'WH12M': '-2',
 'WHALF': '0',
 'WHALM': '0',
 'WHITE': '0',
 'WHKGF': '-2',
 'WHKGM': '-2',
 'WHPKF': '-2',
 'WHPKM': '-2',
 'WHUGF': '-2',
 'WHUGM': '-2'}



In [9]:

    
# Take a look at the most important contents and the variables list
# in our sample (a list of dictionaries)--let's look at just the first entry
print(sample[1]["SCHNAM"], "\n", sample[1]["URL"], "\n", sample[1]["WEBTEXT"], "\n")
print(sample[1].keys()) # look at all the variables!









    



THE ACADEMIES AT JONESBORO HIGH SCHOOL 
 http://www.jonesboroschools.net/schools/academies_at_jonesboro_high_school 
 The mission of the Academies at Jonesboro High School is to provide a high quality, research-based education for all students in order to equip them with the essential skills necessary to be successful in todayÕs changing global community. Through strong partnerships with business and community stakeholders, the Academies at Jonesboro High School will ensure high achievement in all subjects through an expanded curriculum and the use of data-driven methods to evaluate and implement proven instructional strategies. The Academies at JHS will foster respect for global diversity and maintain a commitment to create exceptional opportunities for the educational growth of every child.  Excellence is our Standard, not our Goal, for All Students   

dict_keys(['BL06F', 'BLPKM', 'HIALM', 'TR05F', 'HI12F', 'AM01F', 'HI01M', 'ASPKM', 'RECONSTY', 'AS02M', 'HPKGM', 'BL12F', 'BL10F', 'TR03F', 'FRELCH', 'PHONE', 'AM03M', 'AS05M', 'AS08M', 'SPELM', 'TR09F', 'HIPKF', 'AS08F', 'SHARED', 'AS11F', 'G01', 'AM06F', 'MSTATE', 'HP04M', 'SURVYEAR', 'AM04F', 'HI05F', 'HP09M', 'AS09F', 'AM11M', 'G10OFFRD', 'TR06F', 'HI10M', 'G07', 'HP12F', 'HPUGF', 'AM02F', 'HP07M', 'AS05F', 'FIPST', 'HIKGM', 'ASIAN', 'CHARTAUTH1', 'HI07F', 'HI06M', 'TRPKF', 'HP11M', 'BL09M', 'HI11F', 'LSTATE', 'LEAID', 'G02', 'TR08F', 'TR10F', 'AM05M', 'WH10M', 'AM07F', 'TR', 'WH02M', 'G12OFFRD', 'AM12M', 'TR01M', 'AS02F', 'HP06F', 'HP03M', 'TR06M', 'HP12M', 'RECONSTF', 'WH07F', 'HPKGF', 'WH09F', 'AS06M', 'G01OFFRD', 'STID', 'HIUGM', 'HIKGF', 'HI08F', 'SMEMPUP', 'MCITY', 'WH06F', 'ISPWHITE', 'BL08M', 'G06', 'AS11M', 'HIUGF', 'AM04M', 'HI04F', 'TYPE', 'G06OFFRD', 'WH12F', 'WEBTEXT', 'PKOFFRD', 'G05', 'AM10F', 'TR02M', 'HPALF', 'G03OFFRD', 'HP05M', 'TR11M', 'TRUGM', 'AS12F', 'BL03M', 'HI08M', 'TR02F', 'HI09F', 'BL02F', 'AM05F', 'AMKGF', 'HP01F', 'LEVEL', 'HI07M', 'STITLI', 'HP06M', 'TR04M', 'TITLEISTAT', 'WH08F', 'G10', 'AMUGM', 'BL09F', 'LATCOD', 'WH06M', 'WH12M', 'HIALF', 'HI09M', 'BL07F', 'HISP', 'G12', 'TRKGF', 'AS04M', 'HI04M', 'GSLO', 'TRKGM', 'HI03F', 'CONUM', 'WH01F', 'BL01M', 'ASUGF', 'MZIP', 'WH05M', 'BL04F', 'SCHNO', 'WH10F', 'HP05F', 'LZIP4', 'ISPFEMALE', 'HI05M', 'TR05M', 'HP04F', 'AM02M', 'ASUGM', 'TR08M', 'BL03F', 'AM', 'WHALF', 'BLPKF', 'AMALF', 'BL10M', 'WHPKF', 'TOTETH', 'CDCODE', 'WHUGF', 'CUSTOMID', 'HPPKF', 'WHKGF', 'G08', 'BLALF', 'AMKGM', 'WH03M', 'HI06F', 'WHITE', 'WHALM', 'TOTFRL', 'TR04F', 'ISMEMPUP', 'AS10F', 'WHPKM', 'G07OFFRD', 'G03', 'MSTREE', 'LEANM', 'WHUGM', 'AS06F', 'AS03M', 'SCHNAM', 'G04OFFRD', 'SPWHITE', 'ISFLE', 'SFTEPUP', 'G04', 'TR11F', 'AMPKF', 'SFLE', 'BL01F', 'HP02F', 'AM08M', 'HPPKM', 'HI01F', 'HPUGM', 'BL11M', 'MAGNET', 'AM06M', 'TRPKM', 'ISPELM', 'BL06M', 'BLUGM', 'BLACK', 'AMPKM', 'WH04M', 'ASKGF', 'MEMBER', 'HI10F', 'AM09F', 'PACIFIC', 'AMUGF', 'TR12M', 'WHKGM', 'HI02F', 'ULOCAL', 'CHARTR', 'HI12M', 'KGOFFRD', 'G11', 'AM09M', 'WH09M', 'HPALM', 'NSLPSTATUS', 'TR12F', 'ASPKF', 'WH11M', 'UGOFFRD', 'AM07M', 'WH04F', 'ISFTEPUP', 'HI11M', 'WH03F', 'STATUS', 'AM08F', 'BL04M', 'WH11F', 'HP01M', 'BL05F', 'UNION', 'UG', 'WH02F', 'TR09M', 'HIPKM', 'HP08F', 'HP08M', 'SEARCH', 'G05OFFRD', 'FTE', 'WH01M', 'REDLCH', 'AS01F', 'WH07M', 'AS12M', 'KG', 'ASKGM', 'VIRTUALSTAT', 'AS10M', 'HI03M', 'BL11F', 'LONCOD', 'BL07M', 'TR01F', 'AM11F', 'BL02M', 'BL05M', 'HP03F', 'CONAME', 'SPFEMALE', 'BIES', 'ASALF', 'AM03F', 'HP07F', 'LSTREE', 'TR03M', 'HP10M', 'G09', 'HP11F', 'TR10M', 'HP09F', 'TR07F', 'WH05F', 'PK', 'BL12M', 'AS03F', 'AS07F', 'ADDRESS', 'HP10F', 'MZIP4', 'TITLEI', 'BLUGF', 'G08OFFRD', 'HP02M', 'BLKGF', 'BL08F', 'LZIP', 'AS01M', 'NCESSCH', 'G02OFFRD', 'G11OFFRD', 'AS07M', 'AMALM', 'TR07M', 'CHARTAUTH2', 'URL', 'TRALF', 'LCITY', 'AM10M', 'SEASCH', 'AS09M', 'WH08M', 'AM01M', 'GSHI', 'TRALM', 'BLALM', 'G09OFFRD', 'BLKGM', 'HI02M', 'ASALM', 'AS04F', 'TRUGF', 'AM12F'])



In [10]:

    
# Read the data in as a pandas dataframe
df = pandas.read_csv("../data_URAP_etc/mission_data_prelim.csv", encoding = 'Latin-1')
df = df.dropna(subset=["WEBTEXT"]) # drop any schools with no webtext that might have snuck in (none currently)



In [11]:

    
# Add additional variables for analysis:
# PCTETH = percentage of enrolled students belonging to a racial minority
# this includes American Indian, Asian, Hispanic, Black, Hawaiian, or Pacific Islander
df["PCTETH"] = (df["AM"] + df["ASIAN"] + df["HISP"] + df["BLACK"] + df["PACIFIC"]) / df["MEMBER"]

df["STR"] = df["MEMBER"] / df["FTE"] # Student/teacher ratio
df["PCTFRPL"] = df["TOTFRL"] / df["MEMBER"] # Percent of students receiving FRPL

# Another interesting variable: 
# TYPE = type of school, where 1 = regular, 2 = special ed, 3 = vocational, 4 = other/alternative, 5 = reportable program



In [12]:

    
## Print the webtext from the first school in the dataframe
print(df.iloc[0]["WEBTEXT"])









    



Quest is a public school chartered by the State Board of Education. As a public school, the campus has the responsibility to ensure all students meet the standards created by the Texas Education Agency.

Descriptive statistics

How urban proximity is coded: Lower number = more urban (closer to large city)

More specifically, it uses two digits with distinct meanings:

the first digit:
- 1 = city
- 2 = suburb
- 3 = town
- 4 = rural
the second digit:
- 1 = large or fringe
- 2 = mid-size or distant
- 3 = small/remote



In [13]:

    
print(df.describe()) # get descriptive statistics for all numerical columns
print()
print(df['ULOCAL'].value_counts()) # frequency counts for categorical data
print()
print(df['LEVEL'].value_counts()) # treat grade range served as categorical
# Codes for level/ grade range served: 3 = High school, 2 = Middle school, 1 = Elementary, 4 = Other)
print()
print(df['LSTATE'].mode()) # find the most common state represented in these data
print(df['ULOCAL'].mode()) # find the most urbanicity represented in these data
# print(df['FTE']).mean() # What's the average number of full-time employees by school?
# print(df['STR']).mean() # And the average student-teacher ratio?









    



       SURVYEAR       NCESSCH       FIPST           LEAID         SCHNO  \
count       196  1.960000e+02  196.000000      196.000000    196.000000   
mean       2013  2.510655e+11   25.035714  2511089.642857   5845.489796   
std           0  1.771059e+11   17.759468  1771302.884773   4214.934368   
min        2013  4.001010e+10    4.000000   400101.000000     16.000000   
25%        2013  6.402526e+10    6.000000   640252.500000   2243.500000   
50%        2013  2.500000e+11   25.000000  2500284.000000   4600.500000   
75%        2013  4.200000e+11   42.000000  4200094.500000   8657.250000   
max        2013  5.510000e+11   55.000000  5514220.000000  13727.000000   

              PHONE          MZIP        MZIP4         LZIP        LZIP4  \
count  1.960000e+02    196.000000   141.000000    196.00000   141.000000   
mean   5.800643e+09  61181.617347  2800.822695  61034.77551  2700.843972   
std    2.457133e+09  27190.843275  2819.455282  27277.06302  2724.089340   
min    2.022488e+09   1035.000000     0.000000   1035.00000     0.000000   
25%    3.236360e+09  34111.750000     0.000000  34126.25000     0.000000   
50%    6.022434e+09  70553.000000  2513.000000  70553.00000  2230.000000   
75%    8.033513e+09  85339.000000  3941.000000  85356.25000  3941.000000   
max    9.854463e+09  97497.000000  9999.000000  97497.00000  9999.000000   

          ...         PACIFIC       HPALM       HPALF          TR       TRALM  \
count     ...      196.000000  196.000000  196.000000  196.000000  196.000000   
mean      ...        1.142857    0.525510    0.510204    9.219388    4.387755   
std       ...       10.247201    5.406014    5.001015   18.875240    9.955421   
min       ...       -9.000000   -9.000000   -9.000000   -9.000000   -9.000000   
25%       ...        0.000000    0.000000    0.000000    1.000000    0.000000   
50%       ...        0.000000    0.000000    0.000000    4.000000    2.000000   
75%       ...        0.000000    0.000000    0.000000   11.250000    5.250000   
max       ...      142.000000   74.000000   68.000000  220.000000  116.000000   

            TRALF       TOTETH      PCTETH         STR     PCTFRPL  
count  196.000000   196.000000  196.000000  196.000000  196.000000  
mean     4.724490   303.316327    0.625529  -10.081974    0.577878  
std      9.383325   274.841366    1.059312  112.926532    0.319359  
min     -9.000000    -9.000000  -11.250000 -635.000000   -0.022222  
25%      0.000000   105.750000    0.324186   10.822209    0.286746  
50%      2.000000   235.500000    0.703642   15.118598    0.653312  
75%      6.000000   398.000000    0.955846   20.376488    0.877253  
max    104.000000  1542.000000    5.000000  632.000000    1.000000  

[8 rows x 297 columns]

11    78
21    36
13    21
12    19
41    11
42     9
33     6
32     6
23     4
31     3
22     3
Name: ULOCAL, dtype: int64

1    82
3    46
4    41
2    26
N     1
Name: LEVEL, dtype: int64

0    CA
dtype: object
0    11
dtype: int64



In [14]:

    
# here's the number of schools from each state, in a graph:
grouped_state = df.groupby('LSTATE')
grouped_state['WEBTEXT'].count().sort_values(ascending=True).plot(kind = 'bar', title='Schools mostly in CA, TX, AZ, FL--similar to national trend')
plt.show()



In [15]:

    
# and here's the number of schools in each urban category, in a graph:
grouped_urban = df.groupby('ULOCAL')
grouped_urban['WEBTEXT'].count().sort_values(ascending=True).plot(kind = 'bar', title='Most schools are in large cities or large suburbs')
plt.show()

What these numbers say about the charter schools in the sample:

Most are located in large cities, followed by large suburbs, then medium and small city, and then rural.
The means for percent minorities and students receiving free- or reduced-price lunch are both about 60%.
Most are in CA, TX, AZ, and FL
Most of the schools in the sample are primary schools

This means that the sample reflects national averages. In that sense, this sample isn't so bad.

Cleaning, tokenizing, and stemming the text



In [16]:

    
# Now we clean the webtext by rendering each word lower-case then removing punctuation. 
df['webtext_lc'] = df['WEBTEXT'].str.lower() # make the webtext lower case
df['webtokens'] = df['webtext_lc'].apply(nltk.word_tokenize) # tokenize the lower-case webtext by word
df['webtokens_nopunct'] = df['webtokens'].apply(lambda x: [word for word in x if word not in list(string.punctuation)]) # remove punctuation



In [17]:

    
print(df.iloc[0]["webtokens"]) # the tokenized text without punctuation









    



['quest', 'middle', 'schools¨', 'are', 'schools', 'focused', 'on', 'high', 'expectations', 'for', 'behavior', 'and', 'academics', '.', 'students', 'must', 'work', 'hard', 'to', 'meet', 'their', 'goals', '.', 'to', 'fully', 'succeed', 'in', 'a', 'quest', 'middle', 'school', ',', 'students', 'must', 'consistently', 'show', 'leadership', 'skills', ',', 'good', 'behavior', ',', 'and', 'a', 'work', 'ethic', 'to', 'meet', 'expectations', '.', 'beyond', 'this', ',', 'quest', 'schools', 'provides', 'curriculum', 'designed', 'to', 'teach', 'wisdom', '.', 'knowledge', 'is', 'crucial', ',', 'but', 'wisdom', 'is', 'a', 'vital', 'part', 'of', 'a', 'middle', 'school', 'studentõs', 'growth', 'and', 'maturity', '.', 'character', 'education', 'is', 'taught', 'at', 'all', 'levels', '.', 'students', 'are', 'taught', 'leadership', 'skills', 'through', 'our', '7', 'habits', 'of', 'highly', 'effective', 'teens*', 'environment', '.', 'our', 'administrators', 'and', 'teachers', 'care', 'about', 'students', 'and', 'have', 'a', 'passion', 'to', 'see', 'them', 'reach', 'their', 'full', 'potential', '.', 'while', 'providing', 'quality', 'education', 'for', 'all', 'students', ',', 'quest', 'educators', 'collaborate', 'to', 'make', 'sure', 'each', 'child', 'receives', 'the', 'attention', 'necessary', 'to', 'be', 'successful', '.', 'quest', 'provides', 'a', 'safe', 'environment', 'committed', 'to', 'learning', '.', 'educators', 'work', 'with', 'students', 'and', 'parents', 'to', 'meet', 'the', 'rigorous', 'academic', 'standards', '.', 'quest', 'combines', 'the', 'teaching', 'of', 'knowledge', 'and', 'wisdom', '.', 'quest', 'middle', 'schools', 'use', 'a', 'variety', 'of', 'curriculum', 'to', 'ensure', 'that', 'middle', 'school', 'students', 'have', 'a', 'solid', 'foundation', 'of', 'content', 'learning', 'above', 'traditional', 'curriculum', '.', 'beyond', 'this', ',', 'quest', 'middle', 'schools', 'provide', 'curriculum', 'designed', 'to', 'teach', 'wisdom', '.', 'knowledge', 'is', 'crucial', ',', 'but', 'wisdom', 'is', 'a', 'vital', 'part', 'of', 'a', 'middle', 'school', 'studentõs', 'growth', 'and', 'maturity', '.', 'character', 'education', 'is', 'taught', 'at', 'all', 'levels', '.', 'students', 'are', 'taught', 'leadership', 'skills', 'through', 'our', '7', 'habits', 'of', 'highly', 'effective', 'teens*', 'environment', '.', 'quest', 'has', 'a', 'private', 'school', 'atmosphere', 'without', 'the', 'tuition', 'cost', '.', 'the', 'campus', 'is', 'dedicated', 'to', 'the', 'idea', 'that', 'education', 'can', 'have', 'a', 'great', 'connection', 'with', 'the', 'home', 'and', 'family', '.', 'though', 'the', 'atmosphere', 'feels', 'like', 'a', 'private', 'school', ',', 'there', 'is', 'no', 'tuition', 'to', 'attend', 'a', 'quest', 'middle', 'school', '.', 'quest', 'is', 'a', 'public', 'school', 'chartered', 'by', 'the', 'state', 'board', 'of', 'education', '.', 'as', 'a', 'public', 'school', ',', 'the', 'campus', 'has', 'the', 'responsibility', 'to', 'ensure', 'all', 'students', 'meet', 'the', 'standards', 'created', 'by', 'the', 'texas', 'education', 'agency', '.']



In [18]:

    
# Now we remove stopwords and stem. This will improve the results
df['webtokens_clean'] = df['webtokens_nopunct'].apply(lambda x: [word for word in x if word not in list(stopenglish)]) # remove stopwords
df['webtokens_stemmed'] = df['webtokens_clean'].apply(lambda x: [PorterStemmer().stem(word) for word in x])



In [19]:

    
# Some analyses require a string version of the webtext without punctuation or numbers.
# To get this, we join together the cleaned and stemmed tokens created above, and then remove numbers and punctuation:
df['webtext_stemmed'] = df['webtokens_stemmed'].apply(lambda x: ' '.join(char for char in x))
df['webtext_stemmed'] = df['webtext_stemmed'].apply(lambda x: ''.join(char for char in x if char not in punctuations))
df['webtext_stemmed'] = df['webtext_stemmed'].apply(lambda x: ''.join(char for char in x if not char.isdigit()))



In [20]:

    
df['webtext_stemmed'][0]









    Out[20]:





'quest middl schools¨ school focus high expect behavior academ student must work hard meet goal fulli succeed quest middl school student must consist show leadership skill good behavior work ethic meet expect beyond quest school provid curriculum design teach wisdom knowledg crucial wisdom vital part middl school studentõ growth matur charact educ taught level student taught leadership skill  habit highli effect teens environ administr teacher care student passion see reach full potenti provid qualiti educ student quest educ collabor make sure child receiv attent necessari success quest provid safe environ commit learn educ work student parent meet rigor academ standard quest combin teach knowledg wisdom quest middl school use varieti curriculum ensur middl school student solid foundat content learn tradit curriculum beyond quest middl school provid curriculum design teach wisdom knowledg crucial wisdom vital part middl school studentõ growth matur charact educ taught level student taught leadership skill  habit highli effect teens environ quest privat school atmospher without tuition cost campu dedic idea educ great connect home famili though atmospher feel like privat school tuition attend quest middl school quest public school charter state board educ public school campu respons ensur student meet standard creat texa educ agenc'



In [21]:

    
# Some analyses require tokenized sentences. I'll do this with the list of dictionaries.
# I'll use cleaned, tokenized sentences (with stopwords) to create both a dictionary variable and a separate list for word2vec

words_by_sentence = [] # initialize the list of tokenized sentences as an empty list
for school in sample:
    school["sent_toksclean"] = []
    school["sent_tokens"] = [word_tokenize(sentence) for sentence in sent_tokenize(school["WEBTEXT"])] 
    for sent in school["sent_tokens"]:
        school["sent_toksclean"].append([PorterStemmer().stem(word.lower()) for word in sent if (word not in punctuations)]) # for each word: stem, lower-case, and remove punctuations
        words_by_sentence.append([PorterStemmer().stem(word.lower()) for word in sent if (word not in punctuations)])



In [22]:

    
words_by_sentence[:2]









    Out[22]:





[['quest',
  'middl',
  'schools¨',
  'are',
  'school',
  'focus',
  'on',
  'high',
  'expect',
  'for',
  'behavior',
  'and',
  'academ'],
 ['student', 'must', 'work', 'hard', 'to', 'meet', 'their', 'goal']]

Counting document lengths



In [23]:

    
# We can also count document lengths. I'll mostly use the version with punctuation removed but including stopwords,
# because stopwords are also part of these schools' public image/ self-presentation to potential parents, regulators, etc.

df['webstem_count'] = df['webtokens_stemmed'].apply(len) # find word count without stopwords or punctuation
df['webpunct_count'] = df['webtokens_nopunct'].apply(len) # find length with stopwords still in there (but no punctuation)
df['webclean_count'] = df['webtokens_clean'].apply(len) # find word count without stopwords or punctuation



In [24]:

    
# For which urban status are website self-description the longest?
print(grouped_urban['webpunct_count'].mean().sort_values(ascending=False))









    



ULOCAL
12    941.421053
22    780.000000
21    593.472222
41    576.909091
11    571.500000
13    530.809524
31    408.666667
33    364.500000
32    292.333333
42    257.444444
23    210.000000
Name: webpunct_count, dtype: float64



In [25]:

    
# here's the mean website self-description word count for schools grouped by urban proximity, in a graph:
grouped_urban['webpunct_count'].mean().sort_values(ascending=True).plot(kind = 'bar', title='Schools in mid-sized cities and suburbs have longer self-descriptions than in fringe areas', yerr = grouped_state["webpunct_count"].std())
plt.show()



In [26]:

    
# Look at 'FTE' (proxy for # administrators) clustered by urban proximity and whether it explains this
grouped_urban['FTE'].mean().sort_values(ascending=True).plot(kind = 'bar', title='Title', yerr = grouped_state["FTE"].std())
plt.show()



In [27]:

    
# Now let's calculate the type-token ratio (TTR) for each school, which compares
# the number of types (unique words used) with the number of words (including repetitions of words).

df['numtypes'] = df['webtokens_nopunct'].apply(lambda x: len(set(x))) # this is the number of unique words per site
df['TTR'] =  df['numtypes'] / df['webpunct_count'] # calculate TTR



In [28]:

    
# here's the mean TTR for schools grouped by urban category:
grouped_urban = df.groupby('ULOCAL')
grouped_urban['TTR'].mean().sort_values(ascending=True).plot(kind = 'bar', title='Charters in cities and suburbs have higher textual redundancy than in fringe areas', yerr = grouped_urban["TTR"].std())
plt.show()

(Excessively) Frequent words



In [29]:

    
# First, aggregate all the cleaned webtext:
webtext_all = []
df['webtokens_clean'].apply(lambda x: [webtext_all.append(word) for word in x])
webtext_all[:20]









    Out[29]:





['quest',
 'middle',
 'schools¨',
 'schools',
 'focused',
 'high',
 'expectations',
 'behavior',
 'academics',
 'students',
 'must',
 'work',
 'hard',
 'meet',
 'goals',
 'fully',
 'succeed',
 'quest',
 'middle',
 'school']



In [30]:

    
# Now apply the nltk function FreqDist to count the number of times each token occurs.
word_frequency = nltk.FreqDist(webtext_all)

#print out the 50 most frequent words using the function most_common
print(word_frequency.most_common(50))









    



[('students', 1739), ('school', 1661), ('learning', 736), ('education', 539), ('charter', 476), ('community', 470), ('student', 466), ('high', 415), ('program', 395), ('academic', 390), ('schools', 344), ('academy', 342), ('curriculum', 340), ('college', 328), ('skills', 320), ('teachers', 295), ('children', 280), ('grade', 267), ('environment', 241), ('provide', 238), ('success', 226), ('educational', 224), ('every', 217), ('work', 213), ('support', 207), ('staff', 207), ('leadership', 198), ('year', 198), ('arts', 190), ('parents', 185), ('development', 185), ('develop', 184), ('learn', 183), ('state', 182), ('public', 182), ('grades', 176), ('standards', 174), ('needs', 170), ('instruction', 169), ('core', 169), ('science', 168), ('world', 153), ('mission', 152), ('programs', 152), ('life', 151), ('new', 150), ('opportunities', 150), ('social', 148), ('one', 148), ('also', 146)]

These are prolific, ritual, empty words and will be excluded from topic models!

Distinctive words (mostly place names)



In [31]:

    
sklearn_dtm = countvec.fit_transform(df['webtext_stemmed'])
print(sklearn_dtm)









    



  (0, 3848)	11
  (0, 3003)	8
  (0, 4223)	1
  (0, 4214)	13
  (0, 1883)	1
  (0, 2208)	1
  (0, 1703)	2
  (0, 456)	2
  (0, 25)	2
  (0, 4679)	9
  (0, 3130)	2
  (0, 5432)	3
  (0, 2157)	1
  (0, 2956)	4
  (0, 2045)	1
  (0, 1968)	1
  (0, 4716)	1
  (0, 1005)	1
  (0, 4380)	1
  (0, 2697)	3
  (0, 4438)	3
  (0, 2055)	1
  (0, 1646)	1
  (0, 480)	2
  (0, 3802)	4
  :	:
  (195, 1421)	1
  (195, 3963)	1
  (195, 3768)	1
  (195, 3894)	1
  (195, 2069)	1
  (195, 3020)	2
  (195, 4618)	1
  (195, 2060)	1
  (195, 3197)	1
  (195, 710)	1
  (195, 1605)	1
  (195, 1120)	1
  (195, 2224)	1
  (195, 401)	1
  (195, 5380)	2
  (195, 2102)	1
  (195, 4935)	1
  (195, 5576)	1
  (195, 320)	1
  (195, 5598)	1
  (195, 1704)	1
  (195, 3096)	1
  (195, 3391)	1
  (195, 4715)	1
  (195, 3810)	2



In [32]:

    
# What are some of the words in the DTM? 
print(countvec.get_feature_names()[:10])









    



['a', 'aaec', 'ab', 'abandon', 'abbi', 'abbott', 'abc', 'abernathi', 'abid', 'abil']



In [33]:

    
# now we can create the dtm, but with cells weigthed by the tf-idf score.
dtm_tfidf_df = pandas.DataFrame(tfidfvec.fit_transform(df.webtext_stemmed).toarray(), columns=tfidfvec.get_feature_names(), index = df.index)

dtm_tfidf_df[:20] # let's take a look!









    Out[33]:






  
    
      
      aaec
      ab
      abandon
      abbi
      abbott
      abc
      abernathi
      abid
      abil
      abl
      ...
      ômi
      ôno
      ôsave
      ôsearchõ
      ôsign
      ôsigninõ
      ôsuper
      ôtapestryõ
      ôthreadsõ
      ôwatt
    
  
  
    
      0
      0.000000
      0
      0
      0
      0
      0
      0
      0
      0.000000
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      1
      0.000000
      0
      0
      0
      0
      0
      0
      0
      0.000000
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      2
      0.000000
      0
      0
      0
      0
      0
      0
      0
      0.000000
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      3
      0.413013
      0
      0
      0
      0
      0
      0
      0
      0.000000
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      4
      0.000000
      0
      0
      0
      0
      0
      0
      0
      0.000000
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      5
      0.000000
      0
      0
      0
      0
      0
      0
      0
      0.000000
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      6
      0.000000
      0
      0
      0
      0
      0
      0
      0
      0.000000
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      7
      0.000000
      0
      0
      0
      0
      0
      0
      0
      0.000000
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      8
      0.000000
      0
      0
      0
      0
      0
      0
      0
      0.031577
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      9
      0.000000
      0
      0
      0
      0
      0
      0
      0
      0.000000
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      10
      0.000000
      0
      0
      0
      0
      0
      0
      0
      0.000000
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      11
      0.000000
      0
      0
      0
      0
      0
      0
      0
      0.000000
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      12
      0.000000
      0
      0
      0
      0
      0
      0
      0
      0.000000
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      13
      0.000000
      0
      0
      0
      0
      0
      0
      0
      0.000000
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      14
      0.000000
      0
      0
      0
      0
      0
      0
      0
      0.047121
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      15
      0.000000
      0
      0
      0
      0
      0
      0
      0
      0.000000
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      16
      0.000000
      0
      0
      0
      0
      0
      0
      0
      0.000000
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      17
      0.000000
      0
      0
      0
      0
      0
      0
      0
      0.000000
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      18
      0.000000
      0
      0
      0
      0
      0
      0
      0
      0.000000
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
    
      19
      0.000000
      0
      0
      0
      0
      0
      0
      0
      0.000000
      0
      ...
      0
      0
      0
      0
      0
      0
      0
      0
      0
      0
    
  

20 rows × 5629 columns



In [34]:

    
# What are the 20 words with the highest TF-IDF scores?
print(dtm_tfidf_df.max().sort_values(ascending=False)[:20])









    



we                0.959053
telesi            0.877403
wra               0.787935
action            0.775454
swp               0.733787
ywlc              0.732152
tapestri          0.720522
quest             0.718819
bam               0.710280
englishspanish    0.699484
waysid            0.698282
slam              0.693069
treknorth         0.688578
renaiss           0.675415
rcsa              0.667000
graham            0.664289
taylion           0.662956
ivi               0.648762
somerset          0.648687
scholar           0.647563
dtype: float64

Like the frequent words above, these highly "unique" words are empty of meaning and will be excluded from topic models!

Word Embeddings with word2vec

Word2Vec features

Size: Number of dimensions for word embedding model
Window: Number of context words to observe in each direction
min_count: Minimum frequency for words included in model
sg (Skip-Gram): '0' indicates CBOW model; '1' indicates Skip-Gram
Alpha: Learning rate (initial); prevents model from over-correcting, enables finer tuning
Iterations: Number of passes through dataset
Batch Size: Number of words to sample from data during each pass
Worker: Set the 'worker' option to ensure reproducibility



In [35]:

    
# train the model, using a minimum of 5 words
model = gensim.models.Word2Vec(words_by_sentence, size=100, window=5, \
                               min_count=2, sg=1, alpha=0.025, iter=5, batch_words=10000, workers=1)



In [36]:

    
# dictionary of words in model (may not work for old gensim)
# print(len(model.vocab))
# model.vocab



In [37]:

    
# Find cosine distance between two given word vectors
print(model.similarity('college-prep','align')) # these two are close to essentialism
print(model.similarity('emot', 'curios')) # these two are close to progressivism









    



0.914012837905
0.934911449012



In [38]:

    
# create some rough dictionaries for our contrasting educational philosophies
essentialism = ['excel', 'perform', 'prep', 'rigor', 'standard', 'align', 'comprehens', 'content', \
                               'data-driven', 'market', 'research', 'research-bas', 'program', 'standards-bas']
progressivism = ['inquir', 'curios', 'project', 'teamwork', 'social', 'emot', 'reflect', 'creat',\
                'ethic', 'independ', 'discov', 'deep', 'problem-solv', 'natur']



In [39]:

    
# Let's look at two vectors that demonstrate the binary between these philosophies: align and emot
print(model.most_similar('align')) # words core to essentialism
print()
print(model.most_similar('emot')) # words core to progressivism









    



[('across', 0.962225079536438), ('common', 0.9599748253822327), ('design', 0.9472075700759888), ('compon', 0.9386098980903625), ('research-bas', 0.9384891986846924), ('sequenc', 0.9361364841461182), ('util', 0.9324259161949158), ('philosophi', 0.9307288527488708), ('exceed', 0.9298505187034607), ('pennsylvania', 0.9288857579231262)]

[('creativ', 0.9845424294471741), ('intellectu', 0.9743475914001465), ('strategi', 0.9633899927139282), ('tool', 0.9565004110336304), ('basic', 0.9564352035522461), ('compet', 0.9557984471321106), ('awar', 0.9557099342346191), ('practic', 0.9553041458129883), ('critic', 0.9519203901290894), ('studentsõ', 0.951266884803772)]



In [40]:

    
print(model.most_similar('emot')) # words core to progressivism









    



[('creativ', 0.9845424294471741), ('intellectu', 0.9743475914001465), ('strategi', 0.9633899927139282), ('tool', 0.9565004110336304), ('basic', 0.9564352035522461), ('compet', 0.9557984471321106), ('awar', 0.9557099342346191), ('practic', 0.9553041458129883), ('critic', 0.9519203901290894), ('studentsõ', 0.951266884803772)]



In [41]:

    
# Let's work with the binary between progressivism vs. essentialism
# first let's find the 50 words closest to each philosophy using the two 14-term dictionaries defined above
prog_words = model.most_similar(progressivism, topn=50)
prog_words = [word for word, similarity in prog_words]
for word in progressivism:
    prog_words.append(word)
print(prog_words[:20])









    



['deeper', 'acquir', 'appreci', 'disciplin', 'real-world', 'awar', 'cognit', 'trait', 'human', 'mind', 'differenti', 'defin', 'strengthen', 'play', 'authent', 'self-confid', 'show', 'studentsõ', 'explor', 'scientif']



In [42]:

    
ess_words = model.most_similar(essentialism, topn=50) # now let's get the 50 most similar words for our essentialist dictionary
ess_words = [word for word, similarity in ess_words]
for word in essentialism:
    ess_words.append(word)
print(ess_words[:20])









    



['acceler', 'blend', 'compon', 'rich', 'intens', 'sequenc', 'infus', 'coursework', 'framework', 'proven', 'college-preparatori', 'across', 'key', 'student-cent', 'fulli', 'industri', 'aim', 'util', 'self-pac', 'profici']



In [43]:

    
# construct an combined dictionary
phil_words = ess_words + prog_words



In [44]:

    
# preparing for visualizing this binary with word2vec
x = [model.similarity('emot', word) for word in phil_words]
y = [model.similarity('align', word) for word in phil_words]



In [45]:

    
# here's a visual of the progressivism/essentialism binary: 
# top-left half is essentialism, bottom-right half is progressivism
_, ax = plt.subplots(figsize=(20,20))
ax.scatter(x, y, alpha=1, color='b')
for i in range(len(phil_words)):
    ax.annotate(phil_words[i], (x[i], y[i]))
ax.set_xlim(.635, 1.005)
ax.set_ylim(.635, 1.005)
plt.plot([0, 1], [0, 1], linestyle='--');

Binary of essentialist (top-left) and progressivist (bottom-right) word vectors

Topic Modeling with scikit-learn

For documentation on this topic modeling (TM) package, which uses Latent Dirichlet Allocation (LDA), see here.

And for documentation on the vectorizer package, CountVectorizer from scikit-learn, see here.



In [46]:

    
####Adopted From: 
#Author: Olivier Grisel <olivier.grisel@ensta.org>
#         Lars Buitinck
#         Chyi-Kwei Yau <chyikwei.yau@gmail.com>
# License: BSD 3 clause

# Initialize the variables needed for the topic models
n_samples = 2000
n_topics = 3
n_top_words = 50

# Create helper function that prints out the top words for each topic in a pretty way
def print_top_words(model, feature_names, n_top_words):
    for topic_idx, topic in enumerate(model.components_):
        print("\nTopic #%d:" % topic_idx)
        print(" ".join([feature_names[i]
                        for i in topic.argsort()[:-n_top_words - 1:-1]]))
    print()



In [47]:

    
# Vectorize our text using CountVectorizer
print("Extracting tf features for LDA...")
tf_vectorizer = CountVectorizer(max_df=70, min_df=4,
                                max_features=None,
                                stop_words=stopenglish, lowercase=1
                                )

tf = tf_vectorizer.fit_transform(df.WEBTEXT)









    



Extracting tf features for LDA...



In [48]:

    
print("Fitting LDA models with tf features, "
      "n_samples=%d and n_topics=%d..."
      % (n_samples, n_topics))

# define the lda function, with desired options
lda = LatentDirichletAllocation(n_topics=n_topics, max_iter=20,
                                learning_method='online',
                                learning_offset=80.,
                                total_samples=n_samples,
                                random_state=0)
#fit the model
lda.fit(tf)









    



Fitting LDA models with tf features, n_samples=2000 and n_topics=3...






    Out[48]:





LatentDirichletAllocation(batch_size=128, doc_topic_prior=None,
             evaluate_every=-1, learning_decay=0.7,
             learning_method='online', learning_offset=80.0,
             max_doc_update_iter=100, max_iter=20, mean_change_tol=0.001,
             n_jobs=1, n_topics=3, perp_tol=0.1, random_state=0,
             topic_word_prior=None, total_samples=2000, verbose=0)



In [49]:

    
# print the top words per topic, using the function defined above.

print("\nTopics in LDA model:")
tf_feature_names = tf_vectorizer.get_feature_names()
print_top_words(lda, tf_feature_names, n_top_words)









    



Topics in LDA model:

Topic #0:
scholars character child leadership new achievement district vision believe day values help programs self excellence families teacher ensure leaders strong respect preparatory individual small responsibility safe best others quality free learners take also process teaching campus board become family successful time texas experience reading rigorous meet focus math building prepare

Topic #1:
arts science technology leadership reading language action services provides music also young art writing middle 12 district programs day math history class time study career use english campus elementary areas classes international using knowledge course teacher understanding including activities focus new teaching first content physical personal service may challenging ib

Topic #2:
science new arts leaders summit also teacher leadership time teaching young throughout programs achievement math study approach part building class believe families focus use language elementary small renaissance child help parent experiences technology future activities reading day provides values instructional process quality best behavior rigorous art service 12 thinking experience

These topics seem to mean:

topic 0 relates to GOALS,
topic 1 relates to CURRICULUM, and
topic 2 relates to PHILOSOPHY or learning process (but this topic less clear/ more mottled)



In [50]:

    
# Preparation for looking at distribution of topics over schools
topic_dist = lda.transform(tf) # transpose topic distribution
topic_dist_df = pandas.DataFrame(topic_dist) # turn into a df
df_w_topics = topic_dist_df.join(df) # merge with charter MS dataframe
df_w_topics[:20] # check out the merged df with topics!









    Out[50]:






  
    
      
      0
      1
      2
      SCHNAM
      ADDRESS
      URL
      SEARCH
      CUSTOMID
      WEBTEXT
      LEVEL
      ...
      webtokens
      webtokens_nopunct
      webtokens_clean
      webtokens_stemmed
      webtext_stemmed
      webstem_count
      webpunct_count
      webclean_count
      numtypes
      TTR
    
  
  
    
      0
      131.261273
      0.394623
      0.344103
      QUEST MIDDLE SCHOOL OF PINE BLUFF
      308 SOUTH BLAKE ST, PINE BLUFF, AR
      http://responsiveed.com/questpinebluff/
      QUEST MIDDLE SCHOOL OF PINE BLUFF 308 SOUTH BL...
      AR3542702
      Quest Middle Schools¨ are schools focused on h...
      2
      ...
      [quest, middle, schools¨, are, schools, focuse...
      [quest, middle, schools¨, are, schools, focuse...
      [quest, middle, schools¨, schools, focused, hi...
      [quest, middl, schools¨, school, focus, high, ...
      quest middl schools¨ school focus high expect ...
      200
      314
      200
      139
      0.442675
    
    
      1
      43.251431
      0.404550
      0.344018
      THE ACADEMIES AT JONESBORO HIGH SCHOOL
      301 HURRICANE DR, JONESBORO, AR
      http://www.jonesboroschools.net/schools/academ...
      THE ACADEMIES AT JONESBORO HIGH SCHOOL 301 HUR...
      AR1608703
      The mission of the Academies at Jonesboro High...
      3
      ...
      [the, mission, of, the, academies, at, jonesbo...
      [the, mission, of, the, academies, at, jonesbo...
      [mission, academies, jonesboro, high, school, ...
      [mission, academi, jonesboro, high, school, pr...
      mission academi jonesboro high school provid h...
      63
      111
      63
      71
      0.639640
    
    
      2
      23.290207
      0.367458
      0.342335
      A CHILD'S VIEW SCHOOL
      2846 DREXEL RD, TUCSON, AZ
      http://childcarecenter.us/provider_detail/a_ch...
      A CHILD'S VIEW SCHOOL 2846 DREXEL RD, TUCSON, AZ
      AZ87345
      We believe that every child needs a well-round...
      1
      ...
      [we, believe, that, every, child, needs, a, we...
      [we, believe, that, every, child, needs, a, we...
      [believe, every, child, needs, well-rounded, e...
      [believ, everi, child, need, well-round, educ,...
      believ everi child need wellround educ famili ...
      38
      77
      38
      61
      0.792208
    
    
      3
      13.599131
      5.047494
      0.353375
      AAEC - PARADISE VALLEY
      17811 NORTH 32ND ST, PHOENIX, AZ
      http://www.aaechighschools.com/
      AAEC - PARADISE VALLEY 17811 NORTH 32ND ST, PH...
      AZ6344
      AAEC Early College High School prepares young ...
      3
      ...
      [aaec, early, college, high, school, prepares,...
      [aaec, early, college, high, school, prepares,...
      [aaec, early, college, high, school, prepares,...
      [aaec, earli, colleg, high, school, prepar, yo...
      aaec earli colleg high school prepar young adu...
      31
      45
      31
      38
      0.844444
    
    
      4
      24.045600
      18.605429
      0.348971
      ABRAHAM LINCOLN TRADITIONAL SCHOOL
      10444 NORTH 39TH AVE, PHOENIX, AZ
      http://abrahamlincoln.wesdschools.org/
      ABRAHAM LINCOLN TRADITIONAL SCHOOL 10444 NORTH...
      AZ5274
      The mission of the Abraham Lincoln Traditional...
      1
      ...
      [the, mission, of, the, abraham, lincoln, trad...
      [the, mission, of, the, abraham, lincoln, trad...
      [mission, abraham, lincoln, traditional, schoo...
      [mission, abraham, lincoln, tradit, school, gu...
      mission abraham lincoln tradit school guid cha...
      75
      106
      75
      71
      0.669811
    
    
      5
      0.437738
      93.213344
      0.348919
      ACADEMY DEL SOL
      4525 EAST BROADWAY BLVD, TUCSON, AZ
      http://www.academydelsol.com/
      ACADEMY DEL SOL 4525 EAST BROADWAY BLVD, TUCSO...
      AZ90200
      Academy Del SolÕs mission is to provide a rigo...
      1
      ...
      [academy, del, solõs, mission, is, to, provide...
      [academy, del, solõs, mission, is, to, provide...
      [academy, del, solõs, mission, provide, rigoro...
      [academi, del, solõ, mission, provid, rigor, s...
      academi del solõ mission provid rigor superior...
      147
      226
      147
      140
      0.619469
    
    
      6
      196.920592
      33.728109
      0.351299
      ACADEMY OF TUCSON ELEMENTARY SCHOOL
      9209 EAST WRIGHTSTOWN RD, TUCSON, AZ
      http://www.academyoftucson.com/elementary-scho...
      ACADEMY OF TUCSON ELEMENTARY SCHOOL 9209 EAST ...
      AZ81130
      Mission:\rIt is the purpose of the Academy of ...
      1
      ...
      [mission, :, it, is, the, purpose, of, the, ac...
      [mission, it, is, the, purpose, of, the, acade...
      [mission, purpose, academy, tucson, provide, p...
      [mission, purpos, academi, tucson, provid, pre...
      mission purpos academi tucson provid prepar gr...
      356
      620
      356
      278
      0.448387
    
    
      7
      24.971913
      14.676298
      0.351789
      DESERT MOSAIC SCHOOL
      5757 WEST AJO HWY, TUCSON, AZ
      http://desertmosaic.com/Home_Page.php
      DESERT MOSAIC SCHOOL 5757 WEST AJO HWY, TUCSON...
      AZ79118
      Desert Mosaic School commits to creating a tea...
      4
      ...
      [desert, mosaic, school, commits, to, creating...
      [desert, mosaic, school, commits, to, creating...
      [desert, mosaic, school, commits, creating, te...
      [desert, mosaic, school, commit, creat, teach,...
      desert mosaic school commit creat teach enviro...
      67
      115
      67
      74
      0.643478
    
    
      8
      179.458463
      56.191420
      0.350117
      KAIZEN EDUCATION FOUNDATION DBA SUMMIT HIGH SC...
      728 EAST MCDOWELL RD, PHOENIX, AZ
      http://www.summiths.com/
      KAIZEN EDUCATION FOUNDATION DBA SUMMIT HIGH SC...
      AZ10749
      Summit High SchoolÕs Mission and Vision is to ...
      3
      ...
      [summit, high, schoolõs, mission, and, vision,...
      [summit, high, schoolõs, mission, and, vision,...
      [summit, high, schoolõs, mission, vision, prov...
      [summit, high, schoolõ, mission, vision, provi...
      summit high schoolõ mission vision provid safe...
      363
      580
      363
      275
      0.474138
    
    
      9
      47.900249
      20.751915
      0.347836
      OASIS HIGH SCHOOL
      8632 WEST NORTHERN AVE, GLENDALE, AZ
      https://web.archive.org/web/20120617204246/htt...
      OASIS HIGH SCHOOL 8632 WEST NORTHERN AVE, GLEN...
      AZ78955
      Omega's mission is to provide an optimal teach...
      3
      ...
      [omega, 's, mission, is, to, provide, an, opti...
      [omega, 's, mission, is, to, provide, an, opti...
      [omega, 's, mission, provide, optimal, teachin...
      [omega, 's, mission, provid, optim, teach, lea...
      omega s mission provid optim teach learn envir...
      133
      221
      133
      137
      0.619910
    
    
      10
      0.424860
      138.224560
      0.350580
      SAGE ACADEMY
      1055 EAST HEARN RD, SCOTTSDALE, AZ
      http://www.sage-academy.org/
      SAGE ACADEMY 1055 EAST HEARN RD, SCOTTSDALE, AZ
      AZ89415
      River Valley Charter School (RVCS) is a ...
      1
      ...
      [river, valley, charter, school, (, rvcs, ), i...
      [river, valley, charter, school, rvcs, is, a, ...
      [river, valley, charter, school, rvcs, public,...
      [river, valley, charter, school, rvc, public, ...
      river valley charter school rvc public charter...
      206
      341
      206
      185
      0.542522
    
    
      11
      401.374463
      198.273555
      0.351982
      SCHOOL FOR INTEGRATED ACADEMICS AND TECHNOLOGIES
      518 SOUTH 3RD ST, PHOENIX, AZ
      http://www.siatech.org/
      SCHOOL FOR INTEGRATED ACADEMICS AND TECHNOLOGI...
      AZ79450
      As a nonprofit organization, we create the env...
      3
      ...
      [as, a, nonprofit, organization, ,, we, create...
      [as, a, nonprofit, organization, we, create, t...
      [nonprofit, organization, create, environments...
      [nonprofit, organ, creat, environ, tool, techn...
      nonprofit organ creat environ tool techniqu re...
      964
      1476
      964
      634
      0.429539
    
    
      12
      9.411709
      89.237486
      0.350806
      SEQUOIA VILLAGE SCHOOL
      982 FULL HOUSE LN, SHOW LOW, AZ
      http://www.sequoiavillageschool.org/
      SEQUOIA VILLAGE SCHOOL 982 FULL HOUSE LN, SHOW...
      AZ10848
      Building a Better World One Student at a Time"...
      1
      ...
      [building, a, better, world, one, student, at,...
      [building, a, better, world, one, student, at,...
      [building, better, world, one, student, time, ...
      [build, better, world, one, student, time, '',...
      build better world one student time  seed publ...
      188
      263
      188
      167
      0.634981
    
    
      13
      52.244098
      0.404307
      0.351595
      SOUTH POINTE HIGH SCHOOL
      8325 SOUTH CENTRAL AVE, PHOENIX, AZ
      http://www.southpointehs.com/
      SOUTH POINTE HIGH SCHOOL 8325 SOUTH CENTRAL AV...
      AZ80990
      We nurture our students academically, behavior...
      3
      ...
      [we, nurture, our, students, academically, ,, ...
      [we, nurture, our, students, academically, beh...
      [nurture, students, academically, behaviorally...
      [nurtur, student, academ, behavior, emot, serv...
      nurtur student academ behavior emot serv produ...
      65
      117
      65
      71
      0.606838
    
    
      14
      91.595543
      80.053909
      0.350548
      SOUTH POINTE HIGH SCHOOL
      8325 SOUTH CENTRAL AVE, PHOENIX, AZ
      http://www.southpointehs.com/
      SOUTH POINTE HIGH SCHOOL 8325 SOUTH CENTRAL AV...
      AZ80990
      We nurture our students academically, behavior...
      3
      ...
      [we, nurture, our, students, academically, ,, ...
      [we, nurture, our, students, academically, beh...
      [nurture, students, academically, behaviorally...
      [nurtur, student, academ, behavior, emot, serv...
      nurtur student academ behavior emot serv produ...
      265
      466
      265
      233
      0.500000
    
    
      15
      14.281958
      0.371092
      0.346950
      SOUTH POINTE JUNIOR HIGH SCHOOL
      217 EAST OLYMPIC DR, PHOENIX, AZ
      http://www.southpointejh.com/
      SOUTH POINTE JUNIOR HIGH SCHOOL 217 EAST OLYMP...
      AZ79178
      To provide an education and support students d...
      2
      ...
      [to, provide, an, education, and, support, stu...
      [to, provide, an, education, and, support, stu...
      [provide, education, support, students, develo...
      [provid, educ, support, student, develop, self...
      provid educ support student develop selfreli s...
      28
      48
      28
      37
      0.770833
    
    
      16
      18.623108
      3.020419
      0.356473
      SOUTHSIDE COMMUNITY SCHOOL
      2701 SOUTH CAMPBELL AVE, TUCSON, AZ
      http://www.ade.az.gov/edd/NewDetails.asp?Entit...
      SOUTHSIDE COMMUNITY SCHOOL 2701 SOUTH CAMPBELL...
      AZ79432
      Southside Community School is a free, public c...
      4
      ...
      [southside, community, school, is, a, free, ,,...
      [southside, community, school, is, a, free, pu...
      [southside, community, school, free, public, c...
      [southsid, commun, school, free, public, chart...
      southsid commun school free public charter sch...
      29
      41
      29
      33
      0.804878
    
    
      17
      18.192244
      426.458469
      0.349286
      SUN VALLEY CHARTER SCHOOL
      5806 SOUTH 35TH AVE BLDG EAST, PHOENIX, AZ
      http://www.sunvalleycharterschool.com/
      SUN VALLEY CHARTER SCHOOL 5806 SOUTH 35TH AVE ...
      AZ90193
      Sun Valley Charter School has made a commitmen...
      1
      ...
      [sun, valley, charter, school, has, made, a, c...
      [sun, valley, charter, school, has, made, a, c...
      [sun, valley, charter, school, made, commitmen...
      [sun, valley, charter, school, made, commit, p...
      sun valley charter school made commit provid s...
      666
      986
      666
      383
      0.388438
    
    
      18
      71.231727
      0.418614
      0.349659
      TANQUE VERDE ELEMENTARY SCHOOL
      2600 NORTH FENNIMOREA AVE, TUCSON, AZ
      http://www.tanqueverdeschools.org/
      TANQUE VERDE ELEMENTARY SCHOOL 2600 NORTH FENN...
      AZ5829
      "Excellence is our goal, understanding our fou...
      1
      ...
      [``, excellence, is, our, goal, ,, understandi...
      [``, excellence, is, our, goal, understanding,...
      [``, excellence, goal, understanding, foundati...
      [``, excel, goal, understand, foundat, '', tvu...
      excel goal understand foundat  tvusd ensur ef...
      103
      153
      103
      109
      0.712418
    
    
      19
      49.240595
      0.410438
      0.348967
      TARTESSO ELEMENTARY SCHOOL
      29677 WEST INDIANOLA RD, BUCKEYE, AZ
      http://tartesso.smusd90.org/
      TARTESSO ELEMENTARY SCHOOL 29677 WEST INDIANOL...
      AZ89596
      We are located in Buckeye, Arizona and enjoy t...
      1
      ...
      [we, are, located, in, buckeye, ,, arizona, an...
      [we, are, located, in, buckeye, arizona, and, ...
      [located, buckeye, arizona, enjoy, benefits, a...
      [locat, buckey, arizona, enjoy, benefit, activ...
      locat buckey arizona enjoy benefit activ suppo...
      86
      145
      86
      88
      0.606897
    
  

20 rows × 347 columns



In [51]:

    
topic_columns = range(0,n_topics) # Set numerical range of topic columns for use in analyses, using n_topics from above



In [52]:

    
# Which schools are weighted highest for topic 0? How do they trend with regard to urban proximity and student class? 
print(df_w_topics[['LSTATE', 'ULOCAL', 'PCTETH', 'PCTFRPL', 0, 1, 2]].sort_values(by=[0], ascending=False))









    



    LSTATE  ULOCAL     PCTETH   PCTFRPL           0            1         2
73      FL      12   0.457797  0.440629  836.884989   178.763940  0.351071
87      IL      12   0.733624  0.532751  807.866177    85.781466  0.352358
52      CA      11   0.997093  0.973837  682.823243     0.824402  0.352355
84      FL      11   0.556028  0.173050  660.449380    78.198383  0.352236
51      CA      11   0.557576  0.232323  656.232871     0.417456  0.349673
23      AZ      13   0.174312 -0.009174  571.003730     8.642115  0.354154
127     NY      11   0.989899  0.818182  563.216136    10.432775  0.351089
44      CA      11   0.810409  0.483271  530.244377     0.404822  0.350801
20      AZ      13   0.297222  0.008333  515.216903    96.431358  0.351739
180     TX      12   0.894057  0.744186  473.231776     0.421162  0.347063
22      AZ      41   0.341346  0.552885  446.228030     0.420970  0.351000
11      AZ      11 -11.250000  0.000000  401.374463   198.273555  0.351982
165     TX      11   0.961957  0.739130  391.239505     0.412921  0.347574
128     NY      11   0.996667  0.930000  382.700076    95.953308  0.346616
76      FL      21   0.850288  0.382917  371.228569     0.421452  0.349978
149     PA      11   1.000000 -0.013575  359.454317    12.198365  0.347319
182     TX      21   0.994253  0.948276  352.248564     0.403317  0.348119
195     WI      11   0.353234  0.343284  345.233831     0.419037  0.347131
101     MI      12   0.884232  0.934132  328.253226     0.398792  0.347982
160     TX      21   0.882096  0.895197  319.249176     0.404244  0.346579
159     TX      11   0.976143  0.940358  303.255555     0.397995  0.346450
82      FL      21   0.976744  0.941860  292.288126     2.359365  0.352509
140     OH      12   0.328358  0.776119  291.066628    22.584135  0.349237
104     MI      12   0.753769  0.773869  274.249669     0.402176  0.348154
174     TX      21   0.744076  0.677725  273.262781     0.392607  0.344612
119     NC      21   0.088095  0.095238  266.783878   126.866013  0.350109
63      DE      13   0.965157  0.606272  264.251378     0.401792  0.346830
142     OR      41   0.168224  0.224299  263.240002     0.411085  0.348913
194     WI      22   0.354232  0.360502  262.266485     0.388329  0.345186
179     TX      21   0.991489  0.927660  260.245051     0.407997  0.346951
..     ...     ...        ...       ...         ...          ...       ...
154     SC      13   0.392857  0.535714    0.433510    68.220386  0.346105
72      FL      32   0.324503  0.675497    0.428338    67.225470  0.346192
37      CA      21   0.193878  0.034014    0.425801   200.225710  0.348489
10      AZ      11   0.369427  0.656051    0.424860   138.224560  0.350580
45      CA      11   0.967273  0.709091    0.422800     6.234783  0.342417
172     TX      21   0.337386  0.118541    0.422751   136.230226  0.347023
65      FL      22   0.208333  0.370833    0.422699   367.225716  0.351586
184     UT      41   0.060311  0.130350    0.422111   555.231439  0.346449
192     WI      32   0.107143  0.500000    0.419505    13.238635  0.341860
88      IL      11   0.995968  0.784274    0.417534   185.231030  0.351436
64      FL      21   0.978964  0.621359    0.415587    17.240277  0.344136
31      CA      33   0.286184  0.644737    0.412016    49.240774  0.347210
54      CA      22   0.420664  0.553506    0.407155   306.245094  0.347752
145     PA      21   0.140662  0.159574    0.400168    63.253037  0.346794
50      CA      12   0.494755  0.312937    0.397237   851.253728  0.349034
148     PA      11   0.931104  0.853090    0.396484   260.257830  0.345686
152     RI      12   0.900763  0.877863    0.396424   973.256774  0.346802
110     MN      21   0.084507  0.070423    0.396098  1895.255444  0.348458
67      FL      11   0.382022  0.292135    0.395258    24.261845  0.342896
81      FL      13   0.564706  0.701176    0.394519   142.251015  0.354467
68      FL      11   0.487395  0.291317    0.394138    24.263576  0.342285
97      MA      21   0.257764  0.177019    0.394035   390.259808  0.346158
150     PA      11   0.998494  0.936747    0.392511   660.261869  0.345620
153     SC      21   0.652174  0.231884    0.392209    93.262182  0.345609
74      FL      21   0.965517  0.775862    0.391582   126.259423  0.348995
181     TX      11   0.762821  0.588675    0.389915   468.265444  0.344641
191     WI      33   1.000000  0.000000    0.387120   450.268534  0.344346
168     TX      33   0.191057  0.069106    0.387115    27.269994  0.342890
151     PA      21   0.269565  0.608696    0.387079   308.265419  0.347502
95      LA      13   5.000000  1.000000    0.375334    74.275300  0.349365

[196 rows x 7 columns]



In [53]:

    
# Preparation for comparing total number of words aligned with each topic
# To weight each topic by its prevalenced in the corpus, multiply each topic by the word count from above

col_list = []
for num in topic_columns:
    col = "%d_wc" % num
    col_list.append(col)
    df_w_topics[col] = df_w_topics[num] * df_w_topics['webpunct_count']
    
df_w_topics[:20]









    Out[53]:






  
    
      
      0
      1
      2
      SCHNAM
      ADDRESS
      URL
      SEARCH
      CUSTOMID
      WEBTEXT
      LEVEL
      ...
      webtokens_stemmed
      webtext_stemmed
      webstem_count
      webpunct_count
      webclean_count
      numtypes
      TTR
      0_wc
      1_wc
      2_wc
    
  
  
    
      0
      131.261273
      0.394623
      0.344103
      QUEST MIDDLE SCHOOL OF PINE BLUFF
      308 SOUTH BLAKE ST, PINE BLUFF, AR
      http://responsiveed.com/questpinebluff/
      QUEST MIDDLE SCHOOL OF PINE BLUFF 308 SOUTH BL...
      AR3542702
      Quest Middle Schools¨ are schools focused on h...
      2
      ...
      [quest, middl, schools¨, school, focus, high, ...
      quest middl schools¨ school focus high expect ...
      200
      314
      200
      139
      0.442675
      41216.039781
      123.911766
      108.048453
    
    
      1
      43.251431
      0.404550
      0.344018
      THE ACADEMIES AT JONESBORO HIGH SCHOOL
      301 HURRICANE DR, JONESBORO, AR
      http://www.jonesboroschools.net/schools/academ...
      THE ACADEMIES AT JONESBORO HIGH SCHOOL 301 HUR...
      AR1608703
      The mission of the Academies at Jonesboro High...
      3
      ...
      [mission, academi, jonesboro, high, school, pr...
      mission academi jonesboro high school provid h...
      63
      111
      63
      71
      0.639640
      4800.908882
      44.905089
      38.186028
    
    
      2
      23.290207
      0.367458
      0.342335
      A CHILD'S VIEW SCHOOL
      2846 DREXEL RD, TUCSON, AZ
      http://childcarecenter.us/provider_detail/a_ch...
      A CHILD'S VIEW SCHOOL 2846 DREXEL RD, TUCSON, AZ
      AZ87345
      We believe that every child needs a well-round...
      1
      ...
      [believ, everi, child, need, well-round, educ,...
      believ everi child need wellround educ famili ...
      38
      77
      38
      61
      0.792208
      1793.345929
      28.294249
      26.359822
    
    
      3
      13.599131
      5.047494
      0.353375
      AAEC - PARADISE VALLEY
      17811 NORTH 32ND ST, PHOENIX, AZ
      http://www.aaechighschools.com/
      AAEC - PARADISE VALLEY 17811 NORTH 32ND ST, PH...
      AZ6344
      AAEC Early College High School prepares young ...
      3
      ...
      [aaec, earli, colleg, high, school, prepar, yo...
      aaec earli colleg high school prepar young adu...
      31
      45
      31
      38
      0.844444
      611.960911
      227.137229
      15.901860
    
    
      4
      24.045600
      18.605429
      0.348971
      ABRAHAM LINCOLN TRADITIONAL SCHOOL
      10444 NORTH 39TH AVE, PHOENIX, AZ
      http://abrahamlincoln.wesdschools.org/
      ABRAHAM LINCOLN TRADITIONAL SCHOOL 10444 NORTH...
      AZ5274
      The mission of the Abraham Lincoln Traditional...
      1
      ...
      [mission, abraham, lincoln, tradit, school, gu...
      mission abraham lincoln tradit school guid cha...
      75
      106
      75
      71
      0.669811
      2548.833565
      1972.175517
      36.990918
    
    
      5
      0.437738
      93.213344
      0.348919
      ACADEMY DEL SOL
      4525 EAST BROADWAY BLVD, TUCSON, AZ
      http://www.academydelsol.com/
      ACADEMY DEL SOL 4525 EAST BROADWAY BLVD, TUCSO...
      AZ90200
      Academy Del SolÕs mission is to provide a rigo...
      1
      ...
      [academi, del, solõ, mission, provid, rigor, s...
      academi del solõ mission provid rigor superior...
      147
      226
      147
      140
      0.619469
      98.928743
      21066.215642
      78.855615
    
    
      6
      196.920592
      33.728109
      0.351299
      ACADEMY OF TUCSON ELEMENTARY SCHOOL
      9209 EAST WRIGHTSTOWN RD, TUCSON, AZ
      http://www.academyoftucson.com/elementary-scho...
      ACADEMY OF TUCSON ELEMENTARY SCHOOL 9209 EAST ...
      AZ81130
      Mission:\rIt is the purpose of the Academy of ...
      1
      ...
      [mission, purpos, academi, tucson, provid, pre...
      mission purpos academi tucson provid prepar gr...
      356
      620
      356
      278
      0.448387
      122090.767038
      20911.427627
      217.805335
    
    
      7
      24.971913
      14.676298
      0.351789
      DESERT MOSAIC SCHOOL
      5757 WEST AJO HWY, TUCSON, AZ
      http://desertmosaic.com/Home_Page.php
      DESERT MOSAIC SCHOOL 5757 WEST AJO HWY, TUCSON...
      AZ79118
      Desert Mosaic School commits to creating a tea...
      4
      ...
      [desert, mosaic, school, commit, creat, teach,...
      desert mosaic school commit creat teach enviro...
      67
      115
      67
      74
      0.643478
      2871.770000
      1687.774263
      40.455737
    
    
      8
      179.458463
      56.191420
      0.350117
      KAIZEN EDUCATION FOUNDATION DBA SUMMIT HIGH SC...
      728 EAST MCDOWELL RD, PHOENIX, AZ
      http://www.summiths.com/
      KAIZEN EDUCATION FOUNDATION DBA SUMMIT HIGH SC...
      AZ10749
      Summit High SchoolÕs Mission and Vision is to ...
      3
      ...
      [summit, high, schoolõ, mission, vision, provi...
      summit high schoolõ mission vision provid safe...
      363
      580
      363
      275
      0.474138
      104085.908809
      32591.023578
      203.067613
    
    
      9
      47.900249
      20.751915
      0.347836
      OASIS HIGH SCHOOL
      8632 WEST NORTHERN AVE, GLENDALE, AZ
      https://web.archive.org/web/20120617204246/htt...
      OASIS HIGH SCHOOL 8632 WEST NORTHERN AVE, GLEN...
      AZ78955
      Omega's mission is to provide an optimal teach...
      3
      ...
      [omega, 's, mission, provid, optim, teach, lea...
      omega s mission provid optim teach learn envir...
      133
      221
      133
      137
      0.619910
      10585.954983
      4586.173248
      76.871769
    
    
      10
      0.424860
      138.224560
      0.350580
      SAGE ACADEMY
      1055 EAST HEARN RD, SCOTTSDALE, AZ
      http://www.sage-academy.org/
      SAGE ACADEMY 1055 EAST HEARN RD, SCOTTSDALE, AZ
      AZ89415
      River Valley Charter School (RVCS) is a ...
      1
      ...
      [river, valley, charter, school, rvc, public, ...
      river valley charter school rvc public charter...
      206
      341
      206
      185
      0.542522
      144.877221
      47134.575023
      119.547755
    
    
      11
      401.374463
      198.273555
      0.351982
      SCHOOL FOR INTEGRATED ACADEMICS AND TECHNOLOGIES
      518 SOUTH 3RD ST, PHOENIX, AZ
      http://www.siatech.org/
      SCHOOL FOR INTEGRATED ACADEMICS AND TECHNOLOGI...
      AZ79450
      As a nonprofit organization, we create the env...
      3
      ...
      [nonprofit, organ, creat, environ, tool, techn...
      nonprofit organ creat environ tool techniqu re...
      964
      1476
      964
      634
      0.429539
      592428.707375
      292651.766967
      519.525658
    
    
      12
      9.411709
      89.237486
      0.350806
      SEQUOIA VILLAGE SCHOOL
      982 FULL HOUSE LN, SHOW LOW, AZ
      http://www.sequoiavillageschool.org/
      SEQUOIA VILLAGE SCHOOL 982 FULL HOUSE LN, SHOW...
      AZ10848
      Building a Better World One Student at a Time"...
      1
      ...
      [build, better, world, one, student, time, '',...
      build better world one student time  seed publ...
      188
      263
      188
      167
      0.634981
      2475.279353
      23469.458783
      92.261863
    
    
      13
      52.244098
      0.404307
      0.351595
      SOUTH POINTE HIGH SCHOOL
      8325 SOUTH CENTRAL AVE, PHOENIX, AZ
      http://www.southpointehs.com/
      SOUTH POINTE HIGH SCHOOL 8325 SOUTH CENTRAL AV...
      AZ80990
      We nurture our students academically, behavior...
      3
      ...
      [nurtur, student, academ, behavior, emot, serv...
      nurtur student academ behavior emot serv produ...
      65
      117
      65
      71
      0.606838
      6112.559448
      47.303895
      41.136657
    
    
      14
      91.595543
      80.053909
      0.350548
      SOUTH POINTE HIGH SCHOOL
      8325 SOUTH CENTRAL AVE, PHOENIX, AZ
      http://www.southpointehs.com/
      SOUTH POINTE HIGH SCHOOL 8325 SOUTH CENTRAL AV...
      AZ80990
      We nurture our students academically, behavior...
      3
      ...
      [nurtur, student, academ, behavior, emot, serv...
      nurtur student academ behavior emot serv produ...
      265
      466
      265
      233
      0.500000
      42683.523220
      37305.121560
      163.355220
    
    
      15
      14.281958
      0.371092
      0.346950
      SOUTH POINTE JUNIOR HIGH SCHOOL
      217 EAST OLYMPIC DR, PHOENIX, AZ
      http://www.southpointejh.com/
      SOUTH POINTE JUNIOR HIGH SCHOOL 217 EAST OLYMP...
      AZ79178
      To provide an education and support students d...
      2
      ...
      [provid, educ, support, student, develop, self...
      provid educ support student develop selfreli s...
      28
      48
      28
      37
      0.770833
      685.533972
      17.812416
      16.653613
    
    
      16
      18.623108
      3.020419
      0.356473
      SOUTHSIDE COMMUNITY SCHOOL
      2701 SOUTH CAMPBELL AVE, TUCSON, AZ
      http://www.ade.az.gov/edd/NewDetails.asp?Entit...
      SOUTHSIDE COMMUNITY SCHOOL 2701 SOUTH CAMPBELL...
      AZ79432
      Southside Community School is a free, public c...
      4
      ...
      [southsid, commun, school, free, public, chart...
      southsid commun school free public charter sch...
      29
      41
      29
      33
      0.804878
      763.547425
      123.837179
      14.615396
    
    
      17
      18.192244
      426.458469
      0.349286
      SUN VALLEY CHARTER SCHOOL
      5806 SOUTH 35TH AVE BLDG EAST, PHOENIX, AZ
      http://www.sunvalleycharterschool.com/
      SUN VALLEY CHARTER SCHOOL 5806 SOUTH 35TH AVE ...
      AZ90193
      Sun Valley Charter School has made a commitmen...
      1
      ...
      [sun, valley, charter, school, made, commit, p...
      sun valley charter school made commit provid s...
      666
      986
      666
      383
      0.388438
      17937.552914
      420488.050703
      344.396383
    
    
      18
      71.231727
      0.418614
      0.349659
      TANQUE VERDE ELEMENTARY SCHOOL
      2600 NORTH FENNIMOREA AVE, TUCSON, AZ
      http://www.tanqueverdeschools.org/
      TANQUE VERDE ELEMENTARY SCHOOL 2600 NORTH FENN...
      AZ5829
      "Excellence is our goal, understanding our fou...
      1
      ...
      [``, excel, goal, understand, foundat, '', tvu...
      excel goal understand foundat  tvusd ensur ef...
      103
      153
      103
      109
      0.712418
      10898.454184
      64.048014
      53.497802
    
    
      19
      49.240595
      0.410438
      0.348967
      TARTESSO ELEMENTARY SCHOOL
      29677 WEST INDIANOLA RD, BUCKEYE, AZ
      http://tartesso.smusd90.org/
      TARTESSO ELEMENTARY SCHOOL 29677 WEST INDIANOL...
      AZ89596
      We are located in Buckeye, Arizona and enjoy t...
      1
      ...
      [locat, buckey, arizona, enjoy, benefit, activ...
      locat buckey arizona enjoy benefit activ suppo...
      86
      145
      86
      88
      0.606897
      7139.886291
      59.513470
      50.600239
    
  

20 rows × 350 columns



In [54]:

    
# Now we can see the prevalence of each topic over words for each urban category and state
grouped_urban = df_w_topics.groupby('ULOCAL')
for e in col_list:
    print(e)
    print(grouped_urban[e].sum()/grouped_urban['webpunct_count'].sum())

grouped_state = df_w_topics.groupby('LSTATE')
for e in col_list:
    print(e)
    print(grouped_state[e].sum()/grouped_state['webpunct_count'].sum())









    



0_wc
ULOCAL
11    252.460199
12    344.588359
13    240.022477
21    127.201643
22     79.306683
23    138.867584
31    100.670414
32    137.660762
33     34.277186
41    146.551647
42     67.639781
dtype: float64
1_wc
ULOCAL
11    171.386633
12    318.874117
13    124.322144
21    567.652927
22    237.680326
23      0.399967
31    252.846123
32     41.157828
33    231.109223
41    228.613868
42    324.782390
dtype: float64
2_wc
ULOCAL
11    0.349413
12    0.349343
13    0.350095
21    0.348660
22    0.348461
23    0.347925
31    0.348880
32    0.348457
33    0.347473
41    0.349770
42    0.349948
dtype: float64
0_wc
LSTATE
AR    108.275173
AZ    307.060573
CA    244.894963
CO    120.827397
DC     73.747499
DE    264.251378
FL    371.423011
GA     20.119871
IL    476.761298
IN    121.664297
LA     29.510303
MA     42.715775
MD    154.503844
MI    195.752094
MN     19.651640
MO    134.226657
NC    192.330028
NJ     97.425364
NM     62.503755
NV      1.570714
NY    351.901927
OH    174.489852
OR    203.211717
PA     81.033768
RI      0.396424
SC      0.411729
TN     74.079174
TX    210.712480
UT     38.839149
WI    154.229482
dtype: float64
1_wc
LSTATE
AR       0.397216
AZ      98.357483
CA     303.382596
CO       3.559631
DC     237.404967
DE       0.401792
FL     129.540301
GA       0.527551
IL     264.466043
IN      63.021684
LA      44.251730
MA     278.693377
MD     220.144010
MI     304.035136
MN    1563.666227
MO       0.421632
NC     136.675891
NJ      28.225592
NM     130.136175
NV      94.078004
NY      74.740746
OH      19.871231
OR       7.582450
PA     321.091370
RI     973.256774
SC      81.426507
TN       2.251475
TX     122.607045
UT     400.402277
WI     176.445612
dtype: float64
2_wc
LSTATE
AR    0.344081
AZ    0.351287
CA    0.350417
CO    0.349726
DC    0.349568
DE    0.346830
FL    0.351154
GA    0.352578
IL    0.351628
IN    0.348872
LA    0.349275
MA    0.348913
MD    0.352145
MI    0.349951
MN    0.348572
MO    0.351711
NC    0.350324
NJ    0.349044
NM    0.351285
NV    0.351282
NY    0.348722
OH    0.346984
OR    0.347862
PA    0.346778
RI    0.346802
SC    0.345843
TN    0.347922
TX    0.346956
UT    0.345979
WI    0.346329
dtype: float64



In [55]:

    
# Here's the distribution of urban proximity over the three topics:
fig1 = plt.figure()
chrt = 0
for num in topic_columns:
    chrt += 1 
    ax = fig1.add_subplot(2,3, chrt)
    grouped_urban[num].mean().plot(kind = 'bar', yerr = grouped_urban[num].std(), ylim=0, ax=ax, title=num)

fig1.tight_layout()
plt.show()



In [56]:

    
# Here's the distribution of each topic over words, for each urban category:
fig2 = plt.figure()
chrt = 0
for e in col_list:
    chrt += 1 
    ax2 = fig2.add_subplot(2,3, chrt)
    (grouped_urban[e].sum()/grouped_urban['webpunct_count'].sum()).plot(kind = 'bar', ylim=0, ax=ax2, title=e)

fig2.tight_layout()
plt.show()

	aaec	abil	...
0	0.000000	0.000000	...
1	0.000000	0.000000	...
2	0.000000	0.000000	...
3	0.413013	0.000000	...
4	0.000000	0.000000	...
5	0.000000	0.000000	...
6	0.000000	0.000000	...
7	0.000000	0.000000	...
8	0.000000	0.031577	...
9	0.000000	0.000000	...
10	0.000000	0.000000	...
11	0.000000	0.000000	...
12	0.000000	0.000000	...
13	0.000000	0.000000	...
14	0.000000	0.047121	...
15	0.000000	0.000000	...
16	0.000000	0.000000	...
17	0.000000	0.000000	...
18	0.000000	0.000000	...
19	0.000000	0.000000	...

	0	1	2	SCHNAM	ADDRESS	URL	SEARCH	CUSTOMID	WEBTEXT	LEVEL	...	webtokens	webtokens_nopunct	webtokens_clean	webtokens_stemmed	webtext_stemmed	webstem_count	webpunct_count	webclean_count	numtypes	TTR
0	131.261273	0.394623	0.344103	QUEST MIDDLE SCHOOL OF PINE BLUFF	308 SOUTH BLAKE ST, PINE BLUFF, AR	http://responsiveed.com/questpinebluff/	QUEST MIDDLE SCHOOL OF PINE BLUFF 308 SOUTH BL...	AR3542702	Quest Middle Schools¨ are schools focused on h...	2	...	[quest, middle, schools¨, are, schools, focuse...	[quest, middle, schools¨, are, schools, focuse...	[quest, middle, schools¨, schools, focused, hi...	[quest, middl, schools¨, school, focus, high, ...	quest middl schools¨ school focus high expect ...	200	314	200	139	0.442675
1	43.251431	0.404550	0.344018	THE ACADEMIES AT JONESBORO HIGH SCHOOL	301 HURRICANE DR, JONESBORO, AR	http://www.jonesboroschools.net/schools/academ...	THE ACADEMIES AT JONESBORO HIGH SCHOOL 301 HUR...	AR1608703	The mission of the Academies at Jonesboro High...	3	...	[the, mission, of, the, academies, at, jonesbo...	[the, mission, of, the, academies, at, jonesbo...	[mission, academies, jonesboro, high, school, ...	[mission, academi, jonesboro, high, school, pr...	mission academi jonesboro high school provid h...	63	111	63	71	0.639640
2	23.290207	0.367458	0.342335	A CHILD'S VIEW SCHOOL	2846 DREXEL RD, TUCSON, AZ	http://childcarecenter.us/provider_detail/a_ch...	A CHILD'S VIEW SCHOOL 2846 DREXEL RD, TUCSON, AZ	AZ87345	We believe that every child needs a well-round...	1	...	[we, believe, that, every, child, needs, a, we...	[we, believe, that, every, child, needs, a, we...	[believe, every, child, needs, well-rounded, e...	[believ, everi, child, need, well-round, educ,...	believ everi child need wellround educ famili ...	38	77	38	61	0.792208
3	13.599131	5.047494	0.353375	AAEC - PARADISE VALLEY	17811 NORTH 32ND ST, PHOENIX, AZ	http://www.aaechighschools.com/	AAEC - PARADISE VALLEY 17811 NORTH 32ND ST, PH...	AZ6344	AAEC Early College High School prepares young ...	3	...	[aaec, early, college, high, school, prepares,...	[aaec, early, college, high, school, prepares,...	[aaec, early, college, high, school, prepares,...	[aaec, earli, colleg, high, school, prepar, yo...	aaec earli colleg high school prepar young adu...	31	45	31	38	0.844444
4	24.045600	18.605429	0.348971	ABRAHAM LINCOLN TRADITIONAL SCHOOL	10444 NORTH 39TH AVE, PHOENIX, AZ	http://abrahamlincoln.wesdschools.org/	ABRAHAM LINCOLN TRADITIONAL SCHOOL 10444 NORTH...	AZ5274	The mission of the Abraham Lincoln Traditional...	1	...	[the, mission, of, the, abraham, lincoln, trad...	[the, mission, of, the, abraham, lincoln, trad...	[mission, abraham, lincoln, traditional, schoo...	[mission, abraham, lincoln, tradit, school, gu...	mission abraham lincoln tradit school guid cha...	75	106	75	71	0.669811
5	0.437738	93.213344	0.348919	ACADEMY DEL SOL	4525 EAST BROADWAY BLVD, TUCSON, AZ	http://www.academydelsol.com/	ACADEMY DEL SOL 4525 EAST BROADWAY BLVD, TUCSO...	AZ90200	Academy Del SolÕs mission is to provide a rigo...	1	...	[academy, del, solõs, mission, is, to, provide...	[academy, del, solõs, mission, is, to, provide...	[academy, del, solõs, mission, provide, rigoro...	[academi, del, solõ, mission, provid, rigor, s...	academi del solõ mission provid rigor superior...	147	226	147	140	0.619469
6	196.920592	33.728109	0.351299	ACADEMY OF TUCSON ELEMENTARY SCHOOL	9209 EAST WRIGHTSTOWN RD, TUCSON, AZ	http://www.academyoftucson.com/elementary-scho...	ACADEMY OF TUCSON ELEMENTARY SCHOOL 9209 EAST ...	AZ81130	Mission:\rIt is the purpose of the Academy of ...	1	...	[mission, :, it, is, the, purpose, of, the, ac...	[mission, it, is, the, purpose, of, the, acade...	[mission, purpose, academy, tucson, provide, p...	[mission, purpos, academi, tucson, provid, pre...	mission purpos academi tucson provid prepar gr...	356	620	356	278	0.448387
7	24.971913	14.676298	0.351789	DESERT MOSAIC SCHOOL	5757 WEST AJO HWY, TUCSON, AZ	http://desertmosaic.com/Home_Page.php	DESERT MOSAIC SCHOOL 5757 WEST AJO HWY, TUCSON...	AZ79118	Desert Mosaic School commits to creating a tea...	4	...	[desert, mosaic, school, commits, to, creating...	[desert, mosaic, school, commits, to, creating...	[desert, mosaic, school, commits, creating, te...	[desert, mosaic, school, commit, creat, teach,...	desert mosaic school commit creat teach enviro...	67	115	67	74	0.643478
8	179.458463	56.191420	0.350117	KAIZEN EDUCATION FOUNDATION DBA SUMMIT HIGH SC...	728 EAST MCDOWELL RD, PHOENIX, AZ	http://www.summiths.com/	KAIZEN EDUCATION FOUNDATION DBA SUMMIT HIGH SC...	AZ10749	Summit High SchoolÕs Mission and Vision is to ...	3	...	[summit, high, schoolõs, mission, and, vision,...	[summit, high, schoolõs, mission, and, vision,...	[summit, high, schoolõs, mission, vision, prov...	[summit, high, schoolõ, mission, vision, provi...	summit high schoolõ mission vision provid safe...	363	580	363	275	0.474138
9	47.900249	20.751915	0.347836	OASIS HIGH SCHOOL	8632 WEST NORTHERN AVE, GLENDALE, AZ	https://web.archive.org/web/20120617204246/htt...	OASIS HIGH SCHOOL 8632 WEST NORTHERN AVE, GLEN...	AZ78955	Omega's mission is to provide an optimal teach...	3	...	[omega, 's, mission, is, to, provide, an, opti...	[omega, 's, mission, is, to, provide, an, opti...	[omega, 's, mission, provide, optimal, teachin...	[omega, 's, mission, provid, optim, teach, lea...	omega s mission provid optim teach learn envir...	133	221	133	137	0.619910
10	0.424860	138.224560	0.350580	SAGE ACADEMY	1055 EAST HEARN RD, SCOTTSDALE, AZ	http://www.sage-academy.org/	SAGE ACADEMY 1055 EAST HEARN RD, SCOTTSDALE, AZ	AZ89415	River Valley Charter School (RVCS) is a ...	1	...	[river, valley, charter, school, (, rvcs, ), i...	[river, valley, charter, school, rvcs, is, a, ...	[river, valley, charter, school, rvcs, public,...	[river, valley, charter, school, rvc, public, ...	river valley charter school rvc public charter...	206	341	206	185	0.542522
11	401.374463	198.273555	0.351982	SCHOOL FOR INTEGRATED ACADEMICS AND TECHNOLOGIES	518 SOUTH 3RD ST, PHOENIX, AZ	http://www.siatech.org/	SCHOOL FOR INTEGRATED ACADEMICS AND TECHNOLOGI...	AZ79450	As a nonprofit organization, we create the env...	3	...	[as, a, nonprofit, organization, ,, we, create...	[as, a, nonprofit, organization, we, create, t...	[nonprofit, organization, create, environments...	[nonprofit, organ, creat, environ, tool, techn...	nonprofit organ creat environ tool techniqu re...	964	1476	964	634	0.429539
12	9.411709	89.237486	0.350806	SEQUOIA VILLAGE SCHOOL	982 FULL HOUSE LN, SHOW LOW, AZ	http://www.sequoiavillageschool.org/	SEQUOIA VILLAGE SCHOOL 982 FULL HOUSE LN, SHOW...	AZ10848	Building a Better World One Student at a Time"...	1	...	[building, a, better, world, one, student, at,...	[building, a, better, world, one, student, at,...	[building, better, world, one, student, time, ...	[build, better, world, one, student, time, '',...	build better world one student time seed publ...	188	263	188	167	0.634981
13	52.244098	0.404307	0.351595	SOUTH POINTE HIGH SCHOOL	8325 SOUTH CENTRAL AVE, PHOENIX, AZ	http://www.southpointehs.com/	SOUTH POINTE HIGH SCHOOL 8325 SOUTH CENTRAL AV...	AZ80990	We nurture our students academically, behavior...	3	...	[we, nurture, our, students, academically, ,, ...	[we, nurture, our, students, academically, beh...	[nurture, students, academically, behaviorally...	[nurtur, student, academ, behavior, emot, serv...	nurtur student academ behavior emot serv produ...	65	117	65	71	0.606838
14	91.595543	80.053909	0.350548	SOUTH POINTE HIGH SCHOOL	8325 SOUTH CENTRAL AVE, PHOENIX, AZ	http://www.southpointehs.com/	SOUTH POINTE HIGH SCHOOL 8325 SOUTH CENTRAL AV...	AZ80990	We nurture our students academically, behavior...	3	...	[we, nurture, our, students, academically, ,, ...	[we, nurture, our, students, academically, beh...	[nurture, students, academically, behaviorally...	[nurtur, student, academ, behavior, emot, serv...	nurtur student academ behavior emot serv produ...	265	466	265	233	0.500000
15	14.281958	0.371092	0.346950	SOUTH POINTE JUNIOR HIGH SCHOOL	217 EAST OLYMPIC DR, PHOENIX, AZ	http://www.southpointejh.com/	SOUTH POINTE JUNIOR HIGH SCHOOL 217 EAST OLYMP...	AZ79178	To provide an education and support students d...	2	...	[to, provide, an, education, and, support, stu...	[to, provide, an, education, and, support, stu...	[provide, education, support, students, develo...	[provid, educ, support, student, develop, self...	provid educ support student develop selfreli s...	28	48	28	37	0.770833
16	18.623108	3.020419	0.356473	SOUTHSIDE COMMUNITY SCHOOL	2701 SOUTH CAMPBELL AVE, TUCSON, AZ	http://www.ade.az.gov/edd/NewDetails.asp?Entit...	SOUTHSIDE COMMUNITY SCHOOL 2701 SOUTH CAMPBELL...	AZ79432	Southside Community School is a free, public c...	4	...	[southside, community, school, is, a, free, ,,...	[southside, community, school, is, a, free, pu...	[southside, community, school, free, public, c...	[southsid, commun, school, free, public, chart...	southsid commun school free public charter sch...	29	41	29	33	0.804878
17	18.192244	426.458469	0.349286	SUN VALLEY CHARTER SCHOOL	5806 SOUTH 35TH AVE BLDG EAST, PHOENIX, AZ	http://www.sunvalleycharterschool.com/	SUN VALLEY CHARTER SCHOOL 5806 SOUTH 35TH AVE ...	AZ90193	Sun Valley Charter School has made a commitmen...	1	...	[sun, valley, charter, school, has, made, a, c...	[sun, valley, charter, school, has, made, a, c...	[sun, valley, charter, school, made, commitmen...	[sun, valley, charter, school, made, commit, p...	sun valley charter school made commit provid s...	666	986	666	383	0.388438
18	71.231727	0.418614	0.349659	TANQUE VERDE ELEMENTARY SCHOOL	2600 NORTH FENNIMOREA AVE, TUCSON, AZ	http://www.tanqueverdeschools.org/	TANQUE VERDE ELEMENTARY SCHOOL 2600 NORTH FENN...	AZ5829	"Excellence is our goal, understanding our fou...	1	...	[``, excellence, is, our, goal, ,, understandi...	[``, excellence, is, our, goal, understanding,...	[``, excellence, goal, understanding, foundati...	[``, excel, goal, understand, foundat, '', tvu...	excel goal understand foundat tvusd ensur ef...	103	153	103	109	0.712418
19	49.240595	0.410438	0.348967	TARTESSO ELEMENTARY SCHOOL	29677 WEST INDIANOLA RD, BUCKEYE, AZ	http://tartesso.smusd90.org/	TARTESSO ELEMENTARY SCHOOL 29677 WEST INDIANOL...	AZ89596	We are located in Buckeye, Arizona and enjoy t...	1	...	[we, are, located, in, buckeye, ,, arizona, an...	[we, are, located, in, buckeye, arizona, and, ...	[located, buckeye, arizona, enjoy, benefits, a...	[locat, buckey, arizona, enjoy, benefit, activ...	locat buckey arizona enjoy benefit activ suppo...	86	145	86	88	0.606897

Charter school identities and outcomes in the accountability era: Preliminary results