In [1]:
# Set up paths/ os
import os
import sys
this_path=os.getcwd()
os.chdir("../data")
sys.path.insert(0, this_path)
In [2]:
import pandas as pd
In [3]:
infile="articles-n-forums-posts.csv"
df=pd.read_csv(infile,index_col=0)
df.head(1)
Out[3]:
In [4]:
from textstat.textstat import textstat
test_data = """Playing games has always been thought to be important to the development of well-balanced and creative children; however, what part, if any, they should play in the lives of adults has never been researched that deeply. I believe that playing games is every bit as important for adults as for children. Not only is taking time out to play games with our children and other adults valuable to building interpersonal relationships but is also a wonderful way to release built up tension."""
print (textstat.flesch_reading_ease(test_data))
print (textstat.smog_index(test_data))
print (textstat.flesch_kincaid_grade(test_data))
print (textstat.coleman_liau_index(test_data))
print (textstat.automated_readability_index(test_data))
print (textstat.dale_chall_readability_score(test_data))
print (textstat.difficult_words(test_data))
print (textstat.linsear_write_formula(test_data))
print (textstat.gunning_fog(test_data))
print (textstat.text_standard(test_data))
https://www.ahrq.gov/professionals/quality-patient-safety/talkingquality/resources/writing/tip6.html
http://webcraft.tools/using-readability-score-wordpress/
The most commonly used are the Flesch Reading Ease, Flesch-Kincaid Grade Level, and the FOG Index.[1][2][3]
Flesh readability score
The Flesch readability score uses the sentence length (number of words per sentence) and the number of syllables per word in an equation to calculate the reading ease. Texts with a very high Flesch reading ease score (about 100) are very easy to read. They have short sentences and no words of more than two syllables.Jul 7, 2015
The lower the score, the more difficult the text is. The Flesch readability score uses the sentence length (number of words per sentence) and the number of syllables per word in an equation to calculate the reading ease. Texts with a very high Flesch reading ease score (about 100) are very easy to read. They have short sentences and no words of more than two syllables. Usually, a reading ease of 60-70 is believed to be acceptable/ normal for web copy.
Flesch-Kincaid score The Flesch–Kincaid readability tests are readability tests designed to indicate how difficult a passage in English is to understand. There are two tests, the Flesch Reading Ease, and the Flesch–Kincaid Grade Level. Although they use the same core measures (word length and sentence length), they have different weighting factors.
Analyzing the results is a simple exercise. For instance, a score of 5.0 indicates a grade-school level; i.e., a score of 9.3 means that a ninth grader would be able to read the document. This score makes it easier for teachers, parents, librarians, and others to judge the readability level of various books and texts for the students.
Theoretically, the lowest grade level score could be -3.4, but since there are no real passages that have every sentence consisting of a one-syllable word, it is a highly improbable result in practice. http://www.readabilityformulas.com/flesch-grade-level-readability-formula.php
Smog score A 2010 study published in the Journal of the Royal College of Physicians of Edinburgh stated that “SMOG should be the preferred measure of readability when evaluating consumer-oriented healthcare material.” The study found that “The Flesch-Kincaid formula significantly underestimated reading difficulty compared with the gold standard SMOG formula.”[4] Applying SMOG to other languages lacks statistical validity.[5]
To make calculating a text's readability as simple as possible an approximate formula was also given — count the words of three or more syllables in three 10-sentence samples, estimate the count's square root (from the nearest perfect square), and add 3.
Grade level: from 1 to 240. The bigger the score, the larger grade is. http://www.readabilityformulas.com/smog-readability-formula.php
In [5]:
#df2=df.loc[df['text'].str.len() > 3 ]
text=df['text']
df_readability=text.apply(textstat.flesch_kincaid_grade)
df_readability.head(1)
Out[5]:
In [13]:
df_readability.index.name='post id'
df_readability.name='Readability'
df_readability.to_csv("db-readability.csv",header=True)
In [10]:
Out[10]:
In [11]:
Out[11]:
In [ ]: