Adapted from a lesson by Teddy Roland
With only the tools we learned in the last tutorial we can do a good amount of text analysis. No special libraries or functions, just counting.
Now that we have some of Python's basics in our toolkit, we can immediately perform the kinds of tasks that are the bread and butter of text analysis: counting. When we first meet a text in the wild, we often wish to find out a little about it before digging in deeply, so we start with simple questions like "How many words are in this text?" or "What is the average word length?"
Run the cell below to read in the text of "Pride and Prejudice" and assign it to the variable "austen_string", and read in the text of Louisa May Alcott's "A Garland for Girls," a children's book, and assign it to the variable "aclott_string". With these variables, print the answer to the following questions:
In [ ]:
#read in the texts
austen_string = open('../Data/Austen_PrideAndPrejudice.txt', encoding='utf-8').read()
alcott_string = open('../Data/Alcott_GarlandForGirls.txt', encoding='utf-8').read()
#print the first 100 characters of each text to make sure everything is in order
print(austen_string[:100])
print(alcott_string[:100])
In [ ]: