Exercise 06: store annotated text as JSON files

Much of the preceding NLP code has been worked into a small library, and we'll call functions from that library to help keep these notebooks more readable. Take a look at the source code in pynlp.py, and an example usage:


In [ ]:
import pynlp

html_file = "html/article1.html"
json_file = "a1.json"

pynlp.full_parse(html_file, json_file)

That extracts text from HTML in the first article, then stores the parsed and annotated text as JSON, one line per sentence. Let's look at the first two sentences:


In [ ]:
%sx more a1.json
# representation of paragraphs, sentences, words, and sentence annotation

Now your turn, in the following code block, run the extract/parse/save-to-JSON for each of the example HTML files:


In [ ]: