In this notebook we introduce the basics of pyConTextNLP, a simple Python tool that we have used extensively for processing clinical text, including radiology, psychiatry, etc.
pyConTextNLP is built around the concept of targets and modifiers: the target is the concept we are interested in identifying (like a cough or a pulmonary embolism); a modifier is a concept that changes the target in some sense (e.g. historical, severity, certainty, negation).
pyConTextNLP relies on regular expressions to identify concepts (both targets and modifiers) within a sentence and then uses simple lexical rules to assign relationships between the identified targets and modifiers. Internally, pyConTextNLP uses graphs. Targets and modifiers are nodes in the graph and relationships between modifiers and targets are edges in the graph.
pyConTextNLP uses a four-tuple to represent concepts. Within the program we create an instance of an itemData
class. Each itemData
consists of the following four attributres:
In [ ]:
!pip install -U pycontextnlp==0.6.1.1
In [ ]:
import pyConTextNLP.pyConTextGraph as pyConText
import pyConTextNLP.itemData as itemData
In [ ]:
mytargets = itemData.itemData()
mytargets.extend([["pulmonary embolism", "CRITICAL_FINDING", "", ""],
["pneumonia", "CRITICAL_FINDING", "", ""]])
In [ ]:
print(mytargets)
In [ ]:
!pip install -U radnlp==0.2.0.8
In [ ]:
import pyConTextNLP.helpers as helpers
spliter = helpers.sentenceSplitter()
spliter.splitSentences("This is Dr. Chapman's first sentence. This is the 2.0 sentence.")
However, sentence splitting is a common NLP task and so most full-fledged NLP applications provide sentence splitters. We usually rely on the sentence splitter that is part of the TextBlob package, which in turn relies on the Natural Language Toolkit (NLTK). So before proceeding we need to download some NLTK resources with the following command.
In [ ]:
!python -m textblob.download_corpora
In [ ]: