Simple Clinical Natural Language Processing with pyConTextNLP

In this notebook we introduce the basics of pyConTextNLP, a simple Python tool that we have used extensively for processing clinical text, including radiology, psychiatry, etc.

pyConTextNLP is built around the concept of targets and modifiers: the target is the concept we are interested in identifying (like a cough or a pulmonary embolism); a modifier is a concept that changes the target in some sense (e.g. historical, severity, certainty, negation).

pyConTextNLP relies on regular expressions to identify concepts (both targets and modifiers) within a sentence and then uses simple lexical rules to assign relationships between the identified targets and modifiers. Internally, pyConTextNLP uses graphs. Targets and modifiers are nodes in the graph and relationships between modifiers and targets are edges in the graph.

Specifying targets, modifiers, and rules

pyConTextNLP uses a four-tuple to represent concepts. Within the program we create an instance of an itemData class. Each itemData consists of the following four attributres:

  1. A literal (e.g. "pulmonary embolism", "no definite evidence of"): This is a lingustic representation of the target or modifier we want to identify
  2. A category (e.g. "CRITICAL_FINDING", "PROBABLE_EXISTENCE"): This is the label we want applied to the literal when we see it in text
  3. A regular expression that defines how to identify the literal concept. If no regular expression is specified, a regular expression will be built directly from the literal by wrapping it with word boundaries (e.g. r"""\bpulmonary embolism\b""")
  4. A rule that defines how the concept works in the sentence (e.g. a negation term that looks forward in the sentence). this only applies to modifiers.

In [ ]:
!pip install -U pycontextnlp==0.6.1.1

In [ ]:
import pyConTextNLP.pyConTextGraph as pyConText
import pyConTextNLP.itemData as itemData

The task: Identify patients with pulmonary embolism from radiology reports

Step 1: how is the concept of pulmonary embolism represented in the reports - fill in the list below with literals you want to use.


In [ ]:
mytargets = itemData.itemData()
mytargets.extend([["pulmonary embolism", "CRITICAL_FINDING", "", ""],
                   ["pneumonia", "CRITICAL_FINDING", "", ""]])

In [ ]:
print(mytargets)

In [ ]:
!pip install -U radnlp==0.2.0.8

Sentence Splitting

pyConTextNLP operates on a sentence level and so the first step we need to take is to split our document into individual sentences. pyConTextNLP comes with a simple sentence splitter class.


In [ ]:
import pyConTextNLP.helpers as helpers
spliter = helpers.sentenceSplitter()
spliter.splitSentences("This is Dr. Chapman's first sentence. This is the 2.0 sentence.")

However, sentence splitting is a common NLP task and so most full-fledged NLP applications provide sentence splitters. We usually rely on the sentence splitter that is part of the TextBlob package, which in turn relies on the Natural Language Toolkit (NLTK). So before proceeding we need to download some NLTK resources with the following command.


In [ ]:
!python -m textblob.download_corpora

In [ ]: