Demonstration of Basic Sentence Markup with pyConTextNLP, Part 2.

An ever-so-slightly more complex sentence

Let's use a slightly more complex sentence that will illustrate pruning.


In [1]:
import pyConTextNLP.pyConTextGraph as pyConText
import pyConTextNLP.itemData as itemData
import networkx as nx

Sentences

These example reports are taken from (with modification) the MIMIC2 demo data set that is a publically available database of de-identified medical records for deceased individuals.


In [2]:
reports = [
    """IMPRESSION: Evaluation limited by lack of IV contrast; however, no evidence of
      bowel obstruction or mass identified within the abdomen or pelvis. Non-specific interstitial opacities and bronchiectasis seen at the right
     base, suggestive of post-inflammatory changes.""",
    """IMPRESSION: Evidence of early pulmonary vascular congestion and interstitial edema. Probable scarring at the medial aspect of the right lung base, with no
     definite consolidation."""
    ,
    """IMPRESSION:
     
     1.  2.0 cm cyst of the right renal lower pole.  Otherwise, normal appearance
     of the right kidney with patent vasculature and no sonographic evidence of
     renal artery stenosis.
     2.  Surgically absent left kidney.""",
    """IMPRESSION:  No pneumothorax.""",
    """IMPRESSION: No definite pneumothorax""",
    """IMPRESSION:  New opacity at the left lower lobe consistent with pneumonia."""
]

Read the itemData definitions


In [3]:
modifiers = itemData.instantiateFromCSVtoitemData(
    "https://raw.githubusercontent.com/chapmanbe/pyConTextNLP/master/KB/lexical_kb_05042016.tsv")
targets = itemData.instantiateFromCSVtoitemData(
    "https://raw.githubusercontent.com/chapmanbe/pyConTextNLP/master/KB/utah_crit.tsv")

We're going to start with our simplest of sentences


In [4]:
reports[4]


Out[4]:
'IMPRESSION: No definite pneumothorax'

marking up a sentence

We start by creating an instance of the ConTextMarkup class. This is a subclass of a NetworkX DiGraph. Information will be stored in the nodes and edges.


In [5]:
markup = pyConText.ConTextMarkup()

In [6]:
markup.setRawText(reports[4].lower())
print(markup)
print(len(markup.getRawText()))

markup.cleanText()
print(markup)
print(len(markup.getText()))


__________________________________________
rawText: impression: no definite pneumothorax
cleanedText: None
__________________________________________

36
__________________________________________
rawText: impression: no definite pneumothorax
cleanedText: impression: no definite pneumothorax
__________________________________________

36

Identify concepts in the sentence


In [7]:
markup.markItems(modifiers, mode="modifier")
markup.markItems(targets, mode="target")
print(markup.nodes(data=True))


[(<id> 114731505145955200893208409019887573968 </id> <phrase> pneumothorax </phrase> <category> ['pneumothorax'] </category> , {'category': 'target'}), (<id> 114707835890860389453644815748134525904 </id> <phrase> no </phrase> <category> ['definite_negated_existence'] </category> , {'category': 'modifier'}), (<id> 114696347015014195979537648822800276432 </id> <phrase> definite </phrase> <category> ['definite_existence'] </category> , {'category': 'modifier'}), (<id> 114708186079338702499480608197255289808 </id> <phrase> no definite </phrase> <category> ['definite_negated_existence'] </category> , {'category': 'modifier'})]

What does our initial markup look like?

  • We've identified three concepts in the sentence:
    1. "no"
    2. "no definite"
    3. "pneumothorax"
  • Here "no" is not a true concept in the sentence; it is a subset of the concept "no definite"

Prune Marks

After identifying concepts, we prune concepts that are a subset of another identified concept.


In [8]:
markup.pruneMarks()
print(markup.nodes())


[<id> 114731505145955200893208409019887573968 </id> <phrase> pneumothorax </phrase> <category> ['pneumothorax'] </category> , <id> 114708186079338702499480608197255289808 </id> <phrase> no definite </phrase> <category> ['definite_negated_existence'] </category> ]

What is the effect of pruneMarks

We've correctly dropped no as an identified concept.

Apply modifiers

We now call the applyModifiers method of the ConTextMarkup object to identify any relationships between the nodes.


In [9]:
markup.applyModifiers()
print(markup.edges())


[(<id> 114708186079338702499480608197255289808 </id> <phrase> no definite </phrase> <category> ['definite_negated_existence'] </category> , <id> 114731505145955200893208409019887573968 </id> <phrase> pneumothorax </phrase> <category> ['pneumothorax'] </category> )]

Here is a notebook for Multisentence Documents