Processing Multisentence Documents


In [1]:
import pyConTextNLP.pyConTextGraph as pyConText
import pyConTextNLP.itemData as itemData
from textblob import TextBlob
import networkx as nx
import pyConTextNLP.display.html as html
from IPython.display import display, HTML

In [2]:
reports = [
    """IMPRESSION: Evaluation limited by lack of IV contrast; however, no evidence of
      bowel obstruction or mass identified within the abdomen or pelvis. Non-specific interstitial opacities and bronchiectasis seen at the right
     base, suggestive of post-inflammatory changes.""",
    """IMPRESSION: Evidence of early pulmonary vascular congestion and interstitial edema. Probable scarring at the medial aspect of the right lung base, with no
     definite consolidation."""
    ,
    """IMPRESSION:
     
     1.  2.0 cm cyst of the right renal lower pole.  Otherwise, normal appearance
     of the right kidney with patent vasculature and no sonographic evidence of
     renal artery stenosis.
     2.  Surgically absent left kidney.""",
    """IMPRESSION:  No pneumothorax.""",
    """IMPRESSION: No definite pneumothorax"""
    """IMPRESSION:  New opacity at the left lower lobe consistent with pneumonia."""
]

In [3]:
modifiers = itemData.instantiateFromCSVtoitemData(
    "https://raw.githubusercontent.com/chapmanbe/pyConTextNLP/master/KB/lexical_kb_05042016.tsv")
targets = itemData.instantiateFromCSVtoitemData(
    "https://raw.githubusercontent.com/chapmanbe/pyConTextNLP/master/KB/utah_crit.tsv")

Define markup_sentence

We are putting the functionality we went through in the previous two notebooks (BasicSentenceMarkup and BasicSentenceMarkupPart2) into a function markup_sentence. We add one step to the function: dropInactiveModifiers will delete any modifier node that does not get attached to a target node.


In [4]:
def markup_sentence(s, modifiers, targets, prune_inactive=True):
    """
    """
    markup = pyConText.ConTextMarkup()
    markup.setRawText(s)
    markup.cleanText()
    markup.markItems(modifiers, mode="modifier")
    markup.markItems(targets, mode="target")
    markup.pruneMarks()
    markup.dropMarks('Exclusion')
    # apply modifiers to any targets within the modifiers scope
    markup.applyModifiers()
    markup.pruneSelfModifyingRelationships()
    if prune_inactive:
        markup.dropInactiveModifiers()
    return markup

In [5]:
report = reports[0]
print(report)


IMPRESSION: Evaluation limited by lack of IV contrast; however, no evidence of
      bowel obstruction or mass identified within the abdomen or pelvis. Non-specific interstitial opacities and bronchiectasis seen at the right
     base, suggestive of post-inflammatory changes.

Create a ConTextDocument

ConTextDocument is a class for organizing the markup of multiple sentences. It has a private attribute that is NetworkX DiGraph that represents the document structure. In this exmaple we only use the ConTextDocument class to collect multiple sentence markups.


In [6]:
context = pyConText.ConTextDocument()

Split the document into sentences and process each sentence

pyConTextNLP comes with a simple sentence splitter in helper.py. I have not been maintaining this and have recently been using TextBlob to split sentences. A known problem with either sentence splitting solution is enumerated lists that don't use periods.


In [7]:
blob = TextBlob(report.lower())
count = 0
rslts = []
for s in blob.sentences:
    m = markup_sentence(s.raw, modifiers=modifiers, targets=targets)
    rslts.append(m)

for r in rslts:
    context.addMarkup(r)

Displaying pyConTextNLP Markups

The display subpackage contains some functionality for visualizing the markups. Here I use HTML to color-code identified concepts.


In [8]:
clrs = {\
    "bowel_obstruction": "blue",
    "inflammation": "blue",
    "definite_negated_existence": "red",
    "probable_negated_existence": "indianred",
    "ambivalent_existence": "orange",
    "probable_existence": "forestgreen",
    "definite_existence": "green",
    "historical": "goldenrod",
    "indication": "pink",
    "acute": "golden"
}

In [9]:
display(HTML(html.mark_document_with_html(context,colors = clrs, default_color="black")))


impression: evaluation limited by lack of iv contrast; however, no evidence of bowel obstruction or mass identified within the abdomen or pelvis. non-specific interstitial opacities and bronchiectasis seen at the right base, suggestive of post-inflammatory changes.

There is also a rich XML description of the ConTextDocument


In [10]:
print(context.getXML())


<ConTextDocument>
impression: evaluation limited by lack of iv contrast; however, no evidence of bowel obstruction or mass identified within the abdomen or pelvis. non-specific interstitial opacities and bronchiectasis seen at the right base, suggestive of post-inflammatory changes. <section>
<sectionLabel> document </sectionLabel>
<sentence>
<sentenceNumber> 0 </sentenceNumber>
<sentenceOffset> 0 </sentenceOffset></sentence>

<ConTextMarkup>
<rawText> impression: evaluation limited by lack of iv contrast; however, no evidence of
      bowel obstruction or mass identified within the abdomen or pelvis. </rawText>
<cleanText> impression: evaluation limited by lack of iv contrast; however, no evidence of bowel obstruction or mass identified within the abdomen or pelvis. </cleanText>
<nodes>

<node>
<category> modifier </category>

<tagObject>
<id> 334788564763292261959162205299098966992 </id>
<phrase> evaluation </phrase>
<literal> evaluation </literal>
<category> ['indication'] </category>
<spanStart> 12 </spanStart>
<spanStop> 22 </spanStop>
<scopeStart> 0 </scopeStart>
<scopeStop> 55 </scopeStop>
</tagObject>
<modifiedBy>
<modifyingNode> 334822605143316515627910843029918243792 </modifyingNode>
<modifyingCategory> ['conj'] </modifyingCategory>
</modifiedBy>

</node>

<node>
<category> modifier </category>

<tagObject>
<id> 334822605143316515627910843029918243792 </id>
<phrase> however </phrase>
<literal> however </literal>
<category> ['conj'] </category>
<spanStart> 55 </spanStart>
<spanStop> 62 </spanStop>
<scopeStart> 0 </scopeStart>
<scopeStop> 145 </scopeStop>
</tagObject>
<modifies>
<modifiedNode> 334788564763292261959162205299098966992 </modifiedNode>
</modifies>

</node>

<node>
<category> modifier </category>

<tagObject>
<id> 334809205284190478100466625343833693136 </id>
<phrase> no evidence of </phrase>
<literal> no evidence of </literal>
<category> ['definite_negated_existence'] </category>
<spanStart> 64 </spanStart>
<spanStop> 78 </spanStop>
<scopeStart> 78 </scopeStart>
<scopeStop> 145 </scopeStop>
</tagObject>
<modifies>
<modifiedNode> 334829410050194865794993376561114569680 </modifiedNode>
</modifies>

</node>

<node>
<category> target </category>

<tagObject>
<id> 334829410050194865794993376561114569680 </id>
<phrase> bowel obstruction </phrase>
<literal> bowel obstruction </literal>
<category> ['bowel_obstruction'] </category>
<spanStart> 79 </spanStart>
<spanStop> 96 </spanStop>
<scopeStart> 0 </scopeStart>
<scopeStop> 145 </scopeStop>
</tagObject>
<modifiedBy>
<modifyingNode> 334809205284190478100466625343833693136 </modifyingNode>
<modifyingCategory> ['definite_negated_existence'] </modifyingCategory>
</modifiedBy>

</node>

</nodes>
<edges>

<edge>
<startNode> 334822605143316515627910843029918243792 </startNode>
<endNode> 334788564763292261959162205299098966992 </endNode>

</edge>

<edge>
<startNode> 334809205284190478100466625343833693136 </startNode>
<endNode> 334829410050194865794993376561114569680 </endNode>

</edge>

</edges>
</ConTextMarkup>
<sentence>
<sentenceNumber> 1 </sentenceNumber>
<sentenceOffset> 146 </sentenceOffset></sentence>

<ConTextMarkup>
<rawText> non-specific interstitial opacities and bronchiectasis seen at the right
     base, suggestive of post-inflammatory changes. </rawText>
<cleanText> non-specific interstitial opacities and bronchiectasis seen at the right base, suggestive of post-inflammatory changes. </cleanText>
<nodes>

<node>
<category> modifier </category>

<tagObject>
<id> 334836656416394745438541320585217498064 </id>
<phrase> suggestive </phrase>
<literal> suggest </literal>
<category> ['probable_existence'] </category>
<spanStart> 79 </spanStart>
<spanStop> 89 </spanStop>
<scopeStart> 0 </scopeStart>
<scopeStop> 119 </scopeStop>
</tagObject>
<modifies>
<modifiedNode> 334838179023221944569285845470032616400 </modifiedNode>
</modifies>

</node>

<node>
<category> target </category>

<tagObject>
<id> 334838179023221944569285845470032616400 </id>
<phrase> inflammatory </phrase>
<literal> inflammation </literal>
<category> ['inflammation'] </category>
<spanStart> 98 </spanStart>
<spanStop> 110 </spanStop>
<scopeStart> 0 </scopeStart>
<scopeStop> 119 </scopeStop>
</tagObject>
<modifiedBy>
<modifyingNode> 334836656416394745438541320585217498064 </modifyingNode>
<modifyingCategory> ['probable_existence'] </modifyingCategory>
</modifiedBy>

</node>

</nodes>
<edges>

<edge>
<startNode> 334836656416394745438541320585217498064 </startNode>
<endNode> 334838179023221944569285845470032616400 </endNode>

</edge>

</edges>
</ConTextMarkup>
</section>

</ConTextDocument>