In this notebook we demonstrate a basic document level classification of reports with respect to a single finding ( fever). We leverage the convenience of Pandas to read our data from a MySQL database and then use Pandas to add our classification as a new column in the dataframe.
Many of the common pyConTextNLP tasks have been wrapped into functions contained in the radnlp
pacakge. We important multiple modules that will allow us to write concise code.
In [12]:
from utils import *
import pandas as pd
options = {}
options['lexical_kb'] = ["https://raw.githubusercontent.com/chapmanbe/pyConTextNLP/master/KB/lexical_kb_nlm.tsv"]
options["schema"] = "https://raw.githubusercontent.com/chapmanbe/pyConTextNLP/master/KB/schema2.csv"
options["rules"] = "https://raw.githubusercontent.com/chapmanbe/pyConTextNLP/master/KB/classificationRules3.csv"
data = get_data()
data.head(5)
Out[12]:
targets
and modifiers
as demonstrated below.We now need to apply our schema to the reports. Since our data is in a Pandas data frame, the easiest way to process our reports is with the DataFrame apply
method.
In [10]:
radnlp_rules = rules.read_rules(options["rules"])
myschema = schema.read_schema(options["schema"])
modifiers = itemData.itemData()
targets = itemData.itemData()
for kb in options['lexical_kb']:
modifiers.extend( itemData.instantiateFromCSVtoitemData(kb) )
targets.extend([["pulmonary embolism", "PULMONARY_EMBOLISM", "", ""],
["pulmonary emboli", "PULMONARY_EMBOLISM", "", ""],
["pneumonia", "LUNG_DISEASE", "", ""]])
modifiers.extend((["no definite", "PROBABLE_NEGATED_EXISTENCE", "", "forward"],
["no", "DEFINITE_NEGATED_EXISTENCE", "", "forward"],))
colors = {"pulmonary_embolism":"blue",
"lung_disease":"turquoise",
"probable_negated_existence":"pink",
"definite_negated_existence":"red",
"probable_existence":"green",
"conj":"goldenrod",
}
#data = data.dropna()
data["pe rslt"] = \
data.apply(lambda x: analyze_report(x["impression"],
modifiers,
targets,
radnlp_rules,
myschema), axis=1)
In [11]:
view_markup(data, colors)
In [ ]: