notebook.community

Edit and run



In [1]:

    
import fcFinder as fc
import pyConTextNLP.itemData as itemData
import os
import importlib



In [24]:

    
import sys
import os

sys.path
sys.path.append(os.path.join(os.path.expanduser('~'),'Box Sync', 'Bucher_Surgical_MIMICIII/pyConText_implement/fcFinder/fcFinder.py'))
#fdir



In [25]:

    
sys.path









    Out[25]:





['',
 '/Users/alec/anaconda/lib/python35.zip',
 '/Users/alec/anaconda/lib/python3.5',
 '/Users/alec/anaconda/lib/python3.5/plat-darwin',
 '/Users/alec/anaconda/lib/python3.5/lib-dynload',
 '/Users/alec/anaconda/lib/python3.5/site-packages',
 '/Users/alec/anaconda/lib/python3.5/site-packages/Sphinx-1.4.6-py3.5.egg',
 '/Users/alec/anaconda/lib/python3.5/site-packages/aeosa',
 '/Users/alec/anaconda/lib/python3.5/site-packages/setuptools-27.2.0-py3.5.egg',
 '/Users/alec/anaconda/lib/python3.5/site-packages/IPython/extensions',
 '/Users/alec/.ipython',
 '/Users/alec/Box Sync/Bucher_Surgical_MIMICIII/pyConText_implement/fcFinder/fcFinder.py']



In [5]:

    
importlib.reload(fc)









    Out[5]:





<module 'fcFinder' from '/Users/alec/Box Sync/Bucher_Surgical_MIMICIII/pyConText_implement/fcFinder/fcFinder.py'>



In [4]:

    
"""import pyConTextNLP.pyConTextGraph as pyConText
from pyConTextNLP.pyConTextGraph import ConTextMarkup

from textblob import TextBlob
import networkx as nx
import os

import glob

import re
import copy
import networkx as nx
import platform
import copy
import uuid
import datetime
import time"""
''









    Out[4]:





''



In [6]:

    
#import fcClasses

fcFinder

I'm designing a function to work as a pipeline using the module fcFinder. Here is an outline of how the function works:

As input, it takes:
- a file name (the report)
- an outpath to save the XML file
- modifiers and targets as itemData objects
The function then follows this order, as demonstrated below:
- Reads in the file
- Creates a conText Document that is made up of markup objects
- Classifies each tagObject as being either definitive evidence of fluid collection, negated evidence, or indication (need to get it to work for the relevant anatomic modifiers)
- Creates mentionAnnotation objects out of those tabObjects
- Writes one XML .txt.knowtator file. This still needs to be worked on so that it is compatible with eHOST, but the basic structure is there.



In [7]:

    
#one annotation
modifiers = itemData.instantiateFromCSVtoitemData("/Users/alec/Box Sync/Bucher_Surgical_MIMICIII/pyConText_implement/modifiers.tsv")
targets = itemData.instantiateFromCSVtoitemData(
    "file:///Users/alec/Box Sync/Bucher_Surgical_MIMICIII/pyConText_implement/targets.tsv")



In [8]:

    
input_report = '/Users/alec/Box Sync/Bucher_Surgical_MIMICIII/Radiology_Annotation/Adjudication/Batch_3/corpus/Yes_28226_116465_05-29-93.txt'
report = ''
with open(input_report,'r') as f0:
    report += f0.read()
len(report)
report_name = os.path.basename(input_report)
report_name









    Out[8]:





'Yes_28226_116465_05-29-93.txt'



In [9]:

    
context = fc.create_context_doc(report, modifiers, targets)



In [10]:

    
len(context.getSectionMarkups())









    Out[10]:





32



In [11]:

    
#importlib.reload(fc)



In [12]:

    
#Apply logic, classify tagObjects
annotations1=fc.fluid_collection_classifier(context,"textfile.txt")









    



Definitive evidence: 5 
 Negated evidence: 0 
 Indication: 4



In [13]:

    
annotation_strings = []
print(type(annotations1))
print(type(annotations1[0]))
for annotation in annotations1:
    annotation_strings.append(annotation.getXML())
    print(annotation.getMentionClass())
#definitive_mention = fc.fluid_collection_classifier(context)[0][0]
#definitive_mention.has_edge()
#Will this work to catch the appropriate anatomical modifiers?
#[x.getXML() for x in annotations1]









    



<class 'list'>
<class 'fcFinder.mentionAnnotation'>
fluid collection-indication
fluid collection-indication
fluid collection-indication
Fluid collection-definitive
Fluid collection-definitive
fluid collection-indication
Fluid collection-definitive
Fluid collection-definitive
Fluid collection-definitive



In [14]:

    
#print(type(definitive_mention))
#tagObject1 = definitive_mention.nodes()[0]
#tagObject1.getSpan()

XML

This part of the pipeline takes all of the annotations from above and writes it into a Knowtator XML file. It is not yet able to be imported into eHOST



In [15]:

    
#annotation1 = fcClasses.createAnnotations(definitive_mention,'fluid collection-definitive','Yes_28226_116465_05-29-93.txt')[0]
annotation1 = annotations1[0]
XML1 = annotation1.getXML()
print(XML1)









    



<annotation> 
                <mention id = "229389172551677878530093185737057295312" /> 
                <annotator id=>FC_FINDER</annotator> 
                <span start="0" end="152" />
                <spannedText>[**2659-8-24**] 2:25 pm ct abd w&w/o c; ct pelvis w/contrast clip # [**clip number (radiology) 67740**] reason: evaluation for abscess and pancreasitis.</spannedText>
                <creationDate>Thu Feb  2 13:53:41 2017</creationDate>
        </annotation>
        <classMention id="229389172551677878530093185737057295312">
                <mentionClass id="fluid collection-indication">[**2659-8-24**] 2:25 pm ct abd w&w/o c; ct pelvis w/contrast clip # [**clip number (radiology) 67740**] reason: evaluation for abscess and pancreasitis.</mentionClass>
        </classMention>



In [16]:

    
#XML1 = fcClasses.getXML(annotation1)



In [17]:

    
#print(XML1)
#importlib.reload(fc)



In [18]:

    
print(len(annotation_strings))
#print(len(fc.writeKnowtator(annotation_strings,report_name,os.path.join('..','XML_examples'))))
print(fc.writeKnowtator(annotation_strings,report_name,os.path.join('/Users/alec/Desktop/fcFinder_test','saved')))









    



9
/Users/alec/Desktop/fcFinder_test/saved/Yes_28226_116465_05-29-93.txt.knowtator.xml

One pipelined function for one file



In [19]:

    
input_report2 = '/Users/alec/Box Sync/Adjudication/Batch_3/corpus/No_69411_129942_04-30-58.txt'
os.getcwd()









    Out[19]:





'/Users/alec/Box Sync/Bucher_Surgical_MIMICIII/pyConText_implement/fcFinder'



In [20]:

    
def fcFinder(input_report, output_dir, modifiers=modifiers, targets=targets):
    if not os.path.exists(output_dir):
        os.mkdir(os.path.join(output_dir,'saved'))
    report = ''
    with open(input_report,'r') as f0:
        report += f0.read()
    context = fc.create_context_doc(report, modifiers, targets)
    report_name = os.path.basename(input_report)
    annotations = fc.fluid_collection_classifier(context,input_report)
    XML_strings = [x.getXML() for x in annotations]
    return fc.writeKnowtator(XML_strings,report_name,os.path.join(output_dir,'saved'))



In [21]:

    
fcFinder(input_report,'/Users/alec/Desktop/fcFinder_test')









    



Definitive evidence: 5 
 Negated evidence: 0 
 Indication: 4 






    Out[21]:





'/Users/alec/Desktop/fcFinder_test/saved/Yes_28226_116465_05-29-93.txt.knowtator.xml'

Batches of Documents (not ready yet)



In [127]:

    
### Need to work on this ###
def fcFinder_batches(batch_path, batch_schema,output_dir, targets=targets,modifiers=modifiers):
    batches = batch_path+'//%s'%batch_schema
    #return os.listdir(batch_path)
    listOfBatches = glob.glob(batches)
    counter = 0
    #return listOfBatches
    for batch in listOfBatches:
        #print(batch)
        corpus = os.path.join(batch,'corpus')
        for file in os.listdir(corpus):
            file_path = os.path.isfile(os.path.join(batch,corpus,file))
            #print(file)
            counter += 1
            if os.path.isfile(os.path.join(batch,corpus,file)):
                #print(file)
                if not os.path.exists(os.path.join(output_dir,batch)):
                    os.mkdir(os.path.exists(output_dir,bat))
                fcFinder(file_path,modifiers,targets,os.path.join(output_dir,batch))
                    
    return counter
    return output_dir
    return listOfBatches
fcFinder_batches('/Users/alec/Box Sync/Adjudication','Batch*','/Users/alec/Desktop/fcFinder_test')









    



---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-127-1588370114d1> in <module>()
     21     return output_dir
     22     return listOfBatches
---> 23 fcFinder_batches('/Users/alec/Box Sync/Adjudication','Batch*','/Users/alec/Desktop/fcFinder_test')

<ipython-input-127-1588370114d1> in fcFinder_batches(batch_path, batch_schema, output_dir, targets, modifiers)
     16                 if not os.path.exists(os.path.join(output_dir,batch)):
     17                     os.mkdir(os.path.exists(output_dir,bat))
---> 18                 fcFinder(file_path,modifiers,targets,os.path.join(output_dir,batch))
     19 
     20     return counter

<ipython-input-108-7c70eebd7375> in fcFinder(input_report, modifiers, targets, output_dir)
      7         report += f0.read()
      8     context = fc.create_context_doc(report, modifiers, targets)
----> 9     report_name = os.path.basename(input_report)
     10     annotations = fc.fluid_collection_classifier(context,input_report)
     11     XML_strings = [x.getXML() for x in annotations]

/Users/alec/anaconda/lib/python3.5/posixpath.py in basename(p)
    137     """Returns the final component of a pathname"""
    138     sep = _get_sep(p)
--> 139     i = p.rfind(sep) + 1
    140     return p[i:]
    141 

AttributeError: 'bool' object has no attribute 'rfind'



In [71]:

    
fcFinder(input_report, modifiers, targets, os.getcwd())









    Out[71]:





'/Users/alec/Box Sync/Bucher_Surgical_MIMICIII/pyConText_implement/fcFinder/XML_examples/Yes_28226_116465_05-29-93.txt.knowtator.xml'



In [69]:

    
fcFinder(input_report2, modifiers, targets, os.getcwd())









    Out[69]:





'/Users/alec/Box Sync/Bucher_Surgical_MIMICIII/pyConText_implement/fcFinder/XML_examples/No_69411_129942_04-30-58.txt.knowtator.xml'

Catching anatomical modifiers and creating annotations out of them



In [57]:

    
markup2 = context.getSectionMarkups()[0]
markup2









    Out[57]:





(0, __________________________________________
 rawText: [**2659-8-24**] 2:25 pm
  ct abd w&w/o c; ct pelvis w/contrast                            clip # [**clip number (radiology) 67740**]
  reason: evaluation for abscess and pancreasitis.
 cleanedText: [**2659-8-24**] 2:25 pm ct abd w&w/o c; ct pelvis w/contrast clip # [**clip number (radiology) 67740**] reason: evaluation for abscess and pancreasitis.
 ********************************
 TARGET: <id> 59056097846300117885128082136604531664 </id> <phrase> abscess </phrase> <category> ['fluid_collection'] </category> 
 ----MODIFIED BY: <id> 59052832695266580023396077266003221456 </id> <phrase> pelvis </phrase> <category> ['anatomy'] </category> 
 ----MODIFIED BY: <id> 59055915463070010046424622230320634832 </id> <phrase> evaluation for </phrase> <category> ['indication'] </category> 
 __________________________________________)



In [ ]: