BioVida: Open-i

Open-i is an open access biomedical search engine provided by the US National Institutes of Health. The service grants programmatic access to its over 1.2 million images through a RESTful web API. BioVida provides an easy-to-use python interface for this web API, located in the images subpackage.



In [1]:

    
from biovida.images import OpeniInterface









    



Using Theano backend.



In [2]:

    
opi = OpeniInterface()

We start by creating an instance of the class. All BioVida interfaces accept at least two parameters: verbose and cache_path. The first simply determines whether or not the class provides you with additional updates as the class works. The second refers to the location where will be stored (or cached) on your computer. If left to its default, data will be cached in a directory entitled biovida_cache in your home directory. For most use cases, this should suffice.

Searching

To search the Open-i database, we can use the OpeniInterface's search method. To explore valid values that can be passed to search, we can use options().



In [3]:

    
opi.options()









    



  - 'article_type'
  - 'collection'
  - 'exclusions'
  - 'fields'
  - 'image_type'
  - 'rankby'
  - 'specialties'
  - 'subset'
  - 'video'

The code above enumerates all of the parameters, apart from a specific query string, that can be passed to search(). Additionally, options() can be used to investigate the valid values for any one of these parameters.



In [4]:

    
opi.options('collection')









    



  - 'history_of_medicine'
  - 'indiana_u_xray'
  - 'medpix'
  - 'pubmed'
  - 'usc_anatomy'



In [5]:

    
opi.options('image_type')









    



  - 'ct'
  - 'graphic'
  - 'microscopy'
  - 'mri'
  - 'pet'
  - 'photograph'
  - 'ultrasound'
  - 'x_ray'

Let's go ahead and perform a search for X-ray and CT images of 'lung cancer' from the PubMed collection/database.



In [6]:

    
opi.search(query='lung cancer', image_type=('x_ray', 'ct'), collection='pubmed')









    



Results Found: 8,531.

Downloading Data

Now that we've defined a search, we can easily download some, or all, of the results found. For the sake of expediency, let's limit the number of results we download to the first 1500.



In [7]:

    
pull_df = opi.pull(download_limit=1500)









    



Number of Records to Download: 1,500 (chunk size: 30 records).

The text information associated with images are referred to as 'records', which are downloaded in 'chunks' of no more than 30 at a time.
Images, unlike records, are downloaded 'one by one'. However, pull() will check the cache before downloading an image, in an effort to reduce redundant downloads.

The dataframe generated by pull() can be viewed using either opi.records_db, or the pull_df used above to capture the output of pull(). Both will be identical. We can also view an abbreviated dataframe, opi.records_db_short, which has several (typically unneeded) columns removed.



In [8]:

    
import numpy as np
def simplify_df(df):
    """This function simplifies dataframes
    for the purposes of this tutorial."""
    data_frame = df.copy()
    data_frame['cached_images_path'] = '/path/to/image'
    return data_frame[0:5].replace({np.NaN: ''})



In [9]:

    
simplify_df(opi.records_db_short)









    Out[9]:






  
    
      
      mesh_major
      mesh_minor
      problems
      abstract
      affiliate
      article_type
      authors
      cc_license
      doc_source
      image_caption
      ...
      image_problems_from_text
      imaging_modality_from_text
      parsed_abstract
      sex
      modality_full
      image_id_short
      query
      pull_time
      cached_images_path
      download_success
    
  
  
    
      0
      
      
      lung cancer
      Lung cancer is one of the leading causes of ca...
      Department of Nuclear Medicine and Molecular I...
      other
      Purandare NC, Rangarajan V
      byncsa
      PMC
      (A-D) Nodal disease. Right upper paratracheal ...
      ...
      (arrows, grids)
      Computed Tomography (CT): chest
      
      male
      Computed Tomography (CT)
      7
      {'subset': None, 'rankby': None, 'collection':...
      2017-04-10 15:24:37.098065
      /path/to/image
      True
    
    
      1
      
      
      nsclc
      Non-small cell lung cancer (NSCLC) accounts fo...
      Cardiopulmonary Department, Sant'Andrea Hospit...
      research_article
      Pezzuto A, Piraino A, Mariotta S
      by
      PMC
      Case 2: (A) CT scan showing the lung after sur...
      ...
      
      Computed Tomography (CT): chest
      
      female
      Computed Tomography (CT)
      2
      {'subset': None, 'rankby': None, 'collection':...
      2017-04-10 15:24:37.098065
      /path/to/image
      True
    
    
      2
      (Adrenal Insufficiency/diagnosis*/drug therapy...
      (Acute Disease, Adrenal Cortex Hormones/therap...
      hypoxia; tachycardia
      Background: Adrenal crisis after surgical pro...
      Department of Orthopedic Surgery, Osaka Medica...
      research_article
      Naka N, Takenaka S, Nanno K, Moriguchi Y, Chun...
      by
      PMC
      CT scan showing bilateral adrenal enlargement ...
      ...
      (asterisks,)
      Computed Tomography (CT): chest
      {'background': 'Adrenal crisis after surgical ...
      male
      Computed Tomography (CT)
      1
      {'subset': None, 'rankby': None, 'collection':...
      2017-04-10 15:24:37.098065
      /path/to/image
      True
    
    
      3
      
      
      small cell carcinoma of lung
      Subcutaneous swelling as first clinical presen...
      Department of Medicine.
      case_report
      Kumar S, Gupta A, Diwan SK, Bhake A
      
      PMC
      Computerized tomography of the chest showing p...
      ...
      
      Computed Tomography (CT): chest
      
      male
      Computed Tomography (CT)
      3
      {'subset': None, 'rankby': None, 'collection':...
      2017-04-10 15:24:37.098065
      /path/to/image
      True
    
    
      4
      (Echocardiography/methods*, Lung/ultrasonograp...
      (Aged, Humans, Incidental Findings, Male)
      lung tumor
      We present images of a rare case where a prima...
      Department of Clinical Physiology and Nuclear ...
      research_article
      Dencker M, Cronberg C, Damm S, Valind S, Wadbo M
      by
      PMC
      Display of CT-image. The arrow indicates the n...
      ...
      (arrows,)
      Computed Tomography (CT): chest
      
      male
      Computed Tomography (CT)
      2
      {'subset': None, 'rankby': None, 'collection':...
      2017-04-10 15:24:37.098065
      /path/to/image
      True
    
  

5 rows × 38 columns

This dataframe is provides a lot of rich data, which is valuable independent of the images which have also been downloaded.

For instance, it is possible to quickly generate some descriptive statistics about our newly created 'lung cancer' dataset.



In [10]:

    
pull_df['age'].describe()









    Out[10]:





count    966.000000
mean      53.337267
std       21.999990
min        1.000000
25%       47.000000
50%       59.000000
75%       68.000000
max       91.400000
Name: age, dtype: float64



In [11]:

    
pull_df['sex'].value_counts(normalize=True)









    Out[11]:





male      0.722359
female    0.277641
Name: sex, dtype: float64

The age and sex columns are generated by analyzing the raw text provided by Open-i. It is reasonably accurate, but mistakes are certainly possible.

It should also be mentioned that opi.records_db only contains data for the most recent search() and pull(). Conversely, cache_records_db provides a more complete account of all images in the cache, e.g., those obtained several sessions ago. Additionally, unlike opi.records_db, cache_records_db can contain duplicate rows. However, this is only allowed to occur if the queries that generated the rows are different.

Images

Now that we've explored obtaining and reviewing data, we can finally turn our attention to images themselves.



In [12]:

    
from utils import show_image
%matplotlib inline

Note: utils is a small script with some helpful functions located in the base of this directory.

Using the show_images imported above, we can now look at a random images we pulled in the step above.



In [13]:

    
# show_image(opi.records_db['cached_images_path'].iloc[156])



In [14]:

    
opi.records_db['license_type'].iloc[156]









    Out[14]:





'open-access'

Let's also look at the age and sex of this subject.



In [15]:

    
age_sex = opi.records_db['age'].iloc[156], opi.records_db['sex'].iloc[156]
print("age: {0}, sex: {1}.".format(*age_sex))









    



age: 48.0, sex: male.

We can also easily check their diagnosis



In [16]:

    
opi.records_db['diagnosis'].iloc[156]









    Out[16]:





'carcinoma; neurofibromatosis'

Please be advised that for collections other than 'MedPix'*, such as PubMed, diagnosis information is obtained by analyzing the text associated with the image. Errors are possible.

*MedPix explicitly provides diagnosis information, so it can be assumed to be accurate.

Automated Cleaning of Image Data (Experimental)

While the data may look OK so far, if we look more closely we will likely find several problems with the images we have downloaded.



In [17]:

    
# show_image(opi.records_db['cached_images_path'].iloc[100])



In [18]:

    
# show_image(opi.records_db['cached_images_path'].iloc[10])

The images above contain several clear problems. They both contain arrows and the latter is actually a 'grid' of images. These are liable to confuse any model we attempt to train detect disease. We could manually go through and remove these images or, alternatively, we can use the experimental OpeniImageProcessing class to try and eliminate these images from our dataset automatically.



In [19]:

    
from biovida.images import OpeniImageProcessing

We initialize this class using our OpeniInterface instance. By default, it will extract the records_db DataFrame. Do note, however, that we can force it to extract the cache_records_db DataFrame by setting the db_to_extract equal to 'cache_records_db'.



In [20]:

    
ip = OpeniImageProcessing(opi)

OpeniImageProcessing will automatically download a model for a Convolutional Neural Network (convnet) which has been trained to detect these kinds of problems. If you are unfamiliar with these kinds of models, you can read more about them here.

The OpeniImageProcessing class tries to detect problems in the images by analyzing both the text associated it is associated with as well as by feeding the image through the convnet mentioned above. However, by default the OpeniImageProcessing class will only use predictions gleaned from this model if it has been explicitly trained on images from that kind of imaging modality.

We can easily check the modalities for which the model has been trained:



In [21]:

    
ip.trained_open_i_modality_types









    Out[21]:





['ct', 'mri', 'x_ray']

Luckily, we're working with X-rays and CTs.

Now we're ready to analyze our images.



In [22]:

    
analysis_df = ip.auto()



In [23]:

    
simplify_df(analysis_df)









    Out[23]:






  
    
      
      mesh_major
      mesh_minor
      problems
      abstract
      affiliate
      article_type
      authors
      cc_license
      detailed_query_url
      doc_source
      ...
      grayscale
      medpix_logo_bounding_box
      hbar
      hborder
      vborder
      upper_crop
      lower_crop
      visual_image_problems
      invalid_image
      invalid_image_reasons
    
  
  
    
      0
      
      
      lung cancer
      Lung cancer is one of the leading causes of ca...
      Department of Nuclear Medicine and Molecular I...
      other
      Purandare NC, Rangarajan V
      byncsa
      https://openi.nlm.nih.gov/retrieve.php?img=PMC...
      PMC
      ...
      False
      
      
      
      
      
      
      [(grids, 0.864566), (arrows, 0.00571598), (tex...
      True
      (grayscale, image_problems_from_text, visual_i...
    
    
      1
      
      
      nsclc
      Non-small cell lung cancer (NSCLC) accounts fo...
      Cardiopulmonary Department, Sant'Andrea Hospit...
      research_article
      Pezzuto A, Piraino A, Mariotta S
      by
      https://openi.nlm.nih.gov/retrieve.php?img=PMC...
      PMC
      ...
      True
      
      
      
      
      
      
      [(valid_image, 0.858156), (text, 0.100862), (a...
      False
      
    
    
      2
      (Adrenal Insufficiency/diagnosis*/drug therapy...
      (Acute Disease, Adrenal Cortex Hormones/therap...
      hypoxia; tachycardia
      Background: Adrenal crisis after surgical pro...
      Department of Orthopedic Surgery, Osaka Medica...
      research_article
      Naka N, Takenaka S, Nanno K, Moriguchi Y, Chun...
      by
      https://openi.nlm.nih.gov/retrieve.php?img=PMC...
      PMC
      ...
      True
      
      327
      
      
      
      327
      [(valid_image, 0.914723), (arrows, 0.0369132),...
      True
      (image_problems_from_text,)
    
    
      3
      
      
      small cell carcinoma of lung
      Subcutaneous swelling as first clinical presen...
      Department of Medicine.
      case_report
      Kumar S, Gupta A, Diwan SK, Bhake A
      
      https://openi.nlm.nih.gov/retrieve.php?img=PMC...
      PMC
      ...
      False
      
      348
      (4, 351)
      (191, 491)
      4
      348
      [(arrows, 0.998476), (grids, 0.000149499), (va...
      True
      (grayscale, visual_image_problems)
    
    
      4
      (Echocardiography/methods*, Lung/ultrasonograp...
      (Aged, Humans, Incidental Findings, Male)
      lung tumor
      We present images of a rare case where a prima...
      Department of Clinical Physiology and Nuclear ...
      research_article
      Dencker M, Cronberg C, Damm S, Valind S, Wadbo M
      by
      https://openi.nlm.nih.gov/retrieve.php?img=PMC...
      PMC
      ...
      False
      
      294
      
      (96, 487)
      
      294
      [(grids, 0.993127), (arrows, 8.44916e-06), (va...
      True
      (grayscale, image_problems_from_text, visual_i...
    
  

5 rows × 65 columns

This will generate several new columns:

'grayscale': this is simply an account of whether or not the images is grayscale.
'medpix_logo_bounding_box': images from the MedPix collection, typically contain the organization's logo in the top right corner. Had we passed the class images from MedPix, it would have tried to 'draw' a bounding box around its precise location (enabling it to be cropped out of the image).
'hbar': this denotes a 'horizontal bar' that is sometimes found at the bottom of images. If present, this column reports its height in pixels.
'hborder': this column provides an account of 'horizontal borders' on either side of the image.
'vborder': this column provides an account of 'vertical borders' on the top and bottom of the image.
'upper crop': this is the location that has been selected to crop the top of the image. This decision is made by considering the 'medpix_logo_bounding_box' and 'vborder' columns.
'lower crop': this is the location that has been selected to crop the bottom of the image. This decision is made by considering the 'hbar' and 'hborder' columns.
'visual_image_problems': this column contains the output of the convnet model, with the numbers following the words representing the probability that the image belongs to that category.
'invalid_image': this is a decision as to whether or not the image is invalid, e.g., has an arrow. This decision is made using the 'grayscale' and 'visual_image_problems' columns as well as the text associated with the image ('image_problems_from_text')
'invalid_image_reasons': in cases where the 'invalid_image' column is True, column provides an account as to why a decision was made.

We can use this analysis to construct a new dataframe, with 'invalid_images' removed and the remaining images cropped in such a way that problematic features are removed.



In [24]:

    
ip.clean_image_dataframe()

This 'cleaned' set, should have fewer instances of problematic images.

Here's a random image from this new set:



In [25]:

    
# show_image(ip.image_dataframe_cleaned['cleaned_image'].iloc[180])

With time, the machinery used to detect these kinds of problems, particularly the convolutional neural network, will be improved. However, at the current time, this class is still considered to be very experimental.

Train, Validation and Test

Now that we've explored data harvesting, we can turn our attention to the final step before modeling: dividing data into training, validation and/or tests sets.

Let's use images from the Indiana University Chest X-Ray collection* ('indiana_u_xray'). This set of images has been assembled 'by hand', and thus does not require complicated image cleaning procedures.
*License; images have not been modified.



In [26]:

    
opi.search(collection='indiana_u_xray')









    



Results Found: 7,470.

Let's go ahead and download this entire collection.
Please be advised that this will take some time, so feel free to adjust download_limit to suit your needs.



In [27]:

    
pull_df2 = opi.pull(download_limit=None)









    



Number of Records to Download: 7,470 (chunk size: 30 records).

Let's quickly inspect this newly downloaded data.



In [28]:

    
simplify_df(opi.records_db_short)









    Out[28]:






  
    
      
      mesh_major
      mesh_minor
      problems
      abstract
      affiliate
      article_type
      authors
      cc_license
      doc_source
      image_caption
      ...
      image_problems_from_text
      imaging_modality_from_text
      parsed_abstract
      sex
      modality_full
      image_id_short
      query
      pull_time
      cached_images_path
      download_success
    
  
  
    
      0
      (Calcified Granuloma/lung/upper lobe/right,)
      
      calcified granuloma
      Comparison: Chest radiographs XXXX. Indicatio...
      Indiana University
      radiology_report
      Kohli MD, Rosenman M
      byncnd
      CXR
      PA and lateral chest x-XXXX XXXX.
      ...
      
      Computed Tomography (CT): chest
      {'impression': 'No acute cardiopulmonary proce...
      male
      X-Ray
      1
      {'subset': None, 'rankby': None, 'collection':...
      2017-04-10 15:30:34.796070
      /path/to/image
      True
    
    
      1
      (Calcified Granuloma/lung/upper lobe/right,)
      
      calcified granuloma
      Comparison: Chest radiographs XXXX. Indicatio...
      Indiana University
      radiology_report
      Kohli MD, Rosenman M
      byncnd
      CXR
      PA and lateral chest x-XXXX XXXX.
      ...
      
      Computed Tomography (CT): chest
      {'impression': 'No acute cardiopulmonary proce...
      male
      X-Ray
      2
      {'subset': None, 'rankby': None, 'collection':...
      2017-04-10 15:30:34.796070
      /path/to/image
      True
    
    
      2
      (normal,)
      
      normal
      Comparison: None. Indication: Positive TB tes...
      Indiana University
      radiology_report
      Kohli MD, Rosenman M
      byncnd
      CXR
      Xray Chest PA and Lateral
      ...
      
      Computed Tomography (CT): chest
      {'impression': 'Normal chest x-XXXX.', 'compar...
      
      X-Ray
      1
      {'subset': None, 'rankby': None, 'collection':...
      2017-04-10 15:30:34.796070
      /path/to/image
      True
    
    
      3
      (normal,)
      
      normal
      Comparison: None. Indication: Positive TB tes...
      Indiana University
      radiology_report
      Kohli MD, Rosenman M
      byncnd
      CXR
      Xray Chest PA and Lateral
      ...
      
      Computed Tomography (CT): chest
      {'impression': 'Normal chest x-XXXX.', 'compar...
      
      X-Ray
      2
      {'subset': None, 'rankby': None, 'collection':...
      2017-04-10 15:30:34.796070
      /path/to/image
      True
    
    
      4
      (Markings/lung/bilateral/interstitial/diffuse/...
      
      markings; fibrosis
      Comparison: None. Indication: dyspnea, subjec...
      Indiana University
      radiology_report
      Kohli MD, Rosenman M
      byncnd
      CXR
      CHEST 2V FRONTAL/LATERAL XXXX, XXXX XXXX PM
      ...
      
      Computed Tomography (CT): chest
      {'impression': 'Diffuse fibrosis. No visible f...
      
      X-Ray
      1
      {'subset': None, 'rankby': None, 'collection':...
      2017-04-10 15:30:34.796070
      /path/to/image
      True
    
  

5 rows × 37 columns

We can easily select a subset of these ~7000 images and divide them into training and test sets for some machine learning model using the image_divvy() tool.



In [29]:

    
from biovida.images import image_divvy

Let's imagine we're interested in building a model capable of distinguishing between 'normal' chest x-rays and those with signs of problematic caclium deposits, a disease formally known as 'calcinosis'.

We can define a rule to construct such a training and test set using a 'divvy_rule'. This rule will tell image_divvy() how to 'divvy up' the images in the cache. More specifically, our rule will tell this image_divvy() how to categorize images in the cache.



In [30]:

    
def my_divvy_rule(row):
    if isinstance(row['diagnosis'], str):
        if 'normal' in row['diagnosis']:
            return 'normal'  # though this could be anything, e.g., 'super cool normal images'.
        elif 'calcinosis' in row['diagnosis']:
            return 'calcinosis'

Now that image_divvy() knows how we would like it to categorize the data we've downloaded, we can also pass it a dictionary specifying how to 'split' the data into training and testing sets. In this example, we'll use a standard 80% train, 20% test split and ask the function returns numpy arrays (ndarrays) as output.



In [31]:

    
train_test = image_divvy(instance=opi,
                         divvy_rule=my_divvy_rule,
                         db_to_extract='records_db',
                         action='ndarray',
                         train_val_test_dict={'train': 0.8, 'test': 0.2})









    





 
 










    









    





 
 










    



Structure:

- 'train':
  - 'calcinosis'
  - 'normal'
- 'test':
  - 'calcinosis'
  - 'normal'

Before signing off, image_divvy() printed the structure of the nested dictionary it returned.
We can use this information to unpack the arrays nested within this data structure:



In [32]:

    
train_ca, test_ca = train_test['train']['calcinosis'], train_test['test']['calcinosis']
train_norm, test_norm = train_test['train']['normal'], train_test['test']['normal']

Now that our data has been neatly unpacked, we can look at the number of samples the procedure generated.



In [33]:

    
# Normal
print("Train:", len(train_norm), "|", "Test:", len(test_norm))









    



Train: 2156 | Test: 540



In [34]:

    
# Calcinosis
print("Train:", len(train_ca), "|", "Test:", len(test_ca))









    



Train: 446 | Test: 112

Using the show_image() tool we imported above, we can take a quick at an image from each category.



In [35]:

    
# Normal
# show_image(train_norm[99])



In [36]:

    
# Calcinosis
# show_image(train_ca[104])

Conclusion

Here we've explored how BioVida can be used to easily obtain and process data from the Open-i database.

In the next tutorial, we'll see how BioVida can be used to gain access to a database with orders of magnitude more images.

	mesh_major	mesh_minor	problems	abstract	affiliate	article_type	authors	cc_license	doc_source	image_caption	...	image_problems_from_text	imaging_modality_from_text	parsed_abstract	sex	modality_full	image_id_short	query	pull_time	cached_images_path	download_success
0			lung cancer	Lung cancer is one of the leading causes of ca...	Department of Nuclear Medicine and Molecular I...	other	Purandare NC, Rangarajan V	byncsa	PMC	(A-D) Nodal disease. Right upper paratracheal ...	...	(arrows, grids)	Computed Tomography (CT): chest		male	Computed Tomography (CT)	7	{'subset': None, 'rankby': None, 'collection':...	2017-04-10 15:24:37.098065	/path/to/image	True
1			nsclc	Non-small cell lung cancer (NSCLC) accounts fo...	Cardiopulmonary Department, Sant'Andrea Hospit...	research_article	Pezzuto A, Piraino A, Mariotta S	by	PMC	Case 2: (A) CT scan showing the lung after sur...	...		Computed Tomography (CT): chest		female	Computed Tomography (CT)	2	{'subset': None, 'rankby': None, 'collection':...	2017-04-10 15:24:37.098065	/path/to/image	True
2	(Adrenal Insufficiency/diagnosis*/drug therapy...	(Acute Disease, Adrenal Cortex Hormones/therap...	hypoxia; tachycardia	Background: Adrenal crisis after surgical pro...	Department of Orthopedic Surgery, Osaka Medica...	research_article	Naka N, Takenaka S, Nanno K, Moriguchi Y, Chun...	by	PMC	CT scan showing bilateral adrenal enlargement ...	...	(asterisks,)	Computed Tomography (CT): chest	{'background': 'Adrenal crisis after surgical ...	male	Computed Tomography (CT)	1	{'subset': None, 'rankby': None, 'collection':...	2017-04-10 15:24:37.098065	/path/to/image	True
3			small cell carcinoma of lung	Subcutaneous swelling as first clinical presen...	Department of Medicine.	case_report	Kumar S, Gupta A, Diwan SK, Bhake A		PMC	Computerized tomography of the chest showing p...	...		Computed Tomography (CT): chest		male	Computed Tomography (CT)	3	{'subset': None, 'rankby': None, 'collection':...	2017-04-10 15:24:37.098065	/path/to/image	True
4	(Echocardiography/methods*, Lung/ultrasonograp...	(Aged, Humans, Incidental Findings, Male)	lung tumor	We present images of a rare case where a prima...	Department of Clinical Physiology and Nuclear ...	research_article	Dencker M, Cronberg C, Damm S, Valind S, Wadbo M	by	PMC	Display of CT-image. The arrow indicates the n...	...	(arrows,)	Computed Tomography (CT): chest		male	Computed Tomography (CT)	2	{'subset': None, 'rankby': None, 'collection':...	2017-04-10 15:24:37.098065	/path/to/image	True

	mesh_major	problems	abstract	affiliate	article_type	authors	cc_license	doc_source	image_caption	...	imaging_modality_from_text	parsed_abstract	sex	modality_full	image_id_short	query	pull_time	cached_images_path	download_success
0	(Calcified Granuloma/lung/upper lobe/right,)	calcified granuloma	Comparison: Chest radiographs XXXX. Indicatio...	Indiana University	radiology_report	Kohli MD, Rosenman M	byncnd	CXR	PA and lateral chest x-XXXX XXXX.	...	Computed Tomography (CT): chest	{'impression': 'No acute cardiopulmonary proce...	male	X-Ray	1	{'subset': None, 'rankby': None, 'collection':...	2017-04-10 15:30:34.796070	/path/to/image	True
1	(Calcified Granuloma/lung/upper lobe/right,)	calcified granuloma	Comparison: Chest radiographs XXXX. Indicatio...	Indiana University	radiology_report	Kohli MD, Rosenman M	byncnd	CXR	PA and lateral chest x-XXXX XXXX.	...	Computed Tomography (CT): chest	{'impression': 'No acute cardiopulmonary proce...	male	X-Ray	2	{'subset': None, 'rankby': None, 'collection':...	2017-04-10 15:30:34.796070	/path/to/image	True
2	(normal,)	normal	Comparison: None. Indication: Positive TB tes...	Indiana University	radiology_report	Kohli MD, Rosenman M	byncnd	CXR	Xray Chest PA and Lateral	...	Computed Tomography (CT): chest	{'impression': 'Normal chest x-XXXX.', 'compar...		X-Ray	1	{'subset': None, 'rankby': None, 'collection':...	2017-04-10 15:30:34.796070	/path/to/image	True
3	(normal,)	normal	Comparison: None. Indication: Positive TB tes...	Indiana University	radiology_report	Kohli MD, Rosenman M	byncnd	CXR	Xray Chest PA and Lateral	...	Computed Tomography (CT): chest	{'impression': 'Normal chest x-XXXX.', 'compar...		X-Ray	2	{'subset': None, 'rankby': None, 'collection':...	2017-04-10 15:30:34.796070	/path/to/image	True
4	(Markings/lung/bilateral/interstitial/diffuse/...	markings; fibrosis	Comparison: None. Indication: dyspnea, subjec...	Indiana University	radiology_report	Kohli MD, Rosenman M	byncnd	CXR	CHEST 2V FRONTAL/LATERAL XXXX, XXXX XXXX PM	...	Computed Tomography (CT): chest	{'impression': 'Diffuse fibrosis. No visible f...		X-Ray	1	{'subset': None, 'rankby': None, 'collection':...	2017-04-10 15:30:34.796070	/path/to/image	True