This tutorial will cover the facilities BioVida offers to:
integrate images data against other kinds of biomedical data
manage cached resources.
While primarily focused on image data, BioVida also contains interfaces to allow you to easily gain access to other biomedical data types. Namely, medical diagnostics and genomics data. This section will show how one, or several, image interfaces can be unified into a single DataFrame, complete with data from these additional sources.
We can start by collecting some data.
In [1]:
from biovida.images import OpeniInterface
opi = OpeniInterface()
opi.search(query='lung cancer')
pull_df1 = opi.pull()
Let's also get some data from the Cancer Imaging Archive.
In [2]:
from biovida.images import CancerImageInterface
cii = CancerImageInterface(api_key=YOUR_API_KEY_HERE)
cii.search(cancer_type='lung')
pull_df2 = cii.pull(collections_limit=1) # only download the first collection/study
Next, we can import the tool we will be using to unify the data
In [3]:
from biovida.unification import unify_against_images
In [4]:
unified_df = unify_against_images(instances=[opi, cii])
In [5]:
import numpy as np
def simplify_df(df):
"""This function simplifies dataframes
for the purposes of this tutorial."""
data_frame = df.copy()
for c in ('source_images_path', 'cached_images_path'):
data_frame[c] = 'path_to_image'
return data_frame.replace({np.NaN: ''})
To close out this section, we can take a quick look at the resultant DataFrame.
In [6]:
simplify_df(unified_df)[85:90]
Out[6]:
Note: the 'mentioned_symptoms' column provides a list of symptoms known to be associated with the disease which were mentioned in the article.
This section is intended to provide a brief overview of the ways in which data downloaded with BioVida can be removed from your computer.
1. The simplest way to delete BioVida data is to manually delete the biovida_cache folder, or some portion of files (e.g., images) contained within in. Both OpeniInterfaces and CancerImageInterface check for deleted files each time they are instantiated.
2. While the first approach is straightforward, it is neither elegant nor precise. For situations that require more finesse, we can employ the image_delete tool.
In [7]:
from biovida.images import image_delete
Next, we simply define a which will inform image_delete of which rows to delete.
In [8]:
def my_delete_rule(row):
if isinstance(row['abstract'], str) and 'proteins' in row['abstract'].lower():
return True
In this example, we'll use the instance of OpeniInterface created above.
In [9]:
deleted_rows = image_delete(opi, delete_rule=my_delete_rule, only_recent=True)
This will not only delete the row, but any images associated with it. Therefore, as a precaution, you will be asked to confirm this action before it is performed.
Warning:
The default behavior of image_delete is to delete any rows for which your 'delete_rule' returns True, including those in cache_records_db which were not downloaded in the most recent pull(). The only_recent parameter can be used to limit deletion to data obtained in the most recent pull, as shown above.
In this tutorial we have reviewed how to unify images obtained with BioVida both with eachother as well as against external biomedial databases. Additionally, we have explored methods for deleting downloaded data.