Overview:
Annotations represent a significant time investment for the users who generate
them and they should be backed up frequently. The simplest way to backup the
annotations in a DSA database is to perform a
mongodump
operation. While frequent mongodump operations are always important to guard
against failures they have the following disadvantages:
.json files (most similar to the raw format),
tabular files (.csv), and/or an SQLite database.The SQLite database can easily be viewed using, for example, an offline sqlite viewer or even an online sqlite viewer.
Where to look:
|_histomicstk/
|_annotations_and_masks/
|_annotation_database_parser.py
|_annotation_and_mask_utils.py -> parse_slide_annotations_into_tables()
|_tests/
|_test_annotation_database_parser.py
|_test_annotation_and_mask_utils.py -> test_parse_slide_annotations_into_table()
In [1]:
import os
import pandas as pd
import sqlalchemy as db
from histomicstk.utils.girder_convenience_utils import connect_to_api
from histomicstk.annotations_and_masks.annotation_database_parser import (
dump_annotations_locally, parse_annotations_to_local_tables)
In [2]:
gc = connect_to_api(
apiurl='http://candygram.neurology.emory.edu:8080/api/v1/',
apikey='kri19nTIGOkWH01TbzRqfohaaDWb6kPecRqGmemb')
# This is the girder ID of the folder we would like to backup and parse locally
SAMPLE_FOLDER_ID = "5e24c20dddda5f8398695671"
# This is where the annotations and sqlite database will be dumped locally
savepath = '/home/mtageld/Desktop/tmp/concordance/'
This is the main function you will be using to walk the folder and pull the annotations from the remote server
In [3]:
print(dump_annotations_locally.__doc__)
This optionally calls the following function to parse annotations into tables that are added to an sqlite database.
In [4]:
print(parse_annotations_to_local_tables.__doc__)
The simplest case is to backup the information about the girder folders, items, and annotations as .json files, with a folder structure replicated locally as it is in the girder database. The user may also elect to save the folder and item/slide information (but not the annotations) as the following tables in a SQLite database:
folders: all girder folders contained within the folder that the user wants to backup. This includes an 'absolute girder path' convenience column. The column '_id' is the unique girder ID.
items: all items (slide). The column '_id' is the unique girder ID, and is linked to the folders table by the 'folderId' column.
In [5]:
# recursively save annotations -- JSONs + sqlite for folders/items
dump_annotations_locally(
gc, folderid=SAMPLE_FOLDER_ID, local=savepath,
save_json=True, save_sqlite=True)
In [6]:
!tree '/home/mtageld/Desktop/tmp/concordance/'
In [7]:
# Connect to the database
sql_engine = db.create_engine(
'sqlite:///%s/Concordance.sqlite' % savepath)
dbcon = sql_engine.connect()
In [8]:
# folders table
folders_df = pd.read_sql_query(
"""
SELECT "_id", "name", "folder_path"
FROM "folders"
;""", dbcon)
folders_df
Out[8]:
In [9]:
# items table
items_df = pd.read_sql_query(
"""
SELECT "_id", "name", "folderid"
FROM "items"
;""", dbcon)
items_df
Out[9]:
In [10]:
# cleanup
import shutil
shutil.rmtree(os.path.join(savepath))
os.mkdir(savepath)
Besides everything outlined above, we could also parse the annotations into tables in the SQLite database and not just save the raw JSON files. This is a little slower because loops through each annotation element. Beside the tables above, the following extra tables are saved into the SQLite database:
annotation_docs: Information about all the annotation documents (one document is a collection of elements like polygons, rectangles etc). The column 'annotation_girder_id' is the unique girder ID, and is linked to the 'items' table by the 'itemid' column.
annotation_elements: Information about the annotation elements (polygons, rectangles, points, etc). The column 'element_girder_id' is the unique girder ID, and is linked to the 'annotation_docs' table by the 'annotation_girder_id' column.
In [11]:
# recursively save annotations -- parse sqlite
dump_annotations_locally(
gc, folderid=SAMPLE_FOLDER_ID, local=savepath,
save_json=False, save_sqlite=True,
callback=parse_annotations_to_local_tables,
callback_kwargs={
'save_csv': False,
'save_sqlite': True,
}
)
In [12]:
!tree '/home/mtageld/Desktop/tmp/concordance/'
In [13]:
# Connect to the database
sql_engine = db.create_engine(
'sqlite:///%s/Concordance.sqlite' % savepath)
dbcon = sql_engine.connect()
In [14]:
# annotation documents
docs_df = pd.read_sql_query(
"""
SELECT "annotation_girder_id", "itemId", "item_name", "element_count"
FROM 'annotation_docs'
;""", dbcon)
docs_df.head()
Out[14]:
In [15]:
# annotation elements
elements_summary = pd.read_sql_query(
"""
SELECT "group", count(*)
FROM 'annotation_elements'
GROUP BY "group"
;""", dbcon)
elements_summary
Out[15]: