The goal of biobank is to mantain a consistent picture of physical and digital objects and the chain of operations that connect the objects between themselves. The permanence of objects is guaranteed by a backend system, but what we will describe here is, usually, the client side point of view. In this notebook we will describe the basic concepts of objects creation, destruction and how to represent transformations.
First of all we need to connect to the back-end. FIXME: in the actual example, it would make sense to have generic test accounts, rather than irgb :-)
In [1]:
import sys, os
from bl.vl.kb import KnowledgeBase
OME_HOST = os.getenv('OME_HOST', 'localhost')
OME_USER = os.getenv('OME_USER', 'test')
OME_PASSWD = os.getenv('OME_PASSWD', 'test')
CHECK_OME_VERSION = os.getenv('CHECK_OME_VERSION', "True") == "True"
BaseProxy = KnowledgeBase(driver='omero')
class Proxy(BaseProxy):
def get_objects_dict(self, klass):
return dict((o.label, o) for o in super(Proxy, self).get_objects(klass))
kb = Proxy(OME_HOST, OME_USER, OME_PASSWD, check_ome_version=CHECK_OME_VERSION)
kb.connect()
kb.start_keep_alive()
def cleanup():
print "# disconnecting the kb"
kb.disconnect()
sys.exitfunc = cleanup
print
print "### KB ENV PRELOADED ###"
print "# connected to %s" % OME_HOST
print "# knowledge base: kb"
print "# extra method: kb.get_objects_dict"
print "########################"
What the code above does is to establish a connection to the omero.biobank server. To comunicate with the server we will be using kb, a proxy object.
Omero.biobank has models for multiple physical and digital objects, and, if needed, it can be easily extended with new objects models. One of the simplest models is DataSample. It represents a blob of digital data. In reality, one typically uses instances of more specific models derived from DataSample and there are complexities related to data physical location. However, for the time being, we will consider the simple, unadorned, DataSample.
FIXME: omero.biobank models are proxied by python classes in the client. The API is based on providing access, via kb, to these proxy classes and to global function that operate on them.
First thing we will do is to create a DataSample instance, that we will call data_sample. The creation is done using the method create of the kb factory (kb.factory). The moethod takes two parameters: the object class and a dict with the object fields assignments. All objects that correspond either to phyical (FIXME: what about HardwareDevice?) or to digital data objects require the specification of a field called 'action'. Its purpose will be explained later. For the time being we will use 'kb.create_an_action()' to get a dummy place holder.
To create a DataSample we need to assing an unique label and a status attribute, together with the action mentioned above.
In [2]:
action = kb.create_an_action()
data_sample = kb.factory.create(kb.DataSample,
conf = {'label': 'a foo label',
'status': kb.DataSampleStatus.USABLE,
'action': action})
In [3]:
data_sample.label, data_sample.status
Out[3]:
Now we have data_sample only in the client RAM and we need do save it in omero.biobank. By invoking .is_mapped() we check if it exists in the db.
In [4]:
data_sample.is_mapped()
Out[4]:
In [5]:
data_sample.save()
Out[5]:
In [6]:
data_sample.is_mapped()
Out[6]:
All omero.biobank objects are uniquely identified by a an id called a 'vid'. For instance,
In [7]:
data_sample.id
Out[7]:
Once objects have been created they can be destroyed.
In [8]:
kb.delete(action)
Well, they can be deleted but in the right order. Omero.biobank maintains reference integrity. Thus, first we will have to delete data_sample and then action. FIXME: this will actually leave some debries behind, but it will probably make things to complicated right now.
In [9]:
kb.delete(data_sample)
In [10]:
kb.delete(action)
Omero.biobank handles collection by defining a new object -- instance of a class derived from VLCollection, the base abstract class -- and then associating specific objects to it using instances of a support class. FIXME: this is not the most elegant thing that can be done....
Consider, for instance, a collection of DataSamples. The concrete collection class is DataCollection and the way we would build the collection is as follows.
In [11]:
action = kb.create_an_action()
samples = [kb.factory.create(kb.DataSample,
{'label': "foodata_{}".format(i),
'status': kb.DataSampleStatus.USABLE,
'action': action}).save()
for i in range(3)]
In [12]:
data_collection = kb.factory.create(kb.DataCollection,
{'label': 'a collection', 'action': action}).save()
In [13]:
dcis = [kb.factory.create(kb.DataCollectionItem,
{'dataCollection' : data_collection, 'dataSample' : s}).save()
for s in samples]
In [14]:
kb.get_data_collection_items(data_collection)
Out[14]:
Note that a collection is a, essentially, a set.
In [15]:
kb.factory.create(kb.DataCollectionItem,
{'dataCollection' : data_collection,
'dataSample' : samples[0]}).save()
As you can see, trying to put the same object twice in a collection will fail. Just before we forget, let's clean up what we have just created.
In [16]:
for dci in kb.get_data_collection_items(data_collection):
kb.delete(dci)
In [17]:
for s in samples:
kb.delete(s)
In [18]:
kb.delete(data_collection)
Omero.biobank is designed to trace as accurately as possible, of course within intrinsic operational noise -- whatever we mean by that -- the full sequence of operations that connect one object to its ancestors.
We can order all objects by the time of their creation, so we will use $o_j$ to indicate the $j$ objects in the temporal sequence. For all objects it is true that
$$o_i = F_{\alpha_i}[p_i](\Pi_i(\{o_k\}_{k < i}),$$where $\Pi_i$ is a projection operator that selects from the available objects the subset that it is used by the operator $F_\alpha$. We assume that we have a countable set of operators $F_\alpha$, here with $\alpha=\alpha_i$. With the symbol $p_i$ we indicate the set of parameters that specialize the application of $F_\alpha$.
To describe the $F_\alpha$, kb uses objects that are instances of classes derived from Device.
In [19]:
kb.Device.__fields__
Out[19]:
A Device is characterized by a label, a maker, a model and a release. Thus, for instance, we can will now create a generic device.
In [20]:
device = kb.factory.create(kb.Device,
{'label': 'a-plain-device',
'maker': 'foo', 'model': 'foom',
'release': 'foor'}).save()
In [21]:
device.id
Out[21]:
In [22]:
device.label
Out[22]:
In [23]:
kb.delete(device)
In [24]:
for k in kb.__dict__:
a = getattr(kb, k)
if isinstance(a, type) and issubclass(a, kb.Device):
print k
FIXME: Of the Devices above:
In the specific context of omero.biobank, one is not interested in describing the future but rather the history of events that resulted in a given object. In the figure below, we show a cartoon dependency graph, loosely patterned on a specific example. The graph is directional, with flow going from the root towards the leaves. While each object has one incoming edge, it could have multiple outgoing edges. It is thus natural to associate to each node of the graph, say $o_j$, an object, $a_j$, that saves sufficient information to describe how it has been produced, thus:
$$a_j = (F_{\alpha_j}, p_j, \Pi_j(\{o_k\}_{k < j}).$$In principle, it would be enough to save $\Pi_j$ but it is more efficient to keep the actual list $\Pi_j(\{o_k\}_{k < j})$.
In other words, we are describing a transformation $T$ that goes from the set of the nodes of $G$, the dependency graph, to the set of the $\{a_k\}_k$.
If we describe a path in the graph by its two possible extrema, $P_{i,j}$ with $i<j$, the natural transformation of $P_{i,j}$ by $T$ is the reverse sequence $(a_j, a_{j-1}, a_{j-2}, ..., a_{i+1})$. In a way, $T$ is a sort of contravariant functor. If we think the paths as functions from subsets of $O=\{ o_k\}_{k}$ to $O$, then a composition $f{\circ}g$ becomes a longer path, and
$$T[f{\circ}g](T(p))= (a_{f_j}, a_{f_{j-1}}, ..., a_{g_{j'}}, .... a_{g_{i'+1}}) = (T[g]{\circ}T[f])(T(p))$$FIXME: ok, this will have to be explained with a better notation.
In [1]:
from IPython.display import HTML
HTML('<iframe src=http://en.wikipedia.org/wiki/Covariance_and_contravariance_of_functors#Covariance_and_contravariance width=1000 height=200></iframe>')
Out[1]:
For historical reasons, the $a_k$ defined above are called action(s) in omero.biobank.
In [25]:
kb.Action.__fields__
Out[25]:
In [26]:
for e in kb.ActionCategory.__enums__:
print e.enum_label()
The ActionCategory
emum is basically a way to organize actions.
In [27]:
kb.Study.__fields__
Out[27]:
A Study
is a poor man's mechanims to group things and provide a context.
In [28]:
kb.ActionSetup.__fields__
Out[28]:
An ActionSetup
is a way to describe a parater set. In practice, conf
will contain a json encoding of "parameters". The context in which these parameters will be interpreted is defined by the specific device
Device
.
In [29]:
for k in kb.__dict__:
a = getattr(kb, k)
if isinstance(a, type) and issubclass(a, kb.Action):
print k
With respect to the $a_j$ defined above, we are missing the object on which the operation (i.e., the device) has been applied to $o_j$, that is the result of the projection $\Pi_j(\{o_k\}_{k < j}$. This is done using one of the ActionOnX
classes.
In [30]:
kb.ActionOnDataSample.__fields__
Out[30]:
In [31]:
kb.ActionOnVessel.__fields__
Out[31]:
For purely historical reasons, again, the source, $\Pi_j(\{o_k\}_{k < j}$, object is called target
.
We are now ready to consider the following case:
In [32]:
import json
with kb.context.sandbox():
action0 = kb.create_an_action()
ds0 = kb.factory.create(kb.DataSample,
conf = {'label': 'a foo label 0',
'status': kb.DataSampleStatus.USABLE,
'action': action0}).save()
device = kb.factory.create(kb.Device, {'label': 'a-plain-device',
'maker': 'foo', 'model': 'foom',
'release': 'foor'}).save()
context = kb.factory.create(kb.Study, {'label': 'one more test'}).save()
action_setup = kb.factory.create(kb.ActionSetup,
{'label': 'parameters1',
'conf': json.dumps({'A': 232})}).save()
action1 = kb.factory.create(kb.ActionOnDataSample,
{'actionCategory': kb.ActionCategory.PROCESSING,
'context': context, 'device': device,
'operator': 'Alfred E. Neumann',
'setup': action_setup,
'target': ds0}).save()
ds1 = kb.factory.create(kb.DataSample,
conf = {'label': 'a foo label 1',
'status': kb.DataSampleStatus.USABLE,
'action': action1}).save()
print (ds1.action.device.label, ds1.action.setup.label, ds1.action.target.label)
Which is reasonably similar to $a_j = (F_{\alpha_j}, p_j, \Pi_j(\{o_k\}_{k < j}).$ Note that we are using a sandbox()
context to automatically delete, in the right order, all the objects that we have created.