Objects and Devices

The goal of biobank is to mantain a consistent picture of physical and digital objects and the chain of operations that connect the objects between themselves. The permanence of objects is guaranteed by a backend system, but what we will describe here is, usually, the client side point of view. In this notebook we will describe the basic concepts of objects creation, destruction and how to represent transformations.

Talking to the back-end

First of all we need to connect to the back-end. FIXME: in the actual example, it would make sense to have generic test accounts, rather than irgb :-)


In [1]:
import sys, os
from bl.vl.kb import KnowledgeBase

OME_HOST = os.getenv('OME_HOST', 'localhost')
OME_USER = os.getenv('OME_USER', 'test')
OME_PASSWD = os.getenv('OME_PASSWD', 'test')
CHECK_OME_VERSION = os.getenv('CHECK_OME_VERSION', "True") == "True"

BaseProxy = KnowledgeBase(driver='omero')

class Proxy(BaseProxy):
  def get_objects_dict(self, klass):
    return dict((o.label, o) for o in super(Proxy, self).get_objects(klass))

kb = Proxy(OME_HOST, OME_USER, OME_PASSWD, check_ome_version=CHECK_OME_VERSION)
kb.connect()
kb.start_keep_alive()

def cleanup():
  print "# disconnecting the kb"
  kb.disconnect()

sys.exitfunc = cleanup

print
print "### KB ENV PRELOADED ###"
print "# connected to %s" % OME_HOST
print "# knowledge base: kb"
print "# extra method: kb.get_objects_dict"
print "########################"


### KB ENV PRELOADED ###
# connected to biobank04.crs4.it
# knowledge base: kb
# extra method: kb.get_objects_dict
########################

What the code above does is to establish a connection to the omero.biobank server. To comunicate with the server we will be using kb, a proxy object.

Objects Creation and Deletion

Omero.biobank has models for multiple physical and digital objects, and, if needed, it can be easily extended with new objects models. One of the simplest models is DataSample. It represents a blob of digital data. In reality, one typically uses instances of more specific models derived from DataSample and there are complexities related to data physical location. However, for the time being, we will consider the simple, unadorned, DataSample.

FIXME: omero.biobank models are proxied by python classes in the client. The API is based on providing access, via kb, to these proxy classes and to global function that operate on them.

First thing we will do is to create a DataSample instance, that we will call data_sample. The creation is done using the method create of the kb factory (kb.factory). The moethod takes two parameters: the object class and a dict with the object fields assignments. All objects that correspond either to phyical (FIXME: what about HardwareDevice?) or to digital data objects require the specification of a field called 'action'. Its purpose will be explained later. For the time being we will use 'kb.create_an_action()' to get a dummy place holder.

To create a DataSample we need to assing an unique label and a status attribute, together with the action mentioned above.


In [2]:
action = kb.create_an_action()
data_sample = kb.factory.create(kb.DataSample, 
                                conf = {'label': 'a foo label', 
                                'status': kb.DataSampleStatus.USABLE, 
                                'action': action})

In [3]:
data_sample.label, data_sample.status


Out[3]:
('a foo label',
 <bl.vl.kb.drivers.omero.data_samples.DataSampleStatus at 0x274f390>)

Now we have data_sample only in the client RAM and we need do save it in omero.biobank. By invoking .is_mapped() we check if it exists in the db.


In [4]:
data_sample.is_mapped()


Out[4]:
False

In [5]:
data_sample.save()


Out[5]:
<bl.vl.kb.drivers.omero.data_samples.DataSample at 0x274fa10>

In [6]:
data_sample.is_mapped()


Out[6]:
True

All omero.biobank objects are uniquely identified by a an id called a 'vid'. For instance,


In [7]:
data_sample.id


Out[7]:
'V00B08F1CB14BA4D6190736DA45FD1C1F4'

Once objects have been created they can be destroyed.


In [8]:
kb.delete(action)


---------------------------------------------------------------------------
KBError                                   Traceback (most recent call last)
<ipython-input-8-dc02a02867bd> in <module>()
----> 1 kb.delete(action)

/home/zag/.local/lib64/python2.7/site-packages/bl/vl/kb/drivers/omero/proxy_core.pyc in delete(self, kb_obj)
    287                                   kb_obj.ome_obj)
    288     except omero.ValidationException:
--> 289       raise kb.KBError("object is referenced by one or more objects")
    290     except omero.ApiUsageException:
    291       raise kb.KBError("trying to delete non-persistent object")

KBError: object is referenced by one or more objects

Well, they can be deleted but in the right order. Omero.biobank maintains reference integrity. Thus, first we will have to delete data_sample and then action. FIXME: this will actually leave some debries behind, but it will probably make things to complicated right now.


In [9]:
kb.delete(data_sample)

In [10]:
kb.delete(action)

Objects collections

Omero.biobank handles collection by defining a new object -- instance of a class derived from VLCollection, the base abstract class -- and then associating specific objects to it using instances of a support class. FIXME: this is not the most elegant thing that can be done....

Consider, for instance, a collection of DataSamples. The concrete collection class is DataCollection and the way we would build the collection is as follows.


In [11]:
action = kb.create_an_action()
samples = [kb.factory.create(kb.DataSample, 
                             {'label': "foodata_{}".format(i),
                              'status': kb.DataSampleStatus.USABLE, 
                              'action': action}).save()
           for i in range(3)]

In [12]:
data_collection = kb.factory.create(kb.DataCollection, 
                                    {'label': 'a collection', 'action': action}).save()

In [13]:
dcis = [kb.factory.create(kb.DataCollectionItem, 
                          {'dataCollection' : data_collection, 'dataSample' : s}).save() 
        for s in samples]

In [14]:
kb.get_data_collection_items(data_collection)


Out[14]:
[<bl.vl.kb.drivers.omero.objects_collections.DataCollectionItem at 0x496a410>,
 <bl.vl.kb.drivers.omero.objects_collections.DataCollectionItem at 0x4a29990>,
 <bl.vl.kb.drivers.omero.objects_collections.DataCollectionItem at 0x4a27250>]

Note that a collection is a, essentially, a set.


In [15]:
kb.factory.create(kb.DataCollectionItem, 
                  {'dataCollection' : data_collection, 
                   'dataSample' : samples[0]}).save()


2013-11-22 09:10:31|ERROR   |omero.ValidationException: could not insert: [ome.model.vl.DataCollectionItem]; SQL [insert into datacollectionitem (dataCollection, dataCollectionItemUK, dataSample, creation_id, external_id, group_id, owner_id, permissions, update_id, version, vid, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]; constraint [datacollectionitem_datacollectionitemuk_key]; nested exception is org.hibernate.exception.ConstraintViolationException: could not insert: [ome.model.vl.DataCollectionItem]
2013-11-22 09:10:31|ERROR   |omero.ValidationException object: <class 'bl.vl.kb.drivers.omero.objects_collections.DataCollectionItem'>
---------------------------------------------------------------------------
KBError                                   Traceback (most recent call last)
<ipython-input-15-a19a04f8315f> in <module>()
      1 kb.factory.create(kb.DataCollectionItem, 
      2                   {'dataCollection' : data_collection, 
----> 3                    'dataSample' : samples[0]}).save()

/home/zag/.local/lib64/python2.7/site-packages/bl/vl/kb/drivers/omero/wrapper.pyc in save(self)
    152 
    153   def save(self):
--> 154     return self.proxy.save(self)
    155 
    156   def serialize(self, engine, shallow=False):

/home/zag/.local/lib64/python2.7/site-packages/bl/vl/kb/drivers/omero/proxy_core.pyc in save(self, obj)
    248       self.logger.error(msg)
    249       self.logger.error('omero.ValidationException object: %s' % type(obj))
--> 250       raise kb.KBError(msg)
    251     obj.ome_obj = result
    252     self.store_to_cache(obj)

KBError: omero.ValidationException: could not insert: [ome.model.vl.DataCollectionItem]; SQL [insert into datacollectionitem (dataCollection, dataCollectionItemUK, dataSample, creation_id, external_id, group_id, owner_id, permissions, update_id, version, vid, id) values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]; constraint [datacollectionitem_datacollectionitemuk_key]; nested exception is org.hibernate.exception.ConstraintViolationException: could not insert: [ome.model.vl.DataCollectionItem]

As you can see, trying to put the same object twice in a collection will fail. Just before we forget, let's clean up what we have just created.


In [16]:
for dci in kb.get_data_collection_items(data_collection):
    kb.delete(dci)

In [17]:
for s in samples:
    kb.delete(s)

In [18]:
kb.delete(data_collection)

Representing operations on objects

Omero.biobank is designed to trace as accurately as possible, of course within intrinsic operational noise -- whatever we mean by that -- the full sequence of operations that connect one object to its ancestors.

We can order all objects by the time of their creation, so we will use $o_j$ to indicate the $j$ objects in the temporal sequence. For all objects it is true that

$$o_i = F_{\alpha_i}[p_i](\Pi_i(\{o_k\}_{k < i}),$$

where $\Pi_i$ is a projection operator that selects from the available objects the subset that it is used by the operator $F_\alpha$. We assume that we have a countable set of operators $F_\alpha$, here with $\alpha=\alpha_i$. With the symbol $p_i$ we indicate the set of parameters that specialize the application of $F_\alpha$.

To describe the $F_\alpha$, kb uses objects that are instances of classes derived from Device.


In [19]:
kb.Device.__fields__


Out[19]:
{'label': ('string', 'required'),
 'maker': ('string', 'required'),
 'model': ('string', 'required'),
 'release': ('string', 'required'),
 'vid': ('vid', 'required')}

A Device is characterized by a label, a maker, a model and a release. Thus, for instance, we can will now create a generic device.


In [20]:
device = kb.factory.create(kb.Device, 
                           {'label': 'a-plain-device', 
                            'maker': 'foo', 'model': 'foom', 
                            'release': 'foor'}).save()

In [21]:
device.id


Out[21]:
'V06BDC4721463E48479F6655AC39824A2C'

In [22]:
device.label


Out[22]:
'a-plain-device'

In [23]:
kb.delete(device)

In [24]:
for k in kb.__dict__:
    a = getattr(kb, k)
    if isinstance(a, type) and issubclass(a, kb.Device):
        print k


AnnotatedChip
SoftwareProgram
HardwareDevice
GenotypingProgram
Device
Scanner
Chip

FIXME: Of the Devices above:

  • Chip is now deprecated, since it has been replaced by a more sophisticated model where chip is an object and not a operation;
  • GenotypingProgram, and SoftwareProgram too actually, should both be replaced with a SoftwareWorkflow (=> GalaxyWorkflow ?) Device;
  • AnnotatedChip is deprecated too.

On being contravariant (and contrarian)

In the specific context of omero.biobank, one is not interested in describing the future but rather the history of events that resulted in a given object. In the figure below, we show a cartoon dependency graph, loosely patterned on a specific example. The graph is directional, with flow going from the root towards the leaves. While each object has one incoming edge, it could have multiple outgoing edges. It is thus natural to associate to each node of the graph, say $o_j$, an object, $a_j$, that saves sufficient information to describe how it has been produced, thus:

$$a_j = (F_{\alpha_j}, p_j, \Pi_j(\{o_k\}_{k < j}).$$

In principle, it would be enough to save $\Pi_j$ but it is more efficient to keep the actual list $\Pi_j(\{o_k\}_{k < j})$.

In other words, we are describing a transformation $T$ that goes from the set of the nodes of $G$, the dependency graph, to the set of the $\{a_k\}_k$.

If we describe a path in the graph by its two possible extrema, $P_{i,j}$ with $i<j$, the natural transformation of $P_{i,j}$ by $T$ is the reverse sequence $(a_j, a_{j-1}, a_{j-2}, ..., a_{i+1})$. In a way, $T$ is a sort of contravariant functor. If we think the paths as functions from subsets of $O=\{ o_k\}_{k}$ to $O$, then a composition $f{\circ}g$ becomes a longer path, and

$$T[f{\circ}g](T(p))= (a_{f_j}, a_{f_{j-1}}, ..., a_{g_{j'}}, .... a_{g_{i'+1}}) = (T[g]{\circ}T[f])(T(p))$$

FIXME: ok, this will have to be explained with a better notation.


In [1]:
from IPython.display import HTML
HTML('<iframe src=http://en.wikipedia.org/wiki/Covariance_and_contravariance_of_functors#Covariance_and_contravariance width=1000 height=200></iframe>')


Out[1]:

For historical reasons, the $a_k$ defined above are called action(s) in omero.biobank.


In [25]:
kb.Action.__fields__


Out[25]:
{'actionCategory': (bl.vl.kb.drivers.omero.action.ActionCategory, 'required'),
 'beginTime': ('timestamp', 'required'),
 'context': (bl.vl.kb.drivers.omero.action.Study, 'required'),
 'description': ('text', 'optional'),
 'device': (bl.vl.kb.drivers.omero.action.Device, 'optional'),
 'endTime': ('timestamp', 'optional'),
 'operator': ('string', 'required'),
 'setup': (bl.vl.kb.drivers.omero.action.ActionSetup, 'optional'),
 'vid': ('vid', 'required')}

In [26]:
for e in kb.ActionCategory.__enums__:
    print e.enum_label()


IMPORT
CREATION
EXTRACTION
UPDATE
ALIQUOTING
MEASUREMENT
PROCESSING

The ActionCategory emum is basically a way to organize actions.


In [27]:
kb.Study.__fields__


Out[27]:
{'description': ('string', 'optional'),
 'endDate': ('timestamp', 'optional'),
 'label': ('string', 'required'),
 'startDate': ('timestamp', 'required'),
 'vid': ('vid', 'required')}

A Study is a poor man's mechanims to group things and provide a context.


In [28]:
kb.ActionSetup.__fields__


Out[28]:
{'conf': ('string', 'required'),
 'label': ('string', 'required'),
 'vid': ('vid', 'required')}

An ActionSetup is a way to describe a parater set. In practice, conf will contain a json encoding of "parameters". The context in which these parameters will be interpreted is defined by the specific device Device.


In [29]:
for k in kb.__dict__:
    a = getattr(kb, k)
    if isinstance(a, type) and issubclass(a, kb.Action):
        print k


ActionOnDataCollectionItem
ActionOnVessel
ActionOnDataSample
ActionOnIndividual
Action
ActionOnAction
ActionOnCollection

With respect to the $a_j$ defined above, we are missing the object on which the operation (i.e., the device) has been applied to $o_j$, that is the result of the projection $\Pi_j(\{o_k\}_{k < j}$. This is done using one of the ActionOnX classes.


In [30]:
kb.ActionOnDataSample.__fields__


Out[30]:
{'target': (bl.vl.kb.drivers.omero.data_samples.DataSample, 'required')}

In [31]:
kb.ActionOnVessel.__fields__


Out[31]:
{'target': (bl.vl.kb.drivers.omero.vessels.Vessel, 'required')}

For purely historical reasons, again, the source, $\Pi_j(\{o_k\}_{k < j}$, object is called target.

We are now ready to consider the following case:

  • we have a DataSample with label 'foo' that we call $o_0$;
  • we use an operation that starting from $o_0$ produces $o_1$ using a specific device with a give configuration;
  • we save the result, and we track how we have obtained it.

In [32]:
import json

with kb.context.sandbox():
    action0 = kb.create_an_action()
    ds0 = kb.factory.create(kb.DataSample, 
                            conf = {'label': 'a foo label 0', 
                                    'status': kb.DataSampleStatus.USABLE, 
                                    'action': action0}).save()
    device = kb.factory.create(kb.Device, {'label': 'a-plain-device', 
                                           'maker': 'foo', 'model': 'foom', 
                                           'release': 'foor'}).save()
    context = kb.factory.create(kb.Study, {'label': 'one more test'}).save()
    action_setup = kb.factory.create(kb.ActionSetup, 
                                     {'label': 'parameters1', 
                                      'conf': json.dumps({'A': 232})}).save()
    action1 = kb.factory.create(kb.ActionOnDataSample, 
                                {'actionCategory': kb.ActionCategory.PROCESSING, 
                                 'context': context, 'device': device,
                                 'operator': 'Alfred E. Neumann',
                                 'setup': action_setup,
                                 'target': ds0}).save()
    ds1 = kb.factory.create(kb.DataSample, 
                            conf = {'label': 'a foo label 1', 
                                    'status': kb.DataSampleStatus.USABLE, 
                                    'action': action1}).save()
    print (ds1.action.device.label, ds1.action.setup.label, ds1.action.target.label)


('a-plain-device', 'parameters1', 'a foo label 0')

Which is reasonably similar to $a_j = (F_{\alpha_j}, p_j, \Pi_j(\{o_k\}_{k < j}).$ Note that we are using a sandbox() context to automatically delete, in the right order, all the objects that we have created.