A pipeline can be understood as a collection of filters and connectors that consume data by performing transformation of it on a linear and repetitive fashion.
This data can be named as a stream and each filter is a transformation or operation.
Connectors between the filters are the schema of the pipeline, and it inspires a always forward structure.
We need this for TOROS for several reasons
Observational astronomers are facing a data tsunami, and synoptic surveys are one of the more data intensive tasks today.
Telescopes of any size can produce massive amounts of data, and on top of that TOROS needs real-time processing
Processing is time consuming for humans, and most of the tasks involved are likely to fail.
Handling meta-data is key to a Sinoptic Survey
It makes use of some computer science formalisms and tools:
Testing and Code quality measurements integrated!
In [1]:
from sqlalchemy.orm import *
from sqlalchemy import *
from sqlalchemy.ext import declarative
Model = declarative.declarative_base(name="Model")
In [2]:
class Observatory(Model):
"""Model for observatories. SQLAlchemy Model object.
"""
__tablename__ = 'Observatory'
id = Column(Integer, primary_key=True)
name = Column(String(100), nullable=False, unique=True)
latitude = Column(Float, nullable=False)
longitude = Column(Float, nullable=False)
description = Column(Text, nullable=True)
def __repr__(self):
return self.name
In [3]:
class Campaign(Model):
__tablename__ = 'Campaign'
id = Column(Integer, primary_key=True)
name = Column(String(100), nullable=False, unique=True)
description = Column(Text, nullable=True)
observatory_id = Column(Integer, ForeignKey('Observatory.id'))
observatory = relationship(
"Observatory", backref=backref('campaigns', order_by=id))
ccd_id = Column(Integer, ForeignKey('CCD.id'))
ccd = relationship(
"CCD", backref=backref('campaigns', order_by=id))
def __repr__(self):
return self.name
The processing operations performed during each stage of the pipeline are encapsulated in atomic Steps, i.e. each one is to be performed completely or failed fatally.
They cannot have a mixed outcome, and every time a step process is executed a database entry is recorded, to help mantain reproducibility of results and error tracking.
The general step entity is formalised in filters and connectors.
This can be seen inside a ETL closed process
Whenever a previously defined condition is satisfied, or a preferred state is reached, an alert is triggered.
CORRAL can communicate these alerts by SMS, e-mail, or web services (like a Django web app).
Inside CORRAL any alert can be defined, and any channel compatible with Python will communicate the information.
This Pipeline framework has been developed for TOROS, and it will be fully deployed in next campaign.