Experiment logger

The logger stores experimental data in a single SQLite database. It is intended to be fast and lightweight, but record all necessary meta data and timestamps for experimental trials.

As a consequence, SQLiteBrowser can be used to browse the log results without having to do any coding.

Most of the entries are stored as JSON strings in the database tables; any object that can be serialised by Python's json module can be added directly.

Structure

  • Log ExperimentLog has a single master log which records all logged data as JSON (with a timestamp) in a single series. The log is annotated with different streams that represent distinct sensors or inputs.

  • Session The log is indexed by sessions, where a session is a logical part of an experiment (a whole experiment, a condition, a repetition, etc.).

  • Metadata JSON Metadata about any log stream, session, run, user and the whole dataset can be recorded in the database, so there is a single, consistent record of everything to do with the experimental trials.

Binding

ExperimentLog uses the idea of binding metadata to sessions. So if you have a user who is doing an experiment, you can create a metadata entry for that user, and then bind it to the sessions that involve that user.

Session structures are hierarchical, and bindings apply to sessions and all of their children; so if a user is bound to an experiment, they are also bound to all the conditions, sub-conditions, repetitions, etc.

Runs

  • Run ExperimentLog also tracks runs of the experimental software. A run exists from the start of the experimental software until the process exits. Each session can be part of a single run, or a session can be spread over many runs (e.g. if only part of the data is collected at one time).

Debug logging

The logger also provides a custom logging handler for the standard Python logging module (via the get_logger() method), so that any debug messages can be stored in the DB and cross-referenced against experimental runs.

Simplest possible example

ExperimentLog allows the addition of metadata and structure, but doesn't mandate it. The simplest example would be something like:


In [1]:
# import sqlexperiment as sqle
# from sqlexperiment import experimentlog
from explogger import ExperimentLog
# log some JSON data
e = ExperimentLog(":memory:", ntp_sync=False)
e.log("mouse", data={"x":0, "y":0})
e.log("mouse", data={"x":0, "y":1})
e.log("mouse", data={"x":0, "y":2})
e.close()


09-20 15:05 [WARNI]  No stream mouse registered; creating a new blank entry

In [2]:
# from experimentlog import ExperimentLog
from explogger import extract
import logging

e = ExperimentLog(":memory:", ntp_sync=False)


## shows how to add the SQL logger
sql_handler = e.get_logger()
sql_handler.setLevel(logging.INFO)
log_formatter = logging.Formatter(fmt="%(asctime)s [%(levelname)-5.5s]  %(message)s",
                                 datefmt='%m-%d %H:%M')
sql_handler.setFormatter(log_formatter)
logging.getLogger().addHandler(sql_handler)

# use the logger
logging.info("Some information")
logging.info("Some more information")

# get the extracted logs as a line-separated single string
print("All logs")
print(extract.get_logs(e.cursor))

print("Just the log for this run")
print(extract.get_logs(e.cursor, run=e.run_id))

e.close()


09-20 15:05 [INFO ]  Some information
09-20 15:05 [INFO ]  Some more information
All logs
09-20 15:05 [INFO ]  Some information
09-20 15:05 [INFO ]  Some more information
Just the log for this run
09-20 15:05 [INFO ]  Some information
09-20 15:05 [INFO ]  Some more information

Using paths

A filesystem-like structure is provided to make it easy to separate data:


In [3]:
e = ExperimentLog(":memory:", ntp_sync=False)
e.cd("/Experiment/Condition1")
e.log("mouse", data={"x":0, "y":0})
e.log("mouse", data={"x":0, "y":1})
e.log("mouse", data={"x":0, "y":2})
e.cd("/Experiment/Condition2")
e.log("mouse", data={"x":0, "y":0})
e.log("mouse", data={"x":0, "y":1})
e.log("mouse", data={"x":0, "y":2})
e.close()


09-20 15:05 [WARNI]  No stream mouse registered; creating a new blank entry

In [4]:
import IPython.nbconvert
from IPython.core.display import HTML
def md_html(md):    
    return HTML(IPython.nbconvert.filters.markdown.markdown2html(md))


/Users/antoine/anaconda/envs/experimentlog/lib/python3.6/site-packages/IPython/nbconvert.py:13: ShimWarning: The `IPython.nbconvert` package has been deprecated since IPython 4.0. You should import from nbconvert instead.
  "You should import from nbconvert instead.", ShimWarning)

In [5]:
from explogger import report

In [6]:
md_html(report.string_report(e.cursor))


Out[6]:

Report generated for none


Report date: Wed Sep 20 15:05:05 2017


Runs

  • Number of runs: 1
  • Total duration recorded: 0.0 seconds
  • Dirty exits: 0


Users

  • Unique users: 0

Log

  • Log streams recorded: mouse

mouse

JSON

    null

  • Total entries: 6

A more complex example


In [7]:
# from experimentlog import ExperimentLog, np_to_str, str_to_np
import numpy as np
## open a connection to a database; will be created if it does not exist.
# here we use a memory database so the results are not stored to disk
e = ExperimentLog(":memory:", ntp_sync=False)

Setting up the database

When a log is set up for the first time, the database needs to be configured for the experimental sessions.

Each sensor/information stream can be registered with the database. This could be individual sensors like a mouse (x,y) time series, or questionnaire results.


In [8]:
# check if we've already set everything up
# note we use the special .meta field to access persistent metadata
if e.meta.stage=="init":
    e.create("STREAM", name="mouse", description="A time series of x,y cursor positions",
                   # the data is optional, and can contain anything you want 
                  data={
                    "sample_rate": 60,
                    "dpi": 3000,
                    "mouse_device":"Logitech MX600"})
    
    # and a post-condition questionnaire
    e.create("STREAM", name="satisfaction", 
                   description="A simple satisfaction score",
                   # here, we store the questions used for future reference
                  data={
                    "questions":["How satisfied were you with your performance?",
                                "How satisfied were you with the interface?"]}
                    )

Sessions

ExperimentLog uses the concept of sessions to manage experimental data. Sessions are much like folders in a filesystem and usually form a hierarchy, for example:

/
    Experiment1/
        ConditionA/
            0/
            1/
            2/
        ConditionB/
            0/
            1/
            2/

    Experiment 2
        ConditionA/
            0/
            1/
            2/
            3/
        ConditionC/
            0/
            1/
            2/
            3/

Each session can have metadata attached to it; for example giving the parameters for a given condition.

When an experiment is run, instances of sessions are created, like files inside the filesystem.


In [9]:
if e.meta.stage=="init":
    # We'll register an experiment, with three different conditions
    e.create("SESSION", "Experiment", description="The main experiment", 
                           data={"target_size":40.0, "cursor_size":5.0})
    e.create("SESSION","ConditionA",description="Condition A:circular targets", 
                           data={"targets":["circle"]})
    e.create("SESSION","ConditionB", description="Condition B:square targets", 
                           data={"targets":["square"]})
    e.create("SESSION","ConditionC", description="Condition C:mixed targets", 
                           data={"targets":["circle","square"]})

We'd usually only want to do this metadata creation once-ever; this setup procedure can be recorded by changing the database stage:


In [10]:
# mark the database as ready to log data
# meta is a special field that looks like an object, but is actually backed
# onto the database. Any field can be read or written to, as long as the value
# can be dumped to JSON
e.meta.stage="setup"

Users

Each instance of a session (usually) involves experimental subjects. Each user should be registered, and then attached to a recording session. Multiple users can be attached to one session (e.g. for experiments with groups) but normally there will just be one user.

The pseudo module can generate pronounceable, random, verifiable pseudonyms for subjects.


In [11]:
from explogger import pseudo
user = pseudo.get_pseudo()
print(user)


HAZUN-OHAVA

In [12]:
# now register the user with the database
e.create("USER", name=user, data={"age":30, "leftright":"right"})

In [13]:
# note that passing the session="" parameter automatically
# binds to that session prototype at the start of the session
e.enter("Experiment", session="Experiment")
# attach the user to this experiment, and thus to all conditions, etc.
e.bind("USER", user)
e.enter("ConditionA", session="ConditionA")

# calling enter() without any argument creates a numbered repetition (in this case, 0)
e.enter()

In [14]:
print(e.session_path)
print(e.bindings)


/Experiment/ConditionA/0/
{MetaTuple(mtype='SESSION', name='Experiment', type='', description='The main experiment', json='{"target_size": 40.0, "cursor_size": 5.0}'), MetaTuple(mtype='SESSION', name='ConditionA', type='', description='Condition A:circular targets', json='{"targets": ["circle"]}'), MetaTuple(mtype='USER', name='HAZUN-OHAVA', type='', description='', json='{"age": 30, "leftright": "right"}')}

In [15]:
# log some data
e.log("mouse", data={"x":0, "y":10})
e.log("mouse", data={"x":0, "y":20})


Out[15]:
2

Test how fast we can write into the database:


In [16]:
%%timeit -n 50000
e.log("mouse", data={"x":20, "y":20})


32.8 µs ± 253 ns per loop (mean ± std. dev. of 7 runs, 50000 loops each)

In [17]:
# log questionnaire output
e.log("satisfaction", data={"q1":4,"q2":5})


Out[17]:
350003

In [18]:
# leave this repetition
e.leave() 

# move out of condition A
e.leave()

In [19]:
e.enter("ConditionB")

In [ ]:
# could log more stuff...

In [20]:
from explogger import ExperimentLog, np_to_str#, str_to_np

x = np.random.uniform(-1,1,(16,16))
# if we need to attach binary data to a log file (e.g. an image), we can do this:
# in general, it is best to avoid using blobs unless absolutely necessary
i = e.log("important_matrix", binary=np_to_str({"matrix":(x)}))


09-20 15:05 [WARNI]  No stream important_matrix registered; creating a new blank entry

In [21]:
# back to the root -- here we mark this session (ConditionB) as being invalid.
e.leave(valid=False)
e.leave()

# end the run; normally you would not need to do this, since
# e.close() does this automatically -- but here we keep the DB
# open to make it quicker to demo querying it
e.end()

In [22]:
# print some results with raw SQL queries
mouse_log = e.cursor.execute("SELECT time, json FROM mouse", ())
print("\n".join([str(m) for m in mouse_log.fetchone()]))


1505916309.075445
{"x": 0, "y": 10}

In [25]:
from explogger import report

In [26]:
import IPython.nbconvert
from IPython.core.display import HTML
def md_html(md):    
    return HTML(IPython.nbconvert.filters.markdown.markdown2html(md))

In [27]:
md_html(report.string_report(e.cursor))


Out[27]:

Report generated for none


Report date: Wed Sep 20 15:05:37 2017


Runs

  • Number of runs: 1
  • Total duration recorded: 20.6 seconds
  • Dirty exits: 0


Users

  • Unique users: 1

HAZUN-OHAVA

JSON

    {
        "age": 30,
        "leftright": "right"
    }

Duration recorded: 51.933205127716064 seconds


Log

  • Log streams recorded: mouse,satisfaction,important_matrix

mouse

A time series of x,y cursor positions

JSON

    {
        "dpi": 3000,
        "mouse_device": "Logitech MX600",
        "sample_rate": 60
    }

  • Total entries: 350002

satisfaction

A simple satisfaction score

JSON

    {
        "questions": [
            "How satisfied were you with your performance?",
            "How satisfied were you with the interface?"
        ]
    }

  • Total entries: 1

important_matrix

JSON

    null

  • Total entries: 1

Post-processing

Once all data is logged, it is wise to add indices so that logs can be accessed quickly.


In [28]:
# should only do this when all data is logged; otherwise there may be
# a performance penalty
e.add_indices()

SQL format

There are a few basic tables in the ExperimentLog:

Metadata

meta: 
    id, Unique ID
    mtype,    Type of this metadata: one of LOG, SESSION, USER, PATH
    name,     Name of the object, e.g. user pseudonym
    type,     (Optional) type tag
    description, (Optional) text description
    json       (Optional) JSON string holding any other metadata.

The metadata for a log, session or user, path. mtype specifies the kind of metadata it is. There are convenience views of this table:

stream, mtype=STREAM
users, mtype=USER
session_meta, mtype=SESSION
equipment, mtype=EQUIPMENT
dataset, mtype=DATASET
path, mtype=PATH

All have the same fields as above.

Session

    session: 
        id,          Unique ID
        start_time,  Time this session was started
        end_time,    Time this session was completed (if it was)            
        test_run,    If this is a test run or not
        random_seed, Random seed used for this session can be stored here
        valid,       If this session was marked valid or not
        complete,    If this session was marked completed or not
        parent,      ID of the session this session is a subsession of
        path,        ID of the full path this session belongs to
        json,        Any additional metadata

   run_session: (maps sessions to runs)
       id,           Unique ID
       run,          ID of the run
       session,      ID of the session

   meta_session:
       id,           Unique ID,
       meta,         ID of the metadata
       session,      Session this is bound to
       time,         Time at which this metadata was bound           

Logs

log:
    id,         Unique ID
    time,       Timestamp
    valid,      Valid flag for this data (e.g. to mark faulty sensor data)
    stream,     ID of the stream this log belongs to
    session,    ID of the session this log entry belongs to
    json,       The log entry itself        
    tag,        (optional) tag for this log entry
    binary,     (optional) ID of the binary table entry 


binary:
    id,        Unique ID
    binary,    Blob representation of binary values


Debug logs

debug_logging:
    id,       Unique row ID
    time,     Timestamp
    record,   String from the log formatter
    run,      ID of the run this log belongs to
    level,    The numeric code of the debugging level (see stdlib logging module docs for detail)        

Custom tables

If you want to log values with a custom table where the fields are not just plain JSON, you can add a new table to the database and just attach it to the log fields. The log() function returns the ID of the new log entry; use this as a foreign key in the new log table.

Example:


In [29]:
# make the new table -- must have a reference to the main
# log table
e.execute("""CREATE TABLE accelerometer 
          (id INTEGER PRIMARY KEY, device INT, x REAL, y REAL, z REAL, log INT,
          FOREIGN KEY(log) REFERENCES log(id))
          """)

# register a new stream
e.create("STREAM", name="acc", description="A time series of accelerometer values")

# now log a new value, put it into the separate accelerometer table and link
# it to the main log
def log_acc(dev,x,y,z):
    log_id = e.log("acc")
    e.execute("INSERT INTO accelerometer VALUES (?,?,?,?,?)", 
              (dev, x, y, z, log_id))

Sync points

If you are recording media alongside an experimental trial (e.g. a video), you can use sync_ext() to record the link between external media and the master log.

Usage:

# look up a log entry with a sync point
t = e.execute('SELECT time FROM log WHERE tag="video_sync_mark"').fetchone()[0]
sync_ext("videos/myvideo.mp4", start_time=t)

If you want to record a segment of a video as being aligned:

t = e.execute('SELECT time FROM log WHERE tag="video_sync_mark"').fetchone()[0]
# marks a synchronisation of myvideo.mp4, from 20.0 -> 25.0 to the log time starting at t
sync_ext("videos/myvideo_002.mp4", start_time=t, duration=5.0, media_start_time=20.0)

NTP

To ensure logs are consistent in their timings, ExperimentLog will try and sync to an NTP server on start-up and will record all times with the estimated clock offset already applied.

If you pass ntp_sync=False to the ExperimentLog constructor, this will be skipped. Custom NTP servers can also be passed as a list:

# don't sync to NTP (not recommended)
e = ExperimentLog(ntp_sync=False)

# use custom NTP servers
e = ExperimentLog(ntp_servers=["1.pool.ntp.org", "2.pool.ntp.org"])

Whole-dataset metadata

Dataset-wide metadata can be set using the special .meta field of ExperimentLog, which is backed to the database. The report auto-generator can use this to build automatic readme files suitable for deposit for open access.

The following fields should be set:

  • title (title of this dataset)
  • institution (institution(s) this dataset was recorded by)
  • authors (comma separated list of authors)
  • license (e.g. CC-BY-SA 2.0)
  • confidential (e.g. No, InternalOnly, ConsortiumOnly, Confidential, StrictlyConfidential), etc.
  • funder (name of funder and project name/code)
  • ethics (ethics board approval number)
  • paper (full details of associated paper)
  • short_description (one sentence description of the data set)
  • description (longer description of the dataset)
  • doi (DOI of this dataset)

In [30]:
e.meta.title="TestSet-1"
e.meta.institution="University of Glasgow"
e.meta.funder="ABC:XXX:101"
e.meta.ethics="CSEnnnn"
e.meta.authors="John Williamson"
e.meta.license="CC-BY-SA 2.0"
e.meta.confidential="no"
e.meta.paper="'A good paper', Williamson J., Proceedings of Things International 2016, pp.44-46"
e.meta.description="A study of the experimental logging process. Includes numerous repetitive examples of simple logged data."
e.meta.short_description="A quick logging test."
e.meta.doi= "DOI:xxxxxx"

In [31]:
print(dir(e.meta))


['authors', 'confidential', 'description', 'doi', 'ethics', 'funder', 'institution', 'license', 'paper', 'short_description', 'stage', 'title']

In [32]:
md_html(report.string_readme(e.cursor))


Out[32]:

TestSet-1

A quick logging test.

DOI: DOI:xxxxxx


Report date: Wed Sep 20 15:05:50 2017


Authors: John Williamson

Institution: University of Glasgow

License: CC-BY-SA 2.0


Confidential: no


Funded by: ABC:XXX:101

Associated paper: 'A good paper', Williamson J., Proceedings of Things International 2016, pp.44-46

Ethics board approval number: CSEnnnn


Description A study of the experimental logging process. Includes numerous repetitive examples of simple logged data.


  • Data acquired from 20 September 2017 to 20 September 2017

  • Number of experimental runs: 1
  • Total duration recorded: 20.6 seconds
  • Number of users: 1
  • Total logged entries: 350004


In [ ]:


In [ ]: