The logger stores experimental data in a single SQLite database. It is intended to be fast and lightweight, but record all necessary meta data and timestamps for experimental trials.
As a consequence, SQLiteBrowser can be used to browse the log results without having to do any coding.
Most of the entries are stored as JSON strings in the database tables; any object that can be serialised by Python's json
module can be added directly.
Log ExperimentLog has a single master log which records all logged data as JSON (with a timestamp) in a single series. The log is annotated with different streams that represent distinct sensors or inputs.
Session The log is indexed by sessions, where a session is a logical part of an experiment (a whole experiment, a condition, a repetition, etc.).
Metadata JSON Metadata about any log stream, session, run, user and the whole dataset can be recorded in the database, so there is a single, consistent record of everything to do with the experimental trials.
ExperimentLog uses the idea of binding metadata to sessions. So if you have a user who is doing an experiment, you can create a metadata entry for that user, and then bind it to the sessions that involve that user.
Session structures are hierarchical, and bindings apply to sessions and all of their children; so if a user is bound to an experiment, they are also bound to all the conditions, sub-conditions, repetitions, etc.
The logger also provides a custom logging handler for the standard Python logging
module (via the get_logger()
method), so that any debug messages can be stored in the DB and cross-referenced against experimental runs.
In [1]:
# import sqlexperiment as sqle
# from sqlexperiment import experimentlog
from explogger import ExperimentLog
# log some JSON data
e = ExperimentLog(":memory:", ntp_sync=False)
e.log("mouse", data={"x":0, "y":0})
e.log("mouse", data={"x":0, "y":1})
e.log("mouse", data={"x":0, "y":2})
e.close()
In [2]:
# from experimentlog import ExperimentLog
from explogger import extract
import logging
e = ExperimentLog(":memory:", ntp_sync=False)
## shows how to add the SQL logger
sql_handler = e.get_logger()
sql_handler.setLevel(logging.INFO)
log_formatter = logging.Formatter(fmt="%(asctime)s [%(levelname)-5.5s] %(message)s",
datefmt='%m-%d %H:%M')
sql_handler.setFormatter(log_formatter)
logging.getLogger().addHandler(sql_handler)
# use the logger
logging.info("Some information")
logging.info("Some more information")
# get the extracted logs as a line-separated single string
print("All logs")
print(extract.get_logs(e.cursor))
print("Just the log for this run")
print(extract.get_logs(e.cursor, run=e.run_id))
e.close()
In [3]:
e = ExperimentLog(":memory:", ntp_sync=False)
e.cd("/Experiment/Condition1")
e.log("mouse", data={"x":0, "y":0})
e.log("mouse", data={"x":0, "y":1})
e.log("mouse", data={"x":0, "y":2})
e.cd("/Experiment/Condition2")
e.log("mouse", data={"x":0, "y":0})
e.log("mouse", data={"x":0, "y":1})
e.log("mouse", data={"x":0, "y":2})
e.close()
In [4]:
import IPython.nbconvert
from IPython.core.display import HTML
def md_html(md):
return HTML(IPython.nbconvert.filters.markdown.markdown2html(md))
In [5]:
from explogger import report
In [6]:
md_html(report.string_report(e.cursor))
Out[6]:
In [7]:
# from experimentlog import ExperimentLog, np_to_str, str_to_np
import numpy as np
## open a connection to a database; will be created if it does not exist.
# here we use a memory database so the results are not stored to disk
e = ExperimentLog(":memory:", ntp_sync=False)
In [8]:
# check if we've already set everything up
# note we use the special .meta field to access persistent metadata
if e.meta.stage=="init":
e.create("STREAM", name="mouse", description="A time series of x,y cursor positions",
# the data is optional, and can contain anything you want
data={
"sample_rate": 60,
"dpi": 3000,
"mouse_device":"Logitech MX600"})
# and a post-condition questionnaire
e.create("STREAM", name="satisfaction",
description="A simple satisfaction score",
# here, we store the questions used for future reference
data={
"questions":["How satisfied were you with your performance?",
"How satisfied were you with the interface?"]}
)
Each session can have metadata attached to it; for example giving the parameters for a given condition.
When an experiment is run, instances of sessions are created, like files inside the filesystem.
In [9]:
if e.meta.stage=="init":
# We'll register an experiment, with three different conditions
e.create("SESSION", "Experiment", description="The main experiment",
data={"target_size":40.0, "cursor_size":5.0})
e.create("SESSION","ConditionA",description="Condition A:circular targets",
data={"targets":["circle"]})
e.create("SESSION","ConditionB", description="Condition B:square targets",
data={"targets":["square"]})
e.create("SESSION","ConditionC", description="Condition C:mixed targets",
data={"targets":["circle","square"]})
We'd usually only want to do this metadata creation once-ever; this setup procedure can be recorded by changing the database stage:
In [10]:
# mark the database as ready to log data
# meta is a special field that looks like an object, but is actually backed
# onto the database. Any field can be read or written to, as long as the value
# can be dumped to JSON
e.meta.stage="setup"
Each instance of a session (usually) involves experimental subjects. Each user should be registered, and then attached to a recording session. Multiple users can be attached to one session (e.g. for experiments with groups) but normally there will just be one user.
The pseudo
module can generate pronounceable, random, verifiable pseudonyms for subjects.
In [11]:
from explogger import pseudo
user = pseudo.get_pseudo()
print(user)
In [12]:
# now register the user with the database
e.create("USER", name=user, data={"age":30, "leftright":"right"})
In [13]:
# note that passing the session="" parameter automatically
# binds to that session prototype at the start of the session
e.enter("Experiment", session="Experiment")
# attach the user to this experiment, and thus to all conditions, etc.
e.bind("USER", user)
e.enter("ConditionA", session="ConditionA")
# calling enter() without any argument creates a numbered repetition (in this case, 0)
e.enter()
In [14]:
print(e.session_path)
print(e.bindings)
In [15]:
# log some data
e.log("mouse", data={"x":0, "y":10})
e.log("mouse", data={"x":0, "y":20})
Out[15]:
Test how fast we can write into the database:
In [16]:
%%timeit -n 50000
e.log("mouse", data={"x":20, "y":20})
In [17]:
# log questionnaire output
e.log("satisfaction", data={"q1":4,"q2":5})
Out[17]:
In [18]:
# leave this repetition
e.leave()
# move out of condition A
e.leave()
In [19]:
e.enter("ConditionB")
In [ ]:
# could log more stuff...
In [20]:
from explogger import ExperimentLog, np_to_str#, str_to_np
x = np.random.uniform(-1,1,(16,16))
# if we need to attach binary data to a log file (e.g. an image), we can do this:
# in general, it is best to avoid using blobs unless absolutely necessary
i = e.log("important_matrix", binary=np_to_str({"matrix":(x)}))
In [21]:
# back to the root -- here we mark this session (ConditionB) as being invalid.
e.leave(valid=False)
e.leave()
# end the run; normally you would not need to do this, since
# e.close() does this automatically -- but here we keep the DB
# open to make it quicker to demo querying it
e.end()
In [22]:
# print some results with raw SQL queries
mouse_log = e.cursor.execute("SELECT time, json FROM mouse", ())
print("\n".join([str(m) for m in mouse_log.fetchone()]))
In [25]:
from explogger import report
In [26]:
import IPython.nbconvert
from IPython.core.display import HTML
def md_html(md):
return HTML(IPython.nbconvert.filters.markdown.markdown2html(md))
In [27]:
md_html(report.string_report(e.cursor))
Out[27]:
In [28]:
# should only do this when all data is logged; otherwise there may be
# a performance penalty
e.add_indices()
There are a few basic tables in the ExperimentLog:
meta:
id, Unique ID
mtype, Type of this metadata: one of LOG, SESSION, USER, PATH
name, Name of the object, e.g. user pseudonym
type, (Optional) type tag
description, (Optional) text description
json (Optional) JSON string holding any other metadata.
The metadata for a log, session or user, path. mtype
specifies the kind of metadata it is. There are convenience views of this table:
stream, mtype=STREAM
users, mtype=USER
session_meta, mtype=SESSION
equipment, mtype=EQUIPMENT
dataset, mtype=DATASET
path, mtype=PATH
All have the same fields as above.
session:
id, Unique ID
start_time, Time this session was started
end_time, Time this session was completed (if it was)
test_run, If this is a test run or not
random_seed, Random seed used for this session can be stored here
valid, If this session was marked valid or not
complete, If this session was marked completed or not
parent, ID of the session this session is a subsession of
path, ID of the full path this session belongs to
json, Any additional metadata
run_session: (maps sessions to runs)
id, Unique ID
run, ID of the run
session, ID of the session
meta_session:
id, Unique ID,
meta, ID of the metadata
session, Session this is bound to
time, Time at which this metadata was bound
log:
id, Unique ID
time, Timestamp
valid, Valid flag for this data (e.g. to mark faulty sensor data)
stream, ID of the stream this log belongs to
session, ID of the session this log entry belongs to
json, The log entry itself
tag, (optional) tag for this log entry
binary, (optional) ID of the binary table entry
binary:
id, Unique ID
binary, Blob representation of binary values
debug_logging:
id, Unique row ID
time, Timestamp
record, String from the log formatter
run, ID of the run this log belongs to
level, The numeric code of the debugging level (see stdlib logging module docs for detail)
In [29]:
# make the new table -- must have a reference to the main
# log table
e.execute("""CREATE TABLE accelerometer
(id INTEGER PRIMARY KEY, device INT, x REAL, y REAL, z REAL, log INT,
FOREIGN KEY(log) REFERENCES log(id))
""")
# register a new stream
e.create("STREAM", name="acc", description="A time series of accelerometer values")
# now log a new value, put it into the separate accelerometer table and link
# it to the main log
def log_acc(dev,x,y,z):
log_id = e.log("acc")
e.execute("INSERT INTO accelerometer VALUES (?,?,?,?,?)",
(dev, x, y, z, log_id))
If you are recording media alongside an experimental trial (e.g. a video), you can use sync_ext() to record the link between external media and the master log.
Usage:
# look up a log entry with a sync point
t = e.execute('SELECT time FROM log WHERE tag="video_sync_mark"').fetchone()[0]
sync_ext("videos/myvideo.mp4", start_time=t)
If you want to record a segment of a video as being aligned:
t = e.execute('SELECT time FROM log WHERE tag="video_sync_mark"').fetchone()[0]
# marks a synchronisation of myvideo.mp4, from 20.0 -> 25.0 to the log time starting at t
sync_ext("videos/myvideo_002.mp4", start_time=t, duration=5.0, media_start_time=20.0)
To ensure logs are consistent in their timings, ExperimentLog
will try and sync to an NTP server on start-up and will record all times with the estimated clock offset already applied.
If you pass ntp_sync=False
to the ExperimentLog constructor, this will be skipped. Custom NTP servers can also be passed as a list:
# don't sync to NTP (not recommended)
e = ExperimentLog(ntp_sync=False)
# use custom NTP servers
e = ExperimentLog(ntp_servers=["1.pool.ntp.org", "2.pool.ntp.org"])
Dataset-wide metadata can be set using the special .meta
field of ExperimentLog, which is backed to the database. The report auto-generator can use this to build automatic readme files suitable for deposit for open access.
The following fields should be set:
In [30]:
e.meta.title="TestSet-1"
e.meta.institution="University of Glasgow"
e.meta.funder="ABC:XXX:101"
e.meta.ethics="CSEnnnn"
e.meta.authors="John Williamson"
e.meta.license="CC-BY-SA 2.0"
e.meta.confidential="no"
e.meta.paper="'A good paper', Williamson J., Proceedings of Things International 2016, pp.44-46"
e.meta.description="A study of the experimental logging process. Includes numerous repetitive examples of simple logged data."
e.meta.short_description="A quick logging test."
e.meta.doi= "DOI:xxxxxx"
In [31]:
print(dir(e.meta))
In [32]:
md_html(report.string_readme(e.cursor))
Out[32]:
In [ ]:
In [ ]: