In [ ]:
import sys
import inspect
from unittest import mock

orig_stack = inspect.stack

def stack():
    s = orig_stack()
    i = len(s) - 1
    while i > 0 and not s[i][1].startswith("<ipython-input"):
        i -= 1
    
    return s[:i + 1]

inspect.stack = stack

with mock.patch('seqtools.errors.inspect', inspect):
    import seqtools

Error handling and debuging

During the design of a transformation pipeline, mistakes and programming errors are relatively frequent. SeqTools tries to recover from them and report useful informations as much as possible. This tutorial reviews some details about the internal error management and should facilitate your debugging sessions.

Tracing mapping errors

Due to on-demand execution, an error generated by mapping a function to a dataset will only happen when a problematic element is read, not when the data container is defined.

By default, SeqTools raises an EvaluationError in such cases, and sets the original exception as its cause. Let's observe this in action:


In [ ]:
import math
import random
import seqtools

def f1(x):
    return math.sqrt(x)  # this will fail for negative values

def f2(x):
    return x + 1

data = [random.randint(0, 100) for _ in range(100)]
data[-1] *= -1

out = seqtools.smap(f1, data)
out = seqtools.smap(f2, out)
out = seqtools.smap(f1, out)

Due to on-demand execution, we are still unaware of the upcoming failure for the last item. But when we evaluate the items...


In [ ]:
list(out)

The ValueError that caused the failure is detailed first. Unfortunately, the origin of the error is ambiguous because f1 is used twice in this pipeline. To alleviate this issue, the exception message of the resulting EvaluationError provides additional clarifications: it tells where the failing mapping was defined, here on the first mapping of f1.

If you prefer working with the original directly and skip the EvaluationError, you can enable the 'passthrough' error mode which does just that:


In [ ]:
seqtools.seterr(evaluation='passthrough')

list(out)

In [ ]:
seqtools.seterr(evaluation='wrap')  # revert to normal behaviour

Errors inside worker

Background workers used by prefetch or load_buffers do not share the execution space of the main program. As a result, exceptions raised while evaluating an element inside a worker happen asynchronously and separately from the main program.

To hide this aspect from end-users, SeqTools tries to store errors and report them when appropriate. Let's see this in action:


In [ ]:
import time

fast_out = seqtools.prefetch(out, max_buffered=10)

# evaluate all elements normally
for i in range(len(fast_out) - 1):
    fast_out[i]

time.sleep(1)

All valid elements were read normally. By waiting one extra second, we are now certain that the final element has been evaluated (and raised a ValueException), but no error is raised until an explicit read is attempted:


In [ ]:
fast_out[-1]

Note that the workers will continue working just fine after the error. If desired, you can catch the exception and continue reading other values.

This approach greatly facilitates debugging but has limitations to be aware of:

  • Process-based workers cannot save errors that cannot be pickled, in particular exception types defined inside a function.
  • Error tracebacks must be saved and then reconstructed. Some information is lost during this process and you won't be able to navigate inside the execution stack with a debugger.