| notebook.community

Today I worked to classify type of error logs. I wrote simple regex like these:

def has_missing_dependency(log):
    p = re.compile("ImportError:")
    return any([re.match(p, line) for line in log])

and then labelled the failures and then counted number of failures in each categories. Here are the results, out of 500 recipes build we failed at 174 of them.

No recipe available: 21/174

No packages found in current linux-64 channels: 30/174

missing build dependency: 54/174

test failure: missing dependency: 37/174

test failure: other reasons: 14/174

invalid syntax: 4/174

unclassified: 14/174

Classifying the logs brought more structure to the problem, now I know which type of error to target first. Then step by I can take all, rather most of them. This is the reductionist approach, break the problem into small managable chunks and eat them one at time.