Today I worked to classify type of error logs. I wrote simple regex like these:
def has_missing_dependency(log):
p = re.compile("ImportError:")
return any([re.match(p, line) for line in log])
and then labelled the failures and then counted number of failures in each categories. Here are the results, out of 500 recipes build we failed at 174 of them.
No recipe available: 21/174
No packages found in current linux-64 channels: 30/174
missing build dependency: 54/174
test failure: missing dependency: 37/174
test failure: other reasons: 14/174
invalid syntax: 4/174
unclassified: 14/174
Classifying the logs brought more structure to the problem, now I know which type of error to target first. Then step by I can take all, rather most of them. This is the reductionist approach, break the problem into small managable chunks and eat them one at time.