Repair exports with bad mappings

This notebook repairs an export from a Raveler session with bad mappings. Specifically, we're talking about the problems that happen when a problem with a superpixel split causes superpixel bitmaps to be rewritten but the corresponding sp-seg maps to be left unchanged. The session will fail to save (fully) and will be corrupt, although the corruption will be limited to the areas where the sp splits occured. Probably those sp will show uncolored in Raveler, and if you click them, you'll get an error.

In an export, the sp-seg map will not have entries for these superpixels. That will cause problems for downstream processors. In this case, DVID was unable to generate a label map for the session.

The solution is fairly straightforward. If you're working from an export, simple scan each of the exported superpixel images. For any superpixel IDs present on an image plane but not in the sp-seg map, create a new segment and new body. Append the mappings to the existing sp-seg and seg body maps.

Warning

There is a flaw in these tools! When looking for unmapped superpixels, the functions below assumes that every superpixel whose ID is above the max in the maps is new (which is true), but it also assumes that every superpixel below the previous max is mapped. That was found not to be true.

Therefore instead of simply finding the maximum superpixel on each plane, one must find the exact list in the image and in the maps and compare them. That way those other rogue sp can be found.

General tools


In [1]:
# std lib
import os
import time

# other libs
import numpy

import Image

We need to read and rewrite our sp-seg and seg-body maps. The files are large, so we don't want to read them fully into memory. On the read size, we want the max superpixel per plane, and the max seg and body overall.


In [2]:
def findmaxsuperpixels(filepath):
    maxsp = {}
    currentmax = 0
    for line in open(filepath, 'rt'):
        line = line.strip()
        if line and not line.startswith("#"):
            items = line.split()
            z = int(items[0])
            spid = int(items[1])
            # don't care about seg ID at this point
            currentmax = maxsp.setdefault(z, 0)
            if spid > currentmax:
                maxsp[z] = spid
    return maxsp

In [3]:
def findmaxsegbody(filepath):
    maxseg = 0
    maxbody = 0
    for line in open(filepath, 'rt'):
        line = line.strip()
        if line and not line.startswith("#"):
            items = line.split()
            maxseg = max(maxseg, int(items[0]))
            maxbody = max(maxbody, int(items[1]))
    return maxseg, maxbody

Given max sp in each plane in the maps and in the sp images, and the max seg and body IDs, find the new mappings.


In [4]:
def findnewmappings(maxspimage, maxspmap, maxseg, maxbody):
    """
    input: {z: max sp} from bitmap impages; {z: max sp} from text file; max seg and body IDs
    output: ({(z, spid): segid}, {segid, bodyid}) for sp in image but not text file
    """
    newspseg = {}
    newsegbody = {}
    for z in maxspimage:
        if maxspimage[z] > maxspmap[z]:
            # got to allow for more than one!
            for newsp in range(maxspmap[z], maxspimage[z] + 1):
                maxseg += 1
                maxbody += 1
                newspseg[z, newsp] = maxseg
                newsegbody[maxseg] = maxbody
    return newspseg, newsegbody

For writing, we'll just append lines at the end of the map files rather than read/rewrite:


In [5]:
def appendspmapping(filepath, mapping):
    # note 'a' = append
    with open(filepath, 'at') as f:
        for (z, spid), segid in mapping.items():
            f.write("%d\t%d\t%d\n" % (z, spid, segid))
def appendsegmapping(filepath, mapping):
    # note 'a' = append
    with open(filepath, 'at') as f:
        for segid, bodyid in mapping.items():
            f.write("%d\t%d\n" % (segid, bodyid))

Read a superpixel image plane and translate its RGBA pixels into superpixel IDs:


In [6]:
def readsuperpixelmap(filepath):
    """
    reads an RGBA sp map and returns an array of its superpixels,
    translated into 32 bit ints as we usually do it
    """
    im = Image.open(filepath)
    if im.mode != "RGBA":
        raise ValueError("can only handle RGBA sp maps")
    a = numpy.asarray(im)
    # note the ordering, and note we ignore the A channel
    return a[:, :, 0] + a[:, :, 1] * 256 + a[:, :, 2] * 256 * 256

Find planes with modified tiles

Sometimes we will not know which planes have problems. Fortunately, rather than scan the whole stack, we can look in the session's tiles directory.


In [7]:
def findmodifiedplanes(sessiondir):
    """
    given path to session, return set of planes that have at least
    one modified tile
    """
    
    planes = set()
    for currdir, subdirs, filenames in os.walk(os.path.join(sessiondir, "tiles", "1024", "0")):
        for name in filenames:
            if name.endswith(".png"):
                planes.add(int(name[:-4]))
    return planes

FIB-25 fix, August 2014 -- from export

NOTE: this is obsolete. See version two, below.

In the first half of 2014, we were working on FIB-25. Our seeded body split tool at this time had a bug which resulted in the kinds of errors discussed above. We're going to fix it.

The text maps come from the exportdir, while the images are loaded from the imagedir.


In [7]:
exportdir = "/groups/flyem/data/medulla-FIB-Z1211-25-production/align2/stitched_layers/1-13_0-8089_extended_20140518T072142/raveler-export"
imagedir = "/groups/flyem/data/medulla-FIB-Z1211-25-production/align2/stitched_layers/export/shinya1-13_20140715/"

First, I copied the sp to seg and seg to body maps for safety (.orig suffix). I didn't do that in this notebook because it should not be repeated later. Note that these files are rather large (2-3 Gb).

Next, we are fortunate to be able to reduce the number of images we scan, because Bill's DVID ingestion script tells us which plane ranges have problems. That error log is in the same folder as this notebook.


In [8]:
# read it in before the default working dir changes:
errorfilename = "DVIDerrors-2014.txt"
errordata = open(errorfilename, 'rt').readlines()

In [9]:
errordata[:4]


Out[9]:
['2014/07/17 12:35:54 No mapping found for 0000128d000062ef (slice 4749) in block with Z 4736 to 4767\n',
 "2014/07/17 12:35:54 Aborting processing of 'superpixels' chunk using 'sp2body' labelmap\n",
 '2014/07/17 12:35:54 No mapping found for 0000128d000062ef (slice 4749) in block with Z 4736 to 4767\n',
 "2014/07/17 12:35:54 Aborting processing of 'superpixels' chunk using 'sp2body' labelmap\n"]

Bill notes that DVID processing for a block stops at an error, so we need to scan whole blocks of planes, not just the specifically noted planes. In other words, we need to extract the "4736 to 4767" parts and generate a list of planes.


In [10]:
# who needs elegant regexes when you have brute force...
planes = set()
for line in errordata:
    if "No mapping" in line:
        line = line.strip()
        items = line.split()
        z1 = int(items[-3])
        z2 = int(items[-1])
        planes = planes.union(range(z1, z2 + 1))

In [11]:
len(planes)


Out[11]:
128

As expected, that's a multiple of 32, our block size.

Scan image planes

First task is to scan the image planes.


In [12]:
maxsponplane = {}
print time.asctime()
for z in planes:
    imagepath = os.path.join(imagedir, "superpixel_maps", "sp_map.%05d.png" % z)
    data = readsuperpixelmap(imagepath)
    maxsponplane[z] = data.max()
print time.asctime()


Tue Aug  5 14:43:43 2014
Tue Aug  5 14:47:29 2014

In [13]:
print len(maxsponplane)


128

Adjust maps

Next, read the sp-seg map and identify which sp don't appear. This file is big, and this can take time to parse


In [14]:
print time.asctime()
maxspinmap = findmaxsuperpixels(os.path.join(exportdir, "superpixel_to_segment_map.txt"))
print time.asctime()


Tue Aug  5 14:48:34 2014
Tue Aug  5 14:55:55 2014

In [15]:
print len(maxspinmap)


8090

Likewise the seg-body map:


In [16]:
print time.asctime()
maxseg, maxbody = findmaxsegbody(os.path.join(exportdir, "segment_to_body_map.txt"))
print maxseg, maxbody
print time.asctime()


Tue Aug  5 14:56:02 2014
78160114 8900180
Tue Aug  5 14:59:55 2014

Now finish it up. Compare the max sp found in the images vs. the ones found in the text file maps. For each one missing from the map, create a new seg and body and related mappings. Review them before writing them.


In [17]:
newspmap, newsegmap = findnewmappings(maxsponplane, maxspinmap, maxseg, maxbody)

Upon examination, those look plausible.

Write maps

Last step: append those mappings to the end of the existing files. This has been tested with empty files.


In [19]:
appendspmapping(os.path.join(exportdir, "superpixel_to_segment_map.txt"), newspmap)
appendsegmapping(os.path.join(exportdir, "segment_to_body_map.txt"), newsegmap)

FIB-25 August 2014 -- from export, v2

Turns out we need to work from a later export, and Bill's error list will not give us the planes we need to scan. These are the directories we care about, the export and the seesion:


In [8]:
exportdir = "/groups/flyem/data/medulla-FIB-Z1211-25-production/align2/stitched_layers/export/shinya1-13_20140815/"
sessiondir = "/groups/flyem/data/medulla-FIB-Z1211-25-production/align2/stitched_layers/session/shinya1-13_20140522_extended-ST3/"

Even though we don't work on the session directly, we need to get the modified planes from it rather than some other source (like a DVID error log).


In [9]:
modplanes = findmodifiedplanes(sessiondir)
print len(modplanes)


3751

Ugh, that's almost half of them! Even so, we now just proceed like the first version.

First, preserve the sp to seg and seg to body maps for safety (.orig suffix). I didn't do that in this notebook because it should not be repeated later. Note that these files are rather large (2-3 Gb).

Scan image planes

First version and some testing shows this takes about 2-3s per plane. So for 3700, allow 3+ hours.


In [10]:
maxsponplane = {}
print time.asctime()
for z in modplanes:
    imagepath = os.path.join(exportdir, "superpixel_maps", "sp_map.%05d.png" % z)
    data = readsuperpixelmap(imagepath)
    maxsponplane[z] = data.max()
print time.asctime()


Thu Aug 21 07:42:44 2014
Thu Aug 21 09:40:36 2014

In [11]:
print len(maxsponplane)


3751

Adjust maps

Next, read the sp-seg map and identify which sp don't appear. This file is big, and this can take time to parse


In [12]:
print time.asctime()
maxspinmap = findmaxsuperpixels(os.path.join(exportdir, "superpixel_to_segment_map.txt"))
print time.asctime()


Thu Aug 21 09:44:27 2014
Thu Aug 21 09:52:29 2014

In [13]:
print len(maxspinmap)


8090

In [14]:
print time.asctime()
maxseg, maxbody = findmaxsegbody(os.path.join(exportdir, "segment_to_body_map.txt"))
print maxseg, maxbody
print time.asctime()


Thu Aug 21 09:54:57 2014
78267490 8915777
Thu Aug 21 09:58:56 2014

Now finish it up. Compare the max sp found in the images vs. the ones found in the text file maps. For each one missing from the map, create a new seg and body and related mappings. Review them before writing them.


In [15]:
newspmap, newsegmap = findnewmappings(maxsponplane, maxspinmap, maxseg, maxbody)

At a quick glance, they look plausible.

Write maps

Last step: append those mappings to the end of the existing files. This has been tested with empty files.


In [19]:
appendspmapping(os.path.join(exportdir, "superpixel_to_segment_map.txt"), newspmap)
appendsegmapping(os.path.join(exportdir, "segment_to_body_map.txt"), newsegmap)