Potential Host Detection

To state the classification problem as a binary classification problem, we need to be able to find potential hosts in an infrared image. Potential hosts are galaxies that may or may not contain the radio source we see in the associated radio image.

Previously, I have found potential hosts by looking for local maxima in the IR image. This finds too many potential hosts, mostly due to background noise. Julie suggested fitting Gaussians to the image, and scikit-image provides a few blob detection functions that do this. In this notebook, I will try skimage's blob detection for detecting potential hosts.


In [47]:
import collections
import itertools
import logging
import pprint
import sys
import warnings

import matplotlib.pyplot
import numpy
import skimage.feature

sys.path.insert(1, '..')
import crowdastro.data
import crowdastro.rgz_analysis.consensus
import crowdastro.show

%matplotlib inline
warnings.simplefilter('ignore', UserWarning)  # astropy always raises warnings on Windows.

Let's start by finding a test subject.


In [48]:
# subject = crowdastro.data.get_random_subject()
subject = crowdastro.data.db.radio_subjects.find_one({'zooniverse_id': 'ARG0003rq4'})
crowdastro.show.ir(subject)
matplotlib.pyplot.show()

ir = crowdastro.data.get_ir(subject)[:200, :200]  # This image is, for some reason, 1 px too big.


The most accurate blob detection skimage offers is Laplacian of Gaussian, so I will try using this. It's hard to tell what parameters to use, so I'll have to experiment a bit.


In [49]:
blobs = skimage.feature.blob_log(ir/255, min_sigma=5, max_sigma=15, num_sigma=10, threshold=0.00002, overlap=0.75, log_scale=False)
blobs[:, 2] = blobs[:, 2] * numpy.sqrt(2)  # Convert sigma to radii.

print('Found {} blobs.'.format(blobs.shape[0]))

crowdastro.show.ir(subject)
ax = matplotlib.pyplot.gca()
for y, x, r in blobs:
    c = matplotlib.pyplot.Circle((x, y), r, color='green', linewidth=1, fill=False)
    ax.add_patch(c)
matplotlib.pyplot.show()


Found 57 blobs.

That seems good enough for now. Let's try it on some random examples.


In [50]:
def plot_blobs(subject):
    ir = crowdastro.data.get_ir(subject)[:200, :200]
    blobs = skimage.feature.blob_log(ir/255, min_sigma=5, max_sigma=15, num_sigma=10, threshold=0.00002, overlap=0.75,
                                     log_scale=False)
    blobs[:, 2] = blobs[:, 2] * numpy.sqrt(2)  # Convert sigma to radii.

    print('Found {} blobs.'.format(blobs.shape[0]))

    crowdastro.show.ir(subject)
    ax = matplotlib.pyplot.gca()
    for y, x, r in blobs:
        c = matplotlib.pyplot.Circle((x, y), r, color='green', linewidth=1, fill=False)
        ax.add_patch(c)
    matplotlib.pyplot.show()

for _ in range(10):
    plot_blobs(crowdastro.data.get_random_subject())


Found 39 blobs.
Found 44 blobs.
Found 52 blobs.
Found 58 blobs.
Found 37 blobs.
Found 49 blobs.
Found 40 blobs.
Found 33 blobs.
Found 44 blobs.
Found 53 blobs.

In [ ]: