This notebook uses a fraction of the content database built for OMERO.searcher Local client
http://murphylab.web.cmu.edu/software/searcher/
The database contains 101 SLF33 feature vectors from images from the Human Protein Atlas.
In [20]:
import cPickle as pickle
from IPython.display import Image
import halcon
In [21]:
data = pickle.load( open( 'dataset.pkl', 'r' ) )
I will use the first image in the dataset as the query image.
In [22]:
url = data[0][0]
print url
Image(url=url,height=400,width=400,retina=True)
Out[22]:
I will also "cheat". I am going to include this image in the dataset. This way, I can assess that the most similar image in the dataset is the query image itself.
Now, if we take a lot at one of the records in the dataset we will realize they are made of two elements
In [23]:
datum = data[0]
url = datum[0]
print "Elements in datum: " + str(len(datum))
print "Image URL: " + url
feature_vector = datum[1]
print "Number of features in SLF33 feature vector: " + str(len(feature_vector))
Now we will need to reshape this dataset since each element in FALCON has three parts
If you are interested in learning more about Subcellular Location Features (SLF) visit the
In [24]:
print "Preparing dataset"
dataset = []
for datum in data:
dataset.append( [ datum[0], 1, datum[1] ] )
print "Preparing query image"
query_image = [dataset[0]]
[iids, scores] = halcon.search.query( query_image, dataset, normalization='standard' )
Now, according to HALCON, the image that looks more similar to the query image is
In [25]:
url = iids[1]
print url
Image(url=url,height=400,width=400,retina=True)
Out[25]:
The TOP 10 images are
In [26]:
for i in range(10):
url = iids[i]
print url