Human Protein Atlas Notebook

This notebook uses a fraction of the content database built for OMERO.searcher Local client

http://murphylab.web.cmu.edu/software/searcher/

The database contains 101 SLF33 feature vectors from images from the Human Protein Atlas.



In [20]:

    
import cPickle as pickle
from IPython.display import Image
import halcon



In [21]:

    
data = pickle.load( open( 'dataset.pkl', 'r' ) )

I will use the first image in the dataset as the query image.



In [22]:

    
url = data[0][0]
print url
Image(url=url,height=400,width=400,retina=True)









    



http://www.proteinatlas.org/images/10505/100_A12_1_blue_green.jpg






    Out[22]:

I will also "cheat". I am going to include this image in the dataset. This way, I can assess that the most similar image in the dataset is the query image itself.

Now, if we take a lot at one of the records in the dataset we will realize they are made of two elements

an image URL
a feature vector



In [23]:

    
datum = data[0]
url = datum[0]
print "Elements in datum: " + str(len(datum))
print "Image URL: " + url
feature_vector = datum[1]
print "Number of features in SLF33 feature vector: " + str(len(feature_vector))









    



Elements in datum: 2
Image URL: http://www.proteinatlas.org/images/10505/100_A12_1_blue_green.jpg
Number of features in SLF33 feature vector: 162

Now we will need to reshape this dataset since each element in FALCON has three parts

Any string (in this case we are using the image URL as its identifier)
An initial score (missing in this dataset)
A feature vector (in this case an SLF33 feature vector set)

If you are interested in learning more about Subcellular Location Features (SLF) visit the

http://murphylab.web.cmu.edu/services/SLF/



In [24]:

    
print "Preparing dataset"
dataset = []
for datum in data:
    dataset.append( [ datum[0], 1, datum[1] ] )
    
print "Preparing query image"
query_image = [dataset[0]]

[iids, scores] = halcon.search.query( query_image, dataset, normalization='standard' )









    



Preparing dataset
Preparing query image

Now, according to HALCON, the image that looks more similar to the query image is



In [25]:

    
url = iids[1]
print url
Image(url=url,height=400,width=400,retina=True)









    



http://www.proteinatlas.org/images/10549/100_B12_2_blue_green.jpg






    Out[25]:

The TOP 10 images are



In [26]:

    
for i in range(10):
    url = iids[i]
    print url









    



http://www.proteinatlas.org/images/10505/100_A12_1_blue_green.jpg
http://www.proteinatlas.org/images/10549/100_B12_2_blue_green.jpg
http://www.proteinatlas.org/images/9143/100_D8_2_blue_green.jpg
http://www.proteinatlas.org/images/8406/100_F4_2_blue_green.jpg
http://www.proteinatlas.org/images/8527/100_B5_1_blue_green.jpg
http://www.proteinatlas.org/images/8802/100_B7_2_blue_green.jpg
http://www.proteinatlas.org/images/8716/100_G1_1_blue_green.jpg
http://www.proteinatlas.org/images/8411/100_G3_2_blue_green.jpg
http://www.proteinatlas.org/images/6154/100_D5_1_blue_green.jpg
http://www.proteinatlas.org/images/8614/100_B3_1_blue_green.jpg