I want to automate the selection of intensity bands from gel images. The data I'm using in this notebook is from folding results of my 84X and 42X noodle structures. I've used a Cy3 object probe and a Cy5 defect probe for both strucutres. I used the Typhoon at 940V for Cy3 and 900V for Cy5
I will be using KMeans clustering to segment the gel images. There are many other clustering algorithms we could use, but I like K Means because it is not too computationally intensive. Here is a cool explanation of the differences in clustering algorithms from the folks at scikit-learn: http://scikit-learn.org/stable/modules/clustering.html#overview-of-clustering-methods
We start with some package imports:
In [1]:
%pylab inline
In [2]:
import sklearn
from sklearn.cluster import KMeans
from sklearn import cluster
from scipy import misc
from scipy.ndimage import label
Here we define some functions to use later on
In [40]:
def invert(img):
from copy import deepcopy
imin = np.min(img)
imax = np.max(img)
img_inv = imax - deepcopy(img)+imin
return img_inv
In [3]:
def kmeans(img,nlabels=5):
lena = img
X = lena.reshape((-1, 1)) # We need an (n_sample, n_feature) array
k_means = cluster.KMeans(n_clusters=nlabels, n_init=1)
k_means.fit(X)
values = k_means.cluster_centers_.squeeze()
labels = k_means.labels_
return labels.reshape(img.shape)
In [4]:
def get_labels(data, min_extent=5):
labels, nlabels = label(data)
for idx in range(1, nlabels+1):
if sum(labels==idx)<min_extent:
labels[labels==idx] = 0
nlabels = len(np.unique(labels))
return labels, nlabels
In [5]:
def get_centroids(labels):
labelnums = np.unique(labels)
centroids = []
for label in labelnums:
centroids.append(np.mean(np.asarray(np.nonzero(labels==label)), axis = 1))
return centroids
In [6]:
def make_boxplot(measure_names,cy5,cy3,title=''):
import scipy.stats as stats
ind = np.arange(len(measure_names))
width = 0.35
fig = plt.figure(figsize=(24,8))
ax = fig.add_subplot(111)
rects1 = ax.boxplot(cy3,positions = ind,widths=0.2)
rects2 = ax.boxplot(cy5,positions = [i+width for i in ind],widths=0.2)
ax.set_xticklabels(measure_names,rotation=0)
p1 = [plot(np.ones(len(cy3[i]))*ind[i],cy3[i],'ro',alpha=0.005) for i in range(len(measure_names))]
c1 = [plot(np.ones(len(cy5[i]))*ind[i]+width,cy5[i],'bo',alpha=0.005) for i in range(len(measure_names))]
for key in rects1.keys():
setp(rects1[key],color='red')
setp(rects2[key],color='blue')
ax.set_ylabel('Intensity')
ax.set_xlabel('MgCl2 concentration')
ax.set_title(title)
ax.set_xticks(ind+width/2)
ax.legend( (rects1[key][0], rects2[key][0]), ('Cy3', 'Cy5') )
In [43]:
cy3 = imread("../downloads/10282013_42XCy3_D_O_3H_70V-[Cy3].jpg")
cy5 = imread("../downloads/10282013_42XCy3_D_O_3H_70V-[Cy5].jpg")
fig,ax = subplots(ncols=2,nrows=1,figsize=(12,6))
ax[0].hist(cy3.ravel(),bins=100,color="black");
ax[0].set_title("Cy3 Histogram")
ax[1].hist(cy5.ravel(),bins=100,color="black");
ax[1].set_title("Cy5 Histogram")
Out[43]:
In [44]:
labels = kmeans(cy3)
In [45]:
fig,ax = subplots(ncols=2,nrows=1,figsize=(24,8))
ax[0].imshow(cy3,cmap=cm.Greys)
ax[0].set_title('Cy3 - Object')
cmap = cm.get_cmap('jet', 5)
l = ax[1].imshow(labels,cmap=cmap)
ax[1].set_title('K Means Clustering')
ax[1].figure.colorbar(l);
In this case, we choose cluster 3, or the orange clusters in the picture above. Unfortunately this value will change each time K Means is run. We also limit the size of our clusters to >1000 pixels
In [53]:
bandlabels, nlabels = get_labels(labels==3,1000)
In [54]:
fig,ax = subplots(ncols=4,nrows=1,figsize=(36,12))
ax[0].imshow(cy3,cmap=cm.Greys)
cmap = cm.get_cmap('jet', 5)
l = ax[1].imshow(labels,cmap=cmap)
ax[1].set_title('K Means Clustering')
#ax[1].figure.colorbar(l)
cmap = cm.get_cmap('jet', nlabels)
b=ax[2].imshow(bandlabels,cmap=cmap,alpha=0.3)
ax[2].set_title("ROI Labelling")
centroids = get_centroids(bandlabels)
for i,c in enumerate(centroids):
ax[2].annotate('%d'%i,c[::-1],color="black");
ax[3].imshow(cy5,cmap=cm.Greys)
ax[3].set_title('Cy5 - Defect')
ax[0].set_title('Cy3 - Object')
#ax[1].figure.colorbar(b)
Out[54]:
In [55]:
rois = [5,6,9,11,13,15,17]
roinames = ['10','12','14','16','18','20','22'] # These are the MgCl2 concentrations I used in the gel
In [56]:
labelnames = np.unique(bandlabels)
cy3vals = []
cy5vals = []
cy3inv = invert(cy3)
cy5inv = invert(cy5)
for idx in rois:
cy3vals.append(cy3inv[bandlabels==labelnames[idx]])
cy5vals.append(cy5inv[bandlabels==labelnames[idx]])
In [57]:
names = ['%d'%d for d in range(len(labelnames))]
make_boxplot(roinames,cy5vals,cy3vals,'Object (Cy3) and Defect(Cy5) Probe Intensities for 42X')
In [58]:
cy3 = imread("../downloads/10232013_84XCy3_D_O_3H_70V-[Cy3].jpg")
cy5 = imread("../downloads/10232013_84XCy3_D_O_3H_70V-[Cy5].jpg")
fig,ax = subplots(ncols=2,nrows=1,figsize=(12,6))
ax[0].hist(cy3.ravel(),bins=100,color="black");
ax[0].set_title("Cy3 Histogram")
ax[1].hist(cy5.ravel(),bins=100,color="black");
ax[1].set_title("Cy5 Histogram")
Out[58]:
For this case, I calculated just 3 clusters
In [59]:
labels = kmeans(cy3,3)
In [60]:
fig,ax = subplots(ncols=2,nrows=1,figsize=(24,8))
ax[0].imshow(cy3,cmap=cm.Greys)
ax[0].set_title('Cy3 - Object')
cmap = cm.get_cmap('jet', 3)
l = ax[1].imshow(labels,cmap=cmap)
ax[1].set_title('K Means Clustering')
ax[1].figure.colorbar(l);
In this case, we choose cluster 1 (the green clusters in the picture above). Unfortunately this value will change each time K Means is run. We also limit the size of our clusters to >500 pixes, since the resolution is different than for the 42X example
In [61]:
bandlabels, nlabels = get_labels(labels==1,500)
In [62]:
fig,ax = subplots(ncols=4,nrows=1,figsize=(36,12))
ax[0].imshow(cy3,cmap=cm.Greys)
cmap = cm.get_cmap('jet', 5)
l = ax[1].imshow(labels,cmap=cmap)
ax[1].set_title('K Means Clustering')
#ax[1].figure.colorbar(l)
cmap = cm.get_cmap('jet', nlabels)
b=ax[2].imshow(bandlabels,cmap=cmap,alpha=0.3)
ax[2].set_title("ROI Labelling")
centroids = get_centroids(bandlabels)
for i,c in enumerate(centroids):
ax[2].annotate('%d'%i,c[::-1],color="black");
ax[3].imshow(cy5,cmap=cm.Greys)
ax[3].set_title('Cy5 - Defect')
ax[0].set_title('Cy3 - Object');
#ax[1].figure.colorbar(b)
Out[62]:
In [63]:
rois = [14,12,13,11,9,8,10]
roinames = ['10','12','14','16','18','20','22'] # These are the MgCl2 concentrations I used in the gel
In [66]:
labelnames = np.unique(bandlabels)
cy3vals = []
cy5vals = []
cy3inv = invert(cy3)
cy5inv = invert(cy5)
for idx in rois:
cy3vals.append(cy3inv[bandlabels==labelnames[idx]])
cy5vals.append(cy5inv[bandlabels==labelnames[idx]])
In [67]:
make_boxplot(roinames,cy5vals,cy3vals,'Object (Cy3) and Defect(Cy5) Probe Intensities for 84X')
This processing pipeline is not quite fully automated. We still needed to define the following from visual inspection:
In [ ]: