This notebook investigates the entire pipeline to see where potential errors are occuring. I will be investigating our complete pipeline step by step on simulation data to see how we are clustering/detecting potential synapses.
Almost the exact same simulation that we used for Precision&Recall as well as Connected Components. I generated a smaller volume size so plots are easier to analyze.
Description: Validation testing will be performed on a a 50 x 50 x 50 volume. The synapse pixels will be grouped together in clusters as they would in the true data. Based on research into the true size of synapses, these synthetic synapse clusters will be given area of ~1 micron ^3, or about 139 voxels (assuming the synthetic data here and the real world data have identical resolutions). Pixel intensity distributions are ignored in this simulation because we just want to study how the synapses are being clustered. Therefore, the correct "total" number of synapses is relatively unimportant compared to the ratio of true positives/false positives/ false negatives of our pipeline. In this simulation, synapses make up 8% of the data.
Evaluation: I will be evaluating each step in the pipeline through plot.ly graphs (look at it) and evaluating the final algorithm using Precision and Recall.
In [2]:
import sys
sys.path.insert(0, '../code/functions/')
import connectLib as cLib
from random import randrange as rand
import itertools
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
import pylab
import math
from cluster import Cluster
from mpl_toolkits.mplot3d import Axes3D
import sys
sys.path.insert(0, '../functions/')
import cv2
import plosLib as pLib
import mouseVis as mv
import tiffIO as tIO
import cPickle as pickle
import hyperReg as hype
from scipy import ndimage
def generatePointSet():
center = (rand(0, 50), rand(0, 50), rand(0, 50))
toPopulate = []
for z in range(-1, 5):
for y in range(-1, 5):
for x in range(-1, 5):
curPoint = (center[0]+z, center[1]+y, center[2]+x)
#only populate valid points
valid = True
for dim in range(3):
if curPoint[dim] < 0 or curPoint[dim] >= 50:
valid = False
if valid:
toPopulate.append(curPoint)
return set(toPopulate)
def generateTestVolume():
#create a test volume
volume = np.zeros((50, 50, 50))
myPointSet = set()
clusterList = []
for _ in range(rand(60, 100)):
potentialPointSet = generatePointSet()
#be sure there is no overlap
while len(myPointSet.intersection(potentialPointSet)) > 0:
potentialPointSet = generatePointSet()
clusterList.append(Cluster(list(potentialPointSet)))
for elem in potentialPointSet:
myPointSet.add(elem)
#populate the true volume
for elem in myPointSet:
volume[elem[0], elem[1], elem[2]] = 60000
#introduce noise
noiseVolume = np.copy(volume)
for z in range(noiseVolume.shape[0]):
for y in range(noiseVolume.shape[1]):
for x in range(noiseVolume.shape[2]):
if not (z, y, x) in myPointSet:
noiseVolume[z][y][x] = rand(0, 10000)
return volume, noiseVolume, clusterList
testVolume = generateTestVolume()
foreground = testVolume[0]
combinedIm = testVolume[1]
trueClusterList = testVolume[2]
Why Our Simulation is Correct: Real microscopic images of synapses usually contain a majority of background noise and relatively few synapse clusters. As shown above, the generated test volume follows this expectation.
NOTE: Double check Richard Roth on synapse size (original: .2^3 microns, now 1 micron)
In [3]:
#displaying the random clusters
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
z, y, x = foreground.nonzero()
ax.scatter(x, y, z, zdir='z', c='r')
plt.title('Random Foreground Clusters')
plt.show()
#displaying the noise
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
z, y, x = combinedIm.nonzero()
ax.scatter(x, y, z, zdir='z', c='r')
plt.title('Random Noise + Foreground')
plt.show()
In [5]:
## Getting cluster members for true clusters to plot using plotly.
trueCentroids = []
for cluster in trueClusterList:
trueCentroids.append(cluster.getCentroid())
trueMembers = []
for cluster in trueClusterList:
for member in cluster.members:
trueMembers.append(member)
In [6]:
## Plotting all true cluster members and the centroids of the true positive clusters.
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot
from plotly.graph_objs import *
init_notebook_mode()
In [7]:
trace0 = Scatter3d(
x=[member[2] for member in trueMembers],
y=[member[1] for member in trueMembers],
z=[member[0] for member in trueMembers],
mode='markers',
marker=dict(
size=3,
color = 'red'
),
opacity=1
)
data = [trace0]
layout = Layout(
showlegend=False,
)
fig = dict( data=data, layout=layout )
iplot(fig)
In [8]:
import sys
sys.path.insert(0, '../functions/')
import cv2
import plosLib as pLib
import connectLib as cLib
from cluster import Cluster
import mouseVis as mv
import tiffIO as tIO
import cPickle as pickle
import hyperReg as hype
from scipy import ndimage
import matplotlib.pyplot as plt
def analyzeTimepoint(tiffImage, plosNeighborhood, plosLowerZBound, plosUpperZBound, debug=False):
#finding the clusters after plosPipeline
plosOut = pLib.pipeline(tiffImage, plosNeighborhood, plosLowerZBound, plosUpperZBound)
#binarize output of plos lib
bianOut = cLib.otsuVox(plosOut)
#dilate the output based on neigborhood size
for i in range(int((plosNeighborhood+plosUpperZBound+plosLowerZBound)/3.)):
bianOut = ndimage.morphology.binary_dilation(bianOut).astype(int)
#run connected component
connectList = cLib.connectedComponents(bianOut)
if debug:
print connectList, bianOut
return connectList
In [9]:
plosOut = pLib.pipeline(combinedIm)
In [10]:
# Evaluating plos output
#displaying the plos output
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
z, y, x = plosOut.nonzero()
ax.scatter(x, y, z, zdir='z', c='r')
plt.title('Plos Output')
plt.show()
In [ ]:
for z in range(50):
for y in range(50):
for x in range(50):
print plosOut[z][y][x]
In [11]:
#displaying distribution of plos outputs
fig = plt.figure()
plt.title("PLOS Distribution")
hist, bins = np.histogram(plosOut, bins=50, normed=True)
width = 0.7 * (bins[1] - bins[0])
center = (bins[:-1] + bins[1:]) / 2
plt.bar(center, hist, align='center', width=width)
Out[11]:
In [13]:
#Before trying Otsu's Binarization, I will first trying plotting values above a certain value.
plosMembersAboveThresh = []
thresh = np.percentile(plosOut, 92)
for z in range(50):
for y in range(50):
for x in range(50):
if (plosOut[x][y][z]) >= thresh:
plosMembersAboveThresh.append([z,y,x])
#print plosMembersAboveThresh
#print len(plosMembersAboveThresh)
In [14]:
#plotting PlOS Members
trace0 = Scatter3d(
x=[point[2] for point in plosMembersAboveThresh],
y=[point[1] for point in plosMembersAboveThresh],
z=[point[0] for point in plosMembersAboveThresh],
mode='markers',
marker=dict(
size=3,
),
opacity=1
)
data = [trace0]
layout = Layout(
showlegend=False,
)
fig = dict( data=data, layout=layout )
iplot(fig)
In [32]:
# plotting PLOS and True Members:
# Plos Members
trace0 = Scatter3d(
x=[point[2] for point in plosMembersAboveThresh],
y=[point[1] for point in plosMembersAboveThresh],
z=[point[0] for point in plosMembersAboveThresh],
mode='markers',
marker=dict(
size=3,
),
opacity=1
)
# True Members
trace1 = Scatter3d(
x=[member[2] for member in trueMembers],
y=[member[1] for member in trueMembers],
z=[member[0] for member in trueMembers],
mode='markers',
marker=dict(
size=3,
color = 'red'
),
opacity=.3
)
data = [trace0, trace1]
layout = Layout(
showlegend=False,
)
fig = dict( data=data, layout=layout )
iplot(fig)
In [15]:
#Stronger Thresh?
plosMembersAboveHarshThresh = []
thresh = np.percentile(plosOut, 95)
for z in range(50):
for y in range(50):
for x in range(50):
if (plosOut[x][y][z]) >= thresh:
plosMembersAboveHarshThresh.append([z,y,x])
#print plosMembersAboveHarshThresh
print len(plosMembersAboveHarshThresh)
In [16]:
# plotting PLOS and True Members:
# Plos Members
trace0 = Scatter3d(
x=[point[2] for point in plosMembersAboveHarshThresh],
y=[point[1] for point in plosMembersAboveHarshThresh],
z=[point[0] for point in plosMembersAboveHarshThresh],
mode='markers',
marker=dict(
size=3,
),
opacity=1
)
# True Members
trace1 = Scatter3d(
x=[member[2] for member in trueMembers],
y=[member[1] for member in trueMembers],
z=[member[0] for member in trueMembers],
mode='markers',
marker=dict(
size=3,
color = 'red'
),
opacity=.3
)
data = [trace0, trace1]
layout = Layout(
showlegend=False,
)
fig = dict( data=data, layout=layout )
iplot(fig)
PLOS is not working at all? The plos clusters seem to be formed randomly at places where NO true cluster even exists. Potentially, relatively bright background spots (~10,000 intensity range) might be avoiding getting degraded by the PLOS pipeline. Some true clusters are also just blanatly ignored. Studying the scatter plot, you can also see examples where two clusters close together actually get degraded into one cluster, which helps contribute to our false negative (recall) value. However, some other clusters are just being completely ignored by PLOS (the entire cluster gets degraded away).
The two major problems here that require indepth investigation are:
In [ ]: