In order to fully understand the following analysis, some key terms must be defined.
A Correct Detection of a synapse occurs when, after the binarization step, the pipeline identifies at least 1 voxel in a synapse cluster as synapse by scoring it with a 1. Each synapse cluster can only be correctly detected once
An Incorrect Detection of a synapse occurs when, after the binarization step, the pipeline identifies a voxel as synapse by scoring it with a 1, even though said voxel is not in a ground truth synapse cluster.
The Probability of Detection (PD) is the fraction $\frac{\text{Correct Detections}}{\text{True Number of Synapses}}$ This is the same formula as Recall
The False Alarm Rate (FAR) is the fraction $\frac{\text{Incorrect Detections}}{\text{Total Detections}}$ This can be thought of as the False Positive Rate
The easy case and failure case tests for our algorithm will utilize data that we have created for prior unit tests. These files will be loaded from pickle files on our repo.
In [1]:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import pickle
import sys
sys.path.insert(0,'../code/functions/')
easySim = pickle.load(open('../code/tests/synthDat/twoPerfectClusters.synth'))
hardSim = pickle.load(open('../code/tests/synthDat/all100.synth'))
For the easySim, I expect the algorithm to correctly identify both 'synapse' clusters in the data, and not identify any non synapse voxels as cluster. This will result in a PD of 100% and a FAR of 0%
For the hardSim, I expect the algorithm to identify both synapse clusters, and the noise in the data, as synapse. This would result in a PD of 100% and a FAR of ~99%
I expect the algorithm to perform very well on easySim, since the volume is small, and the clusters are not obscured by any noise.
I expect the algorithm to perform very poorly on the hardSim, since the volume is fully saturated with noise, and there is no way for the pipeline to differentiate synapse from noise.
In addition to these base easy and difficult cases, we will generate data volumes with the following properties:
In order to test how the algorithm performs on data that mimics real world data.
The code below will perform this generation routine
In [2]:
from random import randrange as rand
from skimage.measure import label
def generatePointSet():
center = (rand(0, 50), rand(0, 50), rand(0, 50))
toPopulate = []
for z in range(-1, 2):
for y in range(-1, 2):
for x in range(-1, 2):
curPoint = (center[0]+z, center[1]+y, center[2]+x)
#only populate valid points
valid = True
for dim in range(3):
if curPoint[dim] < 0 or curPoint[dim] >= 100:
valid = False
if valid:
toPopulate.append(curPoint)
return set(toPopulate)
def generateTestVolume():
#create a test volume
volume = np.zeros((100, 100, 100))
myPointSet = set()
for _ in range(rand(500, 800)):
potentialPointSet = generatePointSet()
#be sure there is no overlap
while len(myPointSet.intersection(potentialPointSet)) > 0:
potentialPointSet = generatePointSet()
for elem in potentialPointSet:
myPointSet.add(elem)
#populate the true volume
for elem in myPointSet:
volume[elem[0], elem[1], elem[2]] = 60000
#introduce noise
noiseVolume = np.copy(volume)
for z in range(noiseVolume.shape[0]):
for y in range(noiseVolume.shape[1]):
for x in range(noiseVolume.shape[2]):
if not (z, y, x) in myPointSet:
noiseVolume[z][y][x] = rand(0, 10000)
return volume, noiseVolume
The ground truth for each point set is returned as the first of two unpacked values from the generateTestVolume function. In this way, we can compare every test trial to the ground truth.
I expect the algorithm to perform very well on tihs data, since it was designed to filter out noise and identify synapses of the size in the synthetic data.
I will also generate a data volume to simulate poor data that the pipeline would not work well on. This data will have the following properties:
In order to test how the algorithm performs on very poor quality data.
The code below will perform this generation routine
In [16]:
def generatePoorPointSet():
center = (rand(0, 99), rand(0, 99), rand(0, 99))
toPopulate = []
for z in range(-1, 1):
for y in range(-1, 1):
for x in range(-1, 1):
curPoint = (center[0]+z, center[1]+y, center[2]+x)
#only populate valid points
valid = True
for dim in range(3):
if curPoint[dim] < 0 or curPoint[dim] >= 100:
valid = False
if valid:
toPopulate.append(curPoint)
return set(toPopulate)
def generatePoorTestVolume():
#create a test volume
volume = np.zeros((100, 100, 100))
myPointSet = set()
for _ in range(rand(500, 800)):
potentialPointSet = generatePoorPointSet()
#be sure there is no overlap
while len(myPointSet.intersection(potentialPointSet)) > 0:
potentialPointSet = generatePoorPointSet()
for elem in potentialPointSet:
myPointSet.add(elem)
#populate the true volume
for elem in myPointSet:
volume[elem[0], elem[1], elem[2]] = 60000
#introduce noise
noiseVolume = np.copy(volume)
for z in range(noiseVolume.shape[0]):
for y in range(noiseVolume.shape[1]):
for x in range(noiseVolume.shape[2]):
if not (z, y, x) in myPointSet:
noiseVolume[z][y][x] = rand(0, 10000)
return volume, noiseVolume
The above data can be sampled in one of three ways
Based on our prior testing, I expect a mean PD of ~97% and a mean far of 0% for the generateTestVolume Data.
Based on our prior testing, I expect a mean PD 0f ~4% and a mean far of 0% for the generatePoorTestVolume Data.
My expectations for the easySim and hardSim data can be found in Simulation Data Section 1 - Functionality Testing - i
In [4]:
#The easy sim data
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
z, y, x = easySim.nonzero()
ax.scatter(x, y, z, zdir='z', c='r')
plt.show()
In [5]:
#The hard sim data
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
z, y, x = hardSim.nonzero()
ax.scatter(x, y, z, zdir='z', c='r')
plt.show()
In [6]:
#One instance of generated data
truth, test = generateTestVolume()
#The ground truth data
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
z, y, x = truth.nonzero()
ax.scatter(x, y, z, zdir='z', c='r')
plt.show()
In [7]:
#The test data (has noise)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
z, y, x = test.nonzero()
ax.scatter(x, y, z, zdir='z', c='r')
plt.show()
In [8]:
synapseDensity = len(zip(*truth.nonzero()))/float(100*100*100)
def getAveSynSize(volume):
lVolume = label(volume)
aveSum = []
#starting at 1 since we dont want to count the background label
for i in range(1, np.max(lVolume)+1):
aveSum.append(len(zip(*np.where(lVolume == i))))
return np.average(aveSum), aveSum
print 'Percent Synapse by Volume: ', synapseDensity*100., '%'
aveSize, sizeList = getAveSynSize(truth)
print 'Average Synapse Size: ', aveSize
In [11]:
#Generate a histogram of synapse sizes in voxels
fig = plt.figure()
plt.hist(sizeList)
plt.show()
In [17]:
#One instance of generated data
truth, test = generatePoorTestVolume()
#The ground truth data
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
z, y, x = truth.nonzero()
ax.scatter(x, y, z, zdir='z', c='r')
plt.show()
In [18]:
#The test data (has noise)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
z, y, x = test.nonzero()
ax.scatter(x, y, z, zdir='z', c='r')
plt.show()
In [19]:
synapseDensity = len(zip(*truth.nonzero()))/float(100*100*100)
print 'Percent Synapse by Volume: ', synapseDensity*100., '%'
aveSize, sizeList = getAveSynSize(truth)
print 'Average Synapse Size: ', aveSize
In [20]:
#Generate a histogram of synapse sizes in voxels
fig = plt.figure()
plt.hist(sizeList)
plt.show()
The functionality data is confirmed to be correct from the visualizations.
The validation data shows a percent synapse by bolume of 2.1033%, and an average synapse size of 28.851 voxels. This is consistent with the density and size metrics of our actual data. The histogram of synapse sizes shows four peaks. Each of these peaks can be explained in an intuitive and expeted way:
The peak in the bin around 20 is full of the cluster who were generated at the edges of the 100x100x100 volume, making their total volumes slightly smaller than the 27 voxel expected volume.
The peak around 27 is the large set of clusters who were generated not touching any other cluster. They have a volume of 27 voxels
The peak around 54 is a small set of 2 clusters who were generated ajacently to eachother, resulting in one large cluster. The algorithm will likely have trouble identifying these clusters as separate in some cases.
The peak around 81 is the small set of 3 clusters who were generated ajacently to eachother, resulting in one large cluster. The algorithm will likely have trouble identifying these clusters as separate in some cases.
The validation data shows a percent synapse by bolume of .4052%, and an average synapse size of 8.071 voxels. This is consistent with the reduced metrics that we were aiming for in the failure data. The histogram of synapse sizes shows three peaks. Each of these peaks can be explained in an intuitive and expected way:
The peak in the bin around 4 pixelsis full of the cluster who were generated at the edges of the 100x100x100 volume, making their total volumes slightly smaller than the 8 voxel expected volume.
The peak around 8 is the large set of clusters who were generated not touching any other cluster. They have a volume of 8 voxels
The peak around 16 is a small set of 2 clusters who were generated ajacently to eachother, resulting in one large cluster.
INPUTS - A 3D volume containing voxels with intensities 0 to 65534
OUTPUTS - A 3D volume labeling synapse voxels as 1 and non synapse voxels as 0
In [ ]:
######################################
###THIS IS PSEUDOCODE; WILL NOT RUN###
######################################
###STEP 1: COMPUTATION OF FOREGROUND PROBABILITY###
cdfMapVolume = []
for image in volume:
#Get a distribution of intensities of the slice
dist = generateDistribution(image)
#Get the cdf for every voxel in the image slice
cdfMap = zeros_like(dist)
for y in image.height:
for x in image.width:
cdfMap[y][x] = dist.cdf(image[y][x])
cdfMapVolume.apend(cdfMap)
####Step 2: Probability of 2D Puncta
filteredVolume = zeros_like(cdfMapVolume)
for z in cdfMapVolume.depth:
for y in cdfMapVolume.height:
for x in cdfMapVolume.width:
#NOTE: boxFilter apples a box filter(duh...) of size at the given zyx
#this boxFilter is equivalent to the product of all elements that are <=size
#away from the zyx center, and are on the same z slice.
filteredVolume[z][y][x] = boxFilter(cdfMapVolume, z, y, x, size)
####Step 3: Probability of 3D Puncta
finalVolume = zeros_like(filteredVolume)
for z in filteredVolume.depth:
for y in filteredVolume.height:
for x in filteredVolume.width:
#getSquaredError calculates the sum of the squared error between the provided zyx voxel
#and the voxel at the same yx points within size distance of z (i.e. up and down z axis)
finalVolume[z][y][x] = filteredVolume[z][y][x] * exp(-1 * getSquaredError(filteredVolume, z, y, x, size))
All code for this PLOS Pipeline can be found in our plosLib.py file, located here:
https://github.com/NeuroDataDesign/pan-synapse/blob/master/code/functions/plosLib.py
In [7]:
import plosLib as pLib
import connectLib as cLib
# success test
twoClusterResults = pLib.pipeline(easySim)
bianTwoClusterResults = cLib.otsuVox(twoClusterResults)
# failure test
uniformResults = pLib.pipeline(hardSim)
bianUniformResults = cLib.otsuVox(uniformResults)
In [26]:
#The binary two cluster results
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
z, y, x = bianTwoClusterResults.nonzero()
ax.scatter(x, y, z, zdir='z', c='r')
plt.show()
In [27]:
#The binary two cluster results
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
z, y, x = uniformResults.nonzero()
ax.scatter(x, y, z, zdir='z', c='r')
plt.show()
I plan to use Probability of Detection (PD) and False Alarm Rate (FAR) to quantify performance. Definitions for these metrics can be found in Section 0: Preamble
I will compute these metrics for a permutation algorithm on the same data as a null comparison
The code below will be used to quantify the results for the validation data tests.
In [28]:
def executeTest():
trueVolume, testVolume = generateTestVolume()
labelVolume = label(trueVolume)
maxLabel = np.max(labelVolume)
#get pieline results
results = cLib.otsuVox(pLib.pipeline(testVolume))
detected = zip(*np.where(results == 1))
#get random results for p value comparison
randomResults = np.zeros_like(testVolume)
for z in range(randomResults.shape[0]):
for y in range(randomResults.shape[1]):
for x in range(randomResults.shape[2]):
randomResults[z][y][x] = rand(0, 2)
randomDetected = zip(*np.where(randomResults == 1))
#score results
numDetected = 0
numMistaken = 0
alreadyCounted = []
for point in detected:
labelPointVal = labelVolume[point[0], point[1], point[2]]
if labelPointVal != 0 and not labelPointVal in alreadyCounted:
numDetected +=1
alreadyCounted.append(labelPointVal)
if labelPointVal == 0:
numMistaken +=1
print "\tPipeline:"
print "\t\tPD: ", float(numDetected)/maxLabel
print "\t\tFAR: ", float(numMistaken)/(100 * 100 *100)
randNumDetected = 0
randNumMistaken = 0
alreadyCounted = []
for point in randomDetected:
labelPointVal = labelVolume[point[0], point[1], point[2]]
if labelPointVal != 0 and not labelPointVal in alreadyCounted:
randNumDetected +=1
alreadyCounted.append(labelPointVal)
if labelPointVal == 0:
randNumMistaken +=1
print "\tRandom:"
print "\t\tPD: ", float(randNumDetected)/maxLabel
print "\t\tFAR: ", float(randNumMistaken)/(100 * 100 *100)
return float(numDetected)/maxLabel, float(numMistaken)/(100 * 100 *100), float(randNumDetected)/maxLabel, float(randNumMistaken)/(100 * 100 *100)
In [29]:
def executeFailureTest():
trueVolume, testVolume = generatePoorTestVolume()
labelVolume = label(trueVolume)
maxLabel = np.max(labelVolume)
#get pieline results
results = cLib.otsuVox(pLib.pipeline(testVolume))
detected = zip(*np.where(results == 1))
#get random results for p value comparison
randomResults = np.zeros_like(testVolume)
for z in range(randomResults.shape[0]):
for y in range(randomResults.shape[1]):
for x in range(randomResults.shape[2]):
randomResults[z][y][x] = rand(0, 2)
randomDetected = zip(*np.where(randomResults == 1))
#score results
numDetected = 0
numMistaken = 0
alreadyCounted = []
for point in detected:
labelPointVal = labelVolume[point[0], point[1], point[2]]
if labelPointVal != 0 and not labelPointVal in alreadyCounted:
numDetected +=1
alreadyCounted.append(labelPointVal)
if labelPointVal == 0:
numMistaken +=1
print "\tPipeline:"
print "\t\tPD: ", float(numDetected)/maxLabel
print "\t\tFAR: ", float(numMistaken)/(100 * 100 *100)
randNumDetected = 0
randNumMistaken = 0
alreadyCounted = []
for point in randomDetected:
labelPointVal = labelVolume[point[0], point[1], point[2]]
if labelPointVal != 0 and not labelPointVal in alreadyCounted:
randNumDetected +=1
alreadyCounted.append(labelPointVal)
if labelPointVal == 0:
randNumMistaken +=1
print "\tRandom:"
print "\t\tPD: ", float(randNumDetected)/maxLabel
print "\t\tFAR: ", float(randNumMistaken)/(100 * 100 *100)
return float(numDetected)/maxLabel, float(numMistaken)/(100 * 100 *100), float(randNumDetected)/maxLabel, float(randNumMistaken)/(100 * 100 *100)
In [33]:
print 'Success Test:'
pd, far, rpd, rfar = executeTest()
print 'Failure Test:'
pd, far, rpd, rfar = executeFailureTest()
In [41]:
spd=[]
sfar=[]
srpd=[]
srfar=[]
for num in range(1,11):
print "\nExecuting Test: ", num
pd, far, rpd, rfar = executeTest()
spd.append(pd)
sfar.append(far)
srpd.append(rpd)
srfar.append(rfar)
print '\n\nAverage Performance:'
print '\tPipeline:'
print '\t\tPD: ', np.average(spd)
print '\t\tFAR: ', np.average(far)
print '\tRandom: '
print '\t\tPD: ', np.average(srpd)
print '\t\tFAR: ', np.average(srfar)
In [40]:
fpd=[]
ffar=[]
frpd=[]
frfar=[]
for num in range(1,11):
print "\nExecuting Test: ", num
pd, far, rpd, rfar = executeFailureTest()
fpd.append(pd)
ffar.append(far)
frpd.append(rpd)
frfar.append(rfar)
print '\n\nAverage Performance:'
print '\tPipeline:'
print '\t\tPD: ', np.average(spd)
print '\t\tFAR: ', np.average(far)
print '\tRandom: '
print '\t\tPD: ', np.average(srpd)
print '\t\tFAR: ', np.average(srfar)
I will create a scatter plot detailing the PD and FAR of both algorithms over their 10 tests, for both the success and failure validation tests.
The population performance will simply be calculated as the average PD and FAR for each algorithm over the 10 tests.
Below is the PD of the pipeline (red) vs the random algorithm (blue) over 10 trials
In [49]:
fig = plt.figure()
x = np.arange(10)
plt.scatter(x, spd, c='r')
plt.scatter(x, srpd, c='b')
plt.show()
Below is the FAR of the pipeline (red) vs the random algorithm (blue) over 10 trials
In [50]:
fig = plt.figure()
x = np.arange(10)
plt.scatter(x, sfar, c='r')
plt.scatter(x, srfar, c='b')
plt.show()
In [51]:
fig = plt.figure()
x = np.arange(10)
plt.scatter(x, fpd, c='r')
plt.scatter(x, frpd, c='b')
plt.show()
Below is the FAR of the pipeline (red) vs the random algorithm (blue) over 10 trials
In [52]:
fig = plt.figure()
x = np.arange(10)
plt.scatter(x, ffar, c='r')
plt.scatter(x, frfar, c='b')
plt.show()
The population performance is calculated and output during the simulations. Please refer to Simulation Analysis Sections 10 & 11
The probability of detection plot shows only a small deviation in the PD between the random and the true pipeline, with both algorithms detecting over 97% of the synapses correctly. See the quantitiative section for intuition as to why this performnace is so high.
The deviation between PD measurements in the pipeline is quite small, showing that the pipeline produces stable results.
The false alarm rate plot shows a large diffrence in FAR between the random and the true pipeline. The true pipeline had virtually 0% FAR vs the random pipeline.
The deviation between FAR measurements in the pipeline is quite small, showing that the pipeline produces stable results.
The probability of detection plot shows a slightly larget deviation in the PD between the random and the true pipeline than the success plot did. This deviation, though, is actually quite small compared to the expected output. based on the high performance in PD of the random pipeline, this means that the synapse algoithm is performing better than expected.
The deviation between PD measurements in the pipeline is quite small, showing that the pipeline produces stable results.
The false alarm rate plot shows a large diffrence in FAR between the random and the true pipeline. The true pipeline had virtually 0% FAR vs the random pipeline.
The deviation between FAR measurements in the pipeline is quite small, showing that the pipeline produces stable results.
Comparative to the random sampling, the actual pipeline performs ~2.2% worse in probability of detection, but has ~49% less false alarms, on average.
Some intuition behind these numbers:
The size of volume of test synapses is 27 voxels, meaning that the random algorithm has a $1-(.5)^{27} = .999$ probability of labeling at least one voxel in the volume as true. This explains the very high PD for the random algorithm.
The FAR of the random algorithm has a similarly simple intuition. Since 98% of the volume is 'not synapse', and the randomized algorithm has a 50% chance of labeling any one voxel as 'synapse' there is a $.98*.5=.49$ probability that it marks an incorrect voxel as correct.
The PD of the pipeline is hovering around 98%. I believe that the reason for this not being at 100 is that the algorithm does not attempt to classify voxels where any part of the computations in its pipeline are undefined. Due to this, it will not report a true positive on synapses that are in the corners or on the edges of the volume. I intend to write a test suite to evaluate this hypothesis in the near future.
The FAR of the pipeline is extemely low at 0%. Under normal circumstances, this would provide evidence that hyperparameters could be tweaked such that one could increase the PD at little to no cost in FAR. However, the current default hyperparameter set is the smallest set that can be applied to this particular algorithm; meaning that it cannot be tweaked to favor detection over false alarms any more than it already is. With this in mind, we could make a few modifications to the algorithm to optimize it to the circumstances at hand. I intend to test these tweaks once the full pipeline is together and functional.
With a mean detection rate of ~93% and a mean far of about 1% on the failure data, the pipeline performed much better than expected on the failure data.
Overall, the pipeline performed extremely well when compared to the random algorithm. It was able to detect ~97% of synapses with ~0% false alarms
Just to gain some insight as to how the pipeline is performing, I wanted to run the pipeline on a select few of the real world images and generate some qualitative visualizations. Due to the size of our data, I will be loading it from a pickle file as opposed to directly from the tif images. Additionally, for processing speed reasons, I will only be running the pipeline on slices 1-3. I will be displaying slice 3 as the reference image
In [13]:
realData = pickle.load(open('../code/tests/synthDat/realDataRaw_t0.synth', 'r'))
for i in range(0, 5):
realDataSection = realData[5*i:5*i + 5]
plt.imshow(realDataSection[2], cmap='gray')
plt.show()
out = cLib.otsuVox(pLib.pipeline(realDataSection))
plt.imshow(out[2], cmap='gray')
plt.show()
In [ ]: