Working with image data from viz.neurodata

preliminary


In [1]:
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import urllib2
from __future__ import division

plt.style.use('ggplot')
np.random.seed(1)
url = ('https://raw.githubusercontent.com/Upward-Spiral-Science'
       '/data/master/syn-density/output.csv')
data = urllib2.urlopen(url)
csv = np.genfromtxt(data, delimiter=",",dtype='int')[1:] # don't want first row (labels)
data = csv
print data[::1000]
sizes = [len(np.unique(data[:, i])) for i in range(3)]
ranges = [(np.max(data[:, i]), np.min(data[:,i])) for i in range(3)]
ranges_diff = [np.max(data[:, i])-np.min(data[:,i]) for i in range(3)]
print np.max(data[:,3])
print sizes
print ranges
print ranges_diff
for i, ax in zip(range(3), ['x', 'y', 'z']):
    print ax + '-axis: '
    print 'unique bins (in data): ', np.unique(data[:, i]).size
    print np.unique(data[:, i])
    print


[[    19   1369     55   5063      0]
 [    58   2851   1165      0      0]
 [   136   2344   1054   3042      0]
 [   214   1837    943   3224      0]
 [   253   3358    832      0      0]
 [   331   2851    721  82193     87]
 [   409   2344    610 118055    147]
 [   487   1837    499 153461    205]
 [   526   3358    388      0      0]
 [   604   2851    277  61430     20]
 [   682   2344    166 132042    177]
 [   760   1837     55 153350    177]
 [   799   3319   1165  15316      7]
 [   877   2812   1054 152682    194]
 [   955   2305    943 138771    286]
 [  1033   1798    832 134847    186]
 [  1072   3319    721      0      0]
 [  1150   2812    610 126243    115]
 [  1228   2305    499 150331    178]
 [  1306   1798    388 150579    263]
 [  1345   3319    277      0      0]
 [  1423   2812    166 134814     83]
 [  1501   2305     55 133161    132]
 [  1579   1759   1165 159705    207]
 [  1618   3280   1054   5186      0]
 [  1696   2773    943 152424    120]
 [  1774   2266    832  57804     36]
 [  1852   1759    721 147498    206]
 [  1891   3280    610      0      0]
 [  1969   2773    499 158184    234]
 [  2047   2266    388 147637    183]
 [  2125   1759    277  32607     19]
 [  2164   3280    166      0      0]
 [  2242   2773     55 151533    164]
 [  2320   2227   1165 157801    156]
 [  2398   1720   1054 164268    231]
 [  2437   3241    943   2390      0]
 [  2515   2734    832 147526    197]
 [  2593   2227    721 144017    161]
 [  2671   1720    610 140025    108]
 [  2710   3241    499      0      0]
 [  2788   2734    388 105144     76]
 [  2866   2227    277 155142    226]
 [  2944   1720    166 137611    195]
 [  2983   3241     55      0      0]
 [  3061   2695   1165 156632    143]
 [  3139   2188   1054 161226    216]
 [  3217   1681    943 156828    251]
 [  3256   3202    832   1426      0]
 [  3334   2695    721 147537    173]
 [  3412   2188    610 130806    192]
 [  3490   1681    499 116872    223]
 [  3529   3202    388      0      0]
 [  3607   2695    277 150520    214]
 [  3685   2188    166 136854    176]
 [  3763   1681     55 100042     83]
 [  3802   3163   1165      0      0]
 [  3880   2656   1054     49      0]
 [  3958   2149    943  12275      1]
 [  4036   1642    832  11478      9]
 [  4075   3163    721      0      0]
 [  4153   2656    610  10647      3]]
165789
[108, 52, 11]
[(4192, 19), (3358, 1369), (1165, 55)]
[4173, 1989, 1110]
x-axis: 
unique bins (in data):  108
[  19   58   97  136  175  214  253  292  331  370  409  448  487  526  565
  604  643  682  721  760  799  838  877  916  955  994 1033 1072 1111 1150
 1189 1228 1267 1306 1345 1384 1423 1462 1501 1540 1579 1618 1657 1696 1735
 1774 1813 1852 1891 1930 1969 2008 2047 2086 2125 2164 2203 2242 2281 2320
 2359 2398 2437 2476 2515 2554 2593 2632 2671 2710 2749 2788 2827 2866 2905
 2944 2983 3022 3061 3100 3139 3178 3217 3256 3295 3334 3373 3412 3451 3490
 3529 3568 3607 3646 3685 3724 3763 3802 3841 3880 3919 3958 3997 4036 4075
 4114 4153 4192]

y-axis: 
unique bins (in data):  52
[1369 1408 1447 1486 1525 1564 1603 1642 1681 1720 1759 1798 1837 1876 1915
 1954 1993 2032 2071 2110 2149 2188 2227 2266 2305 2344 2383 2422 2461 2500
 2539 2578 2617 2656 2695 2734 2773 2812 2851 2890 2929 2968 3007 3046 3085
 3124 3163 3202 3241 3280 3319 3358]

z-axis: 
unique bins (in data):  11
[  55  166  277  388  499  610  721  832  943 1054 1165]

Using image data

Where's the origin (xy-plane)? See blue marker (y axis points down)

converting from data units (cx, cy, cz) to pixels

From the data given, it seems that in the x-direction, we have 108 bins, and in the y-direction, 86. Further support for this claim: 108/85 ~= 450/350; 450x350 the stated dimensions (µm) in Bock 2011).


In [2]:
xPix = 135424
yPix = 119808
xPixPerBin = xPix/108.0
yPixPerBin = yPix/86.0
print xPixPerBin, yPixPerBin
# Now since each bin is 40 data coordinates we can define the following function to convert from coordinates to pixels.


1253.92592593 1393.11627907

In [3]:
def coords_to_px(xcoord, ycoord):
    c_vec = np.array([xcoord, ycoord], dtype='float')
    c_vec /= 39.0
    return (c_vec[0]*xPixPerBin, c_vec[1]*yPixPerBin)

# check that max coordinate values are close to max pixels
print coords_to_px(4192, 3358)
# how big is a bin? (just a sanity check, should obviously 
# be identical to xPixPerBin and yPixPerBin)
print coords_to_px(39.0, 39.0)


(134780.96106362774, 119950.88372093023)
(1253.9259259259259, 1393.1162790697674)

Grabbing images from website

Looking through the JavaScript code on viz.neurodata.io, we see that its tiling with 512x512 .png images. At resolution 0, each image has 512x512 "pixels" (correspdonding to the brain images) in it. This can be seen b/c total tiles in the x_direction is 264, and 264x512=135168, approximately our max pixels, and for y-direction, max tiles is 233, 233x512=119296, approximately the max pixels. Similarly for resolution 1, each 512x512 pmg now holds 1024x1024 "pixels", and so on as resolution goes up. Also note that at resolution 1, a single tile is very close in size to a single bin (looking at the second line of output of the previous code block, we see that bins are slightly larger).


In [4]:
def get_tilenums_at_res(xcoord, ycoord, res):
    x, y = coords_to_px(xcoord, ycoord)
    x = np.floor(float(x)/(512*(2**res)))
    y = np.floor(float(y)/(512*(2**res)))
    return x,y

def get_image_url1(xcoord, ycoord, res, z):
    x, y = get_tilenums_at_res(xcoord, ycoord, res)
    x = int(x)
    y = int(y)
    end = '/'+reduce(lambda x, y: str(x) +'_'+str(y), [y, x, res])
    end += '.png'
    imgurl = 'http://openconnecto.me/ocp/catmaid/bock11/image/xy/'+str(z) +end
    return imgurl

print get_image_url1(2000, 2000, 0, 2917)


http://openconnecto.me/ocp/catmaid/bock11/image/xy/2917/139_125_0.png

Now just need to figure out z-axis. The z values in the image data go from 2917-4156, which is a range of


In [5]:
print (2917-4156)*-1


1239

So it seems that the z-values in the data correspond approximately to the z-values in the image data, other than a translation of 2917. So let's redefine our function, and put it in one code block so its easy for other people to use...


In [6]:
# NOTE: you can copy just this block into your notebook to use the get_image_url() function 

xPix = 135424
yPix = 119808
xPixPerBin = xPix/108.0
yPixPerBin = yPix/86.0
max_tiles_x = 264 # found via inspection of viz html/JS code
max_tiles_y = 233 # found via inspection of viz html/JS code

def coords_to_px(xcoord, ycoord):
    c_vec = np.array([xcoord, ycoord], dtype='float')
    c_vec /= 39.0
    return (c_vec[0]*xPixPerBin, c_vec[1]*yPixPerBin)

def get_tilenums_at_res(xcoord, ycoord, res):
    x, y = coords_to_px(xcoord, ycoord)
    if(res == 0):
        x = np.round(float(x)/512)
        y = np.round(float(y)/512)
    else:
        x = np.round(float(x)/(512*(2**res)))
        y = np.round(float(y)/(512*(2**res)))
    return x,y

def get_image_url(xcoord, ycoord, zcoord, res=1):
    """
    params: 
    - xcoord, ycoord, zcoord all in terms of coordinates in original data file
    - res = image resolution, default = 1 
        since 1024x1024 pixels, approx. the size of a bin--i think
    returns: (string) url of image
    """
    zcoord += 2917
    z = int(zcoord)
    x, y = get_tilenums_at_res(int(xcoord), int(ycoord), res)
    x = int(x)
    y = int(y)
    if(x > max_tiles_x//(2**res)): x = max_tiles_x//(2**res)
    if(y > max_tiles_y//(2**res)): y = max_tiles_y//(2**res)
    end = '/' + reduce(lambda x, y: str(x) +'_'+str(y), [y, x, res])
    end += '.png'
    imgurl = 'http://openconnecto.me/ocp/catmaid/bock11/image/xy/' + str(z)
    return imgurl+end

Testing the image scraper w/ some exploratory questions


In [7]:
from IPython.display import Image, HTML, display

disp_dim = {'width': 200, 'height': 200} # just for quickly setting image width/height
m = np.max(data[:, -1])
a = np.where(data[:, -1]==m)
args = list(*data[a, (0, 1, 2)])+[0]
imgs = []
for r in range(3):
    args[-1] = r
    u = get_image_url(*args)
    print u
    imgs.append(Image(url=u, **disp_dim))

display(*imgs)


http://openconnecto.me/ocp/catmaid/bock11/image/xy/3971/131_173_0.png
http://openconnecto.me/ocp/catmaid/bock11/image/xy/3971/65_86_1.png
http://openconnecto.me/ocp/catmaid/bock11/image/xy/3971/33_43_2.png

The above should be images of the bin where maximal number of synapses occured. Note changes in resolution let us zoom in and out.

What do high density and low density regions look like?


In [8]:
dens_data = np.copy(data).astype(float)
dens_data = dens_data[np.where(dens_data[:,3] != 0)]
dens_data[:, 3] = dens_data[:, 4]/dens_data[:, 3]
dens_data = dens_data[:,:-1]
print np.average(dens_data[:,-1])
a = np.argsort(dens_data[:, -1])
urlsMin, urlsMax = zip(*[(get_image_url(*dens_data[a[i],:-1]),
                          get_image_url(*dens_data[a[-1-i],:-1]))  
                         for i in range(9)])
tagsMin = ''.join(["<img style='width: 80px; margin: 0px; padding-right: 3px; float: left;' src='%s' />" % str(u) 
                     for u in urlsMin ])
tagsMin += '<br> <br>'
tagsMax = ''.join(["<img style='width: 80px; margin: 0px; padding-right: 3px; float: left;' src='%s' />" % str(u) 
                     for u in urlsMax ])
display(HTML(tagsMin))
display(HTML(tagsMax))


0.0010395174988


Black regions most likely are masked regions, thus it is actually not surprising too see large amounts of masked for both high and low density areas (since low unmasked increases density, but at the same time lowers synaptic probability). Furthermore, note that the data is binned across many z-slices, while here, we are only looking at one z-slice at a time, thus it is plausible for a high density bin to have an entire slice masked. This also indicates that it would be beneficial to write a function that computes pixel-wise average across z-slices for a bin and returns the corresponding image. We can also only look at the more 'cleaned' data, as many boundary points are likely to be picked up here.


In [9]:
# get the clean data
x_bounds = (409, 3529)
y_bounds = (1564, 3124)

def check_in_bounds(row, x_bounds, y_bounds):
    if row[0] < x_bounds[0] or row[0] > x_bounds[1]:
        return False
    if row[1] < y_bounds[0] or row[1] > y_bounds[1]:
        return False
    if row[3] == 0:
        return False
    return True

indices_in_bound, = np.where(np.apply_along_axis(check_in_bounds, 1, csv, x_bounds, y_bounds))
data_clean = csv[indices_in_bound]

dens_data = np.copy(data_clean).astype(float)
dens_data = dens_data[np.where(dens_data[:,3] != 0)]
dens_data[:, 3] = dens_data[:, 4]/dens_data[:, 3]
dens_data = dens_data[:,:-1]
print np.average(dens_data[:,-1])
a = np.argsort(dens_data[:, -1])
urlsMin, urlsMax = zip(*[(get_image_url(*dens_data[a[i],:-1]),
                          get_image_url(*dens_data[a[-1-i],:-1]))  
                         for i in range(9)])
tagsMin = ''.join(["<img style='width: 80px; margin: 0px; padding-right: 3px; float: left;' src='%s' />" % str(u) 
                     for u in urlsMin ])
tagsMin += '<br> <br>'
tagsMax = ''.join(["<img style='width: 80px; margin: 0px; padding-right: 3px; float: left;' src='%s' />" % str(u) 
                     for u in urlsMax ])
display(HTML(tagsMin))
display(HTML(tagsMax))


0.00115002980202


How about regions with high unmasked, and low synapses, and high unmasked with high synapses?


In [10]:
from itertools import count

avg_unmasked = np.average(data[:,3])
high_unmasked = data[np.where(data[:,3] > avg_unmasked)]

low_synapses = []
for s in count():
    low_synapses = high_unmasked[np.where(high_unmasked[:,-1]==s)]
    if low_synapses.size > 0:
        print s
        break
d_low = low_synapses[0]
print d_low
imgs = []
imgs.append(Image(url=get_image_url(*d_low[:3]), **disp_dim))
max_s = np.max(data[:, 4])
print max_s
high_synapses = []
for s in range(max_s, 0, -1):
    high_synapses = high_unmasked[np.where(high_unmasked[:,-1]==s)]
    if high_unmasked.size > 0:
        print s
        break
d_high = high_synapses[0]
print d_high
imgs.append(Image(url=get_image_url(*d_high[:3]), **disp_dim))
display(*imgs)


9
[   838   2695   1054 107245      9]
507
507
[  2749   1876   1054 149991    507]

In [11]:
# zoom in a resolution
display(Image(url=get_image_url(*d_low[:3], res=0), **disp_dim), 
        Image(url=get_image_url(*d_high[:3], res=0), **disp_dim))


A significant number of the bins were cut off below a given threshold on the y-axis before the data was given to us... What does that line look like?

Since it appears that the coordinates in our data correspond to the center of a bin, we can see that there was no cut applied on the x-axis, since data starts at 19 and bin "length" is 40, and similarly no cut along z since data starts at 55 and bin "depth" is 110. But along the y-axis data starts at 1369; floor(1369/40)=34, meaning the first 34 bins along the y-axis was cut across all data.


In [12]:
# fairly arbitrarily, look at midpoint for x and z
midx, midz = [np.median(np.unique(data[:, i])) for i in [0,2]]
y = np.min(data[:, 1])
print midx, y, midz
Image(url=get_image_url(midx, y, midz, 2), **disp_dim)


2105.5 1369 610.0
Out[12]:

In [13]:
# nothing apparently notable, lets view across entire x-axis
from itertools import groupby

urls = [k for k,_ in groupby(
        [get_image_url(x, y, midz, 3) 
         for x in np.sort(np.unique(data[:, 0]))])]
imgTags = ''.join( ["<img style='width: 20px; margin: 0px; padding-bottom: 3px; float: left;' src='%s' />" % str(u) 
                     for u in urls ])
display(HTML(imgTags))

# y value below cutoff
urls = [k for k,_ in groupby(
        [get_image_url(x, 39*10, midz, 3) 
         for x in np.sort(np.unique(data[:, 0]))])]
imgTags = ''.join( ["<img style='width: 20px; margin: 0px; padding-bottom: 3px; float: left;' src='%s' />" % str(u) 
                     for u in urls ])
display(HTML(imgTags))


First row shows where the data was sliced, second is somewhere before it was sliced (that is, data not included in the set)... Since these black regions probably correspond to regions that are heavily masked, perhaps this is why data split here?


In [14]:
# what bins along this y value have unmasked = 0?
for row in data[np.where(data[:, 1] == y)]:
    if row[3] == 0: print row
print np.average(data[np.where(data[:, 1] == y+39*2), 3])


[  19 1369 1165    0    0]
[  58 1369 1165    0    0]
[  97 1369 1165    0    0]
[ 136 1369 1165    0    0]
[4114 1369 1165    0    0]
[4153 1369 1165    0    0]
[4192 1369 1165    0    0]
118932.739899

In [15]:
# do same thing for z = 1165, since we observe this is where all the unmasked = 0 bins occur

urls = [k for k,_ in groupby(
        [get_image_url(x, y, 1165, 3) 
         for x in np.sort(np.unique(data[:, 0]))])]
imgTags = ''.join( ["<img style='width: 20px; margin: 0px; padding-bottom: 3px; float: left;' src='%s' />" % str(u) 
                     for u in urls ])
display(HTML(imgTags))


Let's confirm that black regions do infact correspond to low unmasked.

What's unmasked = 0 look like?


In [16]:
a, = np.where(data[:, 3] == 0)
# first unmasked = 0
args = list(data[a[0], (0, 1, 2)])
u = get_image_url(*args)
print u
Image(url=u, **disp_dim)


http://openconnecto.me/ocp/catmaid/bock11/image/xy/4082/48_1_1.png
Out[16]:

In [17]:
# middle one
args = list(data[a[a.size//2], (0, 1, 2)])
u = get_image_url(*args)
print u
Image(url=u, **disp_dim)


http://openconnecto.me/ocp/catmaid/bock11/image/xy/2972/116_75_1.png
Out[17]:

In [18]:
# zoom out middle
args += [7]
u = get_image_url(*args)
print u
Image(url=u, **disp_dim)


http://openconnecto.me/ocp/catmaid/bock11/image/xy/2972/1_1_7.png
Out[18]:

Looks like regions where unmasked = 0 are corresponding to edges..


In [19]:
# last
args = list(data[a[-1], (0, 1, 2)])
u = get_image_url(*args)
print u
Image(url=u, **disp_dim)


http://openconnecto.me/ocp/catmaid/bock11/image/xy/4082/116_132_1.png
Out[19]:

Seems to be working... Another sanity check: if at resolution 0, the tiles returned contain 512 pixels, and that doubles as resolution increases, the number of pixels, at resolution r should be 512x2^r. Therefore resoltuion 8 should return approximately the entire cross section, since 512x2^8=131072, which is approximately the total number of pixels along the x-axis (135424). Let's test this out by grabbing the first z-layer's tile at the origin for resolution 8, i.e. the tile with url http://openconnecto.me/ocp/catmaid/bock11/image/xy/2917/0_0_8.png.

Seems to be correct.

Images within context of previous analyses

TODO i.e stuff I thought I had time for this week:

  • Plot density clusterings on top of image data, will help visualize what high synaptic density areas look like, and (hopefully) help visualize the clustering results
  • Plot gradient estimates on top of image data
  • More ways to visualize data than just grabbing a single xy slice
    • Averaging across z slices for a bin (mentioned previously)
    • Sequentially or interactively present image data (could use embeded JavaScript/html)
  • Find a more accurate mapping from data coordinates to image (it seems ok at the moment, but I think could be better... theres lot's of empty space since origin (for the y-axis) starts well above data... additionally would like to incorporate real measurements (see Bock 2011 paper) since i mostly just focused on pixels and compared it to whatever units our data set is in)
  • etc.

In [ ]: