Introduction

This notebook facilitates the manual curation of sample alignment.


In [1]:
import deltascope.alignment as ut

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import h5py

import os
import re
import time
import tqdm

Enable interactive matplotlib plots to allow us to update plots as needed


In [6]:
%matplotlib widget

Alternatively, we can use the inline magic for static plots that flexibly resize to fit the current window.


In [2]:
%matplotlib inline

Setup

Parameters


In [3]:
# --------------------------------
# -------- User input ------------
# --------------------------------

param = {
    'gthresh':0.5,
    'scale':[1,1,1],
    'microns':[0.16,0.16,0.21],
    'mthresh':0.5,
    'radius':20,
    'comp_order':[0,2,1],
    'fit_dim':['x','z'],
    'deg':2,
    
    # Don't forget to modify this with a specific sample name
    'expname':'testvideo'
}

Directories


In [11]:
# --------------------------------
# -------- User input ------------
# --------------------------------

# Specify file paths to directories containing probability files
# after processing by ilastik
gfap = os.path.abspath('./C2')
at = os.path.abspath('./C1')

# Specify root directory where output will be saved
root = os.path.abspath('.')

In [10]:
gfap


Out[10]:
'/Users/morganschwartz/Code/deltascope/experiments/testvideo/C2'

In [ ]:


In [5]:
root


Out[5]:
'/Users/morganschwartz/Code/deltascope/experiments/testvideo'

In [6]:
# Output directory with timestamp
outname = 'Output_'+time.strftime("%m-%d-%H-%M",
                                 time.localtime())

# Create output directory
outdir = os.path.join(root,outname)
os.mkdir(outdir)

Extract list of files


In [12]:
Dat = {}
for f in os.listdir(at):
    if 'h5' in f:
        num  = re.findall(r'\d+',f.split('.')[0])[-1]
        Dat[num] = os.path.join(at,f)

In [16]:
Dat


Out[16]:
{'01': '/Users/morganschwartz/Code/deltascope/experiments/testvideo/C1/AT_01_Probabilities.h5',
 '02': '/Users/morganschwartz/Code/deltascope/experiments/testvideo/C1/AT_02_Probabilities.h5'}

In [13]:
Dzrf = {}
for f in os.listdir(gfap):
    if 'h5' in f:
        num  = re.findall(r'\d+',f.split('.')[0])[-1]
        Dzrf[num] = os.path.join(gfap,f)

In [14]:
# Extract list of filename keys
klist = Dat.keys()

In [15]:
klist


Out[15]:
dict_keys(['01', '02'])

In [17]:
# Create dictionaries to contain the deltascope brain object for each sample
Dbat = {}
Dbzrf = {}

Import raw data and perform preprocessing


In [18]:
%%time
for k in tqdm.tqdm(klist):
    if k not in list(Dbat.keys()):
        Dbat[k] = ut.preprocess(Dat[k],param)
        Dbzrf[k] = ut.preprocess(Dzrf[k],param,pca=Dbat[k].pcamed,
                                 mm=Dbat[k].mm,vertex=Dbat[k].vertex)
    else:
        print(k,'already processed')


  0%|          | 0/2 [00:00<?, ?it/s]/Users/morganschwartz/anaconda3/envs/deltascope/lib/python3.7/site-packages/skimage/util/dtype.py:141: UserWarning: Possible precision loss when converting from float32 to uint8
  .format(dtypeobj_in, dtypeobj_out))
 50%|█████     | 1/2 [00:36<00:36, 36.42s/it]/Users/morganschwartz/anaconda3/envs/deltascope/lib/python3.7/site-packages/skimage/util/dtype.py:141: UserWarning: Possible precision loss when converting from float32 to uint8
  .format(dtypeobj_in, dtypeobj_out))
100%|██████████| 2/2 [01:20<00:00, 38.86s/it]
CPU times: user 2min 57s, sys: 7.7 s, total: 3min 5s
Wall time: 1min 20s

Define experiment specific functions


In [35]:
''' Define wrapper functions for starting and saving to minimize the number 
of inputs that the user needs to type for each call of the function.'''
def start(k):
    return(ut.start(k,Dbat,[Dbzrf],im=True))
def save_both(k,dfa,dfb):
    ut.save_both(k,dfa,dfb,outdir,param['expname'])

'''Save model parameters for each file to a dataframe that can be 
exported for later reference.'''
model = pd.DataFrame({'a':[],'b':[],'c':[]})
def save_model(k,mm,model):
    row = pd.Series({'a':mm[0],'b':mm[1],'c':mm[2]},name=k)
    model = model.append(row)
    return(model)

'''Define a function that can both fit a model and plot it on an existing plot'''
def fit_model(axi,df,mm=None):
    if mm == None:
        mm = np.polyfit(df.x,df.z,2)
    p = np.poly1d(mm)
    xrange = np.arange(np.min(df.x),np.max(df.x))
    axi.plot(xrange,p(xrange),c='m')
    return(mm)

'''Take a set of points and transform to a dataframe format for ease of access.'''
def pick_pts(x1,z1,vx,vz,x2,z2):
    pts = pd.DataFrame({'x':[x1,vx,x2],'z':[z1,vz,z2]})
    return(pts)

Example alignment

The start function initializes a sample for alignment. It displays a plot of the data with three projections (XY, XZ, and ZY) that should give a general sense of sample alignment. Note, that the start function attempts to align the sample using PCA. In some really bad cases, it is better to work on a sample without PCA alignment, in which case we can use the following code to initialize:

k = klist[0]
df,Ldf = get_dfs(k)
ax = ut.make_graph([df]+Ldf)

In most cases, the following approach will be appropriate.

k,df,Ldf,ax = start(klist[0])

k: The dictionary key that identifies this sample. It should be the sample number extracted from the filename.
df: The dataframe containing datapoints associated with the primary alignment channel, in this case, AT.
Ldf: A list of additional dataframes corresponding to other channels in the collection. In this template, we are assuming only one additional channel, ZRF.
ax: The array containing subplots for this sample. There should be three projections for each channel and a row per channel. In this example the dimensions of the array are 2x3.

Correction options

A: Rotation around the X axis

This rotation error can best be identified in the YZ projection where the line of the sample does not fall on either axis. In order to correct this error, we will select two points in YZ that will be used to calculate a line that fits the sample. We can then use this line to calculate the angle of rotation needed to align the sample with the Z axis.

To perform this rotation, we will use the ut.check_yz function, which will fit a line in the YZ plane to use for rotation. This function takes only df and Ldf as required inputs.

df1,Ldf1,ax,p = ut.check_yz(df,Ldf)

This function returns updated verions of df and Ldf following the rotation. I typically define new variables df1 and Ldf1 to keep track of the changes. It also returns a plotting object ax that will display a before and after comparison of the alignment with the points used for alignment plotted on the before plot for reference. Finally, it returns the np.poly1d object which contains the line that was used for alignment.

If the results of this alignment are good, we can proceed with df1 and Ldf1. Alternatively, we can try to manually assign an improved line and pass the resulting np.poly1d object to ut.check_yz as an optional arguement ut.check_yz(df,Ldf,mm=np.poly1d).

B: Rotation around the Y axis

This error can be seen in the XZ projection where the parabola of the sample is tilted towards one side or the other. In order to correct this error, we will select two points that mark the endpoints of the parabola. The line between these two points will be used to calculate the angle of rotation to correct the parabola.

To perform this rotation, we will use the check_pts function, which will perform a rotation either in the XY or XZ plane. It requires three parameters: df, Ldf, and the second dimension of the plane of interest ('y' or 'z').

# Attempt rotation based on alignment points calculated from the data
df1,Ldf1,pts,ax = ut.check_pts(df,Ldf,'z')

In addition to typical outputs, this function returns pts, which is a pandas dataframe specifying two points in the XZ plane that were used for alignment.

If we are unhappy with the results of the alignment, we can manually adjust the anchor points and then recalculate the rotation.

# Assign new values to pts
pts.iloc[0].x = 10
pts.iloc[1].z = 50

# Replot the updated pts to check choices
ax[0,1].scatter(pts.x,pts.z,c='y')

If these pts look good, then we can use ut.revise_pts to recalculate the rotation.

df2,Ldf2,ax = ut.revise_pts(df,Ldf,'z',pts=pts)

If we are happy with these results, we could use df2 and Ldf2 to save as our final result.

C: Mismatched Y and Z axes

Here the parabola appears in the XY projection when we expect it in the XZ projection. We can correct this by simply switching the Y and Z axes. The ut.zyswitch function facilitates this process.

df1,Ldf1 = zyswitch(df,Ldf)
# Plot data to check correction result
ax = ut.make_graph(df1,Ldf1)

D: Upside down parabola

We expect the parabola to lie in the positive half of the Z dimension. To correct an upside down parabola, we rotate the sample by 180$^\circ$ around X axis. Here, we will use the ut.flip function.

df1,Ldf1 = ut.flip(df,Ldf)
# Plot data to check correction result
ax = ut.make_graph(df1,Ldf1)

Correct vertex

After performing the corrections described in the previous section, the vertex of the parabola may no longer be positioned at the origin. The function ut.ch_vertex attempts to reposition the vertex to the origin. This function also returns the math model mm, which describes the parabola of the data. We will save mm for future reference.

For this example, we will assume that we are happy with the alignment in df1 and Ldf1.

df2,Ldf2,mm,ax = ut.ch_vertex(df1,Ldf1)

If we disagree with the assignment of the vertex, we can manually pick three points that will be used to recalculate the vertex. These points should mark the two ends of the parabola (x1,z1) and (x2,z2) as well as the approximate vertex (vx,vz). We can specify and check these points before using them to shift the sample.

pts = pick_pts(-78,12,-36,0,0,10) #these are essentially random numbers for example
ax[0,1].scatter(pts.x,pts.z,c='m',s=50)

Finally if we like these points, we can run ut.ch_vertex again with these specific points.

df3,Ldf3,mm,ax = ut.ch_vertex(df2,Ldf2,pts=pts)

In order to complete the alignment process, we need to add the math model mm to dataframe for future reference and save the aligned data.

model = save_model(k,mm,model)
save_both(k,df3,Ldf3[0])

Process remaining samples

The workflow presented in the section above needs to be applied to all samples in klist. It is up to the user to decide which corrections are appropriate for each individual sample. I recommend that this review is also used as an opportunity to exclude samples with issues in the raw data, such as tears in the commissure or limited signal. The reason for rejecting a sample can be recorded in this notebook for future reference.


In [20]:
klist


Out[20]:
dict_keys(['01', '02'])

01


In [22]:
k


Out[22]:
'01'

In [23]:
df.head()


Out[23]:
x y z
0 7.431914 3.414919 -8.114455
1 7.591783 3.411758 -8.120091
2 -14.569773 17.471701 9.446498
3 -14.409903 17.468540 9.440862
4 -14.250034 17.465379 9.435227

In [24]:
Ldf


Out[24]:
[                x          y          z
 0        3.131551  -8.910951 -23.256206
 1       -0.980481  -8.123329 -22.240900
 2       18.696199  -8.310608 -22.685848
 3       18.856069  -8.313769 -22.691483
 4       19.015938  -8.316931 -22.697119
 5       18.702567  -8.209833 -22.561736
 6       18.862437  -8.212994 -22.567372
 7       19.022306  -8.216155 -22.573007
 8       18.868804  -8.112218 -22.443260
 9        3.111772  -6.690723 -20.520114
 10       3.271642  -6.693884 -20.525750
 11       2.798401  -6.583625 -20.384731
 12       2.958271  -6.586786 -20.390367
 13       3.124508  -6.489172 -20.271891
 14       3.284377  -6.492333 -20.277526
 15       3.290745  -6.391557 -20.153415
 16       3.450615  -6.394718 -20.159050
 17       3.770353  -6.401041 -20.170321
 18       3.137243  -6.287620 -20.023668
 19       3.297113  -6.290781 -20.029303
 20       3.623220  -6.196328 -19.916463
 21       3.469718  -6.092391 -19.786715
 22      20.543237  -4.411966 -17.901849
 23      20.862976  -4.418289 -17.913120
 24      21.022845  -4.421450 -17.918756
 25       0.258242  -1.185608 -13.705375
 26       0.418111  -1.188769 -13.711010
 27       0.577981  -1.191931 -13.716646
 28      -0.055129  -1.078510 -13.569992
 29       0.264610  -1.084832 -13.581263
 ...           ...        ...        ...
 295243 -43.157879  12.267039  32.671925
 295244 -42.998010  12.263878  32.666289
 295245 -42.838140  12.260717  32.660654
 295246 -42.678271  12.257556  32.655018
 295247 -42.518401  12.254394  32.649383
 295248 -42.358532  12.251233  32.643747
 295249 -42.198662  12.248072  32.638112
 295250 -41.719054  12.238588  32.621205
 295251 -41.559184  12.235427  32.615569
 295252 -41.399315  12.232266  32.609934
 295253 -40.919707  12.222782  32.593027
 295254 -40.759837  12.219621  32.587392
 295255 -40.599968  12.216460  32.581756
 295256 -41.392947  12.333041  32.734046
 295257 -41.233078  12.329880  32.728410
 295258 -40.753469  12.320397  32.711503
 295259 -42.985274  12.465429  32.914512
 295260 -41.546449  12.436978  32.863793
 295261 -40.747102  12.421172  32.835615
 295262 -40.427363  12.414850  32.824344
 295263 -38.349060  12.373754  32.751082
 295264 -38.189190  12.370593  32.745446
 295265 -43.618384  12.578850  33.061166
 295266 -41.699951  12.540915  32.993540
 295267 -41.540081  12.537754  32.987904
 295268 -40.740734  12.521948  32.959727
 295269 -40.580864  12.518787  32.954091
 295270 -40.420995  12.515626  32.948456
 295271 -40.261125  12.512464  32.942820
 295272 -38.182822  12.471368  32.869558
 
 [295273 rows x 3 columns]]

In [21]:
k,df,Ldf,ax = start('01')



In [25]:
df1,Ldf1,pts,ax = ut.check_pts(df,Ldf,'z')



In [28]:
pts.iloc[0]['x'] = 45
pts.iloc[0]['z'] = 15
pts


Out[28]:
x z
0 45.000000 15.00000
1 -47.151239 17.27068

In [29]:
df2,Ldf2,ax = ut.revise_pts(df,Ldf,'z',pts=pts)



In [ ]:
np.poly1d(a,b)

In [30]:
df3,Ldf3,ax,p = ut.check_yz(df2,Ldf2)



In [31]:
df4,Ldf4,mm,ax = ut.ch_vertex(df3,Ldf3)



In [32]:
model = save_model(k,mm,model)

In [36]:
save_both(k,df4,Ldf4[0])


Write to /Users/morganschwartz/Code/deltascope/experiments/testvideo/Output_01-04-14-43/AT_01_testvideo.psi complete
Write to /Users/morganschwartz/Code/deltascope/experiments/testvideo/Output_01-04-14-43/ZRF_01_testvideo.psi complete

02


In [37]:
k,df,Ldf,ax = start('02')


Wrapping up: after all samples are processed

Once all of the data has been processed, we want to save the model data we collected to a file for future reference.


In [39]:
model


Out[39]:
a b c

In [38]:
model.to_csv(os.path.join(outdir,'model.csv'))

Additionally, it can be helpful to export a html or pdf version of the notebook that preserves all of the plots generated during the alignment process. To do so, use the Jupyter Lab menu interface: File > Export Notebook As > HTML or PDF.