This project started with the idea in mind to make atom identification and trajectory generation as flexible as possible. Hence it is using sliding windows and AdaBoost to efficiently find deciding features distinguishing available positive and negative samples.
Once the AdaBoost classifier is trained we can calculate atom positions in available frames and from this generate trajectories.
Note, there is no reason not to try out other classifiers as well. sklearn's api makes this very userfriendly.
In order to generate samples we here make use of kmeans clustering in a feature space. For each pixel we generate a feature vector containing the pixels intensity as well as the intensity differences with the neighboring pixels (here first shell, but you can change that). This leads to clusters where we choose the cluster with the highest average pixel intensity.
In [1]:
%reset
%matplotlib inline
from AtomStalker import load_TEM_image, TEM_sliding_collector
#path = './TEM_images/FEI_HAADF_Image_movie_282-CLEANED-subtracted.h5'#BackgroundSubtracted.h5'
path = "./example-subtracted.h5"
idx_frames = range(10)
xlim = [0,240]
ylim = [0,280]
width=8
height=8
xstep=12 #shift of pixels in x per iteration
ystep=12 #shift of pixels in y per iteration
padding=5
distance = 1
ix = 0
n_clusters = 4
n_frames_pca_km = 2
key = '/Experiments/__unnamed__/data' #old h5 file: 'dataset', new: '/Experiments/__unnamed__/data'
img_vec = [load_TEM_image(path,h5=True,frame=v,key=key) for v in idx_frames]
Tsc = TEM_sliding_collector()
Tsc.set_window_size(width=width,height=height)
Tsc.set_image(img_vec)
Tsc.create_samples(n_frames_pca_km,n_clusters,distance,xstep=xstep,ystep=ystep,max_iter=300,tol=1e-4,t=1,cluster_min_size=9)
Tsc.show_all_sliding_window_classifications()
The sample generation suggests classes, see above where positive is green and negative is red. In case that suggestions are wrong please specify below in form of a list of lists. Each list corresponds to one frame with integer entries related to the numbers shown in the patches in the images above. The lists will be interpreted for each frame, hence if there are no changes to be made for a frame insert an empty list for that frame.
Tip: If you have mistakenly assigned a class to a windows using the below cell just execute this cell again to flip the classification back to its original value.
In [2]:
wrong_classifications = [[205],
[206,3]]
Tsc.set_classifications(wrong_classifications)
Tsc.show_all_sliding_window_classifications()
In [3]:
Tsc.multiply_all_samples(distance=1)
Tsc.show_all_sliding_window_classifications()
In [4]:
classes_save_path = "./classes_for_classifier_training.txt"
Tsc.save_sample_positions_and_class(classes_save_path)
In [5]:
%matplotlib inline
from AtomStalker import load_TEM_image, TEM_sliding_collector
path = "./example-subtracted.h5"
xlim = [0,240]
ylim = [0,280]
width=8
height=8
padding=5
key = '/Experiments/__unnamed__/data'
img_vec = [load_TEM_image(path,h5=True,frame=v,key=key) for v in range(10)]
Tsc = TEM_sliding_collector()
Tsc.set_window_size(width=width,height=height)
Tsc.set_image(img_vec)
Tsc.get_sub_image_vec(xlim=xlim,ylim=ylim,padding=padding)
classes_save_path = "./classes_for_classifier_training.txt"
Tsc.load_sample_positions_and_class(classes_save_path)
Tsc.show_all_sliding_window_classifications(default_class=0,colors=['r','g','y','m'],frames=range(10))
In [6]:
import sys
sys.path.append("D:/Anaconda/envs/python2/Lib/site-packages")
import pywt
print(pywt.wavelist())
As mentioned earlier we use here an AdaBoost classifier, but you can simply use another one if you so choose.
In [7]:
from AtomStalker import ClassificationTrainer
from sklearn.ensemble import AdaBoostClassifier
path_classifier = './classifier_test'
all_samples, all_classifications = Tsc.get_all_samples_and_classifications()
CT = ClassificationTrainer(wavelet='coif1')
CT.set_samples(all_samples)
CT.get_features()
ABC = AdaBoostClassifier(base_estimator=None, n_estimators=50, learning_rate=1.0, algorithm='SAMME.R', random_state=None)
CT.set_classifier_and_train(ABC,all_classifications)
CT.save_classifier2disk(path_classifier)
Quality of classifier running on the training data.
In [8]:
new_classy = CT.get_predictions(all_samples)
tp = sum([v==1 for v,v2 in zip(all_classifications,new_classy) if v==v2])
tn = sum([v!=1 for v,v2 in zip(all_classifications,new_classy) if v==v2])
fn = sum([v!=1 for v,v2 in zip(all_classifications,new_classy) if v!=v2])
fp = sum([v==1 for v,v2 in zip(all_classifications,new_classy) if v!=v2])
R = tp/float(tp+fn)
P = tp/float(tp+fp)
F1 = 2*P*R/(P+R)
print("#{} positive classifications {} negative ones. tp {} fp {} tn {} fn {} F1 score = {}".format(sum(new_classy),len(new_classy)-sum(new_classy),tp,fp,tn,fn,F1))
print("all classy {} new classy {}".format(len(all_classifications),len(new_classy)))
In [9]:
from AtomStalker import ImagePeakClassifier
path = "./example-subtracted.h5"
path_save_coordinates = 'coordinates.txt'
path_classifier = './classifier_test'
t = 5 #maximum spatial distance of found window centers to be clustered - if two atoms are identified as one this value could be to large
xstep = 1
ystep = 1
num_frames = 5
width=8
height=8
Ipc = ImagePeakClassifier(wavelet='coif1')
#set classifier
classy = Ipc.load_classifier(path_classifier)
#Ipc.set_trained_classifier(CT.classifier)
#process data
Ipc.process_TEM_sequence(path,width,height,xstep=xstep,ystep=ystep,iclassy=[1],t=t,num_frames=num_frames,key='/Experiments/__unnamed__/data')
#save coordinates
Ipc.save_ClusterCenters2disk(path_save_coordinates,separator=' ')
#for i in range(num_frames):
# Ipc.show_window_for_single_img(i)
Ipc.show_window_for_single_img(1)
Ipc.show_window_for_single_img(2)
Ipc.show_window_for_single_img(3)
Ipc.show_window_for_single_img(4)
#tr.print_diff()
... or via the wrapping function Process_TEM.
In [2]:
from AtomStalker import ImagePeakClassifier, Process_TEM
path = "./example-subtracted.h5"
path_save_coordinates = 'coordinates.txt'
path_classifier = './classifier_test'
t = 5 #maximum spatial distance of found window centers to be clustered - if two atoms are identified as one this value could be to large
xstep = 1
ystep = 1
num_frames = 5
width=8
height=8
centers, corners = Process_TEM(path,path_classifier,width,height,xstep=xstep,ystep=ystep,t=t,num_frames=num_frames,key='/Experiments/__unnamed__/data',
write_path=path_save_coordinates,bool_return=True,wavelet='coif1')
In [1]:
%matplotlib inline
from AtomStalker import load_coordinates
#reading real positions for inference
path = "./example_coordinates.txt"
loaded_positions = load_coordinates(path)
loaded_intensities = [[1]*pos.shape[1] for pos in loaded_positions] #faking some intensities (just for the IO - not actually included in the sigmoid)
Let's connect the dots...
In [2]:
%matplotlib inline
from AtomStalker import show_trajectories
show_trajectories(loaded_positions,None,None,show_tuple=False,t_lim=(0,10))
In [3]:
from AtomStalker import SigmoidConstructor
import numpy as np
space = {'r':np.linspace(0,50,100),'dt':np.linspace(0,5,5)} #ranges for the variables included into the sigmoid
SC = SigmoidConstructor(space)
SC.create()
SC.show_sigmoid()
splined_sigmoid = SC.get_splined_sigmoid()
Given two observed atoms we can calculate their differences as $\vec{x}$, in our case $\vec{x} = (r,\theta,\Delta I,...)^T$. Using $\vec{x}$ we want to decide whether those two atoms form a link or not. This is equivalent to the classification of $\vec{x}$ into either $\mathcal{C=+}$ for "yes they form a link" or $\mathcal{C=-}$ otherwise. The idea then is to either use user input in manually specifying links from which the two classes are generated or to give an initial assumption about the distribution of those classes. Every so often these classes are then updated to adjust better to the actual observations made.
This is based on the logistic sigmoid function $\sigma(a) = \frac{1}{1+\exp{(-a)}}$ with $a=\ln\frac{p(\vec{x}|C=+)p(C=+)}{p(\vec{x}|C=-)p(C=-)}$. Hence if $p(\vec{x}|C=+) = p(\vec{x}|C=-)$ then $\sigma = 1/2$. If $C=+$ is more likely then $\sigma > 1/2$ and vice versa.
Practically this means one creates a histogram for both classes containing all available info and then plug them both into the logistic sigmoid and fit all sorts of things, splines (currently), Gaussian Processes, ...
Given an atom at time $i$ and "neighboring" atoms at time $i+1$ there could be multiple $\vec{x}$ which are classified as $\mathcal{C=+}$. There we use different strategies to link, three linking strategies are implemented:
The greedy strategy choses always the link with maximum sigmoid value.
In [4]:
from AtomStalker import link_positions
dt = 5
t_max = 25
t_update = 5
LM_traj_greedy, link_values_greedy = link_positions(loaded_positions,loaded_intensities,splined_sigmoid,dt=dt,t_update=t_update,t_max=t_max,
link_type='greedy',link_paras=None,final_eval=True)
The $\varepsilon$-greedy strategy is a modification of the the greedy strategy. An issue of the greedy strategy is that always chosing the link with the maximum sigmoid value between two atoms may not lead to the most likely trajectory when multiple trajectories are present. Hence one choses a $\varepsilon \in ]0,1[$. For every linking then there is the probability of $\varepsilon$ to chose a link completely at random without regard for the assigned sigmoid value.
In [5]:
dt = 5
t_max = 25
t_update = 5
#greedy
LM_traj_epsgreedy, link_values_epsgreedy = link_positions(loaded_positions,loaded_intensities,splined_sigmoid,dt=dt,t_update=t_update,t_max=t_max,
link_type='eps-greedy',link_paras={'eps':.2},final_eval=True)
The softmax strategy is a modification of the $\varepsilon$-greedy strategy. In the softmax strategy we assign a probability to each link available for a given position. This is done by taking the sigmoid values we obtained $\sigma$. For this the Boltzmann distribution is used $$ P(\sigma_k) = \frac{e^{-\frac{\sigma_k}{\tau}}}{\sum_{k^\prime}e^{-\frac{\sigma_{k^\prime}}{\tau}}} $$ yielding the probability for an atom and its possible link $k$ with corresponding $\sigma$ value.
In [6]:
dt = 5
t_max = 25
t_update = 5
#greedy
LM_traj_softmax, link_values_softmax = link_positions(loaded_positions,loaded_intensities,splined_sigmoid,dt=dt,t_update=t_update,t_max=t_max,
link_type='softmax',link_paras={'tau':1.},final_eval=True)
Some comparative analysis. Which star constellations do you see?
In [8]:
from AtomStalker import show_trajectories,show_values_vs_t
#trajectories
show_trajectories(loaded_positions,LM_traj_greedy,None,show_tuple=False,t_lim=(0,10),title='Greedy',legend=False)
show_trajectories(loaded_positions,LM_traj_epsgreedy,None,show_tuple=False,t_lim=(0,10),title='$\varepsilon$-$Greedy',legend=False)
show_trajectories(loaded_positions,LM_traj_softmax,None,show_tuple=False,t_lim=(0,10),title='Softmax',legend=False)
#q vs t
show_values_vs_t(greedy=link_values_greedy,epsgreedy=link_values_epsgreedy,softmax=link_values_softmax)
In [9]:
from AtomStalker import write_trajectories
path = "./result.traj"
write_trajectories(path,LM_traj_greedy,loaded_positions,loaded_intensities)
In [ ]: