MEG_AD_Thesis contains the dataset of 80 subjects as follows:
Equipment: 148 channels, 5min recordings, sampling frequency of 169.54Hz
Start Matlab instance and add project folder to the Matlab search path:
In [1]:
%load_ext pymatbridge
Good for writeup: signal processing and others
OBS: Each subject has at least 4 clean epochs for the analysis, i.e. 40 seconds recording time. There are a total of 1380 valid epochs.
We want to check if the recordings in _MEG_50863_noECG10s contain any values other than floats (e.g. NUL, SOH, etc.).
To do this, we would check each file within _MEG_50863_noECG10s folder by loading and inspecting the recordings. A Matlab function which gets all files within a directory (recursively) was employed: getAllFilesInDirectory(directoryName).
An epoch in _MEG_50863_noECG10s contains the following structures (for the _MEGnoECG_07segm9 example):
checkIntegrity.m loads each file in _MEG_50863_noECG10s and checks that:
Also checked that recordings for all 80 patients in IDs_AD-MCI-CON.m are present.
As we want to work with oscillations around 0, we need to subtract the mean at each time-step for each channel. In other words, for a matrix of MEG data (channels x time samples), we want the average of each column to be 0 (average reference). This can be done with the code below (partially from Matrix Algebra and Matlab)
Therefore, we need to remove the DC offset (more here: Brainstorm tutorials). From the website:
This option removes the baseline value of each sensor, ie. the continuous (DC) offset that is added permanently on top of the recordings of interest. In MEG, the sensors record variations around a somewhat arbitrary level, therefore this operation is always needed, unless it was already applied during one of the pre-processing steps. Note that a high-pass filter with a very low frequency (for instance 0.3Hz) can replace efficiently this DC correction. If a high-pass filter has already been applied to the recordings, you may want to unselect this option.
It also seems this is optional (as per the link above), as I look at frequencies higher than 0.5 Hz (bandpass filter for delta band). In summary, I need to do this to remove the baseline value of each sensor. For each channel, I take the mean of all samples and subtract it. (guess I can still use the code from Matrix Algebra and Matlab, might need taking some transposes)
I can use the demeaning parameter when doing ft_preprocessing: DC offset explanation. It subtracts the mean of the whole recording--> that's good, as the mean is not biased, say, when using only the first few samples.
In [8]:
%%matlab
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% Code snippet from blc.m in FieldTrip %%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% BLC does a baseline correction using the prestimulus interval of the data
%
% [data] = baseline(data, interval);
% [data] = baseline(data, begin, end);
%
% If no begin and end are specified, the whole timeinterval is
% used for baseline correction.
% determine the interval to use for baseline correction
%if nargin==1
% default is to use the whole timeinterval for baseline correction
State the problem and say that band-pass filtering should fix it, as the analysis is restricted for the interval (0.5, 45) Hz.
Following the preprocessing pattern of previous papers, I need to do 2 things:
I see two possible methods of doing the filtering:
OBS: It would be good to do the filtering with both methods so I can compare the results. I will go with FieldTrip for now, as it might pay off in the long run, when it comes to compute functional connectivity (FDAtool + filtfilt from Matlab) and do functional connectivity also with FieldTrip.
When it comes to filtering, Javier agreed that an FIR filter would be better than an IIR filter as it is more stable and is linear-phased (doesn't distort the phase) (see FIR vs IIR). The recommended FIR filter is a FIR designed using a Hamming window. The filter needs to be a high order one to ensure that the transition from the pass to the stop band is sharp. In FDATool, when selecting a window-based FIR, only the cut-off frequency(ies) need(s) to be selected. With a high-enough order, the transition from pass to stop will be abrupt.
For a window-based FIR filter, an order of 560 is far more than enough. The filter will be very abrupt, you can check it in fdatool if you want to. As far as I remember, I have used the order of 560 most of the time. The reason is that Matlab requires the filter length (which is equal to the order+1) to be smaller than 1/3 of the signal length. So, 1695/3 = 565 --> with 560 you are pretty close to the maximum order you can have.
After the mean has been subtracted to remove the DC offset, the next step is to filter the data. As I said above, we will use FDAtool + filtfilt to first low-pass filter to remove frequencies >45Hz and then band-pass filter in bands of interest.
Ricardo suggested to filter the data with padding in order to remove edge effects. In they're analysis, they pad the trials/epochs of interest with a no. of data samples corresponding to the order of the filter. Normally, this padding is done with the raw data.
Problem: my cleaned epochs have been cleaned of the cardiac artefact - how should I do the padding?
Answer: From Javier: It is indeed important to minimise transients. Apparently, filtfilt is already taking care of this by extrapolationg beginning and end of the data sequence using reflection. Slopes of original and extrapolated sequences match at end points, reducing transients (_line 204 in filtfilt.m).
If I choose to pad:
best to match amplitude value and the slope of the signal at the beginning and end. If you pad with 0s and the value of the signal is not 0 there, an artificial jump (i.e. transient) will be introduced. Padding is more about making sure that the filter does not come across big jumps and discontinuities than about bringing information from external epochs.
OBS: Javier thinks I don't need to pad and that I can use the whole length of the filtered epoch (all 10s).
Problem: It seems I don't need to pad. But when computing the WPLI, I need to take the Hilbert transform to get the phase. Ricardo recommended band-pass filtering with padding, then taking the Hilbert transform, then removing the initial padding. Is it ok not to pad when doing the Hilbert transform?
Answer:
The phase obtained from the Hilbert transform should be the instantaneous phase. Matlab computes the Hilbert transform via a DFT transformation so I am not sure padding will make much difference. I play a little with some data in Matlab and, as far as I see, the differences of using padding or not are minimal.
-------------- EMAIL 6/07/2014 --------------------------------------
First of all, an important point I forgot to take into account in my previous email (and code) is that the computation of the phase using the Hilbert Transform is properly defined for narrow-band signals. I'm afraid this is not fulfilled for the example using random numbers in my previous email. Please, accept my apologies. If possible, I would recommend you applying a band-pass filter to your examples. For the MEGs, the bands of delta, theta and alpha are relatively narrow, but you may want to split beta and/or gamma into narrower sub-bands. However, I agree with your 2nd email that not using padding should be OK. If you decide to use padding, please, use the symmetric extension.
OBS: Checked Javier's code in Testing. Also checked how padding Javier's way and padding using symmetrical extension affect the phase of MEG signals. It seems that padding very little changes the phase. I will not pad, unless the WPLI measure (sliding window for phase differences) affects the results.
Need to check how the phases are after doing the Hilbert transform for the prefiltered padded/unpadded signal into a bandpass.
Problem: In this project, I look at functional connectivity as measured by looking at phase synchronisation. It is important that the phase of the signals should not be distorted during the processing. The epochs I'm working with have been cleaned of the cardiac (ECG) artefact. Does this cleaning affect the phase?
Answer:
ICA might affect slightly the phase of the recorded activity in the sense that a constraint ICA is used to estimate the cardiac artefact, and then this signal is subtracted from the MEG recordings. So, basically, one signal (the same signal!) resembling the Cardiac Artefact was subtracted from all channels. Because we subtract a signal, we will modify the amplitude and phase of the MEGs. However, the signal is the same for all channels, so the same amplitude and phase is equally subtracted and, therefore, I believe that this will minimise the chances of inserting spurious results in the phase difference between signals. It is not like we are performing ICA on all MEG channels and then performing a partial reconstruction. However, the jury is still out on whether it is better to remove the cardiac artefact this way or not.
OBS: Javier suggested testing both approaches: analyse the signal with the cardiac artefact rejection and the raw data (50863) using the same epochs. Will do the comparison if I have the time.
In [ ]:
# seems that filtering will be done with FieldTrip in the end. See below.
Next step is to see how can I use fieldtrip to do the filtering with an FIR filter.
Ricardo said to use this to save the data (for faster reading? I don't think I need this...): Write data to a more efficient format
FT_PREPROCESSING is used to load the data. If trial definitions are not specified (I don't see a reason to do this), then ft_preprocessing would load all data and put them into a [channels x time] array of a single trial (e.g. in data.trial{1}). They say that sooner or later, I need to segment the data into trials. I guess the trials can just be the epochs.
I already have my clean epochs, so I don't need to do the segmentation.
Sample code
%%matlab
cfg.trl = trl; % saved somewhere previously
cfg.dataset = 'exampledata.eeg'; % data file, you might also need to add the path to it
trialdata = ft_preprocessing(cfg); % call preprocessing, putting the output in 'trialdata'
The first 2 columns from trl that described the start and end samples of a trial in the original data are now found in trialdata.sampleinfo.
In [12]:
from IPython.core.display import Image
Image(filename='./img/preprocessing.png')
Out[12]:
I have modified the header file from Ricardo. Now I need to load the data with FieldTrip and filter into the bands.
FT_PREPROCESSING:
Configurations for cfg structure:
Configurations for data:
Time vector
The clean epochs do not have a time vector associated with them. I need to generate that and supply it to the data structure. Code for this can be found in getTimeVectorFor10sEpoch.mat. Initially, knowing that my epochs have 1695 samples, I tried to take the time vector for the first 1695 samples from the raw data. The problem was that the values in the time vector were multiplied by 10^5. I understand from Alex that this is a problem with the data coming from Excel, where decimals in the Spanish data file used commas instead of dots. I simply multiplied the values by 10^(-5) and that solved the problem.
Welch's Method -- say how it works and maybe how you implemented it in FieldTrip (redefined 10s epoch in segments of 2s length with 50% overlap). Also, when performing frequency analysis in the gamma band, multitapers have been employed for better control over frequency smoothing: Multitapers (Frequency Analysis tutorial FieldTrip)
Used +/-4Hz for gamma band i.e. 8Hz smoothing box (with this as reference (Mitra and Pesaran (1999) Analysis of dynamic brain imaging data. Biophys J. 76(2):691-708) 3) Percival and Walden, 1993 Spectral analysis for physical applications: multitaper and conventional univariate techniques. Cambridge, UK: Cambridge UP.).
REFERENCE: From Good Practice for Conducting and Reporting MEG Research:
The bandwidth of the different frequency components in electro- physiological data increases with frequency. Also, across subjects, there is variability of the peak frequency (e.g. of the subject-specific gamma-band oscillatory activity) that is more pronounced for the higher frequency bands. For optimal sensitivity, it is therefore recommended to increase frequency smoothing at higher frequen- cies: for example, to apply little smoothing for frequencies below about 30 Hz (e.g., a fewtapers implementing up to 2 Hz smoothing), but greater smoothing at higher frequencies, particularly above 50 Hz (e.g.,multiple tapes implementing up to 10–15 Hz smoothing) (Hoogenboom et al., 2006; Schoffelen et al., 2011)
It is recommended for frequencies below 30Hz to use a single taper, e.g. Hanning. This is the one that was used for analysis in [delta-beta] bands.
No single functional connectivity measure is significantly better than the others (paper on neural mass models that says that). They advise to look at multiple measures simultaneously...
From Mailing List:
Just for the very very last time I will make the following statement: Single trial coherence estimates cannot be computed! Perhaps the font size and color will make it memorable ;-). This is due to the simple fact that coherence is defined across observations. If you have just a single observation, due to the mathematics involved, the coherence value will be 1.
From Mailing List:
For all these connectivity measures, we need the phase information from the Fourier transform (FFT) to compute the cross-spectral density matrix (CSD). The CSD can be regarded as the equivalent in the frequency domain to the covariance matrix in the time domain, thus it is a measures how a certain channels activity is co-modulated with another channels activity (for a particular frequency) - makes pretty much sense to make use of this when computing connectivity, right? :) As you might know, a Fourier transform returns complex numbers, where the imaginary part of this number contains phase information. When calling ft_freqanalysis, you can decide if it shall only return the (squared) real part of the frequency spectrum (cfg.output='pow'), or instead the full CSD (cfg.output = 'powandcsd'). Alternatively, you can also let ft_freqanalysis return the raw Fourier coefficients (cfg.output='fourier'). From the Fourier coefficients, however, you can easily obtain the CSD or the power spectrum basically by multiplication the Fourier matrix with itself (transposed). All of these measures quantify the phase relation between channels across trials, and as phase information is coded in the imaginary part of the fourier output, thus for connectivityanalysis you need the CSD which can be obtained either by 'powandcsd' or by 'fourier'.
From Javier, email 07/07/2014
For each MEG epoch, we do not have a clear, well defined event that marks the beginning of the epoch, it is just ongoing activity that happened to be divided into epochs in that way. Still, it may be beneficial to compute the WPLI over shorter windows and the to average the results over the whole epoch, as measuring a phase difference over a relatively long signal of 10s may be more challenging.
It was agreed to use the weighted phase lag index (WPLI) [Vinck et al., 2011] as a phase synchronisation measure, as it is superior in terms of volume conduction, noise and sample-size bias. Coherence is another classic connectivity measure that can be used as a reference for comparing the results of the WPLI.
Basically, if you filter your signal in a particular band (which is what is done in EEG/MEG/ECoG analysis) then you're forcing your data to assume a sinusoidal, oscillating state. Once you have this filtered (discrete) time-series signal you can then calculate the instantaneous phase at any given time point. (from Hilbert Quora)
We pretty much have zero phase at peaks and +/- PI phase at throughs. Hilbert transform is used to extract those phase values.
Need to take the Hilbert transform of the signals to obtain the instantaneous phase. It is good that we prefiltered into specific frequency bands as this is needed for the Hilbert transform when restricting the analysis to certain frequencies (Phase synchronisation, Quian Quiroga, 2002).
Also understand Hilbert transform for brain waves and look at hilbert Matlab for summary.
In Matlab, hilbert(matrix) operates columnwise, finding the Hilbert transform of each column. As we need to compute the analytic signal for each channel over time, we need to transpose the 148 channels X 1695 time samples matrix.
The actual computation of the connectivity metric is done by ft_connectivityanalysis.
Need to figure which one is the best one, considering I also want to compute connectivity.
ft_connectivityanalysis() on frequency domain data containing fourier-spectra w/o specifying cfg.channelcmb will result in yet another representation of frequency domain data. For a quantity like the coherence spectrum the values across the diagonal are symmetric.
If I want to use ft_connectivityplot(), I need to change some parameters when computing the frequency: Connectivity Tutorial. Question is: do I want to use ft_connectivity plot()? :)
It is not necessary to compute the cross-spectral density at this stage, because the function used in the next step, ft_connectivityanalysis, contains functionality to compute the cross-spectral density from the fourier coefficients.
So the parameters I need to use are:
Normally I would set cfg.output to the powandcsd option, but _ftconnectivityanalysis does the csd for me. More info on why 'fourier' should be used: mailing list.
OBS: Also need to set cfg.keeptrials to 'yes' which doesn't average the frequency analysis over the trials. This is needed as the functional connectivity analysis needs separate epochs for
When it comes to Gamma band filtering, make sure to use multitapering for better control over frequency smoothing Time Frequency Analysis. For all other frequencies of interest (delta-beta) the Hanning window is ok as the frequencies are below 30Hz.
From Mailing List:
You should think of coherence more as a measure for the consistency of the phase relation between channels across trials, so coherence is just not defined per trial, neither would it make sense to compute coherence for a single trial.
I can calculate connectivity analysis via the imaginary part of coherency (here) by specifying cfg.method = 'coh' in conjunction with cfg.complex = 'imag'.
Obs: The coherence measure is a symmetric measure, which means that it does not provide information regarding the direction of information flow between any pair of signals.
In [1]:
%%matlab
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% Code for computing coherence
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% frequency analysis
cfg = [];
cfg.method = 'mtmfft'; %% entire spectrum for the entire data length
cfg.taper = 'hanning';
cfg.foilim=[0.5 4];
cfg.output = 'fourier'; %% csd computed in ft_connectivityanalysis
cfg.channel = header.label;
cfg.keeptrial = 'yes'; %% keep individual trials for connectivity step
[freq] = ft_freqanalysis(cfg, processedData);
%%% connectivity
cfg = [];
cfg.channel = header.label;
cfg.method = 'coh'; %wpli_debiased'; %% connectivity measure
stat = ft_connectivityanalysis(cfg, freq);
Layouts are an essential element in FieldTrip, and many FieldTrip functions rely that the user specifies a layout in his configuration (cfg.layout).
In order to plot coherence or other connectivity measures between channels, FieldTrip needs to be specified a layout file. This layout file contains positions of the 2D coordinates of the sensors for plotting. I will use the standard layout file which is included in FieldTrip. This is based on the MAGNES 2500 WH, 4D Neuroimaging machine which was used to collect the data.
Good for writing up (inspiration):
In machine learning and image processing, feature extraction and feature selection are a special form of dimensionality reduction. Feature extraction is a dimensionality reduction approach that projects (linear or nonlinear) D-dimensional vector onto d-dimensional vector ( inline image), while feature selection is an approach that selects a small subset of original features. The aim of feature extraction and feature selection is to prevent the curse of dimensionality problem [Guyon and Elisseeff, 2003] and to identify the relevant features that leads to better performance of learning models.
In the connectivity networks (matrices), there exist a large number of low level features (i.e., inline image, where n is the total number of ROIs) but with small subject size. To reduce the dimension of data and to find out the biomarkers for MCI diagnosis, it is crucial to extract meaningful features from the connectivity networks. The connectivity networks can be characterized at different levels, ranging from properties characterizing a whole network at global scale to properties of network components at local scale
From: Brain Network Analysis: Separating Cost from Topology Using Cost-Integration (found out about it here(Graph analysis of functional brain networks: practical issues in translational neuroscience), in reading the between group comparison sections:
a second approach is to compare the properties of the brain graphs by integrating the topological metric over a collection of available thresholds, as in [97]
something like this: "From each association matrix, a binary adjacency matrix A was derived where aij was considered 1 if rij was greater than a specific threshold and zero otherwise. The diagonal elements of the constructed association matrix were also set to zero. The resultant adjacency matrix represented a binary undirected graph G in which regions i and j were connected if gij was unity" (from here)
When this is not the case, or when one is comparing two populations of weighted networks, the question of whether or not these networks have similar topological properties becomes arduous
We threshold the connectivity matrix at with a range of values so as to not restrict the analysis to just a certain value. (cite this here: M. Rubinov and O. Sporns, “Complex network measures of brain connectivity: uses and interpretations,” NeuroImage, vol. 52, no. 3, pp. 1059–1069, 2010. View at Publisher · View at Google Scholar · View at Scopus) (good paper for writeup)
(connectivity networks are intrinsically fully connected, to reflect different levels of topological structure of the original connectivity network, we simultaneously threshold the connectivity network with different predefined values.)
To account for the different noise levels between subjects, we use a proportional threshold T, keeping N strongest connections, where T is chosen like in (Altered small-world properties of gray matter networks in breast cancer):
T takes the value of the minimum density that satisfies the requirement that all individual subject graphs are connected (connectedness, the ability for every node to reach every other node in the network).
T takes a range of values
Similarly to the study HERE.
The set of networks from a particular clinical group can be thresholded to create either equi-sparse graphsor equi-threshold graphsWe chose to construct equi-sparse graphs in order to ensure the most direct mathematical comparability of graph metric values (Bullmore and Bassett, 2011). [.....] This “cumulative” thresholding procedure has been used extensively in previous functional brain network studies because it facilitates an investigation of a network composed of the x% statistically most significant connections ( Bullmore and Bassett, 2011).
Similarly to the study HERE (go in paper and cite the papers they referenced themselves for this methodology),
E equal to number of edges (links), and a network density (cost) of D = E/[(N x (N-1))/2] representing the fraction of present connections to all possible connections. Since thresholding the association matrices of different groups at an absolute threshold results in networks with a different number of nodes (and degrees) that might influence the network measures and reduce interpretation of between group results [37],
we employed two approaches for thresholding the adjacency matrices: 1) Thresholding the constructed association matrices at a minimum network density in which all nodes become fully connected in the brain networks of both group
Certainly, a good threshold should neither filter all the possible links nor let them all survive. Indeed, analyzing brain graphs that are either almost empty or fully connected appears to be worthless [8] (From Graph analysis of functional brain networks: practical issues in translational neuroscience)
The following graph measures were used as features for the classification task: (http://www.sciencedirect.com/science/article/pii/S1053811913002772) (Reference Rubinov and Sporns, 2010 Complex network measures of brain connectivity..)
"The small-worldness measure should not be regarded as a substitute for individual assesments of itnegration and segregation"! (Rubi, Spor)
To further evaluate the network structure, a graph analysis based on WPLI was used to determine clustering coefficient (C) and betweenness centrality (BC) as local coefficients as well as the characteristic path length (L) as a parameter for global interconnectedness. The network’s modular structure was also calculated to estimate functional segregation.
From BCT page:
Create 100 random networks from each connectivity matrix that will respect the number of nodes, edge distribution, node degree. Each edge of the initial matrix was rewired 1000 times. REFERENCE Considering the fact that we would have to generate 100 random networks for the connectivity matrix of each frequency band, for each threshold, for each subject (LIMITATION), the computational requirements of this task would be too demanding. For this reason, we have chosen to change the parameters to 10 generated networks with 40 rewiring iterations per edge.
OBS/DISCUSSION: For low thresholds, the graph becomes more segmented and multiple graph components may appear. In consequence, networks topology measures may be distorted by these individual connected networks. This would be an explanation for the variabity of the curve for these low threshold values.
No single agreed measure (here, maybe find better ref) Like in the just cited paper, we used the Louvain algorithm for fast community detection (Louvain paper). "Since modularity Q values can vary based on random differences in module assignments from run to run" (MUST SEE: modularity_und.m (Ci and Q may vary from run to run, due to heuristics in the algorithm. Consequently, it may be worth to compare multiple runs))
Functional Data Analysis (FDA) for graph curves here
can also use ANOVA
OBS: Feature Scaling: explain Z-transform (StandardScaler in sklearn)
LOOCV Feature scaling is done on each iteration of LOOCV. The mean and standard deviation of the features of the training set for that specific iteration are taken into account when doing feature scaling.
From here:
One thing that we always do when running any predictive analysis is to perform a randomization test to determine the distribution of performance when the relation between the data and the outcome is broken (e.g., by randomly shuffling the outcomes)
OBS: Might simply learn noise! Very likely not to be good for actual prediction... I don't know... It's certainly a limitation (discuss it a bit).
Used code from https://github.com/blacklab/nyan/blob/master/shared_modules/smote.py
Used K=5, as that was the value used in the original paper.
WRITEUP
did not look at non-symmetrical measures such as Granger causality because the final goal was to develop a classifier to discriminate between the 3 categories and not ...
very good critical review of all steps in processing pipeline:
functional brain networks are the result of a tricky processing pipeline (Fig. 1), which basically consists in improving the quality of the recorded brain signals, constructing the network by means of measures of pairwise functional connectivity, retaining the most relevant links, extracting graph metrics describing topological properties of the network and finally applying statistical procedures to the extracted graph indices to determine differences between populations
Negative weights - page 4 in the above paper; explains why I take absolute value of the correlation coefficients.
also consult GAT toolbox when describing pipeline
In [15]:
%unload_ext pymatbridge