To support adaptive behavior, activity in the brain must correspond in some way to relevant sensory events and planned movements, combine many sources of information into multimodal percepts, and recall traces of past events to inform predictions about the future. In other words, neural activity must somehow encode relevant quantities. For instance, it can be demonstrated behaviorally that many animals use estimates of their location and head direction to navigate towards a goal. Where, and how, are these quantities represented in the brain? What are the neural circuits that can compute and update these signals? How do place and direction estimates contribute to which way to go? Decoding approaches can help address such questions, and many more besides.
To develop an intuition for how decoding works, consider Hubel and Wiesel's demonstration that single cells in macaque V1 respond to bars of light not only within a particular region of visual space, but also with a specific orientation. Such cells are said to be tuned for orientation (of the bar) and a typical tuning curve would therefore look like this:
This tuning curve describes how the cell responds, on average, to different orientations of the stimulus. If the cell were to respond with the same firing rate across the range of stimulus orientations, then the cell is indifferent to this particular stimulus dimension: it does not encode it. However, because this cell clearly modulates its firing rate with stimulus orientation, it encodes, or represents (I use these terms interchangeably, but some disagree) this quantity in its activity.
We can turn this idea around and note that if orientation is encoded, this implies we can also decode the original stimulus from the cell's activity. For instance, if we noted that this cell was firing at a high rate, we would infer that the stimulus orientation is likely close to the cell's preferred direction. Note that this requires knowledge of the cell's tuning curve, and that based on one cell only, we are unlikely to be able to decode (or reconstruct, which means the same thing) the stimulus perfectly. The more general view is to say that the cell's activity provides a certain amount of information about the stimulus, or equivalently, that our (decoded) estimate of the stimulus is improved by taking the activity of this cell into account.
This module first explores some practical issues in estimating tuning curves of “place cells” recorded from the rat hippocampus. An introduction to a particular decoding method (Bayesian decoding) is followed by application to many simultaneously recorded place cells as a rat performs a T-maze task, described in van der Meer et al. (2017).
To run this notebook, I used the following steps:
conda create --name py35 python=3.5 anaconda
)cd "matlabroot\extern\engines\python"; python setup.py install
)python -m pip install imatlab; python -m imatlab install
)jupyter notebook
launches jupyter)Next, we set the MATLAB environment to use the vandermeerlab codebase, and specify the location of the data folder used for this tutorial (R042-2014-08-18
, from van der Meer et al. (2017). To obtain the data, try Datalad or see the download instructions here to access the mvdmlab server):
In [1]:
SET_GitHub_root = 'D:\My_Documents\GitHub'; % replace this with the location of your local GitHub root folder
SET_data_fd = 'D:\data\R042\R042-2013-08-18'; % replace this with the location of your local data folder
addpath(genpath(cat(2,SET_GitHub_root,'\vandermeerlab\code-matlab\shared'))); % warning is because this overloads the plot() function
imatlab_export_fig('print-png'); % set notebook output to be static png images
Next, we load the spike and position data (stored in .t
and .nvt
files respectively, as elaborated on here, you may need to unzip the position data first):
In [2]:
cd(SET_data_fd);
please = []; please.load_questionable_cells = 1;
S = LoadSpikes(please); % `please` variable overrides default LoadSpikes() options
pos = LoadPos([]); % empty input [] causes LoadPos() to use default options
The load_questionable_cells option in LoadSpikes()
results in the loading of ._t
spike time files, in addition to the familiar .t
files. The underscore extension indicates a cell with questionable isolation quality, likely contaminated with noise, spikes from other neurons, and/or missing spikes (see here for more information on the spike sorting process inherent in extracellular recordings such as this one). In general, you do not want to use such questionable neurons to make claims about single neuron properties; however, for this analysis we are not concerned with properties of individual neurons. We are instead interested in the information present in a population of neurons, and for this we will take everything we can get!
Before looking at the data, we will first exclude the pre- and post-run segments of the data:
In [3]:
LoadExpKeys; % annotation file containing some basic information about this data session
S = restrict(S,ExpKeys.TimeOnTrack,ExpKeys.TimeOffTrack); % restrict to on-track data only
pos = restrict(pos,ExpKeys.TimeOnTrack,ExpKeys.TimeOffTrack);
Now we can plot the position data (location of the rat as it runs the maze, in gray), and the position of the rat when an example neuron fired a spike (in red):
In [4]:
plot(getd(pos,'y'),getd(pos,'x'),'.','Color',[0.5 0.5 0.5],'MarkerSize',1); % note getd() gets data corresponding to a specific label (x and y here)
axis off; hold on;
iC = 7; % select cell 7 (out of 107 total)
spk_x = interp1(pos.tvec,getd(pos,'x'),S.t{iC},'linear');
spk_y = interp1(pos.tvec,getd(pos,'y'),S.t{iC},'linear');
h = plot(spk_y,spk_x,'.r'); axis tight;
Note the use of interp1()
here: it finds the corresponding x and y values for each spike time using linear interpolation. This cell seems to have a place field just to the left of the choice point on the track (a T-maze). There are also a few spikes on the pedestals, where the rat rests in between runs on the track.
This figure is a useful visualization of the raw data, but it is not a tuning curve, which captures the relationship between a variable of interest (e.g. position) to firing rate. A set of tuning curves can be thought of as an encoding model that specifies how the position variable is encoded by our population of place cells. As a first step to estmiating this model, we restrict our data to those times the rat is actually running on the track:
In [5]:
LoadMetadata; % loads experimenter-annotated file associated with each data session
% ENCoding variables: used for estimating tuning curves (encoding model)
ENC_S = restrict(S,metadata.taskvars.trial_iv); % trial_iv contains the start and end times of trials
ENC_pos = restrict(pos,metadata.taskvars.trial_iv);
% check for empties and remove
keep = ~cellfun(@isempty,ENC_S.t);
ENC_S = SelectTS([],ENC_S,keep);
% also set up DECoding variables for use later
DEC_S = SelectTS([],S,keep);
Now, we can compute tuning curves for each of our cells. The TuningCurves()
function does this in three steps:
The below code plots the output of the third step for our example cell:
In [6]:
cfg_tc = [];
cfg_tc.minOcc = 1; % minimum occupancy (in seconds) for bin to be included
cfg_tc.binEdges{1} = 80:10:660; cfg_tc.binEdges{2} = 0:10:520; % bin edges (in camera pixels)
tc = TuningCurves(cfg_tc,ENC_S,ENC_pos);
pcolor(sq(tc.tc2D(iC,:,:))); % sq() squeezes 3-D matrix down to 2-D for plotting
shading flat; axis off; colorbar;
As can be seen from the plot above, the (arbitrary) binning results in a noisy-looking tuning curve. As shown in van der Meer et al. (2017) applying some judicious smoothing can increase decoding accuracy substantially. This smoothing is kind of a hack, and can be viewed as an approximation of a more principled estimation process that starts with a prior on the likely shape of single cell tuning curves which is then updated using the encoding data. For now, we'll just do what works:
In [7]:
cfg_tc.smoothingKernel = gausskernel([4 4],2); % Gaussian kernel of 4x4 pixels, SD of 2 pixels (note this should sum to 1)
tc = TuningCurves(cfg_tc,ENC_S,ENC_pos);
pcolor(sq(tc.tc2D(iC,:,:))); shading flat; axis off; colorbar
You can inspect the tuning curves of all cells as follows:
In [8]:
ppf = 25; % plots per figure
for iC = 1:length(ENC_S.t)
nFigure = ceil(iC/ppf); figure(nFigure);
subtightplot(5,5,iC-(nFigure-1)*ppf);
pcolor(sq(tc.tc2D(iC,:,:))); shading flat; axis off;
caxis([0 10]); % set color axis from 0 to 10 Hz
end
You will see a some textbook “place cells” with a clearly defined single place field. There are also cells with other firing patterns.
This notebook computes tuning curves for location, but the idea is of course more general. For continuous variables in particular, it is a natural and powerful way to describe the relationship between two quantities – spikes and location in this case, but there is no reason why you couldn't do something like pupil diameter as a function of arm reaching direction, for instance!
Note: The tc
struct returned by TuningCurves()
contains a number of other fields in addition to the actual tuning curves plotted above. In a nutshell, these fields exist to speed up the decoding computations later, by only storing firing rates for bins with sufficient sampling (occupancy): as you can see from the tuning curves above, this is true for only a small subset of the full grid. The tc.good_idx
variable strores the indices (into the 2-D grid) of those bins, and tc.tc
contains tuning curves in this reduced format (i.e. is of size [nCells x length(good_idx)]).
As noted in the introduction above, given that we have neurons whose activity seems to encode some stimulus variable (location in this case, as evident from the tuning curves above), we can attempt to decode that variable based on the neurons' time-varying activity.
A popular approach to doing this is “one-step Bayesian decoding”, illustrated in this figure (from van der Meer et al. 2010):
For this particular experiment, the goal of decoding is to recover the location of the rat, given neural activity in some time window. More formally, we wish to know $P(\mathbf{x}|\mathbf{n})$, the probability of the rat being at each possible location $x_i$ ($\mathbf{x}$ in vector notation, to indicate that there are many possible locations) given a vector of spike counts $\mathbf{n}$.
If $P(\mathbf{x}|\mathbf{n})$ (the "posterior") is the same for every location bin $x_i$ (i.e. is uniform), that means all locations are equally likely and we don't have a good guess; in contrast, if most of the $x_i$ are zero and a small number have a high probability, that means we are confident predicting the most likely location. Of course, there is no guarantee that our decoded estimate will agree with the actual location; we will test this later on.
So how can we obtain $P(\mathbf{x}|\mathbf{n})$? We can start with Bayes' rule:
$P(\mathbf{x}|\mathbf{n})P(\mathbf{n}) = P(\mathbf{n}|\mathbf{x})P(\mathbf{x})$
If you have not come across Bayes' rule before, or the above equation looks mysterious to you, review the gentle intro by linked to at the top of the page. In general, it provides a quantitative way to update prior beliefs in the face of new evidence.
The key quantity to estimate is $P(\mathbf{n}|\mathbf{x})$, the probability of observing $n$ spikes in a given time window when the rat is at location $x$. At the basis of estimating this probability (the "likelihood" or evidence) lies the tuning curve: this tells us the average firing rate at each location. We need a way to convert a given number of spikes -- whatever we observe in the current time window for which we are trying to decode activity, 3 spikes for cell 1 in the figure above -- to a probability. In other words, what is the probability of observing 3 spikes in a 250ms time window, given that for this location the cell fires, say at 5Hz on average?
A convenient answer is to assume that the spike counts follow a Poisson distribution. Assuming this enables us to assign a probability to each possible spike count for a mean firing rate given by the tuning curve. For instance, here are the probabilities of observing different numbers of spikes $k$ (on the horizontal axis) for four different means ($\lambda = $1, 4 and 10):
In general, from the definition of the Poisson distribution, it follows that
$P(n_i|\mathbf{x}) = \frac{(\tau f_i(\mathbf{x}))^{n_i}}{n_i!} e^{-\tau f_i (x)}$
$f_i(\mathbf{x})$ is the average firing rate of neuron $i$ over $x$ (i.e. the tuning curve for position), $n_i$ is the number of spikes emitted by neuron $i$ in the current time window, and $\tau$ is the size of the time window used. Thus, $\tau f_i(\mathbf{x})$ is the mean number of spikes we expect from neuron $i$ in a window of size $\tau$; the Poisson distribution describes how likely it is that we observe the actual number of spikes $n_i$ given this expectation.
In reality, place cell spike counts are typically not Poisson-distributed so this is clearly a simplifying assumption. There are many other, more sophisticated approaches for the estimation of $P(n_i|\mathbf{x})$ (see for instance Paninski et al. 2007) but this basic method works well for many applications.
The above equation gives the probability of observing $n$ spikes for a given average firing rate for a single neuron. How can we combine information across neurons? Again we take the simplest possible approach and assume that the spike count probabilities for different neurons are independent. This allows us to simply multiply the probabilities together to give:
$P(\mathbf{n}|\mathbf{x}) = \prod_{i = 1}^{N} \frac{(\tau f_i(\mathbf{x}))^{n_i}}{n_i!} e^{-\tau f_i (x)}$
An analogy here is simply to ask: if the probability of a coin coming up heads is $0.5$, what is the probability of two coints, flipped simultaneously, coming up heads? If the coins are independent then this is simply $0.5*0.5$.
Combining the above with Bayes' rule, and rearranging a bit, gives
$P(\mathbf{x}|\mathbf{n}) = C(\tau,\mathbf{n}) P(\mathbf{x}) (\prod_{i = 1}^{N} f_i(\mathbf{x})^{n_i}) \: e (-\tau \sum_{i = 1}^N f_i(\mathbf{x}))$
This is more easily evaluated in vectorized MATLAB code. $C(\tau,\mathbf{n})$ is a normalization factor which we simply set to guarantee $\sum_x P(\mathbf{x}|\mathbf{n}) = 1$ (Zhang et al. 1998). For now, we assume that $P(\mathbf{x})$ (the "prior") is uniform, that is, we have no prior information about the location of the rat and let our estimate be completely determined by the likelihood.
The tuning curves take care of the $f_i(x)$ term in the decoding equations. Next, we need to get $\mathbf{n}$, the spike counts.
In [9]:
cfg_Q = [];
cfg_Q.binsize = 0.25;
cfg_Q.tvec_edges = metadata.taskvars.trial_iv.tstart(1):cfg_Q.binsize:metadata.taskvars.trial_iv.tend(end);
cfg_Q.tvec_centers = cfg_Q.tvec_edges(1:end-1)+cfg_Q.binsize/2;
Q = MakeQfromS(cfg_Q,DEC_S);
imagesc(Q.data);
Each column of this matrix is a vector of spike counts ($\mathbf{n}$ in the section above). You can use the interactive features in the plotly version to zoom in to specific columns in the above plot.
Note that this type of matrix is shared by many different kinds of data in neuroscience -- this one contains binned spike counts, but a similar matrix can be constructed for spike density functions (firing rates), BOLD signals for voxels measured by fMRI, calcium levels from fluorescent microscopy, and so on. Analyses such as representational similarity analysis, functional and effective connectivity, discovery of network structure (hubs, ensembles, cliques, etc.) take a matrix of this type as a starting point.
For the purposes of this tutorial, this matrix completes the set of ingredients required to perform decoding. Although the codebase provides configurable wrapper functions for this, a minimal version of the raw code follows below so you can see how it works:
In [10]:
Q = restrict(Q,metadata.taskvars.trial_iv); % for speed, only decode trials of running on track
nBins = length(tc.good_idx);
occUniform = repmat(1/nBins,[nBins 1]); % prior over locations, P(x) in section above
nActiveNeurons = sum(Q.data > 0);
len = length(Q.tvec);
p = nan(length(Q.tvec),nBins); % this variable will store the posterior
for iB = 1:nBins % loop over space bins (x_i)
tempProd = nansum(log(repmat(tc.tc(:,iB),1,len).^Q.data)); % these 3 lines implement the actual decoding computation
tempSum = exp(-cfg_Q.binsize*nansum(tc.tc(:,iB)',2)); % compare to the equations above!
p(:,iB) = exp(tempProd)*tempSum*occUniform(iB);
end
p = p./repmat(sum(p,2),1,nBins); % renormalize to 1 total probability
p(nActiveNeurons < 1,:) = 0; % ignore time bins with no activity
Now we want to display the results. Before we do so, we should convert the rat's actual position into our binned form, so that we can compare it to the decoded estimate:
In [11]:
xBinned = interp1(ENC_pos.tvec,tc.pos_idx(:,1),Q.tvec);
yBinned = interp1(ENC_pos.tvec,tc.pos_idx(:,2),Q.tvec);
Now we can visualize the decoding, shown here as a movie (see below for the code):
You can see the rat's true location, rendered as a white circle, moving around -- initially hanging around at the base of the maze, and then running up the stem to one of the reward locations at the end of the maze arms (food on the left, water on the right). The output of the decoder is plotted in pseudocolor -- blue colors indicate low decoding probability, and green/yellow/red indicate progressively higher probability. You should see that as the rat runs along the track, the decoder output tends to follow the rat's true location. In other words, based on neural activity in a given time bin, we can infer the location of the rat.
(Note there are some segments of the data for which no decoding output appears; this is due to restriction of the data to runs on the track only.)
To generate the above movie in MATLAB, execute the code below with the plotting commands uncommented:
In [12]:
dec_err = nan(length(Q.tvec),1); % variable to keep track of decoding error
% h = figure; set(h,'Position',[100 100 640 480]); % fix position of figure to help encoding of movie for export later
for iT = 1:length(Q.tvec) % loop over time bins
toPlot = nan(tc.nBins{1},tc.nBins{2}); % initialize empty 2-D grid
toPlot(tc.good_idx) = p(iT,:); % assign decoding probabilities to "good" bins
% cla; pcolor(toPlot); axis xy; hold on; caxis([0 0.5]); shading flat; axis off; % plot decoding output
% hold on; plot(yBinned(iT),xBinned(iT),'ow','MarkerSize',15); % overlay actual position
% get x and y coordinates of location with largest decoding probability ("maximum a posteriori" or MAP)
[~,idx] = max(toPlot(:)); [x_map,y_map] = ind2sub(size(toPlot),idx);
% plot(y_map,x_map,'g*','MarkerSize',5); % plot MAP location
% compute error: distance between actual position and MAP position
if nActiveNeurons(iT) > 0, dec_err(iT) = sqrt((yBinned(iT)-y_map).^2+(xBinned(iT)-x_map).^2); end
% h = title(sprintf('t %.2f, nCells %d, dist %.2f',Q.tvec(iT),nActiveNeurons(iT),dec_err(iT)));
% if nActiveNeurons(iT) == 0, set(h,'Color',[1 0 0]); else, set(h,'Color',[0 0 0]); end
% pause(0.1); drawnow;
% f(iT) = getframe(gcf); % uncomment this if exporting to video file
end
The above movie gives a useful visual impression of what the decoder output looks like, and that it roughly seems to track the rat's position. A more systematic approach to evaluating the performance of the decoder is to plot the error, i.e. the distance between the actual and decoded positions. If you haven't done so already, run the cell above to compute it. Then, the following will plot the error as a function of space:
In [13]:
dec_err_tsd = tsd(Q.tvec,dec_err); % create timestamped data struct for decoding error computed above
cfg = []; cfg.y_edges = cfg_tc.binEdges{2}; cfg.x_edges = cfg_tc.binEdges{1};
space_err = TSDbySpace(cfg,ENC_pos,dec_err_tsd);
pcolor(space_err); shading flat; axis off; colorbar; caxis([0 10]);
It looks like the decoding error is on average larger on the central stem, compared to the arms of the maze. Then there are some outlier points with high (>10 pixels) decoding error.
☛ What could be some reasons for this? Can you think of ways to test your suggestions?
So far, we have been using the same data -- runs on the track -- to estimate our encoding model (tuning curves) and to apply our decoder to. This "tautological" approach is of limited value because it does not do a good job of testing how well our decoder works on data that was not used to construct it. As a result, there is the risk of overfitting: capturing idiosyncratic features of data which do not generalize. A common approach to prevent overfitting is to use cross-validation, which for our data set can be visualized like this (from van der Meer et al. 2017):
Implementing such cross-validation schemes here would entail using different restrict()
calls to make non-overlapping ENC_S
and DEC_S
sets of data, decoding DEC_S
based on tuning curves obtained from ENC_S
. van der Meer et al. (2017) show that the distribution of decoding error across the track can differ substantially between tautological and cross-validated decoding, as illustrated here:
This is a methodologically important observation, because these distributions are the null hypothesis of what we expect when decoding internally generated activity such as occurs during memory recall and planning. For instance, suppose you are interested in what a subject recalls about some prior experience with the task. Perhaps you find a tendency for the subject to replay trajectories that cross the decision point of the T-maze -- a potentially exciting result, because such preferential replay of decision points is what has been shown to be effective in off-line reinforcement learning. However, the above figure provides an alternative, less interesting explanation: your ability to decode locations at the edges of the track is simply lower than for the center of the track! In other words, such an observation is in fact consistent with the non-uniform null hypothesis obtained from cross-validated decoding accuracy.
Now that we have validated our decoder on data where we know the correct answer, we can apply it to internally generated activity that occurs in the absence of overt behavior. The rodent and human hippocampus is known to "replay" experience during wakeful rest and sleep (Wilson and McNaughton 1994); Pezzulo et al. 2017), although this term has become misleading in the light of accumulating evidence that internally generated sequences in the hippocampus (and declarative memory more generally) are constructive, i.e. not limited to literal repeats of prior experience, but including shortcuts and trajectories towards a goal (Gupta et al. 2010; Pfeiffer and Foster 2013).
Hippocampal replay is known to occur during so-called sharp wave-ripple complexes, SWRs for short, which are network events visible in the local field potential that reflect the sudden, transient and highly synchronous activity of a large number of hippocampal neurons. Let's find some in our data! To make visualization and analysis easier, we will linearize the 2-D position data into 1-D "left" and "right" trials:
In [14]:
expCond(1).label = 'left'; % this is a T-maze, we are interested in 'left' and 'right' trials
expCond(2).label = 'right'; % these are just labels we can make up here to keep track of which condition means what
expCond(1).t = metadata.taskvars.trial_iv_L; % previously stored trial start and end times for left trials
expCond(2).t = metadata.taskvars.trial_iv_R;
expCond(1).coord = metadata.coord.coordL; % previously user input idealized linear track, used for linearization
expCond(2).coord = metadata.coord.coordR; % note, this is in units of "camera pixels", not cm
expCond(1).S = S; expCond(2).S = S;
% linearize paths (snap x,y position samples to nearest point on experimenter-drawn idealized track)
nCond = length(expCond);
for iCond = 1:nCond
cfg_linpos = []; cfg_linpos.Coord = expCond(iCond).coord;
expCond(iCond).linpos = LinearizePos(cfg_linpos,pos);
end
Next, we restrict the data to the corresponding trials, and only those times when the rat is running:
In [15]:
% find intervals where rat is running
spd = getLinSpd([],pos); % get speed (in "camera pixels per second")
cfg_spd = []; cfg_spd.method = 'raw'; cfg_spd.threshold = 10;
run_iv = TSDtoIV(cfg_spd,spd); % intervals with speed above 10 pix/s
% restrict (linearized) position data and spike data to desired intervals
for iCond = 1:nCond
fh = @(x) restrict(x,run_iv); % function to restrict input to run times only
expCond(iCond) = structfunS(fh,expCond(iCond),{'S','linpos'}); % apply function to S and linpos fields of input
fh = @(x) restrict(x,expCond(iCond).t); % function to restrict input to specific trials (left/right)
expCond(iCond) = structfunS(fh,expCond(iCond),{'S','linpos'});
end
Now we can compute our tuning curves, as before, except now they are 1-D rather than 2-D:
In [16]:
for iCond = 1:nCond
cfg_tc = []; % use the defaults
expCond(iCond).tc = TuningCurves(cfg_tc,expCond(iCond).S,expCond(iCond).linpos); % note, same function as above
end
To help with visualization, we are going to apply a simple (and arbitrary) rule for which of these tuning curves have a defined "place field" and get the location of that field:
In [17]:
for iCond = 1:nCond
expCond(iCond).fields = DetectPlaceCells1D([],expCond(iCond).tc.tc); % function to detect place fields and their location
end
We also want to load a local field potential, because it will be useful in detecting candidate replay events:
In [18]:
cfg = []; cfg.fc = ExpKeys.goodSWR(1); % note use of experimenter-annotated data file to use
lfp = LoadCSC(cfg); lfp = decimate_tsd([],lfp); % subsample LFP data (to 500 Hz) to speed up plotting
Now, we can plot the data using the fabulous MultiRaster()
function:
In [19]:
fh = figure('KeyPressFcn',@navigate); % enable keyboard shortcuts for figure navigation (see help navigate or press h for instructions)
for iCond = 1:nCond
ax(iCond) = subplot(2,1,iCond); % set up separate subplots for left and right place fields
S_temp = SelectTS([],S,expCond(iCond).fields.template_idx); % template_idx contains ordered place cells
cfg_mr = []; cfg_mr.openNewFig = 0; cfg_mr.lfp = lfp;
MultiRaster(cfg_mr,S_temp);
xlim([5485 5492]); % zoom in on specific time interval of interest
end
linkaxes(ax,'x')
The figure shows two separate panels whose time axis is linked. Each panel (top for cells with place fields on "left" trials, bottom for cells with fields on "right" trials) contains a rasterplot with tickmarks indicating spike times. Each row represents a cell. The panels also have the same local field potential, plotted underneath the raster.
The time interval shown corresponds to a single trial -- a run from the base of the maze to the end of the right arm. The cells have been ordered according to the location of their place field on the track, as apparent from the (mostly) ordered progression in which the cells are activated in time, starting with the cells at the bottom and ending with cells at the top. You can tell that this is a right trial because the ordered progression of cell activation breaks down at some point for the top plot (corresponding to left trials) but continues past the midpoint for the bottom (right) plot.
A number of interesting features of hippocampal activity are apparent. First, the activity of individual cells is strongly rhythmic: given a spike at a certain time, more spikes are likely to occur approximately 100-125 ms later. The local field potential shows a similar rhythmicity in the form of an ~8 Hz ("theta") oscillation. Closer inspection reveals that place cell spikes are not locked to a specific phase of the theta rhythm, but precess systematically (fire at progressively earlier phases) as the rat traverses the place field. (If you are running this in MATLAB, MultiRaster()
enables keypresses to navigate and zoom in; press h
to see the available commands.) At the ensemble level, coordinated phase precession results in theta sequences, which manifest as repeating diagonals in the plot, evident for instance between 5490.5 and 5491 s.
The theta rhythm and sustained activation of place cells is characteristic of an animal that is moving and/or attentive. Hippocampal activity looks quite different during wakeful rest:
In [20]:
fh = figure('KeyPressFcn',@navigate); % enable keyboard shortcuts for figure navigation (see help navigate or press h for instructions)
for iCond = 1:nCond
ax(iCond) = subplot(2,1,iCond); % set up separate subplots for left and right place fields
S_temp = SelectTS([],S,expCond(iCond).fields.template_idx); % template_idx contains ordered place cells
cfg_mr = []; cfg_mr.openNewFig = 0; cfg_mr.lfp = lfp;
MultiRaster(cfg_mr,S_temp);
xlim([5540 5541.5]); % different time
end
linkaxes(ax,'x')
The LFP no longer exhibits a sustained theta rhythm. It appears arrhythmic, but is periodically interrupted by large slow deflections, "sharp waves", that coincide with a fast ~150-200 Hz "ripple". These sharp wave-ripple complexes (SWRs) are associated with synchronous activity in a large number of place cells, which tend to be quiescent otherwise.
Four clear SWR events are visible in the above plot. The second one (around time 5540.5) appears to have place cells active in the same order (particularly apparent in the top panel) as they were active during run -- an example of replay!
The content of replay, i.e. what actual or constructed experience it best corresponds to, is a quantity of psychological interest. How does an animal segment a continuous environment or experience into discrete chunks? Are certain aspects of experience preferentially replayed? Can subsequent choices be predicted from replay content? To address such questions we need a systematic way of identifying candidate replay events, determining whether they reflect specific content (decoding), and summarizing that content. This can be accomplished in the following steps:
We will tackle an example approach for each of these three steps in turn below.
In [21]:
%% filter in SWR band
cfg = [];
cfg.f = [140 220];
cfg.display_filter = 0;
SWRf = FilterLFP(cfg,lfp);
%% obtain power and z-score it
SWRp = LFPpower([],SWRf);
SWRp_z = zscore_tsd(SWRp);
%% detect events
cfg = [];
cfg.method = 'raw';
cfg.threshold = 3;
cfg.operation = '>'; % return intervals where threshold is exceeded
cfg.merge_thr = 0.05; % merge events closer than this
cfg.minlen = 0.05; % minimum interval length
SWR_evt = TSDtoIV(cfg,SWRp_z);
%% to each event, add a field with the max z-scored power (for later selection)
cfg = [];
cfg.method = 'max'; % 'min', 'mean'
cfg.label = 'maxSWRp'; % what to call this in iv, i.e. usr.label
SWR_evt = AddTSDtoIV(cfg,SWR_evt,SWRp_z);
%% select only those events of >5 z-scored power
cfg = [];
cfg.operation = '>';
cfg.threshold = 5;
SWR_evt = SelectIV(cfg,SWR_evt,'maxSWRp');
There are a few options for visualizing what we have detected. The first is to simply plot all the events:
In [22]:
cfg = []; cfg.display = 'iv'; cfg.fgcol = 'r';
PlotTSDfromIV(cfg,SWR_evt,lfp);
We can also use MultiRaster() to highlight the detected events in a richer plot:
In [23]:
cfg = []; cfg.lfp = lfp; cfg.evt = SWR_evt;
MultiRaster(cfg,S_temp);
xlim([5042.3 5043.3]);
Note that you can use keyboard shortcuts to navigate between events. For instance, c
jumps to the center event, from which a
and d
take you to the previous and next events respectively. h
brings up a list of other useful keyboard shortcuts.
Inspecting a few detected SWR events should highlight the synchronous spiking that tends to accompany these events, which in some cases shows the clearly sequential order characteristic of replay (as in the above example).
☛ Optional: using the set operations for interval data listed here use the getMUA()
and getLinSpd()
functions to retain only those candidates that have multi-unit activity (MUA) at least 2 SDs above the mean, and that occur when the rat is stationary.
At this point, we could use the candidate events to apply some spike-based metrics to each event in turn, and determine which events pass some significance threshold (usually relative to a shuffled distribution in which the identity of the cells and/or their spike timing is resampled). Here we take an alternative approach, which is to decode the entire session and look for intervals in which the decoded location does not jump more than a specified distance between adjacent time steps.
To do this, we use a wrapper function, Generate_DecSeq()
which performs several of the steps covered above (linearize data, construct tuning curves, decode), plus an additional sequence detection step. There are a substantial number of parameters here, including those for decoding (time window width and step size; smoothing of tuning curves and spike trains) and for sequence detection (minimum number of active cells per time bin, maximum allowed jump distance). For now, we will use the defaults, which you can inspect by looking at the help.
In [27]:
addpath(genpath(cat(2,SET_GitHub_root,'\vandermeerlab\code-matlab\tasks\Alyssa_Tmaze\')));
out = Generate_DecSeq([]); % note, may take a couple of minutes to run!
The resulting output has a number of fields, including:
P
, the decoded posterior (and P2
, the same thing with bins not meeting thresholds removed)decode_map
, a tsd of the maximum a posteriori decoded locationsseq_iv
, intervals of detected sequences
In [29]:
out.expCond
We can plot some of the output as follows:
In [35]:
iCond = 2;
S_temp = SelectTS([],out.expCond(iCond).decS,out.expCond(iCond).fields.template_idx);
cfg_plot = [];
cfg_plot.lfp(1) = out.expCond(iCond).decode_map; cfg_plot.lfp(2) = lfp;
cfg_plot.evt = out.expCond(iCond).seq_iv;
MultiRaster(cfg_plot,S_temp);
xlim([5042.3 5043.3]);
The above plot shows decoded location as a broken up wiggly line, in between the spike raster at the top and the LFP at the bottom. Time bins which didn't pass all thresholds (number of cells active) are not shown. Red intervals highlight "sequences" detected by the no-jump criterion. The apparent replay event around time 5042.7 gets detected, albeit split up into two separate intervals, as does a smaller event at time 5042.9.
If you're following along in MATLAB you can scroll around (and/or use the f, l, a, d keys) to look at some other detected events. If you do, you can see a number of "theta sequences" that occur not during SWRs in the rest state, but while the animal is running along the track. Since we are interested in SWR-associated replay, we can use interval set operations to only keep decoded sequence events that overlap with the SWR events we detected earlier:
In [36]:
SWR_seq = IntersectIV([],out.expCond(iCond).seq_iv,SWR_evt);
cfg_plot.evt = SWR_seq;
MultiRaster(cfg_plot,S_temp);
xlim([5042.3 5043.3]);
Inspection of the SWR_seq
struct reveals that there are now only 72 events left that pass the sequence criteria, as well as the SWR detection threshold. As you can see in the plot above, the first part of the replay sequence above is now left out, even though a clear SWR is visible.
☛ Change the SWR detection to set the start/end times at 2 SDs above the mean, including only events that exceed 4 SDs, and try again. You should see that now the entire replay sequence is included correctly.
Nevertheless, the above snapshot illustrates some issues. The event at 5042.4 looks like a clear SWR, but is in fact not detected. These observations motivate more sophisticated SWR detection methods, such as those based on an adaptive threshold; alternatively, we can use a different criterion, such as the subject being stationary, to rule out false positives from theta sequences.
So far, we have analyzed left and right data separately (as different elements in the expCond
variable). However, it is possible that some events meet the criteria for both, either because they occur (partially) on the central stem, or because of some similarity in how the left and right arms are encoded.
☛ Use the interval set functions to create left_only
and right_only
iv variables in which events that meet both the left and right criteria are removed. What is the final count of left and right events?
(cognitive)
(methodology)
(neurophysiology)