2. Multidimensional Signal Detection Theory

  • 싸이그래머 인지모델링 스터디 교재 [1]
  • 김무성

Contents

  • Abstract
  • Introduction
  • General Recognition Theory
  • The Multivariate Normal Model
  • Applying GRT to Data
  • Fitting the GRT Model to Identification Data
  • The Summary Statistics Approach
  • An Empirical Example
  • Extensions to Response Time
  • The RT-Distance Hypothesis
  • Process Models of RT
  • Neural Implementations of GRT
  • Conclusions

Abstract

  • Multidimensional signal detection theory
    • multivariate extension of signal detection theory
  • two fundamental assumptions,
    • every mental state is noisy
    • every action requires a decision
  • The most widely studied version is known as general recognition theory (GRT).
    • General recognition theory assumes
      • the percept on each trial can be modeled as a random sample from a multivariate probability distribution defined over the perceptual space.
      • Decision bounds divide this space into regions that are each associated with a response alternative.
  • General recognition theory rigorously defines and tests
    • a number of important perceptual and cognitive conditions,
    • including perceptual and decisional separability
    • and perceptual independence.
  • General recognition theory has been used to analyze data from identification experiments in two ways:
    • (1) fitting and comparing models that make different assumptions about perceptual and decisional processing,
    • (2) testing assumptions by computing summary statistics and checking whether these satisfy certain conditions.
  • Much has been learned recently about the neural networks that mediate the perceptual and decisional processing modeled by GRT, and this knowledge can be used to improve the design of experiments where a GRT analysis is anticipated.

Key Words

  • signal detection theory
  • general recognition theory
  • perceptual separability
  • perceptual independence
  • identification
  • categorization

Introduction

참고

Signal detection theory(신호탐지이론) ? [2,3]

반응 "없음" 반응 "있음"
신호 있음 누락 적중
신호 없음 정기각 오경보
Respond "Absent" Respond "Present"
Stimulus Present Miss Hit
Stimulus Absent Correct Rejection False Alarm
  • one sensory dimension signal detection theory
  • -> Multidimensional signal detection theory
  • -> (The most widely studied version of multidimensional signal detection theory is known as) general recognition theory
    • First, with more than one dimension, it becomes necessary to model interactions (or the lack thereof) among those dimensions.
    • Second, the problem of how to model decision processes when the perceptual space is multidimensional is far more difficult than when there is only one sensory dimension.

General Recognition Theory

As an example,

  • consider an experiment in which participants are asked to categorize or identify faces that vary across trials on gender and age.
  • Suppose there are four stimuli (i.e., faces) that are created by factorially combining two levels of each dimension.
  • In this case we could denote
    • the two levels of the gender dimension by
      • A1 (male) and
      • A2 (female)
    • and the two levels of the age dimension by
      • B1 (teen) and
      • B2 (adult).
  • Then the four faces are denoted as
    • A1B1 (male teen),
    • A1B2 (male adult),
    • A2B1 (female teen), and
    • A2B2 (female adult).

As with signal detection theory, a fundamental assumption of GRT is that all perceptual systems are inherently noisy. There is noise both in the stimulus (e.g., photon noise) and in the neural systems that determine its sensory representation (Ashby & Lee, 1993).

perceptual separability

perceptual independence

decisional separability

  • In GRT, the relationship of the joint pdf to the marginal pdfs plays a critical role in determining whether the stimulus dimensions are perceptually integral or separable.
  • Component A is perceptually separable from component B if the subject’s perception of A does not change when the level of B is varied.
    • For example, age is perceptually separable from gender if the perceived age of the adult in our face experiment is the same for the male adult as for the female adult, and if a similar invariance holds for the perceived age of the teen.
  • If perceptual separability fails then A and B are said to be perceptually integral.
  • Another purely perceptual phenomenon is perceptual independence.
    • According to GRT, components A and B are perceived independently in stimulus AiBj if and only if the perceptual value of component A is statistically independent of the perceptual value of component B on AiBj trials.
  • Note that perceptual independence is a property of a single stimulus, whereas perceptual separability is a property of groups of stimuli.
  • A third important construct from GRT is decisional separability.
    • In our hypothetical experiment with stimuli A1B1, A1B2, A2B1, and A2 B2, and two perceptual dimensions X1 and X 2, decisional separability holds on dimension X 1 (for example),
    • if the subject’s decision about whether stimulus component A is at level 1 or 2 depends only on the perceived value on dimension X 1.
    • A decision bound is a line or curve that separates regions of the perceptual space that elicit different responses.
    • The only types of decision bounds that satisfy decisional separability are vertical and horizontal lines.

A is perceptually separable from B

B is perceptually separable from A

A and B are perceived independently in stimulus AiBj

The Multivariate Normal Model

  • So far we have made no assumptions about the form of the joint or marginal pdfs.
  • Our only assumption has been that there exists some probability distribution associated with each stimulus and that these distributions are all embedded in some Euclidean space
    • the percepts are multivariate normally distributed.
    • The multivariate normal distribution includes two assumptions.
      • First, the marginal distributions are all normal.
      • Second, the only possible dependencies are pairwise linear relationships.
      • Thus, in multivariate normal distributions, uncorrelated random variables are statistically independent
    • The multivariate normal distribution has another important property.
      • Then it is straightforward to show that the decision boundary that maximizes accuracy is always linear or quadratic
      • The optimal boundary is linear if the two perceptual distributions have equal variance-covariance matrices
      • the optimal boundary is quadratic if the two variance-covariance matrices are unequal.
      • Thus, in the Gaussian version of GRT, the only decision bounds that are typically considered are either linear or quadratic

  • Bivariate normal distributions, like those depicted in Figure 2.1 are each characterized by five parameters:
    • a mean on each dimension,
    • a variance on each dimension, and
    • a covariance or correlation between the values on the two dimensions.
  • These are typically catalogued in
    • a mean vector and
    • a variance-covariance matrix.

Applying GRT to Data

confusion matrix

  • The entry in row i and column j lists the number of trials on which stimulus Si was presented and the subject gave response Rj .
    • Thus, the entries on the main diagonal give the frequencies of all correct responses and
    • the off-diagonal entries describe the various errors (or confusions).

  • General recognition theory has been used to analyze data from confusion matrices in two different ways.
    • One is to fit the model to the entire confusion matrix.
    • The other method for using GRT to test assumptions about perceptual processing, which is arguably more popular, is to compute certain summary statistics from the empirical confusion matrix and then to check whether these satisfy certain conditions

Fitting the GRT Model to Identification Data

  • ESTIMATING THE PARAMETERS
  • EVALUATING GOODNESS OF FIT

ESTIMATING THE PARAMETERS

EVALUATING GOODNESS OF FIT

The Summary Statistics Approach

  • MACRO-ANALYSES
  • MICRO-ANALYSES

The summary statistics approach draws inferences about

* perceptual independence,
* perceptual separability, and 
* decisional separability 

by using summary statistics that are easily computed from a confusion matrix.

MACRO-ANALYSES

  • Macro-analyses draw conclusions about perceptual and decisional separability from changes in
    • accuracy,
    • sensitivity, and
    • bias measures computed for one dimension across levels of a second dimension.
  • marginal response invariance

  • Marginal response invariance is closely related to perceptual and decisional separability.
    • Fig 2.3,
      • Dimension X 1 is decisionally but not perceptually separable from dimension X 2;
      • the distance between the means of the perceptual distributions along- the X 1 axis is much greater for the top two stimuli than for the bottom two stimuli.
      • The marginal distributions at the bottom of Figure 2.3 show that the proportion of correct responses, represented by the light-grey areas under the curves, is larger in the second level of X 2 than in the first level.
        • The result would be similar if perceptual separability held and decisional separability failed, as would be the case for X2 if its decision bound was not perpendicular to its main axis.
    • To test marginal response invariance in dimension X 1,
      • we estimate the various probabilities in Eq. 18 from the empirical confusion matrix that results from this identification experiment.
      • Next, equality between the two sides of Eq. 18 is assessed via a standard statistical test.
      • These computations are repeated for both levels of component A and if either of the two tests is significant,
      • then we conclude that marginal response invariance fails,
      • and, therefore, that either perceptual or decisional separability are violated.

  • identification hit rate
    • The left side of Eq. 18 equals P(ai |AiB1) and the right side equals P(ai|AiB2).
    • These are the probabilities that component Ai is correctly identified and are analogous to “hit” rates in signal detection theory.
    • To emphasize this relationship, we define the identification hit rate of component Ai on trials when stimulus AiBj is presented as

  • false alarm rates
    • In Figure 2.3, note that the dark grey areas in the marginal distributions equal Fa1|A2B2 (top) and Fa1|A2B1 (bottom).
    • In signal detection theory,
      • hit and false-alarm rates are used to measure stimulus discriminability.
      • We can use the identification analogues to compute marginal discriminabilities for each stimulus component.

  • The equality between two d`s can be tested using the following statistic

  • marginal response criterion
    • Marginal hit and false-alarm rates can also be used to compute a marginal response criterion.
    • Several measures of response criterion and bias have been proposed (see Chapter 2 of Macmillan & Creelman, 2005), but perhaps the most widely used criterion measure in recent years (due to Kadlec, 1999) is (25)
  • As shown in Figure 2.3, this measure represents the placement of the decision-bound relative to the center of the A2Bj distribution.

  • To test the difference between two c values, the following test statistic can be used (Kadlec, 1999):

MICRO-ANALYSES

  • Macro-analyses focus on properties of the entire stimulus ensemble.
  • In contrast, micro-analyses test assumptions about perceptual independence and decisional separability by examining summary statistics computed for only one or two stimuli.
  • sampling independence
    • The most widely used test of perceptual independence is via sampling independence, which holds when the probability of reporting a combination of components P(aibj ) equals the product of the probabilities of reporting each component alone, P(ai)P(bj ).
    • For example, sampling independence holds for stimulus A1B1 if and only if

  • Sampling independence provides a strong test of perceptual independence if decisional separability holds.
  • In fact, if decisional separability holds on both dimensions, then sampling independence holds if and only if perceptual independence holds (Ashby & Townsend, 1986).
  • sampling independence
    • Figure 2.4A gives an intuitive illustration of this theoretical result.
      • Two cases are presented in which decisional separability holds on both dimensions and the decision bounds cross at the mean of the perceptual distribution.
      • In the distribution to the left, perceptual independence holds and it is easy to see that all four responses are equally likely.
      • Thus, the volume of this bivariate normal distribution in response region R4 = a2b2 is 0.25.
      • It is also easy to see that half of each marginal distribution lies above its relevant decision criterion (i.e., the two shaded regions), so P(a2) = P(b2) = 0.5.
      • As a result, sampling independence is satisfied since P(a2b2) = P(a2) × P(b2)
  • discriminability and criterion measures
    • Perceptual independence can also be assessed through discriminability and criterion measures computed for one dimension conditioned on the perceived value on the other dimension.
    • Figure 2.4B shows
      • the perceptual distributions of two stimuli that share the same level of component B (i.e., B1) and have the same perceptual mean on dimension X 2.
      • The decision bound perpendicular to X 2 separates the perceptual plane into two regions:
        • percepts falling in the upper region elicit an incorrect response on component B (i.e., a miss for B),
        • whereas percepts falling in the lower region elicit a correct B response (i.e., a hit). The bottom of the figure shows the marginal distribution for each stimulus conditioned on whether B is a hit or a miss. When perceptual independence holds,

  • Note that if perceptual independence and decisional separability both hold, then the tests based on sampling independence and equal conditional d` and c should lead to the same conclusion.
    • Conditional d` and c values can be computed from hit and false alarm rates for two stimuli differing in one dimension, conditioned on the reported level of the second dimension.
  • If only one of these two tests holds and the other fails, this indicates a violation of decisional separability

An Empirical Example

  • In this section we show with a concrete example how to analyze the data from an identification experiment using GRT.
    • We will first analyze the data by fitting GRT models to the identification confusion matrix,
    • -> and then we will conduct summary statistics analyses on the same data.
    • -> Finally, we will compare the results from the two separate analyses
  • Imagine that you are a researcher interested in how the age and gender of faces interact during face recognition.
    • You run an experiment in which subjects must identify four stimuli
    • the combination of two levels of age (teen and adult) and
    • two levels of gender (male and female).
    • Each stimulus is presented 250 times, for a total of 1,000 trials in the whole experiment
  • The data to be analyzed are summarized in the confusion matrix displayed in Table 2.1.

  • These data were generated by random sampling from the model shown in Figure 2.5A.
    • The advantage of generating artificial data from this model is that we know in advance what conclusions should be reached by our analyses.
    • For example, note that decisional separability holds in the Figure 2.5A model.
    • Also, because the distance between the “male” and “female” distributions is larger for “adult” than for “teen,” gender is not perceptually separable from age.
    • In contrast, the “adult” and “teen” marginal distributions are the same across levels of gender, so age is perceptually separable from gender.
    • Finally, because all distributions show a positive correlation, perceptual independence is violated for all stimuli.
    • A hierarchy of models were fit to the data in Table 2.1 using maximum likelihood estimation.
    • Figure 2.5C shows the hierarchy of models used for the analysis,
      • together with the number of free parameters m for each of them.
      • In this figure,
        • PS stands for perceptual separability,
        • PI for perceptual independence,
        • DS for decisional separability and
        • 1_RHO describes a model with a single correlation parameter for all distributions.
        • Note that several other models could be tested, depending on specific research goals and hypotheses, or on the results from summary statistics analysis.
      • The arrows in Figure 2.5C connect models that are nested within each other.
        • The result of likelihood ratio tests comparing such
          • nested models are displayed next to each arrow,
          • with an asterisk representing significantly better fit for the more general model (lower in the hierarchy) and
          • n.s. representing a nonsignificant difference in fit.

Extensions to Response Time

  • The RT-Distance Hypothesis
  • Process Models of RT
  • There have been a number of extensions of GRT that allow the theory to account both for response accuracy and response time (RT).
    • One approach was to add the fewest and least controversial assumptions possible that would allow GRT to make RT predictions.
      • The resulting model succeeds, but it offers no process interpretation of how a decision is reached on each trial.
    • An alternative approach is to add enough theoretical structure to make RT predictions and to describe the perceptual and cognitive processes that generated that decision.

The RT-Distance Hypothesis

  • In standard univariate signal detection theory, the most common RT assumption is that RT decreases with the distance between the perceptual effect and the response criterion.
  • The obvious multivariate analog of this, which is known as the RT-distance hypothesis, assumes that RT decreases with the distance between the percept and the decision bound.
  • Efforts to incorporate the RT-distance hypothesis into GRT have been limited to two-choice experimental paradigms, such as
    • categorization or
    • speeded classification, which can be modeled with a single decision bound.
  • The most general form of the RT-distance hypothesis makes no assumptions about the parametric form of the function that relates RT and distance to bound.
  • The only assumption is that this function is monotonically decreasing.
  • For example, consider a filtering task
    • with stimuli A1B1, A1B2, A2B1, and A2 B2,
    • and two perceptual dimensions X1 and X2,
    • in which the subject’s task on each trial is to name the level of component A.
    • Let PFA(RTi < t|AiBj ) denote the probability that
      • the RT is less than or equal to some value t on trials of a filtering task
      • when the subject correctly classified the level of component A.
      • Given this, then the RT analog of marginal response invariance, referred to as marginal RT invariance, can be defined as

  • Ashby and Maddox (1994) showed that perceptual separability holds if and only if marginal RT invariance holds for both correct and incorrect responses.
  • Note that this is an if and only if result, which was not true for marginal response invariance.
  • In particular, if decisional separability and marginal response invariance both hold, perceptual separability could still be violated.
  • But if decisional separability, marginal RT invariance, and the RT-distance hypothesis all hold, then perceptual separability must be satisfied.
  • The reason we get the stronger result with RTs is that marginal RT invariance requires that Eq. 29 holds for all values of t, whereas marginal response invariance only requires a single equality to hold.
  • A similar strong result could be obtained with accuracy data if marginal response invariance were required to hold for all possible placements of the response criterion (i.e., the point where the vertical decision bound intersects the X 1 axis).

Process Models of RT

  • At least three different process models have been proposed that account for both RT and accuracy within a GRT framework.
    • Ashby (1989) proposed a stochastic interpretation of GRT that was instantiated in a discrete-time linear system.
    • Townsend, Houpt, and Silbert (2012) considerably generalized the stochastic model proposed by Ashby (1989) by extending it to a broad class of parallel processing models.
    • Ashby (2000) took a different approach. Rather than specify a processing architecture, he proposed that moment-by-moment fluctuations in the percept could be modeled via a continuous-time multivariate diffusion process.
      • This stochastic version of GRT is more biologically plausible than the Ashby (1989) version (e.g., see Smith & Ratcliff, 2004) and it establishes links to the voluminous work on diffusion models of decision making.

Neural Implementations of GRT

  • Of course, the perceptual and cognitive processes modeled by GRT are mediated by circuits in the brain.
  • During the past decade or two, much has been learned about the architecture and functioning of these circuits. Perhaps most importantly, there is now overwhelming evidence that humans have multiple neuroanatomically and functionally distinct learning systems.
    • The most complete escription of two of the most important learning systems is arguably provided by the COVIS theory of category learning (Ashby, Alfonso-Reese, Turken, & Waldron, 1998; Ashby, Paul, & Maddox, 2011).
      • COVIS assumes separate
        • rule-based and
        • procedural-learning categorization systems that compete for access to response production.
        • The rule-based system uses executive attention and working memory to select and test simple verbalizable hypotheses about category membership.
        • The procedural system gradually associates categorization responses with regions of perceptual space via reinforcement learning.
        • COVIS assumes that rule-based categorization is mediated by a broad neural network that includes the prefrontal cortex, anterior cingulate, head of the caudate nucleus, and the hippocampus, whereas the key structures in the procedural-learning system are the striatum and the premotor cortex.
        • Virtually all decision rules that satisfy decisional separability are easily verbalized.
        • In fact, COVIS assumes that the rule-based system is constrained to use rules that satisfy decisional separability (at least piecewise).
        • In contrast, the COVIS procedural system has no such constraints. Instead, it tends to learn decision strategies that approximate the optimal bound.
  • As we have consistently seen throughout this chapter, decisional separability greatly simplifies applications of GRT to behavioral data. Thus, researchers who want to increase the probability

Conclusions

  • Multidimensional signal detection theory in general, and GRT in particular, make two fundamental assumptions, namely that every mental state is noisy and that every action requires a decision.
  • Multidimensional signal detection theory captures two fundamental features of almost all behaviors.
  • Beyond these two assumptions, however, the theory is flexible enough to model a wide variety of decision processes and sensory and perceptual interactions.
  • For these reasons, the popularity of multidimensional signal detection theory is likely to grow in the coming decades

grtools


In [ ]:
#install.packages("devtools")
#devtools::install_github("fsotoc/grtools", dependencies="Imports")

In [1]:
library(grtools)

In [2]:
?grtools


Out[2]:
grtools-package {grtools}R Documentation

General recognition theory tools for the analysis of perceptual independence

Description

Statistical tools from General Recognition Theory for psychophysical data analyses aimed at determining independent processing of perceptual dimensions

Details

Package: grtools
Type: Package
Version: 0.1.2
Date: 2015-03-26
License: GPL (>= 2)

grtools provides functions for the following analyses using general recognition theory:

1. Model-based analyses of separability and independence with GRT-wIND for the 2x2 identification experiment (Soto et al., 2015). See grt_wind_fit and grt_wind_fit_parallel

2. Model-based analyses of separability and independence with traditional GRT models for the 2x2 identification experiment (Ashby & Soto, 2015). See grt_hm_fit

3. Summary statistics analysis (i.e. Kadlec's MDSDA; see Kadlec & Townsend, 1992) for the 2x2 identification experiment. See sumstats_micro and sumstats_macro

4. Summary statistic analysis for the 2x2 Garner filtering task (Ashby & Maddox, 1994). See sumstats_garner

Author(s)

Fabian Soto, Emily Zheng

Maintainer: Fabian Soto <fabian.soto@psych.ucsb.edu>

References

Ashby, F. G., & Soto, F. A. (2015). Multidimensional signal detection theory. In J. R. Busemeyer, J. T. Townsend, Z. J. Wang, & A. Eidels (Eds.), Oxford handbook of computational and mathematical psychology (pp. 13-34). Oxford University Press: New York, NY.

Ashby, F. G., & Maddox, W. T. (1994). A response time theory of separability and integrality in speeded classification. Journal of Mathematical Psychology, 38(4), 423-466.

Kadlec, H., & Townsend, J. T. (1992). Signal detection analyses of multidimensional interactions. In F. G. Ashby (Ed.), Multidimensional models of perception and cognition (pp. 181–231). Hillsdale, NJ: Erlbaum.

Soto, F. A., Musgrave, R., Vucovich, L., & Ashby, F. G. (2015). General recognition theory with individual differences: A new method for examining perceptual and decisional interactions with an application to face perception. Psychonomic Bulletin & Review, 22(1), 88-111.

See Also

For applications of General Recognition Theory to perceptual categorization experiments, see grt


[Package grtools version 0.1.2 ]

참고자료