Decision(Making,Models( Definition( action

Decision Making, Models

Definition Models of decision making attempt to describe, using stochastic differential equations which represent either neural activity or more abstract psychological variables, the dynamical process that produces a commitment to a single action/outcome as a result of incoming evidence that can be ambiguous as to the action it supports.

Background Decision making can be separated into four processes (Doya, 2008):

1) Acquisition of sensory information to determine the state of the environment and the organism within it.

2) Evaluation of potential actions (options) in terms of the cost and benefit to the organism given its belief about the current state.

3) Selection of an action based on, ideally, an optimal tradeoff between the costs and benefits.

4) Use of the outcome of the action to update the costs and benefits associated with it.

Models of the dynamics of decision making have focused on perceptual decisions with only two possible responses available. The term two--alternative forced choice (TAFC) applies to such tasks when two stimuli are provided, but the term is now generally used for any binary choice discrimination task.

In a perceptual decision, the response, or action, is directly determined by the current percept. Thus the decision in these tasks is essentially one of perceptual categorization, namely process (1) above, though the same models can be used for action selection given ambiguous information of the current state (process 3).

Evaluation of the possible responses in terms of their value or the resulting state's utility (process 2) (Sugrue et al., 2005) given both uncertainty in the current state, and uncertainty in the outcomes of an action given the state, is the subject of expected utility theory and prospect theory.

The necessary learning and updating of the values of different actions given the actual outcomes they produce (process 4) is the subject of instrumental conditioning and reinforcement learning, for example via temporal--difference learning (Seymour et al., 2004) and actor--critic models (Joel et al., 2002).

This article is primarily concerned with the dynamics of the production of either a single percept given unreliable sensory evidence (1), or a single action given uncertainty in the outcomes (3).

General features of discrimination tasks or TAFC tasks. In a TAFC task, a single decision variable can be defined, representing the likelihood ratio-- the probability that evidence to date favors one alternative over the other. While TAFC tasks (Figure 1) have provided the dominant paradigm for analysis of choice behavior, the restriction to only two choices is lifted in many of the more recent models of decision making based on multiple variables, allowing for the fitting of a wider range of data sets.

The tasks can either be based on a free response paradigm, in which a subject responds after as much or little time as she wants, or an interrogation (forced response) paradigm, in which the stimulus duration is limited and the subject must make a response within a given time interval. The free response paradigm is perhaps more powerful, since each trial produces two types of information: accuracy (correct or incorrect) and response time. However, by variation of the time allowed when responses are forced, both paradigms are valuable for constraining models, since they can provide a distribution of response times for both correct and incorrect trials, as well as the proportion of trials that are correct or incorrect with a given stimulus. These behavioral data can be modified by task difficulty, task instructions, such as ("respond rapidly" versus "respond accurately") or reward schedules and inter--trial intervals.

Most models of the dynamics of decision making focus on tasks where the time from stimulus onset to response is no more than one to two seconds, a timescale over which neural spiking can be maintained. Choices requiring much more time than this are likely to depend upon multiple memory stores, neural circuits and strategies, which become difficult to identify, extract and model in a dynamical systems framework (a state--based framework is more appropriate).

Figure 1. Scheme of the two--alternative forced choice (TAFC) task.

Two streams of sensory input, each containing stimulus information, or a signal (S1 and S2) combined with noise (!!

and !! ), are compared in a decision--making circuit.

The circuit must produce one of two responses (A or B) indicating which of the two signals is the stronger. The optimal method for achieving this discrimination is via the sequential probability ratio test (SPRT) which requires the decision making circuit to integrate inputs over time.

In the standard setup of the models, two parallel streams of noisy sensory input are available, with each stream supplying evidence in support of one of the two allowed actions (see Figure 1). The sensory inputs can be of either discrete or continuous quantities and can arrive discretely or continuously in time. The majority of models focus on continuous update in continuous time so can be formulated as stochastic differential equations (Gillespie, 1992, Lawler, 2006). The sensory evidence, which is momentary, produces a decision variable, which indicates the likelihood of choosing one of the two alternatives given current evidence and all prior evidence. The primary difference between models is in how sensory evidence determines the decision variable. While most models incorporate a form of temporal integration of evidence (Cain and Shea--Brown, 2012) and include a negative interaction between the two sources of evidence, differences arise in the stability

of initial states which determines whether integration is perfect and in the nature of the interaction: feedforward between the inputs, feedforward between outputs or feedback from outputs to decision variables (Bogacz et al., 2006). Models can also differ in their choice of decision threshold--the value of the decision variable at which a response is produced--in the free response paradigm (Simen et al., 2009, Deneve, 2012, Drugowitsch et al., 2012), and in particular whether this parameter or other model parameters, such as input gain, which also affect the response time distribution, are static or dynamic across a trial (Shea--Brown et al., 2008, Thura et al., 2012).

As the time available for acquisition of sensory information increases, so accuracy of responses increases in a perceptual discrimination task. Accuracy is measured as probability of choosing the response leading to more reward, which is equivalent to obtaining a veridical percept in these tasks. All of the models to be discussed below can produce such a speed--accuracy tradeoff by parameter adjustment. If parameters are adjusted so as to increase the mean response time, then accuracy increases. Such a tradeoff is observed in behavioral tasks, when either instructions or the schedule of reward and punishment encourages participants to respond as quickly as possible while being less concerned about making errors, or to respond as accurately as possible, while being less concerned about the time it takes to decide. The simplest way to effect such a tradeoff is to adjust the inter trial interval, which if long compared to the decision time, means that accuracy of responses impacts reward rate much more so than the time for the decision itself. Models can replicate such behavior when optimal performance is based on the maximal reward rate. Typical parameter adjustments to increase accuracy while slowing responses would be a multiplicative scaling down of inputs (and the concurrent input noise) or a scaling up of the range across which the decision variable can vary by raising a decision threshold (Figs 2--3) (Ratcliff, 2002, Simen et al., 2009, Balci et al., 2011). A similar effect can be achieved in alternative, attractor--based models through the level of a global applied current, which affects the stability of the initial "undecided" state (Figs 6--9) (Miller and Katz, 2013).

From a neuroscience perspective, the decision variable is typically interpreted as either the mean firing rate of a group of neurons or a linear combination of rates of many neurons (Beck et al., 2008) (the difference between two groups being the simplest such combination). There has been remarkable progress in matching the observed firing patterns of neurons (Newsome et al., 1989, Shadlen and Newsome, 2001, Huk and Shadlen, 2005) with the dynamics of a decision variable in more mathematical models of decision making (Glimcher, 2001, Gold and Shadlen, 2001, Glimcher, 2003, Smith and Ratcliff, 2004, Gold and Shadlen, 2007, Ratcliff et al., 2007). This has led to the introduction of biophysically based models of neural circuits (Wang, 2008), which have accounted for much of the concordance between simple mathematical models, neural activity and behavior.

Optimal Decision Making An optimal decision--making strategy either maximizes expected reward over a given time or minimizes risk. In TAFC perceptual tasks, a response is either correct or an error. In the interrogation paradigm, with fixed time per decision, the optimal strategy is the one leading to greatest accuracy, that is the lowest expected error rate. In the free response paradigm the optimal strategy either delivers the greatest accuracy for a given mean response time, or produces the fastest mean response time for a given accuracy. In these tasks, the sequential probability ratio test (SPRT), introduced by Wald and Wolfowitz (Wald, 1947, Wald and Wolfowitz, 1948), and in its continuous form, the drift diffusion model (DDM) (Ratcliff and

Smith, 2004, Ratcliff and McKoon, 2008) leads to optimal choice behavior by any of these measures of optimality (see (Bogacz et al., 2006) for a thorough review).

Using SPRT in the interrogation paradigm, one simply accumulates over time the log-- likelihood ratio of the probabilities of each alternative given the stream of evidence, where the observed sensory input per unit time has a certain probability given alternative A and another probability given alternative B. Integrating the log--likelihood over time, after setting the initial condition as the log--likelihood ratio of the prior probabilities, log[P(A)/P(B)], leads to a quantity log[P(A|S)/P(B|S)] which is greater than zero if A is more likely than B given the stimulus and less than zero otherwise. Thus the optimal procedure is to choose A or B depending on the sign of the summed, or in the continuous limit, integrated, log--likelihood ratio.

In the free response paradigm a stopping criterion must be included. This is achieved by setting two thresholds for the integrated log--likelihood ratio, a positive one (+) for choice

A and a negative one (? ) for choice B. The further the thresholds are from the origin, the lower the chance of error, but the longer the integration time before reaching a decision. Thus the thresholds reflect the fraction of errors that can be tolerated, with = log !

and

!!!

= log !

where

is the probability of choosing A when B is correct and

is the

!!!

probability of choosing B when A is correct.

The Models

Accumulator Models The first models of decision making in humans or animals were accumulator models, sometimes called counter models or race models. In these models, evidence accumulates separately for each possible outcome. This has the advantage that if many outcomes are possible, the models are simply extended by addition of one more variable for each additional alternative, with evidence for each alternative accumulating within its allotted variable. In the interrogation paradigm, one simply reads out the highest variable, so the choice depends on the sign of the difference of the two variables in the TAFC paradigm. Thus, if the difference in accumulated quantities matched the difference in integrated log probabilities of the two types of evidence, such readout from an accumulator model would be equivalent to an SPRT, so would be optimal.

In the free response paradigm, accumulator models produce a choice when any one of the accumulated variables reaches a threshold, so these models can be called "race to threshold models" or simply "race models". The original accumulator models included neither interaction between accumulators, nor ability for variables to decrease. However, for decisions in nature or in laboratory protocols, evidence in favor of one alternative is typically evidence against the other alternative. This is particularly problematic in the free response paradigm, because the time at which one variable reaches threshold and produces the corresponding choice is independent of evidence accumulated for other choices. Thus the behavior of simple accumulator models is not optimal. Comparisons of response time distributions of these models with behavioral responses showed the models to be inaccurate in this regard--observed response--time distributions are skewed with a long--tail, whereas the response times of accumulator models were much more symmetric about the mean. These discrepancies led to the ascendance of Ratcliff's drift diffusion model (Ratcliff, 1978).

The Drift Diffusion Model

The drift diffusion model (DDM) is an integrator with thresholds (Figure 2), or more

precisely, the decision variable, , follows a Wiener process with two absorbing boundaries

(Figure 3). It includes a deterministic (drift) term, , proportional to the rate of incoming evidence and a diffusive noise term of variance

! , which produces variability in response

times and can lead to errors:

= + ,

where

is a white noise term defined by = - .

1 1(t)

+ S1

- S+n

Choice A X > 0

X or X = +T

Choice B

S2

+

X < 0

or X = -T

2 2(t)

Figure 2. The drift diffusion model (DDM). The DDM is a one--dimensional model, so the two

competing inputs and their noise terms are first combined: in this case = ! - ! and = !! + !!.

If the model is scaled to a given level of noise then its three independent parameters are

drift rate (S) and positions of each of the two thresholds (a, --b) with respect to the starting

point. When the model was introduced, these parameters were assumed fixed for a given

subject in a specific task. The threshold spacing determines where one operates in the

speed--accuracy tradeoff, so can be optimized as a function of the relative cost for making an

incorrect response and the time between trials. Any starting point away from the midpoint

represents bias or prior information. The drift rate is proportional to stimulus strength.

X

X

+a

X(0) = 0 -b

St

mt

0

P(X,t) (for St + m t ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download