Q -Learning

Machine Learning

Srihari

Q - Learning

Sargur N. Srihari srihari@cedar.buffalo.edu

1

Machine Learning

Srihari

Topics in Q- Learning

? Overview 1. The Q Function 2. An algorithm for learning Q 3. An illustrative example 4. Convergence 5. Experimental strategies 6. Updating sequence

2

Machine Learning

Srihari

Task of Reinforcement Learning

States s

Actions a

st

at rt

st+1

(st,at)=st+1 r(st,at)=rt

Task of agent is to learn a policy : S?A

3

Machine Learning

Srihari

Agent's Task is to learn

? The agent has to learn a policy that maximizes V(s) for all states s

? Where

V (st ) = rt + rt+1 + 2rt+2 + .....=

ri i+1

i=0

? We will call such a policy an optimal policy *:

* = arg maxV (s),(s)

? We denote the value function V*(s) by V*(s)

? It gives the maximum discounted cumulative reward that the agent can obtain starting from state s

4

Machine Learning

Srihari

Role of an Evaluation Function

? How can an agent learn an optimal policy *

for an arbitrary environment?

? It is difficult to learn function * : S?A directly

? Because available training data does not provide training examples of the form

? Instead the only information available is the sequence of immediate rewards r(si,ai) for i=0,1,2,...

? Easier to learn a numerical evaluation function defined over states and actions

? And implement optimal policy in term of the

evaluation function

5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches