Q -Learning
Machine Learning
Srihari
Q - Learning
Sargur N. Srihari srihari@cedar.buffalo.edu
1
Machine Learning
Srihari
Topics in Q- Learning
? Overview 1. The Q Function 2. An algorithm for learning Q 3. An illustrative example 4. Convergence 5. Experimental strategies 6. Updating sequence
2
Machine Learning
Srihari
Task of Reinforcement Learning
States s
Actions a
st
at rt
st+1
(st,at)=st+1 r(st,at)=rt
Task of agent is to learn a policy : S?A
3
Machine Learning
Srihari
Agent's Task is to learn
? The agent has to learn a policy that maximizes V(s) for all states s
? Where
V (st ) = rt + rt+1 + 2rt+2 + .....=
ri i+1
i=0
? We will call such a policy an optimal policy *:
* = arg maxV (s),(s)
? We denote the value function V*(s) by V*(s)
? It gives the maximum discounted cumulative reward that the agent can obtain starting from state s
4
Machine Learning
Srihari
Role of an Evaluation Function
? How can an agent learn an optimal policy *
for an arbitrary environment?
? It is difficult to learn function * : S?A directly
? Because available training data does not provide training examples of the form
? Instead the only information available is the sequence of immediate rewards r(si,ai) for i=0,1,2,...
? Easier to learn a numerical evaluation function defined over states and actions
? And implement optimal policy in term of the
evaluation function
5
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- introduction open physical education curriculum
- the chosen confirmation retreat
- lesson titles and segment times
- the chosen episode one
- the chosen christ lincoln
- the chosen season 1 study series
- the chosen episode 3 jesus loves the little children
- liberty s kids we the people video quiz
- tap311 0 speed frequency and wavelength
- the civil war by ken burns video guide