Introduction to Reinforcement Learning
[Pages:58]Introduction to Reinforcement Learning
J. Zico Kolter Carnegie Mellon University
1
Agent interaction with environment
State s
Agent
Reward r
Environment
Action a
2
Of course, an oversimplification
3
Review: Markov decision process
Recall a (discounted) Markov decision process = , , , ,
? : set of states ? : set of actions ? ?? [0,1]: transition probability distribution , ? : rewards function, () is reward for state ? : discount factor
The RL twist: we don't know or , or they are too big to enumerate (only have the ability to act in the MDP, observe states and actions)
4
Some important quantities in MDPs
(Deterministic) policy : : mapping from states to actions
(Stochastic) policy : ? 0,1 : distribution over actions for each state
Value of a policy , expected discounted reward if starting in some state and following policy (expressed via Bellman equation)
0 = + , 0()
24
Optimal value function (value function of optimal policy , policy with highest value)
= + max , ()
67 29
5
"Solving" an MDP
Policy evaluation: To find the value of a policy , start with any 0() and repeat:
: 0 + , 0()
24
(alternatively, can solve above linear equation to determine 0 directly)
Value iteration: To find optimal value function, start with any () and repeat:
: + max , ()
67 24
But, how do we compute these quantities when and are unknown?
6
Overview of RL
Model-based methods
Reinforcement learning
Model-free methods
Value-based methods
Policy-based methods
Important note: the term "reinforcement learning" has also been coopted to mean essentially "any kind of sequential decision-making problem involving some element of machine learning", including many domains different from above (imitation learning, learning control, inverse RL, etc), but we're going to focus on the above outline
7
Important note regarding domain size
For the purposes of this lecture (except for the last section), we're going to assume a discrete state / discrete action setting where we can enumerate all states
Last part of lecture we will talk about the case of large/continuous state and action spaces
Think: grid-world, not Atari (yet)
0
0
0
1
0
0 -100
0
0
0
0
8
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- introduction to model based system engineering
- provider based status update
- ghg protocol scope 2 guidance
- provider based billing
- who surgical site infection prevention guidelines
- department of health and human services
- cms manual system
- introduction to reinforcement learning
- gpm gph flow based on pvc pipe size
- competency checklist sample
Related searches
- introduction to financial management pdf
- introduction to finance
- introduction to philosophy textbook
- introduction to philosophy pdf download
- introduction to philosophy ebook
- introduction to marketing student notes
- introduction to marketing notes
- introduction to information systems pdf
- introduction to business finance pdf
- introduction to finance 15th edition
- introduction to finance books
- introduction to finance online course