Introduction to Reinforcement Learning

[Pages:58]Introduction to Reinforcement Learning

J. Zico Kolter Carnegie Mellon University

1

Agent interaction with environment

State s

Agent

Reward r

Environment

Action a

2

Of course, an oversimplification

3

Review: Markov decision process

Recall a (discounted) Markov decision process = , , , ,

? : set of states ? : set of actions ? ?? [0,1]: transition probability distribution , ? : rewards function, () is reward for state ? : discount factor

The RL twist: we don't know or , or they are too big to enumerate (only have the ability to act in the MDP, observe states and actions)

4

Some important quantities in MDPs

(Deterministic) policy : : mapping from states to actions

(Stochastic) policy : ? 0,1 : distribution over actions for each state

Value of a policy , expected discounted reward if starting in some state and following policy (expressed via Bellman equation)

0 = + , 0()

24

Optimal value function (value function of optimal policy , policy with highest value)

= + max , ()

67 29

5

"Solving" an MDP

Policy evaluation: To find the value of a policy , start with any 0() and repeat:

: 0 + , 0()

24

(alternatively, can solve above linear equation to determine 0 directly)

Value iteration: To find optimal value function, start with any () and repeat:

: + max , ()

67 24

But, how do we compute these quantities when and are unknown?

6

Overview of RL

Model-based methods

Reinforcement learning

Model-free methods

Value-based methods

Policy-based methods

Important note: the term "reinforcement learning" has also been coopted to mean essentially "any kind of sequential decision-making problem involving some element of machine learning", including many domains different from above (imitation learning, learning control, inverse RL, etc), but we're going to focus on the above outline

7

Important note regarding domain size

For the purposes of this lecture (except for the last section), we're going to assume a discrete state / discrete action setting where we can enumerate all states

Last part of lecture we will talk about the case of large/continuous state and action spaces

Think: grid-world, not Atari (yet)

0

0

0

1

0

0 -100

0

0

0

0

8

................
................

In order to avoid copyright disputes, this page is only a partial summary.

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

Introduction to Reinforcement Learning

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches

Introduction to Reinforcement Learning

Based on vs based off

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches