CS294-112 Deep Reinforcement Learning Plotting and ...
CS294-112 Deep Reinforcement Learning Plotting
and Visualization Handout
1
General Best Practices
Plotting and visualization are an important component of designing, debugging,
prototyping, and evaluating your deep reinforcement learning algorithms. The
following tips might help you structure your code in a way that makes it easy
to produce good plots:
? Your learning code should log results to an external file, such as a csv or
pkl file, rather than producing the final plot directly. This way, you can
run the learning process once, and then experiment with different ways
to plot the results. It might be a good idea to log more than you think
is strictly necessary, since you never know what information will be most
useful for understanding what happened. Keep an eye on file size, but
generally it might be good to log some of the following: average reward
or loss at each iteration, some of the sampled trajectories (for subsequent
visualization), useful secondary metrics such as Bellman error or gradient
magnitudes.
? You should have a separate script that loads up one or more logs and
plots the results. If you run the algorithm multiple times with different
hyperparameters or random seeds, run different algorithms to compare,
or run variants of your method, it¡¯s a good idea to load up all of the data
together (perhaps from different files) and plot it on the same plot, with
an automatically generated legend and a color scheme that makes it easy
to distinguish different methods.
? Deep RL methods, especially model-free methods that you¡¯ll learn about
in the course, tend to experience considerable variability between runs.
It¡¯s therefore a good idea to run multiple times with multiple different
random seeds. When plotting the results for multiple runs, it may be a
good idea at least initially to plot all of the runs on the same plot, with
the average performance also plotted with a thicker line or in a different
color. When plotting many different methods, you may find it convenient
to summarize this into mean and standard deviation plots. However, the
distribution doesn¡¯t always follow a normal curve, so plotting all the runs,
1
at least initially, might give you a better sense for the variability between
random seeds.
2
Example Code
In python, matplotlib and seaborn are useful tools for plotting data. Here
is some example code for plotting with shaded regions to indicate standard
deviation:
import numpy a s np
import m a t p l o t l i b . p y p l o t a s p l t
import s e a b o r n a s s n s
# This i s j u s t a dummy f u n c t i o n t o g e n e r a t e some a r b i t r a r y data
def get data ( ) :
base cond = [ [ 1 8 , 2 0 , 1 9 , 1 8 , 1 3 , 4 , 1 ] ,
[20 ,17 ,12 ,9 ,3 ,0 ,0] ,
[20 ,20 ,20 ,12 ,5 ,3 ,0]]
cond1 = [ [ 1 8 , 1 9 , 1 8 , 1 9 , 2 0 , 1 5 , 1 4 ] ,
[19 ,20 ,18 ,16 ,20 ,15 ,9] ,
[19 ,20 ,20 ,20 ,17 ,10 ,0] ,
[20 ,20 ,20 ,20 ,7 ,9 ,1]]
cond2= [ [ 2 0 , 2 0 , 2 0 , 2 0 , 1 9 , 1 7 , 4 ] ,
[20 ,20 ,20 ,20 ,20 ,19 ,7] ,
[19 ,20 ,20 ,19 ,19 ,15 ,2]]
cond3 = [ [ 2 0 , 2 0 , 2 0 , 2 0 , 1 9 , 1 7 , 1 2 ] ,
[18 ,20 ,19 ,18 ,13 ,4 ,1] ,
[20 ,19 ,18 ,17 ,13 ,2 ,0] ,
[19 ,18 ,20 ,20 ,15 ,6 ,0]]
r e t u r n base cond , cond1 , cond2 , cond3
# Load t h e data .
r e s u l t s = get data ()
fig = plt . figure ()
# We w i l l p l o t i t e r a t i o n s 0 . . . 6
xdata = np . a r r a y ( [ 0 , 1 , 2 , 3 , 4 , 5 , 6 ] ) / 5 .
# P l o t each l i n e
# (may want t o automate t h i s p a r t e . g .
s n s . t s p l o t ( time=xdata , data=r e s u l t s [ 0 ]
s n s . t s p l o t ( time=xdata , data=r e s u l t s [ 1 ]
s n s . t s p l o t ( time=xdata , data=r e s u l t s [ 2 ]
s n s . t s p l o t ( time=xdata , data=r e s u l t s [ 3 ]
2
,
,
,
,
with a l o o p ) .
c o l o r =¡¯ r ¡¯ , l i n e s t y l e = ¡¯ ? ¡¯)
c o l o r =¡¯g ¡¯ , l i n e s t y l e =¡¯??¡¯)
c o l o r =¡¯b ¡¯ , l i n e s t y l e = ¡¯ : ¡¯ )
c o l o r =¡¯k ¡¯ , l i n e s t y l e = ¡¯ ?. ¡¯)
# Our y?a x i s i s ¡± s u c c e s s r a t e ¡± h e r e .
p l t . y l a b e l ( ¡± S u c c e s s Rate ¡± , f o n t s i z e =25)
# Our x?a x i s i s i t e r a t i o n number .
p l t . x l a b e l ( ¡± I t e r a t i o n Number ¡± , f o n t s i z e =25 , l a b e l p a d =?4)
# Our t a s k i s c a l l e d ¡±Awesome Robot Performance ¡±
p l t . t i t l e ( ¡± Awesome Robot Performance ¡± , f o n t s i z e =30)
# Legend .
p l t . l e g e n d ( l o c =¡¯bottom l e f t ¡¯ )
# Show t h e p l o t on t h e s c r e e n .
p l t . show ( )
Note that in practice, you may want to automate your code to load a set of
files, automatically draw a reasonable legend, and generate automatic colors.
3
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- xlf website monitor plots beam calibration
- 5 introduction to matplotlib
- cs294 112 deep reinforcement learning plotting and
- matplotlib 2d and 3d plotting in python
- setting up python 3 6 5 numpy and matplotlib on
- creating figures with multiple plots
- 1 introduction to matplotlib 3d plotting and animations
- intro to visualization t r in rna seq experiments e x
- numpy matplotlib troy p kling
- basic plotting with python and matplotlib
Related searches
- learning philosophies and theories
- e learning advantages and disadvantages
- learning theories and concepts
- online learning benefits and problems
- learning theorists and their theories
- adult learning theory and techniques
- learning materials and resources
- adult learning theories and models
- online learning advantages and disadvantages
- online learning pro and con
- learning theory and methods
- learning negative and positive numbers