Gradient-based Hyperparameter Optimization with Reversible ...

Gradient-based Hyperparameter Optimization with Reversible Learning

Dougal Maclaurin, David Duvenaud, Ryan Adams

Motivation

? Hyperparameters are everywhere

? sometimes hidden!

? Gradient-free optimization is hard ? Validation loss is a function of hyperparameters ? Why not take gradients?

Optimizing optimization

xfinal = SGD (xinit , learn rate, momentum, Loss(x, reg, Data)) Initial weights Meta-iteration 1

Weight 1

Weight 2

Optimizing optimization

xfinal = SGD (xinit , learn rate, momentum, Loss(x, reg, Data)) Initial weights Meta-iteration 1 Meta-iteration 2

Weight 1

Weight 2

Optimizing optimization

xfinal = SGD (xinit , learn rate, momentum, Loss(x, reg, Data)) Initial weights Meta-iteration 1 Meta-iteration 2 Meta-iteration 3

Weight 1

Weight 2

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download