Models - Updaters - 《DL4J（Deep Learning for Java）Document》

What are updaters?
Usage
Available updaters
- NadamUpdater
  - applyUpdater
NesterovsUpdater
- applyUpdater
RmsPropUpdater
AdaGradUpdater
- applyUpdater
AdaMaxUpdater
- applyUpdater
NoOpUpdater
AdamUpdater
- applyUpdater
AdaDeltaUpdater
- applyUpdater
SgdUpdater
GradientUpdater
AMSGradUpdater

What are updaters?

The main difference among the updaters is how they treat the learning rate. Stochastic Gradient Descent, the most common learning algorithm in deep learning, relies on Theta (the weights in hidden layers) and alpha (the learning rate). Different updaters help optimize the learning rate until the neural network converges on its most performant state.

Usage

To use the updaters, pass a new class to the updater() method in either a ComputationGraph or MultiLayerNetwork.

ComputationGraphConfiguration conf = new NeuralNetConfiguration.Builder()
    .updater(new Adam(0.01))
    // add your layers and hyperparameters below
    .build();

Available updaters

NadamUpdater

[source]

The Nadam updater.https://arxiv.org/pdf/1609.04747.pdf

applyUpdater

public void applyUpdater(INDArray gradient, int iteration, int epoch)

Calculate the update based on the given gradient

param gradient the gradient to get the update for
param iteration
return the gradient

NesterovsUpdater

[source]

Nesterov’s momentum.Keep track of the previous layer’s gradientand use it as a way of updating the gradient.

applyUpdater

public void applyUpdater(INDArray gradient, int iteration, int epoch)

Get the nesterov update

param gradient the gradient to get the update for
param iteration
return

RmsPropUpdater

[source]

RMS Prop updates:

http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdfhttp://cs231n.github.io/neural-networks-3/#ada

AdaGradUpdater

[source]

Vectorized Learning Rate used per Connection Weight

Adapted from: http://xcorr.net/2014/01/23/adagrad-eliminating-learning-rates-in-stochastic-gradient-descentSee also http://cs231n.github.io/neural-networks-3/#ada

applyUpdater

public void applyUpdater(INDArray gradient, int iteration, int epoch)

Gets feature specific learning ratesAdagrad keeps a history of gradients being passed in.Note that each gradient passed in becomes adapted over time, hence the opName adagrad

param gradient the gradient to get learning rates for
param iteration

AdaMaxUpdater

[source]

The AdaMax updater, a variant of Adam.http://arxiv.org/abs/1412.6980

applyUpdater

public void applyUpdater(INDArray gradient, int iteration, int epoch)

Calculate the update based on the given gradient

param gradient the gradient to get the update for
param iteration
return the gradient

NoOpUpdater

[source]

NoOp updater: gradient updater that makes no changes to the gradient

AdamUpdater

[source]

The Adam updater.http://arxiv.org/abs/1412.6980

applyUpdater

public void applyUpdater(INDArray gradient, int iteration, int epoch)

Calculate the update based on the given gradient

param gradient the gradient to get the update for
param iteration
return the gradient

AdaDeltaUpdater

[source]

http://www.matthewzeiler.com/pubs/googleTR2012/googleTR2012.pdfhttps://arxiv.org/pdf/1212.5701v1.pdf

Ada delta updater. More robust adagrad that keeps track of a moving windowaverage of the gradient rather than the every decaying learning rates of adagrad

applyUpdater

public void applyUpdater(INDArray gradient, int iteration, int epoch)

Get the updated gradient for the given gradientand also update the state of ada delta.

param gradient the gradient to get theupdated gradient for
param iteration
return the update gradient

SgdUpdater

[source]

SGD updater applies a learning rate only

GradientUpdater

[source]

Gradient modifications: Calculates an update and tracks related information for gradient changes over timefor handling updates.

AMSGradUpdater

[source]

The AMSGrad updaterReference: On the Convergence of Adam and Beyond - https://openreview.net/forum?id=ryQu7f-RZ