- What are activations?
- Usage
- Available activations
- ActivationRectifiedTanh
- ActivationELU
- ActivationReLU
- ActivationRationalTanh
- ActivationThresholdedReLU
- ActivationReLU6
- ActivationHardTanH
- ActivationSigmoid
- ActivationGELU
- ActivationPReLU
- ActivationIdentity
- ActivationSoftSign
- ActivationHardSigmoid
- ActivationSoftmax
- ActivationCube
- ActivationRReLU
- ActivationTanH
- ActivationSELU
- ActivationLReLU
- ActivationSwish
- ActivationSoftPlus
What are activations?
At a simple level, activation functions help decide whether a neuron should be activated. This helps determine whether the information that the neuron is receiving is relevant for the input. The activation function is a non-linear transformation that happens over an input signal, and the transformed output is sent to the next neuron.
Usage
The recommended method to use activations is to add an activation layer in your neural network, and configure your desired activation:
GraphBuilder graphBuilder = new NeuralNetConfiguration.Builder()
// add hyperparameters and other layers
.addLayer("softmax", new ActivationLayer(Activation.SOFTMAX), "previous_input")
// add more layers and output
.build();
Available activations
ActivationRectifiedTanh
Rectified tanh
Essentially max(0, tanh(x))
Underlying implementation is in native code
ActivationELU
f(x) = alpha (exp(x) - 1.0); x < 0= x ; x>= 0
alpha defaults to 1, if not specified
ActivationReLU
f(x) = max(0, x)
ActivationRationalTanh
Rational tanh approximationFrom https://arxiv.org/pdf/1508.01292v3
f(x) = 1.7159 tanh(2x/3)where tanh is approximated as follows,tanh(y) ~ sgn(y) { 1 - 1/(1+|y|+y^2+1.41645y^4)}
Underlying implementation is in native code
ActivationThresholdedReLU
Thresholded RELU
f(x) = x for x > theta, f(x) = 0 otherwise. theta defaults to 1.0
ActivationReLU6
f(x) = min(max(input, cutoff), 6)
ActivationHardTanH
⎧ 1, if x > 1f(x) = ⎨ -1, if x < -1⎩ x, otherwise
ActivationSigmoid
f(x) = 1 / (1 + exp(-x))
ActivationGELU
GELU activation function - Gaussian Error Linear Units
ActivationPReLU
/ Parametrized Rectified Linear Unit (PReLU)
f(x) = alpha x for x < 0, f(x) = x for x >= 0
alpha has the same shape as x and is a learned parameter.
ActivationIdentity
f(x) = x
ActivationSoftSign
f_i(x) = x_i / (1+ | x_i | ) |
ActivationHardSigmoid
f(x) = min(1, max(0, 0.2x + 0.5))
ActivationSoftmax
f_i(x) = exp(x_i - shift) / sum_j exp(x_j - shift)where shift = max_i(x_i)
ActivationCube
f(x) = x^3
ActivationRReLU
f(x) = max(0,x) + alpha min(0, x)
alpha is drawn from uniform(l,u) during training and is set to l+u/2 during testl and u default to 1/8 and 1/3 respectively
Empirical Evaluation of Rectified Activations in Convolutional Network
ActivationTanH
f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))
ActivationSELU
https://arxiv.org/pdf/1706.02515.pdf
ActivationLReLU
Leaky RELUf(x) = max(0, x) + alpha min(0, x)alpha defaults to 0.01
ActivationSwish
f(x) = x sigmoid(x)
ActivationSoftPlus
f(x) = log(1+e^x)