AUTOGRAD

AUTOGRAD

The training process of a neural network is powered by backpropagation algorithm. In the backpropagation process, we update the parameters by obtaining the gradient of the loss function with respect to the parameters.

OneFlow provides an autograd engine, which can calculate the gradient of the parameters in the neural network automatically.

We will first introduce the basic concepts of the computation graph, which are conducive to understand the common settings and limitations of Oneflow’s automatic differentiation. Then we will introduce OneFlow’s common automatic differentiation interfaces.

Computation Graph

Computation graphs are composed of tensors and operators. We show this in code as below:

import oneflow as flow
def loss(y_pred, y):
    return flow.sum(1/2*(y_pred-y)**2)
x = flow.ones(1, 5)  # input【不确定】
w = flow.randn(5, 3, requires_grad=True)
b = flow.randn(1, 3, requires_grad=True)
z = flow.matmul(x, w) + b
y = flow.zeros(1, 3)  # label
l = loss(z,y)

Corresponding computation graph：

In computation graph, the nodes only with output and with no input called leaf node, like x, w, b, and y, the nodes only with output and with no input called root node, like loss.

During the backpropagation process, the gradient of l to w and b is required to update w and b. Therefore, we need to set requires_grad as True when creating them.

Automatic Gradient

`backward()` and Gradient

During the backpropagation process, we need to get the gradients of l to w,b respectively, shown as and . We only need to call the ‘backward()’ method of l, and then OneFlow will automatically calculate the gradients and store them in the w.grad and b.grad.

l.backward()
print(w.grad)
print(b.grad)

tensor([[0.9397, 2.5428, 2.5377],
        [0.9397, 2.5428, 2.5377],
        [0.9397, 2.5428, 2.5377],
        [0.9397, 2.5428, 2.5377],
        [0.9397, 2.5428, 2.5377]], dtype=oneflow.float32)
tensor([[0.9397, 2.5428, 2.5377]], dtype=oneflow.float32)

Gradient for Non-leaf Nodes

By default, only gradients of leaf nodes with requires_grad=True will be retained. The ‘grad’ of a non-leaf node is automatically freed during the calling of ‘backward’ and cannot be viewed.

Tensor.retain_grad() can be called to retain and view the ‘grad’ of a non-leaf node.

from math import pi
n1 = flow.tensor(pi/2, requires_grad=True)
n2 = flow.sin(n1)
n2.retain_grad()
n3 = flow.pow(n2, 2)
n3.backward()
print(n1.grad)
print(n2.grad)

we get and using the code above.

Output:

tensor(-8.7423e-08, dtype=oneflow.float32)
tensor(2., dtype=oneflow.float32)

Call `backward()` Multiple Times on a Computation Graph

By default, we can only call backward() once for each computation graph. For example, the following code will raise an error:

n1 = flow.tensor(10., requires_grad=True)
n2 = flow.pow(n1, 2)
n2.backward()
n2.backward()

Error message:

Maybe you try to backward through the node a second time. Specify retain_graph=True when calling .backward() or autograd.grad() the first time.

If we need to call backward() multiple times on the same computation graph, retain_graph needs to be True.

n1 = flow.tensor(10., requires_grad=True)
n2 = flow.pow(n1, 2)
n2.backward(retain_graph=True)
print(n1.grad)
n2.backward()
print(n1.grad)

Output：

tensor(20., dtype=oneflow.float32)
tensor(40., dtype=oneflow.float32)

The above output shows that OneFlow will accumulate the gradient calculated by backward() multiple times. By calling the zeros_(), we can clear the gradient:

n1 = flow.tensor(10., requires_grad=True)
n2 = flow.pow(n1, 2)
n2.backward(retain_graph=True)
print(n1.grad)
n1.grad.zeros_()
n2.backward()
print(n1.grad)

Output：

tensor(20., dtype=oneflow.float32)
tensor(20., dtype=oneflow.float32)

Disabled Gradient Calculation

By default, OneFlow will trace and calculate gradients of Tensors with requires_grad = Ture. However, in some cases, we don’t need OneFlow to keep tracing gradients such as just wanting the forward pass for inference. Then we can use oneflow.no_grad() or oneflow.Tensor.detach() to set.

z = flow.matmul(x, w)+b
print(z.requires_grad)
with flow.no_grad():
    z = flow.matmul(x, w)+b
print(z.requires_grad)

Output：

True
False

z_det = z.detach()
print(z_det.requires_grad)

Output：

False

Gradients for Non-Scalar Outputs

Usually, we call backward() on scalar loss.

However, if loss is a tensor, an error will be raised when calling backward() on loss.

x = flow.randn(1, 2, requires_grad=True)
y = 3*x + 1
y.backward()

Error message：

Check failed: IsScalarTensor(*outputs.at(i)) Grad can be implicitly created only for scalar outputs

We can get the gradient after y.sum().

x = flow.randn(1, 2, requires_grad=True)
y = 3*x + 1
y = y.sum()
y.backward()
print(x.grad)

Output：

tensor([[3., 3.]], dtype=oneflow.float32)

Please refer to the “Further Reading” section below for the analysis of the cause and solution of the error.

Autograd

AUTOGRAD

Computation Graph

Automatic Gradient

backward() and Gradient

Gradient for Non-leaf Nodes

Call backward() Multiple Times on a Computation Graph

Disabled Gradient Calculation

Gradients for Non-Scalar Outputs

Further Reading

`backward()` and Gradient

Call `backward()` Multiple Times on a Computation Graph