STATIC GRAPH INTERFACE: NN.GRAPH

STATIC GRAPH INTERFACE: NN.GRAPH

At present, there are two ways to run models in deep learning framework, Dynamic Graph and Static Graph, which are also called Eager Mode and Graph Mode in OneFlow.

There are pros and cons to both approaches, and OneFlow offers support for both, with the Eager Mode by default. If you are reading the tutorials for this basic topic in order, then all the code you have encountered so far is in Eager Mode.

In general, dynamic graphs are easier to use and static graphs have better performance. OneFlow offers nn.Graph, so that users can use the eager-like programming style to build static graphs and train the models.

Eager Mode in OneFlow

OneFlow runs in Eager Mode by default.

The following script, using polynomial to fit the sine function , finds a set of approximate fitting parameters , , , .

This example was introduced to show how Eager Mode and Graph Mode are related in OneFlow (most of the code is reusable). Readers may be very familiar with OneFlow’s Eager Mode now, here we do not explain in detail, interested readers can click on “Code” to expand the Code.

Note: This sample code is adapted from PyTorch official tutorial.

Code

import math
import numpy as np
import oneflow as flow
device = flow.device("cuda")
dtype = flow.float32
# Create Tensors to hold input and outputs.
x = flow.tensor(np.linspace(-math.pi, math.pi, 2000), device=device, dtype=dtype)
y = flow.tensor(np.sin(x), device=device, dtype=dtype)
# For this example, the output y is a linear function of (x, x^2, x^3), so
# we can consider it as a linear layer neural network. Let's prepare the
# tensor (x, x^2, x^3).
xx = flow.cat(
    [x.unsqueeze(-1).pow(1), x.unsqueeze(-1).pow(2), x.unsqueeze(-1).pow(3)], dim=1
)
# The Linear Module
model = flow.nn.Sequential(flow.nn.Linear(3, 1), flow.nn.Flatten(0, 1))
model.to(device)
# Loss Function
loss_fn = flow.nn.MSELoss(reduction="sum")
loss_fn.to(device)
# Optimizer
optimizer = flow.optim.SGD(model.parameters(), lr=1e-6)
for t in range(2000):
    # Forward pass: compute predicted y by passing x to the model.
    y_pred = model(xx)
    # Compute and print loss.
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.numpy())
    # Use the optimizer object to zero all of the gradients for the variables
    # it will update (which are the learnable weights of the model).
    optimizer.zero_grad()
    # Backward pass: compute gradient of the loss with respect to model
    # parameters.
    loss.backward()
    # Calling the step function on an Optimizer makes an update to its
    # parameters.
    optimizer.step()
linear_layer = model[0]
print(
    f"Result: y = {linear_layer.bias.numpy()[0]} + {linear_layer.weight[:, 0].numpy()[0]}*x + {linear_layer.weight[:, 1].numpy()[0]}*x^2 + {linear_layer.weight[:, 2].numpy()[0]}*x^3"
)

Out:

99 582.7045
...
1799 9.326502
1899 9.154123
1999 9.040091
Result: y = -0.0013652867637574673 + 0.8422811627388*x + 0.0002355352626182139*x^2 + -0.09127362817525864*x^3

Graph Mode in OneFlow

Customize a Graph

OneFlow provide the base class nn.Graph, which can be inherited to create a customized Graph class.

import oneflow as flow
import oneflow.nn as nn
class MyLinear(nn.Graph):
  def __init__(self, in_features, out_features):
    super().__init__()
    self.weight = nn.Parameter(flow.randn(in_features, out_features))
    self.bias = nn.Parameter(flow.randn(out_features))
  def build(self, input):
    return flow.matmul(input, self.weight) + self.bias

The simple example above contains the important steps needed to customize a Graph:

Inherits nn.Graph.
Call super().__init__() at the begining of __init__ method to get OneFlow to do the necessary initialization for the Graph.
Defines the structure and state of a neural network in __init__ method.
Describes the computational process in build method.

You can then instantiate and call the Graph:

mygraph = MyLinear(4, 3)
input = flow.randn(1, 4)
out = mygraph(input)
print(out)

Out:

tensor([[ 4.0638, -1.4453,  3.9640]], dtype=oneflow.float32)

Note that Graph is similar to Module in that the object itself is callable and it is not recommended to explicitly call the build method. The definition of a Graph is very similar to the use of a Module, in fact, Graph can directly reuse a defined Module. Users can refer the content in Build Network directly about how to build a neural network in Graph Mode.

For example, use the model above as the network structure:

class ModelGraph(flow.nn.Graph):
    def __init__(self):
        super().__init__()
        self.model = model
    def build(self, x, y):
        y_pred = self.model(x)
        return loss
model_graph = ModelGraph()

The major difference between Module and Graph is that Graph uses build method rather than forward method to describe the computation process, because the build method can contain not only forward computation, but also setting loss, optimizer, etc. You will see an example of using Graph for training later.

Inference in Graph Mode

The following example for inference in Graph Mode directly using the model, which we have already trained in Eager Mode at the beginning of this article.

class LinearPredictGraph(flow.nn.Graph):
    def __init__(self):
        super().__init__()
        self.model = model
    def build(self, x):
        return self.model(x)
linear_graph = LinearPredictGraph()
y_fit = linear_graph(xx)

Draw the differences between the original function outputs and the fitting results:

import matplotlib.pyplot as plt
plt.plot(x.numpy(),y.numpy())
plt.plot(x.numpy(),y_fit.numpy())

Training in Graph Mode

The Graph can be used for training. Click on the “Code” below to see the detailed code.

Code

import math
import numpy as np
import oneflow as flow
device = flow.device("cuda")
dtype = flow.float32
# Create Tensors to hold input and outputs.
x = flow.tensor(np.linspace(-math.pi, math.pi, 2000), device=device, dtype=dtype)
y = flow.tensor(np.sin(x), device=device, dtype=dtype)
# For this example, the output y is a linear function of (x, x^2, x^3), so
# we can consider it as a linear layer neural network. Let's prepare the
# tensor (x, x^2, x^3).
xx = flow.cat(
    [x.unsqueeze(-1).pow(1), x.unsqueeze(-1).pow(2), x.unsqueeze(-1).pow(3)], dim=1
)
# The Linear Module
model = flow.nn.Sequential(flow.nn.Linear(3, 1), flow.nn.Flatten(0, 1))
model.to(device)
# Loss Function
loss_fn = flow.nn.MSELoss(reduction="sum")
loss_fn.to(device)
# Optimizer
optimizer = flow.optim.SGD(model.parameters(), lr=1e-6)
# The Linear Train Graph
class LinearTrainGraph(flow.nn.Graph):
    def __init__(self):
        super().__init__()
        self.model = model
        self.loss_fn = loss_fn
        self.add_optimizer(optimizer)
    def build(self, x, y):
        y_pred = self.model(x)
        loss = self.loss_fn(y_pred, y)
        loss.backward()
        return loss
linear_graph = LinearTrainGraph()
# linear_graph.debug()
for t in range(2000):
    # Print loss.
    loss = linear_graph(xx, y)
    if t % 100 == 99:
        print(t, loss.numpy())
linear_layer = model[0]
print(
    f"Result: y = {linear_layer.bias.numpy()} + {linear_layer.weight[:, 0].numpy()} x + {linear_layer.weight[:, 1].numpy()} x^2 + {linear_layer.weight[:, 2].numpy()} x^3"
)

Comparing to inference, there are only a few things that are unique to training:

# Optimizer
optimizer = flow.optim.SGD(model.parameters(), lr=1e-6) # (1)
# The Linear Train Graph
class LinearTrainGraph(flow.nn.Graph):
    def __init__(self):
        #...
        self.add_optimizer(optimizer) # (2)
    def build(self, x, y):
        #...
        loss.backward() # (3)
        #...

Constructing the optimizer object, which is same to the training in Eager Mode introduced in Backpropagation and Optimizer.
Call self.add_optimizer in Graph’s __init__ method to add the optimizer object constructed in the previous step to the Graph.
Call backward in Graph’s build to trigger back propagation.

Debugging in Graph Mode

You can call print to show information about the Graph object.

print(linear_graph)

The output is slightly different depending on whether the Graph object is called:

If you use print before the Graph object is called, the output is information about the network structure.

The output for print used before linear_graph is called is like this:

(GRAPH:LinearTrainGraph_0:LinearTrainGraph): (
  (MODULE:model:Sequential()): (
    (MODULE:model.0:Linear(in_features=3, out_features=1, bias=True)): (
      (PARAMETER:model.0.weight:tensor(..., device='cuda:0', size=(1, 3), dtype=oneflow.float32,
             requires_grad=True)): ()
      (PARAMETER:model.0.bias:tensor(..., device='cuda:0', size=(1,), dtype=oneflow.float32,
             requires_grad=True)): ()
    )
    (MODULE:model.1:Flatten(start_dim=0, end_dim=1)): ()
  )
  (MODULE:loss_fn:MSELoss()): ()
)

If you use print after the Graph object is called, in addition to the structure of the network, it will print inputs and outputs of the tensors, the output on the console is like this:

(GRAPH:LinearTrainGraph_0:LinearTrainGraph): (
  (INPUT:_LinearTrainGraph_0-input_0:tensor(..., device='cuda:0', size=(2000, 3), dtype=oneflow.float32))
  (INPUT:_LinearTrainGraph_0-input_1:tensor(..., device='cuda:0', size=(2000,), dtype=oneflow.float32))
  (MODULE:model:Sequential()): (
    (INPUT:_model-input_0:tensor(..., device='cuda:0', is_lazy='True', size=(2000, 3),
           dtype=oneflow.float32))
    (MODULE:model.0:Linear(in_features=3, out_features=1, bias=True)): (
      (INPUT:_model.0-input_0:tensor(..., device='cuda:0', is_lazy='True', size=(2000, 3),
             dtype=oneflow.float32))
      (PARAMETER:model.0.weight:tensor(..., device='cuda:0', size=(1, 3), dtype=oneflow.float32,
             requires_grad=True)): ()
      (PARAMETER:model.0.bias:tensor(..., device='cuda:0', size=(1,), dtype=oneflow.float32,
             requires_grad=True)): ()
      (OUTPUT:_model.0-output_0:tensor(..., device='cuda:0', is_lazy='True', size=(2000, 1),
             dtype=oneflow.float32))
    )
    (MODULE:model.1:Flatten(start_dim=0, end_dim=1)): (
      (INPUT:_model.1-input_0:tensor(..., device='cuda:0', is_lazy='True', size=(2000, 1),
             dtype=oneflow.float32))
      (OUTPUT:_model.1-output_0:tensor(..., device='cuda:0', is_lazy='True', size=(2000,),
             dtype=oneflow.float32))
    )
    (OUTPUT:_model-output_0:tensor(..., device='cuda:0', is_lazy='True', size=(2000,),
           dtype=oneflow.float32))
  )
  (MODULE:loss_fn:MSELoss()): (
    (INPUT:_loss_fn-input_0:tensor(..., device='cuda:0', is_lazy='True', size=(2000,),
           dtype=oneflow.float32))
    (INPUT:_loss_fn-input_1:tensor(..., device='cuda:0', is_lazy='True', size=(2000,),
           dtype=oneflow.float32))
    (OUTPUT:_loss_fn-output_0:tensor(..., device='cuda:0', is_lazy='True', size=(), dtype=oneflow.float32))
  )
  (OUTPUT:_LinearTrainGraph_0-output_0:tensor(..., device='cuda:0', is_lazy='True', size=(), dtype=oneflow.float32))
)

In addition, by calling the debug method of Graph objects, Graph’s debug mode is turned on.

OneFlow prints debug information when it compiles the computation graph. If the linear_graph.debug() is removed from the example code above, the output on the console is like this:

Note that nn.Graph.debug() only print debug info on rank 0.
(GRAPH:LinearTrainGraph_0:LinearTrainGraph) start building forward graph.
(INPUT:_LinearTrainGraph_0-input_0:tensor(..., device='cuda:0', size=(20, 3), dtype=oneflow.float32))
(INPUT:_LinearTrainGraph_0-input_1:tensor(..., device='cuda:0', size=(20,), dtype=oneflow.float32))
(MODULE:model:Sequential())
(INPUT:_model-input_0:tensor(..., device='cuda:0', is_lazy='True', size=(20, 3),
       dtype=oneflow.float32))
(MODULE:model.0:Linear(in_features=3, out_features=1, bias=True))
(INPUT:_model.0-input_0:tensor(..., device='cuda:0', is_lazy='True', size=(20, 3),
       dtype=oneflow.float32))
(PARAMETER:model.0.weight:tensor(..., device='cuda:0', size=(1, 3), dtype=oneflow.float32,
       requires_grad=True))
(PARAMETER:model.0.bias:tensor(..., device='cuda:0', size=(1,), dtype=oneflow.float32,
       requires_grad=True))
(OUTPUT:_model.0-output_0:tensor(..., device='cuda:0', is_lazy='True', size=(20, 1),
       dtype=oneflow.float32))
(MODULE:model.1:Flatten(start_dim=0, end_dim=1))
(INPUT:_model.1-input_0:tensor(..., device='cuda:0', is_lazy='True', size=(20, 1),
       dtype=oneflow.float32))
(OUTPUT:_model.1-output_0:tensor(..., device='cuda:0', is_lazy='True', size=(20,), dtype=oneflow.float32))
(OUTPUT:_model-output_0:tensor(..., device='cuda:0', is_lazy='True', size=(20,), dtype=oneflow.float32))
(MODULE:loss_fn:MSELoss())
(INPUT:_loss_fn-input_0:tensor(..., device='cuda:0', is_lazy='True', size=(20,), dtype=oneflow.float32))
(INPUT:_loss_fn-input_1:tensor(..., device='cuda:0', is_lazy='True', size=(20,), dtype=oneflow.float32))
(OUTPUT:_loss_fn-output_0:tensor(..., device='cuda:0', is_lazy='True', size=(), dtype=oneflow.float32))
(OUTPUT:_LinearTrainGraph_0-output_0:tensor(..., device='cuda:0', is_lazy='True', size=(), dtype=oneflow.float32))
(GRAPH:LinearTrainGraph_0:LinearTrainGraph) end building forward graph.
(GRAPH:LinearTrainGraph_0:LinearTrainGraph) start compiling and init graph runtime.
(GRAPH:LinearTrainGraph_0:LinearTrainGraph) end compiling and init graph rumtime.

It displays the names of the layers in the computation graph and input/output tensor information, including shape, device information, data type, and so on.

The advantage of using debug is that the debug information is composed and printed at the same time, which makes it easy to find the problem if there is any error in the graph building process.

In addition to the methods described above, getting the parameters of the gradient during the training process, accessing to the learning rate and other functions are also under development and will come up soon.

	Dynamic Graph	Static Graph
Computation Mode	Eager Mode	Graph Mode
Pros	The code is flexible and easy to debug.	Good performance, easy to optimize and deploy.
Cons	Poor performance and portability.	Not easy to debug.

Static Graph Interface

STATIC GRAPH INTERFACE: NN.GRAPH

Eager Mode in OneFlow

Graph Mode in OneFlow

Customize a Graph

Inference in Graph Mode

Training in Graph Mode

Debugging in Graph Mode

Further Reading: Dynamic Graph vs. Static Graph

Static Graph Interface

STATIC GRAPH INTERFACE: NN.GRAPH

Eager Mode in OneFlow

Graph Mode in OneFlow

Customize a Graph

Inference in Graph Mode

Training in Graph Mode

Debugging in Graph Mode

Further Reading: Dynamic Graph vs. Static Graph

Related Links