动态图机制-DyGraph
PaddlePaddle的DyGraph模式是一种动态的图执行机制,可以立即执行结果,无需构建整个图。同时,和以往静态的执行计算图不同,DyGraph模式下您的所有操作可以立即获得执行结果,而不必等待所构建的计算图全部执行完成,这样可以让您更加直观地构建PaddlePaddle下的深度学习任务,以及进行模型的调试,同时还减少了大量用于构建静态计算图的代码,使得您编写、调试网络的过程变得更加便捷。
PaddlePaddle DyGraph是一个更加灵活易用的模式,可提供:
- 更加灵活便捷的代码组织结构:使用python的执行控制流程和面向对象的模型设计
- 更加便捷的调试功能:直接使用python的打印方法即时打印所需要的结果,从而检查正在运行的模型结果便于测试更改
- 和静态执行图通用的模型代码:同样的模型代码可以使用更加便捷的DyGraph调试,执行,同时也支持使用原有的静态图模式执行
有关的动态图机制更多的实际模型示例请参考Paddle/models/dygraph
设置和基本用法
- 升级到最新的PaddlePaddle 1.6.0:
- pip install -q --upgrade paddlepaddle==1.6.0
- 使用
fluid.dygraph.guard(place=None)
上下文:
- import paddle.fluid as fluid
- with fluid.dygraph.guard():
- # write your executable dygraph code here
现在您就可以在fluid.dygraph.guard()
上下文环境中使用DyGraph的模式运行网络了,DyGraph将改变以往PaddlePaddle的执行方式: 现在他们将会立即执行,并且将计算结果返回给Python。
Dygraph将非常适合和Numpy一起使用,使用fluid.dygraph.to_variable(x)
将会将ndarray转换为fluid.Variable
,而使用fluid.Variable.numpy()
将可以把任意时刻获取到的计算结果转换为Numpyndarray
:
- x = np.ones([2, 2], np.float32)
- with fluid.dygraph.guard():
- inputs = []
- for _ in range(10):
- inputs.append(fluid.dygraph.to_variable(x))
- ret = fluid.layers.sums(inputs)
- print(ret.numpy())
得到输出:
- [[10. 10.]
- [10. 10.]]
这里创建了一系列ndarray
的输入,执行了一个sum
操作之后,我们可以直接将运行的结果打印出来
然后通过调用reduce_sum
后使用Variable.backward()
方法执行反向,使用Variable.gradient()
方法即可获得反向网络执行完成后的梯度值的ndarray
形式:
- loss = fluid.layers.reduce_sum(ret)
- loss.backward()
- print(loss.gradient())
得到输出 :
- [1.]
基于DyGraph构建网络
编写一段用于DyGraph执行的Object-Oriented-Designed, PaddlePaddle模型代码主要由以下两部分组成: 请注意,如果您设计的这一层结构是包含参数的,则必须要使用继承自
fluid.dygraph.Layer
的Object-Oriented-Designed的类来描述该层的行为。- 建立一个可以在DyGraph模式中执行的,Object-Oriented的网络,需要继承自
fluid.dygraph.Layer
,其中需要调用基类的__init__
方法,在构造函数中,我们通常会执行一些例如参数初始化,子网络初始化的操作,执行这些操作时不依赖于输入的动态信息:
- 建立一个可以在DyGraph模式中执行的,Object-Oriented的网络,需要继承自
- class MyLayer(fluid.dygraph.Layer):
- def __init__(self, input_size):
- super(MyLayer, self).__init__()
- self.linear = fluid.dygraph.nn.Linear(input_size, 12)
- 实现一个
forward(self, *inputs)
的执行函数,该函数将负责执行实际运行时网络的执行逻辑, 该函数将会在每一轮训练/预测中被调用,这里我们将执行一个简单的linear
->relu
->elementwise add
->reduce sum
:
- def forward(self, inputs):
- x = self.linear(inputs)
- x = fluid.layers.relu(inputs)
- self._x_for_debug = x
- x = fluid.layers.elementwise_mul(x, x)
- x = fluid.layers.reduce_sum(x)
- return [x]
在
fluid.dygraph.guard()
中执行:- 使用Numpy构建输入:
- np_inp = np.array([1.0, 2.0, -1.0], dtype=np.float32)
- 转换输入的
ndarray
为Variable
, 并执行前向网络获取返回值: 使用fluid.dygraph.to_variable(np_inp)
转换Numpy输入为DyGraph接收的输入,然后使用my_layer(var_inp)[0]
调用callable object并且获取了x
作为返回值,利用x.numpy()
方法直接获取了执行得到的x
的ndarray
返回值。
- with fluid.dygraph.guard():
- var_inp = fluid.dygraph.to_variable(np_inp)
- my_layer = MyLayer(np_inp.shape[-1])
- x = my_layer(var_inp)[0]
- dy_out = x.numpy()
- 计算梯度:自动微分对于实现机器学习算法(例如用于训练神经网络的反向传播)来说很有用, 使用
x.backward()
方法可以从某个fluid.Varaible
开始执行反向网络,同时利用my_layer._x_for_debug.gradient()
获取了网络中x
梯度的ndarray
返回值:
- x.backward()
- dy_grad = my_layer._x_for_debug.gradient()
完整代码如下:
- import paddle.fluid as fluid
- import numpy as np
- class MyLayer(fluid.dygraph.Layer):
- def __init__(self, input_size):
- super(MyLayer, self).__init__()
- self.linear = fluid.dygraph.nn.Linear(input_size, 12)
- def forward(self, inputs):
- x = self.linear(inputs)
- x = fluid.layers.relu(x)
- self._x_for_debug = x
- x = fluid.layers.elementwise_mul(x, x)
- x = fluid.layers.reduce_sum(x)
- return [x]
- if __name__ == '__main__':
- np_inp = np.array([[1.0, 2.0, -1.0]], dtype=np.float32)
- with fluid.dygraph.guard():
- var_inp = fluid.dygraph.to_variable(np_inp)
- my_layer = MyLayer(np_inp.shape[-1])
- x = my_layer(var_inp)[0]
- dy_out = x.numpy()
- x.backward()
- dy_grad = my_layer._x_for_debug.gradient()
- my_layer.clear_gradients() # 将参数梯度清零以保证下一轮训练的正确性
关于自动剪枝
每个 Variable
都有一个 stop_gradient
属性,可以用于细粒度地在反向梯度计算时排除部分子图,以提高效率。
如果OP只要有一个输入需要梯度,那么该OP的输出也需要梯度。 相反,只有当OP的所有输入都不需要梯度时,该OP的输出也不需要梯度。 在所有的 Variable
都不需要梯度的子图中,反向计算就不会进行计算了。
在动态图模式下,除参数以外的所有 Variable
的 stop_gradient
属性默认值都为 True
,而参数的 stop_gradient
属性默认值为 False
。 该属性用于自动剪枝,避免不必要的反向运算。
例如:
- import paddle.fluid as fluid
- import numpy as np
- with fluid.dygraph.guard():
- x = fluid.dygraph.to_variable(np.random.randn(5, 5)) # 默认stop_gradient=True
- y = fluid.dygraph.to_variable(np.random.randn(5, 5)) # 默认stop_gradient=True
- z = fluid.dygraph.to_variable(np.random.randn(5, 5))
- z.stop_gradient = False
- a = x + y
- a.stop_gradient # True
- b = a + z
- b.stop_gradient # False
当你想冻结你的模型的一部分,或者你事先知道你不会使用某些参数的梯度的时候,这个功能是非常有用的。
例如:
- import paddle.fluid as fluid
- import numpy as np
- with fluid.dygraph.guard():
- value0 = np.arange(26).reshape(2, 13).astype("float32")
- value1 = np.arange(6).reshape(2, 3).astype("float32")
- value2 = np.arange(10).reshape(2, 5).astype("float32")
- fc = fluid.Linear(13, 5, dtype="float32")
- fc2 = fluid.Linear(3, 3, dtype="float32")
- a = fluid.dygraph.to_variable(value0)
- b = fluid.dygraph.to_variable(value1)
- c = fluid.dygraph.to_variable(value2)
- out1 = fc(a)
- out2 = fc2(b)
- out1.stop_gradient = True # 将不会对out1这部分子图做反向计算
- out = fluid.layers.concat(input=[out1, out2, c], axis=1)
- out.backward()
- # 可以发现这里fc参数的梯度都为0
- assert (fc.weight.gradient() == 0).all()
- assert (out1.gradient() == 0).all()
使用DyGraph训练模型
接下来我们将以“手写数字识别”这个最基础的模型为例,展示如何利用DyGraph模式搭建并训练一个模型:
有关手写数字识别的相关理论知识请参考PaddleBook中的内容,我们在这里默认您已经了解了该模型所需的深度学习理论知识。
- 准备数据,我们使用
paddle.dataset.mnist
作为训练所需要的数据集:
- train_reader = paddle.batch(
- paddle.dataset.mnist.train(), batch_size=BATCH_SIZE, drop_last=True)
- 构建网络,虽然您可以根据之前的介绍自己定义所有的网络结构,但是您也可以直接使用
fluid.dygraph.Layer
当中我们为您定制好的一些基础网络结构,这里我们利用fluid.dygraph.Conv2D
以及fluid.dygraph.Pool2d
构建了基础的SimpleImgConvPool
:
- class SimpleImgConvPool(fluid.dygraph.Layer):
- def __init__(self,
- num_channels,
- num_filters,
- filter_size,
- pool_size,
- pool_stride,
- pool_padding=0,
- pool_type='max',
- global_pooling=False,
- conv_stride=1,
- conv_padding=0,
- conv_dilation=1,
- conv_groups=1,
- act=None,
- use_cudnn=False,
- param_attr=None,
- bias_attr=None):
- super(SimpleImgConvPool, self).__init__()
- self._conv2d = fluid.dygraph.Conv2D(
- num_channels=num_channels,
- num_filters=num_filters,
- filter_size=filter_size,
- stride=conv_stride,
- padding=conv_padding,
- dilation=conv_dilation,
- groups=conv_groups,
- param_attr=param_attr,
- bias_attr=bias_attr,
- act=act,
- use_cudnn=use_cudnn)
- self._pool2d = fluid.dygraph.Pool2D(
- pool_size=pool_size,
- pool_type=pool_type,
- pool_stride=pool_stride,
- pool_padding=pool_padding,
- global_pooling=global_pooling,
- use_cudnn=use_cudnn)
- def forward(self, inputs):
- x = self._conv2d(inputs)
- x = self._pool2d(x)
- return x
注意: 构建网络时子网络的定义和使用请在
__init__
中进行, 而子网络的执行则在forward
函数中进行
- 利用已经构建好的
SimpleImgConvPool
组成最终的MNIST
网络:
- class MNIST(fluid.dygraph.Layer):
- def __init__(self):
- super(MNIST, self).__init__()
- self._simple_img_conv_pool_1 = SimpleImgConvPool(
- 1, 20, 5, 2, 2, act="relu")
- self._simple_img_conv_pool_2 = SimpleImgConvPool(
- 20, 50, 5, 2, 2, act="relu")
- self.pool_2_shape = 50 * 4 * 4
- SIZE = 10
- scale = (2.0 / (self.pool_2_shape**2 * SIZE))**0.5
- self._fc = fluid.dygraph.Linear(
- self.pool_2_shape,
- 10,
- param_attr=fluid.param_attr.ParamAttr(
- initializer=fluid.initializer.NormalInitializer(
- loc=0.0, scale=scale)),
- act="softmax")
- def forward(self, inputs, label=None):
- x = self._simple_img_conv_pool_1(inputs)
- x = self._simple_img_conv_pool_2(x)
- x = fluid.layers.reshape(x, shape=[-1, self.pool_2_shape])
- x = self._fc(x)
- if label is not None:
- acc = fluid.layers.accuracy(input=x, label=label)
- return x, acc
- else:
- return x
- 在
fluid.dygraph.guard()
中定义配置好的MNIST
网络结构,此时即使没有训练也可以在fluid.dygraph.guard()
中调用模型并且检查输出:
- with fluid.dygraph.guard():
- mnist = MNIST()
- train_reader = paddle.batch(
- paddle.dataset.mnist.train(), batch_size=32, drop_last=True)
- id, data = list(enumerate(train_reader()))[0]
- dy_x_data = np.array(
- [x[0].reshape(1, 28, 28)
- for x in data]).astype('float32')
- img = fluid.dygraph.to_variable(dy_x_data)
- print("result is: {}".format(mnist(img).numpy()))
输出:
- result is: [[0.10135901 0.1051138 0.1027941 ... 0.0972859 0.10221873 0.10165327]
- [0.09735426 0.09970362 0.10198303 ... 0.10134517 0.10179105 0.10025002]
- [0.09539858 0.10213123 0.09543551 ... 0.10613529 0.10535969 0.097991 ]
- ...
- [0.10120598 0.0996111 0.10512722 ... 0.10067689 0.10088114 0.10071224]
- [0.09889644 0.10033772 0.10151272 ... 0.10245881 0.09878646 0.101483 ]
- [0.09097178 0.10078511 0.10198414 ... 0.10317434 0.10087223 0.09816764]]
- 构建训练循环,在每一轮参数更新完成后我们调用
mnist.clear_gradients()
来重置梯度:
- with fluid.dygraph.guard():
- epoch_num = 5
- BATCH_SIZE = 64
- train_reader = paddle.batch(
- paddle.dataset.mnist.train(), batch_size=32, drop_last=True)
- mnist = MNIST()
- adam = fluid.optimizer.AdamOptimizer(learning_rate=0.001, parameter_list=mnist.parameters())
- for epoch in range(epoch_num):
- for batch_id, data in enumerate(train_reader()):
- dy_x_data = np.array([x[0].reshape(1, 28, 28)
- for x in data]).astype('float32')
- y_data = np.array(
- [x[1] for x in data]).astype('int64').reshape(-1, 1)
- img = fluid.dygraph.to_variable(dy_x_data)
- label = fluid.dygraph.to_variable(y_data)
- cost = mnist(img)
- loss = fluid.layers.cross_entropy(cost, label)
- avg_loss = fluid.layers.mean(loss)
- if batch_id % 100 == 0 and batch_id is not 0:
- print("epoch: {}, batch_id: {}, loss is: {}".format(epoch, batch_id, avg_loss.numpy()))
- avg_loss.backward()
- adam.minimize(avg_loss)
- mnist.clear_gradients()
- 变量及优化器
模型的参数或者任何您希望检测的值可以作为变量封装在类中,然后通过对象获取并使用numpy()
方法获取其ndarray
的输出, 在训练过程中您可以使用mnist.parameters()
来获取到网络中所有的参数,也可以指定某一个Layer
的某个参数或者parameters()
来获取该层的所有参数,使用numpy()
方法随时查看参数的值
反向运行后调用之前定义的Adam
优化器对象的minimize
方法进行参数更新:
- with fluid.dygraph.guard():
- epoch_num = 5
- BATCH_SIZE = 64
- mnist = MNIST()
- adam = fluid.optimizer.AdamOptimizer(learning_rate=0.001, parameter_list=mnist.parameters())
- train_reader = paddle.batch(
- paddle.dataset.mnist.train(), batch_size= BATCH_SIZE, drop_last=True)
- np.set_printoptions(precision=3, suppress=True)
- for epoch in range(epoch_num):
- for batch_id, data in enumerate(train_reader()):
- dy_x_data = np.array(
- [x[0].reshape(1, 28, 28)
- for x in data]).astype('float32')
- y_data = np.array(
- [x[1] for x in data]).astype('int64').reshape(BATCH_SIZE, 1)
- img = fluid.dygraph.to_variable(dy_x_data)
- label = fluid.dygraph.to_variable(y_data)
- label.stop_gradient = True
- cost = mnist(img)
- loss = fluid.layers.cross_entropy(cost, label)
- avg_loss = fluid.layers.mean(loss)
- dy_out = avg_loss.numpy()
- avg_loss.backward()
- adam.minimize(avg_loss)
- mnist.clear_gradients()
- dy_param_value = {}
- for param in mnist.parameters():
- dy_param_value[param.name] = param.numpy()
- if batch_id % 20 == 0:
- print("Loss at step {}: {}".format(batch_id, avg_loss.numpy()))
- print("Final loss: {}".format(avg_loss.numpy()))
- print("_simple_img_conv_pool_1_conv2d W's mean is: {}".format(mnist._simple_img_conv_pool_1._conv2d._filter_param.numpy().mean()))
- print("_simple_img_conv_pool_1_conv2d Bias's mean is: {}".format(mnist._simple_img_conv_pool_1._conv2d._bias_param.numpy().mean()))
输出:
- ```
- Loss at step 0: [2.302]
- Loss at step 20: [1.616]
- Loss at step 40: [1.244]
- Loss at step 60: [1.142]
- Loss at step 80: [0.911]
- Loss at step 100: [0.824]
- Loss at step 120: [0.774]
- Loss at step 140: [0.626]
- Loss at step 160: [0.609]
- Loss at step 180: [0.627]
- Loss at step 200: [0.466]
- Loss at step 220: [0.499]
- Loss at step 240: [0.614]
- Loss at step 260: [0.585]
- Loss at step 280: [0.503]
- Loss at step 300: [0.423]
- Loss at step 320: [0.509]
- Loss at step 340: [0.348]
- Loss at step 360: [0.452]
- Loss at step 380: [0.397]
- Loss at step 400: [0.54]
- Loss at step 420: [0.341]
- Loss at step 440: [0.337]
- Loss at step 460: [0.155]
- Final loss: [0.164]
- _simple_img_conv_pool_1_conv2d W's mean is: 0.00606656912714
- _simple_img_conv_pool_1_conv2d Bias's mean is: -3.4576318285e-05
- ```
- 性能
在使用fluid.dygraph.guard()
时可以通过传入fluid.CUDAPlace(0)
或者fluid.CPUPlace()
来选择执行DyGraph的设备,通常如果不做任何处理将会自动适配您的设备。
使用多卡训练模型
目前PaddlePaddle支持通过多进程方式进行多卡训练,即每个进程对应一张卡。训练过程中,在第一次执行前向操作时,如果该操作需要参数,则会将0号卡的参数Broadcast到其他卡上,确保各个卡上的参数一致;在计算完反向操作之后,将产生的参数梯度在所有卡之间进行聚合;最后在各个GPU卡上分别进行参数更新。
- place = fluid.CUDAPlace(fluid.dygraph.parallel.Env().dev_id)
- with fluid.dygraph.guard(place):
- strategy = fluid.dygraph.parallel.prepare_context()
- epoch_num = 5
- BATCH_SIZE = 64
- mnist = MNIST()
- adam = fluid.optimizer.AdamOptimizer(learning_rate=0.001, parameter_list=mnist.parameters())
- mnist = fluid.dygraph.parallel.DataParallel(mnist, strategy)
- train_reader = paddle.batch(
- paddle.dataset.mnist.train(), batch_size=BATCH_SIZE, drop_last=True)
- train_reader = fluid.contrib.reader.distributed_batch_reader(
- train_reader)
- for epoch in range(epoch_num):
- for batch_id, data in enumerate(train_reader()):
- dy_x_data = np.array([x[0].reshape(1, 28, 28)
- for x in data]).astype('float32')
- y_data = np.array(
- [x[1] for x in data]).astype('int64').reshape(-1, 1)
- img = fluid.dygraph.to_variable(dy_x_data)
- label = fluid.dygraph.to_variable(y_data)
- label.stop_gradient = True
- cost, acc = mnist(img, label)
- loss = fluid.layers.cross_entropy(cost, label)
- avg_loss = fluid.layers.mean(loss)
- avg_loss = mnist.scale_loss(avg_loss)
- avg_loss.backward()
- mnist.apply_collective_grads()
- adam.minimize(avg_loss)
- mnist.clear_gradients()
- if batch_id % 100 == 0 and batch_id is not 0:
- print("epoch: {}, batch_id: {}, loss is: {}".format(epoch, batch_id, avg_loss.numpy()))
动态图单卡训练转多卡训练需要修改的地方主要有四处:
- 需要从环境变量获取设备的ID,即:
- place = fluid.CUDAPlace(fluid.dygraph.parallel.Env().dev_id)
- 需要对原模型做一些预处理,即:
- strategy = fluid.dygraph.parallel.prepare_context()
- mnist = MNIST()
- adam = AdamOptimizer(learning_rate=0.001, parameter_list=mnist.parameters())
- mnist = fluid.dygraph.parallel.DataParallel(mnist, strategy)
- 数据读取,必须确保每个进程读取的数据是不同的,即所有进程读取数据的交集为空,所有进程读取数据的并集是完整的数据集:
- train_reader = paddle.batch(
- paddle.dataset.mnist.train(), batch_size=BATCH_SIZE, drop_last=True)
- train_reader = fluid.contrib.reader.distributed_batch_reader(
- train_reader)
- 需要对loss进行调整,以及对参数的梯度进行聚合,即:
- avg_loss = mnist.scale_loss(avg_loss)
- avg_loss.backward()
- mnist.apply_collective_grads()
Paddle动态图多进程多卡模型训练启动时需要指定使用的GPU,即如果使用0,1,2,3
卡,启动方式如下:
- python -m paddle.distributed.launch --selected_gpus=0,1,2,3 --log_dir ./mylog train.py
输出结果为:
- ----------- Configuration Arguments -----------
- cluster_node_ips: 127.0.0.1
- log_dir: ./mylog
- node_ip: 127.0.0.1
- print_config: True
- selected_gpus: 0,1,2,3
- started_port: 6170
- training_script: train.py
- training_script_args: ['--use_data_parallel', '1']
- use_paddlecloud: True
- ------------------------------------------------
- trainers_endpoints: 127.0.0.1:6170,127.0.0.1:6171,127.0.0.1:6172,127.0.0.1:6173 , node_id: 0 , current_node_ip: 127.0.0.1 , num_nodes: 1 , node_ips: ['127.0.0.1'] , nranks: 4
此时,程序会将每个进程的输出log导出到./mylog路径下:
- .
- ├── mylog
- │ ├── workerlog.0
- │ ├── workerlog.1
- │ ├── workerlog.2
- │ └── workerlog.3
- └── train.py
如果不指定--log_dir
,程序会将打印出所有进程的输出,即:
- ----------- Configuration Arguments -----------
- cluster_node_ips: 127.0.0.1
- log_dir: None
- node_ip: 127.0.0.1
- print_config: True
- selected_gpus: 0,1,2,3
- started_port: 6170
- training_script: train.py
- training_script_args: ['--use_data_parallel', '1']
- use_paddlecloud: True
- ------------------------------------------------
- trainers_endpoints: 127.0.0.1:6170,127.0.0.1:6171,127.0.0.1:6172,127.0.0.1:6173 , node_id: 0 , current_node_ip: 127.0.0.1 , num_nodes: 1 , node_ips: ['127.0.0.1'] , nranks: 4
- grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
- grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
- grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
- grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
- I0923 09:32:36.423513 56410 nccl_context.cc:120] init nccl context nranks: 4 local rank: 1 gpu id: 1
- I0923 09:32:36.425287 56411 nccl_context.cc:120] init nccl context nranks: 4 local rank: 2 gpu id: 2
- I0923 09:32:36.429337 56409 nccl_context.cc:120] init nccl context nranks: 4 local rank: 0 gpu id: 0
- I0923 09:32:36.429440 56412 nccl_context.cc:120] init nccl context nranks: 4 local rank: 3 gpu id: 3
- W0923 09:32:42.594097 56412 device_context.cc:198] Please NOTE: device: 3, CUDA Capability: 70, Driver API Version: 9.0, Runtime API Version: 9.0
- W0923 09:32:42.605836 56412 device_context.cc:206] device: 3, cuDNN Version: 7.5.
- W0923 09:32:42.632463 56410 device_context.cc:198] Please NOTE: device: 1, CUDA Capability: 70, Driver API Version: 9.0, Runtime API Version: 9.0
- W0923 09:32:42.637948 56410 device_context.cc:206] device: 1, cuDNN Version: 7.5.
- W0923 09:32:42.648674 56411 device_context.cc:198] Please NOTE: device: 2, CUDA Capability: 70, Driver API Version: 9.0, Runtime API Version: 9.0
- W0923 09:32:42.654021 56411 device_context.cc:206] device: 2, cuDNN Version: 7.5.
- W0923 09:32:43.048696 56409 device_context.cc:198] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.0, Runtime API Version: 9.0
- W0923 09:32:43.053236 56409 device_context.cc:206] device: 0, cuDNN Version: 7.5.
- start data reader (trainers_num: 4, trainer_id: 2)
- start data reader (trainers_num: 4, trainer_id: 3)
- start data reader (trainers_num: 4, trainer_id: 1)
- start data reader (trainers_num: 4, trainer_id: 0)
- Loss at epoch 0 step 0: [0.57390565]
- Loss at epoch 0 step 0: [0.57523954]
- Loss at epoch 0 step 0: [0.575606]
- Loss at epoch 0 step 0: [0.5767452]
模型参数的保存
动态图由于模型和优化器在不同的对象中存储,模型参数和优化器信息要分别存储。
在模型训练中可以使用 paddle.fluid.dygraph.save_dygraph(state_dict, model_path)
来保存模型参数的dict或优化器信息的dict。
同样可以使用 paddle.fluid.dygraph.load_dygraph(model_path)
获取保存的模型参数的dict和优化器信息的dict。
再使用your_modle_object.set_dict(para_dict)
接口来恢复保存的模型参数从而达到继续训练的目的。
以及使用your_optimizer_object.set_dict(opti_dict)
接口来恢复保存的优化器中的learning rate decay
值。
下面的代码展示了如何在“手写数字识别”任务中保存参数并且读取已经保存的参数来继续训练。
- import paddle.fluid as fluid
- with fluid.dygraph.guard():
- epoch_num = 5
- BATCH_SIZE = 64
- mnist = MNIST()
- adam = fluid.optimizer.Adam(learning_rate=0.001, parameter_list=mnist.parameters())
- train_reader = paddle.batch(
- paddle.dataset.mnist.train(), batch_size= BATCH_SIZE, drop_last=True)
- np.set_printoptions(precision=3, suppress=True)
- dy_param_init_value={}
- for epoch in range(epoch_num):
- for batch_id, data in enumerate(train_reader()):
- dy_x_data = np.array(
- [x[0].reshape(1, 28, 28)
- for x in data]).astype('float32')
- y_data = np.array(
- [x[1] for x in data]).astype('int64').reshape(BATCH_SIZE, 1)
- img = fluid.dygraph.to_variable(dy_x_data)
- label = fluid.dygraph.to_variable(y_data)
- label.stop_gradient = True
- cost = mnist(img)
- loss = fluid.layers.cross_entropy(cost, label)
- avg_loss = fluid.layers.mean(loss)
- dy_out = avg_loss.numpy()
- avg_loss.backward()
- adam.minimize(avg_loss)
- if batch_id == 20:
- fluid.dygraph.save_dygraph(mnist.state_dict(), "paddle_dy")
- mnist.clear_gradients()
- if batch_id == 20:
- for param in mnist.parameters():
- dy_param_init_value[param.name] = param.numpy()
- model, _ = fluid.dygraph.load_dygraph("paddle_dy")
- mnist.set_dict(model)
- break
- if epoch == 0:
- break
- restore = mnist.parameters()
- # check save and load
- success = True
- for value in restore:
- if (not np.array_equal(value.numpy(), dy_param_init_value[value.name])) or (not np.isfinite(value.numpy().all())) or (np.isnan(value.numpy().any())):
- success = False
- print("model save and load success? {}".format(success))
需要注意的是,如果采用多卡训练,只需要一个进程对模型参数进行保存,因此在保存模型参数时,需要进行指定保存哪个进程的参数,比如
- if fluid.dygraph.parallel.Env().local_rank == 0:
- fluid.dygraph.save_dygraph(mnist.state_dict(), "paddle_dy")
模型评估
当我们需要在DyGraph模式下利用搭建的模型进行预测任务,请在fluid.dygraph.guard()
上下文中调用一次YourModel.eval()
接口来切换到预测模式。例如,在之前的手写数字识别模型中我们可以使用mnist.eval()
来切换到预测模式。需要显示地调用YourModel.eval()
切换到预测模式的原因是,我们默认在fluid.dygraph.guard()
上下文中是训练模式,训练模式下DyGraph在运行前向网络的时候会自动求导,添加反向网络;而在预测时,DyGraph只需要执行前向的预测网络,不需要进行自动求导并执行反向网络。
请注意,如果您在GPU
设备中运行YourModel
模型,并且未调用loss.backward
(通常来说,是进行预测时),则必须调用YourModel.eval()
,以避免构建反向网络,否则有可能会导致显存不足。
下面的代码展示了如何使用DyGraph模式训练一个用于执行“手写数字识别”任务的模型并保存,并且利用已经保存好的模型进行预测。
我们在fluid.dygraph.guard()
上下文中进行了模型的保存和训练,值得注意的是,当我们需要在训练的过程中进行预测时需要使用YourModel.eval()
切换到预测模式,并且在预测完成后使用YourModel.train()
切换回训练模式继续训练。
我们在inference_mnist
中启用另一个fluid.dygraph.guard()
,并在其上下文中load
之前保存的checkpoint
进行预测,同样的在执行预测前需要使用YourModel.eval()
来切换到预测模式。
- def test_mnist(reader, model, batch_size):
- acc_set = []
- avg_loss_set = []
- for batch_id, data in enumerate(reader()):
- dy_x_data = np.array([x[0].reshape(1, 28, 28)
- for x in data]).astype('float32')
- y_data = np.array(
- [x[1] for x in data]).astype('int64').reshape(batch_size, 1)
- img = fluid.dygraph.to_variable(dy_x_data)
- label = fluid.dygraph.to_variable(y_data)
- label.stop_gradient = True
- prediction, acc = model(img, label)
- loss = fluid.layers.cross_entropy(input=prediction, label=label)
- avg_loss = fluid.layers.mean(loss)
- acc_set.append(float(acc.numpy()))
- avg_loss_set.append(float(avg_loss.numpy()))
- # get test acc and loss
- acc_val_mean = np.array(acc_set).mean()
- avg_loss_val_mean = np.array(avg_loss_set).mean()
- return avg_loss_val_mean, acc_val_mean
- def inference_mnist():
- with fluid.dygraph.guard():
- mnist_infer = MNIST()
- # load checkpoint
- model_dict, _ = fluid.dygraph.load_dygraph("paddle_dy")
- mnist_infer.load_dict(model_dict)
- print("checkpoint loaded")
- # start evaluate mode
- mnist_infer.eval()
- def load_image(file):
- im = Image.open(file).convert('L')
- im = im.resize((28, 28), Image.ANTIALIAS)
- im = np.array(im).reshape(1, 1, 28, 28).astype(np.float32)
- im = im / 255.0 * 2.0 - 1.0
- return im
- cur_dir = os.path.dirname(os.path.realpath(__file__))
- tensor_img = load_image(cur_dir + '/image/infer_3.png')
- results = mnist_infer(fluid.dygraph.to_variable(tensor_img))
- lab = np.argsort(results.numpy())
- print("Inference result of image/infer_3.png is: %d" % lab[0][-1])
- with fluid.dygraph.guard():
- epoch_num = 1
- BATCH_SIZE = 64
- mnist = MNIST()
- adam = fluid.optimizer.AdamOptimizer(learning_rate=0.001, parameter_list=mnist.parameters())
- test_reader = paddle.batch(
- paddle.dataset.mnist.test(), batch_size=BATCH_SIZE, drop_last=True)
- train_reader = paddle.batch(
- paddle.dataset.mnist.train(),
- batch_size=BATCH_SIZE,
- drop_last=True)
- for epoch in range(epoch_num):
- for batch_id, data in enumerate(train_reader()):
- dy_x_data = np.array([x[0].reshape(1, 28, 28)
- for x in data]).astype('float32')
- y_data = np.array(
- [x[1] for x in data]).astype('int64').reshape(-1, 1)
- img = fluid.dygraph.to_variable(dy_x_data)
- label = fluid.dygraph.to_variable(y_data)
- label.stop_gradient = True
- cost, acc = mnist(img, label)
- loss = fluid.layers.cross_entropy(cost, label)
- avg_loss = fluid.layers.mean(loss)
- avg_loss.backward()
- adam.minimize(avg_loss)
- # save checkpoint
- mnist.clear_gradients()
- if batch_id % 100 == 0:
- print("Loss at epoch {} step {}: {:}".format(
- epoch, batch_id, avg_loss.numpy()))
- mnist.eval()
- test_cost, test_acc = test_mnist(test_reader, mnist, BATCH_SIZE)
- mnist.train()
- print("Loss at epoch {} , Test avg_loss is: {}, acc is: {}".format(
- epoch, test_cost, test_acc))
- fluid.dygraph.save_dygraph(mnist.state_dict(), "paddle_dy")
- print("checkpoint saved")
- inference_mnist()
输出:
- Loss at epoch 0 step 0: [2.2991252]
- Loss at epoch 0 step 100: [0.15491392]
- Loss at epoch 0 step 200: [0.13315125]
- Loss at epoch 0 step 300: [0.10253005]
- Loss at epoch 0 step 400: [0.04266362]
- Loss at epoch 0 step 500: [0.08894891]
- Loss at epoch 0 step 600: [0.08999012]
- Loss at epoch 0 step 700: [0.12975612]
- Loss at epoch 0 step 800: [0.15257305]
- Loss at epoch 0 step 900: [0.07429226]
- Loss at epoch 0 , Test avg_loss is: 0.05995981965082674, acc is: 0.9794671474358975
- checkpoint saved
- No optimizer loaded. If you didn't save optimizer, please ignore this. The program can still work with new optimizer.
- checkpoint loaded
- Inference result of image/infer_3.png is: 3
编写兼容的模型
以上一步中手写数字识别的例子为例,动态图的模型代码可以直接用于静态图中作为模型代码,执行时,直接使用PaddlePaddle静态图执行方式即可,这里以静态图中的executor
为例, 模型代码可以直接使用之前的模型代码,执行时使用Executor
执行即可
- epoch_num = 1
- BATCH_SIZE = 64
- exe = fluid.Executor(fluid.CPUPlace())
- mnist = MNIST()
- sgd = fluid.optimizer.SGDOptimizer(learning_rate=1e-3, parameter_list=mnist.parameters())
- train_reader = paddle.batch(
- paddle.dataset.mnist.train(), batch_size=BATCH_SIZE, drop_last=True)
- img = fluid.layers.data(
- name='pixel', shape=[1, 28, 28], dtype='float32')
- label = fluid.layers.data(name='label', shape=[1], dtype='int64')
- cost = mnist(img)
- loss = fluid.layers.cross_entropy(cost, label)
- avg_loss = fluid.layers.mean(loss)
- sgd.minimize(avg_loss)
- out = exe.run(fluid.default_startup_program())
- for epoch in range(epoch_num):
- for batch_id, data in enumerate(train_reader()):
- static_x_data = np.array(
- [x[0].reshape(1, 28, 28)
- for x in data]).astype('float32')
- y_data = np.array(
- [x[1] for x in data]).astype('int64').reshape([BATCH_SIZE, 1])
- fetch_list = [avg_loss.name]
- out = exe.run(
- fluid.default_main_program(),
- feed={"pixel": static_x_data,
- "label": y_data},
- fetch_list=fetch_list)
- static_out = out[0]
- if batch_id % 100 == 0 and batch_id is not 0:
- print("epoch: {}, batch_id: {}, loss: {}".format(epoch, batch_id, static_out))