DataParallel

class paddle.fluid.dygraph.DataParallel ( layers, strategy ) [源代码]

查看属性与别名

API属性:命令式编程模式(动态图)

通过数据并行模式执行动态图模型。

目前,DataParallel 仅支持以多进程的方式执行动态图模型。使用方式如下:

python -m paddle.distributed.launch –selected_gpus=0,1 dynamic_graph_test.py

其中 dynamic_graph_test.py 脚本的代码可以是下面的示例代码。

参数

  • Layer (Layer) - 需要通过数据并行方式执行的模型。
  • strategy (ParallelStrategy) - 数据并行的策略,包括并行执行的环境配置。

返回

支持数据并行的 Layer

返回类型

Layer实例

代码示例

  1. import numpy as np
  2. import paddle.fluid as fluid
  3. place = fluid.CUDAPlace(fluid.dygraph.ParallelEnv().dev_id)
  4. with fluid.dygraph.guard(place):
  5. # prepare the data parallel context
  6. strategy = fluid.dygraph.prepare_context()
  7. linear = fluid.dygraph.Linear(1, 10, act="softmax")
  8. adam = fluid.optimizer.AdamOptimizer(
  9. learning_rate=0.001, parameter_list=linear.parameters())
  10. # make the module become the data parallelism module
  11. linear = fluid.dygraph.DataParallel(linear, strategy)
  12. x_data = np.random.random(size=[10, 1]).astype(np.float32)
  13. data = fluid.dygraph.to_variable(x_data)
  14. hidden = linear(data)
  15. avg_loss = fluid.layers.mean(hidden)
  16. # scale the loss according to the number of trainers.
  17. avg_loss = linear.scale_loss(avg_loss)
  18. avg_loss.backward()
  19. # collect the gradients of trainers.
  20. linear.apply_collective_grads()
  21. adam.minimize(avg_loss)
  22. linear.clear_gradients()

scale_loss ( loss )

缩放模型损失值 loss 。在数据并行模式中,损失值 loss 需要根据并行训练进程的数目进行缩放。

如果不在数据并行模式下,会直接返回原 loss

参数:

  • loss (Variable) - 当前模型的损失值。

返回:缩放后的损失值 loss

返回类型:Variable

代码示例

  1. import numpy as np
  2. import paddle.fluid as fluid
  3. place = fluid.CUDAPlace(fluid.dygraph.ParallelEnv().dev_id)
  4. with fluid.dygraph.guard(place):
  5. # prepare the data parallel context
  6. strategy = fluid.dygraph.prepare_context()
  7. linear = fluid.dygraph.Linear(1, 10, act="softmax")
  8. adam = fluid.optimizer.AdamOptimizer(
  9. learning_rate=0.001, parameter_list=linear.parameters())
  10. # make the module become the data parallelism module
  11. linear = fluid.dygraph.DataParallel(linear, strategy)
  12. x_data = np.random.random(size=[10, 1]).astype(np.float32)
  13. data = fluid.dygraph.to_variable(x_data)
  14. hidden = linear(data)
  15. avg_loss = fluid.layers.mean(hidden)
  16. # scale the loss according to the number of trainers.
  17. avg_loss = linear.scale_loss(avg_loss)
  18. avg_loss.backward()
  19. # collect the gradients of trainers.
  20. linear.apply_collective_grads()
  21. adam.minimize(avg_loss)
  22. linear.clear_gradients()

apply_collective_grads ( )

AllReduce(规约)参数的梯度值。

返回:无

代码示例

  1. import numpy as np
  2. import paddle.fluid as fluid
  3. place = fluid.CUDAPlace(fluid.dygraph.ParallelEnv().dev_id)
  4. with fluid.dygraph.guard(place):
  5. # prepare the data parallel context
  6. strategy = fluid.dygraph.prepare_context()
  7. linear = fluid.dygraph.Linear(1, 10, act="softmax")
  8. adam = fluid.optimizer.AdamOptimizer(
  9. learning_rate=0.001, parameter_list=linear.parameters())
  10. # make the module become the data parallelism module
  11. linear = fluid.dygraph.DataParallel(linear, strategy)
  12. x_data = np.random.random(size=[10, 1]).astype(np.float32)
  13. data = fluid.dygraph.to_variable(x_data)
  14. hidden = linear(data)
  15. avg_loss = fluid.layers.mean(hidden)
  16. # scale the loss according to the number of trainers.
  17. avg_loss = linear.scale_loss(avg_loss)
  18. avg_loss.backward()
  19. # collect the gradients of trainers.
  20. linear.apply_collective_grads()
  21. adam.minimize(avg_loss)
  22. linear.clear_gradients()