BuildStrategy

class paddle.static. BuildStrategy

BuildStrategy 使用户更方便地控制 ParallelExecutor 中计算图的建造方法,可通过设置 ParallelExecutor 中的 BuildStrategy 成员来实现此功能。

返回

BuildStrategy,一个BuildStrategy的实例

代码示例

  1. import os
  2. import paddle
  3. import paddle.static as static
  4. paddle.enable_static()
  5. os.environ['CPU_NUM'] = str(2)
  6. places = static.cpu_places()
  7. data = static.data(name="x", shape=[None, 1], dtype="float32")
  8. hidden = static.nn.fc(x=data, size=10)
  9. loss = paddle.mean(hidden)
  10. paddle.optimizer.SGD(learning_rate=0.01).minimize(loss)
  11. build_strategy = static.BuildStrategy()
  12. build_strategy.enable_inplace = True
  13. build_strategy.memory_optimize = True
  14. build_strategy.reduce_strategy = static.BuildStrategy.ReduceStrategy.Reduce
  15. program = static.CompiledProgram(static.default_main_program())
  16. program = program.with_data_parallel(loss_name=loss.name,
  17. build_strategy=build_strategy,
  18. places=places)

debug_graphviz_path

str类型。表示以graphviz格式向文件中写入计算图的路径,有利于调试。默认值为空字符串。

代码示例

  1. import paddle
  2. import paddle.static as static
  3. paddle.enable_static()
  4. build_strategy = static.BuildStrategy()
  5. build_strategy.debug_graphviz_path = "./graph"

enable_sequential_execution

bool类型。如果设置为True,则算子的执行顺序将与算子定义的执行顺序相同。默认为False。

代码示例

  1. import paddle
  2. import paddle.static as static
  3. paddle.enable_static()
  4. build_strategy = static.BuildStrategy()
  5. build_strategy.enable_sequential_execution = True

fuse_broadcast_ops

bool类型。表明是否融合(fuse) broadcast ops。该选项指在Reduce模式下有效,使程序运行更快。默认为False。

代码示例

  1. import paddle
  2. import paddle.static as static
  3. paddle.enable_static()
  4. build_strategy = static.BuildStrategy()
  5. build_strategy.fuse_broadcast_ops = True

fuse_elewise_add_act_ops

bool类型。表明是否融合(fuse) elementwise_add_op和activation_op。这会使整体执行过程更快。默认为False。

代码示例

  1. import paddle
  2. import paddle.static as static
  3. paddle.enable_static()
  4. build_strategy = static.BuildStrategy()
  5. build_strategy.fuse_elewise_add_act_ops = True

fuse_relu_depthwise_conv

bool类型。表明是否融合(fuse) relu和depthwise_conv2d,节省GPU内存并可能加速执行过程。此选项仅适用于GPU设备。默认为False。

代码示例

  1. import paddle
  2. import paddle.static as static
  3. paddle.enable_static()
  4. build_strategy = static.BuildStrategy()
  5. build_strategy.fuse_relu_depthwise_conv = True

gradient_scale_strategy

paddle.static.BuildStrategy.GradientScaleStrategy 类型。在 ParallelExecutor 中,存在三种定义loss对应梯度( loss@grad )的方式,分别为 CoeffNumDevice, OneCustomized。默认情况下, ParallelExecutor 根据设备数目来设置 loss@grad 。如果用户需要自定义 loss@grad ,可以选择 Customized 方法。默认为 CoeffNumDevice

代码示例

  1. import numpy
  2. import os
  3. import paddle
  4. import paddle.static as static
  5. paddle.enable_static()
  6. use_cuda = True
  7. place = paddle.CUDAPlace(0) if use_cuda else paddle.CPUPlace()
  8. exe = static.Executor(place)
  9. # NOTE: If you use CPU to run the program, you need
  10. # to specify the CPU_NUM, otherwise, paddle will use
  11. # all the number of the logic core as the CPU_NUM,
  12. # in that case, the batch size of the input should be
  13. # greater than CPU_NUM, if not, the process will be
  14. # failed by an exception.
  15. if not use_cuda:
  16. os.environ['CPU_NUM'] = str(2)
  17. places = static.cpu_places()
  18. else:
  19. places = static.cuda_places()
  20. data = static.data(name='X', shape=[None, 1], dtype='float32')
  21. hidden = static.nn.fc(x=data, size=10)
  22. loss = paddle.mean(hidden)
  23. paddle.optimizer.SGD(learning_rate=0.01).minimize(loss)
  24. exe.run(static.default_startup_program())
  25. build_strategy = static.BuildStrategy()
  26. build_strategy.gradient_scale_strategy =
  27. static.BuildStrategy.GradientScaleStrategy.Customized
  28. compiled_prog = static.CompiledProgram(
  29. static.default_main_program()).with_data_parallel(
  30. loss_name=loss.name, build_strategy=build_strategy,
  31. places=places)
  32. dev_count = len(places)
  33. x = numpy.random.random(size=(10, 1)).astype('float32')
  34. loss_grad = numpy.ones((dev_count)).astype("float32") * 0.01
  35. loss_grad_name = loss.name+"@GRAD"
  36. loss_data = exe.run(compiled_prog,
  37. feed={"X": x, loss_grad_name : loss_grad},
  38. fetch_list=[loss.name, loss_grad_name])

memory_optimize

bool类型或None。设为True时可用于减少总内存消耗,False表示不使用,None表示框架会自动选择使用或者不使用优化策略。当前,None意味着当GC不能使用时,优化策略将被使用。默认为None。

reduce_strategy

static.BuildStrategy.ReduceStrategy 类型。在 ParallelExecutor 中,存在两种参数梯度聚合策略,即 AllReduceReduce 。如果用户需要在所有执行设备上独立地进行参数更新,可以使用 AllReduce 。如果使用 Reduce 策略,所有参数的优化将均匀地分配给不同的执行设备,随之将优化后的参数广播给其他执行设备。 默认值为 AllReduce

代码示例

  1. import paddle
  2. import paddle.static as static
  3. paddle.enable_static()
  4. build_strategy = static.BuildStrategy()
  5. build_strategy.reduce_strategy = static.BuildStrategy.ReduceStrategy.Reduce

remove_unnecessary_lock

bool类型。设置True会去除GPU操作中的一些锁操作, ParallelExecutor 将运行得更快,默认为True。

代码示例

  1. import paddle
  2. import paddle.static as static
  3. paddle.enable_static()
  4. build_strategy = static.BuildStrategy()
  5. build_strategy.remove_unnecessary_lock = True

sync_batch_norm

bool类型。表示是否使用同步的批正则化,即在训练阶段通过多个设备同步均值和方差。当前的实现不支持FP16训练和CPU。并且目前仅支持仅在一台机器上进行同步式批正则。默认为 False。

代码示例

  1. import paddle
  2. import paddle.static as static
  3. paddle.enable_static()
  4. build_strategy = static.BuildStrategy()
  5. build_strategy.sync_batch_norm = True