RecomputeOptimizer

注意:该API仅支持【静态图】模式

  • class paddle.fluid.optimizer.RecomputeOptimizer(optimizer)[源代码]

通常来讲,一个深度学习的训练流程包含了三个子步骤:首先,运行前向算子来计算Variable和loss的值;其次,运行反向算子来计算参数的梯度;最后,应用优化算法以更新参数值。

在前向运算过程中,反向运算会用到的Variable都会保存在内存中,当模型深度很深时,这会占用大量的内存。

重计算将深度学习网络切分为k个部分(segments)。在每个segment,运行反向运算时会首先运算前向计算。在重计算模式下,前向计算除了checkpoint和一些必须存储在内存中的特殊Variable,其他临时Variable都会被释放,这对节省内存非常有益。

把一个深度学习网络切分为k个segments的Variables被称为checkpoints。用户在使用运行RecomputeOptimizer之前需要先设置checkpoints。

  • 参数:
    • optimizer (Optimizer)-内部优化器

代码示例

  1. import paddle.fluid as fluid
  2. import numpy as np
  3. def gen_data():
  4. return {"x": np.random.random(size=(32, 32)).astype('float32'),
  5. "y": np.random.randint(2, size=(32, 1)).astype('int64')}
  6. def mlp(input_x, input_y, hid_dim=128, label_dim=2):
  7. print(input_x)
  8. fc_1 = fluid.layers.fc(input=input_x, size=hid_dim)
  9. prediction = fluid.layers.fc(input=[fc_1], size=label_dim, act='softmax')
  10. cost = fluid.layers.cross_entropy(input=prediction, label=input_y)
  11. sum_cost = fluid.layers.reduce_mean(cost)
  12. return sum_cost, fc_1, prediction
  13. input_x = fluid.layers.data(name="x", shape=[32], dtype='float32')
  14. input_y = fluid.layers.data(name="y", shape=[1], dtype='int64')
  15. cost, fc_1, pred = mlp(input_x, input_y)
  16.  
  17. sgd = fluid.optimizer.Adam(learning_rate=0.01)
  18. sgd = fluid.optimizer.RecomputeOptimizer(sgd)
  19. sgd._set_checkpoints([fc_1, pred])
  20. sgd.minimize(cost)
  21.  
  22. print("Finished optimize")
  23. place = fluid.CPUPlace()
  24. exe = fluid.Executor(place)
  25. exe.run(fluid.default_startup_program())
  26. step = 10
  27.  
  28. for i in range(step):
  29. cost_val = exe.run(feed=gen_data(),
  30. program=fluid.default_main_program(),
  31. fetch_list=[cost.name])
  32. print("step=%d cost=%f" % (i, cost_val[0]))
  • apply_gradients(params_grads)

调用self.apply_gradients

  • 参数:
    • params_grads (list)- 用于优化的(param, grad)对组成的列表

返回: 附加在当前Program的优化算子组成的列表

返回类型: list

代码示例

  1. import paddle.fluid as fluid
  2. import paddle.fluid.framework as framework
  3.  
  4. def mlp(input_x, input_y, hid_dim=128, label_dim=2):
  5. fc_1 = fluid.layers.fc(input=input_x, size=hid_dim)
  6. prediction = fluid.layers.fc(input=[fc_1], size=label_dim, act='softmax')
  7. cost = fluid.layers.cross_entropy(input=prediction, label=input_y)
  8. sum_cost = fluid.layers.reduce_mean(cost)
  9. return sum_cost, fc_1, prediction
  10.  
  11. input_x = fluid.layers.data(name="x", shape=[32], dtype='float32')
  12. input_y = fluid.layers.data(name="y", shape=[1], dtype='int64')
  13. cost, fc_1, pred = mlp(input_x, input_y)
  14. print("Finished FF")
  15.  
  16. sgd = fluid.optimizer.Adam(learning_rate=0.01)
  17. sgd = fluid.optimizer.RecomputeOptimizer(sgd)
  18. params_grads = sgd.backward(
  19. cost,
  20. startup_program=None,
  21. parameter_list=None,
  22. no_grad_set=None,
  23. checkpoints=[fc_1, pred])
  24.  
  25. program = cost.block.program
  26. with framework.program_guard(program, None):
  27. optimize_ops = sgd.apply_gradients(params_grads)
  28.  
  29. print("Finished apply gradients")
  • apply_optimize(loss, startup_program, params_grads)

调用self._optimizer的apply_optimize函数

  • 参数:
    • loss (Variable) – 用于优化过程的损失值变量
    • startup_program (Program) – 用于初始化在parameter_list中参数的startup_program
    • params_grads (list)- 用于优化的(param, grad)对组成的列表

返回: 附加在当前Program的算子组成的列表

返回类型: list

代码示例

  1. import paddle.fluid as fluid
  2.  
  3. def mlp(input_x, input_y, hid_dim=128, label_dim=2):
  4. fc_1 = fluid.layers.fc(input=input_x, size=hid_dim)
  5. prediction = fluid.layers.fc(input=[fc_1], size=label_dim, act='softmax')
  6. cost = fluid.layers.cross_entropy(input=prediction, label=input_y)
  7. sum_cost = fluid.layers.reduce_mean(cost)
  8. return sum_cost, fc_1, prediction
  9.  
  10. input_x = fluid.layers.data(name="x", shape=[32], dtype='float32')
  11. input_y = fluid.layers.data(name="y", shape=[1], dtype='int64')
  12. cost, fc_1, pred = mlp(input_x, input_y)
  13. print("Finished FF")
  14.  
  15. sgd = fluid.optimizer.Adam(learning_rate=0.01)
  16. sgd = fluid.optimizer.RecomputeOptimizer(sgd)
  17. params_grads = sgd.backward(
  18. cost,
  19. startup_program=None,
  20. parameter_list=None,
  21. no_grad_set=None,
  22. checkpoints=[fc_1, pred])
  23.  
  24. optimize_ops = sgd.apply_optimize(
  25. cost, startup_program=None, params_grads=params_grads)
  26.  
  27. print("Finished apply_optimize")
  • backward(loss, startup_program=None, parameter_list=None, no_grad_set=None, callbacks=None)

带checkpoint的backward函数

  • 参数:
    • loss (Variable) – 需要最小化的损失值变量
    • startup_program (Program, 可选) – 用于初始化parameter_list中参数的 Program , 默认值为None,此时将使用 default_startup_program
    • parameter_list (list, 可选) – 待更新的Parameter或者Parameter.name组成的列表, 默认值为None,此时将更新所有的Parameter
    • no_grad_set (set, 可选) – 不需要更新的Parameter或者Parameter.name组成的的集合,默认值为None
    • callbacks (list, 可选) – 当为某参数附加反向算子时所要运行的callables组成的列表
    • checkpoints (list, 可选) – 一批作为checkpoints的Variables

返回: 由(param, grad)对构成的列表,其中param是参数,grad是其对应的梯度

返回类型: list

代码示例

  1. import paddle.fluid as fluid
  2.  
  3. def mlp(input_x, input_y, hid_dim=128, label_dim=2):
  4. fc_1 = fluid.layers.fc(input=input_x, size=hid_dim)
  5. prediction = fluid.layers.fc(input=[fc_1], size=label_dim, act='softmax')
  6. cost = fluid.layers.cross_entropy(input=prediction, label=input_y)
  7. sum_cost = fluid.layers.reduce_mean(cost)
  8. return sum_cost, fc_1, prediction
  9.  
  10. input_x = fluid.layers.data(name="x", shape=[32], dtype='float32')
  11. input_y = fluid.layers.data(name="y", shape=[1], dtype='int64')
  12. cost, fc_1, pred = mlp(input_x, input_y)
  13. print("Finished FF")
  14.  
  15. sgd = fluid.optimizer.Adam(learning_rate=0.01)
  16. sgd = fluid.optimizer.RecomputeOptimizer(sgd)
  17. params_grads = sgd.backward(
  18. cost,
  19. startup_program=None,
  20. parameter_list=None,
  21. no_grad_set=None,
  22. checkpoints=[fc_1, pred])
  23. print("Finished backward")
  • load(stat_dict)

Recompute Optimizer 目前不支持load函数

  • 参数:
    • stat_dict – load_persistable方法加载的dict

代码示例

  1. import paddle.fluid as fluid
  2. import paddle.compat as cpt
  3.  
  4. def mlp(input_x, input_y, hid_dim=128, label_dim=2):
  5. fc_1 = fluid.layers.fc(input=input_x, size=hid_dim)
  6. prediction = fluid.layers.fc(input=[fc_1], size=label_dim, act='softmax')
  7. cost = fluid.layers.cross_entropy(input=prediction, label=input_y)
  8. sum_cost = fluid.layers.reduce_mean(cost)
  9. return sum_cost, fc_1, prediction
  10.  
  11. input_x = fluid.layers.data(name="x", shape=[32], dtype='float32')
  12. input_y = fluid.layers.data(name="y", shape=[1], dtype='int64')
  13. cost, fc_1, pred = mlp(input_x, input_y)
  14. print("Finished FF")
  15.  
  16. sgd = fluid.optimizer.Adam(learning_rate=0.01)
  17. sgd = fluid.optimizer.RecomputeOptimizer(sgd)
  18. sgd._set_checkpoints([fc_1, pred])
  19. try:
  20. stat_dict = {}
  21. sgd.load(stat_dict)
  22. except NotImplementedError as e:
  23. print(cpt.get_exception_message(e))