扩展PyTorch

本篇文章中包含如何扩展torch.nntorch.autograd和使用C库来编写自定义的C扩展工具。

扩展torch.autograd

添加操作autograd需要Function为每个操作实现一个新的子类。回想一下,Function使用autograd来计算结果和梯度,并对操作历史进行编码。每个新功能都需要您实现两种方法:

  • forward() - 执行操作的代码。如果您指定了默认值,则可以根据需求使用任意参数,其中一些参数可选。这里支持各种Python对象。Variable参数在调用之前会被转换Tensor,并且它们的使用情况将在graph中注册。请注意,此逻辑不会遍历lists/dicts/和其他任何数据的结构,并且只考虑被直接调用的Variables参数。如果有多个输出你可以返回单个TensorTensor格式的元组。另外,请参阅Function文档查找只能被forward()调用的有用方法的说明。

  • backward() - 计算梯度的公式. 它将被赋予与输出一样多的Variable参数, 其中的每一个表示对应梯度的输出. 它应该返回与输入一样多的Variable, 其中的每一个表示都包含其相应输入的梯度. 如果输入不需要计算梯度 (请参阅needs_input_grad属性),或者是非Variable对象,则可返回None类.此外,如果你在forward()方法中有可选的参数,则可以返回比输入更多的梯度,只要它们都是None类型即可.

你可以从下面的代码看到torch.nn模块的Linear函数, 以及注解

  1. # Inherit from Function
  2. class Linear(Function):
  3. # bias is an optional argument
  4. def forward(self, input, weight, bias=None):
  5. self.save_for_backward(input, weight, bias)
  6. output = input.mm(weight.t())
  7. if bias is not None:
  8. output += bias.unsqueeze(0).expand_as(output)
  9. return output
  10. # This function has only a single output, so it gets only one gradient
  11. def backward(self, grad_output):
  12. # This is a pattern that is very convenient - at the top of backward
  13. # unpack saved_tensors and initialize all gradients w.r.t. inputs to
  14. # None. Thanks to the fact that additional trailing Nones are
  15. # ignored, the return statement is simple even when the function has
  16. # optional inputs.
  17. input, weight, bias = self.saved_tensors
  18. grad_input = grad_weight = grad_bias = None
  19. # These needs_input_grad checks are optional and there only to
  20. # improve efficiency. If you want to make your code simpler, you can
  21. # skip them. Returning gradients for inputs that don't require it is
  22. # not an error.
  23. if self.needs_input_grad[0]:
  24. grad_input = grad_output.mm(weight)
  25. if self.needs_input_grad[1]:
  26. grad_weight = grad_output.t().mm(input)
  27. if bias is not None and self.needs_input_grad[2]:
  28. grad_bias = grad_output.sum(0).squeeze(0)
  29. return grad_input, grad_weight, grad_bias

现在,为了更方便使用这些自定义操作,推荐使用apply方法:

  1. linear = LinearFunction.apply

我们下面给出一个由非变量参数进行参数化的函数的例子:

  1. class MulConstant(Function):
  2. @staticmethod
  3. def forward(ctx, tensor, constant):
  4. # ctx is a context object that can be used to stash information
  5. # for backward computation
  6. ctx.constant = constant
  7. return tensor * constant
  8. @staticmethod
  9. def backward(ctx, grad_output):
  10. # We return as many input gradients as there were arguments.
  11. # Gradients of non-Tensor arguments to forward must be None.
  12. return grad_output * ctx.constant, None

你可能想检测你刚刚实现的backward方法是否正确的计算了梯度。你可以使用小的有限差分法(Finite Difference)进行数值估计。

  1. from torch.autograd import gradcheck
  2. # gradcheck takes a tuple of tensors as input, check if your gradient
  3. # evaluated with these tensors are close enough to numerical
  4. # approximations and returns True if they all verify this condition.
  5. input = (Variable(torch.randn(20,20).double(), requires_grad=True), Variable(torch.randn(30,20).double(), requires_grad=True),)
  6. test = gradcheck(Linear.apply, input, eps=1e-6, atol=1e-4)
  7. print(test)

扩展 torch.nn

nn模块包含两种接口 - modules和他们的功能版本。你可以用两种方法扩展它,但是我们建议,在扩展layer的时候使用modules, 因为modules保存着参数和buffer。如果使用无参数操作的话,那么建议使用激活函数,池化等函数。

添加操作的功能版本已经在上面的章节中已经介绍了。

增加一个Module

由于nn大量使用autograd。所以, 添加一个新的Module类需要实现一个Function类, 它会执行对应的操作并且计算梯度。我们只需要很少的代码就可以实现上面Linear模块的功能。现在,我们需要实现两个函数:

  • __init__ (optional) - 接收kernel sizes内核大小,特征数量等参数,并初始化parameters参数和buffers缓冲区。
  • forward() - 实例化Function并使用它来执行操作。它与上面显示的functional wrapper非常相似。

下面是实现Linear模块的方式:

  1. class Linear(nn.Module):
  2. def __init__(self, input_features, output_features, bias=True):
  3. super(Linear, self).__init__()
  4. self.input_features = input_features
  5. self.output_features = output_features
  6. # nn.Parameter is a special kind of Variable, that will get
  7. # automatically registered as Module's parameter once it's assigned
  8. # as an attribute. Parameters and buffers need to be registered, or
  9. # they won't appear in .parameters() (doesn't apply to buffers), and
  10. # won't be converted when e.g. .cuda() is called. You can use
  11. # .register_buffer() to register buffers.
  12. # nn.Parameters require gradients by default.
  13. self.weight = nn.Parameter(torch.Tensor(output_features, input_features))
  14. if bias:
  15. self.bias = nn.Parameter(torch.Tensor(output_features))
  16. else:
  17. # You should always register all possible parameters, but the
  18. # optional ones can be None if you want.
  19. self.register_parameter('bias', None)
  20. # Not a very smart way to initialize weights
  21. self.weight.data.uniform_(-0.1, 0.1)
  22. if bias is not None:
  23. self.bias.data.uniform_(-0.1, 0.1)
  24. def forward(self, input):
  25. # See the autograd section for explanation of what happens here.
  26. return LinearFunction.apply(input, self.weight, self.bias)
  27. def extra_repr(self):
  28. # (Optional)Set the extra information about this module. You can test
  29. # it by printing an object of this class.
  30. return 'in_features={}, out_features={}, bias={}'.format(
  31. self.in_features, self.out_features, self.bias is not None
  32. )

编写自定义的C扩展

即将发布。不过现在你可以在GitHub上找到一些例子 。

译者署名

用户名 头像 职能 签名
Song 扩展PyTorch - 图1 翻译 人生总要追求点什么