Module and Parameter
To create a model, we’ll need Module
. To create Module
, we’ll need Parameter
, so let’s start there. Recall that in <> we said that the Parameter
class “doesn’t actually add any functionality (other than automatically calling requires_grad_
for us). It’s only used as a “marker” to show what to include in parameters
.” Here’s a definition which does exactly that:
In [ ]:
class Parameter(Tensor):
def __new__(self, x): return Tensor._make_subclass(Parameter, x, True)
def __init__(self, *args, **kwargs): self.requires_grad_()
The implementation here is a bit awkward: we have to define the special __new__
Python method and use the internal PyTorch method _make_subclass
because, as at the time of writing, PyTorch doesn’t otherwise work correctly with this kind of subclassing or provide an officially supported API to do this. This may have been fixed by the time you read this, so look on the book’s website to see if there are updated details.
Our Parameter
now behaves just like a tensor, as we wanted:
In [ ]:
Parameter(tensor(3.))
Out[ ]:
tensor(3., requires_grad=True)
Now that we have this, we can define Module
:
In [ ]:
class Module:
def __init__(self):
self.hook,self.params,self.children,self._training = None,[],[],False
def register_parameters(self, *ps): self.params += ps
def register_modules (self, *ms): self.children += ms
@property
def training(self): return self._training
@training.setter
def training(self,v):
self._training = v
for m in self.children: m.training=v
def parameters(self):
return self.params + sum([m.parameters() for m in self.children], [])
def __setattr__(self,k,v):
super().__setattr__(k,v)
if isinstance(v,Parameter): self.register_parameters(v)
if isinstance(v,Module): self.register_modules(v)
def __call__(self, *args, **kwargs):
res = self.forward(*args, **kwargs)
if self.hook is not None: self.hook(res, args)
return res
def cuda(self):
for p in self.parameters(): p.data = p.data.cuda()
The key functionality is in the definition of parameters
:
self.params + sum([m.parameters() for m in self.children], [])
This means that we can ask any Module
for its parameters, and it will return them, including for all its child modules (recursively). But how does it know what its parameters are? It’s thanks to implementing Python’s special __setattr__
method, which is called for us any time Python sets an attribute on a class. Our implementation includes this line:
if isinstance(v,Parameter): self.register_parameters(v)
As you see, this is where we use our new Parameter
class as a “marker”—anything of this class is added to our params
.
Python’s __call__
allows us to define what happens when our object is treated as a function; we just call forward
(which doesn’t exist here, so it’ll need to be added by subclasses). Before we do, we’ll call a hook, if it’s defined. Now you can see that PyTorch hooks aren’t doing anything fancy at all—they’re just calling any hooks have been registered.
Other than these pieces of functionality, our Module
also provides cuda
and training
attributes, which we’ll use shortly.
Now we can create our first Module
, which is ConvLayer
:
In [ ]:
class ConvLayer(Module):
def __init__(self, ni, nf, stride=1, bias=True, act=True):
super().__init__()
self.w = Parameter(torch.zeros(nf,ni,3,3))
self.b = Parameter(torch.zeros(nf)) if bias else None
self.act,self.stride = act,stride
init = nn.init.kaiming_normal_ if act else nn.init.xavier_normal_
init(self.w)
def forward(self, x):
x = F.conv2d(x, self.w, self.b, stride=self.stride, padding=1)
if self.act: x = F.relu(x)
return x
We’re not implementing F.conv2d
from scratch, since you should have already done that (using unfold
) in the questionnaire in <>. Instead, we’re just creating a small class that wraps it up along with bias and weight initialization. Let’s check that it works correctly with Module.parameters
:
In [ ]:
l = ConvLayer(3, 4)
len(l.parameters())
Out[ ]:
2
And that we can call it (which will result in forward
being called):
In [ ]:
xbt = tfm_x(xb)
r = l(xbt)
r.shape
Out[ ]:
torch.Size([128, 4, 64, 64])
In the same way, we can implement Linear
:
In [ ]:
class Linear(Module):
def __init__(self, ni, nf):
super().__init__()
self.w = Parameter(torch.zeros(nf,ni))
self.b = Parameter(torch.zeros(nf))
nn.init.xavier_normal_(self.w)
def forward(self, x): return x@self.w.t() + self.b
and test if it works:
In [ ]:
l = Linear(4,2)
r = l(torch.ones(3,4))
r.shape
Out[ ]:
torch.Size([3, 2])
Let’s also create a testing module to check that if we include multiple parameters as attributes, they are all correctly registered:
In [ ]:
class T(Module):
def __init__(self):
super().__init__()
self.c,self.l = ConvLayer(3,4),Linear(4,2)
Since we have a conv layer and a linear layer, each of which has weights and biases, we’d expect four parameters in total:
In [ ]:
t = T()
len(t.parameters())
Out[ ]:
4
We should also find that calling cuda
on this class puts all these parameters on the GPU:
In [ ]:
t.cuda()
t.l.w.device
Out[ ]:
device(type='cuda', index=5)
We can now use those pieces to create a CNN.
Simple CNN
As we’ve seen, a Sequential
class makes many architectures easier to implement, so let’s make one:
In [ ]:
class Sequential(Module):
def __init__(self, *layers):
super().__init__()
self.layers = layers
self.register_modules(*layers)
def forward(self, x):
for l in self.layers: x = l(x)
return x
The forward
method here just calls each layer in turn. Note that we have to use the register_modules
method we defined in Module
, since otherwise the contents of layers
won’t appear in parameters
.
important: All The Code is Here: Remember that we’re not using any PyTorch functionality for modules here; we’re defining everything ourselves. So if you’re not sure what
register_modules
does, or why it’s needed, have another look at our code forModule
to see what we wrote!
We can create a simplified AdaptivePool
that only handles pooling to a 1×1 output, and flattens it as well, by just using mean
:
In [ ]:
class AdaptivePool(Module):
def forward(self, x): return x.mean((2,3))
That’s enough for us to create a CNN!
In [ ]:
def simple_cnn():
return Sequential(
ConvLayer(3 ,16 ,stride=2), #32
ConvLayer(16,32 ,stride=2), #16
ConvLayer(32,64 ,stride=2), # 8
ConvLayer(64,128,stride=2), # 4
AdaptivePool(),
Linear(128, 10)
)
Let’s see if our parameters are all being registered correctly:
In [ ]:
m = simple_cnn()
len(m.parameters())
Out[ ]:
10
Now we can try adding a hook. Note that we’ve only left room for one hook in Module
; you could make it a list, or use something like Pipeline
to run a few as a single function:
In [ ]:
def print_stats(outp, inp): print (outp.mean().item(),outp.std().item())
for i in range(4): m.layers[i].hook = print_stats
r = m(xbt)
r.shape
0.5239089727401733 0.8776043057441711
0.43470510840415955 0.8347987532615662
0.4357188045978546 0.7621666193008423
0.46562111377716064 0.7416611313819885
Out[ ]:
torch.Size([128, 10])
We have data and model. Now we need a loss function.