Putting It All Together
It’s time to implement the process we saw in <>. In code, our process will be implemented something like this for each epoch:
for x,y in dl:
pred = model(x)
loss = loss_func(pred, y)
loss.backward()
parameters -= parameters.grad * lr
First, let’s re-initialize our parameters:
In [ ]:
weights = init_params((28*28,1))
bias = init_params(1)
A DataLoader
can be created from a Dataset
:
In [ ]:
dl = DataLoader(dset, batch_size=256)
xb,yb = first(dl)
xb.shape,yb.shape
Out[ ]:
(torch.Size([256, 784]), torch.Size([256, 1]))
We’ll do the same for the validation set:
In [ ]:
valid_dl = DataLoader(valid_dset, batch_size=256)
Let’s create a mini-batch of size 4 for testing:
In [ ]:
batch = train_x[:4]
batch.shape
Out[ ]:
torch.Size([4, 784])
In [ ]:
preds = linear1(batch)
preds
Out[ ]:
tensor([[-11.1002],
[ 5.9263],
[ 9.9627],
[ -8.1484]], grad_fn=<AddBackward0>)
In [ ]:
loss = mnist_loss(preds, train_y[:4])
loss
Out[ ]:
tensor(0.5006, grad_fn=<MeanBackward0>)
Now we can calculate the gradients:
In [ ]:
loss.backward()
weights.grad.shape,weights.grad.mean(),bias.grad
Out[ ]:
(torch.Size([784, 1]), tensor(-0.0001), tensor([-0.0008]))
Let’s put that all in a function:
In [ ]:
def calc_grad(xb, yb, model):
preds = model(xb)
loss = mnist_loss(preds, yb)
loss.backward()
and test it:
In [ ]:
calc_grad(batch, train_y[:4], linear1)
weights.grad.mean(),bias.grad
Out[ ]:
(tensor(-0.0002), tensor([-0.0015]))
But look what happens if we call it twice:
In [ ]:
calc_grad(batch, train_y[:4], linear1)
weights.grad.mean(),bias.grad
Out[ ]:
(tensor(-0.0003), tensor([-0.0023]))
The gradients have changed! The reason for this is that loss.backward
actually adds the gradients of loss
to any gradients that are currently stored. So, we have to set the current gradients to 0 first:
In [ ]:
weights.grad.zero_()
bias.grad.zero_();
note: Inplace Operations: Methods in PyTorch whose names end in an underscore modify their objects in place. For instance,
bias.zero_()
sets all elements of the tensorbias
to 0.
Our only remaining step is to update the weights and biases based on the gradient and learning rate. When we do so, we have to tell PyTorch not to take the gradient of this step too—otherwise things will get very confusing when we try to compute the derivative at the next batch! If we assign to the data
attribute of a tensor then PyTorch will not take the gradient of that step. Here’s our basic training loop for an epoch:
In [ ]:
def train_epoch(model, lr, params):
for xb,yb in dl:
calc_grad(xb, yb, model)
for p in params:
p.data -= p.grad*lr
p.grad.zero_()
We also want to check how we’re doing, by looking at the accuracy of the validation set. To decide if an output represents a 3 or a 7, we can just check whether it’s greater than 0. So our accuracy for each item can be calculated (using broadcasting, so no loops!) with:
In [ ]:
(preds>0.0).float() == train_y[:4]
Out[ ]:
tensor([[False],
[ True],
[ True],
[False]])
That gives us this function to calculate our validation accuracy:
In [ ]:
def batch_accuracy(xb, yb):
preds = xb.sigmoid()
correct = (preds>0.5) == yb
return correct.float().mean()
We can check it works:
In [ ]:
batch_accuracy(linear1(batch), train_y[:4])
Out[ ]:
tensor(0.5000)
and then put the batches together:
In [ ]:
def validate_epoch(model):
accs = [batch_accuracy(model(xb), yb) for xb,yb in valid_dl]
return round(torch.stack(accs).mean().item(), 4)
In [ ]:
validate_epoch(linear1)
Out[ ]:
0.5219
That’s our starting point. Let’s train for one epoch, and see if the accuracy improves:
In [ ]:
lr = 1.
params = weights,bias
train_epoch(linear1, lr, params)
validate_epoch(linear1)
Out[ ]:
0.6883
Then do a few more:
In [ ]:
for i in range(20):
train_epoch(linear1, lr, params)
print(validate_epoch(linear1), end=' ')
0.8314 0.9017 0.9227 0.9349 0.9438 0.9501 0.9535 0.9564 0.9594 0.9618 0.9613 0.9638 0.9643 0.9652 0.9662 0.9677 0.9687 0.9691 0.9691 0.9696
Looking good! We’re already about at the same accuracy as our “pixel similarity” approach, and we’ve created a general-purpose foundation we can build on. Our next step will be to create an object that will handle the SGD step for us. In PyTorch, it’s called an optimizer.
Creating an Optimizer
Because this is such a general foundation, PyTorch provides some useful classes to make it easier to implement. The first thing we can do is replace our linear1
function with PyTorch’s nn.Linear
module. A module is an object of a class that inherits from the PyTorch nn.Module
class. Objects of this class behave identically to standard Python functions, in that you can call them using parentheses and they will return the activations of a model.
nn.Linear
does the same thing as our init_params
and linear
together. It contains both the weights and biases in a single class. Here’s how we replicate our model from the previous section:
In [ ]:
linear_model = nn.Linear(28*28,1)
Every PyTorch module knows what parameters it has that can be trained; they are available through the parameters
method:
In [ ]:
w,b = linear_model.parameters()
w.shape,b.shape
Out[ ]:
(torch.Size([1, 784]), torch.Size([1]))
We can use this information to create an optimizer:
In [ ]:
class BasicOptim:
def __init__(self,params,lr): self.params,self.lr = list(params),lr
def step(self, *args, **kwargs):
for p in self.params: p.data -= p.grad.data * self.lr
def zero_grad(self, *args, **kwargs):
for p in self.params: p.grad = None
We can create our optimizer by passing in the model’s parameters:
In [ ]:
opt = BasicOptim(linear_model.parameters(), lr)
Our training loop can now be simplified to:
In [ ]:
def train_epoch(model):
for xb,yb in dl:
calc_grad(xb, yb, model)
opt.step()
opt.zero_grad()
Our validation function doesn’t need to change at all:
In [ ]:
validate_epoch(linear_model)
Out[ ]:
0.4157
Let’s put our little training loop in a function, to make things simpler:
In [ ]:
def train_model(model, epochs):
for i in range(epochs):
train_epoch(model)
print(validate_epoch(model), end=' ')
The results are the same as in the previous section:
In [ ]:
train_model(linear_model, 20)
0.4932 0.8618 0.8203 0.9102 0.9331 0.9468 0.9555 0.9629 0.9658 0.9673 0.9687 0.9707 0.9726 0.9751 0.9761 0.9761 0.9775 0.978 0.9785 0.9785
fastai provides the SGD
class which, by default, does the same thing as our BasicOptim
:
In [ ]:
linear_model = nn.Linear(28*28,1)
opt = SGD(linear_model.parameters(), lr)
train_model(linear_model, 20)
0.4932 0.852 0.8335 0.9116 0.9326 0.9473 0.9555 0.9624 0.9648 0.9668 0.9692 0.9712 0.9731 0.9746 0.9761 0.9765 0.9775 0.978 0.9785 0.9785
fastai also provides Learner.fit
, which we can use instead of train_model
. To create a Learner
we first need to create a DataLoaders
, by passing in our training and validation DataLoader
s:
In [ ]:
dls = DataLoaders(dl, valid_dl)
To create a Learner
without using an application (such as cnn_learner
) we need to pass in all the elements that we’ve created in this chapter: the DataLoaders
, the model, the optimization function (which will be passed the parameters), the loss function, and optionally any metrics to print:
In [ ]:
learn = Learner(dls, nn.Linear(28*28,1), opt_func=SGD,
loss_func=mnist_loss, metrics=batch_accuracy)
Now we can call fit
:
In [ ]:
learn.fit(10, lr=lr)
epoch | train_loss | valid_loss | batch_accuracy | time |
---|---|---|---|---|
0 | 0.636857 | 0.503549 | 0.495584 | 00:00 |
1 | 0.545725 | 0.170281 | 0.866045 | 00:00 |
2 | 0.199223 | 0.184893 | 0.831207 | 00:00 |
3 | 0.086580 | 0.107836 | 0.911187 | 00:00 |
4 | 0.045185 | 0.078481 | 0.932777 | 00:00 |
5 | 0.029108 | 0.062792 | 0.946516 | 00:00 |
6 | 0.022560 | 0.053017 | 0.955348 | 00:00 |
7 | 0.019687 | 0.046500 | 0.962218 | 00:00 |
8 | 0.018252 | 0.041929 | 0.965162 | 00:00 |
9 | 0.017402 | 0.038573 | 0.967615 | 00:00 |
As you can see, there’s nothing magic about the PyTorch and fastai classes. They are just convenient pre-packaged pieces that make your life a bit easier! (They also provide a lot of extra functionality we’ll be using in future chapters.)
With these classes, we can now replace our linear model with a neural network.