04 Under the Hood: Training a Digit Classifier - Putting It All Together - 《The fastai book》

Putting It All Together
- Creating an Optimizer

Putting It All Together

It’s time to implement the process we saw in <>. In code, our process will be implemented something like this for each epoch:

for x,y in dl:
    pred = model(x)
    loss = loss_func(pred, y)
    loss.backward()
    parameters -= parameters.grad * lr

First, let’s re-initialize our parameters:

In [ ]:

weights = init_params((28*28,1))
bias = init_params(1)

A DataLoader can be created from a Dataset:

In [ ]:

dl = DataLoader(dset, batch_size=256)
xb,yb = first(dl)
xb.shape,yb.shape

Out[ ]:

(torch.Size([256, 784]), torch.Size([256, 1]))

We’ll do the same for the validation set:

In [ ]:

valid_dl = DataLoader(valid_dset, batch_size=256)

Let’s create a mini-batch of size 4 for testing:

In [ ]:

batch = train_x[:4]
batch.shape

Out[ ]:

torch.Size([4, 784])

In [ ]:

preds = linear1(batch)
preds

Out[ ]:

tensor([[-11.1002],
        [  5.9263],
        [  9.9627],
        [ -8.1484]], grad_fn=<AddBackward0>)

In [ ]:

loss = mnist_loss(preds, train_y[:4])
loss

Out[ ]:

tensor(0.5006, grad_fn=<MeanBackward0>)

Now we can calculate the gradients:

In [ ]:

loss.backward()
weights.grad.shape,weights.grad.mean(),bias.grad

Out[ ]:

(torch.Size([784, 1]), tensor(-0.0001), tensor([-0.0008]))

Let’s put that all in a function:

In [ ]:

def calc_grad(xb, yb, model):
    preds = model(xb)
    loss = mnist_loss(preds, yb)
    loss.backward()

and test it:

In [ ]:

calc_grad(batch, train_y[:4], linear1)
weights.grad.mean(),bias.grad

Out[ ]:

(tensor(-0.0002), tensor([-0.0015]))

But look what happens if we call it twice:

In [ ]:

calc_grad(batch, train_y[:4], linear1)
weights.grad.mean(),bias.grad

Out[ ]:

(tensor(-0.0003), tensor([-0.0023]))

The gradients have changed! The reason for this is that loss.backward actually adds the gradients of loss to any gradients that are currently stored. So, we have to set the current gradients to 0 first:

In [ ]:

weights.grad.zero_()
bias.grad.zero_();

note: Inplace Operations: Methods in PyTorch whose names end in an underscore modify their objects in place. For instance, bias.zero_() sets all elements of the tensor bias to 0.

Our only remaining step is to update the weights and biases based on the gradient and learning rate. When we do so, we have to tell PyTorch not to take the gradient of this step too—otherwise things will get very confusing when we try to compute the derivative at the next batch! If we assign to the data attribute of a tensor then PyTorch will not take the gradient of that step. Here’s our basic training loop for an epoch:

In [ ]:

def train_epoch(model, lr, params):
    for xb,yb in dl:
        calc_grad(xb, yb, model)
        for p in params:
            p.data -= p.grad*lr
            p.grad.zero_()

We also want to check how we’re doing, by looking at the accuracy of the validation set. To decide if an output represents a 3 or a 7, we can just check whether it’s greater than 0. So our accuracy for each item can be calculated (using broadcasting, so no loops!) with:

In [ ]:

(preds>0.0).float() == train_y[:4]

Out[ ]:

tensor([[False],
        [ True],
        [ True],
        [False]])

That gives us this function to calculate our validation accuracy:

In [ ]:

def batch_accuracy(xb, yb):
    preds = xb.sigmoid()
    correct = (preds>0.5) == yb
    return correct.float().mean()

We can check it works:

In [ ]:

batch_accuracy(linear1(batch), train_y[:4])

Out[ ]:

tensor(0.5000)

and then put the batches together:

In [ ]:

def validate_epoch(model):
    accs = [batch_accuracy(model(xb), yb) for xb,yb in valid_dl]
    return round(torch.stack(accs).mean().item(), 4)

In [ ]:

validate_epoch(linear1)

Out[ ]:

0.5219

That’s our starting point. Let’s train for one epoch, and see if the accuracy improves:

In [ ]:

lr = 1.
params = weights,bias
train_epoch(linear1, lr, params)
validate_epoch(linear1)

Out[ ]:

0.6883

Then do a few more:

In [ ]:

for i in range(20):
    train_epoch(linear1, lr, params)
    print(validate_epoch(linear1), end=' ')

0.8314 0.9017 0.9227 0.9349 0.9438 0.9501 0.9535 0.9564 0.9594 0.9618 0.9613 0.9638 0.9643 0.9652 0.9662 0.9677 0.9687 0.9691 0.9691 0.9696

Looking good! We’re already about at the same accuracy as our “pixel similarity” approach, and we’ve created a general-purpose foundation we can build on. Our next step will be to create an object that will handle the SGD step for us. In PyTorch, it’s called an optimizer.

Creating an Optimizer

Because this is such a general foundation, PyTorch provides some useful classes to make it easier to implement. The first thing we can do is replace our linear1 function with PyTorch’s nn.Linear module. A module is an object of a class that inherits from the PyTorch nn.Module class. Objects of this class behave identically to standard Python functions, in that you can call them using parentheses and they will return the activations of a model.

nn.Linear does the same thing as our init_params and linear together. It contains both the weights and biases in a single class. Here’s how we replicate our model from the previous section:

In [ ]:

linear_model = nn.Linear(28*28,1)

Every PyTorch module knows what parameters it has that can be trained; they are available through the parameters method:

In [ ]:

w,b = linear_model.parameters()
w.shape,b.shape

Out[ ]:

(torch.Size([1, 784]), torch.Size([1]))

We can use this information to create an optimizer:

In [ ]:

class BasicOptim:
    def __init__(self,params,lr): self.params,self.lr = list(params),lr
    def step(self, *args, **kwargs):
        for p in self.params: p.data -= p.grad.data * self.lr
    def zero_grad(self, *args, **kwargs):
        for p in self.params: p.grad = None

We can create our optimizer by passing in the model’s parameters:

In [ ]:

opt = BasicOptim(linear_model.parameters(), lr)

Our training loop can now be simplified to:

In [ ]:

def train_epoch(model):
    for xb,yb in dl:
        calc_grad(xb, yb, model)
        opt.step()
        opt.zero_grad()

Our validation function doesn’t need to change at all:

In [ ]:

validate_epoch(linear_model)

Out[ ]:

0.4157

Let’s put our little training loop in a function, to make things simpler:

In [ ]:

def train_model(model, epochs):
    for i in range(epochs):
        train_epoch(model)
        print(validate_epoch(model), end=' ')

The results are the same as in the previous section:

In [ ]:

train_model(linear_model, 20)

0.4932 0.8618 0.8203 0.9102 0.9331 0.9468 0.9555 0.9629 0.9658 0.9673 0.9687 0.9707 0.9726 0.9751 0.9761 0.9761 0.9775 0.978 0.9785 0.9785

fastai provides the SGD class which, by default, does the same thing as our BasicOptim:

In [ ]:

linear_model = nn.Linear(28*28,1)
opt = SGD(linear_model.parameters(), lr)
train_model(linear_model, 20)

0.4932 0.852 0.8335 0.9116 0.9326 0.9473 0.9555 0.9624 0.9648 0.9668 0.9692 0.9712 0.9731 0.9746 0.9761 0.9765 0.9775 0.978 0.9785 0.9785

fastai also provides Learner.fit, which we can use instead of train_model. To create a Learner we first need to create a DataLoaders, by passing in our training and validation DataLoaders:

In [ ]:

dls = DataLoaders(dl, valid_dl)

To create a Learner without using an application (such as cnn_learner) we need to pass in all the elements that we’ve created in this chapter: the DataLoaders, the model, the optimization function (which will be passed the parameters), the loss function, and optionally any metrics to print:

In [ ]:

learn = Learner(dls, nn.Linear(28*28,1), opt_func=SGD,
                loss_func=mnist_loss, metrics=batch_accuracy)

Now we can call fit:

In [ ]:

learn.fit(10, lr=lr)

epoch	train_loss	valid_loss	batch_accuracy	time
0	0.636857	0.503549	0.495584	00:00
1	0.545725	0.170281	0.866045	00:00
2	0.199223	0.184893	0.831207	00:00
3	0.086580	0.107836	0.911187	00:00
4	0.045185	0.078481	0.932777	00:00
5	0.029108	0.062792	0.946516	00:00
6	0.022560	0.053017	0.955348	00:00
7	0.019687	0.046500	0.962218	00:00
8	0.018252	0.041929	0.965162	00:00
9	0.017402	0.038573	0.967615	00:00

As you can see, there’s nothing magic about the PyTorch and fastai classes. They are just convenient pre-packaged pieces that make your life a bit easier! (They also provide a lot of extra functionality we’ll be using in future chapters.)

With these classes, we can now replace our linear model with a neural network.