In [ ]:

  1. #hide
  2. !pip install -Uqq fastbook
  3. import fastbook
  4. fastbook.setup_book()

In [ ]:

  1. #hide
  2. from fastbook import *

[[chapter_accel_sgd]]

The Training Process

You now know how to create state-of-the-art architectures for computer vision, natural language processing, tabular analysis, and collaborative filtering, and you know how to train them quickly. So we’re done, right? Not quite yet. We still have to explore a little bit more the training process.

We explained in <> the basis of stochastic gradient descent: pass a mini-batch to the model, compare it to our target with the loss function, then compute the gradients of this loss function with regard to each weight before updating the weights with the formula:

  1. new_weight = weight - lr * weight.grad

We implemented this from scratch in a training loop, and also saw that PyTorch provides a simple nn.SGD class that does this calculation for each parameter for us. In this chapter we will build some faster optimizers, using a flexible foundation. But that’s not all we might want to change in the training process. For any tweak of the training loop, we will need a way to add some code to the basis of SGD. The fastai library has a system of callbacks to do this, and we will teach you all about it.

Let’s start with standard SGD to get a baseline, then we will introduce the most commonly used optimizers.