Learner
We have data, a model, and a loss function; we only need one more thing we can fit a model, and that’s an optimizer! Here’s SGD:
In [ ]:
class SGD:
def __init__(self, params, lr, wd=0.): store_attr()
def step(self):
for p in self.params:
p.data -= (p.grad.data + p.data*self.wd) * self.lr
p.grad.data.zero_()
As we’ve seen in this book, life is easier with a Learner
. The Learner
class needs to know our training and validation sets, which means we need DataLoaders
to store them. We don’t need any other functionality, just a place to store them and access them:
In [ ]:
class DataLoaders:
def __init__(self, *dls): self.train,self.valid = dls
dls = DataLoaders(train_dl,valid_dl)
Now we’re ready to create our Learner
class:
In [ ]:
class Learner:
def __init__(self, model, dls, loss_func, lr, cbs, opt_func=SGD):
store_attr()
for cb in cbs: cb.learner = self
def one_batch(self):
self('before_batch')
xb,yb = self.batch
self.preds = self.model(xb)
self.loss = self.loss_func(self.preds, yb)
if self.model.training:
self.loss.backward()
self.opt.step()
self('after_batch')
def one_epoch(self, train):
self.model.training = train
self('before_epoch')
dl = self.dls.train if train else self.dls.valid
for self.num,self.batch in enumerate(progress_bar(dl, leave=False)):
self.one_batch()
self('after_epoch')
def fit(self, n_epochs):
self('before_fit')
self.opt = self.opt_func(self.model.parameters(), self.lr)
self.n_epochs = n_epochs
try:
for self.epoch in range(n_epochs):
self.one_epoch(True)
self.one_epoch(False)
except CancelFitException: pass
self('after_fit')
def __call__(self,name):
for cb in self.cbs: getattr(cb,name,noop)()
This is the largest class we’ve created in the book, but each method is quite small, so by looking at each in turn you should be able to follow what’s going on.
The main method we’ll be calling is fit
. This loops with:
for self.epoch in range(n_epochs)
and at each epoch calls self.one_epoch
for each of train=True
and then train=False
. Then self.one_epoch
calls self.one_batch
for each batch in dls.train
or dls.valid
, as appropriate (after wrapping the DataLoader
in fastprogress.progress_bar
. Finally, self.one_batch
follows the usual set of steps to fit one mini-batch that we’ve seen throughout this book.
Before and after each step, Learner
calls self
, which calls __call__
(which is standard Python functionality). __call__
uses getattr(cb,name)
on each callback in self.cbs
, which is a Python built-in function that returns the attribute (a method, in this case) with the requested name. So, for instance, self('before_fit')
will call cb.before_fit()
for each callback where that method is defined.
As you can see, Learner
is really just using our standard training loop, except that it’s also calling callbacks at appropriate times. So let’s define some callbacks!
Callbacks
In Learner.__init__
we have:
for cb in cbs: cb.learner = self
In other words, every callback knows what learner it is used in. This is critical, since otherwise a callback can’t get information from the learner, or change things in the learner. Because getting information from the learner is so common, we make that easier by defining Callback
as a subclass of GetAttr
, with a default attribute of learner
:
In [ ]:
class Callback(GetAttr): _default='learner'
GetAttr
is a fastai class that implements Python’s standard __getattr__
and __dir__
methods for you, such that any time you try to access an attribute that doesn’t exist, it passes the request along to whatever you have defined as _default
.
For instance, we want to move all model parameters to the GPU automatically at the start of fit
. We could do this by defining before_fit
as self.learner.model.cuda()
; however, because learner
is the default attribute, and we have SetupLearnerCB
inherit from Callback
(which inherits from GetAttr
), we can remove the .learner
and just call self.model.cuda()
:
In [ ]:
class SetupLearnerCB(Callback):
def before_batch(self):
xb,yb = to_device(self.batch)
self.learner.batch = tfm_x(xb),yb
def before_fit(self): self.model.cuda()
In SetupLearnerCB
we also move each mini-batch to the GPU, by calling to_device(self.batch)
(we could also have used the longer to_device(self.learner.batch)
. Note however that in the line self.learner.batch = tfm_x(xb),yb
we can’t remove .learner
, because here we’re setting the attribute, not getting it.
Before we try our Learner
out, let’s create a callback to track and print progress. Otherwise we won’t really know if it’s working properly:
In [ ]:
class TrackResults(Callback):
def before_epoch(self): self.accs,self.losses,self.ns = [],[],[]
def after_epoch(self):
n = sum(self.ns)
print(self.epoch, self.model.training,
sum(self.losses).item()/n, sum(self.accs).item()/n)
def after_batch(self):
xb,yb = self.batch
acc = (self.preds.argmax(dim=1)==yb).float().sum()
self.accs.append(acc)
n = len(xb)
self.losses.append(self.loss*n)
self.ns.append(n)
Now we’re ready to use our Learner
for the first time!
In [ ]:
cbs = [SetupLearnerCB(),TrackResults()]
learn = Learner(simple_cnn(), dls, cross_entropy, lr=0.1, cbs=cbs)
learn.fit(1)
0 True 2.1275552130636814 0.2314922378287042
0 False 1.9942575636942674 0.2991082802547771
It’s quite amazing to realize that we can implement all the key ideas from fastai’s Learner
in so little code! Let’s now add some learning rate scheduling.
Scheduling the Learning Rate
If we’re going to get good results, we’ll want an LR finder and 1cycle training. These are both annealing callbacks—that is, they are gradually changing hyperparameters as we train. Here’s LRFinder
:
In [ ]:
class LRFinder(Callback):
def before_fit(self):
self.losses,self.lrs = [],[]
self.learner.lr = 1e-6
def before_batch(self):
if not self.model.training: return
self.opt.lr *= 1.2
def after_batch(self):
if not self.model.training: return
if self.opt.lr>10 or torch.isnan(self.loss): raise CancelFitException
self.losses.append(self.loss.item())
self.lrs.append(self.opt.lr)
This shows how we’re using CancelFitException
, which is itself an empty class, only used to signify the type of exception. You can see in Learner
that this exception is caught. (You should add and test CancelBatchException
, CancelEpochException
, etc. yourself.) Let’s try it out, by adding it to our list of callbacks:
In [ ]:
lrfind = LRFinder()
learn = Learner(simple_cnn(), dls, cross_entropy, lr=0.1, cbs=cbs+[lrfind])
learn.fit(2)
0 True 2.6336045582954903 0.11014890695955222
0 False 2.230653363853503 0.18318471337579617
16.22% [12/74 00:02<00:12]
And take a look at the results:
In [ ]:
plt.plot(lrfind.lrs[:-2],lrfind.losses[:-2])
plt.xscale('log')
Now we can define our OneCycle
training callback:
In [ ]:
class OneCycle(Callback):
def __init__(self, base_lr): self.base_lr = base_lr
def before_fit(self): self.lrs = []
def before_batch(self):
if not self.model.training: return
n = len(self.dls.train)
bn = self.epoch*n + self.num
mn = self.n_epochs*n
pct = bn/mn
pct_start,div_start = 0.25,10
if pct<pct_start:
pct /= pct_start
lr = (1-pct)*self.base_lr/div_start + pct*self.base_lr
else:
pct = (pct-pct_start)/(1-pct_start)
lr = (1-pct)*self.base_lr
self.opt.lr = lr
self.lrs.append(lr)
We’ll try an LR of 0.1:
In [ ]:
onecyc = OneCycle(0.1)
learn = Learner(simple_cnn(), dls, cross_entropy, lr=0.1, cbs=cbs+[onecyc])
Let’s fit for a while and see how it looks (we won’t show all the output in the book—try it in the notebook to see the results):
In [ ]:
#hide_output
learn.fit(8)
Finally, we’ll check that the learning rate followed the schedule we defined (as you see, we’re not using cosine annealing here):
In [ ]:
plt.plot(onecyc.lrs);