16 The Training Process - Establishing a Baseline - 《The fastai book》

Establishing a Baseline

Establishing a Baseline

First, we’ll create a baseline, using plain SGD, and compare it to fastai’s default optimizer. We’ll start by grabbing Imagenette with the same get_data we used in <>:

In [ ]:

#hide_input
def get_data(url, presize, resize):
    path = untar_data(url)
    return DataBlock(
        blocks=(ImageBlock, CategoryBlock), get_items=get_image_files, 
        splitter=GrandparentSplitter(valid_name='val'),
        get_y=parent_label, item_tfms=Resize(presize),
        batch_tfms=[*aug_transforms(min_scale=0.5, size=resize),
                    Normalize.from_stats(*imagenet_stats)],
    ).dataloaders(path, bs=128)

In [ ]:

dls = get_data(URLs.IMAGENETTE_160, 160, 128)

We’ll create a ResNet-34 without pretraining, and pass along any arguments received:

In [ ]:

def get_learner(**kwargs):
    return cnn_learner(dls, resnet34, pretrained=False,
                    metrics=accuracy, **kwargs).to_fp16()

Here’s the default fastai optimizer, with the usual 3e-3 learning rate:

In [ ]:

learn = get_learner()
learn.fit_one_cycle(3, 0.003)

epoch	train_loss	valid_loss	accuracy	time
0	2.571932	2.685040	0.322548	00:11
1	1.904674	1.852589	0.437452	00:11
2	1.586909	1.374908	0.594904	00:11

Now let’s try plain SGD. We can pass opt_func (optimization function) to cnn_learner to get fastai to use any optimizer:

In [ ]:

learn = get_learner(opt_func=SGD)

The first thing to look at is lr_find:

In [ ]:

learn.lr_find()

Out[ ]:

(0.017378008365631102, 3.019951861915615e-07)

It looks like we’ll need to use a higher learning rate than we normally use:

In [ ]:

learn.fit_one_cycle(3, 0.03, moms=(0,0,0))

epoch	train_loss	valid_loss	accuracy	time
0	2.969412	2.214596	0.242038	00:09
1	2.442730	1.845950	0.362548	00:09
2	2.157159	1.741143	0.408917	00:09

Because accelerating SGD with momentum is such a good idea, fastai does this by default in fit_one_cycle, so we turn it off with moms=(0,0,0). We’ll be discussing momentum shortly.)

Clearly, plain SGD isn’t training as fast as we’d like. So let’s learn some tricks to get accelerated training!