Questionnaire
- How did we get to a single vector of activations in the CNNs used for MNIST in previous chapters? Why isn’t that suitable for Imagenette?
- What do we do for Imagenette instead?
- What is “adaptive pooling”?
- What is “average pooling”?
- Why do we need
Flatten
after an adaptive average pooling layer? - What is a “skip connection”?
- Why do skip connections allow us to train deeper models?
- What does <> show? How did that lead to the idea of skip connections?
- What is “identity mapping”?
- What is the basic equation for a ResNet block (ignoring batchnorm and ReLU layers)?
- What do ResNets have to do with residuals?
- How do we deal with the skip connection when there is a stride-2 convolution? How about when the number of filters changes?
- How can we express a 1×1 convolution in terms of a vector dot product?
- Create a
1x1 convolution
withF.conv2d
ornn.Conv2d
and apply it to an image. What happens to theshape
of the image? - What does the
noop
function return? - Explain what is shown in <>.
- When is top-5 accuracy a better metric than top-1 accuracy?
- What is the “stem” of a CNN?
- Why do we use plain convolutions in the CNN stem, instead of ResNet blocks?
- How does a bottleneck block differ from a plain ResNet block?
- Why is a bottleneck block faster?
- How do fully convolutional nets (and nets with adaptive pooling in general) allow for progressive resizing?
Further Research
- Try creating a fully convolutional net with adaptive average pooling for MNIST (note that you’ll need fewer stride-2 layers). How does it compare to a network without such a pooling layer?
- In <> we introduce Einstein summation notation. Skip ahead to see how this works, and then write an implementation of the 1×1 convolution operation using
torch.einsum
. Compare it to the same operation usingtorch.conv2d
. - Write a “top-5 accuracy” function using plain PyTorch or plain Python.
- Train a model on Imagenette for more epochs, with and without label smoothing. Take a look at the Imagenette leaderboards and see how close you can get to the best results shown. Read the linked pages describing the leading approaches.
In [ ]: