Questionnaire
- How is a grayscale image represented on a computer? How about a color image?
- How are the files and folders in the
MNIST_SAMPLE
dataset structured? Why? - Explain how the “pixel similarity” approach to classifying digits works.
- What is a list comprehension? Create one now that selects odd numbers from a list and doubles them.
- What is a “rank-3 tensor”?
- What is the difference between tensor rank and shape? How do you get the rank from the shape?
- What are RMSE and L1 norm?
- How can you apply a calculation on thousands of numbers at once, many thousands of times faster than a Python loop?
- Create a 3×3 tensor or array containing the numbers from 1 to 9. Double it. Select the bottom-right four numbers.
- What is broadcasting?
- Are metrics generally calculated using the training set, or the validation set? Why?
- What is SGD?
- Why does SGD use mini-batches?
- What are the seven steps in SGD for machine learning?
- How do we initialize the weights in a model?
- What is “loss”?
- Why can’t we always use a high learning rate?
- What is a “gradient”?
- Do you need to know how to calculate gradients yourself?
- Why can’t we use accuracy as a loss function?
- Draw the sigmoid function. What is special about its shape?
- What is the difference between a loss function and a metric?
- What is the function to calculate new weights using a learning rate?
- What does the
DataLoader
class do? - Write pseudocode showing the basic steps taken in each epoch for SGD.
- Create a function that, if passed two arguments
[1,2,3,4]
and'abcd'
, returns[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]
. What is special about that output data structure? - What does
view
do in PyTorch? - What are the “bias” parameters in a neural network? Why do we need them?
- What does the
@
operator do in Python? - What does the
backward
method do? - Why do we have to zero the gradients?
- What information do we have to pass to
Learner
? - Show Python or pseudocode for the basic steps of a training loop.
- What is “ReLU”? Draw a plot of it for values from
-2
to+2
. - What is an “activation function”?
- What’s the difference between
F.relu
andnn.ReLU
? - The universal approximation theorem shows that any function can be approximated as closely as needed using just one nonlinearity. So why do we normally use more?
Further Research
- Create your own implementation of
Learner
from scratch, based on the training loop shown in this chapter. - Complete all the steps in this chapter using the full MNIST datasets (that is, for all digits, not just 3s and 7s). This is a significant project and will take you quite a bit of time to complete! You’ll need to do some of your own research to figure out how to overcome some obstacles you’ll meet on the way.
In [ ]: