Loss
We’ve already seen how to define “negative log likelihood”:
In [ ]:
def nll(input, target): return -input[range(target.shape[0]), target].mean()
Well actually, there’s no log here, since we’re using the same definition as PyTorch. That means we need to put the log together with softmax:
In [ ]:
def log_softmax(x): return (x.exp()/(x.exp().sum(-1,keepdim=True))).log()
sm = log_softmax(r); sm[0][0]
Out[ ]:
tensor(-1.2790, grad_fn=<SelectBackward>)
Combining these gives us our cross-entropy loss:
In [ ]:
loss = nll(sm, yb)
loss
Out[ ]:
tensor(2.5666, grad_fn=<NegBackward>)
Note that the formula:
\log \left ( \frac{a}{b} \right ) = \log(a) - \log(b)
gives a simplification when we compute the log softmax, which was previously defined as (x.exp()/(x.exp().sum(-1))).log()
:
In [ ]:
def log_softmax(x): return x - x.exp().sum(-1,keepdim=True).log()
sm = log_softmax(r); sm[0][0]
Out[ ]:
tensor(-1.2790, grad_fn=<SelectBackward>)
Then, there is a more stable way to compute the log of the sum of exponentials, called the LogSumExp trick. The idea is to use the following formula:
\log \left ( \sum_{j=1}^{n} e^{x_{j}} \right ) = \log \left ( e^{a} \sum_{j=1}^{n} e^{x_{j}-a} \right ) = a + \log \left ( \sum_{j=1}^{n} e^{x_{j}-a} \right )
where $a$ is the maximum of $x_{j}$.
Here’s the same thing in code:
In [ ]:
x = torch.rand(5)
a = x.max()
x.exp().sum().log() == a + (x-a).exp().sum().log()
Out[ ]:
tensor(True)
We’ll put that into a function:
In [ ]:
def logsumexp(x):
m = x.max(-1)[0]
return m + (x-m[:,None]).exp().sum(-1).log()
logsumexp(r)[0]
Out[ ]:
tensor(3.9784, grad_fn=<SelectBackward>)
so we can use it for our log_softmax
function:
In [ ]:
def log_softmax(x): return x - x.logsumexp(-1,keepdim=True)
Which gives the same result as before:
In [ ]:
sm = log_softmax(r); sm[0][0]
Out[ ]:
tensor(-1.2790, grad_fn=<SelectBackward>)
We can use these to create cross_entropy
:
In [ ]:
def cross_entropy(preds, yb): return nll(log_softmax(preds), yb).mean()
Let’s now combine all those pieces together to create a Learner
.