Module 3 - Loss functions for classification

Table of Contents

Loss functions for classification


0:00 Recap
2:25 How to choose your loss?
3:18 A probabilistic model for linear regression
7:50 Gradient descent, learning rate, SGD
11:30 Pytorch code for gradient descent
15:15 A probabilistic model for logistic regression
17:27 Notations (information theory)
20:58 Likelihood for logistic regression
22:43 BCELoss
23:41 BCEWithLogitsLoss
25:37 Beware of the reduction parameter
27:27 Softmax regression
30:52 NLLLoss
34:48 Classification in pytorch
36:36 Why maximizing accuracy directly is hard?
38:24 Classification in deep learning
40:50 Regression without knowing the underlying model
42:58 Overfitting in polynomial regression
45:20 Validation set
48:55 Notion of risk and hypothesis space
54:40 estimation error and approximation error

Slides and Notebook

Minimal working examples

BCELoss

import torch.nn as nn
m = nn.Sigmoid()
loss = nn.BCELoss()
input = torch.randn(3,4,5)
target = torch.randn(3,4,5)
loss(m(input), target)

NLLLoss and CrossEntropyLoss

import torch.nn as nn
m = nn.LogSoftmax(dim=1)
loss1 = nn.NLLLoss()
loss2 = nn.CrossEntropyLoss()
C = 4
input = torch.randn(3,C)
target = torch.empty(3, dtype=torch.long).random_(0,C) 
assert loss1(m(input),target) == loss2(input,target)

Frequently asked questions

Because its gradient has simple form: σ=σ(1σ)\sigma' = \sigma(1 - \sigma).