**Table of Contents**

0:00 Recap

2:25 How to choose your loss?

3:18 A probabilistic model for linear regression

7:50 Gradient descent, learning rate, SGD

11:30 Pytorch code for gradient descent

15:15 A probabilistic model for logistic regression

17:27 Notations (information theory)

20:58 Likelihood for logistic regression

22:43 BCELoss

23:41 BCEWithLogitsLoss

25:37 Beware of the reduction parameter

27:27 Softmax regression

30:52 NLLLoss

34:48 Classification in pytorch

36:36 Why maximizing accuracy directly is hard?

38:24 Classification in deep learning

40:50 Regression without knowing the underlying model

42:58 Overfitting in polynomial regression

45:20 Validation set

48:55 Notion of risk and hypothesis space

54:40 estimation error and approximation error

`BCELoss`

```
import torch.nn as nn
m = nn.Sigmoid()
loss = nn.BCELoss()
input = torch.randn(3,4,5)
target = torch.randn(3,4,5)
loss(m(input), target)
```

`NLLLoss`

and `CrossEntropyLoss`

```
import torch.nn as nn
m = nn.LogSoftmax(dim=1)
loss1 = nn.NLLLoss()
loss2 = nn.CrossEntropyLoss()
C = 4
input = torch.randn(3,C)
target = torch.empty(3, dtype=torch.long).random_(0,C)
assert loss1(m(input),target) == loss2(input,target)
```

Because its gradient has simple form: $\sigma' = \sigma(1 - \sigma)$.