Binary Cross-Entropy Loss

machine-learning classification optimisation

Definition

Binary Cross-Entropy Loss

Binary cross-entropy loss is a loss function for binary classification that measures how well a predicted Bernoulli probability matches a binary label.

For a true label $y \in {0, 1}$ and a predicted probability $q \in (0, 1)$ for class $1$ , it is defined by
$ℓ (y, q) = - (y lo g q + (1 - y) lo g (1 - q)) .$
Here $q$ is usually interpreted as the conditional probability $P (Y = 1 ∣ X = x)$ predicted by the model.

Cases

For a positive example $y = 1$ , the loss becomes

ℓ (1, q) = - lo g q .

The model is punished when it assigns low probability to the positive class.

For a negative example $y = 0$ , the loss becomes

ℓ (0, q) = - lo g (1 - q) .

The model is punished when it assigns high probability to the positive class.

Interpretation

Binary cross-entropy is the negative log-likelihood of a Bernoulli distribution. If the model predicts

Y ∣ X = x \sim Bernoulli (q),

then minimising binary cross-entropy is equivalent to maximising the likelihood of the observed labels.

It is the standard loss for logistic regression and binary neural classifiers with a sigmoid output.

Relation to KL Divergence

For a soft target distribution with true conditional probability $p = P (Y = 1 ∣ X = x)$ and predicted probability $q$ , the KL divergence is

D_{KL} (P ∥ Q) = p lo g \frac{p}{q} + (1 - p) lo g \frac{1 - p}{1 - q} .

When the target is a hard label $y \in {0, 1}$ , minimising this divergence is equivalent to minimising binary cross-entropy.

Lukas' Notes

Binary Cross-Entropy Loss

Table of Contents

Definition

Cases

Interpretation

Relation to KL Divergence

Backlinks