Sigmoid Turns a Score into a Probability

In a perceptron, the score

z = ⟨ w, x ⟩

is only a side-test. If $z$ is positive, the point is on one side of the hyperplane. If $z$ is negative, it is on the other side. The sign gives a hard decision.

The sigmoid function softens this. Instead of asking whether $z$ is merely positive or negative, it turns the score into a number between $0$ and $1$ :

σ (z) = \frac{1}{1 + e ^{- z}} .

That number can be interpreted as a conditional probability:

P (Y = 1 ∣ X = x) = σ (⟨ w, x ⟩) .

So the precise statement is not $P (y = x ∣ x)$ . The input is $x$ ; the random label is $Y$ . The model says: given this input $x$ , the probability of class $1$ is the sigmoid of the score.

The sigmoid does not move the hyperplane. The point where the model is undecided is still

⟨ w, x ⟩ = 0.

At that point, $σ (0) = 0.5$ . The model is exactly balanced between class $0$ and class $1$ . Moving away from the hyperplane changes confidence, not just the label.

Distance becomes confidence

The score $z$ measures signed position relative to the boundary. Large positive scores mean strong evidence for class $1$ . Large negative scores mean strong evidence for class $0$ .

A hard perceptron keeps only the sign of $z$ . Logistic regression keeps the whole score and reads it as confidence. Near $z = 0$ , small changes matter a lot. Far away from $0$ , the sigmoid saturates: the model is already very sure, so extra distance changes the probability only slightly.

The output is a biased coin

For binary classification, the sigmoid output parameterises a Bernoulli distribution over the label:

Y ∣ X = x \sim Bernoulli (p), p = σ (⟨ w, x ⟩) .

This means

P (Y = 1 ∣ X = x) = p, P (Y = 0 ∣ X = x) = 1 - p .

This is the small conceptual turn. The linear score $⟨ w, x ⟩$ still comes from geometry: it says where the point lies relative to a boundary. The sigmoid wraps that geometry in probability. One side gives a high conditional probability $P (Y = 1 ∣ X = x)$ ; the other side gives a low one; the boundary itself becomes uncertainty.

The hyperplane is still there. The sigmoid makes it probabilistic.

Lukas' Notes

Sigmoid Turns a Score into a Probability

Table of Contents

Distance becomes confidence

The output is a biased coin

Backlinks