Lukas' Notes

In a perceptron, the score

is only a side-test. If is positive, the point is on one side of the hyperplane. If is negative, it is on the other side. The sign gives a hard decision.

The sigmoid function softens this. Instead of asking whether is merely positive or negative, it turns the score into a number between and :

That number can be interpreted as a conditional probability:

So the precise statement is not . The input is ; the random label is . The model says: given this input , the probability of class is the sigmoid of the score.

The sigmoid does not move the hyperplane. The point where the model is undecided is still

At that point, . The model is exactly balanced between class and class . Moving away from the hyperplane changes confidence, not just the label.

Distance becomes confidence

The score measures signed position relative to the boundary. Large positive scores mean strong evidence for class . Large negative scores mean strong evidence for class .

A hard perceptron keeps only the sign of . Logistic regression keeps the whole score and reads it as confidence. Near , small changes matter a lot. Far away from , the sigmoid saturates: the model is already very sure, so extra distance changes the probability only slightly.

The output is a biased coin

For binary classification, the sigmoid output parameterises a Bernoulli distribution over the label:

This means

This is the small conceptual turn. The linear score still comes from geometry: it says where the point lies relative to a boundary. The sigmoid wraps that geometry in probability. One side gives a high conditional probability ; the other side gives a low one; the boundary itself becomes uncertainty.

The hyperplane is still there. The sigmoid makes it probabilistic.