The Geometry of the Perceptron Algorithm

The perceptron algorithm is often presented as an algebraic update: on a mistake, add the misclassified instance to the weight vector. The algebra is simple, but the geometry is what makes it work — and what makes it fail.

Start with a random hyperplane through the origin. The weight vector $w$ is the normal. Everything on the $w$ side is classified $+ 1$ ; everything opposite is $- 1$ . A random plane will make mistakes.

Two points are on the wrong side. A positive instance sits in the negative half-plane. A negative instance sits in the positive half-plane. The hyperplane needs to move.

The update rule for a misclassified instance $x$ with true label $y \in {- 1, + 1}$ is:

w \leftarrow w + y x

For a misclassified positive ( $y = + 1$ ), add $x$ to $w$ . Geometrically, this rotates the weight vector toward the point — and the decision boundary, always perpendicular to $w$ , swings with it.

Adding a positive misclassified instance pulls $w$ toward it. The boundary rotates, bringing the point closer to the correct side. One update may not fix everything — but it pushes in the right direction.

A misclassified negative ( $y = - 1$ ) calls for subtraction: $w \leftarrow w - x$ . This rotates the weight vector away from the offending point, swinging the boundary to push it back into the negative half-plane.

Each mistake nudges the hyperplane: pull toward the positives, push away from the negatives. The plane swings, corrects some points, perhaps misclassifies others, and swings again. The process repeats — cycle through the data, update on every error, until no errors remain.

If the data is linearly separable — if some separating hyperplane exists — the algorithm is guaranteed to converge in a finite number of steps. The bound is $(\frac{R}{γ})^{2}$ mistakes, where $R$ is the length of the longest instance vector and $γ$ is the margin — the distance from the best separating plane to the closest point.

There is a clean geometric shortcut: reflect every negative instance across the origin (negate all its coordinates). Now all instances want to be on the same side of the hyperplane. The problem reduces to finding a plane with every point on the positive side — a single half-space containing all the reflected data. If the classes are linearly separable, such a plane exists. The perceptron algorithm finds one.

When the data is not linearly separable, no plane works. The perceptron algorithm will swing forever, never settling. There is no separating hyperplane to converge to. The algorithm’s guarantee — and its geometry — depends entirely on that assumption.

The perceptron algorithm is geometric at its core. The update is not an arbitrary correction. It is a physical rotation: the weight vector leans toward the positives and recoils from the negatives, each mistake a nudge, until the hyperplane finds its place — or proves that no place exists.

Lukas' Notes

The Geometry of the Perceptron Algorithm

Backlinks