Lukas' Notes

A single perceptron draws a line. It fires if the weighted sum of its inputs crosses a threshold. This makes it a linear classifier: everything on one side is , everything on the other is .

Linear is enough for AND, OR, NOT — and even for majority gates that fire when at least inputs are . But linear is not enough for XOR.

XOR is not linearly separable. No single perceptron, no matter how you set its weights and threshold, can output for and while outputting for and . One layer is not enough.

But two layers are. An MLP with a single hidden layer solves XOR with three perceptrons:

The hidden layer does the work. fires when at least one input is (OR). fires when both are (AND). The output neuron subtracts: , firing only when exactly one input is . Three perceptrons, two layers, XOR solved.

This scales to any Boolean function. Every truth table can be written in disjunctive normal form — an OR of ANDs, one AND term per row where the output is . The first hidden layer computes the AND terms. The output neuron ORs them together. A single hidden layer is enough.

A one-hidden-layer MLP is a universal Boolean function.

But universality does not come cheap. The DNF construction can require exponentially many hidden neurons. For input variables, the worst-case irreducible function — a checkerboard pattern on a Karnaugh map — demands hidden perceptrons. That is exponential in .

Depth changes the arithmetic. The same checkerboard function can be built as a hierarchy of XORs. An XOR needs 3 perceptrons. An XOR of variables built hierarchically — pairwise, then pairwise again — needs only perceptrons arranged in about layers. Linear, not exponential.

The trade is stark. A shallow MLP pays an exponential price in width. A deep MLP pays linearly. The same function, the same Boolean truth table — but depth transforms the cost from impossible to manageable.

MLPs are universal Boolean functions. But universality is a blunt claim. What matters is the shape of the network that achieves it. Depth is not an implementation detail. It is the difference between a network that fits in the universe and one that does not.