MLPs are universal Boolean functions

A single perceptron draws a line. It fires if the weighted sum of its inputs crosses a threshold. This makes it a linear classifier: everything on one side is $1$ , everything on the other is $0$ .

Linear is enough for AND, OR, NOT — and even for majority gates that fire when at least $K$ inputs are $1$ . But linear is not enough for XOR.

XOR is not linearly separable. No single perceptron, no matter how you set its weights and threshold, can output $1$ for $(0, 1)$ and $(1, 0)$ while outputting $0$ for $(0, 0)$ and $(1, 1)$ . One layer is not enough.

But two layers are. An MLP with a single hidden layer solves XOR with three perceptrons:

The hidden layer does the work. $h_{1}$ fires when at least one input is $1$ (OR). $h_{2}$ fires when both are $1$ (AND). The output neuron subtracts: $y = h_{1} - h_{2}$ , firing only when exactly one input is $1$ . Three perceptrons, two layers, XOR solved.

This scales to any Boolean function. Every truth table can be written in disjunctive normal form — an OR of ANDs, one AND term per row where the output is $1$ . The first hidden layer computes the AND terms. The output neuron ORs them together. A single hidden layer is enough.

A one-hidden-layer MLP is a universal Boolean function.

But universality does not come cheap. The DNF construction can require exponentially many hidden neurons. For $N$ input variables, the worst-case irreducible function — a checkerboard pattern on a Karnaugh map — demands $2^{N - 1}$ hidden perceptrons. That is exponential in $N$ .

Depth changes the arithmetic. The same checkerboard function can be built as a hierarchy of XORs. An XOR needs 3 perceptrons. An XOR of $N$ variables built hierarchically — pairwise, then pairwise again — needs only $3 (N - 1)$ perceptrons arranged in about $2 lo g_{2} (N)$ layers. Linear, not exponential.

The trade is stark. A shallow MLP pays an exponential price in width. A deep MLP pays linearly. The same function, the same Boolean truth table — but depth transforms the cost from impossible to manageable.

MLPs are universal Boolean functions. But universality is a blunt claim. What matters is the shape of the network that achieves it. Depth is not an implementation detail. It is the difference between a network that fits in the universe and one that does not.

Lukas' Notes

MLPs are universal Boolean functions

Backlinks