Lukas' Notes

A perceptron is a small geometric instrument. It receives a point , computes a score, and then asks only one question:

Here is a matrix-vector product: the weight vector is read as a row vector , the input is read as a column vector , and the result is one scalar score.

That question cuts the space into two sides. The quiet boundary between them is the hyperplane

In two dimensions this hyperplane looks like a line. In three dimensions it looks like a plane. In dimensions it is still the same idea: one flat boundary, with one side labelled positive and the other side labelled negative.

The weights and the bias are not separate boundaries. They are the knobs that place one boundary. In , one perceptron has weights and one bias, so it spends parameters to draw one hyperplane.

Parameters Hyperplanes

This is the important correction when large networks are mentioned. Four billion parameters do not mean four billion hyperplanes. A hyperplane is made from a whole weight vector and a bias. Parameters are the coordinates of the tool, not the tool itself.

A layer is a small fence-maker

Now put several perceptrons side by side. They all receive the same input point, but each has its own vector and bias .

Each hidden perceptron draws one hyperplane in the input space. Together, a hidden layer draws a small arrangement of flat fences.

A convex pentagon is a good exam picture because it makes the count visible. Five edges need five half-plane tests. The output neuron can then behave like an AND gate: accept the point only if all five hidden neurons say that the point is on the correct side of their edge.

This is why the informal sentence “each neuron draws a line” is useful in two-dimensional examples. The precise version is:

each hidden perceptron defines one hyperplane and tests one side of it.

The drawing is cosy, but the condition is strict. If even one edge-test fails, the point is outside the pentagon.

The output combines, not redraws

In a one-hidden-layer MLP, the hidden layer makes the geometric tests. The output layer sees only their results:

The output neuron also has weights and a bias, but its hyperplane lives in the hidden activation space, not directly in the original plane. It is deciding how to combine the answers produced by the hidden neurons.

So the comforting mental model is this:

  • one perceptron gives one flat cut;
  • one hidden layer gives many flat cuts in the original input space;
  • the output layer combines those cuts into a region;
  • deeper layers repeat the idea in transformed feature spaces.

The first layer draws fences on the original ground. Later layers no longer walk on that ground directly. They walk on the map of previous answers. That is why deep networks can make curved, folded, and highly intricate decision regions without each parameter being its own boundary.

A perceptron does not draw a whole shape by itself. It draws one side of a possible shape. The network is the patient composition of many such sides.