machine-learning preprocessing

Definition

One-Hot Encoding

One-hot encoding is a preprocessing technique used to represent nominal variables as numerical vectors. Formally, for a categorical feature with a finite alphabet of unique symbols , each symbol is mapped to a binary vector where:

This transformation ensures that the learning algorithm does not erroneously infer an ordinal relationship between purely nominal categories.

Applications

String Vectorisation: Sequences can be transformed into sparse matrices by concatenating the one-hot vectors of their constituent characters or words.

Neural Networks: Frequently used for the representation of discrete targets in classification tasks.

Example

A categorical feature representing the colour of an object (e.g., Red, Green, Blue) can be encoded as:

where the active category is indicated by and others by .