machine-learning preprocessing
Definition
One-Hot Encoding
One-hot encoding is a preprocessing technique used to represent nominal variables as numerical vectors. Formally, for a categorical feature with a finite alphabet of unique symbols , each symbol is mapped to a binary vector where:
This transformation ensures that the learning algorithm does not erroneously infer an ordinal relationship between purely nominal categories.
Applications
String Vectorisation: Sequences can be transformed into sparse matrices by concatenating the one-hot vectors of their constituent characters or words.
Neural Networks: Frequently used for the representation of discrete targets in classification tasks.
Example
A categorical feature representing the colour of an object (e.g., Red, Green, Blue) can be encoded as:
where the active category is indicated by and others by .