Definition
Categorical Data
Categorical data (or qualitative data) is a data type consisting of discrete values that represent distinct groups or categories. Formally, categorical data maps observations to a finite set of labels . Categorical data is subdivided based on the presence of an intrinsic ordering:
- Nominal Data: No inherent order exists between categories.
- Ordinal Data: A meaningful ranking or order exists between categories.
Unlike numerical data, categorical data does not support standard arithmetic operations, though it can be represented numerically through encoding schemes for processing by learning algorithms.
Examples
Binary Classification: Tasks such as “Yes/No” or “Pass/Fail” where .
Multi-class Categories: Sets of distinct objects like {dog, cat, bird, fish, none}.
Preprocessing
- Deal with missing values (delete or impute)
- Re-label ordinal values (e.g.: small/medium/large 0/1/2)
- One-hot encoding