Definition
k-fold Cross-Validation
k-fold cross-validation is a robust evaluation technique used to estimate the true risk of a model, particularly when the dataset is small. Formally, the dataset is partitioned into equally sized, disjoint folds . The procedure involves iterations where:
- Fold is held out as the test set.
- The remaining folds are used as the training set.
- A performance metric is computed on .
The final performance estimate is the arithmetic mean of the metrics: .
Cross-Validation Variants
k-fold CV: The dataset is partitioned into equally sized folds. For each fold, the model is trained on the remaining folds and evaluated on the current fold. The overall metric is the average of these iterations.
Leave-One-Out (LOO): A special case of k-fold cross-validation where (the total number of samples). In each iteration, only a single observation is held out for testing. While providing an unbiased performance estimate, it is computationally expensive for large datasets as it requires training separate models.
Stratification
Stratified k-fold
In stratified k-fold cross-validation, the folds are constructed such that each fold contains approximately the same percentage of samples of each target class as the complete set, ensuring that each split is representative of the underlying class distribution.