machine-learning statistics

Definition

Coefficient of Determination

Let be observed values with mean , and let be the corresponding predictions from a regression model. The coefficient of determination is:

The numerator is the sum of squared residuals (unexplained variance). The denominator is the total sum of squares (total variance).

Interpretation

measures the proportion of variance in the dependent variable that the model explains.

ValueMeaning
Perfect fit; all residuals are zero.
The model predicts no better than the mean .
The model is worse than predicting the mean; the numerator exceeds the denominator.

Properties

Not a measure of correctness

A high does not imply that the model is correct, only that it fits the observed data well. A misspecified model can still achieve if it has enough parameters or if the data happen to align.

Sensitive to outliers

Because is based on squared residuals, a single large outlier can substantially reduce its value.

Increases with added predictors

Adding any predictor to a linear regression never decreases , even if the predictor is irrelevant. The adjusted penalises extra parameters to compensate.