Definition
Coefficient of Determination
Let be observed values with mean , and let be the corresponding predictions from a regression model. The coefficient of determination is:
The numerator is the sum of squared residuals (unexplained variance). The denominator is the total sum of squares (total variance).
Interpretation
measures the proportion of variance in the dependent variable that the model explains.
| Value | Meaning |
|---|---|
| Perfect fit; all residuals are zero. | |
| The model predicts no better than the mean . | |
| The model is worse than predicting the mean; the numerator exceeds the denominator. |
Properties
Not a measure of correctness
A high does not imply that the model is correct, only that it fits the observed data well. A misspecified model can still achieve if it has enough parameters or if the data happen to align.
Sensitive to outliers
Because is based on squared residuals, a single large outlier can substantially reduce its value.
Increases with added predictors
Adding any predictor to a linear regression never decreases , even if the predictor is irrelevant. The adjusted penalises extra parameters to compensate.