machine-learning statistics

Definition

Gini Index

The Gini index (or Gini impurity) is a measure used in decision tree algorithms to quantify the frequency with which a randomly chosen element from a set would be incorrectly labelled if it were randomly labelled according to the distribution of labels in the subset. Formally, for a set with classes and class probabilities , the Gini impurity is:

For a split on an attribute, the resulting impurity is the weighted average of the impurities of the child nodes. A Gini index of indicates a pure node where all elements belong to a single class.