machine-learning

Definition

Clustering Algorithm

A clustering algorithm is a fundamental method in machine learning, specifically within the paradigm of unsupervised learning. Its primary objective is to partition a set of unlabeled data points into a number of groups, known as clusters, based on their intrinsic properties and a defined measure of similarity.

Let be a set of data points, where each is a vector in a -dimensional feature space. A clustering algorithm aims to find a partition of the dataset into clusters, such that:

  1. Exhaustiveness: Every data point belongs to at least one cluster.
  2. Mutual Exclusivity (for hard clustering): Each data point belongs to exactly one cluster.
  3. Non-Emptiness: Each cluster must contain at least one data point.

The core principle is to maximize intra-cluster similarity and minimize inter-cluster similarity. The definition of “similarity” is crucial and is typically encoded using a distance metric (e.g., Euclidean distance) or a similarity function (e.g., cosine similarity).