Definition
Clustering Algorithm
A clustering algorithm is a fundamental method in machine learning, specifically within the paradigm of unsupervised learning. Its primary objective is to partition a set of unlabeled data points into a number of groups, known as clusters, based on their intrinsic properties and a defined measure of similarity.
Let be a set of data points, where each is a vector in a -dimensional feature space. A clustering algorithm aims to find a partition of the dataset into clusters, such that:
- Exhaustiveness: Every data point belongs to at least one cluster.
- Mutual Exclusivity (for hard clustering): Each data point belongs to exactly one cluster.
- Non-Emptiness: Each cluster must contain at least one data point.
The core principle is to maximize intra-cluster similarity and minimize inter-cluster similarity. The definition of “similarity” is crucial and is typically encoded using a distance metric (e.g., Euclidean distance) or a similarity function (e.g., cosine similarity).