Clustering Algorithm

Definition

Clustering Algorithm

A clustering algorithm is a fundamental method in machine learning, specifically within the paradigm of unsupervised learning. Its primary objective is to partition a set of unlabeled data points into a number of groups, known as clusters, based on their intrinsic properties and a defined measure of similarity.

Let $X = {x_{1}, x_{2}, \dots, x_{n}}$ be a set of $n$ data points, where each $x_{i} \in R^{d}$ is a vector in a $d$ -dimensional feature space. A clustering algorithm aims to find a partition $C = {C_{1}, C_{2}, \dots, C_{k}}$ of the dataset $X$ into $k$ clusters, such that:

Exhaustiveness: Every data point belongs to at least one cluster.
$⋃_{i = 1}^{k} C_{i} = X$

Mutual Exclusivity (for hard clustering): Each data point belongs to exactly one cluster.
$C_{i} \cap C_{j} = \emptyset for i \neq = j$

Non-Emptiness: Each cluster must contain at least one data point.
$C_{i} \neq = \emptyset for i = 1, \dots, k$

The core principle is to maximize intra-cluster similarity and minimize inter-cluster similarity. The definition of “similarity” is crucial and is typically encoded using a distance metric (e.g., Euclidean distance) or a similarity function (e.g., cosine similarity).

Lukas' Notes

Clustering Algorithm

Definition

Graph View

Backlinks