Maximum Likelihood Estimation

statistics machine-learning optimisation

Definition

Maximum Likelihood Estimation

Maximum Likelihood Estimation (MLE) is a frequentist method for estimating the parameters $θ$ of a statistical model. Formally, given observed data $D$ , MLE identifies the parameter values that maximise the likelihood function $P (D ∣ θ)$ , making the observed data as probable as possible under the model:

$θ_{MLE}^{*} = ar g max_{θ \in Θ} P (D ∣ θ)$

Example: Bernoulli Parameter

Consider a sequence of $m$ independent coin tosses where $k$ heads are observed. The maximum likelihood estimate for the probability of heads $θ$ is the relative frequency:

$\hat{θ}_{MLE} = \frac{k}{m}$

Implementation Details

In practice, the log-likelihood $ln P (D ∣ θ)$ is typically maximised to simplify computations for independent samples.

Limitations

Overfitting: MLE is highly sensitive to the specific dataset provided and does not incorporate any prior knowledge. In settings with sparse data, this can lead to extreme parameter estimates (e.g., assigning zero probability to an event not seen in the training set).

Comparison with MAP: Unlike MAP, MLE assumes a uniform or non-informative prior distribution over the parameters, ignoring any initial beliefs about the hypothesis space.

Lukas' Notes

Maximum Likelihood Estimation

Definition

Example: Bernoulli Parameter

Implementation Details

Limitations

Graph View

Table of Contents

Backlinks