machine-learning attention

Definition

Attention

Attention mechanisms allow neural networks to dynamically focus on relevant parts of input data when producing each element of the output sequence. Instead of treating all input tokens, attention assigns different weights to different tokens, highlighting their relevance.

Types

Soft Attention

Soft Attention

Soft Attention: Computes a weighted sum of the input features, where the weights (attention scores) are learned.

Transformers use soft attention rather than hard attention.

Example: Scaled Dot-Product Attention

Hard Attention

Hard Attention

Hard Attention: Selects a single part of the input (discrete choice) for each output step, often used reinforcement learning.