Definition
Attention
Attention mechanisms allow neural networks to dynamically focus on relevant parts of input data when producing each element of the output sequence. Instead of treating all input tokens, attention assigns different weights to different tokens, highlighting their relevance.
Types
Soft Attention
Soft Attention
Soft Attention: Computes a weighted sum of the input features, where the weights (attention scores) are learned.
Transformers use soft attention rather than hard attention.
Example: Scaled Dot-Product Attention
Hard Attention
Hard Attention
Hard Attention: Selects a single part of the input (discrete choice) for each output step, often used reinforcement learning.