Definition
Hardware Multi-Threading
Hardware multi-threading allows multiple threads to share the functional units of a single processor without duplicating the entire core.
Each thread has private state — a program counter and a register file — but all threads share the same execution resources. The processor switches between threads rapidly to hide pipeline and memory latencies, exploiting thread-level parallelism (TLP) to increase throughput.
This is distinct from instruction-level parallelism: TLP draws independent work from separate threads rather than from a single instruction stream.
Types
Three approaches differ in when and how frequently the processor switches threads.

Coarse-Grained
Definition
Link to originalCoarse-Grained Hardware Multi-Threading
Coarse-grained hardware multi-threading switches between threads only on costly stalls, typically L2 or L3 cache misses.
The pipeline drains before switching, so a thread that is running without stalls executes continuously.
Fine-Grained
Definition
Link to originalFine-Grained Hardware Multi-Threading
Fine-grained hardware multi-threading switches between threads on every clock cycle, typically in round-robin order, skipping threads that are currently stalled.
Simultaneous Multi-Threading (SMT)
Definition
Link to originalSimultaneous Multi-Threading
Simultaneous multi-threading (SMT) issues instructions from multiple threads in the same cycle.
It is built on top of a dynamically scheduled (OoO) processor, which already supplies the hardware mechanisms SMT needs: issue buffers, reservation stations, and a reorder buffer.