Very Long Instruction Word Processor

Definition

Very Long Instruction Word Processor

A Very Long Instruction Word (VLIW) processor is a pipelined processor that issues one wide instruction word per cycle, where the word encodes multiple independent operations that execute in parallel on distinct functional units.
$IPC > 1$
Scheduling is performed statically by the compiler: the compiler finds independent operations, packs them into an issue bundle, and the hardware executes the bundle without runtime dependence checking. This contrasts with a superscalar processor, where the hardware performs scheduling dynamically.

Instruction Format

A VLIW instruction contains several operation slots, each targeting a specific functional unit. The compiler must ensure that:

no two operations in the same bundle write the same register,
no operation reads a register that another operation in the same bundle writes,
any RAW dependence between operations is respected by placing dependent operations in a later bundle.

If the compiler cannot find enough independent operations to fill a bundle, it inserts NOP operations in the unused slots, wasting issue bandwidth.

Static Scheduling

Compiler Responsibilities

The compiler performs the tasks that a superscalar processor does at runtime:

Dependence analysis — identifies RAW, WAR, and WAW constraints.
Instruction reordering — reorders instructions across basic blocks to fill issue slots. Loop unrolling, software pipelining, and trace scheduling are common techniques.
Resource allocation — assigns operations to functional unit slots and ensures no structural hazard occurs within a bundle.

Hardware Simplicity

Because dependence checking and issue logic are moved to the compiler, VLIW hardware is simpler than superscalar hardware. There is no issue buffer, no wakeup/select logic, and no runtime hazard detection between parallel operations. The processor simply dispatches one bundle per cycle and lets the functional units execute in lockstep.

VLIW vs Superscalar

	VLIW	Superscalar
Scheduling	Compiler (static)	Hardware (dynamic)
Dependence check	Compile-time	Runtime
Hardware complexity	Low	High
Compiler complexity	Very high	Moderate
Code size	Larger (NOPs)	Smaller
Binary compatibility	Recompilation needed for different issue width	Same binary works on different implementations

A VLIW binary compiled for one issue width cannot run efficiently on a processor with a different width without recompilation. A superscalar binary adapts dynamically.

Lukas' Notes

Very Long Instruction Word Processor

Table of Contents

Definition

Instruction Format

Static Scheduling

Compiler Responsibilities

Hardware Simplicity

VLIW vs Superscalar

Backlinks