Lukas' Notes

computer-architecture

Definition

Very Long Instruction Word Processor

A Very Long Instruction Word (VLIW) processor is a pipelined processor that issues one wide instruction word per cycle, where the word encodes multiple independent operations that execute in parallel on distinct functional units.

Scheduling is performed statically by the compiler: the compiler finds independent operations, packs them into an issue bundle, and the hardware executes the bundle without runtime dependence checking. This contrasts with a superscalar processor, where the hardware performs scheduling dynamically.

Instruction Format

A VLIW instruction contains several operation slots, each targeting a specific functional unit. The compiler must ensure that:

  1. no two operations in the same bundle write the same register,
  2. no operation reads a register that another operation in the same bundle writes,
  3. any RAW dependence between operations is respected by placing dependent operations in a later bundle.

If the compiler cannot find enough independent operations to fill a bundle, it inserts NOP operations in the unused slots, wasting issue bandwidth.

Static Scheduling

Compiler Responsibilities

The compiler performs the tasks that a superscalar processor does at runtime:

  • Dependence analysis — identifies RAW, WAR, and WAW constraints.
  • Instruction reordering — reorders instructions across basic blocks to fill issue slots. Loop unrolling, software pipelining, and trace scheduling are common techniques.
  • Resource allocation — assigns operations to functional unit slots and ensures no structural hazard occurs within a bundle.

Hardware Simplicity

Because dependence checking and issue logic are moved to the compiler, VLIW hardware is simpler than superscalar hardware. There is no issue buffer, no wakeup/select logic, and no runtime hazard detection between parallel operations. The processor simply dispatches one bundle per cycle and lets the functional units execute in lockstep.

VLIW vs Superscalar

VLIWSuperscalar
SchedulingCompiler (static)Hardware (dynamic)
Dependence checkCompile-timeRuntime
Hardware complexityLowHigh
Compiler complexityVery highModerate
Code sizeLarger (NOPs)Smaller
Binary compatibilityRecompilation needed for different issue widthSame binary works on different implementations

A VLIW binary compiled for one issue width cannot run efficiently on a processor with a different width without recompilation. A superscalar binary adapts dynamically.