Lukas' Notes

computer-architecture

Definition

Global Branch Predictor

A global branch predictor is a dynamic branch prediction scheme that predicts a branch using the outcomes of all recent branches, not just the current one.

It maintains a global history register — a shift register of the last branch outcomes — and uses this history to index a pattern history table of 2-bit saturating counters. Because the history captures correlations between different branches, a global predictor can exploit patterns like if branch A is taken, branch B is likely not taken.

Examples

1-Bit Branch Prediction

Consider a 1-bit global predictor for the following code:

li s0, 1024
xloop: 
	li s1, 4
yloop: 
	mv a0, s0
	mv a1, s1
	jal ra, do_something
	addi s1, s1, -1
	bnez s1, yloop        # L09
	addi s0, s0, -1
	bnez s0, xloop        # L11

For the inner loop (L09), we have a branch pattern:

and a nested loop pattern:

We know that L09 loops for four iterations. The low bit of the PC for L09 and L11 happens to be the same, so both branches alias to the same BHT entry. With a 1-bit predictor, a single shared counter governs all predictions. At start, the shared entry is PNT (predict not taken).

BranchStartL09L09L09L09L11
BTB entry L09Y
BTB entry L11Y
Shared BHTPNT
PredictionNT
Direction-
Correct?-

First L09. BHT PNT predict NT. The branch is taken () — misprediction. The shared counter flips to PT.

BranchStartL09L09L09L09L11
BTB entry L09YY
BTB entry L11YY
Shared BHTPNTPNT
PredictionNTNT
Direction-T
Correct?-N

Second L09. BHT PT predict T. The branch is taken again () — correct. The counter stays PT.

BranchStartL09L09L09L09L11
BTB entry L09YYY
BTB entry L11YYY
Shared BHTPNTPNTPT
PredictionNTNTT
Direction-TT
Correct?-NY

Third L09. BHT PT predict T. The branch is taken () — correct. The counter stays PT.

BranchStartL09L09L09L09L11
BTB entry L09YYYY
BTB entry L11YYYY
Shared BHTPNTPNTPTPT
PredictionNTNTTT
Direction-TTT
Correct?-NYY

Fourth L09. BHT PT predict T. But the branch is not taken () — misprediction. The counter flips back to PNT.

BranchStartL09L09L09L09L11
BTB entry L09YYYYY
BTB entry L11YYYYY
Shared BHTPNTPNTPTPTPT
PredictionNTNTTTT
Direction-TTTNT
Correct?-NYYN

L11 (outer loop). BHT PNT predict NT. The branch is taken () — misprediction. The shared counter flips to PT.

BranchStartL09L09L09L09L11
BTB entry L09YYYYYY
BTB entry L11YYYYYY
Shared BHTPNTPNTPTPTPNTPNT
PredictionNTNTTTTNT
Direction-TTTNTT
Correct?-NYYNN

Misprediction rate branches ( inner outer). In the first iteration we mispredict of them (the first L09 and the L11). After L11, the shared counter is back at PT, so subsequent outer iterations start with a correct prediction for the first L09, giving misprediction rate overall.

Per outer loop iteration we execute

2-Bit Branch Prediction

Now replace the 1-bit counter with a 2-bit saturating counter. States: PWNT (weakly NT), PWT (weakly T), PST (strongly T). Same aliasing setup — L09 and L11 share one entry. Start: BHT PWNT.

BranchStartL09L09L09L09L11
BTB entry L09Y
BTB entry L11Y
Shared BHTPWNT
PredictionNT
Direction-
Correct?-

First L09. BHT PWNT (weakly NT) predict NT. Taken () — misprediction. Counter increments to PWT (weakly T).

BranchStartL09L09L09L09L11
BTB entry L09YY
BTB entry L11YY
Shared BHTPWNTPWNT
PredictionNTNT
Direction-T
Correct?-N

Second L09. BHT PWT (weakly T) predict T. Taken () — correct. Counter increments to PST (strongly T).

BranchStartL09L09L09L09L11
BTB entry L09YYY
BTB entry L11YYY
Shared BHTPWNTPWNTPWT
PredictionNTNTT
Direction-TT
Correct?-NY

Third L09. BHT PST (strongly T) predict T. Taken () — correct.

BranchStartL09L09L09L09L11
BTB entry L09YYYY
BTB entry L11YYYY
Shared BHTPWNTPWNTPWTPST
PredictionNTNTTT
Direction-TTT
Correct?-NYY

Fourth L09. BHT PST (strongly T) predict T. Not taken () — misprediction. Counter decrements to PWT (weakly T).

BranchStartL09L09L09L09L11
BTB entry L09YYYYY
BTB entry L11YYYYY
Shared BHTPWNTPWNTPWTPSTPST
PredictionNTNTTTT
Direction-TTTNT
Correct?-NYYN

L11 (outer loop). BHT PWT (weakly T) predict T. Taken () — correct. Counter increments to PST (strongly T).

BranchStartL09L09L09L09L11
BTB entry L09YYYYYY
BTB entry L11YYYYYY
Shared BHTPWNTPWNTPWTPSTPSTPWT
PredictionNTNTTTTT
Direction-TTTNTT
Correct?-NYYNY

Hysteresis at work ). After the first iteration the counter sits at PST (strongly T). In subsequent outer loops, only the inner loop exit is mispredicted (Y Y Y N Y — ). The 2-bit predictor converges to a better steady state than the 1-bit case because hysteresis prevents a single NT from flipping the prediction all the way back to NT — it only drops from PST to PWT, still in taken territory.

The first iteration mispredicts only the first and fourth L09 (N Y Y N Y —