As modern processor pipeline depth and issue width increase, the performance penalties caused by a branch misprediction also increase, motivating the need for highly accurate branch predictors. Virtually every modern processor contains a sophisticated branch predictor to improve processor performance. Branch predictors try to find patterns within a program’s execution history in order to accurately predict the
outcome of a given branch. This history used for prediction can be local for a given branch, global for the entire program, or
often times a hybrid combination of both. The hybrid Tagged Geometric length (TAGE) branch predictors are some of the
best predictors proposed in academia. Although several papers describe the performance of TAGE through trace- and
execution-driven simulation, there is limited prior work on detailed hardware implementations. The large GHR of TAGE-style predictors creates a new challenge for optimizing the storage, retrieval, and recovery of GHR bits.
This work explores global history register (GHR) implementations for TAGE style branch predictors with speculative updates. We break down the requirements to both update and recover TAGE predictors’ history registers during normal operation and after mispeculation, discussing where various designs exhibit large checkpoint and/or operation overheads. To reduce these inefficiencies, we introduce BADGR, a novel GHR design for TAGE predictors that lowers power consumption and chip area over naive checkpointing techniques by 90% and 85%, respectively.