Table of Contents
1 Hardware
1.1 CPU
1.1.1 Caching
CPUs have small pools of memory that store information which the CPU is most likely to need next. The goal of the caching system is to ensure that the CPU has the next bit of data it need already loaded into the cache by the time it goes looking for it.
1.1.2 Branch-prediction
A branch predictor is a digital circuit that tries to guess which way a branch (e.g. an if-then-else structure) will go before it is known for sure.
I presume the perf-gain comes from upon a correct prediction allows the CPU to pre-load the required data into the CPU cache beforehand.
Apparently it also allows for speculatively execution of the predicted branch.
1.1.2.1 Misprediction
The time that is wasted in case of a branch misprediction is equal to the number of stages in the pipeline from the fetch stage to the execute stage. Modern microprocessors tend to have quite long pipelines so that the misprediction delay is between 10 and 20 clock cycles. As a result, making a pipeline longer increases the need for a more advanced branch predictor.