Branch prediction

Antoine Marandon
Jul 14 · 3 min read

Consider this code (description bellow):

Image for post
Image for post
  1. We initialize an array of random data
  2. In the first benchmark, we sum its higher than average numbers
  3. We sort the array
  4. We run the same benchmark again.

Sounds simple right ?

However, the results can be a little bit unexpected:

  • 0.03 sec for the first one
  • 0.01 sec for the second one

What’s going on here ?

If you can read a title, you probably know what’s going on:

Branch predictions.

Steve Jobs once tried to explain what is branch prediction:

“I don’t know what it does. It predicts branches”

“It’s a good thing”

Steve jobs, 2003

Thank you Steve !

I’m not sure if this was enough, so I’ll try to explain a bit more:

The pipeline

When a CPU execute instructions, they are fed into a pipeline.

Consider this graphic:

Image for post
Image for post

Here, each square represent one instruction.

Even tho this theorical CPU can execute one instruction per cycle, the execution of one instruction is spread over 4 cycles.

Here are some exemple of real world pipelines:

+------------+----------------+
|Architecture| Pipeline Depth |
+------------+----------------+
| Pentium 4 | Up to 31 |
| PPC G4 | 7 |
| Intel Core | 14 |
| ARM | 3 - 13 |
+------------+----------------+

Why have a pipeline at all ?

Splitting the work into smaller chunks makes it easier to increase the frequency (since each unit needs to process less things).

So… why not making it even longer ?

Well, there’s a practical limit on how much you can split an instruction… And there’s bubbles !

Bubbles

What if the next instruction needs the result from the current one ?

Well then this happens:

Image for post
Image for post

We need to wait for the previous instruction to produce the result, creating “bubble” within the pipeline, where the stages can do nothing useful.

The longer the pipeline, the worse the effect, since the bubble will span more and more stages.

If you thought that was bad, there’s worste to come:

Branches.

Conditional jumps

Some of the instructions are called “conditional jumps” (think if … else ).

What happen when the CPU encounter such instruction ? The next instruction is not waiting for the result of the previous one: the CPU litteraly does’nt know which instruction is next.

So instead of waiting for the result to resume the pipeline again, the CPUs trys to predict it.

If it made the right choice, it can keep on going, if not… it discards everything, and loads the correct branch. At best it gained several cycles, at worse… it looses nothing.

There are various kind of predictions algorithms:

  • Flip a coin (usually do not take the jump): 50% accuracy
  • Lookup past iterations results (usually with 1–2 bits of memory)
  • Replay a pattern

Modern branch predictors are estimated at over 95% accuracy !

Conclusion

So… What about our benchmark test ?

Since our array contains random data, it’s easy to see that no matter how good that branch prediction is, it can predict only with 50% accuracy.

But once the array is sorted, it becomes possible to “predict” the next branching result with >99% accuracy.

And here we can explain our performance difference

Eureka Engineering

Learn about Eureka’s engineering efforts, product…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store