How sorted array process faster than Unsorted array? (in short Branch Prediction)
Hello readers…
Today, I was reading about Branch Prediction, and, suddenly, I have got something interesting about the timing concepts of sorting an array.
And, I am more interested in it because of more computer architecture concepts rather than talking about data structure or complexities.
So, I thought it would be a great idea to share it with you guys, it may be possible that lots of people already know about this stuff.
So, First of all, we’ll start from, What is Branch Predictor?
According to Wikipedia, “ In computer architecture, a branch predictor is a digital circuit that tries to guess which way a (e.g. an if-then-else structure) will go before this is known definitively. The purpose of the branch predictor is to improve the flow in the instruction pipeline. Branch predictors play a critical role in achieving high effective performance in many modern pipelined microprocessor architectures such as x86.”
Let me explain you via using some real life examples,
1.
Suppose, you are chit-chatting with someone on some social media sites, now, after some certain point of chatting, you come at that point when you can easily guess what will be the next message of the person. So, you have already written a big response and waiting for their message.
Now, what if you will receive some unexpected message, then, you need to delete full response that you have already written for them and then, you need to write it again. This process will take some more amount of time because first you need to delete the old one and then write a new response.
So, what if you will receive expected message, then, you can easily send that already written message that’s all, with some small amount of time like ΔT.
Same this process apply on modern processors, to make them effective and fast.
2.
Let me take another real life beautiful example (better than first one), that I found on StackOverflow.
Consider a railroad junction:


Now for the sake of argument, suppose this is back in the 1800s — before long distance or radio communication.
You are the operator of a junction and you hear a train coming. You have no idea which way it is supposed to go. You stop the train to ask the driver which direction they want. And then you set the switch appropriately.
Trains are heavy and have a lot of inertia. So they take forever to start up and slow down.
Is there a better way? You guess which direction the train will go!
- If you guessed right, it continues on.
- If you guessed wrong, the captain will stop, back up, and yell at you to flip the switch. Then it can restart down the other path.
If you guess right every time, the train will never have to stop.
If you guess wrong too often, the train will spend a lot of time stopping, backing up, and restarting.
Consider an if-statement: At the processor level, it is a branch instruction:

You are a processor and you see a branch. You have no idea which way it will go. What do you do? You halt execution and wait until the previous instructions are complete. Then you continue down the correct path.
Modern processors are complicated and have long pipelines. So they take forever to “warm up” and “slow down”.
Is there a better way? You guess which direction the branch will go!
- If you guessed right, you continue executing.
- If you guessed wrong, you need to flush the pipeline and roll back to the branch. Then you can restart down the other path.
If you guess right every time, the execution will never have to stop.
If you guess wrong too often, you spend a lot of time stalling, rolling back, and restarting.
This is branch prediction. I admit it’s not the best analogy since the train could just signal the direction with a flag. But in computers, the processor doesn’t know which direction a branch will go until the last moment.
So how would you strategically guess to minimize the number of times that the train must back up and go down the other path? You look at the past history! If the train goes left 99% of the time, then you guess left. If it alternates, then you alternate your guesses. If it goes one way every 3 times, you guess the same…
In other words, you try to identify a pattern and follow it. This is more or less how branch predictors work.
Most applications have well-behaved branches. So modern branch predictors will typically achieve >90% hit rates. But when faced with unpredictable branches with no recognizable patterns, branch predictors are virtually useless.
Further reading: “Branch predictor” article on Wikipedia.
As hinted from above, the culprit is this if-statement:
if (data[j] >= 50)
sum += data[j];Notice that the data is evenly distributed between 0 and 100. When the data is sorted, roughly the first half of the iterations will not enter the if-statement. After that, they will all enter the if-statement.
This is very friendly to the branch predictor since the branch consecutively goes the same direction many times. Even a simple saturating counter will correctly predict the branch except for the few iterations after it switches direction.
Quick visualization:
T = branch taken
N = branch not takendata[] = 0, 1, 2, 3, 4, ... 50, 51, 52, 53, 54, ... 95, 96, 97, ...
branch = N N N N N ... T T T T T ... T T T ... = NNNNNNNNNNNN ... TTTTTTTTT ... TTTTTTTTTT (easy to predict) However, when the data is completely random, the branch predictor is rendered useless because it can’t predict random data. Thus there will probably be around 50% misprediction. (no better than random guessing)
data[] = 55, 53, 60, 67, 70, 56, 98, 49, 99, 12, 32, 79, 78, 69, 92 ,1, ...
branch = T, T, N, T, T, T, T, N, T, N, N, T, T, T, N ... = TTNTTTTNTNNTTTN ... (completely random - hard to predict)For replacing this, you need to write some hacks like some bitwise operators or something else depends on your code.
I wrote a sample python code for demo purpose,


In above code, more than 98 % winner is sorted array.
Results with time:


Branch prediction and speculative execution affects Computer Security?
(
We are not going into depth of these vulnerabilities because that isn’t the aim of this content, but we’ll look into the basic idea or overview.
For In depth explanation of Meltdown and Spectre:
Meltdown and Spectre
)
In the history of computer architecture (or hardware), the two vulnerabilities became very famous and also an example of branch prediction and speculative execution bugs:
- Meltdown
- Spectre
Overview
Meltdown was issued a Common Vulnerabilities and Exposures ID of CVE-2017–5754, also known as Rogue Data Cache Load, in January 2018. It was disclosed in conjunction with another exploit, Spectre, with which it shares some, but not all characteristics
Meltdown exploits a race condition, inherent in the design of many modern CPUs. This occurs between memory access and privilege checking during instruction processing. Additionally, combined with a cache side-channel attack, this vulnerability allows a process to bypass the normal privilege checks that isolate the exploit process from accessing data belonging to the operating system and other running processes. The vulnerability allows an unauthorized process to read data from any address that is mapped to the current process’s memory space.
Since instruction pipelining is in the affected processors, the data from an unauthorized address will almost always be temporarily loaded into the CPU’s cache during out-of-order execution — from which the data can be recovered. This can occur even if the original read instruction fails due to privilege checking, and/or if it never produces a readable result.
Affected hardware
The Meltdown vulnerability primarily affects Intel microprocessors, but some ARM microprocessors are also affected. The vulnerability does not affect AMD microprocessors. Intel has countered that the flaws affect all processors, but AMD has denied this, saying “we believe AMD processors are not susceptible due to our use of privilege level protections within paging architecture”.
Researchers have indicated that the Meltdown vulnerability is exclusive to Intel processors, while the Spectre vulnerability can possibly affect some Intel, AMD, and ARM processors. However, ARM announced that some of their processors were vulnerable to Meltdown. Google has reported that any Intel processor since 1995 with out-of-order execution is potentially vulnerable to the Meltdown vulnerability (this excludes Itanium and pre-2013 Intel Atom CPUs). Intel introduced speculative execution to their processors with Intel’s P6 family microarchitecture with the Pentium Pro IA-32microprocessor in 1995.
Discoverers
Meltdown was discovered independently by Jann Horn from Google’s Project Zero, Werner Haas and Thomas Prescher from Cyberus Technology, as well as Daniel Gruss, Moritz Lipp, Stefan Mangard and Michael Schwarz from Graz University of Technology. The same research teams that discovered Meltdown also discovered a related CPU security vulnerability now called Spectre.
Patch Work
→ Performance Penalty is the main issue, here.
Several procedures to help protect home computers and related devices from the Meltdown and Spectre security vulnerabilities have been published. Meltdown patches may produce performance loss. Spectre patches have been reported to significantly reduce performance, especially on older computers; on the newer eighth-generation Core platforms, benchmark performance drops of 2–14 percent have been measured.
In June 2017, KASLR was found to have a large class of new vulnerabilities. Research at Graz University of Technology showed how to solve these vulnerabilities by preventing all access to unauthorized pages. A presentation on the resulting KAISER technique was submitted for the Black Hat conference in July 2017, but was rejected by the organizers. Nevertheless, this work led to kernel page-table isolation (KPTI, originally known as KAISER) in 2017, which was confirmed to eliminate a large class of security bugs, including the not-yet-discovered Meltdown — a fact confirmed by the Meltdown authors.
Note:
- I have tried to cover as most of what I have learned and materials that I found in google, In case I have forgotten or missed something, please feel free to update me.
- Branch Prediction and Speculative Execution, both are the backbone of Intel CPU bugs (discovered on 1 Jan 2018).
Credit goes to Mysticial’s answer on StackOverflow.