Spectre & Meltdown Processor Vulnerabilities: A Technical Introduction
The important technical aspects of Spectre and Meltdown
In a previous post we discussed the recently disclosed processor attacks called Spectre (CVE-2017–5753, CVE-2017–5715) & Meltdown (CVE-2017–5754). While we know that most of our readers are users of technology who simply want to be better at security (and aren’t interested in the technical), we also know that some of our readers enjoy going a bit deeper into the technical weeds. In this post we’ll highlight some of the key technical learning that comes from Spectre and and Meltdown.
Although this is a technical blog post, this is not meant to teach the attacks. Instead, it highlights some key technical principles that the attacks present. This blog is based on the technical write ups, and those should be consulted for a full understanding of these attacks.
Key terminology
Before we discuss the what exactly is happening in Spectre and Meltdown we have to understand some key terminology:
Side channel
A side channel in information security is a function or characteristic of a system that discloses unintended information.
An example in the physical world: visibility of lights within your home is a side channel for identifying if someone is in your house without actually observing a person in the house. Or if one light comes on every day at the same time and no other lights do, it’s a side channel for identifying that the occupants may be out of town. So a side channel looks out for information that is being disclosed, and attempts to exploit it. A side channel is not necessarily a technical weakness in a specific system.
An example of this in technology: noticing heat differences of a computer to deduce what the system is doing. Wired has a good article on this in practice.
Covert channel
A covert channel is more invasive than a side channel. While a side channel is simply “observed,” a covert channel usually alters the system it’s attacking. This alteration creates the ability to transfer data from a privileged context to a context that should not have access to that data.
An example in the physical world: an adversary bugs a house by hiding microphones in your walls. They have altered the house, and can now overhear privileged conversations.
An example of this in technology: A malicious program exfiltrates data by modifying the IP identification field, of legitimate packets. The attacker then sniffs these packets off of the network. By encapsulating data within the IP identification field of legitimate packets, a malicious program may be able to bypass system monitoring.
In both cases the covert channel causes a “change” to the system, whereas a side-channel is passive. (Note that in common parlance, these terms are sometimes used interchangeably — simply to denote a method of unintended information disclosure).
Kernel
Operating Systems have a portion designated as the “Kernel”. This is loaded into protected memory spaces and controls the interaction between the programs and the underlying hardware of the system.
Virtual memory
Computers isolate one process’ memory from other processes’ memory. But each program thinks it has access to all of the system’s memory. The Central Processing Unit (CPU) presents each process with a virtual memory space and maintains a “page table” that translates this virtual memory to physical memory.
Physical memory
These are the hardware based memory modules that sit physically within your computer and hold the actual bits that make up random access memory (RAM). The CPU maps virtual memory to physical memory.
Pipelining, out-of-order/speculative execution, transient instructions
The architecture design of modern CPUs led to the Spectre and Meltdown vulnerabilities. Specifically, instruction pipelining. Programs are written in human readable languages (e.g. Python, Ruby, JavaScript) and then compiled into machine code which can be executed by the CPU.
To improve efficiency, modern CPUs implement instruction pipelining. This means that the processor tries to keep every part of the CPU occupied with something. So while one part of the CPU is fetching data the CPU may move on to the next instruction, even if it depends on the result of the fetched data. This leads to out-of-order execution of instructions. The CPU make a best guess on what instructions will be executed next. This is also known as speculative execution.
If the CPU takes an incorrect logic path the results of the speculative execution are disregarded. These disregarded instructions are known as “transient instructions”. Although these transient instructions do not change actual program flow, during the speculative execution of these transient instructions the “microarchitectural state” of the CPU changes. Specifically, the CPU cache is affected.
CPU cache
To further speed up access, the CPU maintains on-chip caches that store the values of frequently accessed memory.
Race condition
A race condition in information security is a vulnerability that requires an attacking system to be the “first” to respond to a data request or the first to access data before it disappears. A good technical example of a race condition was a 2017 vulnerability in MySQL. Essentially, if an attacker can write files to the system they can overwrite a temporary file that is generated during a privileged MySQL operation. If they replace the temporary file, after file creation but before operation completion, they can escalate privileges to a within MySQL.
User-space vs kernel-space memory pace
Finally, it’s important to make an explicit distinction between kernel-space memory and user-space memory. The user-space virtual memory holds the memory of running programs (i.e. processes). User-space virtual memory is mapped to a specific portion of physical memory.
The kernel has its own virtual memory space but this kernel-space virtual memory is mapped to the entire physical memory (We’ll see the consequences of this when we discuss Meltdown).
The CPU maintains a page table that includes mappings to kernel-space and to user-space memory. This page table translates virtual memory to physical memory. It’s important to remember that kernel-space memory is mapped to all physical memory, which also contains user-space memory.
When switching between user-space operations and kernel-space operations the CPU denotes if the virtual memory in use is user-space virtual memory or kernel-space virtual memory. It does this by setting or unsetting a “supervisor bit”. This limits the memory access of the current CPU operation to specific pages of virtual memory.
So with all of these terms defined, let’s look at how Spectre and Meltdown exploit modern CPU architecture design.
Isolation not respected
The key design flaw, that both of these vulnerabilities take advantage of, is that memory placed in caches is not protected from access by other processes that should not have access to the original, uncached memory.
With all of the vulnerabilities, a malicious program causes data to be written to the CPU cache, which is then read through a side channel attack. This side channel attack is dependent on a race condition of the malicious program reading the CPU cache before it is overwritten.
Spectre
The consequence of Spectre is that an adversarial program can read the process memory of other programs.
As we mentioned above, modern CPUs use speculative execution to increase processing efficiency. CPUs take a best guess approach to speculative execution. By observing recent execution patterns the CPU “trains” itself on the fly to follow certain logic paths more often than others.
Spectre takes advantage of this design and trains the CPU to follow desired speculative paths. There are two variants of Spectre: conditional branch misprediction exploitation and indirect branch exploitation.
Conditional branch misprediction exploitation
One of the powerful aspects of programming is the ability to write conditional logic. i.e. “if this then that”. This gets translated into “conditional branch” logic at a machine code level. To exploit Spectre, a malicious program performs conditional tests with the same result, true
or false
, repeatedly.
Pseudocode for a conditional branch operation:
x = 6
if x equals 6
print 'x equals 6'
If we perform the above action repeatedly the CPU will train itself to follow the true
path during speculative execution even if the result is not true in a later operation.
Now let’s say during a later speculative branch operation we try to access privileged memory
x = 2
if x equals 6
y = my_array[x * 256] ; memory outside of the current program
The conditional test will fail. But if we’ve properly trained the CPU, before it fails the test, the CPU will perform the transient instructions of the true
conditional branch. The processor will look up the value at the memory address of array my_array[x * 256]
which is 512 bytes ahead of my_array
in memory, outside of the programs memory space.
If the conditional branch actually evaluated to true this would cause a memory access violation error to be thrown and the program would crash. But it the conditional evaluates to false
, no error is thrown and the result of the transient instruction is disregarded, but the result of the transient instruction (the value atmy_array[x * 256]
) now sits in the CPU cache.
Indirect branch exploitation
Indirect branch machine code instructions cause the CPU to evaluate machine code instructions elsewhere in memory instead of continuing the sequential processing of instructions.
Example machine code operation jumping to execute instructions at memory address 0x12345678
:
jmp 0x12345678
If a malicious program can train the CPU to take speculative jumps to code that performs known computation, this computation can change the microarchitectural state of the CPU cache.
The key similarity between conditional branch misprediction exploitation and indirect branch exploitation is that both methods train the CPU to take a speculative logic path that results in execution desired by the exploiting program. It is is eventually disregarded, but before this discard happens, it alters the CPU cache.
The key difference between the two is the location of the machine code. The code of conditional branch misprediction exploitation sits within the malicious program’s memory space. On the other hand, the code of indirect branch exploitation sits elsewhere in memory but still performs the desired computation that alters the CPU cache.
Harvesting data from CPU caches
The CPU cache is then used by the malicious program as a side channel. After either conditional branch misprediction exploitation or indirect branch exploitation the malicious program performs time based side-channel attacks to infer what changes were made to the caches, subsequently learning the privileged information stored in these caches. These attacks are documented in other research and are outside of the scope of this blog post.
So as a full attack process, Spectre uses CPU caches to create a covert channel that moves the data from one process, within user-space memory, to a malicious program, also in user-space memory. In the end, Spectre discloses privileged information to a malicious program.
In contrast to the two Spectre vulnerabilities, the Meltdown attack has a single variant. It also uses the CPU cache as a covert channel, but with some important differences in how the attack is technically carried out.
Meltdown
The consequence of Meltdown is that a malicious program can read kernel-space memory. Since kernel-space memory is mapped to all physical memory this means that a malicious program can read all kernel-space memory and all user-space memory.
Meltdown is an attack tailored to the logic circuitry of the CPU, and for this reason it is mainly limited to Intel-based processors. Meltdown does not require code that is tailored to a specific software environment. With Spectre, by comparison, attacks must be tailored to the software environment of the victim’s system — but they can work on systems running Intel and non-Intel architecture.
Additionally, the Meltdown attack abuses a privilege escalation vulnerability: Intel CPUs allow the transient instructions of user-space processes to access kernel-space memory.
Exploitation
Instead of training the CPU to take specific logic branches, Meltdown directly attempts to access privileged kernel-space memory pages. When a user-space process attempts to access privileged kernel-space memory the CPU throws an access violation exception error. The malicious program “catches” the access violation exception. By catching this exception the malicious program doesn’t crash. However, the CPU still executes the transient instruction and kernel-space memory is placed in the CPU cache.
The malicious program then performs the same side channel attacks that Spectre uses to read CPU cache memory.
With these two classes of attacks, there are different steps for mitigation.
Mitigation
Let’s look at the technical solutions for Spectre and Meltdown.
Meltdown migitagion
Meltdown is the easier of the two to mitigate. Some patches have already mitigated Meltdown.
Meltdown depends on accessing kernel-space memory, and because that memory is already designated as privileged, the CPU simply needs to respect this isolation when executing speculative instructions.
In June of 2017, before these vulnerabilities were disclosed, the Linux kernel implemented a patch called Kernel Address Isolation to have Side-channels Efficiently Removed (KAISER), to prevent the disclosure of random kernel-space memory via side channels. KAISER separated the kernel-space page table and the user-space page tables completely. With KAISER, the kernel-space page table is only available when performing kernel operations.
The goal of Meltdown is to go from random kernel-space memory access to systematic kernel-space memory access. Since the malicious program runs in user-space memory, and no longer has access to the kernel-space page table, the KAISER patch inadvertently mitigated Meltdown. Mac and Microsoft are both implementing their own versions of KAISER which are generally known as Kernel Table Page Isolation (KPTI).
Spectre mitigation
Unfortunately, Spectre will probably “haunt” us for a long time. This is because attacks are happening within the same memory space (i.e. user-space memory), and it is hard to isolate memory access during speculative execution.
Researchers propose several possible steps for mitigation: 1) halting potentially sensitive speculative execution paths until earlier logic is resolved, and 2) flushing branch prediction models when switching between processes. The first solution would “severely degrade performance” and we’ve already seen benchmark performance take hits of 2%-14% with the patches provided at the time of writing this post. The second solution is currently stated as a hypothetical in the research, and the authors acknowledge that it may not mitigate all cases of Spectre.
Conclusion
These are by no means easy vulnerabilities to understand. They delve deep into the core workings of CPUs and blur the lines between software and hardware. In addition to understanding everything written here, to perform the attacks, you would also have to understand the previous literature that details how to perform side channel attacks against CPU caches, which we take for granted here. (This is mainly because I don’t fully understand how to implement them on a level deeper than the theoretical).
I hope that this has helped you understand the deeper, technical aspects of these vulnerabilities. I myself am constantly learning, so if you notice a technical inaccuracy please leave a comment!
This is a post from Isaiah Sarju of Revis Solutions . If you like this post be sure to clap, check out his other posts on the Revis Solutions Blog, and follow on Twitter @isaiahsarju, @revissolution