Is hardware the black hole of computing ?

Part I : a descent into the abyss

Lukas

Follow

Published in

Linagora Engineering

11 min readSep 6, 2018

--

An illustration of a black hole, used as a metaphor for today’s hardware, alongside dazzling points of lights, which can be associated to free and open source software. (© Alain r, CC BY-SA 2.5)

The second part of the article has been released!

Introduction

Let’s face it, amid revelations of hardware-level vulnerabilities such as Meltdown and Spectre, deploying software in today’s world looks a lot like shooting stars next to a black hole: very entertaining and deeply unsustainable. Why? Because computer hardware is sick. The layer on top of which every information system on earth is built is flawed, perhaps beyond salvation. Are we doomed? No, some glimpses of hope can be seen on the horizon: it may even be a good time to wage a fight to take the power back at the most fundamental level, and to start building a truly open information ecosystem, from top to bottom. Shall we ?

There are few places on the surface of the earth that haven’t been trod on by a human foot, explored and carefully mapped. However, below the surface, whether underground or undersea, it is a different story: there still lurks darkness and ignorance, oblivious to our attempts to shed lights on this no-man’s Underland.

In that regard, computing devices are no different from Earth. On the surface, at the level of the operating system, we can and have studied software, thanks to free and open operating systems such as Linux. But what runs beneath the operating system, before it takes control over the hardware, remains largely a mystery. Why is that the case? Why is the hardware such a black box nowadays, and why does it matter?

There is more than what meets the eye. (The earth is adapted from © Hawk-Eye, CC BY-SA 3.0; The Tux mascot was made by © Larry Ewing and GIMP; the rest is in Public Domain)

This two-part article will explore what lies beneath the software you run every day, and make a case for moving to a radically more transparent planet, where artificial barriers to our understanding of reality are not shamelessly erected by hardware oligopolies and manufacturers.

In this part, we are going to describe some problems that hardware suffer from, with a special focus on the traditional x86 architecture, notably prevalent in the server and desktop computing markets. In the second part, we will illustrate some solutions, both short and long term. We will showcase hardware hackers that are cracking open the surface of the planet, to make apparent what was once invisible and give the control back to the user, where it should have remained.

Thanks to LINAGORA, which is currently looking for new talent, for its support and for making this article possible! This opinion piece doesn’t necessarily reflect my employer’s stance on the subject.

Could you please draw me a computer?

A computing device is usually made from three distinct elements:

The hardware: it is the physical part of a computer, the foundations on top of which the software runs. It is composed of a central processing unit (CPU), some persistent and non-persistent memory, and a motherboard to hold everything together. The hardware is not made to change, hence the suffix “hard”. Examples: the ARM architecture, used mostly in mobile phones and embedded devices, and the x86 architecture, largely used in desktop and server computers.
The firmware: it is a read-only file-system which acts as an intermediary between the hardware and the software layer. It is often located on the motherboard itself. It can be modified under certain conditions, but not as easily as the software, hence the suffix “firm”. Computer parts such as solid state drives (SSDs) are also equipped with firmwares. Examples: The Unified Extensible Firmware Interface (UEFI); the Basic Input/Output System (BIOS), which the UEFI supersedes.
The software: it is the operating system that ultimately manages the hardware and allocates resources to other software, so they can work properly. The software part of a computer can be modified and reprogrammed easily, hence the term “soft”. Example: Linux; Windows; macOS.

The (U)EFI in the stack. In this article, we conflate the firmware and the (U)EFI. This schema is more prevalent among the x86 computer architecture than in the ARM ecosystem which is more fragmented and diverse. (Public Domain)

Not a smart watch, just good ol’ plain hardware.(Public Domain)

Fairytales in software Wonderland

In an ideal situation, when you buy a computer or a smartphone, it would run a free operating system with complete and transparent control over the hardware. It would be complete in the sense that no hardware components could escape the authority of the operating system; it would be transparent in the sense that the operating system could explore processes and devices attached to it, down to the firmware level. For the software to be able to truly control the hardware and the firmware, those elements would need to be open too, in order to be scrutinized and eventually trusted.

One of the key benefits of an open platform would be to allow the creation of a chain of trust, whose root would ultimately reside in the user’s hands. In other words, such a platform would give the control back to the user. An open platform would also allow more bugs to be discovered and corrected: “given enough eyeballs, all bugs are shallow”, is a claim about software development called the Linus’s Law.

This ghost is going to hit you with its stick for the foreseeable future. (Public Domain)

Spoiler alert: what you buy is not what you get

In today’s world, the situation is grim, at the exact opposite of the ideal scenario just described: hardware has gone wild. Computing devices have grown into full-blown deceiving machines, and are ridden with silicon-level bugs as illustrated by the ongoing Meltdown and Spectre fiascoes. Is that really surprising?

Throughout the years, extra layers of complexity have been added at a firmware level, to the point that your surface-level operating system has lost control of the hardware.

According to a conservative estimate [PDF], on a typical x86 laptop such as the Ubuntu-powered HP ProBook 430 G4 that I currently use, there are 2½ operating systems or kernels running between the hardware and the surface-level operating system: this amounts to millions of lines of code you have no choice but to blindly trust. Those lurking operating systems are not only out of reach of the surface-level operating system, they also run with more privileges. On top of that, they are bloated, difficult if not impossible to patch or deactivate, and rely on security by obscurity. What could go wrong? Before we answer this question and uncover those hidden systems, the notion of ring of protection needs to be introduced.

Rings of death

A traditional operating system has several protection rings, with various levels of privilege. The kernel’s code or processes are being executed in Ring 0, the most privileged ring of all. User applications, such as internet browsers and word processors, run in Ring 3, and are the least privileged. Device drivers, such as your graphic card driver, require a deeper access to the hardware, and therefore are located in-between, in Ring 2 and Ring 1.

As many rings as you want. (© Hertzsprung at English Wikipedia, GFDL, CC-BY-SA-3.0)

There is more to it. Negative rings also exist. At Ring -1, you would typically find a Type-1 hypervisor, such as Xen. QubesOS, a desktop-operating system geared towards security-minded people, relies on this. These kinds of virtual machine managers— often simply called hosts in this context — have greater privileges than the guest operating systems they maintain. This is a precondition to enforce isolation between guests, and prevent them from spying on each other.

On a side note, at the exception of Azure, each major public cloud provider is either relying on Xen (Amazon Web Service) or KVM (Google Compute Engine; Alibaba Cloud; OVH; and more recently Amazon Web Service too): both are open source. Up until Ring -1, there is no controversy, as the code running here can be open.

Going down the rabbit hole

You have probably guessed it by now, but the so-called lurking and closed operating systems that were mentioned before are running below the Ring 0 and even -1, with higher privileges than the kernel and the hypervisor. We have already named one of these operating system or kernel: it is the Unified Extensible Firmware Interface (UEFI). What does it do? It is responsible for initializing and checking the hardware at boot-time so it can be used by your surface-level operating system. Among other things, the UEFI runs at Ring -2, has network capabilities and can run full blown applications, including the legacy BIOS, the EFI version of GNU GRand Unified Bootloader (GRUB), and of course malware that will persist across operating system re-installation, while remaining invisible to every antivirus software available on the market. Some of its code is hardware-dependent and varies from one machine to another and some of it isn’t, but almost everything is closed-source, with notable exceptions that we will cover in the second part of this article. Although we have found the first operating system: there is 1½ kernels to go.

At the same level of privilege as the UEFI resides the System Management Mode (SMM), a special CPU mode which is notably responsible for some power management — such as when you close your laptop’s lid — as well as some hardware monitoring. By design, it cannot be removed or disabled and has been in the past targeted by so-called NSA’s software implants, which gave the agency persistent and covert access to infected systems. There goes ½ a kernel. Is that all? No, we are about to reach the darkest corner of the Intel x86 platform.

A computer inside your computer

Let’s go back in time. In the early nineties, Andrew Tannenbaum — Linus Torvalds’ former computer science professor — had an argument over operating system design with his former pupil. Professor Tannenbaum was advocating for microkernel design, while Linus was defending a monolithic design. Linus remained unconvinced and continued the development of Linux — then only 1 year old — and it later became the most popular monolithic operating system in the world. However, has Tannenbaum and his microkernel-based approach really lost the battle? Without the unsolicited help of Intel, it would certainly have been the case…

Shipping in each of of Intel’s processor and chipsets since 2008, behold the Intel Management Engine (Intel ME), which provides management capabilities for IT departments, and runs at… Ring -3, owning the entire platform. Among other things, it has full access to the network stack. It the event of an operating system failure, its superset, the Intel Active Management Engine, can be used to re-image a computer over the network.

Your typical intel-based x86 computer illustrated, or as many rings as you don’t want. In an ideal world, the kernel would be running at with the highest privileges, actually owning the hardware. But it appears that we are not living in the best possible world. In red, at the center of the platform, at the hearth of your computer, you will stumble across closed-source code.

The Intel ME is quite literally a computer inside your computer. The most recent version is based on an Intel Quark CPU used in the embedded market, and until the year 2017, nobody knew which operating system it was running, including the aforementioned public cloud operators — who are not exactly amateurs. Deep inside your computer, unbeknownst to your surface-level operating system, let me introduce MINIX 3, a microkernel operating system whose father is… Professor Tannenbaum.

Meet Rocky Raccoon, MINIX’s mascot, lurking in your Intel computer since 2008. (MINIX , CC BY-SA 4.0)

So, after twenty years, Professor Tannenbaum not only got his revenge but won by K.O. against Linus, although he had to wait press reports to learn the good news, as he didn’t know that Intel was using his creation (which doesn’t constitute a violation of the permissive license associated to MINIX, but would have been in the case of Linux). The result: MINIX 3 is running on hundreds of millions of machines, and became one of the most used operating system on earth overnight — and likely triggered some job openings for MINIX 3 specialists at the NSA and its counterparts around the world…

AMD, Intel’s historic competitor, has it own version of the Management Engine, called the AMD Platform Security Processor (PSP). It is powered by an ARM processor, and surprisingly, it appears to be doing the exact opposite of what its name entails.

This pervasive problem extends to your everyday devices and peripheral as well. Take for instance hard disks drives, which are equipped with a CPU, a firmware and some non-trivial amount of random-access memory (RAM). Unsurprisingly, the NSA has also successfully targeted their firmwares, a technique that can be replicated by skilled hackers.

What the NSA is capable of today is an indicator of what could be done by amateur groups in a decade or so, or even by your resentful tech-savvy ex-boyfriend or girlfriend. The problem is that even ten years from now, it is not clear how we will be able to protect against security vulnerabilities that are affecting the hardware layer. As a society, we are not ready, and aren’t preparing ourselves seriously enough.

The IT industry-wide problem illustrated. Privileges given to any layer of the platform stack are inversely proportional to the layer’s openness. Hardware is as closed as it is privileged. As a result, any hardware bug has trickle-down effects, and therefore will affect the entire ecosystem, for years to come… Humanity is building a stronghold on top of moving sands.

How did we get here?

Until now, we have described some of the problems that exist in hardware today, and we have focused our attention on the x86 architecture. But what put us in this situation? Let’s list a few of the usual suspects:

The lack of competition in the x86 market, which has been dominated by Intel for more than twenty years
The technical debt associated to the backward-compatible x86 architecture, which dates from the late 70s
The almost irreducible difficulty associated with designing and shipping your own silicon, and the cost associated with it
The suspected overall mediocre quality of the software code written by hardware manufactures and Original Equipment Manufacturers (OEM)
The lack of technical documentation available (and when available, its legal protections) which make it unnecessarily difficult to study or modify any of the code below the surface-level operating system
The ongoing miniaturization process, which allows more components to be embedded in a platform without the user’s knowledge
The weak customer demand for an open computing platform
The difficulty of patching exploit-ridden firmware

These are some of the possible reasons that led us to the present situation. Any successful solution to the problem will have to address some of these issues.

A personal view on computer hardware aptly illustrated. (Public Domain)

Hardware : the root of the problem

During the last decade, large amounts of software code has been freed or released under open source licenses. What was once the exception is becoming the rule. Although much remains to be done at the software-level, it is critical to direct some of our efforts to free the hardware, including the firmware, so we can be assured that the foundations on top of which we build software infrastructures are trustworthy enough to sustain increasingly critical workloads. Any open society which relies on information systems to function requires open computing.

As has been uncovered throughout the first part of this exploration, there is a lot of work to be done. Nowadays, any desktop or even mobile computing device is more like a set of loosely coupled elements which are all living a life of their own, untrustworthy and unaccountable. This situation is unacceptable.

We are required to use computing devices in our daily life, at work and also increasingly as citizens interacting with the state. To some extent, smartphones and personal computers can even be viewed as extensions of our minds and could someday be granted the same protections. However, these devices are flawed by design. What would life be like if we could not trust our own brains? Nobody should be forced to rely on a fundamentally flawed and untrustworthy machine as vessel for their life.

How can we restore some sanity in the computing world?

In the second part of this article [now available here], due for next Thursday, we will showcase some available techniques to better mitigate the risks and recover some autonomy, and we will also illustrate longer term solutions. Stay tuned and don’t throw away your computer (yet).