For years, we have learned to repeat that data has three vulnerable points, data-at-rest, data-in-transit, and data-in-use. And for years we have learned to repeat that data-in-use is hard to protect, well because you are using it. A sword is kept in its scabbard to protect it, once drawn out and being used in battle, I’m sorry to say, it’s going to get gouged. And data, albeit mightier than the sword, when in use, suffers the same fate. (If you wish to read about data leakage and prevention, here’s a shameless plug to my article on it.)
Not all roads lead to the cloud, some still lead to locally hosted bare metal servers or sometimes a tad bit more advanced private clouds. If you strike a conversation with a good samaritan who has opted to take the road less traveled, they would tell you that they had to, due to concerns of data privacy or legislative requirements such as data sovereignty. This exact problem is what the good folks at the Confidential Computing Consortium had managed to solve after many years of research ¹.
As we know, there are numerous technologies and practices available to protect data-at-rest and data-in-transit. So much so that if you don’t use them to protect your data, you deserve to be hacked. But those well-protected data have to be taken out of their protective enclosures when it needs to be processed or used, This is where confidential computing comes in. It allows you to execute your program and securely manipulate your data so that no third party, not even the VM provider would have access to your raw data. Now, some of the keen readers may ask, isn’t homomorphic (and for the rest of your lot, no it doesn’t sound like that, get your mind out of the gutter) computing the same? Good question! Let me answer that towards the end of the article.
Confidential Computing is the protection of data in use by performing computation in a hardware-based Trusted Execution Environment ³.
Traditionally, when you manipulate data, the data held in memory is visible to the operating system, hence having the risk of exposing the data to the hypervisor (or the VMM) and the VM provider. What Confidential Computing allows is for the data to be directly executed on the CPU within something called a Trusted Execution Environment (TEE). Only a trusted party with the correct credentials would have access to the TEE. What this does is reduces the need for trust between the cloud provider and the user by eliminating the possibility of a data leak.
A Trusted Execution Environment (TEE) is commonly defined as an environment that provides a level of assurance of data integrity, data confidentiality,and code integrity. ⁴
The trickery behind this lies within the CPUs which powers the Confidential Computing VMs. Both Intel & AMD offer solutions in this space. The Intel Software Guard Extension (Intel SGX) ⁷ & AMD Secure Encrypted Virtualization (AMD SEV) ⁸ are the two contenders. Although groundbreaking in their achievement, the underlying principle is quite simple really. I’ll give you two hints, attestation & PKI.
Attestation is the process by which one party, called a “Verifier”, assesses the trustworthiness of a potentially untrusted peer, i.e., the “Attester”. ³
What this enables is for a user to create a secure container within the CPU (an Enclave) where the user can securely execute application and data. The enclave guarantees the confidentiality and the integrity of the contents while it’s been used. The user can verify the “trustworthiness” of the contents of the enclave by generating a cryptographic signature.
Let’s assume a scenario where a user has an application with sensitive data to execute. Firstly the user generates a cryptographic hash of the application and the data (the contents of the enclave). Then the user uploads the application to the enclave. Upon initialization, the enclave gets the CPU to sign the hash using the private key stored within the tamper-resistant storage and returns it to the user. The user can use the public key to verify the authenticity of the hash, hence attesting to the contents of the enclave.
The practical applications of confidential computing shine when it comes to sharing sensitive data between multiple parties yet not sharing it. Confused? Imagine the top priority of every pharmaceutical company today, finding a vaccine for the Coronavirus. However, the chances of success would greatly increase if they collaborate. However, that would mean they have to share sensitive information, which can cause adverse effects in the long term. The solution is to set up a multi-party analytics application on confidential computing VMs.
Each pharmaceutical company will share encrypted data with the analytics application. The data will be decrypted within a secure enclave and processed. None of the participants will have keys to the enclave, therefore they won’t have access to either manipulate or view data within the enclave. None of the participants will have access to one another’s data, yet, will benefit from an analytics model trained on a combined data set.
With the rise of the digital era, comes digital theft, fraud, and money laundering. Financial institutions are in a constant battle to find new ways to keep these threats at bay. This is an area where machine learning is heavily used. However, if the data set used is not large enough, the model could flag false positives. This requires human intervention to resolve the dispute, which sometimes involves contacting the actual customer to verify the transaction and clear it or void it.
Federated machine learning or collaborative learning is a technique by which an algorithm is trained using multiple decentralized data sets (or servers). These collaborations can be orchestrated with a greater sense of confidentiality and integrity using confidential computing VMs.
Similar to the previous example, the participating financial institutions can share their encrypted customer data with a centrally hosted learning model without compromising their confidentiality. The result would be a model trained on a large and diverse data set with high-accuracy with fewer false positives.
Now let’s get back to homomorphic encryption. Homomorphic encryption gives the ability to perform calculations on encrypted data without the need to decrypt it. If that’s the case, why go into all the trouble of enclaves and TEEs? Well, since all data needs to be encrypted and decrypted, and the computation needs to happen on top of encrypted data, homomorphic computing is resource-intensive. There is no way to perform attestation with homomorphic encryption, nor does it guarantee code confidentiality or code integrity. Although unique and intriguing in its own right, homomorphic encryption fails as a solution for the problem of protecting data-in-use.
Confidential Computing offers a shimmering hope of securing the last leg of data states, making data leakage a thing of the past. However, never underestimate the power of poor design and implementation. You can have the most sophisticated deadbolt lock on your door, but if you leave the key under the doormat and leave a note on the gate giving the location of the key, well…
 Confidential Computing Deep Dive v1.0, Confidential Computing Consortium, October 2020 https://confidentialcomputing.io/wp-content/uploads/sites/85/2020/10/Confidential-Computing-Deep-Dive-white-paper.pdf
 Confidential Computing: Hardware-Based Trusted Execution for Applications and Data, Confidential Computing Consortium, November 2020 https://confidentialcomputing.io/wp-content/uploads/sites/85/2021/01/confidentialcomputing_outreach_whitepaper-8-5x11-1.pdf
 AMD SEV-SNP: Strengthening VM Isolationwith Integrity Protection and More, AMD, January 2020 https://www.amd.com/system/files/TechDocs/SEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf