TEE-based Smart Contracts and Sealing Pitfalls

Last week the rest of the sgx.fail team and I posted a research preprint that included a vulnerability disclosure affecting Secret Network. Secret Network is the first smart contract system based on Trusted Execution Environments (TEEs) to go live in production. However, there are several rival projects with closely related tech that have launched public testnets, namely Oasis, Phala, and Obscuro. Our disclosure kicked off a broader discussion, with all these projects reaching out and/or making public statements (Phala’s) (Oasis’s) (Secret’s) explaining to what degree they would have been affected and about the mitigations they have in development. The four projects have been building most independently of each other, but TEE/SGX compromise presents a common threat to all of them, suggesting an opportunity to work together.

Despite the challenges they face, (which include the potential for new SGX vulnerabilities yet to come, and the inherent reliance on centralized service from Intel), this niche is poised to grow. A promising (though also challenging) future direction is a hybrid of TEEs alongside cryptographic techniques like multiparty computation (MPC). I’ll have more to say on this at some point.

But in the meantime, we need to talk about more basic design issues in TEE-based blockchains. Most importantly, I want to explain how SGX sealed data based on code signing policies can create an unwanted backdoor. Specifically, given Secret’s current design, the developer code signing key could be used to decrypt the master key, even without needing to exploit any bugs or side channels. Other projects exhibit this issue on their testnets to some degree as well. I’ll also point out several other areas where the TEE-based smart contracts could probably work together to strengthen the niche as a whole.

Quick recap of TEE-based smart contract

TEEs are a promising way to add confidentiality to smart contracts. A TEE, or enclave, is secure processor hardware that supports isolation and remote attestation. The most commonly used is Intel SGX. This approach is very flexible and efficient, supporting existing smart contract engines like EVM and CosmWasm without the overhead associated with cryptography like zkSNARKs and MPC. The approach was described in research papers including Ekiden, PDOs and Kaptchuk, Green, Miers. The first instantiation to go live in production has been Secret Network. This has been running since 2020 and is already the host to live applications including DeFi, high profile private NFTs, and others. Oasis, Phala, and Obscuro have launched public testnets and are preparing for a main network launch soon.

The basic principle in all of these is the same: the transactions and the contract state are encrypted, such that only secure TEE enclaves can access them. In the case of Secret, all of the decryption keys are derived from a master secret, the “consensus seed,” which is replicated to all the enclaves on the network. In order to get a copy of the consensus seed, an enclave must complete the remote attestation process. This involves obtaining a signed report from Intel Attestation Service (IAS), that shows you’ve run the correct program on a processor with up-to-date microcode. The privacy of the overall system relies on the ability of TEEs to prevent even the node operator in physical possession of it from reading its internal state while it runs.

A lot of the following discussion will come across as critical of Secret Network in particular. That is not my intent, and so before we get into it I want to temper things by saying I admire a lot of what Secret has done with their first mover advantage. They are already hosting several really compelling real-world applications that showcase the TEE privacy features. Their private NFTs and the marketplace Stashh are the best examples. You can buy, for instance, an NFT containing a secret access code to an unreleased movie from Kevin Smith. (The going rate is $50, secondhand on Stashh). In principle, any of the owners could leak this access code if they chose, however as far as I can tell no one has done so, since it doesn’t show up in any online searches. (see this reddit discussion). As a result, if you want to see the movie, buying the NFT and viewing the private metadata is the only way to do it. It’s a great demonstration of what private smart contracts can do.

Discussion 1. The need for TCB Recovery plans

TCB Recovery is roughly Intel’s name for everything that happens after an exploitable vulnerability is found. It’s probably the most poorly understood aspect of the SGX ecosystem. I’ll explain just enough about the recent AepicLeak and TCB recovery to understand what this has to do with blockchains, but for more details you should read our paper and the SGX.fail website. AepicLeak was publicly disclosed in August (none of the SGX.fail team overlaps with them, we got to hear about it for the first time with everyone else. I got to read about it on the way back from the IC3 Blockchain camp, where a student-led team developed applications on both Oasis Sapphire and Secret Network). Like other vulnerabilities found by researchers, the public disclosure of AepicLeak was coordinated so that Intel and many hardware vendors released microcode and BIOS patches contemporaneously with the announcement. This may have given the impression to some (as it did to me) that all the appropriate mitigations were already in place within IAS by the time the disclosure was published.

First, let me explain how I expected IAS would function by the AepicLeak disclosure date. Every time an enclave needs to generate a remote attestation, it must interact with IAS, sending some representation of the hardware and microcode state, and receiving an attestation report in response. The signed attestation report includes a status flag and optionally a detailed list of advisories. So, I expected that IAS, as the first line of defense, would return a warning of SA-00657 (the advisory for the xAPIC vulnerability) for any vulnerable platform I could find. When I went on a hardware purchasing spree, it seemed like a long shot.

*** Intel IAS can’t assess whether the attesting enclave has the necessary SW mitigations and Intel PCS doesn’t provide attestation collateral that allows assessment of whether the attesting enclave has the necessary SW mitigations.

If I had understood more about TCB Recovery at the time I would have interpreted the advisory differently. The advisory says that a TCB Recovery is planned, and estimated that it would complete in Q1 2023 (this was later improved to Q4 2022). This is a huge oversimplification, but you can think of the TCB Recovery date as the time when IAS requires microcode updates to be applied before issuing new attestations. One of the main takeaways from the sgx.fail paper is that it is difficult for IAS to pick a TCB Recovery schedule that satisfies all SGX applications. For example, if IAS returns new error codes before cloud service providers can patch, SGX applications could become unavailable. The delay between public disclosure date and the TCB Recovery date is the result of a difficult compromise between availability risk and confidentiality risk.

TEE-based blockchains must be prepared to carry out their own TCB Recovery plan without waiting for IAS. The most urgent action must to block new vulnerable nodes from joining. While the vulnerability affects only some processors, the public disclosure serves as a “tip-off” for how to find hardware to use in mounting a siege. Secret Network devs were aware of AepicLeak, but they had estimated (just like I had) that there were no machines that were simultaneously a) vulnerable to AepicLeak, b) capable of passing the IAS-based attestation checks. So they decided to leave registration open. As a result, even though AepicLeak had already been publicly disclosed for nearly two months, we were able to find suitable hardware and breach the consensus seed directly. In response to our disclosure, Secret Network developers immediately halted the registration of new enclave nodes by revoking their project account with IAS.

We’re optimistic that the window of opportunity was thereby closed and that this potential breach of user data will never materialize, though it is important to warn users about the possibility. It must be accepted that regardless of microcode updates, any deliberately-unpatched machines that were already on the network would continue to pose a threat today. Secret developers estimated that 4% of the nodes registered on the network at the time could have been vulnerable.

The first takeaway lesson is that every TEE-based blockchain needs a TCB Recovery plan in place for the next SGX vulnerability, and especially should plan to carry out the equivalent of a registration freeze at the earliest notice. This planning is an area where niche-wide best practices should emerge. I want to note that as far as I understand, none of the developer teams received advanced warning about AepicLeak prior to the public announcement. The niche might join together to better negotiate/advocate with Intel for advance notice.

Part of Secret Network’s response has been to develop a key rotation mechanism. Key rotation means to stop using the old master key, migrate the current state of every contract to a new key, then continue processing transactions encrypted under the new key. Once keys are rotated, new transactions will now no longer be at risk of decryption even if the old key ends up breached. Key rotation is something Phala has already implemented. Oasis Sapphire plains to launch their mainnet before finishing a key rotation mechanism, though it is in development.

Discussion 2: Compartmentalization.

Of course this is easier to say with hindsight immediately following a breach, but to me the choice to share the consensus seed with any node that passes attestation seems far too risky. Secret Developers have explained that this is a deliberate design tradeoff, since it generally favors liveness (more nodes would need to crash in order to halt the network) and makes it much easier for API endpoints and other services to operate.

How do the other TEE-based blockchains differ? We can infer from Obscuro’s documentation that their approach to master key replication is similar to Secret’s in that any “valid TEEs”, whether Aggregators (stakers) or Verifiers (no stake required), obtain a copy of the master seed.

One alternative is “compartmentalization” as explained in the 2018 Ekiden paper. Ekiden has two tiers of TEE nodes: key managers and workers. Workers receive contract-specific keys on a need-to-know basis, such that even if compromised they would only learn a portion of recent blockchain state. The key manager nodes still store a master key, but they are fewer in number and likely require some stake deposit or are chosen from known and trusted entities.

Oasis and Phala both feature multiple tiers like Ekiden. In the Oasis Sapphire testnet, the key manager currently consists of 6 nodes, though their mainnet launch will likely feature a larger committee. Their post notes that they “restrict the membership of these committees to trusted operator partners as an additional measure to prevent unknown bad actors from trying to exploit vulnerabilities like Æpic.”

In Phala, similar to Oasis, the “gatekeepers” with access to the master key are appointed by their “council,” an on-chain governance process. While Phala too has only launched a testnet for their “Phat Contracts” system, the council is already in place and has 8 members and the motions to add new gatekeepers are noted here.

To summarize, at the current time, Oasis and Phala implement compartmentalization to limit the master keys to a small committee, while Obscuro and Secret distribute master keys more broadly. How much better is this? How much trust should we place in these committees, and what exactly are we trusting them to do? This question is related but different from establishing validator committees for PoS blockhcains. The committee members must be trusted not to exploit the next SGX vulnerability that occurs — there is no such analogue in ordinary proof-of-stake blockchains. It might help to define some niche-wide best practices on what information to expect from transparency reports about key manager nodes.

Discussion 3. Software upgrades and the problem with MRSIGNER

Now let’s get to the main point of this post. Every blockchain project needs to have a software upgrade process. This requires migrating sensitive data, like the master key, from an old enclave to the new enclave. The easiest way to do this in SGX is a feature called “sealing with MRSIGNER”. Unfortunately, this creates a backdoor the developers could use to peek at the master key. I’ll explain what all this means in a moment.

I carried out a quick investigation into all four TEE-based blockchain codebases (followed by asking for comments), and determined that Secret and Obscuro currently rely on MRSIGNER sealing, hence the developer’s code signing key could indeed be used to steal the master decryption key without leaving any evidence, while Oasis and Phala require both multiple signatories require a proof of on-chain publication before migration takes place.

To understand the issue we need to start with “sealed data” in TEEs and how this is used for software updates. Sealing is how data is persisted between invocations of an enclave program. For example, in Secret Network the consensus seed is stored in a sealed file, “consensus_seed.sealed”, which is created when the node first registers and thereafter is loaded whenever the node restarts. A sealed file abstraction is provided by SGX standard libraries. A sealed file created by an enclave can later be read by that same enclave, but not by the untrusted OS or any other enclaves.

Just one more gritty SGX detail before the issue becomes clear. SGX provides two sealing key policies: MRSIGNER, and MRENCLAVE. MRENCLAVE binds the sealed data to the hash of the program binary that created it. Only the exact same program that originally sealed the key can unseal it to access it. The alternative policy, MRSIGNER, binds the sealed data to the developer’s code signing key. The policy says that a sealed file can be accessed by ANY program binary signed by those same developers.

You don’t have to know about hardware side channels to see the problem here. If the developers chose to, or if they were coerced or had their keys stolen, they could code up an enclave program that spits out the consensus seed in plaintext, sign it, then run it on a node. The code signing key can be used to retrieve the master decryption key directly, without having to exploit any SGX vulnerabilities. This need not leave any trace, and there is no easy way for the developers to provide evidence they haven’t done this.

Why use MRSIGNER at all? Well, without MRSIGNER, it would be significantly more difficult to deploy software updates. All of the TEE-based blockchain projects are undergoing active development and maintenance. The upgrade process would be more complicated without MRSIGNER sealed data, since nodes running the new software would need to reregister. In developer discussion on Phala’s codebase, Shelven Zhou explains “The Problem with MRSIGNER” very clearly:

The MRSIGNER-based sealing is easier for implementation, but it enables anyone who owns the certificate which signs the program to decrypt its sealed data.

Let’s start with SecretNetwork. Their documentation is explicit that “The consensus_seed is sealed with MRSIGNER to a local file.” Looking at the code where this takes place (storage.rs), the consensus seed is sealed using the default options of SgxFile from the Teaclave library, which is indeed MRSIGNER.

Next, Obscuro uses the EGo library. This features two options, “seal_ex” (MRENCLAVE) and “seal” (MRSIGNER). Obscuro currently uses “seal”. Obscuro developers confirmed that this is an oversight and not an intentional design choice. Obscuro’s documentation mentions “a group of independent, reputable, and competent security auditors has to analyse the code and approve it by signing it carefully.” The developers plan to enforce this within the enclave by the time of their mainnet launch.

Phala’s current implementation seals the master key using the MRSIGNER key policy. However, their detailed discussion on the MRSIGNER issue also includes an implementation of a more involved upgrade process based having a quorum of the council and/or gatekeepers sign off, and requires publishing the program on-chain. Their plan is to remove MRSIGNER prior to mainnet.

Oasis only uses MRENCLAVE for sealing. (code) For software updates, they implement a multi-key code signing mechanism that requires 2-of-3 code signers to approve of an update before the master key can be transferred from an existing node to a node with upgraded software. Additionally, as with Phala, the program binary must be published on-chain.

I’ll summarize everything discussed so far in a table (but please note this is non-exhaustive, there are many more design details we could discuss):

Non-exhaustive summary of selected key breach mitigations in TEE-based smart contracts. Missing mitigations in main networks are red, while for test nets they are yellow, since these can still apply the mitigations before any sensitive user data would be affected.

Secret Developers have explained that MRSIGNER is a deliberate design tradeoff. Besides ease of upgrading, a safety-minded argument is that a more complicated upgrade mechanism introduces more potential bugs that could brick the network or leak the consensus seed. However, I suspect Secret’s users and onlooking developers in general are unaware of this choice or have not fully understood the dilemma. Once this is discussed, I think there will be a clear consensus that this is an unwanted back door. My hope is that after this post the Secret community asks for this policy to change and another key rotation takes place such that the developers cannot unilaterally peek at the key. This would require implementing an upgrade mechanism that relies on some combination of a) developer code signing, b) and/or a proof-of-publication, c) and/or requiring multiple independent code signers not just one.

The significance of proof-of-publication is that requiring a trusted committee to sign off on code updates is only worthwhile if their reputations are truly at stake. Without requiring software updates to be published, the committee could collude to steal the master key without any way of getting caught. Even with proof of publication, it’s unclear what exactly the code signing committees are promising to do. The reason none of the projects earn a “green” for their code signing policy is that I haven’t yet seen documentation of what reputations are at stake and what process they follow when approving code updates.

Acknowledgements and disclosures:
I’m (Andrew Miller) a board member of Zcash Foundation, and a member of the Oasis Foundation TAC, but have no relationship with the other three projects discussed here. This post is entirely my personal viewpoints, and my intent is to be as neutral as possible.

I thank Shelven Zhou from Phala, Jernej Kos from Oasis, Tudor Malene from Obscuro, and Guy Zyskind from Secret for comments on this post. I started out by writing my own descriptions of the functionality of these codebases based on my code review and the code snippets referenced, but I did my best to incorporate corrections based on developer comments. Any remaining mistakes are my own, while the bits I got right are with their help.

--

--

Andrew Miller
The Initiative for CryptoCurrencies and Contracts (IC3)

Assistant Professor @ UIUC. Distributed systems, cryptography, programming languages. Zcash Foundation board member