Opaque: An Oblivious and Encrypted Distributed Analytics Platform

Frank Wang
MIT Security Seminar
3 min readMay 31, 2017

Raluca Ada Popa came to MIT to give a talk on her recent NSDI paper that provides a platform to perform oblivious distributed analytics.

The motivation for Opaque is that complex analytics frequently run on sensitive data. More specifically, she provided the example that companies use Spark SQL to analyze user data. However, the problem is that an attacker has full access to all cloud software. The question is how do we protect data and computation while preserving functionality.

Overview of remote attestation available in Intel SGX.

Current cryptographic solutions are either too slow (fully homomorphic encryption) or too specialized (CryptDB, BlindSeer, Arx, etc.). An alternative is to use hardware enclaves. One important feature of the enclaves used in Intel SGX is remote attestation. Remote attestation enables verifying what code runs in the enclave and performing of key exchange.

Currently, Intel SGX assumes that there are no hardware attacks (timing attacks, side channels, etc.). Systems, such as Haven and Scone, use Intel SGX and support relational algebra. However, they are not distributed and leak access pattern data, which especially problematic in settings such as diseases and genomics where access patterns can leak a large amount of data.

The goal of Opaque is to perform oblivious distributed analytics. By oblivious, she means the access patterns are independent of the data content. More specifically, Opaque is a oblivious and encrypted distributed analytics platform.

Informally, the security guarantees are the following:

  • data encryption and authentication
  • computation integrity: the client can check that the computation result was not affected by the attacker
  • obliviousness: the memory and network access of a query is the same for any two inputs with the same size characteristics (inputs/outputs).
Opaque components

The protocols are fairly involved for all the components, so I’ll leave you to the paper for the details.

I will briefly outline the insights for the rule-based and cost-based query planning optimizations.

For rule-based optimizations, they split each logical operator into smaller Opaque operators. They also take a global view across the plan to remove some Opaque operators.

For cost-based optimizations, not all tables are sensitive, but they propagate sensitive. However, sensitivity propagation introduces a new dimension to query optimization. They develop a SQL optimizer with new cost and sensitivity propagation.

Opaque system layout

In their evaluation they seek to answer 2 main questions:

  1. How does Opaque compare to Spark SQL?
  2. How does Opaque compare to the state-of-the-art oblivious systems?

Here, I’ll present a summary of the results, and I’ll refer you to the paper for more details about the evaluation.

Comparison of Opaque with Spark SQL on a big data benchmark with and without obliviousness.
Comparison of Opaque to a state-of-the-art oblivious system, GraphSC, for graph processing.
Spectrum of systems and their performance-security tradeoffs.

Opaque is an interesting piece of work that uses hardware enclaves to improve performance for oblivious and encrypted data platforms. One downside is that the CPU is trusted, but this is a good sacrifice to make these types of platforms more practical. If you’re interested in learning more, I encourage you to read her paper.

--

--

Frank Wang
MIT Security Seminar

Investor at Dell Technologies Capital, MIT Ph.D in computer security and Stanford undergrad, @cybersecfactory founder, former @roughdraftvc