Model Serving in Trusted Execution Environments: Improve Integrity and Confidentiality of Machine Learning Models on Cloud

mcapuccini
scaleout
Published in
3 min readJan 12, 2022

Serving Machine Learning (ML) models over gRPC or REST APIs has become a common process for many organizations. State of the art practices advocate the deployment of model serving on cloud environments which promise out-of-the-box resilience and scalability for these services. However, every time we move a mission-critical logic outside of an organization we take a risk. Large amount of data and computation is notably needed to train a successful ML model along with a substantial R&D effort, requiring a considerable investment. Therefore, if a third party gets access to served models with the intent of gaining an unfair advantage over our organization, a major loss may occur.

Trusted Execution Environments (TEE) solutions such as Intel Software Guard Extension (SGX) come handy in such a situation. On one hand, we want to be able to run model serving on a cloud provider but on the other hand we don’t trust such an environment. Intel hardware with SGX support allows us to achieve confidentiality and integrity in an untrusted environment by running our application on a so-called “enclave”. Enclave’s code and data exists solely on processor reserved memory and cannot be accessed by any other peripheral or software. In this way if the host Operative System (OS) or hypervisor gets compromised the confidentiality of our ML logic will still be safeguarded.

Attacks from the OS or from the Hypervisor can only be directed up the stack and cannot reach the enclave.

The reduced surface of attack that we gain by running a model serving in an enclave comes with added complexity. Keep in mind that the enclave is isolated from the OS higher in the stack. This not only means that we need to rewrite the service but also that we cannot make use of low level system calls. Furthermore, to load the model to the enclave we need to first encrypt it and send it to the untrusted environment. Then, the necessary secrets to decrypt the model need to be provisioned in the enclave from a trusted environment so that the model can be consumed by the serving API.

Luckily, the Gramine project comes to our aid. Gramine is a lightweight LibOS (or unikernel if you prefer) that allows us to run unmodified Linux binaries in SGX enclaves. In addition, the Gramine project offers some handy tools to encrypt/decrypt data and for secret provisioning. In the context of model serving, we can use Gramine LibOS to run our serving API in the enclave and rely on remote secret provisioning to load and decrypt the model from the untrusted environment.

The plain model is encrypted in a trusted environment and shipped to the untrusted premises. An unmodified serving API runs in the enclave making use of Gramine LibOS. Gramine tools manage secret provisioning and decrypting the model when loading it in the enclave.

Is it all that simple? Not exactly. Mind that Intel SGX enclave memory is limited to 93MB and that once that limit is exceeded paging occurs; the performance of the application will be heavily affected. Therefore, you most likely want to stay away from Python and implement a lightweight service using your compiled language of choice.

Running mission-critical software in a cloud environment can be risky and model serving is no exception. TEE solutions come handy in reducing the surface of attack by third parties. Are you interested in trying it yourself? We prepared a proof of concept that you can run: https://github.com/scaleoutsystems/tee-serve. At Scaleout we use this technology to improve security in our federated learning and MLOps platforms, i.e. FEDn and STACKn. If you are interested in learning more, please do not hesitate to contact us.

--

--