Make Client-side Encrypted Data Searchable by SaaS Backend

We don’t know what it is that you are searching for, but still we are able to find it and hand it to you, without ever knowing what it is.

Published in

CyberArk Engineering

7 min readMar 18, 2020

There are many security benefits for a SaaS vendor to have its customers’ upload already encrypted data, where the backend and vendor can’t access the plain text data. However, client-side encryption of SaaS data is not useful if the backend can’t search it.

Today, I’m going to kick off a series of posts that will discuss how a backend can search encrypted data, without having to decrypt it — making client-side encryption practical.

The increase in privacy awareness and laws has caused a lot of discussion about Searchable-Encryption. If you decide to go down this path, it’s likely you’ll find yourself at a crossroads. The problem, however is that there are only a few actual implementations in the industry — a lot of the other work has been on the academic side. So, that’s why I want to provide a good overview on Searchable Encryption. I will describe some of the different searchable-encryption methods, the threats and risks of these solutions, practices of managing the encryption-keys by customers, techniques to reduce the performance impact, some optimization types, and more.

I am going to set the scene — explain what are we after, the main players, and the main challenges.

Key Advantages of Client-side Encryption

Okay, let’s first dive into the main benefits of having client-side encrypted data on a SaaS solution.

Achieve zero-trust between the client and the server. With client-side encryption, the server never gets access to sensitive data in plaintext — so even if the server is malicious, the customer’s data is safe.

Eliminate the risk of leaking customers’ data. Like a scenario of a malicious tenant that gains access to another tenant’s data, the malicious tenant only gets encrypted data without the decryption key (only the customer that owns the data has the key). Also, in a case of an adversary that takes full control of the server, the server also does not have the data decryption keys.

Allow customers control of their data destruction. When a customer asks to delete their plaintext data from a cloud, they can’t be sure that all the data is deleted and that no one will ever be able to access at least some of their sensitive data. This contrasts with the scenario where the data is encrypted and the server doesn’t have the data decryption keys. The former is data deletion, and the latter is equivalent to data destruction.

Data Integrity. The customer is immediately informed of any malicious tampering with their data.

*. The above AEAD acronym formed accidentally! I happened to realize it after writing this section :). I plan to write about AEAD (Authenticated Encryption with Associated Data) as it is a fundamental aspect of these posts’ subject.

Real-World Examples

What follows are two real-life examples, where client-side searchable-encryption data could reduce the impact and/or mitigate security issues all together.

Data Breaches

About 2 years ago a SaaS provider suffered a security breach that compromised encrypted customer data.

The provider claimed that the data on their servers had been encrypted.

So how were the hackers able to gain access to the cleartext data? While we don’t know what really happened in that specific breach, in theory there could be several possibilities, including finding the key on one of the servers, eavesdropping on the communication between the machines in the backend (if the communication inside the account is not secure), finding cleartext data in the runtime memory of servers, setting a malicious server that pretends to be a legitimate server working with the app-server, and more.

All these scenarios could not have happened if the data was encrypted in a way where nothing in the backend is able to access the cleartext data — the encryption keys are not available to anything in the backend or to anyone who has direct access to the backend.

Competitive Intelligence

The fact that an organization’s data could reside on a cloud that is owned by a competitor of theirs, would certainly deter them from using the solution. In fact, this is one of the main considerations that companies look at when choosing a vendor. Client-side Encryption ensures that competitors would not be able to gain access to sensitive data and allows for more flexibility to use the best solution for their business.

The Goal of Searchable Encryption

The main goal of Searchable Encryption is to enable Client-side Encryption with all its advantages.

Without the ability to search on encrypted data, Client-side Encryption isn’t practical.

In Searchable Encryption the server returns the encrypted matching data without learning the plaintext query or plaintext data.

SaaS customers, for instance, can upload encrypted data to the SaaS application. The service can then, upon request, retrieve the right encrypted data and return it to the user — all without knowing what it was searching for and what data it found and returned.

The picture below shows a ‘users’ database table that is in the SaaS backend.

The customer’s data in this table is fully encrypted, and despite the backend servers not being able to access it — the encrypted data is searchable by the database. In this example, the encryption of the data itself complies with all the best practices for a good encryption — ciphertext that looks indistinguishable from random bytes, the same plaintext never gets to be encrypted to the same ciphertext more than once, etc.

For instance, the admin column holds Boolean values. Although the field holds longer text, due to the nature of ciphertext that may include things like paddings and even an additional padding block, hmac salts and Initialization Vector data.

This column has the values ‘True’ for any admin user, and ‘False’ for all the other users. The picture shows how the backend database sees this encrypted data.

Once the right data is downloaded to the customer’s premises, the data is decrypted with a key that only the customer has, and the end user gets to see the real plaintext values (in this case, the real plaintext data of the fetched user details).

By looking at this table, can you spot all the admin users?

Obviously, you can’t distinguish the admin users from the other users. So how can the database find them? We’ll discuss that in future posts.

Implementing Searchable Encryption

Enabling database search on securely encrypted database fields implies that the cryptographic keys used to encrypt/decrypt the data are not accessible to the database software.

The encryption/decryption key/s are always kept on the client side.

The customer data is sent to the server as ciphertext, where the database has no access to the cleartext data at all.

This poses some challenges, including:

Use of strong cryptography
Reliable search on encrypted data
Securing encryption keys and their lifecycle by the customer
Key rotation
Minimizing performance impact
Minimizing functionality impact
Minimizing knowledge that an observer of the database may gain, by choice of method to use and by implementing deception mechanisms

In addition, with Client-side Searchable Encryption solutions, the encryption keys are owned by the client. This means that the encryption keys are kept and managed exclusively by the client, and the SaaS vendor never has access to the keys or to using them in any way. While this sounds great, it does present some challenges:

Do all the customers of this SaaS use best practices for securing those keys?
What is the risk if a malicious actor gets access to those keys?
What is the risk if the keys are lost?

Pinpointing the Cryptographic Challenge

With a secure cryptographic encryption scheme, the encrypted data is indistinguishable from random characters to anyone who does not hold the decryption key.

This means that the ciphertext reveals no information about the plaintext.

Semantic Security implies that any information revealed cannot be feasibly extracted from the ciphertext.

What is a good semantically secure encryption scheme?

Good semantic security is when an adversary chooses between two plaintext messages, m1 and m2, that are of the same length. The adversary then sends both messages to an oracle and receives an encryption of either one of the plaintext messages they sent. An encryption scheme is semantically secure, if an adversary cannot guess, with better probability than ½, whether the given ciphertext is an encryption of message m1 or m2. This notion is also referred to as ciphertext indistinguishability under chosen-plaintext attack (IND-CPA).

Requirements of Searchable Encryption

Below are the general requirements of a Searchable Encryption solution — although some methods we’ll discuss in later posts will not necessarily comply with this list.

Security Objectives

A Semantically secure system.

Plaintext data and queries are never directly exposed, but statistical inference is possible.

We allow some form of statistical leakage, like data access patterns (e.g. repeated retrieval, size info), etc.

Protects against break-ins, cloud insiders, “surveillance attacks”, etc.

Constraints

Relational database as first solution
No-Sql database as second solution

Requirements

(when moving from a SaaS where client plaintext data is accessible by the backend, to a SaaS with client-side encrypted data)

No compromising on security level
Minimal functionality impact
Minimal performance impact

Now that we’ve provided some good background on some of the benefits of Client-side Encryption and explained Searchable Encryption, we can dig a bit deeper in the next round of posts. We’ll discuss topics including data integrity, deception methods, risks and mitigation, and more.