An introduction to encryption: how to protect data in healthcare

Published in

Doctolib

10 min readMar 31, 2022

Doctolib provides healthcare software for practices, i.e. modern web applications that interact with a cloud infrastructure. The cloud enables us to adapt quickly and to grow fast in different countries, thus serving more practitioners and patients. Scaling also implies storing more and more data and with great power comes great responsibility! Hosting such a large volume of sensitive medical data entails security and confidentiality risks.

The security team’s mission is to protect those data with the highest level of security by implementing the latest innovations. One responsibility is to encourage every Doctolib employees to ask themselves “Where are we the most vulnerable and therefore the most likely to be attacked?” and “What measures should be deployed to safeguard against these threats?”. This article presents data protection as a rigorous iterative process that determines an encryption strategy. It justifies why end-to-end encryption is intended for medical data, even if the cost of implementation is high for cloud applications.

What is sensitive data?

Protecting our users’ personal data is our North Star every day. Personal data is useful for providing services to patients such as online appointment booking, video consultation, or document sharing with healthcare professionals.

Not all data are equally sensitive, which requires adequate levels of protection. We distinguish 3 data sensitivity levels:

Technical data (e.g. date and hours, name of the practitioner)
Personally Identifiable Information or PII (e.g. name, email, phone number)
Personal Health Information (PHI), defined in the European General Data Protection Regulation (GDPR): “data concerning health means personal data related to the physical or mental health of a natural person”. In other words, all information related to a patient seeing a doctor is PHI. The same applies to medical documents since they contain both PHI and personal information about a patient.

Personal data PII and PHI are sensitive data about individuals and must remain private. They deserve a high level of security and confidentiality, otherwise the compromise of this data could pose a real threat to people’s lives.

What are the threats regarding sensitive data?

The main motivation for attackers is to make money, looking for opportunities to blackmail owners of sensitive data in exchange for money or influence. The diversity and high volume of data, combined with high public exposure of the company, could attract experienced attackers. This requires an ongoing effort to patch vulnerabilities, as sometimes one can open the door to large data sets. If we somehow make it very difficult for attackers to leak massive sensitive data, they might give up. Effective security is like a race to radically deter attackers by making their attacks unprofitable.

However, the company can also be a potential threat, as much as the cloud service provider or any subcontractor. Due to medical confidentiality, it only takes one unauthorized employee to access a single piece of sensitive data to trigger a security breach. To mitigate this threat, we ensure users are in control of their data. This means that the user can easily decide or understand who has access to their personal data, for what purpose and for how long, with the ability to change their decisions at any time.

The regulations exist to make companies implement the latest technologies, called state-of-the-art, to protect users’ data from companies, third parties, subcontractors, and external attackers. In practice, we need to make the company’s investment in security pay off by setting priorities:

We continuously align the company on the most impacting threats needing mitigation first (our methodology has been certified ISO 27001).
Our security team determines the most impactful and innovative protection techniques (called controls) to implement.

One of the main categories of protection techniques is data encryption, which keeps evolving through research. Thus, anticipating threats to protect sensitive data requires a continuous iterative process to check the effectiveness of security and to strengthen it if necessary and possible.

How does encryption protect sensitive data?

The purpose of encryption is to prevent unauthorized third parties between two people from reading or modifying their confidential data. Encryption is very effective in achieving security and privacy, as it renders sensitive data unreadable by generating random text that no longer makes sense. The reverse transformation requires a specific secret key (basically a very long random number): whoever possesses it can reverse the encryption and recover the sensitive data again. It’s called cryptography.

A common misconception is to consider data either fully encrypted or not protected where in fact encryption could never ensure 100% security: it is not an on/off switch! The level of security of an encryption scheme depends on how and when you encrypt and decrypt data and how you protect secret keys:

Encrypting 100% of data before saving it on hard drives (at-rest encryption) protects against physical access threats, but not against bugs exploited by attackers in applications (a hack allowing to query forbidden data for example).
Encrypting 100% of data transmitted over the network (in-transit encryption, commonly HTTPS) protects only from traffic interception, but not against a compromission of the application servers.
Additional Server-Side Encryption (SSE) techniques further reduce data exposition in servers to hide it from attackers, but decryption is still necessary for processing data, at least in servers’ memory.

One of the main disadvantages of encryption is that the transformation of data into random text makes it more difficult to process. For example, server-side encryption prevents the database server from performing search or filter tasks on encrypted data which are necessary to allow users to query their data stored in cloud databases. In-transit data encryption of a web application traffic can prevent you from implementing another useful security protection: detection and blocking attacks from the Internet, as it requires being able to analyse non-encrypted requests (e.g. a Web Application Firewall).

The more encryption you add to the data lifecycle, the more complex it is to provide services. Sometimes it is possible to maintain data processing with higher complexity and the cost of implementation would influence the prioritization. Thus in practice, implementing an encryption technique is a trade-off between security, functionality and cost. This requires an assessment of the security of encryption models and their residual risks, known as threat modeling.

PHI requires end-to-end encryption

End-to-end encryption (E2EE) is a client-side encryption scheme, meaning data is encrypted and decrypted on the user device (smartphone or browser), before reaching the Internet network. Each user owns a unique cryptographic identity composed of a public key, shareable with others, and a private key that remains on his device. User’s keys are not used to encrypt data directly: each piece of data is encrypted by a dedicated key. It allows users to share pieces of their data with specific groups of users. These data encryption keys are then protected (i.e. encrypted) with the users’ keys. Our whitepaper details how this envelope technique works combining symmetric and asymmetric encryption. In a nutshell, the distribution of so many different keys among users prevents an attacker from compromising a large volume of data at a time. However, a small leak is still possible if the data is decrypted on a compromised device.

E2EE is particularly strong because it has been designed to ensure that only users (practitioners and patients) can exclusively decrypt their sensitive data and no one else, not even the cloud service provider. Again, the many secret keys deployed on users’ devices give ownership and control over who can decrypt and process their data. E2EE then meets medical privacy expectations but on the Internet, allowing companies like Doctolib to provide cloud hosting solutions without breaking medical confidentiality. This is all the more logical as Doctolib is a data processor for practitioners. They are the ones who own their patients’ medical data, which means that we cannot choose the purposes for which their data is processed.

Even after integrating several encryption techniques, including strong ones like E2EE, you have only solved half the problem. It remains to determine which encryption should be applied on which data (PII, PHI) and when (client-side, server-side), using a data classification methodology. In general, encryption at rest applies to internal data and server-side encryption models are more suitable for PII. E2EE is supposed to secure PHI but the cost of implementation needs to be balanced by encrypting as much PHI as possible, while other sensitive data could be protected with server-side encryption for example. Apple iCloud service encrypts some data with E2EE only when it’s possible and relevant. Obviously, the best thing would be to encrypt everything end-to-end, but the cost of implementation would be so high that we have to use other techniques to optimize it (Figure 1).

Figure 1: Higher security implies higher implementation cost

Securing the data lifecycle by choosing the right techniques to apply is called an encryption strategy. Not having 100% encrypted data at all times does not mean that it is not secure, but specific sensitive data certainly requires appropriate techniques.

The challenge of implementing E2EE

The decision of implementing E2EE is like walking on a ridgeline between risks to mitigate and the high cost of implementation. Because it prevents any third party from decrypting medical data, the complexity of data processing explodes. This is all the more impactful as cloud services rely a lot on server-side processing: vital tasks such as selecting, filtering, searching on encrypted data are broken and must be redesigned. This is the main challenge with E2EE: it’s not the cryptography itself but the way of implementing the technology without killing core features or the security model. That’s why instant messaging applications, such as WhatsApp or Signal, were the first to offer this level of security, thanks to the simplicity of their solutions.

To ensure compatibility and seamless experience, Doctolib partnered and acquired Tanker, the first provider of E2EE technology as a service in the cloud. By using Tanker’s API from their browsers, users register their cryptographic identity and share encryption keys with others without fear of losing data. In this model, the service provider is still in charge of hosting, and serving the encrypted data; Tanker only manages the key management service between users, so they don’t have to worry about the encryption running in the background on their sensitive data. However, to secure their secret keys in cloud applications, the implementation has inherent trades-off:

The risk of data loss. In a classic E2EE scheme, if a user loses his device (containing his private key), all his encrypted data may be lost. However, in the healthcare sector, data loss is an unacceptable risk. To enable users to change devices without losing their private key, we ask them for a password to encrypt and save it in the cloud (the recovery key). However, there is still the risk of forgetting the password (thus losing the recovery key), which experience shows happens often. Thus we implemented a second mechanism to recover the private key based on double authentication to ensure practitioners will always be able to decrypt their documents, even 10 years after encryption. We will detail this mechanism in a future article to prevent anyone but the user from recovering their secret key.
Data history. The utilization of user groups is essential for new users to gain access to the history of encrypted data, otherwise a new secretary or replacement would not be able to work with data encrypted prior to their arrival. However, once you have shared a private group key to users, even if you change the key to remove them from the group, they would still be able to decrypt the history of encrypted data, before the key change. The more flexible it is to access sensitive data, the easier it could be also for attackers. Therefore we balance the residual risk by implementing complementary measures such as server-side access control (attacker could decrypt, but can’t download data) and rate limiting (attacker can’t download much data in a short period of time).

Cloud applications have great potential because of their centralized execution models. They require a minimal degree of trust on the part of the user, since the application code is controlled by the service provider. The challenge is to ensure a significant improvement in security and privacy with E2EE while still ensuring full availability of data, even if passwords are lost. At the same time, it should be impractical for Doctolib and Tanker to hack into users’ secret keys. This requires a set of rigorous technical and organizational countermeasures — known as a Chinese wall — such as segmented infrastructure, segregation of duties, live security monitoring, internal audits, etc. But in the end, our best initiative is to encourage a strong privacy culture among every single employee. They are the real owners of Doctolib applications and they are also users of the services they are building, with high ethical expectations.

In short, end-to-end encryption drastically mitigates security and privacy issues for cloud-based applications. The high implementation cost makes it a gradual process: it requires long-term planning even for mature companies like Facebook. Zoom is rolling out phase 2 in 2021 out of a 4-phase plan that will take years to achieve. WhatsApp has just started to propose end-to-end encryption for backup this year, 5 years after encrypting messages! Doctolib has been working on the implementation since 2019 with a roadmap for at least 2022. Our health-oriented products are end-to-end encrypted by design (phase I) and we are encrypting our documents and other health data of our booking management software (phase II).

We hope that this post brought you some light in the dark corridors of encryption! End-to-end encryption is one example revealing that practicing security within a fast-growing company is about finding the most relevant innovative techniques according to many criteria: security and privacy potential, the risks to mitigate, the cost of implementation and sometimes the functionality trade-off. Stay tuned for future posts on other protection mechanisms if you are interested in Doctolib security experience.

Recently we have been invited to a podcast (in French) describing how we organize security within the company. Our security and engineering teams are hiring in Paris and Berlin: join us!