WhatsApp‘s End to End Encryption, How does it work?

Amit Panghal
10 min readOct 6, 2018

--

“Matrix movie still” by Markus Spiske on Unsplash

You must have heard about WhatsApp using end to end encryption.

What is it?

In layman’s words, every message that you sent to your friend, is encrypted on your device, this encrypted message passes through network and a whole bunch of servers, reaches your friend’s device, and finally, it is decrypted on friend’s device. So as long as underlying cryptography is intact, you can be assured that no one else other than your friend knows about your dirty little secret.

Is it as simple as it sounds?

No. Purpose of this post is to give you the sneak peek of what is actually happening behind the scene.

Features of a secure messaging system

So what possible features such system should have? I will introduce a few characters in the system. I will choose Ankita, Bud, and Mayank. (Cryptography literature uses Alice, Bob, and Mallory. The names are chosen based on their roles). Ankita and Bud want to exchange messages. Mayank is evil, and he wants to listen on the conversation between Ankita and Bud, and possibly want to send messages to Ankita acting as Bud and to Bud acting as Ankita. We want our system to have the following properties:

  1. Confidentiality: Mayank can’t know what messages Ankita and Bud send to each other.
  2. Integrity: If Bud receives a message from Ankita, He can check if the message was modified by Mayank on the way.
  3. Authenticity: When Bud receives a message from Ankita, He can be sure it is from Ankita and not from Mayank.

Another feature that we might want is deniability, i.e., if someone recovers Ankita’s or Bob’s old messages in future, it can’t be linked to them and they can deny having sent that message.

(1) could be achieved using encryption/decryption mechanism. (2) and (3) could be achieved through MACs and Digital Signatures. These primitives require secure exchange of some secret (In case of Symmetric Key Cryptography) or establishment of each other’s public information (case of Public Key Cryptography). Public Key Cryptography is nice in the sense that you don’t need to share a secret with someone you want to send a message to, as long you know each other’s public key or (identity). Problem with that approach is that Ankita needs to be assured that she actually has Bud’s public key and it is not of Mayank’s and so She can securely communicate with Bud by encrypting the messages with Bud’s public key. Only Bud, who has the corresponding private key can decrypt the message. Similarly, In the case of symmetric key cryptography, Mayank and Ankita need a mechanism to share the secret (Maybe meet in person and exchange), which they will use for encryption/decryption. In the former case, it is okay if Mayank learns about Ankita and Bud’s public key (It’s Public).

WhatsApp’s E2E Encryption protocol

WhatsApp uses open source Signal Protocol developed by Open Whisper Systems (They have their own messaging application, Signal). Signal Protocol uses primitives like Double Ratchet Algorithm, prekeys, Triple Diffie Hellman, Curve25519, AES and HMAC_SHA256.

I will summarize these primitive before putting them all together to understand Signal Protocol.

Understanding Primitives

Prekeys

These are Curve25519 key pairs generated on device during install time. There is one Signed Prekey Pair and several one time prekey pairs. The Identity Public key and the Public keys of the prekey pairs are signed by a long term Identity Secret Key(Curve25519 private key correspond to Identity public key) and sent to the server during registration. Server stores these keys along with Identity Public key.

Curve25519

Elliptic curve used as part of Diffie Hellman Key Exchange Protocol. I will discuss Elliptic Curve Cryptography in separate post. Its security is based on difficulty of discrete logarithm problem in large finite groups.

Diffie Hellman Key Agreement

Diffie Hellman Key Agreement or Triple Diffie Hellman Handshake allows two parties to agree on a shared secret over public channel (Mayank is able to listen to message exchanged by Ankita and Bud). It is based on modulo arithmetic large prime p and it basically works as follows,

0. Ankita and Bud agree on protocol parameters, large prime p and generator g. g is the generator of multiplicative group modulo p. Generator is between 1 and p-1 and has this nice property, that every element in [1, p] can be represented as g^k mod p where k is in [0,p-2].
1. Ankita selects a random number x between 1 and p-1 , her private key.
2. Bud selects a random number y between 1 and p-1, his private key.
3. Ankita sends Bud, x_p = g^x mod p , her public key.
4. Bud sends Ankita y_p = g^y mod p, his public key.
5. Ankita computes ss_a = y_p^x mod p.
6. Bud computes ss_b = x_p^y mod p.

After completion of protocol Ankita and Bud have a shared secret ss_a = ss_b = g^(xy) mod p.

Mayank on the network sees x_p and y_p, and it is computationally infeasible for him to determine shared secret without knowledge of x or y.
We assume here that Mayank only see the messages on channel between Bud and Ankita but doesn’t tamper them.

Extended Triple Diffie Hellman Key Agreement (X3DH)

X3DH is an extension of diffie hellman protocol for asynchronous setting. Imagine Ankita wants to establish a shared key with Bud to send the encrypted message to him. Above agreement works smoothly when Bud is online.

What if Bud is offline?.

She would have to wait for Bob to come online.

Also, In the above key agreement, Ankita and Bud have no way to determine if they are talking to each other. They may be both talking to Mayank (Man-in-the-middle) thinking they are talking to each other. The protocol might end up as Ankita and Mayank sharing a secret, Mayank and Bud agreeing on a key.
X3DH solves both these problems using a trusted third party and Prekeys. Ankita and Bud register signed prekeys (Signed using their long-term private keys) on a trusted server. Each time one of them one wants to establish a shared secret with other, the former fetches the signed prekey bundle of latter. Let’s assume Ankita is the sender.

Now, Ankita does her part, performs DH Operation (raising a public key with a private key, operation 5 in DH Agreement) 3 or 4 times (depending on the prekeys she fetched from server). These operations are between,

  1. Ankita’s long term private key(IKa) and Bud’s signed prekey(SPKb).
  2. Ankita’s ephemeral private key(Eka) (from key pair generated specifically for this exchnage and deleted afterwards) and Bud’s signed prekey(SPKb).
  3. Ankita’s ephemeral private key (EKa) and Bud’s long term public key(IKb).
  4. (if prekey bundle has a one time public key) Ankita’s ephemeral private key(EKa) and Bud’s One time public key(OPKb).
X3DH

The outputs from the above steps are combined used as a key material i.e. master secret. These keys derived from master secret are used in Double Ratchet described below to send subsequent encrypted messages to Bud, all of them include Ankita’s ephemeral public key, long-term identity key, and information on which of Bud’s one time public key is used ( in case it is in step 4.)are included in header (plaintext). This ends x3dh from Ankita’s side. The messages sent can be received by Bud if he is online or can be stored on the server where Bud can fetch it later. When Bud receives the messages later, he can derive the same master secret by using his private keys and Ankita’s public keys in the message header.
This scheme allows Ankita and Bud to authenticate each other and derive shared secret to be used as a key material.

HMAC_SHA256

It is keyed cryptographic hash function. Apart from the value to be hashed, this function also takes input a key. Unlike hash function which is easy to compute for a given input, keyed hash function require knowledge of the key. Here it is used as a key derivation function and MAC.

Double Ratchet

Ratchet is the name of a device that moves only in one direction. Double Ratchet uses two cryptographic Ratchets, i.e., deriving new keys from current keys and moving forward, while forgetting old keys. The two Ratchets used are Diffie-Hellman ratchet and Hashing ratchet. Each time a Diffie Hellman ratchet move forward, a secret is established between sender and receiver using Diffie Hellman described above, and the secret is used to derive two new keys (root key and chain key). Hashing Ratchet moves forward by using a Key Derivation Function using chain key to generate a message key to encrypt a message to be sent, and a chain key to be used for the next ratchet movement. This Ratcheting provides a useful property to a protocol known as forward secrecy, i.e., if Ankita or Bud comprise their keys in future, their previous messages cant be comprised (decrypted), as long as ratchet works as expected (old keys are deleted).

Putting Everything Together

I have attempted to summarize the primitives used in the protocol which could be difficult to digest all at once. Assuming the primitives work as expected, following steps will describe the working of protocol in much simpler language. These steps occur when Ankita wants to send a message to Bud using WhatsApp:

  1. Registration of Clients with whatsapp server (Mobile apps on Ankita’s and Bud’s phone). It includes registering signed prekeys.

On Ankita’s Side

  1. Session setup by Ankita using x3dh.
  2. In step 2, Ankita calculates master secret and Using DH Ratchet step, derives a root key and a chain key to be used by Hashing Ratchet.
  3. Ankita derives a message key and next chain key using the chain key using Hashing Ratchet.
  4. She encrypts her message using message key ( AES256 in CBC mode ).
  5. Every time she sends a message her Hashing Ratchet moves forward.
  6. Every time she receives a response from Bud, which includes a new public key in header, she advances her DH Ratchet, calculates a new root key and a new chain key.

On Bud’s Side

  1. When he retrieves first message from Ankita, he completes the session setup by deriving the master secret, root key, chain key and the message key.
  2. Uses messages key to decrypt.
  3. If he wants to send message, he generates a new ephemeral key pair, moves his DH ratchet forward using the root key and ephemeral private key and replaces his chain key and message key.
  4. New message key is derived from chain key and message is encrypted.
  5. Encrypted message is sent with Bud’s ephemeral key in header.

In the image below (Assuming it’s Ankita’s Phone), message in white boxes are ones received and green ones are the messages sent. Each pair of continuous boxes can help you visualize ratchet movements.

  1. White -> White: Hashing Ratchet of Bud moves forward by 1 step (new message key and next chain keys are derived).
  2. Green -> Green: Hashing Ratchet of Ankita moves forward by 1 step (she derives new message key and chain key).
  3. White -> Green: Ankita’s DH Ratchet moves forward by 1 step(she derives new root key and chain key)
  4. Green -> White: Bud’s DH Ratchet moves 1 step forwards (He derives new root key and chain key)

Media Attachements and Files

Media files are encrypted and uploaded to the blob store. When Ankita sends an image or video to Bud, the pointer to the location of encrypted image or video file in blob store is encrypted and sent using above pairwise scheme.

Group Messages

  1. Whenever a group member sends his/her 1st message, he/she generates a sender key which is distributed to all group members using one-to-one protocol described above.
  2. For subsequent messages, messages are encrypted using new message key derived by Hashing Ratchet (this is different from the one used in one-to-one).
  3. Since each member has every other member’s sender key, they can move, ratchet corresponding to the sender to decrypt the message.
  4. Whenever a member leaves the group, sender keys are renegotiated (step 1).

Calling

Calling is synchronous/real-time. Whenever Ankita calls bud, she generates a random SRTP secret.
This secret is sent to Bud using the pairwise system. If he responds to call, the encrypted session begins.

Miscellaneous

Apart from the end to end encryption, the the channel between the whatsApp client and whatsApp server is secure. Plaintext Headers are not visible to anyone listing on channels. This is server’s extra wrapping over Ankita’s already wrapped gift(message) to Bud.

Wrapping Up

So you might have wondered at some point, based on an ad you saw or perhaps a whatsApp forward, that whatsApp is probably reading your messages, As you sometimes happen to see ads of things on your facebook feed which you happen to discuss on whatsapp chats. Lets uncover some possibilities and conspiracy theories.

  1. If you trust that whatsApp’s end to end encryption is actually implemented as per signal specification, then no way they are able to read your chats. Unlike Signal application, whatsApp’s code is not open source so you can’t really confirm.
  2. Assuming signal protocol is implemented correctly, whatsApp servers still know which whatsApp user is interacting with which user, how frequently, how recently. Same is true for signal app’s servers. (It’s a difficult problem to solve). If you have your mobile number registered on facebook and is the same one you use for whatsApp. They could (hypothetically) use this information to associate your friend’s activity on facebook and serve you relevant ad which ends up surprising you.
  3. They might have enough information from other channels and such an ad is merely an coincidence of their really good ad serving algorithms.

References

  1. Signal Developer Docs.
  2. Signal Blog posts.
  3. WhatsApp whitepaper.
  4. Diffie Hellman simple explanation.
  5. Off the record messaging, OTR

--

--

Amit Panghal

Works @Kaleido | Worked @TQ| NYU Courant and IITB Alumnus | Writes about cryptocurrencies, blockchain and security