A look at how private messengers handle key changes

Last week The Guardian published a story about a “backdoor” in WhatsApp. I don’t think their findings are a backdoor, but the story did raise some good questions about how private messaging apps should respond to key changes.

I was curious how other private messengers handle the same question, so I checked them out and found some results I did not expect.

A friend and I used our two Android phones, representing Alice and Bob. Here is our process:

  1. Initial contact. Bob sends Alice a message.
  2. Message after key change. Alice uninstalls and re-installs the app, which results in new keys for Alice. Bob sends Alice a message.
  3. Key change while message in flight. Alice uninstalls the app, Bob sends Alice a message, then Alice re-installs the app. This is the one that got WhatsApp in trouble: what to do when a message is already in-flight and the keys change.

WhatsApp

Let’s start with WhatsApp, since that is what set the tone for this entire discussion.

WhatsApp results:

  1. Initial contact: This is the first time Bob has contacted Alice, and the message is delivered without any warnings or user input. This appears to be standard across all the messengers I tested, except for Wire, which offers a weaker guarantee (see below).
  2. Message after key change: Alice uninstalled and re-installed. As soon as Alice re-installed, Bob’s chat with Alice immediately showed a key change notification (the first instance of “Alice’s security code changed”) in the screenshots above. This was really nice, giving Bob proactive notice of a change. It is unique to WhatsApp in my testing, none of the other messengers I tried do this. So far so good.
  3. Key change while message in flight: Alice uninstalled, then Bob sent his message. On the first image above, you can see that the message has only a single check, indicating that it hasn’t been delivered. Then Alice re-installs, and in the second image above you can see that the message is automatically delivered using the new key material, without asking for manual approval from Bob, although Bob is notified of the change (the second instance of “Alice’s security code changed.”)
  4. Verifying: The UI/UX is nice. WhatsApp e2e uses Signal inside, so they inherit the Signal numeric format, which seems well thought out and user centered, along with the ability to use a QR code for in-person comparisons.

WhatsApp summary:

Good: The UI/UX is nice and feels like it is designed to be used.

Good: The preemptive notice of a key change is lovely. This aspect is better than any other messenger I tried.

Good: WhatsApp messages get delivered under even extreme circumstances.

Bad: Automatically re-encrypting messages in flight to new keys, even though the user is notified. If they made this require user approval, WhatsApp would be the best messenger I tested. Figuring out how to make this work well in groups while preserving message ordering could be difficult.


Signal

Figures 1, 2, 3
Figures 4 and 5

Signal results:

  1. Initial contact: Everything looks normal.
  2. Message after key change: Unlike WhatsApp, Signal does not preemptively notify Bob that Alice’s keys have changed. Everything looks normal until Bob goes to send Alice a message (Figure 1 above). Once Bob sends a message, however, it’s not just a notification, but a blocking process that requires manual approval (Figure 2 above). Once Bob approves the change, a notice is added to the conversation (Figure 3 above).
  3. Key change while message in flight: Alice uninstalls, Bob sends a message, and then Alice re-installs. Alice never receives the message, and it never shows a delivery receipt on Bob’s end of the conversation (Figure 5 above). However, the next time Bob goes to send Alice a message, he is notified that Alice’s keys have changed. 
    Signal is not vulnerable to WhatsApp’s auto-retransmit behavior, but the in-flight message is also never delivered. Maybe that’s alright, since this doesn’t seem like a very common scenario in normal conditions, and Signal “fails closed,” but a better UX for this corner case would be a nice touch.
  4. Verifying UI/UX: The experience here is the best of all the messengers I tried. Signal uses the nice Signal protocol numeric format, has a QR code for quick comparisons, and also includes some really fast sharing features. The app allows me to share the “safety number” through any out of band communication channel on my phone and verify safety numbers I receive easily by comparing with the clipboard. This was designed to be used, was well thought out, and has a nice feel to it.

Signal summary:

Good: As to be expected, Signal is the most conservative and the most secure.

Good: The key verification UI/UX is the smoothest and easiest of all the apps I tried.

Would be nice: Proactive key change notifications like in WhatsApp.

Not great: Messages that are in flight during a key change get lost, and it’s up to the user to infer that based on the absence of a delivery receipt. It would be better if Signal alerted the user and asked if they would like to resend.


Telegram

Telegram results:

  1. Initial contact: What’s strange about Telegram “secret chats” is that they don’t appear to show delivery receipts. The first two messages in the first screenshot above were successfully received and read by Alice, but there’s no way for Bob to know that.
  2. Message after key change: This is where things really break down for Telegram. Alice uninstalled, reinstalled, and then Bob sent her the third message in the first screenshot above. Alice never received it. In fact, Alice never receives another message from Bob, no matter how many he sends. There is no visual indication given to Bob, since messages that do arrive on Alice’s device don’t show delivery receipts either. Bob is also never notified of a key change event.A usability failure case this severe makes me wonder if anyone has ever successfully used “secret chats” in Telegram, but maybe that’s the point.
  3. Key change while message in flight: This is broken too. Bob is not notified of a key change, and Bob’s messages never arrive. I’m not sure whether the messages that Bob sends are being encrypted with the old keys, the new keys, or someone else’s keys, but nobody would willingly use something this unreliable anyway. If Bob somehow discovers that he should create a totally new “secret chat,” he is not notified of a key change then either.
  4. Verification UI/UX: This doesn’t feel like something that was designed to be used. The strange image isn’t a QR code for machine comparison, but something that you’re supposed to visually compare with your eyes. It doesn’t seem hard to make an image which looks roughly the same and corresponds to a different key, but that’s a research project for another day. Visual comparison feels both slower and more difficult than the QR code scans in WhatsApp and Signal. The textual fingerprint is a just long poorly formatted jumble of hex characters, which is consistent with the sloppiness of the entire screen.

Telegram summary:

Bad: Users are never notified of key changes.

Bad: The entire process is flaky and unreliable.

I would say “don’t use Telegram for secret chats,” but it looks like Telegram has already taken care of making it impossible.


Wire

Wire results:

Initial contact: Wire offers a weaker guarantee than the “trust on first use” semantics of other private messengers. Wire will always trust any key or any key change without any notification, blocking or otherwise, unless you explicitly verify all of a contact’s keys and they verify all of yours (very difficult to do, see below).

Message after key change: In my testing, I was able to reproduce results that have not been consistently verified. I have seen key changes without any warnings (as in the screenshots above), but have also seen a key change with a warning after the fact (this did not prevent Bob from sending the message to Alice), and others testing on Twitter (thanks Zaki Manian and @tqbf) have seen blocking warnings.
Wire has responded to say the correct behavior is if both Alice and Bob have bidirectionally verified each-other’s devices, Bob will see a blocking warning here before he can send a message to Alice after a key change. I think this behavior might not be rock solid, and the bidirectional verification requirement is very strange.

Key change while message in transit: Some messengers lose this message, others handle it insecurely. Wire loses the message, and I’m uncertain about the security. There is the same level of uncertainty as with the “message after key change” test above.

Verification UI/UX: Some messengers I tested felt sloppy here, but Wire feels actively user hostile. Unlike Signal’s protocol, where each device shares a common identity, Wire has made the awful decision to give each device its own identity. 
Wire also counts each re-install on a single physical device as a “new device,” which means that a user’s device list quickly starts to look like the screenshots above (this test started with one “device” and quickly ballooned to three “devices,” even though only one physical phone was ever involved).
This means that a user with three physical devices has to get all their devices together and make six comparisons between them every time they make a change on any of them, and users wishing to verify each-other have to do an N² comparison between all of their respective devices.
The actual fingerprint format is 64 hexadecimal characters, formatted in a way that should come with an epilepsy warning. This means that two users with three devices would have to compare an awful 1152 characters in order to verify each-other.

Wire summary:

Very bad: Wire’s key change notification behavior is at best flaky.

Very bad: Wire’s design is hostile to secure communication at a structural level. Even if they fix the flaky key change issues, the way they’ve chosen to design their e2e encryption means that it will always be almost impossible for users to engage with Wire securely.

Very bad: Wire has a weaker guarantee than “trust on first use.” It is similar to messengers that require users to enable a setting for key change notifications, but instead of a setting, users have to complete an impossible task (like comparing 1152 characters), and also ensure that all their conversation partners do the same. This does not feel like key verification that was designed to be used.

Still very bad: I poked around some more and noticed two other critical vulnerabilities, which I’ll write up soon.

In short, I think Wire is the new Telegram.


Allo

Figures 1, 2, 3
Figures 4, 5, 6

Allo results:

  1. Initial contact: Everything seems normal.
  2. Message after key change: Alice uninstalled and re-installed. Again, unlike WhatsApp, Allo does not give Bob a preemptive notice of Alice’s key change. However, once Bob sends Alice a message, a key change notice will be inserted into the conversation before Bob’s outgoing message, and Bob’s outgoing message will fail (Figure 1 above). This is a blocking process, like Signal, but Allo is much less explicit about what’s going on. Merely tapping the blocked message will re-send it with the new key, and it’s up to the user to infer that the failure was because of the key change notice that was inserted in the conversation above it.
  3. Key change while message in transit: Alice uninstalled, Bob sent Alice a message, and then Alice re-installed (Figure 2 above). Bob’s outgoing message remains in the “pending” state, but the conversation history is re-written again to indicate that a key change has occurred (the second instance of “Alice’s conversation code has changed” is inserted before Bob’s outgoing message at the bottom of Figure 2 above).
    The message seems to get stuck in this state. Bob tries sending an additional message, which also gets stuck in this state for a few minutes.
    Finally, Alice tries sending Bob a message (Figure 3 above). When the message arrives, Bob’s two pending outgoing messages are immediately marked as failed, and it’s again up to the user to infer that they’ve failed because of the key change notice that was inserted above them (now quite a ways up the screen).
    Tapping on them immediately re-sends them, and re-orders them in the conversation (Figure 4 above). I tapped on them in the wrong order, so they were sent in the wrong order.
  4. Verify UI/UX: Allo uses Signal protocol, so they also inherit the good numeric format for their “conversation codes.” The screen feels like an afterthought, though. It doesn’t include a QR code, has no sharing functionality, and generally does not feel as if it was designed to be used.
    It’s also worth mentioning that all of this key change behavior is only visible if users opt-in to a setting to be told about key changes (off by default, Figure 6 above). These “incognito chats” are already opt-in, so a double opt-in that users would need to discover is kind of a lot.

Allo summary:

Good: Allo almost handles the tricky case of key change while a message is in flight. It doesn’t automatically send with the new key, like Signal, but it also makes an effort to tell you that the message failed to be delivered, unlike Signal. Unfortunately it seems like the functionality is a little buggy (outgoing messages stuck pending until a message is received), but it “fails closed” at the moment. This is probably easier for Allo than WhatsApp, though, since they don’t support encrypted groups and don’t have to worry about the effects of consistent conversation message order by having a blocking approval process.

Needs work: The UI/UX verification is clearly an after thought at the moment.

Needs work: Requiring users to opt-in twice. Both by selecting “incognito chat” and then by also enabling a setting to see key change notifications.

Bad: The approval process is a little light. It’s up to the user to infer why a message has failed to send, and a single exploratory tap will send the blocking message on its way.


Executive summary:

WhatsApp: Some nice features and functionality, but needs to change the behavior for key changes while messages are in flight. Getting this right while supporting groups is probably tricky.

Signal: Safest choice, but borrowing a few features from WhatsApp could be nice.

Telegram: Not even really usable, but you shouldn’t use it if you can somehow figure it out.

Wire: Thumbs down. A perfect example of what messengers shouldn’t do.

Allo: Surprisingly, Allo looks promising. Fix some bugs, improve the defaults, and it could be a good option.