Email is like a fine wine …

What Google should make instead of their OpenPGP extension

Recently I read ‘The Dream Of Usable Email Encryption Is Still A Work In Progress’, by VICE News. It starts like this:

One and three quarter years is a long time to implement PGP in Javascript. Looking at the commits list of the project shows the issue:

There is very little happening. In early January several changes were made to the code. In February only one change went in. In March there have been none. In fairness, April has been a lot busier so far … if you allow a definition of “busy” that encompasses only one developer.

I suspected this might happen when the end-to-end project was first announced. This is clearly not an issue caused by the fine engineers working on the project, this is an issue of resource allocation by management. And the root cause is one of those painfully obvious elephants-in-the-room: end-to-end encryption is incompatible with the things that make Gmail popular. It’s simply not possible that Gmail as we know it and this tool can be heavily used together, so given a choice, Google management correctly bets on Gmail.

Here are some of the advanced features that routine use of PGP on Gmail would break:

  1. Spam filtering
  2. Labelling rules
  3. Importance filtering
  4. Category classification (social, updates, forums, finance, etc)
  5. Google Now
  6. Access from any web browser given only a password
  7. Attachment anti-virus scanning
  8. SafeBrowsing integration
  9. Response prediction
  10. Contact group prediction
  11. Search

And of course the adverts which pay for the whole thing. Although as the new Inbox app doesn’t have ads in it and mobile never did, perhaps they’ve given up trying to profit off consumer Gmail.

The loss of just the first item in that list would be fatal; needing to chuck or re-engineer all of them makes the project unviable. And that’s ignoring the desperately difficult UX problems in PGP.

I’m sure the developers of the end-to-end product know this, and proceed anyway because they see no other way to get that user data encrypted.

But is there a way to get to mass adoption sooner?

What’s the min viable product?

Look at the list above and for each item, label it according to whether it operates on new mail or old mail. If you’re feeling pedantic let’s say “new” mail is anything received within the last week.

Here’s my list of the items that require access to old mail:

  1. Access from any web browser given only a password
  2. Search

All other items in the list are about the categorisation and utilisation of recent messages. Once your mail is filtered, sorted, labelled, scanned, learned from and forwarded the value Gmail adds is primarily archiving with search.

This leads to the following thought. What if instead of yet another slaughter on the battlefields of unusable crypto, just like the previous 10 attempts, Google were to take a different tack and focus exclusively on encrypting aged email. Transmission of emails between accounts would stay like today — protected by TLS and signed with DKIM, but otherwise sent in the clear. Users would be provisioned with a desktop app that generated for them a key, downloaded aged mail, indexed it using a good client side search engine (like Lucene), encrypted it and re-uploaded it back to Google for backup purposes. The app would provide historical search functionality and online updates would be threshold signed by not just Google, but a group of security firms spread around the world. A majority would have to reproduce the build and then sign it for an update to take place.

Would that be useful and what would it take to make that happen?

The threat model

Firstly, let’s consider whether it is useful. This depends on the threat model. Without fail email encryption efforts tend to assume a sophisticated active adversary that wishes to read and/or tamper with the messages you are sending right now. This is one possible threat model, but it’s not the only one. In actuality a huge amount of the valuable information in somebodies email account comes from the conversations they have sent in the past. Both criminal hackers and governments very often become interested in someone due to some event or action that person has taken (e.g. becoming a celebrity, becoming politically popular), and then decide they’d like to dig through their account history. Gmail now has over 10 years of email correspondence for many of its users and there is a wealth of valuable and abusable information in there, often very old.

I spent several years working on the Gmail abuse team. In one example I’m familiar with, a certain type of scam artist phishes passwords and then search through the victim’s mailbox looking for banking correspondence. If they identify that someone has been emailing with their account manager they then read some more historical mail to figure out common mannerisms, how the victim signs their mails and so on, before sending an often very convincing request to the bank to wire money abroad. As the email comes from the right account and sounds completely legit, sometimes banks do fall for this.

The recent tech industry push towards end-to-end encryption is primarily motivated by politics, not criminal activity. Snowden kicked off his leaks by describing what he called “turnkey totalitarianism”, and fears of a growing police state are what have motivated Silicon Valley to take on the FBI in Apple’s recent court case.

But once again, corrupted politicians will very often be just as interested in what was going on in the past as what someone is doing at the moment. When a possible challenger to the status quo appears, attempting to find weaknesses in their past is a time-honoured political technique. This is why the Stasi kept files on almost everyone, even people who weren’t considered a threat.

This is especially the case because someone who becomes politically important rarely knew that was their fate. “Times make the man”, as they say. Glenn Greenwald had no idea he might become of prime interest to the NSA for most of his career, so even if he had adopted PGP the moment Snowden asked him to (spoiler: it was too hard and he didn’t), that wouldn’t have helped with all the old communication sitting around. This is something frequently ignored by cryptographic developers who seem to implicitly assume that if all communication shifted to their awesome new system today, waiting a generation for the old mail to steadily become irrelevant is totally OK.

Is it pointless to encrypt old mail if a state-level adversary could just collect it all off the wire and build their own unencrypted copy? You can argue not:

  • With widespread SMTP-TLS, only organisations in a position to force provider cooperation could mount such an attack, as traditional wire taps wouldn’t work. Most countries have legal mechanisms for this only at a targeted (warranted) level.
  • It’d require enormous spending on storage and indexing, because the attackers want to do searches too! At the scale of modern webmail services this is not trivial, and would consume significant adversary resources.
  • Assuming it’s harder to phish somebodies cryptographic key than their password (as they type in the password much more often), it’d do severe damage to bank fraudsters and other scammers who benefit from impersonating their victims.

These three factors combined means that for a lot of common adversaries that’d be the end of the road, unless they only cared about very recent communication. It’s a common mistake amongst encryption fans to assume that the only adversary that matters is the US Government, but in reality … even if America does become a police state … there are countries in the world that already are that way. Rolling barrels down the hill at them has value too.

(Of course, it goes almost without saying that these two different approaches are complementary and can both be implemented at once)

Implementation thoughts

Here’s how I’d implement this idea in a cheap and hopefully usable way.

I’d use the notion of ‘key words’, as popularised by Bitcoin wallets and now Keybase. In this scheme an elliptic curve private key (which is short) is represented using a series of words from a dictionary. This makes it much easier to write down and thus you can back up your key with a pen and paper. Bonus points for being able to use the exact same words and key as Keybase. This is (effectively) a symmetric key, i.e. a password. So it’s easily understood by the userbase.

Recently WhatsApp launched with end to end encryption, and they didn’t use ‘key words’ because apparently they don’t internationalise well. Rather than using words or numbers, perhaps you could use the new global alphabet of Emoji instead.

There are three ways to do the index:

  1. Build it locally in the background when the app is first installed, with the emails being erased from the mailbox when the backup of that chunk of data is complete. In this approach if the user reinstalls the app on a new machine, they must wait whilst the old emails are re-downloaded and the index is rebuilt.
  2. Alternatively, the index can also be backed up. But search indexes are often of similar size to the mailbox itself, so there’d still be a long delay, albeit shorter than rebuilding from scratch. This might be acceptable for security conscious people as it’d happen only rarely.
  3. The most ambitious approach would involve being able to search the encrypted index without downloading it. You could try using some of the new searchable encryption schemes that have been developed, or you could use a more pedestrian filesystem block encryption approach where the goal is to avoid as many “disk seeks” (i.e. remote server requests) as possible. But these are complex and potentially fragile — you’d want to try this if you had cryptographic researchers on your team and with a health warning attached. It should, at any rate, be at best an opt-in feature.

To save money and prove out the concept I’d probably write it as a desktop Java app so I can use Lucene for search and UpdateFX for threshold signed updates. Modern JVMs allow you to configure them for maximum memory saving and will release garbage collected memory back to the OS when not using it; they also have a CSS styleable 3D accelerated UI toolkit, so this approach isn’t as suicidal as it may seem.

I’d find the best independent security auditing firms in a variety of countries where the people are educated and the governments are unlikely to cooperate unless there’s a convincing need — perhaps America, Germany, Russia, Iran and Argentina. I’d use in-app sandboxing to minimise the amount of code that would require a threshold signature, allowing me to iterate quickly on non-security sensitive parts of the app, whilst still providing some easily understood social assurances about the software’s security.

The app would present a simple search UI that looked like Inbox. It’d issue searches to both the server and the local index in parallel. Opening the mail would download it and decrypt it for rendering locally.

The hardest problem would be supporting mobile. You pretty much have to query an encrypted index remotely to do that. But this is at least a pure computer science problem of the type Google excels at, rather than a hard UI and social problem. And it isn’t unique to this approach: they’d face the same issue if the end-to-end project took off as well.