We don’t want your data — Pushing boundaries in data collection and end-to-end encryption for apps
Guilty as charged. That’s how I felt when we set about building Tap. You see, like so many before me, I have previously built products that sucked up data, piled it in to a database and then tracked user activity etc.
So, we thought differently with Tap and were lucky to be starting from scratch — a blank canvas, lush! Here are the two overarching principles we wanted to follow:
- Collect as little data as possible from the user in the first instance. Sometimes known as Privacy by design or beyond this Privacy by default.
- Don’t hold anything personal that isn’t encrypted and ensure that only the intended parties can decrypt it.
These both sound pretty straight-forward but the reality is far from simple.
So, starting from the moment you open the app:
Apps generally need user profiles. User profiles need data.
Common practise would suggest we need to collect an email and password. This immediately violates principle 1 and it also strongly leans towards violating point 2 at the same time.
Our solution? We generate a user-profile on the fly without your intervention. We assign a random email address to you (this will be used in some later functionality but for now, it’s not relevant). We also create a super secure password that you’ll never know and never need.
Great. You haven’t given us any data. You have an account. Your app can log in. We have no idea who you are!
So, the basic principle of Tap is this. You send a request to an organisation, they send back your data.
Before we begin…
Asymmetric cryptography, also known as public key cryptography requires a public and private key pair. The encryption / decryption process goes something like this:
Alice wants to send Bob some data securely. Alice takes the public key from Bob (this is the bit Bob can divulge to anyone) and she encrypts it using a given encryption algorithm (or known as a cypher). Bob can then decrypt the data using his private key (this is the bit that he should never divulge to anyone) and the same cypher. And voilà — the data in all its clear text glory! It looks like this:
The security now largely comes down to the key management. Anyone that holds the private key can decrypt the data and, although it makes the process of getting the data more tricky, it may as well be plain text if you have access to the key. This is important, because a lot of systems will say “we store the data encrypted” but not offer up any information as to how they are managing keys.
Sending a request
When a request (e.g. a data subject access request) is sent from a user to an organisation we want to encrypt it with the public key from the organisation (as shown in the example above) so that only the organisation can see it. The problem we have here is that organisations don’t generally know about key management and it’s a bit of a problem for every team member to have the keys and also a bit of a usability nightmare.
How do we manage this? Firstly we issue the keys to the organisation and store them. The organisation is largely unaware of this (although we do let them back them up). This means the user can send an encrypted request. However, as stated previously, we have the keys! So, at this stage, it is a nice extra line of defence over having the plain text data should for any reason the database gets leaked, but not a lot of use if the database got leaked with the keys!
The way we solve this is we let the organisation set an optional shared team password (we might force this at some stage) that we use to encrypt their keys. So, although we still have the keys, we can’t use them. Furthermore, this is all done on the browser of the organisation so we never know what that shared password is.
The request data is sent from our server to the organisation encrypted and it’s decrypted in the browser by the organisation (unless they’re using Internet Explorer). You can actually see this happening:
We could eliminate this flash of encrypted data but I kind of like it. It is a glimpse under the hood and shows things are secure.
Getting back a response
Receiving a response from the organisation securely is a bit easier because we don’t have to worry about key management in the same way. The keys are generated on the user’s device and can be used transparently without the user needing to know and we never hold the private key.
When the organisations uploads data or files this is also encrypted client side in the browser, so it never hits our servers as clear text.
We’re really proud of what we’ve done to minimise our personal data footprint.
The purpose of our product is to help people take back control of their data and it felt simply wrong for us to then be sitting on all that data.
It hasn’t been easy adding this level of encryption in to a product. And it would be difficult to retro-fit in to something pre-existing. It has impacted almost every aspect of our build and I’d say has conservatively added a 50% overhead to both time and budget.
The lack of data we hold also means we can’t rely on a business model that monetises peoples data in the traditional sense. I’m ok with that.