Using cryptography (TweetNaCl-js) to protect user data

Cryptography is a powerful tool for protecting the ownership and control of data —a necessity for any self-sovereign identity solution. This page gives a short introduction to a list of design principles and considerations of a scheme designed to decentralise the control of user data on a network.

Cryptography In Practice

Protecting data requires the careful planning of software, hardware and network infrastructure in combination with cryptographic algorithms which obscure data with secret keys. As far as implementation goes there’s a fair warning: use existing and well used protocol where possible and don’t invent your own cryptography.

Symmetric vs public-key cryptography

Symmetric-key algorithms use the same key for both the encryption and decryption of a message. This key, in practice, represents a shared secret or a password with which two parties can transmit messages securely. The requirement that both parties have access to the secret key is one of the main drawbacks of symmetric key encryption since it requires that the secret key be shared in person or using existing secure lines.

Public-key (asymmetric) cryptography uses pairs of keys: public keys which may be shared widely, and private keys which are known only to the owner to accomplish encryption whereby two parties can communicate securely without prior knowledge of a shared secret. A message can be encrypted using a public key in such a way that only the matching private key can be used to decrypt that message. This way a message can be transmitted to an intended recipient securely by simply encrypting it using their public key.

Key management

The generation, distribution, storage, use and destruction of cryptographic keys is a highly sensitive component of any data protection scheme and the topic deserves a much longer article. The main challenge for a self-sovereign identity solution is to distribute keys in such a way that only the true owners of personal data can access or control it. The most obvious solution is to have each owner themselves generate and store a key pair and carry out the encryption of their data on a local machine — without needing to ever share or communicate their private key. Distributing encryption and key generation amongst a network of users enables the control of data to be truly decentralised. But this solution isn’t trivial and requires an application or browser extension to manage the local storage of keys and the encrypt/decrypt function.

Tweetnacl-js

Tweetnacl-js is a popular js library for encrypting data asymmetrically using the TweetNaCl protocol (TweetNaCl is a lightweight fork of NaCl). TweetNaCl is designed by the highly regarded cryptographer Daniel J. Bernstein.

Why TweetNaCl-js:

  • Library meets the minimum requirements for most use-cases: asymmetric cryptography, high-level library, authenticated encryption, incorporates initialisation, lightweight
  • Has been audited by security firm Cure53
  • Popular in the crypto community e.g. MetaMask, Stellar, Peerio, Keybase, uPort
  • Well used and many downloads on npm

Diffie-Hellman protocol:

TweetNaCL implements the Diffie-Hellman protocol to achieve asymmetric encryption:

Alice’s public key is combined with Bob’s private key to derive a shared key. This key can also be derived by combining Alice’s private key with Bob’s public key — hence it’s a shared key. Alice and Bob can now communicate securely using this shared key to encrypt and decrypt messages using a symmetric cryptographic algorithm.

Ephemeral (temporary) keys:

Using the Diffie-Hellman protocol to derive a shared key ensures that both Alice and Bob can decrypt any of their previous communications at any point in future. This effect is certainly not always desired. For example, a self-sovereign identity platform might need to encrypt personal data in such a way that only the subject could ever decrypt it — regardless of whoever is performing the encryption. In this case the sender (i.e. whoever is carrying out the encryption) can agree to destroy their key pair immediately after the encryption process to ensure that only the intended recipient (i.e. the subject) can ever derive the ‘shared’ key required to decrypt the message.

Initialisation Vector And Nonce:

Using the same key to encrypt data repeatedly can reveal relationships between segments of encrypted messages — particularly if the content of these messages is in anyway predictable. To combat this a separate value (also known as an initialisation vector (IV) or in some cases a nonce) can be incorporated into the encryption process to introduce a degree of randomness in the output. The initialisation vector is unique to each message and is required for decryption but it is not a secret key and can be transmitted in plain view. Tweet-NaCl incorporates initialisation by default and so the developer is free to make use of the ‘nonce’ property as they wish e.g. a counter value which increments with each message can be used to prevent messages being repeated or delayed spuriously (see replay attacks).

Self-sovereign identity implementation — ZINC

Key generation & storage:

Each user generates a key pair to be used in encryption. Key pairs are stored as a keystore file in which the private key is protected with a password.

Encryption:

Personal data (work identity claims) is encrypted using a key which is derived from the subject’s public key and an ephemeral private key. Only the subject’s private key can be used to decrypt the data in future.

Key extraction & decryption:

The subject’s key pair is extracted from their keystore file. Subject’s data can then be decrypted using the subject’s private key.

Sharing & re-encryption

Subject’s data is decrypted using the subject’s private key. It can then be re-encrypted using the intended recipient’s public key.

Example code:

This is an example script using TweetNaCl-js. This is not audited code and should be treated with caution.

import nacl = require("tweetnacl") // cryptographic functions
import util = require("tweetnacl-util") // encoding & decoding

// encrypted message interface
interface IEncryptedMsg {  
ciphertext: string
ephemPubKey: string
nonce: string
version: string
}
// encrypt
function encrypt(receiverPublicKey: string, msgParams: string) {
  const ephemeralKeyPair = nacl.box.keyPair()  
const pubKeyUInt8Array = util.decodeBase64(receiverPublicKey)
const msgParamsUInt8Array = util.decodeUTF8(msgParams)
const nonce = nacl.randomBytes(nacl.box.nonceLength)
  const encryptedMessage = nacl.box(
msgParamsUInt8Array,
nonce,
pubKeyUInt8Array,
ephemeralKeyPair.secretKey
)
  return {    
ciphertext: util.encodeBase64(encryptedMessage),
ephemPubKey: util.encodeBase64(ephemeralKeyPair.publicKey),
nonce: util.encodeBase64(nonce),
version: "x25519-xsalsa20-poly1305"
}

}
// decrypt
function decrypt(receiverPrivKey: string, encryptedData: IEncryptedMsg) {  
  const receiverPrivKeyUint8Array = util.decodeBase64(
receiverPrivKey
)
const nonce = util.decodeBase64(encryptedData.nonce)
const ciphertext = util.decodeBase64(encryptedData.ciphertext)
const ephemPubKey = util.decodeBase64(encryptedData.ephemPubKey)
  const decryptedMessage = nacl.box.open(
ciphertext,
nonce,
ephemPubKey,
receiverPrivKeyUint8Array
)
  return util.encodeUTF8(decryptedMessage)        
}

Known Pitfalls to Beware:

  1. Padding: TweetNaCl does not incorporate padding by default. This means that the length of the encrypted output will be directly proportional to the length of the input. A bad actor might learn the contents of an encrypted message simply be comparing it’s length to some known inputs (e.g.“yes” will produce a longer encrypted output than “no”). To overcome this a string of random length could be appended to each message before encryption to introduce a degree of randomisation in the length of the output.
  2. Storing and using keys: By default cryptographic keys should not be stored as plain text and should themselves be encrypted and secured at rest. Another general rule of thumb is to use keys as least times as possible.
  3. Nonce: Using an incremental nonce to prevent replay attacks comes with the side effect that it also reveals valuable traffic information to snoopers.

Alternatives JS libraries

Libsodium (a NaCl fork)

References

This is a working document and I invite your help!