The Absolute Minimum Every Web Developer Should Know About Cryptography
Abstraction is the magic sauce of software development. Because it allows building ever more complex systems without having to be too concerned with the underlying complexities. It doesn’t allow a total disregard for the what is abstracted away, however. Abstractions fail and when they do it is the job of the software developer to examine those hidden complexities in order to find and fix what failed.
The same thing applies to the real world. If I turn the ignition of my car and nothing happens, I know something is most likely wrong in the electrical system — either the battery, alternator or ignition. Knowing this I can try and fix it myself, or send it to a mechanic. Even if I don’t fix it myself, I can better communicate to the mechanic what is wrong, in his own lingo. All because I know a few details underlying the abstraction of a car.
This post is about peeling back a bit of the cryptographic abstractions of the web. When thinking about cryptography on the web, HTTPS is usually the first thing to come to mind, so let’s start there…
What is HTTPS?
HTTP is the language of the web. It is the protocol describing how browsers and servers communicate. HTTPS adds encryption on top of the HTTP protocol through the cryptographic protocol of SSL/TLS. SSL is actually the older version of TLS, but they can be thought of as synonymous. To make matters more confusing, there are multiple versions of TLS. TLS 1.2 is fairly standard with 1.3 currently being rolled out.
HTTPS has three goals:
- Communication between the browser and server is confidential. E.g. a network eavesdropper can not interpret what is being said between the two parties.
- Communication between the browser and server has integrity. E.g. a network attacker can not secretly alter the communication.
- The server is authentic. E.g. a network attacker can not spoof the legitimate server without the client’s knowledge.
1 — Looking Under the Hood of HTTPS — It All Starts with a “Hello”
Navigating to a ‘https’ site within the browser kicks off the SSL session by sending a Client Hello to the server. The Client Hello is just the browser’s way of saying, “Hey, I want to establish an SSL session.” The server responds with a Server Hello containing the Server Certificate with its Public Key. This is often referred to as the handshake.
Server Authenticity through SSL Certificates
Every server providing SSL has its own unique public key used to establish SSL sessions, but how can the browser be sure the public key belongs to the website it is trying to connect to? SSL Server Certificates are the answer. The browser does not just trust the public key in the Server Hello, instead, it checks that the public key is authentic by verifying the digital signature of the certificate.
SSL Certificates bind a public key to an identity on the web.
Generating SSL Certificates
In order for a company to obtain a certificate for their site, they must validate their identity with a Certificate Authority (CA). The Certificate Authority digitally signs the company’s public key with a secret key that only the Certificate Authority knows. When a browser verifies the public key of a site, it uses the Certificate Authority’s public key.
Problems with SSL Certificates
There are a number of problems with certificates in general. Firstly, it is hard to revoke a digitally-signed certificate once it has been issued. Certificates have an expiration date so expiration can function as a mode of revocation. Unfortunately, a certificate can be valid for multiple years so revocation through expiration is not ideal.
An even bigger problem is that it is possible for CAs to issue Rogue Certificates. A Rogue Certificates is a certificate issued to an imposter. This allows the imposter to spoof the site with a Man-in-the-middle attack, encrypting and decrypting all traffic to and from. This happens in the real world. For example, the Symantec CA issued a rogue certificate for google.com.
Browsers are starting to incorporate some defenses against Rogue Certificates such HTTP Public-key Pinning (HPKP), which allows the server to declare which CAs are allowed to sign their certificate in a Response Header. Another stronger defense is Certificate Transparency, which requires CAs to publicly log all certificates issued, allowing companies and other identities on the web to monitor the log for rogue certificates.
The Power of Digital Signatures
Digital Signatures bind a message to its author. In the case of server certificates, it binds a public key to its server. But anything can be digitally signed. The basic idea is that the signer signs the message with its private key and the author can then be verified by using the signer’s public key. Unlike a signature in the real world, a digital signature is bound to the contents of the document itself so if the document changes, the signature becomes invalid.
2 — From Public Key to Shared Session Key
Once the public key is obtained and verified by the browser, it can use Public-key cryptography to generate a shared session key with the server. Public-key cryptography uses a public key to encrypt a message and a secret key to decrypt. In the case of HTTPS, the browser generates a secret, which it encrypts using the server’s public key. The server then decrypts the message using its secret key, establishing the secret between the browser and the server, which can be used to generate the shared session key.
There are a few public key algorithms used by HTTPS. A couple popular ones are RSA and Diffie–Hellman.
3 — From Shared Session Key to Encrypted Traffic
Public-key cryptography is great for key exchange but is not fast enough to be used for encrypting traffic. With the shared session key established during the handshake protocol, Symmetric-key cryptography can be used to encrypt data between the browser and server. Symmetric-key cryptography uses the same shared secret key to encrypt and decrypt the ciphertext. AES is the standard Symmetric-key algorithm used.
4 — MACs for Integrity
Public and Symmetric key cryptography does not solve the problem of integrity. A network attacker could still alter ciphertext is transit without the receiver’s knowledge. The solution to the integrity problem is to append a Message authentication code (MAC) to the ciphertext. A MAC is a checksum generated with a secret key. The checksum is generated on the ciphertext so it can be verified on reception with the shared secret key. The type of MAC primarily used on the web is a Hash-based message authentication code (HMAC).
Other Things to Keep In Mind
JWT Tokens Only Provide Integrity
JSON Web Tokens (JWTs) are used all over the web these days. They are usually generated by Base64-encoding the payload and then appending an HMAC of the encoded value, providing message integrity and not confidentiality. Therefore, sensitive data should never be included in the token.
Use Authenticated Encryption For Confidentiality + Integrity
In the old days, you had to “roll your own” when you needed confidentiality and integrity, combining a MAC algorithm with an encryption cipher like AES. These days there are packaged solutions like AES-GCM.
Beware of Insecure Algorithms and Never Roll Your Own
AES took almost four years to develop with teams of some extremely smart people. Writing a secure encryption algorithm is a task of titanic proportions and the secure algorithms of today will not be secure in the future. For example, DES was a symmetric key encryption algorithm developed in the 70s. Back in the day, it was considered secure, but today, with advancements in technology, it can easily be broken.
Viewing Encryption Information for a Site Within the Browser
A cool feature of modern browsers is that you can easily view the certificate, TLS version under use, etc. In Chrome, just open up the Developer Tools and navigate to the Security tab. This is the SSL Server Certificate for Medium.com:
As you can see, their certificate is issued by DigiCert. The certificate expires on August 30th, 2019 and is digitally signed using SHA-256 with RSA i.e. hashed using SHA-256, followed by public-key encrypted using RSA. The SSL’s Public-key Algorithm is RSA-2048 (RSA with a 2048 bit long key).
You can also view more detailed connection info:
It’s using version 1.2 of the TLS protocol at the SSL layer and AES-GCM for encryption and message integrity. ECDHE RSA is the specific Public-key cryptography algorithm used for the key exchange. ECDHE stands for Elliptic Curve Diffie-Hellman Exchange, which is a more secure version of plain DHE.
The End
I hope this peak under the hood of HTTPS was enjoyable and helpful. Please let me know what you think.