TCP and TLS handshake: What happens from typing in a URL to displaying a website? (Part 2)

12 min readApr 23, 2022

After type in a URL of a website, the browser starts to search for the IP address of the server which hosts the website. This process is called DNS lookup, which is discussed in my previous article.

In this article, we look at what happens after typing in a URL and going through DNS lookup. The browser can start connecting with the server, with the following steps:

TCP three-way handshake (SYN, ACK & SYN, SYN)
TLS handshake (website that has HTTPS)

TCP three-way handshake and TLS handshake

Image from Cloudflare

After DNS lookup, the browser (client) starts to establish a connection with the server. TCP is a common protocol used for connection, compared to UDP. A server must have been listening to a port already, such as 8080, for preparing to handle TCP connection from clients. How many TCP connections can be handled on a server depends on CPU and memory.

To start a TCP connection, the first step is three-way handshake.

TCP three-way handshake

Image from wikimedia

SYN: Browser sends a SYN packet to server, with a random sequence number x. The packet also includes TCP flags and options
SYN-ACK: Server receives the SYN packet from the browser. It needs to return a SYN-ACK packet that includes two sequence numbers. For ACK, it is x+1 which acknowledges the packet sent from the client. For SYN, the server picks a random sequence number y on its side. Then it sends the packet to the client
ACK: The client receives the SYN-ACK packet. Similarly, the client acknowledges the packet from server, by incrementing the sequence number picked by the server, i.e y+1. Then, the client sends a ACK packet to the server with the sequence numbers y+1 and x+1.

Now the TCP connection is established and the server reserves some memory to handle this connection.

Before sending out HTTP request

The TCP connection is initiated and we can start transmitting data now. However, for website with HTTPS, we have one more step to do, which is to ensure the data is transmitted securely by encryption. That is the why TLS comes into play before transmitting data.

Why do we need TLS?

Answer from RFC 8446 (TLS 1.3):

The primary goal of TLS is to provide a secure channel between two communicating peers; the only requirement from the underlying transport is a reliable, in-order data stream.

For example, if the user sends a POST request to server with credit card information, the information will be seen by the attacker who keeps track of the session. Hence, we need to keep the data secret, by using TLS protocol.

TLS properties

The secure channel provided by TLS have these properties:

Authentication Server is always authenticated but client is optionally to be authenticated, by using different cryptography (RSA, ECSDA…)
Confidentiality Data is only visible to the endpoints.
Integrity Data cannot be modified.

Keep these properties in mind. They help us to understand why TLS handshake work with certain steps later.

Steps of TLS handshake (TLS 1.2)

TLS 1.3 is the latest version now, which is a simplified version of 1.2. Yet, for having a better understanding, let’s start from looking into TLS 1.2.

Message flow for a full handshake

Full handshake of TLS 1.2

Image from RFC5246
“*” Indicates optional or situation-dependent messages that are not always sent.

The diagram above describes both essential and optional steps of TLS handshake. A shortened version is also provided below, which shows only the essential steps:

Image from RFC5246

Few important actions we do in TLS handshake:

Negotiate the cipher suite used in the communication between server and client. Hence, they agree using a specific way to encrypt and decrypt data
Server sends its SSL certificate to client for verification

Why do we need these actions? For the first bullet point, the answer lies in the one of the properties of TLS I mentioned before:

Confidentiality: Data is only visible to the endpoints.

To ensure data is only visible to the endpoints, when it is transmitted between client and server, they need to agree using the same cipher suite, in other words, negotiate to use the same algorithms and protocol, such as exchanging a key to encrypt and decrypt data, i.e. symmetric encryption.

Similarly, for the second point, the server needs to prove its identity, by sending its SSL certificate to client. This action is reasonable when it comes to the following property of TLS:

Authentication Server is always authenticated but client is optionally to be authenticated, by using different cryptography (RSA, ECSDA…)

To understand the steps of how the how browser verifies SSL certificate, this Stackoverflow discussion maybe helpful for you.

Now we understand the important steps in TLS handshake and how they relate to the properties of TLS protocol. Let’s go through the details of TLS handshake.

TLS handshake (TLS 1.2)

The diagram below is a shortened version of TLS handshake with essential steps and RSA based cipher suite. In total, the TLS handshake here takes 2 RTT. The first is from step 1–4. The second is from step 5–9. During the handshake, the server and client exchanges message to negotiation and authentication.

For the detailed description about each steps, this article or RFC 5246 make a clear explanation on it. Hence, I would not go through all the details, instead, I focus on few questions:

What are TLS handshake protocol and TLS record protocol?
What is pre-master secret?
What is master key? Why do we need it?
What does Change cipher spec mean?

What are TLS handshake protocol and TLS record protocol?

TLS protocol is made up with different protocols.

In the book High Performance Browser Networking, it mentions the workflow here:

Record protocol receives application data.
Received data is divided into blocks: maximum of 214 bytes, or 16 KB per record.
Message authentication code (MAC) or HMAC is added to each record.
Data within each record is encrypted using the negotiated cipher.

The data comes from application layer, then is passed to session layer and finally is passed to TCP. On the receiver side, it repeats the whole process in a reverse way.

What is pre-master secret?

1. The purpose of pre-master secret

The purpose of having a pre-master secret is only for generating a master secret. In other words, we don’t encrypt or decrypt data with pre-master secret.

2. How pre-master secret is created?

The pre-master secret is created based on the cipher suite that server and client agree to use. The way that pre-master is created depends on the chosen cipher suite. For example, a 48-byte premaster secret is generated, if RSA is the key agreement. After that, the value is be encrypted with the public key extracted from server’s SSL certificate and is sent to the server.

3. How does server decrypt the value?

Server decrypt it by using the private key in the SSL certificate. We can use a private key to decrypt a value encrypted by the public key. That is the key concept of RSA.

Master secret

1. The purpose of master secret

Server and client encrypt data with master secret before transmitting it to each other.

In RFC5246, it mentions how master secret is used:

The master secret is expanded into a sequence of secure bytes, which is then split to a client write MAC key, a server write MAC key, a client write encryption key, and a server write encryption key. Each of these is generated from the byte sequence in that order.

We have 4 keys generated from master key:

client write key
server write key
client write MAC key
server write MAC key

For understanding what are these keys for, I recommend to read Cloudflare’s article . The explanation is short and simple.

2. How master secret is created?

master_secret = PRF(pre_master_secret, "master secret",
                  ClientHello.random + ServerHello.random)

After step 5, both server and client have already obtained the necessary values for producing a master secret. They can now generate the same master secret.

What does Change cipher spec mean?

In step 6 and 8, client and server send change cipher spec message. The answer is revealed in RFC5246:

The ChangeCipherSpec message is sent during the handshake after the security parameters have been agreed upon, but before the verifying Finished message is sent.

So, what is it used for?

The ChangeCipherSpec message is sent by both the client and the server to notify the receiving party that subsequent records will be protected under the newly negotiated CipherSpec and keys. Reception of this message causes the receiver to instruct the record layer to immediately copy the read pending state into the read current state. Immediately after sending this message, the sender MUST instruct the record layer to make the write pending state the write active state.

ChangeCipherSpec message is to notify the other side, from now on, the messages I sent to you will be encrypted by using a symmetric shared key.

DH and RSA

The concepts of master key and pre-master key mentioned above are for RSA based cipher suite. Besides, Diffie-Hellman (DH) is another choice, which uses DH public key and private key.

RSA cannot achieve forward secrecy. The meaning is that RSA is unsafe since all the messages will be decrypted if attacker gets the private key stored in the server. Yet, DH with ephemeral mode (i.e DHE), has forward secrecy. To improve security, RSA is removed and DHE is kept in TLS 1.3.

Now we understand the overview of TLS 1.2. It’s time to get into the latest TLS handshake, TLS 1.3.

The difference between TLS 1.3 and TLS 1.2

With understanding TLS 1.2, we can easily realise why TLS 1.3 is faster and safer. Let’s start by looking the overview of full TLS handshake in version 1.3.

Full TLS handshake 1.3

Here are 3 main phrases of the handshake mentioned in RFC 8446:

Key Exchange: Establish shared keying material and select the cryptographic parameters. Everything after this phase is encrypted.
Server Parameters: Establish other handshake parameters (whether the client is authenticated, application-layer protocol support, etc.).
Authentication: Authenticate the server (and, optionally, the client) and provide key confirmation and handshake integrity.

Key Exchange

1. Client Hello

RFC 8446

We would not cover all the details, instead, I would like to focus on the notable differences compared to TLS 1.2. Compared to TLS 1.2, similarly, the first step is always Client Hello, but the parameters look different here.

What is key_share?

RFC 8446:

In the Key Exchange phase, the client sends the ClientHello (Section 4.1.2) message, which contains … a list of symmetric cipher/HKDF hash pairs; either a set of Diffie-Hellman key shares (in the “key_share” (Section 4.2.8) extension), a set of pre-shared key labels (in the “pre_shared_key” (Section 4.2.11) extension), or both;

We can conclude that here are the supported key exchange modes:

(EC)DHE, i.e: Elliptic-Curve Diffie-Hellman ephemeral
PSK-only
PSK with (EC)DHE

What is (EC)DHE?

Before we jump into (EC)DHE, we have to know what is Diffie-Hellman(DH). In short, Diffie-Hellman is a key exchange method. The RFC mentions 3 types of Diffie-Hellman:

DH: static Diffie-Hellman (removed)
DHE: Diffie-Hellman ephemeral
ECDHE: Elliptic-Curve Diffie-Hellman ephemeral

For the first type, DH is removed in TLS 1.3 due to the lack of perfect forward secrecy, which means all the messages will be leaked if attacker gets the private key.

For the second and third types, unlike static DH which uses the same private key, Ephemeral Diffie-Hellman creates new private key for each new connection, based on the parameters exchanged during the handshake. For ECDHE, the main difference with DHE is to apply algebraic structure of elliptic curves, for decreasing the key size while keeping a similar security level.

Diffie-Hellman parameters

Compared to TLS 1.2, TLS 1.3 still applies some types of Diffie-Hellman, but it limits the Diffie-Hellman parameters since not all of them are secure.

What is PSK (Pre-Shared Key)?

Different from TLS 1.2, TLS 1.3 enables client and server create a new connection based on the their previous connection, by using PSK.

The top is the full handshake and the bottom is the shortened handshake with using PSK:

+  Indicates noteworthy extensions sent in the previously noted message.
*  Indicates optional or situation-dependent messages/extensions that are not always sent.
() Indicates messages protected using keys derived from a client_early_traffic_secret.
{} Indicates messages protected using keys derived from a [sender]_handshake_traffic_secret.
[] Indicates messages protected using keys derived from [sender]_application_traffic_secret_N.

How does it work?

After client and server finishes the first full handshake process, the server sends a PSK to the client.
When the client comes back and connect to the server again, the client sends the PSK to the server for negotiating to use PSK for the connection.
If the server accepts, the connection references the previous cryptographical context, instead of going through the full handshake again. For example, the server doesn’t need to send the certificate to client for verification.

This IBM documentation makes a simple step by step explanation on it. Be aware that resumption is not a totally new feature. In TLS 1.2, this is achieved by using session ids and tickets.

How does PSK bring in 0-RTT?

If client and server have PSK, client can send data on the first flight (“early data”). Client uses PSK to encrypt early data and authenticate server.

Yet, this handshake comes with different security concerns, as mentioned in the Cloudflare article

2. Server Hello

Let’s get back to the step 2 of the full handshake. Now the server has received Client Hello message, it sends back a Server Hello message.

The Server Hello message is similar to TLS 1.2. Server checks if the parameter sent from client is acceptable or not. It also sends its certificate to client for verification.

3. Finished message

After all, the client sends a finished message to server. Then client and server starts exchanging encrypted data.

TLS 1.3 improvement summary

Improve speed performance TLS 1.3 only takes 1-RRT (or even 0-RRT!), but TLS 1.2 takes 2-RRT.
Remove insecure cryptography RSA key exchange due to its secure vulnerability. Only Diffie-Hellman key exchange mechanism is used for key exchange in TLS.

For more details about the improvments, Cloudflare’s article make a good explanation.

Recap of the difference between TLS 1.2 and TLS 1.3

The left is TLS 1.2 using RSA based cipher suite, the right is TLS 1.3 using DHE.

Summary

I summarise my learning about TCP 3-way handshake and TLS handshake (1.2 and 1.3 version) in this article, including these key concepts:

When browser starts connecting with server, it has to go through TCP 3-way handshake, TLS handshake and then sends a HTTP request to the server
TCP 3-way handshake includes SYN, ACK-SYN, ACK, with an incremental sequence number
After TCP 3-way handshake, if the website is HTTPS, browser and server starts TLS handshake
The main purposes of TLS handshake are to negotiate the cryptography used to encrypt data and to verify the server based on SSL certificate
RSA and DH are the common cryptography used in TLS handshake. The former relies on pre-master key and master key, while the latter relies on public DH key and private DH key
TLS 1.3 major improvements include reducing RTT (from 2-RTT to 1-RTT) and enhancing security by removing insecure cryptography