Benchmarking QUIC

Summer 2020

Introduction

General overview of difference between QUIC and TCP. It’s important to note that QUIC is built on top of UDP, which means it can be implemented in user space along with in the kernel. Image is slightly outdated since TLS1.3 is being adopted for TCP and QUIC now. (Image from the Uber blog post linked above).
This is a basic overview of how a TCP packet traverses through a Linux machine. For packets without any application data (e.g SYN, empty ACKs, CLOSE, WND_UPDATE, etc), the kernel does not need to copy data from kernel socket buffers to protocol recv buffers. Since QUIC is built on top of UDP, all packets are copied to user space. As a result, QUIC requires more data copying (which is expensive). Here is an interesting video on Facebook’s experience with connection-establishment performance between QUIC and TCP in production.

Benchmarking QUIC Clients

Preface

Setup

  • Google Chrome Canary (H2 + H3)
  • Curl (H2)
  • Ngtcp2 (H3)
  • Facebook Proxygen (H3)
Loading a 10 MB page in the foreground
Loading a 10 MB page in the background. Crazy difference! Also note the protocol being used is h2, so this ‘feature’ is applicable regardless of the network protocol used.
TCP Window Size over time when loading 10MB web page in Chrome foreground
TCP Window Size over time when loading 10MB web page in Chrome background. Note the x-axis scale difference!

Network Simulation

Results

Bandwidth

Graph showing page-load times during 0% loss, 0ms added delay, and 10MB bandwidth. Y-axis represents time elapsed. X-axis represents distinct server endpoints tested. Each dot on the graph represents the mean page-load time over ~10–20 iterations. The arrows represent std. dev. Dotted line is purely for visual purposes. Firefox is left out for being difficult to work with.
Same network conditions as graph above. Only difference are endpoints shown on X-axis, which are much larger than endpoints in above graph.
Wireshark capture of QUIC packets for 5 consecutive requests to speedtest-0B using Chrome. As you can see, the QUIC handshake is only performed once at the top. Also the same DCID (Destination Connection ID) is used for all requests. The DCID is intended to be a random sequence of bytes for each QUIC connection.
Wireshark capture of QUIC packets for 5 consecutive requests to speedtest-0B using ngtcp2. You can see the QUIC handshake being performed multiple times (once for each request). Also, different DCIDs are used throughout the packet capture, which indicates separate connections.
Wireshark capture of QUIC packets for 5 consecutive requests to speedtest-0B using Proxygen. Again, we see multiple sets of initials and different DCID’s in the capture which shows multiple established connections.

Delay

As stated before, Chrome has an advantage over command-line clients since it will reuse the same connection. This explains why Chrome performs considerably better on small webpages compared to other clients. Given that the the initial handshake for QUIC or TCP will take even longer when introducing delay, the gaps shown in this graph are justified.
We can see the effects of the handshake delay for large web pages too, albeit less pronounced due to scale of the graph

Loss

Couple interesting things about this graph. First, ngtcp2 has an absurd amount of variation since we can’t even see the std dev arrows on the graph. Curl’s performance is also inconsistent. Lastly, Proxygen seems to catch up to Chrome in the face of loss. It might be worth examining Proxygen’s loss-handling logic in the future to see how it can match Chrome’s performance despite undergoing a handshake each iteration.
The visualization above shows the sequence of packets between the ngtcp2 client (left) and the Facebook CDN server (right). We can see that ngtcp2 will undergo exponential backoff when resending the handshake packet. In this case, the client did not receive an ACK from the server for any of its 8 handshakes. As a result, the default timeout was hit and the connection aborted.
The std dev may look just as large as the previous graph, but notice the Y-axis scale difference. Performance is much more consistent here.
In this graph, Curl continues to perform worse. Rest of the implementations have equal performance.

Loss + Delay

Curl continues to perform poorly in the face of loss. This is the first time we see Chrome H2 perform poorly relative to other clients.
We finally see an instance where all QUIC implementations clearly perform better than any TCP implementation!

Conclusion

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store