Tenable TechBlog
Published in

Tenable TechBlog

Let’s Reverse Engineer Discord

Quick note: This research was a joint effort between Joseph Bingham and David Wells.

The Discord chat app was the target of our latest research project. While this blog will not be covering any exploits, we will share what we learned about the Discord call protocol, and share our insights into the audio/video privacy of Discord and how we were able to prove Discord audio/video calls are decrypted and inspected by Discord servers.

Reverse Engineering Discord

Below shows an RTP packet captured from a Discord client that is sending audio data. In this RTP payload you can see there is encrypted audio data (Salsa20 encrypted), an authentication tag, and a packet sequence number (used for nonce).

If we decrypt the payload section, we can see the audio data, which we found to be Opus codec. In the instance of video streaming, this would be H.264 MPEG codec data.

Once we grasped the server interactions, protocol, and encryption methods, we were able to develop a minimal “mock” discord client, which allowed us to initiate calls and properly fuzz protocol data to find bugs. While we won’t be going over bugs, one interesting side effect we noticed was evidence that Discord servers decrypt and inspect all user’s audio/video data server-side in real-time.

Discord Inspects Users’ Traffic

Our Testing

This was tested by crafting a malformed audio packet from our ”mock” Discord client (Client 1), properly encrypting it, and sending it along with our existing mock audio stream. All “valid” audio data passed through the server to Client 2, however, we witnessed the server drop the malformed audio packet (which were encrypted), thus not delivering it to Client 2.

Below, we can see our mock Discord client sending a valid RTP one-byte extension header along with Opus audio data to our remote Discord client.

After encrypting the entire stream and sending with an RTP header, we can see this packet received and decrypted by our remote Discord client which is in a debugger.

Back in our mock Discord client, we now malformed this data by changing the length field byte in the RTP one-byte extension header with a length larger than expected.

Sending this encrypted data over to our remote Discord client, we no longer can see the packet received under debugger.

This effect can also be seen in Wireshark, as an insufficient amount of packets even make it to our remote Discord client, which certainly means there is some MITM decryption, validation, and dropping occurring at Discord servers.

We tested this malformed audio packet dispatch at various points during a voice call and consistently watched all malformed audio packets dropped by the server, which means that Discord servers are actively decrypting and inspecting all audio/video communications in real-time and not just some.

Final Thoughts

Whatever the reason, it must be very important due to the latency and computational overhead for Discord servers to decrypt and inspect all audio/video communications in real-time (Two and a half million concurrent voice users a day). Discord provides a policy regarding user privacy, which explains it may capture “transient VOIP data”. While it’s a bit unclear what this may entail, our research shows that this “data” includes all voice and video data. Even if Discord’s original intention was to hold user privacy with high regard, they have recently finished their series F funding raising more than $150M from several diverse corporations whose intentions with the platform are unknown.