Vuvuzela Reproduction

Published in

Princeton Systems Course

9 min readMay 24, 2019

Sam Ginzburg and Benjamin Kuykendall

Introduction

Vuvuzela [1] is a secure messaging system that protects client metadata. The system maintains very strong privacy guarantees compared to the anonymity system we learned about in class. Even if all but one of the servers are controlled by a malicious party, Vuvuzela completely hides client data and metadata. The main challenge of the system is minimize latency while still maintaining these stringent security properties in a scalable system.

Demo of our system in action with μ=100, for demonstration purposes

We tasked ourselves with reproducing the Vuvuzela system. We wanted to verify that latency scales linearly with the number of users; this is the property that makes Vuvuzela scalable. Further, we wanted to understand more precisely where the overhead in Vuvuzela comes from: how much do IO, mixing, and public key cryptography each contribute to latency?

System overview

The system uses three tools to ensure privacy: mixing and deaddrops to hide who is participating in a conversation, onion encryption to protect the content of messages, and noise to hide when a conversation begins and ends.

Imagine our system has users Alice, Bob, and Charlie. Alice and Bob are talking to each other; Charlie is not talking to anyone. Every round, all three users will send a message. This is important for security: if a user did not send a message, then an eavesdropper would discover that he or she was not talking to anyone.

Mixing and Deaddrops

Alice and Bob will agree on a secret deaddrop location — really just an integer — and attach it to their messages. Charlie chooses his own random deaddrop location as well. Then, a server will sort pointer to each message according to their deaddrop locations. If two pointers end up the same location, the underlying messages will be swapped. This procedure correctly swaps the messages of any conversing users, but it does not hide anything.

Messages being sorted into their deaddrop locations. m₀ and m₁ will be swapped

To start hiding who is talking to whom, we introduce some more servers. Before the messages reach the “deaddrop server”, they will pass through two “mixing servers” that randomly permute the order of the messages. This way, even if the first or “head server” knows that Alice sent the first message, and Bob send the second, by the time the messages reach the deaddrop server, the order will be random.

The list of messages is shuffled by each server prior to the deaddrop

Once the deaddrop server swaps the messages, the messages are sent in the other direction, and the mixing servers invert the permutation. Once they reach the head server, they will be in the original order but with messages swapped as appropriate.

This step successfully hides who is talking to whom, assuming that the mixing servers cannot read the deaddrop locations. This will be addressed in the next step.

Onion encryption

Onion encryption allows users to securely pass messages through the mixing servers. It is called onion encryption because the repeated encryptions can be visualized as layers around the message. In order to implement onion encryption, we need a primitive called key exchange, which lets a user and a server share a derived key.

Key exchange lets Alice and Bob both derive the same key using different information

To encrypt her message, Alice will sample a public and private key. She combines her private key with the public key associated with the third and final server. She uses the derived key to encrypt her message and attaches her public key to it.

When the third server receives this message, it combines Alice’s public key with its private key to obtain the same derived key, which it can use to decrypt the message.

The trick of onion encryption is recursively apply the above encryption scheme: first encrypting the message for the third server, then for the second, and finally for the first. Then, as each server processes the messages, they peel off their layer of the encryption. On the return route, layers are re-applied similarly. This step hides both the content of the messages and the deaddrop locations.

Onion encryption can be viewed as nested private channels: the user speaks to the last server in the chain without revealing anything to the intermediate servers

Noise for differential privacy

It might seem like the first two steps suffice: after all, onion encryption hides what Alice is saying, and mixing with deaddrops hides that she is saying it to Bob. However, one thing is still leaked. The deaddrop server can deduce the number of active conversations by counting the number of deaddrop locations containing two messages. This might seem like a very minor issue, but it can lead to very real attacks. Imagine an adversary that can cause network partitions. He can cut Alice off from the servers. If the number of conversations goes down, then he concludes that Alice was indeed in the middle of a conversation.

Plot of true number of conversations, clearly revealing that Alice and Bob stopped communicating in round 4

To prevent this attack, we try to add random noise. To achieve this, each server adds fake conversations and non-conversations. If the level of cover traffic is high enough, it should be hard to tell the end of Alice’s conversation from random noise.

Plot of noisy number of conversations, which seems to reveal less

In fact, there is a formal framework called differential privacy that tells us exactly how much noise the servers have to add. Thus we get a formal proof that almost no information about whether or not Alice is conversing is leaked.

Implementation

We reproduced the core Vuvuzela functionalities in about ~3000 lines of Rust. For basic cryptographic functionality, we used the Ring package which in turn is build on top of Google’s BoringSSL fork of OpenSSL. For communication, we used the Tarpc framework. Our source code is available on GitHub.

For each server, we implemented the Vuvuzela protocol as a sequence of futures executing on top of a tokio threadpool. The RPC interface the servers used to communicate is more or less self explanatory. We had RPC’s for sending messages between servers, as well as ending rounds between servers. The head server had a ‘put’ RPC call, which would return a future that resolved to the round response on the client. We found out that our RPC library (tarpc), would spawn 1 physical thread per connection. This was not discovered until near the end of the project, during performance analysis.

On the client side, we implemented two clients. The first, for evaluation purposes, assumes the identities of many users, and sends a random message from each to the server. This allows us to emulate many users in a single process. The second client, for demonstration purposes, is closer to what a user would need in practice. It assumes a single identity and provides a user interface for message input and output. Neither client manages cryptographic keys; key distribution must be managed outside of the Vuvuzela system. For our purposes, we demonstrated all keys prior to initialization and cloned them to each evaluation machine. We also omit “dialing”: users must agree when to communicate outside of the system.

Evaluation

We had two main goals: to show linear scaling, and to understand the main contributors to system latency.

Setup

As described above, a specialized client was written to simulate large amounts of traffic. This client was run on a single laptop computer. Client latency is minimal; the onion encryption and decryption performed by the client take only ~10 milliseconds. Thus for the purposes of our experiments, we only analyze server latency. This allows accurate figures using only simulated users.

Our server processes are run on a single computer. Because the servers operate in sequence, the system should be no less performant than if each process had its own box. Consequently, we also ignore network latency. However, this can easily be accounted for: in a system with 3 servers and network latency t, there will be an increase in latency of 6t. We also do not account for the case when the network is saturated; however, since the system does nothing to mitigate such situations, this test case would be uninteresting.

Linear scaling

The first result from the original paper we sought out to replicate, was figure 9.

Data from the original Vuvuzela paper [1] (not our data)

Benchmarking data for end-to-end round latency

This figure demonstrated linear scaling with respect to the number of users connected to a system. This is because the amount of noise in the system is constant, and does not need to vary with the number of users connected with the system. The authors of the Vuvuzela paper also did their benchmarks with b = 0, in order to simplify the results and make a nicer graph. Since our hardware was not as powerful as theirs (4 core laptop vs 4x 36 cores, 60GB of RAM), we performed our benchmarks with a μ value of μ = 100, 200 and 300, exactly 1000x less than the original paper. However, since noise scales linearly with regard to the value of mu, we were still able to show linear scaling, as can be seen in our results above. Another key difference is that we used significantly smaller counts of users. This is largely due to unforeseen implementation details in the RPC library we used, which resulted in 1 thread being created per connection (user). This caused stability issues in our head server, which made benchmarking with more than a few hundred users concurrently connected difficult. We are still able to demonstrate linear scaling with regard to the user counts that we could support. We also know that we are able to run our system with much large μ values, such as μ = 100,000. The main reason we didn’t use these larger values to generate our graph, is that at that point we are unable to add sufficient user load to show linear scaling. The overhead of sending hundreds of thousands of messages would dwarf the overhead of 200 users, so the graphs would have looked fairly flat.

Latency breakdown

The next major part of our evaluation, was to see what the breakdown of round latency looks like, and to identify what code caused the most overhead. As seen above, we found that network latency formed the largest amount of overhead for us, with almost 57% of end-to-end execution time spent waiting on network IO. Generating noise during forwarding operations as well as deriving keys and decryption took the next largest chunk of time, about 41% total. We arrived at similar conclusions as the original authors, showing that cryptographic operations and inter-server communication accounted for 98% of all end-to-end round time.

Conclusion

In conclusion, we believe that we were able to replicate the Vuvuzela system adequately, and while our choice of libraries led to unforeseen performance impacts that limited the number of concurrently connected users, we believe that we were able to adequately implement the Vuvuzela protocol. In addition, we also believe that we were able to replicate the performance characteristics of the original system.

References

[1] Jelle van den Hooff, David Lazar, Matei Zaharia, and Nickolai Zeldovich. Vuvuzela: Scalable private messaging resistant to traffic analysis. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP ’15, pages 137–152. ACM, 2015.