# A Crash Course on MPC, Part 2

## Two-Party Computation using Beaver/Multiplication Triples

So far, I have discussed some of the multiple concepts that pop up in the MPC literature constantly, and at this point, hopefully, you feel comfortable with these terms. I have not presented any hint, however, as to why MPC is possible in a first place, and if you think about it, it’s not entirely trivial: How can the parties compute a function without leaking the data that this function is supposed to be computed on! The whole idea doesn’t make much sense. It only gets worse when you start considering notions like perfect security… It’s already hard to imagine the notion of MPC, how in earth can it be ‘unbreakable’?

My goal with this post is to convince you via an extremely simple protocol, that there is no contradiction on the idea of MPC at all, even with perfect security. Consider two parties, Alejandra and Bonifacio, who want to compute the following function: *z = f(x, y) = (x-y)*(x+y) mod 7*, where *x* is Alejandra’s input and *y* is Bonifacio’s. These values lie in the set *{0, 1, …, 6}*. The goal is to do this in such a way that Bonifacio does learn Alejandra’s input *x*, and the other way around, beyond what is leaked already by the output *z*.

An astute reader may notice that *z = x²-y² mod 7*, and therefore, since Bonifacio knows *y*, learning *z* allows him to learn *x² mod 7 *(also, Alejandra learns *y² mod 7*), which leaks a lot about *x*! There are two observations to be made, however. First, this doesn’t leak *all *of *x*: The value* x² mod 7*, unless equal to 0,* *still leaves the door open to whether *x *lies in *{1, 2, 3}* or *{4, 5, 6}*, since every square modulo 7 has two roots, one in each of these sets. The second observation is that, we just don’t care what is leaked by *z *about the parties’ inputs! Even if *z *happens to leak *x *or *y *in their entirety, that’s just not our problem¹ given that the security definition for MPC requires that the adversary doesn’t learn anything about the honest parties’ inputs, *beyond what is leaked already from the output*. You can’t really hide the inputs if the output, which is intended to be the result of the computation, leaks them.

# Hiding the Data

Recall that the task is to design a protocol, that is, a set of rules for Alejandra and Bonifacio to follow, in such a way that the function *z = (x-y)*(x+y)* is computed and nothing else is leaked. Alejandra has her input *x, *and Bonifacio has his input *y*. At first glance, it seems like, to compute *z*, the parties need to know both *x *and *y*, which are the values defining the result. With these two values known, the computation would proceed simply as follows (all the arithmetic from now on happens modulo 7):

- Compute
*u = x+y*and*v = x-y* - Multiply
*z = u*v*, and return*z*.

Now, I hope you are familiar with encryption schemes: These allow data to be transformed into a ‘hidden’ representation, in such a way that it can be recovered later on via some secret key. Imagine for a moment that we have way to encrypt the inputs as *Enc(x)* and *Enc(y)*, and that the encryption schemed used is such that, whenever you have encryptions of two values, you can add/subtract/multiply these two to get an encryption of the corresponding operation between the plaintexts. In this case, we could run the exact same steps as above, except we include an encryption step at the beginning to encrypt the inputs, and decryption step at the end to recover the output:

- Alejandra encrypts
*x*as*Enc(x)*, and Bonifacio encrypts*y*as*Enc(y).* - Compute
*Enc(u) = Enc(x)+Enc(y)*and*v = Enc(x)-Enc(y)* - Multiply
*Enc(z) = Enc(u)*Enc(v)* - Decrypt
*z*, and return*z*.

This is conceptually simple, and it hopefully convinces you that the idea of computing on ‘hidden’ data may not be as crazy as it sounds! You ‘only’ need a method of hiding the data in such a way that you can still perform operations on it… but… did we actually make progress? Let me say it once again: A method to hide the data in such a way that you can still compute on it… I think this still sounds as crazy as the idea of MPC itself, and indeed, it is! But it helps me convey the idea that, to compute a function securely, you ‘just’ have to switch to an alternative representation that doesn’t leak anything about the underlying values while you’re processing them.

Encryption schemes with the properties sketched above do exist! These are part of a field known as *Fully Homomorphic Encryption*, that has produced more and more efficient constructions during the last decade. However, even though using one of these schemes in the way I described above would yield a protocol for Alejandra and Bonifacio to learn *z*, that’s definitely not the way you want to do this in practice. The main reason is that these schemes are not as efficient as one would like.² Also, they are quite an overkill: The computation, except perhaps the last decryption step, can be carried out *locally *by anyone, even ‘outsiders’. In an MPC protocol we can exploit the fact that the parties can communicate and help each other carry out the computation. This will allow us to obtain enhanced efficiency, as I discuss next.

## Additive Secret Sharing

Instead of obtaining a ‘hidden’ representation of the data using an encryption scheme, we will *distribute *the data between Alejandra and Bonifacio in such a way that none of them can actually know what the data is, but the two together could figure it out if they wanted to. Notice that this very different to encrypting the data! When the data is encrypted, no one, except the owner of the decryption key, can know what the data is. This is more ‘transferable’, in the sense that this ciphertext can be passed around without leaking anything. In the distributed setting we’re discussing here, privacy is guaranteed as long as the adversary cannot get a hold on enough “pieces”, formally known as shares, but these pieces together completely define the underlying message.

To illustrate how this would work, suppose that Alejandra’s input is *x = 3*. Alejandra will sample a random number in {0, …, 6}, say, 4, and she will subtract this (modulo 7) from her input to get 6. Alejandra sends the first number, 4, to Bonifacio, and she keeps the other piece, or share, 6. These two values together, when added (again, modulo 7), lead to 3, which is Alejandra’s input. However, Bonifacio cannot tell what Alejandra’s input is just from the random number he got. Furthermore, in this case Alejandra knows her input from the beginning, but if we imagine that she forgets this value (and Bonifacio’s share too), she won’t be able to know what it is either from her share alone because it was obtained by subtracting from it a random value that she doesn’t know.

More abstractly, to distribute, or *secret-share*, a value *w* in *{0, …, 6} *between Alejandra and Bonifacio, the *dealer *(that is, the party knowing *x*) proceeds as follows:

- Sample
*w2*in*{0, …, 6}*uniformly at random - Set
*w1 = w-w2 mod 7* - Send
*w1*to Alejandra and*w2*to Bonifacio.

Whenever Alejandra and Bonifacio have shares of a value *w*, we will denote this by *[w]*. Think of this as the analogue of *Enc(w)* from above! It’s a hidden representation of *w*, except this time, it’s not just a string that hides *w*, but instead, it’s a distributed representation that guarantees that no individual party can learn anything about *w*, which is what we care about in our MPC setting.

With this new concept, we can translate the secure protocol from above using encryption, to our new representation. The only missing step is to show that values can be operated on, even when they are in this hidden representation. We will deal with that during the rest of this post.

- Alejandra
*secret-shares x*as*[x]*, and Bonifacio*secret-shares y*as*[y]*. - Compute
*[u] = [x]+[y]*and*[v] = [x]-[y]* - Multiply
*[z] = [u]*[v]* - Reconstruct
*[z]*, and return*z*.

The final step, reconstructing, involves the two parties sending each other their share, since the two shares together can be used to reconstruct the secret. This is the analogue to decrypting. To completely finish the description of the protocol, we need to show how to handle the operations *[u] = [x]+[y]*, *[v] = [x]-[y] *and *[z] = [u]*[v] *used above. That is, we need to show that there are methods to add, subtract and multiply values that are “hidden” with our secret representation, just like the hypothetical encryption scheme from above allowed us to do.

It turns out that addition and subtraction are actually quite simple. Think for example of the first part of the second step above. The parties have shares of *x* and shares of *y*, say that Alejandra has *x1* and *y1*, and Bonifacio has *x2* and *y2*, such that *x = x1+x2 *and *y = y1+y2*. To get shares of *x+y*, it suffices that Alejandra adds together *x1 *and *y1*, and Bonifacio adds *x2* and *y2*. This is because *x+y = (x1+y1)+(x2+y2)*, so the shares *x1+y1* and *x2+y2* add together to the right secret, *x+y*. A similar observation exists with subtraction. Notice that to perform these operations, Alejandra and Bonifacio don’t even need to talk! Interaction is costly, and the more we can reduce it, the better.

Now, let me be the first one who tells you this: **Multiplication will always be the problem**. Unfortunately, it’s not true that Alejandra and Bonifacio can locally multiply their shares of *u* and *v *to get shares of the product of *u*v*, as I hope you can verify as some simple exercise. In general, as we will see throughout this series, the “distribution” mechanisms we use will always enable additions, subtractions, and in general linear operations, to be performed locally, and multiplication will always be the operation that requires more care, and also the operation that requires communication (and it often represents the bottleneck). The rest of this post is completely devoted to handling multiplications.

# Multiplying Secret-Shared Values

Imagine Alejandra and Bonifacio have shared values *[u]* and *[v]*, and they want to obtain shares *[u*v]*, without leaking *u*, nor *v*, in the process. Very much like in step 3 in the protocol above. There are many (many!) ways of dealing with this problem, and in general, you should know that this is the bottleneck of most MPC protocols and a lot of recent research works that improve the efficiency of MPC protocols focus only on this concrete task. The one I will show here is based on the so called *multiplication triples* idea, due to Beaver, which constitutes a quite clever and simple way of tackling the problem.

## Offline-Online Paradigm

First, we split the whole execution into two parts, and *offline phase*, and an* online phase*. In the first phase, also known as the *preprocessing phase*, the parties execute everything that does not depend on the inputs. The second phase is the part of the computation that requires the inputs to be defined. To get an idea, setting up communication channels (e.g. via key exchange) can be considered part of the offline phase in some way. This separation is meaningful since, in many use-cases, one cares more about the performance from the time the inputs are provided to the time the output is obtained, and anything that happens before that is less vital in terms of efficiency.

Beaver’s idea is very simple and general, but it does not solve the problem of secure multiplication completely. In a nutshell, the idea is to *assume* that the parties spent enough time and effort in the offline phase producing certain shared data (that is independent of the inputs). Then, in the online phase, once the inputs are provided, this preprocessed material can be used to securely compute multiplications in a very simple way.

## Multiplication Triples

The secret-shared values that the parties need to preprocess are known as *multiplication triples*, or sometimes *Beaver triples*. This a tuple of shared values *([a], [b], [c])*, where *c=a*b* and *a, b* are uniformly random and unknown to any party. To illustrate a potential way to generate these *[a]* (and *[b]*), think of Alejandra and Bonifacio sampling each some *a1 *and *a2* at random, respectively. Then, *a* can be defined as *a1+a2*. Alejandra doesn’t know *a* since she doesn’t know *a2*, and Bonifacio doesn’t know *a* since he doesn’t know *a1*. Obtaining *[c]* is harder since this is again the problem of computing a shares of the product of two secret values.

Fortunately, the beauty of Beaver’s method is that we don’t really need to care about how these triples are generated! As long as they satisfy the conditions mentioned above, they can be used in the online phase to handle multiplications. This modularity has many advantages. First of all, the online phase we will describe below, which uses these triples to perform secure multiplication, enjoys good properties like high efficiency and perfect security.³ Given this, to improve the efficiency of the overall protocol it is typically enough to focus only on the concrete task of generating multiplication triples which, although not particularly easy, is much less daunting than the task of securely computing a function.

Another cool aspect of this separation is that the generation of the multiplication triples doesn’t even need to be carried out by a protocol, at least if you’re ok with having a “partially” trusted third party. For example, Alejandra and Bonifacio may consider a mutual friend, Camilo, whose only task is to sample a triple *(a, b, c)*, and distribute shares of these values to Alejandra and Bonifacio. We require Camilo to be honest in the sense that these triples must be correctly computed, and he shouldn’t tell Alejandra nor Bonifacio what the original values are. Once Camilo has distributed this preprocessing data, he is not needed anymore. Notice in particular that he is not trusted with the actual inputs from the parties, and also, he is not required to participate in the computation at all.

## Online Phase

Suppose that Alejandra and Bonifacio have shares of two values, *[u]* and *[v]*. Think of these as depending somehow on the input data, such as in the protocol we have considered so far where* u = x+y* and *v = x-y*. Also, assume they got a multiplication triple *([a], [b], [c])* somehow; again, we don’t actually care how. This multiplication triple is used to compute shares *[u*v]* as follows:

- The parties compute (locally)
*[d] = [u]-[a]*and*[e] = [v]-[b]*. - The parties reconstruct
*d*and*e*, turning them into publicly known values. - The parties compute (again, locally, using the fact that linear operations can be handled without communication)
*[u*v] = d*[b]+e*[a]+[c]+d*e.⁴*

There are two main things to argue about this protocol: Correctness (shares of the right value are produced) and privacy (nothing is leaked about *u *nor *v* in the process). The first one is actually very easy to show: Just replace *u* by *d+a* and *v* by *e+b* in *u*v*, and you will get the right-hand side of the equation being computed in step 3. For the privacy property, just notice that the only values that are ever revealed are *d* and *e*. Now, these do not leak anything about *u* and *v*, can you see why? (HINT: Recall that *a* and *b* are uniformly random and unknown to any party).

Notice that, although the protocol above is relatively simple, it does involve communication (in step 2, where the parties need to announce their shares to reconstruct *d* and *e*). We’ll never get around this: Secure multiplication will always require some sort of interaction.

Also, observe that the template above is quite general and it’s not really restricted to two parties. We’ll discuss later on how to use this paradigm in a more general setting.

# Is this Protocol Actively Secure?

I don’t know, you tell me :) No, but seriously, try to think of what can break if suddenly one of the parties starts to act arbitrarily. It turns out the protocol breaks. The simplest part to analyze is the final step of the overall protocol where the result *[z]* is reconstructed, but you get bonus points if you see what breaks specifically in the multiplication protocol.

We will see how to add active security in a subsequent post. Like what we saw here, it will be easier to first ensure active security for the online phase “only”, that is, assuming that anything we need from a preprocessing phase is given to us for free. Ensuring active security for the offline phase is quite a hassle, and I’m not even sure we’ll get to discuss that in this series.

# What About other Functions?

Finally, I want to discuss briefly what to do when you face a different function than *f(x, y) = (x+y)(x-y) mod 7*. For example, in the famous millionaires’ problem, Alejandra and Bonifacio want to compute the function that outputs a bit indicating which of the two inputs (representing salaries) is the largest one. In the private-set intersection problem, parties have sets an inputs and they want the output to be the intersection of these sets. How can the techniques I presented here be used to compute such functions?

Let me first give you the theoretician’s answer. In this post, more than seeing how to compute *f(x, y)* as defined above, we saw how to securely compute additions and multiplications (modulo 7). Well, it turns out that if you can compute additions and multiplications over a field (which is what integers modulo 7 are), then you can compute **any** function over that field (and hence any function in general by encoding data into that field). This is because every function taking tuples over the field and returning a field element can be expressed as a multivariate polynomial, which is just formed by additions and multiplications. Computer scientists don’t tend to talk about multivariate polynomials, but rather arithmetic circuits made up by addition and multiplication gates, but they’re really one and the same.

This answer may not be completely satisfactory since, in general, many functions that, under “normal” conditions, would be easy to compute, may become overly complex when expressed in this unnatural field-multivariate-polynomial representation. This is not the way to handle things in practice. The idea when deploying MPC for an actual meaningful computation is to express it, up to the extent as possible, as a combination of additions and multiplications. If it involves integer arithmetic then one can use a large enough modulus that avoids any wrap-around. If it involves arithmetic with real numbers then one can use fixed-point arithmetic, which requires performing operations like truncation and can be handled via specific protocols for this task. Same if the computation requires, for example, performing a comparison between two shared values or evaluating a mathematical function like *e^x*: These are computed using special-purpose “sub”-protocols.

Unfortunately, I won’t get the time to discuss the fundamental primitives that enable MPC for practical applications. For now, you will have to be satisfied with the first-grade operations of additions and multiplications. At the end of the day these are, theoretically-speaking, sufficient! :)

# Footnotes

- ¹ It’s not that we don’t care about this, it’s just that the (highly relevant) issue of ensuring the function being computed leaks little about the inputs is completely orthogonal to MPC. This problem must be taken seriously, and there are multiple ways to mitigate what the output of a function may leak about its inputs, using
*Differential Privacy*. - ² Don’t misinterpret me. Running a very complex computation ‘in FHE’ can be prohibitive (although the barrier is constantly shrinking thanks to progress in the field). However, for simple operations like the one we’re considering here, FHE schemes can work just fine. In fact, as we will see towards the end of these lectures, we use FHE in MPC a lot, but we try to minimize its use by restricting to certain specific parts in which it can perform very well.
- ³ This doesn’t mean the whole protocol is perfectly secure! Recall for example that perfectly secure protocols with dishonest majority cannot exist. In these cases, the offline phase must involve computational assumptions somehow, and in particular, it will usually be the bottleneck of the whole protocol.
- ⁴ This is actually an
*affine*combination since the last value being added is public. I leave it as an exercise to find a method for Alejandra and Bonifacio to add a public value to some given shared value in such a way that they get shares of the sum of these two. The method should not involve any communication.

# Notes

- Thanks to Wyatt Howe for pointing out a typo on the online use of the triples.