Data Science without Seeing Data: How to Set Microsoft Open Source SEAL Parameters

Published in

Intuit Engineering

7 min readJun 16, 2022

By Boaz Sapir

As trusted stewards of our customers’ data, we’re proud to live by a long-held set of Data Stewardship Principles in service to 100M+ consumers and small businesses who use Intuit products and services, such as QuickBooks, TurboTax and Mint. We’re equally proud to innovate in our security practices with the same rigor as we innovate in our product development. This includes exploring advanced privacy technologies.

Fully homomorphic encryption (FHE) is a relatively new and promising technology with the potential to solve one of the biggest data privacy concerns: the need to decrypt sensitive data to use it for analysis. By using FHE to secure the data, it becomes possible to perform computations directly on the encrypted data to avoid the risk of exposure. In essence, data science without ever seeing the data.

At Intuit, for example, we’ve experimented with FHE to implement machine learning algorithms that use encrypted sensitive data without having to decrypt it. In recent years we’ve seen an uptick in industry adoption, and commercial FHE-based solutions are beginning to come to market, such as IBM’s HELayer library.

FHE is still an emerging technology, and writing code that correctly uses one of the available FHE libraries is a challenging task even for experienced software developers.

In this blog I’ll focus on Microsoft SEAL (simple encrypted arithmetic library), one of the most popular free and open source cross-platform software libraries developed by Microsoft Research for implementing FHE encryption schemes. I’ll explain the importance of correctly choosing the parameters for your specific computation when working with SEAL, and what can go wrong, then provide detailed instructions on how to do it right.

Experienced software developers who work on code that performs computations on sensitive data — and would like to experiment with FHE as an alternative to decrypting the data before using it — will benefit most from this post. A major use case is code that implements machine learning algorithms.

SEAL parameters and why they are important

The SEAL API is rather complex, and there are many details to consider when writing code that uses it, as explained in SEAL examples.

The first challenge, and the focus of this blog, is setting the SEAL parameters correctly. There are several numeric parameters that must be set within the code. Understanding the implications of these parameters, and the relations between them, requires detailed attention. The right values depend on the specific computation you want to perform over encrypted data, and there is no option to simply use the defaults. If you choose wrong values, most likely your computation will fail with a rather cryptic error message from SEAL.

In what follows, I’ll lay out an approach that will help you determine which parameter values are correct for your computation.

Important Notes:

SEAL supports 2 different encryption schemes. This blog is only about CKKS (homomorphic encryption scheme proposed by Cheon, Kim, Kim and Song), which is the one best suited for machine learning models.
SEAL is implemented in C++ and its native API language is C++. However, there are several wrappers available in other languages, including CSharp and Python, and the instructions below also apply to them

First thing’s first — understanding your multiplicative depth

The nature of FHE means that any computation you might consider implementing on top of SEAL is a sequence of additions and multiplications. The critical characteristics of your computation for choosing the right parameters is its multiplicative depth. The multiplicative depth of a computation is the length of the longest chain of consecutive multiplications in it (where the output of one multiplication is an input for the next one).

For example:

The length of the chain X1*X2*…*Xn is n.
The multiplicative depth of the computation X1*X2*…*Xn + X1*X2*…*Xm is max(m,n).

Note that the multiplicative depth depends on the implementation. X1*X2*X3*X4, for instance, is the same computation as (X1*X2)*(X3*X4), but the depth of the first implementation is 3, while the depth of the second implementation is 2.

The first step for deciding on SEAL parameter values is to analyze your computation and understand the exact multiplicative depth of the code that implements it. Since a smaller depth will give you more flexibility, and eventually better performance, it is recommended to look for an implementation with the smallest depth possible. For example, choose (X1*X2)*(X3*X4) over X1*X2*X3*X4.

Working within tight boundaries

The math behind CKKS is complex, and we will not get into the details here. What you do need to know is that every multiplication of CKKS-encrypted numbers adds some noise to the encrypted result. There is a limit to how much noise an encrypted number can have, and when there is a chain of multiplications, the noise keeps growing. This is why there is a limit on the multiplicative depth of a computation over encrypted data.

Here, we get to the main point: you can control the limit on multiplicative depth by your choice of SEAL parameters’ values. If you choose values that increase the limit, you will pay a price in performance. So, what you’re looking for is a limit that is just big enough to enable your computation’s multiplicative depth while sacrificing as little as possible in performance.

Setting the parameters

The multiplicative depth vs. performance tradeoff is controlled by two parameters:

poly_modulus_degree (the size of the irreducible polynomial)
coeff_modulus: (array of sizes of primes, in bits)

We will refer to the first and last primes as “outer primes” and to the rest of the primes as “inner primes.” In the following code example, the outer primes are of size 35 bits, while the inner ones are of size 25 bits. (SEAL generates random primes according to the sizes provided.)

Code example — setting poly_modulus_degree and coeff_modulus:

size_t poly_modulus_degree = 8192;parms.set_poly_modulus_degree(poly_modulus_degree);parms.set_coeff_modulus(CoeffModulus::Create(poly_modulus_degree, { 35, 25, 25, 25, 25, 25, 25, 25, 25, 35 }));

These are the implications you should be aware of:

The multiplicative depth limit is the number of inner primes.
The total bit count of all primes (sum of sizes in the array) is limited by poly_modulus_degree, according to Table 1 below.
Larger total size of primes, and (more significantly) bigger poly_modulus_degree, mean worse performance (i.e. longer runtime of the computation and bigger memory consumption).
Prime sizes determine the precision, before and after the decimal point.

Table1, Limit on total bit count of primes by poly_modulus_degree (from Microsoft APSI documentation)

Now, let’s summarize the rules you should follow:

Decide on the sizes of the primes.

Inner primes: minimal size is 20. For reasons explained in SEAL’s CKKS basic examples, all inner primes should have the same size.
Outer primes: bigger than inner primes by at least 10 bits, and both of them (first and last) should have the same size.
Precision before the decimal point (precision of integer part), in bits, is equal to the difference between sizes of inner and outer primes. For example, if the inner primes are of size 25 and the outer are of size 35, the precision is 10 bits. Meaning that if the inputs or outputs of your computation can be bigger than 2¹⁰, you should choose sizes with a greater difference to avoid corruption of the computation results.
Bit precision after decimal point is roughly equal to the difference between the size of the inner primes and the precision before the decimal point. In our example of sizes 25 and 35, it will be 25–10 = 15. If you need precision of more than 15 bits after the point, you need to enlarge the size of the inner primes (and, consequently, also the outer ones) to keep the difference between them.

2. Construct the coeff_modulus array with your multiplicative depth as the number of inner primes.
For our example, if the depth is 4, the array will be {35, 25, 25, 25, 25, 35}

3. Compute the total bit count of the primes.
In our example, it will be 2*35 + 4*25 = 170.

4. Find, in Table1, the smallest possible value for poly_modulus_degree that supports the primes bit count.
In our example, the value is 8192, which allows a bit count of up to 218 > 170.

Congratulations! You have chosen your values for the parameters.

Notes:

The size of the inner parameters should also be used as the power of 2 for calculating the scale parameter, which is passed as a parameter to CKKSEncode::Encode(). In our example, the scale should be 2³⁵.
If you use SEAL’s batching capability (operating simultaneously on an array of numbers), you should keep in mind that the maximum size of a batch is half of poly_modulus_degree. Therefore, changing the value of poly_modulus_degree may affect the design of your code. Also note that the ability to use bigger batches may mitigate the performance degradation that results from a bigger poly_modulus_degree.
Before starting to use SEAL, it is highly recommended to familiarize yourself with these SEAL examples. If you intend to use CKKS, focus on the CKKS basics examples.

Looking ahead

Using SEAL and other FHE libraries is still quite complex, requiring attention to many details, while overcoming performance challenges. However, the potential is huge, and we’ve seen significant progress in recent years.

At Intuit, we’re continuing to explore new ways to benefit from this technology, both internally as part of our data protection efforts, as well as externally for collaboration with other organizations. Research in this area is dynamic, so we’ve made it a high priority to stay up-to-date on the latest developments, and expect to make more progress as an industry in the near future.