Announcing Concrete Numpy
Zama are very excited to announce our release of Concrete Numpy as a public beta. Building on the efficiency, usability, and simplicity of the Concrete library, we are releasing this open source compiler and a Numpy frontend.
Learning from our past prototype
In October 2021, we showcased our HNP (Homomorphic Numpy) prototype. It allowed data scientists without any prior knowledge of cryptography to automatically turn Numpy functions into their FHE equivalent. Concrete Numpy incorporates many of the things we learned from the original HNP prototype.
HNP: A great API to build encrypted programs
HNP (see examples) is far more user friendly than other APIs for cryptographers. Install our tool, open your Jupyter notebook, and simply import the HNP package in python. Then, you just have to write Numpy as you would normally do. No prior understanding of cryptography is required to create an equivalent program running over encrypted values. You only have to give the shapes, data types, and whether the inputs of the program are clear or encrypted at runtime.
We believe that over-complex systems can easily introduce vulnerabilities, this is why we want to provide an automatically secured configuration. Of course, if you are curious, you can still check Homomorphic Encryption 101.
Concrete Numpy: A fully fledged toolkit for data scientists.
With our new package, more information is inferred from the function itself, making things even more user-friendly. How? In HNP, we used a dataset in the compilation call. The dataset is a set of representative entries to the function that let the compiler know what is the typical dynamic range of the data in each of the intermediate computations. Knowing this range, Concrete Numpy is able to compute the appropriate FHE parameters. Our package also uses this (unlabelled) dataset to automatically recover information like shapes or bitsize of inputs. You only have to define what is encrypted and what is clear, and that’s it!
The call is as simple as this:
Then you can use the compiler object to calibrate with the dataset.
Difficulties with precision
With our previous HNP prototype, the user had nothing to manage and everything was done under the hood. The downside was that if anything went wrong (like an accuracy drop during compilation), nothing could be done. These errors are notably due to the fact that FHE has limited precision. Presently at Zama, we have the analogy of FHE with an 8-bit CPU.
With our new Concrete Numpy, your job is to convert high precision ML models (typically, machine learning algorithms using float32 or float64) into models with less precision, and with smaller and more discretized values. It is a common practice in the ML world, and many tools (as well as academic literature) exist to help you perform model quantization and compression. We will soon release another blog post about quantization and its use for FHE-friendly models, so make sure to subscribe to our newsletter.
Approximate vs Exact approaches
A worthwhile change from the deprecated HNP that can be tricky to understand for a non-cryptographer is the difference between the approximate approach and the exact approach.
Our HNP prototype used the approximate approach, meaning that, while it accepted a larger dynamic range for data, it also allowed some minor errors during computation (this is due to the so-called drift in the programmable bootstrapping, and you can read more about it here).
Such errors can be acceptable for some ML models with favorable parameter distributions (eg: several neural networks) as some neural networks can absorb tiny differences in intermediate values. But the approximate approach and its stochastic nature made debugging model accuracy issues very difficult in case of problems. It also made noise management (which is a critical cryptographic parametrization for the security) much more complicated.
The new Concrete Compiler and its exact approach (from 40:41s) accepts a much smaller range for data, and it only supports integers. In a way, we’re back at using an 8-bit CPU, but this restricted type of data assures the exactitude of computations (with a high degree of probability). The exact nature of the computation allows you to simplify the choice of FHE parameters and the functions compiled will be turned into bit-exact FHE-equivalents.
Splitting the work: You take care of the data science, we do the cryptography
With this new version, we have split the task done by our previous HNP prototype in two parts:
- Data science: you are now responsible to adapt your models so that they respect FHE constraints such as using low precision values.
- Cryptography: we take care of all the tasks related to cryptography.
We encourage you to do the initial work on what you do best: data science. Therefore, you have more control on your ML models and we, on our part, can guarantee the best accuracy in FHE. We will continue to focus on our main job: the compilation into FHE, handling every aspect of FHE security and optimizations for execution, speed and RAM usage.We will also deliver examples and tricks on how to do these model optimizations, so you’ll be able to deal with FHE constraints. We’re also working on other ML tools that will simplify the work of data scientists. Stay tuned for more information, and rest assured that, as promised, we can’t wait to open source them.
And while you can already start playing with Concrete Numpy, next week we will show you how to build an FHE-enabled insurance incident predictor with this tool, so stay tuned.