Indicating progress of RSA key pair generation: the practical approach

Published in

neXenio

6 min readJul 23, 2018

Asymmetric encryption is used abundantly to transfer confidential data through an untrusted infrastructure like the internet. The cornerstone of this end-to-end encryption is the availability of a pair of associated keys — a private and a public key. It is crucial that the private key is generated on the end-user’s device and is confined there as it allows anyone to read the messages encrypted with the easily distributable public key. Unfortunately, the generation of such a key pair can take anything from a few seconds to minutes depending on several factors.

This poses a user experience problem to applications using asymmetric cryptography: How do we update the waiting user on the progress of the key pair generation?

Since our goal at neXenio is to make the usage of cryptography as seamless as possible, some live progress estimation similar to the following one would be great:

Quick and dirty primer to RSA

There are several crypto systems implementing asymmetric crypto, but in this post we focus on RSA, a commonly used and well understood algorithm. Lets have a quick look at how RSA key pairs are structured at a conceptual level: The mathematics behind RSA key pairs relies on the “prime factorisation problem” a so-called trapdoor function. These are functions that are easily computable in one direction but extremely hard in the other.

Leaving out a few details for simplicity: It is easy to multiply two large random prime numbers (p and q) to obtain their composite product (n) but prohibitively hard to reconstruct those primes (aka. the private key) while knowing only their product (aka. the public key).

For our application’s 4096bit RSA key pairs, we are talking about prime numbers with about 600 decimal digits.

Why is generating an RSA key pair unpredictable?

Generating an RSA key pair means finding two large and purely random prime numbers. In practice, one utilizes a cryptography-grade Random Number Generator (RNG) to repeatedly generate random numbers with 600 digits. For each generated number, the probabilistic Miller-Rabin test is used to check whether the number is prime or not. This process is repeated until two suitable prime numbers are found. If we are lucky, we find a p and a q in the first trial. If we are unlucky, we (theoretically) never succeed.

Therefore, the time it takes to find a p and a q is unpredictable due to several factors such as: We have no way of knowing upfront how many random draws we will need before finding a p and a q. Also, it is unclear how long the Miller-Rabin test will run for each random draw because of its probabilistic nature. We don’t know how fast our customer’s computer can actually crunch those numbers or whether it can do it in parallel. And so forth.

Get some statistics on the generation process

But there is hope! Let’s look at RSA key pair generations empirically. Once a generation is successful, we can count the number of random draws the algorithm needed to find a p and a q from the RNG.

Hence: Generating an RSA key pair is a random experiment and the number of prime candidates drawn from our RNG is a random variable (e.g. X). And as such, we can obtain an understanding about this process by running many key pair generations and looking at the outcome:

Visualising the count of random number draws while generating 4,096bit RSA key pairs in 100 bins with a log-normal probability density function fitted to it (using Botan 2.6.0 to generate 24,000 RSA key pairs)

In our setup, it seems extremely unlikely that a suitable p and q will be found in less than 150 draws. On the other hand, most key pair generations were finished after less than 600 draws. However, the distribution shows a considerably long tail that needs to be incorporated into our final generation progress estimation.

Building a Graphical Progress Indicator from this

The orange graph above is the probability density function (PDF) from a fitted log-normal distribution and describes the likelihood of finding an RSA key pair with exactly X random draws. That is nice but it does not help us much when trying to estimate “how much” of the key pair has already been generated. What would be more helpful is the likelihood of finding a p and a q after X random draws because it would give us a notion of “progress”. Enter the cumulative distribution function (CDF) which is the integral of the PDF:

The cumulative distribution function (blue) describes the likelihood that a 4,096bit RSA key pair was generated after X generated random numbers

In practical terms, we could interpret the CDF (blue graph) as “percentage of computation performed to generate an RSA key pair given the number of random draws X so far”. That is a rough estimate of the key pair generation progress based on a value that we can measure on the go and that is independent of the customer’s computer’s performance. We simply need to instrument the RNG and count how often it is invoked by the code that generates the key pair. Most cryptographic libraries should allow you to do that without too much hacking.

Feeding this “percentage” into a graphical progress indicator should make it start moving slowly (0 < x < 200), speed up when successfully generating a key pair is most likely (200 < x < 400; cf. the PDF in orange) and gradually slow down again (x > 400) to account for the long tail of the distribution. The result may look like that for a single key pair generation trial:

Correcting the estimation for better UX

Using the plain CDF as the input for our graphical progress indicator is a good start, but it visually shows two obvious problems:

In the beginning, the indicator hangs at 0% because the CDF is practically 0.0 for x< 150
When the key pair generation finishes before x > 400, the indicator visually “jumps” to 100%

To correct the first problem, we need to depart from the fully data-driven estimation and adapt the estimation function to our needs rather than using the empirically obtained CDF directly. I will spare you the details, but we gradually blended the CDF with a function that initially rises faster than the actual CDF to generate an estimation function that is shaped more “realistically”. For technical details, please refer to the provided code snippets. The progress indicator at the beginning of this post is based on this corrected estimation function:

The corrected CDF (dark blue) provides a better visual progress indication for X < 200 despite departing from the empirical data

To address the second issue, we considered extrapolating the progress indicator from the position where suitable primes for p and q were found up to 100%. That would somewhat smoothen out the motion of the indicator by artificially deferring the result of the key pair generation for a split second. In practice, however, we found that the jumping indicator at the end of the estimation does not harm the UX enough to warrant such an extrapolation.

Conclusion

Obviously, there is still no way of knowing precisely how long our users’ RSA key pair generation might take because of the inherent randomness involved. Nevertheless, compared to the ordinary spinner animation we had before, our users now have a more tangible notion of how long they will have to wait. This makes for a significantly better UX than a boring spinner with the label “this process may take a few minutes”.