In the past few years alone, the world has seen many massive data breaches. Most recently, Marriott Hotels exposed the data of 500 million people (7% of the world’s population!).
Security is important in every single facet of the internet. Imagine if Amazon couldn’t keep your credit card secure, or if your Facebook messages were public, or if anyone could read the emails from your healthcare provider. A world without secure communications is a scary one. But most of us have no idea why our communication is secure.
This article begins to answer this question. We’ll look at public key cryptosystems by means of an introduction and an example. Next, we’ll discuss a specific public key cryptosystem — the RSA (Rivest–Shamir–Adleman) algorithm. This algorithm is named after Ron Rivest (1942), Adi Shamir (1952), and Leonard Adleman (1945). RSA has been around for 50 years and it is still used today!
Public Key Cryptosystems
A cryptosystem is a set of processes used to securely transfer information between parties. A great example of a cryptosystem is writing. You and I agree that a set of squiggles is the encryption of an object. These squiggles are more commonly referred to as letters.
Now, you and I can send messages about that object. These messages are secure against everyone who cannot read our language. This certainly seems trivial now. But if you are Julius Caesar and only 5% of Romans can read or write, this just might do the trick.
What about this public key business? What is that? In a cryptosystem, a key is a piece of information (usually a number) that is inputted into a function that produces a specific output. A public key is a key whose value is open knowledge. Some cryptosystems use a private key, which is a key whose value remains secret.
RSA is a public key cryptosystem because it uses a public encryption key and a private decryption key. As you can guess, an encryption key takes messages in plaintext and converts them to secure ciphertext. And a decryption key undoes this process.
Many of us have used a public key cryptosystem in the form of a ballot box. We’ll use a modified ballot box below so we can use our terms from above. Let’s assume we have an outer box that is locked and inside this outer box is our ballot box.
We first start with the outer box. The outer box is currently locked. But the key is on the outside of the box, so anyone can open it. The key on the outside of the outer box is our public key. Everyone has access to it and everyone can use it.
So, we open the outer box and we find a ballot box inside. The ballot box is below. Let’s assume the slit at the top is perfectly one way (i.e. you can’t pull any ballots out). There is a lock on this box but there is no key. The key to the inner lock is the private key. It must be kept private or anyone can read the ballots.
And this is precisely a public key cryptosystem. Assuming the private key stays private, anyone can drop off a ballot safely and securely. It can only be read by the person with the key to the inner box.
In RSA Encryption, anyone can send a message safely and securely using the public key but these messages can only be read by the person with the private key.
The Benefit of Public Key Cryptosystems
The reason public key cryptosystems are so important is that they do not require any shared prior information. You and your local ballot official do not agree to share a secret private key, you do not exchange any information, and you don’t even have to meet. This is invaluable for modern systems as establishing a private key is impractical.
Imagine having to go to a brick-and-mortar Amazon store, get a secret password in person, go back to your computer, and then start shopping. And imagine having to do that for Facebook, Google, Twitter, Venmo, iMessage, Outlook, and all the rest. It would be impossible. Public key cryptosystems solve this problem.
4 Key Concepts in the RSA Algorithm
Before we can dive into the RSA algorithm itself, we need to cover four key concepts: modular arithmetic, the greatest common divisor, Euler’s Totient Function, and Euler’s Theorem.
(1) Modular arithmetic is often referred to as clock arithmetic. A standard clock only has 12 hours. The 13th hour doesn’t exist. The clock just circles back to the first hour. Clocks work with the modulus 12. It’s equivalent to say clocks work modulo 12. Let’s say I have some number x = 376. And say we want to know what x is congruent to modulo 59. We look at the remainder of x when it is divided by our modulus. Our result is shown below. Therefore x = 376 ≡ 22 mod 59.
Another way to think of this is as 376 = 59•6+22. In fact, any number can be written as x = n•q+r for integers q and r with 0 ≤ r < n. Here, x is congruent to r modulo n. This is an important component of RSA encryption.
(2) The greatest common divisor of two numbers is the largest integer that divides both numbers. It is also known as the greatest common factor. For example, gcd(15, 21) = 3 since 3 is a factor of both 15 and 21 and it is the largest such factor. Similarly, gcd(8, 12) = 4. Although 2 is also a factor of both 8 and 12, 4 is larger. If the two numbers share no factors, like 14 and 25, then the gcd(14,25) = 1. For more on gcd, see here.
(3) When two numbers don’t share any factors (like 14 and 25), we say these numbers are relatively prime and their gcd equals 1. For example, 19 and 8 are relatively prime. And 271 and 22 are as well. Next, we define a function ϕ(n). This function returns the number of integers less than n that are relatively prime to n. Some examples are below. This function is called Euler’s Totient Function. It is named after Leonhard Euler (1707). Note that if n is prime, then ϕ(n) = (n-1) as all numbers less than n are relatively prime to n.
(4) The final key concept comes from Euler as well. Euler’s Theorem (also known as the Fermat-Euler Theorem) underlies the success of the RSA algorithm. The theorem is stated below. The proof contains a little too much number theory for this article. But for those interested in the proof, see here.
The RSA Algorithm
Alright! Now we have all the tools we need to discuss the RSA algorithm. Let’s say Alice wants to send a message to Bob. We’ll call this message m.
We can translate m to a number using ASCII values. For example, the message “Hello” in hexadecimal ASCII is “48 65 6C 6C 6F “. We can then convert from hexadecimal (0x48656C6C6F) to base 10 (310939249775). Now our message m is 310939249775. Once we have a valid message, Bob has to do a little work before Alice can send Bob a message. The RSA algorithm is outlined below.
How do we know that last step will return the original message? We rely on Euler’s Theorem as shown below. For our purposes, we can assume gcd(m, n) = 1 as the only factors of n are p and q. The odds m = p or m = q are very small.
An Implementation of RSA Using Python
The following code uses RSA to allow for the encryption and decryption of a given message m (so long as m, when converted to an integer, is less than n). This code is certainly not robust and is meant only as an example. It should not be used as a means of secure communication. For those looking to edit the codebase, the GitHub repository is here.
We’ll first need two methods — one to convert words to integers and one to convert integers to words i.e. “ Hello” ⇆ 310939249775. These two functions are below.
The next step is to handle the setup. We’ll need to define n and ϕ(n). This function takes a parameter num_digits, which represents the number of digits in our primes p and q. The code below uses 50 digit primes. This implies n is approximately 100 digits long. Real RSA encryption uses an n with ~600 digits.
Bob is almost there! All he needs to do is define e and d, which is done in the below methods. The generation of d requires modular inverses. Rosetta Code has a great implementation that we use. We use an e with 25 digits, which should be sufficient for our purposes.
And we’re all set. We can now encrypt and decrypt a message m using the two methods below.
We implement a main() method below to drive our code and we’re all set.
And we see it works!
This article was only meant as an introduction to public key cryptosystems and RSA. But there is certainly much more to cover! There are restrictions and standard protocols for choosing the values of p, q, e, d, and m. There are efficient ways of performing the above calculations. For those interested in learning and exploring more, I recommend Introduction To Cryptography (2nd Edition).
While building RSA is certainly interesting, breaking RSA is even more interesting. The strength of RSA lies in our inability to factor numbers that are a product of two large, distinct primes. The two methods that attempt to do this are the Quadratic Sieve and the General Number Field Sieve. These are fascinating methods and we’ll hopefully have an article on them soon.