Cryptography for Dummies — Part 3: Polyalphabetic Ciphers

Published in

Blockgeeks

5 min readOct 25, 2018

Check out part 2 here.

Breaking cryptographic codes has a lot to do with detecting patterns. As we saw in our last post, the Caesar Cipher was ultimately broken because it couldn’t hide the pattern of letter frequencies inherent in any given language. So to make a cipher unbreakable, or at least less breakable, code creators have to try to eliminate the presence of patterns in the ciphertext.

In the case of the Caesar Cipher, this means evening out the distribution of letter frequencies in the encrypted message, so that when you plot the number of times each letter occurs in the ciphertext, you get a straight line, as opposed to the pattern of the language’s fingerprint.

The red line in the graph above shows the perfect cipher, because every letter of the alphabet is equally likely to appear in the encrypted message. In other words, letters are selected at random. There is no pattern or system around how each letter is selected, which means that there’s nothing for the code breaker to reverse-engineer.

This also means that the cipher doesn’t have a reusable key. A key is essentially a pattern that we use to lock our message, and which we share with others to unlock our message. Without a pattern, a cipher is unusable. With it, it’s easily breakable. That’s quite a conundrum, and it’s the reason why cryptography’s hard.

Cryptographers have to create ciphers that simulate randomness, while still relying on a sharable pattern. Modern cryptography uses complex algorithms to do this, but we can look at a much simpler example to get a sense of the process involved.

The Caesar Cipher is about as simple as it gets. It uses a single number as its key. This key is easily shareable, but also betrays a pattern. There is no randomness here, simulated or otherwise.

On the other end of the spectrum, we have the cipher plotted above. The key for this cipher is a never-ending sequence of random numbers. This means that to encrypt the message, every letter is shifted forward by a different number. Completely unbreakable, but also, as you can probably tell, completely unshareable.

The middle-ground would be to create a cipher whose key is a sequence of numbers that’s easily shareable. That’s where Polyalphabetic Ciphers come in. Sometime in the 15th century, cryptographers came up with the idea of using words, instead of numbers, as keys. Let’s see how this would work.

Imagine you want to share secret messages with the members of your family. You still use the operation from the Caesar Cipher (i.e. shift letters forward), but instead of using a single number, like 3, you use a word, let’s say, “FAMILY”.

Every letter in the alphabet has a position, and a number associated with that position. As the first letter in the alphabet, A is represented by 1, B by 2, C by 3 and so on. We can use this system to turn the word “FAMILY” into a sequence of numbers.

You can already see how this can be useful. “FAMILY” is a lot easier to remember than “6–1–13–9–12–25”. And you could easily have a longer sequence by using a longer word, or even a sentence. You are now, in a way, simulating randomness.

So, how do you use this pseudo-random key? To encrypt a message, you would shift each letter forward by the next number in the key, repeating the sequence when you run out of numbers.

Notice how the two Ls in the above example translate into different letters in the ciphertext. If we were still using the Caesar Cipher, they would translate into the same letter, already leaking a pattern in this short message.

The longer a message is, the harder it becomes to simulate randomness. “HELLO” is only 4 letters long, so it doesn’t give our key sequence a chance to repeat. But imagine if we wanted to encrypt a paragraph using the same key. Our key sequence would repeat multiple times, making the ciphertext susceptible to patterns and, therefore, code breakers.

The only difference now is that instead of simply plotting letter-frequencies to see a pattern, code breakers have to use trial and error to figure out the length of the key sequence. Once they know, for example, that every fifth letter in the encrypted message uses the same shift, they can break the code by solving five Caesar Ciphers in a row: shift every fifth letter backwards by 6, every sixth letter by 1, every seventh letter by 13 and so on.

What makes breaking this cipher hard, or rather, time-consuming, is the trial and error part. To find out how long a key sequence is, code breakers have to plot letter frequencies at every interval, until they find one that produces the language’s fingerprint. In our example, the code breaker would have to plot five graphs before finding the pattern. Time consuming, but still very doable.

To make our cipher more secure, we’d have to use a longer key sequence. The longer the sequence, the more intervals the code breaker has to test. Our goal as the code creator is to make this process of trial and error as long as possible, to the point that it becomes infeasible for the code breaker to actually go through with it. Imagine if it took the code breaker a 100 tries, or a 1000 tries, instead of 5 to break our cipher. Now imagine if it took them a hundred years. It would be physically impossible for anyone to commit the time it takes to break our code.

What this means is that a cipher doesn’t have to be perfect to be considered unbreakable. It just has to be impractical to solve. The real question for cryptographers is how to close to perfect can a cipher get without losing its usability. We’ll explore this question in our next post, as we continue to look out how these early ciphers evolved.

Check out part 4 here!

Cryptography for Dummies — Part 3: Polyalphabetic Ciphers

Written by Niloo Ravaei