Mono-Alphabetic Substitution ciphers

Omar Elhadidi
20 min readJul 24, 2024

--

In my first blog, where I talked about how and when the monoalphabetic substitution was invented and their historical appearance, I mentioned only 2 of them (the Atbash cipher and the Caesar cipher) although there are 2 other well-known monoalphabetic substitution ciphers that do not hold importance historically but can be taught academically which are :

  • Affine Cipher
  • Mixed Monoalphabetic cipher

So let’s start by explaining these 2 ciphers and lastly, explain how to break each and every monoalphabetic substitution separately.

[+] Affine Cipher

Overview

What is an Affine Cipher

The Affine Cipher is a classical monoalphabetic substitution cipher. It is a combination of a multiplicative cipher and an additive cipher and uses mathematical functions to encrypt and decrypt messages.

Additive Cipher (Shift Cipher / Caesar Cipher)

The simplest mono-alphabetic cipher is an additive cipher. It is also referred to as ‘Shift Cipher’ or ‘Caesar Cipher’. As the name suggests, the ‘addition modulus 2’ operation is performed on the plain text to obtain a cipher text. I explained the Caesar Cipher in my previous blog but this an algebraic representation to explain how the affine cipher works

C = (M + k) mod n
M = (C — k) mod n

where

C -> cipher-text
M -> message/plain-text
k -> key

The key space is 26. Thus, it is not very secure. It can be broken by brute-force attacks.

Multiplicative Cipher

The multiplicative cipher is similar to the additive cipher except for the fact that the key bit is multiplied by the plain-text symbol during encryption. Likewise, the cipher-text is multiplied by the multiplicative inverse of the key for decryption to obtain back the plain text.

C = (M * k) mod n
M = (C * k-1) mod n
where,
k-1 -> multiplicative inverse of k (key)

The key space of the multiplicative cipher is 12. Thus, it is also not very secure.

Note: The Affine Cipher is a form of the more general linear congruence equations. It combines the multiplicative cipher and an additive cipher to improve the key space. The key space of affine cipher is 26 * 12 (key space of additive * key space of multiplicative) i.e. 312. It is relatively more secure than the above two as the key space is larger. Do you know where these numbers came From? Lets keep going

Encryption Process

the encryption process is substantially mathematical. The whole process relies on working modulo m (the length of the alphabet used). By performing a calculation on the plaintext letters, we encipher the plaintext.

Step 1: transform the alphabet letters to the corresponding integer

The first step in the encryption process is to transform each of the letters in the plaintext alphabet to the corresponding integer in the range 0 to m-1.

Step 2: Choose your 2 keys (a,b)

The Value of “b” can be any integer from 1–26 although The Value of “a” must be coprime with m in other terms “gcd(a,26) = 1”

The restriction gcd(a,26) = 1 stems from the fact that the key parameter a needs to be inverted for decryption. We recall that an element a and the modulus must be relatively prime for the inverse of a to exist. Thus, a must be in the set:

a ∈ {1,3,5,7,9,11,15,17,19,21,23,25}

But how do we find a−1? There is a quick way to check if an inverse exists for a given m and a (relying on an advanced mathematics topic called group theory), For now, we can simply compute it by trial and error: For a given a we simply try all possible values a−1 until we obtain:

Why “a” must be coprime

One of the peculiarities of the Affine Cipher is the fact that not all keys will work. Try using the keys a = 4, and b = 5 to generate the ciphertext alphabet in the table given. You can check the answers you get. This key creates a situation where more than one plaintext letter is encrypted to the same ciphertext letter (for example, above both “e” and “r” encipher to “V”). This means that when it comes to decrypting, the recipient will be unable to know which one of the plaintext letters has been used. Clearly this is a huge problem in using the Affine Cipher, and it is essential for the key to be chosen carefully.

This problem occurs since the multiplicative inverse of a does not exist modulo m. That is, there is more than one number that can be multiplied by 4 to get 1 modulo 26.

The inverse of a modulo m exists if and only if a and m are coprime (that is they have no common factor other than 1). Hence, a = 4 does not work when m = 26, as they have a common factor of 2 (2 goes into both 4 and 26), but a = 5 does work since 5 and 26 are coprime.

Step 3: Encrypt the plaintext

Compute E(x) = (a*x) + b mod26. the encryption process for each letter is given by

  • E(x) = (ax + b) mod m

where a and b are the key for the cipher. This means that we multiply our integer value for the plaintext letter by a, and then add b to the result. Finally, we take this modulus m (that is we take the remainder when the solution is divided by m, or we take away the length of the alphabet until we get a number less than this length).

Step 4: Substitute the letters to its alphabet form to get the ciphertext

Write down the alphabet for computed digits according to the table given for x values. That’s our ciphertext.

Decryption Process

In deciphering the ciphertext, we must perform the opposite (or inverse) functions on the ciphertext to retrieve the plaintext. Once again, the first step is to convert each of the ciphertext letters into their integer values. We must now perform the following calculation on each integer

D(x) = c(x — b) mod m

- where c is the modular multiplicative inverse of a. That is, a x c = 1 mod m (c is the number such that when you multiply a by it, and keep taking away the length of the alphabet, you get to 1).

Example

where a and b are the keys for the cipher. This means that we multiply our integer value for the plaintext letter by a, and then add b to the result. Finally, we take this modulus m (that is we take the remainder when the solution is divided by m, or we take away the length of the alphabet until we get a number less than this length).

As an example, let us encrypt the plaintext “affine cipher”, using the key a = 5, b = 8. Firstly we must find the integer value of each of the letters in the plaintext alphabet (the standard alphabet of 26 letters in this case). The table below gives these values.

The standard values for the alphabet of 26 letters. Notice we start at 0, not 1

With the integer values of the plaintext letters found, the next step is to perform the calculations on those values. In this instance, the calculation needed is (5x+8). Finally, we must ensure that all our answers are calculated mod 26 and convert the integers back to ciphertext letters. All this information is shown in the table below.

The affine cipher with a = 5, b = 8. We work out the values of letters, then do the calculations, before converting numbers back to letters.

Thus the ciphertext produced is “IHHWVC SWFRCP”.

Continuing our example, we shall decrypt the ciphertext “IHHWVC SWFRCP”, using a key of a = 5, b = 8. The first step here is to find the inverse of a, which in this case is 21 (since 21 x 5 = 105 = 1 mod 26, as 26 x 4 = 104, and 105–104 = 1). We must now perform the inverse calculations on the integer values of the ciphertext. In this case the calculation in 21(y — 8). Once again, we must take these answers modulo 26, and finally convert the integers back to plaintext letters. This is shown in the table below.

The decryption process for a key of a = 5, b = 8. We had to find the inverse of the first, which is 21.

We retrieve our plaintext of “affine cipher”.

Key Space

Key Space Calculation

For the affine cipher a and b are the keys. we have 25 possible values for 𝒂 and 26 for 𝒃. However, 𝒂 and 26 need to be coprime (meaning a has no common factors with 26 other than 1).

  • Value of a: For a to be valid, it must be coprime with 26. The numbers that are coprime with 26 (from 1 to 25) are: 1, 3, 5, 7, 9, 11, 15, 17, 19, 21, 23, 25. There are 12 such numbers.
  • Value of b: The value of b can be any integer from 0 to 25, giving us 26 possible values.
  • Total Key Space: Each pair (a,b) forms a unique key, there are 12 numbers less than 26 which are coprime to 26, and for each of these there are 26 possibilities for the value of b, Therefore, the total number of possible keys (keyspace) is:

12 (choices for a)×26 (choices for b)=312 possible keys for the Affine Cipher

Enhancing keyspace

Due to this relatively low number of possible keys (we shall compare this with more secure ciphers later), the Affine Cipher is once again susceptible to a Brute Force Attack, especially in the age of computers, and is hence not a particularly secure cipher.

Given this, we can also make the cipher a bit more secure by choosing an alphabet with a prime number of elements (since then all the numbers less than our prime are coprime to it, by definition). Thus, with our alphabet of 31 elements (the 26 letters, space, and 4 punctuation marks), we have 30 possible values for a, and still 26 values for b each time, and hence there are 30 x 26 = 780 possible keys for this alphabet. Although these are significantly more secure keys than the standard alphabet, with computing power we can still perform a brute-force attack (trying every possible key) within a few minutes.

[+] Mixed Monoalphabetic Substitution Cipher

Overview

The Mixed Alphabet Cipher is another example of a monoalphabetic substitution Cipher, and the way it works is exactly the same as with those already encountered, except in one way. The difference, once again, is how we create the ciphertext alphabet. Unlike all the other ciphers we have seen so far (Atbash, Pigpen, Morse, Shift, and Affine), the Mixed Alphabet Cipher does not use a number as a key, but rather a keyword or key phrase.

The first point to make here is that every Monoalphabetic Substitution Cipher using letters is a special case of the Mixed Alphabet Cipher. The Atbash, Shift, and Affine Ciphers are all cases of this much larger class of cipher. Each is a way of reordering the ciphertext alphabet by a given rule, rather than using a keyword.

Encryption Mechanism

Encryption With this cipher, rather than performing a mathematical operation on the values of each letter, or just shifting the alphabet, we create a random order for the ciphertext alphabet. Example:

In the table below is one such random ciphertext alphabet. A Mixed Ciphertext alphabet, where the order of the ciphertext letters has been selected randomly.

Clearly, it is very important to ensure that each letter appears in the ciphertext alphabet once and only once so that two plaintext letters are not enciphered to the same ciphertext letter.

With the ciphertext alphabet generated, the encryption process is the same as with every other form of Monoalphabetic Substitution Cipher. That is, each occurrence of a plaintext letter is replaced with the ciphertext letter that has been assigned to that plaintext letter.

Using Keywords

Although it is possible to generate a completely random ordering on the letters of the ciphertext alphabet, as in the table above, it would involve both sender and recipient to remember a random string of 26 letters: not an easy task! For this reason, as with most ciphers, a keyword is often used. The ciphertext alphabet is then generated using this keyword as follows: the keyword is first written, ignoring any repeated letters, and then the remaining letters of the alphabet are written in alphabetical order. For example, if we took the keyword monoalphabetic we would get the alphabets given in the table below.

The ciphertext alphabet is generated using the keyword “monoalphabetic”. Notice that the second “o” is skipped as it has already appeared in the ciphertext alphabet.

This example demonstrates the ignoring of repeated letters (the second “O” of “MONO” is dropped) and how the rest of the alphabet that has not already appeared follows. It also shows a weakness in the system straight away: in this example “u” encrypts to “U”, “v” to “V” and so on to “z”. This problem occurs if the keyword does not contain any letters from near the end of the plaintext alphabet. To combat this problem, we can choose a keyword with a letter from near the end of the alphabet.

Although above we have talked of a keyword for generating the ciphertext alphabet, we could also use a key phrase or even sentence, removing any characters (such as spaces or punctuation) that do not appear in the alphabet being used.

Decryption Mechanism

Decryption As with the other ciphers of this type, the decryption process is similar to the encryption process. The first step is to generate the ciphertext alphabet in the same way as with the encryption process. We then do the opposite, finding the ciphertext letter in the ciphertext alphabet, and replacing this with the corresponding plaintext letter.

Key Space

Permutation

A dramatic increase in the key space can be achieved by allowing an arbitrary substitution. Before proceeding, we define the term permutation. A permutation of a finite set of elements S is an ordered sequence of all the elements of S, with each element appearing exactly once. For example, if S = {a, b, c}, there are six permutations of S:

  • abc, acb, bac, bca, cab, cba

In general, there are n! permutations of a set of n elements, because the first element can be chosen in one of n ways, the second in n — 1 ways, the third in n — 2 ways, and so on

Key space of Mixed Alphabet Cipher

The next point for discussion is the number of possible keys for the Mixed Alphabet Cipher, using a standard alphabet of 26 letters. First, we realize that there are 26 possible choices for the first letter in the ciphertext alphabet. Now, for the second letter, we can use any letter APART from the letter we have already selected for the first position, so there are 25 choices for the second position. For the third position we can choose any letter, apart from either the letter in the first position or the second position, and hence there are 24 choices here. Thus, for the first 3 places, there are 26 x 25 x 24 possible choices. Continuing in this way, we quickly find that there are 26! (26 factorial, where factorial means multiplying all the whole numbers less than 26) possible keys for this cipher. This is a deceptively large number for its appearance. In fact:

26! = 403,291,461,126,605,635,584,000,000

This is an absurdly large number. If every person on earth (say 8 billion) was to try one key a second, then it would still take 1,598,536,043 (that’s one and a half billion) years to try every possible combination.

This is 10 orders of magnitude greater than the key space for DES and would seem to eliminate brute-force techniques for cryptanalysis

[+] Frequency Analysis in Practise

We won’t explain the Frequency Analysis Technique From Scratch if you are not familiar with this method please read my blog here first where I explained it in depth, Although, we will know how to use frequency analysis in practice and use some shortcuts with each different cipher. Now we will use this paragraph as our plaintext and we will encrypt it once with Caesar cipher, then with affine cipher, and lastly with mixed monoalphabetic cipher

“The system should be, if not theoretically unbreakable, unbreakable in practice. The design of the system should not require secrecy and compromise of the system should not inconvenience the correspondents. This is the most famous principle, often summarized as The security of a cipher must depend only on the secrecy of the key, not the secrecy of the algorithm.” This idea is now known as Kerckhoff’s principle. the security of a cryptographic system shouldn’t rely on the secrecy of the algorithm. Instead, it should be based on the secrecy of the cryptographic key. A good cryptographic system should remain secure even if the algorithm used is known. The key should be memorable without notes and should be easily changeable.The cryptograms should be transmissible by telegraph. The system should be portable and its use should not require more than one person not require the concourse of several people.The system should be easy to use and should neither require knowledge of a long list of rules nor involve mental strain.”

Frequency Analysis of Caesar Cipher

We encrypted the previous plaintext with Caesar cipher but we don't know which key we have used.

“Ocz ntnozh ncjpgy wz, da ijo oczjmzodxvggt piwmzvfvwgz, piwmzvfvwgz di kmvxodxz. Ocz yzndbi ja ocz ntnozh ncjpgy ijo mzlpdmz nzxmzxt viy xjhkmjhdnz ja ocz ntnozh ncjpgy ijo dixjiqzidzixz ocz xjmmznkjiyzion. Ocdn dn ocz hjno avhjpn kmdixdkgz, jaozi nphhvmduzy vn Ocz nzxpmdot ja v xdkczm hpno yzkziy jigt ji ocz nzxmzxt ja ocz fzt, ijo ocz nzxmzxt ja ocz vgbjmdoch.” Ocdn dyzv dn ijr fijri vn Fzmxfcjaa’n kmdixdkgz. ocz nzxpmdot ja v xmtkojbmvkcdx ntnozh ncjpgyi’o mzgt ji ocz nzxmzxt ja ocz vgbjmdoch. Dinozvy, do ncjpgy wz wvnzy ji ocz nzxmzxt ja ocz xmtkojbmvkcdx fzt. V bjjy xmtkojbmvkcdx ntnozh ncjpgy mzhvdi nzxpmz zqzi da ocz vgbjmdoch pnzy dn fijri. Ocz fzt ncjpgy wz hzhjmvwgz rdocjpo ijozn viy ncjpgy wz zvndgt xcvibzvwgz.Ocz xmtkojbmvhn ncjpgy wz omvinhdnndwgz wt ozgzbmvkc. Ocz ntnozh ncjpgy wz kjmovwgz viy don pnz ncjpgy ijo mzlpdmz hjmz ocvi jiz kzmnji ijo mzlpdmz ocz xjixjpmnz ja nzqzmvg kzjkgz.Ocz ntnozh ncjpgy wz zvnt oj pnz viy ncjpgy izdoczm mzlpdmz fijrgzybz ja v gjib gdno ja mpgzn ijm diqjgqz hziovg nomvdi”

Of course, we can use the brute-force attack as we have 25 keys possible only (keyspace=25) but we will use frequency analysis this time.

Step 1: Count Letters and their Frequencies in Ciphertext:

Fortunately, we don't have to manually do it we can use an automated tool for that. There are a lot of websites out there but we can use this one right now.

Step 2: Compare with Expected Frequencies

as we can see z is the letter in the ciphertext with the highest frequency and J is the second highest. And if we compare it with the frequency of the letters in Standard English Language we will find that “E” and “t” are the letters with the highest Frequencies in corresponding order.

Step 3: Make Substitutions

Let's Substitute The Letter “Z” with “e” and see what will happen. Again we will use the same website.

Step 4: Deduce the Shift number (Key)

if our guess is right and we know that this ciphertext was encrypted with Caesar cipher we can deduce the Key by comparing the difference between the letter “Z” and the letter “e” and calculate the distance between them and that will be our Key. if we count from z to e that is 5 shifts.

Step 5: Decrypt with the key Found

Finally, let's decrypt the Ciphertext we the Key found and if the key is right our plaintext will make sense

Frequency Analysis of Affine Cipher

Again we will use the same Plaintext but this time we will use the affine cipher and also we don’t know which key we have used.

“Zrc uyuzcq uraelx nc, wh vaz zrcapczwsilly evnpciginlc, evnpciginlc wv fpiszwsc. Zrc xcuwmv ah zrc uyuzcq uraelx vaz pckewpc ucspcsy ivx saqfpaqwuc ah zrc uyuzcq uraelx vaz wvsavjcvwcvsc zrc sappcufavxcvzu. Zrwu wu zrc qauz hiqaeu fpwvswflc, ahzcv ueqqipwdcx iu Zrc ucsepwzy ah i swfrcp qeuz xcfcvx avly av zrc ucspcsy ah zrc gcy, vaz zrc ucspcsy ah zrc ilmapwzrq.” Zrwu wxci wu vao gvaov iu Gcpsgrahh’u fpwvswflc. zrc ucsepwzy ah i spyfzampifrws uyuzcq uraelxv’z pcly av zrc ucspcsy ah zrc ilmapwzrq. Wvuzcix, wz uraelx nc niucx av zrc ucspcsy ah zrc spyfzampifrws gcy. I maax spyfzampifrws uyuzcq uraelx pcqiwv ucsepc cjcv wh zrc ilmapwzrq eucx wu gvaov. Zrc gcy uraelx nc qcqapinlc owzraez vazcu ivx uraelx nc ciuwly srivmcinlc.Zrc spyfzampiqu uraelx nc zpivuqwuuwnlc ny zclcmpifr. Zrc uyuzcq uraelx nc fapzinlc ivx wzu euc uraelx vaz pckewpc qapc zriv avc fcpuav vaz pckewpc zrc savsaepuc ah ucjcpil fcaflc.Zrc uyuzcq uraelx nc ciuy za euc ivx uraelx vcwzrcp pckewpc gvaolcxmc ah i lavm lwuz ah pelcu vap wvjaljc qcvzil uzpiwv.”

Of course, we can use the brute-force attack as we have 312 keys possible only (keyspace=312) but we will use frequency analysis this time.

Step 1: Count Letters and their Frequencies in Ciphertext

Let's use a different online tool in this one to have some variety.

Step 2: Compare with Expected Frequencies

as we can see “c” is the letter in the ciphertext with the highest frequency and “A” is the second highest. And if we compare it with the frequency of the letters in Standard English Language we will find that “E” and “t” are the letters with the highest Frequencies in corresponding order.

Step 3: Make Substitutions

Let’s Substitute The Letter “C” with “e” and see what will happen. Again we will use the same website.

Step 4: Look For Diagrams & Trigrams

unfortunately, it is not as simple as the Caesar cipher, here we must have at least 2 known plaintext to be able to solve it. So let's continue with the normal Frequency analysis and start to look for Known diagrams and Trigrams and compare them with known diagrams in English.

As the Picture below tells us “ZR” is the most frequent diagram and so lets substitute it with its relevant one in English which is “TH” and see if it’s going to help

As you can see the Famous Triagram “THE” in English starts to be obvious and the plaintext starts to be revealed, confirming our guesses at this point.

Step 5: Decrypt Using Known-Plaintext Attack and Frequency Analysis

This method is a mix of Known-Plaintext Attack which Use known correspondences between plaintext and ciphertext letters to create linear equations and Frequency Analysis which uses the most frequent letters to make initial guesses for known pairs.

At this point, we just need to solve 2 linear equations

  • The Affine cipher encryption formula is: E(x)=(ax+b)mod26
  • The Affine cipher decryption formula: D(y)=a^−1 .(y−b)mod26

Forming the Linear Equations

‘Z’ (25) corresponds to ‘T’ (19)
‘R’ (17) corresponds to ‘H’ (7)
‘c’ (2) corresponds to ‘e’ (4)

f(4) = 2
f(7) = 17

4a + b = 2 mod 26
-
7a + b = 17 mod 26
— — — — — — — — — -
3a = 15
a = 15/3 = 5

using the same a in the original equation
4a + b = 2 mod 26
20 + b =2 mod 26
b = -18 mod 26
b = 8

Now we deduced the keys a=5 nd b=8 which are the original keys, and now if we get the inverse of “a” we can substitute in the decryption equation of the affine cipher and get our plaintext back

Frequency Analysis of Mixed monoalphabetic cipher

Again we will use the same Plaintext but this time we will use the Mixed monoalphabetic cipher and also we don’t know which key we have used.

“Zit lnlztd ligxsr wt, oy fgz zitgktzoeqssn xfwktqaqwst, xfwktqaqwst of hkqezoet. Zit rtlouf gy zit lnlztd ligxsr fgz ktjxokt ltekten qfr egdhkgdolt gy zit lnlztd ligxsr fgz ofegfctfotfet zit egkktlhgfrtfzl. Ziol ol zit dglz yqdgxl hkofeohst, gyztf lxddqkomtr ql Zit ltexkozn gy q eohitk dxlz rthtfr gfsn gf zit ltekten gy zit atn, fgz zit ltekten gy zit qsugkozid.” Ziol ortq ol fgv afgvf ql Atkeaigyy’l hkofeohst. zit ltexkozn gy q eknhzgukqhioe lnlztd ligxsrf’z ktsn gf zit ltekten gy zit qsugkozid. Oflztqr, oz ligxsr wt wqltr gf zit ltekten gy zit eknhzgukqhioe atn. Q uggr eknhzgukqhioe lnlztd ligxsr ktdqof ltexkt tctf oy zit qsugkozid xltr ol afgvf. Zit atn ligxsr wt dtdgkqwst vozigxz fgztl qfr ligxsr wt tqlosn eiqfutqwst.Zit eknhzgukqdl ligxsr wt zkqfldollowst wn ztstukqhi. Zit lnlztd ligxsr wt hgkzqwst qfr ozl xlt ligxsr fgz ktjxokt dgkt ziqf gft htklgf fgz ktjxokt zit egfegxklt gy ltctkqs htghst.Zit lnlztd ligxsr wt tqln zg xlt qfr ligxsr ftozitk ktjxokt afgvstrut gy q sgfu solz gy kxstl fgk ofcgsct dtfzqs lzkqof.”

Unfortunately unlike the Previous 2, we can’t use the brute-force attack as we have 26! keys possible only (keyspace=26!) which is infeasible to do in addition we have to do all the Frequency analysis steps (no shortcuts ^_^)

Step 1: Count Letters and their Frequencies in Ciphertext

Let’s use our online tool to count our Letters Frequencies.

Step 2: Compare with Expected Frequencies

as we can see “T” is the letter in the ciphertext with the highest frequency and “G” is the second highest. And if we compare it with the frequency of the letters in Standard English Language we will find that “E” and “t” are the letters with the highest Frequencies in corresponding order.

Step 3: Make Substitutions

Let’s Substitute The Letter “C” with “e” and see what will happen. Again we will use the same website.

Step 4: Look For Diagrams & Trigrams

Next, we will look for Known diagrams and Trigrams and compare them with known diagrams in English.

As the Picture below tells us “ZI” is the most frequent diagram so let’s substitute it with its relevant one in English which is “TH” and see if it’s going to help

As you can see the Famous Triagram “THE” in English starts to be obvious and the plaintext starts to be revealed, confirming our guesses at this point.

Step 5: Analyze and Refine

Once you make initial guesses, look for common words or letter patterns in the partially deciphered text. Adjust your substitutions based on the context and letter patterns until the entire ciphertext makes sense.

if we substitute G~O the plaintext will start to reveal a little bit and you will start to see the Triagram “Fot” a lot which can hint of the “not” trigram in English so we will change F~N, and then you will start to see “oYten” and “oy” wich can be “often” and “of” so again we will change Y~F

And if we continue like this eventually we will reach our Original Plaintext.

Finally, I hope you enjoyed reading this one and learned new techniques not just theoretically but also practically. Hopefully, See you in the next one where we go through the Well-known transposition ciphers and their famous attacks.

--

--