Linguistics- The Enemy of Security

Hemanth Chitti
The Fun Of Cryptography
4 min readApr 6, 2020

In the last post ( https://bit.ly/2UE67O5) we saw two different ways of encrypting a message. There has to be a flaw in these ways, like I explained in the first post about every system having some vulnerability. Not to mention, you wouldn’t be studying cryptography if these methods were perfectly what we needed. So let’s go into the main factor which leads to many ciphers being broken.

Frustrated student in front of laptop
A science nerd in language class, who’s about to find out what he did wasn’t pointless after all.

(1) “Weethaut rily anderstund is santans strukchar haf a is herd yu noteeced a that to?”

Oh you didn’t understand that at all? Let me try again.

(2) “Really understand without sentence structure is a have you noticed hard to that a?”

Damn, it’s still not good enough ? Okay , I promise I’ll do better this time, and you can leave this post if you still don’t get it.

(3) “Have you noticed that a sentence is really hard to understand without a structure?”

Well, even if you didn’t notice that fact before, I hope you did now through this frustrating exercise. :D

A language* has some rules to play by, which on not following leads to sentences like the first two nonsensical ones. Let’s check what’s wrong with the first two and we will probably arrive at a reasonable conclusion.

In (1), both the rules of spelling and grammar are thrown out of the window. That’s why you get that painful abomination which you probably wouldn’t have understood until you saw (3).

(2) shows some progress from (1) - at least the spellings are right so we can understand each word. But that doesn’t mean we’d understand the sentence on a whole. And that’s because it hasn’t followed the rules of grammar at all.

At this point I think it’s understood why (3) is legible. It’s because it follows the rules of spelling and grammar, or in other words, the rules of linguistics.

These rules are the basis of any language in the world- they might differ from language to language but for anything to be known as a language, it needs to have some agreed-upon rules so that anybody who learns the language can understand it. And these rules are the nail in the coffin of security. Why so?

The fact that there is a structure constrains the number of possible ways to hide a message significantly, for the simple reason that a message still has to be meaningful upon decryption. Thus if the final output on decryption is (2), then we have not succeeded in secure encryption. This also means that you can guess some phrases because you know that they have to follow a particular order.

And this isn’t limited to just phrases! It turns out that everything you learnt in those boring literature and grammar classes from elementary school could be really useful!

For example- If I gave you a phrase, “sla aol jha vba vm aol ihn” and you found out that the first 3 words are “let the cat”, then it would be easy to guess that the rest of the words are “out of the bag”, because you know the common idiom. Or, if I give you that a letter in a word is q, the next letter is sure to be u (observe any English word with the letter q, excluding names).

Probably we could say this is the most important tool to use against a general unknown cryptosystem. We can always combine knowledge of linguistics and observing common phrases with some known weakness of a system to find out more about it, and this is exactly how the famous Enigma encryption was broken; every message ended with “Heil Hitler” and the Allied Forces recognised and used this in combination with other weaknesses of the machine they had guessed to crack it.

Enigma machine
Most complex machine of the time, defeated by a military salute

This is why people who work in encryption put in a lot of effort to obscure these relations between words- they may purposely misspell some words such that someone who reads it can still understand it but an attacker won’t. One example could be using pliz instead of please, u instead of you,etc in the plaintext— when these are encrypted it would be harder for an attacker to figure out what they are. Another example could be using numbers or symbols instead of certain letters- thi$ i$ r3411y wh4t I’m t41king 4bout. These are all different ways of obscuring the structure of the sentences to make unauthenticated decryption harder.

We’d study this in more depth later when we go to frequency analysis. For now we will study some classic ciphers and why they don’t work.

*we will be dealing with the English language only in our study of cryptography. However we can always apply the concepts learned to other ones.

--

--