Overview of Cryptography

12 min readJan 19, 2017

Short introduction to the concepts of Cryptography

This is a theoretical article intended for introducing the terms and terminologies of cryptography that you would have heard or should be known by you. I must warn you that it does not incline towards the mathematical details. It is meant for introducing the branch of Cryptography to a novice. I have just scratched the surface of the topics and there is a lot that has been untouched. I have tried to provide a wider perspective rather than a deeper look on the topics.

Cryptography is the study of techniques of secure communication, i.e. transmission of a message by a legitimate sender to the intended receiver, without allowing an adversary to cause any trouble. This may look like a vague definition of it, but this is roughly what we want to achieve.

This thing may be new to computer science, but from mathematical perspective, such techniques have been used from a long time. It was a tool to protect national secrets and strategies. Often, it was used by the military and the government. There is a great book written by David Kahn which consists of many techniques that were used in past, centuries ago, to 20th century techniques(1963, till the book was written).

Cryptography had found its uses in past, and in present, and undoubtedly, shall remain in future. With a basic idea of ensuring secure communication between trusted parties, it is a means for providing information security management.

What does it mean for a message to be secure?

Now comes the million dollar question, what does the word ‘secure’ signify?

A fundamental goal of cryptography is to address the following four objectives:

Confidentiality, aka Privacy, is to keep the content of the information only to the intended parties only.
Integrity refers to fact that the data should not be manipulated by unauthorized parties. To be precise, manipulation involves either or all of insertion, deletion, and substitution.
Authentication, aka identification, ensures that parties on both sides of communication channel are authorized entities only. It also involves data origin authentication, that ensures that the source has changed if the message has been modified lately.
Non-Repudiation prevents an entity from denying previous commitments and actions.

Based on these objectives, cryptographic tools are designed to provide information security. These tools and techniques should themselves be evaluated using 5 criteria:

Level of Security
Functionality
Methods of Operation
Performance
Ease of Implementation

Talking about cryptographic systems, Dutch cryptographer Auguste Kerckhoff stated 6 design principles, however they are now too old to be considered. One which is important

A cryptosystem should be secure even if everything about the system, except the key, is public knowledge.

A similar statement was given by Claude Shannon

The enemy knows the system.

Encryption systems should not rely on the obscurity of an algorithm. This approach has two problems. Firstly, if the algorithm is known to an adversary, your whole system turns useless and more importantly you don’t have any way to figure out “Well, did someone get the details of the algorithm at this moment of time?”. Also, you should now start designing a new algorithm right from scratch since the old one has been rendered useless. Second, if security relies on keys instead of secrecy of the algorithm, then you can talk to other people, discuss your algorithm, and also their’s, and finally come up or agree upon some stronger algorithm, basically we can say a standard algorithm which can be proven to be more secure than other algorithms.

Then everybody shall be using this algorithm. Even if the attacker knows this algorithm, he can’t do anything feasibly till he lacks the knowledge of keys.

Basic Terminology

Before proceeding, it should be clear to you what general terms and symbols mean in rest of the article.

Message Space M consists of all the possible plaintext messages.
Ciphertext Space C consists of all the possible ciphertext messages.
Let K be the Key Space, which is a set of keys, each of one which maps a single plaintext message to exactly one ciphertext message in a one-to-one bijective manner.
Let e denote a key from the key space used for encryption of the plaintext message. Let E denote the encryption algorithm.
Let d denote a key from the key space used for decryption of the ciphertext message. Let D denote the decryption algorithm.
Encryption is the process of converting a plaintext message m into a ciphertext message using an encryption key e from the Key Space K.

c = E(m, e)

Decryption is the process of converting a ciphertext message c into a plaintext using a decryption key d from the key space K, which is seemingly related to the key used for encryption.

m = D(c, d)

Consistency Equation should be satisfied by any cipher (E, D) in order to successfully encrypt and decrypt the messages.

m = D(E(m, e), d)

Concept of Perfect Secrecy

Claude Shannon, father of Information theory, gave the concept of perfect secrecy, i.e. given a ciphertext, it should not reveal any knowledge about its corresponding plaintext.

In mathematical terms, for all messages m and m’ in the message space M, and ciphertext c in the ciphertext space C,

P[ E(k, m) = c] = P[ E(k, m’) = c]

where the key k belongs to uniformly distributed key space K.

In simpler terms, given only the knowledge about ciphertext, the probability that it belong to any two message samples in M should be the same. Hence, there are no ciphertext only attacks for techniques like One-time pad that show perfect secrecy!

Symmetric-key Encryption

Symmetric Key Encryption refers to the encryption schemes which use seemingly related keys e and d. By seemingly related, I mean that these keys possess a simple mathematical relationship, so knowing the value of e, the value of d can be deduced easily.

Symmetric key encryption is also known as the conventional encryption scheme. It is of two types:

Block Cipher
Stream Cipher

Block Cipher

Block cipher is an encryption scheme which breaks the given plaintext message string into smaller strings of fixed length aka blocks and encrypts them individually one at a time.

Block ciphers can be broadly classified in two types:

Substitution Cipher - A Substitution cipher is a cipher which encrypts a plaintext such that each unit in the plaintext is replaced by another unit. A unit here, may refer to a character or a group of characters. It should also be noted that in substitution ciphers, these units or groups are themselves modified, but not their positions, which are kept the same. So the starting position and the ending position of a unit in a plaintext and a ciphertext shall be same. Examples of such ciphers include Caesar cipher and Vigenère cipher. Besides simple substitution, such ciphers can be classified into Homophonic or Polyalphabetic substitution ciphers.
Transposition Cipher - A Transposition cipher, unlike the substitution cipher changes the position of the units and not altering the values of those units themselves. Basically, it computes to one of the permutations of the plaintext. One of the simplest examples include Rail Fence cipher. Not to our surprise, cyptanalysis techniques can be employed on such ciphers, turning them weak!

Examples of block ciphers include 3DES and AES.

Stream Cipher

Stream cipher is a symmetric cipher scheme that uses a pseudorandom key to encrypt a message. A certain fixed length key is taken as input which is also the seed to a random function that generates a suitable length key stream, which is later XORed with the plaintext message, one bit at a time.

However, it should be noted that the keystream generated is pseudorandom and not random. Ideally, given the first k bits of the keystream, it should not be feasible to guess (k+1)th bit of keystream. This is what it means for the keystream to be completely random. But the fact that the keystream is pseudorandom causes vulnerability, as by repeatedly guessing the next bit in the keystream attacker can process the whole key!

This problem can be dealt by randomly guessing certain number of starting bits of the keystream, but this does not help as attackers can have knowledge about the starting bits of your data, like the header format of a packet which is always similar irrespective of the data you intend to transmit. So once the attacker guesses the protocol you have been using, he can easily guess the starting bits. Now these starting bits of the guessed plaintext can be XORed by the attacker with the starting bits of the ciphertext which reveals the starting bits of the keystream, i.e. the seed. Now whole keystream can be generated to decode all of the ciphertext. This is why, ideally, every bit of the keystream should be generated randomly, so even having the knowledge of previous bits, obtaining the next bit should not be feasible.

Public-key Cryptography

Public key Cryptography employs a technique that is same as the symmetric key encryption except the fact that d is not equal to e, i.e. the key used for performing encryption is different from the key that will be used for decryption. Now your gut feeling that both of these keys are related to each other is correct, but the fact is, knowing only one of the keys, obtaining the other key is not practically feasible.

One of these keys has to be secret, hence the name, private key. The other key has to be made public, hence the name public key. The public key is known to everyone and the private key is kept by the communicating party to itself. If the private key accidentally gets leaked, then security mechanism should be assumed to be compromised and a new pair of keys should be generated.

The convenient thing about these keys is they are generated in such a way that you can use any of the keys for encryption and decryption. So the message encrypted by a private key can be decrypted by the corresponding public key and the message encrypted by a public key can be decrypted by the corresponding private key.

For instance, let’s say Alice wants to send a message to Bob. Since Bob’s public key is known to everyone, Alice can use this key to encrypt the message and send it to Bob. Bob, on receiving the message, can decrypt it using his private key. Since Bob’s private key is known only to Bob, it is guaranteed that no one else other than Bob can decrypt the message. This approach performs encryption using the public key, i.e. the message is encrypted using the public key of the receiver.

Let’s turn the table, this time it’s Bob who wants to send the message. Well, he can use the above approach, or he can encrypt the message using his private key and send it. Now anyone with Bob’s public key can decrypt this message(basically everyone). This strategy may seem useless, but has importance in many applications like digital signature that has been discussed below.

Hash Functions

Cryptographic hash functions are used to calculate the MAC or the Message Authentication Code. HMACs guarantee not only that the given message is authenticated but also the integrity of the message is preserved. They involve a compression method and a encryption method using a private key.

Ideally, hash functions are mapping of arbitrary sized data to a fixed size binary string. So the files that are several MBs and GBs in size can be compressed to a certain n-bit string that can practically uniquely identify the whole file. This n-bit string can be 128-bit or 160-bit or something else depending on the algorithm used.

The output of a hash function is known as message digest or the hash value. For an algorithm that outputs an n-bit hash value, there are a total of 2 powered n output combinations. However, our input may be of any arbitrary size. This gives us a benefit that instead of comparing the whole files of gigabytes of size, we can simply compare the hash values which shall take much lesser time me for verifying the integrity. But, Since we are mapping elements from a domain of larger size to smaller space, collisions are sure to occur. In case of cryptographic hash functions, algorithms are said to be broken if collisions are found, i.e. two messages yield the same hash value.

When an algorithm is broken, we need to search for a stronger algorithm may be by replacing it with one whose hash size is larger. Ideally, if we choose a larger value of n, the chances of collision are reduced but, for calculating the hash value of larger size, the algorithm needs more iterations, taking more time. That is why the value of n should be optimized such that it is neither small enough to encounter collisions frequently, nor large enough to take more time in calculation of hash value. So there is a trade-off between the time consumed in generating the hash value and the occurrence of collisions.

Digital Signatures

Digital Signature is a cryptographic primitive which facilitates an entity to bind its identity to a piece of information. This provides us with authentication, authorization, and non-repudiation. This is best understood by taking analogy with handwritten signatures on papers, we are simply interested in its electronic version.

For all of you who think we could use an electronic pen to sign digital documents, copy pasting your signature from one document to another shall be a piece of cake. So that approach ain’t gonna work!

While working on the papers, signature is something that you possess, and is unique and your signature belongs only and only to you. You can use same signature to sign multiple documents, but only you can do so, because you know how exactly to make your signature in the most perfect way, no one else can do that because only you possess the talent. So how about you in the digital world having a long binary string that is possessed only by you, and no one else has the knowledge of it. Lets say this to be your private key.

Now private key is something that you don’t share with anyone, as its name imply, its private to the holder. Its usefulness lasts only till its secrecy.

Another thing that we should be concerned about is that when in real world we sign a document, we know what is the information in the document, what is the meaning conveyed by that paper and we intend to sign only that paper and not some other document. Long story short, our signature is attached to the information itself. Now in digital world, documents can be of varying sizes, some may be small and some may be way too large. To deal with their size differences, we can convert all the possible documents in this world to a fixed length binary strings, and work on them. Well, this mapping is theoretically not perfect, but practically is usable, and its safe to assume that all digital documents(or any information that you can think of) can be converted to a fixed length binary strings, let’s say 128-bit strings. This can be achieved using hash functions, and the mapped strings of 128 bit size can be referred to as hashed strings or hash values.

Now that we have information that can be uniquely identified by the hash value, we can encrypt it using our private key and the obtained binary string can be referred to as Digital Signature. This Digital Signature can be sent along with the document for the verification purpose.

For verifying a document, we have the document itself and the digital signature corresponding to it. For performing the verification:

Find the hash value of the document again using a hash function.
Decrypt the digital signature using the public key of the signer. This gives you the hash value of the document the signer had signed.
Match both the hash values and declare success if they are equal.

Because of involvement of private keys, digital signatures support authentication. They also ensure that integrity of the data is preserved as the hash values won’t match if the document has been tampered during transmission. Also the sender can not deny signing the document as it involves his private key which is known only to him, therefore, non-repudiation is guaranteed.

To be true, this has been really a short introduction to a very deep concept. There are a number of things that have been omitted and many have not been even mentioned. If you have any suggestions, please let me know.

If you are interested in further delving into the topic, I highly recommend Professor Dan Boneh’s lectures on Coursera, Cryptography-1 and Cryptography-2. The second course has not been released yet, but the first one has enough material that has been covered in detail.

Besides this, you may also subscribe to Bruce Schneier’s blog who has many interesting articles on computer security.

In case you loved this article, please recommend it. I am Piyush Deshmukh, I have taken dozens of MOOCs and am interested in Data Science and Machine Learning, although this topic was a deviation and slightly out of my league. I am planning to write articles on ML and DS. You can also follow me for any further updates and new articles. In case you want to connect with me, I am on LinkedIn too.

Keep Learning folks!!