Encrypting/Decrypting a file using OpenSSL EVP

The missing README for OpenSSL encryption/decryption in C Language

12 min readOct 6, 2017

Background

OpenSSL provides a large full-featured cryptographic toolkit (general purpose library). It’s a popular talk that crypto modules are hard to write. Even the most talented crypto experts who know the math of the algorithm find it hard to implement them securely. Yes, you may know the XOR operations, the modulo arithmetic, the bit manipulations and the theory behind it, but it’s very easy to trip and write unsafe code since there’s no community of smart cryptographers looking at your implementation. This is where OpenSSL shines. Crypto module implementations have a well defined math for performing an operation and to achieve efficiency and optimization, a lot of implementations are in assembly. Here’s an example on Github. These instructions are carefully written to utilize the available optimized instruction set of the microprocessor and perform the large computations with efficiency. Learn more with an example: AES-NI in Laymen’s Terms (Trust me it’s a wonderful article).

So, the whole point of this argument is to utilize the optimized safe code provided by OpenSSL EVP functions. Let’s first push the acronyms away.

What does EVP stand for?
Well, it looks like whoever coined it hasn’t made it search engine friendly. So, the most probable meaning is “envelope encryption”. Let me know in comments if you know anyone from OpenSSL community who has publicly written about it.

Okay, so what about EVP?
EVP interface supports the ability to perform authenticated encryption and decryption, as well as the option to attach unencrypted, associated data to the message. It provides a set of user-level functions that can be used to perform various cryptographic operations. Here are the signatures on Github.

In case you need a quick intro to some interesting functions, here it is:

EVP_CIPHER_CTX_new() : Creates a cipher context
EVP_CIPHER_CTX_free() : Clears all information from a cipher context and frees up any allocated memory associate with it, including context itself
EVP_EncryptInit_ex(): Sets up cipher context for encryption with cipher type from ENGINE implementation

and so on.

Prerequisites

Just get a vanilla Ubuntu machine and you’re good to go

I always prefer to use a docker image for demonstrations since that helps me to demonstrate all installations without polluting my current machine. So, if you love Docker too, feel free to use one or you can perform the steps directly on your ubuntu machine

Setup

[Optional] Get a simple ubuntu image from Dockerhub
$ docker run -it --name openssl_evp_demo ubuntu:16.04 /bin/bash
Install basic tools required for this demo and OpenSSL
$ sudo apt-get update && apt-get install -y vim openssl libssl-dev

NOTE: You don’t need sudo if you’re using a docker image since you will be logged in as root

Make sure you have the crypto libraries installed:

# cd /usr/lib/x86_64-linux-gnu/
# ls -a *crypto*
libcrypto.a  libcrypto.so  libhcrypto.so.4  libhcrypto.so.4.1.0  libk5crypto.so.3  libk5crypto.so.3.1

openssl package contains the openssl binary and related tools.
libssl-dev package contains SSL development libraries, header files and documentation

Preface

Let’s develop the functions one by one and take it slow. Let’s begin with the main() function which will be followed by the encryption/decryption function.

If you observed that I referred to encryption/decryption as one single function, you’re a good reader 👍 Yes, we can do that!

main()

We need a structure to pass around the key, iv, encryption flag and the cipher type:

typedef struct _cipher_params_t{
    unsigned char *key;
    unsigned char *iv;
    unsigned int encrypt;
    const EVP_CIPHER *cipher_type;
}cipher_params_t;

Here are the key operations to perform:

Initialize the key and initialization vector
Open the given file for reading and open another file for writing the ciphertext(encrypted file)
Encrypt the given file and close the file descriptors of both encrypted file and given file
Open the encrypted file for reading and open another file for writing the decrypted file
Decrypt the encrypted file and close the file descriptors of both decrypted and encrypted file

What if I want to use a different cipher ?

Change the cipher assigned in the below line:
params->cipher_type = EVP_aes_256_cbc();

to available options in https://github.com/openssl/openssl/blob/master/include/openssl/evp.h

NOTE: You also need to make suitable changes to key size and function calls if you are using other algorithms.

Where did that struct cipher_params_t come from?

It’s a user defined struct for transporting variety of pointers and data conveniently. You may choose not to use it.

What are those constants?

/* 32 byte key (256 bit key) */
#define AES_256_KEY_SIZE 32/* 16 byte block size (128 bits) */
#define AES_BLOCK_SIZE 16

What is RAND_bytes() , where is that function?

#include <openssl/rand.h>
/* Its a library function available in OpenSSL toolkit */
 int RAND_bytes(unsigned char *buf, int num);

RAND_bytes() puts num cryptographically strong pseudo-random bytes into buf. An error occurs if the PRNG has not been seeded with enough randomness to ensure an unpredictable byte sequence.

What is params->encrypt and why do we set it?

It’s like a flag that we use to decide if we want to

encrypt(params->encrpyt = 1) or
decrypt(params->encrpyt = 0)

You will see how and where it is used when we move on to discussing encrypt/decrypt functions.

Hope all your questions on main() have been answered 😃

file_encrypt_decrypt()

Steps:

Create a new context for encryption/decryption. Think of it as a data structure that holds a lot of vital information related to your encryption/decryption
Initialize the context with key, initialization vector and the encryption/decryption algorithm
Read BUFSIZE data from the input file until the end of the file
Encrypt/Decrypt the read bytes based on the flag params->encrypt
Write the encrypted/decrypted bytes into the output file
Apply the cipher on the final block to make sure padding is taken care of and write the output bytes into the output file
Clear all information from a cipher context and free up any allocated memory associate with it.

Let’s start with the questions you may have.

Why is the size of out_buf equal to (BUFSIZE + cipher_block_size)?

Great question! In fact, this is one the most important question that lets you understand some interesting internals of EVP functions.

This question requires a bit more effort to understand what’s going on. Hang tight. Let’s first start with basics of AES block length.

For AES, NIST selected three members of the Rijndael family, each with a block size of 128 bits, but three different key lengths: 128, 192 and 256 bits. When using AES cipher in any mode with any size key, the block size is 128 bits (16 bytes). AES block cipher encrypts 16 bytes at a time. So, it requires the input to be multiple of this block size. Since not all input will be a multiple of this block size, you need to use padding to add additional bits to the input data to make it a multiple of block size before doing encryption using a block cipher.

For example: If your file size is 8 bits (1 byte), we require 15 bytes of padding. If your file size is 15 bytes, we require 1 byte of padding.

If padding is enabled (the default) then EVP_EncryptFinal_ex() encrypts the “final” data, that is any data that remains in a partial block. It uses PKCS#7 padding. The encrypted final data is written to out_buf which should have sufficient space for one cipher block.
It may not be useful here but I’m sure you might have pondered for a second, what on earth is this PKCS#7 padding and how does it work. Here’s a quick answer:
The PKCS#7 padding string consists of a sequence of bytes, each of which has value equal to the total number of padding bytes added.

The following example shows how these modes work. Given a block cipher of 16 bytes size, If input data length is 12 bytes, then number of padding bytes equal to 4 bytes, each having value as 04.

Data: FF FF FF FF FF FF FF FF FF FF FF FF
PKCS7 padding: FF FF FF FF FF FF FF FF FF FF FF FF 04 04 04 04

When a full block is received, the block is encrypted. At any one time, between 0 and 15 bytes are still buffered (If 16 bytes are buffered, it would be a block and it’s eligible for completing encryption). The “finalize” function is called: between 1 and 16 bytes are appended to the buffered data (to reach the length of 16), and that final block is encrypted like all the others. If 0 bytes were buffered at that point, 16 padding bytes are appended (all of numerical value 16), and that extra block is encrypted. Question is: If 0 bytes were buffered at that point, why was there a need to add the extra 16 bytes?

Good question.
Answer: This is necessary so the deciphering algorithm can determine with certainty whether the last byte of the last block is a pad byte indicating the number of padding bytes added or part of the plaintext message. Consider a plaintext message that is an integer multiple of 16 bytes with the last byte of plaintext being 01. With no additional information, the deciphering algorithm will not be able to determine whether the last byte is a plaintext byte or a pad byte. However, by adding 16 bytes each of value 16 after the 01 plaintext byte, the deciphering algorithm can always treat the last byte as a pad byte and strip the appropriate number of pad bytes off the end of the ciphertext; said number of bytes to be stripped based on the value of the last byte.

I understand that can be a little overwhelming but sit back and imagine this for a minute or two. Get a whiteboard and write this for a better visualization.

This is the reason why PKCS#7 “always” has padding bytes (1–16).

Official note:

The amount of data written depends on the block alignment of the encrypted data: as a result the amount of data written may be anything from zero bytes to BUFSIZE + cipher_block_size — 1 so out_buf should contain sufficient room.

Also, consider a simple example file of 16 bytes. If a developer decides to have an in_buf and out_buf of I don’t know let’s say 5 bytes.
Remember that the EVP functions for Encrypt and Decrypt always wait till they have at least 1 block to work with. All data being thrown till that point will be buffered and the count is maintained in ctx->buf_len

+-------------------+--------------+
| fread bytes count | ctx->buf_len |
+-------------------+--------------+
| 0                 | 0            |
+-------------------+--------------+
| 5                 | 5            |
+-------------------+--------------+
| 5                 | 10           |
+-------------------+--------------+
| 5                 | 15           |
+-------------------+--------------+
| 1                 | 16           |
+-------------------+--------------+

When that last byte is read and EVP_CipherUpdate() is called, things start to move since we have reached the magic number of 16 (1 Block). Now, if the out_buf passed to EVP_CipherUpdate() is not capable of holding the 16 encrypted bytes, it will result in stack smashing and core dump.

For similar reasons, EVP routines also mentions:

EVP_DecryptUpdate() should have sufficient room for (inl + cipher_block_size) bytes unless the cipher block size is 1 in which case inl bytes is sufficient.

Since we are using the same out_buf for both encryption and decryption, it makes sense to have it’s size as BUFSIZE + cipher_block_size . Hope I have answered your question. I understand it was a bit long but I guess it’s worth the knowledge.

What’s with this `EVP_CIPHER_CTX_new()` ?

EVP_CIPHER_CTX_new() returns a pointer to a newly created EVP_CIPHER_CTX for success and NULL for failure. Of course, your next question would be what is EVP_CIPHER_CTX . Here you go:

typedef struct evp_cipher_ctx_st EVP_CIPHER_CTX;
struct evp_cipher_ctx_st {
    const EVP_CIPHER *cipher;
    ENGINE *engine;           /* functional reference if 'cipher' is
                               * ENGINE-provided */
    int encrypt;              /* encrypt or decrypt */
    int buf_len;              /* number we have left */
    unsigned char oiv[EVP_MAX_IV_LENGTH]; /* original iv */
    unsigned char iv[EVP_MAX_IV_LENGTH]; /* working iv */
    unsigned char buf[EVP_MAX_BLOCK_LENGTH]; /* saved partial block */
    int num;                    /* used by cfb/ofb/ctr mode */
    void *app_data;             /* application stuff */
    int key_len;                /* May change for variable length cipher */
    unsigned long flags;        /* Various flags */
    void *cipher_data;          /* per EVP data */
    int final_used;
    int block_mask;
    unsigned char final[EVP_MAX_BLOCK_LENGTH]; /* possible final block */
} /* EVP_CIPHER_CTX */ ;

How will I know when I have reached the End of File (EOF)?

The main breaking condition is:

if (num_bytes_read < BUFSIZE) {            
    /* Reached End of file */            
    break;        
}

Let’s consider some simple examples to see why it works:
BUFSIZE = 100
Filesize = 10
num_bytes_read will read the 10 bytes in the first call to fread and we move on to encrypting them. At the end of while loop, 10 < 100 and we are done

BUFSIZE = 100
Filesize = 102
num_bytes_read will read the 100 bytes in the first call to fread and we move on to encrypting them. At the end of while loop, 100 is not less than 100 and we continue reading bytes. num_bytes_read will read 2 bytes which move to encryption buffers. At the end of while loop, 2 < 100 and we exit. Remember that fread is our friend and will only read till EOF.

How is it possible to use the same EVP function calls to encrypt and decrypt?

Good question again. Let the source code answer your question:

int EVP_CipherUpdate(EVP_CIPHER_CTX *ctx, unsigned char *out, int *outl, const unsigned char *in, int inl)
{
    if (ctx->encrypt)
        return EVP_EncryptUpdate(ctx, out, outl, in, inl);
    else
        return EVP_DecryptUpdate(ctx, out, outl, in, inl);
}int EVP_CipherFinal_ex(EVP_CIPHER_CTX *ctx, unsigned char *out, int *outl)
{
    if (ctx->encrypt)
        return EVP_EncryptFinal_ex(ctx, out, outl);
    else
        return EVP_DecryptFinal_ex(ctx, out, outl);
}

EVP_CipherInit_ex(), EVP_CipherUpdate() and EVP_CipherFinal_ex() are functions that can be used for decryption or encryption. The operation performed depends on the value of the enc parameter. It should be set to 1 for encryption, 0 for decryption.

Where did we set the `enc` parameter for the `ctx` to know?

Check out the below line in code:
EVP_CipherInit_ex(ctx, NULL, NULL, params->key, params->iv, params->encrypt)

Looks like EVP_EncryptUpdate() is a function that can be used for updating the ciphertext file for encrypted bytes. Am I right?

EVP_EncryptUpdate() encrypts num_bytes_read bytes from in_buf and writes the encrypted version to out_buf. This function can be called multiple times to encrypt successive blocks of data. The amount of data written depends on the block alignment of the encrypted data: as a result the amount of data written may be anything from zero bytes to (num_bytes_read + cipher_block_size — 1) so out should contain sufficient room. The actual number of bytes written is placed in out_len.

I am guessing `EVP_EncryptFinal_ex()` will finalize the encryption for any partial blocks?

EVP_EncryptFinal_ex() encrypts the “final” data, that is any data that remains in a partial block. The encrypted final data is written to out_buf which should have sufficient space for one cipher block. The number of bytes written is placed in out_len. After this function is called, the encryption operation is finished and no further calls to EVP_EncryptUpdate() should be made.

There are so many signatures of all these EVP functions, how do I know that? Where do I start?

Relax, all the signatures are here: evp.h

I have seen some examples/demos in which `EVP_EncryptUpdate()`
and `EVP_DecryptUpdate()` are used. Which one shall I choose?

Totally up to you. You have already seen under the hood what EVP_CipherUpdate() does. If you just wish to write your encryption and decryption as separate functions, you are welcome to choose those specific EVP functions.

Looks like your code has more safety net lines of code than actual logic for encryption/decryption. I mean what are all those if conditions, cleanup and fprintf calls ? Do we really need that?

I would highly recommend it but you may choose to *believe* that all calls will work flawlessly and do away with all the if checks. In case a function fails and you are forced to exit prematurely, you will leave around a lot of open files, allocated memory on heap and some other mess.

BTW, this is probably just the simple primary layer of safety net you just saw. Organizations with production code have more sophisticated layers than this to fail gracefully.

In general which functions should I be wary of ?

Just keep in mind that file I/O functions, memory allocation functions and all EVP functions have the potential to fail.

I am kind of a beginner with basic knowledge and I am looking for something more basic text encryption in which I can setup key, iv and encrypt/decrypt small text lines. Do you have anything for me?

That’s a lot of information to process, can I just have a quick demo to see how it’s working?

Sure,

Github : OpenSSL EVP Demo of file encryption and decryption

References

Your feedback/comments for any changes/improvements are always welcome (BTW, appreciations are also welcome 😬)