Last couple of months we’ve seen a rise in ransomware related incidents, mostly due to the increase of remote work because of COVID-19. Nevertheless, not all ransomware works in the same way, and in order to have a better incident response in the event of a successful attack, we should have a good understanding of its inner workings. This can hopefully help you to reverse the encryption mechanism of the ransomware, or at least prevent further infection.
One of the best ways of learning how something truly works is to try to build it yourself (and this is what I did). So in this two part write up we’ll spend the first explaining principles and concepts you need to understand. In the second we’ll actually code our own ransomware, applying this principles.
Note: a couple of friends with far better common sense than I have indicated that publishing the full code for a ransomware might not be the best idea, so we’ll just code SOME parts to illustrate key points.
The most important concept that you need to have when it comes to ransomware is the type of encryption used. There are two main ones, and both will be used on any decent ransomware you might encounter in the wild. We’ll briefly explain them and give references for studying the concepts on your own.
Symmetric encryption is the one that most people are familiar with. You have one secret, or key, and you use that one secret to encrypt and decrypt the data. You do both operations with the same key. This is the encryption that you would use for something like a zip file, or an Office document. The same password you use to encrypt the file is the one used to decrypt it.
Asymmetric encryption is the other concept with which most people get confused with. But its very understandable if you isolate it from specific implementations.
Basically, asymmetric encryption uses two keys instead of one. You can use any key to encrypt the file, but you need to use the other one to decrypt it. That’s it.
Now you might have heard about secret and public key. These are keys that are simply kept, like their name states, public, or secret. So anybody can access your public key, use it to encrypt something, and only you will be able to decrypt it (since it needs your secret key to be decrypted). And you can use your secret key to encrypt any piece of information, and everybody will be able to read it, since it needs your public key to be decrypted. This action is also called signing, since its purpose is not to be secret, but rather prove that you are the one that sent that message (just like a signature works in real life).
Ransomware related applications
Let’s think for a minute the flow of a regular infection. The ransomware payload gets delivered through one of multiple vectors (phishing, vulnerable software, etc…) and it gets ran on a computer. All the files get encrypted. After that a note gets shown, demanding some sort of payment form in exchange for a way to decrypt the files. How can we achieve this?
Our first instinct could be using a symmetric encryption on the files. This would be wrong, and any half-decent ransomware avoids doing this for one important reason. When the ransomware is encrypting our victim’s files, the encryption key will need to be present somewhere. If you’re using symmetric encryption, the encryption key used for encrypting can also be used for decryption. This means that a competent forensic analyst could recover the key used for encrypting during the infection, and then use it to decrypt the files. When you’re using asymmetric encryption, you use different keys for encryption and decryption, so having the encryption key in the victims computer isn’t a big deal, as long as you keep the decryption key safe.
Another important thing we need to consider is that we, as attackers, need to have that key in case the victim decides to pay the ransom. When using symmetric encryption we would need to either hardcode the key on the code of our binary (bad idea, there are multiple ways to reverse this), or generate it on the fly, and find some way to transmit it to our attacking server (also a bad idea, since it could get intercepted during transmission, and if the computer looses connectivity, we will be unable to help the victim since we wont have the key). Schemes like this were used by the first strains of CryptoDefense and allowed the victims to decrypt their files since the key was both transmitted to the server after generation and also accidentally left inside the local filesystem.
This surely means we need to use asymmetric encryption to encrypt the victims files, right? We could generate a key pair, hardcode the public one on the code, and encrypt everything with that one (keeping the private one well… secret)? No, we cant.
You see, asymmetric encryption is orders of magnitude slower than symmetric encryption. When you’re encrypting the victims hard drive you need the process to encrypt everything as quickly as possible. If the full encryption were to take more than a couple of minutes, the victim could notice that their files are getting encrypted and simply turn off the computer. This would allow him to retrieve the remaining files from the hard drive.
So what should we use?
The hybrid approach.
In order to resolve this issues we can use a hybrid approach. When we generate the payload, we also generate a set of public/private keys associated with that payload. We hardcode the public key in that payload and whenever the infection happens, the payload generates a key that will be use for the symmetric encryption. After the encryption, we encrypt the symmetric key with the hardcoded public (and of course the plaintext symmetric key gets destroyed from memory/disk). This encrypted symmetric key gets saved somewhere on the machine and we ask the victim for this key on the ransom note. But we still have another problem.
Key re utilization / chosen plaintext attack
Chosen plaintext attack is a cryptographic attack in which the attacker knows the plaintext before encryption, and given a large enough sample of encrypted files, the key could theoretically be derived from the encrypted result. This can happen with the headers of most files, which have a known format. If we use the same key for all files, given certain conditions, it could theoretically be recovered. This is actually what happened to DirCrypt. Due to a combination of bad cryptographic implementations and key re-utilization, the encryption was reversed.
This issue can be solved by using a different key for each file. We can generate the symmetric key for each file, encrypt the file, encrypt the key with the public key in the payload, write the encrypted symmetric key somewhere and delete the plaintext symmetric key.
A lot of ransomware uses this approach, where they generate a text file with each encrypted file name + the encrypted public key associated with it. When the decryption tool is used, it will read the text file, decrypt each key with the private key, and then use the decrypted key to decrypt the text file. We will use something different though.
Technical note: This type of attack doesn’t actually affect the symmetric encryption cipher that we’ll use (AES-256) since by default it uses a different randomized initialization vector (IV) for each file stream, but I wanted to explained general concepts that apply to all ransomware. This might help you recover your data one day if the malware developer made a mistake…
This attack could in fact affect RSA encryption since, in its most basic form, isn’t randomized. This wont be an issue in our case because:
a) the only thing we’ll encrypt with RSA are the AES encryption keys and these do not constitute neither a big nor a homogeneous sample to analyze, and
b) we’ll be using RSA coupled with Optimal asymmetric encryption padding which adds randomization to the encryption.
Another advantage of using a different key for each file is that the encryption key can be deleted after encrypting each file, so if any analyst tries to recover the key, he would only be able to recover the one used for the last file. If we were to use the same key for all the files, the key could be recovered in any part of the encryption process and all the files would be recoverable.
When coding my ransomware, the original version used all of the features I explained so far, but had somewhat disappointing results. When using a 32 bit key (AES-256), my initial benchmarks showed an encryption speed of approximately 1GB per minute. Granted, this speed is heavily dependent on the victims hardware, and of course I was using VMs since I didn’t want to accidentally encrypt my development computer, but still, taking 16 minutes to encrypt a simple 1TB hard drive was not desired.
So, how can modern ransomware encrypt several gigabytes of information in seconds? The answer is in the file structure.
You see, you don’t actually need to encrypt the full file to make it unusable. Depending on the file format, encrypting the headers and most of the initial bytes is enough to make the file unreadable. We could probably get away with encrypting the first 5 megabytes of each file. I know what you’re thinking and yes, simple files such as txt/ascii files will still be readable using something like strings, but most of the time these files don’t weight more than a couple kb of memory. Besides, the most precious files for the victim are usually docs, pictures and videos. You could still try to do a forensic analysis on the files and recover some of the content, but this is a manual approach that would need to be made on each individual file, and not practical at all.
Changing the ending of the file would be ideal too, and we can take advantage of this by appending two structures to the end.
- Initialization vector: When encrypting files with AES, you need something called an initialization vector. This gets generated at the beginning of the encryption process.
- Encrypted decryption key: We can also append the encrypted decryption key to the end of each file. This would eliminate the necessity of a text file with each file decryption key stored.
An encrypted file structure would end up becoming something like this:
Another advantage of the “encrypting part of the file” approach is that it allows us to work on the same file, rather than generating a new encrypted file and deleting the old one. This would be helpful in border cases where we have permissions to write to an existing file but not creating a new one. It also allows us to process very large files (something like a 500gb MySQL database) in a quick way.
After choosing the appropriate encryption for each process, we would need to design our framework to distribute this malware. This would involve automating a process that creates a set of keys for each payload, since we don’t want all the victims to share the same key (if one pays the ransom, it can distribute the key and everyone would be able to decrypt their info). We would also need to keep a database of the keys associated with each victim.
Since this part of the process is not educative at all, and can only be used for malicious purposes, I didn’t code it, nor I will explain how it could be done. The idea of this write up is to show to blue teamers things that previous ransomware has gotten wrong, and that could stop an infection from happening / spreading.
To generate the asymmetric keys for a single payload I’ll use this ssh-keygen command:
ssh-keygen -b 2048 -m pem -f pem
This will generate both a public (pem.pub) and a private key (pem) in pem format. We’ll bundle the pem.pub with our resulting executable to test our concept malware.
This is it for part one. In part two we’ll code (some parts of) the malware and benchmark its execution to demonstrate how quickly a malware can encrypt your files.