Understanding how VIRUS work!
HISTORY
In 1949, John von Neuman gave a talk on self-replicating automata. He was fascinated by the concept of self-replication, organisms that could create their own copies.
He envisioned the Universal Constructor — a machine that when run, would replicate itself. It had the following parts:
- A blueprint of itself
- A mechanism that could read the blueprint and construct a machine
- A mechanism that could copy the blueprint.
Following this, in 1970 a program named Creeper was written whose main purpose was to move between computers through the internet, which was then called ARPANET.
Now the name might sound familiar to Minecraft fans.
But, that’s a different story. Let’s come back to our story.
Later the Creeper was modified to copy itself onto computers it moved through and display a message.
It did create a mess, a similar program called Reaper was written to move between computers deleting any copies of Creeper. This inspired the development of the first programming game Core war where two programs loaded into memory would execute one instruction per turn in an attempt to terminate the other!
This started a dangerous trend which in the future is the foundation of many malware and anti-viruses which fight each other to survive.
This article aims to explain a particular type of malware, a Virus.
A Virus is a piece of code that could replicate itself and propagate to different hosts by embedding itself on various files and possibly modifying other programs/files in the process.
Professor Len Adleman coined the term virus. During an interview, when asked about the origin of the name, he said
I was at a conference on cryptography and ran into a reporter he asked me what was going on. I said “Not much, I’ve got a student who is researching something we’re calling computer viruses, but the research is embryotic and we haven’t got much now”
Later in the interview, he says
“saying the name ‘computer virus’ to a journalist when nobody knew about them was planting the seed”
And the rest is history.
Much like it’s biological counterpart it requires a host to survive, and can spread from one system to other even if they are six feet away !!
In fact, when the term was first coined, it’s been rumoured that computer scientists were asked by some reporter if people could contract a virus from infected computers.
The best way to understand what viruses do is to create one.
To begin with, this virus has three functions:
- Infect: To append a line ‘Congratulations, you have been infected’ to all files with a .txt extension.
- Copy: To copy the virus code, from the
#Virus
line. This would be a key to create an anti-virus too. - Spread: To spread the virus code to all files with .py extension.
Executing this would append a text to all the text files in the same folder as this program and also, this would copy itself to other programs with a .py extension.
Though not as harmful as a real virus, this gives us the basic idea of how a virus would propagate.
Viruses in real life don’t have plain code visible. They would be encrypted.
Viruses or any program in general, have what is known as a static signature. These signatures are unique to each program, just like a fingerprint. If a program is classified as a virus, its static signature could be flagged as malicious. When any antivirus software encounters that signature, they could terminate it.
Below is the anti-virus program:
This anti-virus has two functions:
1.Search: To search for all files with .py. Functions in the same folder as the program.
2.Destroy: To delete all files passed by search functions with line #Virus
This anti-virus uses the line #Virus
begin as a signature to identify the virus and delete that file.
Though in real life, static signatures could be cryptographic hashes of a code that these anti-viruses can use to compare with a list of known signatures that are classified as threats.
EVOLUTION OF VIRUSES
As the battle between the viruses and anti-viruses raged on, the virus began to evolve into different forms.
- POLYMORPHIC VIRUS
These are viruses that change their final encrypted form without changing their functions. A polymorphic virus can be compared to a zebra changing its stripes.
This is achieved using a polymorphic engine wherein each time the virus uses different keys to encrypt itself.
A polymorphic engine can be thought of as an encryption algorithm that uses a key to encrypt and decrypt data.The encryption is unique to a key.
THE CURE
A code is a code. It has to decrypt itself at the memory for it to execute its instruction. Using static signatures when the program is decrypted for execution will enable the anti-virus to detect them.
- METAMORPHIC VIRUS
These are much more complicated than their predecessors. These change not only their encrypted form but also their functions. It could be compared to a leopard that turns into a tiger.
Let’s take an example. Let us say that the code executes the following:
n = n / 2
Now in a Metamorphic code, the same code executed again will be
n = n * 4 / 2
Not only the code’s encrypted form changes as a polymorphic virus but in fact, the whole algorithm changes but it gives the same logically equivalent output.
THE CURE
The anti-virus cannot use a static signature to detect them as each virus has unique functions and is not the same as its previous generation.
The anti-virus use ‘dynamic signature’. A program needs to communicate with the operating system through many system-calls whether it’s telling time, getting a list of files, or performing calculations. Dynamic signature is the analysis of these system calls made to the operating system.
Thus any malicious system calls could be identified and the anti-virus software could terminate the program that made those calls.
THE BEST DEFENCE IS A GOOD OFFENCE
Viruses now turn to attack the anti-virus. Disabling parts of code that would be a threat to them.
Let’s say that our anti-virus has a loop to go through all the files in a directory checking if they are a virus or not and removing them if they are
for file in os.listdir(r'C:\\windows\system32\'):
if(is_virus(file)==True):
os.remove(file)
Now, if the virus has the ability to modify the list with the file name being omitted, then there is no way that anti-virus software would detect it.
for file in os.listdir(r'C:\\windows\system32\'):
if (file=='virus.exe'):
continue
if(is_virus(file)==True):
os.remove(file)
These types of attacks could happen on a much deeper level, even on the operating system, tampering with the system calls that are made, thus avoiding its detection. After all, the OS relies on some algorithms and data-structures.
If this algorithm is compromised, there is no way the virus would be visible even when you open the folder.
This leads us to a rabbit hole where viruses and anti-viruses compete with each other attacking the other at a much deeper level.
POLYGLOTS
Each application you open a file with be it a text editor or Photoshop is a bunch of code. The file on the other hand can also be considered as a code that stores your data and could display them when opened by the correct application.
Similar to how a python program could be understood only by the python interpreter and not by the C compiler.
Polyglots here exploit this segmentation. A polyglot file could trick the application that it is opened with as a valid file, but it is actually not.
To understand this, see the following code.
#!/bin/bash
#<?php
echo "Hello world \n This is a new line \n in php not in bash \n"
#?>
This code is a mixture of two languages bash and PHP. If this program is run using both bash and PHP compilers it outputs the message differently.
When the code is run using bash and using PHP
- In bash, the
#
refers to comment, so it ignores everything starting with it. - PHP only starts executing from the tag
<?php
till?>
so anything outside will just get printed or get ignored. - Similarly, bash cannot interpret
\n
as a new line operator, but PHP does interpret it.
Hence when a bash interpreter inspects it, it sees only the bash code similarly, a PHP compiler looks only for valid PHP code.
Many malware use this method of exploitation of file formats to spread. They embed themself into these files. These might cause unnoticeable changes in the file, for example, a small colour change in image files, minor noises in an audio file. Hence when scanned by anti-virus software, they look like a normal file. But for the OS, it is an executable file.
CONCLUSION
The foundation of what is now malware was never laid with malicious intent. People thought it was fun and exciting to create code that could propagate through the network and replicate itself.
Humans are the weakest link when it comes to cybersecurity. Updating our software, using good anti-virus software, not installing malicious apps are some of the countermeasures we could take to protect our space in the cyber domain. There is nothing stopping malware from taking control of your system other than your permission.
Remember, the Internet was never built with security in mind!
This article is published as a part of the ‘Hacker Series’ under Spider Research and Development Club, NIT Trichy on a Web Wednesday!