Welcome to my attempt at a deep static analysis of the Mirai botnet source code. This blog stands to server as a running journal of my static analysis of the Mirai bot source code. This is my first attempt at an analysis so this blog
will document my process as it unfolds. I expect to make some assumptions that further analysis may reveal to be incorrect and I will correct those as I move deeper through the source code. So this will be an unedited look at my process. I would love any suggestions from those more experienced as I move through.
Although this source code has been public for several years, new Mirai botnet infections are still discovered and code reuse from this bot is being found to be prevelent still. IoT devices will continue to be soft targets on the internet for some time to come and studying, what will be legacy botnets such as this, is crucial to understanding future malware and attacks.
As I move through the code I will be required to also explore the C and Go languages on a much deeper level than my current understanding is at. By the end, I hope to gain further insights into how bot nets are structured, some techniques that the bot uses to defend its territory and attack other systems, as well as the protocols used to communicate with the rest of its network. Along the way, I hope to also uncover some indicators of compromise on an infected system, build a general profile of the author’s coding style and maybe even discover some of the same coding patterns in other malware source discoverable through simple OSINT work.
I want to use all the information that I have available to me that I can find about the source code and inside the source code. Why short myself right? That would just be inefficient analysis.
Here is my general plan of attack:
1.Work to gain a general overview of the code-base. I figure a great place to start would be to review the build.sh scripts that the author shipped with the project. This will
give me a good idea of the finished binaries and how the modules are compiled together, as well as giving me my first little nuggets about any measures that the author took to protect the bot from being analyzed in production, such as stripping the binaries, etc. I also hope to gain insights into the architectures that the author designed the bot for.
2.Next, I want to begin mapping all the botnet’s main components and build a general picture of how I think that these components work together and their corresponding functionalities. I think it’s best to start by identifying any comments, function, variable and file names that give clues as to the bots components and functionalities, before diving deeper. Ex. start_bot() would
presumably start the bot. This function isn’t in the code, it is just an example. The end result being a tentative mapping of the botnet’s components and those component’s functionalities. At this stage, this will be just an assumption. My assumptions about these functionalities and how they interact could and will most likely change as I dig deeper.
3.After I make the above mapping, my next goal will be to scan the code for common anti-debugging techniques and functionality that obfuscates the running binaries, making a list of
defenses that the author may use. A compiled list of these techniques will help to begin to build up a profile on the author, as well as some possible indicators of compromise that can be identified in network traffic.
4.My next step will be to dig deeper into each component to develop a mapping of that component’s control flow and then use this knowledge to refine my component diagram. Along the way adding to my running list of techniques that will help develop a profile of the author. These techniques could include protocols used for networking or any custom functions that
handle basic tasks that the C library already has a built-in function for, a simple example would be if the author has written his own function to handle writing data into memory when this task could’ve been accomplished with a built-in function like memcpy(). This list will of course be evolving as I go along.
5.Finally, I’d hope to have developed a strong overview of the code and can expand my research to searching for malware samples that reuse parts of the code or employ similiar techniques. This will mainly be done through google dorking, github dorking, etc. I will of course welcome any samples from the community.
I’m going to restrict my analysis to using only static review but may employ some dynamic methods if I am unsure about a particular functionality. The above steps are iterative, like any good reconnaissance and analysis in which one step may provide information that leads me to move back to a previous step, refining my analysis with each iteration. I may also add steps as new components are discovered.
I hope the reader enjoys this journey with me and even adds a bit to the table. Who knows, maybe my analysis can add to the rest of the communities research.