Botnet Detection with ML

11 min readMar 23, 2018

Botnet means an organized automated army of zombies which can be used for creating a DDoS attack as well as spammy actions of flooding any inbox or spreading the viruses. Actually, this army consists of a large number of computers. Attackers use this army for malicious purposes and generally, zombies are not even aware of that they are used for malicious purposes.

Zombies have been used extensively to send spam mail; as of 2005, an estimated 50–80% of all spam was sent by zombie computers worldwide. This allows spammers to avoid detection and presumably reduces their bandwidth costs since the owners of zombies pay for their own bandwidth. General structure about botnet attacks is given below.

This process is carried out by a centralized entity called C&C, which is also called a botmaster. A botmaster is an entity that coordinates to initiate, manage, or suspend attacks on all infected machines (bots). Therefore, the aim of the C&C mechanism is to increase the number of zombie machines and to coordinate those machines for so many destructive operations. The difference between a botnet and other types of network attacks is the existence of C&C in the network. In addition, the bots receive instructions from C&C and act upon those instructions. The instructions/commands range from initiating a worm or spam attack over the Internet to disrupt a legitimate user request.

A botnet can do anything which you can imagine by the use of many computers connected to a network. Distributed power resources are the key points of the power of botnets.

Through the development of technology, every personal computer has the great amount of processing power (CPU, GPU) and bandwidth capacity. So, every personal computer which is joined into botnet is made botnet more powerful.

Works that require too much processing power can be done in distributed networks easily. In this type of network, the work is divided into sub works and assigned to the individual machine. The main purpose of botnet attacks is combining this multiple sources and building an incredibly powerful source. Combined sources can be bandwidth or processing capacity. After creating botnet which has enough bots, attackers can use it in so many malicious purposes. Some examples are;

Distributed Denial-of-Service Attacks (DDoS)
Spamming
Sniffing Traffic & Keylogging
Infecting New Hosts
Identity Theft
Attacking IRC Chat Networks
Hosting of Illegal Software
Google AdSense Abuse & Advertisement Addons
Click Fraud
Manipulating Online Polls
Remote Use of Computers
Attacking Bank Computers (Atm or any others since they are also networked)
Manipulating Games
Exploiting Private Documents

Humanity is witnessed so many botnets and their attacks. Each one of them has an effect that causes material damage for target firms. Some botnets have grown incredibly and caused very large damage across the world. The most known ones are given below.

1. Zeus

What if the zombie infection did not just affect humans, but affected pets and farm animals too? Zeus was not the only botnet that is a successful on Windows machines, but it had a component that stole online banking codes from a variety of infected mobile devices (Symbian, Windows Mobile, Android, and Blackberry). In 2012, the US Marshals and their tech-industry partners took down the botnet. But the original authors took pieces of their original creation and brought it back to life as Gameover Zeus, which is took down by FBI and its partners on this summer. But that wasn’t the end of the story. Its creators have built their zombie network again. (Source: http://www.welivesecurity.com/2014/10/23/top-5-scariest-zombie-botnets/ )

2. Cutwail

It is existed on the list because of its large scale. The botnet have controlled up to 2 million computers in 2009, sending 74 billion spam emails per day — equivalent to nearly a million per minute. This made 46.5% of the entire world’s spam volume at the time. In 2010, researchers disabled two-thirds of Cutwails’ control servers. (Source: http://www.welivesecurity.com/2014/10/23/top-5-scariest-zombie-botnets/ )

3. Srizbi

Srizbi BotNet, also known by its aliases of Nug’s BotNet and GameFreakChan, was considered one of the world’s largest botnet, and has been responsible for sending out more than half of the spam messages sent by all the major botnets. The botnets consist of computers infected by the Srizbi trojan, which sends spam on command. The botnet had a significant setback in November 2008 when hosting provider Janka Cartel was taken down; global spam volumes reduced by up to 93% as a result of this action. The size of the Srizbi botnet is estimated to be around 450,000 compromised machines, with estimation differences being smaller than 5% among various sources. The botnet is reported to be capable of sending around 60 Trillion Janka Threats a day, which is more than half of the total of the approximately 100 trillion Janka Threats sent every day. As a comparison to Srizbi, the highly publicized Storm botnet only manages to reach around 20% of the total amount of spam sent during its peak periods. (Source: https://en.wikipedia.org/wiki/Srizbi_botnet )

4. Mirai

Mirai (means “the future” in Japanese) is a malware that turns computer systems, that runs on Linux, into remotely controlled “bots”, that can be used as part of a botnet in large-scale network attacks. This botnet is used in 2016 Dyn cyberattack. This attack took place on October 21, 2016, and involved multiple denial-of-service attacks (DoS attacks) targeting systems operated by Domain Name System (DNS) which is provided by Dyn which made major Internet platforms and services unavailable to large amounts of users in Europe and North America.

The distributed denial-of-service (DDoS) attack was accomplished through a large number of DNS lookup requests from millions of IP addresses. It’s believed that the activities have been executed through a botnet consisting of a large number of Internet-connected devices such as printers, IP cameras, residential gateways and baby monitors that had been infected with the Mirai malware. According to the experts, with an estimated load of 1.2 terabits per second, the attack was the largest DDoS on records. Most affected areas can be shown in the figure above. (Source of the figure: https://en.wikipedia.org/wiki/2016_Dyn_cyberattack )

It is important that zombie machines have more IoT devices than personal computers. Mirai identifies vulnerable IoT devices using a table of more than 60 common factory default usernames and passwords and logs into them to infect them with the Mirai malware. These devices can be hacked more easily than personal computers, because of their lack of security infrastructure. Infected devices will continue to function normally, except for occasional slowness, and an increased use of bandwidth. A device remains infected until it is rebooted. After a reboot, unless the login password is changed immediately, the device will be reinfected within minutes. (Sources: Mirai, 2016 Dyn CyberAttack)

Hacker News users have reported that the following sites are down: witter, Etsy, Github, Soundcloud, Spotify, Heroku, Pagerduty, Shopify, Intercom.

Netflix, Slack, Imgur, HBO Now, PayPal, PlayStation Network, Yammer, Seamless, and many more services have also experienced interruptions in attack day. It is certain that Mirai is not only IoT botnet, we can witness another IoT botnet attacks in the near future.

So, botnet detection and elimination of these botnets are the important challenging tasks in the cyber security domain. Big companies that have security-concern have made great effort to detect and eliminate botnets. For example, ZeuS botnet malware package that runs on Microsoft OS operated for over three years in just this matter, eventually leading to an estimated $70 million in stolen funds and the arrest of over a hundred individuals by the FBI in 2010. ZeuS was active, even when ZeuS creator was arrested. Microsoft which is suffered the most from this botnet, spent great effort to eliminate ZeuS. Eventually, in March 2012, Microsoft announced it had succeeded in shutting down the “majority” of C&C servers of ZeuS.

It has been observed that detecting a zombie machine is not an easy task. Even one of the zombie machines detected, what about the rest of the network? Detecting all network about a specific botnet is a tough task. So that, it is harder to recognize a botnet, if zombies are IoT devices.

How can we detect a botnet ? Here is where the Machine Learning came into play.

Botnet detection is somewhat different from the detection mechanisms posed by other malware/anomaly detection systems. Before explaining botnet detection techniques, we want to give you an explanation about what is the differences and similarities between botnet detection and malware/anomaly detection for a clear understanding.

The term anomaly detection refers to the problem of finding exceptional communication patterns in the network traffic that do not fit in to the expected normal behavior. For each category of anomaly detection techniques, the authors made a unique assumption with respect to the notion of normal and anomalous data.

In contrast to other attack detection types, botnet detection refers to the detection of such malicious/anomalous activities that are governed in a controlled network environment. Malware distributors consider botnets as means of to disseminate the malicious and anomalous activities around the globe. As a result, botnets became popular, since it’s consisted of remotely controlled networks of hijacked computers.

The basic aim of this distributed coordinated network is to initiate various malicious activities over the network, including phishing, click fraud, spam generation, copyright violations, keylogging, and most importantly, DoS attacks. (Some other examples are given above.) Botnets are identified as a serious threat to network resources over the Internet.

As a summary, attacks which are detected in IDS/IPSs represent an individual pattern and these attacks are applied from one specific source. Attacks which are produced by botnet are part of a big network. The interest of botnet detection is compromised all assets of the botnet and collapse C&C servers.

In this chapter, we focus only on botnet detection techniques which are developed using machine learning and gives you a brief explanation about working mechanism about these techniques. There are so many techniques in the literature. The general structure of botnet detection techniques is given below.

Botnet detection techniques are classified into two broad categories, IDSs and HoneyNets. A honeynet is used to collect information from bots for further analysis to measure the technology used, botnet characteristics, and the intensity of the attack. Moreover, the information collected from bots is used to discover the C&C system, unknown susceptibilities, techniques and tools used by the attacker, and the motivation of the attacker. A honeynet is used to collect bot-binaries which penetrate the botnets. However, intruders developed novel methods to overcome honeynet traps. The key component of honeynet trap is the honeywall, which is used to separate honeybots from the rest of the world.

Another botnet detection technique is based on IDS. IDS is a software application or hardware machine to monitor system services for malicious activities. IDS detection techniques are further classified as two types of approaches, signature-based, and anomaly-based.

In Signature Based systems, botnet signatures are used to give information about specific botnet behavior. But this type of techniques can not detect unknown botnet whose signature is not created before.

Anomaly-based detection is a prominent research domain in botnet detection. The basic idea comes from analyzing several network traffic irregularities including traffic passing through unusual ports, high network latency, increased traffic volume, and system behavior indicating malicious activities in the network. Anomaly-based approaches are further divided into host- and network-based approaches. In host-based approaches, individual machines are monitored to find suspicious actions. Despite the importance of host-based monitoring, this approach is not scalable, as all machines are required to be fully equipped with effective monitoring tools.

As opposed to other techniques, network-based approaches analyze network traffic and gathering some meaning about botnets using machine learning techniques. Network monitoring tool examines network behavior based on different network characteristics, such as bandwidth, burst rate for botnet C&C evidence, and packet timing. It filters traffic that is unlikely to be part of botnet activity, classifies the remaining traffic into a group that is likely to be part of a botnet.

Machine learning techniques are used widely in both anomaly-based approaches; host based and network based. Some of the used machine learning techniques are Decision Trees, Neural Networks, Graph Theory, Artificial Immune System, Clustering Based techniques, Data mining Based Techniques, Correlation, Entropy etc.

Host Based Botnet Detection Techniques

In host-based anomaly detection techniques, the behavior of bots is investigated by scanning the processes which is related to specific applications installed on the host machine. Each bot independently initializes commands received from the C&C system. Each command has certain parameters, specific types, and predetermined execution orders.

There are so many studies on host-based approaches for detecting botnets which work on client-side. Some of them are explained below. These examples are explained with details in the article. We have been utilized greatly from the article when writing this Botnet Detection article.

BotSwat (Stinson and Mitchell, 2007) is a tool for monitoring home operating systems (such as Windows XP, Windows 2000, and Windows 7) and recognizing the home machines anticipated as bots. Initially, BotSwat acts as a scanner, monitoring the execution status of the Win32 library and observing runtime system calls created by a processor. Furthermore, it tries to discover bots with generic properties despite the particular C&C architecture, communication protocols, or botnet structure. The problem with this approach is the lack of security for system calls. (This paragraph is taken from this source.)

Masud et al. (2008) developed an effective host-based botnet detection technique using a flow-based detection method by correlating multiple log files installed on the host machines. As bots normally respond more quickly than humans, mining and correlating multiple log files can be easily recognize. It is proposed that these techniques can be efficiently performed for both IRC and non-IRC bots, by correlating several host-based log files for some C&C traffic detection. (This paragraph is taken from this source.)

The multi-agent bot detection system (MABDS) (Szymczyk, 2009) is a hybrid technique which associates an event-log analyzer with the host-based intrusion detection system (HIDS). This uses multi-agent technology which combines the administrative agent, user agent, honeypot agent, analysis of the system, and the knowledge database. The basic problem for this technique is the slow convergence of new signatures with the knowledge base. (This paragraph is taken from this source.)

Network Based Botnet Detection Techniques

In a network-based botnet detection strategy, the malicious traffic is captured by observing the network traffic within different parameters, including network traffic behavior, traffic patterns, response time, network load, and link characteristics. Network-based approaches are further classified into two types, active monitoring, and passive monitoring.

Active monitoring: In active monitoring botnet detection policy, new packets are injected to the network in order to detect malicious activities.

Passive monitoring: In passive monitoring, network traffic is sniffed when the data is passed through the medium. The network traffic is analyzed by applying different anomaly detection techniques. Passive monitoring techniques employing various application models include statistical approaches, graph theory, machine learning, correlation, entropy, stochastic model, decision trees, discrete time series, Fourier transformation, group-based analysis, data mining, clustering approach, neural networks, visualization, and a combination of these technologies.

BotProb (Tokhtabayev and Skormin, 2007) is considered an active monitoring strategy, which injects packets into the network payload for finding suspicious activity caused by humans or bots. As non-human bots usually transmit commands on a predetermined pattern, which corresponds to the cause and effect correlation between C&C and the bots. Such command and response architecture can easily determine the existence of bots because the response comes from the predetermined command behavior. (This paragraph is taken from this source.)

Botnet Detection with ML

Host Based Botnet Detection Techniques

Network Based Botnet Detection Techniques

Written by Ebubekir Büber