Machine Learning in Cyber Security Workshop

Machine Learning in Cyber Security

A Review of “Botnet Detection Using Graph‑based Feature Clustering” by Chowdhury et. al.

Yenson Lau

Published in

Aggregate Intellect

8 min readMay 31, 2020

This project is part of the Machine Learning in Cyber Security workshop organized by Aggregate Intellect. This post serves as a proof of work, and covers some of the concepts covered in the workshop in addition to advanced concepts pursued by students.
Paper Reviewed: “Botnet detection using graph‑based feature clustering” Chowdhury et. al, 2017.
Contributors: Tryambak Kaushik, Nick, Ildar Abdrashitov, and Rouzbeh Afrasiabi
Editors: Willy Rempel, Susan Shu Chang

Introduction

As early as March 2019, Los Angeles and Salt Lake City power grids were under cyber attack. Similar cyber threats were also reported in other parts of California and Wyoming. Although no power outages were reported as a result of these cyber attacks, yet according to the Department of Energy, the attack caused “disruption in normal operation of the system”. Similar attacks were also experienced by various Equadorian government websites and public institutions in June 2019.[2] Private business and service providers such as Telegram and Ubisoft (both in June 2019), have also been at the receiving end of cyber attacks resulting in poor customer service and loss of business.

To put in perspective, approximately 20% of all internet traffic is generated by bad bots (or Botnets) and approximately three-quarters of bad bots are Advanced Persistent Bots (APBs), i.e., bots can switch IP addresses, and consequently their identities and behavior [1].

Botnets are zombies that are usually under the control of a single remote attacker known as the botmaster. A command and control channel (C2 or C&C) is used by the botmaster to control the compromised hosts. As the botnet attack works in a relatively distributed system, i.e., several infected machines acting synchronously or asynchronously to accomplish a common goal, it becomes increasingly difficult to mitigate these types of cyber threats.

Classical botnet detection methods such as signature or anomaly detection very quickly become obsolete with the advancement and/or change in bot algorithms, better encryption, packet structure or change in data volume. In such circumstances, botnet detection by identifying topological features using graph based methods offers a promising way to identify the affected hosts and take corrective actions. In the method by Chowdury et. al., there is no need to cross-compare data flow between different connections in a network and instead it tries to identify structure or community based patterns, such as social correlations, community disconnections, temporal co-occurrences, graph clustering, etc. Furthermore, botnets are resistant to node-level-take down by their inherent design and thus display assortativity. Graph features can identify assortativity, and thus have distinctive advantages over statistical methods [6].

Chowdury et al (2017) built a weighted directed graph using IP addresses as nodes and connections as edges and used graph features for unsupervised clustering to filter out benign (not bots) nodes. This allowed a more representative behavior of botnets and thus higher success rate in their detection.

The authors used the CTU-13 dataset, which consists of 13 subsets, or scenarios, depending on the botnet type and different attack scenarios. The summary of the dataset is presented in Table 1

Table 1: Botnet Scenarios in CTU-13 dataset

Graph Features

The paper discusses extracting seven graph-based features from the dataset as follows:

For the sake of simplicity, we will assume that we are dealing with directed graphs where connections between nodes have a certain direction.

“In degree feature” for a particular node represents a count of “incoming” connections or, in other words, a number of edges directed towards that particular node. High values of in degree feature indicate the neighboring nodes tendency to establish more connections with a particular node.
Out degree feature” is the opposite of in degree feature and indicates a number of connections that a particular node is making.
“In degree weight feature” represents the total number of packets received by a node from its connected neighbors. This can be useful information if we assume that bots usually receive the same types of commands and the same volume of information.
“Out degree weight feature” is again the opposite of in degree weight and looks at how many packets a node sends to its connected neighbors assuming that bots send the same type of commands and the same information.
“Node betweenness centrality” is a measure of the number of shortest paths between nodes that pass through that particular node. This helps to quantify the interconnection of nodes to detect botnets.
“Local clustering” is a measure that quantifies the closeness of a node to its neighbors.
“Eigenvector centrality” is a measure of influence of a node in a graph.

Clustering

The authors used a two step approach to cluster the dataset and seperate the benign nodes from the bots. First, after creating a graph using the Netflow data and extracting the graph-based features, they filter out all single degree nodes. This helped to filter out large amounts of data and thus drastically reduce the search space.

Single degree nodes are very unlikely to be bots since bots are meant to attack normal nodes. Next, they use a Self-Organizing Map (SOM) to cluster different data nodes together. Once the SOM algorithm has run, the cluster with the largest size is removed. While the search space has been reduced, we still need to use classification tools or subject matter experts to search the remaining data nodes for bots. The specifics of the Bot Detection algorithm is as follows:

1. Create a directed graph using IP addresses as the nodes and connections as the edges

2. Compute the 7 graph-based features from each node in the graph(F1-F7)

3. Filter out single degree Nodes

4. Perform SOM to cluster remaining Nodes

5. Arrange map nodes in ascending order of how many data nodes are in them.

6. Remove the largest map node.

7. Starting with the smallest remaining cluster, investigate all remaining nodes in the rest of the clusters (using classification tools or subject matter experts)

8. Stop the algorithm when bots are detected

In the second step, authors used SOM, to cluster the nodes in groups of benign and not-benign. The algorithm first initializes each node on a M×N grid with a small random vector. In the second step, each data point is mapped to the closest the map node (distinguishing a map node which is part of a SOM to a data node which is part of the data). This node is called the Best Matching Unit (BMU). In the third step, algorithm updates all the nodes in a certain neighbourhood of the BMU (including the BMU itself). These steps are now repeated for all data points.

In the paper, the authors create a 5×5 grid of map nodes and are initialized with a 7d vector. A 7d vector is then used to match the size of the data vector which has the 7 graph based features. For example, if we have a data node with in degree = 1, out degree = 2, in degree weight = 25 MB (should be refined with a good value), out degree weight = 20 MB (same as before), node between centrality of 0.4, local clustering coefficient of 0.01 and eigenvector centrality of 0.3.

After each dimension is normalized, this gives us vector of (0.8, 0.6, 0.25, 0.3, 0.4, 0.01, 0.3), we compare this vector to all map nodes and find that the closest map node is (3,4) with vector (0.83, 0.57, 0.31, 0.8, 0.2, 0, 0.27). Afterwards, we follow Step 4 in the algorithm below to update this node and all nodes within its radius.

The neighbourhood of the nodes is a monotonically decreasing function in time (time elapses after each datum being presented). As time moves on, the SOM goes from more globally changing nodes to locally changing nodes. The nodes update rule warps the SOM nodes to approximate the distribution of the data. Data points can then be clustered onto the map node that is closest to it. Because of this, the nodes become a 2d approximation of the distribution and hence why SOMs can be used for dimensionality reduction.

Figure 1 : SOM warping to fit the data distribution. [4]

They also add an extra filtering step, to filter out all nodes that only have one edge. These techniques are able to consistently filter out more than 98% of nodes in all datasets, even with the botnet nodes having very different topologies. They are able to get consistent results, even with different topologies, because they are able to filter out “normal” nodes which drastically reduces the search space.

These results look quite impressive, but it is not very implementable in practice. The method used is computationally expensive, this would make it very hard to catch a botnet “in the act”. On top of this, there are no clear time limits for when you should apply this method since each of the datasets are of varying size and time. This makes it so that the method is more useful for looking backwards on data when you suspect there has been an attack rather than being proactive and trying to stop an attack.

To be proactive, a method should be relatively quick computationally and be able to use streaming data. One way to do this would be to chunk incoming NetFlow data into small time segments and then apply the above method. This smaller segment becomes much more computationally manageable and will help catch bots close when they are attacking. A list of trusted IPs can be kept and updated so that these IPs are always filtered even if they do not end up in the largest cluster.

Conclusions:

The main contribution of Chowdury et al’s paper was to use graph features of NetFlow data as inputs for unsupervised learning algorithms. By building a directed graph where the nodes are IP addresses and the edges the connections between them, and by feeding the features into an SOM, they are able to filter close to 99% of the nodes. Even though there is a large variation between the graph features of the bots in the different datasets, their method consistently works because it is able to capture the abnormality of the bots relative to the normal nodes.

Although very useful, the algorithm presented in the paper has several drawbacks. Mainly, being computationally expensive, the algorithm cannot give meaningful results quickly in real time. Secondly, the algorithm requires a subject matter expert and/or other statistical tools to sort the smaller clusters and find the bots. A possible way to get around these problems is to chunk incoming data in small batches before applying the above algorithm. This would keep the data at a manageable size while making it easier to catch the active bots.

References:

https://www.zdnet.com/article/bad-bots-focus-on-financial-targets-make-up-20-percent-of-web-traffic/
https://securelist.com/ddos-report-q2-2019/91934/
Garcia, Sebastian, et al. “An empirical comparison of botnet detection methods.” computers & security 45 (2014): 100–123.
Kohonen, Teuvo. “Self-organized formation of topologically correct feature maps.” Biological cybernetics 43.1 (1982): 59–69.
https://en.wikipedia.org/wiki/Self-organizing_map
Venkatesh, Bharath, et al. “BotSpot: fast graph based identification of structured P2P bots.” Journal of Computer Virology and Hacking Techniques 11.4 (2015): 247–261.
Chowdhury, Sudipta, et al. “Botnet detection using graph-based feature clustering.” Journal of Big Data 4.1 (2017): 14.