Lenses SQL for your Intrusion Detection System

Published in

lenses.io

7 min readSep 27, 2018

During the last decade, applications dealing with high throughput data and do so in near real time. Take Network Intrusion Detection Systems (NIDS) for example; a crucial tool in network security — whatever your definition of security is. Until a few years ago, such systems required expensive proprietary hardware solutions, tied with the hardware vendor’s — often poor — software tools. With the advent of cheap and powerful hardware and open source networking solutions, NIDS is within the grasp of organisations of all sizes; route your traffic through a linux-powered router, then use a tool such as MantisNet Software Defined Network to capture traffic and at last send the data to a high performance streaming framework such as Kafka where, you can use the extremely scalable SQL to analyse and react on-time to threats using Lenses.

Continuous SQL queries on streaming data, with Lenses ® can easily enable us to build our own Intrusion Detection system (IDS). But, before jumping into writing the proper SQL to begin detecting intrusions, we need to understand what exactly is an IDS.

What is an IDS?

An intrusion detection system (IDS) is a system that monitors network traffic for suspicious activity and issues alerts when such activity is discovered. While anomaly detection and reporting is the primary function, some intrusion detection systems are capable of taking actions when malicious activity or anomalous traffic is detected, including blocking traffic sent from suspicious IP addresses.

Different types of IDS?

Intrusion detection systems can use different kind of methods to detect suspicious activities, including the following:

Network intrusion detection (NIDS)
Host intrusion detection (HIDS)
Signature-based intrusion detection
Anomaly-based intrusion detection

Intrusion detection systems were categorised to the followings:

Active, it is also called an intrusion detection and prevention system, would generate alerts and log entries, but could also be configured to take actions.
Passive, just detects malicious activity and would generate an alert or logs but it wouldn’t take any action.

In this post we will see the case of a Passive IDS and mainly about NIDS and Anomaly-based detection methods.

Lenses SQL

In order to present the power of continuous SQL queries via Lenses and in parallel build a simple IDS, we are going to focus on DNS traffic and in its different vulnerabilities.

Capture DNS traffic

MantisNet provides a great docker image which captures all the traffic for DNS and DHCP. In parallel, it can send all the data to a Kafka topic. Once the data is present in a topic you can use SQL queries in Lenses to create a Passive IDS applying NIDS and Anomaly-based detection methods. The SQL code required is quite simple.

Let’s see how we can create a few rules for IDS detection using Lenses.

Run MantisNet DNS collector

MantisNet collects the data but for processing it needs to be pushed to Kafka in order to be processed as soon as it is available. Now you need to set the DNS collector to send the data to a Kafka topic.

Firstly, you need to run the DNS collector with the following command:

DNS validation

Rule #1 — DNS Length validation

A typical DNS request in an IPv4 is 512-bytes UDP payload for transporting DNS messages. We can create a continuous SQL query to find all the requests which exceed this number as a violation of DNS protocol.

The continuous query topology/planner shows we filter the traffic of the topic DNS_DHCP_TRAFFIC to validate our first rule for DNS validation where the filtered data will be send to a new topic which will be auto-created.

You are done, you just created your very first IDS rule about DNS length UDP payload validation.

Let’s clarify that this is just an example case of a DNS validation request and if we want to expand this rule to be more accurate, we should take into account other data like if both server and client support EDNS or if it’s a task for zone transfers which both can user larger payloads. Supporting larger payloads over UDP is not advisable. If you do so then you may be confronted with amplification attacks leveraging Nameservers.

DNS Tunneling

DNS was never intended to be used for data transfer, however it has been used for this purpose by individuals with malicious intent for years.

DNS as a tunnel can be established while hiding data (in base64 encoded URLs) inside the DNS requests which then can be turned into real data on the destination DNS server. This can turn into a real threat when malicious software uses DNS to get data out of the company network, or even receive commands/updates from a command and control server.

DNS uses an hierarchical system to determine the correct IP address for a domain as the following image shows:

So in the above example, instead of resolving blog.landoop.com we could send a different request:

Fi0tYwluPWxvA5JoeXmoBGt2a2VyPXRlc3Q7cGFzc3dvcmC9VBNzNDU=.landoop.com

This request will be sent to the landoop.com domain. The difference here is that the string before landoop.com is base64 encoded and will decode to

“domain=landoop;user=spiros;password=123456”

As you can understand it’s really easy to use DNS for data transmission.

Rule #1 — Query URL name length

A typical DNS request is not that long, like google.com, mail.google.com, landoop.com, blog.landoop.com, etc. When using DNS tunneling the URL request character length is higher. The previous example illustrates that. With more information packed in the request, the request can easily go over the barrier of 70 characters long.. This is quite uncommon request for a domain name. We can create a Lenses Continuous Query to find all the requests which exceed this number as a DNS tunneling threat.

We start by grouping the live records on “Source Address” and “URL” in order to keep the unique DNS messages by the same host and the same url. The query filters out those records where the URL request length exceeds 60 characters long. You can see the rule in Lenses as a Stream Topology:

The minimum length of the URL can be fine-tuned to fit your environment and your cases. You could start low (~50) and increase it if you get to many false-positives.

Rule #2 — Pattern of requests

Another DNS tunneling rule is to check the amount of numbers included in the URL. Typical URL doesn’t consist of a lot of numbers. But when the data are encoded using base64 (a group of similar binary-to-text encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation) the URL can potentially consists of a lot of numbers, so we can use this to detect potential DNS tunneling.

This rule checks for all DNS queries and determines if the URL consists of more than 4 numbers and in parallel it safely excludes DNS queries that consists of IP addresses which typically have more than 4 numbers. Of course, the number could be fine-tuned to fit into your environment. Also regular expression matching is an expensive operation but LSQL can scale linearly using Connect or Kubernetes. You can see the rule in Lenses as a Stream Topology:

What’s Next

There is always a next level you can take your solution. The word AI is at every corner. You can leverage the data to apply Stream Data mining and Machine learning models in a real time manner. You can train your IDS system to recognise non signature-based attacks which are not predictable and you can boost your Anomaly-based detection or even classify the severity of the incidents. You can use our Lenses Python library to hook the data in Jupyter and leverage your data science team quickly.

Apart from the above, Lenses SQL helping you to quickly identify suspicious network traffic, you are a small step away from building your real-time alerts dashboard. Lenses, the platform, provides a Javascript library (Redux middleware) allowing you to hook into the real time alerts.

Conclusion

As you can understand, it is a challenging task to apply IDS for large and high dimensional data streams. Data streams have characteristics that are quite distinct from those of statistical databases, which greatly impact on the performance of the anomaly-based ID algorithms used in the detection process. These characteristics include, but are not limited to, the processing of large data as they arrive (real-time), the dynamic nature of data streams, the curse of dimensionality, limited memory capacity and high complexity. Thankfully, Lenses SQL Engine can scale the processing step linearly using Connect or Kubernetes. Data and Security engineers can, therefore, focus on the real problem which in this case is to ensure their systems are secure.

Visit Lenses docs to find out more on Lenses SQL Engine.