AI mission: How self-learning systems detect cyber attacks

Isabell Claus
thinkers.ai
Published in
3 min readJan 31, 2019

Self-learning systems are currently being trialed in many areas. One of the most exciting topics: The detection of cyberattacks. Research is at an advanced stage, practical trials are under way. Now the details are being worked out.

Machine learning uses algorithms in order to recognize patterns or relationships in existing data. These are underpinned by statistical methods, including classic inference statistics, Bayesian models or clustering. On this basis, systems referred to as “self-learning” or also “behavior-based” automatically draw conclusions, calculate probabilities for different scenarios and make predictions.

Such behavior-based systems are used to detect cyberattacks in the IT infrastructure of companies and public institutions. While conventional rule- or signature-based technologies only recognize malware, for instance, if they had access to exact information about their properties in advance. In the event of even minimal deviations from these stipulations, the tools are outwitted and ineffective. Attackers focus specifically on this. They find and exploit new weaknesses in the infrastructure of a company or make use of previously unknown malware. This is why you need specialists today to detect modern attacks. Only machine learning can successively replace a human’s ability to analyze facts and draw conclusions.

An example

Data are stolen from a company’s network. In the technical jargon, this process is referred to as “data exfiltration”. A signature-based system may identify a specific URL pattern for uploads to a potentially dangerous website or it identifies already known malware. Seasoned attackers are, however, able to get around this easily. On the other hand, behavior-based systems recognize that a file upload is currently under way. They are also able to report if this happens from a computer that rarely uploads data or if the destination is unusual. It will be very difficult for an attacker to conceal the main objective of its attack, the uploading of files.

The status quo of research and application

Although machine learning was first mentioned in research as early as in 1999, slow computing times and the high-performance processors required here meant that this was barely an issue for many years. But now that the technical conditions are in place, the topic is one of the most promising approaches to automate workflows that IT security experts currently carry out “manually”.

Success in the application depends heavily on the quality of the data basis. This is not a problem for the area of cyber security in particular, but a general statistical problem. While highly significant events are easy to detect by an automated system, the art lies in automatically distinguishing between low-significance events that are “important from a cyber security point of view” or “not important”. No model has yet come out on top here in practice. As a result, the focus of research is on further developing behavior-based systems. If you add the findings obtained from other data sources such as signature-based systems to the information provided by these systems, you get high-quality data on conspicuities in a company’s network. They make it possible to automatically assess the relevance of an event in the company’s network as an actual security incident.

--

--