Machine Learning and Its Disruptions in the Information Security

Arun Prabhakar
The Startup
Published in
9 min readNov 11, 2020

One of the lovely things that excites a child is playing with the bubbles. The limitless fun and joy that children experience is quite entertaining and are memorable too. One such picture is what we see here with many beautiful bubbles, formed by the right mix of soap and water. I thought this photo is a good way to resonate and understand the impact of the disruptive effect of Machine Learning on Information Security. How is this picture analogous to the topic that we have? Well, let us see how. We all know how Machine learning has been disrupting every domain in the Information Technology Sector. InfoSec disciplines are no exception to this and there are many practioners and enterprises who are early adopters of this subject. And yes, a right mix of these two technologies have seen many emerging concepts and practices that are born, addressing several business challenges. I thought it would be good to share some of these and how Machine learning has been applied to solve such challenges in Information Security.

RegTech

RegTech also known as Regulatory technology is an emerging technological solution and mostly benefits the professionals in the audit and compliance domain. The primary focus of RegTech is to help businesses comply with regulations efficiently and thereby better the existing regulatory processes. There are many technologies that help in the making of RegTech solutions and in that ML and AI have a significant role. Though RegTech solutions are commonly used across the Information Technology space, the Information Security practioners leverage it for addressing many of the challenges in the domains of Risk Management, Identity Management and Compliance Monitoring and Reporting.

A common business scenario where RegTech solutions could become very helpful for the compliance professionals is when they must perform a regulatory audit that has most of its documents and requirements running into pages. It is a highly difficult process to go over these lengthy documents and identify actions for non-compliance when performed using traditional methods. Data from the past will have to be considered including relevant attributes while taking the decision about non-compliance and legal implications. This is not only a time-consuming activity but also adds more complexity to the whole process.

When RegTech solutions are applied to address this challenge, they make use of many Deep learning algorithms that can help in the processing of large volume of regulatory requirements. We can design many models using ensemble techniques and by leveraging Natural Language Processing (NLP) algorithms to help with text classifications, text extraction and come up with meaningful insights about the data so that the regulatory professionals can carry out their tasks more efficiently and it works out very effectively for organizations to complete the compliance audit in a relatively short time. RegTech solutions are commonly used by many of the FinTech companies in addressing the challenges coming from Know Your Customer (KYC)checks and Anti-Money Laundering (AML). The adoption of this in other domains are strongly on the rise.

Threat Intelligence

Threat intelligence is the concept that gives information about the context of threats, the motivation of threat agents, assets in our organization that are of interest to threat agents and the defensive mechanisms that needs to be taken in advance. Threat intelligence is considered by many as a component within Security Intelligence. Going by the definition given by Gartner research, “Threat intelligence is evidence-based knowledge, including context, mechanisms, indicators, implications, and action-oriented advice, about an existing or emerging menace or hazard to assets that can be used to inform decisions regarding the subject’s response to that menace or hazard.”

There are many disciplines within Security that deals with huge volume of data, this is especially true with the SOC analysts, DLP professionals and Firewall administrators who are at the frontline of defense. It is important for them to deal with the right characteristics of incoming data for further analysis or for integrating with other tools and not spend much time to segregate the information and the noise of the data coming from several different sources. This is when the data engineering techniques augmented with machine learning algorithms are very helpful. Many of the data cleaning, data analysis, data transformation and data visualization methods are not only a pre-requisite for the ML algorithms but also helps nail down the problem and focus on the objective. This is followed by the feature selection and feature extraction techniques that will help in building effective models and leads to discovering hidden patterns. Additionally, this also helps identify similar data so that the admins can cluster them for learning more insights.

There are also times when the Threat and Vulnerability management professionals use security tools for application assessment that may report high number of false alerts whereas using tools that are built by leveraging threat intelligence will help generate only the relevant threats. Models developed using machine learning algorithms are configured to report only threats that match a high degree of accuracy or any equivalent classification metrics that are configured, by analyzing different qualities of the past data. The concept of threat intelligence is very broad with many types of it implemented (from strategic to operational) to serve different purposes. There are many threat intelligence platforms also called TIPs that implement different concepts of Machine learning to build models that supports its complete lifecycle so that the process is repeatable.

Zero Trust

The term was first coined by Forrester Research about a decade ago and we have seen this concept being chosen by a lot of enterprises in the recent years. Zero Trust is a security concept that simply means not trusting anyone by default, irrespective of the entities accessing the organizational network from inside or outside of it. Zero Trust models makes it mandatory for any company approved devices, employees, partners, vendors who are considered insiders and customers to third parties accessing the network from outside to go through a thorough identity check and maintain strict access control procedures before accessing the resources within the network.

At the outset, the core of Zero Trust architectures involves using the fundamental security principles of identity, access control, and authentication. The model requires a well-defined process to be successfully executed. However, there are many implementation-level challenges, and it requires the use of advanced technologies to overcome them. The most common challenge is when giving access to entities for resources access within the network. There are so many parameters that are to be considered before letting the person access the network including user privileges, groups associated to, job type, time of day, location, history of commands executed, passwords changed in the recent past and many others. This aspect repeats every time for the same person for every access attempt and for any given user at any point in time. Therefore, any product implementing Zero Trust would have to handle tremendous amount of data.

As a Data scientist, the first step that I would do when I am assigned with such a task, after working on all the data wrangling techniques is that I will be using a classification algorithm since the task is to approve or deny access or to take the required action. Secondly, the target class is the one that got to be determined, hence I would use a Supervised learning algorithm to approach this. Finally, since we are dealing with many parameters, (one of the example) I can think about is by solving this is using Decision Tree classifier, where the target class can be predicted by involving all the attributes discussed above to take the right decision at every level in a sequential manner. Those implementing Zero Trust concept in the product do understand the value in applying ML as they learn continuously and adapt to the evolving. Context plays a vital role in making Zero Trust possible since we are dealing with the behavior of users and devices and this can be achieved by having AI/ML incorporated as we are dealing with lots and lots of data.

There are many IDaaS (Identity as a Service) providers that achieve Zero Trust Security in their product or offering by combining the power of Next-Gen Access and Machine learning techniques. Identity and Access Management (IDAM) professionals are hugely benefitted by this approach.

Security 4.0

We are all very familiar with the term Industry 4.0 which is playing a key role in the ongoing massive evolution in the manufacturing industry. There are about a dozen technologies that are being used to achieve the transformation using Industry 4.0. Cyber security is certainly one among them and plays a noteworthy role. Industry 4.0 is used to build many types of security solutions as well. There are three important technologies that are being used from a developmental and support viewpoint. They are Industrial IOT, Machine learning and Cyber security. We all know the role of Internet of Things that acts as the platform for all these devices to communicate with each other. Since there are many devices and humans involved it generates a lot of data, Machine learning and analytics are leveraged to extract and build the required intelligence. Finally, to design such systems with the highest level of security standards and regulations incorporated, cyber security technologies and tools are required.

Security Architects play a major role in designing these critical infrastructures especially those that deal with safety and security. They must be exceptionally skilled at using IOT and Machine learning when developing solutions and even while assessing such systems. Security 4.0 is a practice where specialized security services are offered to secure products developed using Industry 4.0, also for security professionals to get involved, end-to-end in the designing and building of safety and security solutions in the cyber physical domain.

Remember the threat landscape of these systems including the defense mechanisms required are completely different from the traditional ones. Let us consider an intrusion detection and access monitoring system having a requirement to perform face verification of individuals gaining access, the Security Architect must be able to finalize which similarity learning algorithm works better for the given scenario and accordingly determine the classification metrics that would be relevant to the business including the right levels of threshold to determine access based on the regulatory requirements around security labels so that right course of action is taken. Security 4.0 is an emerging practice in product manufacturing companies.

Others

There are other areas of security where machine learning has been found to be very impactful. One of them is SOAR, which is Security Orchestration Automation Response. SOAR helps many elements involved in the security incident and response process collaborate, to streamline the process and achieve a better outcome. SOAR solutions are combined with SIEM tools in such a way that the former focuses on orchestrating and automating different third party-tools while the latter helps collate and analyze the data to generate alerts precisely. Machine Learning is a critical component of SOAR that helps at every stage of the process like in analyzing incidents, triaging efforts, or when a remediation is suggested, or for that matter when an escalation is made.

User Entity Behavior Analytics is another area where we are witnessing many new developments. UEBA primarily helps with the discovery of insider attacks. It uses the concept of Machine learning to understand the actions of (trustworthy) users within the network, find out anomalies in their routine behavior that may lead to a possibility of threat. UEBA works alongside Security Incident and Event Management (SIEM) and Data Loss Prevention (DLP) tools.

Final thoughts

This may not be all. This is just a collection of some of the common and emerging practices in the Information Security domain. The business cases which I have discussed are the elementary ones to make it easier for anyone to follow along. They provide a bird’s eye view without getting into technicalities. There are many complicated scenarios where researchers are joining hands to build great platforms and products. Though, the benefits are many, there are challenges as well. There are challenges for engineers/consultants from execution purview, challenges faced by business from an adoption standpoint but to address them we have Security Data Scientists. Ohh, who are they? Well, in short, they are the people skilled at Security, Data science and Machine learning, working to build many of these practices. What they resolve, what are their responsibilities, how they are benefitting the business, what are the challenges they face … well, that would be a separate blog altogether.

For now, let’s predict how long these bubbles travel along, do they grow bigger, do they merge with other bubbles or do they even burst. Just like the kids having fun with them, for any technology practioners, it is going to be a fantastic time working on the applications of Machine Learning in Information Security domain, not just from an execution and usage standpoint but also in researching, contributing to the development of new practices and importantly sharing with the wider community.

--

--

Arun Prabhakar
The Startup

Arun is a DevSecOps consultant with a strong interest in Product security and Security Data Science.