Machine Learning and Web Security

Published in

Reblaze Blog

6 min readNov 19, 2018

Machine learning has become a popular topic lately.

As the name implies, machine learning (ML) techniques allow computers to learn. Computers are fed large swaths of data, and predictive models are built based on the correlation between the variables, using a blend of statistics, data mining, pattern recognition, and discovery techniques. Over time, the machine’s “intelligence” (for this particular use case) gets progressively better, all without human intervention.

Machine learning has immense potential power. And so, many marketing claims are being made about it. But unlike some of the overhyped terms out there, machine learning really does have the potential to change the way organizations view, and ultimately use, their data.

Machine learning is already helping organizations identify previously unnoticed connections and trends within data. It also offers the potential for features and benefits that would be cost-prohibitive for businesses to provide otherwise. And so, more and more, it’s making inroads into our daily lives. It’s what Netflix uses to make viewing recommendations, it filters fake news out of your Facebook feed, and it’s training self-driving cars to make split-second decisions.

Malicious Use of Machine Learning

Unfortunately, anything that can be used for good can also be used for bad. An early 2018 paper written by researchers from Yale, Stanford, Oxford and Cambridge sums up the issue quite well: machine learning has no ethical bias and therefore, there is nothing to stop malicious actors from using these techniques for their own benefit.

There are lots of ways attackers can harness the power of machine learning for malicious purposes. As techniques and algorithms get tighter, more use cases will surely come up. Here are some ways machine learning techniques are already being used for illicit purposes.

Hivenets are scalable networks of bots, using swarms of compromised devices to wage coordinated attacks. Criminals are now beginning to explore the uses of machine learning to greatly increase their effectiveness. When trained with an ML technique called reinforcement learning, hivenets become self-learning groups of intelligent attackers that share data among themselves to customize and coordinate attacks. Their behavior changes depending on the unique attack circumstances.
Attackers are using machine learning to make predictions regarding how to best launch their attacks. Once they have set their sights on a certain organization, they can use collected historical data, such as where breaches have occurred within that network and how similar organizations have been attacked, to make predictions about where and how to attack their target.
Other potential ML applications include phishing attacks, which are made all the more credible with Natural Language Processing. With coming advances in NLP, attackers will be able to create fraudulent emails that will be nearly impossible to differentiate from the real thing.

The research team from Yale et al concludes that we’re only at the beginning of ML usage by cybercriminals. Just as ML offers powerful benefits for legitimate uses, it has many potential applications for malicious actors as well.

Using Machine Learning for Web Security

Increasing use of ML by threat actors might seem like a discouraging scenario. But it’s only half of the picture. It’s important to remember that ML-related techniques have been used successfully for years to enhance security. For example, Bayesian email filtering has been around since the 1990s. And recent advances in ML ensure that it’s going to play an even more important role in the future.

Reblaze is a pioneer in applying machine learning for accurate threat detection within HTTP/S traffic. Most organizations today have web assets (sites and web apps) to protect; here are a few of the benefits that ML offers for this purpose.

Anomaly detection and monitoring

Unusual processes and events are often the result of hostile activity, but these outliers can go unnoticed for significant periods of time. Machine learning can provide effective anomaly detection, allowing organizations to spot even subtly atypical anomalies within their traffic, so that their sources can be blocked.

Defeating zero-day exploits

Legacy approaches to web security tend to rely on various forms of signature detection. This has several drawbacks, the worst of which is an inability to recognize new attacks that have not been seen before.

Machine learning allows organizations to take advantage of a key fact: that even though unusual incoming traffic is not always the result of a threat actor, every attacker’s actions are always outside the scope of legitimate user activity within that web application.

Thus, a web security platform such as Reblaze can train itself to recognize legitimate users, by learning how they behave. Shortly after its initial deployment for a new site or web app, Reblaze becomes quite sophisticated in its ability to discern whether a “user” is acting legitimately or not. Since it knows what legitimate user activity looks like, any activity that does not match it can be flagged and blocked.

Enhancing analysis through automation

Taking menial tasks off analysts’ plates frees them to do higher level tasks. For example, Reblaze scrubs incoming traffic for the sites and web apps that it’s protecting, logging every single request. This is done automatically, with no human intervention required (although human operators can step in whenever they wish).

Reblaze uses machine learning to continually reshape its security posture in response to current conditions, adapting to new traffic patterns as they arise. Thus, human analysts do not need to do any of this, and are free to do higher-value activities. For example, since all details of all requests are logged and available through a full API, analysts can construct whatever inquiries they want, gleaning business insights from their traffic data.

Predicting attacks

Machine learning is helping organizations predict attacks using supervised learning models such as neural networks and logistic regression, by looking for certain features such as the number of pages a website has, the language it’s written in, if it’s a local or cloud server, and so on. It then looks at the organization’s attack history; Where has it been vulnerable in the past? How many times has it been breached? How many times have similar organizations been breached and how? Using this information, predictive models can be built to assess where the next attacks are likely to be, allowing organizations to take preventive action.

New approaches to web security

The above has described a few of the many benefits offered by ML for robust web security. There’s much more that could be said about threat detection down at the web application level; however, ML also facilitates new approaches to web security overall.

For example, Reblaze currently has hundreds of customer deployments all over the world, all of which stream their (anonymized) traffic data to a central Big Data trove. The data is continually analyzed with machine learning to identify new traffic patterns and trends. As conditions change, all deployments are updated automatically in real time. Whenever a new attack is encountered anywhere in the world, the platform learns from it, adapts to it, and hardens all global deployments against it.

In the past, every site or network exposed to the Internet had to defend itself on an individual basis. Not so with Reblaze, which is, in effect, a globally distributed, machine-intelligent security network. Once any deployment anywhere encounters a new form of attack, all other Reblaze customers become protected against it, even before they encounter it themselves.

In the past, an attacker merely had to outwit an individual WAF (or whatever) protecting his intended target. But with Reblaze, an attacker must contend with a distributed security network:

Which has the computing capacity of the global cloud at its disposal…
And has probably seen his intended form of attack before, and knows how to defeat it…
And knows what legitimate users will do, and are allowed to do, on the networks it’s protecting, thus allowing it to block even a never-before-seen zero-day exploit.

Truly, ML opens up exciting new possibilities in the effort to make the web a safer place for everyone.

Final note

This article has focused on machine learning, which is only one aspect of Reblaze. The platform offers many other benefits which are beyond the scope of this article.

We’d love to show you more of what the platform can do. To get a demo, get in touch with us here.