Robust Detection of Evasive Malware — Part 3

Loss Landscape vs. Decision Landscape for Adversarial Robustness

Abdullah Al-Dujaili
alfagroup-csail-mit
5 min readAug 29, 2018

--

In our last post, we observed from Table I that models hardened by different inner maximizers performed differently when tested against attack methods that were not incorporated in the training. Put it differently, some of the models (e.g., the rFGSM-hardened model) were able to generalize well to unseen attacks. E.g., the BGA attack achieved a similar evasion rate (5.9%) to that of the rFGSM attack on the rFGSM-hardened model. This was not the case for the BCA-hardened model, where the BGA attack achieved a 91.8% evasion rate compared to the BCA attack’s 7.9% evasion rate.

Why do hardened models have different evasion rates?

Intuitively, one can argue that inner maximizer methods (or alternatively attacks) vary in their strength and so a model that is robust to the strongest adversary should be robust to a weak one too; the opposite is not true.

So let’s say we managed to develop only one attack (one inner maximizer) instead of the four. Will we be able to comment on the corresponding hardened model’s performance on unseen attacks (the other 3 attacks in our case)? This reminds us of the concept of generalization, the ability of a model to perform well on unseen data points. The difference here is that we are concerned with the model’s ability to perform well on unseen attacks. We differentiate these two concepts by the terms, standard generalization and robust generalization, respectively.

Standard generalization refers to the model’s performance on previously unseen data. In the context of robust generalization above, we consider the model’s performance on previously unseen data and unseen attacks.

The standard generalization is subsumed by robust generalization, where the model is expected to handle natural data samples (both benign & malicious PEs) as well as adversarial versions. For models trained with natural data points, flatness (sharpness) of the loss landscape around a model’s parameters has been associated with good (poor) standard generalization (see here and here). Equipped with these findings, we were motivated to check if this is extendable to robust generalization. In other words, let’s look for visual hallmarks for standard generalization (a.k.a. loss landscape flatness) and see if we observe any association with the robust generalization (good performance against unseen attacks). The loss landscape refers to the relationship between model parameters and loss values measured on its training data set (in the case of natural training: it’s malicious PEs + benign PEs, in the case of hardened saddle-point training: it’s adversarial malicious PE versions + benign PEs). For each of the models, we plotted the immediate area around the corresponding model parameters θ spanned by two random directions (α and β). We made use of a filter-wise normalization technique.

As it was trained naturally, the loss landscape of the natural model (top left of the figure below) is associated with the standard generalization, and therefore one can not comment on the association of its flatness to robust generalization. We observe a bumpy geometry of loss landscape for the BCA model. This could be an indicator of a poorly hardened model. However, these plots still do not help in ranking the rest of the hardened models in terms of their robustness: rFGSM, BGA, and dFGSM.

Loss landscape of the trained models.

We took a step back and asked the question: why do adversarial versions exist? In high dimensions, neural net models can behave in a linear fashion and so the decision landscape over blind spots (regions from which no data point was presented during training) is extrapolated setting up the opportunity for adversarial versions to occupy the “wrong” side of the decision boundary. The model’s linear responses are overly confident at points that do not occur in the data distribution (blind spots), and these confident predictions are often highly incorrect. The presence of adversarial version changes the shape of the decision boundary. And so we sought a way of capturing this change.

We use Self-Organizing Maps (SOMs) to visualize the decision landscape of natural and hardened models and superimpose data points and adversarial versions on it. This makes false positives and false negatives easy to see. A self-organizing map is a neural network trained with unsupervised learning to map a set of inputs to a lower dimensional mapping. The pseudo-code below outlines the approach (more details are here).

The figure below shows the computed decision landscapes; we observe that for the most robust model, rFGSM, the adversarial versions (denoted by x) sit relatively far from the decision boundary between benign and malicious points, compared to the rest of the hardened models. As we move from more to less robust models, the adversarial versions get closer to the decision boundary, as detailed in the figure caption.

Decision landscape of the trained models using SOMs. One can observe that close to the decision boundary: the rFGSM model has one adversarial version, the BGA model has around three adversarial versions, and the dFGSM model has around seven adversarial versions neighboring the decision boundary. It gets even worse for the poorly hardened model (BCA), where the adversarial versions step into regions that belong to the benign class with medium to high confidence.

Our empirical observations of the least and most robust models support the proposition that the decision boundary will transform to a more complicated one with the presence of adversarial versions as shown below.

In summary, it appears that the location of the decision boundary relative to the adversarial variations has a stronger association with the hardened model’s robustness, compared to the geometry of the loss landscape around the model’s parameters.

In this series of posts, we investigated methods that reduce the adversarial blind spots for neural network malware detectors. We approached this as a saddle-point optimization problem in the binary domain to train the model via multiple inner maximization methods. We asked how robust generalization can be visually discerned and whether a concise view of the interactions between a hardened decision landscape and input samples is possible. To this end, we presented a SOM-based method to help interpret the same.

Code can be found here (training) and here (visualizing loss and decision landscapes). For more details, you can refer to the papers here and here. This work was supported by the MIT-IBM Watson AI Lab, CrowdStrike and CSAIL CyberSecurity Initiative.

--

--