Robust Detection of Evasive Malware — Part 2

Abdullah Al-Dujaili
alfagroup-csail-mit
3 min readAug 29, 2018

In our last post, we saw how crafted adversarial versions of malicious PEs could break a naturally trained model and we concluded by asking whether we can make our model more robust. Fortunately, we can!

Using the saddle-point formulation, one can incorporate the crafted adversarial versions into the training procedure of our model. Saddle-point formulation is a composition of an outer minimization problem (here, it is the model’s loss minimization) and an inner maximization problem (here, it is the crafting of adversarial malware versions).

This basically changes our training objective for θ, with respect to the loss function L given data points {(x=feature vector,y=label)}, from

to

where x¯ is an adversarial version of the malicious file’s feature vector x. S(x) is the set of allowed adversarial perturbations (Constraints I & II in the last post).

In other words, we use the attack methods described in the last post as inner maximizers to create adversarial versions of the malicious PEs before minimizing the loss (outer minimization in the formulation above) as shown below.

You probably have observed that in both of the inner and outer problems, we are making using of loss function’s gradient: for the former, we are taking the gradient with respect to the input feature vector; for the latter, it is the gradient with respect to the neural net parameters. Given that our outer minimization can be non-convex and our inner-maximization is discrete (the feature vectors are binary), does using the corresponding gradients really harden our model and lead us to the robust optimal θ*?

While there is no theoretical guarantee on the above, the saddle-point formulation was of particular interest to us for two reasons. First, in the case of a continuous differentiable loss function (in the model parameters θ), Danskin’s theorem states that gradients at inner maximizers correspond to descent directions for the saddle-point problem (see Appendix A of this for a formal proof). Second, it has been shown empirically that one can still reliably optimize the saddle-point problem for learning tasks with continuous feature space even with loss functions that are not continuously differentiable (see this and this).

Hardened Models

Using the saddle-point formulation above, we trained 4 models (of the same architecture as the natural model — last post’s model) with the last post’s 4 attacks: rFGSM, dFGSM, BGA, and BCA, respectively.

To test the hardened models, we measured the evasion rates of the malicious test set, as well as its 4 adversarial versions (crafted by the 4 attacks), as shown in the table below. The rows correspond to the trained models (denoted by their inner maximizer methods). The columns correspond to the crafted test versions.

Table I. Evasion Rates (in %) for naturally trained (standard) and hardened models.

From the table, we see that each hardened model is most robust against the attack which it was trained with (cf. the outlined cells). This is consistent with the saddle-point formulation! From all the models trained, we managed to minimize the evasion rate from 99.7% (natural model) to 7% (rFGSM-hardened model) across all the tested attacks. The performance of the BCA-hardened model, however, is not that different from the natural’s. Why is that? Let’s try to answer this in the next post. Stay tuned!

--

--