“Computational Asymmetries in Robust Classification” Explained
TL; DR
- Adversarial attacks are very effective, but defenses are broken often mere months after publication
- Adversarial attacks are NP-hard (i.e. difficult), adversarial defenses are Sigma2P-hard (i.e. very difficult)
- This asymmetry only happens with defenses that are run before seeing the actual input to be classified
- If we use inference-time defenses, the asymmetry can be reversed in favor of the defender
Premise
Usually, when reading a new paper, I first check if there’s already someone else who took the time to read it and explain it in a less “papery” manner. Since “Computational Asymmetries in Robust Classification”, the paper I wrote with my great advisor Michele Lombardi, was accepted at ICML 2023, I thought “Why not do the same for my own paper?”.
Obvious disclaimer: since this is meant to be a gentle introduction to the paper, I’m going to ignore a lot of nuance and mathematical details. Refer to the original paper for all the formal explanations.
Introduction
Adversarial attacks. You take an image, add a small perturbation to it, and boom, your classifier goes crazy.
Over the years, researchers have developed a lot of defenses against adversarial attacks. Adversarial detectors, adversarial training, adversarial-proof preprocessors.
And then Athalye et al. showed that a lot of defenses can be fooled with a few simple tricks. And then Carlini broke 10 adversarial detectors in a single paper.
This happened again, and again, and again, and again, and again. Countless defenses, which showed a lot of promise, were fooled in mere months after publication. Even adversarial training, which is the most common defense against adversarial attacks, is far from perfect, often leading to worse accuracies.
Why does this happen? Why are attacks so effective and defenses so ineffective?
Let’s consider two different scenarios.
Adversarial Attacks
Suppose that we need to find an adversarial example for a model f. In other words, we need to find an input x′ that is not too different from the original input x (usually according to an Lp norm) and that is misclassified by the model. Mathematically, we can see it as a constraint satisfaction problem:
where p is a norm order (e.g. 2 for the Euclidean norm, or infinity for the maximum norm).
Usually, this problem is solved by using an optimization algorithm, but for the sake of simplicity, we’ll focus on a simple yes/no question: “Is there a valid solution to this problem”? Or, more formally,
It turns out that this problem is NP-complete: in other words, it’s in the same complexity category as SAT, the Travelling Salesman Problem, or the Knapsack Problem. This is the type of problems that might require a supercomputer, depending on how big your input is. Still, for most practical situations you can still solve them in decent times, provided you have enough AWS credits.
Of course, in practice most researchers are often fine with using heuristic adversarial attacks, which are not optimal but are in exchange much more efficient. We will focus on heuristic attacks later: for now, assume that all the attacks and defenses we’re considering are exact.
Adversarial Defenses
Let’s now consider another problem: we have a classifier f and we want to find the parameters θ that make f robust against adversarial attacks on a single input. This is a simplification compared to real-world applications: we often want to be robust on multiple inputs, and we want to do it without sacrificing too much the accuracy of our model. Still, it’s good enough for our purposes.
As before, we can formalize it as a constraint satisfaction problem:
where Bp(x, ε) is the ball of radius ε centered in x. Again, we can rewrite it as a yes/no question:
This problem is Sigma2P-complete. This is believed to be a much harder complexity class: if we could solve NP-complete problems in constant time, we would still need a supercomputer to solve Sigma2P-complete problems. This difficulty asymmetry, in our opinion, explains why defenses are much harder than attacks.
For an intuition on why there’s this difference, compare the two problems: the first requires simply finding an input (so it’s an ∃ query), while the second requires finding a value such that, for all possible inputs in the ball, the model gives the same output (so it’s an ∃∀ query). This is the core of the computational asymmetry between attacks and defenses.
Designing Asymmetry-Free Defenses
Ok, defenses are much harder tasks than attacks. Is all hope lost? Of course not!
It turns out that defenses are Sigma2P-hard only if we execute them at training time (i.e. before seeing the actual input we need to classify). If, instead, we run them at inference time (i.e. once we know the actual input to be classified), it’s possible to break free from this asymmetry.
For the sake of proving this, we came up with a very simple defense:
- Try to improve as much as possible the robustness of the model, using regular training-time defenses
- When you receive an input, try to find an adversarial example for the input
- If we find it, by definition the model is not robust, so we flag the input as unsafe and send it to a more robust classifier (e.g. a human)
We call this defense Counter-Attack (CA). To be clear, CA is far from perfect: it can’t magically know the correct class of the input, it can only raise an alarm on inputs that other defenses failed to “fix”. It’s essentially a last bastion against adversarial attacks: if the other defenses made the model 100% robust, CA would never flag an input; however, if an adversarial attack manages to sneak past other defenses, CA will at least let you know that the model cannot be trusted on that specific input.
CA has an interesting property: running it is NP-hard (since we need to run an adversarial attack), but fooling a model defended by CA is Sigma2P-hard. We have reversed the asymmetry!
Heuristic Relaxation
The rest of the paper deals with what happens when we replace exact adversarial attacks with the much more common heuristic attacks. By doing so, instead of finding the optimal adversarial example (i.e. the closest one), we might end up with a slightly sub-optimal one, which means that CA might end up not flagging an input as unsafe (although it will never flag a safe input as unsafe).
Comparing exact adversarial examples (computed using the Galileo100 supercomputer) with heuristic attacks (computed using my pc), we had some very interesting findings:
- Heuristic attacks find surprisingly close-to-optimal adversarial examples. This is useful not only for CA (since it means that the heuristic variant will be less likely to be fooled), but also for future defenses that might use heuristic attacks in different and interesting ways
- When used in defenses, weak but reliable adversarial attacks are often better than strong but unreliable adversarial attacks
- Some heuristic attacks compensate for each other’s weak spots, which means that (small) pools of heuristic attacks are better robustness tools than single attacks
We also compiled all the adversarial examples we found into a new dataset, UG100, so that you can take a look at them and use them for your own research.
Conclusions
To sum up:
- Training-time defenses are harder than attacks
- Inference-time defenses can flip the computational asymmetry
- Heuristic attacks are surprisingly good
Overall, CA is not meant to be the be-all and end-all of inference-time defenses: our hope is that, by showing that the asymmetry can be flipped, other researchers will develop better inference-time defenses without falling into the same traps that caused other defenses to fail.
If you have any questions about the paper, you can also find me on Twitter as @MarroSamuele and on my personal webpage.
And of course, if you want to talk in-person about adversarial defenses, be sure to drop by at ICML 2023! Our poster session (n° 4) will be on Thursday 27th, but you can find us there all week.