A Sight for Obscured Eyes
Adversary, Optics, and Illusions
I’ve seen a few people in the machine learning community draw comparison between computer vision adversarial examples and optical illusions, often referring to the latter as an adversarial example for the human visual cortex. These optical illusions come in many different flavors, such as those that cause distortion of angle, height, color, or movement in a still image (just a few examples). I’d like to use this post to explore similarities between the two and hopefully learn a little in the process of writing about what one can teach us about the other (and vice versa). Fair warning, I’ll probably venture down a tangent or two along the way.
Part 1 — Adversary
I’ll start with an introduction to adversarial examples in the world of machine learning, but before doing so is probably worth a brief refresher on convolutional neural networks, that architecture that is traditionally the vehicle of choice for addressing image data. This blog has previously addressed convolutional networks for image processing in a machine learning context so I will take for granted that a reader has some familiarity with the concept, but as a hedge for those that might be less fluent I’ll offer a quick sketch here: convolutional neural networks are a distinct architecture for neural networks well suited to data structured in a bounded grid topology (such as an image), and such network through supervised training develop the ability to recognize features of increasing complexity in each subsequent hidden row of the neural net (for example an early row may detect edges, another row may apply those features to recognize lines or curves, another may take that output to recognize shapes, and so on until deeper rows can recognize or categorize sophisticated features of an image). One application for convolutional networks involves the classification of features in an image, and that will be the primary use case discussed in this essay.
In the context of convolutional neural networks applied to image classification, an adversarial example is a type of image derived from a particular (trained) network via some (even minute) obfuscations of the picture such that when fed back through that same network the output classification is intentionally incorrect — in fact potentially producing an arbitrary specific desired incorrect classification based on the obfuscation. The fact that it is possible to fool a classifier should not be too surprising, after all even humans aren’t 100% accurate at interpreting images. What is surprising is in some cases how little it takes to output a false classification. An adversarial image may be completely recognizable to a human but still yield seemingly nonsensical output classifications.
The presence of an adversarial example is a learning moment, it reveals underlying structures and shortcomings of a model, including instances of overfit. The fact that a classifier can be fooled on images so immediately recognizable to humans reveal that our computer’s encoded knowledge is not fully generalizable, that it is taking shortcuts in the type of features it tests for in evaluation (such as perhaps focusing on idiosyncratic features of a particular training data set for instance). These shortcomings I suspect could partly stem from the phenomenon that as the image is fed through the network the max-pooling operations between feature detector layers are scrubbing information, with each pixel grid of inspection retaining only the maximum value of detector activation in an output of correspondingly reduced dimensions as it is swept through the image, a part of the architecture necessary to manage the increasing width of evaluation from the scaling up of the number (and hence diversity) of parallel feature detectors in each subsequent row of the neural network — thus the final layers performing our classification are not evaluating based on the entire contents of the image, only those filtered and selected as the most relevant features derived from training (an issue partly addressed by Geoffrey Hinton’s recent invention of the capsule networks variant of convolution which preserves more information between layers in a dynamically routed vectorized activation instead of a traditional convolution’s scalar).
The Deep Learning text offers another explanation for a network’s susceptibility to adversarial examples (one that is actually research based so should certainly be trusted much more than my rampant speculations about max-pooling or capsule networks preceding), offering that a primary cause for a model’s susceptibility arrises from excessive linearity. Although linear models may be easier to train, neural networks built primarily out of linear building blocks can change output very rapidly when addressing numerous inputs. While activation functions like the ReLU can certainly introduce elements of nonlinearity into a network, I think the inference from the text is that these elements alone are not sufficient to address adversarial susceptibility. In fact the primary means of mitigation offered from the text is the use of a series of derived adversarial examples for supplemental supervised training runs aka “adversarial training.“ I think it can be thus inferred that this process of training a model, deriving from the trained model a series of adversarial examples, and then using those same examples to augment a set of labeled points for further training runs could be treated like a shampoo bottle’s rinse>lather>repeat infinite loop for a model’s continuous improvement.
Part 2 — Optics
Turning discussions back to convolutional networks, it is of note that this architecture is a striking example of machine learning algorithms mirroring some of the workings of the human brain. Although the training of the algorithmic version follows a very different tact than a brain (a machine learning algorithm is trained in a batch process of supervised learning on labeled training data whereas the brain selectively and continuously adapts an individual neurons’ firing intensity based on real time exposure to related concepts in a process known as selective adaption (a largely unsupervised approach to learning)), many of the aspects of the architecture called out in my earlier quick sketch can be found in elements of brain operation.
A lot of early research for characterizing the processing of visual perception in the human visual cortex was conducted by the researchers Huber and Wiesel in the 1950’s and 60’s, work for which the Nobel prize was awarded (shared with Sherry) for Physiology and Medicine in 1981. In one important experiment the researchers anesthetized laboratory cats, implanted electrodes to measure specific regions of neuron firings, and then gauged neuron firings related to viewings of specific images. The findings suggested that in a region of the brain known as the striate cortex there were specific neurons (laid out in a grid parallel to spacial regions of an image) that fired in response to specific image features, such as different orientations of moving lines. The lines were of note because they indicate the presence of specific neuron ‘edge detectors’, and the movement aspect was of note because it demonstrated a cat brain’s propensity (more so than a human’s) to focus attention (via neuron firings) to select portions of an image that are undergoing movement — picture how some animals will freeze when suspecting presence of a predator such as a field mouse being hunted by a cat, this behavior was adapted due to the visual acuity of the cat being sensitive to motion. Note that although the edge detection is certainly common in a trained machine learning convolutional neural network early layers, this concept of motion based attention in image evaluation is missing from mainstream modern architectures based on my understanding.
The idea that distinct striate neurons fire in response to specific rotation configurations of detected lines could perhaps beg the question of what range of granularity in the detected line rotation angle will still trigger that same neuron. It turns out that the range of activation for a single cortisol neuron only follows a narrow band of rotation subject to an orientation tuning curve that ramps up or down the firing based on the difference between the detected and desired angles, and thus there exist comparable neurons detecting similar features at other ranges of rotation. The incorporation of a spacial rotation measure is a part of the vectorized activation output in a capsule network feature detector with values for a single capsule neuron covering the full 360 degrees of rotation — thus perhaps allowing for a more efficient activation representation than even that in a brain.
In modern experiments to evaluate visual perception, a certainly more humane variant of the measure of a brain’s neuron firing experiments can be performed using medical imaging devices such as PET or fMRI, which can detect spacial regions and intensity of firing neurons based on the magnetic properties of the active neurons. Through such experiments researchers have found that in addition to the brain’s striate cortex firing in evaluation of low level features of an image, other regions of the brain may be triggered based on more sophisticated features of a viewing, such as different regions that specialize in recognizing faces, type of objects, etc. The presence of this layering of evaluation can be considered analogous to a convolutional neural network’s layers of convolution (although there are certainly material differences, an important one being the difference between the computer’s evaluation via feedforward network and a brain having more collective (albeit in many dimension sparse) interconnectedness between neurons).
Part 3 — Illusions
Having drawn some parallels between the machine learning architecture of convolutional neural networks and a brain’s visual cortex, let’s now try extending the analogy comparing machine learning adversarial examples to the human equivalent of optical illusions. First to illustrate the type of obfuscations that are fooling a brain’s interpretations, let’s try zooming in on the blue lines at odds that opened this essay.
A close inspection reveals that the illusion of angular blue lines appears to be derived from the smaller embedded shapes with some slight complexity of arrangement, interconnection, and abutted smaller diagonal checkered patterns. The resulting illusion of angular distortion turns out to a be quite strong and persistent even at different magnifications, apparently broaching our interpretation with multiple vectors of distraction. I’m left to wonder whether breaking the illusion down to each of its constituent element categories would leave a viewer with different types of illusory interpretations for each. It’s certainly possible to fools rush in with much simpler elements than can be found in this complex illusion.
Each of these examples of illusion vectors use different channels to distort the brain’s representation of objective reality, and just like an adversarial example derived from computer vision may fool an algorithmic classifier, after exposure to these channels a brain may be left with a representation of circumstances that jumps to conclusions not yet supported by reality.
The ability for a brain to jump to conclusions or assumptions, to color the perception of elements based on context of surroundings, is a useful adapted property of consciousness due to the limitations of any model or representation of our circumstances. Just like the inhabitants in Plato’s cave, no agent (human or machine) will ever be able to fully model our surroundings without some scale of shortcuts in representation or coarse grainings — the map is not the territory. Even between two people sitting side by side taking in the same sunset view our perceptions may be colored by our experiences. We may have read different books that taught us different things about such a scene. We may have different beliefs or doubts that are remnants of our own personal brushes with religion.
This propensity for a consciousness to derive meaning and inferences from an incomplete representation is not even limited to our visual experience. Here is an example of just how useful these shortcuts can be given circumstances of inaccurate representations. Csneodir taht one can raed aritabry smcbaelrd txet — based simply on the first and last letters matching the correct spelling. Some authors or poets may extend this effect beyond the granularity of individual words to whole passages or even book length works — if you ever want a wild ride open a copy of Finnegan’s Wake to a random page and see what hilarity awaits you.
When faced with such increasingly nonsensical representations our brains will work to infer intent, potentially even assigning meanings to randomness (what I’ve previously described as the gypsy fallacy after those who would try to read the tea leaves or the like — a type of Rorschach test that may even reveal hidden proclivities or desires). On a first meeting with someone we are still just getting to know, we may draw conclusions about the other that may or may not pan out over time. We all see different things when gazing into a cloudy sky.
These vectors of inference are I believe the very things that make us susceptible to the illusions of optics or otherwise. Some of these channels may be universal to a population, others may be derived from our unique experiences or upbringing. I would offer that the latter are the very channels from which we draw our uniqueness, our creativity, or perhaps even our ability to cope in novel circumstances of incomplete information or trauma. Consider what happens when life throws curve balls that take us out of our expected paths or routines. My parents have both had recent run-ins with doctors and the medical establishment, placing the whole family in conditions well outside our ranges of experience. I believe the best families are antifragile — in circumstances of uncertainty they pull closer together.
Consider it a joy, my brethren, when you encounter various trials, knowing that the testing of your faith produces endurance. And let endurance have its perfect result, so that you may be perfect and complete, lacking in nothing.
The conditions of adversary are learning moments, we learn from our mistakes of interpretation, we learn from our errors in decisions or omission, we even learn from those things that befall us well outside of our control. Through these challenges and obstacles we have the opportunity to grow, to learn more about ourselves and the world around us, to fill in the details of a map to better navigate the territories to come or possibly even to help others facing similar struggles.
It is perhaps a silly comparison, but this navigation of our respective territories has an analogous architecture in the machine learning realm of reinforcement learning, in which an algorithmic agent forms a navigation policy for some environment or system derived from experimentations on actions weighed against a resulting reward function. Such reward function may provide continuous feedback such as the points tally in an Atari game or even the expected impact on some far distant binary reward state such as the outcome of a game of Go. The selection of such a reward metric turns out to have its own challenges such as the implications of Goodhart’s Law. Equally perilous is the assumption that a territory itself may be static or subject to only linear transformations. Consider that in the context of modern applications of reinforcement learning applied to video games adversarial examples can be generated in which the inclusion of merely a single pixel in the territory that might otherwise seem out of place transforms the whole trajectory of a navigation.
This single pixel transformation demonstrates that to alter a navigation we don’t even need to alter the policy derived from training, only the territory upon which it is applied. There are certainly security implications of this vulnerability. As more automation encroaches into our society the ability to subvert decisions derived from machine learning — whether reinforcement models, image classifiers, or other modalities — will leave an agent with the ability to exploit static models. Consider that the generation of these adversarial examples doesn’t even require access to the internals of a model, merely access to a sufficient set of inputs and resulting derived outputs. It is even possible using this same method to derive adversarial examples across multiple categories of classifiers simultaneously, so even an ensemble may be fooled.
The fool doth think he is wise, but the wise man knows himself to be a fool.
Following the advice of Shakespeare could be our best bet. Every model will have exploits, we need to maintain humility about what these algorithms are capable of. It may help to study the weaknesses of our inferences, for every category of illusion we identify we at least can find a way to mitigate that particular flavor. Thus it seems wise to keep searching for vulnerabilities, after all life isn’t a state it is a process, and in the end we are each fools in our own way.
*For further readings please check out my Table of Contents, Book Recommendations, and Music Recommendations.
Books that were referenced here or otherwise inspired this post:
Finnegan’s Wake — James Joyce
Deep Learning — Ian Goodfellow, Yoshua Bengio, and Aaron Courville
Sensation and Perception — Bruce Goldstein
(As an Amazon Associate I earn from qualifying purchases.)
Albums that were referenced here or otherwise inspired this post:
Band on the Run — Wings
Ringo — Ringo Starr
Imagine — John Lennon
All Things Must Pass — George Harrison
(As an Amazon Associate I earn from qualifying purchases.)
Hi, I’m an amateur blogger writing for fun. If you enjoyed or got some value from this post feel free to like, comment, or share. I can also be reached on linkedin for professional inquiries or twitter for personal.
For further readings please check out my Table of Contents, Book Recommendations, and Music Recommendations.