Words jump-start our visual system

Bastien Boutonnet
Superhero Neuroscience
10 min readJun 24, 2015

--

In Short

Can language turn you into a better see-er?

In my recent paper with Gary Lupyan (UW-Madison) I show that language is capable of altering the earliest stages of visual perception to the extent that on a simple cue-picture verification task, where participants first heard either a word (e.g., “dog”) or a sound (e.g., a dog barking) then saw a picture which matched the cue 50% of the time, they were faster and more accurate when the cue was a word. We call this “the label-advantage“. By recording our participants’ brain activity, using EEG, we were able to identify the source of this label-advantage. Words, literally jump-start your visual system, by preparing it to expect certain visual properties. We were able to detect that when cued by a word, the visual system is capable, to tell apart dogs from non-dogs within 100 ms. This was not the case when people were cued by a sound. In that situation their visual system was just as slow and just as hard at work as when it is trying to process something it has not expected.

When it comes to detecting objects in our environment, words give our visual system a head-start in the race.

When it comes to detecting objects in our environment, words give our visual system a head-start in the race. We believe that these findings revolutionise our understanding of the human cognition especially when it comes down to aspects of cognition that are thought to depend on the representation of categories such as inference, rule following, formal reasoning, and learning.

In Longer

Our brains are like extremely busy highways. Millions and millions of passengers travel through them with one aim, getting from A to B in the quickest and most economical way as possible. Just like on a busy highway, should anything go wrong you get a major crash, but should the route be optimised you get a flow of traffic more fluid and efficient than you’ve ever dreamed of.

Start paying Attention!

Visual perception alone produces an awful lot of traffic in our brains so how do they cope? Well, the human brain has more than one trick in its pockets. The most obvious one is attention. Attention is the process by which our brain does not devote full computational energy to all of the incoming sensory input. As a result, the brain can perform a task even in the presence of distracting or irrelevant input without constantly being side-tracked from one to the other.

This results in three things: a. you can direct your attention (e.g., choose to register the colour of a word rather than it’s meaning –something you need to do if you want to complete any Stroop Task well enough), b. attention can be disrupted (as is the case in the famous “cocktail party effect”) and c. you can literally become blind to non-attended stimuli (as is the case in the video below).

Selective attention test: count the number of passes the white team makes.

Attention is perhaps one of the most studied processes and yet psychologists still do not know much about it. Perhaps, at least some suggest, it is because we are looking at the wrong thing. We think that the effects I listed above are attentional but maybe they are not. Or, rather, maybe attention itself is part of a different process –a bigger, more fundamental process.

It’s all about prediction!

An alternative way to understand how our brain works is to imagine that it is a very simple prediction device that constantly tries to compare incoming sensory information in view of all the things about this sensory information that it already knows, with a single goal at hand: reducing the amount of computations it has to operate.

Our brains predict the future.

Our brains don’t just react to the environment but constantly predict it. Essentially our brain has a model, a sort of tape of what is most likely to happen, and it plays that tape along. When the tape doesn’t conform with the sensory input coming from the environment, that’s fine, it updates it by altering the tape and everything else that comes forth. Each time our brain has to update the tape it learns. It’s as simple as that.

Flexible and Über-connected

What this conception buys us is that, unlike what we thought about a decade ago, our brain are extremely well connected, and that systems like vision, which we thought were so basic are actually much more malleable and subject to influence from other more “elaborate” regions. This makes us so efficient at making sense of the world around us and yet so prone to be fooled by even the simplest optical illusion. Sensory information is never integrated as it is but rather as what it probably is, based on what we know about the world, as is the case in Adelson’s famous checkerboard illusion.

Here’s a little poll for you

Do A and B appear different to you?

Now play the video

EXPLANATION:
What is happening in this illusion is that we know that (a.) checkerboards typically alternate between white and black squares and that (b.) colours in the shade always appear darker than what they are. Knowing this, our brain generates predictions about the colours present in the sensory input. These predictions have so much authority that they are able to over-take the sensory evidence. Kind of as if we turned the brightness knob for square B all the way up. As the video shows, if you take enough of the image away. If you disrupt the checkerboard pattern (as I have crudely done in this animation) the colour of B is recovered, evaluated for what it is: a square of the exact same colour as A.

Prior expectations trigger stimuli specific templates in visual cortex

The very same phenomenon behind the visual illusion I just described happens every second of our life. What we see is largely impacted by what our brain predicts (“the mental tape”). If you want to know more about how researchers think this prediction mechanism works head over to the review I made about a recent paper from the group of Floris de Lange at the Donders Institute.

The Power of Words

humans speak!

Now that we understand a crucial aspect of how our brain deals with visual perception we can start playing around with how different pieces of information, different types of cues affect how our brain perceive the world. This is exactly what I did in collaboration with Gary Lupyan. But, first, let’s point out something so obvious that we often forget it: humans speak! Yep! It is estimated that we utter about 16,000 words/day, this means that in 79 days you have uttered enough words to fill in the entire series of books In Search of Lost Time by Marcel Proust (about 1,264,000).

Not only do we use language to speak everyday but we can quite literally shape each other’s behaviour using, simply, words. What’s more, we learn a lot of what we know about the world in which we live through language. Of course dogs are four-legged animals that bark, wail their tails when they’re happy, and are very friendly and helpful to humans but we also know that dogs are dogs because they are called dogs.

So: 1. Can language affect very basic things like vision? 2. If so, how does language affect vision?

Why do we care? Why is it important?

Language is by far one of the most prominent human skills, but it is often thought to have a very limited effect on our brain. In fact, while no one doubts that, of course, language influences how we make decisions, or how we reason, these effects are often seen as “high-level”. That means that language affects other processes at a very late stage in the brain, but language is incapable of affecting functions that we call “low-level” like vision. This debate (see Klemfuss et al. 2012) needs to be solved!

Why? Why does that matter? Why do we care? Because the possibility that one additional function in our brain may be capable of affecting another tremendously changes and increases our knowledge of how intricate even the most basic processes (like vision) work. Furthermore, being able to understand the depth of the reach of language may help us understand why we are, as humans, such a successful species, and why, under certain circumstances we completely fail.

Words vs. sounds as cues to our environment

Let’s imagine your task is to detect objects in your environment, say dogs. Which cue do you think will be more effective?

1. Hearing the word “dog”?

2. Hearing the sound of an actual dog barking?

A word, this completely abstract thing compared to the sound of an actual dog barking turn people into better dog (or anything else) detectors.

Well it turns out that in our study as well as in a previous one using a similar design (Lupyan & Thompson-Schill, 2012) when it comes to detecting objects in our environment being cued by a word makes you more efficient. A word, this completely abstract thing compared to the sound of an actual dog barking turn people into better dog (or anything else) detectors.

How do words turn you into a better object detector?

There are actually two mutually exhaustive hypotheses one can make:

  1. Words are, simply, better at invoking knowledge about objects than are sounds. So no wonder that after hearing “dog” you’d be better because your brain just knows better what you mean. Consequence: the effect of words has nothing to do with vision, it is not a perceptual effect.
  2. Words are not better at invoking knowledge about objects than are sounds, but words are capable of activating features that are more typical of the object that is cued and these activations are in part visual thereby facilitating visual perception in its earliest stages. Consequence: language can directly affect visual processing, it is a perceptual effect.

Our results strongly support Hypothesis #2. Indeed, in the trials where participants were cued by a word, their visual system was able to distinguish within 100 ms whether the picture they were seeing was matching the cue they had just heard or not. When they were cued with a word and the picture was a match, their visual system was essentially able to complete processing earlier than when it was not a match. This difference did not happen when participants were cued by a sound (right part of the graph). In that situation processing the image took equally long whether the picture was a match or a mismatch.

The bars represent the point in time (ms) at which the activity reached its peak. In electrophysiology, we consider that a peak in electrical activity indicates the completion of a process. We can see that in the case where participants were cued by a label (left half of the graph) the activity peaked faster when the picture was a match than when it wasn’t. This, however, was not the case when participants were cued by a sound (right half of the graph)

Brain activity predicts behaviour

The reason why we think our results strongly support the idea that the effect of language happens very early is because it turns out that the brain measurements (within the first 100 ms) we collected on each trial reliably predict how fast and how well a participant is going to perform.

Our brains predict the future!

It all comes back to our brain being a prediction machine. What happens when we hear a word (or a sound) is that a series of predictions are generated. Part of these predictions are visual. That is: specific visual features of dogs, like their basic shape, orientation, contrast etc. is activated in the brain’s visual areas,(just like expecting a series of lines oriented at 45° made the 45° neurons fire even in the absence of 45° lines on the screen, in the paper by Peter Kok and colleagues I referred to earlier and reviewed in a previous post). When sensory information that matches the prediction reaches our visual cortex it is more easily processed, whereas sensory information that does not match the predictions well require a bit more work. Just as if you were trying to make a triangle fit through a circle in one of those baby toys.

The difference between labels and sounds lies in the content of their activations rather than their capacity to activate information. Labels are categorical, the word dog refers to all dogs. However, sounds are necessarily more particular: the sound of a dog is always the sound of a particular dog barking. So, when it comes to cueing a category words are simply better! And when it comes to seeing the world around us, even knowing what objects are called may change how you see them!

— — References

Boutonnet, B., & Lupyan, G. (2015). Words Jump-Start Vision: A Label Advantage in Object Recognition Journal of Neuroscience, 35 (25), 9329–9335 DOI: 10.1523/JNEUROSCI.5111–14.2015

Klemfuss, N., Prinzmetal, W., & Ivry, R. (2012). How Does Language Change Perception: A Cautionary Note Frontiers in Psychology, 3 DOI: 10.3389/fpsyg.2012.00078

Kok, P., Failing, M., & de Lange, F. (2014). Prior Expectations Evoke Stimulus Templates in the Primary Visual Cortex Journal of Cognitive Neuroscience, 26 (7), 1546–1554 DOI: 10.1162/jocn_a_00562

Lupyan, G., & Thompson-Schill, S. (2012). The evocative power of words: Activation of concepts by verbal and nonverbal means. Journal of Experimental Psychology: General, 141 (1), 170–186 DOI: 10.1037/a0024904

— — Footnote * The views expressed in this post are not necessarily all shared with Gary Lupyan, only the points in the research report published in The Journal of Neuroscience are shared by both Gary and I.

Originally published at www.bastienboutonnet.com on June 24, 2015.

--

--