Brian on the Brain: how humans process sound


When there is life, there is sound: Where there is atmosphere, there are molecules that vibrate in response to movement. These vibrations give us the Oxford Dictionary definition of what ‘sound’ — “continuous rapid movements that travel through air or water and can be heard when they reach a person’s or an animal’s ear”. However, this doesn’t give us the full picture.

Neuroscientist, author and music producer Daniel Levitin answered the philosophical question “if a tree falls in the forest and no one hears it, does it make a sound?” with a resounding “no”. ‘Sound’ is not made by things, rather it is something we perceive in the brain.

Professor of psychology Mark Grimshaw agrees, saying “sound is perception”. It isn’t located in any one place as by definition it is always moving . After entering the ear as vibrations, sound waves are turned into electrical impulses called neurons which fire into the auditory cortex and are processed into component parts: pitch, rhythm, timbre etc. The information is then sent to other parts of the brain for processing so we can decide on our reaction — “do I run or stay”, and how we feel emotionally — “do I like it or not”. It does this by stimulating many parts of the brain in both hemispheres.


Sound waves picked up by the pinnae are amplified in the cochlea by the vibrations on the ear drum. These movements are transferred to basilar membrane via tiny bones in the middle ear. When a sound wave reaches the inner ear, it displaces tiny hair cells that are selective according to frequency of sound. Similar to a MIDI keyboard, the basilar membrane has a map of different pitches called the auditory tonotopic map. The hairs that are the closest to the ear canal detect higher frequencies, and they are also the hairs more likely to get damaged as we get older. This is the reason why we lose hearing of higher frequency sounds as we age.

The brain mirrors the tonotopic map in the auditory cortex. Neurons fire at exactly the same frequency as the sound, so 440 hz sound wave tone will equal 440hz of electrical activity. If you converted the neurons into sound and then amplified them, they would resemble the sound we hear. (See the research conducted by Nina Kraus in her ‘Brain Volts’ lab about what our brain waves sound like).

The brain is separated into different parts that specialise in different functions.

Frontal lobe — planning, self control, perceptual organisation /Gestalt grouping, working memory (very short term).

Temporal lobe — auditory cortex, hearing and memory

Posterior/Cerebellum — motor movements, spatial skill, emotions and planning movements (oldest part of the brain)

Occipital — vision, reading music and watching performances

Limbic System — motivation, emotion (amygdala/nucleus accumbens), learning, and memory (hippocampus).


19th century French/American composer and grandfather of electronic and experimental music, Edgar Varese, said “music is organised sound”. This is a key part of music’s definition — just like sound, music is about perception too: “One man’s music is another man’s noise”.

We are all expert musical listeners. We may no longer all participate in the performance of music, like our ancestors did (the barriers that have been created between expert performer and listener are a relatively modern concept in our wider history), but as listeners we have all most likely achieved our 10,000 hours badge - the concept that suggests we can become experts in pretty much anything once we have dedicated this amount of time to it. We all easily surpass this number of hours of listening in our lifetime.

Perhaps due to this, we all display ‘declarative knowledge’ when it comes to music - the ability to talk about music in spite of our lack of formal training.

Music is Universal

Across all cultures across all history and pre-history we create different types of song: dance, healing, love and lullaby.

Music, and our love of it, is universal, but throughout history there has also been a huge and often misinformed fear of music. For example, the Catholic Church banned polyphony in the 1400s on the basis that it “intoxicates the ear without satisfying it” and creates “a sensuous and indecent atmosphere” during the liturgy thereby doubting the unity of God. This was the piece of music that saved us, but it took until 1500s to bring polyphony back!

This fear of music re-emerges at various points throughout history and it’s usually demonstrated by the ruling powers innate knowledge of the potential of music to affect the public. Whether it’s the church banning the augmented 4th/tritone and calling it the ‘devils tone’, or suspicion of jazz in the 20s/30s, the gramophone in the 40s, heavy metal/Satanism in the 80s, rave/drugs in the 90s, binaural beats on the 2000s etc — all are examples of musical genres and formats that struck fear into the ruling powers.


Music might be universal, but why does it exist? Is there an evolutionary basis? Is music is adaptive?

This topic is hotly debated — on one hand, Steven Pinker argued music is merely a by-product of language, calling it “auditory cheesecake”, meaning that although we like the sugariness of cheesecake, it doesn’t server any evolutionary value. He goes on to say, “Music is useless. It shows no sign of design for attaining a goal such as long life, grandchildren, or accurate perception or prediction of the world. Music could vanish and our lives wouldn’t be changed.” Dan Sperber called it “an evolutionary parasite” and John Barrow said it plays no role in the survival of the species. But does it?

On the other hand, there is a weight of evidence to suggest these views are wrong and short-sighted. These include:

  1. Darwin’s Sexual Selection — where music is a key element of acquiring a mate.


Throughout history, Music has often been described as ‘sound paintings’ (see Scriabin/Ravel/Stevie Wonder/Paul Simon/Lindsey Buckingham etc). Notes and melodies gives the shape and form, and timbre the shading and colour. The dynamic (rhythm and meter) is the key difference to painting, as it changes over time.

Musical activity involves almost every aspect of the brain. There is no single music centre in the brain — the theory around ‘music being a right brain activity’ and ‘language being left brain’ has been disproved. Instead, it is better to think of the brain as a ‘parallel processing machine’, processing different elements of a song simultaneously. For example, it doesn’t have to wait to find out the pitch to make a judgement on it based on all the other information available.

So what key information does our auditory cortex extract when listening to a song?

Frequency: Pitch and Intervals

Pitch information is one of the most significant and well-understood aspects of the musical brain. Pitch information includes the related concepts of intervals, melody, and harmony. The brain processes pitch information both locally and globally, where local music refers to the intervals between pitches, while global processing refers to the entire contour of the melody.

Emotional information: Consonance versus Dissonance

Could all music be based on maths? Pythagoras and the ancient greeks thought so. Frequency ratios — 4:1 = octave, 3:2= perfect fifth and represent consonant intervals which are “uplifting, happy, healing, peaceful — a ‘sonic cuddle’, anti-depressant, known as the most harmonious interval.” e.g. Twinkle Twinkle Little Star

On the other hand, dissonant intervals such as the Tritone = 45:32, which as we heard before, is known as the “devil’s tone” can be considered “as harsh, alien, restless, cold, dangerously exciting, thrilling, dark.”

Music is made richer by harmonics — where more instruments play multiple frequencies at once creating multiple overtones. These are usually integers of themselves and this relationship create synchronous neural firings in the brain.

After being processed in the auditory cortex, music signals will work through a network of regions involved in emotions — arousal, pleasure and fear all take place in the limbic system. If the music is pleasant and enjoyable, perhaps because it is consonant, it will lead to the production of dopamine — activating the nucleus accumbens area. Playing sad music when you are feeling down can release prolactyn and oxytocin which are soothing hormone, similar to what is released when a mother is comforting her baby

Timing: Rhythm and Timbre

Time information, which includes rhythm, tempo and meter, timbre, meaning, and emotion is less understood although even animals such as these elephant seals use timbre and rhythm to communicate with one another. Musical timbre is one of the most critical of all components of music, yet remains one of the most mysterious of all human perceptual attributes. In a 2012 study, Patil et al. examined the neural underpinnings of musical timbre in order to understand the underlying processes of timbre recognition. They observed how timbre is recognised at the mammalian primary auditory cortex to predict human sound source recognition.

Motor Systems: Rhythmical Entrainment

Taken together, both behavioural and neuroimaging results demonstrate that brain activation in motor areas and other brain circuits induced by music can in turn have a significant impact on motor or cognitive performance. Neurons in the brain fire in synchrony with the tempo of the music you listen to.

Music appears endowed with the remarkable power to entrain neural activity in several brain circuits — but to different degrees depending on its rhythmic structure as well as its emotional content. Such entrainment may explain why music exposure or music training can influence various tasks such as running, even when these are unrelated to music and when music is not directly relevant to the task.

An example of this is in the work we do with Open Ear — adjusting playlists throughout the day based on energy or tempo to affect customer behaviour. Multiple studies have proved that music has a behavioural affect on the listener: music that can increase or decrease dwell time, music that encourage you to buy a certain product etc.

COGNITION - How do we process sound information?

Music and the subconscious mind

The work we do with Open Ear (in the industry formerly known as ‘background music’) is an obvious example of how music works on your subconscious. In bars, restaurants and cafes, our conscious brain is taken up by conversation and thirst, not the music. However, our brains are using only about 5–10% of their capacity for this conscious thought processing, much more is going on within our subconscious in parallel.

“Passive Frame Theory” is the idea that most our brain’s work is conducted in different lobes and regions at the unconscious level, completely without our knowledge. In an experiment where subjects were shown subliminal images for milli-seconds at a time, the image content is picked up by the visual system and triggers and an emotional response from us but not picked up by the cognitive system in the frontal lobe— so if we see a scary image we will feel fear but not know why.

We are, like it or not, biological machines, and the simpler we keep things, the less chance there is for a mistake or a breakdown. The mind, as our most complex part, needs the streamlining more than anything else.

Auditory Scene Analysis: Cutting through the noise

Also known as the Cocktail Party Effect, the job of Auditory Scene Analysis (coined by Gestalt psychologist Albert Bregman) is to group incoming sensory information to form an accurate mental representation of the individual sounds. When sounds are grouped by the auditory system into a perceived sequence, distinct from other co-occurring sequences, each of these perceived sequences creates an “auditory stream”. We can choose the stream that is more pertinent to us to pay attention to.

Top Down versus Bottom Up

Music cognition involves a number of complex neural computations interacting with memory. The ear is open and continually receiving sound information from multiple sources of noise/sound objects at once. The different parts of the brain circuits interpret those all at once in parallel -bottom up processing via the cochlea, auditory cortex, brain stem, and cerebellum. But at the same time, the higher levels of the cortex for higher thought will be predicting what comes next. When listening to music, it will be processing information such as:

  1. What has come before it in the piece of music — anticipation

Our auditory system exploits the harmonic series by grouping sounds together. Helmholtz called this ‘Unconscious Inference’ - our brains decide that it is unlikely that every harmonic is from a different source and automatically groups them together using the ‘likelihood principle’. This is why we can hear a ‘trumpet’ rather than its individual overtones. We use processes such as ‘Temporal positioning’ to make decisions on individual groups i.e. two instruments starting to play at slightly different times, enable grouping to happen due to the time. We also group using spatial location, timbre, loudness, and frequency/pitch etc.

Gestalt principles of grouping

Gestalt psychology explains to how elements comes together to form wholes — objects that are qualitatively different from the sum of their parts, and cannot be understood in terms of their parts. e.g how we recognise a melody played at different pitches. An orchestra is usually interpreted as a single group. If there are several orchestras playing at once (different stages at a festival for example), you can focus on just one at a time (same as multiple conversations taking place in the same place).

Musical schemas — are another way to think about grouping processes. The brain creates rules on how objects sound based on listening experiences pre-birth and built up during our lifetime which help frame our understanding. Within music listening, they are the system we use to determine the elements and interpretations of an aesthetic object i.e. musical genres, artists and songs. If we have a schema built about a style of music we will be able to interpret it easier and appreciate the compositional nuances more. The brain then uses a process called ‘Gap-fill’, which enables it to interpret a song using minimal data based on the schemas it has in its memory. This way it means that working memory and conscious cognitive processes are not overloaded.

However, this doesn’t always work seamlessly. Top down processing, which parsimoniously need to be as efficient as possible, can cause us to misperceive things. This is demonstarted visually by the Ponzo illusions above — no matter how many times you tell your brain that both lines are the same length, it will continue to tell you that the top one is longer. This happens within audio too:? When playing a recording of speech with words replaced by white noises, subjects are asked to fill in the gaps for the missing words. The subjects grouped the white noises separate to the speech content. Perception, therefore, is a process of inference via the analysis of possibilities — which the brain doesn’t always get right! Music can be thought of as a type of ‘perceptual illusion’ in which our brain imposes structure and order on a sequence of sounds.


In Utero

Our ears are fully developed by the third trimester (4 months before birth). We listen to the underwater rhythms of our mothers heartbeat first, before tuning into her speech rhythms and frequencies and then the sounds of the world around her. We start developing our musical preferences at that point. Children prefer music they were exposed to in the womb.


Music listening, and more importantly musical playing, from an early age helps coordinate and recruit neural structures in both left and right brain hemispheres. This helps the acquisition of motor skills and accounts for a larger cerebellum in musicians than non. Music listening and music therapy can help overcome a broad range of psychological and physical problems.

At this early stage, infants are already demonstrating preference for consonance over dissonance.

Communication - infant-direct speech/parentese/baby talk = exaggerated prosody of speech pitch, contour, rhythm. By doing so, mothers are creating musical ‘prototypes’ for questions/declarations/warnings etc.

All of us have the innate capacity to learn the linguistic and musical distinctions of our culture, but it is our experience of the music of that specific culture that shapes our neural pathways which in turn creates an internal set of rules in line with that musical tradition.

Young Children

Young children start to show preference for music of their culture by age 2. We first start by liking simple songs, as they cannot filter out unwanted sensory stimuli.


Around 14 we develop our taste, where music is a real interest and can stay with us for the rest of our lives. This is cultural and social: bonds with friend groups, exploration of different cultures/people etc. This is fully formed around 18–20 years old. People become less open to new experiences, and any new music we hear becomes assimilated within the framework of the music we listened to during teenage years. Neural circuits become structured around the experiences we’ve had and music we’ve been exposed to. Many of our future likes and dislikes will be consequences of schemas formed through our childhood listening. Early exposure is often most profound.

Is that it?

And then, by the age of 20 we stop. Or, rather it becomes much harder for adults to develop circuitry as easy as we did when we were younger (similar to language learning). The neural circuits that we create for tunes we love during our childhood are hardwired into our brains for the rest of our lives. The songs we love in teenage years stay with us, and often are inseparable from memories we have from that time.

This is one of reasons music can be so helpful for unlocking memories for dementia and Alzheimer sufferers. We use the hardwired musical connections to unlock parts of our memory.

As music is playing, the brain thinks ahead to the possibilities using the schemas we’ve learned. The expertise of a composer or producer is to lull us into a state of trust and security by giving us rewards - completions of expectations - but they also has to reward us by challenging our expectations. This happens particularly during live performance where we can watch the musician play a different ‘live version’, but it can even happen when we are listening passively (such as in a restaurant or bar setting).

This balance of complexity vs liking = inverted-U Function depicting arousal when listening. When music is too simple, we tend not to like it as it doesn’t cognitively challenge us in any way. When too complex, we don’t find a schema for it and so it is not grounded in anything familiar.

Each time we hear a musical pattern that is new, our brains try to make an association with any other sensory (visual, auditory etc) cues that accompany it. The associations are incredibly powerful and can immediately convey lots of different meanings both intended and unintended.


Award-winning music producer and DJ, founder of music strategy company Open Ear, music psychologist, sound designer and trainee sound therapist.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store