The Neuroscience and Machine Learning perspectives of Jazz Improvisation…

Owen Chen
10 min readFeb 3, 2016

Hello Medium.

Jazz is a highly esoteric art-form, and I would love to introduce my perspectives and experience on this journey through the lens of my two disciplines of academic study. As my first official blog post, I will begin to publish my content in the hopes of providing bits of insights for people in these topics through what I pick up. I hope to continue reading and reflecting on my interests within the cognitive sciences, philosophy, technology, and music, and in no way consider myself an expert in any of these fields. Hoping to just spread some love of the topics, so enjoy!

From my research paper in a music cognition and neuroscience course: The Neuroscience and Machine Learning perspectives of Jazz Improvisation


Music is a constantly evolving art form that has permeated through cultures and traditions from the beginnings of human civilization. The process of transcribing the masters from a prior generation and incorporating their language into the formation of a new genre is how music is passed on through the generations. These stages of careful analysis, imitations, and generation of new material is a process that is formulaic and algorithmic in nature, most particularly in forms of music with highly structured theoretic rules and compositional techniques such as classical music. Improvisational jazz music is one of the most challenging forms of spontaneous artist creativity, and additionally much more difficult for musicians coming from a classical background. Unlike the structured contexts of non-improvisational music that follow sheet music in the aims of replicating the exact notes in a score, improvisational music involves very little algorithmic and rigid processing, relying more-so on the human brain’s capacity to tap into it’s unconscious mind.

Formal, rigorous scientific studies have shown insights into the neurological mechanisms at play in the brain during jazz improvisation via fMRI and sCIA analysis techniques. These insights, in which regions of the brain linking to working memory and decision making have been shown to be deactivated completely, conclude that professional-level jazz musicians are not fully consciously aware of the music they produce, relying more on an intuitive sub-conscious process that is difficult to explain with a purely statistical or neurological model.

An example of a subconscious ability found in seasoned musicians is to have the intuitive ability to differentiate the playing of Charlie Parker, Coltrane, Brecker, or other jazz greats by a single note solely based on the timbre of the sound. In a similar vein, master violinists are able to instantly differentiate between a Stradivarius violin and a Guarneri with a single note whereas average listeners are unable to identify any differences. This is related to the concept of schema taxonomies, in which individual’s underlying cognitive schemas for specific genres are refined over repeated exposure and have a more specific analytical framework for identifying these sounds or context cues, whereas the average listener will have a more abstract, generalized schematic framework in attempting to analyze a specific sound.

With the advent of computers in the age of information, statistical methods of “data mining” and computational methods of “learning from data” has sprouted the fields of Machine Learning and Artificial Intelligence, relying on algorithms that replicate mankind’s ability to logically compute problems. Such techniques, which include Markov models, neural networks, clustering, and dimensionality reduction techniques can all be applied in the context of learning and predicting musical data.

The schematic and formulaic nature of music, with the theoretical frameworks of harmony and theory that provide rough schematic guidelines for a musician, is suitable for supervised learning systems to predict and generate music based on quality training datasets of valid musical phrasings. With the understanding of jazz improvisation through the lens of neuroscience however, realistic computationally generated jazz improvisation is a much harder task. In this paper, I would like to explore computational methods of generating jazz improvisation through a holistic understanding of the theoretical, neurological, and computational understandings of jazz improvisation.

Jazz Harmony Background

I will begin this exploration of computationally generated jazz improvisation models with relevant research and understandings in jazz harmony and neuroscience, then proceeding to explore specific computational methods of generating improvisation.
Subsets of music are established and constantly evolving. Through a lifetime of exposure to musical idioms in patterns, scales, lyrics, timbre, and instrumentation, we establish musical schemas, or cognitive frameworks that form for the purpose of accurately predicting and categorizing future musical stimuli we encounter. These schemas are the reason we find familiar music pleasurable, as described in David Huron’s Theory of Musical Expectation. They are the same reasons strong emotions are evoked during nostalgic childhood songs, or how different cultural tastes in music appreciation are developed. Various forms of cognitive expectations, which including schematic, veridical, and dynamic expectations and are based on different forms of memory, serve as our mental inference models that learn over time based on prior experiences for the purpose of predicting musical stimuli. The successful expectation and prediction of musical stimuli we are familiar with is what gives music it’s pleasurable feeling.

Each time we receive a new musical pattern, our brains attempt to contextualize any surrounding visual, auditory, or sensory cues to create memory links for future occurrences of the same musical pattern. This correlates to supervised learning algorithms in Machine Learning, in which a system is trained on a training data set to construct it’s predictive model, which will produce classifiers or regression functions to predict subsequent data. This generalization of prior experiences is very similar to how our brains conceptualize musical genres and rules. For largely cultural reasons, we associate musical modes with major qualities (major 3rds, natural 7ths), as happy, and modes with minor qualities (minor 3rds, flat 7th’s) as sad. Subsets of these tonalities include the pentatonic scales, largely associated with blues, rock, and other western music. Moving into dominant, diminished, and melodic minor harmony, these various intervallic relationships trigger additional feelings, images, and memories that aren’t necessarily happy or sad. Whole-tone scales, for example, conjure up the image of thought bubbles or dreaming states found in movies, the timbre of plucked violins will typically invoke feelings of sneakiness, and a spanish-phrygian dominant will bring up image associations of a gypsy-esque culture. All these context cues and established mental models are typically well understood by film composers who intend to utilize music to evoke certain feelings in a scene.

Within jazz harmony, the dominant scale harmony and the derivatives of the melodic minor scale traditionally have a of influence in filling in tension in jazz chord progressions. A common chord progression in jazz, the ii-7 — V7 — Imaj7, is ubiquitous in jazz standards throughout the ages, as it’s a perfect representation of tension and resolution for the listener. Embellishments to the dominant V7 chord, such as a b9 or b13 interval in relation to the root note, will cause the listener extra tensions to expect the major tonic resolution chord to follow. This is similar to the ITPRA Theory of Expectation, misattribution, and prediction effect in explaining the pleasurable qualities of metric downbeats and drum fills. Both gives an auditory cue the listener to prepare and expect a particular outcome, and reduces the uncertainty of making an accurate prediction. The positive valence generated from the accurate prediction, as described by Huron’s Theory of Expectation, is what causes the ii-V-I progression and drum fills to sound pleasing. These auditory cues are stored in established schematic models so that a listener can make these inferences for the future.

Other idiomatic jazz qualities, such as turnarounds, bebop scales, and complex chord substitutions are subsets of this jazz schema that a jazz musician’s ear and brain must develop over an extended period of time. The act of jazz improvisation is a result of these well defined and engrained schemas into a musician’s subconscious, where he then draws from during his spontaneous improvisations. For example, a seasoned jazz musician accompanist will also be able to instinctively recognize the chord tone qualities of a soloist, and accompany with adapted chord harmonies on the fly in improvisation, all of which involves no conscious thought. Unlike other genres such as classical and rock, the improvisational nature of jazz causes the brain to operate in a different fashion and replicating the act with computational models is less straightforward.

Neuroscience of Jazz Improvisation

Recent publication from Johns Hopkins University set out to investigate the neural correlates of musical improvisation, as many studies on music cognition outside of jazz improvisation context have been long established already. By comparing fMRI data on patients performing memorized jazz standards along with free improvisation over each standard, findings support the hypothesis that syntactical language areas of the brain such as Broca’s area of the inferior frontal gyrus are engaged, but semantic processing such as the dorsolateral prefrontal cortex are deactivated. Regions linked with semantic processing include areas that regulate decision making, conscious reflection and analysis of impulse and thought, and working memory. These areas of the prefrontal cortex model complex thought and action in the same way statistical algorithms generate predications based on assessing input data. The inhibition and deactivation of these areas during jazz improvisation indicates a subconscious system driving the production of a musician’s improvisation, built on his store of schematic material to draw and use to generate new music. The operations of these biological systems are lesser known and are harder to replicate in mathematical models.

Musical improvisation also activated other multi-sensory regions of the brain include somatosensory for feeling, motor system for production of movement, limbic system for emotion regulation such as the amygdala for the processing of musical associations and emotions to memories, as well as the visual and auditory systems. The trigger cues for happiness, sadness, nostalgia, and other various emotional states are processed through the limbic system, such as the dissonant tension found in jazz diminished and dominant scale harmony. The altered dominant scale, for example, is the seventh mode of the melodic minor scale in which all non-essential tones, other than the 3rd, 5th, and 7th that define a dominant chord, are altered. These alterations, such as the b9 interval, are biologically linked to sensations of tension, thanks to the limbic system processing our emotion responses and memory. In order to build up and refine a cognitive schema for jazz, a musician undergoes the process of repeated and long-term exposure to these musical phrases and relationships, strengthening the link associations between a musical stimuli, schema classification, and reaction response.

Various forms of memory come into play in the process of jazz improvisation as well. Schematic expectations are correlated with long-term semantic memory, veridical with episodic memory, and dynamic expectations with short-term working memory. Semantic memory serves as the basis of categorizing inference and predictive rules for the jazz genre framework, and working memory and dynamic expectation is utilized heavily in the act of improvisation when notes are being created on the spot. A blunder in a musician’s improvisation will correlate to incongruences with dynamic expectations, but the blunder will most likely be an incorrect chord tone over the current musical chord, in which semantic memory and schematic expectations are in violation.

Computational Machine Learning Models of Jazz Improvisation

From an expectational standpoint, the probabilistic occurrences of notes have been categorized by terms such as “regression-to-the-mean”, “post-skip-reversal”, “step-inertia”, and etc. These observations and classifications are general rules-of-thumb to predict melodic contours of a piece, generalized to music as a whole. These statistical heuristics are a manifestation of collective musical phrasing tendencies, and can also dynamically adapt as new musical stimuli occur. More rigorous mathematical models in the fields of Artificial Intelligence and Machine Learning attempt to replicate the human neurology in algorithms to do these schema formulations, data analysis, and generative predictions even further.

Neural Network, or multi-layered perceptron algorithms replicate the biological workings of a brain’s biological neural network. Concurrent parallel computations on input data and back-propagation algorithms are used to minimize the system’s output predictive discrepency error, and readjusts synapse weights for a refined prediction. The network of perceptrons, separated in layers connected by synapse links, is initialized with equal weighting on each link connection between perceptrons. A forward pass then pushes input training data through activation functions to produce an output prediction. Back-propogation sends feedback back to readjust synapse weight distributions until prediction errors of the network system is minimized. There have been projects on utilizing Recurrent Neural Networks in training a system to generate a Blues improvisation, in which such systems successfully learned a blues form and performed a passable blues improvisation solo.

The generation of probabilistic music with Markov Models has also been done, such as the pyKov-Music python library that takes MIDI formatted musical phrases and outputs a generated piece. Markov Chains are essentially states, X, that have conditional probabilistic transitions to a subsequent state, Y. Solo generation with Markov Chains would entail chord progressions as states, with the current chord and the last k-beats as the current state of the Markov Chain. A program would look up viable notes from a probability transition matrix to generate the next phrase based on past states. Hidden Markov Models are a variation of regular MM’s, in that the current state is unknown but observable outputs that are dependent on states can be used to infer the underlying structure. For example, an improvised jazz solo can be the input into a Hidden Markov Model, in which the Viterbi algorithm can be used to generate the most probabilistic likely sequence of hidden states that produced that specific solo. This could be the sequence of chord qualities that defined the specific solo, and this could generate another variation of the observed jazz solo. This process is analogous to a jazz musician’s journey through transcribing well established famous jazz solos in order to ingrain these particular styles of phrasing and languages to incorporate into their own playing.

Lastly, one other example of a potential method of computationally generating jazz improvisation is the use of clustering algorithms. K-means clustering is an example of this unsupervised machine learning algorithm, in which data point characteristics are analyzed in an n-dimensional space and euclidian distances are measured between points. These distance values are then used as a metric to calculate similarity for clustering similar data points together. In terms of musical qualities, this could mean analyzing vast amounts of various phrasings, and grouping similar phrasings together with k-means clustering. An example of clustering algorithms and jazz improvisations would be finding the unique stylistic phrasings of famous jazz musicians through their recorded solos and developing a probabilistic transition matrix for a Markov Chain to generate a new jazz solo based on these insights.

Concluding Discussion

While these computational methods of generating jazz improvisations successfully generate passable improvisation, there are a wide amount of extraneous factors that influence the development of a jazz musician’s improvisation and stylistic identity. The manipulation of micro timings, such as playing ahead or behind the beat, or the control of timbre on an instrument can vary each improvisation greatly. Jazz has branched into various sub-genres such as smooth jazz, nu jazz, modern jazz, etc, which all have their own unique characteristics with an underlying basis with jazz roots. These subtle nuances and deviations arise from the creative nature of the human mind, and computational models to replicate jazz improvisation will not be able to fully capture all of these features. I believe that the coming exciting age of technological intersections in machine learning, data analytics, electronic music production, and cognitive neuroscience technology has tremendous potential in creating rapid evolvement in truly avant-garde musical genres in years to come.