On a Language of Musical Thought

Published in

The Sound of AI

12 min readFeb 28, 2019

(This two-part blog series will have a light philosophical background, delving into various areas of music theory. I’ll aim to connect it to recent work in computer music and AI music.)

You might’ve come across the phrase in modern times that ‘music speaks to the soul’. Despite the problematic nature of this phrase (let’s leave the ‘soul’ debate for another time), there’s some truth to the notion that music speaks, or is a language. And by language, I don’t just mean a vehicle for communication, but a kind of internal communication of thought. An important issue, not just for music theory, but for any aspect of music research — such as music information retrieval (MIR) and music AI — is how (and if) music is represented in brains. This issue concerns the way we think about music, or more specifically, how we think in music.

The music inside our heads

Whichever way we internalise music, it seems likely that it’s the same for everyone. This must be so, because, broadly speaking, music is open to everyone across the globe no matter their cultural background. This means that music doesn’t need to be, or in fact cannot be, translated, because musical tones ‘speak’ to everyone. Just as everyone has eyes to see, so they have a universal internal musical ear to hear. But only humans can hear and whisper this magical language — it’s one of the few mental capacities that distinguishes us from the rest of the animal kingdom.

If only this dog could tell it was listening to Eleanor Rigby.

It’s also possible that there isn’t an organised internal representational system for music at all, and musical thought is embodied, non-centralised, or constructed from the ground up, as some music theorists have proposed (Gjerdingen and Bourne, 2015). I suspect that such bottom-up theories are unlikely, because how we think in music seems to be very much like an internal language, what I call a language of musical thought (LMT). The main idea behind a LMT is that we don’t just hear or communicate in music, but are able to speak to ourselves internally using this language. The key attraction for a LMT is that it represents musical elements in the external world, but, importantly, also manipulates these representations, in the same way that a computer manipulates information. I think this is necessary for us to cognise music, since the basic building blocks of music (its basic concepts) are combined to form complex structures in a way that works (more or less) logically and systematically. But I think that for a LMT to be a reasonable hypothesis, the building blocks must operate at low levels, and are combined less rigidly and systematically than in natural language.

The core representational system

This topic isn’t as straightforward as it might seem, especially if you’re expecting the usual broad analogy that music, like language, is just another form of communication. The simple analogy between language and music isn’t really what I mean by a LMT. The LMT hypothesis presented here is quite a specific and special claim. The idea is that there is a core internal computational and representational system by which music is mediated. It seems obvious that we don’t actually think in rhythms, pitches or chords, for example. If we do internally represent music at all, we must represent it differently, but how we do this is a bit of a mystery. What’s clear, is that a LMT probably represents music of the external world non-iconically. It seems also to be a computational system, where representations are manipulated according to a generative syntax. This is necessary for musical thinking to cope with the combinatorial structure of music. This framework corresponds with many theories of grammars in language and in music, such as those put forward by music theorist, Fred Lerdahl, with the help of linguist, Ray Jackendoff, in A Generative Theory of Tonal Music (1983) (GTTM).

So, the basic argument is that the LMT is a representational and computational language, since musical thought is combinatorial and generative. The reason being, many different musical building blocks, such as chords, pitches and rhythms, are assembled in a way that’s both sensitive to context, and is productive and systematic. This hints at a much more complex system than the associative, behavioural, or statistical models of cognition put forward in years past by empiricists, such as John Locke, David Hume, and Burrhus F. Skinner, which were quite wrong for music and many other capacities. In music, there’s no straightforward conditioned relationship or series of associative relations between ideas. Associative or statistical explanations of thought are simply not descriptive enough.

The empiricist position is that the musical brain is simply a machine that churns out commonly associated or statistically probable utterances. But such thinking would not be able to model musical thought, which manipulates musical representations the way that a classical computational system does. I think that the neural net or deep learning models of contemporary computational musicologists automatically espouse an empiricist metaphysics, because their machines encapsulate information directly from the environment, arriving at outcomes based more or less entirely on data, without a generative or rational mediation. While these methods have their advantages, such as being informationally rich, a ‘data-driven’ process doesn’t capture the core generative principle of language.

Recursive operations

One of the most important characteristics of a LMT is that it is recursive. Recursion is a special characteristic that occurs in various languages; it’s the process by which repetition of a functional procedure results in self-embedding. Using recursive processes, representations, which are the building blocks of languages, can be manipulated according to logical operations, producing complex edifices. Recursive operations are important for the serial and parallel embedding of musical ideas, attesting to the idea that musical thought is language-like.

Recursion in natural languages occurs as follows. I can say: the man went to the shop. I can also say: the man went to the shop with his son. And again, with more embedding: the man, that never likes to go out, who was a little tipsy, went to the shop with his son. And so on, ad infinitum. There’s no theoretical limit to the amount of recursion permissible in natural language. Owing to performance limitations, it’s never actually infinite, however. (Although some people’s endless rabbiting on about things might make you a little more skeptical of this.) Natural language is recursive in particular ways, since embedding occurs according to the rules of the grammar. The generative and rationalist picture of natural language is largely a product of Noam Chomsky, the high priest of 20th Century linguistics and philosophy. He views language as a specialised universal internal generative system, which can generate an infinite number of sentences using finite means, based on the principle of recursion. Music may be similar to this, but there may be some important differences as well.

How musical thought is language-like

Usually, grammar is the vehicle that structures thoughts. For example, the rules and structures of natural language are what organise words into sentences. But if thought itself is language-like, then it’s reasonable to propose that thought must have its own generative, logical and constituent structure, like the logical and recursive operations in natural language and computational languages. This argument has been most conspicuously developed by Jerry Fodor in The Language of Thought (1975) and LOT 2: The Language of Thought Revisited (2008), and by various other writers. Fodor sees the LOT as a close metaphor with the natural language capacity; I see a LMT as analogous to the LOT. Following Fodor, musical thinking might entail an internal representation system whereby complex molecular concepts are constructed out of basic atomic concepts, using a combinatorial syntax.

A plausible LMT might read as follows. The building blocks of a LMT are finite, since there are effectively a finite number of pitches, rhythms, timbres, etc., that we infer from the continuous musical environment in the world. How we infer them is just a fact about the way humans are. Our physical, chemical, biological, and psychological make-up limit the type of basic concepts that we can individuate. These finite elements can be put together in infinitely many new ways to make complex (and very complex) concepts, right up to new, original compositions. This is the most important thing that recursion in music is good for.

Intuitively compositional

So recursion seems to be centrally important for an internal LMT. It buttresses the claim that musical thinking is compositional. It corresponds with the intuitive realisation that music has simple building blocks that are used to construct complex edifices. Compositionality has two important related principles that are found in natural language: systematicity and productivity. When a system is productive, it creates infinitely many strings using finite means. When it’s systematic, it rigidly manipulates its lexical symbols using a battery of syntactic operations.

Let’s focus on productivity now though. As noted, a productive LMT can turn a finite set of basic concepts into an infinite set of complex concepts. This means it’s possible for finite brains to understand a potentially infinite number of musical expressions. But for the argument of musical productivity, we need to establish how it’s possible to individuate basic concepts first. It’s clear that basic music concepts should have something to do with music’s key attributes: pitch and time. And for a LMT to be an internal language, these must be innate and low-level enough to be rigid in all musical languages, to permit music to be concretely rooted in psychology. One group of basic concepts could be a type of musical alphabet, such as a musical scale — for example, A, B, C, D, E, F, G — which can be recursively used as blocks to build larger edifices, namely, chords, keys, and musical works.

The problem with this is that such representations surely cannot be endowed in musical thought from birth, because the thousands of scales in the world vary between cultures. (For instance, the musical scales of non-Western cultures, such as the slédro or pélog scales of gamelan music, divide the octave up differently to Irish folk scales). An internal, innate musical alphabet of pitches would need to be fundamental and cross-cultural, such as very low-level representations and relations of pitch frequency. Indeed, there would have to be low-level representations in all parameters of music, including of course rhythm and metre. In the realm of metre, the innate representation would probably be at the level of beats, from which metre and hypermetre is constrained and elaborated. For rhythm, we would conceptualise low-level innate rhythmic relations, by which phrases and larger sections are generated. If all this is all true, it’s suggestive that music may be productive, which supports the idea that there is a LMT.

As mentioned, musical thought that combines low-level representations through some sort of grammatical operations could be termed systematic. Fodor (1975, 2008) argues that thinking is systematic because it has a generalisable syntax. It’s therefore computationally efficient, because a battery of logical operations are used again and again to build complex propositions. While natural language is more feasibly systematic (although this is also disputed), it’s much more questionable whether music has grammatical rules that govern its structural make-up; music seems to be more flexible, which is problematic for the LMT hypothesis. Alas, I’ll talk more about systematicity in the second post of this blog series, because it’s a difficult notion; the discussion will be aided through an examination of recent music research. For the moment, I’ll further explore the productive aspect of compositionality.

As noted, part of the way that natural language is compositional is by being productive, in the special sense of this word already defined. It can build whatever sentences or constructions out of lots of other basic and complex concepts, in a way that’s infinitely creative––in theory. In thought and natural language this seems to happen all the time. For example, the fact that we can think the concept BROWN COW is because we already have the more basic concepts BROWN and COW. This means that we’re able to understand a complex original sentence, because we can separate the concepts out into basic things we already know. Take note, however, BROWN COW is just an example, in actuality the concepts BROWN or COW might break down into smaller concepts — although it’s questionable whether the BROWN concept can be broken down further; probably not, since colours seem to be fundamental.

There is probably a similar internal decomposition process that occurs with music. We understand the complex concepts of music once we are able to separate out its basic concepts. But how are we able to separate out its basic concepts? As mentioned above, I think we can do this because our internal representation system has a direct connection with the low-level external elements of music. This supports two claims about the LMT. First, we can understand music because it’s made out of the basic building blocks that we innately understand. And second, we can generate and think about infinitely complex musical edifices because they are put together using finite building blocks in relatively systematic ways.

Chords and context

This type of conception gives us insight into musical thinking, such as why we’re able to understand that a chord is chord. Let’s consider a common chord, like a C major chord. For those unfamiliar with music theory, this is made out of the pitches C, E, and G. I would say that this chord is a complex because it’s made out of the basic pitch concepts, C, E, and G. Some might argue that complex concepts are often fuzzy. This rings true, for when we have a single sustained chord it can also contain (some) non-chord pitches, but we often still bracket this complex as a discrete chord category or concept. That is, folks internally ‘label’ it as a C CHORD even though it has some non-chord pitches, depending of course on the degree of inclusion of non-chord pitches (as well as a consideration of other contextual parameters). The admission of non-chord tones is problematic for a language of thought, though, because the whole point about a LMT is that representations are stable. However, I don’t think all is lost, because it’s clear that there must be some flexibility in the representation system at higher levels of abstraction. The internal language, or ‘grammar’, must be fixed at lower levels, allowing rigid building blocks for the language to work with. Flexibility is then permitted at higher levels, to build new complex concepts.

The contextual issue I was talking about is that for a chord to be labelled as a chord it requires the interaction of other parameters, such as basic and complex rhythmic concepts. And this issue is exacerbated when concepts are even more abstract — what about key for instance? What does it mean to be in a key? The concept of key is perhaps one of the greatest musical abstractions ever conceived. It refers not just to, for example, the C pitch, or the C chord, but a very large set of chords, pitch relations, and rhythmic relations that give a sense of belonging in the key of C, whatever this means. I’ll consider the difficulty of describing such aspects for the thesis of an internal LMT in the next post when I work my way through the canon of music theories––I think we have a good grounding in the issues to satisfy us for the moment. Regardless of these thorny setbacks, I don’t believe that such complexity and abstraction diminishes the core idea of a LMT, because as discussed, the representational system of the language seems to operate at a low-level, and so higher-level constructions should naturally have expressive freedom.

Final thoughts for now

So, that concludes this post in the Thoughts on a Language of Musical Thought series. I hope I’ve given some credence to the idea that the LMT is at least a feasible hypothesis of music cognition. The LMT appears to be recursive, and has a compositional structure that is productive, allowing for infinite use by finite means; it seems also to be relatively systematic at low levels. In the next post, I’ll go deeper into some of the issues I’ve raised here, looking at various generative and non-generative music theories, and relating them to models put forward in computational musicology.