On a Language of Musical Thought [Part 2]

Trevor Rawbone
The Sound of AI
Published in
12 min readMar 7, 2019

This post follows from the previous in this two-part blog. Before you dive in, I recommend understanding what was covered last time, so that you’re fully equipped to navigate this argument. The central issue we’re dealing with is music representation — specifically the hypothesis of an internal language-like system of music representation, a language of musical thought (LMT). The idea is that internal musical representations are manipulated according to some syntactic system that’s sensitive to their structure. In this post, I’ll examine the effectiveness of generative music theories and generative computational musicology in connection with the LMT, such as A Generative Theory of Tonal Music (1983) (hereafter, GTTM). I’ll also counterbalance this by looking at empiricist theories, such as schema theory of music and probability approaches in computer music.

Fluent in a language you’ve never learned

As indicated in the last post, generative music theorists argue that the innate human musical capacity is not primarily about communication. Rather, being able to communicate in language and music is predicated upon being able to speak the internal language of mentalese, or, more fittingly, musalese — the LMT. This points towards something very striking and radical: in a sense you already know the raw content of music before it’s expressed to you. That is, you have a battery of basic concepts already in your head that are available for use. Of course, for everyday musicking, they’re combined in different ways to make complex concepts so you can’t predict what someone will say musically before they say it. As Cartesians often say, language is only trivially for communication; the LMT is really about the internal combinatorial system.

Fred Lerdahl, with the help of linguist, Ray Jackendoff, has provided perhaps the definitive nativist story about how a LMT might work in GTTM; and by and large, it’s a pretty good story. This is testified by its sheer popularity across almost all music disciplines. I say ‘almost all’, because the exception, ironically enough since it’s a work in music theory, is the discipline of music theory itself, which doesn’t seem to be much interested in generative models. For some reason music theorists don’t like them— but more on that later. In computational musicology, and almost everywhere else in music research, GTTM has pride of place, such as the generative programme running in Japan by Masatoshi Hamanaka, David Temperley’s (2001) classic, The Cognition of Basic Musical Structures, and Alan Marsden’s musical reduction papers.

As I’ve argued in the last post, a LMT should have representational structure at low levels and across all parameters. This is necessary for mentalese to be innate, i.e., rooted in humans’ basic conceptual structure. Higher-leveled musical structure, involving complex concepts, seems to be synonymous with rationalist recombination and perhaps also cultural variation. Generative theories often propose a fairly rigid high-level innate structuring, which doesn’t sit well with the LMT hypothesis.

The generative story on melodic grouping

GTTM proposes that the overall rhythmic and tonal shape of a phrase is a percept, drawing from Gestalt psychology. It argues that rhythmic and pitch shapes are automatically received into human consciousness as non-reducible ‘wholes’. But is this good use of Gestalt psychology?

Gestalt psychology proposes that there’s a propensity to perceive certain patterns or shapes in a direct way, usually without too much cognitive reasoning. Two Gestalt principles, similarity and proximity, are often used in generative models of music.

The principle of similarity states that similar types of things tend to be grouped together.

We tend to see black lines and white lines

The principle of proximity states that things close together tend to be grouped together.

On the left, we have a homogenous group; on the right, we have three separate groups

Now, GTTM effectively says that these perceptual principles are universal and automatic. As Fred Lerdahl (2013, p. 270) points out, ‘[s]ome of the rules appear to be psychologically universal, especially those that incorporate Gestalt principles, whereas others are style-specific.’ The problem with this is that if perception takes on conceptual wholes (i.e., Gestalts), then there’s no opportunity for a more sensitive re-rendering of the shapes when musical contexts demand it. What I’m saying is, Gestalts would be great if they were a product of laws, but they’re possibly just principles that result in tendencies towards a certain grouping. Of course we have a propensity to group things that are close together or similar, but the real question is, do we necessarily group them regardless of other factors? The answer is probably no. Thus they’re dubious as basic axioms of a formal system, and perhaps shouldn’t be used as such in generative theories of music.

Indeed, the Gestalts proposed in generative theories are not sensible levels of conceptual structure for a systematic and productive LMT. The following notational analysis, taken from GTTM (Lerdahl and Jackendoff, 1983, p. 37), shows grouping structure segmented into Gestalts. Don’t worry if you can’t read the notes — what should be visible is that they’re grouped according to rhythm/pitch similarity and/or proximity.

Grouping according to similarity and proximity

The problem is, with different harmony, different performing conditions, different textures, or different almost-any-other-parameter, these phrases could be grouped almost any other way. GTTM says that if there’s a particular shape in a melodic line, we automatically cognise such a line as being a self-contained Gestalt, effectively without the possibility of cognising it differently — otherwise why talk about perceptual inference? This is surely dodgy thinking, because different parametric interaction would certainly result in different ‘Gestalts’. Thus, I think grouping Gestalts cannot be part of a LMT. Once again, I think this boils down to the fact that representations in the LMT must occur at low levels. And it’s no surprise that computational models of GTTM face difficulties when trying to come up with hard-and-fast rules for grouping, as Hamanaka discovered when trying to implement GTTM. Rules about grouping are very difficult to come by.

Gestalt psychology: did you automatically see a triangle, or did you have to think about it for a little while?

The generative story on hierarchies and grids

Another major issue with generative programmes concerns the notion of a metrical hierarchical grid, which is an idealisation or Platonic form: it’s universal, absolute, and independent of culture. Common grid-like structures are found in standard key signatures, such as 4/4 or 3/4 time. Most levels of the grid are generated using multiples of two, such as this example from GTTM (Lerdahl and Jackendoff, 1983, p. 73), using a ‘dot structure’. Again, don’t worry if you can’t read the notes, the main thing is that you see that the onsets of note events cue the metrical structure in a way that’s ‘grid-like’.

The dot structure below the stave suggests that the metrical structure is like a grid

GTTM says that the grid is a well-formed hierarchy. But is the grid just an abstraction that exists only in theorists’ heads? Indeed, more flexible hierarchical metrical systems are commonly found in many subcultures. And the grid idealisation is realised only in particular musical subcultures, in both Western and non-Western contexts. For listeners, texture — the combined landscape of pitch and rhythm events — is most important for metrical induction. Texture is often irregular at various levels, therefore not supporting grid-like metre. The metrical structure that emerges from such textures is irregular or ‘ill-formed’. Lots of texturally sophisticated music must be metrically ill-formed. (Rawbone (2017) and Rawbone and Jan (2019), which will be published shortly in the Music Analysis journal, discuss the subtleties of metrical induction for such textures.) The basic point is that generative theories are not applicable to lots of music because they don’t show representation at the level that it actually occurs in cognition. There can’t be such high-level representations in a LMT since it’s innate. Representations must occur at low levels.

Fixed, low-level metrical representations in the LMT concern beat-level interactions. When beat-level relations are put together they form complex concepts which can be coherently understood by separating them out into the basic concepts. Even if a metrical grid was a Platonic ideal that’s present in cognition, metrical structure must still be internally understood using some sort of bottom-up LMT.

Empiricism and the LMT

With the inception of generative music theory in the 1980s, with GTTM, we might have expected improvements to the programme into the 1990s, 2000s and beyond. Alas, it wasn’t to be. Music theorists were too preoccupied with things such as Sonata Theory, schema theory (which I’ll talk about shortly) and Nordic music to worry about improving generative theory. Fred Lerdahl again came up with the twenty-first century gem in generative theory, Tonal Pitch Space (2001).

While I think rationalist music research requires further development to provide a viable explanation of a LMT, empiricism’s mark on music research is perhaps of more concern. Broadly speaking, the empiricist music researcher tries to capture music-making through leveraging the most probable, most commonly associated, or most statistically prevalent arrangement of features. Which is good as far as it goes. But how does it help explain music cognition, and specifically the LMT?

The sub-discipline that’s occupying most music theorist’s time these days is schema theory. I think this corner of music theory is actually closer to empiricism than rationalism. In its early form, in Robert Gjerdingen’s A Classic Turn of Phrase (1988), it provides a counterposition to the generative enterprise. More recently, schema theorists prefer to emphasise the historical and cultural modes of practice, looking at the ins and outs of the Partamenti tradition in eighteenth-century Italy rather than examining music cognition per se. Of which, Music in the Galant Style (2007), also by Robert Gjerdingen, is a prime example, and a very interesting exploration of a cultural niche.

The schema theory of Leonard Meyer, Robert Gjerdingen, and Vasilli Byros — in that order and counting, since it’s a popular method — looks at the ‘psychology of convention’ through the use of voice-leading schemas in the eighteenth century. One of the underlying contentions is that schemas are the product of time and place. So while much of the work is carried out under the umbrella of cognitive science, it’s also bona fide empiricism. The conclusion is that a good part of music cognition boils down to brute experience, or statistical inference. But to paraphrase Fred Lerdahl in his 2011 presentation at Rice University (shown below), musical understanding and cognition must come down to something more than statistics. Indeed, what about the internal system? What empiricist musicology tells us about music cognition, while in some sense interesting, is different to working out the fundamental representation system of music cognition; it’s a different aspect of cognition than that which concerns a LMT.

Fred Lerdahl in action

Computational musicology might be said to have a similar empiricist bent. For example, Temperley’s Bayesian approach in Music and Probability (2007) arguably doesn’t give us a great deal of insight into music cognition, because it deals purely in probabilistic data. The same might be said of the multiple viewpoints research paradigm, kicked off with Darrell Conklin’s and Ian Witten’s paper, Multiple Viewpoint Systems for Music Prediction (1995). Multiple viewpoint systems incorporate statistically common relations between musical features to generate new pieces in a style. This is useful, but questionable with regards to what it tells us about cognition. My point is, empiricist music theory and computational musicology can’t get us very far until it aims to account for underlying cognitive processes. Noam Chomsky said this many years ago, and it seems that it’s still applicable (to music research) today. How exactly music research should proceed moving forward is a hard problem, but there’s a strong case for considering the representational quirks of a LMT.

A music module in the mind

I think that low-level music representations are innately manipulated by some sort of designated module. Jerry Fodor (1983) has most conspicuously forwarded the idea of a modular mind in his celebrated Modularity of Mind (1983). He proposes that cognition has a number of perceptually and informationally encapsulated units. These are domain-specific, obligatory-firing, fast-speed, limited-access, and neurally-fixed. However, they’re connected to a general central system that is domain-unspecific. Somewhat mysteriously — which is okay because human intelligence is mysterious — the central system draws from the bottom-up perceptual modules and mid-level modules to enable intelligent interaction between the modules and the environment.

The music module is in the bottom corner, right by the ear

Some sort of encapsulated or semi-encapsulated module might describe the LMT, because the representations it deals in are quite special. As I’ve said a few times, this needs to be a low-level module, because it concerns the manipulation of basic concepts. But such a modular view has its detractors, specifically in cognitive science and evolutionary psychology. The general consensus among them is that there are various faculties from which music draws, but there are no specific capacities for music — there isn’t a dedicated and self-contained music faculty.

Steven Pinker’s infamous chapter on music in How the Mind Works (1997, pp. 528–545) makes the claim that music is like auditory cheesecake. It tastes lovely and sweet but hasn’t had much of a role to play in the grandiose and gradualist process of cognitive adaptation. It’s simply a piggyback trait on our evolutionary skill set. Pinker (1997, p. 528) says, “[c]ompared with language, vision, social reasoning, and physical know-how, music could vanish from our species and the rest of our lifestyle would be virtually unchanged.” For Pinker, music acts just like a plug-in in the digital world; it didn’t have to be here, but it’s an amusing thing to add to the main business of life. Of course, many have rightly criticised this on account of its wilful philosophical blindedness. It fails on its faulty assumption about how it thinks life might have been, when we obviously can’t philosophise about this because music emerged out of the starting conditions that everything else came from.

Music, like love, provides a vital rationalist core to our humanity, and directly conflicting against Pinker’s point, might indeed have had a measurable effect on ‘our lifestyle’, whatever that means––as Prince Charles once said about love. Just as worrying is Pinker’s overall moral: if you’ve done nothing for an organism’s evolution, you haven’t done anything good. But human beings, through rational thinking, can transcend brute mechanisms. Music is perhaps the most subtle of cognitive capacities, and doesn’t need to score points with evolutionary psychology.

Noam Chomsky has been coy about attributing the natural language capacity to natural selection. Hauser, Chomsky, and Fitch (2002) play down the role of natural selection in the formation of natural language. Rationality, and the mathematically recursive ability to compute information — first pinpointed by Alan Turing — appears to be an all-or-nothing capacity. Like being able to ride a bike or tell jokes, you can either do it or you can’t. Evolution might have acknowledged recursion and rationalism as good ideas after they’d emerged and thereafter selected them, but evolution by natural selection didn’t gradually shape recursion and rationality — that can’t happen — either you have recursion and rationalism or you don’t. As I previously said. This seems like a really important and decisive argument about the nature of human intelligence, which should be applied ‘wholesale’ to music. Since music is recursive and rational, it’s suggestive that a LMT is the result of a sudden saltation, not gradualistic selection. You don’t get a bit of music one year, and then wait ten thousand years for the next bit. Once you can understand Joplin, you can understand Janis, so to speak. A LMT might use some of the same systems as natural language and rational thought, but I think it’s a fully dedicated system at its core. A dedicated module that permits recursive and rational musical structure is a very strong argument for a LMT.

Janis Joplin

Final thoughts

That concludes my thinking on a LMT. Hopefully it’s provided you with some insight into the issues that face music researchers in various fields dealing with music representation. I’ve argued that many empiricist, inductive, and probabilistic models can’t, seemingly by definition, show how music is cognised from the perspective of a LMT. This position also contrasts somewhat with models in generative music theory and computational musicology, which propose higher-level representations. The main contention is that a LMT must operate at low-levels of representation for it to be viable.

--

--