State of the Union: Music and Computers
Question: What does it mean to make music using computers?
Answer: It can mean almost anything. Computers are used by musicians of all kinds during different stages of the creative process, and this means it is difficult to make broad generalizations. Nonetheless, here is a broadly generalized list of (frequently overlapping) categories:
Audio Sequencers: These applications usually feature a GUI that displays renderings of layered wave forms and midi data as blocks stretched across metered time, and an extensible system for recording, mixing, and processing different aspects of the audio directly within that environment.
Midi Sequencers & Notation Engravers: Like audio sequencers, these applications display musical data stretched across metered time, but because they only handle midi data, the data displayed is usually a more granular, note-by-note representation of the music. It may be in a layered block form like the audio sequencers, or it may be conveyed as a full graphical rendering of traditional music notation. In the latter case, the emphasis of the software is usually on the creation or ‘engraving’ of notation, rather than simply sequencing the midi data.
Software Synthesizers & Audio Effects: These are frequently provided as extensions to those systems in the “audio sequencer” category. They usually take midi streams as input (either from a live musician or pre-sequenced midi track) and output the synthesized or processed audio synchronized with the audio sequencer’s playback. The focus here is usually not organizing sound within a composition, but rather generating and affecting the aural characteristics of audio waveforms themselves, usually with minimal effort. The examples of these are enumerable. The market place is a name-brand wonderland of proprietary magic, packaged with bright colors, flashing lights, and lots of buttons that do all the stuff and all the things. These extensions promise to help you sing in tune, in 32-part harmony, in a football arena, on top of a mountain, with the reincarnated ghost of Michael Jackson.
Visual Music Programming Environments: These systems usually offer a GUI work space with programmable and connectable nodes which do all sorts of different things, from generating data, to playing sounds, to synthesizing new sounds.
Machine Learning Experiments: Naturally, music has captured the attention of those who study machine learning, as illustrated by Google’s ‘Magenta’ project. As you might expect, training data goes in, music comes out.
So, what is the point?
The above examples couldn’t be more different from one another. They all touch different parts of the process, and work to achieve entirely different types of results. However, if you think about it, they do share some common characteristics.
Visual Interfaces: One such characteristic is that music software is dominated by GUIs. Even a majority of computer programming environments are notably ‘visual’ in their design. While it is true that most applications are graphical these days, in the case of music programming software, I think the reason for preferring graphical systems over textual systems comes from a different motivation than comparable software used in other disciplines. Where as a word processor may help ease the mechanical burden of typing and sharing text information, and a photo processing application may help to ease the technical burdens of processing photos, music software is frequently designed in order to ease the conceptual burden of dealing with music-related information by either creating traditional renderings of music (standard notation) which abstract away complicated concepts through their use of familiar symbols, or more novel, modern renderings which do the same (blocks and grids). Each of these renderings may cater to a different type of musician with a different type and level of musical literacy, as even many trained musicians are ill-prepared to express musical data in a more generalized, numerical data format.
This is not the fault of musicians. Rather, I think it is because the numerical relationships between musical objects are actually relatively complex, and the notation systems used by musicians are able to convey these relationships in much more intuitive ways which also yield dramatically increased density of musical information as compared to text-based representations.
Consider the following measure of music:
This does not sound all that great, but it is immediately comprehensible to most trained musicians. They can count the rhythms, and can probably ‘hear’ the pitches without even playing it.
Now, consider the following text-based, numerical equivalent:
(Pitch: 440 hz, Duration: 0.5
(Pitch: 493.8833012561241 hz, Duration: 0.25
(Pitch: 554.3652619537443 hz, Duration: 0.125
(Pitch: 587.3295358348153 hz, Duration: 0.0625
(Pitch: 659.2551138257401 hz, Duration: 0.03125
(Pitch: 739.988845423269 hz, Duration: 0.015625
(Pitch: 830.6093951598906 hz, Duration: 0.0078125
(Pitch: 880 hz, Duration: 0.0078125
It doesn’t exactly roll right off the tongue, does it?
Why do these numbers appear so complex, while the notation equivalent appears so straightforward? The answer, it seems, is that standard notation hides the complexity of exponential relationships between musical objects. These relationships exist for both pitch and rhythm. These are not complex calculations, but the resulting values are too precise for most humans to ‘feel’ intuitively. Like programmers, musicians rely on a significant degree of abstraction in the systems they use in order to make very complex concepts much more simple to work with.
Granularity of Objects: Just as a word processor operates mostly on characters, and a photo editor operates mostly on pixels, music software primarily operates on either units of either notes (midi data) or waves (audio data), which makes perfect sense for a majority of use cases. This strategy means that the musician (rather than the software) is fully in control of note selection, which is obviously the divine right of the humans, and should not be tread upon by the machines. (I’m talking to you, Magenta!)
This is a very granular approach, and I believe that it is the best default approach for making software tools for musicians. However, it does frequently fail to acknowledge the macro-level musical structures that the sequences of notes are actually building. Musicians train themselves for years to rise above and beyond the minutiae of individual note reading in order to interpret these larger structures efficiently. Where a computer sees a sequence of note data, a trained human musician tries to see much more: phrases, harmonic progressions, voice leading, and compositional forms.
More than that, humans also see analogies for human thoughts and emotions, narrative arcs, and evocations of human anatomy in motion through dance, and it is all encoded within the strange medium that is music.
Even in the machine learning space, there appears to be a (non-universal) tendency towards a somewhat naive, note-by-note concept of music in the way that training data is prepared. Amazingly, trained computers still produce compelling, recognizable results this way, but in the roughest spots you can frequently recognize the effect that this naivety is having on the generated output. Like an AI trained on a corpora of works from a great writer, what you get is a form of gibberish with a familiar structure that is frankly more palatable with music due to its more abstract nature — but no less noticeable.
So, what’s a computer to do?
It’s a beautiful thing to be human. However, we all can’t be so lucky. Computers want to make music, too.
As a human that is significantly dumber than a trained computer, I have always relied on explicit sets of rules and relationship definitions to help me understand music, and to encode and decode information into its structure. Humans in certain parts of the globe call this field ‘Music Theory’.
As part of my enrollment in the ChiPy Mentorship Program (Fall 2018 semester), I will attempt to build a music theory library from the ground up in the Python programming language. The focus of this project will be on minimizing the pain points in interacting with textual representations of music data through specific object-oriented design choices, and on exploring the interaction between computers and the macro-level musical structures previously described.
As a proof of concept, I plan to focus my composition-specific efforts towards generating baroque Fugues from varied and limited human input. I’ve selected this form specifically for its strict and procedural nature, which I believe will increase my chances of being able to reliably generate something that ‘sounds good’. However, I hope that in the longer term, I will be explore generating multiple styles and forms, and I’d specifically love to explore generating less structured forms after the mentorship program has concluded.
I will chronicle my progress and challenges in a series of several blog posts in the coming weeks and months. The code is posted to GitHub at the link below. As this project is more of a personal exploration than anything, I welcome anyone interested in the subject matter (either musically or technologically) to reach out to me to talk, critique, or contribute!