The Arrival of Quantum Computer Music

Published in

The Riff

15 min readMay 13, 2020

1. Introduction

People hardly ever realize that musicians started experimenting with computing far before the emergence of the vast majority of scientific, industrial, and commercial computing applications in existence today.

For instance, in the 1950s Lejaren Hiller and Leonard Isaacson, at the University of Illinois at Urbana-Champaign, programmed the ILLIAC computer to compose a string quartet entitled Illiac Suite. The ILLIAC, short for Illinois Automatic Computer, was one of the first mainframe computers built in the USA, comprising thousands of vacuum tubes. The Illiac Suite is often cited as a pioneering piece of algorithmic computer music.

Universities and companies all over the world have been welcoming musicians to join their research laboratories ever since. A notable early example is AT&T’s Bell Laboratories, in New Jersey, where in 1960s composer Max Mathews developed programming languages for synthesising sounds.

The great majority of computer music pioneers were composers interested in inventing new music and/or innovative approaches to compose. They unwittingly paved the way for the development of a thriving global music industry. Nowadays, computers are omnipresent in almost every aspect of music.

Quantum Computing will most certainly have an impact on the way in which we create and distribute music in time to come. Hence the arrival of Quantum Computer Music is a natural progression for music technology.

2. Algorithmic Computer Music

Understandably, those early pioneers were interested in developing algorithms to generate music. Hence the term ‘algorithmic computer music’. Essentially, the art of algorithmic computer music consists of (a) harnessing algorithms to produce patterns of data and (b) developing ways to translate these patterns into musical notes or synthesised sound.

An early approach to algorithmic music is to program the computer with rules to generate sequences of notes [1]. Such rules can be expressed in a number of ways, including graphs, set algebra, Boolean expressions, finite state automata, and transition matrices, to cite but a few. For instance, consider the following set of 8 notes: {C3, D3, E3, F3, G3, A3, B3, C4}. Let us define the following rules for establishing which notes are allowed to follow a given note within the set:

Rule 1: if C3, then either C3, D3, E3, G3, or C4

Rule 2: if D3, then either C3, E3 or G3

Rule 3: if E3, then either D3 or F3

Rule 4: if F3, then either C3, E3 or G3

Rule 5: if G3, then either C3, F3, G3, or A3

Rule 6: if A3, then B3

Rule 7: if B3, then C4

Rule 8: if C4, then either A3 or B3

Each of the above rules represents the transition probabilities for the next note to occur in a sequence. For example, after C3, each of the five notes C3, D3, E3, G3, and C4 has a 20% chance each of occurring.

The rules above can be expressed in terms of probability arrays. For instance, the probability array for note C3 is p(C3) = [0.2, 0.2, 0.2, 0.0, 0.2, 0.0, 0.0, 0.2].

The probability arrays for all 8 rules can be arranged in a two-dimensional matrix, thus forming a transition matrix, as shown in Figure 1.

Figure 1: A transition matrix representation of musical rules.

As computers became increasingly portable and faster, musicians started to program them to create music interactively, during a performance. For instance, given the transition matrix above, if the system listens to the note C4, then it would respond with either A3 or B3.

A sensible approach to get started with Quantum Computer Music activity is to revisit tried-and-tested algorithmic methods with a view to re-designing them in terms of quantum computing. Sooner or later new quantum-specific algorithms are bound to emerge.

3. qSyn: Quantum Sound Synthesis

qSyn is an interactive additive sound synthesiser with parameters supplied by a quantum hyper-die.

Additive synthesis is informed by the theory of Fast Fourier Transform. It is based on the notion that the sounds of music can be modelled as a sum of simple sinusoidal waves (Figure 2). Sinusoidal waves are characterised by their respective amplitudes and frequencies. Different values represent perceptible differences in the timbre of the resulting sound [2].

Figure 2: An example of additive sound synthesis with 3 sinusoidal waves.

qSyn comprises 8 oscillators, each of which produces a sinusoidal wave. Each oscillator is controlled by two linear functions, one to handle the amplitude of the produced sinusoid and another to handle its frequency.

The linear functions are used to vary amplitudes and frequencies from initial to end values through the duration of the sinusoids. The outputs from the oscillators are summed. Then, a low-frequency oscillator and an ADSR envelope are applied to add naturalness to the sound [2].

Figure 3: Additive synthesiser with 8 oscillators.

At the core of the quantum hyper-die is a quantum circuit that puts 9 qubits in superposition, using the Hadamard gate [3], and measures them (Figure 4). This results in a set of 9 measurements, which is fed into an algorithm that calculates binary triplets. Then, these triplets are used to retrieve parameters for the synthesiser.

Figure 4: The circuit for the hyper-die.

For instance, consider the list of measurements C = [c8, c7, c6, c5, c4, c3, c2, c1, c0] and a given list of 8 frequencies Freq = [f0, f1, f2, f3, f4, f5, f6, f7]. The triplets are formed by combining 3 elements from the list C. For example, (c8 c7 c6), (c6 c7 c8), (c5 c4 c3), (c3 c4 c5), (c2 c1 c0), (c0 c1 c2) and so forth. The decimal value of a triplet gives an index to retrieve a frequency fn from the list Freq. For instance, the triplet (0 1 0), which yields the decimal number 2, would retrieve f2.

Each qSyn’s synthesis parameter is coupled with a unique triplet formation (Table 1). For instance, (c8 c7 c6) is coupled with the starting frequency for oscillator number 1 and (c6 c7 c8) with the ending frequency for this oscillator, and so on.

Let us assume that there are 8 arrays of frequencies available (Freq_n), one for each oscillator. And there is an array of amplitudes Amp, which serves all oscillators, as follows:

Freq_1 = [55.0, 277.18, 220.0, 329.63, 164.81, 277.18, 220.0, 329.63]

Freq_2 = [82.4, 369.99, 293.67, 196.0, 466.16, 369.99, 293.67, 196.0]

Freq_3 = [87.3, 349.23, 277.18, 440.0, 87.3, 349.23, 277.18, 440.0]

Freq_4 = [92.49, 415.3, 329.63, 233.08, 523.25, 329.63, 233.08, 523.25]

Freq_5 = [435.53, 1468.32, 1038.26, 1959.97, 2330.81, 1468.32, 1038.26, 1959.97]

Freq_6 = [440.0, 2217.46, 1760.0, 2637.02, 1318.51, 2217.46, 1760.0, 2637.02]

Freq_7 = [435.53, 1746.14, 1385.91, 2200.0, 435.53, 1746.14, 1385.91, 2200.0]

Freq_8 = [741.66, 2354.63, 1571.52, 3143.05, 3960.0, 2354.63, 1571.52, 3143.05]

Amp = [0.06, 0.08, 0.1, 0.12, 0.14, 0.16, 0.18, 0.2]

Frequencies are given in Hertz and amplitudes as scaling values between 0.0 and 1.0.

Suppose that we want the system to synthesise 4 sounds. Thus, the system runs the hyper-die 8 times: 4 times to produce a set of C measurements to retrieve frequencies and 4 times to produce a set of D measurements to retrieve amplitudes and durations (the duration array is not shown).

Let us assume the following measurements:

C = {[0, 0, 0, 0, 0, 1, 0, 0, 1], [0, 1, 1, 1, 1, 1, 0, 1, 0], [0, 0, 1, 0, 1, 1, 1, 1, 1], [1, 1, 1, 0, 1, 0, 0, 1, 1]}

D = {[0, 0, 1, 0, 1, 1, 0, 0, 0], [1, 0, 1, 0, 1, 1, 1, 1, 0], [1, 1, 0, 1, 1, 1, 0, 0, 0], [0, 0, 1, 1, 1, 0, 1, 0, 0]}

Next, the system produces the triplets and retrieves the parameters for the synthesiser. For instance, in the first run, the triplet ( x0 x1 x2 ) is equal to 000. Therefore, it retrieves the first element of Freq_1 for the starting frequency of oscillator 1, which is 55.0 Hz. Table 1 shows the retrieved frequencies and amplitudes values for the first sound.

Table 1: Retrieved parameter values for synthesising a sound.

Figure 5 plots the cochleagram of the resulting sounds. Notice the effect of the synthesiser’s linear functions on the spectrum of the sounds. Salient components of the spectrum are clearly shown sliding upwards or downwards

Figure 5: The resulting 4 sounds. Salient spectral components sliding upwards or downwards can be clearly seen on the cochleagram.

4. qSeq: Machine Learning of Musical Form

qSeq is an intelligent musical sequencer that listens to a tune and responds with a sequence of notes.

The system extracts 3 features from the tune: the pitches of the notes, their durations, and their loudness. With this information, the system builds transition matrices representing how often those features occurred in the tune.

The matrices are converted into a list of angles [ϑ1, ϑ2, …ϑ18], which are used to rotate qubits in a quantum circuit representing transition probabilities in terms of quantum states (Figure 6). This is method is based on a system proposed by James Weaver.

The gate RY(ϑ) rotates a qubit by a given angle ϑ around the y-axis on the Bloch sphere [3]. The circuit is armed and measured as many times as the required number of notes to be produced for a response. Each time the circuit is measured it generates a note. For instance, if qSeq is to generate 20 notes, then the circuit is armed and measured 20 times. To start with, all qubits are put in superposition before being gated through the rest of the circuit. Then, after the first round, the resulting measurements are used to arm the qubits for the next round, and so on.

As a demonstration, let us consider the case where the system listened to the tune shown in Figure 7.

Figure 7: The opening theme of Beethoven’s 5th Symphony.

In this case the system extracted the following information (P = pitches, D = durations and L = loudnesses):

P = [67, 67, 67, 63, 65, 65, 65, 62, 67, 67, 67, 63, 68, 68, 68, 67, 75, 75, 75, 72]

D = [298, 301, 302, 1798, 302, 297, 301, 1799, 302, 303, 296, 302, 297, 302, 298, 301, 297, 301, 297, 1799]

L = [113, 113, 113, 105, 113, 113, 113, 107, 61, 61, 61, 57, 64, 63, 63, 61, 70, 68, 67, 60]

Pitches and loudnesses are represented in the terms of MIDI codes [6] and durations in terms of milliseconds.

The system uses 4 x 4 matrices. Higher dimensions would require circuits with greater depth (i.e., greater amount of gate operations), which would increase the effect of quantum decoherence [4]. Improved quantum hardware and better error correction methods will allow for circuits with greater depth in the future.

Thus, the amount of musical information needs to be reduced: the system processes only the first four different elements of each list. For instance, the first four different elements in P are 67, 63, 65, and 62. The system keeps these in the list and removes all others, as follows:

P’ = [67, 67, 67, 63, 65, 65, 65, 62, 67, 67, 67, 63, 67]

Next, the system builds matrices counting the number of times a certain element in the horizontal axis followed another in the vertical axis (Figure 8); e.g., pitch 63 followed pitch 67 twice.

The next step is to convert each of the 3 matrices into bistochastic matrices. Figure 9 shows the matrix for pitches.

Figure 9: Bistochastic matrix for pitches.

Then, 3 identity matrices are created, defining vector spaces with 6 degrees of freedom each. Bistochastic and respective identity matrices are gradually rotated until the difference between the corresponding entries of each matrix is minimised. The resulting angles for each degree-of-freedom rotation are the angles ϑ for the RY gates of the quantum circuit in Figure 6.

The resulting angles in degrees for our example are as follows:

𝛳P’ = [243, 197, 243, 186, 180, 249]

𝛳D’ = [237, 203, 128, 203, 169, 249]

𝛳L’ = [220, 180, 157, 180, 140, 123]

The complete list of angles for the 18 RY gates of the quantum circuit is achieved by concatenating 𝛳P’ ⨆ 𝛳D’ ⨆ 𝛳L’. Thus 𝛳 = [243, 197, 243, 186, 180, 249, 237, 203, 128, 203, 169, 249, 220, 180, 157, 180, 140, 123].

To begin with, the states of the 6 qubits are put in superposition. At subsequent rounds, the circuit is armed with qubits in states echoing the measurements produced in the most recent round. Thus, if a given round produced C = [c5, c4, c3, c2, c1, c0], then the circuit will be armed for the next round with ⎮c4⟩⎮c5⟩⎮c2⟩⎮c3⟩⎮c0⟩⎮c1⟩. For instance, consider the first 2 rounds of our example:

Round 1:

Initial default quantum state: H(⎮0⟩) ⨂ H(⎮0⟩) ⨂ H(⎮0⟩) ⨂ H(⎮0⟩) ⨂ H(⎮0⟩) ⨂ H(⎮0⟩)

Results from measurements: C = [0, 1, 1, 0, 1, 0]

Round 2:

Quantum state: ⎮1⟩⎮0⟩⎮0⟩⎮1⟩⎮0⟩⎮1⟩

Results from measurements: C = [1, 1, 1, 1, 0, 0]

The measurements from each round embody binary codes that are used to generate a musical response. In order to generate a note, the system retrieves its pitch and duration from pre-established sets of pitches (Figure 11) and durations (Figure 10), respectively.

Figure 10: The set of durations contains an eight-note (1/2 beat), a quarter note (1 beat), a half note (2 beats), and a dotted half note (3 beats).

Let us consider the result from the first round of measurements shown above: C = [0, 1, 1, 0, 1, 0]. The binary codes are defined as follows:

(c0 c1) = code to establish the pitch subset to retrieve the pitch of a note from

(c4 c5) = code to retrieve the pitch of a note

(c2 c3) = code to retrieve the duration of a note

In this case, code (c0 c1) = 01 indicates that pitch will be retrieved from subset B, and code (c4 c5) = 10 determines that the pitch of this note is Eb. And the code (c2 c3) = 01 determines that this is a quarter note. The full response to Beethoven’s tune is shown in Figure 12.

Figure 12: The musical response consisting of 20 notes.

5. The composition Zeno

Zeno is a piece for bass clarinet and electronic sounds. During the performance, the system listens to the bass clarinet and generates responses on the fly.

The interactive system consists of two components: a client and a server. The client runs on a standard laptop computer and the server runs on Rigetti’s Forest quantum computer, which is located in Berkeley, California. The server runs the two quantum systems introduced above. And the client takes care of the music and sound processing tasks.

The client listens to musical phrases via a microphone. Then, it extracts information from the audio signal, prepares the data, and relays them to the server. The server receives the data, runs the required quantum circuit, and sends the measurement results back to the client. Next, the client uses the measurements to compose the response. Together with the data relayed to the server, the client also sends an instruction indicating which circuit to run and other practical instructions. Upon receiving the measurements, the client then activates the respective generative algorithm and/or synthesiser to produce the response (Figure 13).

In addition to the additive qSyn synthesiser introduced above, the client also holds a synthesiser (qWav) that simulates the sounds of the clarinet using the Waveguides synthesis method [2].

The musical score contains performance instructions. It pre-defines which circuit to use and when. In addition to prescribing pre-set musical phrases to be played on the bass clarinet, there are moments where the performer is asked to improvise with a given set of musical notes. It is these improvisations that are listened to and processed by the system.

The extract in Figure 14 shows an example where the performer is asked to improvise with a given set of 7 notes for 5 seconds. The system is programmed to enter in listening mode for 5 seconds (indicate by a box labelled as “#5 QC Listening” on the lower staff). The listened sound is processed, the client and server perform their jobs, and a sound response is produced on the fly. As soon as the response starts, the performer is asked to pick notes from set A or B and improvise with them along with the quantum response.

Figure 14: Extract from the musical score showing an example of a performer’s choice.

5.1. Sarah’s Trick

Monophonic musical instruments, such as the flute or the clarinet, produce 1 note at a time. However, skilled players are able to produce multiphonics on these instruments; that is, more than one note at once. On a wind musical instrument, this is can be achieved by blowing air through the device in such a way that the resulting spectrum is split around more than one prominent, or fundamental, frequency. The bass clarinet is an interesting case in point [5].

Sarah Watts, the performer I wrote this piece for, is able to play two distinct multiphonics variants with the same fingering position on the keys of the bass clarinet. But it is often the case that she cannot predict accurately which of the two sets of multiphonics will result. Figure 15 shows an example of this. Given a base note B2, the same fingering position produces either an additional A5 (top stave) or a slightly sharped D4 (bottom stave).

Figure 15: Two distinct multiphonics variants are produced with the same fingering position on the keys of the bass clarinet.

Whereas the unstable nature of such multiphonics might be regarded as an inconvenience, they actually come in handy for a quantum-compliant musical scenario.

Let us imagine a situation where the player encounters an instruction to produce multiphonics with a specific fingering position. In a way, the act of following this instruction is analogous to measuring a quantum system: the performer is not certain which multiphonics will be produced until she actually plays the instrument. Even if the performer steers her efforts towards producing certain multiphonics, this would only increase the probability of obtaining them. Metaphorically, the multiphonics for that specific fingering position is in superposition until a performer plays them; or observe them if you like. The extract in Figure 16 shows an example of this in Zeno.

In Figure 16, the performer is asked to play multiphonics using a specific fingering position. The 2 possible top multiphonics are shown within brackets on the left side of the figure. After a brief moment of uncertainty (indicated by the wavy line), the multiphonics settles into one or the other, indicated as “a” or “b”. After a few moments, the performer then picks either note set A or B to improvise. However, this choice is dictated here by multiphonics. If the multiphonics settled with the higher note A5 then the performer should improvise for 15 seconds with notes from set A. Otherwise, she should improvise with notes from set B. The system listens to this improvisation (#16 QC Listening) and synthesises a sound. In this case, it does so using the Waveguides synthesiser.

Figure 16: Extract from the musical score showing an example using multiphonics.

An example of a section using qSeq is shown in Figure 17. In this case, the system listens to the performer improvising with notes from a given set for 30 seconds. Then, it processes the listened sequence and generates a response lasting for 1 minute. The system’s response is a sequence of musical notes encoded and saved as MIDI information [6].

Figure 17: Extract from the musical score with free improvisation alongside materials generated by qSeq.

The MIDI file was uploaded into a music production software and channeled straight into a stack of electronic music instruments, drum machines, and samplers. Each MIDI note triggers a few of these in polyphony.

6. Final Remarks

It is often said that today’s quantum computers are in a development stage comparable to those clunky mainframes built in the mid of the last century. Time is ripe for musicians to embrace this emerging technology. We at the Interdisciplinary Centre for Computer Music Research (University of Plymouth, UK) are on a mission to develop unprecedented new uses for quantum computing technology for music and creativity. (Incidentally, should you wish to sign up to study with us, please do not hesitate to get in touch.)

Admittedly, the quantum systems introduced above could as well be implemented on standard digital computers. At this stage, I am not advocating any quantum advantage for musical applications. What I advocate, however, is that musicians interested in using technology should be quantum-ready for when quantum computing hardware becomes more widely available and possibly advantageous. In the process of learning and experimenting with emerging quantum computing systems, novel approaches, creative ideas, and innovative applications are bound to emerge. Watch this space.

I would like to thank Rigetti Computing for supporting this research with priority access to quantum computing hardware and technical advice. Special Amy Brown, Tushar Mittal and Tom Lubowe for fruitful discussions.

Many thanks to bass clarinetist Sarah Watts for sharing her knowledge on multiphonics and contributing to the composition of Zeno.

Also, thanks to James Hefford, James R. Wootton and James Weaver for comments and suggestions on the full version of this article.

A full version of this article is available in arXiv: arXiv:2005.05832

References

[1] Miranda, E. R. (2001). Composing Music with Computers. Oxford (UK): Elsevier / Focal Press.

[2] Miranda, E. R. (2002). Computer Sound Design: Synthesis techniques and programming. Oxford (UK): Elsevier / Focal Press.

[3] Rigetti’s documentation for the Forest SDK: http://docs.rigetti.com/

[4] Bernhardt, C. (2010). Quantum Computing for Everyone. Cambridge, MA: The MIT Press.

[5] Watts, S. (2016). Spectral Immersions: A comprehensive Guide to the Theory and Practice of Bass Clarinet Multiphonics with Fingering Charts. Puurs (Belgium): Metropolis Music Publishers.

[6] The official MIDI specification. MIDI Association: https://www.midi.org/