AI/ML music series 2: Imitation Game by Artemi-Maria Gioti

Published in

25 composers

15 min readFeb 23, 2021

Read my interview with Artemi-Maria Gioti on her piece “Imitation game” (2018). It is an interactive composition for human and robotic percussionist, incorporating machine learning and musical robotics. The composition is based on a dynamic form, shaped by decisions made by the musician and the robotic percussionist in real-time. The robotic percussionist interacts with the human based on machine listening, particularly a Neural Network trained to recognize different instruments (cymbals, bongos and cowbells) and playing techniques (strokes, scraping and bowing), while the musician interprets a non-linear score which enables him/her to adapt to the robotic percussionist’s actions in real-time.

The piece was a part of her doctoral research and which, in turn, is part of a larger artistic research project called Inter-Agency, funded by the Austrian Science Fund. The composer says that both her doctoral research and this project explore artificial intelligence and machine learning as a means of enhancing human missing communication in musical works for instrumentalists, computer and live electronics.

Text version of interview was created by Armands Stefans Sargsuns.

What is the main message of this piece?

I don’t know if there is an essential message as such, but the piece is really about the negotiation of intentions between two actors. In this case the two actors are the human and the robotic percussionist, so the piece is based on a reciprocal and conversational interaction between the two. The idea here is that the computer can choose among following the human percussionist’s lead, playing something similar and imitating the human, or just initiating musical changes and proposing new sound material. Ultimately, the robotic percussionist can choose among three interaction scenarios — imitate, repeat and initiate. The musician also follows a non-linear score that allows him to adapt to the robotic percussionist’s actions in real time, so the piece is based on a dynamic form. The form of the piece results from these decisions that the musician and the robotic percussionist make in real time; it’s a result of the musical negotiation between them.

How did you build the robot?

The robot is very much a DIY project. I built the hardware and the software for this piece. The hardware is mostly servomotors, controlled by an Arduino Uno. These motors control or move different mallets and some wire brushes. Then there are also two permanent magnets suspended over a cymbal and controlled by electromagnets underneath which creates a sort of irregular scraping sound on the cymbal. The software is written in SuperCollider and the neural networks as well as all the training was done in Python.

Essentially, the robot uses a neural network that has been trained to recognize different instruments, so the different types of percussion instruments and different playing techniques, such as strokes, scraping and bowing. Based on this sound-event-level recognition, the robot percussionist then calculates some metrics of musical contrast and these metrics concern larger musical phrases. Then the robotic percussionist monitors how musical contrast evolves through the piece. Let’s say, for example, that rhythmic contrast is constant for a while, this means that the musician has been playing the same rhythm for too long, then the robotic percussionist is more likely to try to introduce different rhythms and different sound material in order to make their interaction a bit more interesting and a bit more variable. That’s the auditory processing and decision making stage of the robot percussionist in a nutshell.

Watch full interviw:

The video recording of Imitation Game you will find bellow

Were you the one who did the process of training?

I trained the robotic percussionist myself with the help of a musician. I have to say that the percussionist who performed this piece is Manuel Alcaraz Clemente and I’m very grateful to him for his participation in this part of the process.

We would just get together and record a lot of examples of the different classes that we wanted the neural network to recognize. Manuel would play a lot of strokes on the cowbell, then a lot of strokes on the cymbals, the same for the bongos. We would use different types of mallets to help the neural network better differentiate. It was just the process of recording examples, training the neural network, doing some error analysis and then recording more examples until it reached a level of accuracy that I considered was good enough for the purposes of the piece.

How big was the database that you needed to create for training the robot?

I have to be honest, I don’t remember. It was definitely a few hours of material, because there are two components. There are the different instruments and then the different playing techniques, so there’s definitely a few hours of material, but I don’t remember exactly how many training examples that added up to, if I’m honest.

What kind of software did you use?

That was all done in Python actually. I programmed the neural network from scratch and did all the training in Python, but the piece is running in Super Collider which is a programming language made specifically for music.

Would you say that you needed programming skills in order to create this piece?

I guess you could say that. I think there would have been an easier way to do this. I don’t think that in order to work with machine learning you have to be able to program your own neural networks in Python, but for me it’s something that I was really interested in and I really wanted to be able to understand the algorithms. I thought that I could do that, if I was able to program them from scratch. This is because it’s not my only piece that involves AI and machine learning. It’s something that I have been interested in for a long time now, so understanding the inner workings of the algorithms is very central to my work and I think it, also, influenced some of the work I did later on. In this piece the neural network is doing something very standard, a normal task for neural networks, but later in my work I decided that I wanted to explore more of the specificities of neural networks, even the limitations. Recently, I finished a piece, dealing with the concept of AI bias, for example. I think that it is this sort of knowledge of the algorithms and the ability to really go into depth in them that led my compositional thoughts to these ideas later.

What kind of learning technique did you use?

It is supervised learning. I provided training examples with labels, so I provided the neural network with examples of a certain sound and then the label of what it is, for example, a bongo stroke. But it is supervised learning. Classification.

Why did you choose this specific setup for the piece?

If you notice in the video of the piece, the human and the robotic percussionist have basically an almost identical setup. This is meant to accentuate the symmetrical relationship between them. The piece is really based on an equal and reciprocal interaction between them, meaning that the form of the piece and what happens next depends on both the musician and the robotic percussionist. Both can decide to introduce new sound material to initiate musical changes and so on. This setup is mirroring that relationship between them in the piece.

How did you come up with the idea of the piece?

If you want to work with musical robotics, percussion is a really good way to do that, because it would be much harder to build a robot that plays flute, for example. It was definitely this idea of putting both the human and the computer on the same domain, that they are both producing acoustic sounds, that was very interesting to me. Again, it has to do with the concept of the piece which is this equal interaction, it’s reciprocal interaction between human and machine.

What was the process of composing?

It usually starts with an idea. For me this idea can be quite abstract at first. Then, of course, there was a technical part, which means that I had to collect sound examples, I had to train the neural networks. This has to be done before I can start working on the piece, because it was really based on this idea of instrument and playing technique recognition. Then I would say that the largest part of the compositional process is a reciprocal process between software and hardware development and artistic experimentation. This process involved Manuel, the percussionist, as well, so what I would do during the compositional process is that I would come up with some interaction scenarios, some specific behaviours for the robotic percussionist and I would implement them, I would make like a first version of the software and then I would invite the musician to participate in what Hsu and Sosnick call a “naive rehearsal”.

A “naive rehearsal” is basically an improvisation in which the musician is not given any information on how the robotic percussionist is going to respond to his actions. I just asked him to improvise with the robotic percussionist for 10 minutes and I collected data from these sessions through observation. I observed their interaction, but I also conducted a semi-structured interview with the musician and asked him to fill-in a questionnaire. During this process I was interested in exploring any gaps between what I intended and what the musician perceived, so how he perceived, how he interpreted the interaction or the responses of the robotic percussionist. I was also interested in experimenting aesthetically and just trying out my ideas and seeing if they worked musically, or if they needed to be revised. Of course, as a part of this process a lot of these initial ideas were revised, some were rejected and I even came up with new ideas. It was really this iterative process of implementing things, trying them out and assessing them aesthetically.

How did you decide on the musical language for this piece?

So that’s interesting… I don’t know if I can really describe that as a language, but I think I understand what you mean. When I program these things I have specific interactions in mind, but I also want to see what the musician might discover. Sometimes I design things for a certain purpose, but then the musician makes something else out of it, so I want to explore that thing. I think this can really be helpful for creative discovery, so at this point of the compositional process I have to say I was not super interested in the musician doing what I had planned for the piece, but just doing his own thing and allowing any unintended affordances to emerge, and things that I might not have necessarily intended to go this way. Some of them were actually very interesting for the creative process.

There was one instance, in particular, where during one of these “naive rehearsals” the musician mistakenly thought that the robotic percussionist was repeating his actions one by one. It just happened that the musician, for example, played a few strokes on the bongos and the computer did the same and it led him to think that the robotic percussionist was really repeating after him. That was not what the robotic percussionist was doing, it was just coincidence at that moment, but this misinterpretation led the musician to behave in a very specific way and that created a very interesting counterpoint between the two. Later on, this gave me the idea to create the repetition scenario and integrate it into the composition. In its final variation in the composition this scenario is this sort of improvised interaction in which the computer will repeat some of the musician’s actions, but not all of them. This is just an example of how this collaboration with the musician really informed the compositional process.

Could one say that the whole setup has an improvisational element to it?

There are some things that are notated, but this is very much the idea and this is what I’m looking for. It’s this element of surprise. And not just for the audience, but also for me, because every time I hear this place it’s different. It’s not that different that you can’t recognize it as the same piece anymore, but it is a different experience. This is something I’m very interested in.

Also it has to do with the musician being a part of this process and the musician becoming a co-creator during the performance. The work is the product of a co-creative effort that involves me as a composer, the musician and, of course, the robot, because the decisions and actions that the robotic percussionist performs also shape the performance.

What is the form of the piece?

There is no specific form to the piece, except for the end. The piece is ended by the robotic percussionist — the performer has no influence on that. It’s the robotic percussionist’s decision. The robotic percussionist will end the piece after a certain amount of time, somewhere around 15 minutes — it’s not the same at every performance. However, for the main body of the piece there is no predetermined form, so the form is really the result of the in-the-moment interaction between the two.

Could you imagine the piece without AI, but with pre-programmed algorithms, for example?

Actually, it’s a very good question. No, I cannot imagine that, because it would be a very, very different listening experience, it would be a very different concept. The use of machine learning in this piece is not something that is secondary or unimportant. I think that this piece would not have been possible without AI and machine learning, because how else can you teach the computer to distinguish among these perceptual categories, different instruments, different playing techniques? It is actually impossible to describe these in hand-coded rules, so this is one of these cases where the ability of machine learning algorithms to learn by example comes in really handy and enables compositional concepts that otherwise would have been unimaginable. I mean I really don’t think that I would have been able to make the exact same piece without this technology and I think this also shows that these tools are not just tools, they’re actually more than that. They are a part of the creative ideation, they are a part of compositional thinking really.

What was the biggest surprise in terms of working with machine learning for this piece?

Actually, I learned a lot during this process. There were some things that were related to the machine learning part, but I think they are more technical and less interesting. You know, I learned a lot by trying to some degree, to some extent, to simulate aesthetically informed decisions that humans make when they play music with each other. I’m not saying that the robotic percussionist in this case fully simulates what a musician would do in this situation, but it creates the illusion of this sort of agency that humans have in music-making. I’ve learned a lot by trying to simulate all these ideas of following the musician, then for some reason deciding to take the lead and introduce new sound material, on which criteria should these decisions be based and when do they make sense musically. All those things are very hard to describe in propositional terms, they’re very hard to describe verbally. It is just a matter of programming and letting the robotic percussionist and the musician interact with each other, observing and just revising the code, correcting things and so on. That is not the type of knowledge that is very easy to describe or put into words, but it was definitely a very important part of the compositional process, just sort of fine tuning these behaviours to make them look more believable and more musically meaningful.

What was the hardest part of working with machine learning?

I guess it’s the amount of data that you need to train a neural network to perform these

classification tasks that are very, very easy for a human. It is very easy for us to tell, for example, that this was a cowbell, this was a bongo, but, you know, for the machine learning algorithm it takes a considerable amount of data.

I really remember Manuel and me recording these examples and using all kinds of different mallets, because the sound that you produce when you use a soft mallet on the cowbell is completely different than when you use a hard mallet on the same instrument. I don’t know if it was necessarily a challenge, but it was definitely a part of the process, because you have to be aware of the gap between human and computational perception and you have to try to bridge that gap. You have to try to find the right features, you have to really analyse your own listening — what do I hear in this sound that makes me think this is a cowbell or a bongo? — and try to find the right features to describe these perceptual features to the computer.

I would say that labelling is not a very hard part of the process. Feature engineering was difficult in terms of just finding the right features. It definitely involves a lot of iterations of trying out different features and then assessing the accuracy of the model, adding new features and assessing the accuracy.

What were the features of working with machine learning for this piece?

The features are fairly technical, it’s music information retrieval. I don’t know if they are very relevant. There are a few features that I’ve used in this piece. I have used onset detection that just detects if there is an attack in the sound, which should be in the case of stroke, but perhaps not in the case of scraping or other playing techniques. There were also some spectral descriptors such as MFCC’s that described the timbre of the sound. I mean I can’t say that I remember the exact list of features that I ended up using for the piece, but there’s a publication it. I have detailed everything that is involved in the machine learning part and, also, the compositional aspects of the piece in that paper.

What advice would you give to an up-and-coming composer who wants to work with machine learning?

For composers, for example, who are interested in machine learning and want to explore this area, I think that nowadays there are so many great tools available. For example, Rebecca Fiebrink’s Wekinator is an open source software tool and it’s designed specifically for artists. You don’t need any machine learning background, you do not need to go into the effort of programming neural networks from scratch to use machine learning in your works. I would advise them to start there, to just start experimenting with these algorithms, training these algorithms. I really don’t think that it is necessary for artists to be programmers to be able to use machine learning in their work. At the same time I think that artists should get more involved with these technologies. I think that this will have a significant impact not only on art itself.

For example, what I mentioned before is that these tools really change the way we think and they really end up shaping our musical thinking basically. In my last piece I explored this concept of AI bias and I essentially trained a neural network to predict my aesthetic preferences. I recorded a lot of sound material and then I labelled it on a scale from one to five, based on how interesting I found it. Then I trained a neural network and integrated this neural network in an interactive music system that interacts with a clarinetist. What happens during the piece is that the machine learning algorithm determines whether the computer music system will respond to the musician, whether it will remain silent or propose new sound material. Essentially, the computer has its own aesthetic preferences and acts based on them. Again, this is an example of a compositional concept that was only possible and imaginable because of these tools, because of neural networks and what these algorithms afford.

I think that there are two components to this relationship between arts and AI or machine learning. The first is the transformative potential that these algorithms can have on arts, on artistic thought, on artistic practices. The second is that artistic applications of these tools, artistic applications of AI and machine learning ,can really help demystify these tools, so they can really help us understand what these tools can and can’t do and what their capabilities and limitations are.

For example, the piece that I just mentioned, Bias, is a piece that explores a limitation of machine learning algorithms. What I wanted to do in this piece was not simulate my aesthetic judgments. It was about letting the algorithm make its own assumptions about my judgments. The algorithm will make some erroneous assumptions, it will make some arbitrary assumptions about my aesthetic judgments, so what we have in the end is not something that is a simulation of my agency. It’s a new hybrid agency that has a little bit of my bias and a little bit of algorithmic bias. That’s what I find interesting. These compositional concepts that are really informed by the capabilities and limitations of these technologies. I think they can really help nowadays, they can help us understand what these algorithms can do and also what they can’t do.

Artemi-Maria Gioti is a composer and artistic researcher working in the fields of artificial intelligence, musical robotics, collaborative and participatory sound art. Her compositions include works for solo instruments, ensemble, live and interactive electronics and have been performed in Greece, Austria, Portugal, Germany, Denmark, Canada, The Netherlands and in the USA. She studied Composition at the University of Macedonia (Greece), Electroacoustic Composition at the University of Music and performing Arts of Vienna, and Composition — Computer Music at the Institute for Electronic Music and Acoustics (IEM) of the University of Music and Performing Arts of Graz. She is currently pursuing her doctoral degree at the same university in the field of Music and AI.

AI/ML music series 2: Imitation Game by Artemi-Maria Gioti

Written by sandris murins