Dawn Patrol EP—Surfing with AI-generative music models

Nao Tokui
Qosmo Lab
Published in
6 min readNov 6, 2022
Dawn Patrol EP

As a tangible outcome of my AI-based music production endeavors over the past few years, I am pleased to announce the release of a 12" vinyl record, the Dawn Patrol EP! Nearly a year after I initially announced its release and began taking pre-orders… Finally, the record has arrived in my hands.

The COVID pandemic, increased demand for vinyl records, and the situation in Ukraine caused logistical problems with raw materials and delayed the pressing process. In addition, the agency I had asked to arrange the pressing of the records went bankrupt (!!). The release of this album has been delayed due to a series of unfortunate events. I am very sorry for the inconvenience and worry this has caused those who were among the first to order the record (we even announced that we would start the refund procedure at one point). I sincerely appreciate your patience.

You can order it here: https://naotokui.bandcamp.com/album/dawn-patrol-ep

I want to say, “Let the work speak for itself.” Still, I would like to add some comments on the songs in this EP because I believe this EP is a practical example of the theme I wrote in my book “Creating with AI” (Published in 2021 in Japanese. I’m working on the English version of the book). The purpose of this article is to explain my intentions and how I used AI technologies briefly.

The first song, “Vox Yt-populi” is an experiment of a machine learning-based sampling method: I downloaded a dataset of music videos on YouTube and extracted only the parts containing human voices using a sound classification model based on a convolutional neural network (CNN). After segmenting with sound attacks, another CNN was used to identify the pitch of each segment. In this way, I can construct a musical scale made up of a series of human voices sampled from YouTube videos spanning from the lowest to the highest key on the keyboard. Each time a key is pressed, my sampler plays a different sound of the same pitch. (I used the same idea in rhythm pattern programming and created a website called Neural Beatbox.)

Neural Beatbox — Early experiment of machine learning-based sampling

For the drums, I used rhythm patterns generated by Variational Autoencoder-based model. This AI model was released as a Max for Live device, so that anyone can easily train and use their own AI model for music production by simply dragging and dropping the MIDI files. (I have talked about the significance of creating your own AI models and being able to create your own models in various places). Please give it a try. The title of the song is taken from the Latin Vox Populi, which means “voice of the people” (YT means “YouTube”), as it is a sampling of various voices.

Tokui, Nao. 2020. “Towards Democratizing Music Production with AI-Design of Variational Autoencoder-Based Rhythm Generator as a DAW Plugin.” arXiv [eess.AS]. arXiv. http://arxiv.org/abs/2004.01525.

For the second piece, “A Mission,” I used a different rhythm generation model based on Generative Adversarial Networks (GANs). GAN is a framework for adversarial training of a Generator, which attempts to generate data that looks exactly like the training data, and a Discriminator, which distinguishes the training “real” data from the generated “fake” data (It is often compared to a game between a forger and a connoisseur/detective). It is a groundbreaking framework that became a hot topic because of its ability to generate realistic-looking images(faces for example), and it led to the current image generation models such as Stable Diffusion and DALL-E.

In this work, I modified this framework slightly to generate new data (in this case, rhythm patterns), not only realistic but also deviating slightly from existing genres. Specifically, we added a new discriminator that identifies the genre of the generated rhythm, in addition to the original discriminator that distinguishes whether it is realistic or not. Then, I tried to train the generator to “confuse” this new additional genre discriminator as well. The following video provides a good overview of this framework.

Tokui, Nao. 2020. “Can GAN Originate New Electronic Dance Music Genres? -- Generating Novel Rhythm Patterns Using GAN with Genre Ambiguity Loss.” arXiv [cs.SD]. arXiv. http://arxiv.org/abs/2011.13062.

Mono-oto 2006” is a piece I made with the AKAI MPC2000 in 2006, when I was living in Paris and is the only piece I did not use AI. It is a simple minimal techno piece that I like for sampled sounds from the hair dryer and other things in my room and the thick bass made with AKAI sampler’s built-in sine wave. I really wanted to use it for DJing, so I added it to the collection this time ;-)

Models like Stable Diffusion and Midjourney, which generate images from text, are currently creating a lot of buzzes. Their high fidelity and ease of use (they can be easily tested on the Web) have led more people than ever before to start pondering the impact of AI on the realm of expression and creativity. There is no doubt that such models can have a positive effect, such as opening doors to people who have never been involved in creative activities or helping to improve efficiency. On the other hand, many issues need to be considered, such as the rights of painters and illustrators whose works are used as training data without their permission.

my attempt was a trial to create something that slightly deviates from the existing expressions with the help of AI

I will leave the discussion of these points for another time. Still, I would like to point out in this article that, unlike these models that aim to efficiently generate a high-quality product by combining existing expressions, my attempt was a trial to create something that slightly deviates from the existing expressions with the help of AI.

As I wrote in “Creating with AI,” I compare the ideal relationship with AI in art and music to “surfing.” Surfing is a sport (although I think it is more than a sport) based on the perfect balance between the passive act of being swept along by the waves and the proactive action of choosing waves and making turns at the right time. The rhythms and sampled tones generated/proposed by AI are unexpected, different, and sometimes outside the bounds of my creativity. I created music on this EP by proactively selecting, accepting, and being influenced (swept away) by AI’s outputs.

I summarized and gave a talk on my trials at AIMC (AI Music Creativity) Conference 2022. Here is a recording of my talk:

My keynote speech at AI Music Creativity Conference 2022

The name of the EP, Dawn Patrol, refers to surfers going to the beach every day at dawn to check the conditions of the day’s waves. The word “Dawn Patrol” evokes a sense of anticipation (and frustration when it misses) and excitement when finding unexpectedly good waves. This EP is a diary of my daily “dawn patrol” on AI during my days from 2019 to 2021. I hope you enjoy it.

The image on the album cover was also created by intentionally destroying (network bending) part of a simple AI model trained on photos of my face. Special thanks to Naoki Ise for the cover design.

Album cover animation