update: 11, June 2018 — AI DJ Project won “Honorary Mentions” Award at Prix Ars Electronica 2018.
update: March 2018 — AI DJ Project got Jury Selections award at Japan Media Art Festival.
“AI DJ Project” is a live performance featuring an Artificial Intelligence (AI) DJ playing alongside a human DJ. Utilizing various deep neural networks, the software(AI DJ) selects vinyl records and mixes songs. Playing alternately, each DJ selects one song at a time, embodying a dialogue between the human and AI through music. DJ-ing “Back to Back” serves as a critical investigation into the unique relationship between humans and machines.
In the performance, AI is not a replacement for the human DJ. Instead, it is a partner that can think and play alongside its human counterpart, bringing forth a broader perspective of our relationship to contemporary technologies.
1. Background and Related Projects
A DJ (or disc jockey) is a person who mixes different sources of pre-existing recorded music, usually for a live audience in a nightclub. It is regarded as a highly creative process to select appropriate music and mix them in smooth and pleasant ways.
The art of DJ has been one of many testbeds of computational creativity. ‘AlgoRhythms’ is a Turing test competition, where DJ software mix given music automatically and try to convince human evaluators that human DJs did the mixes. ‘2045' is an AI-themed DJ party, where each DJ brings his/her custom DJ algorithm and let it play in lieu.
Unlike these previous attempts, our AI DJ project doesn’t aim to automate the whole DJ process, but rather tries to accomplish a successful collaboration between AI and human DJ. Hence in our DJ session, software and human DJ plays alternately one track at a time(usually referred as Back to Back or B2B).
AI DJ project doesn’t aim to automate the whole DJ process, but rather tries to accomplish a successful collaboration between AI and human DJ. Hence in our DJ session, software and human DJ plays alternately one track at a time (usually referred as Back to Back or B2B)
2. Our method
In B2B the AI system and human DJ perform under similar conditions as much as possible. For example, the AI uses the same physical vinyl records and turntables as human DJ. The system listens to tracks played by the human DJ and chooses the next record to be played. (It is a task for human assistants to look for the selected record and set it to the turntable.)
After a record is set the AI begins the process again, adjusting the tempo of the next track to the tempo of the track played by its human counterpart. The beats of both tracks are matched by controlling the pitch(rotation speed) of the turntable. For this purpose, we built a custom DJ turntable and a robot finger, which can be plugged into a computer and be manipulated via OSC protocol.
Also, a good DJ needs to pay attention to the energy of the audience. We utilize a Deep Leaning based motion tracking technique to quantify how much people in the audience dance to the music AI plays.
2.1 Music Selection
The minimum requirement for a DJ is to maintain the “flow” of music, so it is a common practice to select a next track, which sounds somewhat similar to what is being played, but has something new in its rhythm structure/sound texture. . . etc at the same time. Also, DJs usually use instruments or sometimes prominent drum-machine sounds used in tracks as clues for music selection (i.e., a track with piano solo to a track with organ riff, Two tracks both with Roland TR-808 snare)
Based on these observations, we trained three different neural networks. Our models and datasets used for each model are the following:
- Genre Inference (wasabeat dance music dataset)
- Instrument Inference (IRMAS dataset)
- Drum Machine Inference (200.Drum.Machines dataset)
Each model is a convolutional neural network similar to , which takes spectrogram images of sounds and infers genres(minimal techno/tech house/hip-hop. . . ), instruments(piano/trumpet. . . ) and drum machines (TR-808/TR-909. . . ).
Once we got the network trained, we can use the same model to extract auditory features in a high dimensional vector. When human DJ is playing, the system feeds the incoming audio into the model and generate a feature vector. The vector will be compared with those of all tracks in our pre-selected record box (with over 350 tracks for the present), so that the system can select the closest track, which presumably has similar musical tone/mood/texture, as the next track to play.
It’s worth noting that we initially collected and analyzed DJ playlist dataset (visualized in the image) and used it to select the most likely candidate according to the data as in the collaborative filter. We soon realized, however, that it ended up banal music selections, then decided to ignore all metadata associated with the music (genre, artist name, label, etc.) and focus only on the audio data. (An interactive version of the DJ playlist visualization is available here)
The second task for AI DJ is to control the pitch(=speed) of turntable to match the beat with music human DJ plays. We used “reinforcement learning”(RL) to teach the model how to speed up/down, nudge/pull the turntable to align downbeats through trials and errors. We use various metrics in  to compute rewards for the model.
We have found that it is relatively easy to match tempi of two tracks, but very difficult to align the “phase” of beats at the same time due to its longterm dependency: the result of any manipulation can be observed as changes in tempo only after several bars. Hence, the beat matching through RL is still an open challenge.
“A good DJ is always looking at the crowd, seeing what they like, seeing whether it’s working; communicating with them, smiling at them. And a bad DJ is always looking down at the decks and just doing whatever they practiced in their bedroom, regardless of whether the crowd are enjoying it or not.” Norman Cook, aka Fatboy Slim
At the latest AI DJ performance in Dec 2017, we introduced a new feature: “reading” a crowd. It is an essential role of DJ to read the audience and play music suitable to the atmosphere. In the performance, we deployed a camera system to track the movement of the bodies in the crowd using OpenPose library. The system quantifies how much the audience appreciates (i.e., dance) the music being played and use the information in the process of music selection.
During music selection process, the system tries to select tracks with similar mood as mentioned above, as long as the amount of the body movement is more significant than a given threshold. Once the index gets less than the threshold, random noise, inverse proportional to the amount of the body movement, was added to the feature vectors of incoming music, so that the system might be able to explore new musical realm and (hopefully) stimulate the seemingly bored audience.
Unsurprisingly, this randomness apparently worked as a feedback loop in the performance: the randomness brought more confusion to the audience, and it led to more randomness. It ended up proving the difficulty to maintain a subtle balance between regularity and unexpectedness in DJ’s music selection process.
We also tried to visualize the aforementioned three processes during the performance.
3. Conclusion and Future Directions
We have performed several times in different locations in Japan and Europe. Sometimes the music selection of the system suits the atmosphere very well, but at times it appears alien to the human DJ and the audience. This slight unpredictability brings an amusing tension into the performance, which is only made possible by the interaction between the human and AI DJs.
We also found that the audience still needs physical embodiment of AI DJ, to which they can project their emotional bond with the music. We have introduced a very minimalistic head-only robot, which keeps nodding to the beats. A GoPro camera attached to the robot provided the first-person view of the robot and gave an impression of “autonomy” to the audience. The design of physical artifacts embodying the existence of AI is an open and intriguing question to investigate in the future.
4. Call for collaborators
If you are interested in working on this project, for better beatmatching algorithms/crowd-reading process, etc., please feel free to drop me a line!:
tokui (at) qosmo.jp
■ 2045 × LIFE PAINT Supported by VOLVO CAR JAPAN
2016/10/27 @Daikanyama UNIT, Tokyo
■ YCAM presents AI DJ / Wildbunch 2017
2017/8/19 @ Kirara Memorial Park, Yamaguchi, Japan
■ Google I/O 20219 Keynote Preshow
2019/5/7 @ Shoreline Amphitheatre, CA, USA
Concept/Machine Learning: Nao Tokui
Visualization: Shoya Dozono
Project Management: Miyu Hosoi
Assistant Programmer: Yuma Kajihara, Robin Jungers
Robot: TASKO, inc.
Customized turntable for AI: Mitsuhito Ando (YCAM InterLab)
Production Support: YCAM InterLab
 AlgoRhythm — Neukom Institute Turing Tests in the Creative Arts. http://bregman.dartmouth.edu/turingtests/, 2016.
 Keunwoo Choi, Gyrgy Fazekas and Mark Sandler. Transfer learning for music classification and regression tasks. Mar 2017.
 Matthew E. P. Davies and Sebastian Böck. Evaluating the Evaluation Measures for Beat Tracking. ISMIR, 2014.
 Daito Manabe and Nao Tokui. 2045 AI DJ Party. http://2045.rhizomatiks.com/, 2014.