Interactive AI

Alex Han
4 min readFeb 21, 2023

--

I have been enthralled with the idea of interactive artificial intelligence ever since I read Ge Wang’s article for Stanford’s Institute for Human-Centered Artificial Intelligence (HAI). I have always felt frustrated with the way AI systems are treated as a black box system that simply produces answers and solutions — what Ge calls the “big red button” model. This is especially unsatisfying when considering the use of AI in the arts — as impressive as the latest generative models are, it feels as though it defeats the purpose to simply press the big red button and spit out a song. Such an approach treats music as an end-product, and almost by definition obscures (or at least excludes human decision-making from) the process. The idea of interweaving layers of human interaction into the iterative cycle of machine learning seems like such an attractive and elegant solution.

It was similarly inspiring to read this 2014 paper outlining and advocating for interactive machine learning as a distinct and promising field. The authors provide a range of case studies and examples of how a system that integrates human feedback to the learning process can result in measurable performance improvements. However, these examples mostly involved systems that used AI for prediction and classification, as opposed to the kind of generative or “creative” models being used to make music. I wonder to what extent the interactive machine learning framework can benefit generative AI music systems, and what such designs would look like. Perhaps a reinforcement learning model could be used to allow users to give feedback based on qualitative evaluations (“song is too fast”, “I don’t like that high melody”, “I want more bass”) and nudge subsequent output. This does allow the human user to shape the iterative process more than a simple “big red button” model, but falls short, in my eyes, from sufficiently centering human creative decisions.

More promising to me were the ways Rebecca Fiebrink used Wekinator to train mappings between gesture and sound in real-time. Here, the fundamental creative decision-making is still done by the human user — but the complex, unconventional coupling of physical gesture and sound synthesis is handled by machine learning. Ultimately, the usefulness of machine learning is its power to…well, learn. It seems that a fruitful way to use AI as a creative tool, is to establish complex, multifaceted relationships between signals that would otherwise be too computationally intricate to define in real-time. This allows for cross-domain transfer of all kinds of input sources (gesture, image, text, other sound) into music — in a robust way that allows for interpolation and extrapolation to an endless sea of novel outputs. In this way, AI serves as the computational muscle that opens up new channels of expression available to the user. Crucially, the actual creative decision making is done by the human user.

This research is incredibly exciting to me, both because it aims to address so many of the inadequacies of more traditional AI systems in music, and for the fact that this seems to be an actively emerging field with many unexplored possibilities. I am eager to dream up new ways to involve humans in the loop, and to use them myself in the music I create. Here are some ideas that I want to develop further and potentially try to implement:

  • A generative audiovisual piece whose output is actively modulated and nudged in real-time by members of the audience (entering feedback on their smartphones in the form of text, sliders, buttons, etc.)
  • A synth timbre transformation tool for producers, that allows for iterative sculpting of timbre based on subjective qualities as opposed to direct sonic features (e.g. “make it brighter, more retro, and kind of hollow sounding” vs. “add more resonance to the filter, decrease the attack of the amp envelope”).
  • A smart arpeggiator tool that can transform gesture or other input signals to change a synth arpeggiator’s sequence or sound design in real-time
  • A telepathic piano piece where gesture or drawing is mapped to different parameters driving an algorithmic composition as it plays in real-time
  • Related to the first idea: a “reverse film scoring” system where a live ensemble plays music, and input from audience members via smartphone interaction drives generation of a single visual scene.
  • A more nuanced/full spectrum synesthesia visualizer, where I could provide examples of the particular mix of colors I see for different pitches, intervals, and chords, and then play piano live with generative visuals with interpolated colors
  • An even better Spotify recommendation system that doesn’t just rely on likes, saves, and plays, instead allowing for a range of intermittent user feedback (“too jazzy”, “more calming”, “sick guitar solo”) to influence recommendations
  • A musical analysis tool that performs analysis of harmonic function, phrase boundaries, and long-form structure, all of which rely on some degree of human subjective interpretation
  • Simple track mixing for untrained users, allowing for generic feedback (“piano too loud”, “vocals sound thin”, “drums weak”) to drive automatic mixing (using gain, EQ, compression, etc.)
  • A tool for design and transformation of acoustic space/reverb where user input/feedback can affect impulse responses under the hood

--

--