Etude #2: Featured Artist

Guinness Chen
14 min readFeb 1, 2024

--

Phase 1

I experimented with six combinations of features. My philosophy for tuning my feature set was to keep the feature dimension as low as possible, as to avoid overfitting. To this end, I would add an additional feature if and only if that additional feature lowered the cross validation accuracy. Similarly, I experimented with dropping some features, and if cross validation accuracy improved after the drop, then I would confirm the drop. Here were the results from my six trials.

Trial 1:

Feature set: Centroid, flux, RMS, MFCC (20 coefficients, 10 filters)

fold 0 accuracy: 0.4181

fold 1 accuracy: 0.3966

fold 2 accuracy: 0.4025

fold 3 accuracy: 0.3779

fold 4 accuracy: 0.4157

Trial 2:

Centroid, flux, RMS, MFCC (13 coefficients, 25 filters)

fold 0 accuracy: 0.4304

fold 1 accuracy: 0.4309

fold 2 accuracy: 0.3961

fold 3 accuracy: 0.4211

fold 4 accuracy: 0.4157

Trial 3:

Centroid, flux, RMS, MFCC (7 coefficients, 25 filters)

fold 0 accuracy: 0.4049

fold 1 accuracy: 0.3990

fold 2 accuracy: 0.3735

fold 3 accuracy: 0.4216

fold 4 accuracy: 0.4025

Trial 4:

Centroid, RMS, MFCC (13 coefficients, 25 filters)

fold 0 accuracy: 0.4083

fold 1 accuracy: 0.4225

fold 2 accuracy: 0.3922

fold 3 accuracy: 0.3691

fold 4 accuracy: 0.3985

Trial 5:

Centroid, RMS, MFCC (13 coefficients, 25 filters), flux, chroma

fold 0 accuracy: 0.3971

fold 1 accuracy: 0.4103

fold 2 accuracy: 0.3784

fold 3 accuracy: 0.3858

fold 4 accuracy: 0.4196

Trial 6:

Centroid, RMS, MFCC (13 coefficients, 25 filters), flux, kurtosis

fold 0 accuracy: 0.4632

fold 1 accuracy: 0.4382

fold 2 accuracy: 0.4147

fold 3 accuracy: 0.4373

fold 4 accuracy: 0.4397

Analysis:

The best results were in the final trial, where my features were Centroid, RMS, MFCC (13 coefficients, 25 filters), flux, and kurtosis.

Phase 2

For my mosaic tool, I experimented with lots of different types of audio for my dataset, including rap music, pop music, acapella music, and classical music. I also experimented with the number of voices, and the KNN K parameters.

It was very difficult to get my final output to sound coherent and tasteful as opposed to chaotic and overwhelming. From my experimentation, it was clear that the timing of the audio samples was very important. If the samples were beat aligned, then the resulting audio would sound more intentional. However, for this milestone I haven’t implemented the beat alignment yet. I hope to finish this by the end of phase 3.

For phase 2, my demo is mostly a proof of concept of driving one song with another song. For this, I decided to create a database of short sounds from the song Runaway by Kanye West. And I decided to drive it with Anti-Hero by Taylor Swift (Illenium remix). I chose these particular songs because Kanye and Taylor have infamously feuded in the past, so I thought it would be funny to bring them together in the same song. Also, Runaway and Anti-Hero are in the same key, so if I played them at the same time, then it wouldn’t sound terribly dissonant.

Anyway, this is just a proof of concept, and I expect to have a much more polished and tasteful piece by the end of phase 3. Enjoy the demo!

Phase 3

The extra week of work did wonders for my project. Over the past week, I developed a coherent vision for I wanted my etude to be. In one sentence, my etude is a KNN-driven remix engine and visualizer. I chose two–Runaway by Kanye West and Anti-Hero by Taylor Swift–and my ChucK program remixes them together in a way that actually doesn’t sound horrible. I had two main goals for Phase 3: tastefulness, and visualization.

First, I wanted to make my music actually enjoyable to listen to. Call me old fashioned, but I think CCRMA-Core music isn’t always the most tasteful music ever. To be fair, it’s tough to get ChucK-generated music to sound great. But I found that there was some low hanging fruit that I could easily fix. My phase 2 demo was extremely cacophonous. This happened because the two audio sources clashed too much. Even though the two songs were in the same key, the background instrumentals disagreed on chords, and neither song was beat aligned. To fix that problem, I decided to use only use the instrumental from Runaway. Then I found isolated acapella vocals for Anti-Hero and also Runaway. Finally, I slightly stretched the BPM of Anti-Hero and aligned the starts of the tracks in Logic so that all three files (the instrumental, and both of the vocals) were beat aligned. I also calibrated the FFT analysis and hop size so that it would analyze 1 full measure at a time, and then pull 1 full measure from the vocals.

My second goal was to create a visualization for the music. I got some feedback after the milestone that my etude would be more engaging to listen to and watch if there was also an accompanying visualization. So I created one in procesing. My visualization has two main components. First, in the foreground, there are images of Kanye and Taylor which fade in and fade out across the screen as their vocals are played. Next, in the background there is a particle simulation which is synced to the RMS of instrumental track. So you can visualize the energy of the music. I wanted to create something that would be captivating to watch. I think I succeeded. It’s pretty entertaining to just watch the particles and listen to the music, since you can hear and see every beat.

Without further ado, here is the demo video for my project. Enjoy the video!

    //------------------------------------------------------------------------------
// name: remix.ck
// desc: real-time audio mosaic synthesis with similarity retrieval.
//
// USAGE: run with INPUT model file
// > chuck mosaic-synth-file.ck:INPUT
//
// date: Winter 2024
// author: Guinness Chen
//
// starter code by: Ge Wang (https://ccrma.stanford.edu/~ge/)
// Yikai Li
//------------------------------------------------------------------------------

// input: pre-extracted model file
string FEATURES_FILE;
if( me.args() > 0 )
{
me.arg(0) => FEATURES_FILE;
}
else
{
// print usage
<<< "usage: chuck mosaic-synth-doh.ck:INPUT", "" >>>;
<<< " |- INPUT: model file (.txt) containing extracted feature vectors", "" >>>;
}

//------------------------------------------------------------------------------
// expected model file format; each VALUE is a feature value
// (feel free to adapt and modify the file format as needed)
//------------------------------------------------------------------------------
// filePath windowStartTime VALUE VALUE ... VALUE
// filePath windowStartTime VALUE VALUE ... VALUE
// ...
// filePath windowStartTime VALUE VALUE ... VALUE
//------------------------------------------------------------------------------

//------------------------------------------------------------------------------
// unit analyzer network: *** this must match the features in the features file
//------------------------------------------------------------------------------

// audio input into a FFT
SndBuf input => FFT fft;
// a thing for collecting multiple features into one vector
FeatureCollector combo => blackhole;
// add spectral feature: Centroid
fft =^ Centroid centroid =^ combo;
// add spectral feature: Flux
fft =^ Flux flux =^ combo;
// add spectral feature: RMS
fft =^ RMS rms =^ combo;
// add spectral feature: MFCC
fft =^ MFCC mfcc =^ combo;
// add spectral feature: Chroma
fft =^ Kurtosis kurtosis =^ combo;

// setup seperate analysis for energy
input => FFT fftEnergy =^ RMS rmsEnergy => blackhole;

//------------------------------------------------------------------------------
// setting up our synthesized audio input to be analyzed and mosaic'ed
//------------------------------------------------------------------------------

// if we want to hear our audio input
input => Gain g => dac;

// scale the volume
0.7 => g.gain;

// load sound (by default it will start playing from SndBuf)
"runaway_backing.wav"=> input.read;

//-----------------------------------------------------------------------------
// setting analysis parameters -- also should match what was used during extration
//-----------------------------------------------------------------------------

// set number of coefficients in MFCC (how many we get out)
13 => mfcc.numCoeffs;
// set number of mel filters in MFCC
25 => mfcc.numFilters;

// do one .upchuck() so FeatureCollector knows how many total dimension
combo.upchuck();
// get number of total feature dimensions
combo.fvals().size() => int NUM_DIMENSIONS;

// set FFT size
121654 => fft.size;
// set window type and size
Windowing.hann(fft.size()) => fft.window;
// our hop size (how often to perform analysis)
2.7586::second => dur HOP;
// how many frames to aggregate before averaging?
// (this does not need to match extraction; might play with this number)
1 => int NUM_FRAMES;
// how much time to aggregate features for each file
fft.size()::samp * NUM_FRAMES => dur EXTRACT_TIME;

//------------------------------------------------------------------------------
// unit generator network: for real-time sound synthesis
//------------------------------------------------------------------------------

// how many max at any time?
12 => int NUM_VOICES;
// a number of audio buffers to cycle between
SndBuf buffers[NUM_VOICES]; ADSR envs[NUM_VOICES];

// setup pans
Gain leftPans[NUM_VOICES];
Gain rightPans[NUM_VOICES];

// setup reverb
NRev leftReverb[NUM_VOICES];
NRev rightReverb[NUM_VOICES];

// set parameters
for( int i; i < NUM_VOICES; i++ ){
// setup pan
Math.random2f(0,1) => float leftPan;
leftPan => leftPans[i].gain;
1-leftPan => rightPans[i].gain;

// setup reverb
.16 => leftReverb[i].mix;
1 => leftReverb[i].gain;
.16 => rightReverb[i].mix;
1 => rightReverb[i].gain;

// connect audio
buffers[i] => envs[i];
envs[i] => leftPans[i];
envs[i] => rightPans[i];
leftPans[i] => leftReverb[i] => dac.left;
rightPans[i] => rightReverb[i] => dac.right;

// set chunk size (how to to load at a time)
fft.size() => buffers[i].chunks;

// set envelope parameters
envs[i].set( EXTRACT_TIME, EXTRACT_TIME/256, 1, EXTRACT_TIME );
}

//------------------------------------------------------------------------------
// load feature data; read important global values like numPoints and numCoeffs
//------------------------------------------------------------------------------

// values to be read from file
0 => int numPoints; // number of points in data
0 => int numCoeffs; // number of dimensions in data
// file read PART 1: read over the file to get numPoints and numCoeffs
loadFile( FEATURES_FILE ) @=> FileIO @ fin;
// check
if( !fin.good() ) me.exit();
// check dimension at least
if( numCoeffs != NUM_DIMENSIONS )
{
// error
<<< "[error] expecting:", NUM_DIMENSIONS, "dimensions; but features file has:", numCoeffs >>>;
// stop
me.exit();
}

//------------------------------------------------------------------------------
// each Point corresponds to one line in the input file, which is one audio window
//------------------------------------------------------------------------------

class AudioWindow
{
// unique point index (use this to lookup feature vector)
int uid;
// which file did this come file (in files arary)
int fileIndex;
// starting time in that file (in seconds)
float windowTime;

// set
fun void set( int id, int fi, float wt )
{
id => uid;
fi => fileIndex;
wt => windowTime;
}
}

// array of all points in model file
AudioWindow windows[numPoints];
// unique filenames; we will append to this
string files[0];
// map of filenames loaded
int filename2state[0];
// feature vectors of data points
float inFeatures[numPoints][numCoeffs];
// generate array of unique indices
int uids[numPoints]; for( int i; i < numPoints; i++ ) i => uids[i];

// use this for new input
float features[NUM_FRAMES][numCoeffs];
// average values of coefficients across frames
float featureMean[numCoeffs];


//------------------------------------------------------------------------------
// read the data
//------------------------------------------------------------------------------

readData( fin );

//------------------------------------------------------------------------------
// set up our KNN object to use for classification
// (KNN2 is a fancier version of the KNN object)
// -- run KNN2.help(); in a separate program to see its available functions --
//------------------------------------------------------------------------------

KNN2 knn;
// k nearest neighbors
25 => int K;
// results vector (indices of k nearest points)
int knnResult[K];
// knn train
knn.train( inFeatures, uids );

// used to rotate sound buffers
0 => int which;

//------------------------------------------------------------------------------
// SYNTHESIS!!
// this function is meant to be sporked so it can be stacked in time
//------------------------------------------------------------------------------

fun void synthesize( int uid ){
// get the buffer to use
buffers[which] @=> SndBuf @ sound;
// get the envelope to use
envs[which] @=> ADSR @ envelope;
// increment and wrap if needed
which++; if( which >= buffers.size() ) 0 => which;

// get a referencde to the audio fragment to synthesize
windows[uid] @=> AudioWindow @ win;
// get filename
files[win.fileIndex] => string filename;
// load into sound buffer
filename => sound.read;
// seek to the window start time
((win.windowTime::second)/samp) $ int => sound.pos;

// print what we are about to play
chout <= "synthsizing window: ";
// print label
chout <= win.uid <= "["
<= win.fileIndex <= ":"
<= win.windowTime <= ":POSITION="
<= sound.pos() <= "]";
// endline
chout <= IO.newline();

// send a signal to Processing that tells it which sample we are playing
sendWindow( win.fileIndex, leftPans[which].gain() );

// open the envelope, overlap add this into the overall audio
envelope.keyOn();
// wait
(EXTRACT_TIME*2)-envelope.releaseTime() => now;
// start the release
envelope.keyOff();
// wait
envelope.releaseTime() => now;
}

//------------------------------------
// setup OSC messages to Processing
//------------------------------------

// destination host name
"localhost" => string hostname;
// destination port number
12000 => int port;

// sender object
OscOut xmit;

// aim the transmitter at destination
xmit.dest( hostname, port );

// send OSC message: current file index and startTime, uniquely identifying a window
fun void sendWindow( int fileIndex, float pan )
{
// start the message...
xmit.start( "/mosaic/window" );

// add int argument
fileIndex=> xmit.add;
// add float argument
pan => xmit.add;
// send it
xmit.send();
}

fun void sendEnergy( float energy )
{
// start the message...
xmit.start( "/mosaic/energy" );

// add float argument
energy => xmit.add;
// send it
xmit.send();
}

Hid hid;
HidMsg msg;

//------------------------------------------------------------------------------
// real-time similarity retrieval loop
//------------------------------------------------------------------------------

HOP / 50 => dur smallIncrement;
while( true ){
// aggregate features over a period of time
for( int frame; frame < NUM_FRAMES; frame++ ){
// send energy to processing
for (int i; i < 50; i++) {
// get energy
rmsEnergy.upchuck();
sendEnergy( rmsEnergy.fval(0) );
// wait
smallIncrement => now;
}

//-------------------------------------------------------------
// a single upchuck() will trigger analysis on everything
// connected upstream from combo via the upchuck operator (=^)
// the total number of output dimensions is the sum of
// dimensions of all the connected unit analyzers
//-------------------------------------------------------------
combo.upchuck();
// get features
for( int d; d < NUM_DIMENSIONS; d++)
{
// store them in current frame
combo.fval(d) => features[frame][d];
}

// advance time
//HOP => now;
}

// compute means for each coefficient across frames
for( int d; d < NUM_DIMENSIONS; d++ ){
// zero out
0.0 => featureMean[d];
// loop over frames
for( int j; j < NUM_FRAMES; j++ )
{
// add
features[j][d] +=> featureMean[d];
}
// average
NUM_FRAMES /=> featureMean[d];
}

//-------------------------------------------------
// search using KNN2; results filled in knnResults,
// which should the indices of k nearest points
//-------------------------------------------------
knn.search( featureMean, K, knnResult );

// SYNTHESIZE THIS
spork ~ synthesize( knnResult[Math.random2(0,knnResult.size()-1)] );
}

//------------------------------------------------------------------------------
// end of real-time similiarity retrieval loop
//------------------------------------------------------------------------------

//------------------------------------------------------------------------------
// function: load data file
//------------------------------------------------------------------------------
fun FileIO loadFile( string filepath ){
// reset
0 => numPoints;
0 => numCoeffs;

// load data
FileIO fio;
if( !fio.open( filepath, FileIO.READ ) ){
// error
<<< "cannot open file:", filepath >>>;
// close
fio.close();
// return
return fio;
}

string str;
string line;
// read the first non-empty line
while( fio.more() ){
// read each line
fio.readLine().trim() => str;
// check if empty line
if( str != "" )
{
numPoints++;
str => line;
}
}

// a string tokenizer
StringTokenizer tokenizer;
// set to last non-empty line
tokenizer.set( line );
// negative (to account for filePath windowTime)
-2 => numCoeffs;
// see how many, including label name
while( tokenizer.more() ){
tokenizer.next();
numCoeffs++;
}

// see if we made it past the initial fields
if( numCoeffs < 0 ) 0 => numCoeffs;

// check
if( numPoints == 0 || numCoeffs <= 0 ){
<<< "no data in file:", filepath >>>;
fio.close();
return fio;
}

// print
<<< "# of data points:", numPoints, "dimensions:", numCoeffs >>>;

// done for now
return fio;
}


//------------------------------------------------------------------------------
// function: read the data
//------------------------------------------------------------------------------
fun void readData( FileIO fio ){
// rewind the file reader
fio.seek( 0 );

// a line
string line;
// a string tokenizer
StringTokenizer tokenizer;

// points index
0 => int index;
// file index
0 => int fileIndex;
// file name
string filename;
// window start time
float windowTime;
// coefficient
int c;

// read the first non-empty line
while( fio.more() ){
// read each line
fio.readLine().trim() => line;
// check if empty line
if( line != "" )
{
// set to last non-empty line
tokenizer.set( line );
// file name
tokenizer.next() => filename;
// window start time
tokenizer.next() => Std.atof => windowTime;
// have we seen this filename yet?
if( filename2state[filename] == 0 )
{
// make a new string (<< appends by reference)
filename => string sss;
// append
files << sss;
// new id
files.size() => filename2state[filename];
}
// get fileindex
filename2state[filename]-1 => fileIndex;
// set
windows[index].set( index, fileIndex, windowTime );

// zero out
0 => c;
// for each dimension in the data
repeat( numCoeffs )
{
// read next coefficient
tokenizer.next() => Std.atof => inFeatures[index][c];
// increment
c++;
}

// increment global index
index++;
}
}
}
import oscP5.*;
import netP5.*;

//-------------------------------------
// Display an image of Kanye or Taylor
//-------------------------------------

class ImageDisplay {
int imageType;
int posX;
float alpha = 0;
float scale;
boolean fadingIn = true;
float displayStartTime;
boolean fadingOut = false;

// constructor
ImageDisplay(int imageType, int posX) {
this.imageType = imageType;
this.posX = posX;
this.displayStartTime = millis();
this.scale = random(1, 2);
}

// update the opacity of the image and display it
boolean updateAndDisplay() {
// Calculate elapsed time since the image was displayed
float elapsedTime = millis() - displayStartTime;

// Implement fade out after 4 seconds
if (elapsedTime > 4000 && !fadingOut) {
fadingOut = true;
}

// fade out
if (fadingOut) {
alpha -= 5; // Fade out speed
if (alpha < 0) alpha = 0;
}
else {
// Fade in until fully opaque
if (alpha < 255) {
alpha += 5; // Fade in speed
if (alpha > 255) alpha = 255;
}
}

// Draw the image
PImage img = (imageType == 1) ? img1 : img2;
tint(255, alpha);

// Compute the size
float newWidth = img.width * scale;
float newHeight = img.height * scale;
// Draw scaled image
image(img, posX, height/2 - newHeight/2, newWidth, newHeight);

// Check if the image has been displayed for 5 seconds
if (elapsedTime > 5000) {
return false; // Indicate that the image should be removed
}

return true; // Continue displaying the image
}
}

//----------------------------------------
// Particles that react to the current RMS
//----------------------------------------

class Particle {
PVector position;
PVector velocity;
PVector acceleration;
float maxSpeed = 2;
int particleColor;

// constructor
Particle() {
position = new PVector(random(width), random(height));
velocity = new PVector(0, 0);
acceleration = new PVector(0, 0);

// assign a randomn color
if (random(1) < 0.5) {
particleColor = color(192, 192, 192, 200); // Silver with semi-transparency
} else {
particleColor = color(255, 215, 0, 200); // Gold with semi-transparency
}

}

// update the position, velocity, accel, etc.
void update() {
maxSpeed = map(currentEnergy, 0, 0.5, 2, 10);
velocity.add(acceleration);
velocity.limit(maxSpeed);
position.add(velocity);
acceleration.mult(0); // Reset acceleration
}

// apply a force to the particle
void applyForce(PVector force) {
acceleration.add(force);
}

// display the particle
void display() {
stroke(particleColor); // Semi-transparent white
strokeWeight(2);
point(position.x, position.y);
}

// have the edges wrap
void edges() {
if (position.x > width) position.x = 0;
if (position.x < 0) position.x = width;
if (position.y > height) position.y = 0;
if (position.y < 0) position.y = height;
}
}

//----------------------------------------
// Constants and Global Variables
//----------------------------------------

// initialize particles and flow field
Particle[] particles;
int particleCount = 2000; // Adjust based on performance and visual preference
PVector[] flowField;
int scale = 20; // Scale of the noise grid

// initialize OSC
OscP5 oscP5;

// initalize energy global variable
float currentEnergy = 0;

// inialize image arraylist
ArrayList<ImageDisplay> images = new ArrayList<ImageDisplay>();
PImage img1, img2;

//----------------------------------------
// Main Loop
//----------------------------------------

void setup() {
size(800, 600);

// setup OSC
oscP5 = new OscP5(this, 12000);

// setup particle visualization
particles = new Particle[particleCount];
for (int i = 0; i < particles.length; i++) {
particles[i] = new Particle();
}
flowField = new PVector[(width / scale) * (height / scale)];

// load images
img1 = loadImage("kanye_1.png");
img1.resize(img1.width / 3, img1.height / 3);
img2 = loadImage("taylor_1.png");
img2.resize(img2.width / 3, img2.height / 3);
}

void draw() {
background(0, 50);
// draw background visuals
calculateFlowField();
for (Particle particle : particles) {
particle.update();
particle.edges();
particle.display();
}

// display foreground images
for (int i = images.size() - 1; i >= 0; i--) {
ImageDisplay imgDisplay = images.get(i);
boolean continueDisplay = imgDisplay.updateAndDisplay();
if (!continueDisplay) {
images.remove(i); // Remove image after display duration
}
}
}

void oscEvent(OscMessage theOscMessage) {
if (theOscMessage.addrPattern().equals("/mosaic/window")) {
int imageType = theOscMessage.get(0).intValue();
float leftGain = theOscMessage.get(1).floatValue();
int positionX = (int) ((leftGain) * (width - img2.width));
images.add(new ImageDisplay(imageType, positionX));
println(leftGain, positionX);
}

else if (theOscMessage.addrPattern().equals("/mosaic/energy")) {
currentEnergy = theOscMessage.get(0).floatValue() * 60;
}
}

//----------------------------------------
// Helper functions
//----------------------------------------

// Calculate the particle flow field
void calculateFlowField() {
int index = 0;
for (int y = 0; y < height; y += scale) {
for (int x = 0; x < width; x += scale) {
// Use Perlin noise to generate a vector direction
float angle = noise(x * 0.1, y * 0.1, currentEnergy * 0.1) * TWO_PI * 2;
PVector vector = PVector.fromAngle(angle);

// Modulate the vector magnitude based on the RMS value
// The map function parameters (0, 0.5, 0.5, 2) are placeholders
// Adjust them based on the expected RMS range and desired visual effect
float vectorMag = pow(map(currentEnergy, 0, 0.5, 0.5, 2), 2); // Exponential scaling
vector.setMag(vectorMag);

flowField[index] = vector;
index++;
}
}

// Apply the flow field vectors to the particles
for (Particle particle : particles) {
int xIndex = int(particle.position.x / scale);
int yIndex = int(particle.position.y / scale);
int arrayIndex = xIndex + yIndex * (width / scale);

// Ensure the index is within the bounds of the flowField array
if (arrayIndex >= 0 && arrayIndex < flowField.length) {
PVector force = flowField[arrayIndex];
particle.applyForce(force);
}
}
}

--

--