CS 470 — Programming Étude #2

5 min readFeb 7, 2023

Phase One

In my experiments, I tested each feature individually to assess how helpful each of them is in distinguishing the 10 selected genres. Here are the different feature configurations as well as their corresponding accuracy results on the test set:

1. All 4 features

fold 0 accuracy: 0.4377
fold 1 accuracy: 0.4480
fold 2 accuracy: 0.3804
fold 3 accuracy: 0.4466
fold 4 accuracy: 0.4431

2. Only MFCC

fold 0 accuracy: 0.4157
fold 1 accuracy: 0.3868
fold 2 accuracy: 0.3985
fold 3 accuracy: 0.4029
fold 4 accuracy: 0.3956

3. Only Centroid

fold 0 accuracy: 0.1662
fold 1 accuracy: 0.1789
fold 2 accuracy: 0.1750
fold 3 accuracy: 0.1672
fold 4 accuracy: 0.1946

4. Only Flux

fold 0 accuracy: 0.1495
fold 1 accuracy: 0.1608
fold 2 accuracy: 0.1485
fold 3 accuracy: 0.1436
fold 4 accuracy: 0.1686

5. Only RMS

fold 0 accuracy: 0.1946
fold 1 accuracy: 0.1882
fold 2 accuracy: 0.1843
fold 3 accuracy: 0.2020
fold 4 accuracy: 0.1975

Results

With MFCC features, the final accuracy is quite close to the accuracy obtained using all features. Individually, centroid, flux, and RMS give poor results. Results obtained using the RMS feature are slightly better. Based on these results, MFCC features appear to be the ones that carry the most “genre meaning” as individual features. This makes sense since MFCC generally corresponds to “timbre.” It sounds reasonable to say that the selected genres cannot be classified solely based on amplitude (RMS), pitch (centroid), or sharpness/change of brightness (flux).

Overall, a 0.44 accuracy score may seem mediocre for a classifier, but I am positively surprised, given that the model only relies on a few spectral features. Additionally, genres are highly categorical. Music can be difficult to classify, as it can have multiple influences and distinct passages with various styles. The human baseline is certainly below 1.

When live-testing these different models on several music pieces, I generally found similar tendencies as those obtained in the test set. For example, results for Beethoven’s “classical” 9th and the Beatles’ “pop” (?) Let It Be are shown below.

From left to right, classification results of Beethoven’s Symphony №9based on: all features, MFCC, centroid, flux, RMS

From left to right, classification results of the Beatles’ Let It Be based on: all features, MFCC, centroid, flux, RMS

Phase Two

The core of the algorithm retrieves drum samples from a drum solo based on the microphone input. The input is compared to the dataset of drum samples based on all previously mentioned features to closely match it.

In addition, keyboard inputs can be used to add or remove sound layers:
- Ominous harmonics sampled based on the microphone input
- A heartbeat that can be accelerated and decelerated
- Chirping bird noises

One key triggers a bright triangle noise that removes the drums and harmonics layers. I am still trying to trigger this with a voice command by training a classifier recognizing a specific word.

Phase Three (Milestone)

My mosaic tool is inspired by Alejandro Iñárritu’s movie Birdman (2014). The drumming provides a driving, persistent beat throughout the movie, creating a sense of urgency and adding to the film’s overall energy.

Like in the movie, I built this responsive drumming part to reflect the internal turmoil of an input voice. The final performance will feature a user or actor telling a story about their anxieties about the world.

At the start, the audience only hears chirping noises. As the story progresses, the heartbeat accelerates and the drumming intensifies along with the narrator’s voice. An added layer of ominous harmonics reinforces the energy of the scene. The turmoil suddenly stops when the narrator comes back to their senses. *Chirping birds*

Phase Three (Final deliverables)

For the final deliverables, I decided to slightly modify the sounds and their features, add visual elements, and read a poem as the main driver of the performance.

1. Sound Features

The harmonics were replaced by randomly modulated noises of an orchestra tuning. Features were re-extracted, and sound features and the overall rhythm were adjusted. Keyboard controls enable to accelerate the heartbeat and to rise the general volume to match the rising intensity of the poem until the final triangle hit makes everything come to an end.

2. Visual elements

The lines of the poem appear as the performer tells them. I tried to make it voice-controlled but I couldn’t find a feature that would be specific enough. It is thus keyboard-controlled. In addition, letters that are not in the line told by the performer appear based on their voice. Under the hood, each line corresponds to one edited video. I meant the visuals to match the fear, disorientation, and trouble addressed in the poem.

3. Poem

Naturally, the poem was generated by ChatGPT. It was closely guided to obtain this structure and this content. The poem is about being ready to face upcoming challenges and enjoy life.

4. Reflections

I had a lot of fun creating this piece. I felt like I had to adapt a lot to the exercise constraints and to my lack of expertise. This is not the result I originally imagined, but I am satisfied with it.

Piecing samples together seems to be the foundation of most crafts, at least from a computational point of view. This in-between work that uses low-level features, still understandable for humans, really helped me better apprehend how algorithms can learn and produce. At atomic levels, everything seems to be reproducible. I was impressed by the voice generator impersonating Ge, and I was wondering how much I could reproduce a drums solo. With basic time-beat alignment, I was able to produce rhythms that sounded OK to me. While experimenting with these tools, even though most sound associations felt unnatural, I found some of them really compelling and I hope I can re-use some of these samples in my own future works.

Milestone:
https://drive.google.com/drive/folders/1OwOHQw0Vu-86dN4sNfA5e5Tmo3R4038S?usp=share_link

Final Deliverables:
https://drive.google.com/drive/folders/1TFsMjulbTxigUC-HQcykqTvU5MejrRaB?usp=share_link

— Jean-Peïc Chou

CS 470 — Programming Étude #2

Phase One

Phase Two

Phase Three (Milestone)

Phase Three (Final deliverables)

Written by JP