CS 470 Project 2 Milestone 1

5 min readJan 30, 2024

Phase 1:

Syeds-MacBook-Air:CS470_Proj2 murtaza$ chuck — silent x-validate.ck:output_features_config1.txt
# of data points: 1000 dimensions: 23
fold 0 accuracy: 0.4064
fold 1 accuracy: 0.4368
fold 2 accuracy: 0.4289
fold 3 accuracy: 0.4235
fold 4 accuracy: 0.4824

The features extracted in question were: Centroid, Flux, RMS, MFCC

(base) Syeds-MacBook-Air:CS470_Proj2 murtaza$ chuck — silent x-validate.ck:output_features_config2.txt
# of data points: 1000 dimensions: 15
fold 0 accuracy: 0.3181
fold 1 accuracy: 0.3093
fold 2 accuracy: 0.2931
fold 3 accuracy: 0.3324
fold 4 accuracy: 0.3314

The 4 features extracted in question were: Centroid, Flux, RollOff, Chroma

(base) Syeds-MacBook-Air:CS470_Proj2 murtaza$ chuck — silent x-validate.ck:output_features_config3.txt
# of data points: 1000 dimensions: 2
fold 0 accuracy: 0.2211
fold 1 accuracy: 0.2270
fold 2 accuracy: 0.2211
fold 3 accuracy: 0.2304
fold 4 accuracy: 0.2289

The 2 features extracted in question were: Flux, RollOff

(base) Syeds-MacBook-Air:CS470_Proj2 murtaza$ chuck — silent x-validate.ck:output_features_config4.txt
# of data points: 1000 dimensions: 23
fold 0 accuracy: 0.4181
fold 1 accuracy: 0.4137
fold 2 accuracy: 0.3730
fold 3 accuracy: 0.3740
fold 4 accuracy: 0.4333

The 4 features extracted in question were: Kurtosis, ZeroX, RMS, MFCC

(base) Syeds-MacBook-Air:CS470_Proj2 murtaza$ chuck — silent x-validate.ck:output_features_config5.txt
# of data points: 1000 dimensions: 22
fold 0 accuracy: 0.1275
fold 1 accuracy: 0.0931
fold 2 accuracy: 0.0882
fold 3 accuracy: 0.1127
fold 4 accuracy: 0.1127

The 3 features extracted in question were: ZeroX, RollOff, MFCC

In your experiment, what configuration yielded the highest score in cross-validation?
From all these experiments it is apparent that the 4 feature combination of Centroid, Flux, RMS, MFCC
Yielded the highest score amongst all the tested configurations.
How do different — and different numbers of — features affect the classification results?
It seemed as if MFCC’s presence with its tuned parameters greatly contributed to the accuracy
Of the classifiers. Also the greater number of features extracted seemed to correlate well
With improved accuracy, except configuration 3 had only 2 features and had a greater accuracy than
Configuration 5 with 3 features extracted so this trend may not always hold.

Phase 2:

The idea I wanted to explore was advertisements, since by nature they are already designed to be catchy soundbites. The artistic choice I wanted to go for however, was contrasting advertising’s odious positivity with some audio which shows its inherent deceptiveness. I was also hopeful that the constant repetition of jingles would devolve into a cacophony, but wanted to ensure that the slogans and associated brands would be not be lost on the audience.

Audio Choice: At this early stage, I went with an audio clip from the pilot episode of AMC’s Emmy award winning TV show “Mad Men.” The show follows the lives of ad-executives in the 60s and explores themes of deception and capitalism—the audio clip in question is of the series protagonist Don Draper explaining advertising to his fellow executives. My interpretation of the clip is that it rightfully portrays advertising an idyllic escape, a soother for all our choices. I echo McLuhan’s take on this, “They (people) read things (ads) to feel reassured that they bought the right thing,” —the question this brief demo explores is what happens when that feeling of comfort fades away and ads are exposed for what they truly are in their most basic form: attention grabbing soundbites?

The ad choice for the feature vector extraction was a jingle compilation from the early 2000s (linked here) and some more vintage clips from a 50s-70s ad compilation (linked here). The former were chosen for relatability with the class, and the latter to create a sense of being out of place, and to hark back to beginning of the era of the jingle. The time period for the latter compilation clips also aligns with the time setting of Mad Men so I thought it would be a nice touch.

Future work may involve the construction of an AI poem about advertising featuring the compilation companies. I briefly experimented with various audio pitches and options using Eleven Lab’s AI voice tools, and it may be a cool option to dig deeper into. Other options I considered were relating to were keyboard input and mic vocal synthesis, but given the diversity of soundbites controlling pitch and creating tunes with these proved immensely difficult.

Visual Element: I would be interested in perhaps exploring a visual element to the advertising soundbites. The Youtube jingle compilations lend themselves to being used to display the images of brands as their soundbites come on. I have currently not explored this option in the interest of time, but went forward with a visual element in the form of an anti-advertising slogan which slowly scrolls down as the demo proceeds. Other options considered included generating similar words like was done in Etude 1 and having these appear on screen, and alternatively just manually curating a set of images. More options will be brainstormed, and feasibility will depend on time constraints.

Project Demo:

Code can be found here

Final Piece:

The final piece wound up having a life of its own. I wanted to focus on advertisements as sound bites, but also situate the audio I’m using within the context of a broader story—American capitalism in a post WWII society. My reason for doing so is that many of the innovations which made widespread advertising possible (TV, and widespread access to radio due to cheaper models being available are two noteworthy examples) were a direct result of the post war boom, and the boom itself was a direct result of the military industrial complex’s role in WWII.

The piece tells the story of the US state as one whose economy (and therefore its consumerism) hinges on armed conflict and tragedy. Paying homage to the initial sound clip from Mad Men, the soundbites advance through major US news headlines in the latter half of the 20th century, before eventually reaching the 2000s where the background ads are playing. The wars in Korea, Vietnam, and the Gulf were in large part caused by corporate interests driven by profit—and advertisements are a major piece of the puzzle in shaping positive sentiment towards these companies as they engage in war profiteering.

All in all the piece was meant to be a whirlwind tour of the US during the latter half of the 20th century, seen primarily through the lens of advertisements trying to promote a vision of a happy, healthy society, while the veneer chips in the background in the form of jarring news clips and political audio.

CS 470 Project 2 Milestone 1

Phase 1:

Phase 2:

Project Demo:

Final Piece:

Written by Murtaza Hassan