Wekinate Your World

shreya
4 min readFeb 22, 2024

--

FINAL DELIVERABLES

  1. stem_split.ck

I think the music production process is fascinating, and I love watching YouTube videos breaking down the different stems in songs. I wanted to see if I could create an interactive program to explore different stems in a song that I believe has quite a few distinct elements — “A-Punk” by Vampire Weekend.

I trained this model using x-y coordinates from Processing in Wekinator to output a 4-float vector at each timestamp to synthesize sounds. The square being in the middle allows for all stems to be heard (i.e. the original song), and the quadrants correspond to each of the four stems. Placing the square at the intersection of two quadrants allows for two stems to be heard at the same time.

2. drums.ck

This program uses the MotionSender iOS app to send 6 inputs to Wekinator. I decided to train a model to produce kick, snare, and hi-hat drum sounds with different movements to mimic playing a drum.

The minimal output here reflects how much trouble I had creating different inputs here. I tried different phone positions and rotations in different combinations and found that simple acceleration/velocity was best at producing the most deterministic output; turns out that using a phone that could be at any orientation and position when you’re holding it isn’t the best input! Acceleration is fairly limited if you want to preserve your phone, so I have 3 outputs here.

3. justin_bieber.ck

This is an improvement on my milestone implementation that implements more smoothing between clips. This uses FaceOSC to map to five different Justin Bieber songs, a ChucK relay file to send the input to Wekinator, and another ChucK file do configure the outputs.

Code: https://github.com/shreyadsouza/wekinator

Reflection

In my past assignments, I felt like a lot of my execution was hard-coded and not interactive. I am pleased that all of these are heavily interactive and actual systems that can be modified to fit different tasks.

I admittedly felt a dearth of creativity when doing this assignment; I found it hard to come up with three different systems. I wanted to explore different inputs into ChucK, but ran into difficulties creating a relay for these different inputs. For example, I was interested in using word2vec2osc (https://github.com/fiebrink1/word2vec2osc) in some way, but could not figure out what kinds of inputs to send to Wekinator to give an appropriate output. I also experimented with TouchpadOSC and VisionOSC.

Wekinator allowed me to focus only on the input and output modalities without having to worry about training a model myself. However, figuring out the inputs proved to be a challenge. FaceOSC did not detect all the expressions I was trying to make, especially in the lighting. It was hard to let go and allow for the outputs to not be predetermined to the extent I would have hoped.

I do wish I had been given more time to work on this project and develop my ideas for each final deliverable further. They are all janky in some way. I also learned how many different kinds of libraries can be imported into Processing; it would have been interesting to have produced a more customized visual for the first deliverable above. I also ended up using .wav inputs instead of generating sounds in ChucK; it would have been cool to experiment with different sounds, pitches, and effects.

I see minimal real-world applications for my second two deliverables. If I were to implement automated stem-splitting for my first deliverable for any song, I think it would be a really useful application for aspiring music producers/instrumentalists.

MILESTONE

Video

https://drive.google.com/file/d/1AfRaphBK07m60iH4tWmPlfgCuwJOMAQU/view?usp=sharing

Code

Explorations

  • FaceOSC only has a finite set of measurements that are not necessarily able to accurately detect different expressions. I experimented with different numbers of inputs and found a set that worked reasonably well, but I think trying to have 5 different sets of outputs just with facial expressions was quite difficult, as evident with the output not always changing with different expressions.
  • Different OSCs (pde/trackpad/fingers) will require me to process the inputs differently. My initial goal for this checkpoint was to use the Processing video input so I could detect more than just changes in my facial expressions. I will be working on this for one of my deliverables; I’m currently working on getting all 54 values to be read from Processing.
  • My ideas for the next deliverable:
  • Creating an instrument (maybe printing out piano keys to store position?)
  • DJing using keyboard keys or trackpad

Acknowledgements:

  • 2023 classwork
  • Ge and Andrew, as always!

--

--