Who’s afraid of creative code?

Visualizing chromesthesia

Published in

DENKWERK STORIES

12 min readJan 11, 2018

We’ve all been there. You have this killer idea for an installation or interactive experience, but realizing it…well, that’s the tricky part. The devil is in the details: where do you start when you don’t know where to start? For most of my work, the answer always lies in creative code — it’s there, you just have to find it. The good thing about creative code: the possibilities are endless. The bad thing about creative code: the possibilities are endless. For many, this situation is quite overwhelming, even scary, but in the spirit of Ghostbusters (or in my case, “code-busters”): I ain’t afraid of no code! With a patience, perseverance and resourcefulness, a solution always presents itself eventually. So dare to be digital and don’t fear the code!

So, here’s a little background on this killer idea. A while back I worked with denkwerk on a project in cooperation with the German Synesthesia Association (DSG e.V.) to raise awareness about the fascinating phenomenon that is their namesake. Never heard of it? Well, synesthesia is a neurological phenomenon where the stimulation of one sense automatically triggers another. In the project, we worked closely with four synesthetes and the DSG to create an interactive and informative website as well as an immersive VR experience.

It was here that I had the pleasure of working with synesthete and concert violinist, Silja Müller. Her synesthetic perceptions manifest themselves in various intriguing combinations, most notably (and fitting to her profession) in the form of chromesthesia. This kind of synesthesia allows her to see colors when she hears (or plays) music. She also composes music synesthetically by arranging sounds, colors, and tastes so that they harmonize with each other. I was really inspired by Silja’s use of her condition to create unique art. Maybe it was the complete mastery of her craft, or the fact that music composition and coding are more alike than people realize, but the end result was that for one night, the worlds of code and composition collided for an exclusive concert.

My biggest challenge here was creating an accurate representation of what Silja perceives. The very nature of this phenomenon makes it difficult, if not impossible to put in words or show to others. How do you explain the inexplicable, express the inexpressible, and make the invisible become visible? Like I said before, the answer is creative code.

In this article, I’ll give you a behind-scenes-look into how I made this seemingly impossible task possible by creating an installation that allows non-synesthetes to see the world as Silja does in the world’s first interactive synesthetic experience.

Converting music into an interpretable format

The mission: create a concert experience where concertgoers can see the world through a synesthete’s eyes. Well, that’s easier said than done. One of the difficulties when trying to connect our analog, physical world with the digital world is finding a way to convert something analog into an interpretable, digital format which we then can use as input for a digital experience. I mulled over the same question over and over: How could I take the music Silja was playing on her acoustic violin and turn it into compatible digital input so that 30 smartphones could display some amazing audio reactive visual content? I didn’t have all the answers yet, but one thing was quite clear; I needed to find a way to convert raw audio data into sequential midi (musical notes) information.

Now, there are a thousand ways I could’ve built this particular setup, but for live audio reactive visuals one particular thing is super challenging: low latency. The lower the latency between the original music source and the visual output on the phones, the more the end-user feels like these two worlds are connected to each other. My setup had two clearly defined parts where reducing the latency was imperative:

Analyzing the raw audio data
Sending the converted data in sync to 30 Samsung phones

Analyzing the raw audio

I figured the fastest way to analyze raw audio data would be to send the analog input signal through some C++ code. Why? Speed. Nowadays, C++ is still the fastest performing programming language out there and since my goal was reaching zero latency, this was definitely the way to go.

Enter the marvelous world of OpenFrameworks. If you’re ever going to build anything in the realm of creative coding, mastery of this open source toolkit is essential. It’s written completely in C++ and is especially built to handle processing huge chunks of raw input and output data — exactly what I needed.

I started off by including some essential add-ons into our project environment: ofxAubio, ofxMidi and ofxGui. These add-ons allow you to easily implement pieces of code other genius developers have written. I then used ofxAubio to process the incoming audio signal and convert it to MIDI messages. The great thing about ofxAubio is that it can detect several essential audio elements with great accuracy. It can do beat detection, onset detection, pitch detection and even figure out the melodic bands of the raw audio signal.

Next step: pitch detection. An ofxAubio pitch detection algorithm determines the pitch of the sound that is currently being played and maps it to a classical MIDI keyboard layout. The note is then assigned a corresponding pitch number between 21 and 108.

After determining which MIDI notes were being played, I needed to find a way to send that information to the next part of the installation. For this, I used the ofxMidi add-on which enabled me to send MIDI messages over a virtual MIDI channel to any listening client device.

Next, I used ofxGui to build a small graphical user interface to control the flow of the overall experience and have some visual feedback during the performance.

Connecting & syncing 30 phones

So far, so good. I had live music input that was being analyzed and converted in my OpenFrameworks app. These MIDI notes were then sent through a virtual MIDI channel. Now, I just had figure out how to send these MIDI messages over a wifi connection to all the smartphones and keep everything in sync during the whole performance. node.js saves the day once again! This development environment allows you to use JavaScript server-side and is perfect for data-intensive, real-time applications that run across distributed devices. It made it super easy for me to set up a web server, open a bidirectional socket connection and send some messages to a bunch of clients in no time. With 50 lines of code, I had everything up and running.

Here’s the play-by-play:

Start webserver
Open bidirectional socket connection and listen for connecting clients
Listen for incoming MIDI messages on a virtual port
Format incoming MIDI messages and forward them to each individual connected client
Sit back, relax and watch the show

The visual experience: challenges

Phew! You’ve made it this far. I promise to get to the fun part soon, but before we do, let’s recap: Silja is playing her acoustic violin; the sounds coming from her violin are being analyzed and converted into MIDI notes. These MIDI messages are sent to a node.js server, which then forwards these messages over a socket connection to all connected clients — all in real time, all in sync and with a latency close to zero.

You see, the thing is that if you want to augment the user’s environment, you need access to the back facing camera of the phone. Now, accessing the camera isn’t that hard; It’s just a few lines of code, but the problem nowadays is that most of the browsers out there don’t allow access to the front or back camera as long as the server’s origin is not trusted.

Secured website asking access to your camera.

The lock symbol in the address bar is so much more than proverbial “green light” assuring you that everything is ok. It guarantees the server is secure & safe, has been validated by an external authority, has a valid key/certificate pair and tells us that all data being sent and received is encrypted.

The challenge here was trying to get getting a valid certificate for a server that never connects to the internet. After days of trying and testing all sorts of different techniques (including self-signed certificates), I finally realized there was no way to get a valid client-server connection over a local area network by just using a local IP. The green light, turned yellow and then red before my eyes.

This was extremely frustrating, but I wasn’t about to give up just yet. I reoriented my thought process away from trying to make the client-server connection secure and realized I still had a few tricks up my sleeves. For one, denkwerk owned all the smartphones that would eventually be used at the concert, which gave me the opportunity to explore solutions on the individual devices instead of on the server.

So how did I finally do it? Well, the great thing about Android is that you can change and override a lot of the core functionalities by installing a so called ‘kiosk’ application. This kind of application takes over the whole Android experience and gives you as the developer the ability to force the end-user to engage in a particular behavior. I used Kiosk Browser Lockdown and in my opinion, it’s by far the best of its kind. I was able to use all the recent web technologies through a Chrome WebView, disable the certificate errors (yay!) and limit the user access to the home button, back button, lock button, etc.

After this breakthrough, I could finally start creating the visual experience. I started off by building an augmented reality prototype using a tracking marker within close vicinity of the violinist, but quickly abandoned this idea when I realized that both she and the crowd would remain in fixed positions throughout the venue for the entire concert. Thus, I could use the compass as a reference to where the artist was located in relation to the users. Also, not using a tracking marker freed up a lot of processing power I could then use for the WebGL experience.

The visual experience: welcome to the wonderful world of WebGL

A person affected by synesthesia sees, feels, hears and tastes all sorts of different things based on particular triggers. These perceptions are very personal and no two synesthetes have the same experience. As I mentioned earlier, Silja(the violinist) sees colors pop into her field of view when playing her violin — and each musical note has a specific corresponding color. She also uses these perceptions to compose music; for her, music has to both look and sound good. My goal was to mimic this phenomenon so that concert visitors could get an idea of what it’s like for a synesthete and experience the world through her eyes.

I already had the phones and infrastructure in place to send the musical information to each of the phones. Now it was time to implement the visual experience. I decided to split the experience into two independent layers. The first layer (in the background) shows the camera feed and manipulates its color by running it through a fragment shader. The second layer (in the foreground), displays a WebGL 3D scene that augments the user’s environment.

The first layer

Here, I wanted to manipulate the original camera feed into something more synesthetic, so I decided to manipulate the colors of the camera feed by running it through a filter. This filter performs like the ones you already know from Instagram. To clarify, it changes the color of each individual pixel using mathematical pixel operations. Now, let’s consider something first. Let’s say your camera output is 1920 x 1080 pixels, which means you’ve got about 2 million pixels spanning your whole screen. This also means 2 million pixel operations per rendered frame and that 60 times per second. That’s about 120 million pixel operations per second. If you know anything about JavaScript, you definitely know that this is WAY too many calculations for a scripting language to handle.

I needed to find a better solution for performing so many pixel operations at once. Enter the world of fragment shaders. These little programs are just one small part of the entire WebGL ecosystem. Fragment shaders allow you to do pixel operations all at once via the graphics card, instead of manipulating each pixel 1-by-1 on the processor. By doing all 2 million pixels operations simultaneously on the graphics card, I only needed a few microseconds to process the whole image. This made a huge difference in performance and responsiveness. By offloading massive amounts of pixel operations to the graphics card, I could easily achieve the highly-desired and silky smooth 60 frames per second.

1-by-1 pixel manipulation on the processor (left) vs. simultaneously processing on the graphics card (right).

The following snippet shows you how the fragment shader was built. The original video feed (videoTexture) is blended together with two colors (colorA, colorB). I used the ‘overlay’ blending method which is comparable to the ‘overlay’ blending method designers love to use in Photoshop.

The second layer

Here my goal was augmenting the user’s field of view by adding depth perception. It’s important to note that people with synesthesia sometimes see objects appear in their field of view and it’s as if these objects have a real physical presence in our three-dimensional world. To mimic this phenomenon, I decided to go with audio reactive spheres. These spheres would spawn from the area all around Silja’s position each time she plays a different note on her violin.

To determine a user’s distance from Silja, the smartphone’s gyroscope came into play. The gyro tells you where the phone is located rin relation to the rest of the world. So I knew where the viewer was holding their phone, and where Silja was playing her violin (offset from true north). I used both angles to figure out where the 3D scene’s origin should be and from here, spawned colorful spheres that flew around the users. To accurately portray Silja’s perceptions, these spheres were tinted with the same colors she sees when playing certain musical notes. The viewer was also able to view the entire scene by rotating their device 360°.

For an added visual effect, I decided to implement a small physics engine. This allowed me to create more organic movement making the spheres seem “alive”. Using magnetic forces, associated spheres became attracted to one another eventually forming clusters of related musical notes.

Conclusion

It’s a great feeling when the fruits of your labor finally pay off. There were definitely some ups and downs, but ultimately the interactive synesthesia concert went off without a hitch. Mission accomplished — and all made possible through employing various web technologies for creative coding.

For several years now, I’ve been big supporter of doing exactly this. Especially because setting up visual experiences for the web has almost become child’s play these days. This, in combination with the enormous up rise of WebGL, makes it possible to push the boundaries of interactive visual content for the browser.

But, beware. The era in which creative coding toolkits, e.g. OpenFrameworks or processing become obsolete isn’t quite here yet. We still need these frameworks to do a lot of the heavy lifting. This becomes painfully clear when setting up communication between external devices, sensors, or any kind of external piece of hardware. Nowadays, browser vendors aren’t focused on serving this part of the creative coding community (installations). Instead, they’re more concerned with building safe and secure browsers for everyday use.

My advice is to play around and experiment with various web technologies, build some awesome prototypes, but be careful when trying to set up communication between different hardware devices…and don’t be afraid of creative code!