Making Kantor: Human-machine Collaboration to Build Artistic VR

Valerio Velardo
The Sound of AI
Published in
10 min readOct 22, 2019

Kantor is a short artistic VR experience for Oculus Rift and Rift S. It started as a simple hack and turned into a series of 9 abstract worlds inspired by the shapes and colours of Russian painter Wassily Kandinsky. In Kantor, players can explore evolving 3d paintings flying through space freely. The visual and audio elements fuse to evoke a profound emotional experience.

[You can download Kantor for free on Itch.io]

Composition 8 by Wassily Kandinsky. Kantor borrows its visual style from Kandinsky’s work.

The human-machine co-creation process

With Kantor, we set out to explore in VR a few philosophical themes that are part of our daily research at Melodrive. Investigating the creative relationship between humans and machines is what interested us the most. Can AI extend the human creative horizons? We’ve tangentially reflected on the implication of this question while building Melodrive Indie — a piece of intelligent software that generates music for video games using AI. In Kantor, however, we wanted to tackle the issue of human-machine co-creation head-on. So, we implemented a custom system to co-compose music with a simple generative system, and a procedural templating system able to generate Kandinskian worlds, starting from units of colour and shape.

A Kandisky-inspired world in Kantor.

Our co-creation approach, detailed below for music and visuals in isolation, includes four iterative steps. Initially, we pass instructions to the generative software. In this phase, we configure the machine and constrain its “imaginative” potential. You can compare this step to writing a film script. A script doesn’t map perfectly to the finalised movie. There are still numerous decisions the director can take that aren’t covered on the written page. Should the furniture in the hotel where the action takes place be modern or in a mid-century style? Should the actor who shoots his lover in cold blood feel disgust, fear, or, perhaps, excitement? The generative software we leveraged in Kantor has creative freedom similar to that of the film director and actors. In step two of the co-creation pipeline, it uses our instructions as the equivalent of a script for content generation. Once the machine has produced the content, in phase three, we re-appropriate the creative process by polishing the generated material. Finally, we take an educated guess of what might not be working in the generative system, based on its creative output. As a result of this diagnostic phase, we change the initial instructions. In more radical circumstances, we tweak the code of the system to implement our desiderata.

The human-machine co-creation approach we used to create Kantor.

Kandinsky and compositionality

Why did we reference Kandinsky’s work? For once, we love his art. This would be a good enough reason in itself, but there’s more than that to our choice. Kandinsky’s creative practice is an excellent example of the principle of compositionality in the visual arts. The meaning of a Kandinsky’s painting emerges through the interaction (composition) of single patterns, such as lines, circles and squares. When considered individually, these patterns don’t display much artistic quality. They are simple geometries. However, when enough patterns are wisely composed together, an aesthetic quality emerges. Re-purposing an old adage, we can say that the totality of a Kandinsky’s painting is more than the sum of its single shapes.

Kantor’s trailer.

With Kantor, we aimed to create a piece in virtual reality where the visual and sonic levels emerge as the result of the composition of a multitude of intertwined units. Similarly to what happens in Kandinsky’s paintings, the audio-visual meaning of the worlds in Kantor can be appreciated when experiencing different building blocks in conjunction. However, differently from a 2d canvas, when players fly through the abstract 3d world, they can perceive limited selections of the whole composition, therefore, creating a unique, ever-changing experience of each island. Generative techniques added a layer of unpredictability to the compositions. Every time players enter the Kantor’s world, they will experience islands with slightly different arrangements and sonic profiles.

Co-creating music with machines

Compositionality has been a guiding principle for producing the Kantor’s soundtrack. The different islands have unique sonic profiles, achieved through different arrangements of instrumental ensembles. Each geometrical pattern in an island is an audio source that broadcasts an instrumental part. The sound in the experience is spatialised and surrounds the player. The polyphonic music emerges through the interaction between the music associated with each shape and the player’s position. By flying across an island, the player can experience infinite, slightly different implementations of the same piece. The music isn’t static. It’s a living being that evolves as a function of the player’s position and the dynamic distances between the geometrical shapes.

To create the music, we implemented a generative system in the Python programming language. The system is loosely inspired by the notion of formalised music introduced by the composer Iannis Xenakis. Like in the compositional practice of Iannis Xenakis, we use stochastic (random) mathematical functions to generate musical sequences. Our system produces musical events, defined by pitch, duration and intensity. A human composer can direct the random generation, by injecting constraints into the system to account for several musical domains. Before generating some music we can, for example, choose the number of sections included in a piece. We can also set the number of independent musical voices and decide which parts to silence in which sections. We can tweak the density of the musical events in the piece and, additionally, specify a range of pitches, durations and intensities, the system can pick from in the production process. When considered altogether, these instructions represent the script we pass to our co-creator in silico.

Once we developed the generative system, we started sketching out the different pieces we would need for the experience — 9 in total. We began each piece by choosing an overall idea for how the music should unfold. Should the composition be very dissonant or consonant? Predictable or more random? Should we use complex rhythms or simple patterns?

Instruction data passed to the music generative system.

While engaging with these high-level compositional choices, we also tried to envision the sonic qualities of the music and the relationships between different musical dimensions. We’ve soon realised that steering the generative system in a musical domain like pitch or rhythm, may or may not have a snowball effect on all the other dimensions. For example, a very rhythmical piece with a ton of drum parts is inherently dissonant, so restricting the notes to deploy in a musical scale isn’t of much use. By contrast, if the piece has numerous melodic lines, constraining the notes available to the system for generation becomes vital. It enables us to modulate the level of dissonance in the composition.

For each piece, we ended up thinking about the aesthetic direction we intended to follow. To achieve the desired result, we implemented constraints in the music engine, updating a configuration file. Once we had done this, we used the engine to generate a piece, rendering all the parts, typically 30–50 different monophonic lines, into MIDI files. Working with MIDI files made it easy to map the notes to any instrument we wanted.

To produce a piece we imported the generated MIDI sequences into a DAW, splitting up the parts on to several tracks. We used a wide variety of sample instruments, synthesis techniques and audio processing tricks, to get an array of exciting timbres to fill up the soundscapes.

MIDI files in Logic for 7 different parts composed by the generative music system.

The compositional process has been iterative for most pieces. We would start with a high-level idea, tweak the engine, generate a MIDI file and render the sequence in a DAW. At this point, we would evaluate the sonic results, and come up with new ideas on how to improve the music. We would then go back to re-tweak the engine to implement the new musical intuitions and, therefore, restart the compositional loop.

We went back and forth like this until we had all the 9 musical pieces. In sum, the co-creation process has been a continuous negotiation of two strategies. We embraced what the system spat out, but we also fed ideas back into the engine to slightly nudge the frame of the composition in a different direction.

If you’d like to listen to the sonic results of our musical co-creations with the machine, check out the Kantor soundtrack on Bandcamp. It’s free!

Co-creating visuals with machines

For the visuals, we took a more human-centered creation approach. First, we studied the shapes and colours used by Kandinsky in his paintings. Once we realised we had a sufficient understanding, we went on to create a series of models and textures that would re-create the signature patterns used by the Russian artist in 3d.

Examples of Kandinsky-inspired 3d models.

To spice up the experience, we coded a few shaders which added an extra level of dynamism. Of these, we decided to include in the experience only a dissolve shader that dematerialises the textures of the 3d objects.

An island rendered with our custom dissolve shader.

After the visual assets were ready, we enabled the models to rotate on an axis, translate or revolve around a coordinate in the world space. At this point, we realised that we could use the 3d models, motions and shaders as the building blocks of our creative visual palette. By combining these elements together in different permutations, we were able to elicit an aesthetic experience in the player. You may recognise in this process the enactment of the principle of compositionality unconsciously used by Kandinsky to build up his paintings out of simple units.

As the next step, we developed a procedural template system that can generate an island of shapes following a set of instructions. When considered together, these instructions form the configuration of an island class that constrains the generation process. In these instructions, we can specify, for example, the number and type of shapes to render, the motions to apply to the models, and the generation area. The generative system leverages this configuration to create template islands in realtime, while the player explores the world. In the film metaphor we used earlier, these configurations correspond once again to a script.

A tad of Kantor’s “gameplay”.

Differently from the case of the generative music system, with the procedural island template system, we renounced to have the last word over machine generation. We did this to inject an element of unpredictability in the experience. Even though players can learn to recognise the defining traits of the 9 template islands encountered in Kantor, they’ll never experience the exact same island twice. The implementation details, like the number and type of models, their position in space, and the configuration of their motions, will always be different. The ever-changing distribution of the models guarantees that the music profile itself, which depends on the relative position between the geometrical patterns, will never repeat exactly the same. Paradoxically, in Kantor players can experience an almost infinite audio-visual configuration, even though they’re condemned to loop through the same 9 island templates. Can they learn to appreciate the slight differences between the re-implementations of the same world templates? We’re not sure.

To enable players to explore different islands, we added portals which transfer players from one world to another. The portals are implemented as paintings which display a 2d dynamic view of the island the players will be accessing next. Every time players venture into a new island, they recursively enter a deeper meta-abstraction level. When they’re exploring the third island, for example, players are in the painting of the painting of the painting. (Georg) Kantor is a door into infinity.

In Kantor, players access new islands using 2d paintings as portals.

Conclusion

We had a blast developing Kantor. For audio people like us, building a full VR experience presented an ideal opportunity to venture into the fascinating world of visual art. Implementing an end-to-end human-machine co-creation pipeline in VR for both the audio and visual domains has been the climax of this exciting journey. When we collaborated with the generative systems, we appreciated the transformative potential of AI for co-creation. Even though we’re experienced in building AI creative systems, this was the first time we co-operated as creatives with a machine on an art piece. The feeling of sharing creative ideas with a machine can’t really be told in words.

We’d like to conclude this article by sharing some of the questions we’ve asked ourselves while working on Kantor.

  • Are there some inherent differences in the way we can perceive art in VR compared to other, more traditional media?
  • When do art objects/audio signals grouped together cease to be perceived individually and become a meaningful unit?
  • Can we use compositionality and procedural generation in VR to get a glimpse into infinity?
  • Will composers, artists and technologists work in collaboration with creative AI systems to fuel their practice in the near future?

We reserve to do more research before addressing these questions in a future post, or, perhaps, in our next VR experience. However, we’d love to start a debate around these topics. So, let us know what you think!

Finally, if you have an Oculus Rift (S) and are intrigued by our work, you can check out Kantor on Itch.io — it’s free as in free beer!

--

--