A Brief History of Machine-Assisted Music in Video Games

Published in

The Sound of AI

8 min readMar 15, 2019

The modern-day adventure is virtually embodied by the video game; an enthralling, expansive, player-focused experience that captivates our imaginations with extensively engaging content. Consequently, game audio developers are continually seeking to exceed the capabilities of the sonic and musical experience in games, and particularly the next generation of VR immersion. But game audio has come a long way since the early frontier days of tapes and cartridges. Let’s look back at the history of machine-assisted, machine-generated and procedural music, while highlighting its many challenges and innovations through some key examples.

A sonic VR experience, brought to you by Melodrive.

Procedural content generation

In general terms, procedural generation refers to any aspect of game development that’s deferred to computer algorithms rather than manual creation by a game developer or designer. For instance, in a space simulator one might decide to write code that generates a huge galaxy automatically with some element of randomness, instead of exhaustively determining the appearance, physics or attributes of each planet or system. In fact, this is exactly how the ground-breaking Elite series managed to create its rich, sprawling planetary systems for player exploration. In these early games, processing power and memory were at a premium, and procedural generation enabled huge experiences to spawn from a single floppy disk.

Nowadays, as game designers, we’re probably less limited by computing resources than our own imaginations, and procedural generation is mostly used to create unique, dynamic and individual experiences for the users. Maxis’ Spore takes inspiration from DNA sequencing and fractal theory to create exotic creatures on the fly. Minecraft takes Perlin noise formulae and generates vast swathes of landscape and terrain elements with complex flora and fauna.

Procedural generation of music

Procedural music — a term often used in conjunction with non-linear, dynamic, interactive or adaptive music — refers to programmed music within a game that can change or respond to different states or events at varying degrees, usually in realtime. For games, we seek a non-linear audio experience, as opposed to say a film, where each audio event follows another in the series, matching the prescribed events in the film. A player is probably going to spend a lot more time in a game environment, retracing and replaying specific stages multiple or consecutive times. An unchanging linear soundtrack can easily become tiresome, or worse, irritating and distracting for the player.

Karen Collins distinguishes between interactive and adaptive audio. Interactive audio is directly influenced by the players’ actions or input in game (for example, Super Mario hitting a question mark block to reveal a coin induces that instantly-recognisable sound effect). Adaptive audio is not directly influenced by the player’s actions, but by rather more complex states and events such as the environmental time, location or other factors that might not always be transparent to the player.

At Melodrive, the approach we use is described as deep adaptive music, reflecting how our system can dynamically react to deeper multi-dimensional states that encompass the human facets of emotion and arousal. You can read more about our ongoing research related to that on this very blog.

As a final note on this topic, in wider academic and musical contexts, procedurally adaptive music is inextricably related to algorithmic or generative music, also using a formalised set of rules or procedures to relinquish musical and composition decision-making to a machine. Many techniques exist for generating music automatically, often drawing inspiration and referring back to natural or biological systems. For example, Al Biles borrows from the evolutionary principle of natural selection to trade jazz fours using genetic algorithms, and Google recently made headlines using a deep learning network that simulates neural processes to spit out piano phrases, which is a landmark moment in AI composition history.

Early days

The biggest challenge with developing games is that sometimes you have to compromise your great ideas owing to the limitations of the platform you’re using. While early video game sound and music may sound crude and primitive to our sophisticated ears of today (even chiptune, which revels in its retro appeal), many groundbreaking innovations that came out of this era have proven to have had a profound influence on subsequent generations of game audio methods.

LucasArts

Any review of the historical use of procedural music invariably begins with LucasArts, the video gaming arm of George Lucas’ interactive media empire, and which also encompasses Skywalker Sound and Industrial Light and Magic. In the early 90s, the studio released a string of point and click adventure games with rich storytelling, fiendish puzzles and unique humour. Building on prior work by Peter Langston on the BallBlazer games in the 80s, composers Michael Land and Peter McConnell developed the iMUSE system that let them smoothly transition through variations on themes depending on different game scenarios. You can hear iMUSE across many titles, such as X-Wing, Sam & Max and Grim Fandango, but perhaps most memorably in the Monkey Island series where the endless pastiches of Carribean ‘easy listening’ accompany the adventures of the hapless pirate Guybrush Threepwood.

Horizontal and vertical stem mixing

Part of the reason why early games feel dated today is that the audio was synthesised artificially using the hardware of the time, rather than played using existing recorded or sampled sound. With the proliferation of digital audio, the compact disc and MPEG compression, ‘real’ audio soon became available for consumer PCs and consoles. I still fondly remember my amazement on hearing, for the first time, the big beat sounds of actual Leftfield and Chemical Brothers tracks on my PlayStation, such as the high octane racetracks of Wipeout or Public Enemy — as I jumped 360s in Tony Hawk’s Pro Skater. But after a while, I felt wiped out hearing the same tracks umpteen times, and soon availed myself using the trick of swapping the disc for another to get something different.

Game developers quickly realised that fixed, linear tracks of existing music were not flexible or interactive enough for the purposes of modern video game music — especially after the novelty wore off with the public. To remedy this, they quickly worked with composers and sound people to use stems of music rather than final mixes.

A stem is audio parlance for a group of one or more tracks that contributes to a final mix or song. So the drums could be considered one stem, but so too could all the guitars, all the keyboards or all the guitars and keyboards together. The exact contents of a stem is up to those using them. Stems can be combined in many different ways to come up with endless arrangements, in a process known as vertical layering, which is the stacking of different stems on top of each other to be heard at the same time.

Stems can also be repeated (in small loops or sections) or changed and branched sequentially in various combinations, known as horizontal mixing. By combining horizontal and vertical layering we can begin to envisage building an extended, dynamic arrangement of ever-changing music from a very small set of elements. This is exactly how it’s done, for example, in car racing games series Forza Motorsports. (Full disclosure: I worked on several of these titles.) Audio lead Mike Caviezel says:

The last couple of Forza titles, 3, 4, and 5, were very EDM-friendly. Electronic music lends itself well to being remixed. So if you’re an electronic artist who wants to pitch your music to a game, make sure that you have your individual stems available, so that if we ever would want to create alternate mixes or pull out the drums in the middle of a racing game, we can do that on the fly. That’s very important for us.

Composer Lance Hayes has a good video of how this works in practice:

But it need not be just for EDM. One of the most compelling applications of stem mixing in recent years has been Rockstar’s Red Dead Redemption, and documented beautifully in the video below. Gathering a host of legendary musicians and a wealth of authentic instruments, they perfectly recreate the sepia-tinged deserts of Leone and Morricone’s Wild West fantasy. But, if you watch the video you will notice that they constrain everything to one single key: A minor.

Modern audio middleware systems, such as FMOD and Wwise are there to serve as a bridge between developers and audio people. One of their unique strengths is being able to facilitate exactly the kind of non-linear and responsive musical demands of interactive games — features that are inherently lacking in traditional timeline-focused environments such as Pro Tools or Logic.

Pure procedural music

Stem mixing and layering of audio assets certainly increases engagement and variety in the game music experience, but these can still fall short in comparison to ‘pure’ procedural audio systems that link up all parameters of the music generation process to in-game events. Game audio expert and researcher Andy Farnell has often discussed the benefit of a revival of early “embedded” synthesis techniques which could be controlled procedurally. He argues that a fine-grained control over all parameters of sound, can potentially offer the richest integration in a complex game world.

There are some modern games that use procedural content generation to create the entire music composition. Demoscene shooter ..krieger harnesses PCG in all aspects of the game, and the soundtrack generates streams of MIDI data for its own V2 synthesiser. Rez Infinite quantises note events to ensure that the EDM heavy music stays in sync with the player’s actions. Spore teamed up with Brian Eno and used a heavily customised version of the interactive media software Pure Data to create its immersive ambient musical soundscape. Finally, No Man’s Sky’s composer Paul Weir joined forces with UK post-rock outfit 65 Days of Static, who used their prior knowledge of live coding and modular synthesis to create a truly unique music experience.

The best of both worlds

To build truly rich and compelling soundtracks for modern games and interactive media, we need to have interactive, ever-changing experiences that react to the myriad structure of in-game minutiae and player progression and development. Game music also needs to draw from a wealth of musical knowledge and sources that comprise high quality sampled and synthesised sound.

At Melodrive, we’re hard at work offering a strong marriage of both spheres: combining sophisticated algorithmic composition and parametric synthesis of new sounds along with complex samples of instruments for the real world, processed with state-of-the-art DSP effects. Hundreds of gamers have told us that they’d love to create their own music, if only they had the means to do it.