Signals & Sorcery

Creating Procedural Music -prt.2: Standing on the shoulders of giants

Steve Hiehn
3 min readFeb 11, 2017

Before digging into the implementation details I think it will be helpful to summarize my tech stack. Over the last few years I’ve rewritten many of the components several times over with different languages and frameworks. As this project continues i’m certain this trend will continue so bear in mind this is just a snapshot in time

From about 10k feet, it is a distributed cloud based system which is currently hosted on a public cloud.

Permanent data store:

I’m using MySQL as disk storage for all data including analytics. Most of the data is normalized with some exceptions where I treat tables as No-SQL (key to Json). MySQL is easy to use, free, and battle tested.

In-Memory data store:

The next piece is completely essential to the system. This is the in-memory datagrid. There are multiple options but I find Hazelcast ( https://hazelcast.com/ ) very easy to use and it stores Java objects so I chose to work with it. The datagrid is heated at a regular interval with almost the entire database as POJOs. Hopefully you will notice that the system is extremely read heavy. There is no way I could afford the hardware required to support this number of reads from SQL, nor would it make sense.

Application Core:

Next is the what I’m calling the application core. It’s a Spring Boot Java application ( https://projects.spring.io/spring-boot/ ) which is essentially responsible for all the plumbing, such as assembling layers of audio, processing effects, normalization, generating & rendering midi, pushing audio to a content delivery network. It’s also used for extracting features and passing those to the machine learning workers.

Machine Learning Workers:

The machine learning workers are where the magic happens. They receive extracted data features from the application core and decide if the potential options are going to sound ‘good’ or not. When I began the project I started by learning the R language and attempting to write the workers in R. It was easy to use initially but it ended up being horrible as concurrent web service. After aborting mission with R I started using python, which did work well and is also a standard in data science. However, the data access was already written in Java and I became frustrated with the amount of time i was wasting porting data models and data access modules from the existing Spring Java app. This frustration lead me to started playing with the DeepLeaning4J framework ( https://deeplearning4j.org/ ). After testing out the DL4J demo apps and realizing it was easy enough to use and decided to use it. So currently, the ML workers are essentially a Spring Boot web API wrapper around DL4J. The ML workers have been designed to be horizontally scalable. I usually have about 4 instances running concurrently.

Content Delivery Network:

I would imagine any CDN (content delivery network) would do but i currently use Amazon S3 simply because of familiarity.

Training interface:

This is is a Spring Boot / Angular GUI for analytics / model training.

End User Site:

A public facing Nodejs/Angular website used to pipe audio from S3 to users.

This is a list of libraries, frameworks and technology that are significant:

FFMPEG:

https://ffmpeg.org/

Used for many things: encoding, normalization, merging of audio, and some FX.

TarsosDSP:

https://github.com/JorenSix/TarsosDSP

This is the signal processing library I’m using to extract eqs, pitches, and rhythmic templates. There are a few different options for digital signal libraries but it chose this because it was Java and seamless to integrate.

JFugue:

http://www.jfugue.org/

This library saved me an enormous amount of work. As someone with formal music training, thinking in a traditional western harmony works best for me. JFugue allows me to interface with midi via western note and chord names like ‘Cmaj7’. JFugue is used to generate midi-files from note patterns selected by the machine learning workers.

MrsWatson:

https://github.com/teragonaudio/MrsWatson

This is a commandline VST host that can used for 2 purposes.

1) render the midi-files into audio via open source synthesizers.

2) add fx to audio.

Please listen to the latest output from my system here:

http://signalsandsorcery.com/

Thanks!

Prt3: Order Of Operations

--

--