Play Bach: Let a Neural Network Play For You. Part 4
I do not know how to play music. But I can still play with music.
This is part of a series of articles, which explores many aspects of this project, including static MIDI file generation, real time streaming, Tensorflow/Keras sequential and functional models, LSTM, over and under fitting, attention mechanism, embedding layers, multi-head models, probability distribution, conversion to TensorflowLite, use of TPU/hardware accelerator, running application on multiple platforms (Raspberry PI, edge devices) ….
The entire code is available on github
So far we have generated a MIDI file and have played it from our desktop. A bit boring ?
Let’s build a slightly nicer user interface from which we can stream generated music in real time, using any device (smartphone , tablet, ..), from anywhere on the internet
The use case is bragging with your friend, “see what I built”.
You can also listen to (hopefully) interesting music, which is being generated, at this very moment, just for you.
The real time prediction and streaming application are structured as two Python threads, communicating thru a queue, and controlled by a GUI (Graphical User Interface)
- The Predict thread loads an already trained model, runs inference, and converts the predicted MIDI note/chord to a PCM audio sample. PCM is a technique to digitally encode analog sound signal. This ‘chunk of sound’ is put on a queue, and the predict thread proceeds to the next inference.
- The Streaming thread reads audio sample from the queue, and sends it to the smartphone’s browser. The browser recognises the data as PCM, and plays it using the smartphone/tablet/desktop audio system. The streaming thread then reads the next audio sample from the queue.
- The GUI is used to configure inference parameters, start the predict thread and start the playing of music.
The GUI, the Predict and Streaming thread run on a server powerful enough to run inferences. The client needs a browser and a sound system only.
The objective of such a dual thread approach is to maximize parallelism and improve overall server performance. We will discuss performance and running on low-end platform in future articles.
The streaming thread uses Flask , a python micro web server.
The GUI is developed using REMI.
The GUI , available on the smartphone’s browser, allows to:
- Load a trained model. So far we trained a model based on suite for solo cello, but what about Brandenburg Concerto №1 in F Major — BWV 1046, or Christmas Oratorio · BWV 248 …
- Select an instrument to play the music. Typical SoundFont will come with 10’s of instruments.
- Select the tempo, i.e. the “speed’ of the music, in BPM (beat per minutes).
- Select a ‘temperature’ which , in a way , increases randomness in the network prediction. This may generate interesting, surprising or weird music.
Hit Play, et voila …. music will play on your device
A few monitoring functions are also included: server test to ‘ping’ the inference/streaming server, audio test to test the audio subsystem on the server, and some statistics on the Flask web server.
It is possible to change the inference parameters while playing music. The updated parameters will be taken into account when all previous predictions have been played.
a JSON file is used to configure the GUI: credentials, IP address or DNS name and ports for the GUI and the web server
"dns" : "http://192.168.1.61:"
To start the inference/streaming/GUI server, run:
python -m play_bach.py -st
My experience is that most of the complexity (and lines of code) of such an application is not in the deep learning model definition , but in preparing the data so that they can be fed to the model, and in designing how to use the prediction in a ‘real world application’. Getting a prediction (a softmax — see part 3 ) is good, but using a prediction to drive an application that impacts the real world and ‘touches’ the end user is much better . Unless the music is really awful, of course.
The next article will cover an other aspect of this project.
In the mean time stay tuned !!!
— — — — — Do not cross this line if you are not interested in details — — — — —
This whole series of article is about supervised learning. i.e. training a model with examples. In the case of training a model to recognize a dog from a cat, that would be many pictures of cats and dogs, each labeled as ‘cat’ or ‘dog’. This implies that someone, at a time, somewhere, did the work of labelling the pictures and this had to be done by a human, looking at the picture. Of course databases of labeled pictures now exists.
Our case is indeed supervised learning, but we are lucky enough not to have to explicitly label the training data. The very structure of our data provides us automatically with the label (i.e. the right answer), because it is just the 41th note.
The other type of learnings (not covered here) are:
- unsupervised learning: when no labels are available, because it would be too costly to create them, or because they just do not exist, the model can use unsupervised learning methods to extract information about the raw data, i.e. cluster the data in groups which share a common characteristic
- reinforcement learning: a baby will quickly learn that ‘fire burns’. He/She tries (possibly at random) to interact with its environment and gets a feedback, in that case a negative one, that he/she will remember. In reinforcement learning, the model interacts with an environment, takes actions which impact the environment, and in return receives a ‘reward’, which can be positive or negative. The model is trained to maximise the long term value of the reward. A typical example is to train a model to play a game. Fascinating stuff …
So , was crossing the line worthwhile ?