How AI evolved in the world of music: From Markov Chains to LSTMs and NSynths.

Dhruv Kabra
Version 1
Published in
6 min readMay 11, 2023

Hey there, folks! Have you ever wondered how artificial intelligence (AI) is changing the world of music? Well, let me tell you, it’s pretty amazing. AI is being used to create music that was once impossible for humans to make without large groups of musicians and orchestras. And the best part? Anyone can use these tools to make music, regardless of their musical abilities. It all started when a guy named Ianis Xenochius used a Markov chain to generate music way back in 1958. Since then, the technology has come a long way, and we now have deep learning models that can generate music that sounds like it was created by humans. In this blog post, I will take you through the evolution of AI-generated music and show you how we got to where we are today.

Markov Chains

Markov chains are a statistical model that can be used to generate sequences of events based on the probability of each event occurring. In the case of music, a Markov chain can be used to generate new melodies or chord progressions by analysing the patterns and structures of existing music. Essentially, the algorithm looks at a piece of music and identifies the most common transitions between notes or chords. It then uses these patterns to create new music that has a similar structure.

While Markov chains can be useful for generating simple melodies or chord progressions, they are limited in their ability to create longer and more complex pieces of music. This is because they can only grasp short-term sequences and are unable to learn long-term patterns or structures.

Here a simple code snippet to create using Markov Chains :-

import random

markov_dict = {}

with open("music_data.txt", "r") as f:
data = f.read()

notes = data.split()

for i in range(len(notes) - 1):
current_note = notes[i]
next_note = notes[i + 1]
if current_note not in markov_dict:
markov_dict[current_note] = []
markov_dict[current_note].append(next_note)

melody = []
current_note = random.choice(list(markov_dict.keys()))
for i in range(20):
melody.append(current_note)
if current_note not in markov_dict:
current_note = random.choice(list(markov_dict.keys()))
else:
current_note = random.choice(markov_dict[current_note])

print(melody)

The disadvantage of Markov Chains is that it can be used to create music from already trained previous samples using a Probability distribution map it can never infer new notes and create music.

The first attempts to generate music with AI were limited by their short-term coherence. They could only grasp short-term sequences, which made it difficult to create longer and more complex pieces of music. However, with the development of recurrent neural networks, things started to change. These networks are designed to learn from sequences, making them perfect for generating music.

The rise of LSTM(s) in the late 90’s gave rise gave newer ways a neural network could hold a memory to it’s previous sequences.

Long short-term memory networks (LSTMs) are a type of recurrent neural network that are designed to learn from sequences of data. Unlike traditional neural networks, which are designed to work with fixed-sized inputs, LSTMs are capable of processing sequences of data of arbitrary length. This makes them ideal for generating music, as music is essentially a sequence of notes or chords.

LSTMs work by processing each note or chord in a sequence, and then using the information from previous notes or chords to predict what should come next. This allows the model to learn long-term patterns and structures in the music, and to generate longer and more complex pieces of music.

# Import the necessary libraries
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense, LSTM, Dropout

# Load the music data
# This can be in any format, such as MIDI or a text file of notes and durations
# For example, you can use the music21 library to load a MIDI file:
# from music21 import converter
# score = converter.parse('path/to/midi/file.mid')
# notes = score.flat.notesAndRests
# durations = [n.duration.quarterLength for n in notes]
# pitches = [n.pitch.midi for n in notes]
# music_data = list(zip(pitches, durations))
music_data = [(60, 1.0), (62, 0.5), (64, 0.5), (60, 1.0), (62, 0.5), (64, 0.5), (62, 1.0), (60, 1.0), (62, 0.5), (64, 0.5), (62, 1.0), (60, 1.0), (62, 0.5), (64, 0.5), (60, 1.0), (62, 0.5), (64, 0.5), (62, 1.0), (60, 1.0), (62, 0.5), (64, 0.5), (62, 1.0), (60, 1.0), (62, 0.5), (64, 0.5), (60, 1.0), (62, 0.5), (64, 0.5), (62, 1.0)]

# Define the vocabulary size and sequence length
vocab_size = len(set(music_data))
seq_length = 20

# Create input/output sequences
input_seqs = []
output_seqs = []
for i in range(len(music_data) - seq_length):
input_seq = music_data[i:i+seq_length]
output_seq = music_data[i+seq_length]
input_seqs.append(input_seq)
output_seqs.append(output_seq)

# Reshape the input sequences
x = np.reshape(input_seqs, (len(input_seqs), seq_length, 2))

# One-hot encode the output sequences
y = keras.utils.to_categorical(output_seqs, num_classes=vocab_size)

# Define the LSTM model
model = Sequential()
model.add(LSTM(units=128, input_shape=(seq_length, 2), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=128))
model.add(Dropout(0.2))
model.add(Dense(units=vocab_size, activation='softmax'))

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy')

# Train the model
model.fit(x, y, batch_size=32, epochs=100)

# Generate new music
start_seq = [(60, 1.0), (62, 0.5), (64, 0.5), (60, 1.0), (62, 0.5), (64, 0.5), (62, 1.0), (60, 1.0), (62, 0.5), (64, 0.5), (62, 1.0), (60, 1.0), (62, 0.5), (64, 0. LSTM, Dropout

Fast forward to the decade of 2010s, Generative Adversarial Networks (GANs) were first introduced in 2014 by a group of researchers led by Ian Goodfellow. This technique has since become incredibly popular and powerful in the field of deep learning. It’s been used to create things like realistic images and videos and to synthesize text into images.

A generative adversarial network (GAN), allows for a novel way to generate data. The GAN has two parts — the generator and the discriminator, both of which are neural networks. The generator’s job is to create new data, while the discriminator’s job is to determine if the data is real or fake. The problem with recurrent networks is that they can’t learn long-term sequences because of the vanishing gradient problem. To solve this, researchers developed long short-term memory networks (LSTMs), which improved the architecture and what the model could generate.

The magenta team at Google Brain has been developing code for generating music using machine learning, and they are really on the bleeding edge of this stuff. Through a cat-and-mouse game between the generator and the discriminator, the generator becomes so good at creating music that even the discriminator can’t tell if it’s real or fake. This technology is especially useful for creating monophonic music, but GANs are also being developed to generate polyphonic music (music with chords). Fast forward to today, and we have NSynth, which is built on top of WaveNets. NSynth is a sound maker that lets you try out different instruments and combine them to make all-new sounds.

It’s pretty amazing to think about how far we’ve come in just a few decades. From Markov chains to deep learning models, AI-generated music has come a long way. AI in music is just one example of how technology is changing how we create and consume art. It can also be used in areas like advertising, film scores, and game soundtracks. The democratization of AI tools means that anyone can use them to create music, no matter their background or experience.

Of course, there are still some important questions to answer in this space. How do we decide on a proper representation of music? What music data should we use? Whose music counts? These are all important questions, but for now, let’s just appreciate how far we’ve come. We may not be at the point where AI-generated music can replace human musicians, but we’re getting there. Who knows what the future holds?

So next time you listen to a song, take a moment to think about the technology behind it and how AI is changing the game. Who knows, maybe the next hit song will be created by a computer.

About the Author:
Dhruv Kabra is a Pyhton Developer here at Version 1.

--

--