Ai music composer

Manish Pawar
All things AI
Published in
4 min readAug 20, 2018

Assuming u have prior knowledge of Deep learning, i would like to dive straight into the topic.

As music is of sequence data type, we need to use RNNs (lstms or gru) as per tradition.We will be using midi file and all preprocessing on it using the documentation here. Lemme explain it piece by piece. Entire code is in my repo.

Lets code :

STEP 1 : Since i was doing it on google colab, i needed this to mport my mid file. It’s final fantasy theme called Suteki Da Ne(piano version). NOTE : usually mido isnt installed on colab servers, but u can install it just by !pip install mido.

from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
print('User uploaded file "{name}" with length {length} bytes'.format(
name=fn, length=len(uploaded[fn])))
from mido import MidiFile, MidiTrack, Message
from keras.layers import LSTM, Dense, Activation, Dropout
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.optimizers import RMSprop,Adam,SGD,Adagrad
import numpy as np
import mido
midi = MidiFile('Suteki_Da_Ne_(Piano_Version).mid')

STEP 2 : We do some preprocessing- a track has list of msg+meta msg from which if we exclude meta msg,then its easier to play song on any port.Then our note is the data of msg (msg.bytes).

notes=[]

time=float(0)
prev=float(0)
#The tracks attribute is a list of tracks.
#Each track is a list of messages and meta messages, with the time attribute of each messages set to its delta time (in ticks)
for msg in midi:
time += msg.time
if not msg.is_meta: #easy to playback on port
#only interested in piano channel
if msg.channel==0:
if msg.type=='note_on':
note=msg.bytes()
# [only interested in the note and velocity. note message is in the form [type, note, velocity]]
note=note[1:3]
note.append(time-prev)
prev=time
notes.append(note)

STEP 3 : We need to scale our data in 0–1 range so eg. our note becomes -[0.5113636363636364, 0.6062992125984252, 0.0] from [69,77,0] ..We gottta unroll back later (step 7)

# need to scale notes
n=[]
for note in notes:
note[0] = (note[0]-24)/88
note[1] = note[1]/127
n.append(note[2])
max_n = max(n) # scale based on the biggest time of any note
for note in notes:
note[2] = note[2]/max_n

STEP 4 : Our input(x) will be of 20*3 dim numpy arrays with output being the next note to be predicted(y).

x = []
y = []
n_p = 20
for i in range(len(notes)-n_p):
current = notes[i:i+n_p]
next = notes[i+n_p]
x.append(current)
y.append(next)
x=np.array(x) # convert to numpy arrays to pass it through model
y=np.array(y)
print(x[1])
print(y[1])
OUTPUT :
[[0.51136364 0. 0.12408759]
[0.52272727 0.50393701 0.00104275]
[0.52272727 0. 0.12408759]
[0.60227273 0.50393701 0.00104275]
[0.60227273 0. 0.12408759]
[0.48863636 0.50393701 0.00104275]
[0.48863636 0. 0.12408759]
[0.51136364 0.50393701 0.00104275]
[0.51136364 0. 0.12408759]
[0.59090909 0.50393701 0.00104275]
[0.59090909 0. 0.12408759]
[0.46590909 0.50393701 0.00104275]
[0.46590909 0. 0.12408759]
[0.54545455 0.50393701 0.00104275]
[0.54545455 0. 0.12408759]
[0.56818182 0.60629921 0.00104275]
[0.56818182 0. 0.12408759]
[0.45454545 0.50393701 0.00104275]
[0.45454545 0. 0.12408759]
[0.46590909 0.50393701 0.00104275]]
[0.46590909 0. 0.12408759] ----this is my y

STEP 5 : Now, building a lstm network,we use RMSprop since it works well with RNNs (i suggest u try playing around with this since val_acc can be increased further from 95%)

model=Sequential()
model.add(LSTM(512,input_shape=(20,3),return_sequences=True))
model.add(Dropout(0.3))
model.add(LSTM(512,return_sequences=True)) #return_sequences=False
model.add(Dropout(0.3))
model.add(LSTM(512))
model.add(Dense(256))
model.add(Dropout(0.3))
model.add(Dense(3,activation="softmax")) #output=3

model.compile(loss="categorical_crossentropy",optimizer="RMSprop",metrics=["accuracy"])
model.fit(x,y,epochs=1000,batch_size=200,validation_split=0.1)
OUTPUT :
Epoch 999/1000
955/955 [==============================] - 1s 1ms/step - loss: 0.5739 - acc: 0.9476 - val_loss: 0.6992 - val_acc: 0.9159
Epoch 1000/1000
955/955 [==============================] - 1s 1ms/step - loss: 0.5732 - acc: 0.9372 - val_loss: 0.7183 - val_acc: 0.9533

STEP 6 : Now note here that for generating text from rnn,we randomly pick a word and append the predicted word to it ( check my generating text with rnns here).BUT for audio, we cant append it directly, so we squeeze x and then combine with predicted note

seed=notes[0:n_p]
x=seed
x=np.expand_dims(x, axis=0)
print(x)
predict=[]
for i in range(2000):
p=model.predict(x)
x=np.squeeze(x) #squeezed to concateneate
x=np.concatenate((x, p))
x=x[1:]
x=np.expand_dims(x, axis=0) #expanded to roll back
p=np.squeeze(p)
predict.append(p)

STEP 7 : rolling back note[0,1,2]. note[0] is predicted note[0]{a[0]} and so on.

# unrolling back from conversion 
for a in predict:
a[0] = int(88*a[0] + 24)
a[1] = int(127*a[1])
a[2] *= max_n
# reject values out of range (note[0]=24-102)(note[1]=0-127)(note[2]=0-__)
if a[0] < 24:
a[0] = 24
elif a[0] > 102:
a[0] = 102
if a[1] < 0:
a[1] = 0
elif a[1] > 127:
a[1] = 127
if a[2] < 0:
a[2] = 0

STEP 8 : Now to decode arrays back to midi file,just a snippet…amd if u wanna download ur generated song from colab , then simply write :
‘files.download(‘Ai_song.mid’)’

rom mido import MidiFile, MidiTrack, Message
#saving track from bytes data
m=MidiFile()
track=MidiTrack()
m.tracks.append(track)

for note in predict:
#147 means note_on
note=np.insert(note, 0, 147)
bytes=note.astype(int)
print(note)
msg = Message.from_bytes(bytes[0:3])
time = int(note[3]/0.001025) # to rescale to midi's delta ticks. arbitrary value
msg.time = time
track.append(msg)

m.save('Ai_song.mid')

Voila ! u have just made an outstanding Ai theme composer..Further we can generate a lyrical song by considering pitch,vocals,etc in our array.

A music album called I AM AI, the featured single of which is set to release on August 21st, is the first album that’s entirely composed and produced by an artificial intelligence. It works in collaboration with a human artist, who provides inputs that Amper uses as composing parameters.

Credits to Siraj_raval , I have modified the model & merely created a wrapper to understand

--

--