How to Generate Real Music Using ChatGPT?

With the help of an incredible app that you probably didn’t know about

XQ
The Research Nest

--

Photo by Marcela Laskoski on Unsplash

Wait, what? ChatGPT only generates text at the moment. How is music even possible?

The answer to that question is also the reason why I love what I do. I am not a musician or a magician, but a programmer. And as rightly said, programming is the closest thing to magic in our world. If you can imagine something coherent, you can make it happen with code. It’s a powerful tool to create anything you can think of, including music.

Do you know who else is very good at programming? ChatGPT!

So, here’s the answer.

We can make ChatGPT write code that can generate music as we imagine it in our prompt. But how do we convert code to music?

Fortunately, we don’t have to start from scratch. Sam Aaron has built an incredible tool called Sonic Pi that provides a coding environment that can generate live music.

I think this is a very intuitive way to create music. Instead of trying to predict the following sequence of an audio file, we are directly “computing” music. This gives it a more human context. ChatGPT already has a very large number of learned representations or patterns of context about our world. It knows about our trends, our culture, our music, and everything in between from the point of view of “natural language.”

It also knows about Sonic Pi from all the documentation and tutorials that are already there in its training datasets. Also, ChatGPT can generate code really well. This makes it a perfect candidate to act as a bridge between natural human language and Sonic Pi by translating our thoughts into code. With OpenAI recently launching plugins, one might as well create a plugin to connect the UI interface with a Sonic Pi-like tool to create music from within that interface.

I feel this is a better way to create music than trying to train a generative model over audio data and predict the next sequence. However, I doubt if there is enough volume of data for ChaGPT to perform really well in this task. Sonic Pi is very niche software. Hence, the overall documentation, tutorials, examples, and data are less than other software.

This presents an exciting opportunity to create high-quality and large datasets of Sonic Pi’s code and the corresponding natural language descriptions so that the model learns it better (I don’t know if they already exist). Imagine if all the world’s music is transcribed into the format of Sonic Pi’s code while maintaining its correlation with human context, tone, and emotions, and you train a model with it. Will we get a Midjourney of music?

Let’s see how well ChatGPT performs in this task at the moment.

You can download Sonic Pi from the hyperlink above. It’s a pretty straightforward installation. Then, all you need to do is blindly copy and paste the code that ChatGPT generates and click the run button.

I have created five prompts with increasing complexity to test.

Prompt 1: Write a sonic pi program to compose the complete music of the twinkle twinkle little star.

# set the tempo and use the default synth
use_bpm 60
use_synth :piano

# define the notes and their durations
notes = [:c4, :c4, :g4, :g4, :a4, :a4, :g4,
:f4, :f4, :e4, :e4, :d4, :d4, :c4]

durations = [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1,
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 1]

# play the notes and their durations in a loop
live_loop :twinkle_twinkle do
index = 0
14.times do
play notes[index]
sleep durations[index]
index = index + 1
end
end

It gets it right. Easy task. In most simple tunes, it does well.

Prompt 2: Write a sonic pi program to create the piano tune for the song Payphone by Maroon5

# Set the tempo and key
use_bpm 120
use_synth :piano
use_synth_defaults amp: 0.5

# Define the notes for the intro
intro_notes = [:E5, :B4, :C5, :D5, :C5, :B4, :A4, :B4]

# Define the notes for the verse
verse_notes = [:E5, :B4, :C5, :D5, :C5, :B4, :A4, :B4]

# Define the notes for the chorus
chorus_notes = [:E5, :G5, :A5, :B5, :C6, :B5, :A5, :B5]

# Define the durations for each note
note_durations = [0.5, 0.5, 1, 1, 1, 0.5, 0.5, 2]

# Play the intro
intro_notes.each_with_index do |note, index|
play note, sustain: note_durations[index]
sleep note_durations[index]
end

# Play the verse
verse_notes.each_with_index do |note, index|
play note, sustain: note_durations[index]
sleep note_durations[index]
end

# Play the chorus
chorus_notes.each_with_index do |note, index|
play note, sustain: note_durations[index]
sleep note_durations[index]
end

This sounds coherent but doesn’t exactly sound like Payphone. I try one more approach where I give the model the music transcript of the song from here and ask it to convert that into a Sonic Pi program. It creates a more complex code that plays well, but it doesn’t exactly sound like Payphone, either.

Prompt 3: Write a sonic pi program to play Beethoven’s Piano Sonata №14 in C-sharp minor “Quasi una fantasia”, popularly known as the Moonlight Sonata’s third and last movement, Presto Agitato.

This prompt is inspired by one of my favorite anime called “Your Lie in April.” In the opening scene, the protagonist plays this sequence. Here’s one of the results that ChatGPT gives. It doesn’t sound accurate. At this point, it is clear that ChatGPT cannot reproduce complex tunes completely. However, it is a compelling result following similar notes and beats.

# Set up the tempo and key signature
use_bpm 120
use_synth :piano
use_transpose -1

# Define the notes of the melody
notes = [:Cs5, :D5, :Cs5, :D5, :Cs5, :D5, :Cs5, :D5,
:Cs5, :D5, :Cs5, :D5, :Cs5, :D5, :Cs5, :D5,
:Cs5, :D5, :Cs5, :D5, :Cs5, :D5, :Cs5, :D5,
:Cs5, :D5, :Cs5, :D5, :Cs5, :D5, :Cs5, :D5,
:B4, :Cs5, :B4, :Cs5, :B4, :Cs5, :B4, :Cs5,
:B4, :Cs5, :B4, :Cs5, :B4, :Cs5, :B4, :Cs5,
:B4, :Cs5, :B4, :Cs5, :B4, :Cs5, :B4, :Cs5,
:B4, :Cs5, :B4, :Cs5, :B4, :Cs5, :B4, :Cs5]

# Define the durations of each note
durations = [0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25,
0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25,
0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25,
0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25,
0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25,
0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25,
0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25,
0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25]

# Play the melody
play_pattern_timed notes, durations

Let’s get more creative. For prompt 4, I made ChatGPT write a poem about the spring season and think about creating some music that goes well with it, and then write a sonic pi program for the same. Here’s the poem it generated.

The winter's icy grip begins to thaw,
As springtime's warmth begins to draw,
The flowers from the earth's deep roots,
With vibrant colors, it's spring that shoots.

The trees begin to bud with green,
As new life bursts forth on the scene,
The birds return from their migration,
Filling the air with sweet elation.

The world awakens from its slumber,
And all around, new wonders outnumber,
The sound of rain upon the ground,
A symphony of life, all around.

The sun's rays grow stronger every day,
As the cold of winter fades away,
The grass grows green, the air turns mild,
Nature's beauty, now fully compiled.

Oh, spring, you are a welcome sight,
A time of new beginnings and delight,
A season of hope, of joy, of love,
A gift to us from heaven above.

Here’s the corresponding music it created.

# Set up the tempo and scale
use_bpm 90
use_synth :piano
use_synth_defaults attack: 0.2, release: 1

# Define the notes and chords
spring_notes = [:C4, :D4, :E4, :F4, :G4, :A4, :B4, :C5]
spring_chords = [(chord :C4, :major), (chord :D4, :major), (chord :E4, :major), (chord :F4, :major), (chord :G4, :major), (chord :A4, :major), (chord :B4, :major), (chord :C5, :major)]

# Define the melody and chords progression
melody = [spring_notes[0], spring_notes[1], spring_notes[2], spring_notes[3], spring_notes[4], spring_notes[5], spring_notes[6], spring_notes[7], spring_notes[6], spring_notes[5], spring_notes[4], spring_notes[3], spring_notes[2], spring_notes[1], spring_notes[0]]
chords = [spring_chords[0], spring_chords[1], spring_chords[2], spring_chords[3], spring_chords[4], spring_chords[5], spring_chords[6], spring_chords[7]]

# Play the melody and chords
in_thread do
2.times do
chords.each do |c|
play_chord c
sleep 1
end
end
end

in_thread do
melody.each do |n|
play n
sleep 0.5
end
end

# Add some gentle strings in the background
in_thread do
use_synth :blade
use_synth_defaults amp: 0.2
2.times do
chords.each do |c|
play_chord c
sleep 2
end
end
end

# Play the music for 1 minute
sleep 60

Now, this is quite something. Check out the output I got below.

For our fifth test, let us do some prompt engineering.

The prompt: Act as an expert musician and suggest, in detail, the type and flow of music that can be composed for a given scene and context. Here’s the scene. A 60 second clip of a boxer doing training and various exercises to prepare themselves for their next match. What background music should we compose?

It gives a very detailed response.

“For this scene, I would recommend a high-energy, fast-paced instrumental track with a driving rhythm that complements the boxer’s movements and helps to build tension and excitement. Here are some specific suggestions for the type and flow of music that could be composed:

  1. Tempo: A fast tempo, around 120–140 beats per minute, would work well to match the boxer’s movements and give the sense of intensity and urgency. This could be achieved with electronic drums, a driving bassline, and a repeating melody or riff.
  2. Instrumentation: A combination of electronic and acoustic instruments could be used to create a dynamic and powerful sound. Drums and bass guitar would provide a strong foundation, while electric guitar or keyboard could add melody and texture. Brass or strings could be added to create a more cinematic or epic feel.
  3. Dynamics: The music should have a sense of ebb and flow, rising and falling in intensity to match the boxer’s movements and the progression of their workout routine. There could be moments of quiet and contemplation, followed by explosive bursts of energy as the boxer performs intense exercises or spars with a partner.
  4. Mood: The music should convey a sense of determination and focus, as well as the physical and mental challenges of training for a big fight. This could be achieved with a mix of minor and major chords, and a combination of aggressive and more introspective musical elements.
  5. Structure: The music could be structured in a way that follows the boxer’s workout routine, building in intensity as the exercises become more challenging. This could be achieved by starting with a simple rhythm and melody, and gradually adding more instruments and layers of complexity as the scene progresses.

Overall, the goal of the music in this scene should be to enhance the emotional impact of the boxer’s training, creating a sense of excitement, tension, and anticipation for their upcoming match.”

I then asked a follow-up question, “Write a Sonic Pi program to compose music meeting the above requirements.”

It gave a very detailed code. At first, it threw errors. I copy pasted them back into ChatGPT, and it fixed them. Here’s the running code.

# Set up tempo and time signature
use_bpm 130
use_sched_ahead_time 0.05

# Define the instruments to be used
live_loop :drums do
sample :drum_bass_hard, amp: 1.5
sleep 0.5
sample :drum_snare_hard, amp: 1.5
sleep 0.5
end

live_loop :bass do
use_synth :fm
play_pattern_timed [:c2, :g2, :e2, :g2], [0.5, 0.5, 1, 1.5], amp: 1.5
end

live_loop :guitar do
use_synth :pluck
play_pattern_timed [:c4, :d4, :e4, :d4], [0.25, 0.25, 0.5, 1], amp: 0.8
end

# Define the structure and progression of the music
in_thread do
8.times do
play chord(:c2, :minor), release: 0.5, amp: 0.5
sleep 0.5
end

4.times do
play chord(:f2, :major), release: 0.5, amp: 0.5
sleep 0.5
end

4.times do
play chord(:g2, :major), release: 0.5, amp: 0.5
sleep 0.5
end

4.times do
play chord(:c2, :minor), release: 0.5, amp: 0.5
sleep 0.5
end

4.times do
play chord(:f2, :major), release: 0.5, amp: 0.5
sleep 0.5
end

4.times do
play chord(:g2, :major), release: 0.5, amp: 0.5
sleep 0.5
end
end

# Add dynamics and variation to the music
in_thread do
sleep 16

with_fx :distortion do
4.times do
sample :drum_snare_soft, amp: 1.2
sleep 0.5
end
end

with_fx :reverb do
8.times do
play chord(:c4, :minor), release: 0.5, amp: 0.7
sleep 0.5
end
end

with_fx :echo do
8.times do
play chord(:g3, :major), release: 0.5, amp: 0.7
sleep 0.5
end
end
end

The music does match the vibe and context a bit. It’s fascinating to think that even this much is possible today. I would not call it top-notch, but we are just a training dataset away from creating a robust model.

Here is the music on Spotify.

Here’s another interesting BGM I created with ChatGPT’s help that I actually used in one of my Instagram Reels. Run it on Sonic Pi yourself to listen to it. These are some exciting results.

# Set the BPM and time signature
use_bpm 120
use_synth :piano
use_synth_defaults release: 0.3, amp: 0.6

# Define the drum beats
drum_beats = (ring :bd_haus, :sn_dolf, :drum_cymbal_closed, :drum_snare_hard)

# Define the melody for the music
live_loop :melody do
with_fx :reverb, mix: 0.5, room: 0.8 do
2.times do
play_pattern_timed [:C4, :E4, :G4, :B4, :C5], [0.25, 0.25, 0.25, 0.25, 0.5]
play_pattern_timed [:A3, :C4, :E4, :G4, :A4], [0.25, 0.25, 0.25, 0.25, 0.5]
end
2.times do
play_pattern_timed [:C4, :E4, :G4, :B4, :C5], [0.25, 0.25, 0.25, 0.25, 0.5]
play_pattern_timed [:G3, :B3, :D4, :G4, :B4], [0.25, 0.25, 0.25, 0.25, 0.5]
end
end
end

# Define the drum loop
live_loop :drums do
16.times do
sample drum_beats.tick
sleep 0.25
end
end

# Build up the music
live_loop :build_up do
sync :melody
2.times do
use_synth :pluck
play_pattern_timed [:C5, :E5, :G5], [0.25, 0.25, 0.5], amp: 0.5
sleep 0.5
end
2.times do
use_synth :pluck
play_pattern_timed [:A4, :C5, :E5, :G5], [0.25, 0.25, 0.25, 0.5], amp: 0.5
sleep 0.5
end
end

# Define the beat drop
live_loop :beat_drop do
sync :drums
with_fx :slicer, mix: 0.4 do
with_fx :echo, mix: 0.2, decay: 2 do
use_synth :piano
play_pattern_timed [:C4, :E4, :G4, :B4, :C5], [0.25, 0.25, 0.25, 0.25, 0.5], release: 0.1, amp: 1.5
sleep 0.5
end
end
end

# Slow down for a smooth finish
live_loop :slow_down do
control bpm, slide: 60, finish: 4
sleep 4
end

One thing I notice is how we can create concepts for scene-to-scene music to intricate detail and then write the Sonic Pi program for the same. It feels like this has real potential to replace musicians. ChatGPT gives fantastic suggestions on tone, modulation, melody, loops, build-up, beat drops, and more.

Before writing this article, I had zero knowledge of the technical terms related to music. I also had zero knowledge about the programming language used in Sonic Pi.

Yet, here we are. Thanks to AI.

Things to explore further

  1. Create a ChatGPT plugin to integrate with Sonic Pi
  2. Create more datasets specific to music compositions, human context, and Sonic Pi code.
  3. Can we create a live AI orchestra with ChatGPT + Sonic Pi?
  4. Can you make an automated program that generates varying music based on weather or world events as they happen in real-time and stream it continually on Youtube? Theoretically, very much possible. Make periodic calls to ChatGPT integrated with the new browsing plugin. With that information, create new Sonic Pi programs to flow well as music. Stream that music to Youtube (IDK if anyone has already done this).
  5. Can we use the same approach to create audio effects? Say for video games, transitions, and other such props?

The possibilities are endless, limited by our imagination.

--

--

XQ
The Research Nest

Exploring Tech, Life, and Careers Through Content 🚀