Song-to-Music Video Generator

jl33.ai
DSCubed Blogs
2 min readDec 26, 2023

--

Since the dawn of YouTube, people have made a killing adding a little bit of content over a song. OpenAI’s Dall-E 2.0 endpoint gave me the opportunity to finally try Youtube automation.

My idea was simple: string together the Genius Lyrics and Dall-E API, along with some automated video compiling, to get a song to music video generatorreally just an abstraction of a text to image model.

Rough architecture

This was more of an exercise in automation and data pipelining, rather than machine learning per se.

Does it work?

Yes. The pipeline is able to fully convert any song into a full music/lyric video.

Input

input() = "Sandstorm by Darude"

The search occurs within the surprisingly good Genius API, meaning strings do not need to match exactly.

It costs ~$1.79 per song, and takes ~5 minutes to render. This is what the terminal output looks like:

Processing

Console

Output

Unfortunately, the YouTube lyric video market is pretty oversaturated, so it doesn’t look like I’ll be making any money of this.

Stable diffusion

By the time of writing, a multiverse of GPT-4 wrappers have been spun into existence, in the form of personal projects and even entire companies.

To be honest, I used this relatively shallow project as an excuse to:

a) Listen to more Olivia Rodrigo

b) Learn more about the underlying deep learning theory; Stable Diffusion.

With the help of my friend Sai Kumar — a machine learning engineer at Canva — we look at how stable diffusion transforms song lyrics into pictures from first principles:

Appendix

And for those interested, here’s Sai explaining his paper on how to make stable diffusion models less racist:

Repo

Images:

Thumbnail

Thanks for reading :)

--

--