The Making Of An AI Music Video: Neon City Nights

Published in

Deus Ex DAO

8 min readJul 4, 2023

The recent developments in AI in the field of image and video generation enable artists to push the limits of their creativity and explore new forms of visual storytelling. From generating realistic landscapes to breathing life into fantastical characters, image and video generation technologies hold great promise for artists seeking to explore multimedia creations.

As an artist who has been producing music for years, my music always had a visual component which I tried to express by adding animation. However, my journey into video production was filled with hurdles and frustrations. Despite my lack of animation skills, I stubbornly attempted to create videos, often resulting in time-consuming projects. Unfortunately, this detracted time and energy from my music making, as hours that should have been spent composing were consumed by animating the visuals. However, the latest innovations in AI made me rethink the animation process. These new image and video technologies have empowered me to create animations, enabling me to bring my music to life.

In the last year the members of Deus Ex DAO have enthusiastically experimented with AI tooling to create a brand style, to create content or increase their productivity. With the cyberpunk brand style of Deus Ex DAO as an inspiration and starting point, I started the Neon City Nights project. In this article I will describe the reasoning and challenges of creating this multimedia expression with the help of AI. If you are keen to listen and see the result visit this link for the narrated version or this link for the song version.

The Process

Generating the story (ChatGPT)

The branding of Deus Ex DAO has slowly evolved over time, but since the start, the brand assets from the DAO evolved around a cyberpunk theme. In addition to this motif I needed a story that contained parallels with the vision and the work of the DAO members. With the assistance of ChatGPT and a well-defined prompt, ChatGPT not only sparked the initial idea for the story but fully wrote the story visualized in the video. The only edits needed were removing two small paragraphs to better fit the song structure. Since prompt engineering is a serious business present-day I have included the prompt as inspiration below.

Act like you are a renowned sci-fi writer. Write a story in a 80s cyberpunk theme with maximum 500 words. General structure of the story: The story takes place in a cyberpunk dystopian city. The city has been captured by an AI overlord who has the finances of the inhabitants in its control by running schemes and obscure financial tricks. An underground group of friends starts to plot against the AI and comes up with a plan. They hack into the mainframe to implement good behavior in the AI overlord to beat it. The inhabitants of the city will be financially free after the implementation.

Narrating the story (Speechify)

For the narration of the story I used Speechify, a text-to-voice AI tool, which offers an opportunity to breathe life into stories even if they are crafted by ChatGPT. Speechify enables the transformation of written narratives into captivating audio experiences. I chose the voice named Jane since this was perfect for narrating the story due to her calm, clear, and slightly mysterious tone, resonating with the essence of the female protagonist of the Neon City Nights story called Mara.

The Music (Cubase)

The basis for the music was the podcast tune from the Deus Ex DAO podcast. When we started the podcast in January 2022 I created a short tune which we still use today. The tune feels upbeat but was a great fit for a positive ending to the story. I had to work backwards and add mysterious elements and an 80s sci-fi feeling, to make the song fit for a soundtrack, which I did by utilizing the typical 80s arpeggiated synthesizers and long stretched pads.

While almost all elements in this project utilized generative AI, music production was still a human endeavor. Even since the 1980s when samplers and synthesizers became widely available, people still have picked up guitars, drumsticks, and violins to express themselves. Although at some moment we will listen to music generated by AI, people will still enjoy creating and expressing themselves through music, probably assisted more by new technologies.

Specifically for this project, it is important to note that even in this human-led process, elements of machine learning and computer-assisted workflows were integrated into my production workflow. Currently, VSTs (digital instrument plugins) exist on the market which can automatically generate sounds or presets. For example, by analyzing the audio input, the VST intelligently suggests settings and parameters, enabling me to quickly experiment with different sounds and textures. While the core of my music production remains a manual and creative process, the integration of machine learning and computer-assisted features has enriched my output and increased productivity.

Storyboard & Characters (Midjourney)

To study the characters and places in Neon City further, I created a storyboard with the help of Midjourney, the text-to-image AI tool. With its continuous development over the past year, Midjourney has evolved from generating acceptable pictures to producing truly astonishing visuals, making it a great choice for visualizing the story. The storyboard served as additional input for the Runway Gen-2 text-to-video tool since I assumed the input of these pictures would increase the likelihood Runway Gen-2 would produce a more coherent style between videos generated. During the creation of the storyboard, I tried capturing several angles of the city and the characters, which was at times problematic when trying to keep the styling, color schemes, and character form consistent. Luckily Bob Ross has taught us there are only happy little accidents, which led to me embracing changing colors and characters, and actually using these in the story progression. Below you can see a few pictures generated to breathe life into Nexus, the AI overlord ruling Neon City.

Nexus in hacked and good form, close up shot.

Video generation (Runway Gen1 and Gen2)

The release of the Runway Gen2 marked another key moment in generative AI for me, which made me instantly want to experiment with the tool. Runway Gen2 was utilized to generate video segments from text to visualize the story I had in mind. In Gen2, I utilized simple prompts like Flyover shot of a city, cyberpunk in combination with the visuals created by Midjourney to steer the generation in the desired direction. In the cases where the animation had the correct composition and movement, but the styling wasn’t to my taste, I used the Gen2 video as input for Gen1 together with a Midjourney image. The Gen1 tool is fairly flexible in showing a few previews before rendering the final result. Overall, the process was still time-consuming but the enjoyment of seeing impressive results and useful videos for the project made up for that. In total more than 130 videos were generated with Gen2 and 30 with Gen1, which resulted in a 3 minute video after editing.

An example of the Gen2 prompt interface loaded with Midjourney pictures

Limitations of Runway Gen2

During the creation of the video I ran into several limitations when generating the videos:

Character and style consistency between videos makes it challenging to create a coherent long length video. The workarounds I used are:

Generated videos in Gen2 and reprocessed them in Gen1 using Midjourney pictures as input, although this consumes much more credits and still styling can diverge.
Adjusted colors, hues and saturation to achieve more consistent color schemes between videos.
Utilized different color schemes to signify story changes, such as Nexus transforming from red to green and robot to humanoid.

2. Inconsistency between frames can occur resulting in artifacts, offering a cool effect but which is not always realistic. I worked around this limitation by incorporating additional layers of abstract and digital-like animations generated by Gen2 and Gen1 to create an artistic effect.

3. Movements of characters and the camera in Runway are limited. The tool has constraints when it comes to generating complex or intricate movements in characters, even when giving specific instructions. The team seems to be aware of this problem, since they have this as a feedback option when you rate the generated video.

4. Videos created with Runway Gen2 are limited to four seconds in length, which might be too short for some shots, especially if the video contains artifacts that become visible halfway during the video.

5. The maximum resolution for generated videos in Runway Gen2 is lower than the desired 1080p standard, even when the user clicks the highest resolution. The lower resolution results in worse video quality if the editor wants to animate additional scaling or movements to the video.

6. The time Gen2 takes to generate the video takes fairly long, without a hint of the desired result until completion. Gen1 generates several previews, which ensures the viewer has a sense of the output video. Simply only generating a few frames in Gen2 before a full render would help both the user and Runway saving compute resources.

Conclusion

In conclusion, the journey of creating the AI-powered music video “Neon City Nights” has been an enjoyable experience. Leveraging the advancements in AI image and video generation technologies, I was able to overcome previous hurdles and bring my music to life by showcasing the possibilities of working with AI tooling.

Especially Midjourney’s astonishing visuals aided in creating the city and the characters, providing additional input for the Runway Gen-2 text-to-video tool. Even when I was faced with limitations such as character and style inconsistency I enjoyed trying to find creative workarounds.

The fusion of human creativity and machine assistance has made it possible to breathe life into the world of Neon City. As AI continues to evolve, I am looking forward to the endless possibilities these new technologies will offer to artists and creators. Through experimentation and an open attitude we keep pushing the boundaries of multimedia expression.