Unleashing the power of AI in media and entertainment

Published in

XRPractices

3 min readFeb 10, 2023

The recent explosion of Generative AI advancements enables us to do lot of creative work even by a commoner. In this article we are going to explore the automation possibility of creating an animated content with the open source models and tools available and see how far we can go from here.

For this example, I’m going to stay simple and do an animated clip of my avatar and few inspired characters from Marvel movies. The workflow as follows

Dream Booth Training
Audio Creation
Character Generation
Deep Fake animation

Step 1: Dream Booth Training:

The methodology behind Stable Diffusion is different from that of older artistic style transfer models. It is not possible to work directly on our photos with Stable Diffusion, so it is important to fine-tune the model by using 10–16 portrait pictures of ourselves and associate the training with a keyword. In this example, we will use the Google Colab Dreambooth training script and train the model with our photos. The following training parameters are used during dream booth training

"instance_prompt":      "photo of SDRaju",
"class_prompt":         "photo of a man",
--save_sample_prompt="photo of SDRaju" \

The keyword that I used for my picture in stable diffusion is “SDRaju”. In the prompt whenever stable diffusion sees this keyword it would try to bring my picture into play.

Step 2: Create Audio

We will use a text-to-speech AI model, such as Speechify, to generate audio for our animation. We will provide a prompt, such as a personal introduction, and Speechify will output the audio in mp3 format.

The following text is used for the audio generation

Hi! My name is Raju Kandasamy. I'm a developer at Thoughtworks. This is my animated character. Every pixel of this is generated by artificial intelligence. This may transform the content creation for media and entertainment and may boost the productivity.

To prepare the audio for the next step, we will convert it from mp3 to wav format using a tool such as Audacity.

Step 3: Generate Character using Stable Diffusion Prompt

Using the below prompt, generated the character

a portrait of SDRaju as Thanos, face only, concept art, trending in artstation

Step 4: Using Deep Fake models for animation

Deep Fake AI models are capable of generating lip sync, eye blinks and slight head movements. Using the LIHQ Colab notebook, generated the character animation for the character above. LIHQ uses wav2lip model for lip syncing. For some odd reason LIHQ fails if we give the image in square dimension (i-e 256x256 or 512x512). Had to alter the dimension to portrait (like 512w x 576h)

The lip sync is still not perfect. The frame interpolation needs improvement.

The LIHQ uses First Order Motion Model to first convert the still photograph into a video. Then uses wave2lip to sync audio and lip movements.

As you can see, the lip movement has lot of artifacts, cuts and tearing. Hence, the frames are extracted and an interpolation is applied on the extracted frames to smoothen the animation and reduce artifacts.

Future:

With what we have seen so far, This could personalize our entertainment consumption. For instance, the possibility of customizing a character in a movie as yourself and watch it streaming is a tech feasibility only limited by computing complexity and infrastructure. How great it would be if video2video AI model prompted using text generates a personalized version of a movie? For instance

The Dark Knight - Robert Downey Jr as batman, Marlon brando as joker, Emma Watson as Rachel

Conclusion:

In conclusion, AI has the power to enhance productivity in the field of media and entertainment, but it should not be seen as a replacement for creativity. As AI continues to evolve and influence the creative process, it becomes increasingly important to embrace its role and be prepared for the shift towards prompt engineering. By doing so, we can fully harness the potential of AI and use it to drive innovation and progress in this exciting industry.