One of the frame of the 4K AI-generated video.

I had an AI hallucinating over the text of the Bible — Here’s how. [2/4]

A tutorial on how I used four neural networks to generate a 4k, 15 minutes long audiovisual piece — with only the text of the Bible as the input

8 min readJan 23, 2022

Preview of the final result of the tutorial

AI video generation

1. Introduction

Welcome back! In the first part of the tutorial we had a look at the back story of my artwork, and at the general concept of how text-based AI video generation works. Now we’re ready to generate the video. Let’s go!

2. Choosing the right alternative

Among the many fantastic things of the open source community, is that work and innovation can build up tremendously quickly. The pivotal work of Ryan Murdock paved the way to the creation of several different variations and extensions, such as Pixray Panorama and VQGAN + CLIP with augmentations, to link a few.

Nonetheless, my long-time favourite has always been pytti — an outstanding work by sportsracer48. It’s accessible via its patreon for less than 5€ and I really recommend to check it out. This is the main resource I used for the generation of my piece, and if you want to follow the precise steps I would suggest to get access to it — if not, don’t worry because you can still get the general idea and apply it other implementations, such as those linked above.

When opening pytti on colab we are welcomed by this interface:

The first thing we notice is the amount of parameters in the Instructions section: they are almost one hundred. This already gives an insight on the complexity of the system. As both a machine learning engineer and an artist, I believe this is the very reason that makes pytti such a unique resource.

All the parameters require a lot of testing and trial and error to be mastered. I have spent entire weeks trying many different possibilities and combinations so with this tutorial I will spare you the time and point directly to the parameters that were important for the creation of my artwork —always keeping in mind the concept of an AI autonomously hallucinating over the Bible. At the same time, I will keep the discussion quite open to not limit the creative freedom of experimenting with them yourself.

3. Expanding pytti to generate a video of an arbitrary length

When approaching the creation of my artwork through pytti, I stumbled upon the problem of setting an arbitrary output length, as well as using a very long text as the input for the system. I knew I wanted my video to be 15 minutes long because of the exhibitions requirements, but there was no direct way to do it in the system. Nothing that an extra bit of code can’t solve :)

I first added a very simple cell that would create a file with a text of choice as the content — for example, the first chapter of the Bible:

Now that we have the text written in a file, we can read it and process the content in a way that can be correctly used as input for pytti. Another code cells does that:

As we can see, this code reads the input text we provided, removes unwanted characters and unifies all the sentences in a single long one separated by the special character ||.

The text is ready! Now we just need to have a way to set an arbitrary output lenght and we’re good to go. Yes, you’re right! we need one last very simple code cell:

Here we can change parameters such as the length of the video, the fps, and how many steps per frame to calculate — and the algorithm would output a corresponding value that will have the system generate exactly the right amount of frames we need. Fantastic!

Just before we move on to the next step, here comes a tip: one might think that a low number of fps like 12 would result in a very laggy video — this is generally true. However, I discovered in my exhibitions experiences that people appreciated it this way because as every generated frame is so full of details, they had the chance to better focus on them.

4. Having fun with parameters

As mentioned above, there are almost one hundred possible parameters in the system, and the combination of these really creates the magic of the generated video. We will now go through the most important ones for my piece, and I will explain their role and importance in making the concept of my artwork coming to life.

4.1 Prefix style

One of the most important is certainly prefix_style — this parameter will change entirely the aesthetic of your video. In the case of my piece, I wanted the AI to be completely free to imagine anything about the Bible — but I wanted nonetheless to give a slight hint about a sacred, classical painting texture to it. I used this parameter to achive the effect, and if we zoom and analyze one of the frames, we can see the impact it had on the generation — the texture is definitely there.

Feel free to experiment with this parameter to get yourself a unique look to your video.

4.2 Image motion

Some other very relevant parameters are the one driving the motion of the video. I modified translate_x, translate_y and translate_z_3d to create a movement that appears to go towards the inside of the video as it progresses — making it more immersive for the spectator.

If we analyze a short snippet of the video, we can see the movement happening:

4.3 Other parameters

I tried all the other parameters and there are many that are definitely interesting such as pixel_size, latent_image_prompts, suffix_style, and semantic_stabilization_weight— to say a few. However, in the case of my piece I wanted to let the AI as free as possible to autonomously think and imagine — therefore I left everything either blank, or at their default value.

4.4 Boring but necessary parameters

We tried to avoid it as much as we could, but now it’s time: we must set some of the parameters to the values we calculated in Section 3 of this tutorial.

The first parameter is text_prompt — it needs to be set to ret, which is the variable containing our pre-processed long text we want to use as the input. We can do it very simply by changing part of the code of Cell 2.1 to:

text_prompts = ret #@param{type:"raw"}

Fantastic! The input is set. Now we need to include the resulting_x value we calculated earlier to ensure that the video will have the desidered length. We simply assign the steps_per_scene parameter to it, as follows:

Hurray! Now we have our desidered long text set as input, and we are sure that the video will be exactly as long as we want it to be. This means that is time for the real excitement — firing up the AI.

5. Fireworks

Now that all is set, check really quickly if all the parameters are inserted correctly. When you are sure and ready, take a breath, and start the process. You should get an output similar to this:

Congratulations! The generation has started, and now all you have to do is wait — for quite a long time.

Machine learning unfortunately requires a lot of computational power, and even more if we want to generate a 15 minutes long video. Google by default will connect you to a poor GPU (eg: Tesla K80), and this will make the process extremely slow. Luckily, there’s a way to get access to powerful GPUs for very cheap by getting a Colab Pro subscription: for 10€ a month, you get access to much more powerful computation units (eg: P100), as well as more run time. This is the cheapest on the market. To follow the tutorial and get the idea of how it works you don’t need it, but if you plan on generating a long video like I did for my piece, I recommend on buying it as it will make the generation process at the least three time faster.

Furthermore, I’m not generating directly the video at a 4K resolution — quite the opposite. I set the output dimension to 480x300 px. This is a 16:9 proportion, and yes, it’s very small. Consider this: even at this size, it took with a Colab Pro+ subscription around 100 hours to generate the video. This is the computational complexity we’re talking about — it’s quite substantial. Any size bigger than this would drive the generation time to months.

In the next part of the tutorial I will show how to use another neural network to perform super resolution in order to get from 480x300 to 3840x2160 — in other words, 4K.

6. Conclusion

Hurray! If you made it until here, you must have a fantastic AI video slowly cooking in your browser. How exciting!

If you enjoyed this tutorial and you want to know how to get to increase the resolution of your video as well as how to generate the audio part of the piece, follow me on twitter at @p_x_studio where I will post the next parts of the tutorial soon.

Finally, I would love to see the artworks you create. Please tag me on Twitter, send me a message, write a comment — anything works, let’s just spread the art love!

Same applies if you have any question or comment: I will always be happy to answer :) it’s all — a hug, and see you soon!