ZetaForge Use Case Example: A Multimedia AI-generated Storybook — Part III

Zahra Golpayegani
Zetane
Published in
16 min readJun 13, 2024

Creating Your First ZetaForge Block

Now that we have a general idea of what ZetaForge is through reading Part I and Part II, and we have already run a quick demo, it’s time to create something new.

Now, let’s go back to our storybook example and start with a story generator Block. There are two ways to create this Block; the easy way, which uses an OpenAI API key and is a simple API call, and the cost-free way, based on an open-source text generator model on HuggingFace. We will go with the former method for this first example to take advantage of its simplicity and focus more on ZetaForge.

Story Generation with OpenAI API

First, get your OpenAI API key if you don’t have one.

Now, to create a new Block, drag and drop a New Python Block from the Block Library to your workspace. The New Python Block is just a template Block that you can use to speed up the Block creation process. Once you drop the New Python Block into your workspace, you can edit its name and replace the template code with your custom code.

To edit the code, click on the </> icon and open the Block Editor sidebar. You will see three sections inside the Block Editor: Files, Specs, and Test Block, as shown in the screenshot below (note that I closed the Block Library sidebar on the left for more visibility):

New Python Block with the Block Editor opened. Note the three different tabs inside the Block Editor, namely, Files, Specs, and Test Block.

The Files tab includes all the files and folders inside the New Python Block. As you can see, some of the files are highlighted in orange; those are the files that you can edit directly. We take care of the specs.json file, so you don’t need to worry about that. You can add files or a folder directly from the Block Editor to your Block as well.

Next, there is the Specs tab, which is where you can change a Block’s name, add some description, and modify a few other Block configurations.

Finally, you can run a test for a selected Block using the test function you can customize under the Test Block tab.

We will cover all of these features with examples in this article.

The Block Editor includes three tabs: Files, Specs, and Test Block.

Okay, back to the Story Generator code.

After you open the Block Editor, click on the computations.py file. This file includes the main logic of your Block in the compute function. For this example, we want to make a call to OpenAI API with a proper prompt for our moral story. Before writing the code, let’s think about the inputs and outputs of our Block.

As input, we want to take the OpenAI API key, as well as a description of our story, and for output, we need the generated story to be returned. To reflect the inputs and outputs on the visual structure of the Block, we only need to change the signature of the compute function, and then the specs file for this Block would automatically update with the new information, resulting in an updated Block input/output structure on the UI. Let’s see how it works in action.

Here’s the code that we will be using for the story generation:

from openai import OpenAI
import json
def tell_story(sentence, api_key):
client = OpenAI(api_key=api_key)
completion = client.chat.completions.create(
model="gpt-4-turbo",
messages=[
{
"role": "user",
"content": """
You are an educator and writer who teaches children good morals by incorporating those morals and good
habits into stories in illustrated books. You are given an input that is a moral or good habit that a parent
wants to teach their children. Create a short and simple story with at least two characters that will teach that moral to the children.
The story should come into six panels that are easy to illustrate. When referring to a character for the
first time in the story, describe how they look. No two panels should be exactly the same. Keep the story simple
to follow and illustrate. Format your story according to the following example:
input prompt: I want to teach my son to eat more carrots because they are healthy.
your output:
'''
1) Panel one
Once upon a time, there was a beautiful princess with long golden hair and brown eyes, named Diana. Diana
had a dear friend who was not like other people at the castle; he was a dragon! His name was Mushu. He had
bright green flakes and large purple wings that spread wide. Diana and Mushu played in the castle's garden
and always had fun when they were together.

2) Panel two
Mushu loved carrots. So, Diana took Mushu to pick up carrots from the garden every day. They would run
in the carrot garden until they were thirsty. Then, they drank some water from the fontain and ate some
carrots together to power up and go back to their game.

3) Panel three
But things changed one day when they went to play in the garden. Although Diana was so energetic and ready
to run and play more and more, Mushu was very tired from the beginning and couldn't play with Diana. Mushu
and Diana were both sad that Mushu couldn't run in the garden like every day.

4) Panel four
Diana was looking for a way to make Mushu strong again so they can play together. She went into the
garden and picked some fresh orange carrots for Mushu by herself. Diana brought a basket full of carrots
to Mushu.

5) Panel five
Mushu started munching on the carrots. As he ate more carrots, he felt stronger, happier, and healthier.
When Mushu finished eating the carrots Diana brought, he started laughing again and wanted to run in the
garden and play with Diana like every day!

6) Panel six
Diana and Mushu learned together that carrots are healthy and full of nutrients that are good for both
humans and dragons. Eating carrots helped them play longer in the beautiful garden and stay full of joy!
'''
After you wrote your story, reformat it in the json format according to this example:
{
"prompt": "I want to teach my son to eat more carrots because they are healthy.",
"response":
{
"page1": {"text": Once upon a time, there was a beautiful...},
"page2": {"text": ...},
...
"page6": {"text": ...}
}
}

Don't include panel titles, such as "1) Panel one", in the output dictionary. Output only the dictionary with no explanations so that it is convertable to a json object as is.
here is the input prompt:
"""
+ sentence,
}
],
temperature=0.0,
response_format = {"type": "json_object"}
)
return json.loads(completion.choices[0].message.content)
def compute(story_description, api_key):
"""
Generates a story based on prompt.
"""
story = tell_story(story_description, api_key)
return {"story": story}
def test():
"""Test the compute function."""
print("Running test")

Let’s break it down.

tell_story : Uses the OpenAI API and makes a call to the gpt-4-turbo model with a prompt that is specifically designed for this example, which is writing a moral story. Note that the prompt divides the story into several panels; we will generate an image later on for each panel to visualize our story and make our storybook look more interesting. Feel free to experiment with other model types or prompts.

compute : This is the main function that should always exist in the computations.py file in [almost]¹ every Block. The input and output nodes of a Block in the UI are defined by the inputs of the compute function and the dictionary it returns, respectively. Here, we take the story_description and api_key as inputs. To specify the output, we need to return a dictionary containing node names as keys and the object (integer, string, JSON, etc.) as value. You can have as many function calls as you want here, but make sure you are keeping the same input and output signatures as shown in this example (inputs are defined as usual, and outputs are returned as a dictionary). Note that we don’t support default arguments yet; all the arguments need to be provided with a value.

test : You can add test functions to your Block using this function. This is completely optional but can be very useful for unit testing.

To use this code, after you have opened the Block Editor, double-click on the computations.py file, click on the edit button to make the code editable, paste the code inside the code box, and save the changes by clicking on the save button. To help you not lose track of your previous code versions, we use a code selector that allows you to easily switch to another version of your code and compare different code versions effortlessly. To apply your latest changes, make sure you select the current version using the code selector option button.

After you make these modifications to the compute function and save your changes, the inputs and outputs of the Block will get updated in your workspace. At this point, here’s what you should see:

Block structure updates automatically when you modify the New Python Block with your own customized compute function.

But, you probably need to rename this Block. Here’s where the Specs tab comes in handy. Head over to Specs and find the name field. You can replace the current name (which is New Python most probably) with something more representative of what you are building, in this case, something like GPT Story Teller . You can modify the description here as well as the block version, etc. Don’t forget to save your changes.

Modifying the Block name, description, and other configurations through the Specs tab in Block Editor.

We are almost ready for this Block. To test this Block inside a container, you can navigate to the Test Block tab and click on the Run Test button. This would create a new container for your Block and run the test function for you. This part is optional, and we encourage you to try it out yourself. Currently, since we have only a print statement in the test function, we should only see Running test as output, as well as the container build logs.

Other than the computations.py file, we need to edit two more files: the Dockerfile and the requirements.txt file. The template Dockerfile is a pretty standard and general one that can be used without any modifications in many cases. But, if you need to add another step to your Dockerfile, for example, if you need to copy your model checkpoints file to the container, you can edit it directly on ZetaForge. Save your changes to make sure they take effect. In our case, we only modify the base image from python:3.9 to python:3.12-slim to make it more efficient.

For the requirements.txt , we only have one that requires installation, which is the openai Python package. We will add that to our requirements and save the changes as usual. Note that to make sure your Block works in the future and doesn’t get deprecated, you should specify the exact format. Otherwise, if you only write the package name, the latest published version of the package will be installed, which requires active maintenance. We recommend that you always specify the exact package version to avoid surprises in the future. Whenever you are developing, you can explore what’s new in the latest version and choose to pick the latest version or stick with the previous one. Here, we add openai==1.25.0 to the requirements.txt file for consistency.

To create a new Block, use the New Python Block as a template. Then, you need to edit three files: 1) computations.py which contains a function, called compute that specifies the main logic of your application, and its inputs and outputs are automatically set as the input/output node names on the Block interface. 2) The Dockerfile , which is general enough for many cases but you may need to modify it, and 3) requirements.txt file which includes the package names, preferably with their specific versions.

Now that the Block is ready, let’s attach some inputs to it! We will need a Password Block for our API key, and a text Block for the story description. To visualize the output story, we will use the view text core Block that is already there in the Block Library. If you assemble the Pipeline and provide the API key (you need to enclose it in quotations) with the story description and run the Pipeline by clicking on the Run button, you will see something like this:

Creating a GPT-powered story generator Pipeline using ZetaForge.

Voilà! Now we have our paginated story.

Let’s add voiceovers to our storybook next. We will use HuggingFace for the text-to-speech model instead of an API to experiment with another coding style.

Story Voiceover with HuggingFace Models

Now that you have some ZetaForge experience, let’s try a more challenging code that will also demonstrate how easy it is to integrate other tools in ZetaForge. To find a suitable text-to-speech model from HuggingFace, you can explore HuggingFace Models and filter your search by selecting the text-to-speech tag from the left-hand side section. I will use Microsoft’s model for this example, called microsoft/speecht5_tts , but feel free to experiment with other options.

To create the Block for this task, similar to what we saw before, drag and drop a New Python Block into your workspace and paste the following code into the computations.py file. Quick reminder: you can find computations.py by clicking on the </> icon to open the Block Editor sidebar and under the Files tab, double-click on computations.py to open it on the ZetaForge code editor.

Here’s the code that we will be using:

from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
from datasets import load_dataset
import torch
import soundfile as sf
def text_to_speech(text, output_file_path, processor, model, speaker_embeddings, vocoder):
inputs = processor(text=text, return_tensors="pt")
speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)
sf.write(output_file_path, speech.numpy(), samplerate=16000)
return output_file_path
def compute(text_dict): result = []
processor = SpeechT5Processor.from_pretrained("microsoft/speecht5_tts")
model = SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")
# load xvector containing speaker's voice characteristics from a dataset
embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
speaker_embeddings = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0)
for key in text_dict["response"].keys():
text = text_dict["response"][key]["text"]
output_file_path = f"audio_{key[4:]}.wav"
result.append(text_to_speech(text, output_file_path, processor, model, speaker_embeddings, vocoder))
return {"result": result}
def test():
"""Test the compute function."""
print("Running test")

Go ahead and click on the Edit button and paste this code into the computations.py file. Save your modifications and select this new code version using the radio button just above your code. It should update the Block structure according to your code, as seen below:

Selecting a code version will automatically update the structure of a Block according to the function signature.

But we also need to change the Block name. We can do that by clicking on the Specs tab and modifying the Block name. I will rename this Block to Text to Speech .

Next, we need to modify the requirements.txt . We select it from the Block Editor and add our new requirements which are the following:

torch
transformers
datasets
soundfile

Since we don’t need to add any new steps to the Dockerfile, our job is done for creating this Block.

It is not necessary to fix the package versions, but make sure you put the effort into maintaining your code if you are not planning on fixing the versions.

This Block takes in a text dictionary and is designed to work with the same structure as the output of the Story Generator Block we just made. So, we can connect the output node of the Story Teller Block to the input node of this Block, as shown below, and we connect the output of the Text to Speech Block to an Audio Player Block to visualize what we have so far:

Connecting the Text to Speech Block to the GPT Story Teller Block on ZetaForge to build a Pipeline that generates a story based on a prompt and narrates the story afterward. This screenshot shows this Pipeline before the execution; the Audio Player Block will show the audio file once the Pipeline is executed.

Click on the Run button on the top to start the execution of your Pipeline. After a few moments, the audio files are generated and can be viewed on top of the Audio Player Block.

Adding More Blocks

So far, we have built a Pipeline that generates a story for us based on a prompt using OpenAI API, and then we add voiceovers to the story using an open-source text-to-speech model from HuggingFace.

You can get creative and add any Block you like to this Pipeline. To create a multimodal AI-generated storybook, we added an illustrator Block that utilizes Stable Diffusion API to draw the characters and the story. Also, we added a few other Blocks to aggregate all forms of generated media and put them in a book format, then publish the book on GitHub Pages using a GitHub access token, and finally create a QR code that can be scanned to access the published book. Here’s the Pipeline we built that we demonstrated at our Workshop at World Summit AI 2024 (don’t worry if you missed it, here’s the link to the Workshop video):

A complete Pipeline that creates an AI-generated storybook based on an input prompt using ZetaForge.

As you can see in the screenshot above, there is also a Block that reviews the content generated by the “GPT Story Teller” Block to ensure the story is suitable for children. Also, we added a “Story to SD Prompt” Block to turn the generated story panels into stable diffusion prompts while keeping the visual appearance of characters consistent, because if you feed the story to stable diffusion directly, each panel’s drawing could end up illustrating something different, which is not desirable for a storybook.

Here are some examples to show you what we built with this Pipeline:

Storybook Example #1

Prompt: I want to teach my son to eat more carrots.

Screenshot:

Storybook example #1.

Link:

Storybook Example #2

Prompt: I want a child to learn that it is important to be attentive to detail. The main character is called Luka and it is in a world of Lego. Make the story rhyme.

Screenshot:

Storybook example #2.

Link:

Storybook Example #3

Prompt: I want to teach my child not to forget his personal belongings. He loves to pretend he is a ninja, so use this in your story.

Screenshot:

Storybook example #3.

Link:

Feel free to explore each storybook using the provided link. As you can see, the characters are consistent throughout the whole book, which is achievable by prompt engineering and some additional touches.

When we were building this Pipeline at Zetane, we assigned each Block to one of our developers to get it done faster. The best part was that no one had to be careful about what tools or packages their colleague is working with, and everyone could focus on their own task.

When we put all of the Blocks together to build the Pipeline, it worked on all of our computers without having to troubleshoot anything or add an extra step for integrating all the code into one Pipeline.

The Pipeline we built took all the data and deployed a React website based on the content, and everyone could view the created storybook using the same link. It was straightforward to deploy our storybook to a website using ZetaForge, and the best thing is, if we ever want to create any other Pipeline that does something very different than this example, but at the end needs to deploy a React website for example, we can use the same Block for the deployment part with minimum adjustments.

Also, when we wanted to present our Pipeline at the Workshop, it was easy to explain what this complex Pipeline does by having those view Blocks that show intermediate results within the Pipeline. It also made our Pipeline explainable and less complicated.

Next Steps

Where can we go from here?

Well, you are only limited by your imagination and a few technical things! Currently, we support pipelines that can be represented by a Directed Acyclic Graph (DAG) structure and are deterministic meaning that the pipeline flow is not conditional. Moreover, we need objects transmitted through the Blocks to be JSON serializable. That’s pretty much it!

So, if you are interested in what ZetaForge offers, like easily collaborating with your colleagues, reusing code that has worked before, running code consistently in containers, etc., use your creativity and think of any Pipeline that saves time and money, deploy your research project into a ZetaForge Pipeline, or use ZetaForge to create a Pipeline that explains how an AI system works to people with no background in AI.

ZetaForge Pipelines are designed to run on any Kubernetes cluster, either remotely or on-premises; therefore, GPUs can be utilized if a Pipeline is computationally expensive. Also, since it allows you to replace Blocks surgically, it’s an ideal tool for comparing different methods and doing benchmarking while providing visualizations.

For example, this is a Pipeline we presented at a Workshop on Trustworthy Artificial Intelligence that compares a quantized and unquantized network. This Pipeline quantizes the model using the activation values of layers. The calculated activation values are visualized using a bar plot. On the top branch, we have the full-precision network; on the bottom, there is the quantized network. We provide 3D visualizations for both models that help us present the Pipeline to others more interactively and understand it better.

Utilizing ZetaForge to compare a full-precision model (top branch) with a quantized model (bottom branch) with visualizations and explanations. The Pipeline can run on any Kubernetes cluster.

Conclusion

In this blog, we described some of the most common pain points in software development in today’s world. We introduced ZetaForge as a promising tool built to solve some of those challenges. Then, we explored ZetaForge and its features through an interactive example that deploys an AI-generated multimedia storybook. We saw together how to create custom Blocks on ZetaForge and how to run a ZetaForge Pipeline. Finally, we provided some examples and discussed some limitations in terms of what ZetaForge offers.

References

  • Here’s where you can find the official documentation for ZetaForge: https://zetane.com/docs/.
  • If you have any ZetaForge Block/Pipeline creations you want to share or if you have any questions about this tool, you can ask us on our Discord channel: https://discord.gg/zetaforge.
  • Check out ZetaForge on GitHub: https://github.com/zetane/ZetaForge
    Feel free to open a new issue if you notice a bug or want to request a feature, and if you want to contribute directly, we are open to your contributions!
  • Subscribe to our news and updates and learn more about ZetaForge on ZetaForge’s website: https://www.zetane.com/zetaforge.
  • Learn more about who we are at Zetane: https://zetane.com.

[1] In Parameter Blocks, we don’t need to have a computations.py file because nothing needs to be computed inside a container.

--

--

Zahra Golpayegani
Zetane
Writer for

Computer scientist and machine learning engineer