Community Spotlight

Write your story and DiscoArt will make it come to life!

Using (Distill)BART models and DiscoArt to generate an interactive storyboard from text!

Marco Franzon
Jina AI

--

On the letf, Moby Dick, a fragment of the history. On the right, Frankenstein, very beginning scene. Both generated via DiscoArt using a summary generated by distillBART.

Overview

In this article, we want to show how to combine DistillBART models, used for summarizing long text, and the capabilities of DiscoArt, to create a storyboard, starting from a long-short story.

This project aims to create images from a summary obtained from a long text. For example, a chapter of a book or a paragraph of a fairy tale.

First of all, we need to find the best NLP model suitable for summaries. I have chosen distillbart-cnn-12–6, which seems to me a good threshold in terms of performances/computational cost.

The entire process of building an end-to-end DiscoArt dashboard just from text input can be summarized into two simple steps:

Text Summarization

First, we will summarize the long text taken from the story to make it shorter that can be given as the prompt/input to our art generation notebook. Here is a very minimalistic notebook to understand and play with the summarization process I have used in my little project ( four lines of code is all you need! )

Art Generation

Next step is to take the summarized output and send it as the input to the DiscoArt notebook. Whatever we get as a summary is considered a prompt for generating the artwork. Here is the notebook to use DiscoArt for generating the images from text. The notebook is straightforward to follow but gives you an idea of the possibilities it offers.

StoryBoard from text

I always wanted to paint one of my favorite books, The History of Tom Jones, a Foundling by Henry J. Fielding. But never had the means or skills to do it until I came across DiscoArt, which literally made me an artist overnight!

Let’s look at how I used AI text summarization and at generation to bring my hypothetical idea into picture.

First step is to generate the summary of a section from the first chapter of The History of Tom Jones, a Foundling by Henry J. Fielding, which we will then use as input to generate an artistic image with DiscoArt.

Summary from DistillBART

“Thomas Allworthy, a rich landowner in south-west England, is a widower with three adult children . He lives in a large house with his sister Bridget who is middle-aged, plain and unmarried.”

Artwork using DiscoArt

Using the previous summary as input to DiscoArt, I get this amazing image generated by an AI on the fly!

Illustration from DiscoArt, using the previous summary as input

Here are the parameters for the DiscoArt prompt used to generate the above image:

Thanks to the fact that DiscoArt gives you the possibility to choose between different models, I have chosen the one with the lowest computational cost. Additionally, I generated it in low resolution to speed up the generation process.

A few minutes later, you have an image like this starting from an input like the one above. I deep dive into the two main technologies that power our application. In this way, you will have an understanding of all the generative tools that we used for building our multimodal application.

DistillBART and summarization

DistillBART model comes from the BART model, which is, by definition:

“a denoising autoencoder for pretraining sequence-to-sequence models”

BART is trained by corrupting text with an arbitrary noising function and training a model to reconstruct the original text. It uses a standard Transformer-based neural machine translation architecture which, despite its simplicity, can be seen as generalizing BERT.

By design, it has more than one hundred million parameters, which makes it really hard to use on consumer hardware. As a result, the distillation process is used to get almost the same accuracy but at a lower computational cost.

The concept of distillation is quite intuitive: it is the process of training a small student model to mimic a larger teacher model as closely as possible. Distillation comes in whenever we want to port a model onto smaller hardware, such as a limited laptop or a cellphone because a distilled model runs faster and takes less space.

If you want to understand a little bit more, check out these links →

DiscoArt and generative art

DiscoArt, at this moment, is one of the most powerful and comprehensive way to create artistic images using diffusion models.

What is a diffusion model?

Diffusion Models are generative models used to generate data similar to the data on which they are trained. Diffusion Models work by destroying training data through the successive addition of Gaussian noise and then learning to recover the data by reversing this noising process. After training, we can use the Diffusion Model to generate data by simply passing randomly sampled noise through the learned denoising process.

These are the heart of DiscoArt. On top of that, it has a modern API with which you can easily interact and create artistic images. It also introduces handy features such as result recovery and persistence, gRPC/HTTP, and a lot of other stuff that are well explained here in the official repository.

If you have made it this far reading the article, you might be curious about what went behind building the DiscoArt StoryBoard. Wait no more! Here is the source code for the same 👇

This is mostly a PoC but works well enough to give you a taste of generative AI and its potential. You can run this web app locally, it has tuned parameters to perform pretty nicely on your local machine.

Conclusion

DiscoArt Storyboard is an example of how AI can be used to generate creative art from the text. It is simple to use and low-code as it only requires a few lines of code. With generative AI, the possibilities are endless, and it is a great way to bring your favorite stories to life.

--

--