Animating the News with AI

Felix van Deelen
NOS Digital
Published in
10 min readMar 23, 2023

In this post we discuss potential integrations of Stable Diffusion into our news platform for youth. We — data scientists and developers at the public news organisation NOS — explored ways to automatically generate animations based on textual features from news articles.

$ whoami
I am writing this blog as the Data Scientist at NOS, which is an independent public media organisation in the Netherlands reporting on news and sports. We are a broadcaster by origin, and the last few decades we’ve witnessed how news is becoming a digital and a mobile service. We have dedicated teams of professionals to create digital services for several brands.

Stable Diffusion to encourage youth to follow the news

Jeugdjournaal is our news brand aimed at children aged 8 to 12. The daily broadcast informs hundreds of thousands of children about the news every day in classrooms and at home. The Jeugdjournaal website and app are playful products where kids read, watch and engage with the daily news. However, qualitative research into our platform showed that youth reads fewer articles while the interest in video is increasing. We do not want to discourage youth to read, but we do want to encourage youth to keep up with the news. From this we started thinking about the use of AI to generate content that may fit better with our audience.

Left: Developer interest on Stable Diffusion over time. Right: Example of an image generated with Stable Diffusion

Our journalists already work on informative videos for the platforms, but it can take a lot of work to produce these videos and for some topics there may be no footage available at all. As a result many articles on our platform are text-based, despite our audience’s preference for video. Next to this, the importance of visuals becomes larger as we see social media platforms becoming more and more focussed on video-content. At the same time we see — for instance in the graph above — the rise of Stable Diffusion, a deep learning text-to-image model that is able to quickly and accurately generate stunning images based on textual input. An example of what Stable Diffusion can do is shown above on the right: we input a prompt requesting a painting of Jupiter and its moons and we get an image back. From Stable Diffusion’s potential to generate visuals at a low cost, and the demand for videos by our users, we posed ourselves the following question

Can we use Stable Diffusion to create a new type of content for our youth news platform by automatically generating animations from news articles?

We decided to run an experiment during an internal hackathon in search for a method to create animations from news articles in a fully automated pipeline, which we will go through in this blog. A sneak preview of a generated animation for a news article about a wolf hunt is shown below. Read on to learn more about the pipeline we developed (news2animation, if you will) that automatically generates animations powered by Stable Diffusion from textual features of news articles.

An animation generated for a news article on a wolf hunt

The experiment

We set out with the ambitious goal to develop a pipeline that is able to automatically generate animations for news articles. Our experiment started with an algorithm that is able to convert a news article to useful prompts, i.e. the text used to guide the model on how the output image should look. The blog will then go into detail on how we moved from generative images to generative animations. Finally, we dedicate a section on how we put it all together, and extended our pipeline with additional functionality such as the incorporation of text2speech, text2starwars and even a 3D version of our animation, which in the end enabled us to fully-automatically create new types of content that may possibly catch on better with our younger audience. An overview of our full pipeline from article to animation is shown below.

Overview of our pipeline for generating animations for news articles

The prompt construction algorithm

The most important factor for creating accurate and aesthetic visuals using Stable Diffusion is the prompt. The prompt can tell the model what objects to generate, properties of the object, in what scene it should take place, but also guide the model to an art style, type of medium or colour palette usage. The first step of our prompt construction algorithm is to translate the articles from Dutch to English [1], as Stable Diffusion models generally work best on English texts. We then tested several methods to construct sets of prompts given an article, starting by just using full sentences from the article as prompts. We also tested a machine learning model that generates prompts from texts automatically [2]. We finally tested a method that extracts the important words from the article text which are used to fill out a prompt template. We qualitatively evaluated these three methods by generating images for the same input article. Below we show an example of the prompts and images generated using the three different prompt construction algorithms for an article on the king of the Netherlands obtaining his pilot license. In general we saw that the ML-based method often created overly dramatic scenes and sometimes added extra topics to the prompt generating images that are not representative of the article. The sentence-based model sometimes gave good results, but often the real subject of an article seemed to be obscured by the many additional words in the prompt. In the end we decided that the template-based method yielded the best and most consistent images for our use case.

Example of images generated for an article on the king of the Netherlands obtaining his pilot license using various prompt construction algorithms

Template based prompt construction

Based on our prior experience with Stable Diffusion and literature on prompt building [3, 4], we came up with the following template for constructing our prompts:

A [art medium] of a [main subject and its properties], [style cues]

- art medium: oil painting / 3d render / photograph
- Main subject: noun phrases extracted from text
- Style cues: cyberpunk / psychedelic / renaissance depending on article category

As an [art medium] we tested a few options such as photograph, 3d render or sketch, but decided on oil painting. We did not want the results to be too realistic, so as to not confuse our users as to whether the animation is actually real, but at the same time we wanted the subject of the article to be visualised accurately, and for this case oil painting seemed to perform best. To fill out the subject in the prompt template we used Spacy’s Named Entity Recognition model to extract noun phrases from the article [5]. To add style cues to our prompt, we used categories that our items are annotated with. For instance we used cyberpunk for the category technology and psychedelic for the category remarkable.

Some articles may not fit the use-case

Animation generated for an article on bones found on an archeological site.

As the animations are made for for children, they should not be scary or inappropriate, which proved to be somewhat problematic for some specific articles as can be seen in the example above. This animation was generated for an article on animal bones found on an archeological site, and gave us some properly scary frames. We experimented with modifying the prompt so that it would be more appropriate, but we noticed occasionally the prompt could still generate inappropriate frames. To solve for this, one could train their own Stable Diffusion model for example based on children’s illustrations only. For now we decided that an editor should check the final animation before publishing it.

From prompts to images to animations

Now that we have a way to construct a set of prompts from an article, we are ready to move on to generating images. For this one could for instance post each prompt separately to the DreamStudio API [6] or run their own Stable Diffusion model e.g. using a model on HuggingFace [7]. Summarised in a strongly simplified manner, these services perform the following steps to generate images (see this page for a more in depth explanation):

  1. The prompt is converted to a numeric representation that is interpretable by a computer (also called embeddings)
  2. A random image is generated in a strongly reduced dimensional space (also called latent space)
  3. A deep learning model predicts what parts of the randomly generated images are noise compared to the embeddings
  4. The predicted noise is subtracted from the initial image and the result is mapped to regular image dimensionality.

Such models are trained on a large set of prompt-image combinations from which it learns how to denoise and generate images.

However a set of images is not an animation yet. We explored two directions that use Stable Diffusion for animation generation: the Deforum notebook [8] and running our own Stable Diffusion server. Deforum is an organisation that provides a notebook that can be run on Google Colab which generates animations from a set of prompts. It slowly shifts from prompt to prompt, by creating interpolations of the text embeddings. A very cool feature is that it can create a 3D scene, and you can specify instructions to simulate how a camera moves through the scene. An example of an animation generated with Deforum is shown on the left below figure.

A downside of using a notebook for generating animations is that it requires manual input and therefore cannot be integrated directly into our pipeline. For this reason we looked into running our own Stable Diffusion server, which was built on top of a pretrained model from RunwayML [7]. Our server takes a set of prompts as input and generates animation frames from prompts using the previous frame as a starting point for the next image. To make a coherent set of frames we process all images so that they have a similar colour palette as the first frame of the animation. An example of the animation generated by this process is shown below on the right.

Animations generated for an article about a rocket landing on the moon. Left was generated using Deforum, right using our own Stable Diffusion server.

Putting it all together

Combining the algorithm that constructs theme-specific prompts from article texts and our server that is able to generate animations given a set a of prompts, we have our pipeline ready to create Stable Diffusion animations. We configured our pipeline so that it generates animations containing 200 frames, with a frame rate of 12 fps, and frames dimensions of 512 by 512 pixels. When running the pipeline locally on a MacBook Pro, it takes about 30 seconds to generate a single frame, which means it will take over an hour to generate a single animation. We also tried running the pipeline on a GPU cluster, which significantly sped up the process, enabling us to generate animations in about 15 minutes.

We decided to integrate a few more features into our pipeline. To be able to better convey the message of an article, we wanted to further incorporate the article text. For this purpose we used a text2speech service [9], which is used to recite the article text during playback of our animation. We took this one step further by overlaying the article text on top of our animation (not in the slightest inspired by a well-known media franchise). We came up with one even more experimental integration of the animation, where we played the animation on top of a 3D model of a globe. We integrated the several types of automated animated versions of articles into a version of our news app, which are all shown below. After putting it all together, we were able to create a completely new type of content for our platform.

Example of how the new types of content could be integrated into our app

Conclusion

We don’t know yet exactly what the arrival of Stable Diffusion will mean for us, but we are experimenting with it to find new ways to support our storytelling. We found that it is possible to automate the creation of new types of news content powered by Stable Diffusion. We could use this pipeline to generate various animations for news articles, where an editor may check the quality and decide to publish it. The advantage of such a pipeline is that the animation can be generated quickly and cheaply compared to a conventionally human-made video. The content generated by our pipeline might also be useful for creating engaging content to push to social media, as many social media platforms focus mainly on visuals. We certainly are fascinated by the animations that can be created with Stable Diffusion, so we included a few more examples below. If you have any questions or feedback, please let us know in the comments!

Left: Moon rocket flies at record distance from Earth. Middle: Robot dog Spot helps as a guard in archaeological city. Right: Gelderland is going to shoot at wolves with paintball guns.
Left & Middle: Moon rocket flies at record distance from Earth. Right: Robot dog Spot helps as a guard in archaeological city.
Left: Robot dog Spot helps as a guard in archaeological city. Middle & Right: Moon rocket flies at record distance from Earth.
Left: Tourist dances through Europe. Right: What snacks did the Romans eat in the theatre?

--

--

Felix van Deelen
NOS Digital

Data Scientist at the Dutch public news organisation, live coding musician and AI-art-enthusiast