AI photography in the newsroom — are we there yet?

Stable Diffusion has shown impressive capabilities of photo-realistic image generation. But is it ready to take a role in journalism? We put the model to the test.

Jens Finnäs
Newsworthy.se
9 min readDec 7, 2022

--

We have all seen text-to-image models progress at a jaw-dropping pace in the past year. It is only a matter of time before these models will disrupt a lot of the content creation industry — including journalism (the field of our own).

AI models seem to be able to produce just about any image. But while the photos of horse-riding astronauts are certainly impressive, they are not necessarily useful in a journalistic context.

We wanted to find out if the current state of the art AI models can actually generate images at a publishable quality. In our case “publishable” means images that can serve as illustrations for news articles at Newsworthy, our news service for local, data-driven and highly automated news in Sweden.

We are a small news organization with limited resources for photography. We currently accompany our articles with open license stock photos from sites such as Unsplash, Wikimedia and Flickr. Most of our photos are generic enough to be replaceable by AI images. At least in theory.

With the help of Andrea Breuer and Isac Jonsson, two students of statistics at Uppsala University, we have spent some time exploring the possibilities and boundaries of Stable Diffusion as a tool for production of stock photos in journalism where the realism of the output is more important than the coolness.

Stable Diffusion is the leading open source alternative for text-to-image AI modeling. It was released to the public in August 2022 by Stability.ai and has been shown to generate fascinating images. Compared to previous image generating models it has substantially better performance and is able to operate at relatively low computational cost.

For this project we picked three article types for which we thought that Stable Diffusion would be able to generate decent images:

1) An article about house prices (as we publish monthly reports on property sales).

2) An article about a company filing for bankruptcy (for our weekly coverage of bankruptcies).

3) An article about an unusually warm autumn week (for our weekly and monthly coverage of weather events).

Case 1: House prices

This is what a an article about housing prices looks like today:

Our first attempt was to feed the model a basic text prompt. These were the initial results:

“a wooden house on a small town street, sweden, light yellow, hyperrealism, 4k, photo realistic, hd”

As you can see the model struggles with windows and doors, just as we have seen AI models struggle with limbs of people and animals. We also found it difficult to capture a whole street of houses. The more elements in the picture, the more things could go wrong. Instead we tried to zoom in and be more specific:

“steps leading to an apartment door, dull colours, small town, Scandinavia, flower pot, autumn 4k, hd”

There are some obvious unrealistic elements remaining in these pictures, but the results are certainly better.

Are these images publishable? Maybe not quite. We found it hard to capture the feel of a Swedish house. The stairways above all have something of a foreign look and feel. But we were certainly getting closer to something publishable.

As an alternative strategy we tried to generate some artistic images, rather than photo-realistic ones, inspired by stock photos such as these:

Here is what Stable Diffusion offered us:

“tiny red house surrounded by gold coin stacks, board game, 4k, hd”

With an artistic image we can allow ourselves a lesser degree of photo-realism. Already in the first batch of images we are able to generate a few more or less publishable images:

But a lot of these images were disqualified by the coins that the model had a surprisingly hard time capturing:

And also the vast majority of the houses were too distorted to be publishable:

Case 2: A company filing for bankruptcy

Every Monday Newsworthy publishes articles about the companies that have filed for bankruptcy in the past week. Here is an example:

We knew that images of people are challenging. However, we gave it a try to see what it would give us.

At a quick first glance, some of these images did not look too bad. However, when you look closer you will notice that the hands are kind of wonky with either too many or too few fingers. In general the model has a hard time generating realistic hands and there always seems to be something wrong with them. Also, the faces aren’t too great.

“Man signing a contract, in an office, high resolution”

Next step, we tried to generate images with people seen from behind hoping we would not have to worry about hands and faces. However, something still feels a bit uncanny. All the briefcases look very strange and the limbs and the poses of the person remain somewhat non-human (sidenote: notice that there is no uncertainty about the gender of “a person in a suit”).

“a person in a suit seen from behind, holding a briefcase, walking down a busy street, 4k, hd”

We gave up humans all together and tried a whole different approach. The idea was to illustrate the articles with a images of moving boxes and plants:

“two cardboard boxes filled with papers, on top of a table, a plant, folders, in an office, seen from the side, hyperrealism, 4k, photo realistic, hd”

There are some comical traces of a well known e-commerce company, but the model does a pretty good job.

With some fine-tuning, the prompt “cardboard boxes and plants, on a table, in an office, folders, sideview, 4k, hd” seemed to produce images with plain boxes without any text. Why this prompt removed the texts we don’t know, but it did.

And the result? Almost publishable at least.

3. Case: Weather articles

Lastly, for an article about an unusually warm October we wanted a sunny nature picture. This was a pretty good use case for Stable Diffusion. Several of the images in the first batch look publishable:

“a forest in Sweden by a lake, autumn, sun is shining, hyperrealism, 4k, photo realistic, hd”

Many of the nature shots were decent — even good — pictures in terms of quality and composition and, more importantly, they mostly passed the realism test. But, on the other hand, they were often a bit too artistic in their style to be suited for a news context:

But in general, this is the category that we were most successful in. We were able to produce several images that we could see ourselves publishing.

Strategies and learnings

So what did we learn? Is Stable Diffusion ready to step in as a stock photographer in the newsroom?

Maybe not quite yet.

With some patience it is most certainly possible to generate images that are publishable. But it is not necessarily quite as straightforward as snappy tweets and instagram posts lead you to believe.

We found that it takes quite a lot of tinkering to come up with useful prompts. Since randomness plays a big part when using Stable Diffusion, one cannot expect all generated images to always look good.

We experimented with adding keywords to the prompts to adjust the outline of the images and make them more realistic. Examples of such words were high resolution, highly detailed, hyperrealism and photorealism. We found that the two words that seemed to have the highest consistency when tweaking the images to a realistic setting were 4k and hd.

We also found that some popular keywords, like trending on artstation or concept art, instantly turn the images into art. These keywords should perhaps be avoided when generating photorealistic images.

After finding a reasonable base prompt for the three article types we generated a total of 500 images with different settings (additional keywords, number of iterations etc). About 1 in 20 images we considered publishable (or almost publishable).

Our preliminary results suggest that the number of iterations (how many times the image goes through the neural network) does not have a significant effect on the image quality. Thus, using more iterations than the default of 50 is redundant and a waste of computational power. Nor does the use of extra keywords seem to significantly affect the image outcome. So do not feel pressured to add things along the line of “Studio lighting”, “Hyperrealistic” or “16k” in your prompt. On the other hand, guidance level (i.e how much the AI listens to the prompt) seems to have some effect on the image outcome.

The main takeaway, however, is that the base prompt has the single biggest effect on whether a picture will be good or not. One can tweak the different model-parameters but that won’t help much if you are trying to depict something that Stable Diffusion has a hard time generating.

Prompts can be developed with not only keywords, but also so called seeds. A seed is a unique identifier of an image that can be used to generate new variations of the same image. As the word suggests it allows you to grow a new “crop” from an existing one.

Here we have an example where the only difference between the images is that the prompt says “auburn” for the images on top and “light yellow” for the images at the bottom.

This is one of the real potential opportunities of AI imagery for a site like Newsworthy that currently depends heavily on re-use of a limited number of stock photos.

Awaiting models to come

This project has been limited to a handful of use cases. We have seen that the capacity to generate images that are realistic enough to be used in journalism varies a lot from context to context.

The results that you get from an AI model will always reflect the training data. Stable Diffusion was trained on images that are generally seen as “beautiful”, rather than images optimized for journalism. Likely, it has not been trained on pictures of ordinary houses or people signing contracts. Stunning landscapes however, it has probably seen before.

That said, we still think that it is a question of when, rather than if AI photography will make its way into journalism. We eagerly await future models trained on a wider range of images, but at version 1.5 we don’t see Stable Diffusion replacing too many photographers in the newsroom.

Jens Finnäs
Founder of Newsworthy

Andrea Breuer
Student of statistics at Uppsala University

Isac Jonsson
Student of statistics at Uppsala University

--

--

Jens Finnäs
Newsworthy.se

Finlandssvensk datajournalist i Stockholm. Grundare av Newsworthy, J++ Stockholm och Botten Ada.