Creating and Illustrating a children’s book with Dall-E in less than a week

Ray Wang
5 min readAug 22, 2022

--

10 years ago I had the idea to write a children’s book about the joy and difficulties of building something; I wanted something with the approachability of Dr. Seuss but also presents a message that scales with age. That idea was there but the hassle and cost of illustration always prevented me from finishing the book.

Alternatives like hiring an artist on platforms like Fivver and Upwork takes a lot of work, money, and human interactions. When Dall-E was launched, I was excited as the barrier to produce a children’s book dropped dramatically.

To see the result now, scroll to the end.

How Dall-E works

Dall-E takes a single prompt and then generates 4 square picture with the dimensions 1024x1024, although the photo quality is extremely high and I think the real pixel amount is probably 4x that.

The prompt that generated by cover graphics

Creating a cohesive style

The biggest initial problem I found was creating a consistent style. The entire project is to generate 17 images with the same style. As Dall-E generates each new image from a new prompt without any control on style or any way to save a style, it’s hard to maintain consistency.

The best solution for this is to find a set of keywords that keep generating images that look the same. However, I discovered you can’t do just any style, if you have both a subject matter that is very unique and a style that is very unique, the generated image will lose the style first. The style has to be one popular enough that Dall-E already has tons of images on it so it maintains the same style throughout.

Below is an style that failed to make the cut:

Prompt: “monk holding a snowglobe sitting in a temple, ukiyo-e style, detailed” As you see this ended up not in the clean japanese wood-block printing style that we wanted but as a oil/water color type of style

At the end, I chose the style descriptor “digital art smooth”, it created a certain minimalistic/clean style similar to someone drawing in photoshop and there apparently are so many examples of this style in Dall-E’s training that it consistently produced images that are relatively the same.

“”A boy facing a broken bridge that he cannot cross on top of a cliff” digital art smooth”

Other issues during image generation

  1. Faces are screwed up

Not sure if more people encountered this but in over 200 renders, more than 50% have human faces that render the drawing unusable.

Faces are the weakest point of Dall-E; Especially human faces; 50% or higher of human faces are weird looking

Solution: Adding faceless as a descriptor to the subject or use the word abstract. Just spend more credits and re-render till you get a good face.

2. Size of the main subject is way too large
You want most images to feel like a small figure in a big landscape to make room for your text; Dall-E by default make the subject HUGE.

The subject are too big, you want it much smaller so you can fit in text for the book

Solution: Add more objects or details about the background, such as “next to a tree”, “with the sunset in the background”, “with the universe in the background”. These create complexities in the drawing and forces Dall-E to move away from drawing one big image.

A boy on the dock looking back at a ship in the water, sunset in the background” digital art smooth

3. Quality is highly inconsistent

You get rendering that are almost masterpieces vs 3rd grader crayon level stuff.

The first image is phenomenal, the second like something a kid drew in microsoft paint

Solution: Sometimes re-rendering gets you what you want. But sometimes, the drawings get progressively worse in detail and quality. The only solution is to rewrite the wording for the text prompts sometimes.

Results and conclusion

At the end, it took about 20 hours to figure out Dall-E and generate the 17 images that went along with book content I wrote. I think it’s an amazing tool. The negatives are clear, lack of consistent art style and also control.

The positives far outweigh the negatives:
1. Cost of art creation is great, totally it took about 30 dollars for around 230 tries for the 17 images

2. New iteration or changes are instantaneous, no need to work with a human artist and get new images after a day or more

3. The best of the images exceed the level of artist that I would be able to hire. The level of “artistry” is way better than the average artist on Fiverr or upwork.

Anyways, here’s the final book. You can also view it on Amazon and Kindle unlimited.

--

--

Ray Wang

I am an entrepreneur working on ScaleDesk.io. Cornell CS grad and ex-goldman programmer. Can mediocrely code, design and sell things :)