Creating and Illustrating a children’s book with Dall-E in less than a week
10 years ago I had the idea to write a children’s book about the joy and difficulties of building something; I wanted something with the approachability of Dr. Seuss but also presents a message that scales with age. That idea was there but the hassle and cost of illustration always prevented me from finishing the book.
Alternatives like hiring an artist on platforms like Fivver and Upwork takes a lot of work, money, and human interactions. When Dall-E was launched, I was excited as the barrier to produce a children’s book dropped dramatically.
To see the result now, scroll to the end.
How Dall-E works
Dall-E takes a single prompt and then generates 4 square picture with the dimensions 1024x1024, although the photo quality is extremely high and I think the real pixel amount is probably 4x that.
Creating a cohesive style
The biggest initial problem I found was creating a consistent style. The entire project is to generate 17 images with the same style. As Dall-E generates each new image from a new prompt without any control on style or any way to save a style, it’s hard to maintain consistency.
The best solution for this is to find a set of keywords that keep generating images that look the same. However, I discovered you can’t do just any style, if you have both a subject matter that is very unique and a style that is very unique, the generated image will lose the style first. The style has to be one popular enough that Dall-E already has tons of images on it so it maintains the same style throughout.
Below is an style that failed to make the cut:
At the end, I chose the style descriptor “digital art smooth”, it created a certain minimalistic/clean style similar to someone drawing in photoshop and there apparently are so many examples of this style in Dall-E’s training that it consistently produced images that are relatively the same.
Other issues during image generation
- Faces are screwed up
Not sure if more people encountered this but in over 200 renders, more than 50% have human faces that render the drawing unusable.
Solution: Adding faceless as a descriptor to the subject or use the word abstract. Just spend more credits and re-render till you get a good face.
2. Size of the main subject is way too large
You want most images to feel like a small figure in a big landscape to make room for your text; Dall-E by default make the subject HUGE.
Solution: Add more objects or details about the background, such as “next to a tree”, “with the sunset in the background”, “with the universe in the background”. These create complexities in the drawing and forces Dall-E to move away from drawing one big image.
3. Quality is highly inconsistent
You get rendering that are almost masterpieces vs 3rd grader crayon level stuff.
Solution: Sometimes re-rendering gets you what you want. But sometimes, the drawings get progressively worse in detail and quality. The only solution is to rewrite the wording for the text prompts sometimes.
Results and conclusion
At the end, it took about 20 hours to figure out Dall-E and generate the 17 images that went along with book content I wrote. I think it’s an amazing tool. The negatives are clear, lack of consistent art style and also control.
The positives far outweigh the negatives:
1. Cost of art creation is great, totally it took about 30 dollars for around 230 tries for the 17 images
2. New iteration or changes are instantaneous, no need to work with a human artist and get new images after a day or more
3. The best of the images exceed the level of artist that I would be able to hire. The level of “artistry” is way better than the average artist on Fiverr or upwork.
Anyways, here’s the final book. You can also view it on Amazon and Kindle unlimited.