Creating a children’s book with AI

Tarwin Stroh-Spijer
Maker of Things

--

Updates 2024: The speed of progress in both image and video generation has only increased over the last year, with speed, quality and controllability getting better by leaps and bounds. ChatGPT has introduced Dalle3, Midjourney quality is just insane, and there are tools that allow you to draw what you want such as Alpaca, and new competition like Lexica. Check out what Martin Nebelong is doing to see what could possibly be the future of art creation for actual artists. Also ARTiV3RSE, the insane speed of FastSDXL which creates images as you type, and new kinds of images editors such as Playground!

You have probably heard about the amazing jumps in the power of AI in recent years, and in the last few months the advent of AI generated imagery using tools such as Midjourney, DALL·E and Stable Diffusion, and text tools such as ChatGPT.

I decided to have a play with the technology and see how easy it would be to get the AI to write, and illustrate, children’s books. You can see some of the outputs here:

Ethics

I’m not going to go into the weeds about the ethics of using AI generated imagery. You should know that these tools have been trained using billions of images taken from the Internet, in most cases without the artists permission. You should do some reading before you start.

A very short primer can be seen in this video by Cleo Abram.

Generating your own book

I will go through the process I used to create this book:

“Writing” the story

ChatGPT can be used to write a short story. It’s pretty good, especially with a little bit of guidance.

I asked ChatGPT to write me a story, giving it the name of my main character “Colah”, and a vague idea of what I want to happen in the story.

Pretty good for children’s story eh?

Now we want illustrations. I’m sure you could imagine each of these scenes, and then write a nice description of each. But we can be really lazy and ask ChatGPT to do that for us as well.

This blew me away when it actually worked! This already feels like magic; now think what this technology can do a year from now, 3 years, 10 years!

“Drawing” the illustrations

There are many text-to-image AI generators now available, all with their own strengths and weaknesses. A few of them:

  • Stable Diffusion: A fully open-source generator that has really led to a lot of progress recently.
  • Midjourney: Only accessible through a Discord, although there is a version of Stable Diffusion that has been trained on Midjourney output, which I will use here.
  • DALL·E: The originally publicly available tool.

I will explain scene-by-scene how I used these generators.

Scene 1

In the first scene, we see Colah the Koala climbing up into her eucalyptus tree. She is small and cute, with big, fluffy ears and soft, furry fur. The tree is tall and green, with branches reaching up into the sky.

I started out with Stable Diffusion Midjourney.

“a fluffy koala is asleep in a eucalyptus tree. Close up of the koala. Blurry background.”

Not really what I was looking for. Let’s try adding some more information for it to go on:

an illustration for children’s book of fluffy koala is asleep in a eucalyptus tree. Close up of the koala. Blurry background.”

Still not what I was looking for. Even when I tried asking it to only show a single Koala it kind of ignored me. There are “smarter” generators that will be available really soon that will probably solve problems like this, but right now it can be hard getting what you want.

Next I tried the basic Stable Diffusion. Eventually I came up with the following prompt which gave me what I was looking for. Note I ask for the output in the “style of May Gibbs”. It mostly ignores this, and I am happy that I’m not just ripping off a famous artist — even if she is dead.

A watercolor painting of a koala asleep in a eucalyptus tree. In the style of May Gibbs. You can see the whole koala.”

Scene 3

In the third scene, we see Colah trying to explain to Kiki that she just wants to sleep. She looks tired and annoyed, while Kiki looks excited and eager.

I wasn’t able to get Stable Diffusion to create an image that showed both a Koala and a Kookaburra in the same scene.

So I moved over to using Open AI’s DALLE.

“A watercolor painting of a Kookaburra and Koala in a eucalyptus tree. In the style of May Gibbs.”

This turned out a lot better than the other tools. The only problem is that I had created “widescreen” images previously. No worries, DALLE allows you to “outpaint”.

Creating the book

Make sure you save each of your illustrations as you go. These will likely be saved as PNG files — these files keep the full quality of your images, but really are too large to put on the web.

Compress your images

There is an amazing free online tool called Squoosh, which you can use to get your images into a format such as WEBP. Easy.

Putting it all together

I created HTML / JS / CSS that would display each of the images, along with the related text in a browser.

You can download the source for this book here:

In the README you will find instructions on how to use it, but the basics are:

  • replace images with your images in /img/pages
  • replace /img/icon.png and /img/og.webp with your own images. These are used when sharing on social media or adding to a phone’s home screen
  • edit /book.js — set details you want on your cover, the text of your story, and the images you generated

You can load the index.html into any browser, and refresh after each change you make to play with how it works.

Hosting your HTML

You’re probably going to want to share this book, at least to make it easily accessible on a phone or tablet.

To do this you’ll need to upload it to a host, so it can be accessed on the Internet.

My favorite host for this kind of thing is Netlify. You’ll be able to start using them for free as well.

I’m not going to go into the details of how to do this; there are many great tutorials online. But know you will also have to learn GIT.

Update: Adding video + audio

You cane take this further by creating a voice / video clone of yourself:

  • Create an AI generated avatar of yourself using a Dreambooth model. There are many services that make this easier such as DYVO, Neural.love — or a model on Replicate. Be careful of those that are free though because then you are the product! You could also just take a normal-old-boring photo of yourself and use that.
  • Using an AI voice cloner / generator such as ElevenLabs to create a voice-over — again you could just record your voice directly.
  • Using an avatar video generator such as D-ID you can add the generated image and audio, and get a funny talking head.

An example here — note, I do not have an American accent, it’s just that a lot of these voice generators are trained on American English:

Done?

I guess that’s about it. Good luck making your own. And please share your creations with me, and include the tag #ai-childrens-book or similar if you post about it.

--

--