How to combine ChatGPT with DALL·E (text to image) to increase your productivity

Key Hui
3 min readFeb 25, 2023

--

Nowadays, everyone knows ChatGPT and use it many times. In this article, I will try to combine ChatGPT and text-to-image api to generate AI images.

DALL·E 2 is a new AI system that can create realistic images and art from a description in natural language.

My prompt is “Describe a Gundam

The result from ChatGPT for example

“A Gundam is a giant robot from the Japanese anime series of the same name. It is typically depicted as a large, humanoid mecha with a variety of weapons and armor. The most iconic Gundam is the RX-78–2, which is a white and blue robot with a red and white V-fin on its head. It is equipped with a beam rifle, beam saber, and shield. It also has a variety of other weapons, such as a bazooka, missile launchers, and a beam cannon. The Gundam is piloted by a human, usually a young man, and is capable of performing a variety of maneuvers and attacks. It is often used to defend Earth from alien threats.”

After I got the result, I passed it to DALL·E to generate my Gundam image

It works great.

If I directly use DALL·E and set the prompt “A Gundam”, it also can return similar image.

It let me think I misuse the ChatGPT prompt. This is not a good example.

Let me try another one.

The prompt is “Describe 3 different Goblin characters”

This time I list out 3 possible description and generate 3 images at the same time.

First the ChatGPT response

  1. Grunthor the Greedy: Grunthor is a small, green-skinned goblin with a large appetite for gold and treasure. He is always on the lookout for a way to get his hands on more wealth, and will do whatever it takes to get it. He is a master of disguise and can often be found lurking in dark corners, waiting for the perfect opportunity to strike.
  2. Krog the Cunning: Krog is a tall, blue-skinned goblin with a sharp mind and a knack for getting out of tight spots. He is a master of manipulation and can often be found using his wits to get what he wants. He is also a master of disguise and can often be found in the shadows, waiting for the perfect moment to strike.
  3. Grom the Fierce: Grom is a large, red-skinned goblin with a fierce temper and a love of battle. He is a master of weapons and can often be found leading a small band of goblins into battle. He is a master of intimidation and can often be found intimidating his enemies into submission.

Then I use the 3 prompts to generate images respectively.

Interestingly, I have 3 different styles of Goblin with images too. It’s really helpful to produce game assets in a short time.

The sample code without difficult logic.

@app.post('/top_results')
async def top_results(model: ChatModel):
response = openai.Completion.create(
model=model.model,
prompt=model.prompt,
temperature=model.temperature,
max_tokens=512,
top_p=0.1,
frequency_penalty=0.0,
presence_penalty=0.0,
)
result: str = response.choices[0].text
prompts = result.removeprefix('\n\n').split('\n\n')
print(prompts)
images = []
for p in prompts:
result_image = text_to_image(p)
images.append(result_image)
return {'prompts': prompts, 'images': images}

In conclusion

From the above result, this technical may be not mature enough to use it directly, but it can help you to do research work and increase your productivity.

Source code reference.

https://github.com/k-hui/chatgpt-text-to-image-aws-lambda

--

--

Key Hui

Full Stack Software Architect | AWS & GCP Cloud Computing | Mobile Development | Web Development | Frontend Development | Backend Development