Naive art plus Generative AI (VQGAN + CLIP). Practical experiments

4 min readMay 14, 2024

This post focuses on applying generative models to various art styles rather than delving deeply into the details of generative models themselves. I’ve already discussed the implementation concepts of VQGAN — a powerful system for generating new images based on user prompts — in the link. Here, I revisit the VQGAN + CLIP interactive system, which modifies an input image based on a text prompt. As inputs, I use my naive art — paintings created with watercolors in a style similar to “zentangle” and this Google Colab — originally made by Katherine Crowson (https://github.com/crowsonkb, https://twitter.com/RiversHaveWings).

The picture below illustrates the input settings I configured in the Colab:

Output images are displayed in the Colab output after every 5 steps of transformation during the 80 transformation steps. We can then select the best result.

Below, several images illustrate transformations of various inputs with different text prompts. Each image depicts an input painting on the left side and the transformed image, along with the exact text prompt, on the right side. I’ve selected the most interesting ones and kept them in the minimum resolution that still allows for detailed viewing. Enjoy!