Qosmo Mask 2022

Ryosuke Nakajima
Qosmo Lab
Published in
6 min readDec 9, 2021

➝日本語版はこちら

Qosmo has designed an original mask to greet the New Year 2022 just like last year. This year, we used CLIP, a model that learns the relationship between images and text, and an image generating model to design it. In this article, I will describe the concept and method of the design.

Concept

Once again “Stay safe with an AI-generated tiger mask, Best wishes for 2022 !”

2022 has the tiger symbol on Chinese zodiac, so Qosmo has designed an AI-generated tiger mask. And we have used a text to image algorithm for it. It is called CLIP and image generation which are cutting-edge topics in AI. The tiger texture looks like a real photograph but it has been generated by AI.

The typical tiger texture colors are yellow and black — these are translated to navy color which is Qosmo’s identity color. It will be great for the everyday use.

Technology

For this year’s Qosmo Mask, we used CLIP and image generation model to generate a pattern.

CLIP is a model developed by OpenAI and trained to learn the relationship between images and texts. About 400 million pairs of images and texts are obtained from the web and the model is trained to predict the correct pairs. CLIP is an abbreviation for Contrastive Language-Image Pre-training, and is characterized by the fact that it uses Contrastive Learning for training. During Contrastive Learning, in simple terms, the model is trained so that similar data produces similar vector representations and different data produces different vector representations. Through this learning process, CLIP can associate images with text.

The model vectorize image and text. It learns to make similar data into similar vectors and different data into different vectors. (from https://openai.com/blog/clip/)

CLIP also focuses on improving the accuracy of zero-shot classification. Zero-shot in this context means that the model performs well on data that is seen for the first time. In the past, image classification models have improved their accuracy by focusing on a specific task with a specific dataset. However, these methods do not work well on the data that is new to the model (i.e. not part of the dataset). Therefore, in recent years, research has been conducted on the models that work well on new data that is not part of the training data set. In the domain of natural language, GPT-2 and GPT-3 have become a hot topic due to their versatility, and CLIP is a method that aims to achieve this same versatility in image domain.

As you can see, CLIP is a model that can associate images with text and is highly accurate even in zero shot. So, what kind of expression is possible in the field of art by using CLIP?

Since the appearance of CLIP, various experiments have been done by many artists. Among them, the combination of CLIP and image generation models to generate images that match the input text has attracted much attention. GANs and other image generation models allow us to control the generated image by changing the input vector. In addition, as mentioned above, CLIP can be used to vectorize images and texts as features, and calculate their similarity as a numerical value. This makes it possible to actually calculate how well the image generated by the image generation model matches the text input. Initially, we generate images using random input vectors, calculate their similarity using CLIP, and then optimize the input vectors by calculating the gradient and use backpropagation so that the generated images match the text input. By repeating this process many times, it becomes possible to find an input vector that matches the input text, and then this input vector can be used to generate a matching image.

System diagram of CLIP + image generation model (from https://ml.berkeley.edu/blog/posts/clip-art/

This technique has been experimented with by many artists. By including the words such as “unreal engine” or “trending on ArtStation” has been known to improve the quality of the generated images. The artist can interact with the system to generate images by trying different input texts.

CLIP art, which combines CLIP and image generation models, was previously introduced on our website Create with AI. Please refer to it as well.

Experiments

For this year’s Qosmo Mask, we used the method of combining CLIP and image generation models that we have introduced so far to generate the pattern. There are various image generation models such as VQ-GAN and StyleGAN, but this time we used Diffusion model, which, unlike GANs, gradually generates a high-precision image by removing noise from the input signal. Diffusion model takes longer to generate than the GAN, but it is capable of generating images with higher quality. The following is the text we used as input and the result of generated images.

“Tiger texture, trending on ArtStation”

“dark blue tiger texture pattern, trending on ArtStation”

“animal stripe, trending on ArtStation”

By adding the words “trending on ArtStation”, we can generate a realistic image. Art Station is a website where creators from all over the world publish their high quality CG-related art works.” By adding “trending on ArtStation”, you can steer the model to follow the style of Art Station’s works in the generated image. By adding the word “dark blue”, we can limit the color to blue. As you can see, we experimented with a variety of input texts to generate the pattern.

We already have the plan to arrange the color, so we want the tiger texture which people can recognize without colors. And I selected 3 textures which keywords “Tiger texture, trending on ArtStation”.

Layout the original:

Arranged the colors to dark navy:

The mask has become great for the everyday use.

We selected the last one and finalized it by adjusting the color to Qosmo’s navy.

This is how we created the Qosmo Mask for 2022. Stay safe in 2022!

We are Qosmo!

Thank you for reading till the end. We are Qosmo, Inc. a collective of Artists, Designers, Engineers and Researchers. Read other Medium articles from Qosmo Lab and if you are so intrigued to find out more, get in touch with us from here. We are actively searching for new members, collaborators and clients who are passionate about pushing the boundaries of AI and creativity. Ciao!

--

--