AI Network
Published in

AI Network

DALL·E 2 : meaning, limitations, and solutions

Table of Contents

√ What kind of model is DALL·E?
√ What made DALL·E 2 different?
√ What kind of technology has been applied?
√ Meaning of DALL·E 2
√ Limitations of DALL·E 2
√ Solution: The problem is openness. Let’s deal it with open resources

A polar bear playing the guitar? A robot that looks like a Picasso? A koala dunking a basketball? Can you draw a picture that pops into your head right away? You’re probably thinking of an image you’ve seen before somewhere or perhaps of something created by your own imagination. But even for us humans, it’s not easy to turn a thought or an idea into a specific picture. Here’s a model that makes this possible. It’s called ‘DALL·E 2’ and it was released recently by Open AI. If you use this model and type in ‘a polar bear playing the guitar’, you will actually get a picture of a polar bear playing the guitar. The same will go if you type ‘a Koala playing basketball’, then you will get a picture of a koala playing basketball.

What is DALL-E?

In January 2021, DALL·E was developed by Open AI and hit the headlines back then. In short, it’s an AI model that can generate images from text. Does the name ‘DALL·E’ ring a bell? Perhaps the name Salvador Dali, the surrealist painter popped into your mind. That’s right. They named it DALL·E, inspired by the Spanish artist as well as the robot animation movie Wall-E. Many efforts have been made to create images from text, such as the GAN model, but there are still parts that don’t seem natural. There’s also been a lot of progress in research in generative models based on transformers, such as GPT-3 and the representative model is DALL·E.

DALL·E 2, What’s Different?

DALL·E 2 has been recently released and it is bolder. It brought imagination into the realm of fully-fledged AI. The community that witnessed the birth of the original model DALL·E, has already responded enthusiastically. So, what’s different now?

First of all, DALL·E 2 does not only recognize individual images, it also considers the relationship between images. It’s not only doing a partial recognition of koalas and motorcycles. When you write, ‘a koala riding a motorcycle’, DALL·E 2 also considers the relationship between the two images.

Second, The resolution was improved. If you search for ‘avocado chair’, you can see a clearer image that has 4x better resolution than the original DALL·E (such as the avocado chair image in OpenAI’s website).

Lastly, It is also possible to transform or reprocess images.

What technology has been used in DALL·E 2?

DALL·E 2 uses an image recognition system called CLIP (Contrastive Language-Image Pre-Training), also developed by Open AI. CLIP has the training objective of predicting whether an image and a piece of text match. DALL·E 2 uses a method the authors call “unCLIP” to produce images. Here they use a frozen CLIP model to produce image and text embeddings. Raw text is first fed into the CLIP text encoder with the text embeddings as the output. From here, DALL·E 2 attempts to recreate the image embeddings produced by CLIP using either an autoregressive or diffusion prior. It finally uses this image embedding to generate the final image using a diffusion decoder.

If you’re unfamiliar with diffusion models, they work by corrupting the training images by adding Gaussian noise for a set number of iterations. This is done until the training image is approximately pure noise. The model is then trained to reverse this process and gradually denoise the image until a clean sample is produced.

Significance of DALL-E 2

The significance of the DALL-E model to us can be summarized as follows:

First, we can now express things that were hard to express before. As we explained earlier, we can now generate images that were difficult to express even for humans with only text as an input. DALL·E has made it possible for AI to enter the realm of “creation”.

Second, it can be a way to test whether AI can produce more than human data or just use the data that is provided. We can use this model to determine whether AI can do more than humans have taught AI could do.

Finally, we will be able to see how much AI can understand the human world. We’ll see if the human world that DALL·E 2 sees is in the realm of our imagination, or if it’s out of it. It’s also important to see if this model will be used for good or bad, ethically or unethically. And looking at it will inevitably be a useful tool for social consensus on safe and useful AI development. Open AI claims that the DALL·E model will help develop safe and useful AI.

Limits of DALL·E 2

Al models are still evolving. It’s hard to say at this point if DALL·E 2 is a perfect model. For example, if data labeling is incorrect, it can produce false results, just like someone who learned the wrong word. Or, when it receives text that it hasn’t learned before, it will try to produce similar results to what it saw during training, but the results may be too different. We think it’s exciting to see the development of DALL·E with time and see how it can be applied to new areas using what it has learned.

The problem is openness. Let’s deal with open resources!

We find it unfortunate that it’s not open to the public who want to try it. Just like GPT-3, which was not disclosed to everyone but only to a few, DALL·E 2 is said to be released first to small users. Because it can be used for Deepfakes and it can be used to create violent, sensational and provocative images. Of course, it is said that OpenAI has considered the “impact” of releasing the model hastily, but still we think it is a pity.

In opposition to the monopoly of GPT-3, GPT-Neo, an open-source version of GPT-3, was created in the open-source community. What if DALL·E 2 develops similarly? We hope to help solve the computational restraints problem using our AI Network platform, which advocates for open resources. And in the process, the inevitable ‘resources’ will be solved with our platform AI Network, which advocates for ‘open resources.’ If we can all create an ‘open source’ DALL·E together, a model that surpasses DALL·E 2 would come out sooner. The ‘side effects’ that Open AI is worried about, will also be easily solved if they are managed together as ‘live’ in the AI Network community, a distributed cloud infrastructure using blockchain.

You can try the small-version of DALL·E released by Phil Wang on Ainize. The model size is small, so the performance is not as good as the latest version of Open AI, but if you use it, you can get a sense of what kind of model it is.

AI Network is a blockchain-based platform and aims to innovate the AI development environment. It represents a global back-end infrastructure with millions of open source projects deployed live.

If you want to know more about us,

AI Network website: https://ainetwork.ai/

AI Network Official Telegram Group (English): https://t.me/ainetwork_en

Ainize: https://ainize.ai

AI Network YouTube: https://www.youtube.com/channel/UCnyBeZ5iEdlKrAcfNbZ-wog

AI Network Facebook: https://www.facebook.com/ainetworkofficial

AI Network Twitter: https://twitter.com/ai__network

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
AI Network

AI Network

AI Network official account. Please contact me here. info@ainetwork.ai