Uncle Aaroh Testing
6 min readOct 15, 2023

--

Blog Image

Imagine a system that turns any description you provide into a strikingly realistic image. You type, ‘an astronaut riding a horse’, and instantly it creates a distinctive, brand-new image of just that. Or perhaps ‘teddy bears shopping for groceries’ and, before you know it, you have an image to match your odd yet delightful description. How about ‘a bowl of soup that is a portal to another dimension’? No problem, it renders that fantastical scenario into being.

This isn’t from a science fiction novel, this technology actually exists and it is called Dolly 2. Developed by OpenAI, a company co-founded by Elon Musk, Dolly 2 is an AI that creates original, realistic images and art from a textual description.

The brilliance behind Dolly 2 is a combination of two AI technologies — ‘Clip’ and ‘Diffusion’. ‘Clip’ matches images to text, forming a basic, conceptual understanding of the request, while ‘Diffusion’, impressively, teaches a computer to ‘corrupt’ an image by adding gaussian noise and then un-corrupt or enhance that image by removing the noise.

It takes the gist of an image, understands the concept, and then by using a process similar to corrupting and enhancing an image, it generates new, high-resolution images. It is much like the website ‘this person does not exist.com’ where an AI is used to look at thousands of faces and then a new face is generated based on that information; Dolly 2 is an advanced version of this, generalised to turn any concept into an image.

At this stage, Dolly 2 is not widely available and OpenAI has limited its access to a handpicked group. However, the possibilities for this technology are fascinating, and raises questions about the future of AI and its potential to transform the way we produce and consume visual content.

Discovering the Wonders of Dolly 2

When open AI team graciously offered me the opportunity to yield Dolly 2’s amazing capabilities, my expectations were sky-high. It turns out, I wasn’t prepared for the sheer genius this AI beholds.

I started with a simple task: a blue apple and a bowl of oranges. The result was startling! The vibrant colours, extreme sharpness, and the remarkable detailing surpassed all my anticipations, making it difficult to distinguish it from a real photo.

Next, I craved a certain eccentricity, an elderly kangaroo, to be precise. Though I had imagined an older kangaroo to bear a greyish look, the AI’s representation was indeed satisfactory. Maybe not what I had in mind, but the output was, nevertheless, a well-executed image of an aged kangaroo.

Apart from animals, I thought of crafting a mystical portrait, hoping for a wise elephant gazing at the moon. Dolly 2 hardly disappointed. Although the moon seemed slightly dubious, the elephant was impeccably convincing, convincingly cloaked in an aura of sagacity.

Moving away from reality a little, I was keen on a visual manifestation of a teddy bear performing surgery on a grape, all in a nostalgic 1990s cartoon style. This request was met with a highly imaginative, though not exactly irrefutable, animation. My ideal instrument for the operation would be a scalpel, however, Dolly 2 opted for scissors. But then again, we’re talking about teddy bear surgeons, aren’t we?

I was curious about a picture of a dog operating a camera on a film set. Unexpectedly, the AI managed to render a delightful image of a Cooker Hunch, the sheer precision depicting the dog breed astounded me. The closer examination, however, might give away the AI touch.

In the realm of sci-fi, I requested a robotic woman, firmly guarding a wall of computers. The simulation of this scene was no less than awe-inspiring, reflecting the AI’s intelligent interpretation of the word ‘guarding’.

New challenge: a tiger discovering the lost city of Atlantis. While there couldn’t be any realistic references, Dolly 2’s representation was closer to an art piece, heavily relying on the AI’s creativity.

Finally, a painting, inspired by the Mona Lisa but depicting a goat using an iPad. Dolly 2 not only created goats with hands but managed to maintain the Mona Lisa style while integrating the iPad into the scenario. This fun culmination signified the boundless possibilities with Dolly 2.

As a conclusion, Dolly 2 pushes the boundaries of what’s possible with AI, forcing us to reimagine the limits of creativity.

Introducing Dolly 2: The Visual Research Project by Open AI

Fascinated by technology and AI? Have you ever imagined a Cyclops on a tractor listening through over-the-ear headphones in the Simpsons style? Well, an AI tool like Dolly 2 can help bring such vivid images to life! However, it’s amusing that sometimes, there are certain quirks, like the mix-up between earbuds and over-the-ear headphones.

Dolly 2 is a research project, not a consumer product that falls under Open AI’s vast portfolio. This tool forms part of the impressive league of highly specific AI systems capable of performing tasks from detecting cancer in X-rays to navigating autonomous cars or even sharpening photos in Photoshop. The complexities extend exponentially when we venture into the realm of General AI which requires tremendous data to be competent in numerous defined situations. Bringing the level of something akin to a Tesla robot wandering the earth, doing tasks for you, is a completely separate challenge!

Reality Check: Dolly isn’t Perfect

Of course, Dolly 2 is not without its flaws and limitations. Some are intentional, while others are just incidental quirks. For instance, Dolly 2 intentionally refrains from generating images containing adult content, illegal activities or violence. It even checks specific identity usage of people to avoid any harmful consequences.

Some unintentional shortcomings include certain quirks with variable binding. Relative position of objects in an image can be a bit tricky for Dolly. For example, if asked for a red cube on top of a blue cube, it might mix up the order. Interestingly, it also struggles with written text formats.

A Glimpse into the Future: Dolly 3

The research team is working hard to iron out these kinks for the next version, Dolly 3. However, clever little exclusions are what make this tool even more interesting. For example, by leveraging a diffusion method, Dolly 2 can transform existing images iteratively towards any description prompt you want!

You can seamlessly turn a plain jacket into a Jackson Pollock painting or morph a picture of a cat into a samurai master! Or better yet, un-modernise a piece of tech, converting an latest iPhone model into older and older versions.

While concerns about AI replacing jobs persist, Dolly 2 continues to be more of a useful tool rather than an employment threat at this point. Stay tuned for more exciting developments in the world of AI!

If you’ve ever wanted to know the practical implications and potential of artificial intelligence, particularly in the task of graphic design, we recently addressed that very topic in a studio video. Essentially, we experimented by putting Dolly 2, a cutting-edge AI, head-to-head against Tim, our resident graphic designer.

The primary aim of both Dolly 2 and Tim is essentially the same — to transform spoken words into a visually appealing image. And the spoiler alert? Given enough time, Tim can make something superior. But in just 10 seconds, Dolly can churn out a multitude of different variations.

True, some images created by Dolly might be rough around the edges, or have unusual text placements, or even become pixelated when you zoom in on faces, hands, or objects. Yet, as it is currently constituted, this AI tool is stunningly efficient for brainstorming ideas and concepts. It can create things that would generally take a significantly longer time to devise.

Further, the images that Dolly produces are not necessarily intended to be the final finished product. Instead, they provide a fantastic platform from which to develop later projects. This very concept was utilised for the thumbnail of our video, which began as a robot hand sketch created by Dolly.

I confidently foresee that future generations of Dolly will likely produce even higher-resolution, more photo-realistic images. Possibly even quick animations, video clips, and who knows, perhaps even entire movies. The march towards General AI continues and it’s an incredible time to be a part of this journey. Thanks for reading, and until next time, stay curious and keep exploring.

--

--