Are You A Snakke Oil Salleman? Then Stop Using DALL-E 3 From Producing Text

--

One of the funniest new items I’ve seen for a while was for “Willy’s Wonka Experience” in Glasgow, and which promised so much:

and delivered so little:

The text on the left hand side of the promotional image has been generated by DALL-E 3, and contains the rather strange words of, “Catgacting”, “Cartchy tuns”, “exarserday lollipops”, and “a pasadise of sweet teats”. I’m not event going to start to analyse that text.

And, so, DALL-E 3 is a significant step forward over the earlier versions of the image creator, but it is not good with text. For example, with “Illustrate the question of whether it is Cheaper To Subscribe To ChatGPT and DALL-E 3 Or To Use APIs?”, we get poor spelling:

This is due to the way that DALL-E 3 works, and it would be hoped that DALL-E 4 will improve the way that text is rendered.

The way around this is to stop DALL-E 3 from using text and by telling more of a story of the event. If we try:

Illustrated a Snake Oil Salesman, and who is selling cryptocurrency and cybersecurity.

we get “Snakke”, “Salleman” and “Cryourency”:

But, we can now define more elements to this illustration with,

Illustrate a Snake Oil Salesman and an engaged audience. Show the salesman as having a bottle in his hand and with an illustration of a snake. There should be a table there are bottles with labels that represent cryptocurrency and cybersecurity, such as with the bitcoin and erthereum symbols and a virus symbol on the label.

This forces DALL-E 3 out of requiring the engine to illustrate elements of text, as we now define how the salesman should look and in the things that represent cryptocurrency and cybersecurity:

The slight downside of this, is that the DALL-E 3 cost model is based on the number of tokens in the text, and which roughly equals the number of words in the request. But, the main cost involved in image generation is with the actual cost of the image, and so the overhead of adding more text is fairly minimal. Overall, it is around 3 cents for each image.

Logos

It does seem that the logos of major companies are fairly well defined, and where DALL-E 3 gets these right. This would make sense, as an incorrect company logo should cause copyright problems. For example, if we try for a Tesla related image,

Illustrate a user entering their details into their app in order to get access to their Tesla car

We get the Tesla logo in all its glory, and even get the word, “Secure” spelt correctly:

So, let’s try Google, and see if it can give us the Google logo. In this case we want to illustrate the way that Google scanned in books against the wishes of authors and publishers. In this case, we can define the emotions of those involved, with “illustrate when Google Steamrolled In Our Digital World by scanning books. Show book authors being angry about the scanning”. There is no need to add extra text on the image as the emotions of the people become the representive part of the image, and so we get:

Sometimes, you can strike it lucky and where it manages to get the text correct, such as with:

a reviewer of software code looks angry and announces, “YOUR CODE IS GARBAGE!”

Try and try again

The core focus of reducing the need for text is to be descriptive. But, sometimes you have got to find out what DALL-E 3 is able to render, and what it doesn’t. For example, the following does give any pointers to the actual graphic element produced:

Illustrate a centre of excellence in digital trust and distributed ledger technology

The generated result unfortunately has some text which is misspelt:

There’s obviously no such thing as Distrrubated. We can now give DALL-E 3 more to work on by defining the scene we want to capture:

Illustrate a centre of excellence in digital trust and distributed ledger technology. Digital trust should be represented with a padlock on a message, and distributed ledger technology by a chain of code which is linked together. There should be some technical programmers working together and pointing to parts of the chain of code. This should be set in a laboratory type environment.

But, that doesn’t work, but at least the Centre of Excellence has been renderred correctly:

And, we try again:

Illustrate a centre of excellence in digital trust and distributed ledger technology. Digital trust should be represented with a padlock on a message, and distributed ledger technology by a chain of code which is linked together. There should be some technical programmers working together and pointing to parts of the chain of code. This should be set in a laboratory type environment. The banner for the centre should have an image of two encryption keys, and not show any text.

But, that doesn’t work either:

We can see, it just doesn’t like the words in “Distributed Ledger Technology”, so let’s dump it:

Illustrate a centre of excellence in digital trust. Digital trust should be represented with a padlock on a message, and with a chain of code which is linked together. There should be some technical programmers working together and pointing to parts of the chain of code. This should be set in a laboratory-type environment. The banner for the centre should have an image of two encryption keys, and not show any text.

And you then pull your hair out, as it trips up on “Centre”:

So, the best way forward is to get a graphic which does not contain any text, and then we can use another package to add it in:

This should be set in a laboratory-type environment and a place which is skilled in digital trust. Digital trust should be represented with a padlock on a message, and with a chain of code which is linked together. There should be some technical programmers working together and pointing to parts of the chain of code. The banner for the centre should have an image of two encryption keys, and not show any text.

And, so we get something with some elements of misspelling but these have a nice artist blur:

We can then import this into a package which can produce the text in the right way:

Conclusions

The original DALL-E system was terrible, and DALL-E 2 was a vast improvement. With DALL-E 3 we now get images that are representive of the story we apply, and which can be fairly engaging. But, the misspelling of text is a problem, and the only current way to help reduce this problem is to stop it from using text as a main banner element, and then adding your own using another package. I feel, though, that DALL-E 4 will address this issue, and who knows where we will end up. Perhaps, a whole Internet of social media that is filled with DALL-E?

--

--

Prof Bill Buchanan OBE FRSE
ASecuritySite: When Bob Met Alice

Professor of Cryptography. Serial innovator. Believer in fairness, justice & freedom. Based in Edinburgh. Old World Breaker. New World Creator. Building trust.