Can DALL·E take over Medium?

Stock Photos from Unsplash vs AI-Generated Images

Julia Turc
6 min readJul 1, 2022

I used Craiyon (a text-to-image AI that replicates DALL·E 2 at a smaller scale) to generate pictures for my past Medium articles. Here’s how they compare to Unsplash.

Since April 2022, when OpenAI announced their latest model DALL·E 2 — capable of generating photorealistic images based on a text prompt — my Twitter feed has been inundated with reviews from early access users and even some mild drama (does DALL·E 2 have a secret language or does it not?). But beyond the specifics of how it works under the hood and nit-picks on its quality, a larger question is how its emergence will change the way we do things.

As an occasional Medium blogger, it made me wonder: Can I finally escape the Unsplash borefest?

For those not familiar, Medium is a blogging platform that prioritizes simple and elegant design. Bloggers can focus on their content, while Medium takes care of how it gets displayed. In order to play well with the rendering algorithm, writers are encouraged to include at least one image, even if the content doesn’t necessarily call for one. In 2013, one year after its birth, Medium started integrating directly with Unsplash, a collection of high-quality stock photos.

The Rise (and Fall?) of Unsplash

The integration with Unsplash created a certain distinctive aesthetic for Medium, which worked well in the early 2010s. For instance, if you were to write a listicle about the benefits of coffee, this picture of a latte art Cappuccino would make your article look sophisticated back in the day:

Stock photo by Anubhav Arora on Unsplash — Admittedly high quality, but reminiscent of early 2010s.

However, things started to change towards the end of the decade. As this article elaborates, the very existence of Unsplash decreased the value proposition of stock photography: free = not premium. The omnipresence of these (admittedly high-quality) photos created a sense of surplus, which diminished their perceived value and clickability.

Another problem is that abstract content is awkward to illustrate with photography. If you’re writing about hybrid cars, birds, or New York City, then photographs fit right in. However, Medium hosts many publications (think of them as communities) that focus on abstract topics — personal growth, economy, startups, etc. In such cases, writers resort to visual metaphors (e.g. a ladder for career growth, a robot for AI). But there are only so many pictures of ladders and robots; if you spend enough time on TowardsDataScience, you come to learn them all by heart.

Stock photo by Andy Kelly on Unsplash — Overly-used in tech articles on TowardsDataScience.

The Incoming Disruption of Generative AI

The promise of DALL·E 2 and its competitors (e.g. Imagen) is that creators will be able to produce unique images with very little effort or visual design ability. At the very least, each article in TowardsDataScience could now get its own robot. But more notably, such models seem capable of illustrating abstract concepts in non-trivial ways — your article about trends in machine learning could finally be robot-free, and perhaps show an interconnected network of computers and human brains on a Matrix-like background.

I was curious what my own Medium articles would look like in a world where Medium integrated directly with DALL·E 2 instead of Unsplash. Since DALL·E 2 is still not open to the public yet, I used an open-sourced replica of it, Craiyon (initially called DALL·E Mini, which OpenAI did not appreciate). Note that Craiyon (a two-person project) is an underrepresentation of DALL·E 2’s capabilities: the former has almost 9x fewer parameters (400M vs 3.5B) and was trained on 11x less data (15M vs 650M). Existing reviews from early access users do indeed suggest that the quality of the full model is orders of magnitude better than the open-sourced version.

The gallery below contains two columns of images: on the left hand side, pictures that I took from Unsplash and placed in my Medium articles; on the right hand side, images generated by Craiyon based on the title of the article.

Why GPT Won’t Tell You the Truth. Left: Stock photo (Unsplash). Right: AI-generated (Craiyon).
Why Text Summarization Is Still Hard. Left: Stock photo (Unsplash). Right: AI-generated (Craiyon).
Unconstrained Chatbots Condone Self-Harm. Left: Stock photo (Unsplash). Right: AI-generated (Craiyon).
The Switch Transformer. Left: Stock photo (Unsplash). Right: AI-generated (Craiyon).
TernaryBERT: Quantization Meets Distillation. Left: Stock photo (Unsplash). Right: AI-generated (Craiyon).
From von Neumann to Memory-Augmented Neural Networks. Left: Stock photo (Unsplash). Right: AI-generated (Craiyon).
AMBERT: A Multi-grained BERT. Left: Stock photo (Unsplash). Right: AI-generated (Craiyon).
Unsolved Problems in Natural Language Datasets. Left: Stock photo (Unsplash). Right: AI-generated (Craiyon).
Trends in Model Pre-training for Natural Language Understanding. Left: Stock photo (Unsplash). Right: AI-generated (Craiyon).

I personally find the AI-generated images promising at the very least, accounting for the fact that Craiyon is grossly understating the quality of the full DALL·E 2 model. The generated images are relevant enough to the title of the article to not seem completely arbitrary, but also vague enough to spark the curiosity and creativity of the reader in interpreting them. In my book, this is the definition of engagement. I’m curious what you think — feel free to vote in the poll above.

Remaining Hurdles to Overcome

Realistically, Medium might not implement this feature too soon, even if its design team were to unanimously agree on this direction. There are many hurdles to overcome, on many fronts:

  • Ethical: While OpenAI has already drawn the line for things it considers unethical to generate (violence, nudity, faces of humans that exist in reality), there is a huge gray area of content where making a binary judgment is difficult even in theory. Since AI can produce large volumes of images, we cannot simply apply the same ethical constraints as we do for human-generated imagery.
  • Legal: It’s still unclear who owns the copyright of an AI-generated image. The entity (company / person) that produced the AI model? The user that generated the text prompt? How different do two generated images need to be in order to be considered distinct from a copyright perspective? Or do we need an entirely new concepts that replaces copyright?
  • Technical: Running inference (i.e., generating an image based on a prompt) is not yet instantaneous (it can take seconds or even minutes) — would creators have the patience for it? Also, compute cost is non-trivial (certainly more expensive than an API call to Unsplash).

Despite these hurdles, the incoming disruption of image-generative AI still feels inevitable. It’s a matter of how and when it will happen, rather than if.

--

--