Multimodal language models from a designer’s perspective

Published in

ELCA IT

11 min readMar 8, 2024

To me, it is almost hard to believe that ChatGPT has been around for a little more than a year as of writing this article. The first widespread image generators emerged a few months earlier. The changes that OpenAI, Midjourney, Stability.ai and many more inspired, the creativity they sparked, and the questions they raised simultaneously in mere months are so profound that they could fill a decade easily. The pace of innovation and development is unprecedented, and we are still at the beginning! Granted, the initial hype is cooling down a bit. Simply delegating human work to AI can lead to painful learning experiences. But as a tool, cleverly utilised, much potential is yet to be discovered.

Free floating artificial brain surrounded by documents — Generated with Midjourney

One of the more recent novelties in generative AI is the advancement of Multimodal Large Language Models (Multimodal LLMs). Those can “see”, which opens a plethora of opportunities. It is an important development because this will potentially pave the way to break free from text-based interactions like chats and prompts, and it will lead to concrete use cases in more visual disciplines like design that go beyond generating images. The underlying interaction is still prompt or natural language-based, but it can be automated way more and be integrated in existing workflows — the AI tool could generate prompts in the background based on images and specified actions. Nothing of this is particularly new, but in the combination lies the magic. Below I will share some different examples how to work with AI tools in general and give an outlook what the future might look like.

Examples of Multimodal language models in the design process

Extracting a color palette from images

You can test Multimodal Language Models with chatGPT if you have a subscription. Here is an example I tried with the iPhone app:

ChatGPT created a UI color palette based on a photo

I uploaded a beautiful photo of an autumn morning, which I took on my way to work and asked chatGPT to create a UI colour palette from the scene. Not only was it able to identify “autumnal hues” and translate those to a pretty accurate palette. It understood the “UI” context and advised me to use colours with accessibility in mind.

Takeaway: Multimodal LLMs provide us with new ways to include them in visual workflows. This is a simple example but it shows how well the LLM “understands” the context of the image.

Reviewing design mockups

Here is another, more sophisticated example I tried with the chatGPT web app:

*ChatGPT provided clear and actionable feedback for an uploaded app design*

I uploaded a visual design with apparent flaws and asked ChatGPT to suggest improvements. Note how I assigned the visual designer role and the job to review. With minimal context, it could provide correct and actionable suggestions. It identified the UI elements correctly, i.e. “product items”. Even the circular selector was recognized despite its awkward and unusual position and lack of contrast.

If you have a ChatGPT Plus subscription I invite you to try Brutal Feedback, where a GPT impersonates a grumpy senior designer who delivers honest feedback without holding back. Upload a screenshot or image of what you want to review and be prepared.

*GPTs provide an easy way to create tailored agents or assistants*

Takeaway: These examples demonstrate how AI tools can help us with visual “micro”-tasks, inspiration and feedback. I call them “micro” because they are isolated parts in a hypothetical design process. With the tooling we have at hand today, the AI doesn’t know the general context of our project or the thing we are designing for. Yet, the potential is already visible.

UX Storybooks

AI tools enable us to do things that were too time-consuming or downright impossible before, like UX storybooks. UX Storybooks are a great medium to make a story or journey more relatable and understandable. They can elevate a customer journey from an abstract to a more symbolic level. Storyboards capture the whole context of personas and demonstrate how people might interact with a product or service. That is useful for pitch presentations and stakeholder demos. The problem is that good storyboards are challenging to do (well) and take up a lot of time. Nobody would hire a professional artist just for such a disposable artefact. A talented designer may be on the team, but their time is valuable. Most likely, they won’t be assigned to storyboarding. But with AI tools and a little hackery, it is possible to create visually compelling storyboards with consistent characters in little time.

*An example of a storyboard for a fictitious business of a presentation trainer. The images have been created with Midjourney v4*

Takeaway: It’s possible to force Midjourney into creating consistent characters but in general we face again the missing overall context. However, this is a solvable problem and to make compelling storyboards is more feasible than ever before. In this case I don’t mind that you can easily spot the AI-generated images because it doesn’t really matter for a throwaway artifact. The AI generated storyboard serves the purpose of making a story or process understandable and relatable quite perfectly.

Turn crude drawings into anything

It will soon be possible to transform drawings into wireframes, designs and code prototypes. I encourage you to run https://makereal.tldraw.com (you will need an OpenAI API key).

*makereal turns a quick sketch and a description into a working code demo with tailwind*

Another great tool is https://www.krea.ai, where you can currently sign up for free.

*Krea creates illustrations based on a sketch and a prompt in real-time.*

Takeaway: Drawing on a whiteboard and turning simple drawings into beautiful illustrations or turning crude wireframes with a short description into working code demos is really cool. But what does this mean? Will designers become some sort of curator in the future? Or act as a business analyst describing what has to be implemented by the AI and testing and integrating the results it spits out? It may not be too far-fetched, given that many design tools are already moving towards AI-assisted design. A part of the role and work of designers will be challenged.

Content generation and content review

Another area that’s prone to be affected by AI is content generation and content review. Here is a simple example that I put together a while ago:

*A UX writing assistant based on guidelines.*

This assistant can create and review text based on guidelines. It provides clear improvements and explains each suggested change clearly. You can find it on my GitHub if you want to try and understand how it works. Of course, this example is not multimodal but does demonstrate the future of assistant-supported design. It’s incredible how well this works and relatively simple: There is a piece of information (the “system” role) that is passed to the language model with the prompt and hidden from the user. Even ChatGPT has a global system prompt which might actually hold it back — but that is unverified territory. By the way, the models retrieved via API don’t have a system prompt which presents another good reason to not only rely on ChatGPT! In effect, the system prompt assigns a role to the LLM, providing it with context and steer it into a specific lane. With the recently announced GPTs, this became even easier. You don’t have to touch a line of Python code and can even upload your own data — if you build it yourself you have just a little more control.

Takeaway: It’s important to understand how this example is designed: It doesn’t simply do your job but it helps to do it better. Your content guidelines run in the background and the LLM proposes improvements. It’s up to you what you do with it. Don’t let the AI do your job.

Conversational content creation

I want to share two further examples which I did with GPTs that address the “listening” part of multimodal models. Story Builder lets kids (or their parents) build their own story: It asks a few questions about the main characters and places and generates a short novel. I instructed the GPT about the length of the story (not more than 20 minutes), about the amount of questions to ask (too many would be annoying) and even to create a coloring page with the main characters at the end of the story. Disclaimer: This is purely experimental, I would never let my children use this unsupervised and I strongly discourage anyone to replace reading real books by AI generated content. I was curious how they would react and they don’t really seem to mind.

*Story builder creates short novels for kids and makes a coloring page at the end.*

The police sketch artist was another idea of mine: This GPT asks some basic questions about physiological features of a face and then creates sketches that can be refined iteratively. It is not perfect and I would not recommend it to the police just yet but it shows some interesting potential. What Dall:e still seems to fail at is not the quality which is certainly good enough. However it can’t take a previously created image as a base for a new image and just modify parts of it. There is no reliable consistency but I’m certain there will be solutions in the future.

*The Police sketch artist generates police sketches iteratively, similar to a situation in a police station.*

The iterative improvement based on instructions is a strength of LLMs and this also gives a certain amount of control to the user.

Takeaway: There are clearly unsolved ethical questions concerning AI. Things that are possible today may not be possible in the future because we will have learned and gained a better understanding about the risks that AI tools may pose. I will not let AI-bots read stories to my kids and the police shouldn’t introduce helpers like this without thoroughly checking and testing them in a controlled environment. However, the approach to achieve user-control over an inherently uncontrollable output by including a constant feedback loop in the design of an assistant is a promising approach which I will explore further.

A glimpse into the future

The next part is highly speculative and primarily based on my ideas and observations. However, I hope it will spark new ideas in other people’s minds. We should start rethinking human-computer interaction from the users’ perspective and the creators’ — our own! — perspective. What are our tools going to look like?

The key to using AI for design lies in something other than replacing human labour or creativity. AI will instead act as an augmenter of the capabilities we already have — individually or as a team. Even so it was impressive initially to see how Midjourney or Dall:e produced almost realistic replications of oil paintings — those became stale, and we have seen it all. I can spot AI-generated illustrations instantly when I see them (at least for now). If anything, I appreciate outstanding human artwork even more. If we want to use AI in design, we must do it wisely, ethically and thoughtfully. Otherwise, we might risk producing junk, which will ultimately lead to significant damage to our profession. We already have to justify investments in user research and UX. It won’t help if our sponsors see us achieving “similar results” with a few prompts, which they could just as well do by themselves.

The true power of AI lies in new possibilities that we don’t have yet or the simplification of complex tasks that have now become possible or feasible. These are a few paradigms that future tools, methods and approaches may follow:

Data-driven

A common artifact in an early design phase are personas so I pick this example to explain the concept. AI can create personas, it is not even bad at it but we should not delegate it. Personas must always be based on data that is gathered or researched methodically. But once personas exist and the AI has access to them, it can refer back to them at any part of the design process. Currently without AI this is often done inconsistently or Personas become forgotten over time altogether. Now think about the persona content as a form of the system prompt. AI could constantly review design concepts in the background and then hint at how well your concept matches the different personas or why a specific persona could have a problem with your actions. It helps you think on various levels in parallel, but it doesn’t do the job for you.

Non-linear

Especially when working on complex professional applications we start by doing research and collecting as much research data as possible. This could be papers on related topics, interviews, spreadsheets, or basically anything — collected in a knowledge base. Today we use Miro or Confluence to collect data at a common place. The AI can quickly deliver insights and context for your decision-making and conceptual work throughout the design process. In a way the research phase doesn’t have a clear end because new insights can roll in continuously. The double diamond as a model for a linear approach may have to be adapted to accommodate these new possibilities.

iterative with instructions

Right now, we can use text prompts to jump directly to screen designs that are static images with mostly nonsensical text. This is suitable for mood boards at best. But multimodal LLMs can take images as an input as well. Soon it will be possible to use wireframes or sketches and the LLM (that also knows the business context based through your research data) can iterate design proposals. With the help of LLMs it is possible to explore many more directions in much less time. That could mean that we may work more as an art director in the future. With an instruction based workflow we can quickly iterate different design solutions.

Automated and interpolated

A lot of design tasks today involve laborious tasks like building design systems, develop typographical systems, building components, etc. Once they are there they can speed up our workflow, provide clarity and consistency. But to reach that point we are doing the same work over and over again. A lot of that stuff will be done by AI tools in the future. But it is also in the small things: Creating credible and specific mock data, mock imagery, etc.

Conlusion

The future design tool — if it would even exist — would likely consist of multiple agents, each of them individually created when needed and optimised for specific tasks. These agents will be connected via a central knowledge base and communicate with each other. Ideally, you as the designer won’t see much of them. They would work in the background and support you across the different design phases. You maintain the knowledge base, take care of the interviews and are the creative lead. You and your team are ultimately responsible for the deliverables.

For me, as a designer, those are the exciting times. Nothing is defined yet. We can figure it out together. So many untapped ideas and potentials are left to discover. It is comparable to the early days of the web until the mid-2000s, before everything was streamlined, standardised, and made fit a framework, and before conversion-driven design decisions took over the field.

The fact is that AI won’t just quietly go away. We will have to deal with it in one way or another. Our chance and responsibility as early adopters is to build the world we want to live in. With the right approach, new opportunities and careers will be created, leading to better projects and a better design experience.