Creating a color scale with AI, or: How to co-hallucinate with AI to prototype an abstract idea

Dan Hiester
6 min readJan 17, 2024

--

Over the holiday break, my portfolio site got to the point where I needed to ensure color harmony by bringing in a color scale. When I started looking around at color scales, though, I realized I didn’t want to do my color scale with colors that blend like a linear gradient.

Early in my career, I noticed that if I boost the saturation on the very dark and very light colors, that helps make the hue more perceivable, and connects those colors to the brand color more effectively.

Keeping this in mind, I had an idea for a “dream” color scale design tool. It might look something like this:

An image of an HSL color selector where a blue hue has been chosen, with a bezier curve layered over the top that has nine points.
A rough mockup of what my dream color scale tool might look like.

I wanted to draw a bezier curve that would allow me to quickly choose a color scale. The white dots represent colors that I chose manually (the darkest, middle, and lightest colors), and the gray dots represent the colors in between, chosen for me by the tool.

There’s just one problem: I couldn’t find a color scale tool that works like this. (If you know one, I’d love to hear about it!)

So, instead of trying to choose all of these colors by hand, I had an idea: What if I prototype the concept for my color scale using an LLM?

At worst, I’ll have a funny story to tell. At best, I’ll get the AI to co-hallucinate a color scale with me!

First draft of my ChatGPT prompt

Asking an AI to create a color scale — here’s how it started:

Help me create a color scale based around the color #38A0CD, using HSL color. I want there to be nine steps to the scale. I want the nine steps to use HSL color, and form a parabolic curve, where the lighter and darker colors have more saturation. Let’s call the color brand blue. Output this color scale as a series of CSS custom properties.

In the first draft of the scale it gave me, I wasn’t satisfied with the colors ChatGPT chose as the start and end points of the scale, so I asked it a follow-up question:

Let’s re-do the parabolic curve a little. Can we make the dark point of the curve hsl(198, 97, 7), the middle point of the curve hsl(198, 60, 51), and the light point of the curve hsl(198, 95, 95)?

From this, I got:

  --brand-blue-1: hsl(198, 97%, 7%);    /* Darkest */
--brand-blue-2: hsl(198, 88%, 25%);
--brand-blue-3: hsl(198, 78%, 34%);
--brand-blue-4: hsl(198, 69%, 43%);
--brand-blue-5: hsl(198, 60%, 51%); /* Middle */
--brand-blue-6: hsl(198, 73%, 68%);
--brand-blue-7: hsl(198, 80%, 78%);
--brand-blue-8: hsl(198, 87%, 87%);
--brand-blue-9: hsl(198, 95%, 95%); /* Lightest */
Two sets of nine color samples. One was sampled from a gradient, and one uses colors from ChatGPT. The ChatGPT colors are slightly easier to recognize as being different from each other.

For the purposes of the idea I wanted to test, this did what I wanted. The difference between each shade is easier to notice, and this can help each shade stand out better when placed close to another similarly colored object.

This could have been the end of the story. But as I found out through repeated experimentation, this was an extraordinary case of beginner’s luck.

If at first you think you succeed, try, try again 😂

When I evaluate a new technology or technique for work, it often isn’t enough to see that it works once. I need to understand how it works. Or at least, I need to see it work more than once.

Also, I’m working with OKLCH colors on my project, so my next step was to ask the LLM to generate the scale with OKLCH a few more times, with a slightly different prompt each time, to see what I get.

This is where it started to get weird.

A set of 11 color swatches, ranging from dark to light. Near the lighter end of the scale, the colors become very saturated and overly bright, such that they don’t blend into the last step in the color scale.
What’s wrong with this color scale? 🤣

When I asked the LLMs to set the scale in OKLCH colors, I kept getting colors at the lighter end of the scale that didn’t even come close to blending in to the final step of the scale. I tried dozens of times to reframe my prompt in such a way that I could get a different result. I tried ChatGPT 3.5 via Open AI, ChatGPT 4 via Microsoft Copilot, and even Google Bard. But every color set came back looking like this.

Along the way, there were a handful of other responses that were not what I wanted.

For example, this code sample used variables that it didn’t declare, so none of these colors worked:

    --brand-blue-darkest: oklch(20.2% 0.048 223);
--brand-blue-lightest: oklch(96.4% 0.027 221.6);
--brand-blue-middle: oklch(66.1% 0.14 224.6);

--brand-blue-step-1: oklch(
calc(20.2% + (var(--brand-blue-middle-luminance) - var(--brand-blue-darkest-luminance)) / 5)
calc(0.048 + (var(--brand-blue-middle-saturation) - var(--brand-blue-darkest-saturation)) / 5)
calc(223 + (var(--brand-blue-middle-hue) - var(--brand-blue-darkest-hue)) / 5)
);

This sample used the same color for the last three steps. Also, on step 8, it set the luminance value above 100%. OKLCH doesn’t support that.

  --brand-blue-1: oklch(20.2% 0.048 223); /* Darkest */
--brand-blue-2: oklch(31.6% 0.077 224.2);
--brand-blue-3: oklch(42.9% 0.109 224.4);
--brand-blue-4: oklch(54.3% 0.134 224.5);
--brand-blue-5: oklch(66.1% 0.14 224.6); /* Middle */
--brand-blue-6: oklch(77.9% 0.134 224.5);
--brand-blue-7: oklch(89.3% 0.109 224.4);
--brand-blue-8: oklch(100.6% 0.077 224.2);
--brand-blue-9: oklch(96.4% 0.027 221.6); /* Lightest */
--brand-blue-10: oklch(96.4% 0.027 221.6);
--brand-blue-11: oklch(96.4% 0.027 221.6);

Also, I found that converting colors from HSL to OKLCH is not straightforward. In Figma, I use a plugin called OkColor, which itself uses a JavaScript library called Colouri. The LLMs apparently haven’t learned this library’s secrets, as of holiday season 2023. In general, I found LLMs don’t generate good color scales in OKLCH.

Finally, I realized a parabolic curve was too rigid or literal of a concept to ask an LLM for. Moving forward, I ask for a bezier curve instead.

The results, after so much tinkering

Out of all the colors scales I generated, no two scales were the same. And even though I kept trying to improve the prompt, none of the scales I tested generated a scale of colors I liked better than the first one.

Pro tip: In addition to having the LLM generate color scales for you, you can also have it generate code with color swatches that demonstrate the colors, so you can rapidly see them next to each other.

What did I learn, here?

First of all, I was reminded that LLMs are not perfect. I could use the exact same prompt several times and get different color values from each attempt. ChatGPT 3.5, especially had a tendency to tell me it did something that it didn’t actually do—but I had the same problem to lesser degree with the other LLMs, too.

Also, I learned that LLMs don’t do math as computers normally do. If an LLM has been trained on a lot of math stuff, it memorizes the calculations and hallucinates math stuff based on what it learned. (Which is pretty relatable, isn’t it?!)

For now, LLMs can do a first draft faster than us, but the first draft is never perfect. Learning to work well with an LLM means figuring out how to set up the LLM to set us up for success. It frequently won’t do what you want on the first try, so it helps to learn how to ask it for revisions.

But my big takeaway here is: Don’t just ask an LLM to give you a thing. Bring your personal spark to the prompt. For me, that spark was asking for the color curve. You might have your own, different strategy for setting up a color scale. One strategy may work well for one project, and another strategy may work well for a different project. As a human, it’s still up to you to own the vision!

--

--

Dan Hiester

A Product Designer who leverages technical skills and product strategy to elevate design.