Prototyping 5 Novel Interactions for AI

6 min readDec 29, 2023

Towards the end of 2023, I wanted to conclude the year by wrapping up my explorations in creating novel interfaces and experiences for generative AI. This is something I’ve been thinking about a lot as a creative person who always wanted to push beyond the boundaries of simple chat/prompting interfaces. I set the goal to create a set of AI interface prototypes by the end of this year, and was lucky to share this idea with Kawandeep Virdee. We started an #AIxUIdailies challenge on X, where each of us created 1 prototype per day for a week, with 1 unique prompt everyday to explore various topics related to AI & UI. I gained a lot through this challenge, and wanted to share my thoughts and process with you in the following deep dive:

Prompt 1: Probabilistic Metaphor (Dec 18th)

I came up with this prompt after getting inspiration from Jason Yuan, who described AI as a probabilistic design material, that is different from traditional deterministic software features, in a talk at AI Engineer. He showcased a design where two photos selected by a user, to be used in the generation of a new image (similar to Midjourney’s “/blend” function), are lit up to signify their affordance of being able to be merged. Yuan cited the physics metaphor of white light, which can be refracted into a rainbow of different colors (different possibilities). I found the metaphor both delightful and effective. It serves as a clear signifier of affordance, particularly for a new AI feature that users might not be familiar with, or a complex feature primarily used by expert users but can bring significant value to novice users.

As I brainstormed other AI capabilities and their related physics metaphor, the first feature that came to mind was “variation,” the process that is crucial in flattening the learning curve through which users get from prompt engineering and experimentation to the result they find satisfactory within generative AI applications. (See Linus Lee’s talk at Lightning AI) This got me thinking about how AI generated media, such as image output from text2image models, is malleable and probabilistic with possibilities for further variations, rather than a deterministic final result. The metaphor of water droplets hitting the ground and breaking into smaller droplets came to mind. Or perhaps one liquid material transforming into several other materials through chemical reaction.

This led to my prototype #1, signifying the affordance that the user can generate variations from this specific image through liquidy blob animations and ripple effects.

Prototype #1

More ideas: What would be the physics metaphor for, eg. video generation with camera controls, image2image, controllable image generation technologies, prompt engineering, custom model training, etc.

Prompt 2: XY Input (Dec 19th)

The second prompt, according to Kawandeep Virdee, was inspired by prototypes that navigate generative AI outputs through the latent space. However, I gave it my own spin to reimagine the AI image generation experience from “prompt -> output” to “output -> spatial exploration of new visual ideas based on semantic elements,” sort of like a visual improv of “yes, and …” or an infinite generative visual canvas inducive of creative flow state and divergent thinking. I was also inspired by the problem that advanced multimodal models such as GPT4 Vision still struggle to describe the precise position of certain elements within an image, and was thinking about what creativity enhancement would that knowledge enable.

Where is the bird😳

As a result, I leveraged the amazingly fast P5.js and ML5.js library, plus DALL-E 3 API, to build my prototype #2, an infinite “dream collage” canvas:

Prototype #2

More ideas: What if we connected this with latent space exploration? What if we leveraged more powerful computer vision models?

Prompt 3: Latent Consistency Model (LCM) (Dec 20th)

This prompt addresses the explosively popular LCM which dramatically speeds up diffusion models’ image generation process to allow for near-real-time generation, unlocking various novel creative workflows (eg. the tool by Krea.ai, or demos like the following).

Having created indie games solo in 3D myself, I thought of 3D artists and game devs’ potential pain point of ideation when building complex imaginary worlds. Therefore, I added some UI that allows the user to enter some world elements, see their rough 3D scenes composed of basic geometry rendered in real-time, and add or remove any distinct element as they ideate on the virtual world they are building. See prototype #3 world-building ideation assistant built with Fal.ai API; the 3D editor is Aframe:

Prototype #3: Though I am still planning on having other people try the tool, I found myself already entering creative flow state while using this.

More ideas: Sending LCM generations back into the 3D workspace

Prompt 4: Dynamically Generated UI (Dec 21st)

I suggested this prompt after this demo of Google’s Gemini generating interactive interface as a response to user input blew my mind. While earlier dynamically generated UI focused on personalization, now in the context of generative AI, this form of interface opens up possibilities of making LLMs better communicators (dynamically generated UI, rather than long paragraphs of ChatGPT responses), or making applications more context-aware and fitting for the user’s usage scenario.

Gemini generating bespoke UIs

Passionate about creative support tools, I started brainstorming by drawing parallels between Gemini’s generative UI feature (user enters prompt, but instead of text output, Gemini organizes output information in the form of interactive UI) and text2image applications (the user enters prompt, but instead of image output, the app could generate drawing UIs for the user to create an image themselves — would that enhance users’/artists’ creativity even more?) Furthermore, I wanted to break out of the chatbox paradigm with this prototype, so I thought, why don’t I use camera as input? This resulted in prototype #4 which I built with React and OpenAI’s GPT4-Vision API. The AI app allows you to take a photo of anything, and generates buttons of emojis it thinks would be a fitting addition to your photo. You can press the button to add the emoji to your photo. (another “yes, and…” idea):

Prototype #4

More ideas: Emoji was a fun and easy element given the time constraints, what other digital drawing/painting UIs could be generated?

Prompt 5: Multi-User Interface (Dec 22nd)

Rather than single human + AI collaboration, what would a multi-user collaborative interface with AI look like? Again, I aimed to think beyond text-based input, and also incorporate the idea of embedding space exploration. So I made a playful 3D interface for multiple users to make some ambient sounds together. I extracted the prompts describing some recorded or generated natural sounds, obtained their embedding space with OpenAI’s Embedding API, and converted it to a 3D representation (x, y, z coordinates) with t-SNE. I then used the x, y, z coordinates as positions for cubes within a Three.js scene, with each cube representing a sound, and similar sounds were located close to each other. I used Socket.IO to manage multi-user interaction within the same scene. The result is prototype #5 below:

Prototype #5

More ideas: Visualizing larger embedding space with a larger dataset, or even a latent space.

That’s it! Stay tuned for more prototypes for novel AI interactions! Here is the Google Doc that contained details about the #AIxUIdailies coding challenge, feel free to check it out: https://docs.google.com/document/d/1tKCDKBwXTx-IRhZpcIUT4V0ouwg5lGCUOGcqrqHfGHk/edit

--

--

Yvonne Fang
Yvonne Fang

Written by Yvonne Fang

Exploring AI, creativity, HCI, games, climate, and more | MHCI '23 @CMU

No responses yet