ChatGPT Vision — Five Real Use Cases for Product Managers (Part One)

Angus Allan
6 min readOct 19, 2023

--

A few weeks ago, OpenAI announced that “ChatGPT can now see, hear, and speak”. The new capabilities of ChatGPT finally bring multi-modality to life; you can now speak to a realistic synthetic voice embodiment of ChatGPT via the mobile app (I’ll detail this in a future post) as well as input images into conversations.

While it sounds like a cool gimmick (and it is cool), after a week of testing I can say that this may be one of the most exciting developments since GPT-4 was released earlier this year.

How to Access ChatGPT Vision:

OpenAI is gradually rolling out ChatGPT Vision (and DALL·E 3 and ChatGPT Voice) to ChatGPT Plus users. Chances are if you’re subscribed you should have access already.

Make sure you are:

  • In GPT-4
  • Default mode (vision is not yet available when using advanced data analysis, browse with Bing, plugins, or DALL·E 3)
  • You should then see an image icon in the input bar
ChatGPT on web browser

On mobile, the inclusion of vision is much more noticeable, with camera and image icons in the bottom left, respectively.

Once you have access to Vision, the only limitation is four images per prompt. As you’ll soon see, this is plenty, even for large-volume data input.

Don’t have access to ChatGPT Vision? Give Bing a try for free

No, really. Microsoft has partnered with OpenAI, meaning Bing has access to GPT-4, DALL·E 3, and Vision. From my extensive testing, it doesn’t seem as good, but it is very close. If you don’t want to pay for ChatGPT Plus give Bing a go.

Use this link to try Bing Chat. I recommend the creative mode for almost all use cases. Anything we describe from here on will work in ChatGPT or Bing. Thanks, Microsoft!

Bing Chat includes a Vision model. Use the camera icon in the bottom left of the input bar.

Use Case 1: Transcribe & Transform Notes

Imagine this: you’re in a workshop. It’s been great. Everyone is engaged and you have dozens of sticky notes and paperwork to transcribe. Sure you could do this by hand, but Vision can be a superpower.

OCR (Optical Character Recognition) has existed for years, and can probably do a lot of the transcription for you. Heck, most smartphones can now copy text by just looking at a photo. But the real value in using Vision is taking advantage of ChatGPT’s inherent understanding of language to not only transcribe your notes but transform them into detailed task-oriented conversations.

Photo by David Travis on Unsplash

We can start by using a simple example. I found this stock photo that shows eight sticky notes. Vision was able to effortlessly transcribe them:

ChatGPT Vision Conversation

Great! It could even get the seven on the wall even though they were out of focus.

Let’s up the difficulty…

2x2 Prioritisation Method by Miro

This matrix not only has items to transcribe but also meaning to infer. The location of each sticky note is indicative of their placement on the prioritisation matrix. Vision knocks this one out of the park as well:

ChatGPT Vision successfully transcribes each sticky note and understands the meaning of each quadrant of the matrix

This approach can be scaled up to include diagrams and content with far more nuance and depth. And, once this content is inside ChatGPT, you can have a conversation with it. How about we turn the outputs of that 2x2 matrix into actions we could take in a sprint plan?

ChatGPT turning our prioritised tasks into a two-week plan (chat image edited to fit on screen)

The really incredible part of this is that ChatGPT intuitively understood that “create video” is not a task in itself. It automatically split it into three (planning & scripting, shooting, and editing) and then planned it across multiple days. This is where the value of AI really comes into its own. I could take this even further and say “let’s begin with day 1 of the sprint, help me plan out my social media post” and so on. By creating a coherent task-oriented conversation you can extract incredible value from a simple input.

As one last trick up my sleeve, let’s ask ChatGPT to add these all to my calendar so I can track the tasks more easily:

ChatGPT can generate code. Calendar import files are code. So it is entirely possible to ask ChatGPT to add to your calendar based on your natural language conversations.

And then after importing that ics file into Google Calendar, I have two weeks of events planned. I can rinse and repeat this approach as needed (or ask ChatGPT to give a different format, maybe a .csv for JIRA).

After importing the .ics file into Google Calendar it successfully added the tasks from the 2x2 matrix in an order that makes sense

You can take this even further. Now that the days are planned, bring ChatGPT into your conversations that make the work happen. I often advocate for people to treat ChatGPT like a co-worker on your team. Talk to them, ask them questions, get their opinion, and work together to make your creations better and faster. The longer you chat about a single context (and fill up the context window), the more relevant and insightful ChatGPT becomes.

Why is This important?

As product managers, we are responsible for building products that solve customer problems, that people love to use, and that drive our businesses forward. To do this, we often talk to customers, create roadmaps, run workshops and ideation sessions, map customer journeys, build prototypes, and more. You name it, a product manager is probably doing it. And yet, so many of these tasks are visual. Sure, you could represent most of these in text, but humans are multi-modal: we use text, but we also see, hear, and touch the real world. When we can interact with AI agents and systems in the modality that makes sense to us I would argue that not only can we be more productive, but we can have more genuine interactions too. Showing ChatGPT a diagram and iterating with it in a conversation is more natural than writing a block of text and receiving one in return.

When you can take the output of a workshop or a product artefact and it understands it you can accelerate every step of the process. In some cases, like transcription, you are literally saving time. You can invest this time back into doing the work that matters. And in cases where you can transform your work, you can become a Hybrid PM. Work side-by-side with AI to work faster, more creatively, and in new and interesting ways.

What’s Next?

This is just the tip of the iceberg. I have been extensively using ChatGPT Vision since it became available to me and I have several in-depth use cases to share that I believe can transform the way product managers work. I am going to be releasing another in-depth use case every week for a total of five weeks, with my articles going live one week early on my Substack. Subscribe on Substack to get immediate access to my articles, for free.

--

--

Angus Allan

AI Product Manager and ex-founder building and scaling products for top UK brands. Join my newsletter at http://aiangus.com/