“Hey ChatGPT, do I look lonely?”

Losers (and winners) from the GPT-4o announcement

Some products are more toast than others

Mike Young
7 min readMay 13, 2024

--

Normally, I cover research papers, tools, and models for you. But today, I want to put my product hat on and give some ideas on who I think the biggest winners and losers from yesterday’s GPT-4o announcement. In case you missed it, this latest OpenAI model processes and generates text, audio, and images in real time. This “omni” model is a major step forward in creating more natural human-computer interactions.

Here’s a look at who I think stands to gain and who might lose out from this development.

Subscribe or follow me on Twitter for more content like this!

Context

GPT-4o, with “o” standing for “omni,” handles text, audio, and images simultaneously. It responds to audio inputs in just 232 milliseconds, matching human conversational speeds. It’s faster, more accurate, and 50% cheaper than its predecessors. The model excels in non-English languages and multimodal tasks, making it a game-changer in the AI arms race.

By integrating text, audio, and visual processing into a single system (and, from what I can tell, doing the inference directly from the inputs as opposed to first transcribing everything), GPT-4o sets a new standard for conversational AI. This model…

--

--

Mike Young

Writing in-depth beginner tutorials on AI, software development, and startups. Follow me on Twitter @mikeyoung44 !