The Incarnation of Her: ChatGPT 4o, Voice AI, and applications

After years of Siri, Google Assistant, Alexa will we finally see a next-gen voice assistant?

Kavir Kaycee
The Discourse Co
7 min readMay 17, 2024

--

Hey folks, Kavir here. welcome to yet another edition of The Discourse. I am glad you could join me today. It’s a great time to be an AI optimist. Yesterday’s OpenAI event was fantastic and today’s Google IO event is much awaited to see what the sleeping AI giant has for consumers. Today’s piece is on yesterday’s announcements from OpenAI centered around Voice AI.

ChatGPT 4o Omni and Voice AI

OpenAI Spring Update: Voice AI, GPT 4 ‘Omni’, ChatGPT Mac App and more

We knew before the event that voice was touted to be the big feature to be released. The Information covered it and there were many hints including Sam Altman liking a tweet mentioning that OpenAI was launching ‘her’.

And then after the event tweeting just the word ‘her’.

Voice is a useful medium and I’ve lately found myself using it regularly for use cases including writing first drafts (including this article), listing my next day’s tasks, brainstorming, venting, and more use cases. I’ll write more on these, so subscribe if you haven’t.

Voice to text AI using Audiopen

But with the introduction of GPT-4o, particularly its advanced voice interaction and unified Omni model, represents a step function upgrade to voice interactions and poised to redefine interactive assistants’ roles and capabilities in our daily lives and professional environments.

In this piece, we’ll discuss my personal experience with ChatGPT voice, the new omni model, comparison with existing voice assistants, the new ChatGPT Mac app and the new text model.

Let’s start with voice.

Personal Experience with ChatGPT Voice

I’ve used it extensively ever since it launched, from venting out my concerns and getting empathetic and supportive responses to brainstorming ideas, and thinking through solutions — often on walks when I don’t want to type and they usually are 5–10 minute conversations.

Since I speak slowly with pauses, I end up triggering ChatGPT voice to respond before I am done. So I end up holding onto the screen and using it as a walkie-talkie rather than a normal conversation.

There have been a few things that break the experience.

Sometimes the connection breaks and you have to repeat what you say. That’s a big no-no. The latency between you finishing speaking and the response is long enough to break the flow. In yesterday’s event, they deconstructed the way it used to work with the GPT4 model. The AI had to first transcribe your voice to text, feed the model with text, get an output, and then convert the output from text to speech.

I found that the voice interaction was already more human-like than any other interaction I had before.

This changed a bit when Hume, the empathic voice agent, launched. It was real-time, could decipher emotions through a combination of text and voice tone, was able to emote, and handle interruptions.

Hume Voice AI

Yesterday’s update basically shipped all of that, with a way more powerful reasoning model.

The Omni Model

The simplified version of yesterday’s update is that they have managed to create a model that natively supports text, audio, vision — all in one. It takes the input as is, processes it, and returns some intelligence, reasoning, and response based on that.

This unlocks a lot of use cases that I’ll talk about in a bit. But before that, let’s look at the existing ecosystem of voice assistants and how they have all flattered to deceive.

Comparisons to Siri, Alexa, and Other Assistants

Voice assistants are nothing new. Siri, Apple’s virtual assistant, launched on October 4, 2011, alongside the iPhone 4S. 2011! That’s a whole 12.5 years ago now.

Amazon Alexa launched on November 6, 2014, with the release of the Amazon Echo smart speaker. Google Assistant was introduced on May 18, 2016, during Google’s I/O developer conference and became available on the Google Pixel smartphones later that year.

It works for simple tasks like setting a timer, setting an alarm, telling me a joke, playing a song on Spotify, asking a fact, or interacting with your home devices — like switching on and off lights. I use Alexa for playing music and have set a night-time routine for sleep sounds.

Some do some things better than others. MKBHD released this comparison video in 2023. Worth a watch. https://youtu.be/Q2MGqmuEdtU

But none of them have leveled up their intelligence to the point of it being useful. All of them are built on previous tech. Here is Siri’s response to me asking it to call me an Uber.

Siri Voice AI

ChatGPT voice would change things. And if rumor has it, if it’s integrated into Siri and onto 1.3B iPhones , it would be a game changer. We’ll know more at WWDC on June 10th, 2024.

Applications of Voice AI

Some of the applications that OpenAI demoed were interesting. Live language translation, bedtime stories for kids, jokes, homework assistants, interview prep, and what I use it for: assistant to create tasks, write emails and messages, write first drafts, act as a life coach, act as a career coach, and more.

Apart from these, I would be really interested in applications that were demoed by Google 6 years ago! https://youtu.be/D5VN56jQMWM

Google Duplex allowed the AI to make calls for you — to make restaurant, hair cut, etc bookings.

And also allows you to screen calls and filter out scam and spam calls. The AI could talk to these callers and screen them for me, so I only pick up the worthwhile ones.

I would love to not have to speak on the phone for these tasks ever again.

Even though Google demoed this a while ago, we’ve not seen it live yet, which has been disappointing.

Maybe another feature for OpenAI to integrate into the iPhone. Of course, Google will be forced to finally launch this publicly soon at Google IO (May 14th, 2024).

The new ChatGPT Mac App

Now, coming to the laptop, they have announced a new ChatGPT app, which I’ve got access to. It’s an improved UI with a search filter and direct access to voice conversations, with a keyboard shortcut to access ChatGPT similar to Spotlight or Raycast. You can drag and drop images onto the chat and ask questions about them. This makes the previous ChatGPT apps on your Mac obsolete.

Here is how you can get access.

In a future iteration — You can also live share your screen, and the AI will respond, but that’s not yet available. This ability to view what’s on your screen and have an ambient assistant always available in terms of voice is going to be really powerful.

The new text model GPT 4o

Impressions on GPT 4o are that it’s extremely fast, faster than 3.5 turbo, and it seems to be more intelligent. I’ve used it in some of my business APIs, which have provided mixed results because I was used to the previous version. This changes things a bit-in some ways better, some ways worse — but I think that can be fixed with better prompt engineering and giving a few examples.

This update might reduce my usage of Claude now because things are more integrated-it has memory, it’s fast, it’s smarter, and it has code interpreter. It can use my writing style to write things and do a lot of things. It can access the GPT store, and I can create more GPTs for my use cases.

Final Thoughts

Overall, it’s not the big bang update. It’s not GPT-5. It’s not a search engine. But it is incremental and good in a way. It also allows free users to access GPT4, driving further adoption and acceptance of AI. 3.5 wasn’t cutting it anymore and it was now a lousy first impression of AI for people. For example, I can’t wait to get my parents on it. Everyone can benefit from an IQ boost and an intelligent partner.

We still don’t have access to the live voice updates yet, but when that’s out, I can’t wait to test it out and improve my quality of life with the new intelligent voice assistant.

Thanks to ChatGPT 4o and Lex.page for providing feedback on early drafts of this piece.

That’s it for today, thanks for reading!

What do you think ofVoice AI and the new updates?

Comment below, and I’ll reply to you. Give feedback and vote on the next topic here.

Talk to you soon!

Follow me on Twitter @kavirkaycee

Originally published at https://thediscourse.co.

--

--

Kavir Kaycee
The Discourse Co

Product Manager | Ex-entrepreneur | ISB grad | Former football writer