GPT-4o by OpenAI : Things to know

Understanding the new features and comparison with existing GPT-4

Mehul Gupta
Data Science in your pocket

--

The Generative AI race is heating up as OpenAI, suprisingly, has released a new variant of GPT-4 called GPT-4o (O stands for Omni)

The word Omni reference the words Omniscient (knowing everything) & Omnipotent (having unlimited power) referring to its capability to handle audio, images,video, text formats altogether using a single model. This is actually a monumental achievement as most of the time, when any audio is fed to an LLM (you must have used the mic option on ChatGPT), it internally uses a combination of models to

Convert audio to text

LLM works on the text

Text output mapped to an audio

Hence this isn’t handled by a single model. But with the incoming of GPT-4o, this is actually handled by a single model.

My debut book, “LangChain in your Pocket” is out !

In this new offering, OpenAI has came out with a few strong features including

  • A strong focus on audio & video alongside images and text making its multi-modal functionality even more diverse. Remember that GPT-4 isn’t well known for audio & video but for just images & text. Also, given the early verdict, the audio & video synthesis looks terrific !
  • GPT-4o has extremely low latency, with the ability to respond to audio inputs in as little as 232 milliseconds on average, similar to human conversation response times. This real-time performance enables seamless voice interactions.
  • The voice and tones used are no mechanical but resembles human, making it more realistic. It does giggle, cry, become serious as and when require.
  • While matching GPT-4’s performance on English text and code, GPT-4o shows significant improvements in understanding and generating text in non-English languages compared to previous models.
  • GPT-4o is available for free to all ChatGPT users on mobile and desktop, democratizing access to this advanced AI technology. Though, not all functionalities are available in the free version.
  • It is also 50% cheaper than GPT-4 when accessed through the API.
  • As measured on traditional benchmarks, GPT-4o achieves GPT-4 Turbo-level performance on text, reasoning, and coding intelligence, while setting new high standards on multilingual, audio, and vision capabilities.

I assume by the time you must have seen some of the demo videos, including the one where Sal Khan from Khan Academy had a real time conversation (video conferencing) with GPT-4o. Looking at the response rate and accuracy, this looks to be a killer development.

How to access GPT-4o?

You just need to visit https://chatgpt.com/ and login into your account. Switching to the free GPT-4o is quite easy as shown in the image below.

So, would GPT-4 become obsolete?

Maybe. Given the results, GPT-4o right now is similar to GPT-4 on text and is looking way better on audio & video alongside improved latency and cost (about 50% less). But we must remember that even in the recent past, many firms made some claims with decorated demos like Devin but didn’t go well and hence until unless models like GPT-4o aren’t tested on real-world usecases, we shall wait before getting too excited.

With this, its a wrap, see you soon with some other exciting Generative AI development !

--

--