OpenAI Introduces GPT-4o with Real-time Conversation and New Vision Capabilities

Emanuele
3 min readMay 13, 2024

O stands for Omni, a Latin word for everything. Is this AI capable of everything? Maybe not yet, but we are getting a new improved model from OpenAI with new vision and voice capabilities named GPT-4o.

Screenshot from OpenAI’s Spring Update

The big news today is the launch of a new model, named GPT-4o, which brings GPT-4 level intelligence to everyone, including free users.

There’s a strong emphasis on user interaction with this new model, focusing on understanding tone of voice and nuances, which are very complex elements to grasp for an AI. Until now.

GPT-4o operates across text, voice, and vision. And it’s so efficient that it will be available for all ChatGPT users for free.

OpenAI has also made significant improvements in quality and speed across 50 different languages, demonstrating their commitment to opening access to as many people as possible. That’s also useful for translations!

Faster audio capabilities

During the live demo, the new voice capabilities were revealed: there is almost no latency during the talk, and the responses sound really natural. Additionally, you don’t have to wait for ChatGPT to finish their sentence; you can interrupt it! This is possible thanks to the almost real-time…

--

--

Emanuele

Coding and GenAI enthusiast // Flutter apps creator // Experimenting with LLMs and Stable Diffusion