[Image by Author and DALL-E]

GPT-4o and Gemini 1.5 Flash Are Here: A Look at the Future of AI

Chandler K
The AI Archives

--

The past week has changed the future of AI. From OpenAI’s new GPT-4o model to Google’s new AI Overview for searching and Gemini 1.5 Flash, massive strides have been made toward making Generative AI a daily aspect of our lives. A lot of announcements were made this week so this article will highlight the most important developments.

Specifically, this article will cover the following:

  • OpenAI’s new GPT-4o model
  • Google I/O’s AI updates
  • Keymate.AI’s newest features

GPT-4o

This new “omni” model is OpenAI’s latest LLM (Large Language Model) which is meant to be an improved version of the GPT-4 Turbo model. It has greater understanding of text, audio, images, and video (hence the “omni” name). These combined features have created the most “natural human-computer interaction” currently available through AI. However, that’s not where the upgrades stop. GPT-4o is not only faster than its predecessors, but also 50% cheaper when developing with the model through the API. These changes make GPT-4o the obvious choice for developers looking to create cutting edge products with OpenAI’s latest advances. As the OpenAI demonstrations show, many of the improvements come in the form of the multi-modal features.

Many of ChatGPT’s users are utilizing the text generation feature, so by adding a model with increased text related abilities, OpenAI is improving the experience for millions of daily users. Each of the benchmark evaluations measures the performance of GPT-4o and compares it to other state-of-the-art models. Based on the above graphic, GPT-4o is one of the most powerful models currently on the market.

  • MMLU: (Massive Multitask Language Understanding) assesses the model’s performance across various academic subjects.
  • GPQA: (General Purpose Question Answering) measures the model’s ability to answer general knowledge questions accurately.
  • MATH: (Mathematics Benchmark) evaluates the model’s proficiency in solving math problems.
  • HumanEval: (Human Evaluation) involves humans evaluating the quality of the model’s output based on predetermined criteria.
  • MGSM: (Multilingual Grade School Math Benchmark) accesses the models ability to accurately use non english prompts and responses.
  • DROP: (Reading Comprehension) ensures the model can perform complex reasoning over multiple paragraphs.

The last major announcement from this event is that GPT-4o will be FREE for everyone. OpenAI is giving this model to all users. Along with this, OpenAI is making GPTs (and the GPT store) open to all users as well. This means that the features gained from ChatGPT Plus have been significantly reduced. However, there are still several benefits that are exclusive to Plus members. These include a 5x higher message limit (meaning users can ask ChatGPT 5x more questions) as well as a new “Voice Mode” for GPT-4o.

Google I/O Updates

Similarly to OpenAI, Google has unveiled a number of major updates in the past week. These updates include new and improved models such as Gemini 1.5 Flash and the upgraded Gemini 1.5 Pro. Google has also released updates to browsing and now has Gemini built into the searching experience. This new way to search will have millions of users using Gemini on a daily basis in the coming days.

Here are some highlights from Google I/O that are worth mentioning:

  • Gemini 1.5 Flash: Flash is meant to be fast, efficient, and cheap. This model is the inexpensive alternative to other LLMs (mainly Gemini 1.5 Pro). The reduced price doesn’t equate to poor performance. Although not as “intelligent” as Pro, this model can still solve most user queries. This powerful model is currently 35 cents per 1 million tokens.
  • Gemini 1.5 Pro Updates: Gemini 1.5 Pro got a host of updates and improvements this week. These new changes ensure that Pro can better tackle complex and nuanced problems. A 2 million token context window will be coming soon for developers using Gemini 1.5 Pro.

Both of these updated models are publicly available to preview through Google AI Studio with a one million token context window. They are also now fully multi-modal and can receive images, audio, and video inputs.

“1.5 Flash excels at summarization, chat applications, image and video captioning, data extraction from long documents and tables, and more. This is because it’s been trained by 1.5 Pro through a process called “distillation,” where the most essential knowledge and skills from a larger model are transferred to a smaller, more efficient model.” — Google I/O

A number of AI tools have been released that integrate directly into your Google experience. From an AI Overview to AI assisted browsing, the search experience is being shaken up by these new developments. These features and more can be explored and enabled (yes, you have to enable them) through Google Search Labs. To help foster both interest and innovation, Google launched a competition challenging developers to build unique AI apps with the Gemini API. It’s a great way to push the boundaries of what’s possible with AI.

Keymate.AI Updates

In keeping with this week’s major AI news, Keymate has begun releasing a series of new features and UI updates. These changes are focused around Keymate Memory and have significantly increased the actions users can take then managing and implementing with their saved information. The Keymate Memory Manager will now allow users to store web pages, PDFs, youtube videos, and ChatGPT responses that can all be recalled and interacted with with ease.

The UI changes now allow users to manage their saved documents by adding them to personalized collections. Each saved item can also have notes associated with it that only the user can view, this allows for better organization without needing the worry about a user’s reminders / notes impacting a response from the model.

Rarely is an ecosystem completely changed in a single week. Yet, in the past week, multiple industry leading AI companies have announced substantially upgraded models as well as groundbreaking new features. While all of these updates sound impressive, we still have to wait to see the impact these changes will have on the AI industry as a whole.

--

--

Chandler K
The AI Archives

Harvard, UPenn, prev NASA , writing about AI, game development, and more..