Gemini is like Leonardo Da Vinci while GPT4 and PaLM2-Large (text-unicorn) are like Wisława Szymborska and Stanisław Lem

Olejniczak Lukasz
Google Cloud - Community
6 min readDec 18, 2023

When I joined Google — it was June 2021 — I was really interested in one of the GCP announcements about the new Google Cloud Service named Matching Engine (today Vertex Vector Search) which was based on technology developed by Google engineers for Google Search.

Matching engine was all about embeddings, vector database and fast nearest neighbor search. I was doubly surprised when I saw limited interest in this service. Limited — meaning that it was primarily of interest to companies that were already very good at handling data. That was almost 3 years ago and when you open internet today vector databases are all the rage. It seems like every other post is about them and different RAG techniques. Another thing is that it also happens that people ask me if Google has something to offer in this space ;)

It feel like deja vu when I read different posts on GPT4 vs Gemini, whether Gemini is way better or only a little better, and how Microsoft used some heavy prompting technique to level on MMLU benchmark. Seems like we all just used to live with text-only models and some even assumed that AI is about building chatbots.

With the announcement of Gemini there is an opportunity to clarify that Gemini is not text-only model. It is multimodal experience and v1.0 implementation of Google’s vision of a single, everything-to-everything model capable if handling text, images, video, audio, code, …. and probably many more in the future. Gemini is a different category — and it is totally fine that GPT4+ may one day join this club. It is just the beginning of AI — models will be better and smarter — yet these models will be just tools and it is up to business leaders and developers to picture the future and use these amazing tools to get there. But ‘stay hungry, stay foolish’ — future will not be just about text — and Gemini is a great manifestation of that future.

How multimodal models like Gemini can help? I saw that funny video when someone reproduced one of Gemini demos using OCR and text-to-speech model — so if we can built similar things with simpler tools why we would even need multimodal models?

There are many reasons and I will not even try to list them here. Here is one that I think is quite enlightening.

One of the amazing phenomenon observed when training larger language models is so-called emerging abilities https://arxiv.org/abs/2206.07682:

Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot be predicted simply by extrapolating the performance of smaller models. The existence of such emergence implies that additional scaling could further expand the range of capabilities of language models.

Model size is one knob that brings emerging abilities. Additional modalities may bring them as well. Gemini supports 5 modalities (text, images, video, audio, code) https://deepmind.google/technologies/gemini/#introduction and when compared on academic benchmarks it is better than any other state-of-the art model that is focused just on a single modality. So it is better:

  • with text than any other text model (GPT4),
  • better with video than any other video model (Flamingo, SeVila),
  • better with images than any other image model (GPT4-V),
  • better with audio than the best audio model (Whisper v3)

GPT4 is a very capable text model with quite impressive reasoning skills — but so is PaLM2-Large available on Vertex AI as text-unicorn:

text-unicorn is the largest text-only model from PaLM2 family of models released by Google in May 2023 https://arxiv.org/abs/2305.10403. Google Cloud users have access to two sizes: unicorn and bison.

If we assumed text-only Language Models are scribes of the digital age then PaLM2-Large (text-unicorn) is like the cosmic architect, crafting universes as vast and enigmatic as Lem’s dystopias and GPT-4 is like the lyrical maestro of AI, weaving tales as intricate as Szymborska’s sonets. Both can summarize information or provide concise answers to questions weaving together different sources and perspectives. Both are used when we really need exceptional reasoning capabilities.

Google Search

Google Search

Gemini however is like Leonardo da Vinci. Similar to Leonardo da Vinci’s mastery of diverse artistic disciplines who was active as a painter, draughtsman, engineer, scientist, theorist, sculptor, and architect, Gemini excels across various modalities.

Google Search

How about Gemini Pro, that is available on Google Cloud now? It handles text, images and video — which already enables quite a lot of new use cases.

To me, Gemini Pro is like Marie Skłodowska-Curie — the first woman who win Nobel prizes in two different fields — physics and chemistry. Curie’s groundbreaking multidiscipline research not only revolutionized our understanding of radioactivity but also paved the way for countless other women to pursue careers in STEM fields. Therefore, when discussing Gemini Pro’s potential to unveil the unseen and push boundaries, let’s also remember Marie Curie’s legacy as a trailblazing scientist who shattered glass ceilings and opened doors for future generations.

Google Search

Gemini isn’t just a language model. It’s a Renaissance in your pocket, a reminder that the limits of creativity lie not in silicon or synapses, but in our own willingness to dream. Have fun!

This article is authored by Lukasz Olejniczak — Customer Engineer at Google Cloud. The views expressed are those of the authors and don’t necessarily reflect those of Google.

Please clap for this article if you enjoyed reading it. For more about google cloud, data science, data engineering, and AI/ML follow me on LinkedIn.

You may want to check my previous articles on applied Generative AI on Google Cloud

  • Google Imagen (through GCP Vertex AI Studio) as fashion design assistant

https://medium.com/google-cloud/google-imagen-through-gcp-vertex-ai-studio-as-fashion-design-assistant-a7c1cce547ab

  • Build Flutter application in python to chat in ANY language with Google Cloud LLMs available on Vertex AI

https://medium.com/google-cloud/build-flutter-application-in-python-to-chat-in-any-language-with-google-cloud-llms-available-on-574599cce85c

  • Flutter for data engineering and data science! Flet.dev — running Flutter apps built in Python on Google Cloud with Cloud Run

https://medium.com/google-cloud/flutter-for-data-engineering-and-data-science-1ab54381f9d3

--

--