Introducing the New Google Gemini API: A Comparative Analysis with ChatGPT in the AI Revolution

Lawrence Teixeira
9 min readDec 17, 2023

--

Google’s recent announcement of the Gemini API marks a transformative leap in artificial intelligence technology. This cutting-edge API, developed by Google DeepMind, is a testament to Google’s commitment to advancing AI and making it accessible and beneficial for everyone. This blog post will explore the multifaceted features, potential applications, and impact of the Google Gemini API, as revealed in Google’s official blogs and announcements.

What is the Google Gemini?

Google Gemini is a highly advanced, multimodal artificial intelligence model developed by Google. It represents a significant step forward in AI capabilities, especially in understanding and processing a wide range of data types.

Extract from the Google Germini official website

Gemini’s Position in the AI Landscape

Gemini is a direct competitor to OpenAI’s GPT-3 and GPT-4 models. It differentiates itself through its native multimodal capabilities and its focus on seamlessly processing and combining different types of information​​. Its launch was met with significant anticipation and speculation, and it is seen as a crucial development in the AI arms race between major tech companies​.

Below is a comparison of text and multimodal capabilities provided by Google, comparing Germi Ultra, which has not yet been officially launched, with Open AI’s GTP-4.

Key Features of Gemini

  1. Multimodal Capabilities: Gemini’s groundbreaking design allows it to process and comprehend various data types seamlessly, from text and images to audio and video, facilitating sophisticated multimodal reasoning and advanced coding capabilities.
  2. Three Distinct Models: The Gemini API offers three versions — Ultra, Pro, and Nano, each optimized for different scales and types of tasks, ranging from complex data center operations to efficient on-device applications.
  3. State-of-the-Art Performance: Gemini models have demonstrated superior performance on numerous academic benchmarks, surpassing human expertise in specific tasks and showcasing their advanced reasoning and problem-solving abilities.
  4. Diverse Application Spectrum: The versatility of Gemini allows for its integration across a wide array of sectors, including healthcare, finance, and technology, enhancing functionalities like predictive analytics, fraud detection, and personalized user experiences.
  5. Developer and Enterprise Accessibility: The Gemini Pro is now available for developers and enterprises, with various features such as function calling, semantic retrieval, and chat functionality. Additionally, Google AI Studio and Vertex AI support the integration of Gemini into multiple applications.

The New Google Gemini API

The Gemini API represents a significant stride in AI development, introducing Google’s most capable and comprehensive AI model to date. This API is the product of extensive collaborative efforts, blending advanced machine learning and artificial intelligence capabilities to create a multimodal system. Unlike previous AI models, Gemini is designed to understand, operate, and integrate various types of information, including text, code, audio, images, and video, showcasing a new level of sophistication in AI technology.

Benefits for Developers and Creatives:

Gemini’s versatility unlocks a plethora of possibilities for developers and creatives alike. Imagine:

  • Building AI-powered applications: Germini can power chatbots, virtual assistants, and personalized learning platforms.
  • Boosting your creative workflow: Generate song lyrics, script ideas, or even marketing copy with Gemini’s innovative capabilities.
  • Simplifying coding tasks: Let Germini handle repetitive coding tasks or write entire code snippets based on your instructions.
  • Unlocking new research avenues: Gemini’s multimodal abilities open doors for exploring the intersection of language, code, and other modalities in AI research.

How to use the Google Germini API?

Using the Google Gemini API involves several steps and can be applied to various programming languages and platforms. Here’s a comprehensive guide based on the information from Google AI for Developers:

Setting Up Your Project

  1. Obtain an API Key: First, create an API key in Google AI Studio or MakeSuite. Securing your API key and not checking it into your version control system is crucial. Instead, pass your API key to your app before initializing the model.
  2. Initialize the Generative Model: Import and initialize the Generative Model in your project. This involves specifying the model name (e.g., gemini-pro-vision for multimodal input) and accessing your API key.

Follow a quick start with Pyhton at Google Colab.

Implementing Use Cases

The Gemini API allows you to implement different use cases:

  1. Text-Only Input: Use the gemini-pro model with the generateContent method for text-only prompts.
  2. Multimodal Input (Text and Image): Use the gemini-pro-vision model. Make sure to review the image requirements for input.
  3. Multi-Turn Conversations (Chat): Use the gemini-pro model and initialize the chat by calling startChat(). Use sendMessage() to send new user messages.
  4. Streaming for Faster Interactions: Implement streaming with the generateContentStream method to handle partial results for faster interactions.

Germini Pro

"""
At the command line, only need to run once to install the package via pip:

$ pip install google-generativeai
"""

import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")

# Set up the model
generation_config = {
"temperature": 0.9,
"top_p": 1,
"top_k": 1,
"max_output_tokens": 2048,
}

safety_settings = [
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
}
]

model = genai.GenerativeModel(model_name="gemini-pro",
generation_config=generation_config,
safety_settings=safety_settings)

prompt_parts = [
"Write a 10 paragraph about the Germini functionalities':",
]

response = model.generate_content(prompt_parts)
print(response.text)

Germini Pro Vision

Python

"""
At the command line, only need to run once to install the package via pip:

$ pip install google-generativeai
"""

from pathlib import Path
import google.generativeai as genai

genai.configure(api_key="YOUR_API_KEY")

# Set up the model
generation_config = {
"temperature": 0.4,
"top_p": 1,
"top_k": 32,
"max_output_tokens": 4096,
}

safety_settings = [
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
}
]

model = genai.GenerativeModel(model_name="gemini-pro-vision",
generation_config=generation_config,
safety_settings=safety_settings)

# Validate that an image is present
if not (img := Path("image0.jpeg")).exists():
raise FileNotFoundError(f"Could not find image: {img}")

image_parts = [
{
"mime_type": "image/jpeg",
"data": Path("image0.jpeg").read_bytes()
},
]

prompt_parts = [
image_parts[0],
"\nTell me about this image, what colors do we have here? How many people do we have here?",
]

response = model.generate_content(prompt_parts)
print(response.text)

Implementing in Various Languages

The Gemini API supports several programming languages, each with its specific implementation details:

  • Python, Go, Node.js, Web, Swift, Android, cURL: Each language requires specific code structures and methods for initializing the model, sending prompts, and handling responses. Examples include setting up the Generative Model, defining prompts, and processing the generated content.

Further Reading and Resources

  • The Gemini API documentation and API reference on Google AI for Developers provide detailed information, including safety settings, guides on large language models, and embedding techniques.
  • For specific language implementations and more advanced use cases like token counting, refer to the respective quickstart guides available on Google AI for Developers.

By following these steps and referring to the detailed documentation, you can effectively utilize the Google Gemini API for various applications ranging from simple text generation to more complex multimodal interactions.

Germini vs. ChatGPT: The Ultimate Multimodal Mind Showdown

The world of large language models (LLMs) is heating up, and two titans stand at the forefront: Google’s Germini and OpenAI’s ChatGPT. Both boast impressive capabilities, but which one reigns supreme? Let’s dive into a head-to-head comparison.

Google Germini API — Pricing

1-Free for Everyone Plan:

  • Rate Limits: 60 QPM (queries per minute)
  • Price (input): Free
  • Price (output): Free
  • Input/output data used to improve our products: Yes

2-Pay-as-you-go Plan: ( will coming soon to Google AI Studio)

  • Rate Limits: Starts at 60 QPM
  • Price (input): $0.00025 / 1K characters, $0.0025 / image
  • Price (output): $0.0005 / 1K characters
  • Input/output data used to improve our products: No

Source: Gemini API Pricing | Google AI for Developers

Open AI ChatGPT API — Pricing

GPT-4 Turbo

With 128k context, fresher knowledge, and the broadest set of capabilities, the GPT-4 Turbo is more potent than the GPT-4 and is offered at a lower price.

Learn about GPT-4 Turbo

GPT-4

With broad general knowledge and domain expertise, GPT-4 can follow complex instructions in natural language and solve difficult problems accurately.

Learn about GPT-4

GPT-3.5 Turbo

GPT-3.5 Turbo models are capable and cost-effective.

gpt-3.5-turbo This family’s flagship model supports a 16K context window optimized for dialog.

gpt-3.5-turbo-instruct It is an Instruction model and only supports a 4K context window.

Learn about GPT-3.5 Turbo

Source: Pricing (openai.com)

Strengths of Germini:

  • Multimodality: Germini shines in its ability to handle text, code, images, and even audio. This opens doors for applications like generating image captions or translating spoken language.
  • Function Calling: Germini seamlessly integrates into workflows thanks to its function calling feature, allowing developers to execute specific tasks within their code.
  • Embeddings and Retrieval: Gemini’s understanding of word relationships and semantic retrieval leads to more accurate information retrieval and question answering.
  • Custom Knowledge: Germini allows fine-tuning with your own data, making it a powerful tool for specialized tasks.
  • Multiple Outputs: Germini goes beyond text generation, offering creative formats like poems, scripts, and musical pieces.

Strengths of ChatGPT:

  • Accessibility: ChatGPT is widely available through various platforms and APIs, offering free and paid options. Germini is currently in limited access.
  • Creative Writing: ChatGPT excels in creative writing tasks, producing engaging stories, poems, and scripts.
  • Large Community: ChatGPT has a well-established user community that offers extensive resources and tutorials.

An experiment comparing the Germini and ChatGPT APIs applying the Sparse Priming Representations (SPR) technique

I conducted an experiment using the APIs from Open AI — ChatGPT and Google Germini, applying the technique(Sparse Priming Representations (SPR)) of prompt engineering to compress and decompress a text. Click here to access the experimental code I created in Google Colab.

The outcome was interesting; both APIs responded very well to the test. In the table below, we can observe a contextual difference, but both APIs were able to perform the task satisfactorily.

If you want to learn more about Sparse Priming Representations (SPR), I’ve written an entire post discussing it. Here it is below:

Prompt Engineering: Compressing Text to Ideas and Decompressing Back with Sparse Priming Representations (SPR) — Tech News & Insights (lawrence.eti.br)

Conclusion

In the rapidly evolving landscape of artificial intelligence, the Google Gemini API represents a significant milestone. Its introduction heralds a new era where AI transcends traditional boundaries, offering multimodal capabilities far beyond the text-centric focus of models like ChatGPT. Google Gemini’s ability to process and integrate diverse data types — from images to audio and video — not only sets it apart but also showcases the future direction of AI technology.

While ChatGPT excels in textual creativity and enjoys widespread accessibility and community support, Gemini’s native multimodal functionality and advanced features like function calling and semantic retrieval position it as a more versatile and comprehensive tool. This distinction is crucial in an AI landscape where the needs range from simple text generation to complex, multimodal interactions and specialized tasks.

As we embrace this new phase of AI development, it’s clear that both ChatGPT and Google Gemini have unique strengths and applications. The choice between them hinges on specific needs and project requirements. Gemini’s launch is not just a technological breakthrough; it’s a testament to the ever-expanding possibilities of AI, promising to revolutionize various sectors and redefine our interaction with technology. With such advancements, the future of AI seems boundless, limited only by our imagination and the ethical considerations of its application.

That’s it for today!

Sources:

https://tech.co/news/gemini-vs-chatgpt

https://mockey.ai/blog/google-gemini-vs-chatgpt/

https://www.pcguide.com/ai/compare/google-gemini-vs-openai-gpt-4/

https://gptpluginz.com/google-gemini/

https://www.augustman.com/sg/gear/tech/google-gemini-vs-chatgpt-core-differences-of-the-ai-model-chatbots/

https://whatsthebigdata.com/gemini-vs-chatgpt-how-does-googles-latest-ai-compare/

https://www.washingtonpost.com/technology/2023/12/06/google-gemini-chatgpt-alternatives/

Google Gemini Vs OpenAI ChatGPT: What’s Better? (businessinsider.com)

--

--

Lawrence Teixeira

CIO | CDO | Data Enthusiast | Chief Transformation Officer at Licks Attorneys