Exploring Google’s Gemini AI: A Hands-On Guide to Leveraging the Latest Large Language Model

4 min readDec 8, 2023

Gemini is natively multimodal, which gives you the potential to transform any type of input into any type of output.

Gemini, Google’s advanced AI model with multimodal capabilities.

Gemini is built from the ground up for multimodality — reasoning seamlessly across text, images, video, audio, and code, a significant leap in the field of artificial intelligence and natural language processing.

Gemini, as an LLM, is part of a burgeoning family of AI models that specialize in understanding, generating, and interacting with human language. What sets Gemini apart is its advanced algorithms and expansive dataset, allowing it to grasp context, generate more coherent and relevant responses, and offer improved accuracy in language understanding.

Key Features of Gemini

Enhanced Contextual Understanding: Gemini is the first model to outperform human experts on MMLU (Massive Multitask Language Understanding), one of the most popular methods to test the knowledge and problem-solving abilities of AI models.
Multimodality: Gemini is built from the ground up for multimodality — reasoning seamlessly across text, images, video, audio, and code.
Anything to anything: Gemini is natively multimodal, which gives you the potential to transform any type of input into any type of output.
Customizability: Users can fine-tune Gemini for specific tasks or industries.

Gemini comes in three sizes

Nano — Most efficient model for on-device tasks.
Pro — Best model for scaling across a wide range of tasks.
Ultra — The most capable and largest model for highly complex tasks.

Gemini API: Quickstart with Python — A Basic Example

This quickstart demonstrates how to use the Python SDK for the Gemini API, which gives you access to Google’s Gemini large language models. In this quickstart, you will learn how to:

Set up your development environment and API access to use Gemini.
Generate text responses from text inputs.
Generate text responses from multimodal inputs (text and images).
Use Gemini for multi-turn conversations (chat).
Use embeddings for large language models.

Prerequisites — To complete this quickstart locally, ensure that your development environment meets the following requirements:

Python 3.9+
An installation of jupyter to run the notebook.

Setup

Code taken from the official quickstart guide by Google. Here’s the step-by-step guide.

Install the Python SDK

The Python SDK for the Gemini API, is contained in the google-generativeai package. Install the dependency using pip:

!pip install -q -U google-generativeai

Import packages

Import the necessary packages.

import pathlib
import textwrap

import google.generativeai as genai

# Used to securely store your API key
from google.colab import userdata

from IPython.display import display
from IPython.display import Markdown


def to_markdown(text):
  text = text.replace('•', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

Setup your API key

Before you can use the Gemini API, you must first obtain an API key. If you don’t already have one, create a key with one click in Google AI Studio.

Get an API key

Once you have the API key, pass it to the SDK. You can do this in two ways:

Put the key in the GOOGLE_API_KEY environment variable (the SDK will automatically pick it up from there).
Pass the key to genai.configure(api_key=...)

# Or use `os.getenv('GOOGLE_API_KEY')` to fetch an environment variable.
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

genai.configure(api_key=GOOGLE_API_KEY)

List models

Now you’re ready to call the Gemini API. Use list_models to see the available Gemini models:

gemini-pro: optimized for text-only prompts.
gemini-pro-vision: optimized for text-and-images prompts.

for m in genai.list_models():
  if 'generateContent' in m.supported_generation_methods:
    print(m.name)

Generate text from text inputs

For text-only prompts, use the gemini-pro model:

model = genai.GenerativeModel('gemini-pro')

The generate_content method can handle a wide variety of use cases, including multi-turn chat and multimodal input, depending on what the underlying model supports. The available models only support text and images as input, and text as output.

In the simplest case, you can pass a prompt string to the GenerativeModel.generate_content method:

%%time
response = model.generate_content("What is the meaning of life?")

In simple cases, the response.text accessor is all you need. To display formatted Markdown text, use the to_markdown function:

to_markdown(response.text)

If the API fails to return a result, use GenerateContentRespose.prompt_feedback to see if it was blocked due to safety concerns regarding the prompt.

response.prompt_feedback

Gemini is the first model to outperform human experts on MMLU (Massive Multitask Language Understanding), one of the most popular methods to test the knowledge and problem-solving abilities of AI models. Gemini represents the cutting edge in the world of Large Language Models.

As Gemini continues to evolve, it’s crucial to stay updated with its advancements and understand how they can be leveraged in various domains. The potential of Gemini in fields like automated content creation, language translation, and even complex problem-solving is vast and still unfolding.

Conclusion

Google has unveiled Gemini, an innovative AI model that stands out for its multimodal capabilities. This advanced technology is trained natively on diverse data types, such as text, images, and audio. Gemini excels in complex reasoning, efficiently processing and understanding multiple forms of data simultaneously. Its proficiency extends to intricate fields like mathematics, physics, and coding in various programming languages. During development, Google focused on scalability, efficiency, and ensuring safety by conducting extensive evaluations for potential biases and toxicity. Gemini’s future integration into Google’s product ecosystem promises to significantly enhance functionalities, particularly in areas requiring complex reasoning and deeper understanding. For a more detailed exploration, visit Google’s blog.