Introduction to Gemini AI
Google recently made waves announcing their latest AI model Gemini which they called it their largest and most capable AI model
Intrigued by this and having tried ChatGPT in the past, I wanted to give it a try and got to know that one of their models has been released in public preview.
I will share how you can get your hands dirty with the model in this blog.
Quick Intro
Gemini is series of foundation models which Google has built and has been built from ground up to be multimodal which means it can generalize, understand and operate across different types of inputs like text, audio, video, image etc
Their first version Gemini 1.0 is optimized for three different sizes :
- Gemini Ultra — largest and most capable model for highly complex tasks.
- Gemini Pro — best model for scaling across a wide range of tasks.
- Gemini Nano — most efficient model for on-device tasks.
Out of these three, Gemini Pro is the model which is made available for public preview and we would go through this in this blog today.
Getting started with Gemini Pro
We will use the Gemini Pro API in Python today to play around with the model.
To begin, you will need to generate the API key to access the models programatically and also install the dependency to use it in your code
To install the dependency use pip as shown below
pip install -q -U google-generativeai
To generate the API key go to https://makersuite.google.com/ and login using your google account if not logged in already. This will take you to the Google AI studio, where you will see an option to generate your API key as show below.
Once generated, copy the API Key and keep it handy as you will require this in your code later.
Generate text from text inputs
We will pass a simple text question to the model and get Gemini to provide us with answers
Below is the code snippet I used to generate an answer
import os
import google.generativeai as genai
os.environ['GOOGLE_API_KEY'] = "Your API Key here"
genai.configure(api_key = os.environ['GOOGLE_API_KEY'])
model = genai.GenerativeModel('gemini-pro')
response = model.generate_content("List the top 5 generative AI models")
print(response.text)
We call the GenerativeModel class from genai which is responsible for creating a model and it takes the model type as its input. Here we use gemini-pro as a the model type which is text generation model which takes text as input and produces output in a text format. This model has input context length of 32k tokens and output context length of 2k tokens
Running the above code, Gemini produces output as shown below
As you see, it is very simple to set this up and try Gemini on your desktop. Gemini promises a lot more capabilities than just text generation. This blog : https://blog.google/technology/ai/google-gemini-ai/ from Google will give you an idea about these capabilities.
I have just touched upon one aspect of Gemini. In the next blog, I will share how we can generate text output from images and also show how Gemini has established foundations for Responsible AI