Getting Started with Gemini 1.5 Pro and Google AI Studio:

A beginners guide for 2024

Chandler K
The AI Archives
Published in
6 min readApr 19, 2024

--

Google is taking a big step into the world of Large Language models with Gemini 1.5 Pro. Last week, the model became available to developers through the Gemini API and Google AI Studio. Similarly to OpenAI’s Playground (which I recently wrote an article about) the Google AI Studio is a simple interface to test Google’s latest Gemini models. This product shouldn’t be confused with the similarly named “Vertex AI Studio” which is an entirely different product built for large Enterprises.

In this article, I will cover the following:

  • What is Google AI Studio and the basics of using it
  • What are the different modes?
  • How to use the multimodal features available in Google AI Studio
  • When to use Google AI Studio vs Gemini

What is Google AI Studio?

Google AI Studio is a web-based environment where developers can write, run, and test prompts using Google’s Gemini models. Additionally, if you want to use the Gemini API, you can get your API key from inside Google AI Studio. Broadly, it is designed to be a simple entry point for developers to not just use models but also get started building with the Gemini API. If you don’t want to use the Gemini API, you can skip the API key step all together and just test the models.

The Google AI Studio Basics

If you’re familiar with OpenAI’s Playground, then much of what is about to be discussed will be familiar. Let’s walk through the basic UI as is shown below:

Regardless of what mode you select, the “Run Settings” will be the same.

  • Model: Currently, Google is offering three different models, Gemini 1.0 Pro, Gemini 1.5 Pro, and Gemini 1.0 Pro 001 (Tuning). Each of these models has their own unique benefits. For example, Gemini 1.5 Pro allows the user to insert images in addition to video, audio, and other files. You can learn more about Google’s LLMs in their documentation.
  • Temperature: This variable controls the “creativity level” of the model. By increasing this value, the model will choose less statistically likely tokens when creating responses. The best way to understand the impact of this variable is to experiment with it yourself and see how the outputs change.
  • Stop Sequence: This variable makes the model stop generating tokens when a specific word / phrase is generated. For example, if my stop sequence was “world” and I prompted the model to say “Hello world”, the output generated would be “Hello”. This means that the stop sequence will ever be displayed / created by the model.
  • Safety Settings: Due to the nature of Large Language Models, responses can sometimes be unpredictable. While steps have been taken to ensure the model generates appropriate responses, Google created this management tool to ensure developers can better control outputs.
  • Top K: This variable determines if the model will select the most probable next tokens. A high Top K value will have the model consider a larger number of (lower probability) tokens while a higher Top K value will make the outputs more deterministic. In other words, the higher the Top K value, the more predictable the model is.
  • Top P: This setting is available under the “Advanced settings” in the bottom right corner. This variable influences the amount of tokens that the model considers when generating a response. To be clear, this variable does not impact the context window, because the context window is used in the input phase (before the generation has started) and the Top P only impacts how the response is generated. In other words, the Top P value dictates the randomness of the model’s output.

Google AI Studio’s different Modes

Currently, Google AI Studio is offering three distinct modes when creating with the Gemini API. These options can be selected by clicking “Create new” in the top left corner, as seen below.

Each of these modes is meant to address a specific use case.

  • Chat Prompt: This commonly seen prompt type is used in chatbots like ChatGPT and Google’s Gemini chatbot. It is used to respond to user queries in a conversational manner. This is where you can customize the chatbot to speak or act in a certain way. Want a friendly custom service chatbot? A sarcastic chatbot that talks down to you? This is where you would tell the model to act in a certain way.
  • Freeform Prompt: This prompt type is used for open-ended responses. Creative writing, brainstorming, and learning assistance can all excel using this mode. Many writing tools and products use this or similar processes.
  • Structured Prompt: Uniquely, this prompt type has the user provide examples (sample data) of queries and responses of the desired behavior. In the below example, I gave sample questions and responses about which US cities are best. Obviously this question will be different for everyone, but we can see that the model followed the examples it was given.

How to use the multimodal features available in Google AI Studio

One of the most unique features of the Google AI Studio is that various file types can be used in the environment. These include images, videos, audio, and files from Google Drive. This means that developers can easily test out if their idea works, and how to work through any bugs. For example, if we use the Chat Prompt from above and add a video we want summarized, the model will access the inserted video and accomplish its task. Let’s take a look.

In this case, we used a five minute video that showcased various dinosaur fossils. When prompted, the model can interact with the video and produce a summary of its contents. These types of use cases can’t be done in other AI Playgrounds (OpenAI for example).

When to use Google AI Studio vs Gemini

While the Google AI Studio is a powerful tool, it’s important to understand when you should use it vs Google’s Gemini chatbot. The Gemini chatbot is Google’s equivalent to OpenAI’s ChatGPT. Users can expect to engage with the model through conversation and with limited control of the reasoning and response. Alternatively, if users intend to make changes to the way the Gemini model responds or need an API key, Google AI Studio is the tool to achieve this. After testing and creating a project in the Studio, users can export their work directly to code by clicking “Get Code” in the top right corner. Once outside the Studio and connected to the Gemini API, users can connect to other APIs like the Keymate.AI API. By doing so, they can utilize features like Keymate Memory and Keymate’s Confidence Scoring to help identify hallucinations. Overall, it was impressive to see what the Gemini models were capable of, especially around multi-modal use cases.

--

--

Chandler K
The AI Archives

Harvard, UPenn, prev NASA , writing about AI, game development, and more..