Gemini has entered the chat: Understanding basic LLM settings

Maciej Strzelczyk
Google Cloud - Community
5 min readJul 3, 2024

In the previous post, I showed you how to get started with a basic Discord bot that uses Gemini to communicate with chat users. I used this simple code to initiate a model object that was later used to generate chat responses:

import vertexai.generative_models as genai

model = genai.GenerativeModel(
model_name="gemini-1.0-pro",
generation_config=genai.GenerationConfig(max_output_tokens=1900)
)

In this article, I want to provide more insight into the configuration that controls the model’s behavior. I will not provide code samples for every setting, as sample code can be easily generated in Vertex AI Studio using the < > GET CODE button.

From the GenerativeModel reference page, we see that there are following parameters available:

  • model_name
  • generation_config
  • safety_settings
  • tools
  • tool_config
  • system_instruction

Let’s look at them one by one.

Model name

This is the only required parameter: the identifier of the LLM model you are going to use.The model is what is responsible for generating responses to your query. There are many available models, and it’s up to you which one to use. For the sake of simplicity, we’ll stick with the Gemini model family, which is easy to use and fits best with the Discord bot use case. To find more information about available models, visit the Google Cloud Model Garden.

The models you might want to try:

  • Gemini 1.0 Pro (gemini-1.0-pro)
  • Gemini 1.5 Flash (gemini-1.5-flash)
  • Gemini 1.0 Ultra (gemini-1.0-ultra)

Generation config

This parameter controls the behavior of the selected model. You can see on its reference page that it provides settings similar to the ones available in the Vertex AI Studio or Google AI Studio. Those settings are:

Temperature

A float value, usually from 0.0 to 1.0. In LLMs, temperature controls the randomness of output. A lower temperature leads to predictable responses, while a higher temperature encourages creativity. LLMs generate text by randomly sampling words from a probability distribution. The temperature modulates this process, with lower values favoring common words and higher values increasing the likelihood of less likely words. The aim is to balance coherence and creativity. Low temperatures are useful for tasks like summarizing documents, while high temperatures are suitable for creative writing.

Top_p

A float value from 0.0 to 1.0. Top-p changes how the model selects tokens for output. Tokens are selected from most probable to least probable, until the sum of their probabilities equals the top-p value. For example, if tokens A, B, and C have a probability of 0.3, 0.2, and 0.1, and the top-p value is 0.5, then the model will select either A or B as the next token (using temperature).

Top_k

An integer value from 1 to 40. Top-k, just like top-p, affects the way the next token is selected by the model. A top-k of 1 means the selected token is the most probable among all tokens in the model’s vocabulary (also called greedy decoding), while a top-k of 3 means that the next token is selected from among the 3 most probable tokens (using temperature).

Candidate_count

An integer describing how many answers the model should generate for your prompt. Passing a value higher than 1 results in slower response and higher cost. You are not guaranteed to receive the exact number of responses you asked for, as some might be blocked by safety filters or other policies.

Max_output_tokens

An integer suggesting maximum length of the response. This is useful when you want your responses to be short. Note that if your application imposes a hard limit on the response length, you will need to check the length yourself, as this is only a suggestion for the LLM.

Stop_sequences

A list of strings that, when generated by the LLM, will terminate the response generation. This is useful if you ask for structured output. For example, if you ask for HTML code for an <article> element, your stop_sequences should contain </article>. In case of open text responses, it can be left empty.

Presence_penalty and frequency_penalty

Both of those parameters are used to encourage the LLM to not repeat certain phrases or words. Those parameters are not supported by the Gemini models, so I won’t get into details here.

Response_mime_type

A string specifying the MIME type of the response you expect to get. This is useful when working with models that generate non-text responses like images, sounds or videos. This is ignored in case of Gemini models, which generate text-only responses.

Safety settings

Gemini has certain safety features enforced by Google by default. There are topics it will not respond to or information it will not share. However, those default limitations are only a baseline. You might want to have better control over the types of content you want to receive from Gemini. Safety settings allow you to put additional restrictions on LLMs answers. There are four safety categories:

  • hate speech
  • dangerous content
  • sexually explicit content
  • harassment content

To learn more about the safety settings, visit the official documentation, where you can find detailed information on their meaning and how to use them.

Tools and tool config

The tool configuration allows your GenAI model to reach out and interact with the world. This configuration will be covered in an upcoming article, as it is a big topic in itself.

System_instruction

A string with human language instructions. This parameter allows you to control the behavior of your model, providing it additional guidance on how to reply to user requests. You can provide additional instructions or context information, so the model can provide better answers. However, do not treat it as a security solution that will prevent users from abusing your Discord bot.

Do not confuse this parameter with the prompt that’s being sent to the AI to generate a response. The information you include here will be attached to all the requests done through this model object. Here are a few examples of what you might want to include here:

  • You are an insurance expert. Provide answers in the most professional and polite manner.
  • You are a chat bot designed to answer questions about company X. Do not say anything negative about X. Do not use emojis.
  • Always talk like a pirate!

Those instructions, combined with the user prompt and other settings, will produce the final answer for the users.

Note: This parameter is not available for every Gemini model. Check the documentation to see which models are supported.

Making use of the new knowledge

Let’s now use this new knowledge to improve the bot you created previously. Here’s an example of how you can enhance user experience with a couple simple lines of code.

model = genai.GenerativeModel(
model_name="gemini-1.0-pro",
generation_config=genai.GenerationConfig(
temperature=1,
max_output_tokens=1000
),
system_instruction="You are a Discord bot named GeminiBot. "
"Your task is to provide useful information to "
"users interacting with you. "
"You should be positive, cheerful and polite. "
"Feel free to use the default Discord emojis."
)

And let’s see how this affects your bot’s replies!

Before the change:

Maciek asks: Hey @GeminiBot who are you? GeminiBot replies with generic message about it being a large language model etc.
GeminiBot answer before introducing system instructions with context about its role.

After the change:

Maciek asks: Hey @GeminiBot who are you? GeminiBot answers that its a friendly Discord bot, here to help. The answer includes a couple of nice emojis.
GeminiBot answer after introducing system instructions like in the code block above.

Now feel free to play around with the various settings like temperature and see how it affects the answers you get.

I have published my basic version of the bot in a GitHub repository, you can use it as a base for your own creation. 🙂

--

--