Image Generated by Google Gemini Advanced

Token counting | What is it? Why it is required? How to use? — Explained to you in simple words

Maddula Sampath Kumar
Google Cloud - Community
4 min readAug 1, 2024

--

In this post you will learn what is a token in GenAI, and how to use the count token feature when working with Gemini Pro and Gemini Flash models on Vertex AI Platform.

In simple words, the count token feature allows you to check how many input tokens are provided as input and received as output. So let us understand what tokens are and how they are related to GenAI or LLMs.

What is a token?

When you prompt the AI model with queries, you receive a response. Often, the simplest form of a prompt is a text input but it can also include other types of data such as images, audio, or video. But how does the AI model understand inputs? Computers understand only 1s and 0s right?

True. Computers only understand 1s and 0s; so we need a way to translate human language into something the computers can process. We do this by breaking down text into smaller pieces called tokens. Each token gets assigned a unique number, which can then be easily converted into 1s and 0s for the computer to work with. For images, audio, and videos, there is a similar process that converts information into tokens that LLMs understand and process.

Here is a simple diagram to understand user prompts, tokens, an AI model, a response prompt, and a response.

If you are curious to know more about how these tokens are generated, you can start with learning about Transformers. Based on my past data scientist experience, I think that it is great that most of the process of converting data into tokens or tokenization is done under the hood.

Why? What is the use of knowing this information?

Many AI services cost money to use. It takes a lot of money to buy hardware, set it up, and maintain the setup over time. The cost of electricity and the internet also adds up.

Token information provides a way for you to estimate the usage of the AI services and calculate billing cost associated with it.

How to calculate the token count for a text Prompt using Vertex AI SDK?

Here is a Python code sample for you. In this code, we use the Vertex AI SDK features to count the number of tokens in a simple text input.

from vertexai.preview.tokenization import get_tokenizer_for_model

# using local tokenzier
tokenizer = get_tokenizer_for_model("gemini-1.5-flash")

prompt = "hello world"
response = tokenizer.count_tokens(prompt)
print(f"Prompt Token Count: {response.total_tokens}")

prompt = ["hello world", "what's the weather today"]
response = tokenizer.count_tokens(prompt)
print(f"Prompt Token Count: {response.total_tokens}")

Source: https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/generative_ai/count_token/count_token_example.py#L20-L35C33

Using SDK’s count token is recommended approach, as you can avoid network latency.

How to calculate the token count for a text Prompt using an AI model?

Here is a Python code sample, similar to the one above but uses the AI model itself to count tokens.

import vertexai
from vertexai.generative_models import GenerativeModel

# TODO(developer): Update project & location
vertexai.init(project=PROJECT_ID, location=LOCATION)

# using Vertex AI Model as tokenzier
model = GenerativeModel("gemini-1.5-flash")

# count token for simple text string
prompt = "hello world"
response = model.count_tokens(prompt)
print(f"Prompt Token Count: {response.total_tokens}")
print(f"Prompt Character Count: {response.total_billable_characters}")

# count token for a list of strings
prompt = ["hello world", "what's the weather today"]
response = model.count_tokens(prompt)
print(f"Prompt Token Count: {response.total_tokens}")
print(f"Prompt Character Count: {response.total_billable_characters}")

Source: https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/generative_ai/count_token/count_token_example.py#L38-L59C33

How to calculate the token count for a text prompt and text response using an AI model?

Now that you learned how to count tokens for the user prompt, let us learn how to count tokens for model responses.

import vertexai
from vertexai.generative_models import GenerativeModel

# TODO(developer): Update and un-comment below line
# project_id = "PROJECT_ID"
vertexai.init(project=project_id, location="us-central1")
model = GenerativeModel("gemini-1.5-flash")

# count token for Prompt
prompt = "Why is the sky blue?"
response = model.count_tokens(prompt)
print(f"Prompt Token Count: {response.total_tokens}")
print(f"Prompt Character Count: {response.total_billable_characters}")

# count token for Response
response = model.generate_content(prompt)
usage_metadata = response.usage_metadata
print(f"Prompt Token Count: {usage_metadata.prompt_token_count}")
print(f"Candidates Token Count: {usage_metadata.candidates_token_count}")
print(f"Total Token Count: {usage_metadata.total_token_count}")

Source: https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/generative_ai/gemini_count_token_example.py#L20-L47

How to calculate the token count for a multimodal prompt and response using an AI model?

As I mentioned earlier, information can be a combination of text, images, video, and audio. Here is another python code sample that teaches you how to count tokens for a multimodal inputs.

import vertexai
from vertexai.generative_models import GenerativeModel, Part

# TODO(developer): Update and un-comment below lines
# project_id = "PROJECT_ID"

vertexai.init(project=project_id, location="us-central1")
model = GenerativeModel("gemini-1.5-flash-001")

# count token for a multimodal prompt
contents = [
Part.from_uri(
"gs://cloud-samples-data/generative-ai/video/pixel8.mp4",
mime_type="video/mp4",
),
"Provide a description of the video.",
]
response = model.count_tokens(contents)
print(f"Prompt Token Count: {response.total_tokens}")
print(f"Prompt Character Count: {response.total_billable_characters}")

# count token for Response
response = model.generate_content(contents)
usage_metadata = response.usage_metadata
print(f"Prompt Token Count: {usage_metadata.prompt_token_count}")
print(f"Candidates Token Count: {usage_metadata.candidates_token_count}")
print(f"Total Token Count: {usage_metadata.total_token_count}")

Source: https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/generative_ai/gemini_count_token_example.py#L52-L84

Please note that in the example, the video (`gs://cloud-`) is uploaded to a cloud storage bucket. To access this file, go to https://console.cloud.google.com/storage/browser/cloud-samples-data/generative-ai/video/pixel8.mp4.

Count tokens are a good way to start the estimation of your usage. However, there can also be other costs associated with it. For example, you use Google Cloud Storage for storing your images or your model is on the other side of the world,, or you use multiple context caches. To learn more about the count token and billing details, see the following references:

References:

--

--

Maddula Sampath Kumar
Google Cloud - Community

Software Developer - Developer Relations in Google Cloud, Poland. To more about me, check https://www.linkedin.com/in/msampathkumar/