Understanding the pricing for Vertex AI Text Bison Foundation Model

Romin Irani
Google Cloud - Community
8 min readAug 30, 2023

The Text Bison Foundation Model in Vertex AI, as the documentation states is fine-tuned to follow natural language instructions and is suitable for a variety of language tasks, such as:

  • Classification
  • Sentiment Analysis
  • Entity extraction
  • Extractive Question Answering
  • Summarization
  • Re-writing text in a different style
  • Ad copy generation
  • Concept ideation

Several demos and tutorials show how easy it is to get started with using this Foundation Model. As an example, I am in Vertex AI Generative Studio and have run a sample prompt I have a day to spare in Mumbai. Give me a day long itenary with places to visitagainst the model as shown below:

I have gone with the defaults for the model and currently I have chosen the text-bison@001 model. The text-bison@001 model as the documentation states has the following capabilities:

The equivalent curl command to

For e.g. here is the curl command to invoke the model is straightforward too:

API_ENDPOINT="us-central1-aiplatform.googleapis.com"
PROJECT_ID="YOUR_GCP_PROJECT_ID"
MODEL_ID="text-bison@001"

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://${API_ENDPOINT}/v1/projects/${PROJECT_ID}/locations/us-central1/publishers/google/models/${MODEL_ID}:predict" -d \
$'{
"instances": [
{
"content": "I have a day to spare in Mumbai. Give me a day long itenary with places to visit?"
}
],
"parameters": {
"candidateCount": 1,
"maxOutputTokens": 256,
"temperature": 0.2,
"topP": 0.8,
"topK": 40
}
}'

So far so good.

Let’s talk about pricing

It is important that you understand both the capabilities of the model and the pricing. For capabilities, we will keep things simple in terms of the model.

At the time of writing this blog post, the documentation states the following:

Max input token: 8192
Max output tokens: 1024
Training data: Up to Feb 2023

A token is typically considered to be around 4 characters. Let’s keep this information with it.

So, how much are you going to get charged for your query. In the query, you provided the following prompt:

I have a day to spare in Mumbai. Give me a day long itenary with places to visit?

You might try to do some tokenize roughly on the above statements and count the number of words and then the number of characters, etc.

Is there a better way to do this? Does Vertex AI provide us with a method to determine this for our analysis? Yes it does. Enter Get token count and billable characters.

Get token count and billable characters

This feature is available in Preview and well documented. This feature is currently supported in the following models:

  • text-bison
  • chat-bison
  • textembedding-gecko

We can do a quick run (and I suggest you can do the same for the prompt that you are interested in). We will consider the following elaborate prompt:

I have a day to spare in Mumbai. Could you build out an itenary for a single day trip in Mumbai for someone who is new to the city. I would like to use public transport as much as possible.

Assuming that you have gcloud environment all setup on your machine or better still, use Cloud Shell, this is the command that we will be executing to get the token and character count.

export GCP_PROJECT="Your GCP PROJECT ID"
export MODEL_ID="text-bison"


curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://us-central1-aiplatform.googleapis.com/v1beta1/projects/$GCP_PROJECT/locations/us-central1/publishers/google/models/$MODEL_ID:countTokens"

The request.json that we will pass in the above curl command is shown below:

{
"instances": [
{ "prompt": "I have a day to spare in Mumbai. Could you build out an itenary for a single day trip in Mumbai for someone who is new to the city. I would like to use public transport as much as possible."}
],
}

We get the following output:

{
"totalTokens": 45,
"totalBillableCharacters": 150
}

If I run the above prompt in the Open AI tokenizer tool, I can approximately the same thing:

Nice ! This is a good way for you to understand the totalTokens and totalBillableCharacters that you will be eventually charged for.

This is not the end of it, since you are also charged for the output tokens. So for the above prompt, it returned me the following response:

**Day 1:**

* Morning:
* Start your day with a visit to the Chhatrapati Shivaji Maharaj Terminus (CST), a UNESCO World Heritage Site and one of the busiest railway stations in the world. Take some time to explore the station's beautiful architecture, including its soaring dome and intricate carvings.
* After visiting the CST, head to the nearby Haji Ali Dargah, a Muslim shrine that is one of the most popular tourist destinations in Mumbai. Take a walk along the seashore and enjoy the views of the Arabian Sea.
* Afternoon:
* In the afternoon, visit the Dharavi slum, the largest slum in Asia. Learn about the lives of the people who live in Dharavi and see how they have made a home for themselves in this challenging environment.
* After visiting Dharavi, take a rickshaw ride through the city's streets. This is a great way to get a feel for the city and see some of its most iconic landmarks, such as the Gateway of India and the Taj Mahal Palace Hotel.
* Evening:
* In the evening, head to the Colaba Causeway, a popular shopping street. Browse the stalls for souvenirs and gifts, or grab a bite to eat at one of the many restaurants.
* After dinner, catch a Bollywood movie at a local theater. This is a great way to experience Indian culture and see some of the country's most popular films.

This is just a suggested itinerary, of course, and you can customize it to fit your interests and time constraints. But no matter how you spend your day in Mumbai, you're sure to have a memorable experience.

Using the Open AI tool, it comes down to the following:

Vertex AI Pricing

You should bookmark the Pricing for Generative AI support on Vertex AI page.

Before we get to the specifics for the Text Bison mode, a few general notes as mentioned in the documentation:

  • You are charged by every 1,000 characters of input (prompt) and every 1,000 characters of output (response).
  • Characters are counted by UTF-8 code points and white space is excluded from the count.
  • Prediction requests that lead to filtered responses are charged for the input only.

Now, let us look at the pricing for the Text Bison model and you will notice that the pricing per 1000 characters is given ($0.0010 at the time of this blog post).

Let’s look at our previous example and the input / output tokens/characters returned:

Input : ~ 150 characters
Output: ~ 1500 characters

Assuming that 10 calls are made, we are looking at the following total number of characters (taking help from the template provided by the official documentation):

Input cost:
150 input characters x 10 prompts = 1,500 total input characters;
1,500 total input characters x ($0.001 / 1000) = $0.0015 input cost.

Output cost:
1500 output characters x 10 prompts = 15,000 total output characters;
15,000 total output characters x ($0.001 / 1000) = $0.015 output cost.

Total cost:
$0.0015 input cost + $0.015 output cost = $0.0165 total cost.

Hope this helps you understand the pricing of the Text Bison model. The goal of the article is to ensure that you are fully aware of how the pricing could add up in case this is scale to 100s of users doing 100s of query per hour/day.

Few considerations

A couple of points are important to understand here. First up is how the token and characters can range if it is not English. For e.g. the same prompt if I pass in Hindi and get the output in Hindi language, we are looking at the following numbers:

The prompt in Hindi is as follows:

मुंबई में मेरे पास एक दिन खाली है। क्या आप किसी ऐसे व्यक्ति के लिए मुंबई की एक दिवसीय यात्रा की योजना बना सकते हैं जो शहर में नया है। मैं यथासंभव सार्वजनिक परिवहन का उपयोग करना चाहूँगा।

The Vertex AI token count endpoint returns me the following output:

{
“totalTokens”: 62,
“totalBillableCharacters”: 148
}

Try out with different languages and you will find that the totalTokens and totalBillableCharacters will differ across languages (due to language density and other factors). This also brings up an interesting point whether charging by Tokens or by Characters is more beneficial but that is a completely different discussion and I recommend that you experiment with it, especially if you have to support several languages.

The next one is to consider the just announced Text Bison 32K model. This model properties are as follows:

You see a huge bump in the Token limits supported. A total of 32K in token (with both input and output tokens being counted towards a total max. of 32K). The point to be made here is the following:

  1. It makes a great demo to insert the entire 40–50 page document in the Prompt itself and directly query the document.
  2. If you are going to do that — do consider measuring first what your per call cost is going to be. Thousands of characters across 100s of users could possibly add up quickly.

Hope this helps clarify the pricing model. The above are just my views as a developer on how I went about understanding pricing for the Text Bison model. I am not a pricing expert and would love to hear your reviews and other points that could be of interest to all of us.

References

--

--