A Comprehensive Guide about Google Cloud Generative AI Studio

Rubens Zimbres
Google Cloud - Community
25 min readJul 25, 2023

It’s been some time since I wanted to make a comprehensive walkthrough in Google Cloud Generative AI Studio. Here I present most options, including Python code for the server and for the client for many applications, that can be easily tested locally and implemented.

The last part of this article presents the Fine Tuning of a Large Language Model in Generative AI Studio.

Generative AI Studio contains the following services:

  • Language
  • Speech
  • Video

In Language, you will be able to test, tune, and deploy generative AI language models. You can use the PaLM API for Question Answering, chat for code generation, content generation, chat, summarization, classification and writing.

You can access the Generative AI Studio Image Models documentation here.

CHAT

First, let’s explore the chat for customers with basic company information. This is basically a Question Answering chatbot, the same way I wrote in my previous article, found here. The Q&A app is available here, on my website.

In the Google Cloud console, you add a context in plain text, of company information, products, prices, location, working hours, add some examples of user input and AI response. Then, you can make inference:

At the top right of the console, you will find the model type (PaLM 2 chat-bison@001) and its hyperparameters. Regarding hyperparameter tuning, refer to this link to tune them.

Let’s talk about Python code. The second example is a customer service chatbot. The setup is exactly the same. You also have context, examples and can interact with the chatbot.

First, install the necessary libraries:

pip3 install google-cloud-aiplatform==1.25
pip install Flask==2.2.2
pip install watchdog

On the top right of the console, click <> View Code. First, we will get the code in Python. We will do the same for all examples. Insert the code in a file chat.py:

import vertexai
from vertexai.preview.language_models import ChatModel, InputOutputTextPair
import os

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/home/user/key.json'

vertexai.init(project="your-project", location="us-central1")
chat_model = ChatModel.from_pretrained("chat-bison@001")
parameters = {
"temperature": 0.2,
"max_output_tokens": 1024,
"top_p": 0.8,
"top_k": 40
}
chat = chat_model.start_chat(
context="""You are a customer service representative of Cymbal.
Here is Cymbal policy: How many days do I have to return my purchase?
We offer free returns and exchanges within 30 days of your delivery, with exceptions as described in our Returns Policy. Certain items are designated as final sale and not eligible for returns or exchanges. All on-sale purchases are final.
""",
examples=[
InputOutputTextPair(
input_text="""I was in a car accident last month and couldn\'t return the item in the 30 days time window. Can you make an exception for me?""",
output_text="""I understand that you had an emergency and were unable to return your order within the 30-day window. I would be happy to make an exception for you. Please contact our customer service department at 1-555-010--2667 and they will be able to assist you with your return.
Once your return is approved, you will be issued a return label. Please pack the item carefully and ship it back to us. We will process your refund within 3-5 business days of receiving the returned item."""
),
InputOutputTextPair(
input_text="""I forgot to return the item within 30 days. Can you make an exception for me?""",
output_text="""I understand that you want to return the item, but we are unable to return your order since you have missed the 30-day window. Please let me know anything else I can assist you with."""
)
]
)

response = chat.send_message("""{}""".format(data),**parameters)
response.text

Then, add a Flask application to this code. It will become like this:

import vertexai
from vertexai.preview.language_models import ChatModel, InputOutputTextPair
import os
from flask import Flask, request, jsonify
import json
from collections import Counter

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/home/user/key.json'

vertexai.init(project="your-project", location="us-central1")
chat_model = ChatModel.from_pretrained("chat-bison@001")
parameters = {
"temperature": 0.2,
"max_output_tokens": 1024,
"top_p": 0.8,
"top_k": 40
}
chat = chat_model.start_chat(
context="""You are a customer service representative of Cymbal.
Here is Cymbal policy: How many days do I have to return my purchase?
We offer free returns and exchanges within 30 days of your delivery, with exceptions as described in our Returns Policy. Certain items are designated as final sale and not eligible for returns or exchanges. All on-sale purchases are final.
""",
examples=[
InputOutputTextPair(
input_text="""I was in a car accident last month and couldn\'t return the item in the 30 days time window. Can you make an exception for me?""",
output_text="""I understand that you had an emergency and were unable to return your order within the 30-day window. I would be happy to make an exception for you. Please contact our customer service department at 1-555-010--2667 and they will be able to assist you with your return.
Once your return is approved, you will be issued a return label. Please pack the item carefully and ship it back to us. We will process your refund within 3-5 business days of receiving the returned item."""
),
InputOutputTextPair(
input_text="""I forgot to return the item within 30 days. Can you make an exception for me?""",
output_text="""I understand that you want to return the item, but we are unable to return your order since you have missed the 30-day window. Please let me know anything else I can assist you with."""
)
]
)

app = Flask(__name__)

@app.route('/predict', methods= ['POST'])
def predict():
if request.get_json():
x=json.dumps(request.get_json())
x=json.loads(x)
else:
x={}
data=x["text"]
response = chat.send_message("""{}""".format(data),**parameters)
response=jsonify(response.text)
print(response)

return response

if __name__ == "__main__":
app.run(port=8080, host='0.0.0.0', debug=True)

Note that we are using ChatModel.from_pretrained(“chat-bison@001”)

As we are testing locally, you will need to use: os.environ[‘GOOGLE_APPLICATION_CREDENTIALS’] = ….. in the notebook, plus:

gcloud auth login
gcloud config set project your-project

For Python code deployed in containers you won’t need to define Google Application Credentials, as a service account will be used to access the API. Ok, we just set up the server. Now let’s create a file to call this server, called call_flask.py:

import requests
url = 'http://127.0.0.1:8080/predict'
r = requests.post(
url, json={"text": "What products does Cymbal sell?"})
print(r.json())

Now, supposing you are using VS Code, split the terminal in two, one left for the server and one right for the client:

In the terminal on the left you will run:

python3 chat.py

In the terminal on the right, you will run:

python3 call_flask.py

This will run the Flask server on the left and submit your question “What products does Cymbal sell?” to it:

You will get a code 200 = successful, and the response in the right terminal.

“Cymbal sells a variety of products, including clothing, shoes, accessories, and home goods. We offer a wide selection of brands and styles to choose from, so you’re sure to find something you love.”

It’s done. Now you can have a chatbot with Generative AI! You may ask: Does it replace Dialogflow? If you just want a simple Q&A like in my website, yes. Otherwise the answer is No. Because Dialogflow CX also takes care of the flows in the conversation. With this Flask application, you have one endpoint called via webhook for the conversation, Q&A, etc. You may want to have another endpoint to effectively query a database of products, add orders, in another part of the conversation. These endpoints are called by Dialogflow webhook.

In Dialogflow, you will submit the raw user message from the chat to the server endpoint via webhook, and get the response back. However, the response payload you get back into Dialogflow must be configured in the Flask application of the server as:

response = jsonify({
"fulfillment_response": {
"messages": [{
"text": {
"text": [
response.text
]
}
}]
}
})

SUMMARIZATION

In summarization, you can have the Freeform (open field) and Structured data (table). They both use the PaLM API text-bison@001.

The Freeform is as simple as clicking a button:

Note that in the last line of text, it’s written Summary:

In the Structured Summarization, we will provide examples of text summarization for the LLM. You may want to summarize text with a specific emphasis, like quantitative or qualitative advantages, simple summarization, etc:

Then, if we want to summarize a phone call conversation, we enter the phone call transcript (via Speech-to-Text API) and predict the summarization. Now, instead of ChatModel.from_pretrained(“chat-bison@001”), we will use TextGenerationModel.from_pretrained(“text-bison@001”). In <> View Code:

import vertexai
import os
from vertexai.language_models import TextGenerationModel
from flask import Flask, request, jsonify
import json
from collections import Counter
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/home/user/key.json'


vertexai.init(project="your-project", location="us-central1")
parameters = {
"temperature": 0.2,
"max_output_tokens": 256,
"top_p": 0.8,
"top_k": 40
}
model = TextGenerationModel.from_pretrained("text-bison@001")

app = Flask(__name__)

@app.route('/predict', methods= ['POST'])
def predict():
if request.get_json():
x=json.dumps(request.get_json())
x=json.loads(x)
else:
x={}
data=x["text"]
print(data)

response = model.predict("""Provide a summary with about two sentences for the following article: Beyond our own products, we think it\'s important to make it easy, safe and scalable for others to benefit from these advances by building on top of our best models. Next month, we\'ll start onboarding individual developers, creators and enterprises so they can try our Generative Language API, initially powered by LaMDA with a range of models to follow. Over time, we intend to create a suite of tools and APIs that will make it easy for others to build more innovative applications with AI. Having the necessary compute power to build reliable and trustworthy AI systems is also crucial to startups, and we are excited to help scale these efforts through our Google Cloud partnerships with Cohere, C3.ai and Anthropic, which was just announced last week. Stay tuned for more developer details soon.
Summary: Google is making its AI technology more accessible to developers, creators, and enterprises. Next month, Google will start onboarding developers to try its Generative Language API, which will initially be powered by LaMDA. Over time, Google intends to create a suite of tools and APIs that will make it easy for others to build more innovative applications with AI. Google is also excited to help scale these efforts through its Google Cloud partnerships with Cohere, C3.ai, and Anthropic.

Provide a summary with about two sentences for the following article: The benefits of electricPromptData kitchens go beyond climate impact, starting with speed. The first time I ever cooked on induction (electric) equipment, the biggest surprise was just how incredibly fast it is. In fact, induction boils water twice as fast as traditional gas equipment and is far more efficient — because unlike a flame, electric heat has nowhere to escape. At Bay View, our training programs help Google chefs appreciate and adjust to the new pace of induction. The speed truly opens up whole new ways of cooking.
Summary: Electric kitchens are faster, more efficient, and better for the environment than gas kitchens. Induction cooking is particularly fast, boiling water twice as fast as traditional gas equipment. This speed opens up whole new ways of cooking. Google chefs are trained to appreciate and adjust to the new pace of induction cooking at Bay View.

Provide a summary with about two sentences for the following article: We\'re also using AI to forecast floods, another extreme weather pattern exacerbated by climate change. We\'ve already helped communities to predict when floods will hit and how deep the waters will get — in 2021, we sent 115 million flood alert notifications to 23 million people over Google Search and Maps, helping save countless lives. Today, we\'re sharing that we\'re now expanding our coverage to more countries in South America (Brazil and Colombia), Sub-Saharan Africa (Burkina Faso, Cameroon, Chad, Democratic Republic of Congo, Ivory Coast, Ghana, Guinea, Malawi, Nigeria, Sierra Leone, Angola, South Sudan, Namibia, Liberia, and South Africa), and South Asia (Sri Lanka). We\'ve used an AI technique called transfer learning to make it work in areas where there\'s less data available. We\'re also announcing the global launch of Google FloodHub, a new platform that displays when and where floods may occur. We\'ll also be bringing this information to Google Search and Maps in the future to help more people to reach safety in flooding situations.
Summary: Google is using AI to forecast floods in South America, Sub-Saharan Africa, South Asia, and other parts of the world. The AI technique of transfer learning is being used to make it work in areas where there\'s less data available. Google FloodHub, a new platform that displays when and where floods may occur, has also been launched globally. This information will also be brought to Google Search and Maps in the future to help more people reach safety in flooding situations.

Provide a summary with about two sentences for the following article: In order to learn skiing, you must first be educated on the proper use of the equipment. This includes learning how to properly fit your boot on your foot, understand the different functions of the ski, and bring gloves, goggles etc. Your instructor starts you with one-footed ski drills. Stepping side-to-side, forward-and-backward, making snow angels while keeping your ski flat to the ground, and gliding with the foot not attached to a ski up for several seconds. Then you can put on both skis and get used to doing them with two skis on at once. Next, before going down the hill, you must first learn how to walk on the flat ground and up small hills through two methods, known as side stepping and herringbone. Now it\'s time to get skiing! For your first attempted run, you will use the skills you just learned on walking up the hill, to go down a small five foot vertical straight run, in which you will naturally stop on the flat ground. This makes you learn the proper athletic stance to balance and get you used to going down the hill in a safe, controlled setting. What do you need next? To be able to stop yourself. Here, your coach will teach you how to turn your skis into a wedge, also commonly referred to as a pizza, by rotating legs inward and pushing out on the heels. Once learned, you practice a gliding wedge down a small hill where you gradually come to a stop on the flat ground thanks to your wedge. Finally, you learn the necessary skill of getting up after falling, which is much easier than it looks, but once learned, a piece of cake.
Summary: Skiing is a great way to enjoy the outdoors and get some exercise. It can be a little daunting at first, but with a little practice, you\'ll be skiing like a pro in no time.

Provide a summary with about two sentences for the following article: Yellowstone National Park is an American national park located in the western United States, largely in the northwest corner of Wyoming and extending into Montana and Idaho. It was established by the 42nd U.S. Congress with the Yellowstone National Park Protection Act and signed into law by President Ulysses S. Grant on March 1, 1872. Yellowstone was the first national park in the U.S. and is also widely held to be the first national park in the world.The park is known for its wildlife and its many geothermal features, especially the Old Faithful geyser, one of its most popular. While it represents many types of biomes, the subalpine forest is the most abundant. It is part of the South Central Rockies forests ecoregion.
Summary: Yellowstone National Park is the first national park in the United States and the world. It is located in the western United States, largely in the northwest corner of Wyoming and extending into Montana and Idaho. The park is known for its wildlife and its many geothermal features, especially the Old Faithful geyser.

Provide a summary with about two sentences for the following article:{}
Summary:""".format(data),**parameters)
response=jsonify(response.text)
return response

if __name__ == "__main__":
app.run(port=8080, host='0.0.0.0', debug=True)

The Flask application is identical.

CLASSIFICATION

Once again, we have the Freeform and Structured inputs. The freeform takes a context and is able to predict sentiment, classify articles in topics, etc:

If we <> View Code, we will see we are using the TextGenerationModel.from_pretrained(“text-bison@001”) again. Flask remains the same in predict.py:

import vertexai
import os
from vertexai.language_models import TextGenerationModel
from flask import Flask, request, jsonify
import json
from collections import Counter

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/home/user/key.json'

vertexai.init(project="your-project", location="us-central1")
parameters = {
"temperature": 0.2,
"max_output_tokens": 256,
"top_p": 0.8,
"top_k": 40
}
model = TextGenerationModel.from_pretrained("text-bison@001")

app = Flask(__name__)

@app.route('/predict', methods= ['POST'])
def predict():
if request.get_json():
x=json.dumps(request.get_json())
x=json.loads(x)
else:
x={}
data=x["text"] # text
print(data)

response = model.predict("""
Multi-choice problem: What is the topic of this text?
- entertainment
- technology
- politics
- sports
- business
- health
- fun
- culture
- science

Text: {}""".format(data),**parameters)
response=jsonify(response.text)
return response

if __name__ == "__main__":
app.run(port=8080, host='0.0.0.0', debug=True)

And we will call this Flask app with call_flask2.py, same dynamics of the first test I did.

import requests
url = 'http://127.0.0.1:8080/predict'
r = requests.post(
url, json={"text": """Samba, is a name or prefix used for several rhythmic variants, such as samba urbano carioca (urban Carioca samba), samba de roda (sometimes also called rural samba), recognized as part of the Intangible Cultural Heritage of Humanity by UNESCO, amongst many other forms of Samba, mostly originated in the Rio de Janeiro and Bahia States. Samba is a broad term for many of the rhythms that compose the better known Brazilian music genres that originated in the Afro-Brazilian communities of Bahia in the late 19th century and early 20th century, having continued its development on the communities of Rio de Janeiro in the early 20th century. Having its roots in the Afro-Brazilian Candomblé, as well as other Afro-Brazilian and Indigenous folk traditions, such as the traditional Samba de Caboclo, it is considered one of the most important cultural phenomena in Brazil and one of the country's symbols. Present in the Portuguese language at least since the 19th century, the word "samba" was originally used to designate a "popular dance". Over time, its meaning has been extended to a "batuque-like circle dance", a dance style, and also to a "music genre". This process of establishing itself as a musical genre began in the 1910s and it had its inaugural landmark in the song "Pelo Telefone", launched in 1917. Despite being identified by its creators, the public, and the Brazilian music industry as "samba", this pioneering style was much more connected from the rhythmic and instrumental point of view to maxixe than to samba itself."""})
print(r.json())

You can also make sentiment analysis, by providing some examples in the Structured form (few-shot learning):

Let’s see the code:

import vertexai
from vertexai.language_models import TextGenerationModel
import os
from flask import Flask, request, jsonify
import json
from collections import Counter

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/home/user/key.json'

vertexai.init(project="your-project", location="us-central1")
parameters = {
"temperature": 0.2,
"max_output_tokens": 256,
"top_p": 0.8,
"top_k": 40
}
model = TextGenerationModel.from_pretrained("text-bison@001")

app = Flask(__name__)

@app.route('/predict', methods= ['POST'])
def predict():
if request.get_json():
x=json.dumps(request.get_json())
x=json.loads(x)
else:
x={}
data=x["text"] # text
print(data)

response = model.predict("""input: I had to compare two versions of Hamlet for my Shakespeare class and unfortunately I picked this version. Everything from the acting (the actors deliver most of their lines directly to the camera) to the camera shots (all medium or close up shots...no scenery shots and very little back ground in the shots) were absolutely terrible. I watched this over my spring break and it is very safe to say that I feel that I was gypped out of 114 minutes of my vacation. Not recommended by any stretch of the imagination.
Classify the sentiment of the message: negative

input: This Charles outing is decent but this is a pretty low-key performance. Marlon Brando stands out. There\'s a subplot with Mira Sorvino and Donald Sutherland that forgets to develop and it hurts the film a little. I\'m still trying to figure out why Charlie want to change his name.
Classify the sentiment of the message: negative

input: My family has watched Arthur Bach stumble and stammer since the movie first came out. We have most lines memorized. I watched it two weeks ago and still get tickled at the simple humor and view-at-life that Dudley Moore portrays. Liza Minelli did a wonderful job as the side kick - though I\'m not her biggest fan. This movie makes me just enjoy watching movies. My favorite scene is when Arthur is visiting his fiancée\'s house. His conversation with the butler and Susan\'s father is side-spitting. The line from the butler, \"Would you care to wait in the Library\" followed by Arthur\'s reply, \"Yes I would, the bathroom is out of the question\", is my NEWMAIL notification on my computer.
Classify the sentiment of the message: positive

input: {}
Classify the sentiment of the message:
""".format(data),**parameters)
response=jsonify(response.text)
return response

if __name__ == "__main__":
app.run(port=8080, host='0.0.0.0', debug=True)

Given the recent advancements in Generative AI, our greatest limitation is no longer technology but rather our creativity

EXTRACTION

For Question Answering, please refer to my other article, Deploying a Google Cloud Generative AI App in a Website with Cloud Run.

For Troubleshooting Documentation, we will use the Freeform:

Now, the code <> View Code. The only change is in the variable response. Here, the customer just have to tell support what color is the light to get support.

import vertexai
import os
from vertexai.language_models import TextGenerationModel
from flask import Flask, request, jsonify
import json
from collections import Counter

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/home/user/key.json'

vertexai.init(project="your-project", location="us-central1")
parameters = {
"temperature": 0.2,
"max_output_tokens": 256,
"top_p": 0.8,
"top_k": 40
}
model = TextGenerationModel.from_pretrained("text-bison@001")

app = Flask(__name__)

@app.route('/predict', methods= ['POST'])
def predict():
if request.get_json():
x=json.dumps(request.get_json())
x=json.loads(x)
else:
x={}
data=x["text"] # text
print(data)

response = model.predict("""Answer the question using the text below. Respond with only the text provided.
Question: What should I do to fix my disconnected wifi? The light on my Google Wifi router is {}.

Text:
Color: No light
What it means: Router has no power or the light was dimmed in the app.
What to do:
Check that the power cable is properly connected to your router and to a working wall outlet.
If your device is already set up and the light appears off, check your light brightness settings in the app.
If there\'s still no light, contact Wifi customer support.

Color: Solid white, no light, solid white
What it means: Device is booting up.
What to do:
Wait for the device to boot up. This takes about a minute. When it\'s done, it will slowly pulse white, letting you know it\'s ready for setup.

Color: Slow-pulsing white
What it means: Device is ready for set up.
What to do:
Use the Google Home app to set up your router.

Color: Solid white
What it means: Router is online and all is well.
What to do:
You\'re online. Enjoy!

Color: Slowly pulsing yellow
What it means: There is a network error.
What to do:
Check that the Ethernet cable is connected to both your router and your modem and both devices are turned on. You might need to unplug and plug in each device again.

Color: Fast blinking yellow
What it means: You are holding down the reset button and are factory resetting this device.
What to do:
If you keep holding down the reset button, after about 12 seconds, the light will turn solid yellow. Once it is solid yellow, let go of the factory reset button.

Color: Solid yellow
What it means: Router is factory resetting.
What to do:
This can take up to 10 minutes. When it\'s done, the device will reset itself and start pulsing white, letting you know it\'s ready for setup.
Image Solid red light Solid red Something is wrong. Critical failure. Factory reset the router. If the light stays red, contact Wifi customer support.
""".format(data),**parameters)
response=jsonify(response.text)
return response

if __name__ == "__main__":
app.run(port=8080, host='0.0.0.0', debug=True)

Flask (call_flask.py) is identical again.

import requests
url = 'http://127.0.0.1:8080/predict'
r = requests.post(
url, json={"text": """yellow and blinking slowly"""})
print(r.json())

The response:

“Check that the Ethernet cable is connected to both your router and your modem and both devices are turned on. You might need to unplug and plug in each device again.”

Note a pattern in the codes presented so far. They are extremely similar, only the instructions change.

The interesting thing here is that a similar result for Q&A can be obtained using embeddings. However, in Gen AI you get the few-shot feature, you don’t need a full dataset. By opting for embeddings, you must create a full dataset of questions-answers. Then you will use the Two Towers model, as I wrote here:

Another way to create embeddings is by using the Embeddings feature in textembedding-gecko model , in Google Cloud Model Garden, which is very simple:

from vertexai.preview.language_models import TextEmbeddingModel

model = TextEmbeddingModel.from_pretrained("textembedding-gecko")

embeddings = model.get_embeddings(["Dinner in New York City"])
for embedding in embeddings:
vector = embedding.values
print(vector)

Instead of using the pair user-item, or user-movie, you create the embeddings for each question-answer pair, feed the embeddings into the Two Towers Neural Network, and make inference wherever you want: Cloud Run, Vertex AI and Matching Engine (as it is a Tensorflow model, .pb).

WRITING AND IDEATION

These are very similar, in Writing you will generate text according to a set of declarations and will use it to write an email, announce a product, create a job posting, etc. Flask is the same (again!).

Server:

import vertexai
from vertexai.language_models import TextGenerationModel
import os
from flask import Flask, request, jsonify
import json
from collections import Counter

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/home/user/key.json'


vertexai.init(project="testing-2424", location="us-central1")
parameters = {
"temperature": 0.2,
"max_output_tokens": 256,
"top_p": 0.8,
"top_k": 40
}
model = TextGenerationModel.from_pretrained("text-bison@001")

app = Flask(__name__)

@app.route('/predict', methods= ['POST'])
def predict():
if request.get_json():
x=json.dumps(request.get_json())
x=json.loads(x)
else:
x={}
data=x["text"]
response = model.predict(
"""{}""".format(data),**parameters)
response=jsonify(response.text)
print(response)
return response

if __name__ == "__main__":
app.run(port=8080, host='0.0.0.0', debug=True)

Client:

import requests
url = 'http://127.0.0.1:8080/predict'
r = requests.post(
url, json={"text": """Write an ad copy for a part-time data entry job targeting college students. The job pays $15/hour and you can work from home."""})
print(r.json())

Response:

**Are you a college student looking for a part-time job that pays well and allows you to work from home?**

If so, then you’re in luck! We are looking for data entry workers to join our team. The job is simple and easy to learn, and you can earn up to $15 per hour.

**What are the requirements?**

* You must be a college student
* You must have a reliable computer and internet connection
* You must be able to work independently and meet deadlines

**What are the benefits?**

* The job is flexible and you can set your own hours
* You can work from home, so you can stay in your pajamas all day if you want
* The pay is competitive and you can earn up to $15 per hour

**If you’re interested, please send us your resume and we will be in touch.**

**We look forward to hearing from you!**

In Ideation, we generate ideas based on a specific topic:

“Give me ten interview questions for the role of program manager.”

<> View Code:

import vertexai
from vertexai.language_models import TextGenerationModel

vertexai.init(project="your-project", location="us-central1")
parameters = {
"temperature": 0.2,
"max_output_tokens": 256,
"top_p": 0.8,
"top_k": 1
}
model = TextGenerationModel.from_pretrained("text-bison@001")
response = model.predict(
"""Give me ten interview questions for the role of program manager.""",
**parameters
)
print(f"Response from Model: {response.text}")

Regarding language, we have also Codey, a chatbot that provides code, iteratively. Codey is an AI that provides coding features for Colab like code completions, natural language to code generation and even a code-assisting chatbot. It uses codechat-bison@001 and also allow chat interactions to build and run your code, as it remembers past interactions history:

Google also has Duet AI for this tasks, not in GA yet, but very powerful. If you check <> View Code, Codey follows the same dynamics of the chat models, but for code, with repeated chat.send_message and memory:

import vertexai
from vertexai.preview.language_models import CodeChatModel

vertexai.init(project="your-project", location="us-central1")
chat_model = CodeChatModel.from_pretrained("codechat-bison@001")
parameters = {
"temperature": 0.2,
"max_output_tokens": 1024
}
chat = chat_model.start_chat()
response = chat.send_message("""I want the code to create a function that sums 3 numbers""", **parameters)
print(f"Response from Model: {response.text}")
response = chat.send_message("""now, given numbers 1 , 4 and 6, run sum()""", **parameters)
print(f"Response from Model: {response.text}")

Text-To-Speech

Text-to-Speech is very useful for voicebots, where an interaction defined in text becomes voice, using Dialogflow.

<> View Code

"""Synthesizes speech from the input string of text."""
from google.cloud import texttospeech
import os

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/home/user/key.json'

client = texttospeech.TextToSpeechClient()

input_text = texttospeech.SynthesisInput(text="This research experimentally evaluates the ability of a drone-mounted wireless attack platform (DWAP) equipped with a directional antenna to conduct wireless attacks effectively.")

# Note: the voice can also be specified by name.
# Names of voices can be retrieved with client.list_voices().
voice = texttospeech.VoiceSelectionParams(
language_code="en-US",
name="en-US-Studio-O",
)

audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.LINEAR16,
speaking_rate=1
)

response = client.synthesize_speech(
request={"input": input_text, "voice": voice, "audio_config": audio_config}
)

# The response's audio_content is binary.
with open("output.mp3", "wb") as out:
out.write(response.audio_content)
print('Audio content written to file "output.mp3"')

To run this code in Python, you must enable the Text-To-Speech API first. There are hundreds of supported voices and languages. You can find the complete list here.

Speech-To-Text

This model uses Chirp, a version of Google’s Universal Speech Model (USM) with 2B+ parameters and can transcribe in over 100 languages. Chirp achieves state-of-the-art Word Error Rate (WER) on a variety of public test sets and languages. Use cases:

  • Google Cloud Contact Center AI solution (CCAI). Check my article here.
  • Turning audio containing speech into formatted text representation.
  • Captioning of videos for providing subtitles in English and other languages, when associated with Google Cloud Video Intelligence.
  • Content transcription for entity extraction, content classification (also in CCAI).

In this Codelab you have the code details to run speech-to-text:

Using the Speech-to-Text API with Python

from google.cloud import speech


def speech_to_text(
config: speech.RecognitionConfig,
audio: speech.RecognitionAudio,
) -> speech.RecognizeResponse:
client = speech.SpeechClient()

# Synchronous speech recognition request
response = client.recognize(config=config, audio=audio)

return response


def print_response(response: speech.RecognizeResponse):
for result in response.results:
print_result(result)


def print_result(result: speech.SpeechRecognitionResult):
best_alternative = result.alternatives[0]
print("-" * 80)
print(f"language_code: {result.language_code}")
print(f"transcript: {best_alternative.transcript}")
print(f"confidence: {best_alternative.confidence:.0%}")

config = speech.RecognitionConfig(
language_code="en",
)
audio = speech.RecognitionAudio(
uri="gs://cloud-samples-data/speech/brooklyn_bridge.flac",
)

response = speech_to_text(config, audio)
print_response(response)

You can get also the timestamps of each word, in case you are creating subtitles for a movie/video:

Speech-to-text is a discipline by itself, as you also have audio requirements, hyperparameters for transcription, you can alter frequency, channels, filter/alter audio with ffmpeg, among other possibilities to increase output quality. Take a look at Speech Studio, also in Google Cloud console.

With Imagen, the Generative AI Vision engine, you can:

  • Generate novel images using only a text prompt (text-to-image generation)
  • Edit an entire uploaded or generated image with a text prompt
  • Edit only parts of an uploaded or generated image using a mask area you define.
  • Upscale generated images.
  • Fine-tune a model with a specific subject (for example, a specific handbag or shoe) for image generation.
  • Get text descriptions of images with visual captioning.
  • Get answers to a question about an image with Visual Question Answering (VQA).

Access the Generative AI Studio Image Models documentation here.

As of July 2023, some features are not ready yet. Currently in Generative AI Studio, you upload an image and the interface is able to generate a caption to the image:

You can also make questions about the image (Q&A):

Tune a Foundation Model in Vertex AI

All models we used so far are pre-trained. You can also opt for tuning your own Foundation Model, as long as you provide a .JSONL dataset with the data you are interested in the proper format.

Let’s give it a try. I’ll use PyTorch to get the tatsu-lab/alpaca dataset from Hugging Face:

import torch
import json
from datasets import load_dataset

train_dataset = load_dataset("tatsu-lab/alpaca", split="train")

# Transform this pytorch dataset train_dataset in a pandas dataframe

df = train_dataset.to_pandas()

df["input_text"]=df.text.astype(str)+': '+df.instruction.astype(str)
df["output_text"]=df.output.astype(str)
df=df[["input_text","output_text"]]
data_list = df.to_dict(orient='records')
with open('output_alpaca.jsonl', 'w') as file:
for example in data_list:
file.write(json.dumps(example) + '\n')

Then transfer this file to a Google Cloud Storage bucket.

gsutil cp output_alpaca.jsonl gs://alpaca-dataset/

Data has this format, one example per row:

You can do the fine tuning using Python code or via Google Cloud Console:

import vertexai
from vertexai.preview.language_models import TextGenerationModel

training_data = "gs://alpaca-dataset/output_alpaca.jsonl"
vertexai.init(project="your project", location="us-central1")
model = TextGenerationModel.from_pretrained("text-bison@001")
model.tune_model(
training_data=training_data,
train_steps=300,
tuning_job_location="europe-west4",
tuned_model_location="us-central1",
)
print(model._job.status)

Tuning jobs in us-central1 use eight A100 80GB GPUs. Tuning jobs in europe-west4 use 64 cores of the TPU v3 pod custom model training resource, only available upon request. By doing a fast calculation, eight A100 80GB will cost 40.22 USD/hour and the TPU V3 64 cores, supposing is the double of 32 cores, will cost 64 USD/hour. Check your A100 80GB quota:

When you tune a Foundation model, you run a pipeline composed by steps.

Wait for it to finish and the model will be stored in Vertex AI Model Registry.

Then, in a further graph in the pipeline, the model is deployed to an endpoint for serving so that you can generate predictions by using model.predict. This whole process is very clear in my article Two Towers Model: A Custom Pipeline in Vertex AI Using Kubeflow. Here you don’t have to worry about writing Kubeflow components.

The model will be available for use and the rest is trivial:

from google.cloud.aiplatform.private_preview.language_models import TextGenerationModel

model = TextGenerationModel.get_tuned_model("fine-tuned-model-name-here")

response = model.predict("""Below is an instruction that describes a task.\n {}.
output_text:
""".format(data),**parameters)

response.text

For more information on tuning a Foundation Model, check these papers, kindly shared by Google guys:

Parameter-Efficient Transfer Learning for NLP

Adaptation of Large Foundation Models

PRICING

--

--

Rubens Zimbres
Google Cloud - Community

I’m a Senior Data Scientist and Google Developer Expert in ML and GCP. I love studying NLP algos and Cloud Infra. CompTIA Security +. PhD. www.rubenszimbres.phd