Generative AI — PaLM-2 model deployment with Cloud Run
This post shows a frontend in Gradio, deployed in Cloud Run, that exposes one of the PaLM-2 foundational models text-bison@001
.
text-bison@001
is one of the foundational models based on PaLM-2 that is available in Vertex AI. This post shows a front-end exposing this model and its main parameters (temperature, output tokens, top-P and top-K) via a Gradio app.
The model text-bison@001
is fine-tuned for language tasks such as classification, summarization, and entity extraction.
The frontend is deployed through a Gradio app in Cloud Run. A screenshot of the app follows:
PaLM-2 in Vertex AI
text-bison@001
is one of the foundational models available in Vertex AI, based on PaLM-2, and fine-tuned for certain language tasks. Details about PaLM-2 can be found in the technical report. Using the Vertex AI SDK, you can easily call the publisher endpoints for this model:
vertexai.init(project=PROJECT_ID, location=LOCATION)
model = TextGenerationModel.from_pretrained("text-bison@001")
model.predict(
prompt,
max_output_tokens=max_output_tokens, # default 128
temperature=temperature, # default 0
top_p=top_p, # default 1
top_k=top_k) # default 40
User-managed service account for Cloud Run
Since the application is deployed in Cloud Run, it uses the permissions of the compute service account by default to call the model. It’s recommended to use a separate service account with the minimum permissions. To do that, create a service account with impersonation and the following two extra roles: roles/aiplatform.user
to be able to call predictions and roles/logging.logWriter
to be able to write logs:
# Create service account
gcloud iam service-accounts create cloud-run-llm \
--description="Service account to call LLM models from Cloud Run" \
--display-name="cloud-run-llm"
# add aiplatform.user role
gcloud projects add-iam-policy-binding argolis-rafaelsanchez-ml-dev \
--member="serviceAccount:cloud-run-llm@argolis-rafaelsanchez-ml-dev.iam.gserviceaccount.com" \
--role="roles/aiplatform.user"
# add logging.logWriter role
gcloud projects add-iam-policy-binding argolis-rafaelsanchez-ml-dev \
--member="serviceAccount:cloud-run-llm@argolis-rafaelsanchez-ml-dev.iam.gserviceaccount.com" \
--role="roles/logging.logWriter"
# add permission to impersonate the sa (iam.serviceAccounts.actAs), since this is a user-namaged sa
gcloud iam service-accounts add-iam-policy-binding \
cloud-run-llm@argolis-rafaelsanchez-ml-dev.iam.gserviceaccount.com \
--member="user:<REPLACE_WITH_YOUR_USER_ACCOUNT>" \
--role="roles/iam.serviceAccountUser"
Build and deploy in Cloud Run
To build and deploy the Gradio app in Cloud Run, you need to build the docker in Artifact Registry and deploy it in Cloud Run.
Note the --allow-unauthenticated
parameter (no authentication required to access the app) and the --service-account
parameter pointed to the one configured earlier:
gcloud auth configure-docker europe-west4-docker.pkg.dev
gcloud builds submit --tag europe-west4-docker.pkg.dev/argolis-rafaelsanchez-ml-dev/ml-pipelines-repo/genai-text-demo
gcloud run deploy genai-text-demo --port 7860 --image europe-west4-docker.pkg.dev/argolis-rafaelsanchez-ml-dev/ml-pipelines-repo/genai-text-demo --service-account=cloud-run-llm@argolis-rafaelsanchez-ml-dev.iam.gserviceaccount.com --allow-unauthenticated --region=europe-west4 --platform=managed --project=argolis-rafaelsanchez-ml-dev
Conclusions
This post shows how to deploy a simple Gradio app that exposes a PaLM-2 model for text generation deployed in Cloud Run.
text-bison@001
use cases includes dialog summarization, text generation, scoring for marketing, and many others.
You can find the repo with all the code in this link.
References
[1] PaLM-2 technical report
[2] YouTube video: Generative AI on Google Cloud
[3] YouTube video: Build, tune, and deploy foundation models with Vertex AI
[4] YouTube video: Build, tune, and deploy foundation models with Generative AI Support in Vertex AI
[5] Overview of Generative AI support on Vertex AI.