Build Flutter application in python to chat in ANY language with Google Cloud LLMs available on Vertex AI

Published in

Google Cloud - Community

9 min readAug 25, 2023

When Google announced their new family of Large Language Models (LLMs) named PaLM2, they emphasized significant improvements in multilingual capabilities:

“PaLM 2 is more heavily trained on multilingual text, spanning more than 100 languages. This has significantly improved its ability to understand, generate and translate nuanced text — including idioms, poems and riddles — across a wide variety of languages, a hard problem to solve. PaLM 2 also passes advanced language proficiency exams at the “mastery” level” [1]

[1] https://blog.google/technology/ai/google-palm-2-ai-large-language-model/

Table 21 in PaLM 2 Technical Report lists the top 50 languages out of hundreds, with their associated percentages in the multilingual web documents sub corpus. One of them in my native language: Polish.

It is hard nowadays to open LinkedIn, Twitter, YouTube, Medium etc. without encountering something about Generative AI. However, all these demos, examples and inspirational prototypes use LLMs with English. For many companies outside USA, UK support for their native languages is as important as data privacy guarantees.

In this article, we want to demo PaLM2 models available on Google Cloud with non-english languages like Polish. I focus on Polish because it in my native language, however everything that is explained in this article applies to other non-english languages as well. So feel free to try to follow below steps in your native language and let me know in the comments how it worked in Spanish, Ukrainian, Hindi etc.

We will build up on our previous article, entitled: Flutter for data engineering and data science! Flet.dev — running Flutter apps built in Python on Google Cloud with Cloud Run where we show how with just a few lines of code one can build beautiful web, mobile and desktop apps based on Flutter utilizing python — the language of data engineers and data scientists. All this is possible thanks to project called Flet.dev which has potential to bridge software and data engineering/science worlds and become a standard for building both cross-platform, interactive multi-user applications and data products! On top of this, we show how to quickly host such an application on Cloud Run — fully-managed compute environment for deploying and scaling serverless HTTP containers.

We will assume that all the details necessary to build Flet application and deploy it on Cloud Run as explained in the referenced article. Here we go one step further and show how to build Flet application to chat with Google’s Large Language Models designed for multi-turn conversations and available on Google Cloud on Vertex AI. Specifically, we will chat with chat-bizon. We want to chat in non-english languages. High level architecture with all core components and communication flows is shown in the following diagram:

Vertex AI comes with Generative AI Studio where we can quickly prototype prompts and play with the language, vision and speech models. Vertex AI also enables you to fine-tune Google’s LLMs to make them work better for your use case.

For this article, however, we will assume that the prototyping phase is behind us and we will focus more on building and running our own chat application. And remember — we want to create an app which will be able to chat in non-english languages so the expected end result is as follows:

Thanks to Flutter and Flet.dev, all that is needed to build such a user interface and communicate with chat-bizon model available on Vertex AI is this short python script.

from vertexai.preview.language_models import ChatModel, InputOutputTextPair
import flet as ft
import os

class Message():
    def __init__(self, user_type: str, text: str):
        self.user_type = user_type
        self.text = text

class ChatMessage(ft.Row):
    def __init__(self, message: Message):
        super().__init__()
        self.vertical_alignment="start"
        self.controls=[
                ft.CircleAvatar(
                    content=ft.Text(self.get_initials(message.user_type)),
                    color=ft.colors.WHITE,
                    bgcolor=self.get_avatar_color(message.user_type),
                ),
                ft.Column(
                    [
                        ft.Text(message.user_type, weight="bold"),
                        ft.Text(message.text, selectable=True, no_wrap = False),
                    ],
                    tight=True,
                    spacing=5,
                    expand=True
                ),
            ]

    def get_initials(self, user_type: str):
        return user_type[:1].capitalize()

    def get_avatar_color(self, user_type: str):
        if user_type == "user":
            return ft.colors.BLUE
        else:
            return ft.colors.LIME,

def main(page: ft.Page):
        page.title = "GCP Generative AI"

        chat_model = ChatModel.from_pretrained("chat-bison@001")
        # TODO developer - override these parameters as needed:
        parameters = {
            "temperature": 0.2,  # Temperature controls the degree of randomness in token selection.
            "max_output_tokens": 256,  # Token limit determines the maximum amount of text output.
            "top_p": 0.95, # Tokens are selected from most probable to least until the sum of their probabilities equals the top_p value.
            "top_k": 40,  # A top_k of 1 means the selected token is the most probable among all tokens.
        }

        chat_session = chat_model.start_chat(
                context="Nazywam sie Lukasz. Staram sie inspirować.",
                examples=[],
        )

        def send_message_click(e):
            user_message = Message("user", new_message.value);
            user_ui_message = ChatMessage(user_message)
            chat_ui.controls.append(user_ui_message)
            page.update()


            response = chat_session.send_message(
                new_message.value, **parameters
            )

            bot_message = Message("bot", response.text);
            bot_ui_message = ChatMessage(bot_message)
            chat_ui.controls.append(bot_ui_message)

            new_message.value = ""
            new_message.focus()
            page.update()


        chat_ui = ft.ListView(
        expand=True,
        spacing=10,
        auto_scroll=True,
        )

        # A new message entry form
        new_message = ft.TextField(
            hint_text="Wyslij...",
            autofocus=True,
            shift_enter=True,
            min_lines=1,
            max_lines=5,
            filled=True,
            expand=True,
            on_submit=send_message_click,
        )

        page.add(
            ft.Container(
                content=chat_ui,
                border=ft.border.all(1, ft.colors.OUTLINE),
                border_radius=5,
                padding=10,
                expand=True
            ),
            ft.Row(
                [
                    new_message,
                    ft.IconButton(
                        icon=ft.icons.SEND_ROUNDED,
                        tooltip="Send message",
                        on_click=send_message_click,
                    ),
                ],
            ),
        )

ft.app(target=main, port=int(os.environ.get("PORT", 8090)), view = ft.AppView.WEB_BROWSER)

Let us explain it. To communicate with Vertex AI we use Vertex AI python SDK.

from vertexai.preview.language_models import ChatModel

We need to instantiate object representing chat-bizon model:

chat_model = ChatModel.from_pretrained("chat-bison@001")

Then we need to init chat. It is the moment where we can specify so-called context and examples for chat-bizon — both things can greatly help the model generate better responses.

chat_session = chat_model.start_chat(
                context="Nazywam sie Lukasz. Staram sie inspirować.",
                examples=[],
        )

For our application we will just provide a very basic context with no examples but careful prompt engineering is a critical element of every LLM-based application and many advanced methods were proposed to get as much out of language models as possible. Fore example Google researchers showed that language models that are properly prompted, via chain-of-thought, demonstrate emergent capabilities that carry out self-conditioned reasoning traces to derive answers from questions, excelling at various arithmetic, commonsense, and symbolic reasoning tasks. In “ReAct: Synergizing Reasoning and Acting in Language Models”, Google researchers proposed a general paradigm that combines reasoning and acting advances to enable language models to solve various language reasoning and decision making tasks. They demonstrated that the Reason+Act (ReAct) paradigm systematically outperforms reasoning and acting only paradigms, when prompting bigger language models and fine-tuning smaller language models. Such a tight integration of reasoning and acting is also more aligned with human task-solving trajectories that improve interpretability, diagnosability, and controllability.

We are ready now to have conversation with chat-bizon. Every message from user needs to be sent to chat-bizon using send_message method:

response = chat_session.send_message(
                new_message.value, **parameters
            )

It takes user question or comment as text but also dictionary of parameters that control model responses:

parameters = {
            "temperature": 0.2,  # Temperature controls the degree of randomness in token selection.
            "max_output_tokens": 256,  # Token limit determines the maximum amount of text output.
            "top_p": 0.95, # Tokens are selected from most probable to least until the sum of their probabilities equals the top_p value.
            "top_k": 40,  # A top_k of 1 means the selected token is the most probable among all tokens.
        }

Please note that the same parameters are available in Vertex Generative AI studio in Google Cloud Console where you can play and experiment with the models:

From application UI perspective, Chat Messages are added to ListView:

chat_ui = ft.ListView(
        expand=True,
        spacing=10,
        auto_scroll=True,
)


...
user_ui_message = ChatMessage(user_message)
chat_ui.controls.append(user_ui_message)
...
bot_ui_message = ChatMessage(bot_message)
chat_ui.controls.append(bot_ui_message)

User inputs are collected using new_message TextField. They are send for processing when user either clicks enter on this field (on_submit):

new_message = ft.TextField(
            hint_text="Wyslij...",
            autofocus=True,
            shift_enter=True,
            min_lines=1,
            max_lines=5,
            filled=True,
            expand=True,
            on_submit=send_message_click,
        )

or when user click Send button (on_click):

ft.IconButton(icon=ft.icons.SEND_ROUNDED,
              tooltip="Send message",
              on_click=send_message_click,
),

on_click and on_submit actions trigger event which calls send_message_click method. send_message_click then adds new ChatMessage control to the list of ListView controls and clears new_message TextField so that user can ask a new question.

bot_message = Message("bot", response.text);
bot_ui_message = ChatMessage(bot_message)
chat_ui.controls.append(bot_ui_message)

new_message.value = ""

In send_message_click we have two calls of update() function. First one displays user question on the list of conversation messages without waiting for response from chat-bizon. Second update() is used to display response from chat-bizon.

user_message = Message("user", new_message.value);
user_ui_message = ChatMessage(user_message)
chat_ui.controls.append(user_ui_message)
page.update()

...

bot_message = Message("bot", response.text);
bot_ui_message = ChatMessage(bot_message)
chat_ui.controls.append(bot_ui_message)

.....

page.update()

ChatMessage is aRow containing CircleAvatar with user type (one of: user, bot) initials and Column that contains two Text controls: user type and message text. This design is taken after one of the Flet.dev tutorials: creating realtime chat app in python.

Source: https://flet.dev/docs/tutorials/python-realtime-chat/

class ChatMessage(ft.Row):
    def __init__(self, message: Message):
        super().__init__()
        self.vertical_alignment="start"
        self.controls=[
                ft.CircleAvatar(
                    content=ft.Text(self.get_initials(message.user_type)),
                    color=ft.colors.WHITE,
                    bgcolor=self.get_avatar_color(message.user_type),
                ),
                ft.Column(
                    [
                        ft.Text(message.user_type, weight="bold"),
                        ft.Text(message.text, selectable=True, no_wrap = False),
                    ],
                    tight=True,
                    spacing=5,
                    expand=True
                ),
            ]

We are ready to deploy our chat application to Cloud Run! Please start with our previous article Flutter for data engineering and data science! Flet.dev — running Flutter apps built in Python on Google Cloud with Cloud Run where we explain this very simple deployment in detail.

You will learn that you basically need two additional files:

Dockerfile — use exactly the same definition
requirements.txt — file which list dependencies needed for our application to function. Dependency on Flet project remains but we must also declare dependency on Vertex AI SDK:

flet>=0.2.4
google-cloud-aiplatform>=1.25

There is one more thing we need to address here. Some of you probably noticed that in order to communicate with Vertex AI and send requests to chat-bizon we need to be first authenticated and then GCP validates if we are authorized to execute requested actions. But the word ‘WE’ is not precise here. Requests to chat-bizon will be sent by specific GCP IAM identity. Therefore for our application we will create service account.

When creating new Cloud Run service, you have an option to specify which service account will represent this service when communicating with GCP:

Last thing is to make sure our service account is authorized to use LLMs on Vertex AI. Following documentation: to give generative AI feature access to service accounts, you can give the service account the role Vertex AI Service Agent (roles/aiplatform.serviceAgent):

Nice thing is that you do not need to do anything in your code to authenticate this service account before sending requests to chat-bizon. GCP will handle that behind the scenes.

Another nice thing is that while deploying to Cloud Run we need to specify service account and that can be done with just one extra attribute in gcloud command:

gcloud run deploy \
--allow-unauthenticated \
--service-account=<your_service_account> \

When deployment is finished you should be able to chat with chat-bizon in any language you like. Have fun!

I hope this article shows how easy and quickly you can build and run beautiful Flutter applications in python with Flet.dev and Cloud Run and integrate them with Google Generative AI models available on Vertex AI.

In the next article we will show how to extend this demo to build applications that can chat about private company documents using chat-bizon and GCP Enterprise Search.

This article is authored by Lukasz Olejniczak — Customer Engineer at Google Cloud. The views expressed are those of the authors and don’t necessarily reflect those of Google.

Please clap for this article if you enjoyed reading it. For more about google cloud, data science, data engineering, and AI/ML follow me on LinkedIn.

Build Flutter application in python to chat in ANY language with Google Cloud LLMs available on Vertex AI

Written by Olejniczak Lukasz