Build a serverless Telegram bot on GCP with Cloud Run
--
Or “A praise for streaming architectures and Cloud Pub/Sub”
I’ve always liked the Telegram platform and wanted to do something with it, but I probably missed the right motivation. The occasion came as I got GCP-certified and wanted to try out GCP services with a concrete DIY project. Long story short, I created a telegram bot you can send pictures of your friends to get back chat app stickers of them!
I picture Telegram as a user-friendly channel to expose rich functionalities, like an API. One could also extend this with some AI features, and the chatbot, in fact, uses a matting model to easily extract people from pictures.
The goal was to build the bot, but also try as many GCP products and features as possible. Maybe this resulted in questionable architectural choices, but I must admit I had a blast building it!
Not to mention, I will try keeping the bot running so if you want to try it, here it is: https://t.me/pic_to_stickers_bot.
The architecture
I designed the Telegram bot as a streaming application made of individual specialized components.
Did I really need to do this? Of course not. The scope of the application is narrow, and I cloud have built a single component carrying out all the load. But this allowed me to take a look at Cloud Pub/Sub, Google’s serverless messaging service. Cloud Pub/Sub will connect the individual components, all python applications packaged into containers and running on Cloud Run.
For readers not familiar with GCP products, Cloud Run is a serverless computing platform that allows developers to deploy and run stateless containers. Cloud Run applications are triggered by incoming HTTP requests: when a request is received, the platform automatically starts up a new instance of the container to handle it, and if the rate increases, Cloud Run automatically spins up additional instances to handle the load.
Another central component is Cloud Storage, Google’s object storage service, which is responsible for storing the images as they are processed into stickers.
To recap, we will have:
- one component that will receive user messages, get the input pictures and feed the data pipelines
- one component that will get the pictures and use a matting model to extract the persons
- one component will post-process the images, get them to a format supported by Telegram, and send them back to the users.
An important aspect that to me is worth mentioning is Cloud Pub/Sub push notification mechanism. This is, in fact, crucial when designing serverless applications, as there is no need to poll for updates. Also, having the messaging platform manage notification of updates will simplify application logic: you only have to process incoming messages and return acks. Pub/Sub will handle retries (with exponential backoffs if configured) and will funnel messages that could not be correctly processed in a dedicated topic.
This feature, coupled with Telegram’s webhook mechanism to push updates to the bot, enables a truly serverless application, which wastes no resources if no incoming messages are received.
Here you have a pretty complete architecture diagram. I am going to expand on the individual steps that take place.
- Users interact with Telegram servers to communicate with the bot.
- Telegram web-hook mechanism notifies the bot for each new message at a specific endpoint, exposed by the Bot service on Cloud Run.
- As the container is spun, it will interact with the user asking for a picture. When a picture is sent, it will write it on a Cloud Storage Bucket.
- The Bot service publishes a new event into the incoming images Pub/Sub topic.
- Pub/Sub will push new events to the endpoint exposed by the Matting service, which has subscribed to the topic.
- The Matting service will matt the image and write the result on the Cloud Storage bucket.
- If the image is successfully written, it will also publish a new message into the matted images topic.
- Pub/Sub will push new events to the postprocessing service, which has subscribed to the topic.
- The matted image is read from the bucket and gets post-processed to Telegram standards for stickers.
- The output sticker is written to the images bucket
- The post-process service sends back the stickers to telegram servers…
- …the sticker finally makes it to the user’s phone!
References
Here’s the GitHub repo of the project: https://github.com/riccamini/p2sb
In the initial phase of the project, I worked on a local development environment using docker and the Cloud Pub/Sub emulator. You can find help scripts to start up Cloud Pub/Sub emulator and create topics with related subscriptions. You can then configure the individual components to use local storage instead of Cloud Storage to store intermediate results. You will have the complete deployment on your laptop to experiment with it. You will find more details in the project README file.
I would also recommend you take a look at FastDeploy, the AI model deployment toolkit I used to deploy the matting model at the heart of the bot. FastDeploy is part of PaddlePaddle, an open-source deep learning platform where you can find some quality implementations of common AI tasks.
Wrap up
There are probably too many aspects of building this app that I would have liked to talk about. I will try to hold back and expand on just some of them.
I love streaming architectures
There are plenty of ways I thought expanding this simple bot, and each one of them did not have any impact on the core functions. That’s for me the beauty of streaming architectures: you plug in a new component and start working with the data. For example, we could sink events into a database for analytics, trigger actions on another component (like warming it up) and so on. You just need to change the topology of your event flow.
Design for the cloud = keep an eye on the cost
Cloud platforms are a great sandbox for our DIY projects. We can create anything we can imagine with an unprecedented level of ease. But, although it should not be the main focus of the design phase, you should always keep an eye on the cost aspects of the architectural choices you make. To put this into context, let’s imagine a trivial example: we did not choose to use Pub/Sub push notification mechanism and went for pulling new events from the topics. We would have a completely different cost profile, with containers always running to ensure updates are processed, even if no user is using the bot. Also, locality is another huge factor in the cost profile. Sometimes costs may vary between regions and zones, and having different services not colocated would make them levitate (in most cases for traffic costs). Not to mention, the overall app performance will deteriorate.
Manage configuration with environment variables may be helpful
Managing configuration using environment variables can help you transition from a development version on your laptop to the production one on the Cloud. Nothing too fancy here, but I think correctly managing these aspects in software development can save you a lot of time. You would like to use libraries like python-dotenv to help you with this. The library loads environment variables from a .env file if found. So you can run and debug the application on your laptop easily by providing a .env file (not checked out by source control), while still reading configuration from environment variables. When it comes to running the app in a container, just configure environment variables with the run command. On the cloud, you will find options to configure them on the GUI or deployment commands. Still not buying it? It’s one of the pillars of 12-factor apps.
Improvements — IaC
I would have liked to try Terraform or some alternative IaC tool on GCP. At the beginning of this project, the idea was to provide also the infrastructure so one could simply check out the GitHub repo and try the application on his own. At the same time, I wanted to try also the GCP console, and I did not know in advance how the architecture would take shape as I was still evaluating options. It’s definitely something I will try.
Improvements — Bot features
Regarding the bot features, it would be good to integrate bot commands. Maybe expanding more on the functionalities with captions, swapping the background and so on. Also, a nice feature worth trying is Pub/Sub schemas. At the moment, messages on the topics are exchanged as plain JSON. I wrapped this up with a python dataclass and serde utilities, but having a central authority managing and enforcing schemas is key to facilitating the consumption of data streams among different applications.
If you made it to the end of the article, or if you just looked at the pictures…thank you for reading!
Please feel free to write any suggestions/improvements!