Multi-modal Gemini Pro Vision Slack chat bot

Published in

Google Cloud - Community

2 min readMar 27, 2024

In my last article, we built a PaLM-based chat bot. But since then, the state of art has moved onwards and it’s time to make some improvements to our chat bot and move it over to Gemini.

We also want to some super powers to the bot, namely multi-modal dialogue support. As always, a fully deployable example is provided.

Deploying the bot

To deploy the bot, check out the code from GoogleCloudPlatform/pubsub2inbox and follow the README instructions.

This time deploying the bot is even easier, as there is now a Slack manifest for getting the permissions correct the first time.

Once deployed and added to your Slack workspace, you can discuss with the app directly or add it to a channel and tag it. Some examples of what it can do:

Normal text conversation

Explaining an image

Well, it isn’t fully automatic, but close.

Explaining a PDF

You can also provide a prompt to be added to a discussion (prompt parameter in Slack processors processMessages part).

Update: Now the bot can also integrate with Vertex Search (or any other REST service) directly using Gemini function calling! See example here: Integrating Vertex AI Search

Information sourced from Vertex AI Search via Gemini Pro function call.

That’s it — you can customize the bot as you wish. Interested to hear about your use cases! You can reach out to me on X for example: twitter.com/rosmo

Multi-modal Gemini Pro Vision Slack chat bot

Deploying the bot

Normal text conversation

Explaining an image

Explaining a PDF

Written by Taneli Leppä